Model Sharding
Performance
Splitting model across devices for parallelism
What is Model Sharding?
Pipeline or tensor model parallelism partitions model weights across GPUs to train or serve very large models.
Real-World Examples
- •Megatron-LM tensor parallelism
Related Terms
Learn more about concepts related to Model Sharding