massOfai

Model Sharding

Performance

Splitting model across devices for parallelism

What is Model Sharding?

Pipeline or tensor model parallelism partitions model weights across GPUs to train or serve very large models.

Real-World Examples

  • Megatron-LM tensor parallelism