Knowledge Distillation
Neural Networks
Training small model (student) from large model (teacher)
What is Knowledge Distillation?
Student model learns teacher’s softened outputs to achieve similar performance with smaller size for edge or faster inference.
Real-World Examples
- •Compressing BERT to DistilBERT
Related Terms
Learn more about concepts related to Knowledge Distillation