Quantization
Neural Networks
Reducing numerical precision to shrink models and speed inference
What is Quantization?
Convert weights/activations from 32-bit float to 16-bit or 8-bit integers, with hardware support boosting throughput.
Real-World Examples
- •INT8 inference on CPUs/TPUs
- •Mixed precision training using FP16
Related Terms
Learn more about concepts related to Quantization