Quantization

Neural Networks

Reducing numerical precision to shrink models and speed inference

Convert weights/activations from 32-bit float to 16-bit or 8-bit integers, with hardware support boosting throughput.

Learn more about concepts related to Quantization

Mixed Precision

Using lower precision where safe to speed training/inference

TensorRT

NVIDIA runtime for high-performance inference

ONNX

Open Neural Network Exchange format