High-Fidelity Conversion of Floating-Point Networks for Low-Precision Inference using Distillation

A major challenge for fast, low-power inference on neural network accelerators is the size of the models. There is a trend in recent years towards deeper.