Lower Numerical Precision Deep Learning Inference and Training

Most commercial deep learning applications today use 32-bits of floating point precision (๐‘“๐‘32) for training and inference workloads. Various researchers have demonstrated that both deep learning training and inference can be performed with lower numerical precision, using 16-bit multipliers for training and 8-bit multipliers or fewer for inference with minimal to no loss in accuracy (higher precision โ€“ 16-bits vs. 8-bits โ€“ is usually needed during training to accurately represent the gradients during the backpropagation phase). Using these lower numerical precisions (training with 16-bit multipliers accumulated to 32-bits or more and inference with 8-bit multipliers accumulated to 32-bits) will likely become the standard over the next year, in particular for convolutional neural networks (CNNs)…

Download File

Lower-Numerical-Precision-Deep-Learning-Inference-Training