Resource Efficient Deep Learning

שלחו לחבר
Daniel Soudry
BIU Engineering Building 1103, Room 329
Columbia University, NY, USA

Background: The recent success of deep neural networks (DNNs) relies on large computational resources (memory, energy, area and processing power). These resources pose a major bottleneck in our ability to train better models, and to use these models on low power devices (e.g., mobile phones). However, current generation DNNs seem tremendously wasteful, especially in comparison to the brain (which consumes only 12W). For example, typical DNNs use 32bit floating point operations, while the brain typically operates using binary spikes and with limited synaptic precision. Achieving such low precision in DNNs can significantly improve memory, speed and energy. However, until recently, 8 bits appeared to be the lowest possible limit. 

Results: We show that it is possible to significantly quantize (even down to 1 bit) the activations and weights of DNNs trained by a variant of the backpropagation algorithm, while preserving good performance on various benchmarks (e.g., MNIST, ImageNet). Interestingly, the algorithm originated from first principles: we developed a closed form analytical approximation to the Bayes rule update of the posterior distribution of the binary DNN weights. At run-time, such a binarized DNN requires 32-fold less memory, is 7 times faster (using dedicated GPU kernels), and is at least 10-fold more energy efficient (using dedicated hardware). This can enable the use of trained DNNs in low power devices. Additional benefits are expected at train-time by further quantizing the gradients, potentially allowing larger and more sophisticated models to be trained.