Low-bit quantization of network weights and activations can drastically reduce the memory footprint, complexity, energy consumption and latency of DeepNeural Networks (DNNs) However, it can also cause an aconsiderable drop in accuracy when applied to complex tasks or lightweight DNNs . Inexperiments, our approach outperforms other low-bit Quantization techniques on various object recognition benchmarks such as CIFAR10 and ImageNetILSVRC 2012, achieves almost the same accuracy as a full precision DNN, and reduces the accuracy drop when quantizing lightweight Dnnarchitectures . We call thisprocedure DNN Quantization with Attention (DQA)

Author(s) : Ghouthi Boukli Hacene, Lukas Mauch, Stefan Uhlich, Fabien Cardinaux

Links : PDF - Abstract

Code :



Keywords : quantization - accuracy - dnn - lightweight - attention -

Leave a Reply

Your email address will not be published. Required fields are marked *