神经网络压缩和正则化的量化和修剪
深度神经网络通常在计算上过于昂贵,无法在消费级硬件和低功耗设备上实时运行。在本文中,我们研究了通过网络修剪和量化来减少神经网络的计算和内存需求。..
Quantisation and Pruning for Neural Network Compression and Regularisation
Deep neural networks are typically too computationally expensive to run in real-time on consumer-grade hardware and low-powered devices. In this paper, we investigate reducing the computational and memory requirements of neural networks through network pruning and quantisation.We examine their efficacy on large networks like AlexNet compared to recent compact architectures: ShuffleNet and MobileNet. Our results show that pruning and quantisation compresses these networks to less than half their original size and improves their efficiency, particularly on MobileNet with a 7x speedup. We also demonstrate that pruning, in addition to reducing the number of parameters in a network, can aid in the correction of overfitting.