A White Paper on Neural Network Quantization
A Survey of Model Compression and Acceleration for Deep Neural Networks
A Survey of Quantization Methods for Efficient Neural Network Inference
Convolutional neural networks with low-rank regularization
Coordinating filters for faster deep neural networks.
Darkrank: Accelerating deep metric learning via cross sample similarities transfer
Distilling the Knowledge in a Neural Network
Fitnets: Hints for thin deep nets
Faster cnns with direct sparse convolutions and guided pruning
Hardware-oriented approximation of convolutional neural networks.
Incremental network quantization: Towards lossless cnns with low-precision weights
Like what you like: Knowledge distill via neuron selectivity transfer
Learning both weights and connections for efficient neural network
Learning Efficient Convolutional Networks through Network Slimming
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Pruning filters for efficient convnets
Pruning Convolutional Neural Networks for Resource Efficient Inference
Quantized convolutional neural networks for mobile devices
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
SqueezeNet :AlexNet-level accuracy with 50x fewer parameters and <0.5MB
Thinet: A filter level pruning method for deep neural network compression.
Towards lightweight convolutional neural networks for object detection
Xnor-net: Imagenet classification using binary convolutional neura lnetworks.
Xception: Deep Learning with Depthwise Separable Convolutions