tools/accnn/README.md - mxnet-test - Git at Google

 # Accelerate Convolutional Neural Networks

 This tool aims to accelerate the test-time computation and decrease number of parameters of deep CNNs.


 ## How to use

 Use ``accnn.py`` to get a new model by specifying an original model and the speeding-up ratio.

 You may provide a json to explicitly control the architecture of the new model, otherwise the rank-selection algorithm would be used to do it automatically and the configuration would be saved to file ``config.json``.

 ``acc_conv.py`` and ``acc_fc.py`` would be involved automatically when using ``accnn.py`` while ``acc_conv.py`` and ``acc_fc.py`` can also be used seperately.

 ## Example

 ###Speedup whole network

 - Speed up a model by 2 times and use ``rank-selection`` to determine ranks of each layer automatically

   ```bash
   python accnn.py -m MODEL-PREFIX --save-model new-vgg16 --ratio 2
   ```

 - Use your own configuration file without ``rank-selection``

   ```bash
   python accnn.py -m MODEL-PREFIX --save-model new-model --config YOUR-CONFIG_JSON
   ```

 ###Speedup a single layer

 - Decompose a convolutional layer:

   ```bash
   python acc_conv.py -m MODEL-PREFIX --layer LAYER-NAME --K NUM-FILTER --save-model new-model
   ```

 - Decompose a fullyconnected layer:

   ```bash
   python acc_fc.py -m MODEL-PREFIX --layer LAYER-NAME --K NUM-HIDDEN --save-model new-model
   ```
 - uses `--help` to see more options


 ## Results

 The experiments are carried on a single machine with four Nvidia Titan X GPUs. The top-5 accuracy is evaluated on ImageNet validation dataset.


 | Model | Top-5 accuracy  |  Theoretical speed up | CPU speed up | GPU speed up |
 | ------------- | -----------: | -------------: | -----------: | -----------: |
 | model0 | 89.6% |  1x|  1x|  1x|
 | model1 | 88.6% |  2.4x|   2.2x|  1.1x|
 | model2 | 89.8% |  2.4x|   2.2x|  1.1x|
 | model3 | 87.5% |  3x|   2.6x|    1.2x|
 | model4 | 89.6% |  3x|   2.6x|    1.2x|


  * ``model0`` is the original VGG16 model directly converted from Caffe Model Zoo
  * ``model1`` is the accelerated model based on ``config.json``
  * ``model2`` is the same as ``model1`` but is fine-tuned on ImageNet training dataset for 5 epochs
  * ``model3`` is the accelerated model based on rank-selection with 3 times speeding up
  * ``model4`` is the same as ``model3`` but is fine-tuned on ImageNet training dataset for 5 epochs
  * The experiments in GPU are carried with cuDNN 4


 ## Notes

 * This tool is verified on the [VGG-16](https://gist.github.com/jimmie33/27c1c0a7736ba66c2395) model converted from Caffe by ``caffe_converter`` tool.

 * ``accnn.py`` tool only supports single input and output

 * This tool mainly implements the algorithm of Cheng *et al.* [2] to decompose a convolutional layer to two convolutional layers both in spatial dimensions and across channels. ``acc_conv.py`` provides the function to replace a ``(N,d,d)`` conv. layer by two ``(K,d,1)`` and ``(N,1,d)`` conv. layers.

 * The idea of ``rank-selection`` tool is based on the related work of Zhang *et al* [1] that we could use the product of PCA energy to determine the rank for each layer.

 ## Reference Paper

 [1] Zhang, Xiangyu, et al. "Efficient and accurate approximations of nonlinear convolutional networks." arXiv preprint arXiv:1411.4229 (2014).

 [2] Tai, Cheng, et al. "Convolutional neural networks with low-rank regularization." arXiv preprint arXiv:1511.06067 (2015).
	# Accelerate Convolutional Neural Networks

	This tool aims to accelerate the test-time computation and decrease number of parameters of deep CNNs.


	## How to use

	Use ``accnn.py`` to get a new model by specifying an original model and the speeding-up ratio.

	You may provide a json to explicitly control the architecture of the new model, otherwise the rank-selection algorithm would be used to do it automatically and the configuration would be saved to file ``config.json``.

	``acc_conv.py`` and ``acc_fc.py`` would be involved automatically when using ``accnn.py`` while ``acc_conv.py`` and ``acc_fc.py`` can also be used seperately.

	## Example

	###Speedup whole network

	- Speed up a model by 2 times and use ``rank-selection`` to determine ranks of each layer automatically

	```bash
	python accnn.py -m MODEL-PREFIX --save-model new-vgg16 --ratio 2
	```

	- Use your own configuration file without ``rank-selection``

	```bash
	python accnn.py -m MODEL-PREFIX --save-model new-model --config YOUR-CONFIG_JSON
	```

	###Speedup a single layer

	- Decompose a convolutional layer:

	```bash
	python acc_conv.py -m MODEL-PREFIX --layer LAYER-NAME --K NUM-FILTER --save-model new-model
	```

	- Decompose a fullyconnected layer:

	```bash
	python acc_fc.py -m MODEL-PREFIX --layer LAYER-NAME --K NUM-HIDDEN --save-model new-model
	```
	- uses `--help` to see more options


	## Results

	The experiments are carried on a single machine with four Nvidia Titan X GPUs. The top-5 accuracy is evaluated on ImageNet validation dataset.



	\| Model \| Top-5 accuracy \| Theoretical speed up \| CPU speed up \| GPU speed up \|
	\| ------------- \| -----------: \| -------------: \| -----------: \| -----------: \|
	\| model0 \| 89.6% \| 1x\| 1x\| 1x\|
	\| model1 \| 88.6% \| 2.4x\| 2.2x\| 1.1x\|
	\| model2 \| 89.8% \| 2.4x\| 2.2x\| 1.1x\|
	\| model3 \| 87.5% \| 3x\| 2.6x\| 1.2x\|
	\| model4 \| 89.6% \| 3x\| 2.6x\| 1.2x\|


	* ``model0`` is the original VGG16 model directly converted from Caffe Model Zoo
	* ``model1`` is the accelerated model based on ``config.json``
	* ``model2`` is the same as ``model1`` but is fine-tuned on ImageNet training dataset for 5 epochs
	* ``model3`` is the accelerated model based on rank-selection with 3 times speeding up
	* ``model4`` is the same as ``model3`` but is fine-tuned on ImageNet training dataset for 5 epochs
	* The experiments in GPU are carried with cuDNN 4


	## Notes

	* This tool is verified on the [VGG-16](https://gist.github.com/jimmie33/27c1c0a7736ba66c2395) model converted from Caffe by ``caffe_converter`` tool.

	* ``accnn.py`` tool only supports single input and output

	* This tool mainly implements the algorithm of Cheng et al. [2] to decompose a convolutional layer to two convolutional layers both in spatial dimensions and across channels. ``acc_conv.py`` provides the function to replace a ``(N,d,d)`` conv. layer by two ``(K,d,1)`` and ``(N,1,d)`` conv. layers.

	* The idea of ``rank-selection`` tool is based on the related work of Zhang et al [1] that we could use the product of PCA energy to determine the rank for each layer.

	## Reference Paper

	[1] Zhang, Xiangyu, et al. "Efficient and accurate approximations of nonlinear convolutional networks." arXiv preprint arXiv:1411.4229 (2014).

	[2] Tai, Cheng, et al. "Convolutional neural networks with low-rank regularization." arXiv preprint arXiv:1511.06067 (2015).