example/ssd/README.md - mxnet-test - Git at Google

 # SSD: Single Shot MultiBox Object Detector

 SSD is an unified framework for object detection with a single network.

 You can use the code to train/evaluate/test for object detection task.

 ### Disclaimer
 This is a re-implementation of original SSD which is based on caffe. The official
 repository is available [here](https://github.com/weiliu89/caffe/tree/ssd).
 The arXiv paper is available [here](http://arxiv.org/abs/1512.02325).

 This example is intended for reproducing the nice detector while fully utilize the
 remarkable traits of MXNet.
 * The model is fully compatible with caffe version.
 * The prediction result is almost identical to the original version. However, due to different non-maximum suppression Implementation, the results might differ slightly.

 Due to the permission issue, this example is maintained in this [repository](https://github.com/zhreshold/mxnet-ssd) separately. You can use the link regarding specific per example [issues](https://github.com/zhreshold/mxnet-ssd/issues).

 ### Demo results
 ![demo1](https://cloud.githubusercontent.com/assets/3307514/19171057/8e1a0cc4-8be0-11e6-9d8f-088c25353b40.png)
 ![demo2](https://cloud.githubusercontent.com/assets/3307514/19171063/91ec2792-8be0-11e6-983c-773bd6868fa8.png)
 ![demo3](https://cloud.githubusercontent.com/assets/3307514/19171086/a9346842-8be0-11e6-8011-c17716b22ad3.png)

 ### mAP
 |        Model          | Training data    | Test data |  mAP |
 |:-----------------:|:----------------:|:---------:|:----:|
 | VGG16_reduced 300x300 | VOC07+12 trainval| VOC07 test| 71.57|

 ### Speed
 |         Model         |   GPU            | CUDNN | Batch-size | FPS* |
 |:---------------------:|:----------------:|:-----:|:----------:|:----:|
 | VGG16_reduced 300x300 | TITAN X(Maxwell) | v5.1  |     16     | 95   |
 | VGG16_reduced 300x300 | TITAN X(Maxwell) | v5.1  |     8      | 95   |
 | VGG16_reduced 300x300 | TITAN X(Maxwell) | v5.1  |     1      | 64   |
 | VGG16_reduced 300x300 | TITAN X(Maxwell) |  N/A  |     8      | 36   |
 | VGG16_reduced 300x300 | TITAN X(Maxwell) |  N/A  |     1      | 28   |
 - *Forward time only, data loading and drawing excluded.*


 ### Getting started
 * You will need python modules: `easydict`, `cv2`, `matplotlib` and `numpy`.
 You can install them via pip or package managers, such as `apt-get`:
 ```
 sudo apt-get install python-opencv python-matplotlib python-numpy
 sudo pip install easydict
 ```

 * Build MXNet: Follow the official instructions, make sure the extra operators for this example is enabled
 ```
 # for Ubuntu/Debian
 cp make/config.mk ./config.mk
 # modify it if with vim or whatever editor
 EXTRA_OPERATORS = example/ssd/operator
 # or add a line if you have other EXTRA_OPERATORS directory
 EXTRA_OPERATORS += example/ssd/operator
 ```
 Remember to enable CUDA if you want to be able to train, since CPU training is
 insanely slow. Using CUDNN is optional.

 ### Try the demo
 * Download the pretrained model: [`ssd_300.zip`](https://dl.dropboxusercontent.com/u/39265872/ssd_300_vgg16_reduced_voc0712_trainval.zip), and extract to `model/` directory. (This model is converted from VGG_VOC0712_SSD_300x300_iter_60000.caffemodel provided by paper author).
 * Run
 ```
 # cd /path/to/mxnet/example/ssd/
 # grab demo images
 python data/demo/download_demo_images.py
 # run demo.py with defaults
 python demo.py
 # play with examples:
 python demo.py --epoch 0 --images ./data/demo/dog.jpg --thresh 0.5
 ```
 * Check `python demo.py --help` for more options.

 ### Train the model
 This example only covers training on Pascal VOC dataset. Other datasets should
 be easily supported by adding subclass derived from class `Imdb` in `dataset/imdb.py`.
 See example of `dataset/pascal_voc.py` for details.
 * Download the converted pretrained `vgg16_reduced` model [here](https://dl.dropboxusercontent.com/u/39265872/vgg16_reduced.zip), unzip `.param` and `.json` files
 into `model/` directory by default.
 * Download the PASCAL VOC dataset, skip this step if you already have one.
 ```
 cd /path/to/where_you_store_datasets/
 wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
 wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
 wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
 # Extract the data.
 tar -xvf VOCtrainval_11-May-2012.tar
 tar -xvf VOCtrainval_06-Nov-2007.tar
 tar -xvf VOCtest_06-Nov-2007.tar
 ```
 * We are goint to use `trainval` set in VOC2007/2012 as a common strategy.
 The suggested directory structure is to store `VOC2007` and `VOC2012` directories
 in the same `VOCdevkit` folder.
 * Then link `VOCdevkit` folder to `data/VOCdevkit` by default:
 ```
 ln -s /path/to/VOCdevkit /path/to/mxnet/example/ssd/data/VOCdevkit
 ```
 Use hard link instead of copy could save us a bit disk space.
 * Start training:
 ```
 # cd /path/to/mxnet/example/ssd
 python train.py
 ```
 * By default, this example will use `batch-size=32` and `learning_rate=0.001`.
 You might need to change the parameters a bit if you have different configurations.
 Check `python train.py --help` for more training options. For example, if you have 4 GPUs, use:
 ```
 # note that a perfect training parameter set is yet to be discovered for multi-GPUs
 python train.py --gpus 0,1,2,3 --batch-size 128 --lr 0.0005
 ```
 * Memory usage: MXNet is very memory efficient, training on `VGG16_reduced` model with `batch-size` 32 takes around 4684MB without CUDNN.
 * Initial lenarning rate: 0.001 is fine for single GPU. 0.0001 should be used for the first couple of epoches then go back to 0.001 via using parameter --resume.

 ### Evalute trained model
 Again, currently we only support evaluation on PASCAL VOC
 Use:
 ```
 # cd /path/to/mxnet/example/ssd
 python evaluate.py --gpus 0,1 --batch-size 128 --epoch 0
 ```
 ### Convert model to deploy mode
 This simply removes all loss layers, and attach a layer for merging results and non-maximum suppression.
 Useful when loading python symbol is not available.
 ```
 # cd /path/to/mxnet/example/ssd
 python deploy.py --num-class 20
 ```
	# SSD: Single Shot MultiBox Object Detector

	SSD is an unified framework for object detection with a single network.

	You can use the code to train/evaluate/test for object detection task.

	### Disclaimer
	This is a re-implementation of original SSD which is based on caffe. The official
	repository is available [here](https://github.com/weiliu89/caffe/tree/ssd).
	The arXiv paper is available [here](http://arxiv.org/abs/1512.02325).

	This example is intended for reproducing the nice detector while fully utilize the
	remarkable traits of MXNet.
	* The model is fully compatible with caffe version.
	* The prediction result is almost identical to the original version. However, due to different non-maximum suppression Implementation, the results might differ slightly.

	Due to the permission issue, this example is maintained in this [repository](https://github.com/zhreshold/mxnet-ssd) separately. You can use the link regarding specific per example [issues](https://github.com/zhreshold/mxnet-ssd/issues).

	### Demo results
	![demo1](https://cloud.githubusercontent.com/assets/3307514/19171057/8e1a0cc4-8be0-11e6-9d8f-088c25353b40.png)
	![demo2](https://cloud.githubusercontent.com/assets/3307514/19171063/91ec2792-8be0-11e6-983c-773bd6868fa8.png)
	![demo3](https://cloud.githubusercontent.com/assets/3307514/19171086/a9346842-8be0-11e6-8011-c17716b22ad3.png)

	### mAP
	\| Model \| Training data \| Test data \| mAP \|
	\|:-----------------:\|:----------------:\|:---------:\|:----:\|
	\| VGG16_reduced 300x300 \| VOC07+12 trainval\| VOC07 test\| 71.57\|

	### Speed
	\| Model \| GPU \| CUDNN \| Batch-size \| FPS* \|
	\|:---------------------:\|:----------------:\|:-----:\|:----------:\|:----:\|
	\| VGG16_reduced 300x300 \| TITAN X(Maxwell) \| v5.1 \| 16 \| 95 \|
	\| VGG16_reduced 300x300 \| TITAN X(Maxwell) \| v5.1 \| 8 \| 95 \|
	\| VGG16_reduced 300x300 \| TITAN X(Maxwell) \| v5.1 \| 1 \| 64 \|
	\| VGG16_reduced 300x300 \| TITAN X(Maxwell) \| N/A \| 8 \| 36 \|
	\| VGG16_reduced 300x300 \| TITAN X(Maxwell) \| N/A \| 1 \| 28 \|
	- Forward time only, data loading and drawing excluded.


	### Getting started
	* You will need python modules: `easydict`, `cv2`, `matplotlib` and `numpy`.
	You can install them via pip or package managers, such as `apt-get`:
	```
	sudo apt-get install python-opencv python-matplotlib python-numpy
	sudo pip install easydict
	```

	* Build MXNet: Follow the official instructions, make sure the extra operators for this example is enabled
	```
	# for Ubuntu/Debian
	cp make/config.mk ./config.mk
	# modify it if with vim or whatever editor
	EXTRA_OPERATORS = example/ssd/operator
	# or add a line if you have other EXTRA_OPERATORS directory
	EXTRA_OPERATORS += example/ssd/operator
	```
	Remember to enable CUDA if you want to be able to train, since CPU training is
	insanely slow. Using CUDNN is optional.

	### Try the demo
	* Download the pretrained model: [`ssd_300.zip`](https://dl.dropboxusercontent.com/u/39265872/ssd_300_vgg16_reduced_voc0712_trainval.zip), and extract to `model/` directory. (This model is converted from VGG_VOC0712_SSD_300x300_iter_60000.caffemodel provided by paper author).
	* Run
	```
	# cd /path/to/mxnet/example/ssd/
	# grab demo images
	python data/demo/download_demo_images.py
	# run demo.py with defaults
	python demo.py
	# play with examples:
	python demo.py --epoch 0 --images ./data/demo/dog.jpg --thresh 0.5
	```
	* Check `python demo.py --help` for more options.

	### Train the model
	This example only covers training on Pascal VOC dataset. Other datasets should
	be easily supported by adding subclass derived from class `Imdb` in `dataset/imdb.py`.
	See example of `dataset/pascal_voc.py` for details.
	* Download the converted pretrained `vgg16_reduced` model [here](https://dl.dropboxusercontent.com/u/39265872/vgg16_reduced.zip), unzip `.param` and `.json` files
	into `model/` directory by default.
	* Download the PASCAL VOC dataset, skip this step if you already have one.
	```
	cd /path/to/where_you_store_datasets/
	wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
	wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
	wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
	# Extract the data.
	tar -xvf VOCtrainval_11-May-2012.tar
	tar -xvf VOCtrainval_06-Nov-2007.tar
	tar -xvf VOCtest_06-Nov-2007.tar
	```
	* We are goint to use `trainval` set in VOC2007/2012 as a common strategy.
	The suggested directory structure is to store `VOC2007` and `VOC2012` directories
	in the same `VOCdevkit` folder.
	* Then link `VOCdevkit` folder to `data/VOCdevkit` by default:
	```
	ln -s /path/to/VOCdevkit /path/to/mxnet/example/ssd/data/VOCdevkit
	```
	Use hard link instead of copy could save us a bit disk space.
	* Start training:
	```
	# cd /path/to/mxnet/example/ssd
	python train.py
	```
	* By default, this example will use `batch-size=32` and `learning_rate=0.001`.
	You might need to change the parameters a bit if you have different configurations.
	Check `python train.py --help` for more training options. For example, if you have 4 GPUs, use:
	```
	# note that a perfect training parameter set is yet to be discovered for multi-GPUs
	python train.py --gpus 0,1,2,3 --batch-size 128 --lr 0.0005
	```
	* Memory usage: MXNet is very memory efficient, training on `VGG16_reduced` model with `batch-size` 32 takes around 4684MB without CUDNN.
	* Initial lenarning rate: 0.001 is fine for single GPU. 0.0001 should be used for the first couple of epoches then go back to 0.001 via using parameter --resume.

	### Evalute trained model
	Again, currently we only support evaluation on PASCAL VOC
	Use:
	```
	# cd /path/to/mxnet/example/ssd
	python evaluate.py --gpus 0,1 --batch-size 128 --epoch 0
	```
	### Convert model to deploy mode
	This simply removes all loss layers, and attach a layer for merging results and non-maximum suppression.
	Useful when loading python symbol is not available.
	```
	# cd /path/to/mxnet/example/ssd
	python deploy.py --num-class 20
	```