There exist good implementations of Faster R-CNN yet they lack support for recent ConvNet architectures. The aim of reproducing it from scratch is to fully utilize MXNet engines and parallelization for object detection.
|Indicator||py-faster-rcnn (caffe resp.)||mx-rcnn (this reproduction)|
|Speed ||2.5 img/s training, 5 img/s testing||3.8 img/s in training, 12.5 img/s testing|
|Performance ||mAP 73.2||mAP 75.97|
|Efficiency ||11G for Fast R-CNN||4.6G for Fast R-CNN|
|Parallelization ||None||3.8 img/s to 6 img/s for 2 GPUs|
|Extensibility ||Old framework and base networks||ResNet|
 On Ubuntu 14.04.5 with device Titan X, cuDNN enabled. The experiment is VGG-16 end-to-end training.
 VGG network. Trained end-to-end on VOC07trainval+12trainval, tested on VOC07 test.
 VGG network. Fast R-CNN is the most memory expensive process.
 VGG network (parallelization limited by bandwidth). ResNet-101 speeds up from 2 img/s to 3.5 img/s.
 py-faster-rcnn does not support ResNet or recent caffe version.
|Method||Network||Training Data||Testing Data||Reference||Result|
|Faster R-CNN alternate||VGG16||VOC07||VOC07test||69.9||69.62|
|Faster R-CNN end-to-end||VGG16||VOC07||VOC07test||69.9||70.23|
|Faster R-CNN end-to-end||VGG16||VOC07+12||VOC07test||73.2||75.97|
|Faster R-CNN end-to-end||ResNet-101||VOC07+12||VOC07test||76.4||79.35|
|Faster R-CNN end-to-end||VGG16||COCO train||COCO val||21.2||22.8|
|Faster R-CNN end-to-end||ResNet-101||COCO train||COCO val||27.2||26.1|
The above experiments were conducted at mx-rcnn using a MXNet fork, based on MXNet 0.9.1 nnvm pre-release.
bash script/vgg_voc07.sh 0,1(use gpu 0 and 1)
bash script/additional_deps.sh will do the following for you.
HOMErepresents where this file is located. All commands, unless stated otherwise, should be started from
cython easydict matplotlib scikit-image.
import mxnetto confirm.
Command line arguments have the same meaning as in mxnet/example/image-classification.
prefixrefers to the first part of a saved model file name and
epochrefers to a number in this file name. In
begin_epochmeans the start of your training process, which will apply to all saved checkpoints.
--prefix final --epoch 0to access it.
python demo.py --prefix final --epoch 0 --image myimage.jpg --gpu 0 --vis. Drop the
--visif you do not have a display or want to save as a new file.
The following tutorial is based on VOC data, VGG network. Supply
--network resnet and
--dataset coco to use other networks and datasets. Refer to
script/vgg_voc07.sh and other experiments for examples.
bash script/get_voc.sh and
bash script/get_coco.sh will do the following for you.
datafolder will be used to place the training data folder
coco/imagesand annotation jsons to
(Skip this if not interested) All dataset have three attributes,
2007_trainvalor something like
rpn_datawill be stored.
dataset_pathcould be something like
data/VOCdevkit, where images, annotations and results can be put so that many copies of datasets can be linked to the same actual place.
bash script/get_pretrained_model.sh will do this for you. If not,
modelfolder will be used to place model checkpoints along the training process. It is recommended to set
modelas a symbolic link to somewhere else in hard disk.
vgg16-0000.paramsfrom MXNet model gallery to
resnet-101-0000.paramsfrom ResNet to
bash script/vgg_alter_voc07.sh 0 (use gpu 0) will do the following for you.
python train_alternate.py. This will train the VGG network on the VOC07 trainval. More control of training process can be found in the argparse help.
python test.py --prefix model/final --epoch 0after completing the training process. This will test the VGG network on the VOC07 test with the model in
HOME/model/final-0000.params. Adding a
--viswill turn on visualization and
-hwill show help as in the training process.
bash script/vgg_voc07.sh 0 (use gpu 0) will do the following for you.
python train_end2end.py. This will train the VGG network on VOC07 trainval.
python test.py. This will test the VGG network on the VOC07 test.
bash script/get_selective.sh and
bash script/vgg_fast_rcnn.sh 0 (use gpu 0) will do the following for you.
scipyis used to load selective search proposals.
script/get_selective_search.shwill do this.
python -m rcnn.tools.train_rcnn --proposal selective_searchto use the selective search proposal.
python -m rcnn.tools.test_rcnn --proposal selective_search.
script/vgg_fast_rcnn.shwill train Fast R-CNN on VOC07 and test on VOC07test.
Region Proposal Network solves object detection as a regression problem from the objectness perspective. Bounding boxes are predicted by applying learned bounding box deltas to base boxes, namely anchor boxes across different positions in feature maps. Training process directly learns a mapping from raw image intensities to bounding box transformation targets.
Fast R-CNN treats general object detection as a classification problem and bounding box prediction as a regression problem. Classifying cropped region feature maps and predicting bounding box displacements together yields detection results. Cropping feature maps instead of image input accelerates computation utilizing shared convolution maps. Bounding box displacements are simultaneously learned in the training process.
Faster R-CNN utilize an alternate optimization training process between RPN and Fast R-CNN. Fast R-CNN weights are used to initiate RPN for training. The approximate joint training scheme does not backpropagate rcnn training error to rpn training.
This repository provides Faster R-CNN as a package named
rcnn.core: core routines in Faster R-CNN training and testing.
rcnn.cython: cython speedup from py-faster-rcnn.
rcnn.dataset: dataset library. Base class is
rcnn.io: prepare training data.
rcnn.processing: data and label processing library.
rcnn.pycocotools: python api from coco dataset.
rcnn.symbol: symbol and operator.
rcnn.tools: training and testing wrapper.
rcnn.utils: utilities in training and testing, usually overloads mxnet functions.
This repository used code from MXNet, Fast R-CNN, Faster R-CNN, caffe, tornadomeet/mx-rcnn, MS COCO API.
Training data are from Pascal VOC, ImageNet, COCO.
Model comes from VGG16, ResNet.
Thanks to tornadomeet for end-to-end experiments and MXNet contributers for helpful discussions.
History of this implementation is:
mxnet/example/rcnn was v1, v2, v3.5 and now v5.