The examples in this folder demonstrate the inference workflow. Please build the MXNet C++ Package as explained in the README File. You can get the executable files by just copying them from mxnet/build/cpp-package/example
This directory contains following examples. In order to run the examples, ensure that the path to the MXNet shared library is added to the OS specific environment variable viz. LD_LIBRARY_PATH for Linux, Mac and Ubuntu OS and PATH for Windows OS.
This example demonstrates image classification workflow with pre-trained models using MXNet C++ API. Now this script also supports inference with quantized CNN models generated by oneDNN (see this quantization flow). By using C++ API, the latency of most models will be reduced to some extent compared with current Python implementation.
Most of CNN models have been tested on Linux systems. And 50000 images are used to collect accuracy numbers. Please refer to this README for more details about accuracy.
The following performance numbers are collected via using C++ inference API on AWS EC2 C5.12xlarge. The environment variables are set like below:
export KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0 export OMP_NUM_THREADS=$(vCPUs/2) export MXNET_ENGINE_TYPE=NaiveEngine
Also users are recommended to use numactl
or taskset
to bind a running process to the specified cores.
Model | Dataset | BS=1 (imgs/sec) | BS=64 (imgs/sec) |
---|---|---|---|
FP32 / INT8 | FP32 / INT8 | ||
ResNet18-V1 | Validation Dataset | 369.00 / 778.82 | 799.7 / 2598.04 |
ResNet50-V1 | Validation Dataset | 160.72 / 405.84 | 349.73 / 1297.65 |
ResNet101-V1 | Validation Dataset | 89.56 / 197.55 | 193.25 / 740.47 |
Squeezenet 1.0 | Validation Dataset | 294.46 / 899.28 | 857.70 / 3065.13 |
MobileNet 1.0 | Validation Dataset | 554.94 / 676.59 | 1279.44 / 3393.43 |
MobileNetV2 1.0 | Validation Dataset | 303.40 / 776.40 | 994.25 / 4227.77 |
Inception V3 | Validation Dataset | 108.20 / 219.20 | 232.22 / 870.09 |
ResNet152-V2 | Validation Dataset | 52.28 / 64.62 | 107.03 / 134.04 |
Inception-BN | Validation Dataset | 211.86 / 306.37 | 632.79 / 2115.28 |
The command line to launch inference by this script can accept are as shown below:
./imagenet_inference --help Usage: imagenet_inference --symbol_file <model symbol file in json format> --params_file <model params file> --dataset <dataset used to benchmark> --data_nthreads <number of threads for data decoding, default: 60> --input_shape <shape of input image e.g "3 224 224">] --rgb_mean <mean value to be subtracted on R/G/B channel e.g "0 0 0"> --rgb_std <standard deviation on R/G/B channel. e.g "1 1 1"> --batch_size <number of images per batch> --num_skipped_batches <skip the number of batches for inference> --num_inference_batches <number of batches used for inference> --data_layer_type <default: "float32", choices: ["float32", "int8", "uint8"]> --gpu <whether to run inference on GPU, default: false> --enableTRT <whether to run inference with TensorRT, default: false>" --benchmark <whether to use dummy data to run inference, default: false>
Follow the below steps to do inference with more models.
./model
directory../model
directory../data
directory.The below command lines show how to run inference with FP32/INT8 resnet50_v1 model. Because the C++ inference script provides the almost same command line as this Python script and then users can easily go from Python to C++.
# FP32 inference ./imagenet_inference --symbol_file "./model/resnet50_v1-symbol.json" --params_file "./model/resnet50_v1-0000.params" --dataset "./data/val_256_q90.rec" --rgb_mean "123.68 116.779 103.939" --rgb_std "58.393 57.12 57.375" --batch_size 64 --num_skipped_batches 50 --num_inference_batches 500 # INT8 inference ./imagenet_inference --symbol_file "./model/resnet50_v1-quantized-5batches-naive-symbol.json" --params_file "./model/resnet50_v1-quantized-0000.params" --dataset "./data/val_256_q90.rec" --rgb_mean "123.68 116.779 103.939" --rgb_std "58.393 57.12 57.375" --batch_size 64 --num_skipped_batches 50 --num_inference_batches 500 # FP32 dummy data ./imagenet_inference --symbol_file "./model/resnet50_v1-symbol.json" --batch_size 64 --num_inference_batches 500 --benchmark # INT8 dummy data ./imagenet_inference --symbol_file "./model/resnet50_v1-quantized-5batches-naive-symbol.json" --batch_size 64 --num_inference_batches 500 --benchmark
For a quick inference test, users can directly run unit_test_imagenet_inference.sh by using the below command. This script will automatically download the pre-trained Inception-Bn and resnet50_v1_int8 model and validation dataset which are required for inference.
./unit_test_imagenet_inference.sh
And you may get the similiar outputs like below:
>>> INFO: FP32 real data imagenet_inference.cpp:282: Loading the model from ./model/Inception-BN-symbol.json imagenet_inference.cpp:295: Loading the model parameters from ./model/Inception-BN-0126.params imagenet_inference.cpp:443: INFO:Dataset for inference: ./data/val_256_q90.rec imagenet_inference.cpp:444: INFO:label_name = softmax_label imagenet_inference.cpp:445: INFO:rgb_mean: (123.68, 116.779, 103.939) imagenet_inference.cpp:447: INFO:rgb_std: (1, 1, 1) imagenet_inference.cpp:449: INFO:Image shape: (3, 224, 224) imagenet_inference.cpp:451: INFO:Finished inference with: 500 images imagenet_inference.cpp:453: INFO:Batch size = 1 for inference imagenet_inference.cpp:454: INFO:Accuracy: 0.744 imagenet_inference.cpp:455: INFO:Throughput: xxxx images per second >>> INFO: FP32 dummy data imagenet_inference.cpp:282: Loading the model from ./model/Inception-BN-symbol.json imagenet_inference.cpp:372: Running the forward pass on model to evaluate the performance.. imagenet_inference.cpp:387: benchmark completed! imagenet_inference.cpp:388: batch size: 1 num batch: 500 throughput: xxxx imgs/s latency:xxxx ms >>> INFO: INT8 dummy data imagenet_inference.cpp:282: Loading the model from ./model/resnet50_v1_int8-symbol.json imagenet_inference.cpp:372: Running the forward pass on model to evaluate the performance.. imagenet_inference.cpp:387: benchmark completed! imagenet_inference.cpp:388: batch size: 1 num batch: 500 throughput: xxxx imgs/s latency:xxxx ms
For running this example with TensorRT, you can quickly try the following example to run a benchmark test for testing Inception BN:
./imagenet_inference --symbol_file "./model/Inception-BN-symbol.json" --params_file "./model/Inception-BN-0126.params" --batch_size 16 --num_inference_batches 500 --benchmark --enableTRT
Sample output will looks like this (the example is running on a AWS P3.2xl machine):
imagenet_inference.cpp:302: Loading the model from ./model/Inception-BN-symbol.json build_subgraph.cc:686: start to execute partition graph. imagenet_inference.cpp:317: Loading the model parameters from ./model/Inception-BN-0126.params imagenet_inference.cpp:424: Running the forward pass on model to evaluate the performance.. imagenet_inference.cpp:439: benchmark completed! imagenet_inference.cpp:440: batch size: 16 num batch: 500 throughput: 6284.78 imgs/s latency:0.159115 ms
This example demonstrates how you can load a pre-trained RNN model and use it to predict the sentiment expressed in the given movie review with the MXNet C++ API. The example is capable of processing variable legnth inputs. It performs the following tasks
The example is capable of processing variable length input by implementing following technique:
The example uses a pre-trained RNN model trained with a IMDB dataset. The RNN model was built by exercising the GluonNLP Sentiment Analysis Tutorial. The tutorial uses ‘standard_lstm_lm_200’ available in Gluon Model Zoo and fine tunes it for the IMDB dataset The model consists of :
The model files can be found here.
The example's command line parameters are as shown below:
./sentiment_analysis_rnn --help Usage: sentiment_analysis_rnn --input Input movie review. The review can be single line or multiline.e.g. "This movie is the best." OR "This movie is the best. The direction is awesome." [--gpu] Specify this option if workflow needs to be run in gpu context If the review is multiline, the example predicts sentiment score for each line and the final score is the average of scores obtained for each line.
The following command line shows running the example with the movie review containing only one line.
./sentiment_analysis_rnn --input "This movie has the great story"
The above command will output the sentiment score as follows:
sentiment_analysis_rnn.cpp:346: Input Line : [This movie has the great story] Score : 0.999898 sentiment_analysis_rnn.cpp:449: The sentiment score between 0 and 1, (1 being positive)=0.999898
The following command line shows invoking the example with the multi-line review.
./sentiment_analysis_rnn --input "This movie is the best. The direction is awesome."
The above command will output the sentiment score for each line in the review and average score as follows:
Input Line : [This movie is the best] Score : 0.964498 Input Line : [ The direction is awesome] Score : 0.968855 The sentiment score between 0 and 1, (1 being positive)=0.966677
Alternatively, you can run the unit_test_sentiment_analysis_rnn.sh script.