blob: 37061cec0d7079350ead9edf367f38a62fa768d3 [file] [log] [blame] [view]
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
~
-->
# MXNet C++ Package Inference Workflow Examples
## Building C++ Inference examples
The examples in this folder demonstrate the **inference** workflow. Please build the MXNet C++ Package as explained in the [README](<https://github.com/apache/mxnet/tree/master/cpp-package#building-c-package>) File. You can get the executable files by just copying them from ```mxnet/build/cpp-package/example```
## Examples demonstrating inference workflow
This directory contains following examples. In order to run the examples, ensure that the path to the MXNet shared library is added to the OS specific environment variable viz. **LD\_LIBRARY\_PATH** for Linux, Mac and Ubuntu OS and **PATH** for Windows OS.
## [imagenet_inference.cpp](<https://github.com/apache/mxnet/blob/master/cpp-package/example/inference/imagenet_inference.cpp>)
This example demonstrates image classification workflow with pre-trained models using MXNet C++ API. Now this script also supports inference with quantized CNN models generated by oneDNN (see this [quantization flow](https://github.com/apache/mxnet/blob/master/example/quantization/README.md)). By using C++ API, the latency of most models will be reduced to some extent compared with current Python implementation.
Most of CNN models have been tested on Linux systems. And 50000 images are used to collect accuracy numbers. Please refer to this [README](https://github.com/apache/mxnet/blob/master/example/quantization/README.md) for more details about accuracy.
The following performance numbers are collected via using C++ inference API on AWS EC2 C5.12xlarge. The environment variables are set like below:
```
export KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0
export OMP_NUM_THREADS=$(vCPUs/2)
export MXNET_ENGINE_TYPE=NaiveEngine
```
Also users are recommended to use ```numactl``` or ```taskset``` to bind a running process to the specified cores.
| Model | Dataset |BS=1 (imgs/sec) |BS=64 (imgs/sec) |
|:---|:---|:---:|:---:|
| | |FP32 / INT8 | FP32 / INT8 |
| ResNet18-V1 | [Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) |369.00 / 778.82|799.7 / 2598.04|
| ResNet50-V1 | [Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) |160.72 / 405.84|349.73 / 1297.65 |
| ResNet101-V1 | [Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) | 89.56 / 197.55| 193.25 / 740.47|
|Squeezenet 1.0|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) | 294.46 / 899.28| 857.70 / 3065.13|
|MobileNet 1.0|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) |554.94 / 676.59|1279.44 / 3393.43|
|MobileNetV2 1.0|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) |303.40 / 776.40|994.25 / 4227.77|
|Inception V3|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) |108.20 / 219.20 | 232.22 / 870.09 |
|ResNet152-V2|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) |52.28 / 64.62|107.03 / 134.04 |
|Inception-BN|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) | 211.86 / 306.37| 632.79 / 2115.28|
The command line to launch inference by this script can accept are as shown below:
```
./imagenet_inference --help
Usage:
imagenet_inference --symbol_file <model symbol file in json format>
--params_file <model params file>
--dataset <dataset used to benchmark>
--data_nthreads <number of threads for data decoding, default: 60>
--input_shape <shape of input image e.g "3 224 224">]
--rgb_mean <mean value to be subtracted on R/G/B channel e.g "0 0 0">
--rgb_std <standard deviation on R/G/B channel. e.g "1 1 1">
--batch_size <number of images per batch>
--num_skipped_batches <skip the number of batches for inference>
--num_inference_batches <number of batches used for inference>
--data_layer_type <default: "float32", choices: ["float32", "int8", "uint8"]>
--gpu <whether to run inference on GPU, default: false>
--enableTRT <whether to run inference with TensorRT, default: false>"
--benchmark <whether to use dummy data to run inference, default: false>
```
Follow the below steps to do inference with more models.
- Download the pre-trained FP32 models into ```./model``` directory.
- Refer this [README](https://github.com/apache/mxnet/blob/master/example/quantization/README.md) to generate the corresponding quantized models and also put them into ```./model``` directory.
- Prepare [validation dataset](http://data.mxnet.io/data/val_256_q90.rec) and put it into ```./data``` directory.
The below command lines show how to run inference with FP32/INT8 resnet50_v1 model. Because the C++ inference script provides the almost same command line as this [Python script](https://github.com/apache/mxnet/blob/master/example/quantization/imagenet_inference.py) and then users can easily go from Python to C++.
```
# FP32 inference
./imagenet_inference --symbol_file "./model/resnet50_v1-symbol.json" --params_file "./model/resnet50_v1-0000.params" --dataset "./data/val_256_q90.rec" --rgb_mean "123.68 116.779 103.939" --rgb_std "58.393 57.12 57.375" --batch_size 64 --num_skipped_batches 50 --num_inference_batches 500
# INT8 inference
./imagenet_inference --symbol_file "./model/resnet50_v1-quantized-5batches-naive-symbol.json" --params_file "./model/resnet50_v1-quantized-0000.params" --dataset "./data/val_256_q90.rec" --rgb_mean "123.68 116.779 103.939" --rgb_std "58.393 57.12 57.375" --batch_size 64 --num_skipped_batches 50 --num_inference_batches 500
# FP32 dummy data
./imagenet_inference --symbol_file "./model/resnet50_v1-symbol.json" --batch_size 64 --num_inference_batches 500 --benchmark
# INT8 dummy data
./imagenet_inference --symbol_file "./model/resnet50_v1-quantized-5batches-naive-symbol.json" --batch_size 64 --num_inference_batches 500 --benchmark
```
For a quick inference test, users can directly run [unit_test_imagenet_inference.sh](<https://github.com/apache/mxnet/blob/master/cpp-package/example/inference/unit_test_imagenet_inference.sh>) by using the below command. This script will automatically download the pre-trained **Inception-Bn** and **resnet50_v1_int8** model and **validation dataset** which are required for inference.
```
./unit_test_imagenet_inference.sh
```
And you may get the similiar outputs like below:
```
>>> INFO: FP32 real data
imagenet_inference.cpp:282: Loading the model from ./model/Inception-BN-symbol.json
imagenet_inference.cpp:295: Loading the model parameters from ./model/Inception-BN-0126.params
imagenet_inference.cpp:443: INFO:Dataset for inference: ./data/val_256_q90.rec
imagenet_inference.cpp:444: INFO:label_name = softmax_label
imagenet_inference.cpp:445: INFO:rgb_mean: (123.68, 116.779, 103.939)
imagenet_inference.cpp:447: INFO:rgb_std: (1, 1, 1)
imagenet_inference.cpp:449: INFO:Image shape: (3, 224, 224)
imagenet_inference.cpp:451: INFO:Finished inference with: 500 images
imagenet_inference.cpp:453: INFO:Batch size = 1 for inference
imagenet_inference.cpp:454: INFO:Accuracy: 0.744
imagenet_inference.cpp:455: INFO:Throughput: xxxx images per second
>>> INFO: FP32 dummy data
imagenet_inference.cpp:282: Loading the model from ./model/Inception-BN-symbol.json
imagenet_inference.cpp:372: Running the forward pass on model to evaluate the performance..
imagenet_inference.cpp:387: benchmark completed!
imagenet_inference.cpp:388: batch size: 1 num batch: 500 throughput: xxxx imgs/s latency:xxxx ms
>>> INFO: INT8 dummy data
imagenet_inference.cpp:282: Loading the model from ./model/resnet50_v1_int8-symbol.json
imagenet_inference.cpp:372: Running the forward pass on model to evaluate the performance..
imagenet_inference.cpp:387: benchmark completed!
imagenet_inference.cpp:388: batch size: 1 num batch: 500 throughput: xxxx imgs/s latency:xxxx ms
```
For running this example with TensorRT, you can quickly try the following example to run a benchmark test for testing Inception BN:
```
./imagenet_inference --symbol_file "./model/Inception-BN-symbol.json" --params_file "./model/Inception-BN-0126.params" --batch_size 16 --num_inference_batches 500 --benchmark --enableTRT
```
Sample output will looks like this (the example is running on a AWS P3.2xl machine):
```
imagenet_inference.cpp:302: Loading the model from ./model/Inception-BN-symbol.json
build_subgraph.cc:686: start to execute partition graph.
imagenet_inference.cpp:317: Loading the model parameters from ./model/Inception-BN-0126.params
imagenet_inference.cpp:424: Running the forward pass on model to evaluate the performance..
imagenet_inference.cpp:439: benchmark completed!
imagenet_inference.cpp:440: batch size: 16 num batch: 500 throughput: 6284.78 imgs/s latency:0.159115 ms
```
## [sentiment_analysis_rnn.cpp](<https://github.com/apache/mxnet/blob/master/cpp-package/example/inference/sentiment_analysis_rnn.cpp>)
This example demonstrates how you can load a pre-trained RNN model and use it to predict the sentiment expressed in the given movie review with the MXNet C++ API. The example is capable of processing variable legnth inputs. It performs the following tasks
- Loads the pre-trained RNN model.
- Loads the dictionary file containing the word to index mapping.
- Splits the review in multiple lines separated by "."
- The example predicts the sentiment score for individual lines and outputs the average score.
The example is capable of processing variable length input by implementing following technique:
- The example creates executors for pre-determined input lenghts such as 5, 10, 15, 20, 25, etc called **buckets**.
- Each bucket is identified by **bucket-key** representing the length on input required by corresponding executor.
- For each line in the review, the example finds the number of words in the line and tries to find a closest bucket or executor.
- If the bucket key does not match the number of words in the line, the example pads or trims the input line to match the required length.
The example uses a pre-trained RNN model trained with a IMDB dataset. The RNN model was built by exercising the [GluonNLP Sentiment Analysis Tutorial](<http://gluon-nlp.mxnet.io/examples/sentiment_analysis/sentiment_analysis.html#>). The tutorial uses 'standard_lstm_lm_200' available in Gluon Model Zoo and fine tunes it for the IMDB dataset
The model consists of :
- Embedding Layer
- 2 LSTM Layers with hidden dimension size of 200
- Average pooling layer
- Sigmoid output layer
The model was trained for 10 epochs to achieve 85% test accuracy.
The visual representation of the model is [here](<http://gluon-nlp.mxnet.io/examples/sentiment_analysis/sentiment_analysis.html#Sentiment-analysis-model-with-pre-trained-language-model-encoder>).
The model files can be found here.
- [sentiment_analysis-symbol.json](< https://s3.amazonaws.com/mxnet-cpp/RNN_model/sentiment_analysis-symbol.json>)
- [sentiment_analysis-0010.params](< https://s3.amazonaws.com/mxnet-cpp/RNN_model/sentiment_analysis-0010.params>)
- [sentiment_token_to_idx.txt](<https://s3.amazonaws.com/mxnet-cpp/RNN_model/sentiment_token_to_idx.txt>) Each line of the dictionary file contains a word and a unique index for that word, separated by a space, with a total of 32787 words generated from the training dataset.
The example downloads the above files while running.
The example's command line parameters are as shown below:
```
./sentiment_analysis_rnn --help
Usage:
sentiment_analysis_rnn
--input Input movie review. The review can be single line or multiline.e.g. "This movie is the best." OR "This movie is the best. The direction is awesome."
[--gpu] Specify this option if workflow needs to be run in gpu context
If the review is multiline, the example predicts sentiment score for each line and the final score is the average of scores obtained for each line.
```
The following command line shows running the example with the movie review containing only one line.
```
./sentiment_analysis_rnn --input "This movie has the great story"
```
The above command will output the sentiment score as follows:
```
sentiment_analysis_rnn.cpp:346: Input Line : [This movie has the great story] Score : 0.999898
sentiment_analysis_rnn.cpp:449: The sentiment score between 0 and 1, (1 being positive)=0.999898
```
The following command line shows invoking the example with the multi-line review.
```
./sentiment_analysis_rnn --input "This movie is the best. The direction is awesome."
```
The above command will output the sentiment score for each line in the review and average score as follows:
```
Input Line : [This movie is the best] Score : 0.964498
Input Line : [ The direction is awesome] Score : 0.968855
The sentiment score between 0 and 1, (1 being positive)=0.966677
```
Alternatively, you can run the [unit_test_sentiment_analysis_rnn.sh](<https://github.com/apache/mxnet/blob/master/cpp-package/example/inference/unit_test_sentiment_analysis_rnn.sh>) script.