| <!-- |
| ~ Licensed to the Apache Software Foundation (ASF) under one |
| ~ or more contributor license agreements. See the NOTICE file |
| ~ distributed with this work for additional information |
| ~ regarding copyright ownership. The ASF licenses this file |
| ~ to you under the Apache License, Version 2.0 (the |
| ~ "License"); you may not use this file except in compliance |
| ~ with the License. You may obtain a copy of the License at |
| ~ |
| ~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~ |
| ~ Unless required by applicable law or agreed to in writing, |
| ~ software distributed under the License is distributed on an |
| ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| ~ KIND, either express or implied. See the License for the |
| ~ specific language governing permissions and limitations |
| ~ under the License. |
| ~ |
| --> |
| |
| # MXNet C++ Package Inference Workflow Examples |
| |
| ## Building C++ Inference examples |
| |
| The examples in this folder demonstrate the **inference** workflow. Please build the MXNet C++ Package as explained in the [README](<https://github.com/apache/mxnet/tree/master/cpp-package#building-c-package>) File. You can get the executable files by just copying them from ```mxnet/build/cpp-package/example``` |
| |
| ## Examples demonstrating inference workflow |
| |
| This directory contains following examples. In order to run the examples, ensure that the path to the MXNet shared library is added to the OS specific environment variable viz. **LD\_LIBRARY\_PATH** for Linux, Mac and Ubuntu OS and **PATH** for Windows OS. |
| |
| ## [imagenet_inference.cpp](<https://github.com/apache/mxnet/blob/master/cpp-package/example/inference/imagenet_inference.cpp>) |
| |
| This example demonstrates image classification workflow with pre-trained models using MXNet C++ API. Now this script also supports inference with quantized CNN models generated by oneDNN (see this [quantization flow](https://github.com/apache/mxnet/blob/master/example/quantization/README.md)). By using C++ API, the latency of most models will be reduced to some extent compared with current Python implementation. |
| |
| Most of CNN models have been tested on Linux systems. And 50000 images are used to collect accuracy numbers. Please refer to this [README](https://github.com/apache/mxnet/blob/master/example/quantization/README.md) for more details about accuracy. |
| |
| The following performance numbers are collected via using C++ inference API on AWS EC2 C5.12xlarge. The environment variables are set like below: |
| |
| ``` |
| export KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0 |
| export OMP_NUM_THREADS=$(vCPUs/2) |
| export MXNET_ENGINE_TYPE=NaiveEngine |
| ``` |
| Also users are recommended to use ```numactl``` or ```taskset``` to bind a running process to the specified cores. |
| |
| | Model | Dataset |BS=1 (imgs/sec) |BS=64 (imgs/sec) | |
| |:---|:---|:---:|:---:| |
| | | |FP32 / INT8 | FP32 / INT8 | |
| | ResNet18-V1 | [Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) |369.00 / 778.82|799.7 / 2598.04| |
| | ResNet50-V1 | [Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) |160.72 / 405.84|349.73 / 1297.65 | |
| | ResNet101-V1 | [Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) | 89.56 / 197.55| 193.25 / 740.47| |
| |Squeezenet 1.0|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) | 294.46 / 899.28| 857.70 / 3065.13| |
| |MobileNet 1.0|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) |554.94 / 676.59|1279.44 / 3393.43| |
| |MobileNetV2 1.0|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) |303.40 / 776.40|994.25 / 4227.77| |
| |Inception V3|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) |108.20 / 219.20 | 232.22 / 870.09 | |
| |ResNet152-V2|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) |52.28 / 64.62|107.03 / 134.04 | |
| |Inception-BN|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) | 211.86 / 306.37| 632.79 / 2115.28| |
| |
| The command line to launch inference by this script can accept are as shown below: |
| ``` |
| ./imagenet_inference --help |
| Usage: |
| imagenet_inference --symbol_file <model symbol file in json format> |
| --params_file <model params file> |
| --dataset <dataset used to benchmark> |
| --data_nthreads <number of threads for data decoding, default: 60> |
| --input_shape <shape of input image e.g "3 224 224">] |
| --rgb_mean <mean value to be subtracted on R/G/B channel e.g "0 0 0"> |
| --rgb_std <standard deviation on R/G/B channel. e.g "1 1 1"> |
| --batch_size <number of images per batch> |
| --num_skipped_batches <skip the number of batches for inference> |
| --num_inference_batches <number of batches used for inference> |
| --data_layer_type <default: "float32", choices: ["float32", "int8", "uint8"]> |
| --gpu <whether to run inference on GPU, default: false> |
| --enableTRT <whether to run inference with TensorRT, default: false>" |
| --benchmark <whether to use dummy data to run inference, default: false> |
| ``` |
| |
| Follow the below steps to do inference with more models. |
| |
| - Download the pre-trained FP32 models into ```./model``` directory. |
| - Refer this [README](https://github.com/apache/mxnet/blob/master/example/quantization/README.md) to generate the corresponding quantized models and also put them into ```./model``` directory. |
| - Prepare [validation dataset](http://data.mxnet.io/data/val_256_q90.rec) and put it into ```./data``` directory. |
| |
| The below command lines show how to run inference with FP32/INT8 resnet50_v1 model. Because the C++ inference script provides the almost same command line as this [Python script](https://github.com/apache/mxnet/blob/master/example/quantization/imagenet_inference.py) and then users can easily go from Python to C++. |
| ``` |
| |
| # FP32 inference |
| ./imagenet_inference --symbol_file "./model/resnet50_v1-symbol.json" --params_file "./model/resnet50_v1-0000.params" --dataset "./data/val_256_q90.rec" --rgb_mean "123.68 116.779 103.939" --rgb_std "58.393 57.12 57.375" --batch_size 64 --num_skipped_batches 50 --num_inference_batches 500 |
| |
| # INT8 inference |
| ./imagenet_inference --symbol_file "./model/resnet50_v1-quantized-5batches-naive-symbol.json" --params_file "./model/resnet50_v1-quantized-0000.params" --dataset "./data/val_256_q90.rec" --rgb_mean "123.68 116.779 103.939" --rgb_std "58.393 57.12 57.375" --batch_size 64 --num_skipped_batches 50 --num_inference_batches 500 |
| |
| # FP32 dummy data |
| ./imagenet_inference --symbol_file "./model/resnet50_v1-symbol.json" --batch_size 64 --num_inference_batches 500 --benchmark |
| |
| # INT8 dummy data |
| ./imagenet_inference --symbol_file "./model/resnet50_v1-quantized-5batches-naive-symbol.json" --batch_size 64 --num_inference_batches 500 --benchmark |
| |
| ``` |
| For a quick inference test, users can directly run [unit_test_imagenet_inference.sh](<https://github.com/apache/mxnet/blob/master/cpp-package/example/inference/unit_test_imagenet_inference.sh>) by using the below command. This script will automatically download the pre-trained **Inception-Bn** and **resnet50_v1_int8** model and **validation dataset** which are required for inference. |
| |
| ``` |
| ./unit_test_imagenet_inference.sh |
| ``` |
| And you may get the similiar outputs like below: |
| ``` |
| >>> INFO: FP32 real data |
| imagenet_inference.cpp:282: Loading the model from ./model/Inception-BN-symbol.json |
| imagenet_inference.cpp:295: Loading the model parameters from ./model/Inception-BN-0126.params |
| imagenet_inference.cpp:443: INFO:Dataset for inference: ./data/val_256_q90.rec |
| imagenet_inference.cpp:444: INFO:label_name = softmax_label |
| imagenet_inference.cpp:445: INFO:rgb_mean: (123.68, 116.779, 103.939) |
| imagenet_inference.cpp:447: INFO:rgb_std: (1, 1, 1) |
| imagenet_inference.cpp:449: INFO:Image shape: (3, 224, 224) |
| imagenet_inference.cpp:451: INFO:Finished inference with: 500 images |
| imagenet_inference.cpp:453: INFO:Batch size = 1 for inference |
| imagenet_inference.cpp:454: INFO:Accuracy: 0.744 |
| imagenet_inference.cpp:455: INFO:Throughput: xxxx images per second |
| |
| >>> INFO: FP32 dummy data |
| imagenet_inference.cpp:282: Loading the model from ./model/Inception-BN-symbol.json |
| imagenet_inference.cpp:372: Running the forward pass on model to evaluate the performance.. |
| imagenet_inference.cpp:387: benchmark completed! |
| imagenet_inference.cpp:388: batch size: 1 num batch: 500 throughput: xxxx imgs/s latency:xxxx ms |
| |
| >>> INFO: INT8 dummy data |
| imagenet_inference.cpp:282: Loading the model from ./model/resnet50_v1_int8-symbol.json |
| imagenet_inference.cpp:372: Running the forward pass on model to evaluate the performance.. |
| imagenet_inference.cpp:387: benchmark completed! |
| imagenet_inference.cpp:388: batch size: 1 num batch: 500 throughput: xxxx imgs/s latency:xxxx ms |
| ``` |
| For running this example with TensorRT, you can quickly try the following example to run a benchmark test for testing Inception BN: |
| ``` |
| ./imagenet_inference --symbol_file "./model/Inception-BN-symbol.json" --params_file "./model/Inception-BN-0126.params" --batch_size 16 --num_inference_batches 500 --benchmark --enableTRT |
| ``` |
| Sample output will looks like this (the example is running on a AWS P3.2xl machine): |
| ``` |
| imagenet_inference.cpp:302: Loading the model from ./model/Inception-BN-symbol.json |
| build_subgraph.cc:686: start to execute partition graph. |
| imagenet_inference.cpp:317: Loading the model parameters from ./model/Inception-BN-0126.params |
| imagenet_inference.cpp:424: Running the forward pass on model to evaluate the performance.. |
| imagenet_inference.cpp:439: benchmark completed! |
| imagenet_inference.cpp:440: batch size: 16 num batch: 500 throughput: 6284.78 imgs/s latency:0.159115 ms |
| ``` |
| |
| ## [sentiment_analysis_rnn.cpp](<https://github.com/apache/mxnet/blob/master/cpp-package/example/inference/sentiment_analysis_rnn.cpp>) |
| This example demonstrates how you can load a pre-trained RNN model and use it to predict the sentiment expressed in the given movie review with the MXNet C++ API. The example is capable of processing variable legnth inputs. It performs the following tasks |
| - Loads the pre-trained RNN model. |
| - Loads the dictionary file containing the word to index mapping. |
| - Splits the review in multiple lines separated by "." |
| - The example predicts the sentiment score for individual lines and outputs the average score. |
| |
| The example is capable of processing variable length input by implementing following technique: |
| - The example creates executors for pre-determined input lenghts such as 5, 10, 15, 20, 25, etc called **buckets**. |
| - Each bucket is identified by **bucket-key** representing the length on input required by corresponding executor. |
| - For each line in the review, the example finds the number of words in the line and tries to find a closest bucket or executor. |
| - If the bucket key does not match the number of words in the line, the example pads or trims the input line to match the required length. |
| |
| The example uses a pre-trained RNN model trained with a IMDB dataset. The RNN model was built by exercising the [GluonNLP Sentiment Analysis Tutorial](<http://gluon-nlp.mxnet.io/examples/sentiment_analysis/sentiment_analysis.html#>). The tutorial uses 'standard_lstm_lm_200' available in Gluon Model Zoo and fine tunes it for the IMDB dataset |
| The model consists of : |
| - Embedding Layer |
| - 2 LSTM Layers with hidden dimension size of 200 |
| - Average pooling layer |
| - Sigmoid output layer |
| The model was trained for 10 epochs to achieve 85% test accuracy. |
| The visual representation of the model is [here](<http://gluon-nlp.mxnet.io/examples/sentiment_analysis/sentiment_analysis.html#Sentiment-analysis-model-with-pre-trained-language-model-encoder>). |
| |
| The model files can be found here. |
| - [sentiment_analysis-symbol.json](< https://s3.amazonaws.com/mxnet-cpp/RNN_model/sentiment_analysis-symbol.json>) |
| - [sentiment_analysis-0010.params](< https://s3.amazonaws.com/mxnet-cpp/RNN_model/sentiment_analysis-0010.params>) |
| - [sentiment_token_to_idx.txt](<https://s3.amazonaws.com/mxnet-cpp/RNN_model/sentiment_token_to_idx.txt>) Each line of the dictionary file contains a word and a unique index for that word, separated by a space, with a total of 32787 words generated from the training dataset. |
| The example downloads the above files while running. |
| |
| The example's command line parameters are as shown below: |
| |
| ``` |
| ./sentiment_analysis_rnn --help |
| Usage: |
| sentiment_analysis_rnn |
| --input Input movie review. The review can be single line or multiline.e.g. "This movie is the best." OR "This movie is the best. The direction is awesome." |
| [--gpu] Specify this option if workflow needs to be run in gpu context |
| If the review is multiline, the example predicts sentiment score for each line and the final score is the average of scores obtained for each line. |
| |
| ``` |
| |
| The following command line shows running the example with the movie review containing only one line. |
| |
| ``` |
| ./sentiment_analysis_rnn --input "This movie has the great story" |
| ``` |
| |
| The above command will output the sentiment score as follows: |
| ``` |
| sentiment_analysis_rnn.cpp:346: Input Line : [This movie has the great story] Score : 0.999898 |
| sentiment_analysis_rnn.cpp:449: The sentiment score between 0 and 1, (1 being positive)=0.999898 |
| ``` |
| |
| The following command line shows invoking the example with the multi-line review. |
| |
| ``` |
| ./sentiment_analysis_rnn --input "This movie is the best. The direction is awesome." |
| ``` |
| The above command will output the sentiment score for each line in the review and average score as follows: |
| ``` |
| Input Line : [This movie is the best] Score : 0.964498 |
| Input Line : [ The direction is awesome] Score : 0.968855 |
| The sentiment score between 0 and 1, (1 being positive)=0.966677 |
| ``` |
| |
| Alternatively, you can run the [unit_test_sentiment_analysis_rnn.sh](<https://github.com/apache/mxnet/blob/master/cpp-package/example/inference/unit_test_sentiment_analysis_rnn.sh>) script. |