A long standing request from MXNet users has been to invoke parallel inference on a model from multiple threads while sharing the parameters. With this use case in mind, the threadsafe version of CachedOp was added to provide a way for customers to do multi-threaded inference for MXNet users. This doc attempts to do the following:
Examining the current state of thread safety in MXNet we can arrive to the following conclusion:
The CachedOpThreadSafe and corresponding C APIs were added to address point 3 above and provide a way for MXNet users to do multi-threaded inference.
/*! * \brief create cached operator, allows to choose thread_safe version * of cachedop */ MXNET_DLL int MXCreateCachedOp(SymbolHandle handle, int num_flags, const char** keys, const char** vals, CachedOpHandle *out, bool thread_safe DEFAULT(false));
To complete this tutorial you need to:
To use the C++ API in MXNet, you need to build MXNet from source with C++ package. Please follow the built from source guide, and C++ Package documentation The summary of those two documents is that you need to build MXNet from source with USE_CPP_PACKAGE
flag set to 1. This example requires a build with CUDA and CUDNN.
If you have built mxnet from source with cmake, then do the following:
$ cp build/cpp-package/example/multi_threaded_inference .
The example is tested with models such as imagenet1k-inception-bn
, imagenet1k-resnet-50
, imagenet1k-resnet-152
, imagenet1k-resnet-18
To run the multi threaded inference example:
First export LD_LIBRARY_PATH
:
$ export LD_LIBRARY_PATH=<MXNET_LIB_DIR>:$LD_LIBRARY_PATH
$ ./multi_threaded_inference [model_name] [is_gpu] [file_names]
e.g.
./multi_threaded_inference imagenet1k-inception-bn 1 grace_hopper.jpg dog.jpg
The above script spawns 2 threads, shares the same cachedop and params among two threads, and runs inference on GPU. It returns the inference results in the order in which files are provided.
NOTE: This example is to demonstrate the multi-threaded-inference with cached op. The inference results work well only with specific models (e.g. imagenet1k-inception-bn). The results may not necessarily be very accurate because of different preprocessing step required etc.
The multi threaded inference example (multi_threaded_inference.cc
) involves the following steps:
The above code parses arguments, loads the image file into a ndarray with a specific shape. There are a few things that are set by default and not configurable. For example, static_alloc
and static_shape
are by default set to true.
The above code loads params and copies input data and params to specific context.
The above code prepares flag_key_cstrs
and flag_val_cstrs
to be passed the Cached op. The C API call is made with MXCreateCachedOp
. This will lead to creation of thread safe cached op since the thread_safe
(which is the last parameter to MXCreateCachedOp
) is set to true. When this is set to false, it will invoke CachedOp instead of CachedOpThreadSafe.
The above creates the lambda function taking the thread number as the argument. If random_sleep
is set it will sleep for a random number (secs) generated between 0 to 5 seconds. Following this, it invokes MXInvokeCachedOp
(from the hdl it determines whether to invoke cached op threadsafe version or not). When this is set to false, it will invoke CachedOp instead of CachedOpThreadSafe.
Spawns multiple threads, joins and waits to wait for all ops to complete. The other alternative is to wait in the thread on the output ndarray and remove the WaitAll after join.
The above code outputs results for different threads and cleans up the thread safe cached op.
Models Tested | oneDNN | CUDNN | NO-CUDNN |
---|---|---|---|
imagenet1k-resnet-18 | Yes | Yes | Yes |
imagenet1k-resnet-152 | Yes | Yes | Yes |
imagenet1k-resnet-50 | Yes | Yes | Yes |
wait_to_read
in individual threads can cause issues. Calling invoke from each thread and calling WaitAll after thread joins should still work fine.Future work includes Increasing model coverage and addressing most of the limitations mentioned under Current Limitations except the training use case. For more updates, please subscribe to discussion activity on RFC: https://github.com/apache/mxnet/issues/16431.