| Environment Variables |
| ===================== |
| MXNet has several settings that you can change with environment variables. |
| Typically, you wouldn't need to change these settings, but they are listed here for reference. |
| |
| ## Set the Number of Threads |
| |
| * MXNET_GPU_WORKER_NTHREADS (default=2) |
| - The maximum number of threads that do the computation job on each GPU. |
| * MXNET_GPU_COPY_NTHREADS (default=1) |
| - The maximum number of threads that do the memory copy job on each GPU. |
| * MXNET_CPU_WORKER_NTHREADS (default=1) |
| - The maximum number of threads that do the CPU computation job. |
| * MXNET_CPU_PRIORITY_NTHREADS (default=4) |
| - The number of threads given to prioritized CPU jobs. |
| * MXNET_CPU_NNPACK_NTHREADS (default=4) |
| - The number of threads used for NNPACK. |
| |
| ## Memory Options |
| |
| * MXNET_EXEC_ENABLE_INPLACE (default=true) |
| - Whether to enable in-place optimization in symbolic execution. |
| * MXNET_EXEC_MATCH_RANGE (default=10) |
| - The rough matching scale in the symbolic execution memory allocator. |
| - Set this to 0 if you don't want to enable memory sharing between graph nodes(for debugging purposes). |
| * MXNET_EXEC_NUM_TEMP (default=1) |
| - The maximum number of temp workspaces to allocate to each device. |
| - Setting this to a small number can save GPU memory. It will also likely decrease the level of parallelism, which is usually acceptable. |
| * MXNET_GPU_MEM_POOL_RESERVE (default=5) |
| - The percentage of GPU memory to reserve for things other than the GPU array, such as kernel launch or cudnn handle space. |
| - If you see a strange out-of-memory error from the kernel launch, after multiple iterations, try setting this to a larger value. |
| |
| ## Engine Type |
| |
| * MXNET_ENGINE_TYPE (default=ThreadedEnginePerDevice) |
| - The type of underlying execution engine of MXNet. |
| - Choices: |
| - NaiveEngine: A very simple engine that uses the master thread to do computation. |
| - ThreadedEngine: A threaded engine that uses a global thread pool to schedule jobs. |
| - ThreadedEnginePerDevice: A threaded engine that allocates thread per GPU. |
| |
| ## Control the Data Communication |
| |
| * MXNET_KVSTORE_REDUCTION_NTHREADS (default=4) |
| - The number of CPU threads used for summing big arrays. |
| * MXNET_KVSTORE_BIGARRAY_BOUND (default=1e6) |
| - The minimum size of a "big array." |
| - When the array size is bigger than this threshold, MXNET_KVSTORE_REDUCTION_NTHREADS threads are used for reduction. |
| * MXNET_ENABLE_GPU_P2P (default=1) |
| - If true, MXNet tries to use GPU peer-to-peer communication, if available, |
| when kvstore's type is `device` |
| |
| ## Memonger |
| |
| * MXNET_BACKWARD_DO_MIRROR (default=0) |
| - whether do `mirror` during training for saving device memory. |
| - when set to `1`, then during forward propagation, graph executor will `mirror` some layer's feature map and drop others, but it will re-compute this dropped feature maps when needed. `MXNET_BACKWARD_DO_MIRROR=1` will save 30%~50% of device memory, but retains about 95% of running speed. |
| - one extension of `mirror` in MXNet is called [memonger technology](https://arxiv.org/abs/1604.06174), it will only use O(sqrt(N)) memory at 75% running speed. |
| |
| ## Control the profiler |
| |
| When USE_PROFILER is enabled in Makefile or CMake, the following environments can be used to profile the application without changing code. |
| |
| * MXNET_PROFILER_AUTOSTART (default=0) |
| - Set to 1, MXNet starts the profiler automatically. The profiling result is stored into profile.json in the working directory. |
| |
| * MXNET_PROFILER_MODE (default=0) |
| - If set to '0', profiler records the events of the symbolic operators. |
| - If set to '1', profiler records the events of all operators. |
| |
| ## Other Environment Variables |
| |
| * MXNET_CUDNN_AUTOTUNE_DEFAULT (default=0) |
| - The default value of cudnn_tune for convolution layers. |
| - Auto tuning is turn off by default. For benchmarking, set this to 1 to turn it on by default. |
| |
| Settings for Minimum Memory Usage |
| --------------------------------- |
| - Make sure ```min(MXNET_EXEC_NUM_TEMP, MXNET_GPU_WORKER_NTHREADS) = 1``` |
| - The default setting satisfies this. |
| |
| Settings for More GPU Parallelism |
| --------------------------------- |
| - Set ```MXNET_GPU_WORKER_NTHREADS``` to a larger number (e.g., 2) |
| - To reduce memory usage, consider setting ```MXNET_EXEC_NUM_TEMP```. |
| - This might not speed things up, especially for image applications, because GPU is usually fully utilized even with serialized jobs. |