commit | 5ab7f64c2e99890502aa456798b74523e76f3c17 | [log] [tgz] |
---|---|---|
author | Dick Carter <dcarter@nvidia.com> | Fri Mar 18 16:55:07 2022 -0700 |
committer | GitHub <noreply@github.com> | Fri Mar 18 16:55:07 2022 -0700 |
tree | b69cc76501b2155ada2970cc15e2564de9799e53 | |
parent | 95a6a54adf8b9dfbabe8d558109896e0b88fb9e4 [diff] |
[2.0] [BACKPORT] of [1.x][FEATURE] CUDA graphs support (#19142) (#20324) * [1.x][FEATURE] CUDA graphs support (#19142) * Initial cherry-pick * Store NodeAttrs in OpExecutor * Do not allow stateful operations in CUDA graphs and provide mechanism for marking ops as safe * Guard against using ops with synchronization * Cleaning * Properly guard graphs * Limit graphs to CUDA 10.2+ * Fix the compilation when graphs are not available * Guarding the libcuda.so usage behind RTC compilation flag * Document the env variables * Add test * Fix the test * Use with_environment * Fix compile and test_cuda_graphs * Fix lint * Mark more ops as not CUDA Graphs compatible * Mark some linalg ops as not CUDA Graphs compatible * Marked 2 ops CUDA Graphs incompatible due to cpu->gpu copy * Mark cuDNN Dropout as fully CUDA Graphs compatible. Reenable tests. * clang-tidy fixes * More clang-tidy fixes * Avoid CUDA_CALL(e): improper macro expansion * Add compile guard to Dropout's FIsCUDAGraphsCompatible def * Temporarily add '-s' to pytest serial tests * Fix DropoutOp.dropout_passthrough_ handling for CUDA Graphs * Adapt test_gluon_gpu.py::test_cuda_graphs for gluon2.0 * Create CUDA Graph 'dot' files if MXNET_CUDA_GRAPHS_DBG_FILE=<file_prefix> * Fix clang-tidy * Fix more clang-tidy * Skip test_np_standard_binary_funcs test of 0-dim array broadcast * Improve test_rnn_layers_fp{16,32} invocation * Run test_rnn_layers_fp32 only when cuDNN is present * Fix potential out-of-bounds write in count_sketch.cu * Add temp output to debug centos crash * Mark InstanceNorm and LeakyRELU as not CUDA Graphs compatible * Ops calling FStatefulCompute* are not CUDA Graphs compatible by default * Fix clang-tidy * Revert "Add temp output to debug centos crash" This reverts commit e013a85ea599fa761cb98762f11feab6e7d74049. * Quiet 'unused variable' compilation warning * Trigger CI * Check of FCreateOpState removed given new check for FStatefulCompute* * Revert "Temporarily add '-s' to pytest serial tests" This reverts commit 5a2f847558a7f55790f1ad1fb5ee930b4ad1a3a9. Co-authored-by: Przemyslaw Tredak <ptredak@nvidia.com>
Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic and imperative programming to maximize efficiency and productivity. At its core, MXNet contains a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient. MXNet is portable and lightweight, scalable to many GPUs and machines.
MXNet is more than a deep learning project. It is a community on a mission of democratizing AI. It is a collection of blue prints and guidelines for building deep learning systems, and interesting insights of DL systems for hackers.
Licensed under an Apache-2.0 license.
Branch | Build Status |
---|---|
master | |
v1.x | |
Channel | Purpose |
---|---|
Follow MXNet Development on Github | See what's going on in the MXNet project. |
MXNet Confluence Wiki for Developers | MXNet developer wiki for information related to project development, maintained by contributors and developers. To request write access, send an email to send request to the dev list . |
dev@mxnet.apache.org mailing list | The “dev list”. Discussions about the development of MXNet. To subscribe, send an email to dev-subscribe@mxnet.apache.org . |
discuss.mxnet.io | Asking & answering MXNet usage questions. |
Apache Slack #mxnet Channel | Connect with MXNet and other Apache developers. To join the MXNet slack channel send request to the dev list . |
Follow MXNet on Social Media | Get updates about new features and events. |
Keep connected with the latest MXNet news and updates.
MXNet emerged from a collaboration by the authors of cxxnet, minerva, and purine2. The project reflects what we have learned from the past projects. MXNet combines aspects of each of these projects to achieve flexibility, speed, and memory efficiency.
Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. In Neural Information Processing Systems, Workshop on Machine Learning Systems, 2015