[2.0] [BACKPORT] of [1.x][FEATURE] CUDA graphs support (#19142) (#20324)

* [1.x][FEATURE] CUDA graphs support (#19142)

* Initial cherry-pick

* Store NodeAttrs in OpExecutor

* Do not allow stateful operations in CUDA graphs and provide mechanism
for marking ops as safe

* Guard against using ops with synchronization

* Cleaning

* Properly guard graphs

* Limit graphs to CUDA 10.2+

* Fix the compilation when graphs are not available

* Guarding the libcuda.so usage behind RTC compilation flag

* Document the env variables

* Add test

* Fix the test

* Use with_environment

* Fix compile and test_cuda_graphs

* Fix lint

* Mark more ops as not CUDA Graphs compatible

* Mark some linalg ops as not CUDA Graphs compatible

* Marked 2 ops CUDA Graphs incompatible due to cpu->gpu copy

* Mark cuDNN Dropout as fully CUDA Graphs compatible.  Reenable tests.

* clang-tidy fixes

* More clang-tidy fixes

* Avoid CUDA_CALL(e): improper macro expansion

* Add compile guard to Dropout's FIsCUDAGraphsCompatible def

* Temporarily add '-s' to pytest serial tests

* Fix DropoutOp.dropout_passthrough_ handling for CUDA Graphs

* Adapt test_gluon_gpu.py::test_cuda_graphs for gluon2.0

* Create CUDA Graph 'dot' files if MXNET_CUDA_GRAPHS_DBG_FILE=<file_prefix>

* Fix clang-tidy

* Fix more clang-tidy

* Skip test_np_standard_binary_funcs test of 0-dim array broadcast

* Improve test_rnn_layers_fp{16,32} invocation

* Run test_rnn_layers_fp32 only when cuDNN is present

* Fix potential out-of-bounds write in count_sketch.cu

* Add temp output to debug centos crash

* Mark InstanceNorm and LeakyRELU as not CUDA Graphs compatible

* Ops calling FStatefulCompute* are not CUDA Graphs compatible by default

* Fix clang-tidy

* Revert "Add temp output to debug centos crash"

This reverts commit e013a85ea599fa761cb98762f11feab6e7d74049.

* Quiet 'unused variable' compilation warning

* Trigger CI

* Check of FCreateOpState removed given new check for FStatefulCompute*

* Revert "Temporarily add '-s' to pytest serial tests"

This reverts commit 5a2f847558a7f55790f1ad1fb5ee930b4ad1a3a9.

Co-authored-by: Przemyslaw Tredak <ptredak@nvidia.com>
37 files changed
tree: b69cc76501b2155ada2970cc15e2564de9799e53
  1. .github/
  2. 3rdparty/
  3. benchmark/
  4. cd/
  5. ci/
  6. cmake/
  7. config/
  8. contrib/
  9. cpp-package/
  10. docker/
  11. docs/
  12. example/
  13. include/
  14. licenses/
  15. plugin/
  16. python/
  17. src/
  18. tests/
  19. tools/
  20. .asf.yaml
  21. .clang-format
  22. .clang-tidy
  23. .cmakelintrc
  24. .codecov.yml
  25. .git-blame-ignore-revs
  26. .gitattributes
  27. .gitignore
  28. .gitmodules
  29. .licenserc.yaml
  30. .mxnet_root
  31. CMakeLists.txt
  32. CODE_OF_CONDUCT.md
  33. CODEOWNERS
  34. conftest.py
  35. CONTRIBUTORS.md
  36. DISCLAIMER
  37. DNNL_README.md
  38. doap.rdf
  39. LICENSE
  40. NEWS.md
  41. NOTICE
  42. prospector.yaml
  43. pytest.ini
  44. rat-excludes
  45. README.md
  46. readthedocs.yml
  47. SECURITY.md
  48. snap.python
README.md

banner

Apache MXNet (incubating) for Deep Learning

GitHub release (latest SemVer) GitHub stars GitHub forks GitHub contributors GitHub issues good first issue GitHub pull requests by-label GitHub license Twitter Twitter Follow

Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic and imperative programming to maximize efficiency and productivity. At its core, MXNet contains a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient. MXNet is portable and lightweight, scalable to many GPUs and machines.

MXNet is more than a deep learning project. It is a community on a mission of democratizing AI. It is a collection of blue prints and guidelines for building deep learning systems, and interesting insights of DL systems for hackers.

Licensed under an Apache-2.0 license.

BranchBuild Status
masterCentOS CPU Build Status CentOS GPU Build Status Clang Build Status
Edge Build Status Miscellaneous Build Status Sanity Build Status
Unix CPU Build Status Unix GPU Build Status Website Build Status
Windows CPU Build Status Windows GPU Build Status Documentation Status
v1.xCentOS CPU Build Status CentOS GPU Build Status Clang Build Status
Edge Build Status Miscellaneous Build Status Sanity Build Status
Unix CPU Build Status Unix GPU Build Status Website Build Status
Windows CPU Build Status Windows GPU Build Status Documentation Status

Features

  • NumPy-like programming interface, and is integrated with the new, easy-to-use Gluon 2.0 interface. NumPy users can easily adopt MXNet and start in deep learning.
  • Automatic hybridization provides imperative programming with the performance of traditional symbolic programming.
  • Lightweight, memory-efficient, and portable to smart devices through native cross-compilation support on ARM, and through ecosystem projects such as TVM, TensorRT, OpenVINO.
  • Scales up to multi GPUs and distributed setting with auto parallelism through ps-lite, Horovod, and BytePS.
  • Extensible backend that supports full customization, allowing integration with custom accelerator libraries and in-house hardware without the need to maintain a fork.
  • Support for Python, Java, C++, R, Scala, Clojure, Go, Javascript, Perl, and Julia.
  • Cloud-friendly and directly compatible with AWS and Azure.

Contents

What's New

Ecosystem News

Stay Connected

ChannelPurpose
Follow MXNet Development on GithubSee what's going on in the MXNet project.
MXNet Confluence Wiki for Developers MXNet developer wiki for information related to project development, maintained by contributors and developers. To request write access, send an email to send request to the dev list .
dev@mxnet.apache.org mailing listThe “dev list”. Discussions about the development of MXNet. To subscribe, send an email to dev-subscribe@mxnet.apache.org .
discuss.mxnet.io Asking & answering MXNet usage questions.
Apache Slack #mxnet Channel Connect with MXNet and other Apache developers. To join the MXNet slack channel send request to the dev list .
Follow MXNet on Social MediaGet updates about new features and events.

Social Media

Keep connected with the latest MXNet news and updates.

History

MXNet emerged from a collaboration by the authors of cxxnet, minerva, and purine2. The project reflects what we have learned from the past projects. MXNet combines aspects of each of these projects to achieve flexibility, speed, and memory efficiency.

Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. In Neural Information Processing Systems, Workshop on Machine Learning Systems, 2015