1. fffd168 [Unity][BYOC] Use arith.Analyzer to check batch equality of matmul in cublas (#16982) by Rick Zhou · 17 hours ago main
  2. 4c1ebcf [Relax] Implement relax.op.view (#16955) by Eric Lunderberg · 22 hours ago
  3. c0a47ed [CUBLAS][FP8] Enable R.matmul + R.multiply offloading (#16974) by Ivan Sidorenko · 2 days ago nightly
  4. 02c4c55 [SVE] Add codegen support for `vscale_range()` function attribute (#16962) by Andrei Hutu · 2 days ago
  5. 819b002 [Relax] Support nested ModuleList in nn.Module (#16971) by Wuwei Lin · 3 days ago
  6. 28d32b5 [TIR] Support narrow dtype for let binding (#16947) by Siyuan Feng · 4 days ago
  7. 876f528 [LLVM] Stringref API deprecation fixes (#16968) by Anirudh Sundar Subramaniam · 4 days ago
  8. 9cfebca [TVMScript] Fix error reporting inside Macro func (#16967) by Siyuan Feng · 5 days ago
  9. 59ef0ee [Bugfix][ONNX] Improve broadcast and batch_matmul conversion (#16961) by XinhuaHamiMelon · 5 days ago
  10. 944d180 [SVE] Add get_active_lane_mask builtin (#16965) by Luke Hutton · 6 days ago
  11. effa5d7 [CUBLAS] Enable offloading of R.matmul + R.dequantize (#16896) by Ivan Sidorenko · 6 days ago
  12. 20d7696 [Relax] Express dynamic arguments of strided_slice as arguments (#16826) by Eric Lunderberg · 9 days ago
  13. a320b63 [Unity][Cutlass] Fix C source generation of dense operation (#16476) by Jinbae Park · 9 days ago
  14. 6252fa5 [TIR] Enhance CLZ intrinsic support (#16952) by Siyuan Feng · 10 days ago
  15. bc8742b [Misc] Add script for testing release package (#16956) by ysh329 · 10 days ago
  16. c8deb7f Overriding the StructuralEqual() for easy usage (#16908) by sdalvi-quic · 10 days ago
  17. 114ad70 [TOPI] Revert unification of conv2d NHWC hybrid scheduling for `arm_cpu` targets (#16951) by Andrei Hutu · 11 days ago
  18. b4a69de Enable gemv schedule for adreno (#16932) by krishnaraj36 · 11 days ago
  19. c0385c7 [Runtime] Allow offset to be specified in NDArray::CreateView (#16938) by Eric Lunderberg · 11 days ago
  20. dd09c85 [CI] Update image tag to 20240428-060115-0b09ed018 (#16948) by Yong Wu · 11 days ago
  21. 2d7663c [CI] Use LLVM17 for tests on `ci_cpu` (#16931) by Luke Hutton · 11 days ago
  22. e10cdc5 [tir][Compute-at] Make compute-ated block simple when the predicate could be merged (#16945) by wrongtest · 11 days ago
  23. b00fc55 [CI] Enable Conda setup v3 (#16942) by Tianqi Chen · 11 days ago
  24. 081c23b [Relax] Allow PrimValue as index in relax.op.take (#16940) by Eric Lunderberg · 12 days ago
  25. b54f57a [TFLite] Add support for GELU conversion (#16936) by Luke Hutton · 12 days ago
  26. 0b09ed0 [3rdparty] Bump FlashInfer for sampling functions (#16935) by Ruihang Lai · 12 days ago
  27. 63e0a0f [Thrust] Increase static workspace size (#16937) by Ruihang Lai · 12 days ago
  28. 3ff3daa [CI] Upgrade CUDA to 12.4 (#16939) by Yong Wu · 12 days ago
  29. 1453893 [CLML] Fix in clml pattern check condition (#16933) by krishnaraj36 · 13 days ago
  30. 97ff7cc [VM][OPENCL] Take advantage of OpenCL host ptr for improved copy (#16929) by Siva · 13 days ago
  31. 278a6af [Relax][TIR] Introduce new `cumsum` op for gpu (#16934) by Siyuan Feng · 14 days ago
  32. 5bd1047 [SCRIPT][ADRENO] Fix in build config for adreno (#16927) by krishnaraj36 · 14 days ago
  33. 51cfb70 [Fix][Dlight] Fix GeneralReduction for log-sum-exp (#16923) by Ruihang Lai · 2 weeks ago
  34. 39f2482 [Fix] Fix SSA conversion for SizeVar retention (#16924) by Ruihang Lai · 2 weeks ago
  35. 4f8c03f [TVMScript] Support `T.launch_thread` with i64 dtype (#16916) by Siyuan Feng · 2 weeks ago
  36. 5cf4ca6 [Marvell BYOC]: Marvell AI Accelerator Integration - Phase 2 (#16915) by Krishna Bindumadhavan · 2 weeks ago
  37. 2f395f1 [SVE][TOPI] Add conv2d NHWC hybrid SVE schedule for `arm_cpu` (#16899) by Andrei Hutu · 2 weeks ago
  38. 11f2253 Restore "pytest.mark.gpu" for RELAX tests (#16741) by apeskov · 2 weeks ago
  39. 342f472 [Disco] Improve error message for CallPacked (#16919) by Wuwei Lin · 2 weeks ago
  40. b0143d1 [CMAKE] Make LOG_BEFORE_THROW explicit (#16914) by Tianqi Chen · 3 weeks ago
  41. 29534b7 [SVE] Check for SVE target in VectorizeLoop (#16893) by Elen Kalda · 3 weeks ago
  42. 57316da [Web] Support string[] in setPackedFunc() and exceptionally long arrays (#16910) by Charlie Ruan · 3 weeks ago
  43. 6b77cba [Misc] Enhance Release Note Script and Remove Useless File (#16913) by ysh329 · 3 weeks ago
  44. a2511cc [QoL][Relax] Use SeqExpr in IR types when SeqExpr is required (#16859) by Eric Lunderberg · 3 weeks ago
  45. 2978427 [Relax] Prevent to generate duplicate func in dispatch_sort_scan (#16904) by Siyuan Feng · 3 weeks ago
  46. 6afbc12 [Bugfix][Relax] Raise exception for OOM allocation (#16905) by Eric Lunderberg · 3 weeks ago
  47. 36efa36 [Upd] Fixed lld search in rocm (#16907) by Shrey Gupta · 3 weeks ago
  48. 622bd15 [Relax] Handle binary operations between Tensor and PrimValue (#16827) by Eric Lunderberg · 3 weeks ago
  49. fe52709 [CMAKE] Misc improvment of Util (#16900) by Tianqi Chen · 3 weeks ago
  50. 59376ee [Relax] Allow specifying entry_funcs for BYOC (#16902) by Wuwei Lin · 3 weeks ago
  51. 7dc0472 [Bugfix] CudaDeviceAPI::GetAttr may check kExist when GPUs absent (#16903) by Eric Lunderberg · 3 weeks ago
  52. de91c5c [Bugfix] rocm shared memory issue on MI250 (#16901) by Lesheng Jin · 3 weeks ago
  53. da56c89 [Dlight] Enhance vectorization for gpu matmul (#16894) by Wuwei Lin · 3 weeks ago
  54. b3ffd97 [BYOC] Add layout check and update shape check for cublas FP8 BYOC (#16895) by Wuwei Lin · 3 weeks ago
  55. 857fe61 [Target] Don't register AArch64 target tags without LLVM compiler support (#16897) by Luke Hutton · 3 weeks ago
  56. d030ce2 [TVMScript] Optionally use `ruff format` instead of `black` (#16876) by Eric Lunderberg · 3 weeks ago
  57. 460f6f1 [QoL][Relax] Infer StructInfo for relax::Tuple on construction (#16860) by Eric Lunderberg · 3 weeks ago
  58. 94a44d7 [QoL][Relax] Return well-formed IR from relax::Function::CreateEmpty (#16861) by Eric Lunderberg · 3 weeks ago
  59. 4cb4605 [TVMScript][Bug] Add test case for missing symbolic bounds (#16877) by Eric Lunderberg · 3 weeks ago
  60. 08965f0 [CUBLAS] Set fp32 compute and scale dtypes in fp16 matmul (#16892) by Ivan Sidorenko · 3 weeks ago
  61. 3680a0d [RUNTIME][VULKAN] Support total_global_memory (#16890) by Tianqi Chen · 3 weeks ago
  62. d1ac73c [CUBLAS][FP8] Support e4m3 gemm in cuBLAS BYOC (#16888) by Ivan Sidorenko · 3 weeks ago
  63. 95d6778 [dlight] Add check for matmul dtype and fix reduction rule (#16884) by Wuwei Lin · 3 weeks ago
  64. e738f1d [Relax][Frontend] Fix sort, argsort and topk in nn module (#16886) by Siyuan Feng · 3 weeks ago
  65. cdfdd0e [Contrib] Enable fp16 for thrust sort (#16887) by Siyuan Feng · 3 weeks ago
  66. d4056ca [SVE] Support splitting by vscale in `tir::split` and `te::split` (#16862) by Luke Hutton · 4 weeks ago
  67. f267691 [Relax] Stabilize relax pass mutation order (#16883) by Siyuan Feng · 4 weeks ago
  68. a64d1f1 [TIR] Make T.reinterpret nop when dtype is the same (#16879) by Wuwei Lin · 4 weeks ago
  69. 64911ab [Runtime] Implemented Datatype.itemsize() (#16880) by Wuwei Lin · 4 weeks ago
  70. d0cbb02 [release] Update version to 0.17.dev0 on main branch by Star Yuan · 4 weeks ago v0.17.dev0
  71. 6496903 [release] Update version to 0.16.0 on main branch by Star Yuan · 4 weeks ago v0.16.0 v0.16.0 v0.16.0.rc0
  72. 5c80691 [Dlight] Enhance vectorization loading weight for gemv (#16878) by Wuwei Lin · 4 weeks ago
  73. 0a3fe22 [Relax] Enhance symbolic expr estimation in memory planning (#16872) by Ruihang Lai · 4 weeks ago
  74. 3f09e7f [Thrust] Fix thrust workspace allocation (#16873) by Wuwei Lin · 4 weeks ago
  75. 88a1c65 [3rdparty] Bump flashinfer (#16868) by Wuwei Lin · 4 weeks ago
  76. 0aae97d [PageKV] allow PopN to pop all the tokens in last block (#16871) by ZCHNO · 4 weeks ago
  77. 4b90655 [OpenCL] Add OpenCL device for automatic target detection (#16854) by Mengshiun Yu · 4 weeks ago
  78. c67a055 [BugFix][Target] Added null check to fix segfault at ->defined() in cpu.cc DetectSystemTriple() (#16766) by Otto Rasmussen · 4 weeks ago
  79. f9e36fc [3rdparty] Bump FlashInfer (#16866) by Ruihang Lai · 4 weeks ago
  80. 4617efa [Relax] Dispatch sort/scan for non-cuda gpu backends (#16867) by Wuwei Lin · 4 weeks ago
  81. 6748215 [Codegen, CUDA] Add handling of fp8 broadcast / const (#16865) by Wuwei Lin · 4 weeks ago
  82. 2829b59 [TVMScript] Add parser and printer support for e4m3/e5m2 fp8 (#16864) by Wuwei Lin · 4 weeks ago
  83. a482b4c [Picojson] Let the key of objects in json be ordered by default (#16863) by Yixin Dong · 4 weeks ago
  84. 95cb0de [VULKAN] Fix CLZ support for Vulkan (#16858) by Siyuan Feng · 4 weeks ago
  85. 4d4f050 [SVE] Support scalable vectors in LoopVectorizer (#16782) by Elen Kalda · 4 weeks ago
  86. a309b6b [Thrust] Use pointer to tls pool to prevent creating new pool (#16856) by Wuwei Lin · 4 weeks ago
  87. 0594994 [ONNX] Fix interpreting auto_pad parameters in ConvTranspose operator (#16001) by padreofthegame · 4 weeks ago
  88. d1e24ca [Web] Support web indexDB cache for larger model storage (#16733) by Hangrui Cao · 5 weeks ago
  89. 81a8506 [TIR] Use constructor for new PrimFunc in TransformLayout (#16832) by Eric Lunderberg · 5 weeks ago
  90. 97d7a35 Fixing probability comment (#16850) by Thais Camacho · 5 weeks ago
  91. a7be540 [KVCache] Initialize one extra page than specified (#16849) by Ruihang Lai · 5 weeks ago
  92. a156181 [Relax] Fix EliminiateCommonSubexpr removing alloc tensor (#16852) by Wuwei Lin · 5 weeks ago
  93. 3e802d1 [Relax,Topi] Allow passing workspace to thrust to avoid allocations (#16851) by Wuwei Lin · 5 weeks ago
  94. 9b5a7a4 [IR] Provide well-formed intermediate in ApplyPassToFunction (#16843) by Eric Lunderberg · 5 weeks ago
  95. ee3f7bc [MSC][M5.3] Support torch.dynamo for dynamic models (#16772) by Archermmt · 5 weeks ago
  96. b91d4e5 [TVMScript] Produce empty DictAttrs when R.func_attrs is absent (#16844) by Eric Lunderberg · 5 weeks ago
  97. b01de08 [DLight] Fix a corner case for reduction rule (#16848) by Siyuan Feng · 5 weeks ago
  98. ab94ca3 [CI] Disable flaky unit test (#16837) by Eric Lunderberg · 5 weeks ago
  99. c93f0ba [Meta-Schedule][OpenCL] Enable MS tuning for Android OpenCL (#16846) by Egor Churaev · 5 weeks ago
  100. cd08356 [TIR] Fix segfaults from ordering of Let/Assert in MakePackedAPI (#16543) by Eric Lunderberg · 5 weeks ago