| commit | b40a02c265ad029a6dec2eef808b48945e39c31b | [log] [tgz] |
|---|---|---|
| author | Siyuan Feng <Hzfengsy@sjtu.edu.cn> | Fri Aug 09 21:44:14 2024 +0800 |
| committer | GitHub <noreply@github.com> | Fri Aug 09 09:44:14 2024 -0400 |
| tree | 5d9cb40ac5344eee312d6b03ce9509e14d8acd8a | |
| parent | 77391714ab714afcc849fde1378a5a0c62d99c2e [diff] |
[Relax] Add KVCache Interface for Relax NNModule (#17261) Introduce kv cache interface for Relax NNModule to support paged attention. Note that the implementation is migrated from MLC-llm Co-authored-by: Bohan Hou <bohanhou@andrew.cmu.edu> Co-authored-by: Ruihang Lai <ruihangl@cs.cmu.edu> Co-authored-by: Hongyi Jin <hongyij@andrew.cmu.edu> Co-authored-by: krishnaraj36 <quic_kvegiraj@quicinc.com>
Documentation | Contributors | Community | Release Notes
Apache TVM is a compiler stack for deep learning systems. It is designed to close the gap between the productivity-focused deep learning frameworks, and the performance- and efficiency-focused hardware backends. TVM works with deep learning frameworks to provide end to end compilation to different backends.
TVM is licensed under the Apache-2.0 license.
Check out the TVM Documentation site for installation instructions, tutorials, examples, and more. The Getting Started with TVM tutorial is a great place to start.
TVM adopts apache committer model, we aim to create an open source project that is maintained and owned by the community. Check out the Contributor Guide.
We learned a lot from the following projects when building TVM.