docs/arch/external_library_dispatch.rst - tvm - Git at Google

 ..  Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at

 ..    http://www.apache.org/licenses/LICENSE-2.0

 ..  Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.

 .. _external-library-dispatch:

 External Library Dispatch (BYOC)
 ================================

 When deploying models, certain operator patterns (e.g., matmul + bias + relu) can be executed
 more efficiently by vendor-optimized libraries such as cuBLAS, CUTLASS, cuDNN, or DNNL. TVM's
 **BYOC (Bring Your Own Codegen)** mechanism identifies these patterns in a Relax module and
 offloads them to external backends, while keeping the rest of the computation on TVM's own
 generated kernels.

 This document explains the BYOC pipeline: how patterns are registered, how subgraphs are
 matched and extracted, how backend code generators are invoked, and how the externally compiled
 code is executed at runtime.


 Overview
 --------

 The BYOC pipeline consists of four stages:

 .. code-block:: text

    IRModule (high-level Relax IR)
         │
         ▼  FuseOpsByPattern              ← match high-level ops, create composite functions
    IRModule (with Composite + Codegen attributes)
         │
         ▼  RunCodegen                    ← invoke backend codegen via FFI
    IRModule (with call_dps_packed to ExternFunc)
    + external runtime Modules
         │
         ▼  LegalizeOps + FuseOps + ...   ← compile remaining ops normally
         │
         ▼  VM compilation                ← link external modules into executable
    Deployable artifact

 Each stage is a Relax transformation pass that operates on the ``IRModule``:

 1. **FuseOpsByPattern** — matches operator subgraphs against registered patterns and groups them
    into composite functions annotated with ``Composite`` and ``Codegen`` attributes.
 2. **MergeCompositeFunctions** (optional) — merges multiple composite functions targeting the same
    backend when inter-operator dependencies allow.
 3. **RunCodegen** — finds all functions with a ``Codegen`` attribute, invokes the corresponding
    backend code generator via FFI, and replaces the original calls with ``call_dps_packed``
    to externally compiled functions.
 4. **Linking** — the resulting external ``runtime.Module``\ s are attached to the ``IRModule``
    as the ``external_mods`` attribute and bundled into the final executable during
    ``relax.build()``.


 Pattern Registration
 --------------------

 Each backend registers the operator patterns it supports in a **global pattern registry**
 (``python/tvm/relax/backend/pattern_registry.py``). The registry is a static table that maps
 pattern names to ``FusionPattern`` objects.

 Registering patterns
 ~~~~~~~~~~~~~~~~~~~~

 .. code-block:: python

    from tvm.relax.backend.pattern_registry import register_patterns
    from tvm.relax.backend.patterns import make_matmul_pattern

    register_patterns([
        (
            "cublas.matmul",              # pattern name (prefix = backend)
            *make_matmul_pattern(          # returns (DFPattern, annotation_patterns)
                with_bias=False,
            ),
            _check_matmul,                # check function
        ),
        (
            "cublas.matmul_bias_relu",
            *make_matmul_pattern(
                with_bias=True,
                activation="relax.nn.relu",
            ),
            _check_matmul,
        ),
        # ... more patterns
    ])

 Each entry is a tuple of ``(name, pattern, annotation_patterns, check_func)`` that gets
 converted to a ``FusionPattern`` object. The name prefix (e.g., ``"cublas"``) identifies the
 backend; ``get_patterns_with_prefix("cublas")`` retrieves all patterns for that backend.

 Patterns registered later have **higher priority** — when a subgraph matches multiple patterns,
 the highest-priority match wins.

 Pattern templates
 ~~~~~~~~~~~~~~~~~

 ``python/tvm/relax/backend/patterns.py`` provides reusable templates for common patterns:

 - ``make_matmul_pattern(with_bias, activation, transposed_rhs)`` — matmul with optional bias
   and activation fusion
 - ``make_conv2d_pattern(with_bias, activation)`` — 2D convolution
 - ``make_attention_pattern()`` — multi-head attention
 - ``make_residual_block_pattern()`` — residual connections
 - ``make_layer_norm_pattern()`` / ``make_rms_norm_pattern()`` — normalization layers

 Each template returns ``(DFPattern, Mapping[str, DFPattern])`` — the main pattern and its
 annotation sub-patterns.

 Check functions
 ~~~~~~~~~~~~~~~

 The check function validates whether a matched subgraph can actually be handled by the backend.
 It receives a ``PatternCheckContext`` and returns ``True`` to accept or ``False`` to reject.

 Typical checks include:

 - **Data type support**: verify the operand dtypes are supported (e.g., cuBLAS supports
   float16, float32, int8, bfloat16, float8 for matmul).
 - **Shape constraints**: verify reduction axes are constant, batch dimensions are compatible.
 - **Leaking intermediates**: reject if an intermediate result is used outside the fused group
   (via ``has_leaking_intermediate_variables()``).


 Partitioning
 ------------

 After patterns are registered, a backend provides a **partition function** that applies
 ``FuseOpsByPattern`` to an ``IRModule``:

 .. code-block:: python

    # python/tvm/relax/backend/cuda/cublas.py
    def partition_for_cublas(mod, bind_constants=False):
        patterns = get_patterns_with_prefix("cublas")
        return transform.FuseOpsByPattern(
            patterns, bind_constants=bind_constants, annotate_codegen=True
        )(mod)

 With ``annotate_codegen=True``, each matched subgraph is wrapped in a two-level function
 structure:

 .. code-block:: text

    # Outer function — tagged for the codegen backend
    @R.function
    def fused_relax_matmul_cublas0(args...):
        R.func_attr({"Codegen": "cublas", "global_symbol": "fused_relax_matmul_cublas0"})
        ...
            # Inner function — identifies the specific pattern
            @R.function(private=True)
            def composite(args...):
                R.func_attr({"Composite": "cublas.matmul_bias_relu"})
                lv0 = R.matmul(x, w)
                lv1 = R.add(lv0, bias)
                lv2 = R.nn.relu(lv1)
                return lv2
        ...

 The outer function carries the ``Codegen`` attribute that ``RunCodegen`` uses to dispatch to the
 right backend. The inner function carries the ``Composite`` attribute that the backend codegen
 uses to identify which operation to emit.

 MergeCompositeFunctions
 ~~~~~~~~~~~~~~~~~~~~~~~

 When ``annotate_codegen=False``, ``FuseOpsByPattern`` only creates inner functions with
 ``Composite`` attributes. A separate ``MergeCompositeFunctions`` pass then groups multiple
 composite functions targeting the same backend into a single outer function with ``Codegen``
 and ``global_symbol`` attributes.

 This is useful when multiple sequential operations should be sent to the same backend as a
 single unit (e.g., a sequence of cuBLAS matmuls that share intermediate results). The pass
 checks that merging does not create cyclic dependencies between groups.


 Code Generation
 ---------------

 ``RunCodegen`` (``src/relax/transform/run_codegen.cc``) is the pass that triggers backend
 code generation:

 1. Scan the module for all functions with a ``Codegen`` attribute.
 2. Group them by backend target name.
 3. For each backend, look up the registered codegen function via FFI key
    ``"relax.ext.<backend>"`` (e.g., ``"relax.ext.cublas"``).
 4. Call the codegen function, which returns an array of compiled ``runtime.Module``\ s.
 5. Replace the original function calls with ``call_dps_packed(ExternFunc(...), args)``.
 6. Attach the compiled modules to the ``IRModule`` as the ``external_mods`` attribute.

 Codegen registration
 ~~~~~~~~~~~~~~~~~~~~

 Each backend registers a codegen function via TVM's FFI mechanism:

 .. code-block:: cpp

    // src/relax/backend/contrib/cublas/codegen.cc
    ffi::Array<ffi::Module> CublasCompiler(
        ffi::Array<Function> functions,
        ffi::Map<ffi::String, ffi::Any> options,
        ffi::Map<Constant, ffi::String> constant_names) {
      ffi::Array<ffi::Module> compiled_functions;
      for (const auto& func : functions) {
        CublasJSONSerializer serializer(constant_names, AnalyzeVar2Value(func));
        serializer.serialize(func);
        auto graph_json = serializer.GetJSON();
        auto names = serializer.GetConstantNames();
        const auto pf = ffi::Function::GetGlobalRequired("runtime.CublasJSONRuntimeCreate");
        compiled_functions.push_back(
            pf(GetExtSymbol(func), graph_json, names).cast<ffi::Module>());
      }
      return compiled_functions;
    }

    TVM_FFI_STATIC_INIT_BLOCK() {
      namespace refl = tvm::ffi::reflection;
      refl::GlobalDef().def("relax.ext.cublas", CublasCompiler);
    }

 The codegen function receives:

 - ``functions``: the Relax functions with ``Codegen`` attribute to compile.
 - ``options``: backend-specific compilation options.
 - ``constant_names``: mapping from constant values to their names (for weight handling).

 It returns an array of ``runtime.Module`` objects — one per function — that contain the
 externally compiled code.

 Codegen strategies
 ~~~~~~~~~~~~~~~~~~

 TVM provides two base classes for implementing backend codegens:

 - **JSONSerializer** (``src/relax/backend/contrib/codegen_json/codegen_json.h``): serializes the
   composite function into a JSON graph representation. At runtime, a backend-specific JSON
   runtime module interprets the graph and dispatches to library calls. Used by cuBLAS, cuDNN,
   and most backends.

 - **CSourceCodegen** (``src/relax/backend/contrib/codegen_c/codegen_c.h``): generates C/CUDA
   source code that is compiled and linked. Used when the backend requires ahead-of-time
   compilation.


 Runtime Execution
 -----------------

 After ``RunCodegen``, the original high-level function calls are replaced with:

 .. code-block:: python

    R.call_dps_packed(ExternFunc("fused_relax_matmul_cublas0"), (x, w, bias), ...)

 At runtime, ``call_dps_packed`` invokes the externally compiled function through the
 ``PackedFunc`` interface. The external ``runtime.Module``\ s (produced by the codegen) are
 imported into the final executable during ``relax.build()`` and are available via the module's
 function lookup mechanism.

 For JSON-based backends (cuBLAS, cuDNN), the runtime module deserializes the JSON graph and
 dispatches each node to the corresponding library API call. For source-based backends, the
 compiled native code is called directly.


 Adding a New Backend
 --------------------

 To add support for a new external library:

 1. **Define patterns** in ``python/tvm/relax/backend/<target>/``:

    - Create DFPatterns using templates from ``patterns.py`` or custom patterns.
    - Write check functions to validate dtypes, shapes, and other constraints.
    - Register patterns with ``register_patterns()``.
    - Provide a ``partition_for_<backend>(mod)`` convenience function.

 2. **Implement codegen** in ``src/relax/backend/contrib/<target>/``:

    - Subclass ``JSONSerializer`` or ``CSourceCodegen``.
    - Implement the visitor that converts composite functions to the target format.
    - Register the codegen function as ``"relax.ext.<target>"``.

 3. **Implement runtime** (for JSON-based backends):

    - Create a JSON runtime module that interprets the serialized graph and dispatches
      to the library's API calls.
    - Register the runtime constructor as ``"runtime.<Target>JSONRuntimeCreate"``.


 Supported Backends
 ------------------

 .. list-table::
    :header-rows: 1
    :widths: 15 25 60

    * - Backend
      - Patterns
      - Operations
    * - cuBLAS
      - ``cublas.*``
      - Matmul (with bias, activation, transpose, dequantize variants)
    * - CUTLASS
      - ``cutlass.*``
      - Matmul, conv2d, attention, residual blocks, decode matmul
    * - cuDNN
      - ``cudnn.*``
      - Conv2d (NHWC/NCHW), stacked attention
    * - DNNL
      - ``dnnl.*``
      - Matmul, conv2d (x86 CPU). Codegen exists at C++ level; patterns are
        defined in tests rather than pre-registered.


 Source Code Map
 ---------------

 .. list-table::
    :header-rows: 1
    :widths: 50 50

    * - Path
      - Contents
    * - ``python/tvm/relax/backend/pattern_registry.py``
      - Pattern registry API (register_patterns, get_patterns_with_prefix)
    * - ``python/tvm/relax/backend/patterns.py``
      - Reusable pattern templates (make_matmul_pattern, etc.)
    * - ``python/tvm/relax/backend/cuda/cublas.py``
      - cuBLAS patterns and partition_for_cublas
    * - ``python/tvm/relax/backend/cuda/cutlass.py``
      - CUTLASS patterns and partition_for_cutlass
    * - ``python/tvm/relax/backend/cuda/cudnn.py``
      - cuDNN patterns and partition_for_cudnn
    * - ``src/relax/backend/pattern_registry.cc``
      - Pattern registry C++ implementation
    * - ``src/relax/transform/run_codegen.cc``
      - RunCodegen pass (CodeGenRunner)
    * - ``src/relax/transform/merge_composite_functions.cc``
      - MergeCompositeFunctions pass
    * - ``src/relax/backend/contrib/cublas/codegen.cc``
      - cuBLAS codegen (JSONSerializer-based)
    * - ``src/relax/backend/contrib/cutlass/codegen.cc``
      - CUTLASS codegen
    * - ``src/relax/backend/contrib/codegen_json/codegen_json.h``
      - JSONSerializer base class
    * - ``src/relax/backend/contrib/codegen_c/codegen_c.h``
      - CSourceCodegen base class
	.. Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	.. http://www.apache.org/licenses/LICENSE-2.0

	.. Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.

	.. _external-library-dispatch:

	External Library Dispatch (BYOC)
	================================

	When deploying models, certain operator patterns (e.g., matmul + bias + relu) can be executed
	more efficiently by vendor-optimized libraries such as cuBLAS, CUTLASS, cuDNN, or DNNL. TVM's
	BYOC (Bring Your Own Codegen) mechanism identifies these patterns in a Relax module and
	offloads them to external backends, while keeping the rest of the computation on TVM's own
	generated kernels.

	This document explains the BYOC pipeline: how patterns are registered, how subgraphs are
	matched and extracted, how backend code generators are invoked, and how the externally compiled
	code is executed at runtime.


	Overview
	--------

	The BYOC pipeline consists of four stages:

	.. code-block:: text

	IRModule (high-level Relax IR)
	│
	▼ FuseOpsByPattern ← match high-level ops, create composite functions
	IRModule (with Composite + Codegen attributes)
	│
	▼ RunCodegen ← invoke backend codegen via FFI
	IRModule (with call_dps_packed to ExternFunc)
	+ external runtime Modules
	│
	▼ LegalizeOps + FuseOps + ... ← compile remaining ops normally
	│
	▼ VM compilation ← link external modules into executable
	Deployable artifact

	Each stage is a Relax transformation pass that operates on the ``IRModule``:

	1. FuseOpsByPattern — matches operator subgraphs against registered patterns and groups them
	into composite functions annotated with ``Composite`` and ``Codegen`` attributes.
	2. MergeCompositeFunctions (optional) — merges multiple composite functions targeting the same
	backend when inter-operator dependencies allow.
	3. RunCodegen — finds all functions with a ``Codegen`` attribute, invokes the corresponding
	backend code generator via FFI, and replaces the original calls with ``call_dps_packed``
	to externally compiled functions.
	4. Linking — the resulting external ``runtime.Module``\ s are attached to the ``IRModule``
	as the ``external_mods`` attribute and bundled into the final executable during
	``relax.build()``.


	Pattern Registration
	--------------------

	Each backend registers the operator patterns it supports in a global pattern registry
	(``python/tvm/relax/backend/pattern_registry.py``). The registry is a static table that maps
	pattern names to ``FusionPattern`` objects.

	Registering patterns
	~~~~~~~~~~~~~~~~~~~~

	.. code-block:: python

	from tvm.relax.backend.pattern_registry import register_patterns
	from tvm.relax.backend.patterns import make_matmul_pattern

	register_patterns([
	(
	"cublas.matmul", # pattern name (prefix = backend)
	*make_matmul_pattern( # returns (DFPattern, annotation_patterns)
	with_bias=False,
	),
	_check_matmul, # check function
	),
	(
	"cublas.matmul_bias_relu",
	*make_matmul_pattern(
	with_bias=True,
	activation="relax.nn.relu",
	),
	_check_matmul,
	),
	# ... more patterns
	])

	Each entry is a tuple of ``(name, pattern, annotation_patterns, check_func)`` that gets
	converted to a ``FusionPattern`` object. The name prefix (e.g., ``"cublas"``) identifies the
	backend; ``get_patterns_with_prefix("cublas")`` retrieves all patterns for that backend.

	Patterns registered later have higher priority — when a subgraph matches multiple patterns,
	the highest-priority match wins.

	Pattern templates
	~~~~~~~~~~~~~~~~~

	``python/tvm/relax/backend/patterns.py`` provides reusable templates for common patterns:

	- ``make_matmul_pattern(with_bias, activation, transposed_rhs)`` — matmul with optional bias
	and activation fusion
	- ``make_conv2d_pattern(with_bias, activation)`` — 2D convolution
	- ``make_attention_pattern()`` — multi-head attention
	- ``make_residual_block_pattern()`` — residual connections
	- ``make_layer_norm_pattern()`` / ``make_rms_norm_pattern()`` — normalization layers

	Each template returns ``(DFPattern, Mapping[str, DFPattern])`` — the main pattern and its
	annotation sub-patterns.

	Check functions
	~~~~~~~~~~~~~~~

	The check function validates whether a matched subgraph can actually be handled by the backend.
	It receives a ``PatternCheckContext`` and returns ``True`` to accept or ``False`` to reject.

	Typical checks include:

	- Data type support: verify the operand dtypes are supported (e.g., cuBLAS supports
	float16, float32, int8, bfloat16, float8 for matmul).
	- Shape constraints: verify reduction axes are constant, batch dimensions are compatible.
	- Leaking intermediates: reject if an intermediate result is used outside the fused group
	(via ``has_leaking_intermediate_variables()``).


	Partitioning
	------------

	After patterns are registered, a backend provides a partition function that applies
	``FuseOpsByPattern`` to an ``IRModule``:

	.. code-block:: python

	# python/tvm/relax/backend/cuda/cublas.py
	def partition_for_cublas(mod, bind_constants=False):
	patterns = get_patterns_with_prefix("cublas")
	return transform.FuseOpsByPattern(
	patterns, bind_constants=bind_constants, annotate_codegen=True
	)(mod)

	With ``annotate_codegen=True``, each matched subgraph is wrapped in a two-level function
	structure:

	.. code-block:: text

	# Outer function — tagged for the codegen backend
	@R.function
	def fused_relax_matmul_cublas0(args...):
	R.func_attr({"Codegen": "cublas", "global_symbol": "fused_relax_matmul_cublas0"})
	...
	# Inner function — identifies the specific pattern
	@R.function(private=True)
	def composite(args...):
	R.func_attr({"Composite": "cublas.matmul_bias_relu"})
	lv0 = R.matmul(x, w)
	lv1 = R.add(lv0, bias)
	lv2 = R.nn.relu(lv1)
	return lv2
	...

	The outer function carries the ``Codegen`` attribute that ``RunCodegen`` uses to dispatch to the
	right backend. The inner function carries the ``Composite`` attribute that the backend codegen
	uses to identify which operation to emit.

	MergeCompositeFunctions
	~~~~~~~~~~~~~~~~~~~~~~~

	When ``annotate_codegen=False``, ``FuseOpsByPattern`` only creates inner functions with
	``Composite`` attributes. A separate ``MergeCompositeFunctions`` pass then groups multiple
	composite functions targeting the same backend into a single outer function with ``Codegen``
	and ``global_symbol`` attributes.

	This is useful when multiple sequential operations should be sent to the same backend as a
	single unit (e.g., a sequence of cuBLAS matmuls that share intermediate results). The pass
	checks that merging does not create cyclic dependencies between groups.


	Code Generation
	---------------

	``RunCodegen`` (``src/relax/transform/run_codegen.cc``) is the pass that triggers backend
	code generation:

	1. Scan the module for all functions with a ``Codegen`` attribute.
	2. Group them by backend target name.
	3. For each backend, look up the registered codegen function via FFI key
	``"relax.ext.<backend>"`` (e.g., ``"relax.ext.cublas"``).
	4. Call the codegen function, which returns an array of compiled ``runtime.Module``\ s.
	5. Replace the original function calls with ``call_dps_packed(ExternFunc(...), args)``.
	6. Attach the compiled modules to the ``IRModule`` as the ``external_mods`` attribute.

	Codegen registration
	~~~~~~~~~~~~~~~~~~~~

	Each backend registers a codegen function via TVM's FFI mechanism:

	.. code-block:: cpp

	// src/relax/backend/contrib/cublas/codegen.cc
	ffi::Array<ffi::Module> CublasCompiler(
	ffi::Array<Function> functions,
	ffi::Map<ffi::String, ffi::Any> options,
	ffi::Map<Constant, ffi::String> constant_names) {
	ffi::Array<ffi::Module> compiled_functions;
	for (const auto& func : functions) {
	CublasJSONSerializer serializer(constant_names, AnalyzeVar2Value(func));
	serializer.serialize(func);
	auto graph_json = serializer.GetJSON();
	auto names = serializer.GetConstantNames();
	const auto pf = ffi::Function::GetGlobalRequired("runtime.CublasJSONRuntimeCreate");
	compiled_functions.push_back(
	pf(GetExtSymbol(func), graph_json, names).cast<ffi::Module>());
	}
	return compiled_functions;
	}

	TVM_FFI_STATIC_INIT_BLOCK() {
	namespace refl = tvm::ffi::reflection;
	refl::GlobalDef().def("relax.ext.cublas", CublasCompiler);
	}

	The codegen function receives:

	- ``functions``: the Relax functions with ``Codegen`` attribute to compile.
	- ``options``: backend-specific compilation options.
	- ``constant_names``: mapping from constant values to their names (for weight handling).

	It returns an array of ``runtime.Module`` objects — one per function — that contain the
	externally compiled code.

	Codegen strategies
	~~~~~~~~~~~~~~~~~~

	TVM provides two base classes for implementing backend codegens:

	- JSONSerializer (``src/relax/backend/contrib/codegen_json/codegen_json.h``): serializes the
	composite function into a JSON graph representation. At runtime, a backend-specific JSON
	runtime module interprets the graph and dispatches to library calls. Used by cuBLAS, cuDNN,
	and most backends.

	- CSourceCodegen (``src/relax/backend/contrib/codegen_c/codegen_c.h``): generates C/CUDA
	source code that is compiled and linked. Used when the backend requires ahead-of-time
	compilation.


	Runtime Execution
	-----------------

	After ``RunCodegen``, the original high-level function calls are replaced with:

	.. code-block:: python

	R.call_dps_packed(ExternFunc("fused_relax_matmul_cublas0"), (x, w, bias), ...)

	At runtime, ``call_dps_packed`` invokes the externally compiled function through the
	``PackedFunc`` interface. The external ``runtime.Module``\ s (produced by the codegen) are
	imported into the final executable during ``relax.build()`` and are available via the module's
	function lookup mechanism.

	For JSON-based backends (cuBLAS, cuDNN), the runtime module deserializes the JSON graph and
	dispatches each node to the corresponding library API call. For source-based backends, the
	compiled native code is called directly.


	Adding a New Backend
	--------------------

	To add support for a new external library:

	1. Define patterns in ``python/tvm/relax/backend/<target>/``:

	- Create DFPatterns using templates from ``patterns.py`` or custom patterns.
	- Write check functions to validate dtypes, shapes, and other constraints.
	- Register patterns with ``register_patterns()``.
	- Provide a ``partition_for_<backend>(mod)`` convenience function.

	2. Implement codegen in ``src/relax/backend/contrib/<target>/``:

	- Subclass ``JSONSerializer`` or ``CSourceCodegen``.
	- Implement the visitor that converts composite functions to the target format.
	- Register the codegen function as ``"relax.ext.<target>"``.

	3. Implement runtime (for JSON-based backends):

	- Create a JSON runtime module that interprets the serialized graph and dispatches
	to the library's API calls.
	- Register the runtime constructor as ``"runtime.<Target>JSONRuntimeCreate"``.


	Supported Backends
	------------------

	.. list-table::
	:header-rows: 1
	:widths: 15 25 60

	* - Backend
	- Patterns
	- Operations
	* - cuBLAS
	- ``cublas.*``
	- Matmul (with bias, activation, transpose, dequantize variants)
	* - CUTLASS
	- ``cutlass.*``
	- Matmul, conv2d, attention, residual blocks, decode matmul
	* - cuDNN
	- ``cudnn.*``
	- Conv2d (NHWC/NCHW), stacked attention
	* - DNNL
	- ``dnnl.*``
	- Matmul, conv2d (x86 CPU). Codegen exists at C++ level; patterns are
	defined in tests rather than pre-registered.


	Source Code Map
	---------------

	.. list-table::
	:header-rows: 1
	:widths: 50 50

	* - Path
	- Contents
	* - ``python/tvm/relax/backend/pattern_registry.py``
	- Pattern registry API (register_patterns, get_patterns_with_prefix)
	* - ``python/tvm/relax/backend/patterns.py``
	- Reusable pattern templates (make_matmul_pattern, etc.)
	* - ``python/tvm/relax/backend/cuda/cublas.py``
	- cuBLAS patterns and partition_for_cublas
	* - ``python/tvm/relax/backend/cuda/cutlass.py``
	- CUTLASS patterns and partition_for_cutlass
	* - ``python/tvm/relax/backend/cuda/cudnn.py``
	- cuDNN patterns and partition_for_cudnn
	* - ``src/relax/backend/pattern_registry.cc``
	- Pattern registry C++ implementation
	* - ``src/relax/transform/run_codegen.cc``
	- RunCodegen pass (CodeGenRunner)
	* - ``src/relax/transform/merge_composite_functions.cc``
	- MergeCompositeFunctions pass
	* - ``src/relax/backend/contrib/cublas/codegen.cc``
	- cuBLAS codegen (JSONSerializer-based)
	* - ``src/relax/backend/contrib/cutlass/codegen.cc``
	- CUTLASS codegen
	* - ``src/relax/backend/contrib/codegen_json/codegen_json.h``
	- JSONSerializer base class
	* - ``src/relax/backend/contrib/codegen_c/codegen_c.h``
	- CSourceCodegen base class