| .. Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| .. http://www.apache.org/licenses/LICENSE-2.0 |
| |
| .. Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| |
| |
| .. _tvm-target-specific-overview: |
| |
| Device/Target Interactions |
| ========================== |
| |
| This documented is intended for developers interested in understanding |
| how the TVM framework interacts with specific device APIs, or who |
| may want to implement support for a new API or new hardware. |
| |
| There are three main aspects that must be implemented for any new |
| runtime environment. |
| |
| * The :ref:`DeviceAPI <tvm-target-specific-device-api>` class gives a |
| handle to a specific device, and the API used to interact with it. |
| It defines a common interface for querying device parameters |
| (e.g. memory available, number of threads, etc.) and for performing |
| simple actions (e.g. copying memory from the host, or between |
| buffers on the device). |
| |
| * The :ref:`Target <tvm-target-specific-target>` class contains a |
| description of the device on which a function will run. It is |
| exposed both to the target code generators and to the optimization |
| passes. |
| |
| * The :ref:`target code generators <tvm-target-specific-codegen>` |
| construct a :ref:`Module <tvm-runtime-system-module>` consisting of |
| one or more :ref:`PackedFunc <tvm-runtime-system-packed-func>`, from |
| an IRModule. |
| |
| .. _tvm-target-specific-device-api: |
| |
| DeviceAPI |
| --------- |
| |
| The ``DeviceAPI`` represents a handle to a specific hardware device |
| API. (e.g. ``CUDADeviceAPI`` handles all interactions through the |
| CUDA framework.) Most ``DeviceAPI`` methods accept a ``device_id`` |
| parameter to specify which device should be accessed. In Python, |
| these are typically accessed using the :py:func:`tvm.runtime.device` |
| function, which returns a handle to a specific device, accessed |
| through a specific API. (e.g. ``tvm.runtime.device('cuda',0)`` gives |
| access to physical device ``0``, accessed through the CUDA API.) |
| |
| .. _device_api.h: https://github.com/apache/tvm/blob/main/include/tvm/runtime/device_api.h |
| |
| * Attribute queries - ``GetAttr`` allows different |
| device-specific parameters to be queried, such as the device name, |
| number of threads, etc. The parameters that can be queried are |
| defined in ``enum DeviceAttrKind`` in `device_api.h`_. Not all |
| query-able parameters are supported by all devices. If a parameter |
| cannot be queried (e.g. ``kMaxClockRate`` on Vulkan), or if a |
| parameter isn't applicable (e.g. ``kWarpSize`` on CPU), then those |
| queries should return ``nullptr``. |
| |
| * Setting active device - ``SetDevice`` should set a |
| particular device as being active. If a ``PackedFunc`` generated by |
| the target-specific code gen requires execution on a device, it |
| should run on the active device. |
| |
| * Memory management - Utilities for allocating and deallocating memory |
| on the device. |
| |
| * Allocate data space - ``AllocDataSpace`` and ``FreeDataSpace`` |
| allocate and free space on the device. These allocations can be |
| provided as inputs and outputs to an operator and make up the |
| primary data flow of the operator graph. It must be possible to |
| transfer data from the host to/from a data space. The return |
| value is an opaque ``void*``. While some implementations return a |
| memory address, this is not required, and the ``void*`` may be an |
| opaque handle that is interpretable only by the device backend |
| that generated it. The ``void*`` is used as an argument to other |
| backend-specific functions, such as ``CopyDataFromTo``. |
| |
| * Allocate work space - ``AllocWorkspace`` and ``FreeWorkspace`` |
| allocate and free space on the device. Unlike data space, these |
| are used for storage of intermediate values within an operator |
| definition, and are not required to be transferable to/from the |
| host device. If a ``DeviceAPI`` subclass does not implement these |
| methods, they will default to calling the corresponding |
| ``DataSpace`` functions. |
| |
| * Copy data - ``CopyDataFromTo`` should copy data from one location |
| to another. The type of copy is determined by the ``dev_from`` |
| and ``dev_to`` parameters. Implementations should support copying |
| memory from CPU to device, from device to CPU, and from one buffer |
| to another on a single device. If the source or destination |
| locations are on the CPU, the corresponding ``void*`` points to a |
| CPU address that can be passed into ``memcpy``. If the source or |
| destinations locations are on the device, the corresponding |
| ``void*`` was previously generated by either ``AllocDataSpace`` or |
| ``AllocWorkspace``. |
| |
| These copies are queued to execute on a specific |
| ``TVMStreamHandle``. However, implementations should not assume |
| that CPU buffers remains valid or accessible after the call to |
| ``CopyDataFromTo`` completes. |
| |
| |
| * Execution stream management - Utilities for handling |
| ``TVMStreamHandle``, which represents parallel streams of execution |
| used to execute commands. |
| |
| * Create stream - ``CreateStream`` and ``FreeStream`` should |
| allocate/free a handle to a stream of execution. If a device |
| implements only a single queue of commands, then ``CreateStream`` |
| should return ``nullptr``. |
| |
| * Set active stream - ``SetStream`` should set a stream as being |
| active. While active, if a ``PackedFunc`` generated by the |
| target-specific code gen requires execution on a device, the work |
| should be submitted to the active stream. |
| |
| * Synchronize to CPU - ``StreamSync`` should synchronize a stream of |
| execution to the CPU. The call to ``StreamSync`` should return |
| once all memory transfers and computations submitted prior to the |
| ``StreamSync`` call have completed. |
| |
| * Synchronize between streams - ``SyncStreamFromTo`` should |
| introduce a synchronization barrier between the source and |
| destination stream. That is, the destination stream may not |
| proceed beyond commands currently queued until the source stream |
| has completed all commands that are currently queued. |
| |
| |
| In order to be usable by the TVM framework, the new DeviceAPI should |
| then be registered with the following steps. |
| |
| #. Create a function that instantiates the new DeviceAPI, and returns |
| a pointer to it:: |
| |
| FooDeviceAPI* FooDeviceAPI::Global() { |
| static FooDeviceAPI inst; |
| return &inst; |
| } |
| |
| #. Register the function to the tvm registry:: |
| |
| TVM_FFI_STATIC_INIT_BLOCK() { |
| namespace refl = tvm::ffi::reflection; |
| refl::GlobalDef().def("device_api.foo", FooDeviceAPI::Global); |
| } |
| |
| .. _base.h: https://github.com/apache/tvm/blob/main/include/tvm/runtime/base.h |
| |
| #. Add an entry for the new DeviceAPI to the ``TVMDeviceExtType`` enum |
| in `base.h`_. The value should be an unused value greater |
| than ``DLDeviceType::kDLExtDev``, but less than |
| ``DeviceAPIManager::kMaxDeviceAPI``. |
| |
| #. Add a case in ``DeviceName`` in `device_api.h`_ to convert from the |
| enum value to a string representation. This string representation |
| should match the name given to ``GlobalDef().def``. |
| |
| #. Add entries to the ``_DEVICE_TYPE_TO_NAME`` and ``_DEVICE_NAME_TO_TYPE`` dictionaries of |
| :py:class:`tvm.runtime.Device` for the new enum value. |
| |
| |
| .. _tvm-target-specific-target: |
| |
| Target Definition |
| ----------------- |
| |
| The ``Target`` object is a lookup table of properties about a physical |
| device, its hardware/driver limits, and its capabilities. The |
| ``Target`` is accessible both during optimization and code generation |
| stages. While the same ``Target`` class is used for all runtime |
| targets, each runtime target may need to add target-specific options. |
| |
| .. _target_kind.cc: https://github.com/apache/tvm/blob/main/src/target/target_kind.cc |
| |
| In `target_kind.cc`_, add a new declaration of |
| ``TVM_REGISTER_TARGET_KIND``, passing a string name of the new target, |
| and the ``TVMDeviceExtType`` or ``DLDeviceType`` enum value for the |
| device on which that target should run. Typically, the target name |
| and the device name will match. (e.g. The ``"cuda"`` target runs on |
| the ``kDLCUDA`` device.) There are exceptions, such as when multiple |
| different code generation targets can run on the same physical device. |
| (e.g. The ``"llvm"`` and ``"c"`` targets both run on the ``kDLCPU`` |
| device type.) |
| |
| All options for a specific target kind are added with the |
| ``add_attr_option`` function, with optional default values. A `Target` |
| parser can be added with ``set_target_parser`` to process |
| any parameters that are dynamically based on other parameters or |
| queried from device properties. |
| |
| This argument definition defines a parser that can unpack a string |
| description of a target. This is done in the ``Target::Target(const |
| String&)`` constructor in C++, which accepts a JSON-formatted string |
| and is typically called using the :py:class:`tvm.target.Target` python |
| object. For example, ``tvm.target.Target('{"kind": "cuda", |
| "max_num_threads": 1024}')`` will create a ``cuda`` target, while |
| overriding the default maximum number of threads. |
| |
| In a code generator, the target properties can be accessed using |
| ``target->GetAttr<T>(param_name)`` in C++, or with the |
| ``target.attrs`` dictionary in Python. |
| |
| |
| .. _tvm-target-specific-codegen: |
| |
| Target Code Generators |
| ---------------------- |
| |
| The code generators take an optimized ``IRModule`` and converts it |
| into an executable representation. Each code generator must be |
| registered in order to be used by the TVM framework. This is done by |
| registering a function named ``"target.build.foo"``, where ``foo`` is |
| the same name as was used in the ``TVM_REGISTER_TARGET_KIND`` |
| definition above. :: |
| |
| tvm::runtime::Module GeneratorFooCode(IRModule mod, Target target); |
| TVM_FFI_STATIC_INIT_BLOCK() { |
| namespace refl = tvm::ffi::reflection; |
| refl::GlobalDef().def("target.build.foo", GeneratorFooCode); |
| } |
| |
| The code generator takes two arguments. The first is the ``IRModule`` |
| to compile, and the second is the ``Target`` that describes the device |
| on which the code should run. Because the environment performing the |
| compilation is not necessarily the same as the environment that will |
| be executing the code, code generators should not perform any |
| attribute lookups on the device itself, and should instead access |
| parameters stored in the ``Target``. |
| |
| Each function in the input ``IRModule`` should be accessible by name |
| in the output ``runtime::Module``. |