docs/how_to/deploy/index.rst - tvm - Git at Google

 ..  Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at

 ..    http://www.apache.org/licenses/LICENSE-2.0

 ..  Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.

 .. _deploy-and-integration:

 Deploy Models and Integrate TVM
 ===============================

 This page contains guidelines on how to deploy TVM to various platforms
 as well as how to integrate it with your project.

 .. image::  https://tvm.apache.org/images/release/tvm_flexible.png

 Build the TVM runtime library
 -----------------------------

 .. _build-tvm-runtime-on-target-device:

 Unlike traditional deep learning frameworks. TVM stack is divided into two major components:

 - TVM compiler, which does all the compilation and optimizations of the model
 - TVM runtime, which runs on the target devices.

 In order to integrate the compiled module, we **do not** need to build entire
 TVM on the target device. You only need to build the TVM compiler stack on your
 desktop and use that to cross-compile modules that are deployed on the target device.

 We only need to use a light-weight runtime API that can be integrated into various platforms.

 For example, you can run the following commands to build the runtime API
 on a Linux based embedded system such as Raspberry Pi:

 .. code:: bash

     git clone --recursive https://github.com/apache/tvm tvm
     cd tvm
     mkdir build
     cp cmake/config.cmake build
     cd build
     cmake ..
     make runtime

 Note that we type ``make runtime`` to only build the runtime library.

 It is also possible to cross compile the runtime. Cross compiling
 the runtime library should not be confused with cross compiling models
 for embedded devices.

 If you want to include additional runtime such as OpenCL,
 you can modify ``config.cmake`` to enable these options.
 After you get the TVM runtime library, you can link the compiled library

 .. figure:: https://raw.githubusercontent.com/tlc-pack/web-data/main/images/dev/tvm_deploy_crosscompile.svg
    :align: center
    :width: 85%

 A model (optimized or not by TVM) can be cross compiled by TVM for
 different architectures such as ``aarch64`` on a ``x64_64`` host. Once the model
 is cross compiled it is necessary to have a runtime compatible with the target
 architecture to be able to run the cross compiled model.


 Cross compile the TVM runtime for other architectures
 -----------------------------------------------------

 In the example :ref:`above <build-tvm-runtime-on-target-device>` the runtime library was
 compiled on a Raspberry Pi. Producing the runtime library can be done much faster on
 hosts that have high performace processors with ample resources (such as laptops, workstation)
 compared to a target devices such as a Raspberry Pi. In-order to cross compile the runtime the toolchain
 for the target device must be installed. After installing the correct toolchain,
 the main difference compared to compiling natively is to pass some additional command
 line argument to cmake that specify a toolchain to be used. For reference
 building the TVM runtime library on a modern laptop (using 8 threads) for ``aarch64``
 takes around 20 seconds vs ~10 min to build the runtime on a Raspberry Pi 4.

 cross-compile for aarch64
 """""""""""""""""""""""""

 .. code-block:: bash

    sudo apt-get update
    sudo apt-get install gcc-aarch64-linux-gnu g++-aarch64-linux-gnu

 .. code-block:: bash

     cmake .. \
         -DCMAKE_SYSTEM_NAME=Linux \
         -DCMAKE_SYSTEM_VERSION=1 \
         -DCMAKE_C_COMPILER=/usr/bin/aarch64-linux-gnu-gcc \
         -DCMAKE_CXX_COMPILER=/usr/bin/aarch64-linux-gnu-g++ \
         -DCMAKE_FIND_ROOT_PATH=/usr/aarch64-linux-gnu \
         -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
         -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
         -DMACHINE_NAME=aarch64-linux-gnu

     make -j$(nproc) runtime

 For bare metal ARM devices the following toolchain is quite handy to install instead of gcc-aarch64-linux-*

 .. code-block:: bash

    sudo apt-get install gcc-multilib-arm-linux-gnueabihf g++-multilib-arm-linux-gnueabihf


 cross-compile for RISC-V
 """""""""""""""""""""""""

 .. code-block:: bash

    sudo apt-get update
    sudo apt-get install gcc-riscv64-linux-gnu g++-riscv64-linux-gnu


 .. code-block:: bash

     cmake .. \
         -DCMAKE_SYSTEM_NAME=Linux \
         -DCMAKE_SYSTEM_VERSION=1 \
         -DCMAKE_C_COMPILER=/usr/bin/riscv64-linux-gnu-gcc \
         -DCMAKE_CXX_COMPILER=/usr/bin/riscv64-linux-gnu-g++ \
         -DCMAKE_FIND_ROOT_PATH=/usr/riscv64-linux-gnu \
         -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
         -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
         -DMACHINE_NAME=riscv64-linux-gnu

     make -j$(nproc) runtime

 The ``file`` command can be used to query the architecture of the produced runtime.


 .. code-block:: bash

    file libtvm_runtime.so
    libtvm_runtime.so: ELF 64-bit LSB shared object, UCB RISC-V, version 1 (GNU/Linux), dynamically linked, BuildID[sha1]=e9ak845b3d7f2c126dab53632aea8e012d89477e, not stripped


 Optimize and tune models for target devices
 -------------------------------------------

 The easiest and recommended way to test, tune and benchmark TVM kernels on
 embedded devices is through TVM's RPC API.
 Here are the links to the related tutorials.

 - :ref:`tutorial-cross-compilation-and-rpc`
 - :ref:`tutorial-deploy-model-on-rasp`

 Deploy optimized model on target devices
 ----------------------------------------

 After you finished tuning and benchmarking, you might need to deploy the model on the
 target device without relying on RPC. See the following resources on how to do so.

 .. toctree::
    :maxdepth: 2

    cpp_deploy
    android
    integrate
    hls
    arm_compute_lib
    tensorrt
    vitis_ai
    bnns

 Additional Deployment How-Tos
 -----------------------------

 We have also developed a number of how-tos targeting specific devices, with
 working Python code that can be viewed in a Jupyter notebook. These how-tos
 describe how to prepare and deploy models to many of the supported backends.

 .. toctree::
    :maxdepth: 1

    ../deploy_models/index
	.. Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	.. http://www.apache.org/licenses/LICENSE-2.0

	.. Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.

	.. _deploy-and-integration:

	Deploy Models and Integrate TVM
	===============================

	This page contains guidelines on how to deploy TVM to various platforms
	as well as how to integrate it with your project.

	.. image:: https://tvm.apache.org/images/release/tvm_flexible.png

	Build the TVM runtime library
	-----------------------------

	.. _build-tvm-runtime-on-target-device:

	Unlike traditional deep learning frameworks. TVM stack is divided into two major components:

	- TVM compiler, which does all the compilation and optimizations of the model
	- TVM runtime, which runs on the target devices.

	In order to integrate the compiled module, we do not need to build entire
	TVM on the target device. You only need to build the TVM compiler stack on your
	desktop and use that to cross-compile modules that are deployed on the target device.

	We only need to use a light-weight runtime API that can be integrated into various platforms.

	For example, you can run the following commands to build the runtime API
	on a Linux based embedded system such as Raspberry Pi:

	.. code:: bash

	git clone --recursive https://github.com/apache/tvm tvm
	cd tvm
	mkdir build
	cp cmake/config.cmake build
	cd build
	cmake ..
	make runtime

	Note that we type ``make runtime`` to only build the runtime library.

	It is also possible to cross compile the runtime. Cross compiling
	the runtime library should not be confused with cross compiling models
	for embedded devices.

	If you want to include additional runtime such as OpenCL,
	you can modify ``config.cmake`` to enable these options.
	After you get the TVM runtime library, you can link the compiled library

	.. figure:: https://raw.githubusercontent.com/tlc-pack/web-data/main/images/dev/tvm_deploy_crosscompile.svg
	:align: center
	:width: 85%

	A model (optimized or not by TVM) can be cross compiled by TVM for
	different architectures such as ``aarch64`` on a ``x64_64`` host. Once the model
	is cross compiled it is necessary to have a runtime compatible with the target
	architecture to be able to run the cross compiled model.


	Cross compile the TVM runtime for other architectures
	-----------------------------------------------------

	In the example :ref:`above <build-tvm-runtime-on-target-device>` the runtime library was
	compiled on a Raspberry Pi. Producing the runtime library can be done much faster on
	hosts that have high performace processors with ample resources (such as laptops, workstation)
	compared to a target devices such as a Raspberry Pi. In-order to cross compile the runtime the toolchain
	for the target device must be installed. After installing the correct toolchain,
	the main difference compared to compiling natively is to pass some additional command
	line argument to cmake that specify a toolchain to be used. For reference
	building the TVM runtime library on a modern laptop (using 8 threads) for ``aarch64``
	takes around 20 seconds vs ~10 min to build the runtime on a Raspberry Pi 4.

	cross-compile for aarch64
	"""""""""""""""""""""""""

	.. code-block:: bash

	sudo apt-get update
	sudo apt-get install gcc-aarch64-linux-gnu g++-aarch64-linux-gnu

	.. code-block:: bash

	cmake .. \
	-DCMAKE_SYSTEM_NAME=Linux \
	-DCMAKE_SYSTEM_VERSION=1 \
	-DCMAKE_C_COMPILER=/usr/bin/aarch64-linux-gnu-gcc \
	-DCMAKE_CXX_COMPILER=/usr/bin/aarch64-linux-gnu-g++ \
	-DCMAKE_FIND_ROOT_PATH=/usr/aarch64-linux-gnu \
	-DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
	-DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
	-DMACHINE_NAME=aarch64-linux-gnu

	make -j$(nproc) runtime

	For bare metal ARM devices the following toolchain is quite handy to install instead of gcc-aarch64-linux-*

	.. code-block:: bash

	sudo apt-get install gcc-multilib-arm-linux-gnueabihf g++-multilib-arm-linux-gnueabihf


	cross-compile for RISC-V
	"""""""""""""""""""""""""

	.. code-block:: bash

	sudo apt-get update
	sudo apt-get install gcc-riscv64-linux-gnu g++-riscv64-linux-gnu


	.. code-block:: bash

	cmake .. \
	-DCMAKE_SYSTEM_NAME=Linux \
	-DCMAKE_SYSTEM_VERSION=1 \
	-DCMAKE_C_COMPILER=/usr/bin/riscv64-linux-gnu-gcc \
	-DCMAKE_CXX_COMPILER=/usr/bin/riscv64-linux-gnu-g++ \
	-DCMAKE_FIND_ROOT_PATH=/usr/riscv64-linux-gnu \
	-DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
	-DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
	-DMACHINE_NAME=riscv64-linux-gnu

	make -j$(nproc) runtime

	The ``file`` command can be used to query the architecture of the produced runtime.


	.. code-block:: bash

	file libtvm_runtime.so
	libtvm_runtime.so: ELF 64-bit LSB shared object, UCB RISC-V, version 1 (GNU/Linux), dynamically linked, BuildID[sha1]=e9ak845b3d7f2c126dab53632aea8e012d89477e, not stripped


	Optimize and tune models for target devices
	-------------------------------------------

	The easiest and recommended way to test, tune and benchmark TVM kernels on
	embedded devices is through TVM's RPC API.
	Here are the links to the related tutorials.

	- :ref:`tutorial-cross-compilation-and-rpc`
	- :ref:`tutorial-deploy-model-on-rasp`

	Deploy optimized model on target devices
	----------------------------------------

	After you finished tuning and benchmarking, you might need to deploy the model on the
	target device without relying on RPC. See the following resources on how to do so.

	.. toctree::
	:maxdepth: 2

	cpp_deploy
	android
	integrate
	hls
	arm_compute_lib
	tensorrt
	vitis_ai
	bnns

	Additional Deployment How-Tos
	-----------------------------

	We have also developed a number of how-tos targeting specific devices, with
	working Python code that can be viewed in a Jupyter notebook. These how-tos
	describe how to prepare and deploy models to many of the supported backends.

	.. toctree::
	:maxdepth: 1

	../deploy_models/index