| |
| .. DO NOT EDIT. THIS FILE WAS AUTOMATICALLY GENERATED BY |
| .. TVM'S MONKEY-PATCHED VERSION OF SPHINX-GALLERY. TO MAKE |
| .. CHANGES, EDIT THE SOURCE PYTHON FILE: |
| .. "how_to/work_with_microtvm/micro_autotune.py" |
| |
| .. only:: html |
| |
| .. note:: |
| :class: sphx-glr-download-link-note |
| |
| This tutorial can be used interactively with Google Colab! You can also click |
| :ref:`here <sphx_glr_download_how_to_work_with_microtvm_micro_autotune.py>` to run the Jupyter notebook locally. |
| |
| .. image:: https://raw.githubusercontent.com/tlc-pack/web-data/main/images/utilities/colab_button.svg |
| :align: center |
| :target: https://colab.research.google.com/github/apache/tvm-site/blob/asf-site/docs/_downloads/f83ba3df2d52f9b54cf141114359481a/micro_autotune.ipynb |
| :width: 300px |
| |
| .. rst-class:: sphx-glr-example-title |
| |
| .. _sphx_glr_how_to_work_with_microtvm_micro_autotune.py: |
| |
| |
| .. _tutorial-micro-autotune: |
| |
| 6. Model Tuning with microTVM |
| ============================= |
| **Authors**: |
| `Andrew Reusch <https://github.com/areusch>`_, |
| `Mehrdad Hessar <https://github.com/mehrdadh>`_ |
| |
| This tutorial explains how to autotune a model using the C runtime. |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 31-33 |
| |
| .. include:: ../../../../gallery/how_to/work_with_microtvm/install_dependencies.rst |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 34-42 |
| |
| .. code-block:: default |
| |
| |
| |
| # You can skip the following section (installing Zephyr) if the following flag is False. |
| # Installing Zephyr takes ~20 min. |
| import os |
| |
| use_physical_hw = bool(os.getenv("TVM_MICRO_USE_HW")) |
| |
| |
| |
| |
| |
| |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 43-45 |
| |
| .. include:: ../../../../gallery/how_to/work_with_microtvm/install_zephyr.rst |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 49-52 |
| |
| Import Python dependencies |
| ------------------------------- |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 52-60 |
| |
| .. code-block:: default |
| |
| import json |
| import numpy as np |
| import pathlib |
| |
| import tvm |
| from tvm.relay.backend import Runtime |
| import tvm.micro.testing |
| |
| |
| |
| |
| |
| |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 61-67 |
| |
| Defining the model |
| ################### |
| |
| To begin with, define a model in Relay to be executed on-device. Then create an IRModule from relay model and |
| fill parameters with random numbers. |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 67-92 |
| |
| .. code-block:: default |
| |
| |
| data_shape = (1, 3, 10, 10) |
| weight_shape = (6, 3, 5, 5) |
| |
| data = tvm.relay.var("data", tvm.relay.TensorType(data_shape, "float32")) |
| weight = tvm.relay.var("weight", tvm.relay.TensorType(weight_shape, "float32")) |
| |
| y = tvm.relay.nn.conv2d( |
| data, |
| weight, |
| padding=(2, 2), |
| kernel_size=(5, 5), |
| kernel_layout="OIHW", |
| out_dtype="float32", |
| ) |
| f = tvm.relay.Function([data, weight], y) |
| |
| relay_mod = tvm.IRModule.from_expr(f) |
| relay_mod = tvm.relay.transform.InferType()(relay_mod) |
| |
| weight_sample = np.random.rand( |
| weight_shape[0], weight_shape[1], weight_shape[2], weight_shape[3] |
| ).astype("float32") |
| params = {"weight": weight_sample} |
| |
| |
| |
| |
| |
| |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 93-104 |
| |
| Defining the target |
| ###################### |
| Now we define the TVM target that describes the execution environment. This looks very similar |
| to target definitions from other microTVM tutorials. Alongside this we pick the C Runtime to code |
| generate our model against. |
| |
| When running on physical hardware, choose a target and a board that |
| describe the hardware. There are multiple hardware targets that could be selected from |
| PLATFORM list in this tutorial. You can chose the platform by passing --platform argument when running |
| this tutorial. |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 104-118 |
| |
| .. code-block:: default |
| |
| |
| RUNTIME = Runtime("crt", {"system-lib": True}) |
| TARGET = tvm.micro.testing.get_target("crt") |
| |
| # Compiling for physical hardware |
| # -------------------------------------------------------------------------- |
| # When running on physical hardware, choose a TARGET and a BOARD that describe the hardware. The |
| # STM32L4R5ZI Nucleo target and board is chosen in the example below. |
| if use_physical_hw: |
| BOARD = os.getenv("TVM_MICRO_BOARD", default="nucleo_l4r5zi") |
| SERIAL = os.getenv("TVM_MICRO_SERIAL", default=None) |
| TARGET = tvm.micro.testing.get_target("zephyr", BOARD) |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 119-128 |
| |
| Extracting tuning tasks |
| ######################## |
| Not all operators in the Relay program printed above can be tuned. Some are so trivial that only |
| a single implementation is defined; others don't make sense as tuning tasks. Using |
| `extract_from_program`, you can produce a list of tunable tasks. |
| |
| Because task extraction involves running the compiler, we first configure the compiler's |
| transformation passes; we'll apply the same configuration later on during autotuning. |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 128-134 |
| |
| .. code-block:: default |
| |
| |
| pass_context = tvm.transform.PassContext(opt_level=3, config={"tir.disable_vectorize": True}) |
| with pass_context: |
| tasks = tvm.autotvm.task.extract_from_program(relay_mod["main"], {}, TARGET) |
| assert len(tasks) > 0 |
| |
| |
| |
| |
| |
| |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 135-145 |
| |
| Configuring microTVM |
| ##################### |
| Before autotuning, we need to define a module loader and then pass that to |
| a `tvm.autotvm.LocalBuilder`. Then we create a `tvm.autotvm.LocalRunner` and use |
| both builder and runner to generates multiple measurements for auto tunner. |
| |
| In this tutorial, we have the option to use x86 host as an example or use different targets |
| from Zephyr RTOS. If you choose pass `--platform=host` to this tutorial it will uses x86. You can |
| choose other options by choosing from `PLATFORM` list. |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 145-183 |
| |
| .. code-block:: default |
| |
| |
| module_loader = tvm.micro.AutoTvmModuleLoader( |
| template_project_dir=pathlib.Path(tvm.micro.get_microtvm_template_projects("crt")), |
| project_options={"verbose": False}, |
| ) |
| builder = tvm.autotvm.LocalBuilder( |
| n_parallel=1, |
| build_kwargs={"build_option": {"tir.disable_vectorize": True}}, |
| do_fork=True, |
| build_func=tvm.micro.autotvm_build_func, |
| runtime=RUNTIME, |
| ) |
| runner = tvm.autotvm.LocalRunner(number=1, repeat=1, timeout=100, module_loader=module_loader) |
| |
| measure_option = tvm.autotvm.measure_option(builder=builder, runner=runner) |
| |
| # Compiling for physical hardware |
| if use_physical_hw: |
| module_loader = tvm.micro.AutoTvmModuleLoader( |
| template_project_dir=pathlib.Path(tvm.micro.get_microtvm_template_projects("zephyr")), |
| project_options={ |
| "board": BOARD, |
| "verbose": False, |
| "project_type": "host_driven", |
| "serial_number": SERIAL, |
| }, |
| ) |
| builder = tvm.autotvm.LocalBuilder( |
| n_parallel=1, |
| build_kwargs={"build_option": {"tir.disable_vectorize": True}}, |
| do_fork=False, |
| build_func=tvm.micro.autotvm_build_func, |
| runtime=RUNTIME, |
| ) |
| runner = tvm.autotvm.LocalRunner(number=1, repeat=1, timeout=100, module_loader=module_loader) |
| |
| measure_option = tvm.autotvm.measure_option(builder=builder, runner=runner) |
| |
| |
| |
| |
| |
| |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 184-188 |
| |
| Run Autotuning |
| ######################### |
| Now we can run autotuning separately on each extracted task on microTVM device. |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 188-206 |
| |
| .. code-block:: default |
| |
| |
| autotune_log_file = pathlib.Path("microtvm_autotune.log.txt") |
| if os.path.exists(autotune_log_file): |
| os.remove(autotune_log_file) |
| |
| num_trials = 10 |
| for task in tasks: |
| tuner = tvm.autotvm.tuner.GATuner(task) |
| tuner.tune( |
| n_trial=num_trials, |
| measure_option=measure_option, |
| callbacks=[ |
| tvm.autotvm.callback.log_to_file(str(autotune_log_file)), |
| tvm.autotvm.callback.progress_bar(num_trials, si_prefix="M"), |
| ], |
| si_prefix="M", |
| ) |
| |
| |
| |
| |
| |
| |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 207-213 |
| |
| Timing the untuned program |
| ########################### |
| For comparison, let's compile and run the graph without imposing any autotuning schedules. TVM |
| will select a randomly-tuned implementation for each operator, which should not perform as well as |
| the tuned operator. |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 213-252 |
| |
| .. code-block:: default |
| |
| |
| with pass_context: |
| lowered = tvm.relay.build(relay_mod, target=TARGET, runtime=RUNTIME, params=params) |
| |
| temp_dir = tvm.contrib.utils.tempdir() |
| project = tvm.micro.generate_project( |
| str(tvm.micro.get_microtvm_template_projects("crt")), |
| lowered, |
| temp_dir / "project", |
| {"verbose": False}, |
| ) |
| |
| # Compiling for physical hardware |
| if use_physical_hw: |
| temp_dir = tvm.contrib.utils.tempdir() |
| project = tvm.micro.generate_project( |
| str(tvm.micro.get_microtvm_template_projects("zephyr")), |
| lowered, |
| temp_dir / "project", |
| { |
| "board": BOARD, |
| "verbose": False, |
| "project_type": "host_driven", |
| "serial_number": SERIAL, |
| "config_main_stack_size": 4096, |
| }, |
| ) |
| |
| project.build() |
| project.flash() |
| with tvm.micro.Session(project.transport()) as session: |
| debug_module = tvm.micro.create_local_debug_executor( |
| lowered.get_graph_json(), session.get_system_lib(), session.device |
| ) |
| debug_module.set_input(**lowered.get_params()) |
| print("########## Build without Autotuning ##########") |
| debug_module.run() |
| del debug_module |
| |
| |
| |
| |
| |
| .. rst-class:: sphx-glr-script-out |
| |
| .. code-block:: none |
| |
| ########## Build without Autotuning ########## |
| Node Name Ops Time(us) Time(%) Shape Inputs Outputs Measurements(us) |
| --------- --- -------- ------- ----- ------ ------- ---------------- |
| tvmgen_default_fused_nn_contrib_conv2d_NCHWc tvmgen_default_fused_nn_contrib_conv2d_NCHWc 302.9 98.717 (1, 2, 10, 10, 3) 2 1 [302.9] |
| tvmgen_default_fused_layout_transform_1 tvmgen_default_fused_layout_transform_1 2.983 0.972 (1, 6, 10, 10) 1 1 [2.983] |
| tvmgen_default_fused_layout_transform tvmgen_default_fused_layout_transform 0.954 0.311 (1, 1, 10, 10, 3) 1 1 [0.954] |
| Total_time - 306.836 - - - - - |
| |
| |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 253-256 |
| |
| Timing the tuned program |
| ######################### |
| Once autotuning completes, you can time execution of the entire program using the Debug Runtime: |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 256-295 |
| |
| .. code-block:: default |
| |
| |
| with tvm.autotvm.apply_history_best(str(autotune_log_file)): |
| with pass_context: |
| lowered_tuned = tvm.relay.build(relay_mod, target=TARGET, runtime=RUNTIME, params=params) |
| |
| temp_dir = tvm.contrib.utils.tempdir() |
| project = tvm.micro.generate_project( |
| str(tvm.micro.get_microtvm_template_projects("crt")), |
| lowered_tuned, |
| temp_dir / "project", |
| {"verbose": False}, |
| ) |
| |
| # Compiling for physical hardware |
| if use_physical_hw: |
| temp_dir = tvm.contrib.utils.tempdir() |
| project = tvm.micro.generate_project( |
| str(tvm.micro.get_microtvm_template_projects("zephyr")), |
| lowered_tuned, |
| temp_dir / "project", |
| { |
| "board": BOARD, |
| "verbose": False, |
| "project_type": "host_driven", |
| "serial_number": SERIAL, |
| "config_main_stack_size": 4096, |
| }, |
| ) |
| |
| project.build() |
| project.flash() |
| with tvm.micro.Session(project.transport()) as session: |
| debug_module = tvm.micro.create_local_debug_executor( |
| lowered_tuned.get_graph_json(), session.get_system_lib(), session.device |
| ) |
| debug_module.set_input(**lowered_tuned.get_params()) |
| print("########## Build with Autotuning ##########") |
| debug_module.run() |
| del debug_module |
| |
| |
| |
| |
| .. rst-class:: sphx-glr-script-out |
| |
| .. code-block:: none |
| |
| ########## Build with Autotuning ########## |
| Node Name Ops Time(us) Time(%) Shape Inputs Outputs Measurements(us) |
| --------- --- -------- ------- ----- ------ ------- ---------------- |
| tvmgen_default_fused_nn_contrib_conv2d_NCHWc tvmgen_default_fused_nn_contrib_conv2d_NCHWc 100.5 97.31 (1, 6, 10, 10, 1) 2 1 [100.5] |
| tvmgen_default_fused_layout_transform_1 tvmgen_default_fused_layout_transform_1 1.795 1.738 (1, 6, 10, 10) 1 1 [1.795] |
| tvmgen_default_fused_layout_transform tvmgen_default_fused_layout_transform 0.984 0.953 (1, 1, 10, 10, 3) 1 1 [0.984] |
| Total_time - 103.278 - - - - - |
| |
| |
| |
| |
| |
| .. rst-class:: sphx-glr-timing |
| |
| **Total running time of the script:** ( 1 minutes 31.783 seconds) |
| |
| |
| .. _sphx_glr_download_how_to_work_with_microtvm_micro_autotune.py: |
| |
| .. only:: html |
| |
| .. container:: sphx-glr-footer sphx-glr-footer-example |
| |
| |
| .. container:: sphx-glr-download sphx-glr-download-python |
| |
| :download:`Download Python source code: micro_autotune.py <micro_autotune.py>` |
| |
| .. container:: sphx-glr-download sphx-glr-download-jupyter |
| |
| :download:`Download Jupyter notebook: micro_autotune.ipynb <micro_autotune.ipynb>` |
| |
| |
| .. only:: html |
| |
| .. rst-class:: sphx-glr-signature |
| |
| `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_ |