| |
| .. DO NOT EDIT. THIS FILE WAS AUTOMATICALLY GENERATED BY |
| .. TVM'S MONKEY-PATCHED VERSION OF SPHINX-GALLERY. TO MAKE |
| .. CHANGES, EDIT THE SOURCE PYTHON FILE: |
| .. "tutorial/autotvm_relay_x86.py" |
| |
| .. only:: html |
| |
| .. note:: |
| :class: sphx-glr-download-link-note |
| |
| This tutorial can be used interactively with Google Colab! You can also click |
| :ref:`here <sphx_glr_download_tutorial_autotvm_relay_x86.py>` to run the Jupyter notebook locally. |
| |
| .. image:: https://raw.githubusercontent.com/tlc-pack/web-data/main/images/utilities/colab_button.svg |
| :align: center |
| :target: https://colab.research.google.com/github/apache/tvm-site/blob/asf-site/docs/_downloads/2f91b1346a0ba21b800081aa15fdaac2/autotvm_relay_x86.ipynb |
| :width: 300px |
| |
| .. rst-class:: sphx-glr-example-title |
| |
| .. _sphx_glr_tutorial_autotvm_relay_x86.py: |
| |
| |
| Compiling and Optimizing a Model with the Python Interface (AutoTVM) |
| ==================================================================== |
| **Author**: |
| `Chris Hoge <https://github.com/hogepodge>`_ |
| |
| In the `TVMC Tutorial <tvmc_command_line_driver>`_, we covered how to compile, run, and tune a |
| pre-trained vision model, ResNet-50 v2 using the command line interface for |
| TVM, TVMC. TVM is more that just a command-line tool though, it is an |
| optimizing framework with APIs available for a number of different languages |
| that gives you tremendous flexibility in working with machine learning models. |
| |
| In this tutorial we will cover the same ground we did with TVMC, but show how |
| it is done with the Python API. Upon completion of this section, we will have |
| used the Python API for TVM to accomplish the following tasks: |
| |
| * Compile a pre-trained ResNet-50 v2 model for the TVM runtime. |
| * Run a real image through the compiled model, and interpret the output and model |
| performance. |
| * Tune the model that model on a CPU using TVM. |
| * Re-compile an optimized model using the tuning data collected by TVM. |
| * Run the image through the optimized model, and compare the output and model |
| performance. |
| |
| The goal of this section is to give you an overview of TVM's capabilites and |
| how to use them through the Python API. |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 47-57 |
| |
| TVM is a deep learning compiler framework, with a number of different modules |
| available for working with deep learning models and operators. In this |
| tutorial we will work through how to load, compile, and optimize a model |
| using the Python API. |
| |
| We begin by importing a number of dependencies, including ``onnx`` for |
| loading and converting the model, helper utilities for downloading test data, |
| the Python Image Library for working with the image data, ``numpy`` for pre |
| and post-processing of the image data, the TVM Relay framework, and the TVM |
| Graph Executor. |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 57-66 |
| |
| .. code-block:: default |
| |
| |
| import onnx |
| from tvm.contrib.download import download_testdata |
| from PIL import Image |
| import numpy as np |
| import tvm.relay as relay |
| import tvm |
| from tvm.contrib import graph_executor |
| |
| |
| |
| |
| |
| |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 67-88 |
| |
| Downloading and Loading the ONNX Model |
| -------------------------------------- |
| |
| For this tutorial, we will be working with ResNet-50 v2. ResNet-50 is a |
| convolutional neural network that is 50 layers deep and designed to classify |
| images. The model we will be using has been pre-trained on more than a |
| million images with 1000 different classifications. The network has an input |
| image size of 224x224. If you are interested exploring more of how the |
| ResNet-50 model is structured, we recommend downloading |
| `Netron <https://netron.app>`_, a freely available ML model viewer. |
| |
| TVM provides a helper library to download pre-trained models. By providing a |
| model URL, file name, and model type through the module, TVM will download |
| the model and save it to disk. For the instance of an ONNX model, you can |
| then load it into memory using the ONNX runtime. |
| |
| .. admonition:: Working with Other Model Formats |
| |
| TVM supports many popular model formats. A list can be found in the |
| :ref:`Compile Deep Learning Models <tutorial-frontend>` section of the TVM |
| Documentation. |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 88-101 |
| |
| .. code-block:: default |
| |
| |
| model_url = ( |
| "https://github.com/onnx/models/raw/main/" |
| "vision/classification/resnet/model/" |
| "resnet50-v2-7.onnx" |
| ) |
| |
| model_path = download_testdata(model_url, "resnet50-v2-7.onnx", module="onnx") |
| onnx_model = onnx.load(model_path) |
| |
| # Seed numpy's RNG to get consistent results |
| np.random.seed(0) |
| |
| |
| |
| |
| |
| |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 102-119 |
| |
| Downloading, Preprocessing, and Loading the Test Image |
| ------------------------------------------------------ |
| |
| Each model is particular when it comes to expected tensor shapes, formats and |
| data types. For this reason, most models require some pre and |
| post-processing, to ensure the input is valid and to interpret the output. |
| TVMC has adopted NumPy's ``.npz`` format for both input and output data. |
| |
| As input for this tutorial, we will use the image of a cat, but you can feel |
| free to substitute this image for any of your choosing. |
| |
| .. image:: https://s3.amazonaws.com/model-server/inputs/kitten.jpg |
| :height: 224px |
| :width: 224px |
| :align: center |
| |
| Download the image data, then convert it to a numpy array to use as an input to the model. |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 119-138 |
| |
| .. code-block:: default |
| |
| |
| img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg" |
| img_path = download_testdata(img_url, "imagenet_cat.png", module="data") |
| |
| # Resize it to 224x224 |
| resized_image = Image.open(img_path).resize((224, 224)) |
| img_data = np.asarray(resized_image).astype("float32") |
| |
| # Our input image is in HWC layout while ONNX expects CHW input, so convert the array |
| img_data = np.transpose(img_data, (2, 0, 1)) |
| |
| # Normalize according to the ImageNet input specification |
| imagenet_mean = np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1)) |
| imagenet_stddev = np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1)) |
| norm_img_data = (img_data / 255 - imagenet_mean) / imagenet_stddev |
| |
| # Add the batch dimension, as we are expecting 4-dimensional input: NCHW. |
| img_data = np.expand_dims(norm_img_data, axis=0) |
| |
| |
| |
| |
| |
| |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 139-146 |
| |
| Compile the Model With Relay |
| ---------------------------- |
| |
| The next step is to compile the ResNet model. We begin by importing the model |
| to relay using the `from_onnx` importer. We then build the model, with |
| standard optimizations, into a TVM library. Finally, we create a TVM graph |
| runtime module from the library. |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 146-149 |
| |
| .. code-block:: default |
| |
| |
| target = "llvm" |
| |
| |
| |
| |
| |
| |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 150-162 |
| |
| .. admonition:: Defining the Correct Target |
| |
| Specifying the correct target can have a huge impact on the performance of |
| the compiled module, as it can take advantage of hardware features |
| available on the target. For more information, please refer to |
| :ref:`Auto-tuning a convolutional network for x86 CPU <tune_relay_x86>`. |
| We recommend identifying which CPU you are running, along with optional |
| features, and set the target appropriately. For example, for some |
| processors ``target = "llvm -mcpu=skylake"``, or ``target = "llvm |
| -mcpu=skylake-avx512"`` for processors with the AVX-512 vector instruction |
| set. |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 162-176 |
| |
| .. code-block:: default |
| |
| |
| # The input name may vary across model types. You can use a tool |
| # like Netron to check input names |
| input_name = "data" |
| shape_dict = {input_name: img_data.shape} |
| |
| mod, params = relay.frontend.from_onnx(onnx_model, shape_dict) |
| |
| with tvm.transform.PassContext(opt_level=3): |
| lib = relay.build(mod, target=target, params=params) |
| |
| dev = tvm.device(str(target), 0) |
| module = graph_executor.GraphModule(lib["default"](dev)) |
| |
| |
| |
| |
| |
| |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 177-185 |
| |
| Execute on the TVM Runtime |
| -------------------------- |
| Now that we've compiled the model, we can use the TVM runtime to make |
| predictions with it. To use TVM to run the model and make predictions, we |
| need two things: |
| |
| - The compiled model, which we just produced. |
| - Valid input to the model to make predictions on. |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 185-192 |
| |
| .. code-block:: default |
| |
| |
| dtype = "float32" |
| module.set_input(input_name, img_data) |
| module.run() |
| output_shape = (1, 1000) |
| tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).numpy() |
| |
| |
| |
| |
| |
| |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 193-200 |
| |
| Collect Basic Performance Data |
| ------------------------------ |
| We want to collect some basic performance data associated with this |
| unoptimized model and compare it to a tuned model later. To help account for |
| CPU noise, we run the computation in multiple batches in multiple |
| repetitions, then gather some basis statistics on the mean, median, and |
| standard deviation. |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 200-217 |
| |
| .. code-block:: default |
| |
| import timeit |
| |
| timing_number = 10 |
| timing_repeat = 10 |
| unoptimized = ( |
| np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number)) |
| * 1000 |
| / timing_number |
| ) |
| unoptimized = { |
| "mean": np.mean(unoptimized), |
| "median": np.median(unoptimized), |
| "std": np.std(unoptimized), |
| } |
| |
| print(unoptimized) |
| |
| |
| |
| |
| |
| .. rst-class:: sphx-glr-script-out |
| |
| .. code-block:: none |
| |
| {'mean': 503.31041855000313, 'median': 502.7171557000031, 'std': 1.893587549916346} |
| |
| |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 218-227 |
| |
| Postprocess the output |
| ---------------------- |
| |
| As previously mentioned, each model will have its own particular way of |
| providing output tensors. |
| |
| In our case, we need to run some post-processing to render the outputs from |
| ResNet-50 v2 into a more human-readable form, using the lookup-table provided |
| for the model. |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 227-244 |
| |
| .. code-block:: default |
| |
| |
| from scipy.special import softmax |
| |
| # Download a list of labels |
| labels_url = "https://s3.amazonaws.com/onnx-model-zoo/synset.txt" |
| labels_path = download_testdata(labels_url, "synset.txt", module="data") |
| |
| with open(labels_path, "r") as f: |
| labels = [l.rstrip() for l in f] |
| |
| # Open the output and read the output tensor |
| scores = softmax(tvm_output) |
| scores = np.squeeze(scores) |
| ranks = np.argsort(scores)[::-1] |
| for rank in ranks[0:5]: |
| print("class='%s' with probability=%f" % (labels[rank], scores[rank])) |
| |
| |
| |
| |
| |
| .. rst-class:: sphx-glr-script-out |
| |
| .. code-block:: none |
| |
| class='n02123045 tabby, tabby cat' with probability=0.621103 |
| class='n02123159 tiger cat' with probability=0.356379 |
| class='n02124075 Egyptian cat' with probability=0.019712 |
| class='n02129604 tiger, Panthera tigris' with probability=0.001215 |
| class='n04040759 radiator' with probability=0.000262 |
| |
| |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 245-254 |
| |
| This should produce the following output: |
| |
| .. code-block:: bash |
| |
| # class='n02123045 tabby, tabby cat' with probability=0.610553 |
| # class='n02123159 tiger cat' with probability=0.367179 |
| # class='n02124075 Egyptian cat' with probability=0.019365 |
| # class='n02129604 tiger, Panthera tigris' with probability=0.001273 |
| # class='n04040759 radiator' with probability=0.000261 |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 256-278 |
| |
| Tune the model |
| -------------- |
| The previous model was compiled to work on the TVM runtime, but did not |
| include any platform specific optimization. In this section, we will show you |
| how to build an optimized model using TVM to target your working platform. |
| |
| In some cases, we might not get the expected performance when running |
| inferences using our compiled module. In cases like this, we can make use of |
| the auto-tuner, to find a better configuration for our model and get a boost |
| in performance. Tuning in TVM refers to the process by which a model is |
| optimized to run faster on a given target. This differs from training or |
| fine-tuning in that it does not affect the accuracy of the model, but only |
| the runtime performance. As part of the tuning process, TVM will try running |
| many different operator implementation variants to see which perform best. |
| The results of these runs are stored in a tuning records file. |
| |
| In the simplest form, tuning requires you to provide three things: |
| |
| - the target specification of the device you intend to run this model on |
| - the path to an output file in which the tuning records will be stored |
| - a path to the model to be tuned. |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 278-283 |
| |
| .. code-block:: default |
| |
| |
| import tvm.auto_scheduler as auto_scheduler |
| from tvm.autotvm.tuner import XGBTuner |
| from tvm import autotvm |
| |
| |
| |
| |
| |
| |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 284-294 |
| |
| Set up some basic parameters for the runner. The runner takes compiled code |
| that is generated with a specific set of parameters and measures the |
| performance of it. ``number`` specifies the number of different |
| configurations that we will test, while ``repeat`` specifies how many |
| measurements we will take of each configuration. ``min_repeat_ms`` is a value |
| that specifies how long need to run configuration test. If the number of |
| repeats falls under this time, it will be increased. This option is necessary |
| for accurate tuning on GPUs, and is not required for CPU tuning. Setting this |
| value to 0 disables it. The ``timeout`` places an upper limit on how long to |
| run training code for each tested configuration. |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 294-309 |
| |
| .. code-block:: default |
| |
| |
| number = 10 |
| repeat = 1 |
| min_repeat_ms = 0 # since we're tuning on a CPU, can be set to 0 |
| timeout = 10 # in seconds |
| |
| # create a TVM runner |
| runner = autotvm.LocalRunner( |
| number=number, |
| repeat=repeat, |
| timeout=timeout, |
| min_repeat_ms=min_repeat_ms, |
| enable_cpu_cache_flush=True, |
| ) |
| |
| |
| |
| |
| |
| |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 310-324 |
| |
| Create a simple structure for holding tuning options. We use an XGBoost |
| algorithim for guiding the search. For a production job, you will want to set |
| the number of trials to be larger than the value of 20 used here. For CPU we |
| recommend 1500, for GPU 3000-4000. The number of trials required can depend |
| on the particular model and processor, so it's worth spending some time |
| evaluating performance across a range of values to find the best balance |
| between tuning time and model optimization. Because running tuning is time |
| intensive we set number of trials to 10, but do not recommend a value this |
| small. The ``early_stopping`` parameter is the minimum number of trails to |
| run before a condition that stops the search early can be applied. The |
| measure option indicates where trial code will be built, and where it will be |
| run. In this case, we're using the ``LocalRunner`` we just created and a |
| ``LocalBuilder``. The ``tuning_records`` option specifies a file to write |
| the tuning data to. |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 324-335 |
| |
| .. code-block:: default |
| |
| |
| tuning_option = { |
| "tuner": "xgb", |
| "trials": 20, |
| "early_stopping": 100, |
| "measure_option": autotvm.measure_option( |
| builder=autotvm.LocalBuilder(build_func="default"), runner=runner |
| ), |
| "tuning_records": "resnet-50-v2-autotuning.json", |
| } |
| |
| |
| |
| |
| |
| |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 336-341 |
| |
| .. admonition:: Defining the Tuning Search Algorithm |
| |
| By default this search is guided using an `XGBoost Grid` algorithm. |
| Depending on your model complexity and amount of time available, you might |
| want to choose a different algorithm. |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 344-351 |
| |
| .. admonition:: Setting Tuning Parameters |
| |
| In this example, in the interest of time, we set the number of trials and |
| early stopping to 20 and 100. You will likely see more performance improvements if |
| you set these values to be higher but this comes at the expense of time |
| spent tuning. The number of trials required for convergence will vary |
| depending on the specifics of the model and the target platform. |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 351-406 |
| |
| .. code-block:: default |
| |
| |
| # begin by extracting the tasks from the onnx model |
| tasks = autotvm.task.extract_from_program(mod["main"], target=target, params=params) |
| |
| # Tune the extracted tasks sequentially. |
| for i, task in enumerate(tasks): |
| prefix = "[Task %2d/%2d] " % (i + 1, len(tasks)) |
| |
| # choose tuner |
| tuner = "xgb" |
| |
| # create tuner |
| if tuner == "xgb": |
| tuner_obj = XGBTuner(task, loss_type="reg") |
| elif tuner == "xgb_knob": |
| tuner_obj = XGBTuner(task, loss_type="reg", feature_type="knob") |
| elif tuner == "xgb_itervar": |
| tuner_obj = XGBTuner(task, loss_type="reg", feature_type="itervar") |
| elif tuner == "xgb_curve": |
| tuner_obj = XGBTuner(task, loss_type="reg", feature_type="curve") |
| elif tuner == "xgb_rank": |
| tuner_obj = XGBTuner(task, loss_type="rank") |
| elif tuner == "xgb_rank_knob": |
| tuner_obj = XGBTuner(task, loss_type="rank", feature_type="knob") |
| elif tuner == "xgb_rank_itervar": |
| tuner_obj = XGBTuner(task, loss_type="rank", feature_type="itervar") |
| elif tuner == "xgb_rank_curve": |
| tuner_obj = XGBTuner(task, loss_type="rank", feature_type="curve") |
| elif tuner == "xgb_rank_binary": |
| tuner_obj = XGBTuner(task, loss_type="rank-binary") |
| elif tuner == "xgb_rank_binary_knob": |
| tuner_obj = XGBTuner(task, loss_type="rank-binary", feature_type="knob") |
| elif tuner == "xgb_rank_binary_itervar": |
| tuner_obj = XGBTuner(task, loss_type="rank-binary", feature_type="itervar") |
| elif tuner == "xgb_rank_binary_curve": |
| tuner_obj = XGBTuner(task, loss_type="rank-binary", feature_type="curve") |
| elif tuner == "ga": |
| tuner_obj = GATuner(task, pop_size=50) |
| elif tuner == "random": |
| tuner_obj = RandomTuner(task) |
| elif tuner == "gridsearch": |
| tuner_obj = GridSearchTuner(task) |
| else: |
| raise ValueError("Invalid tuner: " + tuner) |
| |
| tuner_obj.tune( |
| n_trial=min(tuning_option["trials"], len(task.config_space)), |
| early_stopping=tuning_option["early_stopping"], |
| measure_option=tuning_option["measure_option"], |
| callbacks=[ |
| autotvm.callback.progress_bar(tuning_option["trials"], prefix=prefix), |
| autotvm.callback.log_to_file(tuning_option["tuning_records"]), |
| ], |
| ) |
| |
| |
| |
| |
| |
| .. rst-class:: sphx-glr-script-out |
| |
| .. code-block:: none |
| |
|
[Task 1/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 1/25] Current/Best: 11.84/ 23.31 GFLOPS | Progress: (4/20) | 9.03 s
[Task 1/25] Current/Best: 4.83/ 23.31 GFLOPS | Progress: (8/20) | 11.79 s
[Task 1/25] Current/Best: 21.36/ 23.31 GFLOPS | Progress: (12/20) | 15.45 s
[Task 1/25] Current/Best: 13.72/ 23.31 GFLOPS | Progress: (16/20) | 17.94 s
[Task 1/25] Current/Best: 20.22/ 23.31 GFLOPS | Progress: (20/20) | 21.23 s Done. |
|
[Task 2/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 2/25] Current/Best: 4.90/ 12.97 GFLOPS | Progress: (4/20) | 4.87 s
[Task 2/25] Current/Best: 13.52/ 19.73 GFLOPS | Progress: (8/20) | 6.68 s
[Task 2/25] Current/Best: 16.20/ 19.73 GFLOPS | Progress: (12/20) | 8.36 s
[Task 2/25] Current/Best: 15.19/ 19.73 GFLOPS | Progress: (16/20) | 9.78 s
[Task 2/25] Current/Best: 13.40/ 19.73 GFLOPS | Progress: (20/20) | 11.46 s Done. |
|
[Task 3/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 3/25] Current/Best: 6.91/ 23.32 GFLOPS | Progress: (4/20) | 5.20 s
[Task 3/25] Current/Best: 20.73/ 23.32 GFLOPS | Progress: (8/20) | 7.38 s
[Task 3/25] Current/Best: 7.65/ 23.32 GFLOPS | Progress: (12/20) | 9.90 s
[Task 3/25] Current/Best: 12.72/ 23.32 GFLOPS | Progress: (16/20) | 13.03 s
[Task 3/25] Current/Best: 18.90/ 23.32 GFLOPS | Progress: (20/20) | 15.25 s Done. |
|
[Task 4/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 4/25] Current/Best: 11.60/ 14.46 GFLOPS | Progress: (4/20) | 5.17 s
[Task 4/25] Current/Best: 11.61/ 14.46 GFLOPS | Progress: (8/20) | 8.44 s
[Task 4/25] Current/Best: 12.23/ 14.46 GFLOPS | Progress: (12/20) | 11.45 s
[Task 4/25] Current/Best: 17.67/ 17.67 GFLOPS | Progress: (16/20) | 13.88 s
[Task 4/25] Current/Best: 6.92/ 17.67 GFLOPS | Progress: (20/20) | 19.49 s Done. |
|
[Task 5/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 5/25] Current/Best: 14.70/ 14.70 GFLOPS | Progress: (4/20) | 5.35 s
[Task 5/25] Current/Best: 14.58/ 14.77 GFLOPS | Progress: (8/20) | 7.69 s
[Task 5/25] Current/Best: 17.46/ 18.91 GFLOPS | Progress: (12/20) | 9.62 s
[Task 5/25] Current/Best: 16.30/ 18.91 GFLOPS | Progress: (16/20) | 11.96 s
[Task 5/25] Current/Best: 17.71/ 18.91 GFLOPS | Progress: (20/20) | 14.14 s Done. |
|
[Task 6/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 6/25] Current/Best: 9.26/ 20.53 GFLOPS | Progress: (4/20) | 6.01 s
[Task 6/25] Current/Best: 14.94/ 20.53 GFLOPS | Progress: (8/20) | 8.60 s
[Task 6/25] Current/Best: 11.64/ 20.53 GFLOPS | Progress: (12/20) | 11.08 s
[Task 6/25] Current/Best: 10.89/ 20.53 GFLOPS | Progress: (16/20) | 13.80 s
[Task 6/25] Current/Best: 11.60/ 20.53 GFLOPS | Progress: (20/20) | 17.06 s Done. |
|
[Task 7/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 7/25] Current/Best: 19.65/ 19.65 GFLOPS | Progress: (4/20) | 5.42 s
[Task 7/25] Current/Best: 12.92/ 19.65 GFLOPS | Progress: (8/20) | 7.97 s
[Task 7/25] Current/Best: 9.92/ 20.77 GFLOPS | Progress: (12/20) | 10.60 s
[Task 7/25] Current/Best: 14.00/ 20.77 GFLOPS | Progress: (16/20) | 13.53 s
[Task 7/25] Current/Best: 12.60/ 20.77 GFLOPS | Progress: (20/20) | 16.93 s Done. |
|
[Task 8/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 8/25] Current/Best: 10.41/ 20.22 GFLOPS | Progress: (4/20) | 14.28 s
[Task 8/25] Current/Best: 10.04/ 20.22 GFLOPS | Progress: (8/20) | 21.53 s
[Task 8/25] Current/Best: 11.56/ 20.22 GFLOPS | Progress: (12/20) | 25.93 s
[Task 8/25] Current/Best: 16.20/ 20.22 GFLOPS | Progress: (16/20) | 37.38 s
[Task 8/25] Current/Best: 13.52/ 20.22 GFLOPS | Progress: (20/20) | 40.88 s
[Task 9/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 9/25] Current/Best: 6.85/ 19.06 GFLOPS | Progress: (4/20) | 6.02 s
[Task 9/25] Current/Best: 11.98/ 19.06 GFLOPS | Progress: (8/20) | 15.16 s
[Task 9/25] Current/Best: 9.64/ 19.06 GFLOPS | Progress: (12/20) | 21.03 s
[Task 9/25] Current/Best: 10.25/ 19.06 GFLOPS | Progress: (16/20) | 26.57 s
[Task 9/25] Current/Best: 16.79/ 19.06 GFLOPS | Progress: (20/20) | 29.07 s Done. |
|
[Task 10/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 10/25] Current/Best: 14.24/ 14.24 GFLOPS | Progress: (4/20) | 5.32 s
[Task 10/25] Current/Best: 12.83/ 17.14 GFLOPS | Progress: (8/20) | 7.57 s
[Task 10/25] Current/Best: 10.16/ 17.14 GFLOPS | Progress: (12/20) | 10.17 s
[Task 10/25] Current/Best: 14.79/ 17.14 GFLOPS | Progress: (16/20) | 11.84 s
[Task 10/25] Current/Best: 18.04/ 18.04 GFLOPS | Progress: (20/20) | 14.40 s Done. |
|
[Task 11/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 11/25] Current/Best: 23.38/ 23.38 GFLOPS | Progress: (4/20) | 4.72 s
[Task 11/25] Current/Best: 11.88/ 23.73 GFLOPS | Progress: (8/20) | 7.21 s
[Task 11/25] Current/Best: 8.90/ 23.73 GFLOPS | Progress: (12/20) | 9.54 s
[Task 11/25] Current/Best: 10.59/ 23.73 GFLOPS | Progress: (16/20) | 13.28 s
[Task 11/25] Current/Best: 3.09/ 23.73 GFLOPS | Progress: (20/20) | 16.08 s Done. |
|
[Task 12/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 12/25] Current/Best: 4.09/ 17.37 GFLOPS | Progress: (4/20) | 6.39 s
[Task 12/25] Current/Best: 17.98/ 17.98 GFLOPS | Progress: (8/20) | 10.05 s
[Task 12/25] Current/Best: 13.32/ 17.98 GFLOPS | Progress: (12/20) | 12.79 s
[Task 12/25] Current/Best: 10.69/ 17.98 GFLOPS | Progress: (16/20) | 16.26 s
[Task 12/25] Current/Best: 18.47/ 18.47 GFLOPS | Progress: (20/20) | 18.48 s Done. |
|
[Task 13/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 13/25] Current/Best: 17.90/ 19.23 GFLOPS | Progress: (4/20) | 5.24 s
[Task 13/25] Current/Best: 18.84/ 19.23 GFLOPS | Progress: (8/20) | 8.74 s
[Task 13/25] Current/Best: 9.41/ 19.23 GFLOPS | Progress: (12/20) | 11.38 s
[Task 13/25] Current/Best: 19.25/ 19.25 GFLOPS | Progress: (16/20) | 13.67 s
[Task 13/25] Current/Best: 12.79/ 19.25 GFLOPS | Progress: (20/20) | 16.28 s Done. |
|
[Task 14/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 14/25] Current/Best: 14.23/ 16.92 GFLOPS | Progress: (4/20) | 7.61 s
[Task 14/25] Current/Best: 15.90/ 17.44 GFLOPS | Progress: (8/20) | 9.55 s
[Task 14/25] Current/Best: 16.03/ 17.44 GFLOPS | Progress: (12/20) | 19.71 s
[Task 14/25] Current/Best: 20.22/ 20.22 GFLOPS | Progress: (16/20) | 22.48 s
[Task 14/25] Current/Best: 13.06/ 20.22 GFLOPS | Progress: (20/20) | 28.78 s Done. |
|
[Task 15/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 15/25] Current/Best: 6.16/ 19.06 GFLOPS | Progress: (4/20) | 4.89 s
[Task 15/25] Current/Best: 12.34/ 19.06 GFLOPS | Progress: (8/20) | 6.73 s
[Task 15/25] Current/Best: 7.91/ 20.47 GFLOPS | Progress: (12/20) | 11.19 s
[Task 15/25] Current/Best: 12.01/ 20.47 GFLOPS | Progress: (16/20) | 20.25 s
[Task 15/25] Current/Best: 10.48/ 20.47 GFLOPS | Progress: (20/20) | 22.87 s Done. |
|
[Task 16/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 16/25] Current/Best: 9.89/ 18.02 GFLOPS | Progress: (4/20) | 6.81 s
[Task 16/25] Current/Best: 20.60/ 20.60 GFLOPS | Progress: (8/20) | 9.69 s
[Task 16/25] Current/Best: 3.11/ 20.60 GFLOPS | Progress: (12/20) | 11.64 s
[Task 16/25] Current/Best: 6.62/ 20.60 GFLOPS | Progress: (16/20) | 14.15 s
[Task 16/25] Current/Best: 8.28/ 20.60 GFLOPS | Progress: (20/20) | 15.78 s Done. |
|
[Task 17/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 17/25] Current/Best: 18.34/ 18.34 GFLOPS | Progress: (4/20) | 7.24 s
[Task 17/25] Current/Best: 13.78/ 18.34 GFLOPS | Progress: (8/20) | 9.91 s
[Task 17/25] Current/Best: 9.42/ 19.96 GFLOPS | Progress: (12/20) | 13.06 s
[Task 17/25] Current/Best: 19.28/ 21.98 GFLOPS | Progress: (16/20) | 15.57 s
[Task 17/25] Current/Best: 22.11/ 22.11 GFLOPS | Progress: (20/20) | 18.33 s Done. |
|
[Task 18/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 18/25] Current/Best: 12.01/ 12.44 GFLOPS | Progress: (4/20) | 8.19 s
[Task 18/25] Current/Best: 6.83/ 14.57 GFLOPS | Progress: (8/20) | 10.81 s
[Task 18/25] Current/Best: 15.82/ 15.82 GFLOPS | Progress: (12/20) | 13.12 s
[Task 18/25] Current/Best: 15.86/ 15.86 GFLOPS | Progress: (16/20) | 15.51 s
[Task 18/25] Current/Best: 13.74/ 17.28 GFLOPS | Progress: (20/20) | 17.67 s Done. |
|
[Task 19/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 19/25] Current/Best: 20.32/ 20.32 GFLOPS | Progress: (4/20) | 7.65 s
[Task 19/25] Current/Best: 8.91/ 20.32 GFLOPS | Progress: (8/20) | 11.12 s
[Task 19/25] Current/Best: 3.08/ 20.62 GFLOPS | Progress: (12/20) | 14.23 s
[Task 19/25] Current/Best: 21.75/ 21.75 GFLOPS | Progress: (16/20) | 20.68 s
[Task 19/25] Current/Best: 12.59/ 21.75 GFLOPS | Progress: (20/20) | 24.29 s Done. |
|
[Task 20/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 20/25] Current/Best: 9.98/ 9.98 GFLOPS | Progress: (4/20) | 6.96 s
[Task 20/25] Current/Best: 19.58/ 19.58 GFLOPS | Progress: (8/20) | 10.77 s
[Task 20/25] Current/Best: 6.78/ 19.58 GFLOPS | Progress: (12/20) | 18.84 s
[Task 20/25] Current/Best: 12.25/ 19.58 GFLOPS | Progress: (16/20) | 21.41 s
[Task 20/25] Current/Best: 17.55/ 19.58 GFLOPS | Progress: (20/20) | 33.01 s
[Task 21/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s Done. |
| Done. |
|
[Task 21/25] Current/Best: 6.26/ 17.77 GFLOPS | Progress: (4/20) | 14.40 s
[Task 21/25] Current/Best: 20.50/ 20.58 GFLOPS | Progress: (8/20) | 17.82 s
[Task 21/25] Current/Best: 21.58/ 21.58 GFLOPS | Progress: (12/20) | 26.96 s
[Task 21/25] Current/Best: 13.23/ 21.58 GFLOPS | Progress: (16/20) | 29.34 s
[Task 21/25] Current/Best: 7.98/ 21.58 GFLOPS | Progress: (20/20) | 40.61 s
[Task 22/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 22/25] Current/Best: 5.10/ 20.93 GFLOPS | Progress: (4/20) | 5.85 s
[Task 22/25] Current/Best: 16.30/ 20.93 GFLOPS | Progress: (8/20) | 8.08 s
[Task 22/25] Current/Best: 15.27/ 20.93 GFLOPS | Progress: (12/20) | 10.14 s
[Task 22/25] Current/Best: 18.37/ 20.93 GFLOPS | Progress: (16/20) | 12.91 s
[Task 22/25] Current/Best: 6.29/ 20.93 GFLOPS | Progress: (20/20) | 15.12 s Done. |
|
[Task 23/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 23/25] Current/Best: 1.55/ 21.50 GFLOPS | Progress: (4/20) | 6.92 s
[Task 23/25] Current/Best: 18.73/ 21.50 GFLOPS | Progress: (8/20) | 13.59 s
[Task 23/25] Current/Best: 3.08/ 21.50 GFLOPS | Progress: (12/20) | 17.61 s
[Task 23/25] Current/Best: 2.69/ 22.94 GFLOPS | Progress: (16/20) | 22.04 s
[Task 23/25] Current/Best: 20.12/ 22.94 GFLOPS | Progress: (20/20) | 24.80 s Done. |
|
[Task 24/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 24/25] Current/Best: 7.31/ 9.96 GFLOPS | Progress: (4/20) | 13.34 s
[Task 24/25] Current/Best: 2.87/ 9.96 GFLOPS | Progress: (8/20) | 16.52 s
[Task 24/25] Current/Best: 9.26/ 9.96 GFLOPS | Progress: (12/20) | 18.50 s
[Task 24/25] Current/Best: 5.71/ 9.96 GFLOPS | Progress: (16/20) | 29.56 s
[Task 24/25] Current/Best: 6.97/ 9.96 GFLOPS | Progress: (20/20) | 40.62 s
[Task 25/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/20) | 0.00 s
[Task 25/25] Current/Best: 8.53/ 8.53 GFLOPS | Progress: (4/20) | 5.80 s
[Task 25/25] Current/Best: 2.59/ 8.53 GFLOPS | Progress: (8/20) | 7.35 s
[Task 25/25] Current/Best: 2.81/ 8.53 GFLOPS | Progress: (12/20) | 18.39 s
[Task 25/25] Current/Best: 1.54/ 8.53 GFLOPS | Progress: (16/20) | 30.71 s Done. |
| Done. |
|
[Task 25/25] Current/Best: 2.99/ 8.53 GFLOPS | Progress: (20/20) | 41.69 s |
| |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 407-436 |
| |
| The output from this tuning process will look something like this: |
| |
| .. code-block:: bash |
| |
| # [Task 1/24] Current/Best: 10.71/ 21.08 GFLOPS | Progress: (60/1000) | 111.77 s Done. |
| # [Task 1/24] Current/Best: 9.32/ 24.18 GFLOPS | Progress: (192/1000) | 365.02 s Done. |
| # [Task 2/24] Current/Best: 22.39/ 177.59 GFLOPS | Progress: (960/1000) | 976.17 s Done. |
| # [Task 3/24] Current/Best: 32.03/ 153.34 GFLOPS | Progress: (800/1000) | 776.84 s Done. |
| # [Task 4/24] Current/Best: 11.96/ 156.49 GFLOPS | Progress: (960/1000) | 632.26 s Done. |
| # [Task 5/24] Current/Best: 23.75/ 130.78 GFLOPS | Progress: (800/1000) | 739.29 s Done. |
| # [Task 6/24] Current/Best: 38.29/ 198.31 GFLOPS | Progress: (1000/1000) | 624.51 s Done. |
| # [Task 7/24] Current/Best: 4.31/ 210.78 GFLOPS | Progress: (1000/1000) | 701.03 s Done. |
| # [Task 8/24] Current/Best: 50.25/ 185.35 GFLOPS | Progress: (972/1000) | 538.55 s Done. |
| # [Task 9/24] Current/Best: 50.19/ 194.42 GFLOPS | Progress: (1000/1000) | 487.30 s Done. |
| # [Task 10/24] Current/Best: 12.90/ 172.60 GFLOPS | Progress: (972/1000) | 607.32 s Done. |
| # [Task 11/24] Current/Best: 62.71/ 203.46 GFLOPS | Progress: (1000/1000) | 581.92 s Done. |
| # [Task 12/24] Current/Best: 36.79/ 224.71 GFLOPS | Progress: (1000/1000) | 675.13 s Done. |
| # [Task 13/24] Current/Best: 7.76/ 219.72 GFLOPS | Progress: (1000/1000) | 519.06 s Done. |
| # [Task 14/24] Current/Best: 12.26/ 202.42 GFLOPS | Progress: (1000/1000) | 514.30 s Done. |
| # [Task 15/24] Current/Best: 31.59/ 197.61 GFLOPS | Progress: (1000/1000) | 558.54 s Done. |
| # [Task 16/24] Current/Best: 31.63/ 206.08 GFLOPS | Progress: (1000/1000) | 708.36 s Done. |
| # [Task 17/24] Current/Best: 41.18/ 204.45 GFLOPS | Progress: (1000/1000) | 736.08 s Done. |
| # [Task 18/24] Current/Best: 15.85/ 222.38 GFLOPS | Progress: (980/1000) | 516.73 s Done. |
| # [Task 19/24] Current/Best: 15.78/ 203.41 GFLOPS | Progress: (1000/1000) | 587.13 s Done. |
| # [Task 20/24] Current/Best: 30.47/ 205.92 GFLOPS | Progress: (980/1000) | 471.00 s Done. |
| # [Task 21/24] Current/Best: 46.91/ 227.99 GFLOPS | Progress: (308/1000) | 219.18 s Done. |
| # [Task 22/24] Current/Best: 13.33/ 207.66 GFLOPS | Progress: (1000/1000) | 761.74 s Done. |
| # [Task 23/24] Current/Best: 53.29/ 192.98 GFLOPS | Progress: (1000/1000) | 799.90 s Done. |
| # [Task 24/24] Current/Best: 25.03/ 146.14 GFLOPS | Progress: (1000/1000) | 1112.55 s Done. |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 438-447 |
| |
| Compiling an Optimized Model with Tuning Data |
| ---------------------------------------------- |
| |
| As an output of the tuning process above, we obtained the tuning records |
| stored in ``resnet-50-v2-autotuning.json``. The compiler will use the results to |
| generate high performance code for the model on your specified target. |
| |
| Now that tuning data for the model has been collected, we can re-compile the |
| model using optimized operators to speed up our computations. |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 447-455 |
| |
| .. code-block:: default |
| |
| |
| with autotvm.apply_history_best(tuning_option["tuning_records"]): |
| with tvm.transform.PassContext(opt_level=3, config={}): |
| lib = relay.build(mod, target=target, params=params) |
| |
| dev = tvm.device(str(target), 0) |
| module = graph_executor.GraphModule(lib["default"](dev)) |
| |
| |
| |
| |
| |
| .. rst-class:: sphx-glr-script-out |
| |
| .. code-block:: none |
| |
| Done. |
| |
| |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 456-457 |
| |
| Verify that the optimized model runs and produces the same results: |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 457-470 |
| |
| .. code-block:: default |
| |
| |
| dtype = "float32" |
| module.set_input(input_name, img_data) |
| module.run() |
| output_shape = (1, 1000) |
| tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).numpy() |
| |
| scores = softmax(tvm_output) |
| scores = np.squeeze(scores) |
| ranks = np.argsort(scores)[::-1] |
| for rank in ranks[0:5]: |
| print("class='%s' with probability=%f" % (labels[rank], scores[rank])) |
| |
| |
| |
| |
| |
| .. rst-class:: sphx-glr-script-out |
| |
| .. code-block:: none |
| |
| class='n02123045 tabby, tabby cat' with probability=0.621104 |
| class='n02123159 tiger cat' with probability=0.356378 |
| class='n02124075 Egyptian cat' with probability=0.019712 |
| class='n02129604 tiger, Panthera tigris' with probability=0.001215 |
| class='n04040759 radiator' with probability=0.000262 |
| |
| |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 471-480 |
| |
| Verifying that the predictions are the same: |
| |
| .. code-block:: bash |
| |
| # class='n02123045 tabby, tabby cat' with probability=0.610550 |
| # class='n02123159 tiger cat' with probability=0.367181 |
| # class='n02124075 Egyptian cat' with probability=0.019365 |
| # class='n02129604 tiger, Panthera tigris' with probability=0.001273 |
| # class='n04040759 radiator' with probability=0.000261 |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 482-488 |
| |
| Comparing the Tuned and Untuned Models |
| -------------------------------------- |
| We want to collect some basic performance data associated with this optimized |
| model to compare it to the unoptimized model. Depending on your underlying |
| hardware, number of iterations, and other factors, you should see a performance |
| improvement in comparing the optimized model to the unoptimized model. |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 488-504 |
| |
| .. code-block:: default |
| |
| |
| import timeit |
| |
| timing_number = 10 |
| timing_repeat = 10 |
| optimized = ( |
| np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number)) |
| * 1000 |
| / timing_number |
| ) |
| optimized = {"mean": np.mean(optimized), "median": np.median(optimized), "std": np.std(optimized)} |
| |
| |
| print("optimized: %s" % (optimized)) |
| print("unoptimized: %s" % (unoptimized)) |
| |
| |
| |
| |
| |
| .. rst-class:: sphx-glr-script-out |
| |
| .. code-block:: none |
| |
| optimized: {'mean': 421.9829224799946, 'median': 422.4424844499936, 'std': 2.5120245504977543} |
| unoptimized: {'mean': 503.31041855000313, 'median': 502.7171557000031, 'std': 1.893587549916346} |
| |
| |
| |
| |
| .. GENERATED FROM PYTHON SOURCE LINES 505-517 |
| |
| Final Remarks |
| ------------- |
| |
| In this tutorial, we gave a short example of how to use the TVM Python API |
| to compile, run, and tune a model. We also discussed the need for pre and |
| post-processing of inputs and outputs. After the tuning process, we |
| demonstrated how to compare the performance of the unoptimized and optimize |
| models. |
| |
| Here we presented a simple example using ResNet-50 v2 locally. However, TVM |
| supports many more features including cross-compilation, remote execution and |
| profiling/benchmarking. |
| |
| |
| .. rst-class:: sphx-glr-timing |
| |
| **Total running time of the script:** ( 14 minutes 8.890 seconds) |
| |
| |
| .. _sphx_glr_download_tutorial_autotvm_relay_x86.py: |
| |
| .. only:: html |
| |
| .. container:: sphx-glr-footer sphx-glr-footer-example |
| |
| |
| .. container:: sphx-glr-download sphx-glr-download-python |
| |
| :download:`Download Python source code: autotvm_relay_x86.py <autotvm_relay_x86.py>` |
| |
| .. container:: sphx-glr-download sphx-glr-download-jupyter |
| |
| :download:`Download Jupyter notebook: autotvm_relay_x86.ipynb <autotvm_relay_x86.ipynb>` |
| |
| |
| .. only:: html |
| |
| .. rst-class:: sphx-glr-signature |
| |
| `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_ |