| .. note:: |
| :class: sphx-glr-download-link-note |
| |
| Click :ref:`here <sphx_glr_download_tutorial_autotvm_relay_x86.py>` to download the full example code |
| .. rst-class:: sphx-glr-example-title |
| |
| .. _sphx_glr_tutorial_autotvm_relay_x86.py: |
| |
| |
| Compiling and Optimizing a Model with the Python Interface (AutoTVM) |
| ==================================================================== |
| **Author**: |
| `Chris Hoge <https://github.com/hogepodge>`_ |
| |
| In the `TVMC Tutorial <tvmc_command_line_driver>`_, we covered how to compile, run, and tune a |
| pre-trained vision model, ResNet-50 v2 using the command line interface for |
| TVM, TVMC. TVM is more that just a command-line tool though, it is an |
| optimizing framework with APIs available for a number of different languages |
| that gives you tremendous flexibility in working with machine learning models. |
| |
| In this tutorial we will cover the same ground we did with TVMC, but show how |
| it is done with the Python API. Upon completion of this section, we will have |
| used the Python API for TVM to accomplish the following tasks: |
| |
| * Compile a pre-trained ResNet-50 v2 model for the TVM runtime. |
| * Run a real image through the compiled model, and interpret the output and model |
| performance. |
| * Tune the model that model on a CPU using TVM. |
| * Re-compile an optimized model using the tuning data collected by TVM. |
| * Run the image through the optimized model, and compare the output and model |
| performance. |
| |
| The goal of this section is to give you an overview of TVM's capabilites and |
| how to use them through the Python API. |
| |
| TVM is a deep learning compiler framework, with a number of different modules |
| available for working with deep learning models and operators. In this |
| tutorial we will work through how to load, compile, and optimize a model |
| using the Python API. |
| |
| We begin by importing a number of dependencies, including ``onnx`` for |
| loading and converting the model, helper utilities for downloading test data, |
| the Python Image Library for working with the image data, ``numpy`` for pre |
| and post-processing of the image data, the TVM Relay framework, and the TVM |
| Graph Executor. |
| |
| |
| .. code-block:: default |
| |
| |
| import onnx |
| from tvm.contrib.download import download_testdata |
| from PIL import Image |
| import numpy as np |
| import tvm.relay as relay |
| import tvm |
| from tvm.contrib import graph_executor |
| |
| |
| |
| |
| |
| |
| |
| Downloading and Loading the ONNX Model |
| -------------------------------------- |
| |
| For this tutorial, we will be working with ResNet-50 v2. ResNet-50 is a |
| convolutional neural network that is 50 layers deep and designed to classify |
| images. The model we will be using has been pre-trained on more than a |
| million images with 1000 different classifications. The network has an input |
| image size of 224x224. If you are interested exploring more of how the |
| ResNet-50 model is structured, we recommend downloading |
| `Netron <https://netron.app>`_, a freely available ML model viewer. |
| |
| TVM provides a helper library to download pre-trained models. By providing a |
| model URL, file name, and model type through the module, TVM will download |
| the model and save it to disk. For the instance of an ONNX model, you can |
| then load it into memory using the ONNX runtime. |
| |
| .. admonition:: Working with Other Model Formats |
| |
| TVM supports many popular model formats. A list can be found in the |
| :ref:`Compile Deep Learning Models <tutorial-frontend>` section of the TVM |
| Documentation. |
| |
| |
| .. code-block:: default |
| |
| |
| model_url = ( |
| "https://github.com/onnx/models/raw/main/" |
| "vision/classification/resnet/model/" |
| "resnet50-v2-7.onnx" |
| ) |
| |
| model_path = download_testdata(model_url, "resnet50-v2-7.onnx", module="onnx") |
| onnx_model = onnx.load(model_path) |
| |
| |
| |
| |
| |
| |
| |
| Downloading, Preprocessing, and Loading the Test Image |
| ------------------------------------------------------ |
| |
| Each model is particular when it comes to expected tensor shapes, formats and |
| data types. For this reason, most models require some pre and |
| post-processing, to ensure the input is valid and to interpret the output. |
| TVMC has adopted NumPy's ``.npz`` format for both input and output data. |
| |
| As input for this tutorial, we will use the image of a cat, but you can feel |
| free to substitute this image for any of your choosing. |
| |
| .. image:: https://s3.amazonaws.com/model-server/inputs/kitten.jpg |
| :height: 224px |
| :width: 224px |
| :align: center |
| |
| Download the image data, then convert it to a numpy array to use as an input to the model. |
| |
| |
| .. code-block:: default |
| |
| |
| img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg" |
| img_path = download_testdata(img_url, "imagenet_cat.png", module="data") |
| |
| # Resize it to 224x224 |
| resized_image = Image.open(img_path).resize((224, 224)) |
| img_data = np.asarray(resized_image).astype("float32") |
| |
| # Our input image is in HWC layout while ONNX expects CHW input, so convert the array |
| img_data = np.transpose(img_data, (2, 0, 1)) |
| |
| # Normalize according to the ImageNet input specification |
| imagenet_mean = np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1)) |
| imagenet_stddev = np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1)) |
| norm_img_data = (img_data / 255 - imagenet_mean) / imagenet_stddev |
| |
| # Add the batch dimension, as we are expecting 4-dimensional input: NCHW. |
| img_data = np.expand_dims(norm_img_data, axis=0) |
| |
| |
| |
| |
| |
| |
| |
| Compile the Model With Relay |
| ---------------------------- |
| |
| The next step is to compile the ResNet model. We begin by importing the model |
| to relay using the `from_onnx` importer. We then build the model, with |
| standard optimizations, into a TVM library. Finally, we create a TVM graph |
| runtime module from the library. |
| |
| |
| .. code-block:: default |
| |
| |
| target = "llvm" |
| |
| |
| |
| |
| |
| |
| |
| .. admonition:: Defining the Correct Target |
| |
| Specifying the correct target can have a huge impact on the performance of |
| the compiled module, as it can take advantage of hardware features |
| available on the target. For more information, please refer to |
| :ref:`Auto-tuning a convolutional network for x86 CPU <tune_relay_x86>`. |
| We recommend identifying which CPU you are running, along with optional |
| features, and set the target appropriately. For example, for some |
| processors ``target = "llvm -mcpu=skylake"``, or ``target = "llvm |
| -mcpu=skylake-avx512"`` for processors with the AVX-512 vector instruction |
| set. |
| |
| |
| |
| .. code-block:: default |
| |
| |
| # The input name may vary across model types. You can use a tool |
| # like Netron to check input names |
| input_name = "data" |
| shape_dict = {input_name: img_data.shape} |
| |
| mod, params = relay.frontend.from_onnx(onnx_model, shape_dict) |
| |
| with tvm.transform.PassContext(opt_level=3): |
| lib = relay.build(mod, target=target, params=params) |
| |
| dev = tvm.device(str(target), 0) |
| module = graph_executor.GraphModule(lib["default"](dev)) |
| |
| |
| |
| |
| |
| |
| |
| Execute on the TVM Runtime |
| -------------------------- |
| Now that we've compiled the model, we can use the TVM runtime to make |
| predictions with it. To use TVM to run the model and make predictions, we |
| need two things: |
| |
| - The compiled model, which we just produced. |
| - Valid input to the model to make predictions on. |
| |
| |
| .. code-block:: default |
| |
| |
| dtype = "float32" |
| module.set_input(input_name, img_data) |
| module.run() |
| output_shape = (1, 1000) |
| tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).numpy() |
| |
| |
| |
| |
| |
| |
| |
| Collect Basic Performance Data |
| ------------------------------ |
| We want to collect some basic performance data associated with this |
| unoptimized model and compare it to a tuned model later. To help account for |
| CPU noise, we run the computation in multiple batches in multiple |
| repetitions, then gather some basis statistics on the mean, median, and |
| standard deviation. |
| |
| |
| .. code-block:: default |
| |
| import timeit |
| |
| timing_number = 10 |
| timing_repeat = 10 |
| unoptimized = ( |
| np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number)) |
| * 1000 |
| / timing_number |
| ) |
| unoptimized = { |
| "mean": np.mean(unoptimized), |
| "median": np.median(unoptimized), |
| "std": np.std(unoptimized), |
| } |
| |
| print(unoptimized) |
| |
| |
| |
| |
| |
| .. rst-class:: sphx-glr-script-out |
| |
| Out: |
| |
| .. code-block:: none |
| |
| {'mean': 492.9028391300023, 'median': 492.6402408000058, 'std': 0.6584335240170025} |
| |
| |
| |
| Postprocess the output |
| ---------------------- |
| |
| As previously mentioned, each model will have its own particular way of |
| providing output tensors. |
| |
| In our case, we need to run some post-processing to render the outputs from |
| ResNet-50 v2 into a more human-readable form, using the lookup-table provided |
| for the model. |
| |
| |
| .. code-block:: default |
| |
| |
| from scipy.special import softmax |
| |
| # Download a list of labels |
| labels_url = "https://s3.amazonaws.com/onnx-model-zoo/synset.txt" |
| labels_path = download_testdata(labels_url, "synset.txt", module="data") |
| |
| with open(labels_path, "r") as f: |
| labels = [l.rstrip() for l in f] |
| |
| # Open the output and read the output tensor |
| scores = softmax(tvm_output) |
| scores = np.squeeze(scores) |
| ranks = np.argsort(scores)[::-1] |
| for rank in ranks[0:5]: |
| print("class='%s' with probability=%f" % (labels[rank], scores[rank])) |
| |
| |
| |
| |
| |
| .. rst-class:: sphx-glr-script-out |
| |
| Out: |
| |
| .. code-block:: none |
| |
| class='n02123045 tabby, tabby cat' with probability=0.610551 |
| class='n02123159 tiger cat' with probability=0.367180 |
| class='n02124075 Egyptian cat' with probability=0.019365 |
| class='n02129604 tiger, Panthera tigris' with probability=0.001273 |
| class='n04040759 radiator' with probability=0.000261 |
| |
| |
| |
| This should produce the following output: |
| |
| .. code-block:: bash |
| |
| # class='n02123045 tabby, tabby cat' with probability=0.610553 |
| # class='n02123159 tiger cat' with probability=0.367179 |
| # class='n02124075 Egyptian cat' with probability=0.019365 |
| # class='n02129604 tiger, Panthera tigris' with probability=0.001273 |
| # class='n04040759 radiator' with probability=0.000261 |
| |
| Tune the model |
| -------------- |
| The previous model was compiled to work on the TVM runtime, but did not |
| include any platform specific optimization. In this section, we will show you |
| how to build an optimized model using TVM to target your working platform. |
| |
| In some cases, we might not get the expected performance when running |
| inferences using our compiled module. In cases like this, we can make use of |
| the auto-tuner, to find a better configuration for our model and get a boost |
| in performance. Tuning in TVM refers to the process by which a model is |
| optimized to run faster on a given target. This differs from training or |
| fine-tuning in that it does not affect the accuracy of the model, but only |
| the runtime performance. As part of the tuning process, TVM will try running |
| many different operator implementation variants to see which perform best. |
| The results of these runs are stored in a tuning records file. |
| |
| In the simplest form, tuning requires you to provide three things: |
| |
| - the target specification of the device you intend to run this model on |
| - the path to an output file in which the tuning records will be stored |
| - a path to the model to be tuned. |
| |
| |
| |
| .. code-block:: default |
| |
| |
| import tvm.auto_scheduler as auto_scheduler |
| from tvm.autotvm.tuner import XGBTuner |
| from tvm import autotvm |
| |
| |
| |
| |
| |
| |
| |
| Set up some basic parameters for the runner. The runner takes compiled code |
| that is generated with a specific set of parameters and measures the |
| performance of it. ``number`` specifies the number of different |
| configurations that we will test, while ``repeat`` specifies how many |
| measurements we will take of each configuration. ``min_repeat_ms`` is a value |
| that specifies how long need to run configuration test. If the number of |
| repeats falls under this time, it will be increased. This option is necessary |
| for accurate tuning on GPUs, and is not required for CPU tuning. Setting this |
| value to 0 disables it. The ``timeout`` places an upper limit on how long to |
| run training code for each tested configuration. |
| |
| |
| .. code-block:: default |
| |
| |
| number = 10 |
| repeat = 1 |
| min_repeat_ms = 0 # since we're tuning on a CPU, can be set to 0 |
| timeout = 10 # in seconds |
| |
| # create a TVM runner |
| runner = autotvm.LocalRunner( |
| number=number, |
| repeat=repeat, |
| timeout=timeout, |
| min_repeat_ms=min_repeat_ms, |
| enable_cpu_cache_flush=True, |
| ) |
| |
| |
| |
| |
| |
| |
| |
| Create a simple structure for holding tuning options. We use an XGBoost |
| algorithim for guiding the search. For a production job, you will want to set |
| the number of trials to be larger than the value of 10 used here. For CPU we |
| recommend 1500, for GPU 3000-4000. The number of trials required can depend |
| on the particular model and processor, so it's worth spending some time |
| evaluating performance across a range of values to find the best balance |
| between tuning time and model optimization. Because running tuning is time |
| intensive we set number of trials to 10, but do not recommend a value this |
| small. The ``early_stopping`` parameter is the minimum number of trails to |
| run before a condition that stops the search early can be applied. The |
| measure option indicates where trial code will be built, and where it will be |
| run. In this case, we're using the ``LocalRunner`` we just created and a |
| ``LocalBuilder``. The ``tuning_records`` option specifies a file to write |
| the tuning data to. |
| |
| |
| .. code-block:: default |
| |
| |
| tuning_option = { |
| "tuner": "xgb", |
| "trials": 10, |
| "early_stopping": 100, |
| "measure_option": autotvm.measure_option( |
| builder=autotvm.LocalBuilder(build_func="default"), runner=runner |
| ), |
| "tuning_records": "resnet-50-v2-autotuning.json", |
| } |
| |
| |
| |
| |
| |
| |
| |
| .. admonition:: Defining the Tuning Search Algorithm |
| |
| By default this search is guided using an `XGBoost Grid` algorithm. |
| Depending on your model complexity and amount of time available, you might |
| want to choose a different algorithm. |
| |
| .. admonition:: Setting Tuning Parameters |
| |
| In this example, in the interest of time, we set the number of trials and |
| early stopping to 10. You will likely see more performance improvements if |
| you set these values to be higher but this comes at the expense of time |
| spent tuning. The number of trials required for convergence will vary |
| depending on the specifics of the model and the target platform. |
| |
| |
| .. code-block:: default |
| |
| |
| # begin by extracting the tasks from the onnx model |
| tasks = autotvm.task.extract_from_program(mod["main"], target=target, params=params) |
| |
| # Tune the extracted tasks sequentially. |
| for i, task in enumerate(tasks): |
| prefix = "[Task %2d/%2d] " % (i + 1, len(tasks)) |
| tuner_obj = XGBTuner(task, loss_type="rank") |
| tuner_obj.tune( |
| n_trial=min(tuning_option["trials"], len(task.config_space)), |
| early_stopping=tuning_option["early_stopping"], |
| measure_option=tuning_option["measure_option"], |
| callbacks=[ |
| autotvm.callback.progress_bar(tuning_option["trials"], prefix=prefix), |
| autotvm.callback.log_to_file(tuning_option["tuning_records"]), |
| ], |
| ) |
| |
| |
| |
| |
| |
| .. rst-class:: sphx-glr-script-out |
| |
| Out: |
| |
| .. code-block:: none |
| |
|
[Task 1/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 1/25] Current/Best: 23.32/ 23.32 GFLOPS | Progress: (4/10) | 6.33 s
[Task 1/25] Current/Best: 10.32/ 23.32 GFLOPS | Progress: (8/10) | 10.05 s
[Task 1/25] Current/Best: 16.19/ 23.71 GFLOPS | Progress: (10/10) | 11.14 s Done. |
|
[Task 2/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 2/25] Current/Best: 17.22/ 17.22 GFLOPS | Progress: (4/10) | 2.36 s
[Task 2/25] Current/Best: 18.46/ 18.87 GFLOPS | Progress: (8/10) | 3.36 s
[Task 2/25] Current/Best: 5.91/ 18.87 GFLOPS | Progress: (10/10) | 3.88 s Done. |
|
[Task 3/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 3/25] Current/Best: 7.01/ 11.57 GFLOPS | Progress: (4/10) | 3.57 s
[Task 3/25] Current/Best: 12.55/ 22.65 GFLOPS | Progress: (8/10) | 5.88 s
[Task 3/25] Current/Best: 7.98/ 22.65 GFLOPS | Progress: (10/10) | 6.88 s Done. |
|
[Task 4/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 4/25] Current/Best: 13.18/ 17.68 GFLOPS | Progress: (4/10) | 2.51 s
[Task 4/25] Current/Best: 11.04/ 17.68 GFLOPS | Progress: (8/10) | 5.30 s
[Task 4/25] Current/Best: 6.79/ 17.68 GFLOPS | Progress: (10/10) | 6.08 s Done. |
|
[Task 5/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 5/25] Current/Best: 10.02/ 19.43 GFLOPS | Progress: (4/10) | 2.90 s
[Task 5/25] Current/Best: 8.31/ 19.43 GFLOPS | Progress: (8/10) | 4.58 s
[Task 5/25] Current/Best: 20.53/ 20.53 GFLOPS | Progress: (10/10) | 5.19 s Done. |
|
[Task 6/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 6/25] Current/Best: 11.34/ 18.66 GFLOPS | Progress: (4/10) | 2.88 s
[Task 6/25] Current/Best: 18.29/ 18.66 GFLOPS | Progress: (8/10) | 4.62 s
[Task 6/25] Current/Best: 16.10/ 18.66 GFLOPS | Progress: (10/10) | 6.10 s Done. |
|
[Task 7/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 7/25] Current/Best: 5.47/ 18.84 GFLOPS | Progress: (4/10) | 4.24 s
[Task 7/25] Current/Best: 8.38/ 18.84 GFLOPS | Progress: (8/10) | 7.24 s
[Task 7/25] Current/Best: 6.51/ 18.84 GFLOPS | Progress: (10/10) | 8.39 s Done. |
|
[Task 8/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 8/25] Current/Best: 9.91/ 18.74 GFLOPS | Progress: (4/10) | 4.91 s
[Task 8/25] Current/Best: 12.66/ 18.74 GFLOPS | Progress: (8/10) | 7.34 s
[Task 8/25] Current/Best: 14.14/ 18.74 GFLOPS | Progress: (10/10) | 8.83 s Done. |
|
[Task 9/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 9/25] Current/Best: 21.38/ 21.69 GFLOPS | Progress: (4/10) | 4.66 s
[Task 9/25] Current/Best: 12.55/ 21.69 GFLOPS | Progress: (8/10) | 6.91 s
[Task 9/25] Current/Best: 16.36/ 21.69 GFLOPS | Progress: (10/10) | 7.54 s Done. |
|
[Task 10/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 10/25] Current/Best: 6.12/ 14.98 GFLOPS | Progress: (4/10) | 3.08 s
[Task 10/25] Current/Best: 20.87/ 20.87 GFLOPS | Progress: (8/10) | 4.67 s
[Task 10/25] Current/Best: 14.80/ 20.87 GFLOPS | Progress: (10/10) | 5.46 s Done. |
|
[Task 11/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 11/25] Current/Best: 9.46/ 12.35 GFLOPS | Progress: (4/10) | 4.09 s
[Task 11/25] Current/Best: 21.30/ 21.30 GFLOPS | Progress: (8/10) | 6.15 s
[Task 11/25] Current/Best: 16.33/ 21.30 GFLOPS | Progress: (10/10) | 7.23 s Done. |
|
[Task 12/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 12/25] Current/Best: 18.82/ 18.82 GFLOPS | Progress: (4/10) | 5.82 s
[Task 12/25] Current/Best: 10.71/ 22.92 GFLOPS | Progress: (8/10) | 7.52 s
[Task 12/25] Current/Best: 11.69/ 22.92 GFLOPS | Progress: (10/10) | 10.88 s Done. |
|
[Task 13/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 13/25] Current/Best: 21.12/ 23.12 GFLOPS | Progress: (4/10) | 3.47 s
[Task 13/25] Current/Best: 17.31/ 23.12 GFLOPS | Progress: (8/10) | 6.82 s
[Task 13/25] Current/Best: 14.10/ 23.12 GFLOPS | Progress: (10/10) | 8.47 s Done. |
|
[Task 14/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 14/25] Current/Best: 9.35/ 11.64 GFLOPS | Progress: (4/10) | 4.36 s
[Task 14/25] Current/Best: 14.85/ 14.85 GFLOPS | Progress: (8/10) | 6.14 s
[Task 14/25] Current/Best: 14.89/ 14.89 GFLOPS | Progress: (10/10) | 6.87 s
[Task 15/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 15/25] Current/Best: 11.97/ 23.54 GFLOPS | Progress: (4/10) | 3.60 s
[Task 15/25] Current/Best: 7.11/ 23.54 GFLOPS | Progress: (8/10) | 5.61 s
[Task 15/25] Current/Best: 22.41/ 23.54 GFLOPS | Progress: (10/10) | 6.82 s Done. |
|
[Task 16/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 16/25] Current/Best: 8.30/ 20.84 GFLOPS | Progress: (4/10) | 2.56 s
[Task 16/25] Current/Best: 14.28/ 20.84 GFLOPS | Progress: (8/10) | 3.80 s
[Task 16/25] Current/Best: 10.80/ 20.84 GFLOPS | Progress: (10/10) | 5.03 s Done. |
|
[Task 17/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 17/25] Current/Best: 6.16/ 19.25 GFLOPS | Progress: (4/10) | 3.08 s
[Task 17/25] Current/Best: 16.96/ 19.99 GFLOPS | Progress: (8/10) | 5.48 s
[Task 17/25] Current/Best: 17.56/ 19.99 GFLOPS | Progress: (10/10) | 6.65 s Done. |
|
[Task 18/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 18/25] Current/Best: 18.10/ 21.88 GFLOPS | Progress: (4/10) | 2.62 s
[Task 18/25] Current/Best: 7.00/ 21.88 GFLOPS | Progress: (8/10) | 4.80 s
[Task 18/25] Current/Best: 16.30/ 21.88 GFLOPS | Progress: (10/10) | 8.76 s Done. |
|
[Task 19/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 19/25] Current/Best: 20.20/ 20.20 GFLOPS | Progress: (4/10) | 5.72 s Done. |
|
[Task 19/25] Current/Best: 14.34/ 20.20 GFLOPS | Progress: (8/10) | 13.02 s
[Task 19/25] Current/Best: 21.76/ 21.76 GFLOPS | Progress: (10/10) | 13.85 s Done. |
|
[Task 20/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 20/25] Current/Best: 16.50/ 19.24 GFLOPS | Progress: (4/10) | 2.96 s
[Task 20/25] Current/Best: 9.28/ 19.24 GFLOPS | Progress: (8/10) | 7.33 s
[Task 20/25] Current/Best: 18.85/ 19.24 GFLOPS | Progress: (10/10) | 8.03 s
[Task 21/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 21/25] Current/Best: 3.16/ 13.22 GFLOPS | Progress: (4/10) | 3.16 s
[Task 21/25] Current/Best: 15.63/ 15.63 GFLOPS | Progress: (8/10) | 4.86 s
[Task 21/25] Current/Best: 19.85/ 19.85 GFLOPS | Progress: (10/10) | 6.94 s
[Task 22/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 22/25] Current/Best: 9.07/ 16.53 GFLOPS | Progress: (4/10) | 2.80 s Done. |
| Done. |
|
[Task 22/25] Current/Best: 16.59/ 16.82 GFLOPS | Progress: (8/10) | 4.55 s
[Task 22/25] Current/Best: 21.54/ 21.54 GFLOPS | Progress: (10/10) | 5.35 s Done. |
|
[Task 23/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 23/25] Current/Best: 5.04/ 18.72 GFLOPS | Progress: (4/10) | 3.42 s
[Task 23/25] Current/Best: 16.98/ 18.72 GFLOPS | Progress: (8/10) | 6.02 s
[Task 23/25] Current/Best: 23.15/ 23.15 GFLOPS | Progress: (10/10) | 7.03 s Done. |
|
[Task 24/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 24/25] Current/Best: 7.23/ 7.62 GFLOPS | Progress: (4/10) | 213.84 s
[Task 24/25] Current/Best: 5.95/ 7.62 GFLOPS | Progress: (8/10) | 229.95 s
[Task 24/25] Current/Best: 2.97/ 10.19 GFLOPS | Progress: (10/10) | 230.86 s
[Task 25/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/10) | 0.00 s Done. |
|
[Task 25/25] Current/Best: 3.66/ 9.02 GFLOPS | Progress: (4/10) | 21.84 s
[Task 25/25] Current/Best: 5.10/ 9.02 GFLOPS | Progress: (8/10) | 23.44 s
[Task 25/25] Current/Best: 7.89/ 9.02 GFLOPS | Progress: (10/10) | 24.17 s |
| |
| |
| The output from this tuning process will look something like this: |
| |
| .. code-block:: bash |
| |
| # [Task 1/24] Current/Best: 10.71/ 21.08 GFLOPS | Progress: (60/1000) | 111.77 s Done. |
| # [Task 1/24] Current/Best: 9.32/ 24.18 GFLOPS | Progress: (192/1000) | 365.02 s Done. |
| # [Task 2/24] Current/Best: 22.39/ 177.59 GFLOPS | Progress: (960/1000) | 976.17 s Done. |
| # [Task 3/24] Current/Best: 32.03/ 153.34 GFLOPS | Progress: (800/1000) | 776.84 s Done. |
| # [Task 4/24] Current/Best: 11.96/ 156.49 GFLOPS | Progress: (960/1000) | 632.26 s Done. |
| # [Task 5/24] Current/Best: 23.75/ 130.78 GFLOPS | Progress: (800/1000) | 739.29 s Done. |
| # [Task 6/24] Current/Best: 38.29/ 198.31 GFLOPS | Progress: (1000/1000) | 624.51 s Done. |
| # [Task 7/24] Current/Best: 4.31/ 210.78 GFLOPS | Progress: (1000/1000) | 701.03 s Done. |
| # [Task 8/24] Current/Best: 50.25/ 185.35 GFLOPS | Progress: (972/1000) | 538.55 s Done. |
| # [Task 9/24] Current/Best: 50.19/ 194.42 GFLOPS | Progress: (1000/1000) | 487.30 s Done. |
| # [Task 10/24] Current/Best: 12.90/ 172.60 GFLOPS | Progress: (972/1000) | 607.32 s Done. |
| # [Task 11/24] Current/Best: 62.71/ 203.46 GFLOPS | Progress: (1000/1000) | 581.92 s Done. |
| # [Task 12/24] Current/Best: 36.79/ 224.71 GFLOPS | Progress: (1000/1000) | 675.13 s Done. |
| # [Task 13/24] Current/Best: 7.76/ 219.72 GFLOPS | Progress: (1000/1000) | 519.06 s Done. |
| # [Task 14/24] Current/Best: 12.26/ 202.42 GFLOPS | Progress: (1000/1000) | 514.30 s Done. |
| # [Task 15/24] Current/Best: 31.59/ 197.61 GFLOPS | Progress: (1000/1000) | 558.54 s Done. |
| # [Task 16/24] Current/Best: 31.63/ 206.08 GFLOPS | Progress: (1000/1000) | 708.36 s Done. |
| # [Task 17/24] Current/Best: 41.18/ 204.45 GFLOPS | Progress: (1000/1000) | 736.08 s Done. |
| # [Task 18/24] Current/Best: 15.85/ 222.38 GFLOPS | Progress: (980/1000) | 516.73 s Done. |
| # [Task 19/24] Current/Best: 15.78/ 203.41 GFLOPS | Progress: (1000/1000) | 587.13 s Done. |
| # [Task 20/24] Current/Best: 30.47/ 205.92 GFLOPS | Progress: (980/1000) | 471.00 s Done. |
| # [Task 21/24] Current/Best: 46.91/ 227.99 GFLOPS | Progress: (308/1000) | 219.18 s Done. |
| # [Task 22/24] Current/Best: 13.33/ 207.66 GFLOPS | Progress: (1000/1000) | 761.74 s Done. |
| # [Task 23/24] Current/Best: 53.29/ 192.98 GFLOPS | Progress: (1000/1000) | 799.90 s Done. |
| # [Task 24/24] Current/Best: 25.03/ 146.14 GFLOPS | Progress: (1000/1000) | 1112.55 s Done. |
| |
| Compiling an Optimized Model with Tuning Data |
| ---------------------------------------------- |
| |
| As an output of the tuning process above, we obtained the tuning records |
| stored in ``resnet-50-v2-autotuning.json``. The compiler will use the results to |
| generate high performance code for the model on your specified target. |
| |
| Now that tuning data for the model has been collected, we can re-compile the |
| model using optimized operators to speed up our computations. |
| |
| |
| .. code-block:: default |
| |
| |
| with autotvm.apply_history_best(tuning_option["tuning_records"]): |
| with tvm.transform.PassContext(opt_level=3, config={}): |
| lib = relay.build(mod, target=target, params=params) |
| |
| dev = tvm.device(str(target), 0) |
| module = graph_executor.GraphModule(lib["default"](dev)) |
| |
| |
| |
| |
| |
| |
| |
| Verify that the optimized model runs and produces the same results: |
| |
| |
| .. code-block:: default |
| |
| |
| dtype = "float32" |
| module.set_input(input_name, img_data) |
| module.run() |
| output_shape = (1, 1000) |
| tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).numpy() |
| |
| scores = softmax(tvm_output) |
| scores = np.squeeze(scores) |
| ranks = np.argsort(scores)[::-1] |
| for rank in ranks[0:5]: |
| print("class='%s' with probability=%f" % (labels[rank], scores[rank])) |
| |
| |
| |
| |
| |
| .. rst-class:: sphx-glr-script-out |
| |
| Out: |
| |
| .. code-block:: none |
| |
| class='n02123045 tabby, tabby cat' with probability=0.610551 |
| class='n02123159 tiger cat' with probability=0.367180 |
| class='n02124075 Egyptian cat' with probability=0.019365 |
| class='n02129604 tiger, Panthera tigris' with probability=0.001273 |
| class='n04040759 radiator' with probability=0.000261 |
| |
| |
| |
| Verifying that the predictions are the same: |
| |
| .. code-block:: bash |
| |
| # class='n02123045 tabby, tabby cat' with probability=0.610550 |
| # class='n02123159 tiger cat' with probability=0.367181 |
| # class='n02124075 Egyptian cat' with probability=0.019365 |
| # class='n02129604 tiger, Panthera tigris' with probability=0.001273 |
| # class='n04040759 radiator' with probability=0.000261 |
| |
| Comparing the Tuned and Untuned Models |
| -------------------------------------- |
| We want to collect some basic performance data associated with this optimized |
| model to compare it to the unoptimized model. Depending on your underlying |
| hardware, number of iterations, and other factors, you should see a performance |
| improvement in comparing the optimized model to the unoptimized model. |
| |
| |
| .. code-block:: default |
| |
| |
| import timeit |
| |
| timing_number = 10 |
| timing_repeat = 10 |
| optimized = ( |
| np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number)) |
| * 1000 |
| / timing_number |
| ) |
| optimized = {"mean": np.mean(optimized), "median": np.median(optimized), "std": np.std(optimized)} |
| |
| |
| print("optimized: %s" % (optimized)) |
| print("unoptimized: %s" % (unoptimized)) |
| |
| |
| |
| |
| |
| .. rst-class:: sphx-glr-script-out |
| |
| Out: |
| |
| .. code-block:: none |
| |
| optimized: {'mean': 440.2762375400016, 'median': 440.18241799999487, 'std': 1.1120732853729713} |
| unoptimized: {'mean': 492.9028391300023, 'median': 492.6402408000058, 'std': 0.6584335240170025} |
| |
| |
| |
| Final Remarks |
| ------------- |
| |
| In this tutorial, we gave a short example of how to use the TVM Python API |
| to compile, run, and tune a model. We also discussed the need for pre and |
| post-processing of inputs and outputs. After the tuning process, we |
| demonstrated how to compare the performance of the unoptimized and optimize |
| models. |
| |
| Here we presented a simple example using ResNet-50 v2 locally. However, TVM |
| supports many more features including cross-compilation, remote execution and |
| profiling/benchmarking. |
| |
| |
| .. rst-class:: sphx-glr-timing |
| |
| **Total running time of the script:** ( 10 minutes 13.988 seconds) |
| |
| |
| .. _sphx_glr_download_tutorial_autotvm_relay_x86.py: |
| |
| |
| .. only :: html |
| |
| .. container:: sphx-glr-footer |
| :class: sphx-glr-footer-example |
| |
| |
| |
| .. container:: sphx-glr-download |
| |
| :download:`Download Python source code: autotvm_relay_x86.py <autotvm_relay_x86.py>` |
| |
| |
| |
| .. container:: sphx-glr-download |
| |
| :download:`Download Jupyter notebook: autotvm_relay_x86.ipynb <autotvm_relay_x86.ipynb>` |
| |
| |
| .. only:: html |
| |
| .. rst-class:: sphx-glr-signature |
| |
| `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_ |