| .. Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| .. http://www.apache.org/licenses/LICENSE-2.0 |
| |
| .. Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| |
| ================= |
| Debugger |
| ================= |
| |
| TVM Debugger is an interface for debugging TVM's computation graph execution. It helps to provide access to graph structures and tensor values at the TVM runtime. |
| |
| ******************************************* |
| Debug Exchange Format |
| ******************************************* |
| |
| 1. Computational Graph |
| ====================== |
| The optimized graph build by relay in json |
| serialized format is dumped as it is. This contains the whole |
| information about the graph. The UX can either use this graph directly |
| or transform this graph to the format UX can understand. |
| |
| The Graph JSON format is explained below |
| |
| 1. ``nodes`` |
| Nodes are either placeholders or computational nodes in json. The nodes are stored |
| as a list. A node contains the below information |
| |
| - ``op`` - operation type, ``null`` means it is a placeholder/variable/input node and``tvm_op`` means this node can be executed |
| - ``name`` - Name of the node |
| - ``inputs`` - Position of the inputs for this operation, Inputs is a list of tuples with (nodeid, index, version). (Optional) |
| - ``attrs`` - Attributes of the node which contains the following information |
| |
| - ``flatten_data`` - Whether this data need to be flattened before execution |
| - ``func_name`` - Fused function name, corresponds to the symbol in the lib generated by relay compilation process. |
| - ``num_inputs`` - Number of inputs for this node |
| - ``num_outputs`` - Number of outputs this node produces |
| |
| 2. ``arg_nodes`` |
| arg_nodes is a list of indices of nodes which is placeholder/variable/input or constant/param to the graph. |
| |
| 3. ``heads`` |
| heads is a list of entries as the output of the graph. |
| |
| 4. ``node_row_ptr`` |
| node\_row\_ptr stores the history of forward path, so you can skip constructing the entire graph in inference tasks. |
| |
| 5. ``attrs`` |
| attrs can contain version numbers or similar helpful information. |
| |
| - ``storage_id`` - Memory slot id for each node in the storage layout. |
| - ``dtype`` - Datatype of each node (enum value). |
| - ``dltype`` - Datatype of each node in order. |
| - ``shape`` - Shape of each node k order. |
| - ``device_index`` - Device assignment for each entry in the graph. |
| |
| Example of dumped graph: |
| |
| :: |
| |
| { |
| "nodes": [ # List of nodes |
| { |
| "op": "null", # operation type = null, this is a placeholder/variable/input or constant/param node |
| "name": "x", # Name of the argument node |
| "inputs": [] # inputs for this node, its none since this is an argument node |
| }, |
| { |
| "op": "tvm_op", # operation type = tvm_op, this node can be executed |
| "name": "relu0", # Name of the node |
| "attrs": { # Attributes of the node |
| "flatten_data": "0", # Whether this data need to be flattened |
| "func_name": "fuse_l2_normalize_relu", # Fused function name, corresponds to the symbol in the lib generated by compilation process |
| "num_inputs": "1", # Number of inputs for this node |
| "num_outputs": "1" # Number of outputs this node produces |
| }, |
| "inputs": [[0, 0, 0]] # Position of the inputs for this operation |
| } |
| ], |
| "arg_nodes": [0], # Which all nodes in this are argument nodes |
| "node_row_ptr": [0, 1, 2], # Row indices for faster depth first search |
| "heads": [[1, 0, 0]], # Position of the output nodes for this operation |
| "attrs": { # Attributes for the graph |
| "storage_id": ["list_int", [1, 0]], # memory slot id for each node in the storage layout |
| "dtype": ["list_int", [0, 0]], # Datatype of each node (enum value) |
| "dltype": ["list_str", [ # Datatype of each node in order |
| "float32", |
| "float32"]], |
| "shape": ["list_shape", [ # Shape of each node k order |
| [1, 3, 20, 20], |
| [1, 3, 20, 20]]], |
| "device_index": ["list_int", [1, 1]], # Device assignment for each node in order |
| } |
| } |
| |
| 2. Tensor dumping |
| ================= |
| |
| The tensor received after execution is in ``tvm.ndarray`` type. All the tensors will |
| be saved as binary bytes in serialized format. The result binary bytes can be loaded by the |
| API "load_params". |
| |
| Example of loading the parameters |
| :: |
| with open(path_params, "rb") as fi: |
| loaded_params = bytearray(fi.read()) |
| |
| module.load_params(loaded_params) |
| |
| *************************************** |
| How to use Debugger? |
| *************************************** |
| |
| 1. In ``config.cmake`` set the ``USE_GRAPH_RUNTIME_DEBUG`` flag to ``ON`` |
| |
| :: |
| |
| # Whether enable additional graph debug functions |
| set(USE_GRAPH_RUNTIME_DEBUG ON) |
| |
| 2. Do 'make' tvm, so that it will make the ``libtvm_runtime.so`` |
| |
| 3. In frontend script file instead of |
| ``from tvm.contrib import graph_runtime`` import the |
| ``debug_runtime`` |
| ``from tvm.contrib.debugger import debug_runtime as graph_runtime`` |
| |
| :: |
| |
| from tvm.contrib.debugger import debug_runtime as graph_runtime |
| m = graph_runtime.create(graph, lib, ctx, dump_root="/tmp/tvmdbg") |
| # set inputs |
| m.set_input('data', tvm.nd.array(data.astype(dtype))) |
| m.set_input(**params) |
| # execute |
| m.run() |
| tvm_out = m.get_output(0, tvm.nd.empty(out_shape, dtype)).asnumpy() |
| |
| The outputs are dumped to a temporary folder in ``/tmp`` folder or the |
| folder specified while creating the runtime. |
| |
| *************************************** |
| Sample Output |
| *************************************** |
| |
| The below is the an example output of the debugger. |
| |
| :: |
| |
| Node Name Ops Time(us) Time(%) Start Time End Time Shape Inputs Outputs |
| --------- --- -------- ------- ---------- -------- ----- ------ ------- |
| 1_NCHW1c fuse___layout_transform___4 56.52 0.02 15:24:44.177475 15:24:44.177534 (1, 1, 224, 224) 1 1 |
| _contrib_conv2d_nchwc0 fuse__contrib_conv2d_NCHWc 12436.11 3.4 15:24:44.177549 15:24:44.189993 (1, 1, 224, 224, 1) 2 1 |
| relu0_NCHW8c fuse___layout_transform___broadcast_add_relu___layout_transform__ 4375.43 1.2 15:24:44.190027 15:24:44.194410 (8, 1, 5, 5, 1, 8) 2 1 |
| _contrib_conv2d_nchwc1 fuse__contrib_conv2d_NCHWc_1 213108.6 58.28 15:24:44.194440 15:24:44.407558 (1, 8, 224, 224, 8) 2 1 |
| relu1_NCHW8c fuse___layout_transform___broadcast_add_relu___layout_transform__ 2265.57 0.62 15:24:44.407600 15:24:44.409874 (64, 1, 1) 2 1 |
| _contrib_conv2d_nchwc2 fuse__contrib_conv2d_NCHWc_2 104623.15 28.61 15:24:44.409905 15:24:44.514535 (1, 8, 224, 224, 8) 2 1 |
| relu2_NCHW2c fuse___layout_transform___broadcast_add_relu___layout_transform___1 2004.77 0.55 15:24:44.514567 15:24:44.516582 (8, 8, 3, 3, 8, 8) 2 1 |
| _contrib_conv2d_nchwc3 fuse__contrib_conv2d_NCHWc_3 25218.4 6.9 15:24:44.516628 15:24:44.541856 (1, 8, 224, 224, 8) 2 1 |
| reshape1 fuse___layout_transform___broadcast_add_reshape_transpose_reshape 1554.25 0.43 15:24:44.541893 15:24:44.543452 (64, 1, 1) 2 1 |