docs/dev/debugger.rst - tvm - Git at Google

 ..  Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at

 ..    http://www.apache.org/licenses/LICENSE-2.0

 ..  Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.

 =================
 Debugger
 =================

 TVM Debugger is an interface for debugging TVM's computation graph execution. It helps to provide access to graph structures and tensor values at the TVM runtime.

 *******************************************
 Debug Exchange Format
 *******************************************

 1. Computational Graph
 ======================
 The optimized graph build by relay in json
 serialized format is dumped as it is. This contains the whole
 information about the graph. The UX can either use this graph directly
 or transform this graph to the format UX can understand.

 The Graph JSON format is explained below

 1. ``nodes``
 Nodes are either placeholders or computational nodes in json. The nodes are stored
 as a list. A node contains the below information

 -     ``op`` - operation type, ``null`` means it is a placeholder/variable/input node and``tvm_op`` means this node can be executed
 -     ``name`` - Name of the node
 -     ``inputs`` - Position of the inputs for this operation, Inputs is a list of tuples with (nodeid, index, version). (Optional)
 -     ``attrs`` - Attributes of the node which contains the following information

     -     ``flatten_data`` - Whether this data need to be flattened before execution
     -     ``func_name`` - Fused function name, corresponds to the symbol in the lib generated by relay compilation process.
     -     ``num_inputs`` - Number of inputs for this node
     -     ``num_outputs`` - Number of outputs this node produces

 2. ``arg_nodes``
 arg_nodes is a list of indices of nodes which is placeholder/variable/input or constant/param to the graph.

 3. ``heads``
 heads is a list of entries as the output of the graph.

 4. ``node_row_ptr``
 node\_row\_ptr stores the history of forward path, so you can skip constructing the entire graph in inference tasks.

 5. ``attrs``
 attrs can contain version numbers or similar helpful information.

 - ``storage_id`` - Memory slot id for each node in the storage layout.
 - ``dtype`` - Datatype of each node (enum value).
 - ``dltype`` - Datatype of each node in order.
 - ``shape`` - Shape of each node k order.
 - ``device_index`` - Device assignment for each entry in the graph.

 Example of dumped graph:

 ::

     {
       "nodes": [                                    # List of nodes
         {
           "op": "null",                             # operation type = null, this is a placeholder/variable/input or constant/param node
           "name": "x",                              # Name of the argument node
           "inputs": []                              # inputs for this node, its none since this is an argument node
         },
         {
           "op": "tvm_op",                           # operation type = tvm_op, this node can be executed
           "name": "relu0",                          # Name of the node
           "attrs": {                                # Attributes of the node
             "flatten_data": "0",                    # Whether this data need to be flattened
             "func_name": "fuse_l2_normalize_relu",  # Fused function name, corresponds to the symbol in the lib generated by compilation process
             "num_inputs": "1",                      # Number of inputs for this node
             "num_outputs": "1"                      # Number of outputs this node produces
           },
           "inputs": [[0, 0, 0]]                     # Position of the inputs for this operation
         }
       ],
       "arg_nodes": [0],                             # Which all nodes in this are argument nodes
       "node_row_ptr": [0, 1, 2],                    # Row indices for faster depth first search
       "heads": [[1, 0, 0]],                         # Position of the output nodes for this operation
       "attrs": {                                    # Attributes for the graph
         "storage_id": ["list_int", [1, 0]],         # memory slot id for each node in the storage layout
         "dtype": ["list_int", [0, 0]],              # Datatype of each node (enum value)
         "dltype": ["list_str", [                    # Datatype of each node in order
             "float32",
             "float32"]],
         "shape": ["list_shape", [                   # Shape of each node k order
             [1, 3, 20, 20],
             [1, 3, 20, 20]]],
         "device_index": ["list_int", [1, 1]],       # Device assignment for each node in order
       }
     }

 2. Tensor dumping
 =================

 The tensor received after execution is in ``tvm.ndarray`` type. All the tensors will
 be saved as binary bytes in serialized format.  The result binary bytes can be loaded by the
 API "load_params".

 Example of loading the parameters
    ::
     with open(path_params, "rb") as fi:
         loaded_params = bytearray(fi.read())

     module.load_params(loaded_params)

 ***************************************
 How to use Debugger?
 ***************************************

 1. In ``config.cmake`` set the ``USE_GRAPH_RUNTIME_DEBUG`` flag to ``ON``

    ::

        # Whether enable additional graph debug functions
        set(USE_GRAPH_RUNTIME_DEBUG ON)

 2. Do 'make' tvm, so that it will make the ``libtvm_runtime.so``

 3. In frontend script file instead of
    ``from tvm.contrib import graph_runtime`` import the
    ``debug_runtime``
    ``from tvm.contrib.debugger import debug_runtime as graph_runtime``

 ::

     from tvm.contrib.debugger import debug_runtime as graph_runtime
     m = graph_runtime.create(graph, lib, ctx, dump_root="/tmp/tvmdbg")
     # set inputs
     m.set_input('data', tvm.nd.array(data.astype(dtype)))
     m.set_input(**params)
     # execute
     m.run()
     tvm_out = m.get_output(0, tvm.nd.empty(out_shape, dtype)).asnumpy()

 The outputs are dumped to a temporary folder in ``/tmp`` folder or the
 folder specified while creating the runtime.

 ***************************************
 Sample Output
 ***************************************

 The below is the an example output of the debugger.

 ::

     Node Name               Ops                                                                  Time(us)   Time(%)  Start Time       End Time         Shape                Inputs  Outputs
     ---------               ---                                                                  --------   -------  ----------       --------         -----                ------  -------
     1_NCHW1c                fuse___layout_transform___4                                          56.52      0.02     15:24:44.177475  15:24:44.177534  (1, 1, 224, 224)     1       1
     _contrib_conv2d_nchwc0  fuse__contrib_conv2d_NCHWc                                           12436.11   3.4      15:24:44.177549  15:24:44.189993  (1, 1, 224, 224, 1)  2       1
     relu0_NCHW8c            fuse___layout_transform___broadcast_add_relu___layout_transform__    4375.43    1.2      15:24:44.190027  15:24:44.194410  (8, 1, 5, 5, 1, 8)   2       1
     _contrib_conv2d_nchwc1  fuse__contrib_conv2d_NCHWc_1                                         213108.6   58.28    15:24:44.194440  15:24:44.407558  (1, 8, 224, 224, 8)  2       1
     relu1_NCHW8c            fuse___layout_transform___broadcast_add_relu___layout_transform__    2265.57    0.62     15:24:44.407600  15:24:44.409874  (64, 1, 1)           2       1
     _contrib_conv2d_nchwc2  fuse__contrib_conv2d_NCHWc_2                                         104623.15  28.61    15:24:44.409905  15:24:44.514535  (1, 8, 224, 224, 8)  2       1
     relu2_NCHW2c            fuse___layout_transform___broadcast_add_relu___layout_transform___1  2004.77    0.55     15:24:44.514567  15:24:44.516582  (8, 8, 3, 3, 8, 8)   2       1
     _contrib_conv2d_nchwc3  fuse__contrib_conv2d_NCHWc_3                                         25218.4    6.9      15:24:44.516628  15:24:44.541856  (1, 8, 224, 224, 8)  2       1
     reshape1                fuse___layout_transform___broadcast_add_reshape_transpose_reshape    1554.25    0.43     15:24:44.541893  15:24:44.543452  (64, 1, 1)           2       1
	.. Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	.. http://www.apache.org/licenses/LICENSE-2.0

	.. Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.

	=================
	Debugger
	=================

	TVM Debugger is an interface for debugging TVM's computation graph execution. It helps to provide access to graph structures and tensor values at the TVM runtime.

	*******************************************
	Debug Exchange Format
	*******************************************

	1. Computational Graph
	======================
	The optimized graph build by relay in json
	serialized format is dumped as it is. This contains the whole
	information about the graph. The UX can either use this graph directly
	or transform this graph to the format UX can understand.

	The Graph JSON format is explained below

	1. ``nodes``
	Nodes are either placeholders or computational nodes in json. The nodes are stored
	as a list. A node contains the below information

	- ``op`` - operation type, ``null`` means it is a placeholder/variable/input node and``tvm_op`` means this node can be executed
	- ``name`` - Name of the node
	- ``inputs`` - Position of the inputs for this operation, Inputs is a list of tuples with (nodeid, index, version). (Optional)
	- ``attrs`` - Attributes of the node which contains the following information

	- ``flatten_data`` - Whether this data need to be flattened before execution
	- ``func_name`` - Fused function name, corresponds to the symbol in the lib generated by relay compilation process.
	- ``num_inputs`` - Number of inputs for this node
	- ``num_outputs`` - Number of outputs this node produces

	2. ``arg_nodes``
	arg_nodes is a list of indices of nodes which is placeholder/variable/input or constant/param to the graph.

	3. ``heads``
	heads is a list of entries as the output of the graph.

	4. ``node_row_ptr``
	node\_row\_ptr stores the history of forward path, so you can skip constructing the entire graph in inference tasks.

	5. ``attrs``
	attrs can contain version numbers or similar helpful information.

	- ``storage_id`` - Memory slot id for each node in the storage layout.
	- ``dtype`` - Datatype of each node (enum value).
	- ``dltype`` - Datatype of each node in order.
	- ``shape`` - Shape of each node k order.
	- ``device_index`` - Device assignment for each entry in the graph.

	Example of dumped graph:

	::

	{
	"nodes": [ # List of nodes
	{
	"op": "null", # operation type = null, this is a placeholder/variable/input or constant/param node
	"name": "x", # Name of the argument node
	"inputs": [] # inputs for this node, its none since this is an argument node
	},
	{
	"op": "tvm_op", # operation type = tvm_op, this node can be executed
	"name": "relu0", # Name of the node
	"attrs": { # Attributes of the node
	"flatten_data": "0", # Whether this data need to be flattened
	"func_name": "fuse_l2_normalize_relu", # Fused function name, corresponds to the symbol in the lib generated by compilation process
	"num_inputs": "1", # Number of inputs for this node
	"num_outputs": "1" # Number of outputs this node produces
	},
	"inputs": [[0, 0, 0]] # Position of the inputs for this operation
	}
	],
	"arg_nodes": [0], # Which all nodes in this are argument nodes
	"node_row_ptr": [0, 1, 2], # Row indices for faster depth first search
	"heads": [[1, 0, 0]], # Position of the output nodes for this operation
	"attrs": { # Attributes for the graph
	"storage_id": ["list_int", [1, 0]], # memory slot id for each node in the storage layout
	"dtype": ["list_int", [0, 0]], # Datatype of each node (enum value)
	"dltype": ["list_str", [ # Datatype of each node in order
	"float32",
	"float32"]],
	"shape": ["list_shape", [ # Shape of each node k order
	[1, 3, 20, 20],
	[1, 3, 20, 20]]],
	"device_index": ["list_int", [1, 1]], # Device assignment for each node in order
	}
	}

	2. Tensor dumping
	=================

	The tensor received after execution is in ``tvm.ndarray`` type. All the tensors will
	be saved as binary bytes in serialized format. The result binary bytes can be loaded by the
	API "load_params".

	Example of loading the parameters
	::
	with open(path_params, "rb") as fi:
	loaded_params = bytearray(fi.read())

	module.load_params(loaded_params)

	***************************************
	How to use Debugger?
	***************************************

	1. In ``config.cmake`` set the ``USE_GRAPH_RUNTIME_DEBUG`` flag to ``ON``

	::

	# Whether enable additional graph debug functions
	set(USE_GRAPH_RUNTIME_DEBUG ON)

	2. Do 'make' tvm, so that it will make the ``libtvm_runtime.so``

	3. In frontend script file instead of
	``from tvm.contrib import graph_runtime`` import the
	``debug_runtime``
	``from tvm.contrib.debugger import debug_runtime as graph_runtime``

	::

	from tvm.contrib.debugger import debug_runtime as graph_runtime
	m = graph_runtime.create(graph, lib, ctx, dump_root="/tmp/tvmdbg")
	# set inputs
	m.set_input('data', tvm.nd.array(data.astype(dtype)))
	m.set_input(**params)
	# execute
	m.run()
	tvm_out = m.get_output(0, tvm.nd.empty(out_shape, dtype)).asnumpy()

	The outputs are dumped to a temporary folder in ``/tmp`` folder or the
	folder specified while creating the runtime.

	***************************************
	Sample Output
	***************************************

	The below is the an example output of the debugger.

	::

	Node Name Ops Time(us) Time(%) Start Time End Time Shape Inputs Outputs
	--------- --- -------- ------- ---------- -------- ----- ------ -------
	1_NCHW1c fuse___layout_transform___4 56.52 0.02 15:24:44.177475 15:24:44.177534 (1, 1, 224, 224) 1 1
	_contrib_conv2d_nchwc0 fuse__contrib_conv2d_NCHWc 12436.11 3.4 15:24:44.177549 15:24:44.189993 (1, 1, 224, 224, 1) 2 1
	relu0_NCHW8c fuse___layout_transform___broadcast_add_relu___layout_transform__ 4375.43 1.2 15:24:44.190027 15:24:44.194410 (8, 1, 5, 5, 1, 8) 2 1
	_contrib_conv2d_nchwc1 fuse__contrib_conv2d_NCHWc_1 213108.6 58.28 15:24:44.194440 15:24:44.407558 (1, 8, 224, 224, 8) 2 1
	relu1_NCHW8c fuse___layout_transform___broadcast_add_relu___layout_transform__ 2265.57 0.62 15:24:44.407600 15:24:44.409874 (64, 1, 1) 2 1
	_contrib_conv2d_nchwc2 fuse__contrib_conv2d_NCHWc_2 104623.15 28.61 15:24:44.409905 15:24:44.514535 (1, 8, 224, 224, 8) 2 1
	relu2_NCHW2c fuse___layout_transform___broadcast_add_relu___layout_transform___1 2004.77 0.55 15:24:44.514567 15:24:44.516582 (8, 8, 3, 3, 8, 8) 2 1
	_contrib_conv2d_nchwc3 fuse__contrib_conv2d_NCHWc_3 25218.4 6.9 15:24:44.516628 15:24:44.541856 (1, 8, 224, 224, 8) 2 1
	reshape1 fuse___layout_transform___broadcast_add_reshape_transpose_reshape 1554.25 0.43 15:24:44.541893 15:24:44.543452 (64, 1, 1) 2 1