| .. Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| .. http://www.apache.org/licenses/LICENSE-2.0 |
| |
| .. Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| |
| Introduction to Module Serialization |
| ==================================== |
| |
| When to deploy TVM runtime module, no matter whether it is CPU or GPU, TVM only needs one single dynamic |
| shared library. The key is our unified module serialization mechanism. This document will introduce TVM module |
| serialization format standard and implementation details. |
| |
| ********************* |
| Module Export Example |
| ********************* |
| |
| Let us build one ResNet-18 workload for GPU as an example first. |
| |
| .. code:: python |
| |
| from tvm import relay |
| from tvm.relay import testing |
| from tvm.contrib import util |
| import tvm |
| |
| # Resnet18 workload |
| resnet18_mod, resnet18_params = relay.testing.resnet.get_workload(num_layers=18) |
| |
| # build |
| with relay.build_config(opt_level=3): |
| _, resnet18_lib, _ = relay.build_module.build(resnet18_mod, "cuda", params=resnet18_params) |
| |
| # create one tempory directory |
| temp = util.tempdir() |
| |
| # path lib |
| file_name = "deploy.so" |
| path_lib = temp.relpath(file_name) |
| |
| # export library |
| resnet18_lib.export_library(path_lib) |
| |
| # load it back |
| loaded_lib = tvm.runtime.load_module(path_lib) |
| assert loaded_lib.type_key == "library" |
| assert loaded_lib.imported_modules[0].type_key == "cuda" |
| |
| ************* |
| Serialization |
| ************* |
| |
| The entrance API is ``export_library`` of ``tvm.module.Module``. |
| Inside this function, we will do the following steps: |
| |
| 1. Collect all DSO modules (LLVM modules and C modules) |
| |
| 2. Once we have DSO modules, we will call ``save`` function to save them into files. |
| |
| 3. Next, we will check whether we have imported modules, such as CUDA, |
| OpenCL or anything else. We don't restrict the module type here. |
| Once we have imported modules, we will create one file named ``devc.o`` / ``dev.cc`` |
| (so that we could embed the binary blob data of import modules into one dynamic shared library), |
| then call function ``_PackImportsToLLVM`` or ``_PackImportsToC`` to do module serialization. |
| |
| 4. Finally, we call ``fcompile`` which invokes ``_cc.create_shared`` to get |
| dynamic shared library. |
| |
| .. note:: |
| 1. For C source modules, we will compile them and link them together with the DSO module. |
| |
| 2. Use ``_PackImportsToLLVM`` or ``_PackImportsToC`` depends on whether we enable LLVM in TVM. |
| They achieve the same goal in fact. |
| |
| *************************************************** |
| Under the Hood of Serialization and Format Standard |
| *************************************************** |
| |
| As said before, we will do the serialization work in the ``_PackImportsToLLVM`` or ``_PackImportsToC``. |
| They both call ``SerializeModule`` to serialize the runtime module. In ``SerializeModule`` |
| function, we firstly construct one helper class ``ModuleSerializer``. It will take ``module`` to do some |
| initialization work, like marking module index. Then we could use its ``SerializeModule`` to serialize module. |
| |
| For better understanding, let us dig the implementation of this class a little deeper. |
| |
| The following code is used to construct ``ModuleSerializer``: |
| |
| .. code:: c++ |
| |
| explicit ModuleSerializer(runtime::Module mod) : mod_(mod) { |
| Init(); |
| } |
| private: |
| void Init() { |
| CreateModuleIndex(); |
| CreateImportTree(); |
| } |
| |
| In ``CreateModuleIndex()``, We will inspect module import relationship |
| using DFS and create index for them. Note the root module is fixed at |
| location 0. In our example, we have module relationship like this: |
| |
| .. code:: c++ |
| |
| llvm_mod:imported_modules |
| - cuda_mod |
| |
| So LLVM module will have index 0, CUDA module will have index 1. |
| |
| After constructing module index, we will try to construct import tree (``CreateImportTree()``), |
| which will be used to restore module import relationship when we load |
| the exported library back. In our design, we use CSR format to store |
| import tree, each row is parent index, the child indices correspond to its children |
| index. In code, we use ``import_tree_row_ptr_`` and |
| ``import_tree_child_indices_`` to represent them. |
| |
| After initialization, we could serialize module using ``SerializeModule`` function. |
| In its function logic, we will assume the serialization format like this: |
| |
| .. code:: c++ |
| |
| binary_blob_size |
| binary_blob_type_key |
| binary_blob_logic |
| binary_blob_type_key |
| binary_blob_logic |
| ... |
| _import_tree |
| _import_tree_logic |
| |
| ``binary_blob_size`` is the number of blobs we will have in this |
| serialization step. There will be three blobs in our example which |
| are created for LLVM module, CUDA module, and ``_import_tree``, respectively. |
| |
| ``binary_blob_type_key`` is the blob type key of module. For LLVM / C module, whose |
| blob type key is ``_lib``. For CUDA module, it is ``cuda``, which could be got by ``module->type_key()``. |
| |
| ``binary_blob_logic`` is the logic handling of blob. For most of blob (like CUDA, OpenCL), we will call |
| ``SaveToBinary`` function to serialize blob into binary. However, like LLVM / C module, we will only write |
| ``_lib`` to indicate this is a DSO module. |
| |
| .. note:: |
| Whether or not it is required to implement the SaveToBinary virtual function depends on |
| how the module is used. For example, If the module has information we need when we load |
| the dynamic shared library back, we should do. Like CUDA module, we need its binary data |
| passing to GPU driver when we load the dynamic shared library, so we should implement |
| ``SaveToBinary`` to serialize its binary data. But for host module (like DSO), we don't |
| need other information when we load the dynamic shared library, so we don't need to implement |
| ``SaveToBinary``. However, if in the future, we want to record some meta information of DSO module, |
| we could implement ``SaveToBinary`` for DSO module too. |
| |
| Finally, we will write one key ``_import_tree`` unless our module only |
| has one DSO module and it is in the root. It is used to reconstruct the |
| module import relationship when we load the exported library back as said |
| before. The ``import_tree_logic`` is just to write ``import_tree_row_ptr_`` |
| and ``import_tree_child_indices_`` into stream. |
| |
| After this step, we will pack it into a symbol |
| ``runtime::symbol::tvm_dev_mblob`` that can be recovered in the dynamic |
| libary. |
| |
| Now, we complete the serialization part. As you have seen, we could |
| support arbitrary modules to import ideally. |
| |
| **************** |
| Deserialization |
| **************** |
| |
| The entrance API is ``tvm.runtime.load``. This function |
| is to call ``_LoadFromFile`` in fact. If we dig it a little deeper, this is |
| ``Module::LoadFromFile``. In our example, the file is ``deploy.so``, |
| according to the function logic, we will call ``module.loadfile_so`` in |
| ``dso_library.cc``. The key is here: |
| |
| .. code:: c++ |
| |
| // Load the imported modules |
| const char* dev_mblob = reinterpret_cast<const char*>(lib->GetSymbol(runtime::symbol::tvm_dev_mblob)); |
| Module root_mod; |
| if (dev_mblob != nullptr) { |
| root_mod = ProcessModuleBlob(dev_mblob, lib); |
| } else { |
| // Only have one single DSO Module |
| root_mod = Module(n); |
| } |
| |
| As said before, we will pack the blob into the symbol |
| ``runtime::symbol::tvm_dev_mblob``. During deserialization part, we will |
| inspect it. If we have ``runtime::symbol::tvm_dev_mblob``, we will call ``ProcessModuleBlob``, |
| whose logic like this: |
| |
| .. code:: c++ |
| |
| READ(blob_size) |
| READ(blob_type_key) |
| for (size_t i = 0; i < blob_size; i++) { |
| if (blob_type_key == "_lib") { |
| // construct dso module using lib |
| } else if (blob_type_key == "_import_tree") { |
| // READ(_import_tree_row_ptr) |
| // READ(_import_tree_child_indices) |
| } else { |
| // call module.loadbinary_blob_type_key, such as module.loadbinary_cuda |
| // to restore. |
| } |
| } |
| // Using _import_tree_row_ptr and _import_tree_child_indices to |
| // restore module import relationship. The first module is the |
| // root module according to our invariance as said before. |
| return root_module; |
| |
| After this, we will set the ``ctx_address`` to be the ``root_module`` so |
| that allow lookup of symbol from root (so all symbols are visible). |
| |
| Finally, we complete the deserialization part. |