Feature Name: microtvm_project_api
Start Date: 2020-06-09
RFC PR: apache/tvm-rfcs#0008
GitHub Issue: apache/tvm#8595

Summary

This RFC describes how TVM integrates with build systems for unconventional platforms, such as those for microcontrollers and for other bare-metal scenarios.

Motivation

Though TVM‘s primary goal is generating code to implement models from a high-level description, there are several reasons why a user might want to interact with a platform’s build system through TVM:

To perform autotuning. TVM's internal operator implementations are merely templates and rely on an automatic search process to arrive at a fast configuration for the template on a given platform. This search process requires that TVM iteratively build and time code on the platform.
To perform remote model execution. A user may wish to try several different models or schedules on a different platform without rewriting the platform-specific code. Users can do this by building generated model code against a generic implementation of the microTVM RPC server.
To debug model execution remotely. Some aspects of model execution are easy to debug using a platform-specific debugger; however, some things, such as analyzing intermediate tensor values, are more easily accomplished with TVM-specific tooling. By leveraging the generic microTVM RPC server used in (2), TVM can provide such tooling in a platform-agnostic way.

TVM currently supports these use cases through a set of interfaces:

tvm.micro.Compiler: used to produce binary and library artifacts.
tvm.micro.Flasher: used to program attached hardware
tvm.micro.Transport: used to communicate with on-device microTVM RPC server

Thus far, implementations of these interfaces have been made for Zephyr, mBED OS, and for simulated hardware using a POSIX subprocess. The latter two interfaces have proven to be a relatively good fit; however, tvm.micro.Compiler is difficult to implement because it attempts to replicate a platform's build system in TVM. TVM does not want to incorporate platform-specific build logic into its codebase.

This proposal unifies these three interfaces to form a “Project-level” interface, recognizing that it's typical to interact with unconventional platforms and their build systems at this level. It simplifies the Compiler interaction into a project-level Build, and adds an explicit generate_project method to the interface. These changes remove the need to build components and drive the link process from TVM.

Additionally, when integrating microTVM with RTOS platforms, a project management question comes up: should TVM include all Python dependencies needed to interact with each RTOS as dependencies of TVM? Given a large chunk (majority) of TVM's user base do not plan to interact with microcontroller platforms, it seems a bit unreasonable to require everyone to install micro-specific dependencies. One can use PyPI extras for this task, but there is still the issue of conflicts between dependencies listed in extras_require and those core dependencies of TVM. For example, if a micro extras_require were to be added to TVM, and one of those packages required an old version of TensorFlow, it would be impossible for TVM to update its core dependency on TensorFlow without unbreaking the platform integration. This conflict is at odds with making TVM as easy to integrate with micro platforms as possible.

As a goal, this proposal aims to allow for the same use cases as are currently supported with these improvements:

Integrating more naturally with build systems typical of embedded platforms.
Allowing TVM to automatically generate projects for platforms and define automated scripts to build such projects.

Guide-level explanation

TVM can interact with platform SDKs via its Project API. Such SDKs are common when working with non-traditional OS platforms, such as those commonly used in embedded systems (e.g. Arduino, Zephyr, iOS). Given a platform-specific implementation of this Project API, TVM can:

Generate projects that integrate implemented TVM models with generic platform runtime componeents
Build those generated projects
Program attached hardware
Drive remote model execution via the TVM RPC Server interface.

This last capability means that TVM can drive autotuning, remotely perform model inference, and debug models on non-traditional OS such as Arduino, Zephyr, and mobile platforms such as iOS and Android.

To provide support for a platform, a template project is first defined. Template projects are expected to exist entirely inside a directory and are identified to TVM by the path to the directory locally. Template projects may live either inside the TVM repository (when they can be included in the TVM CI) or in other version control repositories. The template project contains at minimum an implementation of the Project API inside an executable program known as the TVM Project API Server.

To begin working with a particular platform's Project API implementation, the user supplies TVM with the path to the top-level directory. TVM launches an instance the Project API Server (found at a standard location in that directory). TVM communicates with the Project API Server using JSON-RPC over standard OS pipes.

TVM supplies generated code to the Project API Server using Model Library Format.

Below is a survey of example workflows used with the Project API Server:

Generating a project

Suggested tvmc command-line:

$ tvmc micro generate-project --template=path/to/template path/to/project

The user imports a model into TVM and builds it using tvm.relay.build.
The user supplies TVM with the path to the template project and a path to a non-existent directory where the generated project should live.
TVM launches a Project API server in the template project.
TVM verifies that the template project is indeed a template by invoking the Project API server server_info_query method.
TVM invokes the Project API server generate_project method to generate the new project.

Building and Flashing

Suggested tvmc command-line:

$ tvmc micro build path/to/project
$ tvmc micro flash path/to/project

The user follows the steps under Generating a project.
TVM expects the Project API server to copy itself to the generated project. It launches a Project API server in the generated project directory.
TVM verifies that the generated project is not a template by invoking the Project API server server_info_query method. This method also returns options that can be used to customize the build.
TVM invokes the Project API server build method to build the project.
TVM invokes the Project API server flash method to program the attached device. The options can be used to specify a device serial number.

Host-driven model inference

Suggested tvmc command-line:

$ tvmc run --device=micro --project=path/to/project <run_options>

The user follows the steps under Generating a project.
TVM invokes the Project API server connect_transport method to connect to the remote on-device microTVM RPC server.
The microTVM RPC server is attached to a traditional TVM RPC session on the host device.
TVM drives inference on-device using the traditional TVM RPC methods. The Project API server methods read_transport and write_transport are used to receive and send data.
When the inference session is over, TVM invokes the Project API server method close_transport to release any underlying I/O resources, and terminates the Project API server.

AutoTVM

The user supplies a kernel to the AutoTVM tuner for search-based optimization.
AutoTVM generates a set of task configurations, instantiates each task, and then invokes tvm.build to produce a Module for each instantiated task.
AutoTVM produces a Model Library Format archive from the Module for each instantiated task.
AutoTVM passes the Model Library Format archive to the AutoTVM runner. Project API overrides the traditional AutoTVM runner by providing a module_loader. The microTVM module_loader connects to a supervisor TVM RPC server which carries out the microTVM project build as part of the TVM RPC session_constructor. The following steps occur in the session_constructor:
1. The Model Library Format tar is uploaded to the supervisor.
2. The user supplies a path, on the supervisor, to a template project.
3. The supervisor session_constructor performs the steps under Building and Flashing.
4. The supervisor session_constructor invokes the Project API server connect_transport method to connect to the remote device. The session constructor registers a traditional TVM RPC session on the supervisor, and this session is also used by the AutoTVM runner due to the session_constructor mechanism.
The AutoTVM runner measures runtime as normal.
The AutoTVM runner disconnects the session, closing the Project API server on the supervisor.

Reference-level explanation

Project API implementation

The Project API is a Remote Procedure Call (RPC)-type mechanism implemented using JSON-RPC. The client and server are implemented in python/tvm/micro/project_api. Tests are implemented in tests/python/unittest/test_micro_project_api.py.

Project API interface

The functions that need to be implemented as part of a Project API server are defined on the ProjectAPIHandler class in python/tvm/micro/project_api/server.py:

class ProjectAPIHandler(metaclass=abc.ABCMeta):

    @abc.abstractmethod
    def server_info_query(self) -> ServerInfo:
        raise NotImplementedError()

    @abc.abstractmethod
    def generate_project(self, model_library_format_path : pathlib.Path, standalone_crt_dir : pathlib.Path, project_dir : pathlib.Path, options : dict):
        """Generate a project from the given artifacts, copying ourselves to that project.

        Parameters
        ----------
        model_library_format_path : pathlib.Path
            Path to the Model Library Format tar archive.
        standalone_crt_dir : pathlib.Path
            Path to the root directory of the "standalone_crt" TVM build artifact. This contains the
            TVM C runtime.
        project_dir : pathlib.Path
            Path to a nonexistent directory which should be created and filled with the generated
            project.
        options : dict
            Dict mapping option name to ProjectOption.
        """
        raise NotImplementedError()

    @abc.abstractmethod
    def build(self, options : dict):
        """Build the project, enabling the flash() call to made.

        Parameters
        ----------
        options : Dict[str, ProjectOption]
            ProjectOption which may influence the build, keyed by option name.
        """
        raise NotImplementedError()

    @abc.abstractmethod
    def flash(self, options : dict):
        """Program the project onto the device.

        Parameters
        ----------
        options : Dict[str, ProjectOption]
            ProjectOption which may influence the programming process, keyed by option name.
        """
        raise NotImplementedError()

    @abc.abstractmethod
    def open_transport(self, options : dict) -> TransportTimeouts:
        """Open resources needed for the transport layer.

        This function might e.g. open files or serial ports needed in write_transport or read_transport.

        Calling this function enables the write_transport and read_transport calls. If the
        transport is not open, this method is a no-op.

        Parameters
        ----------
        options : Dict[str, ProjectOption]
            ProjectOption which may influence the programming process, keyed by option name.
        """
        raise NotImplementedError()

    @abc.abstractmethod
    def close_transport(self):
        """Close resources needed to operate the transport layer.
        This function might e.g. close files or serial ports needed in write_transport or read_transport.

        Calling this function disables the write_transport and read_transport calls. If the
        transport is not open, this method is a no-op.
        """
        raise NotImplementedError()

    @abc.abstractmethod
    def read_transport(self, n : int, timeout_sec : typing.Union[float, type(None)]) -> bytes:
        """Read data from the transport.

        Parameters
        ----------
        n : int
            The exact number of bytes to read from the transport.
        timeout_sec : Union[float, None]
            Number of seconds to wait for at least one byte to be written before timing out. If
            timeout_sec is 0, write should attempt to service the request in a non-blocking fashion.
            If timeout_sec is None, write should block until all `n` bytes of data can be returned.

        Returns
        -------
        bytes :
            Data read from the channel. Should be exactly `n` bytes long.

        Raises
        ------
        TransportClosedError :
            When the transport layer determines that the transport can no longer send or receive
            data due to an underlying I/O problem (i.e. file descriptor closed, cable removed, etc).
        IoTimeoutError :
            When `timeout_sec` elapses without receiving any data.
        """
        raise NotImplementedError()

    @abc.abstractmethod
    def write_transport(self, data : bytes, timeout_sec : float):
        """Write data to the transport.
        This function should either write all bytes in `data` or raise an exception.

        Parameters
        ----------
        data : bytes
            The data to write over the channel.
        timeout_sec : Union[float, None]
            Number of seconds to wait for all bytes to be written before timing out. If timeout_sec
            is 0, write should attempt to service the request in a non-blocking fashion. If
            timeout_sec is None, write should block until it has written all data.

        Raises
        ------
        TransportClosedError :
            When the transport layer determines that the transport can no longer send or receive
            data due to an underlying I/O problem (i.e. file descriptor closed, cable removed, etc).
        IoTimeoutError :
            When `timeout_sec` elapses without receiving any data.
        """
        raise NotImplementedError()

Project Options

Each Project API server can return project_options as part of the server_info_query response. These can be specified by the user to allow them to give platform SDK-specific options to each API method.

{"name": "str", "choices": ["str"], "help": "str"}

It's expected that user-facing clients of the Project API could expose these either as command-line flags or e.g. accepting them via a JSON or YAML file.

ServerInfo

In response to a server_info_query, an API server should return this structure:

{
    "is_template": "bool",
    "model_library_format_path": "str",
    "platform_name": "str",
    "project_options": ["ProjectOption"],
    "protocol_version": "int"

Its members are documented below:

is_template: True when this server lives in a template project. When True, generate_project can be called.
model_library_format_path: None when is_template is True; otherwise, the path, relative to the API server,
platform_name: A unique slug identifying this API server. of the Model Library Format archive used to create this project.
project_options: list of ProjectOption, defined above.
protocol_version: Version of the protocol (e.g. ProjectHandler interface) supported by the API server.

Changes to AutoTVM

There are two changes to AutoTVM needed to interwork with Project API. They are documented in the sections below.

Build Model Library Format artifacts

At present, the AutoTVM Builder creates shared libraries. To interoperate with Project API servers, it needs to create Model Library Format archives. Currently, only fcompile may be given to customize the output format. Builder will accept an additional keyword argument output_format which defaults to so. When mlf is given, Model Library Format .tar will be produced.

Introducing `module_loader` to the runner

Before TVM measures inference time for a given artifact, it needs to connect to a TVM RPC server and load the generated code. This process will be abstracted behind module_loader. The default implementation is as follows:

class DefaultModuleLoader:
    """See default_module_loader(). A pickleable emulation of the original function closure."""

    def __init__(self, pre_load_function=None) -> None:
        self.pre_load_function = pre_load_function

    @contextlib.contextmanager
    def __call__(self, remote_kwargs, build_result):
        remote = request_remote(**remote_kwargs)
        if self.pre_load_function is not None:
            self.pre_load_function(remote, build_result)

        remote.upload(build_result.filename)
        try:
            yield remote, remote.load_module(os.path.split(build_result.filename)[1])

        finally:
            # clean up remote files
            remote.remove(build_result.filename)
            remote.remove(os.path.splitext(build_result.filename)[0] + ".so")
            remote.remove("")


def default_module_loader(pre_load_function=None):
    """Returns a default function that can be passed as module_loader to run_through_rpc.
    Parameters
    ----------
    pre_load_function : Optional[Function[tvm.rpc.Session, tvm.runtime.Module]]
        Invoked after a session is established and before the default code-loading RPC calls are
        issued. Allows performing pre-upload actions, e.g. resetting the remote runtime environment.
    Returns
    -------
    DefaultModuleLoader :
        A callable that can be passed as module_loader to run_through_rpc.
    """

    # This was a function with a closure before but that couldn't be pickled!
    # We need pickle to work for using python's multiprocessing on some platforms.
    return DefaultModuleLoader(pre_load_function)

Drawbacks

The main drawback of this approach is added complexity to TVM's autotuning process. In particular, developers have to understand:

Use of a separate binary program which may reside in a different virtualenv or be implmeneted in a different language.
Stack traces which may be split across the two binaries.

The author's opinion is that these drawbacks are outweighed by the complexity minimized in not requiring TVM to share a compatible set of Python dependencies with each supported micro platform.

Rationale and alternatives

Choice of `JSON-RPC`

There were a couple of RPC options considered for this:

JSON-RPC. Pros: - Human-readable encoding - Very simple to implement (could be in bash with jq) - Concise specification - Packages in several popular languages Cons: - Heavyweight encoding - No streaming facility - Implementations aren't as cohesively authored as gRPC - Makes for two RPC implementations checked-in to TVM.
gRPC Pros: - Widely supported, compact encoding - Clearly-documented API and good support forums - Supports streaming, the most natural way to forward TVM RPC traffic. Cons: - Requires the use of another Python package - Requires the use of an IDL compiler - Intended use case (datacenter-scale RPC) is overkill. - Makes for two RPC implementations checked-in to TVM.
TVM RPC Pros: - Already exists in TVM - Some prior art for session forwarding - Binary encoding Cons: - Binary encoding - Impossible to use today without compiling TVM - Implementation is designed around TVM's remote inference use cases, and will likely change as new demands arise there.

TVM RPC was decided against given the requirement that TVM must be compiled. gRPC was considered, but ultimately rejected because JSON-RPC can be implemented in a single Python file without adding the complexities of an IDL compiler.

Transport functions

When generating projects that perform host-driven inference or autotuning, TVM needs some way to communicate with the project‘s microTVM RPC server. Prior to this RFC, TVM included driver code for various transports (e.g. stdio, PySerial, etc). The Project API places this functionality in the API server, so that TVM doesn’t need to include any transport-specific dependencies (e.g. PySerial) in its required Python dependencies.

There are a couple of subtle details that were changed when re-implementing this interface in Project API to reduce the complexity of Project API servers.

Encapsulating binary data

First, JSON-RPC is an ASCII protocol and as such, binary data can't be transmitted without adding escape characters. To avoid unreadable and large payloads over the Project API RPC, some encoding scheme needed to be chosen in order to encapsulate the binary data in the protocol. The desired properties of the encoding scheme are:

Representable in JSON without the need for escapes
Compact given the above constraint
Easy to encode and decode in languages likely to be used for the API server

Another common place this occurs is in transmitting binary data to and from websites, where an ASCII alphabet is chosen to represent binary data, and the binary data is translated into the alphabet. Since there aren't enough ASCII characters to represent all 2^8 == 256 binary values in one byte, a smaller alphabet is chosen, typically with 64 or 85 characters. This is referred to as base64 or base85, and the binary data is encoded by modular arithmetic into the smaller alphabet. Python provides standard support for these via the base64 module, so the more widely- used encoding (base64) was chosen from those standards to encode binary data in the Project API.

Timeouts

The read and write calls have the following interface semantics:

read(n, timeout_sec) -> bytes. Read n bytes, raising IoTimeoutError if timeout_sec elapses before n bytes were read. No timeout if timeout_sec is None.
write(data, timeout_sec). Write data, raising IoTimeoutError if timeout_sec elapses before all of data was sent. No timeout if timeout_sec is None.

This is a departure from the previous interface, which allowed the implementation to read or write less data than was needed, returning as soon as possible. This was done initially to match the typical UNIX read and write semantics, but it turns out this was tricky to implement with a timeout. The reason for this was as follows:

Because these semantics were expected from the interface, TVM always commanded it to read 128 bytes, even if less were needed.
However, not all libraries obeyed these semantics. In particular, PySerial reads data until the timeout occurs, possibly returning less than n bytes. The implementation tried to read 1 byte until timeout_sec, then set timeout_sec to the expected time taken to transmit n - 1 bytes, and returned whatever it could within that window.
In the case where timeout_sec was None (e.g. when debugging something else), implementations should be quite simple. However, this wasn't the case, because TVM mostly requested more data than it actually needed. In this case, implementations using PySerial were forced to return 1 byte, causing a lot of log spam given the number of round-trips. Also, implementations using file descriptors were needlessly complicated, since select plus a non-blocking read was needed.

In the new interface, implementers know exactly how much data to read and write, and the deadline for such operations. This interface is overall easier to implement and aligns it better with PySerial. Implementations can choose the simplest approach, which is particularly beneficial given that it will be more brittle to depend on shared Python modules from API Server implementations.

Prior art

The current way of integrating with third-party platforms is via the abstractions in the tvm.micro namespace:

tvm.micro.Compiler: used to produce binary and library artifacts.
tvm.micro.Flasher: used to program attached hardware
tvm.micro.Transport: used to communicate with on-device microTVM RPC server

There are several drawbacks to the present scenario:

Generally speaking, this interface encourages code from any platform used by TVM to live in the TVM tree (the main barrier is the CI and reviews).
The Compiler interface is not the correct abstraction of a platform's build system.
The interface is large and spread across multiple classes, making it difficult to assemble a list of tasks needed to support new platforms.
The implementation is overly complex, making it diffcult to actually support such platforms.

Unresolved questions

Is anyone particularly opposed the RPC mechanism used here?
Does this seem simple for downstream platforms to implement?
Are there missing pieces from this initial implementation we should include?

Future possibilities

In the future, one could consider expanding the API slightly to encompass more platform-specific tasks. So far, the main use case to consider is library generation. For example, suppose someone wanted to use tvmc to produce only a library (not a full project) compatible with a particular platform. TVM could include such code in the mainline codebase, or it could rely on a “plugin” which would implement the Project API. A new method generate_library could be added, and additional metadata could be added to the server_info_query reply to allow the API server to indicate whether libraries or projects or both could be generated.

Note that the Project API does not currently aim to be a generic plugin interface for TVM. Such a solution is beyond the scope of this RFC.

Summary

Motivation

Guide-level explanation

Generating a project

Building and Flashing

Host-driven model inference

AutoTVM

Reference-level explanation

Project API implementation

Project API interface

Project Options

ServerInfo

Changes to AutoTVM

Build Model Library Format artifacts

Introducing module_loader to the runner

Drawbacks

Rationale and alternatives

Choice of JSON-RPC

Transport functions

Encapsulating binary data

Timeouts

Prior art

Unresolved questions

Future possibilities

Introducing `module_loader` to the runner

Choice of `JSON-RPC`