Summary

Add the Khronos Neural Network Exchange Format (NNEF) as a frontend to TVM Relay and Relax.

Motivation

NNEF is an open, standardized format for neural network exchange developed by the Khronos Group since 2018 (https://www.khronos.org/nnef). It is aimed at deploying trained neural networks from deep learning frameworks to proprietary inference engines of neural network hardware vendors. Such inference engines often require an offline compilation step for running models more efficiently, hence hardware vendors are are looing into open source compiler stacks to be leveraged. On one hand, hardware vendors may integrate their hardware as a backend into TVM, while at the same time integrating NNEF as a frontend would allow vendors to use TVM as an end-to-end compilation tool starting from a standardized format.

The Khronos Group also maintains a set of tools for handling NNEF models. Since NNEF is mainly a textual format, these include a parser (with C++ and Python interfaces), and conversion tools from other formats. NNEF supports conversion from models of various deep learning frameworks, including Caffe, TensorFlow (also Lite) and all those that support ONNX, such as PyTorch. Creating NNEF models is also possible manually by directly writing the model text file(s) (since NNEF is similar to a scripting language). Manually written models may even be executed or trained in deep learning frameworks (currently support for PyTorch exists).

For example, loading an NNEF model in Python is as simple as follows:

import nnef
graph = nnef.load_graph('example.nnef')

The resulting graph object, containing tensors and operators can then be traversed and processed, for example converted into TVM representation, as done in this PR.

The NNEF tools also provide a simple C++ based reference implementation for NNEF models, whose main purpose is testing/debugging conversions, and serving as a reference for other more efficient inference backends. Furthermore, a PyTorch based interpreter is also supported, which is able to execute NNEF models via on/the-fly conversion to PyTorch calls, and can also be used as a (more efficient) reference.

Guide-level explanation

We are going to add support for models in NNEF format. The model may be provided either as an NNEF model folder, or an nnef.Graph object already loaded into memory. The conversion is done via the new frontend function

# for relay frontend:
import tvm.relay as relay
mod, params = relay.frontend.from_nnef(model, freeze_vars=False)
  • model: either a string / PathLike to an NNEF model folder, or an nnef.Graph object.
  • freeze_vars: bool (optional), which sets whether the parameters should be considered variables or constants for optimization.
# for relax frontend:
import tvm.relax as relax
import tvm.relax.frontend.nnef
mod = relax.frontend.nnef.from_nnef(model, keep_params_in_input=False)
  • model: either a string / PathLike to an NNEF model folder, or an nnef.Graph object.
  • keep_params_in_input: bool (optional), sets whether the nnef variables will be converted to constants and folded into the model, or need to be given as inputs.

Example usages (assuming we have a valid NNEF model)

import nnef
from tvm import relay

model_path = 'path/to/model.nnef'

# If modification is warranted the graph can be imported with `nnef.load_graph` 
graph = nnef.load_graph(model_path)

mod, params = relay.frontend.from_nnef(graph)

# Or the converter can read the graph from path as well

mod, params = relay.frontend.from_nnef(model_path)

import tvm.relax as relax
import tvm.relax.frontend.nnef

model_path = 'path/to/model.nnef'

# If modification is warranted the graph can be imported with `nnef.load_graph` 
graph = nnef.load_graph(model_path)

mod = relax.frontend.nnef.from_nnef(graph)

# Or the converter can read the graph from path as well
mod = relax.frontend.nnef.from_nnef(model_path)

Reference-level explanation

As this RFC only adds a new frontend, no other features should be affected.

The process of importing an NNEF model consists of:

  • Loading an NNEF model into memory, if a model path is provided, using nnef.load_graph function to get an nnef.Graph object. After this step the model may be modified with functions provided for NNEF models before final conversion to TVM.
  • Converting the operations of the Graph, setting inputs, and reading parameters one by one.

Drawbacks

Potential increase in time-cost of unit tests.

Rationale and alternatives

The frontend of NNEF is similar to that of ONNX, PyTorch, and TensorFlow, adding it would increase the number of model formats that TVM can process.

Prior art

We are aware of the following projects that currently support importing NNEF models:

Unresolved questions

  • Whether test cases can make use of pre-written the NNEF models, (text files with NNEF syntax, such as graph.nnef) as a starting point. Currently our test cases use separate model folders with prewritten model definitions, and we only generate the inputs for those. The ‘tests/python/frontend/nnef/models’ folder contains these test cases.
  • Installation of NNEF and NNEF-Tools to the TVM CI Docker images. We need the Docker images to contain an install script which uses git to add NNEF to the CI environment, also with lint exceptions to .nnef files (mentioned in the previous point). It seems to work when the docker images are rebuilt from source with the install scripts added, but we are not sure if it okay.

Future possibilities

The Khronos Groups is actively working on the next major update to the NNEF format, whose main purpose is to increase model coverage by adding support for dynamic models and custom operators. In the latter case, more involved compilation of models carries even more potential, so we plan to add support for the next generation as well.

Support for some NNEF operators would only be possible through more complex mapping to a sequence of TVM operators, and the less widely used ones were not the focus of this initial release. We may add support to such operators in the future if required.