This RFC proposes to integrate DNNL into TVM via BYOC framework. The drawback of the current “Bring DNNL to TVM via DNNL JSON codegen/runtime” is analysed and has been enhanced. Performance benefits are observed by comparing with either MXNet-oneDNN or TVM-autoscheduler on several popular workloads.
TVM has shown its good performance on many CV models. One of the major advantages is the maximizing throughput which benefits from the small overhead. However, tuning is needed for each new shape, and it usually takes long time.
oneDNN is an open-source cross-platform performance library of basic building blocks for deep learning applications. The library is optimized for Intel(R) Architecture Processors, Intel(R) Processor Graphics and Xe Architecture graphics. Given a new shape and the env config, oneDNN is able to infer the optimal data format immediately. In order to take the advantage of small overhead of TVM, and achieve the best performance on CPU in a short time, we propose to integrate oneDNN into TVM via BYOC framework.
Currently, the BYOC homepage provides a simple example of integrating DNNL(naming to oneDNN nowadays) into TVM, but the performance is far away from both TVM autoscheduler and MXNet-oneDNN due to the following main reasons:
We have already solved the above issues and observed the performance benefits by comparing with either MXNet-oneDNN or TVM-autoscheduler on several popular workloads like ResNet50_v1b, InceptionV3, VGG11_bn in several scenarios including latency (Figure 1, single instance with 28 cores and bs=1), throughput (Figure 2, single instance with 28 core and bs=32) and real-time (Figure 3, 7 instances with 4core per each and bs=1) mode.
Hardware config
Compilation config
Runtime config
This proposal aims to provide a new approach to integrate oneDNN into TVM via DNNL JSON codegen/runtime by applying the following adjustments to tackle the aforementioned issues:
simplifyConsecuitiveAdd
pattern in simplify_expr
pass. So that, FoldConstant
pass able to fuse pattern conv-add-add-relu
(comes from conv-bias_add-bn-relu
) into conv-add-relu
.We have enhanced and updated the support. Currently, the following ops/post-op fusion/datatype are enhanced/added, as well as some CV models are verified with the new oneDNN backend, we’re going to cover more ops/datatypes and models (denoted with *) in the next step.
Currently, only test on Intel
CPU.
There are two ways to integrate oneDNN into TVM, “JSON Codegen” and “C Source Codegen”. This RFC is developped with “JSON Codegen”.
This RFC aims to enhance the existing “Bring DNNL to TVM via DNNL JSON codegen/runtime” to take advantages of both TVM and oneDNN.
The issues related to poor performance of the existing BYOC with DNNL have been listed in Motivation. They have been solved in this RFC.
More ops, post-op fusions, datatypes and more workloads are to be supported in the next step.