blob: c40251d22433d469dc8adb5ecb6474b9f26af799 [file] [log] [blame]
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
.. _relay-op-strategy:
Relay Operator Strategy
=======================
In order to lower Relay operators to the implementations defined in TOPI
library, a compute and schedule function need to be registered to each Relay
operator. However, compute and schedule functions are usually specialized for
each target, and further, even for the same target, we may have multiple
algorithms and implementations available. To deal with the complexity, we
introduce operator strategy to allow developers to define a flexible lowering
strategy for each operator and target.
Operator Strategy Design
------------------------
The basic element in operator strategy is an ``OpImplementation``. It includes
the a pair of compute and schedule function, the name of the implementation,
and a priority level (the use of priority level is explained in
`Select Implementation from Op Strategy`_).
The ``OpStrategy`` includes a list of ``OpSpecialization``. Each ``OpSpecialization``
contains a list of ``OpImplementation`` associated with a ``SpecializedCondition``
(see definition in ``include/tvm/te/schedule.h``). The ``SpecializedCondition``
can be null, indicating the implementations are generally applicable;
otherwise, the implementations are only considered when the specialized
condition is satisfied. ``SpecializedCondition`` consists of a list
of clauses defined in Tensor Expression in conjunctive normal form (CNF) and
only supports conditions on tensor shapes.
Last, a strategy function, or ``FTVMStrategy``, determines which pair(s) of
compute and schedule functions should be used given a workload, and needs to be
registered to each Relay operator. ``FTVMStrategy`` is a generic function (see
``include/tvm/target/generic_func.h``), that can be overwritten for each
target. The function signature is
.. code:: c
OpStrategy(const Attrs& attrs, const Array<Tensor>& inputs, const Type& out_type, const Target& target)
that the function returns an ``OpStrategy`` given the op attributes, input
tensors, output types, and target to compile to.
Write A Strategy Function
-------------------------
We recommend developers to write strategy function in Python as
most TOPI compute and schedule functions are written in Python.
In python, we provide ``OpStrategy`` class in ``pyton/tvm/relay/op/op.py``.
It only has one API, which is to add an implementation to the strategy:
.. code:: python
def add_implementation(self, compute, schedule, name="default", plevel=10)
We now take ``topk`` as an example to explain how to write the
``FTVMStrategy`` function:
.. code:: python
# add to python/tvm/relay/op/strategy/generic.py
@override_native_generic_func("topk_strategy")
def topk_strategy(attrs, inputs, out_type, target):
strategy = _op.OpStrategy()
strategy.add_implementation(
wrap_compute_topk(topi.topk),
wrap_topi_schedule(topi.generic.schedule_topk),
name="topk.generic")
return strategy
# add to each target file in python/tvm/relay/op/strategy, e.g., x86.py, cuda.py, etc.
@topk_strategy.register(["cuda", "gpu"])
def topk_strategy_cuda(attrs, inputs, out_type, target):
strategy = _op.OpStrategy()
strategy.add_implementation(
wrap_compute_my_new_op(topi.cuda.topk),
wrap_topi_schedule(topi.cuda.schedule_topk),
name="topk.cuda")
return strategy
In this example, we use ``topi.cuda.topk`` and ``topi.cuda.schedule_topk``
as the compute and schedule function for CUDA or GPU target, while use TOPI
generic compute and schedule for the rest of targets.
Note that we use two wrapper functions that wrap the topi
compute and schedule to conform with the required function signature (
see ``FTVMCompute`` and ``FTVMSchedule`` in ``include/tvm/relay/op_attr_types.h``).
Usually we need to write a customized compute wrapper function for each operator
to get different fields from op attributes.
The example above shows a very basic strategy function that only
adds one implementation in the strategy. But for many complicated operators,
we may need to add multiple implementations that use different algorithms.
For example, we can use both direct and winograd algorithm to
compute a conv2d op. In order to achieve this, we can write the strategy function
as follows:
.. code:: python
strategy.add_implementation(
wrap_compute_conv2d(topi.cuda.conv2d_nchw),
wrap_topi_schedule(topi.cuda.schedule_conv2d_nchw),
name="conv2d_nchw.cuda",
plevel=10)
if winograd_condition:
strategy.add_implementation(
wrap_compute_conv2d(topi.cuda.conv2d_nchw_winograd),
wrap_topi_schedule(topi.cuda.schedule_conv2d_nchw_winograd),
name="conv2d_nchw_winograd.cuda",
plevel=15)
In this example, we add two implementations to the conv2d strategy where
winograd algorithm is only added when ``winograd_condition`` is true.
The implementation ``"conv2d_nchw_winograd.cuda"`` will be used to compile
conv2d when ``winograd_condition`` is true as it has higher
priority level (this could be changed if certain implementation is an AutoTVM
template. See `Select Implementation from Op Strategy`_ for more
details). Otherwise, ``"conv2d_nchw.cuda"`` is used.
We can extend the example above to third party library implementation. For
example, we can add the implementation that invokes kernel in the cblas
library when cblas is included in the target.
.. code:: python
if "cblas" in target.libs:
strategy.add_implementation(
wrap_compute_dense(topi.x86.dense_cblas),
wrap_topi_schedule(topi.x86.schedule_dense_cblas),
name="dense_cblas.x86",
plevel=15)
Further, we can add implementation specialized for a certain range of shapes.
The code below shows an example of dense strategy that adds an implementation
that is specialized for ``m`` greater than 16. The main difference between
hardcode python condition like examples above and specialized condition is that
it allows TVM to generate multiple kernels when the input tensors have symbolic
shapes. The compile engine will generate a dispatch function that invokes the
specialized kernel when the corresponding condition is met; otherwise,
invoke the kernel that has no associated specialized condition (``dense_common``
in this example). This part is still work in progress. More details will be
provided after it is done.
.. code:: python
def dense_strategy(attrs, inputs, out_type, target):
m = inputs[0].shape[0]
strategy = _op.OpStrategy()
strategy.add_implementation(
wrap_compute_dense(dense_compute1),
wrap_topi_schedule(dense_schedule1),
name="dense_common")
with tvm.te.SpecializedCondition(m > 16):
strategy.add_implementation(
wrap_compute_dense(dense_compute2),
wrap_topi_schedule(dense_schedule2),
name="dense_for_large_m",
plevel=15)
return strategy
Register Strategy Function to An Operator
-----------------------------------------
After we define the strategy function for an operator, we can now
register the strategy function to this operator with
.. code:: python
register_strategy("topk", strategy.topk_strategy)
However, it takes much effort to write a strategy function for an operator.
Therefore, we provide two other methods for simpler operators.
First, for operators that have injective, broadcast, or reduction pattern, we
can call ``register_injective_schedule``, ``register_broadcast_schedule``, and
``register_reduce_schedule`` repsectively. The schedule function for these
patterns are already registered by each target and can be applied to these
operators. We assume the compute function should be the same across all targets,
and ``FTVMCompute`` needs to be registered to the op before invoking register
schedule.
.. code:: python
register_broadcast_schedule("add")
Second, for operators that doesn't have these common patterns mentioned before,
but also have the same compute function for all targets, we can use
``register_schedule`` API. It is easier to write ``FTVMSchedule`` function
as we only need to provide which schedule function to use. The following
code snippet shows ``FTVMSchedule`` function for pooling.
.. code:: python
# add to python/tvm/relay/op/strategy/generic.py
@generic_func
def schedule_pool(attrs, outs, target):
with target:
return topi.generic.schedule_pool(outs, attrs.layout)
# add to each target file in python/tvm/relay/op/strategy, e.g., x86.py, cuda.py, etc.
@schedule_pool.register("cpu")
def schedule_pool_cpu(attrs, outs, target):
...
After we created the ``FTVMSchedule`` for an operator, we can
register the strategy using ``register_schedule``:
.. code:: python
register_schedule("nn.max_pool2d", strategy.schedule_pool)
Register Strategies for A New Target
------------------------------------
There are two ways to register strategies for a new target. The more
straightforward one is adding a new target file in the directory
``python/tvm/relay/op/strategy``. You only need to customize the strategy for
ops that have been implemented for this new target and reuse the generic
strategies for the rest.
Alternatively, you can also register the strategy for the new target outside the
TVM python library. The following code snippet shows an example how to do
so. You can find more examples in ``vta/python/vta/top/op.py``.
.. code:: python
@relay.op.strategy.conv2d_strategy.register("mytarget")
def conv2d_strategy_mytarget(attrs, inputs, out_type, target):
...
Select Implementation from Op Strategy
--------------------------------------
During the compilation, Relay compile engine needs to determine which
implementation to use for an operator when there are multiple. The selection
policy works as follows.
When the input tensors to an operator or a fused op all have constant shapes,
the compile engine first finds the best implementation based on AutoTVM tuning
logs. If there is no implementation that is an AutoTVM template or all AutoTVM
templates have fallback configs, the implementation with highest priority level
will then be chosen. Implementations with same priority level in this case leads
to an undefined behavior, and any of them might be selected.
The selection policy for ops with symbolic input shapes is still work in
progess. Currently, if any input tensor has a symbolic shape, only the
implementation with highest priority level will be used for this operator. This
will be updated after the implemention finishes.
For debug purpose, you can add the following lines before you compile the Relay
model to learn which implementation is used for each operator.
.. code:: python
logging.getLogger("compile_engine").setLevel(logging.INFO)
logging.getLogger("compile_engine").addHandler(logging.StreamHandler(sys.stdout))