| .. Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| .. http://www.apache.org/licenses/LICENSE-2.0 |
| |
| .. Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| |
| .. _relay-op-strategy: |
| |
| Relay Operator Strategy |
| ======================= |
| |
| In order to lower Relay operators to the implementations defined in TOPI |
| library, a compute and schedule function need to be registered to each Relay |
| operator. However, compute and schedule functions are usually specialized for |
| each target, and further, even for the same target, we may have multiple |
| algorithms and implementations available. To deal with the complexity, we |
| introduce operator strategy to allow developers to define a flexible lowering |
| strategy for each operator and target. |
| |
| |
| Operator Strategy Design |
| ------------------------ |
| |
| The basic element in operator strategy is an ``OpImplementation``. It includes |
| the a pair of compute and schedule function, the name of the implementation, |
| and a priority level (the use of priority level is explained in |
| `Select Implementation from Op Strategy`_). |
| |
| The ``OpStrategy`` includes a list of ``OpSpecialization``. Each ``OpSpecialization`` |
| contains a list of ``OpImplementation`` associated with a ``SpecializedCondition`` |
| (see definition in ``include/tvm/te/schedule.h``). The ``SpecializedCondition`` |
| can be null, indicating the implementations are generally applicable; |
| otherwise, the implementations are only considered when the specialized |
| condition is satisfied. ``SpecializedCondition`` consists of a list |
| of clauses defined in Tensor Expression in conjunctive normal form (CNF) and |
| only supports conditions on tensor shapes. |
| |
| Last, a strategy function, or ``FTVMStrategy``, determines which pair(s) of |
| compute and schedule functions should be used given a workload, and needs to be |
| registered to each Relay operator. ``FTVMStrategy`` is a generic function (see |
| ``include/tvm/target/generic_func.h``), that can be overwritten for each |
| target. The function signature is |
| |
| .. code:: c |
| |
| OpStrategy(const Attrs& attrs, const Array<Tensor>& inputs, const Type& out_type, const Target& target) |
| |
| that the function returns an ``OpStrategy`` given the op attributes, input |
| tensors, output types, and target to compile to. |
| |
| |
| Write A Strategy Function |
| ------------------------- |
| |
| We recommend developers to write strategy function in Python as |
| most TOPI compute and schedule functions are written in Python. |
| In python, we provide ``OpStrategy`` class in ``pyton/tvm/relay/op/op.py``. |
| It only has one API, which is to add an implementation to the strategy: |
| |
| .. code:: python |
| |
| def add_implementation(self, compute, schedule, name="default", plevel=10) |
| |
| |
| We now take ``topk`` as an example to explain how to write the |
| ``FTVMStrategy`` function: |
| |
| .. code:: python |
| |
| # add to python/tvm/relay/op/strategy/generic.py |
| @override_native_generic_func("topk_strategy") |
| def topk_strategy(attrs, inputs, out_type, target): |
| strategy = _op.OpStrategy() |
| strategy.add_implementation( |
| wrap_compute_topk(topi.topk), |
| wrap_topi_schedule(topi.generic.schedule_topk), |
| name="topk.generic") |
| return strategy |
| |
| # add to each target file in python/tvm/relay/op/strategy, e.g., x86.py, cuda.py, etc. |
| @topk_strategy.register(["cuda", "gpu"]) |
| def topk_strategy_cuda(attrs, inputs, out_type, target): |
| strategy = _op.OpStrategy() |
| strategy.add_implementation( |
| wrap_compute_my_new_op(topi.cuda.topk), |
| wrap_topi_schedule(topi.cuda.schedule_topk), |
| name="topk.cuda") |
| return strategy |
| |
| In this example, we use ``topi.cuda.topk`` and ``topi.cuda.schedule_topk`` |
| as the compute and schedule function for CUDA or GPU target, while use TOPI |
| generic compute and schedule for the rest of targets. |
| Note that we use two wrapper functions that wrap the topi |
| compute and schedule to conform with the required function signature ( |
| see ``FTVMCompute`` and ``FTVMSchedule`` in ``include/tvm/relay/op_attr_types.h``). |
| Usually we need to write a customized compute wrapper function for each operator |
| to get different fields from op attributes. |
| |
| The example above shows a very basic strategy function that only |
| adds one implementation in the strategy. But for many complicated operators, |
| we may need to add multiple implementations that use different algorithms. |
| For example, we can use both direct and winograd algorithm to |
| compute a conv2d op. In order to achieve this, we can write the strategy function |
| as follows: |
| |
| .. code:: python |
| |
| strategy.add_implementation( |
| wrap_compute_conv2d(topi.cuda.conv2d_nchw), |
| wrap_topi_schedule(topi.cuda.schedule_conv2d_nchw), |
| name="conv2d_nchw.cuda", |
| plevel=10) |
| |
| if winograd_condition: |
| strategy.add_implementation( |
| wrap_compute_conv2d(topi.cuda.conv2d_nchw_winograd), |
| wrap_topi_schedule(topi.cuda.schedule_conv2d_nchw_winograd), |
| name="conv2d_nchw_winograd.cuda", |
| plevel=15) |
| |
| In this example, we add two implementations to the conv2d strategy where |
| winograd algorithm is only added when ``winograd_condition`` is true. |
| The implementation ``"conv2d_nchw_winograd.cuda"`` will be used to compile |
| conv2d when ``winograd_condition`` is true as it has higher |
| priority level (this could be changed if certain implementation is an AutoTVM |
| template. See `Select Implementation from Op Strategy`_ for more |
| details). Otherwise, ``"conv2d_nchw.cuda"`` is used. |
| |
| We can extend the example above to third party library implementation. For |
| example, we can add the implementation that invokes kernel in the cblas |
| library when cblas is included in the target. |
| |
| .. code:: python |
| |
| if "cblas" in target.libs: |
| strategy.add_implementation( |
| wrap_compute_dense(topi.x86.dense_cblas), |
| wrap_topi_schedule(topi.x86.schedule_dense_cblas), |
| name="dense_cblas.x86", |
| plevel=15) |
| |
| |
| Further, we can add implementation specialized for a certain range of shapes. |
| The code below shows an example of dense strategy that adds an implementation |
| that is specialized for ``m`` greater than 16. The main difference between |
| hardcode python condition like examples above and specialized condition is that |
| it allows TVM to generate multiple kernels when the input tensors have symbolic |
| shapes. The compile engine will generate a dispatch function that invokes the |
| specialized kernel when the corresponding condition is met; otherwise, |
| invoke the kernel that has no associated specialized condition (``dense_common`` |
| in this example). This part is still work in progress. More details will be |
| provided after it is done. |
| |
| .. code:: python |
| |
| def dense_strategy(attrs, inputs, out_type, target): |
| m = inputs[0].shape[0] |
| strategy = _op.OpStrategy() |
| strategy.add_implementation( |
| wrap_compute_dense(dense_compute1), |
| wrap_topi_schedule(dense_schedule1), |
| name="dense_common") |
| |
| with tvm.te.SpecializedCondition(m > 16): |
| strategy.add_implementation( |
| wrap_compute_dense(dense_compute2), |
| wrap_topi_schedule(dense_schedule2), |
| name="dense_for_large_m", |
| plevel=15) |
| |
| return strategy |
| |
| |
| Register Strategy Function to An Operator |
| ----------------------------------------- |
| |
| After we define the strategy function for an operator, we can now |
| register the strategy function to this operator with |
| |
| .. code:: python |
| |
| register_strategy("topk", strategy.topk_strategy) |
| |
| However, it takes much effort to write a strategy function for an operator. |
| Therefore, we provide two other methods for simpler operators. |
| |
| First, for operators that have injective, broadcast, or reduction pattern, we |
| can call ``register_injective_schedule``, ``register_broadcast_schedule``, and |
| ``register_reduce_schedule`` repsectively. The schedule function for these |
| patterns are already registered by each target and can be applied to these |
| operators. We assume the compute function should be the same across all targets, |
| and ``FTVMCompute`` needs to be registered to the op before invoking register |
| schedule. |
| |
| .. code:: python |
| |
| register_broadcast_schedule("add") |
| |
| Second, for operators that doesn't have these common patterns mentioned before, |
| but also have the same compute function for all targets, we can use |
| ``register_schedule`` API. It is easier to write ``FTVMSchedule`` function |
| as we only need to provide which schedule function to use. The following |
| code snippet shows ``FTVMSchedule`` function for pooling. |
| |
| .. code:: python |
| |
| # add to python/tvm/relay/op/strategy/generic.py |
| @generic_func |
| def schedule_pool(attrs, outs, target): |
| with target: |
| return topi.generic.schedule_pool(outs, attrs.layout) |
| |
| # add to each target file in python/tvm/relay/op/strategy, e.g., x86.py, cuda.py, etc. |
| @schedule_pool.register("cpu") |
| def schedule_pool_cpu(attrs, outs, target): |
| ... |
| |
| After we created the ``FTVMSchedule`` for an operator, we can |
| register the strategy using ``register_schedule``: |
| |
| .. code:: python |
| |
| register_schedule("nn.max_pool2d", strategy.schedule_pool) |
| |
| |
| Register Strategies for A New Target |
| ------------------------------------ |
| |
| There are two ways to register strategies for a new target. The more |
| straightforward one is adding a new target file in the directory |
| ``python/tvm/relay/op/strategy``. You only need to customize the strategy for |
| ops that have been implemented for this new target and reuse the generic |
| strategies for the rest. |
| |
| Alternatively, you can also register the strategy for the new target outside the |
| TVM python library. The following code snippet shows an example how to do |
| so. You can find more examples in ``vta/python/vta/top/op.py``. |
| |
| .. code:: python |
| |
| @relay.op.strategy.conv2d_strategy.register("mytarget") |
| def conv2d_strategy_mytarget(attrs, inputs, out_type, target): |
| ... |
| |
| |
| Select Implementation from Op Strategy |
| -------------------------------------- |
| |
| During the compilation, Relay compile engine needs to determine which |
| implementation to use for an operator when there are multiple. The selection |
| policy works as follows. |
| |
| When the input tensors to an operator or a fused op all have constant shapes, |
| the compile engine first finds the best implementation based on AutoTVM tuning |
| logs. If there is no implementation that is an AutoTVM template or all AutoTVM |
| templates have fallback configs, the implementation with highest priority level |
| will then be chosen. Implementations with same priority level in this case leads |
| to an undefined behavior, and any of them might be selected. |
| |
| The selection policy for ops with symbolic input shapes is still work in |
| progess. Currently, if any input tensor has a symbolic shape, only the |
| implementation with highest priority level will be used for this operator. This |
| will be updated after the implemention finishes. |
| |
| For debug purpose, you can add the following lines before you compile the Relay |
| model to learn which implementation is used for each operator. |
| |
| .. code:: python |
| |
| logging.getLogger("compile_engine").setLevel(logging.INFO) |
| logging.getLogger("compile_engine").addHandler(logging.StreamHandler(sys.stdout)) |