Provide a standard and easily testable way to inspect features of a given target and provide them to the various parts of TVM which utilise that information.
TVM has multiple ways to define a Target
s architectural features for use in deciding on schedules or other calculations, here's a few different ways we do this:
Target
in utility functions: https://github.com/apache/tvm/blob/d2db9cb0d839e32778f461b77e59f6418282a511/python/tvm/topi/arm_cpu/arm_utils.py#L24-L70Target
in utility functions inside legalization code: https://github.com/apache/tvm/blob/02fbaf0ed9120a8f95155e63de42459f230584aa/python/tvm/relay/qnn/op/legalizations.py#L350-L359Target
inside the definition a strategy: https://github.com/apache/tvm/blob/b542724873140bb051492530d97a78b9b7b7983d/python/tvm/relay/op/strategy/arm_cpu.py#L232PackedFunc
(https://github.com/apache/tvm/blob/24e5498021cecca2fe7d44149ce90efe28b6d930/python/tvm/topi/x86/utils.py#L21-L34) and then used as part of Op
processing: https://github.com/apache/tvm/blob/24e5498021cecca2fe7d44149ce90efe28b6d930/src/relay/qnn/op/requantize_config.h#L58-L73This RFC aims to standardise the way in which we convert Target
attributes into architectural features by processing them ahead of time.
An additional property features
will be added to the Target
which is created at the time of instantiation, this will be populated by inferred features of the Target
such as architectural extensions or bus sizes. The main distinction is that features
are inferred from the Target
attrs
rather than being passed in.
An example of the new features
attribute will be illustrated using examples targeting TVM for Arm(R) Cortex(R)-M4.
The Target
specifies the specific CPU in the attrs
and uses that to create the features
object representing the architectural extensions of the Target
, which can then be accessed using the GetFeature
method similar to GetAttr
:
Target my_target("c -mcpu=cortex-m4"); my_target->GetFeature<Bool>("is_aarch64", false); // false my_target->GetFeature<Bool>("has_dsp", false); // true
my_target = Target("c -mcpu=cortex-m4") my_target.features.is_aarch64 # false my_target.features.has_dsp # true
This means that instead of the current:
isa = arm_isa.IsaAnalyzer(target) if isa.has_dsp_support: do_dsp_stuff()
The Target
can be directly inspected:
if target.features.dsp: do_dsp_stuff()
The Target
class, in C++, will have an an additional property named features
:
class Target { ... DictAttrs features; ... }
Which will have similar helper methods to those seen in IRModule
for DictAttrs
but with reference to Features
rather than Attr
:
template <typename TObjectRef> Optional<TObjectRef> GetFeatures( const std::string& attr_key, Optional<TObjectRef> default_value = Optional<TObjectRef>(nullptr)) const { return attrs.GetAttr(attr_key, default_value); } template <typename TObjectRef> Optional<TObjectRef> GetFeatures(const std::string& attr_key, TObjectRef default_value) const { return GetFeatures<TObjectRef>(attr_key, Optional<TObjectRef>(default_value)); }
As well as a Python class to represent this and allow simple access to the features
using the target.features.<feature>
syntax:
class TargetFeatures: def __init__(self, target): self._target = target def __getattr__(self, name): return _ffi_api.TargetGetFeature(self._target, name)
Centralising features
on Target
increases the complexity for each Target
parser as they will have to cater for a number of attributes, this is easily avoided by splitting the internal parsers.
Making features
read-only and derived from the parser limits the flexibility to create an object with specific features for testing, in this case actual valid Target
s will have to be used for such testing.
If we were to attach all of these directly to Target
(i.e. llvm
) as attrs
, that would drastically increase the number of fields on a given Target
and in all cases only a subset would be used - specific to a given CPU/GPU profile:
my_target = Target("c -mcpu=cortex-m4") my_target.is_aarch64 # Extra attribute in `attrs`
Re-using attrs
becomes confusing to work with alongside the documented Target
attributes in target_kind.cc
, or target_kind.cc
would need to be bloated with every potential feature of a Target
. The approach of overlapping with Target
attributes would also increase testing overhead rather than having a straight forward attrs
to features
map to test you would need to consider which attrs
could validly mutate - this also introduces user confusion as target.mcpu
is no longer the mcpu
which they passed in.
Using a standalone function or class across the various areas of the codebase, such as:
TargetFeatures my_target_features(target) my_target_features->is_aarch64; // false
This means re-processing Target
whenever a specific attribute is required but would provide a single source of truth for doing so.
It's potentially possible to recreate the functionality of features
by populating a larger list of Target
tags, taking the example of:
TVM_REGISTER_TARGET_TAG("raspberry-pi/4b-aarch64") .set_config({{"kind", String("llvm")}, {"mtriple", String("aarch64-linux-gnu")}, {"mcpu", String("cortex-a72")}, {"mattr", Array<String>{"+neon"}}, {"num-cores", Integer(4)}, {"host", Map<String, ObjectRef>{{"kind", String("llvm")}, {"mtriple", String("aarch64-linux-gnu")}, {"mcpu", String("cortex-a72")}, {"mattr", Array<String>{"+neon"}}, {"num-cores", Integer(4)}}}});
These are pre-configured Target
s with various mtriple
, mcpu
and mattr
attributes already set - once parsed these can produce a set of architecture features for subsequent steps, such as replacing this check in the operator strategy:
Other tagged Target
s will likely have the same mattr
and mcpu
, thus rather than trying to hand craft the permutations each time, the parser generalises inferring these features
, augmenting tagged Target
s.
Taking the example of LLVM, it follows a similar methodology, resulting in a Features
vector:
clang
uses mtriple
to determine the correct parser to use for the various other options: https://github.com/llvm/llvm-project/blob/2f04e703bff3d9858f53225fa7c780b240c3e247/clang/lib/Driver/ToolChains/Clang.cpp#L324clang
uses the LLVM parsers to determine available features for a given set of Target
parameters such as mcpu
and mtune
: https://github.com/llvm/llvm-project/blob/43d758b142bbdf94a1c55dc0950637ae74f825b9/clang/lib/Driver/ToolChains/Arch/AArch64.cppFeatures
parsers: https://github.com/llvm/llvm-project/blob/09c2b7c35af8c4bad39f03e9f60df8bd07323028/llvm/lib/Support/AArch64TargetParser.cppYou can see similar definitions within GCC:
This RFC builds upon the following existing TVM RFCs:
Similar to LLVM and GCC, we may be able to use a custom file format to describe Target
s more effectively in future which can be added using the same hooks, allowing for easier contributions.