| # How to Create New Operators (Layers) |
| |
| This tutorials walks you through the process of creating new MXNet operators (or layers). |
| We've done our best to provide high-speed operators for most common use cases. |
| However, if you're engaged in research, |
| there's a good chance you'll want to define custom layers, |
| like a novel loss function. In these cases, you have two options: |
| |
| * Use CustomOp to write new operators using a front-end language (e.g., Python) that run on CPUs or GPUs. |
| Depending on your implementation, this can range from very fast (if you only use operators under mx.nd) to very slow (if you copy out the data, using `.asnumpy()`). |
| |
| * Use C++/mshadow (CUDA). This provides the best performance, but can be difficult |
| if you're not familiar with MXNet, mshadow, or Cuda. |
| |
| ## CustomOp |
| Implementing an operator in Python is simple. |
| As an example, let's create a softmax operator. |
| Start by subclassing `mxnet.operator.CustomOp`, |
| and then override a few methods: |
| |
| ```python |
| import os |
| import mxnet as mx |
| import numpy as np |
| |
| class Softmax(mx.operator.CustomOp): |
| def forward(self, is_train, req, in_data, out_data, aux): |
| x = in_data[0].asnumpy() |
| y = np.exp(x - x.max(axis=1).reshape((x.shape[0], 1))) |
| y /= y.sum(axis=1).reshape((x.shape[0], 1)) |
| self.assign(out_data[0], req[0], mx.nd.array(y)) |
| ``` |
| |
| We defined the computation for the forward pass of our operator. |
| The forward function takes a list of input and a list of output NDArrays. |
| For convenience, we called `.asnumpy()` on the first NDArray in input |
| and convert it to a CPU-based NumPy array. |
| This can be very slow. If you want the best performance, |
| keep data in the NDArray format and use operators under mx.nd to do the computation. |
| |
| At the end, we used CustomOp.assign to assign the resulting array y to out_data[0]. It handles assignment based on the value of req, which can be 'write', 'add', or 'null'. |
| |
| Then do the same for the backward pass: |
| |
| ```python |
| def backward(self, req, out_grad, in_data, out_data, in_grad, aux): |
| l = in_data[1].asnumpy().ravel().astype(np.int) |
| y = out_data[0].asnumpy() |
| y[np.arange(l.shape[0]), l] -= 1.0 |
| self.assign(in_grad[0], req[0], mx.nd.array(y)) |
| ``` |
| |
| Softmax defines the computation of our custom operator, |
| but you still need to define its input/output format |
| by subclassing mx.operator.CustomOpProp. |
| First, register the new operator with the name 'softmax': |
| |
| ```python |
| @mx.operator.register("softmax") |
| class SoftmaxProp(mx.operator.CustomOpProp): |
| ``` |
| |
| Then, call the base constructor with `need_top_grad=False` |
| because softmax is a loss layer and you don't need gradient input from preceding layers: |
| |
| ```python |
| def __init__(self): |
| super(SoftmaxProp, self).__init__(need_top_grad=False) |
| ``` |
| |
| Then declare the input and output: |
| |
| ```python |
| def list_arguments(self): |
| return ['data', 'label'] |
| |
| def list_outputs(self): |
| return ['output'] |
| ``` |
| |
| Note that list_arguments declares both input and parameter. |
| We recommend ordering them as follows: `['input1', 'input2', ... , 'weight1', 'weight2', ...]` |
| |
| Next, provide `infer_shape` to declare the shape of the output/weight |
| and check the consistency of the input shapes: |
| |
| ```python |
| def infer_shape(self, in_shape): |
| data_shape = in_shape[0] |
| label_shape = (in_shape[0][0],) |
| output_shape = in_shape[0] |
| return [data_shape, label_shape], [output_shape], [] |
| ``` |
| The first axis of an input/output tensor corresponds to different examples within the batch. |
| The label is a set of integers, one for each data entry, |
| and the output has the same shape as the input. |
| The `infer_shape` function should always return three lists in this order: |
| inputs, outputs, and auxiliary states (which we don't have here), |
| even if one of them is empty. |
| |
| Optionally, you can also define `infer_type` to declare the input and output data type of your operator. Supported types are `np.float32`, `np.float64`, `np.float16`, `np.uint8`, and `np.int32`. |
| |
| ```python |
| def infer_type(self, in_type): |
| dtype = in_type[0] |
| return [dtype, dtype], [dtype], [] |
| ``` |
| |
| Finally, define a create_operator function that will be called by the back end to create an instance of softmax: |
| |
| ```python |
| def create_operator(self, ctx, shapes, dtypes): |
| return Softmax() |
| ``` |
| |
| To use the custom operator, create a mx.sym.Custom symbol with op_type as the registered name: |
| |
| ```python |
| mlp = mx.symbol.Custom(data=fc3, name='softmax', op_type='softmax') |
| ``` |
| |
| Please see the full code for this example [here](https://github.com/dmlc/mxnet/blob/master/example/numpy-ops/custom_softmax.py). |
| |
| ## C++ |
| With MXNet v0.9 (the NNVM refactor) or later, creating new operators has become easier. |
| Operators are now registered with NNVM. |
| The following code is an example on how to register an operator (checkout [src/operator/tensor](https://github.com/dmlc/mxnet/tree/master/src/operator/tensor) for more examples): |
| |
| ```c++ |
| NNVM_REGISTER_OP(abs) |
| .MXNET_DESCRIBE("Take absolute value of the src") |
| .set_num_inputs(1) |
| .set_num_outputs(1) |
| .set_attr<nnvm::FInferShape>("FInferShape", ElemwiseShape<1,1>); |
| ``` |
| |
| The syntax is quite simple, we register the operator with a name, |
| then set number of inputs and outputs. |
| You can register attributes with any key (`FInferShape` for example) to any operator, |
| without having to modify a central class interface definition. |
| |
| ### Operator Attribute System |
| |
| One of the biggest improvements brought by NNVM is the operator attribute system. |
| This is like traits for types in common languages like C++. |
| We can register any attribute to any operator, with the syntax |
| |
| ``` c++ |
| NNVM_REGISTER_OP(op-name) |
| .set_attr<AttributeType>("AttributeKey", CorrespondingAttributeObject); |
| ``` |
| |
| These attributes can be retrieved later for various purposes. |
| For example, `FInferShape` is used for shape inference, `FCompute<cpu>` is used for carrying out actual computation on CPU. |
| |
| As long as all attributes registered with the same key have the same type, |
| we can register any attributes to operators. |
| The more attribute an operator provides, |
| the more information the system can use for optimization. |
| |
| ### List of basic attributes |
| |
| In this section, we will go through the basic attributes MXNet expect for all operators. |
| You can find the definition for them in the following two files: |
| |
| - [nnvm/op_attr_types.h](https://github.com/dmlc/nnvm/blob/master/include/nnvm/op_attr_types.h) |
| - [mxnet/op_attr_types.h](https://github.com/dmlc/mxnet/blob/master/include/mxnet/op_attr_types.h) |
| |
| #### Descriptions (Optional) |
| |
| `.describe(comment)` adds a comment to the operator. Use `.MXNET_DESCRIBE(comment)` to add the current file name and line number to comment. |
| |
| #### Attribute Parser (Optional) |
| |
| Set attribute parser with `.set_attr_parser(PARSER)` where PARSER is a function with prototype `void(nnvm::NodeAttr* attrs)`. This function should parse the key-word arguments in `attrs->dict` and store the result in `attrs->parsed`. |
| |
| Simple arguments can be parsed like |
| ```c++ |
| NNVM_REGISTER_OP(scalar_op) |
| .set_attr_parser( |
| [](NodeAttrs* attrs) { |
| attrs->parsed = std::stod(attrs->dict["scalar"]); |
| }) |
| ``` |
| |
| The parsed arguments can then be accessed in other attribute functions with |
| ``` |
| double alpha = nnvm::get<double>(attrs.parsed); |
| ``` |
| |
| More complex ops can use `dmlc::Parameters` and `ParamParser` (defined in operator_common.h) for parsing: |
| |
| ``` c++ |
| #include <dmlc/parameter.h> |
| #include <operator_common.h> |
| struct ActivationParam : public dmlc::Parameter<ActivationParam> { |
| // use int for enumeration |
| int act_type; |
| DMLC_DECLARE_PARAMETER(ActivationParam) { |
| DMLC_DECLARE_FIELD(act_type) |
| .add_enum("relu", activation::kReLU) |
| .add_enum("sigmoid", activation::kSigmoid) |
| .add_enum("tanh", activation::kTanh) |
| .add_enum("softrelu", activation::kSoftReLU) |
| .describe("Activation function to be applied."); |
| } |
| }; |
| NNVM_REGISTER_OP(Activation) |
| .set_attr_parser(ParamParser<ActivationParam>); |
| // access with: |
| // const ActivationParam& param = nnvm::get<ActivationParam>(attrs.parsed); |
| ``` |
| |
| #### Inputs & Outputs |
| |
| Number of inputs/outputs can be set with `.set_num_inputs(n_in)` and `.set_num_outputs(n_out)` |
| where n_in and n_out are integers. |
| |
| Alternatively, if the number of inputs/outputs is variable and depends on arguments, |
| you can set `n_in`/`n_out` to functions with prototype `uint32_t(const nnvm::NodeAttrs& attrs)` |
| that return the number of inputs/outputs based on parsed arguments. |
| |
| Outputs can be made invisible to other operators by registering `FNumVisibleOutputs` |
| and returning an integer smaller than `n_out`. |
| |
| Inputs/outputs can be named by registering `FListInputNames` and `FListOutputNames` with prototype `std::vector<std::string>(const NodeAttrs& attrs)`. |
| |
| |
| #### Argument Descriptions |
| |
| Set argument descriptions with `.add_argument(name, type, comment)`. |
| This is necessary for operators to be properly called imperatively. |
| |
| First, add NDArray arguments `num_inputs` times with type "NDArray" |
| or one time with type "NDArray[]" for ops with variable length inputs. |
| |
| Then add key-word arguments with proper type (float, string, etc). |
| Operators that parse key-word arguments with `dmlc::Parameter` |
| can add argument descriptions in bulk with `.add_arguments(ActivationParam::__FIELDS__())` |
| (NDArray arguments still need to be manually added with type "NDArray"). |
| |
| #### FInferShape or TIsBackward (for Backward Only Ops) |
| |
| Normally operators need to have `FInferShape` with prototype `bool(const nnvm::NodeAttrs& attrs, std::vector<TShape> *in_attrs, std::vector<TShape> *out_attrs)`. `FInferShape` fills unknown shapes (`shape.ndim() == 0`) in in_attrs/out_attrs based on known shapes in in_attrs/out_attrs. Use `ElemwiseShape<n_in, n_out>` for simple operators with uniform shapes. |
| |
| Operators that are only used for a backward pass can instead register `.set_attr<nnvm::TIsBackward>("TIsBackward", true)` |
| and their shapes with be copied from the corresponding forward operators. |
| |
| #### FInferType |
| |
| Similar to `FInferShape`, `FInferType` fills unknown types (-1) based on known types. Use `ElemwiseType<n_in, n_out>` for simple operators with uniform types. Operators that registered `TIsBackward` don't need to register this. |
| |
| |
| #### FInplaceOption (Optional) |
| |
| `FInplaceOption` with prototype `std::vector<std::pair<int, int> >(const NodeAttrs& attrs)` |
| specifies which input/output pairs can be computed in-place |
| and share memory with each other. |
| Each pair (i, j) in the returned list means |
| that the i-th input can share memory with the j-th output. |
| |
| |
| #### FGradient (Optional for imperative use, required for symbolic use) |
| |
| If an operator has gradient, it can be described with `FGradient` with prototype |
| |
| ``` c++ |
| std::vector<nnvm::NodeEntry>(const nnvm::NodePtr& n, |
| const std::vector<nnvm::NodeEntry>& ograds) |
| ``` |
| |
| Use utility functions `ElemwiseGradUseIn{op_name}`, `ElemwiseGradUseOut{op_name}`, `ElemwiseGradUseNone{op_name}` for ops that need corresponding forward op's input, |
| output or nothing to calculating gradient. |
| |
| For more complicated patterns, use `MakeGradNode(op_name, n, heads, dict)` to create gradient entries, |
| where heads are input entries to the backward op, composed from ograds and n->inputs. |
| |
| #### FCompute\<xpu\> |
| |
| Simple operators can register FCompute<xpu> with `.set_attr<FCompute>("FCompute<cpu>", ...)` and `.set_attr<FCompute>("FCompute<gpu>", ...)` for both CPU and (optionally) GPU computation. |
| |
| FCompute has prototype |
| |
| ```c++ |
| void(const nnvm::NodeAttrs& attrs, |
| const OpContext& ctx, |
| const std::vector<TBlob>& inputs, |
| const std::vector<OpReqType>& req, |
| const std::vector<TBlob>& outputs) |
| ``` |
| |
| `req` has the same length as `outputs`. |
| Each entry of `req` specifies |
| how the corresponding `output` should be written to. |
| `OpReqType` is defined as: |
| |
| ```c++ |
| enum OpReqType { |
| kNullOp, |
| kWriteTo, |
| kWriteInplace, |
| kAddTo |
| }; |
| ``` |
| |
| Normally, the `req` of all `outputs` should be `kWriteTo`, |
| meaning that the provided `outputs` tensor is a *raw* memory block, |
| so the operator should write results directly into it. |
| In some cases, for example, when calculating the gradient tensor, |
| it would be great if we could accumulate the result, |
| rather than directly overwrite the tensor contents |
| so that no extra space needs to be created each time. |
| In such cases, the corresponding `req` is set to `kAddTo`, |
| indicating that a `+=` should be used. |
| |
| ### Example: abs operator |
| |
| ```c++ |
| NNVM_REGISTER_OP(abs) |
| .MXNET_DESCRIBE("Take absolute value of the src") |
| .set_num_inputs(1) |
| .set_num_outputs(1) |
| .set_attr<nnvm::FInferShape>("FInferShape", ElemwiseShape<1, 1>) |
| .set_attr<nnvm::FInferType>("FInferType", ElemwiseType<1, 1>) |
| .set_attr<nnvm::FInplaceOption>("FInplaceOption", |
| [](const NodeAttrs& attrs){ |
| return std::vector<std::pair<int, int> >{{0, 0}}; |
| }) |
| .set_attr<FCompute>("FCompute<cpu>", UnaryCompute<cpu, mshadow_op::abs>) |
| .set_attr<nnvm::FGradient>("FGradient", ElemwiseGradUseIn{"_backward_abs"}); |
| .add_argument("data", "NDArray", "Source input") |
| |
| NNVM_REGISTER_OP(_backward_abs) |
| .set_num_inputs(2) |
| .set_num_outputs(1) |
| .set_attr<nnvm::FInferShape>("FInferShape", ElemwiseShape<2, 1>) |
| .set_attr<nnvm::FInferType>("FInferType", ElemwiseType<2, 1>) |
| .set_attr<nnvm::FInplaceOption>("FInplaceOption", |
| [](const NodeAttrs& attrs){ |
| return std::vector<std::pair<int, int> >{{0, 0}, {1, 0}}; |
| }) |
| .set_attr<FCompute>("FCompute<cpu>", BinaryCompute<cpu, unary_bwd<mshadow_op::sign> >); |
| ``` |
| |
| ### Legacy Operators |
| |
| For the legacy (pre 0.9) way of defining operators with C++, please see: |
| - [Developer Guide - Operators](http://mxnet.io/architecture/overview.html#operators-in-mxnet) |
| - [Developer Guide - SimpleOp](http://mxnet.io/architecture/overview.html#simpleop-the-unified-operator-api) |