| # Symbol - Neural network graphs and auto-differentiation |
| |
| Besides the tensor computation interface `NDArray`, another main object in MXNet |
| is the `Symbol` provided by `mxnet.symbol`, or `mxnet.sym` for short. A symbol |
| represents a multi-output symbolic expression. They are composited by operators, |
| such as simple matrix operations (e.g. "+"), or a neural network layer |
| (e.g. convolution layer). An operator can take several input variables, produce |
| more than one output variables, and have internal state variables. A variable |
| can be either free, which we can bind with value later, or an output of another |
| symbol. |
| |
| ## Symbol Composition |
| |
| ### Basic Operators |
| |
| The following example composites a simple expression `a+b`. We first create the |
| placeholders `a` and `b` with names using `mx.sym.Variable`, and then construct |
| the desired symbol by using the operator `+`. When the string name is not given |
| during creating, MXNet will automatically generate a unique name for the symbol, |
| which is the case for `c`. |
| |
| ```python |
| import mxnet as mx |
| a = mx.sym.Variable('a') |
| b = mx.sym.Variable('b') |
| c = a + b |
| (a, b, c) |
| ``` |
| |
| Most `NDArray` operators can be applied to `Symbol`, for example: |
| |
| ```python |
| # elemental wise times |
| d = a * b |
| # matrix multiplication |
| e = mx.sym.dot(a, b) |
| # reshape |
| f = mx.sym.reshape(d+e, shape=(1,4)) |
| # broadcast |
| g = mx.sym.broadcast_to(f, shape=(2,4)) |
| mx.viz.plot_network(symbol=g) |
| ``` |
| |
| ### Basic Neural Networks |
| |
| Besides the basic operators, `Symbol` has a rich set of neural network |
| layers. The following codes construct a two layer fully connected neural work |
| and then visualize the structure by given the input data shape. |
| |
| ```python |
| net = mx.sym.Variable('data') |
| net = mx.sym.FullyConnected(data=net, name='fc1', num_hidden=128) |
| net = mx.sym.Activation(data=net, name='relu1', act_type="relu") |
| net = mx.sym.FullyConnected(data=net, name='fc2', num_hidden=10) |
| net = mx.sym.SoftmaxOutput(data=net, name='out') |
| mx.viz.plot_network(net, shape={'data':(100,200)}) |
| ``` |
| |
| ### Modulelized Construction for Deep Networks |
| |
| For deep networks, such as the Google Inception, constructing layer by layer is |
| painful given the large number of layers. For these networks, we often |
| modularize the construction. Take the Google Inception as an example, we can |
| first define a factory function to chain the convolution layer, batch |
| normalization layer, and Relu activation layer together: |
| |
| ```python |
| def ConvFactory(data, num_filter, kernel, stride=(1,1), pad=(0, 0), name=None, suffix=''): |
| conv = mx.sym.Convolution(data=data, num_filter=num_filter, kernel=kernel, stride=stride, pad=pad, name='conv_%s%s' %(name, suffix)) |
| bn = mx.sym.BatchNorm(data=conv, name='bn_%s%s' %(name, suffix)) |
| act = mx.sym.Activation(data=bn, act_type='relu', name='relu_%s%s' %(name, suffix)) |
| return act |
| prev = mx.sym.Variable(name="Previous Output") |
| conv_comp = ConvFactory(data=prev, num_filter=64, kernel=(7,7), stride=(2, 2)) |
| shape = {"Previous Output" : (128, 3, 28, 28)} |
| mx.viz.plot_network(symbol=conv_comp, shape=shape) |
| ``` |
| |
| Then we define a function that constructs an Inception module based on |
| `ConvFactory` |
| |
| ```python |
| def InceptionFactoryA(data, num_1x1, num_3x3red, num_3x3, num_d3x3red, num_d3x3, pool, proj, name): |
| # 1x1 |
| c1x1 = ConvFactory(data=data, num_filter=num_1x1, kernel=(1, 1), name=('%s_1x1' % name)) |
| # 3x3 reduce + 3x3 |
| c3x3r = ConvFactory(data=data, num_filter=num_3x3red, kernel=(1, 1), name=('%s_3x3' % name), suffix='_reduce') |
| c3x3 = ConvFactory(data=c3x3r, num_filter=num_3x3, kernel=(3, 3), pad=(1, 1), name=('%s_3x3' % name)) |
| # double 3x3 reduce + double 3x3 |
| cd3x3r = ConvFactory(data=data, num_filter=num_d3x3red, kernel=(1, 1), name=('%s_double_3x3' % name), suffix='_reduce') |
| cd3x3 = ConvFactory(data=cd3x3r, num_filter=num_d3x3, kernel=(3, 3), pad=(1, 1), name=('%s_double_3x3_0' % name)) |
| cd3x3 = ConvFactory(data=cd3x3, num_filter=num_d3x3, kernel=(3, 3), pad=(1, 1), name=('%s_double_3x3_1' % name)) |
| # pool + proj |
| pooling = mx.sym.Pooling(data=data, kernel=(3, 3), stride=(1, 1), pad=(1, 1), pool_type=pool, name=('%s_pool_%s_pool' % (pool, name))) |
| cproj = ConvFactory(data=pooling, num_filter=proj, kernel=(1, 1), name=('%s_proj' % name)) |
| # concat |
| concat = mx.sym.Concat(*[c1x1, c3x3, cd3x3, cproj], name='ch_concat_%s_chconcat' % name) |
| return concat |
| prev = mx.sym.Variable(name="Previous Output") |
| in3a = InceptionFactoryA(prev, 64, 64, 64, 64, 96, "avg", 32, name="in3a") |
| mx.viz.plot_network(symbol=in3a, shape=shape) |
| ``` |
| |
| Finally we can obtain the whole network by chaining multiple inception |
| modulas. See a complete example at |
| [here](https://github.com/dmlc/mxnet/blob/master/example/image-classification/symbols/inception-bn.py). |
| |
| ### Group Multiple Symbols |
| |
| To construct neural networks with multiple loss layers, we can use |
| `mxnet.sym.Group` to group multiple symbols together. The following example |
| group two outputs: |
| |
| ```python |
| net = mx.sym.Variable('data') |
| fc1 = mx.sym.FullyConnected(data=net, name='fc1', num_hidden=128) |
| net = mx.sym.Activation(data=fc1, name='relu1', act_type="relu") |
| out1 = mx.sym.SoftmaxOutput(data=net, name='softmax') |
| out2 = mx.sym.LinearRegressionOutput(data=net, name='regression') |
| group = mx.sym.Group([out1, out2]) |
| group.list_outputs() |
| ``` |
| |
| ## Relations to NDArray |
| |
| As can be seen now, both Symbol and NDArray provide multi-dimensional array |
| operations, such as `c=a+b` in MXNet. We briefly clarify the difference here. |
| |
| The `NDArray` provides an imperative programming alike interface, in which the |
| computations are evaluated sentence by sentence. While `Symbol` is closer to |
| declarative programming, in which we first declare the computation, and then |
| evaluate with data. Examples in this category include regular expression and |
| SQL. |
| |
| The pros for `NDArray`: |
| |
| - straightforward |
| - easy to work with other language features (for loop, if-else condition, ..) |
| and libraries (numpy, ..) |
| - easy to step-by-step debug |
| |
| The pros for `Symbol`: |
| |
| - provides almost all functionalities of NDArray, such as +, \*, sin, and |
| reshape |
| - provides a large number of neural network related operators such as |
| Convolution, Activation, and BatchNorm |
| - provides automatic differentiation |
| - easy to construct and manipulate complex computations such as deep neural |
| networks |
| - easy to save, load, and visualization |
| - easy for the backend to optimize the computation and memory usage |
| |
| ## Symbol Manipulation |
| |
| One important difference of `Symbol` comparing to `NDArray` is that, we first |
| declare the computation, and then bind with data to run. |
| |
| In this section we introduce the functions to manipulate a symbol directly. But |
| note that, most of them are wrapped nicely by the |
| `module` package. One can skip this section safely. |
| |
| ### Shape Inference |
| |
| For each symbol, we can query its inputs (or arguments) and outputs. We can also |
| inference the output shape by given the input shape, which facilitates memory |
| allocation. |
| |
| |
| ```python |
| arg_name = c.list_arguments() # get the names of the inputs |
| out_name = c.list_outputs() # get the names of the outputs |
| arg_shape, out_shape, _ = c.infer_shape(a=(2,3), b=(2,3)) |
| {'input' : dict(zip(arg_name, arg_shape)), |
| 'output' : dict(zip(out_name, out_shape))} |
| ``` |
| |
| ### Bind with Data and Evaluate |
| |
| The symbol `c` we constructed declares what computation should be run. To |
| evaluate it, we need to feed arguments, namely free variables, with data |
| first. We can do it by using the `bind` method, which accepts device context and |
| a `dict` mapping free variable names to `NDArray`s as arguments and returns an |
| executor. The executor provides method `forward` for evaluation and attribute |
| `outputs` to get all results. |
| |
| ```python |
| ex = c.bind(ctx=mx.cpu(), args={'a' : mx.nd.ones([2,3]), |
| 'b' : mx.nd.ones([2,3])}) |
| ex.forward() |
| print 'number of outputs = %d\nthe first output = \n%s' % ( |
| len(ex.outputs), ex.outputs[0].asnumpy()) |
| ``` |
| |
| We can evaluate the same symbol on GPU with different data |
| |
| ```python |
| ex_gpu = c.bind(ctx=mx.gpu(), args={'a' : mx.nd.ones([3,4], mx.gpu())*2, |
| 'b' : mx.nd.ones([3,4], mx.gpu())*3}) |
| ex_gpu.forward() |
| ex_gpu.outputs[0].asnumpy() |
| ``` |
| |
| ### Load and Save |
| |
| Similar to NDArray, we can either serialize a `Symbol` object by using `pickle`, |
| or use `save` and `load` directly. Different to the binary format chosen by |
| `NDArray`, `Symbol` uses the more readable json format for serialization. The |
| `tojson` method returns the json string. |
| |
| ```python |
| print(c.tojson()) |
| c.save('symbol-c.json') |
| c2 = mx.sym.load('symbol-c.json') |
| c.tojson() == c2.tojson() |
| ``` |
| |
| ## Customized Symbol |
| |
| Most operators such as `mx.sym.Convolution` and `mx.sym.Reshape` are implemented |
| in C++ for better performance. MXNet also allows users to write new operators |
| using any frontend language such as Python. It often makes the developing and |
| debugging much easier. |
| |
| To implement an operator in Python, we just need to define the two computation |
| methods `forward` and `backward` with several methods for querying the |
| properties, such as `list_arguments` and `infer_shape`. |
| |
| `NDArray` is the default type of arguments in both `forward` and |
| `backward`. Therefore we often also implement the computation with `NDArray` |
| operations. To show the flexibility of MXNet, however, we will demonstrate an |
| implementation of the `softmax` layer using NumPy. Though a NumPy based operator |
| can be only run on CPU and also lose some optimizations which can be applied on |
| NDArray, it enjoys the rich functionalities provided by NumPy. |
| |
| We first create a subclass of `mx.operator.CustomOp` and then define `forward` |
| and `backward`. |
| |
| ```python |
| class Softmax(mx.operator.CustomOp): |
| def forward(self, is_train, req, in_data, out_data, aux): |
| x = in_data[0].asnumpy() |
| y = np.exp(x - x.max(axis=1).reshape((x.shape[0], 1))) |
| y /= y.sum(axis=1).reshape((x.shape[0], 1)) |
| self.assign(out_data[0], req[0], mx.nd.array(y)) |
| |
| def backward(self, req, out_grad, in_data, out_data, in_grad, aux): |
| l = in_data[1].asnumpy().ravel().astype(np.int) |
| y = out_data[0].asnumpy() |
| y[np.arange(l.shape[0]), l] -= 1.0 |
| self.assign(in_grad[0], req[0], mx.nd.array(y)) |
| ``` |
| |
| Here we use `asnumpy` to convert the `NDArray` inputs into `numpy.ndarray`. Then |
| using `CustomOp.assign` to assign the results back to `mxnet.NDArray` based on |
| the value of req, which could be "over write" or "add to". |
| |
| Next we create a subclass of `mx.operator.CustomOpProp` for querying the |
| properties. |
| |
| ```python |
| # register this operator into MXNet by name "softmax" |
| @mx.operator.register("softmax") |
| class SoftmaxProp(mx.operator.CustomOpProp): |
| def __init__(self): |
| # softmax is a loss layer so we don't need gradient input |
| # from layers above. |
| super(SoftmaxProp, self).__init__(need_top_grad=False) |
| |
| def list_arguments(self): |
| return ['data', 'label'] |
| |
| def list_outputs(self): |
| return ['output'] |
| |
| def infer_shape(self, in_shape): |
| data_shape = in_shape[0] |
| label_shape = (in_shape[0][0],) |
| output_shape = in_shape[0] |
| return [data_shape, label_shape], [output_shape], [] |
| |
| def create_operator(self, ctx, shapes, dtypes): |
| return Softmax() |
| ``` |
| |
| Finally, we can use `mx.sym.Custom` with the register name to use this operator |
| |
| ```python |
| net = mx.sym.Custom(data=prev_input, op_type='softmax') |
| ``` |
| |
| ## Advanced Usages |
| |
| ### Type Cast |
| |
| MXNet uses 32-bit float in default. Sometimes we want to use a lower precision |
| data type for better accuracy-performance trade-off. For example, The Nvidia |
| Tesla Pascal GPUs (e.g. P100) have improved 16-bit float performance, while GTX |
| Pascal GPUs (e.g. GTX 1080) are fast on 8-bit integers. |
| |
| We can use the `mx.sym.Cast` operator to convert the data type. |
| |
| ```python |
| a = mx.sym.Variable('data') |
| b = mx.sym.Cast(data=a, dtype='float16') |
| arg, out, _ = b.infer_type(data='float32') |
| print({'input':arg, 'output':out}) |
| |
| c = mx.sym.Cast(data=a, dtype='uint8') |
| arg, out, _ = c.infer_type(data='int32') |
| print({'input':arg, 'output':out}) |
| ``` |
| |
| ### Variable Sharing |
| |
| Sometimes we want to share the contents between several symbols. This can be |
| simply done by bind these symbols with the same array. |
| |
| ```python |
| a = mx.sym.Variable('a') |
| b = mx.sym.Variable('b') |
| c = mx.sym.Variable('c') |
| d = a + b * c |
| |
| data = mx.nd.ones((2,3))*2 |
| ex = d.bind(ctx=mx.cpu(), args={'a':data, 'b':data, 'c':data}) |
| ex.forward() |
| ex.outputs[0].asnumpy() |
| ``` |
| |
| <!-- INSERT SOURCE DOWNLOAD BUTTONS --> |