Gluon package is a high-level interface for MXNet designed to be easy to use while keeping most of the flexibility of low level API. Gluon supports both imperative and symbolic programming, making it easy to train complex models imperatively in Python and then deploy with symbolic graph in C++ and Scala.
# import dependencies from __future__ import print_function import numpy as np import mxnet as mx import mxnet.ndarray as F import mxnet.gluon as gluon from mxnet.gluon import nn from mxnet import autograd
Neural networks (and other machine learning models) can be defined and trained with gluon.nn and gluon.rnn package. A typical training script has the following steps:
gluon.Block is the basic building block of models. You can define networks by composing and inheriting Block:
class Net(gluon.Block): def __init__(self, **kwargs): super(Net, self).__init__(**kwargs) with self.name_scope(): # layers created in name_scope will inherit name space # from parent layer. self.conv1 = nn.Conv2D(6, kernel_size=5) self.pool1 = nn.MaxPool2D(pool_size=(2,2)) self.conv2 = nn.Conv2D(16, kernel_size=5) self.pool2 = nn.MaxPool2D(pool_size=(2,2)) self.fc1 = nn.Dense(120) self.fc2 = nn.Dense(84) self.fc3 = nn.Dense(10) def forward(self, x): x = self.pool1(F.relu(self.conv1(x))) x = self.pool2(F.relu(self.conv2(x))) # 0 means copy over size from corresponding dimension. # -1 means infer size from the rest of dimensions. x = x.reshape((0, -1)) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x
A network must be created and initialized before it can be used:
net = Net() # Initialize on CPU. Replace with `mx.gpu(0)`, or `[mx.gpu(0), mx.gpu(1)]`, # etc to use one or more GPUs. net.initialize(mx.init.Xavier(), ctx=mx.cpu())
Note that because we didn‘t specify input size to layers in Net’s constructor, the shape of parameters cannot be determined at this point. Actual initialization is deferred to the first forward pass, i.e. if you access net.fc1.weight.data() now an exception will be raised.
You can actually initialize the weights by running a forward pass:
data = mx.nd.random_normal(shape=(10, 1, 32, 32)) # dummy data output = net(data)
Or you can specify input size when creating layers, i.e. nn.Dense(84, in_units=120) instead of nn.Dense(84).
Loss functions take (output, label) pairs and compute a scalar loss for each sample in the mini-batch. The scalars measure how far each output is from the label.
There are many predefined loss functions in gluon.loss. Here we use softmax_cross_entropy_loss for digit classification.
To compute loss and backprop for one iteration, we do:
label = mx.nd.arange(10) # dummy label with autograd.record(): output = net(data) L = gluon.loss.SoftmaxCrossEntropyLoss() loss = L(output, label) loss.backward() print('loss:', loss) print('grad:', net.fc1.weight.grad())
Now that gradient is computed, we just need to update the weights. This is usually done with formulas like weight = weight - learning_rate * grad / batch_size. Note we divide gradient by batch_size because gradient is aggregated over the entire batch. For example,
lr = 0.01 for p in net.collect_params().values(): p.data()[:] -= lr / data.shape[0] * p.grad()
But sometimes you want more fancy updating rules like momentum and Adam, and since this is a commonly used functionality, gluon provide a Trainer class for it:
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.01}) with autograd.record(): output = net(data) L = gluon.loss.SoftmaxCrossEntropyLoss() loss = L(output, label) loss.backward() # do the update. Trainer needs to know the batch size of data to normalize # the gradient by 1/batch_size. trainer.step(data.shape[0])