blob: cbaaac5d7b1ee05198440a4ca5c8dd09fbd652b7 [file] [log] [blame] [view]
# Handwritten Digit Recognition
This tutorial guides you through a classic computer vision application: identify
hand written digits with neural networks.
<!-- ENABLE LANGUAGE BAR -->
## Loading Data
We first fetch the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset, which is
commonly used for handwritten digit recognition. Each image in this
dataset has been resized into 28x28 with grayscale value between 0 and 254.
![png](https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/example/mnist.png)
The following codes download and load the images and the according labels into
memory.
```python
import mxnet as mx
mnist = mx.test_utils.get_mnist()
```
```julia
using MXNet
include("mnist-data.jl")
```
```r
require(mxnet)
```
```scala
xxx
```
Next we create data iterators for MXNet. A data iterator returns a batch of
examples with according labels each time. If the examples are images, then they
are represented by a 4-D matrix with shape `(batch_size, num_channels, width,
height)`. For the MNIST dataset, there is only one color channel, and both width
and height are 28, therefore the shape is `(batch_size, 1, 28, 28)`. In
addition, we often shuffle the images used for training, which accelerates the
training progress.
```python
batch_size = 100
train_iter = mx.io.NDArrayIter(mnist['train_data'], mnist['train_label'], batch_size, shuffle=True)
val_iter = mx.io.NDArrayIter(mnist['test_data'], mnist['test_label'], batch_size)
```
```julia
batch_size = 100
train_provider, eval_provider = get_mnist_providers(batch_size)
```
## Multilayer Perceptron
We first use [multilayer perceptron](https://en.wikipedia.org/wiki/Multilayer_perceptron) to solve this problem. We
define a multilayer perceptron by using MXNet's symbolic interface. The
following command create a place holder variable for the input data.
```python
data = mx.sym.var('data')
# Flatten the data from 4-D shape into 2-D (batch_size, num_channel*width*height)
data = mx.sym.flatten(data=data)
```
A multilayer perceptron contains several fully-connected layers. A fully-connected
layer, with an *n x m* input matrix *X* outputs a matrix *Y* with size *n x k*,
where *k* is often called as the hidden size. This layer has two learnable parameters, the
*m x k* weight matrix *W* and the *m x 1* bias vector *b*. It compute the
outputs with *Y = W X + b*.
The output of a fully-connected layer is often feed into an activation layer,
which performs element-wise operations. Common activation functions include
sigmoid, tanh, and rectifier (or "relu").
```python
# The first fully-connected layer and the according activation function
fc1 = mx.sym.FullyConnected(data=data, num_hidden=128)
act1 = mx.sym.Activation(data=fc1, act_type="relu")
# The second fully-connected layer and the according activation function
fc2 = mx.sym.FullyConnected(data=act1, num_hidden = 64)
act2 = mx.sym.Activation(data=fc2, act_type="relu")
```
The last fully-connected layer often has the hidden size equals to the number of
classes in the dataset. Then we stack a softmax layer, which map the input into
a probability score. During the training stage, a cross entropy loss is then
applied between the output and label.
```python
# MNIST has 10 classes
fc3 = mx.sym.FullyConnected(data=act2, num_hidden=10)
# Softmax with cross entropy loss
mlp = mx.sym.SoftmaxOutput(data=fc3, name='softmax')
```
```julia
# chain multiple layers with the mx.chain macro
mlp = @mx.chain mx.Variable(:data) =>
mx.FullyConnected(name=:fc1, num_hidden=128) =>
mx.Activation(name=:relu1, act_type=:relu) =>
mx.FullyConnected(name=:fc2, num_hidden=64) =>
mx.Activation(name=:relu2, act_type=:relu) =>
mx.FullyConnected(name=:fc3, num_hidden=10) =>
mx.SoftmaxOutput(name=:softmax)
```
Now both the neural network definition and data iterators are ready. We can
start training. The following commands train the multilayer perception on the
MNIST dataset by minibatch (batch size is 100) stochastic gradient descent with
learning rate 0.1. It stops after 10 epochs (data passes).
```python
import logging
logging.getLogger().setLevel(logging.DEBUG) # logging to stdout
# create a trainable module on CPU
mlp_model = mx.mod.Module(symbol=mlp, context=mx.cpu())
mlp_model.fit(train_iter, # training data
eval_data=val_iter, # validation data
optimizer='sgd', # use SGD to train
optimizer_params={'learning_rate':0.1}, # use fixed learning rate
eval_metric='acc', # report accuracy during training
batch_end_callback = mx.callback.Speedometer(batch_size, 100), # output progress for each 100 data batches
num_epoch=10) # train at most 10 data passes
```
```julia
model = mx.FeedForward(mlp, context=mx.cpu())
optimizer = mx.SGD(lr=0.1, momentum=0.9, weight_decay=0.00001)
mx.fit(model, optimizer, train_provider, n_epoch=20, eval_data=eval_provider)
```
## Convolutional Neural Networks
Note that the fully-connected layer simply reshapes the image into a
vector during training. It ignores the spatial information that pixels are
correlated on both horizontal and vertical dimensions. The convolutional layer
aims to improve this drawback by using a more structural weight *W*. Instead of
simply matrix-matrix multiplication, it uses 2-D convolution to obtain the
output.
Besides the convolutional layer, another major change of the convolutional
neural network is the adding of pooling layers. A pooling layer reduce a
*n x m* patch into a single value to make
the network less sensitive to the spatial location.
The following codes define a convolutional neural network called LeNet:
```python
data = mx.sym.var('data')
# first conv layer
conv1 = mx.sym.Convolution(data=data, kernel=(5,5), num_filter=20)
tanh1 = mx.sym.Activation(data=conv1, act_type="tanh")
pool1 = mx.sym.Pooling(data=tanh1, pool_type="max", kernel=(2,2), stride=(2,2))
# second conv layer
conv2 = mx.sym.Convolution(data=pool1, kernel=(5,5), num_filter=50)
tanh2 = mx.sym.Activation(data=conv2, act_type="tanh")
pool2 = mx.sym.Pooling(data=tanh2, pool_type="max", kernel=(2,2), stride=(2,2))
# first fullc layer
flatten = mx.sym.flatten(data=pool2)
fc1 = mx.symbol.FullyConnected(data=flatten, num_hidden=500)
tanh3 = mx.sym.Activation(data=fc1, act_type="tanh")
# second fullc
fc2 = mx.sym.FullyConnected(data=tanh3, num_hidden=10)
# softmax loss
lenet = mx.sym.SoftmaxOutput(data=fc2, name='softmax')
```
```julia
# input
data = mx.Variable(:data)
# first conv
conv1 = @mx.chain mx.Convolution(data=data, kernel=(5,5), num_filter=20) =>
mx.Activation(act_type=:tanh) =>
mx.Pooling(pool_type=:max, kernel=(2,2),
stride=(2,2))
# second conv
conv2 = @mx.chain mx.Convolution(data=conv1, kernel=(5,5), num_filter=50) =>
mx.Activation(act_type=:tanh) =>
mx.Pooling(pool_type=:max, kernel=(2,2), stride=(2,2))
# first fully-connected
fc1 = @mx.chain mx.Flatten(data=conv2) =>
mx.FullyConnected(num_hidden=500) =>
mx.Activation(act_type=:tanh)
# second fully-connected
fc2 = mx.FullyConnected(data=fc1, num_hidden=10)
# softmax loss
lenet = mx.Softmax(data=fc2, name=:softmax)
```
Now train LeNet with the same hyper-parameters as before. Note that, if GPU is
available, it is desirable to use GPU for the computation given that LeNet is
more complex than the previous multilayer perceptron. To do so, we only need to
change `mx.cpu()` to `mx.gpu()`.
```python
# create a trainable module on GPU 0
lenet_model = mx.mod.Module(symbol=lenet, context=mx.cpu())
# train with the same
lenet_model.fit(train_iter,
eval_data=val_iter,
optimizer='sgd',
optimizer_params={'learning_rate':0.1},
eval_metric='acc',
batch_end_callback = mx.callback.Speedometer(batch_size, 100),
num_epoch=10)
```
```julia
# fit model
model = mx.FeedForward(lenet, context=mx.gpu())
# optimizer
optimizer = mx.SGD(lr=0.05, momentum=0.9, weight_decay=0.00001)
# fit parameters
mx.fit(model, optimizer, train_provider, n_epoch=20, eval_data=eval_provider)
```
## Predict
After training is done, we can predict on new data. The following codes compute
the predict probaility scores for every images, namely *prob[i][j]* is the
probability that the *i*-th image contains the *j*-th object in the label set.
```python
test_iter = mx.io.NDArrayIter(mnist['test_data'], None, batch_size)
prob = mlp_model.predict(test_iter)
assert prob.shape == (10000, 10)
```
If we have the labels for the new images, then we can compute the metrics.
```python
test_iter = mx.io.NDArrayIter(mnist['test_data'], mnist['test_label'], batch_size)
# predict accuracy of mlp
acc = mx.metric.Accuracy()
mlp_model.score(test_iter, acc)
print(acc)
assert acc.get()[1] > 0.96
# predict accuracy for lenet
acc.reset()
lenet_model.score(test_iter, acc)
print(acc)
assert acc.get()[1] > 0.98
```