blob: cc5267071f1170d7ba327d6fce480c5cb67470b6 [file] [log] [blame] [view]
<!--- Licensed to the Apache Software Foundation (ASF) under one -->
<!--- or more contributor license agreements. See the NOTICE file -->
<!--- distributed with this work for additional information -->
<!--- regarding copyright ownership. The ASF licenses this file -->
<!--- to you under the Apache License, Version 2.0 (the -->
<!--- "License"); you may not use this file except in compliance -->
<!--- with the License. You may obtain a copy of the License at -->
<!--- http://www.apache.org/licenses/LICENSE-2.0 -->
<!--- Unless required by applicable law or agreed to in writing, -->
<!--- software distributed under the License is distributed on an -->
<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
<!--- KIND, either express or implied. See the License for the -->
<!--- specific language governing permissions and limitations -->
<!--- under the License. -->
Digit Recognition on MNIST
==========================
In this tutorial, we will work through examples of training a simple
multi-layer perceptron and then a convolutional neural network (the
LeNet architecture) on the [MNIST handwritten digit
dataset](http://yann.lecun.com/exdb/mnist/). The code for this tutorial
could be found in
[examples/mnist](https://github.com/dmlc/MXNet.jl/tree/master/examples/mnist). There are also two Jupyter notebooks that expand a little more on the [MLP](https://github.com/ultradian/julia_notebooks/blob/master/mxnet/mnistMLP.ipynb) and the [LeNet](https://github.com/ultradian/julia_notebooks/blob/master/mxnet/mnistLenet.ipynb), using the more general `ArrayDataProvider`.
Simple 3-layer MLP
------------------
This is a tiny 3-layer MLP that could be easily trained on CPU. The
script starts with
```julia
using MXNet
```
to load the `MXNet` module. Then we are ready to define the network
architecture via the [symbolic API](../user-guide/overview.md). We start
with a placeholder `data` symbol,
```julia
data = mx.Variable(:data)
```
and then cascading fully-connected layers and activation functions:
```julia
fc1 = mx.FullyConnected(data, name=:fc1, num_hidden=128)
act1 = mx.Activation(fc1, name=:relu1, act_type=:relu)
fc2 = mx.FullyConnected(act1, name=:fc2, num_hidden=64)
act2 = mx.Activation(fc2, name=:relu2, act_type=:relu)
fc3 = mx.FullyConnected(act2, name=:fc3, num_hidden=10)
```
Note each composition we take the previous symbol as the first argument,
forming a feedforward chain. The architecture looks like
```
Input --> 128 units (ReLU) --> 64 units (ReLU) --> 10 units
```
where the last 10 units correspond to the 10 output classes (digits
0,...,9). We then add a final `SoftmaxOutput` operation to turn the
10-dimensional prediction to proper probability values for the 10
classes:
```julia
mlp = mx.SoftmaxOutput(fc3, name=:softmax)
```
As we can see, the MLP is just a chain of layers. For this case, we can
also use the `mx.chain` macro. The same architecture above can be
defined as
```julia
mlp = @mx.chain mx.Variable(:data) =>
mx.FullyConnected(name=:fc1, num_hidden=128) =>
mx.Activation(name=:relu1, act_type=:relu) =>
mx.FullyConnected(name=:fc2, num_hidden=64) =>
mx.Activation(name=:relu2, act_type=:relu) =>
mx.FullyConnected(name=:fc3, num_hidden=10) =>
mx.SoftmaxOutput(name=:softmax)
```
After defining the architecture, we are ready to load the MNIST data.
MXNet.jl provide built-in data providers for the MNIST dataset, which
could automatically download the dataset into
`Pkg.dir("MXNet")/data/mnist` if necessary. We wrap the code to
construct the data provider into `mnist-data.jl` so that it could be
shared by both the MLP example and the LeNet ConvNets example.
```julia
batch_size = 100
include("mnist-data.jl")
train_provider, eval_provider = get_mnist_providers(batch_size)
```
If you need to write your own data providers for customized data format,
please refer to [`mx.AbstractDataProvider`](@ref).
Given the architecture and data, we can instantiate an *model* to do the
actual training. `mx.FeedForward` is the built-in model that is suitable
for most feed-forward architectures. When constructing the model, we
also specify the *context* on which the computation should be carried
out. Because this is a really tiny MLP, we will just run on a single CPU
device.
```julia
model = mx.FeedForward(mlp, context=mx.cpu())
```
You can use a `mx.gpu()` or if a list of devices (e.g.
`[mx.gpu(0), mx.gpu(1)]`) is provided, data-parallelization will be used
automatically. But for this tiny example, using a GPU device might not
help.
The last thing we need to specify is the optimization algorithm (a.k.a.
*optimizer*) to use. We use the basic SGD with a fixed learning rate 0.1
, momentum 0.9 and weight decay 0.00001:
```julia
optimizer = mx.SGD(η=0.1, μ=0.9, λ=0.00001)
```
Now we can do the training. Here the `n_epoch` parameter specifies that
we want to train for 20 epochs. We also supply a `eval_data` to monitor
validation accuracy on the validation set.
```julia
mx.fit(model, optimizer, train_provider, n_epoch=20, eval_data=eval_provider)
```
Here is a sample output
```
INFO: Start training on [CPU0]
INFO: Initializing parameters...
INFO: Creating KVStore...
INFO: == Epoch 001 ==========
INFO: ## Training summary
INFO: :accuracy = 0.7554
INFO: time = 1.3165 seconds
INFO: ## Validation summary
INFO: :accuracy = 0.9502
...
INFO: == Epoch 020 ==========
INFO: ## Training summary
INFO: :accuracy = 0.9949
INFO: time = 0.9287 seconds
INFO: ## Validation summary
INFO: :accuracy = 0.9775
```
Convolutional Neural Networks
-----------------------------
In the second example, we show a slightly more complicated architecture
that involves convolution and pooling. This architecture for the MNIST
is usually called the \[LeNet\]\_. The first part of the architecture is
listed below:
```julia
# input
data = mx.Variable(:data)
# first conv
conv1 = @mx.chain mx.Convolution(data, kernel=(5,5), num_filter=20) =>
mx.Activation(act_type=:tanh) =>
mx.Pooling(pool_type=:max, kernel=(2,2), stride=(2,2))
# second conv
conv2 = @mx.chain mx.Convolution(conv1, kernel=(5,5), num_filter=50) =>
mx.Activation(act_type=:tanh) =>
mx.Pooling(pool_type=:max, kernel=(2,2), stride=(2,2))
```
We basically defined two convolution modules. Each convolution module is
actually a chain of `Convolution`, `tanh` activation and then max
`Pooling` operations.
Each sample in the MNIST dataset is a 28x28 single-channel grayscale
image. In the tensor format used by `NDArray`, a batch of 100 samples is
a tensor of shape `(28,28,1,100)`. The convolution and pooling operates
in the spatial axis, so `kernel=(5,5)` indicate a square region of
5-width and 5-height. The rest of the architecture follows as:
```julia
# first fully-connected
fc1 = @mx.chain mx.Flatten(conv2) =>
mx.FullyConnected(num_hidden=500) =>
mx.Activation(act_type=:tanh)
# second fully-connected
fc2 = mx.FullyConnected(fc1, num_hidden=10)
# softmax loss
lenet = mx.Softmax(fc2, name=:softmax)
```
Note a fully-connected operator expects the input to be a matrix.
However, the results from spatial convolution and pooling are 4D
tensors. So we explicitly used a `Flatten` operator to flat the tensor,
before connecting it to the `FullyConnected` operator.
The rest of the network is the same as the previous MLP example. As
before, we can now load the MNIST dataset:
```julia
batch_size = 100
include("mnist-data.jl")
train_provider, eval_provider = get_mnist_providers(batch_size; flat=false)
```
Note we specified `flat=false` to tell the data provider to provide 4D
tensors instead of 2D matrices because the convolution operators needs
correct spatial shape information. We then construct a feedforward model
on GPU, and train it.
```julia
# fit model
model = mx.FeedForward(lenet, context=mx.gpu())
# optimizer
optimizer = mx.SGD(η=0.05, μ=0.9, λ=0.00001)
# fit parameters
mx.fit(model, optimizer, train_provider, n_epoch=20, eval_data=eval_provider)
```
And here is a sample of running outputs:
```
INFO: == Epoch 001 ==========
INFO: ## Training summary
INFO: :accuracy = 0.6750
INFO: time = 4.9814 seconds
INFO: ## Validation summary
INFO: :accuracy = 0.9712
...
INFO: == Epoch 020 ==========
INFO: ## Training summary
INFO: :accuracy = 1.0000
INFO: time = 4.0086 seconds
INFO: ## Validation summary
INFO: :accuracy = 0.9915
```
Predicting with a trained model
-------------------------------
Predicting with a trained model is very simple. By calling `mx.predict`
with the model and a data provider, we get the model output as a Julia
Array:
```julia
probs = mx.predict(model, eval_provider)
```
The following code shows a stupid way of getting all the labels from the
data provider, and compute the prediction accuracy manually:
```julia
# collect all labels from eval data
labels = reduce(
vcat,
copy(mx.get(eval_provider, batch, :softmax_label)) for batch ∈ eval_provider)
# labels are 0...9
labels .= labels .+ 1
# Now we use compute the accuracy
pred = map(i -> argmax(probs[1:10, i]), 1:size(probs, 2))
correct = sum(pred .== labels)
@printf "Accuracy on eval set: %.2f%%\n" 100correct/length(labels)
```
Alternatively, when the dataset is huge, one can provide a callback to
`mx.predict`, then the callback function will be invoked with the
outputs of each mini-batch. The callback could, for example, write the
data to disk for future inspection. In this case, no value is returned
from `mx.predict`. See also predict.