# Handwritten Digits Classification Competition | |

[MNIST](http://yann.lecun.com/exdb/mnist/) is a handwritten digits image data set created by Yann LeCun. Every digit is represented by a 28x28 image. It has become a standard data set to test classifiers on simple image input. Neural network is no doubt a strong model for image classification tasks. There's a [long-term hosted competition](https://www.kaggle.com/c/digit-recognizer) on Kaggle using this data set. | |

We will present the basic usage of [mxnet](https://github.com/dmlc/mxnet/tree/master/R-package) to compete in this challenge. | |

## Data Loading | |

First, let us download the data from [here](https://www.kaggle.com/c/digit-recognizer/data), and put them under the `data/` folder in your working directory. | |

Then we can read them in R and convert to matrices. | |

```{r, echo=FALSE} | |

download.file('https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/R/data/mnist_csv.zip', destfile = 'mnist_csv.zip') | |

unzip('mnist_csv.zip', exdir = '.') | |

``` | |

```{r} | |

require(mxnet) | |

train <- read.csv("train.csv", header=TRUE) | |

test <- read.csv("test.csv", header=TRUE) | |

train <- data.matrix(train) | |

test <- data.matrix(test) | |

train.x <- train[,-1] | |

train.y <- train[,1] | |

``` | |

Besides using the csv files from kaggle, you can also read the orginal MNIST dataset into R. | |

```{r, eval=FALSE} | |

load_image_file <- function(filename) { | |

f = file(filename, 'rb') | |

readBin(f, 'integer', n = 1, size = 4, endian = 'big') | |

n = readBin(f,'integer', n = 1, size = 4, endian = 'big') | |

nrow = readBin(f,'integer', n = 1, size = 4, endian = 'big') | |

ncol = readBin(f,'integer', n = 1, size = 4, endian = 'big') | |

x = readBin(f, 'integer', n = n * nrow * ncol, size = 1, signed = F) | |

x = matrix(x, ncol = nrow * ncol, byrow = T) | |

close(f) | |

x | |

} | |

load_label_file <- function(filename) { | |

f = file(filename, 'rb') | |

readBin(f,'integer', n = 1, size = 4, endian = 'big') | |

n = readBin(f,'integer', n = 1, size = 4, endian = 'big') | |

y = readBin(f,'integer', n = n, size = 1, signed = F) | |

close(f) | |

y | |

} | |

train.x <- load_image_file('mnist/train-images-idx3-ubyte') | |

test.y <- load_image_file('mnist/t10k-images-idx3-ubyte') | |

train.y <- load_label_file('mnist/train-labels-idx1-ubyte') | |

test.y <- load_label_file('mnist/t10k-labels-idx1-ubyte') | |

``` | |

Here every image is represented as a single row in train/test. The greyscale of each image falls in the range [0, 255], we can linearly transform it into [0,1] by | |

```{r} | |

train.x <- t(train.x/255) | |

test <- t(test/255) | |

``` | |

We also transpose the input matrix to npixel x nexamples, which is the column major format accepted by mxnet (and the convention of R). | |

In the label part, we see the number of each digit is fairly even: | |

```{r} | |

table(train.y) | |

``` | |

## Network Configuration | |

Now we have the data. The next step is to configure the structure of our network. | |

```{r} | |

data <- mx.symbol.Variable("data") | |

fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=128) | |

act1 <- mx.symbol.Activation(fc1, name="relu1", act_type="relu") | |

fc2 <- mx.symbol.FullyConnected(act1, name="fc2", num_hidden=64) | |

act2 <- mx.symbol.Activation(fc2, name="relu2", act_type="relu") | |

fc3 <- mx.symbol.FullyConnected(act2, name="fc3", num_hidden=10) | |

softmax <- mx.symbol.SoftmaxOutput(fc3, name="sm") | |

``` | |

1. In `mxnet`, we use its own data type `symbol` to configure the network. `data <- mx.symbol.Variable("data")` use `data` to represent the input data, i.e. the input layer. | |

2. Then we set the first hidden layer by `fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=128)`. This layer has `data` as the input, its name and the number of hidden neurons. | |

3. The activation is set by `act1 <- mx.symbol.Activation(fc1, name="relu1", act_type="relu")`. The activation function takes the output from the first hidden layer `fc1`. | |

4. The second hidden layer takes the result from `act1` as the input, with its name as "fc2" and the number of hidden neurons as 64. | |

5. the second activation is almost the same as `act1`, except we have a different input source and name. | |

6. Here comes the output layer. Since there's only 10 digits, we set the number of neurons to 10. | |

7. Finally we set the activation to softmax to get a probabilistic prediction. | |

If you are a big fan of the `%>%` operator, you can also define the network as below: | |

```{r, eval=FALSE} | |

library(magrittr) | |

softmax <- mx.symbol.Variable("data") %>% | |

mx.symbol.FullyConnected(name = "fc1", num_hidden = 128) %>% | |

mx.symbol.Activation(name = "relu1", act_type = "relu") %>% | |

mx.symbol.FullyConnected(name = "fc2", num_hidden = 64) %>% | |

mx.symbol.Activation(name = "relu2", act_type = "relu") %>% | |

mx.symbol.FullyConnected(name="fc3", num_hidden=10) %>% | |

mx.symbol.SoftmaxOutput(name="sm") | |

``` | |

## Training | |

We are almost ready for the training process. Before we start the computation, let's decide what device should we use. | |

```{r} | |

devices <- mx.cpu() | |

``` | |

Here we assign CPU to `mxnet`. After all these preparation, you can run the following command to train the neural network! Note that `mx.set.seed` is the correct function to control the random process in `mxnet`. | |

```{r} | |

mx.set.seed(0) | |

model <- mx.model.FeedForward.create(softmax, X = train.x, y = train.y, | |

ctx = devices, num.round = 5, | |

array.batch.size = 100, | |

learning.rate = 0.07, momentum = 0.9, | |

eval.metric = mx.metric.accuracy, | |

initializer = mx.init.uniform(0.07), | |

batch.end.callback = mx.callback.log.train.metric(100)) | |

``` | |

## Prediction and Submission | |

To make prediction, we can simply write | |

```{r} | |

preds <- predict(model, test) | |

dim(preds) | |

``` | |

It is a matrix with 28000 rows and 10 cols, containing the desired classification probabilities from the output layer. To extract the maximum label for each row, we can use the `max.col` in R: | |

```{r} | |

pred.label <- max.col(t(preds)) - 1 | |

table(pred.label) | |

``` | |

With a little extra effort in the csv format, we can have our submission to the competition! | |

```{r, eval = FALSE} | |

submission <- data.frame(ImageId=1:ncol(test), Label=pred.label) | |

write.csv(submission, file='submission.csv', row.names=FALSE, quote=FALSE) | |

``` | |

## LeNet | |

Next we are going to introduce a new network structure: [LeNet](http://yann.lecun.com/exdb/lenet/). It is proposed by Yann LeCun to recognize handwritten digits. Now we are going to demonstrate how to construct and train an LeNet in `mxnet`. | |

First we construct the network: | |

```{r} | |

require(mxnet) | |

# input | |

data <- mx.symbol.Variable('data') | |

# first conv | |

conv1 <- mx.symbol.Convolution(data=data, kernel=c(5,5), num_filter=20) | |

tanh1 <- mx.symbol.Activation(data=conv1, act_type="tanh") | |

pool1 <- mx.symbol.Pooling(data=tanh1, pool_type="max", | |

kernel=c(2,2), stride=c(2,2)) | |

# second conv | |

conv2 <- mx.symbol.Convolution(data=pool1, kernel=c(5,5), num_filter=50) | |

tanh2 <- mx.symbol.Activation(data=conv2, act_type="tanh") | |

pool2 <- mx.symbol.Pooling(data=tanh2, pool_type="max", | |

kernel=c(2,2), stride=c(2,2)) | |

# first fullc | |

flatten <- mx.symbol.Flatten(data=pool2) | |

fc1 <- mx.symbol.FullyConnected(data=flatten, num_hidden=500) | |

tanh3 <- mx.symbol.Activation(data=fc1, act_type="tanh") | |

# second fullc | |

fc2 <- mx.symbol.FullyConnected(data=tanh3, num_hidden=10) | |

# loss | |

lenet <- mx.symbol.SoftmaxOutput(data=fc2) | |

``` | |

Then let us reshape the matrices into arrays: | |

```{r} | |

train.array <- train.x | |

dim(train.array) <- c(28, 28, 1, ncol(train.x)) | |

test.array <- test | |

dim(test.array) <- c(28, 28, 1, ncol(test)) | |

``` | |

Next we are going to compare the training speed on different devices, so the definition of the devices goes first: | |

```{r} | |

n.gpu <- 1 | |

device.cpu <- mx.cpu() | |

device.gpu <- lapply(0:(n.gpu-1), function(i) { | |

mx.gpu(i) | |

}) | |

``` | |

As you can see, we can pass a list of devices, to ask mxnet to train on multiple GPUs (you can do similar thing for cpu, | |

but since internal computation of cpu is already multi-threaded, there is less gain than using GPUs). | |

We start by training on CPU first. Because it takes a bit time to do so, we will only run it for one iteration. | |

```{r} | |

mx.set.seed(0) | |

tic <- proc.time() | |

model <- mx.model.FeedForward.create(lenet, X = train.array, y = train.y, | |

ctx = device.cpu, num.round = 1, | |

array.batch.size = 100, | |

learning.rate = 0.05, momentum = 0.9, wd = 0.00001, | |

eval.metric = mx.metric.accuracy, | |

batch.end.callback = mx.callback.log.train.metric(100)) | |

print(proc.time() - tic) | |

``` | |

Training on GPU: | |

```{r} | |

mx.set.seed(0) | |

tic <- proc.time() | |

model <- mx.model.FeedForward.create(lenet, X = train.array, y = train.y, | |

ctx = device.gpu, num.round = 5, | |

array.batch.size = 100, | |

learning.rate = 0.05, momentum = 0.9, wd = 0.00001, | |

eval.metric = mx.metric.accuracy, | |

batch.end.callback = mx.callback.log.train.metric(100)) | |

print(proc.time() - tic) | |

``` | |

As you can see by using GPU, we can get a much faster speedup in training! | |

Finally we can submit the result to Kaggle again to see the improvement of our ranking! | |

```{r, eval = FALSE} | |

preds <- predict(model, test.array) | |

pred.label <- max.col(t(preds)) - 1 | |

submission <- data.frame(ImageId=1:ncol(test), Label=pred.label) | |

write.csv(submission, file='submission.csv', row.names=FALSE, quote=FALSE) | |

``` | |

![](https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/knitr/mnistCompetition-kaggle-submission.png) | |

<!-- INSERT SOURCE DOWNLOAD BUTTONS --> |