| Handwritten Digits Classification Competition |
| ============================================= |
| |
| [MNIST](http://yann.lecun.com/exdb/mnist/) is a handwritten digits image data set created by Yann LeCun. Every digit is represented by a 28x28 image. It has become a standard data set to test classifiers on simple image input. Neural network is no doubt a strong model for image classification tasks. There's a [long-term hosted competition](https://www.kaggle.com/c/digit-recognizer) on Kaggle using this data set. |
| We will present the basic usage of [mxnet](https://github.com/dmlc/mxnet/tree/master/R-package) to compete in this challenge. |
| |
| This tutorial is written in Rmarkdown. You can download the source [here](https://github.com/dmlc/mxnet/blob/master/R-package/vignettes/mnistCompetition.Rmd) and view a |
| hosted version of tutorial [here](http://mxnet.readthedocs.io/en/latest/packages/r/mnistCompetition.html). |
| |
| ## Data Loading |
| |
| First, let us download the data from [here](https://www.kaggle.com/c/digit-recognizer/data), and put them under the `data/` folder in your working directory. |
| |
| Then we can read them in R and convert to matrices. |
| |
| ```{r, eval=FALSE} |
| require(mxnet) |
| train <- read.csv("train.csv", header=TRUE) |
| test <- read.csv("test.csv", header=TRUE) |
| train <- data.matrix(train) |
| test <- data.matrix(test) |
| |
| train.x <- train[,-1] |
| train.y <- train[,1] |
| ``` |
| |
| Besides using the csv files from kaggle, you can also read the orginal MNIST dataset into R. |
| |
| ```{r, eval=FALSE} |
| load_image_file <- function(filename) { |
| f = file(filename, 'rb') |
| readBin(f, 'integer', n = 1, size = 4, endian = 'big') |
| n = readBin(f,'integer', n = 1, size = 4, endian = 'big') |
| nrow = readBin(f,'integer', n = 1, size = 4, endian = 'big') |
| ncol = readBin(f,'integer', n = 1, size = 4, endian = 'big') |
| x = readBin(f, 'integer', n = n * nrow * ncol, size = 1, signed = F) |
| x = matrix(x, ncol = nrow * ncol, byrow = T) |
| close(f) |
| x |
| } |
| |
| load_label_file <- function(filename) { |
| f = file(filename, 'rb') |
| readBin(f,'integer', n = 1, size = 4, endian = 'big') |
| n = readBin(f,'integer', n = 1, size = 4, endian = 'big') |
| y = readBin(f,'integer', n = n, size = 1, signed = F) |
| close(f) |
| y |
| } |
| |
| train.x <- load_image_file('mnist/train-images-idx3-ubyte') |
| test.y <- load_image_file('mnist/t10k-images-idx3-ubyte') |
| |
| train.y <- load_label_file('mnist/train-labels-idx1-ubyte') |
| test.y <- load_label_file('mnist/t10k-labels-idx1-ubyte') |
| ``` |
| |
| Here every image is represented as a single row in train/test. The greyscale of each image falls in the range [0, 255], we can linearly transform it into [0,1] by |
| |
| ```{r, eval=FALSE} |
| train.x <- t(train.x/255) |
| test <- t(test/255) |
| ``` |
| We also transpose the input matrix to npixel x nexamples, which is the column major format accepted by mxnet (and the convention of R). |
| |
| In the label part, we see the number of each digit is fairly even: |
| |
| ```{r, eval=FALSE} |
| table(train.y) |
| ``` |
| |
| ## Network Configuration |
| |
| Now we have the data. The next step is to configure the structure of our network. |
| |
| ```{r, eval=FALSE} |
| data <- mx.symbol.Variable("data") |
| fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=128) |
| act1 <- mx.symbol.Activation(fc1, name="relu1", act_type="relu") |
| fc2 <- mx.symbol.FullyConnected(act1, name="fc2", num_hidden=64) |
| act2 <- mx.symbol.Activation(fc2, name="relu2", act_type="relu") |
| fc3 <- mx.symbol.FullyConnected(act2, name="fc3", num_hidden=10) |
| softmax <- mx.symbol.SoftmaxOutput(fc3, name="sm") |
| ``` |
| |
| 1. In `mxnet`, we use its own data type `symbol` to configure the network. `data <- mx.symbol.Variable("data")` use `data` to represent the input data, i.e. the input layer. |
| 2. Then we set the first hidden layer by `fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=128)`. This layer has `data` as the input, its name and the number of hidden neurons. |
| 3. The activation is set by `act1 <- mx.symbol.Activation(fc1, name="relu1", act_type="relu")`. The activation function takes the output from the first hidden layer `fc1`. |
| 4. The second hidden layer takes the result from `act1` as the input, with its name as "fc2" and the number of hidden neurons as 64. |
| 5. the second activation is almost the same as `act1`, except we have a different input source and name. |
| 6. Here comes the output layer. Since there's only 10 digits, we set the number of neurons to 10. |
| 7. Finally we set the activation to softmax to get a probabilistic prediction. |
| |
| If you are a big fan of the `%>%` operator, you can also define the network as below: |
| |
| ```{r, eval=FALSE} |
| library(magrittr) |
| softmax <- mx.symbol.Variable("data") %>% |
| mx.symbol.FullyConnected(name = "fc1", num_hidden = 128) %>% |
| mx.symbol.Activation(name = "relu1", act_type = "relu") %>% |
| mx.symbol.FullyConnected(name = "fc2", num_hidden = 64) %>% |
| mx.symbol.Activation(name = "relu2", act_type = "relu") %>% |
| mx.symbol.FullyConnected(name="fc3", num_hidden=10) %>% |
| mx.symbol.SoftmaxOutput(name="sm") |
| ``` |
| |
| ## Training |
| |
| We are almost ready for the training process. Before we start the computation, let's decide what device should we use. |
| |
| ```{r, eval=FALSE} |
| devices <- mx.cpu() |
| ``` |
| |
| Here we assign CPU to `mxnet`. After all these preparation, you can run the following command to train the neural network! Note that `mx.set.seed` is the correct function to control the random process in `mxnet`. |
| |
| ```{r, eval=FALSE} |
| mx.set.seed(0) |
| model <- mx.model.FeedForward.create(softmax, X=train.x, y=train.y, |
| ctx=devices, num.round=10, array.batch.size=100, |
| learning.rate=0.07, momentum=0.9, eval.metric=mx.metric.accuracy, |
| initializer=mx.init.uniform(0.07), |
| batch.end.callback=mx.callback.log.train.metric(100)) |
| ``` |
| |
| ## Prediction and Submission |
| |
| To make prediction, we can simply write |
| |
| ```{r, eval=FALSE} |
| preds <- predict(model, test) |
| dim(preds) |
| ``` |
| |
| It is a matrix with 28000 rows and 10 cols, containing the desired classification probabilities from the output layer. To extract the maximum label for each row, we can use the `max.col` in R: |
| |
| ```{r, eval=FALSE} |
| pred.label <- max.col(t(preds)) - 1 |
| table(pred.label) |
| ``` |
| |
| With a little extra effort in the csv format, we can have our submission to the competition! |
| |
| ```{r, eval=FALSE} |
| submission <- data.frame(ImageId=1:ncol(test), Label=pred.label) |
| write.csv(submission, file='submission.csv', row.names=FALSE, quote=FALSE) |
| ``` |
| |
| ## LeNet |
| |
| Next we are going to introduce a new network structure: [LeNet](http://yann.lecun.com/exdb/lenet/). It is proposed by Yann LeCun to recognize handwritten digits. Now we are going to demonstrate how to construct and train an LeNet in `mxnet`. |
| |
| |
| First we construct the network: |
| |
| ```{r} |
| require(mxnet) |
| # input |
| data <- mx.symbol.Variable('data') |
| # first conv |
| conv1 <- mx.symbol.Convolution(data=data, kernel=c(5,5), num_filter=20) |
| tanh1 <- mx.symbol.Activation(data=conv1, act_type="tanh") |
| pool1 <- mx.symbol.Pooling(data=tanh1, pool_type="max", |
| kernel=c(2,2), stride=c(2,2)) |
| # second conv |
| conv2 <- mx.symbol.Convolution(data=pool1, kernel=c(5,5), num_filter=50) |
| tanh2 <- mx.symbol.Activation(data=conv2, act_type="tanh") |
| pool2 <- mx.symbol.Pooling(data=tanh2, pool_type="max", |
| kernel=c(2,2), stride=c(2,2)) |
| # first fullc |
| flatten <- mx.symbol.Flatten(data=pool2) |
| fc1 <- mx.symbol.FullyConnected(data=flatten, num_hidden=500) |
| tanh3 <- mx.symbol.Activation(data=fc1, act_type="tanh") |
| # second fullc |
| fc2 <- mx.symbol.FullyConnected(data=tanh3, num_hidden=10) |
| # loss |
| lenet <- mx.symbol.SoftmaxOutput(data=fc2) |
| ``` |
| |
| Then let us reshape the matrices into arrays: |
| |
| ```{r, eval=FALSE} |
| train.array <- train.x |
| dim(train.array) <- c(28, 28, 1, ncol(train.x)) |
| test.array <- test.x |
| dim(test.array) <- c(28, 28, 1, ncol(test.x)) |
| ``` |
| |
| Next we are going to compare the training speed on different devices, so the definition of the devices goes first: |
| |
| ```{r, eval=FALSE} |
| n.gpu <- 1 |
| device.cpu <- mx.cpu() |
| device.gpu <- lapply(0:(n.gpu-1), function(i) { |
| mx.gpu(i) |
| }) |
| ``` |
| |
| As you can see, we can pass a list of devices, to ask mxnet to train on multiple GPUs (you can do similar thing for cpu, |
| but since internal computation of cpu is already multi-threaded, there is less gain than using GPUs). |
| |
| We start by training on CPU first. Because it takes a bit time to do so, we will only run it for one iteration. |
| |
| ```{r, eval=FALSE} |
| mx.set.seed(0) |
| tic <- proc.time() |
| model <- mx.model.FeedForward.create(lenet, X=train.array, y=train.y, |
| ctx=device.cpu, num.round=1, array.batch.size=100, |
| learning.rate=0.05, momentum=0.9, wd=0.00001, |
| eval.metric=mx.metric.accuracy, |
| batch.end.callback=mx.callback.log.train.metric(100)) |
| print(proc.time() - tic) |
| ``` |
| |
| Training on GPU: |
| |
| ```{r, eval=FALSE} |
| mx.set.seed(0) |
| tic <- proc.time() |
| model <- mx.model.FeedForward.create(lenet, X=train.array, y=train.y, |
| ctx=device.gpu, num.round=5, array.batch.size=100, |
| learning.rate=0.05, momentum=0.9, wd=0.00001, |
| eval.metric=mx.metric.accuracy, |
| batch.end.callback=mx.callback.log.train.metric(100)) |
| print(proc.time() - tic) |
| ``` |
| |
| As you can see by using GPU, we can get a much faster speedup in training! |
| Finally we can submit the result to Kaggle again to see the improvement of our ranking! |
| |
| ```{r, eval=FALSE} |
| preds <- predict(model, test.array) |
| pred.label <- max.col(t(preds)) - 1 |
| submission <- data.frame(ImageId=1:ncol(test), Label=pred.label) |
| write.csv(submission, file='submission.csv', row.names=FALSE, quote=FALSE) |
| ``` |
| |
| ![](../web-data/mxnet/knitr/mnistCompetition-kaggle-submission.png) |