| # Classify Real-world Images with Pre-trained Model |
| |
| |
| MXNet is a flexible and efficient deep learning framework. One of the cool things that a deep learning |
| algorithm can do is to classify real world images. |
| |
| In this example we will show how to use a pretrained Inception-BatchNorm network to predict the content of |
| real world image. The network architecture is described in [1]. |
| |
| The pre-trained Inception-BatchNorm network can be downloaded from [this link](http://data.mxnet.io/mxnet/data/Inception.zip). |
| This model gives the recent state-of-art prediction accuracy on the image net dataset. |
| |
| ## Package Loading |
| |
| To get started, we load the `mxnet` package first. |
| |
| ```{r} |
| require(mxnet) |
| ``` |
| |
| In this example, we also need the imager package to load and preprocess the images in R. |
| |
| ```{r} |
| require(imager) |
| ``` |
| |
| ## Load the Pretrained Model |
| |
| |
| Make sure you unzip the pre-trained model in current folder. And we can use the model |
| loading function to load the model into R. |
| |
| ```{r} |
| download.file('http://data.dmlc.ml/data/Inception.zip', destfile = 'Inception.zip') |
| unzip("Inception.zip") |
| model <- mx.model.load("Inception/Inception_BN", iteration = 39) |
| ``` |
| |
| We also need to load in the mean image, which is used for preprocessing using ```mx.nd.load```. |
| |
| ```{r} |
| mean.img <- as.array(mx.nd.load("Inception/mean_224.nd")[["mean_img"]]) |
| ``` |
| |
| ## Load and Preprocess the Image |
| |
| Now we are ready to classify a real image. In this example, we simply take the parrots image |
| from imager package. But you can always change it to other images. |
| |
| Load and plot the image: |
| |
| ```{r, fig.align='center'} |
| im <- load.image(system.file("extdata/parrots.png", package = "imager")) |
| plot(im) |
| ``` |
| |
| Before feeding the image to the deep net, we need to do some preprocessing |
| to make the image fit the input requirement of deepnet. The preprocessing |
| include cropping, and subtraction of the mean. |
| Because mxnet is deeply integerated with R, we can do all the processing in R function. |
| |
| The preprocessing function: |
| |
| ```{r} |
| preproc.image <- function(im, mean.image) { |
| # crop the image |
| shape <- dim(im) |
| short.edge <- min(shape[1:2]) |
| xx <- floor((shape[1] - short.edge) / 2) |
| yy <- floor((shape[2] - short.edge) / 2) |
| croped <- crop.borders(im, xx, yy) |
| # resize to 224 x 224, needed by input of the model. |
| resized <- resize(croped, 224, 224) |
| # convert to array (x, y, channel) |
| arr <- as.array(resized) * 255 |
| dim(arr) <- c(224, 224, 3) |
| # subtract the mean |
| normed <- arr - mean.img |
| # Reshape to format needed by mxnet (width, height, channel, num) |
| dim(normed) <- c(224, 224, 3, 1) |
| return(normed) |
| } |
| ``` |
| |
| We use the defined preprocessing function to get the normalized image. |
| |
| ```{r} |
| normed <- preproc.image(im, mean.img) |
| ``` |
| |
| ## Classify the Image |
| |
| Now we are ready to classify the image! We can use the predict function |
| to get the probability over classes. |
| |
| ```{r} |
| prob <- predict(model, X = normed) |
| dim(prob) |
| ``` |
| |
| As you can see ```prob``` is a 1 times 1000 array, which gives the probability |
| over the 1000 image classes of the input. |
| |
| We can use the ```max.col``` on the transpose of prob. get the class index. |
| |
| ```{r} |
| max.idx <- max.col(t(prob)) |
| max.idx |
| ``` |
| |
| The index do not make too much sense. So let us see what it really corresponds to. |
| We can read the names of the classes from the following file. |
| |
| ```{r} |
| synsets <- readLines("Inception/synset.txt") |
| ``` |
| |
| And let us see what it really is |
| |
| ```{r} |
| print(paste0("Predicted Top-class: ", synsets[[max.idx]])) |
| ``` |
| |
| Actually I do not know what does the word mean when I saw it. |
| So I searched on the web to check it out.. and hmm it does get the right answer :) |
| |
| ## Extract features |
| |
| |
| Besides the final classification results, we can also extract the internal features. |
| We need to get feature layer symbol out of internals first. Here we use `global_pool_output` |
| as an example. |
| |
| ```{r} |
| internals = model$symbol$get.internals() |
| fea_symbol = internals[[match("global_pool_output", internals$outputs)]] |
| ``` |
| |
| Next, we rebuild a new model using the feature symbol |
| |
| ```{r} |
| model2 <- list(symbol = fea_symbol, |
| arg.params = model$arg.params, |
| aux.params = model$aux.params) |
| |
| class(model2) <- "MXFeedForwardModel" |
| ``` |
| |
| Then we can do the `predict` using the new model to get the internal results. |
| You need to set `allow.extra.params = TRUE` since some parameters are not used this time. |
| |
| ```{r} |
| global_pooling_feature <- predict(model2, X = normed, allow.extra.params = TRUE) |
| dim(global_pooling_feature) |
| ``` |
| |
| |
| ## Reference |
| |
| |
| [1] Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015). |
| |
| |
| <!-- INSERT SOURCE DOWNLOAD BUTTONS --> |