blob: 73e9a9f3ff54fe6c1bc350eb40da52473f61331f [file] [log] [blame] [view]
# Image Classification
In this tutorial, we assign
labels to an image with confidence scores. The following figure ([source](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf)) shows an example:
<img src=https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/image/image-classification.png
width=600/>
Get the source code for the tutorial from [GitHub](https://github.com/dmlc/mxnet/tree/master/example/image-classification).
## Training
To train models on a particular dataset, use `train_dataset.py`. For example:
- To train an MLP on mnist, use this command:
```bash
python train_mnist.py
```
- To save the models for each epoch, use this command:
```bash
mkdir model; python train_mnist.py --model-prefix model/mnist
```
- To restart training from the model saved at epoch 8, use this command:
```bash
python train_mnist.py --model-prefix model/mnist --load-epoch 8
```
- To choose another initial learning rate, and decay it by 0.9 for every half epoch, use this command:
```bash
python train_mnist.py --lr .1 --lr-factor .9 --lr-factor-epoch .5
```
- To train a convolutional neural network on mnist by using GPU 0, use this command:
```bash
python train_mnist.py --network lenet --gpus 0
```
- To use multiple GPUs, specify the list; for example: `---gpus 0,1,3.`
- To see more options, use `--help`.
## Distributed Training
To speed training, train a model using multiple computers.
* Quickly test distributed training on your local computer by using two workers:
```bash
../../tools/launch.py -n 2 python train_mnist.py --kv-store dist_sync
```
You can use either synchronous SGD `dist_sync` or asynchronous SGD
`dist_async`.
* If you have several computers that you can connect to using SSH, and if this mxnet folder is
accessible on these computers (is mounted as an NFS; see a tutorial for [Ubuntu](https://help.ubuntu.com/lts/serverguide/network-file-system.html)), run a job on these computers, first by saving their hostnames on a file, for example:
```bash
$ cat hosts
172.30.0.172
172.30.0.171
```
* Then pass this file using `-H`:
```bash
../../tools/launch.py -n 2 -H hosts python train_mnist.py --kv-store dist_sync
```
* If the mxnet folder isn't available on the other computers, copy the mxnet
library to this example folder:
```bash
cp -r ../../python/mxnet .
cp -r ../../lib/libmxnet.so mxnet
```
Then synchronize the folder to other the other computers `/tmp/mxnet` before running:
```bash
../../tools/launch.py -n 2 -H hosts --sync-dir /tmp/mxnet python train_mnist.py --kv-store dist_sync
```
For more launch options, for example, using `YARN`, and information about how to write a distributed training
program, see this [tutorial](http://mxnet.io/how_to/multi_devices.html).
## Generating Predictions
You have several options for generating predictions:
- Use a [pre-trained model](http://mxnet.io/tutorials/python/predict_imagenet.html). More pre-trained models are provided in the [model gallery](https://github.com/dmlc/mxnet-model-gallery).
- Use your own datasets.
- You can also easily run the prediction on various devices, such as
[Android/iOS](http://dmlc.ml/mxnet/2015/11/10/deep-learning-in-a-single-file-for-smart-device.html).
### Using Your Own Datasets
There are two ways to feed data into MXNet:
- Pack all examples into one or more compact `recordio` files. For more information, see this [step-by-step tutorial](http://mxnet.io/api/python/io.html#create-a-dataset-using-recordio) and [documentation](http://mxnet.io/architecture/note_data_loading.html). Avoid the common mistake of neglecting to shuffle the image list during packing. This causes training to fail. For example, ```accuracy``` keeps 0.001 for several rounds.
**Note:** We automatically download the small datasets, such as `mnist` and `cifar10`.
- For small datasets, which can be easily loaded into memory, here is an example:
```python
from sklearn.datasets import fetch_mldata
from sklearn.utils import shuffle
mnist = fetch_mldata('MNIST original', data_home="./mnist")
# shuffle data
X, y = shuffle(mnist.data, mnist.target)
# split dataset
train_data = X[:50000, :].astype('float32')
train_label = y[:50000]
val_data = X[50000: 60000, :].astype('float32')
val_label = y[50000:60000]
# Normalize data
train_data[:] /= 256.0
val_data[:] /= 256.0
# create a numpy iterator
batch_size = 100
train_iter = mx.io.NDArrayIter(train_data, train_label, batch_size=batch_size, shuffle=True)
val_iter = mx.io.NDArrayIter(val_data, val_label, batch_size=batch_size)
# create model as usual: model = mx.model.FeedForward(...)
model.fit(X = train_iter, eval_data = val_iter)
```
## Improving Performance
The following factors can significantly improve performance:
* A fast back end. A fast BLAS library, e.g., openblas, atlas,
and mkl, is necessary only if you are using a CPU processor. For Nvidia GPUs, we strongly
recommend using CUDNN.
* Input data:
- Data format. Use the `rec` format.
- A number of threads used for decoding. By default, MXNet uses four CPU threads for decoding images, which
can often decode more than 1 Kb images per second. If you are using a low-end CPU or
very powerful GPUs, you
can increase the number of threads .
- Data storage location. Any local or distributed file system (HDFS, Amazon
S3) should be fine. If multiple computers read the
data from the network shared file system (NFS) at the same time, however, you might encounter a problem.
- Batch size. We recommend using the largest size that the GPU memory can accommodate. A value that is too large might slow down convergence. A safe batch size for CIFAR 10 is approximately 200; for ImageNet
1K, the batch size can exceed 1 Kb.
* If you are using more than one GPU, the right `kvstore`. For more information, see
[this guide](http://mxnet.io/how_to/multi_devices.html#distributed-training-with-multiple-machines).
- For a single computer, the default `local` is often sufficient. For models bigger than 100 MB, such as AlexNet
and VGG, you might want
to use `local_allreduce_device`. `local_allreduce_device` uses more GPU memory than
other options.
- For multiple computers, we recommend trying to use `dist_sync` first. If the
model is very large or if you use a large number of computers, you might want to use `dist_async`.
## Results
- Computers
| name | hardware | software |
| --- | --- | --- |
| GTX980 | Xeon E5-1650 v3, 4 x GTX 980 | GCC 4.8, CUDA 7.5, CUDNN 3 |
| TitanX | dual Xeon E5-2630 v3, 4 x GTX Titan X | GCC 4.8, CUDA 7.5, CUDNN 3 |
| EC2-g2.8x | Xeon E5-2670, 2 x GRID K520, 10G Ethernet | GCC 4.8, CUDA 7.5, CUDNN 3 |
- Datasets
| name | class | image size | training | testing |
| ---- | ----: | ---------: | -------: | ------: |
| CIFAR 10 | 10 | 28 × 28 × 3 | 60,000 | 10,000 |
| ILSVRC 12 | 1,000 | 227 × 227 × 3 | 1,281,167 | 50,000 |
### CIFAR 10
- Command
```bash
python train_cifar10.py --batch-size 128 --lr 0.1 --lr-factor .94 --num-epoch 50
```
- Performance:
| 1 GTX 980 | 2 GTX 980 | 4 GTX 980 |
| --- | --- | --- |
| 842 img/sec | 1640 img/sec | 2943 img/sec |
- Accuracy vs epoch ([interactive figure](https://docs.google.com/spreadsheets/d/1kV2aDUXNyPn3t5nj8UdPA61AdRF4_w1UNmxaqu-cRBA/pubchart?oid=761035336&format=interactive)):
<img src=https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/image/inception-with-bn-cifar10.png width=400px/>
### ILSVRC 12
<!-- #### Alexnet -->
<!-- `train_imagenet.py` with `--network alexnet` -->
<!-- - time for one epoch: -->
<!-- | 1 x GTX 980 | 2 x GTX 980 | 4 x GTX 980 | -->
<!-- | ----------- | ------------ | ------------ | -->
<!-- | 2,413 sec | 1,244 sec | 906 sec | -->
#### VGG
`train_imagenet.py` with `--network vgg`
- Performance
| Cluster | # machines | # GPUs | batch size | kvstore | epoch time |
| --- | --- | --- | --- | --- | ---: |
| TitanX | 1 | 1 | 96 | `none` | 14,545 |
| - | - | 2 | - | `local` | 19,692 |
| - | - | 4 | - | - | 20,014 |
| - | - | 2 | - | `local_allreduce_device` | 9,142 |
| - | - | 4 | - | - | 8,533 |
| - | - | - | 384 | - | 5,161 |
#### Inception with Batch Normalization
`train_imagenet.py` with `--network inception-bn`
- Performance
| Cluster | # machines | # GPUs | batch size | kvstore | epoch time |
| --- | --- | --- | --- | --- | ---: |
| GTX980 | 1 | 1 | 32 | `local` | 13,210 |
| - | - | 2 | 64 | - | 7,198 |
| - | - | 3 | 128 | - | 4,952 |
| - | - | 4 | - | - | 3,589 |
| TitanX | 1 | 1 | 128 | `none` | 10,666 |
| - | - | 2 | - | `local` | 5,161 |
| - | - | 3 | - | - | 3,460 |
| - | - | 4 | - | - | 2,844 |
| - | - | - | 512 | - | 2,495 |
| EC2-g2.8x | 1 | 4 | 144 | `local` | 14,203 |
| - | 10 | 40 | 144 | `dist_sync` | 1,422 |
- Convergence
- `single machine` :
```bash
python train_imagenet.py --batch-size 144 --lr 0.05 --lr-factor .94 \
--gpus 0,1,2,3 --num-epoch 60 --network inception-bn \
--data-dir ilsvrc12/ --model-prefix model/ilsvrc12
```
- `10 x g2.8x` : `hosts` contains the private IPs of the 10 computers
```bash
../../tools/launch.py -H hosts -n 10 --sync-dir /tmp/mxnet \
python train_imagenet.py --batch-size 144 --lr 0.05 --lr-factor .94 \
--gpus 0,1,2,3 --num-epoch 60 --network inception-bn \
--kv-store dist_sync \
--data-dir s3://dmlc/ilsvrc12/ --model-prefix s3://dmlc/model/ilsvrc12
```
**Note:** Occasional instability in Amazon S3 might cause training to hang or generate frequent errors, preventing downloading data to `/mnt` first.
- Accuracy vs. epoch ([the interactive figure](https://docs.google.com/spreadsheets/d/1AEesHjWUZOzCN0Gp_PYI1Cw4U1kZMKot360p9Fowmjw/pubchart?oid=1740787404&format=interactive)):
<img src=https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/image/inception-with-bn-imagnet1k.png width=400px/>
## Next Steps
* [MXNet tutorials index](http://mxnet.io/tutorials/index.html)