blob: 2963d741b1715a3c06f63b7d23d4ca79ea29959f [file] [log] [blame]
# CNN Example
---
Convolutional neural network (CNN) is a type of feed-forward artificial neural
network widely used for image and video classification. In this example, we will
use a deep CNN model to do image classification for the
[CIFAR10 dataset](http://www.cs.toronto.edu/~kriz/cifar.html).
## Running instructions
Please refer to the [installation](installation.html) page for
instructions on building SINGA, and the [quick start](quick-start.html)
for instructions on starting zookeeper.
We have provided scripts for preparing the training and test dataset in *examples/cifar10/*.
# in examples/cifar10
$ cp Makefile.example Makefile
$ make download
$ make create
### Training on CPU
We can start the training by
./bin/singa-run.sh -conf examples/cifar10/job.conf
You should see output like
Record job information to /tmp/singa-log/job-info/job-2-20150817-055601
Executing : ./singa -conf /xxx/incubator-singa/examples/cifar10/job.conf -singa_conf /xxx/incubator-singa/conf/singa.conf -singa_job 2
E0817 06:56:18.868259 33849 cluster.cc:51] proc #0 -> 192.168.5.128:49152 (pid = 33849)
E0817 06:56:18.928452 33871 server.cc:36] Server (group = 0, id = 0) start
E0817 06:56:18.928469 33872 worker.cc:134] Worker (group = 0, id = 0) start
E0817 06:57:13.657302 33849 trainer.cc:373] Test step-0, loss : 2.302588, accuracy : 0.077900
E0817 06:57:17.626708 33849 trainer.cc:373] Train step-0, loss : 2.302578, accuracy : 0.062500
E0817 06:57:24.142645 33849 trainer.cc:373] Train step-30, loss : 2.302404, accuracy : 0.131250
E0817 06:57:30.813354 33849 trainer.cc:373] Train step-60, loss : 2.302248, accuracy : 0.156250
E0817 06:57:37.556655 33849 trainer.cc:373] Train step-90, loss : 2.301849, accuracy : 0.175000
E0817 06:57:44.971276 33849 trainer.cc:373] Train step-120, loss : 2.301077, accuracy : 0.137500
E0817 06:57:51.801949 33849 trainer.cc:373] Train step-150, loss : 2.300410, accuracy : 0.135417
E0817 06:57:58.682281 33849 trainer.cc:373] Train step-180, loss : 2.300067, accuracy : 0.127083
E0817 06:58:05.578366 33849 trainer.cc:373] Train step-210, loss : 2.300143, accuracy : 0.154167
E0817 06:58:12.518497 33849 trainer.cc:373] Train step-240, loss : 2.295912, accuracy : 0.185417
After training some steps (depends on the setting) or the job is
finished, SINGA will [checkpoint](checkpoint.html) the model parameters.
### Training on GPU
Since version 0.2, we can train CNN models on GPU using cuDNN. Please refer to
the [GPU page](gpu.html) for details on compiling SINGA with GPU and cuDNN.
The configuration file is similar to that for CPU training, except that the
cuDNN layers are used and the GPU device is configured.
./bin/singa-run.sh -conf examples/cifar10/cudnn.conf
### Training using Python script
The python helpers coming with SINGA 0.2 make it easy to configure a training
job. For example the *job.conf* is replaced with a simple python script
*mnist_mlp.py* which has about 30 lines of code following the [Keras API](http://keras.io/).
# on CPU
./bin/singa-run.sh -exec tool/python/examples/cifar10_cnn.py
# on GPU
./bin/singa-run.sh -exec tool/python/examples/cifar10_cnn_cudnn.py
## Details
To train a model in SINGA, you need to prepare the datasets,
and a job configuration which specifies the neural net structure, training
algorithm (BP or CD), SGD update algorithm (e.g. Adagrad),
number of training/test steps, etc.
### Data preparation
Before using SINGA, you need to write a program to convert the dataset
into a format that SINGA can read. Please refer to the
[Data Preparation](data.html#example---cifar-dataset) to get details about
preparing this CIFAR10 dataset.
### Neural net
Figure 1 shows the net structure of the CNN model we used in this example, which is
set following [Alex](https://code.google.com/p/cuda-convnet/source/browse/trunk/example-layers/layers-18pct.cfg.)
The dashed circle represents one feature transformation stage, which generally
has four layers as shown in the figure. Sometimes the rectifier layer and normalization layer
are omitted or swapped in one stage. For this example, there are 3 such stages.
Next we follow the guide in [neural net page](neural-net.html)
and [layer page](layer.html) to write the neural net configuration.
<div style = "text-align: center">
<img src = "../_static/images/example-cnn.png" style = "width: 200px"> <br/>
<strong>Figure 1 - Net structure of the CNN example.</strong></img>
</div>
* We configure an input layer to read the training/testing records from a disk file.
layer{
name: "data"
type: kRecordInput
store_conf {
backend: "kvfile"
path: "examples/cifar10/train_data.bin"
mean_file: "examples/cifar10/image_mean.bin"
batchsize: 64
random_skip: 5000
shape: 3
shape: 32
shape: 32
}
exclude: kTest # exclude this layer for the testing net
}
layer{
name: "data"
type: kRecordInput
store_conf {
backend: "kvfile"
path: "examples/cifar10/test_data.bin"
mean_file: "examples/cifar10/image_mean.bin"
batchsize: 100
shape: 3
shape: 32
shape: 32
}
exclude: kTrain # exclude this layer for the training net
}
* We configure layers for the feature transformation as follows
(all layers are built-in layers in SINGA; hyper-parameters of these layers are set according to
[Alex's setting](https://code.google.com/p/cuda-convnet/source/browse/trunk/example-layers/layers-18pct.cfg)).
layer {
name: "conv1"
type: kConvolution
srclayers: "data"
convolution_conf {... }
...
}
layer {
name: "pool1"
type: kPooling
srclayers: "conv1"
pooling_conf {... }
}
layer {
name: "relu1"
type: kReLU
srclayers:"pool1"
}
layer {
name: "norm1"
type: kLRN
lrn_conf {... }
srclayers:"relu1"
}
The configurations for another 2 stages are omitted here.
* There is an [inner product layer](layer.html#innerproductlayer)
after the 3 transformation stages, which is
configured with 10 output units, i.e., the number of total labels. The weight
matrix Param is configured with a large weight decay scale to reduce the over-fitting.
layer {
name: "ip1"
type: kInnerProduct
srclayers:"pool3"
innerproduct_conf {
num_output: 10
}
param {
name: "w4"
wd_scale:250
...
}
param {
name: "b4"
...
}
}
* The last layer is a [Softmax loss layer](layer.html#softmaxloss)
layer{
name: "loss"
type: kSoftmaxLoss
softmaxloss_conf{ topk:1 }
srclayers:"ip1"
srclayers: "data"
}
### Updater
The [normal SGD updater](updater.html#updater) is selected.
The learning rate is changed like going down stairs, and is configured using the
[kFixedStep](updater.html#kfixedstep) type.
updater{
type: kSGD
weight_decay:0.004
learning_rate {
type: kFixedStep
fixedstep_conf:{
step:0 # lr for step 0-60000 is 0.001
step:60000 # lr for step 60000-65000 is 0.0001
step:65000 # lr for step 650000- is 0.00001
step_lr:0.001
step_lr:0.0001
step_lr:0.00001
}
}
}
### TrainOneBatch algorithm
The CNN model is a feed forward model, thus should be configured to use the
[Back-propagation algorithm](train-one-batch.html#back-propagation).
train_one_batch {
alg: kBP
}
### Cluster setting
The following configuration set a single worker and server for training.
[Training frameworks](frameworks.html) page introduces configurations of a couple of distributed
training frameworks.
cluster {
nworker_groups: 1
nserver_groups: 1
}