blob: 1f89a3a3c4195db8e38c868b90b2f46fefdc2e41 [file] [log] [blame]
# MLP Example
---
Multilayer perceptron (MLP) is a subclass of feed-forward neural networks.
A MLP typically consists of multiple directly connected layers, with each layer fully
connected to the next one. In this example, we will use SINGA to train a
[simple MLP model proposed by Ciresan](http://arxiv.org/abs/1003.0358)
for classifying handwritten digits from the [MNIST dataset](http://yann.lecun.com/exdb/mnist/).
## Running instructions
Please refer to the [installation](installation.html) page for
instructions on building SINGA, and the [quick start](quick-start.html)
for instructions on starting zookeeper.
We have provided scripts for preparing the training and test dataset in *examples/cifar10/*.
# in examples/mnist
$ cp Makefile.example Makefile
$ make download
$ make create
### Training on CPU
After the datasets are prepared, we start the training by
./bin/singa-run.sh -conf examples/mnist/job.conf
After it is started, you should see output like
Record job information to /tmp/singa-log/job-info/job-1-20150817-055231
Executing : ./singa -conf /xxx/incubator-singa/examples/mnist/job.conf -singa_conf /xxx/incubator-singa/conf/singa.conf -singa_job 1
E0817 07:15:09.211885 34073 cluster.cc:51] proc #0 -> 192.168.5.128:49152 (pid = 34073)
E0817 07:15:14.972231 34114 server.cc:36] Server (group = 0, id = 0) start
E0817 07:15:14.972520 34115 worker.cc:134] Worker (group = 0, id = 0) start
E0817 07:15:24.462602 34073 trainer.cc:373] Test step-0, loss : 2.341021, accuracy : 0.109100
E0817 07:15:47.341076 34073 trainer.cc:373] Train step-0, loss : 2.357269, accuracy : 0.099000
E0817 07:16:07.173364 34073 trainer.cc:373] Train step-10, loss : 2.222740, accuracy : 0.201800
E0817 07:16:26.714855 34073 trainer.cc:373] Train step-20, loss : 2.091030, accuracy : 0.327200
E0817 07:16:46.590946 34073 trainer.cc:373] Train step-30, loss : 1.969412, accuracy : 0.442100
E0817 07:17:06.207080 34073 trainer.cc:373] Train step-40, loss : 1.865466, accuracy : 0.514800
E0817 07:17:25.890033 34073 trainer.cc:373] Train step-50, loss : 1.773849, accuracy : 0.569100
E0817 07:17:51.208935 34073 trainer.cc:373] Test step-60, loss : 1.613709, accuracy : 0.662100
E0817 07:17:53.176766 34073 trainer.cc:373] Train step-60, loss : 1.659150, accuracy : 0.652600
E0817 07:18:12.783370 34073 trainer.cc:373] Train step-70, loss : 1.574024, accuracy : 0.666000
E0817 07:18:32.904942 34073 trainer.cc:373] Train step-80, loss : 1.529380, accuracy : 0.670500
E0817 07:18:52.608111 34073 trainer.cc:373] Train step-90, loss : 1.443911, accuracy : 0.703500
E0817 07:19:12.168465 34073 trainer.cc:373] Train step-100, loss : 1.387759, accuracy : 0.721000
E0817 07:19:31.855865 34073 trainer.cc:373] Train step-110, loss : 1.335246, accuracy : 0.736500
E0817 07:19:57.327133 34073 trainer.cc:373] Test step-120, loss : 1.216652, accuracy : 0.769900
After the training of some steps (depends on the setting) or the job is
finished, SINGA will [checkpoint](checkpoint.html) the model parameters.
### Training on GPU
To train this example model on GPU, just add a field in the configuration file for
the GPU device,
# job.conf
gpu: 0
### Training using Python script
The python helpers come with SINGA 0.2 make it easy to configure the job. For example
the job.conf is replaced with a simple python script mnist_mlp.py
which has about 30 lines of code following the [Keras API](http://keras.io/).
./bin/singa-run.sh -exec tool/python/examples/mnist_mlp.py
## Details
To train a model in SINGA, you need to prepare the datasets,
and a job configuration which specifies the neural net structure, training
algorithm (BP or CD), SGD update algorithm (e.g. Adagrad),
number of training/test steps, etc.
### Data preparation
Before using SINGA, you need to write a program to pre-process the dataset you
use to a format that SINGA can read. Please refer to the
[Data Preparation](data.html) to get details about preparing
this MNIST dataset.
### Neural net
<div style = "text-align: center">
<img src = "../_static/images/example-mlp.png" style = "width: 230px">
<br/><strong>Figure 1 - Net structure of the MLP example. </strong></img>
</div>
Figure 1 shows the structure of the simple MLP model, which is constructed following
[Ciresan's paper](http://arxiv.org/abs/1003.0358). The dashed circle contains
two layers which represent one feature transformation stage. There are 6 such
stages in total. They sizes of the [InnerProductLayer](layer.html#innerproductlayer)s in these circles decrease from
2500->2000->1500->1000->500->10.
Next we follow the guide in [neural net page](neural-net.html)
and [layer page](layer.html) to write the neural net configuration.
* We configure an input layer to read the training/testing records from a disk file.
layer {
name: "data"
type: kRecordInput
store_conf {
backend: "kvfile"
path: "examples/mnist/train_data.bin"
random_skip: 5000
batchsize: 64
shape: 784
std_value: 127.5
mean_value: 127.5
}
exclude: kTest
}
layer {
name: "data"
type: kRecordInput
store_conf {
backend: "kvfile"
path: "examples/mnist/test_data.bin"
batchsize: 100
shape: 784
std_value: 127.5
mean_value: 127.5
}
exclude: kTrain
}
* All [InnerProductLayer](layer.html#innerproductlayer)s are configured similarly as,
layer{
name: "fc1"
type: kInnerProduct
srclayers:"data"
innerproduct_conf{
num_output: 2500
}
param{
name: "w1"
...
}
param{
name: "b1"
..
}
}
with the `num_output` decreasing from 2500 to 10.
* A [STanhLayer](layer.html#stanhlayer) is connected to every InnerProductLayer
except the last one. It transforms the feature via scaled tanh function.
layer{
name: "tanh1"
type: kSTanh
srclayers:"fc1"
}
* The final [Softmax loss layer](layer.html#softmaxloss) connects
to LabelLayer and the last STanhLayer.
layer{
name: "loss"
type:kSoftmaxLoss
softmaxloss_conf{ topk:1 }
srclayers:"fc6"
srclayers:"data"
}
### Updater
The [normal SGD updater](updater.html#updater) is selected.
The learning rate shrinks by 0.997 every 60 steps (i.e., one epoch).
updater{
type: kSGD
learning_rate{
base_lr: 0.001
type : kStep
step_conf{
change_freq: 60
gamma: 0.997
}
}
}
### TrainOneBatch algorithm
The MLP model is a feed-forward model, hence
[Back-propagation algorithm](train-one-batch#back-propagation)
is selected.
train_one_batch {
alg: kBP
}
### Cluster setting
The following configuration set a single worker and server for training.
[Training frameworks](frameworks.html) page introduces configurations of a couple of distributed
training frameworks.
cluster {
nworker_groups: 1
nserver_groups: 1
}