| # MLP Example |
| |
| --- |
| |
| Multilayer perceptron (MLP) is a subclass of feed-forward neural networks. |
| A MLP typically consists of multiple directly connected layers, with each layer fully |
| connected to the next one. In this example, we will use SINGA to train a |
| [simple MLP model proposed by Ciresan](http://arxiv.org/abs/1003.0358) |
| for classifying handwritten digits from the [MNIST dataset](http://yann.lecun.com/exdb/mnist/). |
| |
| ## Running instructions |
| |
| Please refer to the [installation](installation.html) page for |
| instructions on building SINGA, and the [quick start](quick-start.html) |
| for instructions on starting zookeeper. |
| |
| We have provided scripts for preparing the training and test dataset in *examples/cifar10/*. |
| |
| # in examples/mnist |
| $ cp Makefile.example Makefile |
| $ make download |
| $ make create |
| |
| ### Training on CPU |
| |
| After the datasets are prepared, we start the training by |
| |
| ./bin/singa-run.sh -conf examples/mnist/job.conf |
| |
| After it is started, you should see output like |
| |
| Record job information to /tmp/singa-log/job-info/job-1-20150817-055231 |
| Executing : ./singa -conf /xxx/incubator-singa/examples/mnist/job.conf -singa_conf /xxx/incubator-singa/conf/singa.conf -singa_job 1 |
| E0817 07:15:09.211885 34073 cluster.cc:51] proc #0 -> 192.168.5.128:49152 (pid = 34073) |
| E0817 07:15:14.972231 34114 server.cc:36] Server (group = 0, id = 0) start |
| E0817 07:15:14.972520 34115 worker.cc:134] Worker (group = 0, id = 0) start |
| E0817 07:15:24.462602 34073 trainer.cc:373] Test step-0, loss : 2.341021, accuracy : 0.109100 |
| E0817 07:15:47.341076 34073 trainer.cc:373] Train step-0, loss : 2.357269, accuracy : 0.099000 |
| E0817 07:16:07.173364 34073 trainer.cc:373] Train step-10, loss : 2.222740, accuracy : 0.201800 |
| E0817 07:16:26.714855 34073 trainer.cc:373] Train step-20, loss : 2.091030, accuracy : 0.327200 |
| E0817 07:16:46.590946 34073 trainer.cc:373] Train step-30, loss : 1.969412, accuracy : 0.442100 |
| E0817 07:17:06.207080 34073 trainer.cc:373] Train step-40, loss : 1.865466, accuracy : 0.514800 |
| E0817 07:17:25.890033 34073 trainer.cc:373] Train step-50, loss : 1.773849, accuracy : 0.569100 |
| E0817 07:17:51.208935 34073 trainer.cc:373] Test step-60, loss : 1.613709, accuracy : 0.662100 |
| E0817 07:17:53.176766 34073 trainer.cc:373] Train step-60, loss : 1.659150, accuracy : 0.652600 |
| E0817 07:18:12.783370 34073 trainer.cc:373] Train step-70, loss : 1.574024, accuracy : 0.666000 |
| E0817 07:18:32.904942 34073 trainer.cc:373] Train step-80, loss : 1.529380, accuracy : 0.670500 |
| E0817 07:18:52.608111 34073 trainer.cc:373] Train step-90, loss : 1.443911, accuracy : 0.703500 |
| E0817 07:19:12.168465 34073 trainer.cc:373] Train step-100, loss : 1.387759, accuracy : 0.721000 |
| E0817 07:19:31.855865 34073 trainer.cc:373] Train step-110, loss : 1.335246, accuracy : 0.736500 |
| E0817 07:19:57.327133 34073 trainer.cc:373] Test step-120, loss : 1.216652, accuracy : 0.769900 |
| |
| After the training of some steps (depends on the setting) or the job is |
| finished, SINGA will [checkpoint](checkpoint.html) the model parameters. |
| |
| ### Training on GPU |
| |
| To train this example model on GPU, just add a field in the configuration file for |
| the GPU device, |
| |
| # job.conf |
| gpu: 0 |
| |
| ### Training using Python script |
| |
| The python helpers come with SINGA 0.2 make it easy to configure the job. For example |
| the job.conf is replaced with a simple python script mnist_mlp.py |
| which has about 30 lines of code following the [Keras API](http://keras.io/). |
| |
| ./bin/singa-run.sh -exec tool/python/examples/mnist_mlp.py |
| |
| |
| |
| ## Details |
| |
| To train a model in SINGA, you need to prepare the datasets, |
| and a job configuration which specifies the neural net structure, training |
| algorithm (BP or CD), SGD update algorithm (e.g. Adagrad), |
| number of training/test steps, etc. |
| |
| ### Data preparation |
| |
| Before using SINGA, you need to write a program to pre-process the dataset you |
| use to a format that SINGA can read. Please refer to the |
| [Data Preparation](data.html) to get details about preparing |
| this MNIST dataset. |
| |
| |
| ### Neural net |
| |
| <div style = "text-align: center"> |
| <img src = "../_static/images/example-mlp.png" style = "width: 230px"> |
| <br/><strong>Figure 1 - Net structure of the MLP example. </strong></img> |
| </div> |
| |
| |
| Figure 1 shows the structure of the simple MLP model, which is constructed following |
| [Ciresan's paper](http://arxiv.org/abs/1003.0358). The dashed circle contains |
| two layers which represent one feature transformation stage. There are 6 such |
| stages in total. They sizes of the [InnerProductLayer](layer.html#innerproductlayer)s in these circles decrease from |
| 2500->2000->1500->1000->500->10. |
| |
| Next we follow the guide in [neural net page](neural-net.html) |
| and [layer page](layer.html) to write the neural net configuration. |
| |
| * We configure an input layer to read the training/testing records from a disk file. |
| |
| layer { |
| name: "data" |
| type: kRecordInput |
| store_conf { |
| backend: "kvfile" |
| path: "examples/mnist/train_data.bin" |
| random_skip: 5000 |
| batchsize: 64 |
| shape: 784 |
| std_value: 127.5 |
| mean_value: 127.5 |
| } |
| exclude: kTest |
| } |
| |
| layer { |
| name: "data" |
| type: kRecordInput |
| store_conf { |
| backend: "kvfile" |
| path: "examples/mnist/test_data.bin" |
| batchsize: 100 |
| shape: 784 |
| std_value: 127.5 |
| mean_value: 127.5 |
| } |
| exclude: kTrain |
| } |
| |
| |
| * All [InnerProductLayer](layer.html#innerproductlayer)s are configured similarly as, |
| |
| layer{ |
| name: "fc1" |
| type: kInnerProduct |
| srclayers:"data" |
| innerproduct_conf{ |
| num_output: 2500 |
| } |
| param{ |
| name: "w1" |
| ... |
| } |
| param{ |
| name: "b1" |
| .. |
| } |
| } |
| |
| with the `num_output` decreasing from 2500 to 10. |
| |
| * A [STanhLayer](layer.html#stanhlayer) is connected to every InnerProductLayer |
| except the last one. It transforms the feature via scaled tanh function. |
| |
| layer{ |
| name: "tanh1" |
| type: kSTanh |
| srclayers:"fc1" |
| } |
| |
| * The final [Softmax loss layer](layer.html#softmaxloss) connects |
| to LabelLayer and the last STanhLayer. |
| |
| layer{ |
| name: "loss" |
| type:kSoftmaxLoss |
| softmaxloss_conf{ topk:1 } |
| srclayers:"fc6" |
| srclayers:"data" |
| } |
| |
| ### Updater |
| |
| The [normal SGD updater](updater.html#updater) is selected. |
| The learning rate shrinks by 0.997 every 60 steps (i.e., one epoch). |
| |
| updater{ |
| type: kSGD |
| learning_rate{ |
| base_lr: 0.001 |
| type : kStep |
| step_conf{ |
| change_freq: 60 |
| gamma: 0.997 |
| } |
| } |
| } |
| |
| ### TrainOneBatch algorithm |
| |
| The MLP model is a feed-forward model, hence |
| [Back-propagation algorithm](train-one-batch#back-propagation) |
| is selected. |
| |
| train_one_batch { |
| alg: kBP |
| } |
| |
| ### Cluster setting |
| |
| The following configuration set a single worker and server for training. |
| [Training frameworks](frameworks.html) page introduces configurations of a couple of distributed |
| training frameworks. |
| |
| cluster { |
| nworker_groups: 1 |
| nserver_groups: 1 |
| } |