content/v0.3.0/en/_sources/docs/cnn.txt - singa-site - Git at Google

 # CNN Example

 ---

 Convolutional neural network (CNN) is a type of feed-forward artificial neural
 network widely used for image and video classification. In this example, we will
 use a deep CNN model to do image classification for the
 [CIFAR10 dataset](http://www.cs.toronto.edu/~kriz/cifar.html).


 ## Running instructions

 Please refer to the [installation](installation.html) page for
 instructions on building SINGA, and the [quick start](quick-start.html)
 for instructions on starting zookeeper.

 We have provided scripts for preparing the training and test dataset in *examples/cifar10/*.

     # in examples/cifar10
     $ cp Makefile.example Makefile
     $ make download
     $ make create


 ### Training on CPU

 We can start the training by

     ./bin/singa-run.sh -conf examples/cifar10/job.conf

 You should see output like

     Record job information to /tmp/singa-log/job-info/job-2-20150817-055601
     Executing : ./singa -conf /xxx/incubator-singa/examples/cifar10/job.conf -singa_conf /xxx/incubator-singa/conf/singa.conf -singa_job 2
     E0817 06:56:18.868259 33849 cluster.cc:51] proc #0 -> 192.168.5.128:49152 (pid = 33849)
     E0817 06:56:18.928452 33871 server.cc:36] Server (group = 0, id = 0) start
     E0817 06:56:18.928469 33872 worker.cc:134] Worker (group = 0, id = 0) start
     E0817 06:57:13.657302 33849 trainer.cc:373] Test step-0, loss : 2.302588, accuracy : 0.077900
     E0817 06:57:17.626708 33849 trainer.cc:373] Train step-0, loss : 2.302578, accuracy : 0.062500
     E0817 06:57:24.142645 33849 trainer.cc:373] Train step-30, loss : 2.302404, accuracy : 0.131250
     E0817 06:57:30.813354 33849 trainer.cc:373] Train step-60, loss : 2.302248, accuracy : 0.156250
     E0817 06:57:37.556655 33849 trainer.cc:373] Train step-90, loss : 2.301849, accuracy : 0.175000
     E0817 06:57:44.971276 33849 trainer.cc:373] Train step-120, loss : 2.301077, accuracy : 0.137500
     E0817 06:57:51.801949 33849 trainer.cc:373] Train step-150, loss : 2.300410, accuracy : 0.135417
     E0817 06:57:58.682281 33849 trainer.cc:373] Train step-180, loss : 2.300067, accuracy : 0.127083
     E0817 06:58:05.578366 33849 trainer.cc:373] Train step-210, loss : 2.300143, accuracy : 0.154167
     E0817 06:58:12.518497 33849 trainer.cc:373] Train step-240, loss : 2.295912, accuracy : 0.185417

 After training some steps (depends on the setting) or the job is
 finished, SINGA will [checkpoint](checkpoint.html) the model parameters.

 ### Training on GPU

 Since version 0.2, we can train CNN models on GPU using cuDNN. Please refer to
 the [GPU page](gpu.html) for details on compiling SINGA with GPU and cuDNN.
 The configuration file is similar to that for CPU training, except that the
 cuDNN layers are used and the GPU device is configured.

     ./bin/singa-run.sh -conf examples/cifar10/cudnn.conf

 ### Training using Python script

 The python helpers coming with SINGA 0.2 make it easy to configure a training
 job. For example the *job.conf* is replaced with a simple python script
 *mnist_mlp.py* which has about 30 lines of code following the [Keras API](http://keras.io/).

       # on CPU
     ./bin/singa-run.sh -exec tool/python/examples/cifar10_cnn.py
       # on GPU
     ./bin/singa-run.sh -exec tool/python/examples/cifar10_cnn_cudnn.py

 ## Details

 To train a model in SINGA, you need to prepare the datasets,
 and a job configuration which specifies the neural net structure, training
 algorithm (BP or CD), SGD update algorithm (e.g. Adagrad),
 number of training/test steps, etc.

 ### Data preparation

 Before using SINGA, you need to write a program to convert the dataset
 into a format that SINGA can read. Please refer to the
 [Data Preparation](data.html#example---cifar-dataset) to get details about
 preparing this CIFAR10 dataset.

 ### Neural net

 Figure 1 shows the net structure of the CNN model we used in this example, which is
 set following [Alex](https://code.google.com/p/cuda-convnet/source/browse/trunk/example-layers/layers-18pct.cfg.)
 The dashed circle represents one feature transformation stage, which generally
 has four layers as shown in the figure. Sometimes the rectifier layer and normalization layer
 are omitted or swapped in one stage. For this example, there are 3 such stages.

 Next we follow the guide in [neural net page](neural-net.html)
 and [layer page](layer.html) to write the neural net configuration.

 <div style = "text-align: center">
 <img src = "../_static/images/example-cnn.png" style = "width: 200px"> <br/>
 <strong>Figure 1 - Net structure of the CNN example.</strong></img>
 </div>

 * We configure an input layer to read the training/testing records from a disk file.

         layer{
           name: "data"
           type: kRecordInput
           store_conf {
             backend: "kvfile"
             path: "examples/cifar10/train_data.bin"
             mean_file: "examples/cifar10/image_mean.bin"
             batchsize: 64
             random_skip: 5000
             shape: 3
             shape: 32
             shape: 32
            }
            exclude: kTest  # exclude this layer for the testing net
         }
         layer{
           name: "data"
           type: kRecordInput
           store_conf {
             backend: "kvfile"
             path: "examples/cifar10/test_data.bin"
             mean_file: "examples/cifar10/image_mean.bin"
             batchsize: 100
             shape: 3
             shape: 32
             shape: 32
            }
          exclude: kTrain # exclude this layer for the training net
         }


 * We configure layers for the feature transformation as follows
 (all layers are built-in layers in SINGA; hyper-parameters of these layers are set according to
 [Alex's setting](https://code.google.com/p/cuda-convnet/source/browse/trunk/example-layers/layers-18pct.cfg)).

         layer {
           name: "conv1"
           type: kConvolution
           srclayers: "data"
           convolution_conf {... }
           ...
         }
         layer {
           name: "pool1"
           type: kPooling
           srclayers: "conv1"
           pooling_conf {... }
         }
         layer {
           name: "relu1"
           type: kReLU
           srclayers:"pool1"
         }
         layer {
           name: "norm1"
           type: kLRN
           lrn_conf {... }
           srclayers:"relu1"
         }

   The configurations for another 2 stages are omitted here.

 * There is an [inner product layer](layer.html#innerproductlayer)
 after the 3 transformation stages, which is
 configured with 10 output units, i.e., the number of total labels. The weight
 matrix Param is configured with a large weight decay scale to reduce the over-fitting.

         layer {
           name: "ip1"
           type: kInnerProduct
           srclayers:"pool3"
           innerproduct_conf {
             num_output: 10
           }
           param {
             name: "w4"
             wd_scale:250
             ...
           }
           param {
             name: "b4"
             ...
           }
         }

 * The last layer is a [Softmax loss layer](layer.html#softmaxloss)

         layer{
           name: "loss"
           type: kSoftmaxLoss
           softmaxloss_conf{ topk:1 }
           srclayers:"ip1"
           srclayers: "data"
         }

 ### Updater

 The [normal SGD updater](updater.html#updater) is selected.
 The learning rate is changed like going down stairs, and is configured using the
 [kFixedStep](updater.html#kfixedstep) type.

         updater{
           type: kSGD
           weight_decay:0.004
           learning_rate {
             type: kFixedStep
             fixedstep_conf:{
               step:0             # lr for step 0-60000 is 0.001
               step:60000         # lr for step 60000-65000 is 0.0001
               step:65000         # lr for step 650000- is 0.00001
               step_lr:0.001
               step_lr:0.0001
               step_lr:0.00001
             }
           }
         }

 ### TrainOneBatch algorithm

 The CNN model is a feed forward model, thus should be configured to use the
 [Back-propagation algorithm](train-one-batch.html#back-propagation).

     train_one_batch {
       alg: kBP
     }

 ### Cluster setting

 The following configuration set a single worker and server for training.
 [Training frameworks](frameworks.html) page introduces configurations of a couple of distributed
 training frameworks.

     cluster {
       nworker_groups: 1
       nserver_groups: 1
     }
	# CNN Example

	---

	Convolutional neural network (CNN) is a type of feed-forward artificial neural
	network widely used for image and video classification. In this example, we will
	use a deep CNN model to do image classification for the
	[CIFAR10 dataset](http://www.cs.toronto.edu/~kriz/cifar.html).


	## Running instructions

	Please refer to the [installation](installation.html) page for
	instructions on building SINGA, and the [quick start](quick-start.html)
	for instructions on starting zookeeper.

	We have provided scripts for preparing the training and test dataset in examples/cifar10/.

	# in examples/cifar10
	$ cp Makefile.example Makefile
	$ make download
	$ make create


	### Training on CPU

	We can start the training by

	./bin/singa-run.sh -conf examples/cifar10/job.conf

	You should see output like

	Record job information to /tmp/singa-log/job-info/job-2-20150817-055601
	Executing : ./singa -conf /xxx/incubator-singa/examples/cifar10/job.conf -singa_conf /xxx/incubator-singa/conf/singa.conf -singa_job 2
	E0817 06:56:18.868259 33849 cluster.cc:51] proc #0 -> 192.168.5.128:49152 (pid = 33849)
	E0817 06:56:18.928452 33871 server.cc:36] Server (group = 0, id = 0) start
	E0817 06:56:18.928469 33872 worker.cc:134] Worker (group = 0, id = 0) start
	E0817 06:57:13.657302 33849 trainer.cc:373] Test step-0, loss : 2.302588, accuracy : 0.077900
	E0817 06:57:17.626708 33849 trainer.cc:373] Train step-0, loss : 2.302578, accuracy : 0.062500
	E0817 06:57:24.142645 33849 trainer.cc:373] Train step-30, loss : 2.302404, accuracy : 0.131250
	E0817 06:57:30.813354 33849 trainer.cc:373] Train step-60, loss : 2.302248, accuracy : 0.156250
	E0817 06:57:37.556655 33849 trainer.cc:373] Train step-90, loss : 2.301849, accuracy : 0.175000
	E0817 06:57:44.971276 33849 trainer.cc:373] Train step-120, loss : 2.301077, accuracy : 0.137500
	E0817 06:57:51.801949 33849 trainer.cc:373] Train step-150, loss : 2.300410, accuracy : 0.135417
	E0817 06:57:58.682281 33849 trainer.cc:373] Train step-180, loss : 2.300067, accuracy : 0.127083
	E0817 06:58:05.578366 33849 trainer.cc:373] Train step-210, loss : 2.300143, accuracy : 0.154167
	E0817 06:58:12.518497 33849 trainer.cc:373] Train step-240, loss : 2.295912, accuracy : 0.185417

	After training some steps (depends on the setting) or the job is
	finished, SINGA will [checkpoint](checkpoint.html) the model parameters.

	### Training on GPU

	Since version 0.2, we can train CNN models on GPU using cuDNN. Please refer to
	the [GPU page](gpu.html) for details on compiling SINGA with GPU and cuDNN.
	The configuration file is similar to that for CPU training, except that the
	cuDNN layers are used and the GPU device is configured.

	./bin/singa-run.sh -conf examples/cifar10/cudnn.conf

	### Training using Python script

	The python helpers coming with SINGA 0.2 make it easy to configure a training
	job. For example the job.conf is replaced with a simple python script
	mnist_mlp.py which has about 30 lines of code following the [Keras API](http://keras.io/).

	# on CPU
	./bin/singa-run.sh -exec tool/python/examples/cifar10_cnn.py
	# on GPU
	./bin/singa-run.sh -exec tool/python/examples/cifar10_cnn_cudnn.py

	## Details

	To train a model in SINGA, you need to prepare the datasets,
	and a job configuration which specifies the neural net structure, training
	algorithm (BP or CD), SGD update algorithm (e.g. Adagrad),
	number of training/test steps, etc.

	### Data preparation

	Before using SINGA, you need to write a program to convert the dataset
	into a format that SINGA can read. Please refer to the
	[Data Preparation](data.html#example---cifar-dataset) to get details about
	preparing this CIFAR10 dataset.

	### Neural net

	Figure 1 shows the net structure of the CNN model we used in this example, which is
	set following [Alex](https://code.google.com/p/cuda-convnet/source/browse/trunk/example-layers/layers-18pct.cfg.)
	The dashed circle represents one feature transformation stage, which generally
	has four layers as shown in the figure. Sometimes the rectifier layer and normalization layer
	are omitted or swapped in one stage. For this example, there are 3 such stages.

	Next we follow the guide in [neural net page](neural-net.html)
	and [layer page](layer.html) to write the neural net configuration.

	<div style = "text-align: center">
	<img src = "../_static/images/example-cnn.png" style = "width: 200px"> <br/>
	<strong>Figure 1 - Net structure of the CNN example.</strong></img>
	</div>

	* We configure an input layer to read the training/testing records from a disk file.

	layer{
	name: "data"
	type: kRecordInput
	store_conf {
	backend: "kvfile"
	path: "examples/cifar10/train_data.bin"
	mean_file: "examples/cifar10/image_mean.bin"
	batchsize: 64
	random_skip: 5000
	shape: 3
	shape: 32
	shape: 32
	}
	exclude: kTest # exclude this layer for the testing net
	}
	layer{
	name: "data"
	type: kRecordInput
	store_conf {
	backend: "kvfile"
	path: "examples/cifar10/test_data.bin"
	mean_file: "examples/cifar10/image_mean.bin"
	batchsize: 100
	shape: 3
	shape: 32
	shape: 32
	}
	exclude: kTrain # exclude this layer for the training net
	}


	* We configure layers for the feature transformation as follows
	(all layers are built-in layers in SINGA; hyper-parameters of these layers are set according to
	[Alex's setting](https://code.google.com/p/cuda-convnet/source/browse/trunk/example-layers/layers-18pct.cfg)).

	layer {
	name: "conv1"
	type: kConvolution
	srclayers: "data"
	convolution_conf {... }
	...
	}
	layer {
	name: "pool1"
	type: kPooling
	srclayers: "conv1"
	pooling_conf {... }
	}
	layer {
	name: "relu1"
	type: kReLU
	srclayers:"pool1"
	}
	layer {
	name: "norm1"
	type: kLRN
	lrn_conf {... }
	srclayers:"relu1"
	}

	The configurations for another 2 stages are omitted here.

	* There is an [inner product layer](layer.html#innerproductlayer)
	after the 3 transformation stages, which is
	configured with 10 output units, i.e., the number of total labels. The weight
	matrix Param is configured with a large weight decay scale to reduce the over-fitting.

	layer {
	name: "ip1"
	type: kInnerProduct
	srclayers:"pool3"
	innerproduct_conf {
	num_output: 10
	}
	param {
	name: "w4"
	wd_scale:250
	...
	}
	param {
	name: "b4"
	...
	}
	}

	* The last layer is a [Softmax loss layer](layer.html#softmaxloss)

	layer{
	name: "loss"
	type: kSoftmaxLoss
	softmaxloss_conf{ topk:1 }
	srclayers:"ip1"
	srclayers: "data"
	}

	### Updater

	The [normal SGD updater](updater.html#updater) is selected.
	The learning rate is changed like going down stairs, and is configured using the
	[kFixedStep](updater.html#kfixedstep) type.

	updater{
	type: kSGD
	weight_decay:0.004
	learning_rate {
	type: kFixedStep
	fixedstep_conf:{
	step:0 # lr for step 0-60000 is 0.001
	step:60000 # lr for step 60000-65000 is 0.0001
	step:65000 # lr for step 650000- is 0.00001
	step_lr:0.001
	step_lr:0.0001
	step_lr:0.00001
	}
	}
	}

	### TrainOneBatch algorithm

	The CNN model is a feed forward model, thus should be configured to use the
	[Back-propagation algorithm](train-one-batch.html#back-propagation).

	train_one_batch {
	alg: kBP
	}

	### Cluster setting

	The following configuration set a single worker and server for training.
	[Training frameworks](frameworks.html) page introduces configurations of a couple of distributed
	training frameworks.

	cluster {
	nworker_groups: 1
	nserver_groups: 1
	}