content/v0.3.0/en/_sources/docs/rbm.txt - singa-site - Git at Google

 # RBM Example

 ---

 This example uses SINGA to train 4 RBM models and one auto-encoder model over the
 [MNIST dataset](http://yann.lecun.com/exdb/mnist/). The auto-encoder model is trained
 to reduce the dimensionality of the MNIST image feature. The RBM models are trained
 to initialize parameters of the auto-encoder model. This example application is
 from [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf).

 ## Running instructions

 Running scripts are provided in *SINGA_ROOT/examples/rbm* folder.

 The MNIST dataset has 70,000 handwritten digit images. The
 [data preparation](data.html) page
 has details on converting this dataset into SINGA recognizable format. Users can
 simply run the following commands to download and convert the dataset.

     # at SINGA_ROOT/examples/mnist/
     $ cp Makefile.example Makefile
     $ make download
     $ make create

 The training is separated into two phases, namely pre-training and fine-tuning.
 The pre-training phase trains 4 RBMs in sequence,

     # at SINGA_ROOT/
     $ ./bin/singa-run.sh -conf examples/rbm/rbm1.conf
     $ ./bin/singa-run.sh -conf examples/rbm/rbm2.conf
     $ ./bin/singa-run.sh -conf examples/rbm/rbm3.conf
     $ ./bin/singa-run.sh -conf examples/rbm/rbm4.conf

 The fine-tuning phase trains the auto-encoder by,

     $ ./bin/singa-run.sh -conf examples/rbm/autoencoder.conf


 ## Training details

 ### RBM1

 <img src="../_static/images/example-rbm1.png" align="center" width="200px"/>
 <span><strong>Figure 1 - RBM1.</strong></span>

 The neural net structure for training RBM1 is shown in Figure 1.
 The data layer and parser layer provides features for training RBM1.
 The visible layer (connected with parser layer) of RBM1 accepts the image feature
 (784 dimension). The hidden layer is set to have 1000 neurons (units).
 These two layers are configured as,

     layer{
       name: "RBMVis"
       type: kRBMVis
       srclayers:"mnist"
       srclayers:"RBMHid"
       rbm_conf{
         hdim: 1000
       }
       param{
         name: "w1"
         init{
           type: kGaussian
           mean: 0.0
           std: 0.1
         }
       }
       param{
         name: "b11"
         init{
           type: kConstant
           value: 0.0
         }
       }
     }

     layer{
       name: "RBMHid"
       type: kRBMHid
       srclayers:"RBMVis"
       rbm_conf{
         hdim: 1000
       }
       param{
         name: "w1_"
         share_from: "w1"
       }
       param{
         name: "b12"
         init{
           type: kConstant
           value: 0.0
         }
       }
     }


 For RBM, the weight matrix is shared by the visible and hidden layers. For instance,
 `w1` is shared by `vis` and `hid` layers shown in Figure 1. In SINGA, we can configure
 the `share_from` field to enable [parameter sharing](param.html)
 as shown above for the param `w1` and `w1_`.

 [Contrastive Divergence](train-one-batch.html#contrastive-divergence)
 is configured as the algorithm for [TrainOneBatch](train-one-batch.html).
 Following Hinton's paper, we configure the [updating protocol](updater.html)
 as follows,

     # Updater Configuration
     updater{
       type: kSGD
       momentum: 0.2
       weight_decay: 0.0002
       learning_rate{
         base_lr: 0.1
         type: kFixed
       }
     }

 Since the parameters of RBM0 will be used to initialize the auto-encoder, we should
 configure the `workspace` field to specify a path for the checkpoint folder.
 For example, if we configure it as,

     cluster {
       workspace: "examples/rbm/rbm1/"
     }

 Then SINGA will [checkpoint the parameters](checkpoint.html) into *examples/rbm/rbm1/*.

 ### RBM1
 <img src="../_static/images/example-rbm2.png" align="center" width="200px"/>
 <span><strong>Figure 2 - RBM2.</strong></span>

 Figure 2 shows the net structure of training RBM2.
 The visible units of RBM2 accept the output from the Sigmoid1 layer. The Inner1 layer
 is a  `InnerProductLayer` whose parameters are set to the `w1` and `b12` learned
 from RBM1.
 The neural net configuration is (with layers for data layer and parser layer omitted).

     layer{
       name: "Inner1"
       type: kInnerProduct
       srclayers:"mnist"
       innerproduct_conf{
         num_output: 1000
       }
       param{ name: "w1" }
       param{ name: "b12"}
     }

     layer{
       name: "Sigmoid1"
       type: kSigmoid
       srclayers:"Inner1"
     }

     layer{
       name: "RBMVis"
       type: kRBMVis
       srclayers:"Sigmoid1"
       srclayers:"RBMHid"
       rbm_conf{
         hdim: 500
       }
       param{
         name: "w2"
         ...
       }
       param{
         name: "b21"
         ...
       }
     }

     layer{
       name: "RBMHid"
       type: kRBMHid
       srclayers:"RBMVis"
       rbm_conf{
         hdim: 500
       }
       param{
         name: "w2_"
         share_from: "w2"
       }
       param{
         name: "b22"
         ...
       }
     }

 To load w0 and b02 from RBM0's checkpoint file, we configure the `checkpoint_path` as,

     checkpoint_path: "examples/rbm/rbm1/checkpoint/step6000-worker0"
     cluster{
       workspace: "examples/rbm/rbm2"
     }

 The workspace is changed for checkpointing `w2`, `b21` and `b22` into
 *examples/rbm/rbm2/*.

 ### RBM3

 <img src="../_static/images/example-rbm3.png" align="center" width="200px"/>
 <span><strong>Figure 3 - RBM3.</strong></span>

 Figure 3 shows the net structure of training RBM3. In this model, a layer with
 250 units is added as the hidden layer of RBM3. The visible units of RBM3
 accepts output from Sigmoid2 layer. Parameters of Inner1 and Innner2 are set to
 `w1,b12,w2,b22` which can be load from the checkpoint file of RBM2,
 i.e., "examples/rbm/rbm2/".

 ### RBM4


 <img src="../_static/images/example-rbm4.png" align="center" width="200px"/>
 <span><strong>Figure 4 - RBM4.</strong></span>

 Figure 4 shows the net structure of training RBM4. It is similar to Figure 3,
 but according to [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf), the hidden units of the
 top RBM (RBM4) have stochastic real-valued states drawn from a unit variance
 Gaussian whose mean is determined by the input from the RBM's logistic visible
 units. So we add a `gaussian` field in the RBMHid layer to control the
 sampling distribution (Gaussian or Bernoulli). In addition, this
 RBM has a much smaller learning rate (0.001).  The neural net configuration for
 the RBM4 and the updating protocol is (with layers for data layer and parser
 layer omitted),

     # Updater Configuration
     updater{
       type: kSGD
       momentum: 0.9
       weight_decay: 0.0002
       learning_rate{
         base_lr: 0.001
         type: kFixed
       }
     }

     layer{
       name: "RBMVis"
       type: kRBMVis
       srclayers:"Sigmoid3"
       srclayers:"RBMHid"
       rbm_conf{
         hdim: 30
       }
       param{
         name: "w4"
         ...
       }
       param{
         name: "b41"
         ...
       }
     }

     layer{
       name: "RBMHid"
       type: kRBMHid
       srclayers:"RBMVis"
       rbm_conf{
         hdim: 30
         gaussian: true
       }
       param{
         name: "w4_"
         share_from: "w4"
       }
       param{
         name: "b42"
         ...
       }
     }

 ### Auto-encoder
 In the fine-tuning stage, the 4 RBMs are "unfolded" to form encoder and decoder
 networks that are initialized using the parameters from the previous 4 RBMs.

 <img src="../_static/images/example-autoencoder.png" align="center" width="500px"/>
 <span><strong>Figure 5 - Auto-Encoders.</strong></span>


 Figure 5 shows the neural net structure for training the auto-encoder.
 [Back propagation (kBP)] (train-one-batch.html) is
 configured as the algorithm for `TrainOneBatch`. We use the same cluster
 configuration as RBM models. For updater, we use [AdaGrad](updater.html#adagradupdater) algorithm with
 fixed learning rate.

     ### Updater Configuration
     updater{
       type: kAdaGrad
       learning_rate{
       base_lr: 0.01
       type: kFixed
       }
     }


 According to [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf),
 we configure a EuclideanLoss layer to compute the reconstruction error. The neural net
 configuration is (with some of the middle layers omitted),

     layer{ name: "data" }
     layer{ name:"mnist" }
     layer{
       name: "Inner1"
       param{ name: "w1" }
       param{ name: "b12" }
     }
     layer{ name: "Sigmoid1" }
     ...
     layer{
       name: "Inner8"
       innerproduct_conf{
         num_output: 784
         transpose: true
       }
       param{
         name: "w8"
         share_from: "w1"
       }
       param{ name: "b11" }
     }
     layer{ name: "Sigmoid8" }

     # Euclidean Loss Layer Configuration
     layer{
       name: "loss"
       type:kEuclideanLoss
       srclayers:"Sigmoid8"
       srclayers:"mnist"
     }

 To load pre-trained parameters from the 4 RBMs' checkpoint file we configure `checkpoint_path` as

     ### Checkpoint Configuration
     checkpoint_path: "examples/rbm/checkpoint/rbm1/checkpoint/step6000-worker0"
     checkpoint_path: "examples/rbm/checkpoint/rbm2/checkpoint/step6000-worker0"
     checkpoint_path: "examples/rbm/checkpoint/rbm3/checkpoint/step6000-worker0"
     checkpoint_path: "examples/rbm/checkpoint/rbm4/checkpoint/step6000-worker0"


 ## Visualization Results

 <div>
 <img src="../_static/images/rbm-weight.PNG" align="center" width="300px"/>

 <img src="../_static/images/rbm-feature.PNG" align="center" width="300px"/>
 <br/>
 <span><strong>Figure 6 - Bottom RBM weight matrix.</strong></span>
 &nbsp;
 &nbsp;
 &nbsp;
 &nbsp;

 <span><strong>Figure 7 - Top layer features.</strong></span>
 </div>

 Figure 6 visualizes sample columns of the weight matrix of RBM1, We can see the
 Gabor-like filters are learned. Figure 7 depicts the features extracted from
 the top-layer of the auto-encoder, wherein one point represents one image.
 Different colors represent different digits. We can see that most images are
 well clustered according to the ground truth.
	# RBM Example

	---

	This example uses SINGA to train 4 RBM models and one auto-encoder model over the
	[MNIST dataset](http://yann.lecun.com/exdb/mnist/). The auto-encoder model is trained
	to reduce the dimensionality of the MNIST image feature. The RBM models are trained
	to initialize parameters of the auto-encoder model. This example application is
	from [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf).

	## Running instructions

	Running scripts are provided in SINGA_ROOT/examples/rbm folder.

	The MNIST dataset has 70,000 handwritten digit images. The
	[data preparation](data.html) page
	has details on converting this dataset into SINGA recognizable format. Users can
	simply run the following commands to download and convert the dataset.

	# at SINGA_ROOT/examples/mnist/
	$ cp Makefile.example Makefile
	$ make download
	$ make create

	The training is separated into two phases, namely pre-training and fine-tuning.
	The pre-training phase trains 4 RBMs in sequence,

	# at SINGA_ROOT/
	$ ./bin/singa-run.sh -conf examples/rbm/rbm1.conf
	$ ./bin/singa-run.sh -conf examples/rbm/rbm2.conf
	$ ./bin/singa-run.sh -conf examples/rbm/rbm3.conf
	$ ./bin/singa-run.sh -conf examples/rbm/rbm4.conf

	The fine-tuning phase trains the auto-encoder by,

	$ ./bin/singa-run.sh -conf examples/rbm/autoencoder.conf


	## Training details

	### RBM1

	<img src="../_static/images/example-rbm1.png" align="center" width="200px"/>
	<span><strong>Figure 1 - RBM1.</strong></span>

	The neural net structure for training RBM1 is shown in Figure 1.
	The data layer and parser layer provides features for training RBM1.
	The visible layer (connected with parser layer) of RBM1 accepts the image feature
	(784 dimension). The hidden layer is set to have 1000 neurons (units).
	These two layers are configured as,

	layer{
	name: "RBMVis"
	type: kRBMVis
	srclayers:"mnist"
	srclayers:"RBMHid"
	rbm_conf{
	hdim: 1000
	}
	param{
	name: "w1"
	init{
	type: kGaussian
	mean: 0.0
	std: 0.1
	}
	}
	param{
	name: "b11"
	init{
	type: kConstant
	value: 0.0
	}
	}
	}

	layer{
	name: "RBMHid"
	type: kRBMHid
	srclayers:"RBMVis"
	rbm_conf{
	hdim: 1000
	}
	param{
	name: "w1_"
	share_from: "w1"
	}
	param{
	name: "b12"
	init{
	type: kConstant
	value: 0.0
	}
	}
	}



	For RBM, the weight matrix is shared by the visible and hidden layers. For instance,
	`w1` is shared by `vis` and `hid` layers shown in Figure 1. In SINGA, we can configure
	the `share_from` field to enable [parameter sharing](param.html)
	as shown above for the param `w1` and `w1_`.

	[Contrastive Divergence](train-one-batch.html#contrastive-divergence)
	is configured as the algorithm for [TrainOneBatch](train-one-batch.html).
	Following Hinton's paper, we configure the [updating protocol](updater.html)
	as follows,

	# Updater Configuration
	updater{
	type: kSGD
	momentum: 0.2
	weight_decay: 0.0002
	learning_rate{
	base_lr: 0.1
	type: kFixed
	}
	}

	Since the parameters of RBM0 will be used to initialize the auto-encoder, we should
	configure the `workspace` field to specify a path for the checkpoint folder.
	For example, if we configure it as,

	cluster {
	workspace: "examples/rbm/rbm1/"
	}

	Then SINGA will [checkpoint the parameters](checkpoint.html) into examples/rbm/rbm1/.

	### RBM1
	<img src="../_static/images/example-rbm2.png" align="center" width="200px"/>
	<span><strong>Figure 2 - RBM2.</strong></span>

	Figure 2 shows the net structure of training RBM2.
	The visible units of RBM2 accept the output from the Sigmoid1 layer. The Inner1 layer
	is a `InnerProductLayer` whose parameters are set to the `w1` and `b12` learned
	from RBM1.
	The neural net configuration is (with layers for data layer and parser layer omitted).

	layer{
	name: "Inner1"
	type: kInnerProduct
	srclayers:"mnist"
	innerproduct_conf{
	num_output: 1000
	}
	param{ name: "w1" }
	param{ name: "b12"}
	}

	layer{
	name: "Sigmoid1"
	type: kSigmoid
	srclayers:"Inner1"
	}

	layer{
	name: "RBMVis"
	type: kRBMVis
	srclayers:"Sigmoid1"
	srclayers:"RBMHid"
	rbm_conf{
	hdim: 500
	}
	param{
	name: "w2"
	...
	}
	param{
	name: "b21"
	...
	}
	}

	layer{
	name: "RBMHid"
	type: kRBMHid
	srclayers:"RBMVis"
	rbm_conf{
	hdim: 500
	}
	param{
	name: "w2_"
	share_from: "w2"
	}
	param{
	name: "b22"
	...
	}
	}

	To load w0 and b02 from RBM0's checkpoint file, we configure the `checkpoint_path` as,

	checkpoint_path: "examples/rbm/rbm1/checkpoint/step6000-worker0"
	cluster{
	workspace: "examples/rbm/rbm2"
	}

	The workspace is changed for checkpointing `w2`, `b21` and `b22` into
	examples/rbm/rbm2/.

	### RBM3

	<img src="../_static/images/example-rbm3.png" align="center" width="200px"/>
	<span><strong>Figure 3 - RBM3.</strong></span>

	Figure 3 shows the net structure of training RBM3. In this model, a layer with
	250 units is added as the hidden layer of RBM3. The visible units of RBM3
	accepts output from Sigmoid2 layer. Parameters of Inner1 and Innner2 are set to
	`w1,b12,w2,b22` which can be load from the checkpoint file of RBM2,
	i.e., "examples/rbm/rbm2/".

	### RBM4


	<img src="../_static/images/example-rbm4.png" align="center" width="200px"/>
	<span><strong>Figure 4 - RBM4.</strong></span>

	Figure 4 shows the net structure of training RBM4. It is similar to Figure 3,
	but according to [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf), the hidden units of the
	top RBM (RBM4) have stochastic real-valued states drawn from a unit variance
	Gaussian whose mean is determined by the input from the RBM's logistic visible
	units. So we add a `gaussian` field in the RBMHid layer to control the
	sampling distribution (Gaussian or Bernoulli). In addition, this
	RBM has a much smaller learning rate (0.001). The neural net configuration for
	the RBM4 and the updating protocol is (with layers for data layer and parser
	layer omitted),

	# Updater Configuration
	updater{
	type: kSGD
	momentum: 0.9
	weight_decay: 0.0002
	learning_rate{
	base_lr: 0.001
	type: kFixed
	}
	}

	layer{
	name: "RBMVis"
	type: kRBMVis
	srclayers:"Sigmoid3"
	srclayers:"RBMHid"
	rbm_conf{
	hdim: 30
	}
	param{
	name: "w4"
	...
	}
	param{
	name: "b41"
	...
	}
	}

	layer{
	name: "RBMHid"
	type: kRBMHid
	srclayers:"RBMVis"
	rbm_conf{
	hdim: 30
	gaussian: true
	}
	param{
	name: "w4_"
	share_from: "w4"
	}
	param{
	name: "b42"
	...
	}
	}

	### Auto-encoder
	In the fine-tuning stage, the 4 RBMs are "unfolded" to form encoder and decoder
	networks that are initialized using the parameters from the previous 4 RBMs.

	<img src="../_static/images/example-autoencoder.png" align="center" width="500px"/>
	<span><strong>Figure 5 - Auto-Encoders.</strong></span>


	Figure 5 shows the neural net structure for training the auto-encoder.
	[Back propagation (kBP)] (train-one-batch.html) is
	configured as the algorithm for `TrainOneBatch`. We use the same cluster
	configuration as RBM models. For updater, we use [AdaGrad](updater.html#adagradupdater) algorithm with
	fixed learning rate.

	### Updater Configuration
	updater{
	type: kAdaGrad
	learning_rate{
	base_lr: 0.01
	type: kFixed
	}
	}



	According to [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf),
	we configure a EuclideanLoss layer to compute the reconstruction error. The neural net
	configuration is (with some of the middle layers omitted),

	layer{ name: "data" }
	layer{ name:"mnist" }
	layer{
	name: "Inner1"
	param{ name: "w1" }
	param{ name: "b12" }
	}
	layer{ name: "Sigmoid1" }
	...
	layer{
	name: "Inner8"
	innerproduct_conf{
	num_output: 784
	transpose: true
	}
	param{
	name: "w8"
	share_from: "w1"
	}
	param{ name: "b11" }
	}
	layer{ name: "Sigmoid8" }

	# Euclidean Loss Layer Configuration
	layer{
	name: "loss"
	type:kEuclideanLoss
	srclayers:"Sigmoid8"
	srclayers:"mnist"
	}

	To load pre-trained parameters from the 4 RBMs' checkpoint file we configure `checkpoint_path` as

	### Checkpoint Configuration
	checkpoint_path: "examples/rbm/checkpoint/rbm1/checkpoint/step6000-worker0"
	checkpoint_path: "examples/rbm/checkpoint/rbm2/checkpoint/step6000-worker0"
	checkpoint_path: "examples/rbm/checkpoint/rbm3/checkpoint/step6000-worker0"
	checkpoint_path: "examples/rbm/checkpoint/rbm4/checkpoint/step6000-worker0"


	## Visualization Results

	<div>
	<img src="../_static/images/rbm-weight.PNG" align="center" width="300px"/>

	<img src="../_static/images/rbm-feature.PNG" align="center" width="300px"/>
	<br/>
	<span><strong>Figure 6 - Bottom RBM weight matrix.</strong></span>





	<span><strong>Figure 7 - Top layer features.</strong></span>
	</div>

	Figure 6 visualizes sample columns of the weight matrix of RBM1, We can see the
	Gabor-like filters are learned. Figure 7 depicts the features extracted from
	the top-layer of the auto-encoder, wherein one point represents one image.
	Different colors represent different digits. We can see that most images are
	well clustered according to the ground truth.