tool/python/README.md - singa - Git at Google

 # Python Binding

 ---

 Python binding provides APIs for configuring a training job following
 [keras](http://keras.io/), including the configuration of neural net, training
 algorithm, etc.  It replaces the configuration file (e.g., *job.conf*) in
 protobuf format, which is typically long and error-prone to prepare. In later
 version, we will add python functions to interact with the layer and neural net
 objects, which would enable users to train and debug their models
 interactively.

 Here is the layout of python related code,

     SINGAROOT/tool/python
     |-- pb2 (has job_pb2.py)
     |-- singa
         |-- model.py
         |-- layer.py
         |-- parameter.py
         |-- initialization.py
         |-- utils
             |-- utility.py
             |-- message.py
     |-- examples
         |-- cifar10_cnn.py, mnist_mlp.py, , mnist_rbm1.py, mnist_ae.py, etc.
         |-- datasets
             |-- cifar10.py
             |-- mnist.py

 ## Compiling and running instructions

 In order to use the Python APIs, users need to add the following arguments when compiling
 SINGA,

     ./configure --enable-python --with-python=PYTHON_DIR
     make

 where PYTHON_DIR has Python.h


 The training program is launched by

     bin/singa-run.sh -exec <user_main.py>

 where user_main.py creates the JobProto object and passes it to Driver::Train to
 start the training.

 For example,

     cd SINGAROOT
     bin/singa-run.sh -exec tool/python/examples/cifar10_cnn.py


 ## Examples


 ### MLP Example

 This example uses python APIs to configure and train a MLP model over the MNIST
 dataset. The configuration content is the same as that written in *SINGAROOT/examples/mnist/job.conf*.

 ```
 X_train, X_test, workspace = mnist.load_data()

 m = Sequential('mlp', sys.argv)

 m.add(Dense(2500, init='uniform', activation='tanh'))
 m.add(Dense(2000, init='uniform', activation='tanh'))
 m.add(Dense(1500, init='uniform', activation='tanh'))
 m.add(Dense(1000, init='uniform', activation='tanh'))
 m.add(Dense(500,  init='uniform', activation='tanh'))
 m.add(Dense(10, init='uniform', activation='softmax'))

 sgd = SGD(lr=0.001, lr_type='step')
 topo = Cluster(workspace)
 m.compile(loss='categorical_crossentropy', optimizer=sgd, cluster=topo)
 m.fit(X_train, nb_epoch=1000, with_test=True)
 result = m.evaluate(X_test, batch_size=100, test_steps=10, test_freq=60)
 ```

 ### CNN Example

 This example uses python APIs to configure and train a CNN model over the Cifar10
 dataset. The configuration content is the same as that written in *SINGAROOT/examples/cifar10/job.conf*.


 ```
 X_train, X_test, workspace = cifar10.load_data()

 m = Sequential('cnn', sys.argv)

 m.add(Convolution2D(32, 5, 1, 2, w_std=0.0001, b_lr=2))
 m.add(MaxPooling2D(pool_size=(3,3), stride=2))
 m.add(Activation('relu'))
 m.add(LRN2D(3, alpha=0.00005, beta=0.75))

 m.add(Convolution2D(32, 5, 1, 2, b_lr=2))
 m.add(Activation('relu'))
 m.add(AvgPooling2D(pool_size=(3,3), stride=2))
 m.add(LRN2D(3, alpha=0.00005, beta=0.75))

 m.add(Convolution2D(64, 5, 1, 2))
 m.add(Activation('relu'))
 m.add(AvgPooling2D(pool_size=(3,3), stride=2))

 m.add(Dense(10, w_wd=250, b_lr=2, b_wd=0, activation='softmax'))

 sgd = SGD(decay=0.004, lr_type='manual', step=(0,60000,65000), step_lr=(0.001,0.0001,0.00001))
 topo = Cluster(workspace)
 m.compile(updater=sgd, cluster=topo)
 m.fit(X_train, nb_epoch=1000, with_test=True)
 result = m.evaluate(X_test, 1000, test_steps=30, test_freq=300)
 ```


 ### RBM Example

 This example uses python APIs to configure and train a RBM model over the MNIST
 dataset. The configuration content is the same as that written in *SINGAROOT/examples/rbm*.conf*.

 ```
 rbmid = 3
 X_train, X_test, workspace = mnist.load_data(nb_rbm=rbmid)
 m = Energy('rbm'+str(rbmid), sys.argv)

 out_dim = [1000, 500, 250]
 m.add(RBM(out_dim, w_std=0.1, b_wd=0))

 sgd = SGD(lr=0.1, decay=0.0002, momentum=0.8)
 topo = Cluster(workspace)
 m.compile(optimizer=sgd, cluster=topo)
 m.fit(X_train, alg='cd', nb_epoch=6000)
 ```

 ### AutoEncoder Example
 This example uses python APIs to configure and train an autoencoder model over
 the MNIST dataset. The configuration content is the same as that written in
 *SINGAROOT/examples/autoencoder.conf*.


 ```
 rbmid = 4
 X_train, X_test, workspace = mnist.load_data(nb_rbm=rbmid+1)
 m = Sequential('autoencoder', sys.argv)

 hid_dim = [1000, 500, 250, 30]
 m.add(Autoencoder(hid_dim, out_dim=784, activation='sigmoid', param_share=True))

 agd = AdaGrad(lr=0.01)
 topo = Cluster(workspace)
 m.compile(loss='mean_squared_error', optimizer=agd, cluster=topo)
 m.fit(X_train, alg='bp', nb_epoch=12200)
 ```

 ### To run SINGA on GPU

 Users need to set a list of gpu ids to `device` field in fit() or evaluate().
 The number of GPUs must be the same to the number of workers configured for
 cluster topology.


 ```
 gpu_id = [0]
 m.fit(X_train, nb_epoch=100, with_test=True, device=gpu_id)
 ```

 ### TIPS

 Hidden layers for MLP can be configured as

 ```
 for n in [2500, 2000, 1500, 1000, 500]:
   m.add(Dense(n, init='uniform', activation='tanh'))
 m.add(Dense(10, init='uniform', activation='softmax'))
 ```

 Activation layer can be specified separately

 ```
 m.add(Dense(2500, init='uniform'))
 m.add(Activation('tanh'))
 ```

 Users can explicitly specify hyper-parameters of weight and bias

 ```
 par = Parameter(init='uniform', scale=0.05)
 m.add(Dense(2500, w_param=par, b_param=par, activation='tanh'))
 m.add(Dense(2000, w_param=par, b_param=par, activation='tanh'))
 m.add(Dense(1500, w_param=par, b_param=par, activation='tanh'))
 m.add(Dense(1000, w_param=par, b_param=par, activation='tanh'))
 m.add(Dense(500, w_param=par, b_param=par, activation='tanh'))
 m.add(Dense(10, w_param=par, b_param=par, activation='softmax'))
 ```


 ```
 parw = Parameter(init='gauss', std=0.0001)
 parb = Parameter(init='const', value=0)
 m.add(Convolution(32, 5, 1, 2, w_param=parw, b_param=parb, b_lr=2))
 m.add(MaxPooling2D(pool_size(3,3), stride=2))
 m.add(Activation('relu'))
 m.add(LRN2D(3, alpha=0.00005, beta=0.75))

 parw.update(std=0.01)
 m.add(Convolution(32, 5, 1, 2, w_param=parw, b_param=parb))
 m.add(Activation('relu'))
 m.add(AvgPooling2D(pool_size(3,3), stride=2))
 m.add(LRN2D(3, alpha=0.00005, beta=0.75))

 m.add(Convolution(64, 5, 1, 2, w_param=parw, b_param=parb, b_lr=1))
 m.add(Activation('relu'))
 m.add(AvgPooling2D(pool_size(3,3), stride=2))

 m.add(Dense(10, w_param=parw, w_wd=250, b_param=parb, b_lr=2, b_wd=0, activation='softmax'))
 ```


 Data can be added in this way,

 ```
 X_train, X_test = mnist.load_data()  // parameter values are set in load_data()
 m.fit(X_train, ...)                  // Data layer for training is added
 m.evaluate(X_test, ...)              // Data layer for testing is added
 ```
 or this way,

 ```
 X_train, X_test = mnist.load_data()  // parameter values are set in load_data()
 m.add(X_train)                       // explicitly add Data layer
 m.add(X_test)                        // explicitly add Data layer
 ```


 ```
 store = Store(path='train.bin', batch_size=64, ...)        // parameter values are set explicitly
 m.add(Data(load='recordinput', phase='train', conf=store)) // Data layer is added
 store = Store(path='test.bin', batch_size=100, ...)        // parameter values are set explicitly
 m.add(Data(load='recordinput', phase='test', conf=store))  // Data layer is added
 ```


 ### Cases to run SINGA

 (1) Run SINGA for training

 ```
 m.fit(X_train, nb_epoch=1000)
 ```

 (2) Run SINGA for training and validation

 ```
 m.fit(X_train, validate_data=X_valid, nb_epoch=1000)
 ```

 (3) Run SINGA for test while training

 ```
 m.fit(X_train, nb_epoch=1000, with_test=True)
 result = m.evaluate(X_test, batch_size=100, test_steps=100)
 ```

 (4) Run SINGA for test only
 Assume a checkpoint exists after training

 ```
 result = m.evaluate(X_test, batch_size=100, checkpoint_path=workspace+'/checkpoint/step100-worker0')
 ```


 ## Implementation Details

 ### Layer class (inherited)

 * Data
 * Dense
 * Activation
 * Convolution2D
 * MaxPooling2D
 * AvgPooling2D
 * LRN2D
 * Dropout
 * RBM
 * Autoencoder

 ### Model class

 Model class has `jobconf` (JobProto) and `layers` (layer list)

 Methods in Model class

 * add
 	* add Layer into Model
 	* 2 subclasses: Sequential model and Energy model

 * compile
 	* set Updater (i.e., optimizer) and Cluster (i.e., topology) components

 * fit
 	* set Training data and parameter values for the training
 		* (optional) set Validatiaon data and parameter values
 	* set Train_one_batch component
 	* specify `with_test` field if a user wants to run SINGA with test data simultaneously.
 	* [TODO] recieve train/validation results, e.g., accuracy, loss, ppl, etc.

 * evaluate
 	* set Testing data and parameter values for the testing
 	* specify `checkpoint_path` field if a user want to run SINGA only for testing.
 	* [TODO] recieve test results, e.g., accuracy, loss, ppl, etc.

 ### Results

 fit() and evaluate() return train/test results, a dictionary containing

 * [key]: step number
 * [value]: a list of dictionay
 	* 'acc' for accuracy
 	* 'loss' for loss
 	* 'ppl' for ppl
 	* 'se' for squred error


 ### Parameter class

 Users need to set parameter and initial values. For example,

 * Parameter (fields in Param proto)
 	* lr = (float) // learning rate multiplier, used to scale the learning rate when updating parameters.
 	* wd = (float) // weight decay multiplier, used to scale the weight decay when updating parameters.

 * Parameter initialization (fields in ParamGen proto)
 	* init = (string) // one of the types, 'uniform', 'constant', 'gaussian'
 	* high = (float)  // for 'uniform'
 	* low = (float)   // for 'uniform'
 	* value = (float) // for 'constant'
 	* mean = (float)  // for 'gaussian'
 	* std = (float)   // for 'gaussian'

 * Weight (`w_param`) is 'gaussian' with mean=0, std=0.01 at default

 * Bias (`b_param`) is 'constant' with value=0 at default

 * How to update the parameter fields
 	* for updating Weight, put `w_` in front of field name
 	* for updating Bias, put `b_` in front of field name

 Several ways to set Parameter values

 ```
 parw = Parameter(lr=2, wd=10, init='gaussian', std=0.1)
 parb = Parameter(lr=1, wd=0, init='constant', value=0)
 m.add(Convolution2D(10, w_param=parw, b_param=parb, ...)
 ```

 ```
 m.add(Dense(10, w_mean=1, w_std=0.1, w_lr=2, w_wd=10, ...)
 ```

 ```
 parw = Parameter(init='constant', mean=0)
 m.add(Dense(10, w_param=parw, w_lr=1, w_wd=1, b_value=1, ...)
 ```

 ### Other classes

 * Store
 * Algorithm
 * Updater
 * SGD
 * AdaGrad
 * Cluster
	# Python Binding

	---

	Python binding provides APIs for configuring a training job following
	[keras](http://keras.io/), including the configuration of neural net, training
	algorithm, etc. It replaces the configuration file (e.g., job.conf) in
	protobuf format, which is typically long and error-prone to prepare. In later
	version, we will add python functions to interact with the layer and neural net
	objects, which would enable users to train and debug their models
	interactively.

	Here is the layout of python related code,

	SINGAROOT/tool/python
	\|-- pb2 (has job_pb2.py)
	\|-- singa
	\|-- model.py
	\|-- layer.py
	\|-- parameter.py
	\|-- initialization.py
	\|-- utils
	\|-- utility.py
	\|-- message.py
	\|-- examples
	\|-- cifar10_cnn.py, mnist_mlp.py, , mnist_rbm1.py, mnist_ae.py, etc.
	\|-- datasets
	\|-- cifar10.py
	\|-- mnist.py

	## Compiling and running instructions

	In order to use the Python APIs, users need to add the following arguments when compiling
	SINGA,

	./configure --enable-python --with-python=PYTHON_DIR
	make

	where PYTHON_DIR has Python.h


	The training program is launched by

	bin/singa-run.sh -exec <user_main.py>

	where user_main.py creates the JobProto object and passes it to Driver::Train to
	start the training.

	For example,

	cd SINGAROOT
	bin/singa-run.sh -exec tool/python/examples/cifar10_cnn.py



	## Examples


	### MLP Example

	This example uses python APIs to configure and train a MLP model over the MNIST
	dataset. The configuration content is the same as that written in SINGAROOT/examples/mnist/job.conf.

	```
	X_train, X_test, workspace = mnist.load_data()

	m = Sequential('mlp', sys.argv)

	m.add(Dense(2500, init='uniform', activation='tanh'))
	m.add(Dense(2000, init='uniform', activation='tanh'))
	m.add(Dense(1500, init='uniform', activation='tanh'))
	m.add(Dense(1000, init='uniform', activation='tanh'))
	m.add(Dense(500, init='uniform', activation='tanh'))
	m.add(Dense(10, init='uniform', activation='softmax'))

	sgd = SGD(lr=0.001, lr_type='step')
	topo = Cluster(workspace)
	m.compile(loss='categorical_crossentropy', optimizer=sgd, cluster=topo)
	m.fit(X_train, nb_epoch=1000, with_test=True)
	result = m.evaluate(X_test, batch_size=100, test_steps=10, test_freq=60)
	```

	### CNN Example

	This example uses python APIs to configure and train a CNN model over the Cifar10
	dataset. The configuration content is the same as that written in SINGAROOT/examples/cifar10/job.conf.


	```
	X_train, X_test, workspace = cifar10.load_data()

	m = Sequential('cnn', sys.argv)

	m.add(Convolution2D(32, 5, 1, 2, w_std=0.0001, b_lr=2))
	m.add(MaxPooling2D(pool_size=(3,3), stride=2))
	m.add(Activation('relu'))
	m.add(LRN2D(3, alpha=0.00005, beta=0.75))

	m.add(Convolution2D(32, 5, 1, 2, b_lr=2))
	m.add(Activation('relu'))
	m.add(AvgPooling2D(pool_size=(3,3), stride=2))
	m.add(LRN2D(3, alpha=0.00005, beta=0.75))

	m.add(Convolution2D(64, 5, 1, 2))
	m.add(Activation('relu'))
	m.add(AvgPooling2D(pool_size=(3,3), stride=2))

	m.add(Dense(10, w_wd=250, b_lr=2, b_wd=0, activation='softmax'))

	sgd = SGD(decay=0.004, lr_type='manual', step=(0,60000,65000), step_lr=(0.001,0.0001,0.00001))
	topo = Cluster(workspace)
	m.compile(updater=sgd, cluster=topo)
	m.fit(X_train, nb_epoch=1000, with_test=True)
	result = m.evaluate(X_test, 1000, test_steps=30, test_freq=300)
	```


	### RBM Example

	This example uses python APIs to configure and train a RBM model over the MNIST
	dataset. The configuration content is the same as that written in SINGAROOT/examples/rbm.conf*.

	```
	rbmid = 3
	X_train, X_test, workspace = mnist.load_data(nb_rbm=rbmid)
	m = Energy('rbm'+str(rbmid), sys.argv)

	out_dim = [1000, 500, 250]
	m.add(RBM(out_dim, w_std=0.1, b_wd=0))

	sgd = SGD(lr=0.1, decay=0.0002, momentum=0.8)
	topo = Cluster(workspace)
	m.compile(optimizer=sgd, cluster=topo)
	m.fit(X_train, alg='cd', nb_epoch=6000)
	```

	### AutoEncoder Example
	This example uses python APIs to configure and train an autoencoder model over
	the MNIST dataset. The configuration content is the same as that written in
	SINGAROOT/examples/autoencoder.conf.


	```
	rbmid = 4
	X_train, X_test, workspace = mnist.load_data(nb_rbm=rbmid+1)
	m = Sequential('autoencoder', sys.argv)

	hid_dim = [1000, 500, 250, 30]
	m.add(Autoencoder(hid_dim, out_dim=784, activation='sigmoid', param_share=True))

	agd = AdaGrad(lr=0.01)
	topo = Cluster(workspace)
	m.compile(loss='mean_squared_error', optimizer=agd, cluster=topo)
	m.fit(X_train, alg='bp', nb_epoch=12200)
	```

	### To run SINGA on GPU

	Users need to set a list of gpu ids to `device` field in fit() or evaluate().
	The number of GPUs must be the same to the number of workers configured for
	cluster topology.


	```
	gpu_id = [0]
	m.fit(X_train, nb_epoch=100, with_test=True, device=gpu_id)
	```

	### TIPS

	Hidden layers for MLP can be configured as

	```
	for n in [2500, 2000, 1500, 1000, 500]:
	m.add(Dense(n, init='uniform', activation='tanh'))
	m.add(Dense(10, init='uniform', activation='softmax'))
	```

	Activation layer can be specified separately

	```
	m.add(Dense(2500, init='uniform'))
	m.add(Activation('tanh'))
	```

	Users can explicitly specify hyper-parameters of weight and bias

	```
	par = Parameter(init='uniform', scale=0.05)
	m.add(Dense(2500, w_param=par, b_param=par, activation='tanh'))
	m.add(Dense(2000, w_param=par, b_param=par, activation='tanh'))
	m.add(Dense(1500, w_param=par, b_param=par, activation='tanh'))
	m.add(Dense(1000, w_param=par, b_param=par, activation='tanh'))
	m.add(Dense(500, w_param=par, b_param=par, activation='tanh'))
	m.add(Dense(10, w_param=par, b_param=par, activation='softmax'))
	```


	```
	parw = Parameter(init='gauss', std=0.0001)
	parb = Parameter(init='const', value=0)
	m.add(Convolution(32, 5, 1, 2, w_param=parw, b_param=parb, b_lr=2))
	m.add(MaxPooling2D(pool_size(3,3), stride=2))
	m.add(Activation('relu'))
	m.add(LRN2D(3, alpha=0.00005, beta=0.75))

	parw.update(std=0.01)
	m.add(Convolution(32, 5, 1, 2, w_param=parw, b_param=parb))
	m.add(Activation('relu'))
	m.add(AvgPooling2D(pool_size(3,3), stride=2))
	m.add(LRN2D(3, alpha=0.00005, beta=0.75))

	m.add(Convolution(64, 5, 1, 2, w_param=parw, b_param=parb, b_lr=1))
	m.add(Activation('relu'))
	m.add(AvgPooling2D(pool_size(3,3), stride=2))

	m.add(Dense(10, w_param=parw, w_wd=250, b_param=parb, b_lr=2, b_wd=0, activation='softmax'))
	```


	Data can be added in this way,

	```
	X_train, X_test = mnist.load_data() // parameter values are set in load_data()
	m.fit(X_train, ...) // Data layer for training is added
	m.evaluate(X_test, ...) // Data layer for testing is added
	```
	or this way,

	```
	X_train, X_test = mnist.load_data() // parameter values are set in load_data()
	m.add(X_train) // explicitly add Data layer
	m.add(X_test) // explicitly add Data layer
	```


	```
	store = Store(path='train.bin', batch_size=64, ...) // parameter values are set explicitly
	m.add(Data(load='recordinput', phase='train', conf=store)) // Data layer is added
	store = Store(path='test.bin', batch_size=100, ...) // parameter values are set explicitly
	m.add(Data(load='recordinput', phase='test', conf=store)) // Data layer is added
	```


	### Cases to run SINGA

	(1) Run SINGA for training

	```
	m.fit(X_train, nb_epoch=1000)
	```

	(2) Run SINGA for training and validation

	```
	m.fit(X_train, validate_data=X_valid, nb_epoch=1000)
	```

	(3) Run SINGA for test while training

	```
	m.fit(X_train, nb_epoch=1000, with_test=True)
	result = m.evaluate(X_test, batch_size=100, test_steps=100)
	```

	(4) Run SINGA for test only
	Assume a checkpoint exists after training

	```
	result = m.evaluate(X_test, batch_size=100, checkpoint_path=workspace+'/checkpoint/step100-worker0')
	```


	## Implementation Details

	### Layer class (inherited)

	* Data
	* Dense
	* Activation
	* Convolution2D
	* MaxPooling2D
	* AvgPooling2D
	* LRN2D
	* Dropout
	* RBM
	* Autoencoder

	### Model class

	Model class has `jobconf` (JobProto) and `layers` (layer list)

	Methods in Model class

	* add
	* add Layer into Model
	* 2 subclasses: Sequential model and Energy model

	* compile
	* set Updater (i.e., optimizer) and Cluster (i.e., topology) components

	* fit
	* set Training data and parameter values for the training
	* (optional) set Validatiaon data and parameter values
	* set Train_one_batch component
	* specify `with_test` field if a user wants to run SINGA with test data simultaneously.
	* [TODO] recieve train/validation results, e.g., accuracy, loss, ppl, etc.

	* evaluate
	* set Testing data and parameter values for the testing
	* specify `checkpoint_path` field if a user want to run SINGA only for testing.
	* [TODO] recieve test results, e.g., accuracy, loss, ppl, etc.

	### Results

	fit() and evaluate() return train/test results, a dictionary containing

	* [key]: step number
	* [value]: a list of dictionay
	* 'acc' for accuracy
	* 'loss' for loss
	* 'ppl' for ppl
	* 'se' for squred error


	### Parameter class

	Users need to set parameter and initial values. For example,

	* Parameter (fields in Param proto)
	* lr = (float) // learning rate multiplier, used to scale the learning rate when updating parameters.
	* wd = (float) // weight decay multiplier, used to scale the weight decay when updating parameters.

	* Parameter initialization (fields in ParamGen proto)
	* init = (string) // one of the types, 'uniform', 'constant', 'gaussian'
	* high = (float) // for 'uniform'
	* low = (float) // for 'uniform'
	* value = (float) // for 'constant'
	* mean = (float) // for 'gaussian'
	* std = (float) // for 'gaussian'

	* Weight (`w_param`) is 'gaussian' with mean=0, std=0.01 at default

	* Bias (`b_param`) is 'constant' with value=0 at default

	* How to update the parameter fields
	* for updating Weight, put `w_` in front of field name
	* for updating Bias, put `b_` in front of field name

	Several ways to set Parameter values

	```
	parw = Parameter(lr=2, wd=10, init='gaussian', std=0.1)
	parb = Parameter(lr=1, wd=0, init='constant', value=0)
	m.add(Convolution2D(10, w_param=parw, b_param=parb, ...)
	```

	```
	m.add(Dense(10, w_mean=1, w_std=0.1, w_lr=2, w_wd=10, ...)
	```

	```
	parw = Parameter(init='constant', mean=0)
	m.add(Dense(10, w_param=parw, w_lr=1, w_wd=1, b_value=1, ...)
	```

	### Other classes

	* Store
	* Algorithm
	* Updater
	* SGD
	* AdaGrad
	* Cluster