Python Helper

Users can construct a model and run SINGA using Python. Specifically, the Python helper enables users to generate JobProto for the model and run Driver::Train or Driver::Test using Python. The Python Helper tool can be found in SINGA_ROOT/tool/python consisting of the following directories.

SINGAROOT/tool/python	
|-- pb2 (has job_pb2.py)
|-- singa 
    |-- model.py 
    |-- layer.py 
    |-- parameter.py 
    |-- initialization.py 
    |-- utils 
        |-- utility.py 
        |-- message.py 
|-- examples 
    |-- cifar10_cnn.py, mnist_mlp.py, mnist_rbm1.py, mnist_ae.py, etc. 
    |-- datasets 
        |-- cifar10.py 
        |-- mnist.py 

##1. Basic User Guide

In order to use the Python Helper features, users need to add the following option when building SINGA as follows.

./configure --enable-python --with-python=PYTHON_DIR

where PYTHON_DIR has Python.h

(a) How to Run

bin/singa-run.sh -exec user_main.py

The python code, e.g., user_main.py, would create the JobProto object and pass it to Driver::Train or Driver:Test.

For running CIFAR10 example,

cd SINGA_ROOT
bin/singa-run.sh -exec tool/python/examples/cifar10_cnn.py 

For running MNIST example,

cd SINGA_ROOT
bin/singa-run.sh -exec tool/python/examples/mnist_mlp.py 

(b) Class Description

Layer class

The following classes configure field values for a particular layer and generate its LayerProto.

  • Data for a data layer.
  • Dense for an innerproduct layer.
  • Activation for an activation layer.
  • Convolution2D for a convolution layer.
  • MaxPooling2D for a max pooling layer.
  • AvgPooling2D for an average pooling layer.
  • LRN2D for a normalization (or local response normalization) layer.
  • Dropout for a dropout layer.

In addition, the following classes generate multiple layers for particular models.

  • RBM for constructing layers of RBM.
  • Autoencoder for constructing layers of Autoencoder

Model class

Model class has jobconf (JobProto) and layers (a layer list).

Methods in Model class

  • add to add Layer into the model

    • 2 subclasses: Sequential model and Energy model
  • compile to configure an optimizer and topology for training.

    • set Updater (i.e., optimizer) and Cluster (i.e., topology) components
  • fit to configure field values for training.

    • set Training data and parameter values for the training
      • (optional) set Validatiaon data and parameter values
    • set Train_one_batch component
    • set with_test argument True if users want to run SINGA with test data simultaneously.
    • return train/validation results, e.g., accuracy, loss, ppl, etc.
  • evaluate to configure field values for test.

    • set Testing data and parameter values for the test
    • specify checkpoint_path field if users want to run SINGA only for test.
    • return test results, e.g., accuracy, loss, ppl, etc.

(c) To Run Singa on GPU

Users need to set a list of gpu ids to device field in fit() or evaluate().

For example,

gpu_id = [0]
m.fit(X_train, nb_epoch=100, with_test=True, device=gpu_id)

(d) How to set/update parameter values

Users may need to set/update parameter field values.

  • Parameter fields for both Weight and Bias (i.e., fields of ParamProto)

    • lr = (float) : learning rate multiplier, used to scale the learning rate when updating parameters.
    • wd = (float) : weight decay multiplier, used to scale the weight decay when updating parameters.
  • Parameter initialization (fields of ParamGenProto)

    • init = (string) : one of the types, ‘uniform’, ‘constant’, ‘gaussian’
    • scale = (float) : for ‘uniform’, it is used to set low=-scale and high=+scale
    • high = (float) : for ‘uniform’
    • low = (float) : for ‘uniform’
    • value = (float) : for ‘constant’
    • mean = (float) : for ‘gaussian’
    • std = (float) : for ‘gaussian’
  • Weight (w_param) is set as ‘gaussian’ with mean=0 and std=0.01 at default.

  • Bias (b_param) is set as ‘constant’ with value=0 at default.

  • In order to set/update the parameter fields of either Weight or Bias

    • for Weight, put w_ in front of field name
    • for Bias, put b_ in front of field name

    For example,

    	m.add(Dense(10, w_mean=1, w_std=0.1, w_lr=2, w_wd=10, ...)
    

(e) Results

fit() and evaluate() return training/test results, i.e., a dictionary containing

  • [key]: step number
  • [value]: a list of dictionay
    • ‘acc’ for accuracy
    • ‘loss’ for loss
    • ‘ppl’ for ppl
    • ‘se’ for squred error

2. Examples

MLP example (to generate job.conf for MNIST)

X_train, X_test, workspace = mnist.load_data()

m = Sequential('mlp', sys.argv)  

m.add(Dense(2500, init='uniform', activation='tanh'))
m.add(Dense(2000, init='uniform', activation='tanh'))
m.add(Dense(1500, init='uniform', activation='tanh'))
m.add(Dense(1000, init='uniform', activation='tanh'))
m.add(Dense(500,  init='uniform', activation='tanh'))
m.add(Dense(10, init='uniform', activation='softmax')) 

sgd = SGD(lr=0.001, lr_type='step')
topo = Cluster(workspace)
m.compile(loss='categorical_crossentropy', optimizer=sgd, cluster=topo)
m.fit(X_train, nb_epoch=1000, with_test=True)
result = m.evaluate(X_test, batch_size=100, test_steps=10, test_freq=60)

CNN example (to generate job.conf for cifar10)

X_train, X_test, workspace = cifar10.load_data()

m = Sequential('cnn', sys.argv)

m.add(Convolution2D(32, 5, 1, 2, w_std=0.0001, b_lr=2))
m.add(MaxPooling2D(pool_size=(3,3), stride=2))
m.add(Activation('relu'))
m.add(LRN2D(3, alpha=0.00005, beta=0.75))

m.add(Convolution2D(32, 5, 1, 2, b_lr=2))
m.add(Activation('relu'))
m.add(AvgPooling2D(pool_size=(3,3), stride=2))
m.add(LRN2D(3, alpha=0.00005, beta=0.75))

m.add(Convolution2D(64, 5, 1, 2))
m.add(Activation('relu'))
m.add(AvgPooling2D(pool_size=(3,3), stride=2))

m.add(Dense(10, w_wd=250, b_lr=2, b_wd=0, activation='softmax'))

sgd = SGD(decay=0.004, lr_type='manual', step=(0,60000,65000), step_lr=(0.001,0.0001,0.00001))
topo = Cluster(workspace)
m.compile(updater=sgd, cluster=topo)
m.fit(X_train, nb_epoch=1000, with_test=True)
result = m.evaluate(X_test, 1000, test_steps=30, test_freq=300)

RBM Example

rbmid = 3                                                                                           
X_train, X_test, workspace = mnist.load_data(nb_rbm=rbmid)                                               
m = Energy('rbm'+str(rbmid), sys.argv)

out_dim = [1000, 500, 250]
m.add(RBM(out_dim, w_std=0.1, b_wd=0)) 
                                                                                         
sgd = SGD(lr=0.1, decay=0.0002, momentum=0.8)                                
topo = Cluster(workspace)                                                                    
m.compile(optimizer=sgd, cluster=topo)                                                    
m.fit(X_train, alg='cd', nb_epoch=6000)                            

AutoEncoder Example

rbmid = 4
X_train, X_test, workspace = mnist.load_data(nb_rbm=rbmid+1)                                               
m = Sequential('autoencoder', sys.argv)

hid_dim = [1000, 500, 250, 30]
m.add(Autoencoder(hid_dim, out_dim=784, activation='sigmoid', param_share=True))

agd = AdaGrad(lr=0.01)
topo = Cluster(workspace)
m.compile(loss='mean_squared_error', optimizer=agd, cluster=topo)
m.fit(X_train, alg='bp', nb_epoch=12200)

3. Advanced User Guide

Parameter class

Users can explicitly set/update parameter. There are several ways to set Parameter values

parw = Parameter(lr=2, wd=10, init='gaussian', std=0.1)
parb = Parameter(lr=1, wd=0, init='constant', value=0)
m.add(Convolution2D(10, w_param=parw, b_param=parb, ...)
m.add(Dense(10, w_mean=1, w_std=0.1, w_lr=2, w_wd=10, ...)
parw = Parameter(init='constant', mean=0)
m.add(Dense(10, w_param=parw, w_lr=1, w_wd=1, b_value=1, ...)

Data layer

There are alternative ways to add Data layer. In addition, users can write your own load_data method of cifar10.py and mnist.py in examples/dataset.

X_train, X_test = mnist.load_data()  // parameter values are set in load_data() 
m.fit(X_train, ...)                  // Data layer for training is added
m.evaluate(X_test, ...)              // Data layer for testing is added
X_train, X_test = mnist.load_data()  // parameter values are set in load_data() 
m.add(X_train)                       // explicitly add Data layer
m.add(X_test)                        // explicitly add Data layer
store = Store(path='train.bin', batch_size=64, ...)        // parameter values are set explicitly 
m.add(Data(load='recordinput', phase='train', conf=store)) // Data layer is added
store = Store(path='test.bin', batch_size=100, ...)        // parameter values are set explicitly 
m.add(Data(load='recordinput', phase='test', conf=store))  // Data layer is added

Other TIPS

Hidden layers for MLP can be written as

for n in [2500, 2000, 1500, 1000, 500]:
  m.add(Dense(n, init='uniform', activation='tanh'))
m.add(Dense(10, init='uniform', activation='softmax'))

Activation layer can be specified separately

m.add(Dense(2500, init='uniform'))
m.add(Activation('tanh'))

Users can explicity specify weight and bias, and their values

for example of MLP

par = Parameter(init='uniform', scale=0.05)
m.add(Dense(2500, w_param=par, b_param=par, activation='tanh'))
m.add(Dense(2000, w_param=par, b_param=par, activation='tanh'))
m.add(Dense(1500, w_param=par, b_param=par, activation='tanh'))
m.add(Dense(1000, w_param=par, b_param=par, activation='tanh'))
m.add(Dense(500, w_param=par, b_param=par, activation='tanh'))
m.add(Dense(10, w_param=par, b_param=par, activation='softmax'))

for example of Cifar10

parw = Parameter(init='gauss', std=0.0001)
parb = Parameter(init='const', value=0)
m.add(Convolution(32, 5, 1, 2, w_param=parw, b_param=parb, b_lr=2))
m.add(MaxPooling2D(pool_size(3,3), stride=2))
m.add(Activation('relu'))
m.add(LRN2D(3, alpha=0.00005, beta=0.75))

parw.update(std=0.01)
m.add(Convolution(32, 5, 1, 2, w_param=parw, b_param=parb))
m.add(Activation('relu'))
m.add(AvgPooling2D(pool_size(3,3), stride=2))
m.add(LRN2D(3, alpha=0.00005, beta=0.75))

m.add(Convolution(64, 5, 1, 2, w_param=parw, b_param=parb, b_lr=1))
m.add(Activation('relu'))
m.add(AvgPooling2D(pool_size(3,3), stride=2))

m.add(Dense(10, w_param=parw, w_wd=250, b_param=parb, b_lr=2, b_wd=0, activation='softmax'))

Different Cases to Run SINGA

(1) Run singa for training

m.fit(X_train, nb_epoch=1000)

(2) Run singa for training and validation

m.fit(X_train, validate_data=X_valid, nb_epoch=1000)

(3) Run singa for test while training

m.fit(X_train, nb_epoch=1000, with_test=True)
result = m.evaluate(X_test, batch_size=100, test_steps=100)

(4) Run singa for test only Assume a checkpoint exists after training

result = m.evaluate(X_test, batch_size=100, checkpoint_path=workspace+'/checkpoint/step100-worker0')