Users can construct a model and run SINGA using Python. Specifically, the Python helper enables users to generate JobProto for the model and run Driver::Train or Driver::Test using Python. The Python Helper tool can be found in SINGA_ROOT/tool/python
consisting of the following directories.
SINGAROOT/tool/python |-- pb2 (has job_pb2.py) |-- singa |-- model.py |-- layer.py |-- parameter.py |-- initialization.py |-- utils |-- utility.py |-- message.py |-- examples |-- cifar10_cnn.py, mnist_mlp.py, mnist_rbm1.py, mnist_ae.py, etc. |-- datasets |-- cifar10.py |-- mnist.py
##1. Basic User Guide
In order to use the Python Helper features, users need to add the following option when building SINGA as follows.
./configure --enable-python --with-python=PYTHON_DIR
where PYTHON_DIR
has Python.h
bin/singa-run.sh -exec user_main.py
The python code, e.g., user_main.py
, would create the JobProto object and pass it to Driver::Train or Driver:Test.
For running CIFAR10 example,
cd SINGA_ROOT bin/singa-run.sh -exec tool/python/examples/cifar10_cnn.py
For running MNIST example,
cd SINGA_ROOT bin/singa-run.sh -exec tool/python/examples/mnist_mlp.py
The following classes configure field values for a particular layer and generate its LayerProto.
Data
for a data layer.Dense
for an innerproduct layer.Activation
for an activation layer.Convolution2D
for a convolution layer.MaxPooling2D
for a max pooling layer.AvgPooling2D
for an average pooling layer.LRN2D
for a normalization (or local response normalization) layer.Dropout
for a dropout layer.In addition, the following classes generate multiple layers for particular models.
RBM
for constructing layers of RBM.Autoencoder
for constructing layers of AutoencoderModel class has jobconf
(JobProto) and layers
(a layer list).
Methods in Model class
add
to add Layer into the model
Sequential
model and Energy
modelcompile
to configure an optimizer and topology for training.
Updater
(i.e., optimizer) and Cluster
(i.e., topology) componentsfit
to configure field values for training.
Train_one_batch
componentwith_test
argument True
if users want to run SINGA with test data simultaneously.evaluate
to configure field values for test.
checkpoint_path
field if users want to run SINGA only for test.Users need to set a list of gpu ids to device
field in fit() or evaluate().
For example,
gpu_id = [0] m.fit(X_train, nb_epoch=100, with_test=True, device=gpu_id)
Users may need to set/update parameter field values.
Parameter fields for both Weight and Bias (i.e., fields of ParamProto)
lr
= (float) : learning rate multiplier, used to scale the learning rate when updating parameters.wd
= (float) : weight decay multiplier, used to scale the weight decay when updating parameters.Parameter initialization (fields of ParamGenProto)
init
= (string) : one of the types, ‘uniform’, ‘constant’, ‘gaussian’scale
= (float) : for ‘uniform’, it is used to set low
=-scale and high
=+scalehigh
= (float) : for ‘uniform’low
= (float) : for ‘uniform’value
= (float) : for ‘constant’mean
= (float) : for ‘gaussian’std
= (float) : for ‘gaussian’Weight (w_param
) is set as ‘gaussian’ with mean
=0 and std
=0.01 at default.
Bias (b_param
) is set as ‘constant’ with value
=0 at default.
In order to set/update the parameter fields of either Weight or Bias
w_
in front of field nameb_
in front of field nameFor example,
m.add(Dense(10, w_mean=1, w_std=0.1, w_lr=2, w_wd=10, ...)
fit() and evaluate() return training/test results, i.e., a dictionary containing
X_train, X_test, workspace = mnist.load_data() m = Sequential('mlp', sys.argv) m.add(Dense(2500, init='uniform', activation='tanh')) m.add(Dense(2000, init='uniform', activation='tanh')) m.add(Dense(1500, init='uniform', activation='tanh')) m.add(Dense(1000, init='uniform', activation='tanh')) m.add(Dense(500, init='uniform', activation='tanh')) m.add(Dense(10, init='uniform', activation='softmax')) sgd = SGD(lr=0.001, lr_type='step') topo = Cluster(workspace) m.compile(loss='categorical_crossentropy', optimizer=sgd, cluster=topo) m.fit(X_train, nb_epoch=1000, with_test=True) result = m.evaluate(X_test, batch_size=100, test_steps=10, test_freq=60)
X_train, X_test, workspace = cifar10.load_data() m = Sequential('cnn', sys.argv) m.add(Convolution2D(32, 5, 1, 2, w_std=0.0001, b_lr=2)) m.add(MaxPooling2D(pool_size=(3,3), stride=2)) m.add(Activation('relu')) m.add(LRN2D(3, alpha=0.00005, beta=0.75)) m.add(Convolution2D(32, 5, 1, 2, b_lr=2)) m.add(Activation('relu')) m.add(AvgPooling2D(pool_size=(3,3), stride=2)) m.add(LRN2D(3, alpha=0.00005, beta=0.75)) m.add(Convolution2D(64, 5, 1, 2)) m.add(Activation('relu')) m.add(AvgPooling2D(pool_size=(3,3), stride=2)) m.add(Dense(10, w_wd=250, b_lr=2, b_wd=0, activation='softmax')) sgd = SGD(decay=0.004, lr_type='manual', step=(0,60000,65000), step_lr=(0.001,0.0001,0.00001)) topo = Cluster(workspace) m.compile(updater=sgd, cluster=topo) m.fit(X_train, nb_epoch=1000, with_test=True) result = m.evaluate(X_test, 1000, test_steps=30, test_freq=300)
rbmid = 3 X_train, X_test, workspace = mnist.load_data(nb_rbm=rbmid) m = Energy('rbm'+str(rbmid), sys.argv) out_dim = [1000, 500, 250] m.add(RBM(out_dim, w_std=0.1, b_wd=0)) sgd = SGD(lr=0.1, decay=0.0002, momentum=0.8) topo = Cluster(workspace) m.compile(optimizer=sgd, cluster=topo) m.fit(X_train, alg='cd', nb_epoch=6000)
rbmid = 4 X_train, X_test, workspace = mnist.load_data(nb_rbm=rbmid+1) m = Sequential('autoencoder', sys.argv) hid_dim = [1000, 500, 250, 30] m.add(Autoencoder(hid_dim, out_dim=784, activation='sigmoid', param_share=True)) agd = AdaGrad(lr=0.01) topo = Cluster(workspace) m.compile(loss='mean_squared_error', optimizer=agd, cluster=topo) m.fit(X_train, alg='bp', nb_epoch=12200)
Users can explicitly set/update parameter. There are several ways to set Parameter values
parw = Parameter(lr=2, wd=10, init='gaussian', std=0.1) parb = Parameter(lr=1, wd=0, init='constant', value=0) m.add(Convolution2D(10, w_param=parw, b_param=parb, ...)
m.add(Dense(10, w_mean=1, w_std=0.1, w_lr=2, w_wd=10, ...)
parw = Parameter(init='constant', mean=0) m.add(Dense(10, w_param=parw, w_lr=1, w_wd=1, b_value=1, ...)
There are alternative ways to add Data layer. In addition, users can write your own load_data
method of cifar10.py
and mnist.py
in examples/dataset
.
X_train, X_test = mnist.load_data() // parameter values are set in load_data() m.fit(X_train, ...) // Data layer for training is added m.evaluate(X_test, ...) // Data layer for testing is added
X_train, X_test = mnist.load_data() // parameter values are set in load_data() m.add(X_train) // explicitly add Data layer m.add(X_test) // explicitly add Data layer
store = Store(path='train.bin', batch_size=64, ...) // parameter values are set explicitly m.add(Data(load='recordinput', phase='train', conf=store)) // Data layer is added store = Store(path='test.bin', batch_size=100, ...) // parameter values are set explicitly m.add(Data(load='recordinput', phase='test', conf=store)) // Data layer is added
Hidden layers for MLP can be written as
for n in [2500, 2000, 1500, 1000, 500]: m.add(Dense(n, init='uniform', activation='tanh')) m.add(Dense(10, init='uniform', activation='softmax'))
Activation layer can be specified separately
m.add(Dense(2500, init='uniform')) m.add(Activation('tanh'))
Users can explicity specify weight and bias, and their values
for example of MLP
par = Parameter(init='uniform', scale=0.05) m.add(Dense(2500, w_param=par, b_param=par, activation='tanh')) m.add(Dense(2000, w_param=par, b_param=par, activation='tanh')) m.add(Dense(1500, w_param=par, b_param=par, activation='tanh')) m.add(Dense(1000, w_param=par, b_param=par, activation='tanh')) m.add(Dense(500, w_param=par, b_param=par, activation='tanh')) m.add(Dense(10, w_param=par, b_param=par, activation='softmax'))
for example of Cifar10
parw = Parameter(init='gauss', std=0.0001) parb = Parameter(init='const', value=0) m.add(Convolution(32, 5, 1, 2, w_param=parw, b_param=parb, b_lr=2)) m.add(MaxPooling2D(pool_size(3,3), stride=2)) m.add(Activation('relu')) m.add(LRN2D(3, alpha=0.00005, beta=0.75)) parw.update(std=0.01) m.add(Convolution(32, 5, 1, 2, w_param=parw, b_param=parb)) m.add(Activation('relu')) m.add(AvgPooling2D(pool_size(3,3), stride=2)) m.add(LRN2D(3, alpha=0.00005, beta=0.75)) m.add(Convolution(64, 5, 1, 2, w_param=parw, b_param=parb, b_lr=1)) m.add(Activation('relu')) m.add(AvgPooling2D(pool_size(3,3), stride=2)) m.add(Dense(10, w_param=parw, w_wd=250, b_param=parb, b_lr=2, b_wd=0, activation='softmax'))
(1) Run singa for training
m.fit(X_train, nb_epoch=1000)
(2) Run singa for training and validation
m.fit(X_train, validate_data=X_valid, nb_epoch=1000)
(3) Run singa for test while training
m.fit(X_train, nb_epoch=1000, with_test=True) result = m.evaluate(X_test, batch_size=100, test_steps=100)
(4) Run singa for test only Assume a checkpoint exists after training
result = m.evaluate(X_test, batch_size=100, checkpoint_path=workspace+'/checkpoint/step100-worker0')