blob: 026b5b8c2d15de970c04c2524363375204c80f40 [file] [log] [blame]
# Train-One-Batch
---
For each SGD iteration, every worker calls the `TrainOneBatch` function to
compute gradients of parameters associated with local layers (i.e., layers
dispatched to it). SINGA has implemented two algorithms for the
`TrainOneBatch` function. Users select the corresponding algorithm for
their model in the configuration.
## Basic user guide
### Back-propagation
[BP algorithm](http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf) is used for
computing gradients of feed-forward models, e.g., [CNN](cnn.html)
and [MLP](mlp.html), and [RNN](rnn.html) models in SINGA.
# in job.conf
alg: kBP
To use the BP algorithm for the `TrainOneBatch` function, users just simply
configure the `alg` field with `kBP`. If a neural net contains user-defined
layers, these layers must be implemented properly be to consistent with the
implementation of the BP algorithm in SINGA (see below).
### Contrastive Divergence
[CD algorithm](http://www.cs.toronto.edu/~fritz/absps/nccd.pdf) is used for
computing gradients of energy models like RBM.
# job.conf
alg: kCD
cd_conf {
cd_k: 2
}
To use the CD algorithm for the `TrainOneBatch` function, users just configure
the `alg` field to `kCD`. Uses can also configure the Gibbs sampling steps in
the CD algorthm through the `cd_k` field. By default, it is set to 1.
## Advanced user guide
### Implementation of BP
The BP algorithm is implemented in SINGA following the below pseudo code,
BPTrainOnebatch(step, net) {
// forward propagate
foreach layer in net.local_layers() {
if IsBridgeDstLayer(layer)
recv data from the src layer (i.e., BridgeSrcLayer)
foreach param in layer.params()
Collect(param) // recv response from servers for last update
layer.ComputeFeature(kForward)
if IsBridgeSrcLayer(layer)
send layer.data_ to dst layer
}
// backward propagate
foreach layer in reverse(net.local_layers) {
if IsBridgeSrcLayer(layer)
recv gradient from the dst layer (i.e., BridgeDstLayer)
recv response from servers for last update
layer.ComputeGradient()
foreach param in layer.params()
Update(step, param) // send param.grad_ to servers
if IsBridgeDstLayer(layer)
send layer.grad_ to src layer
}
}
It forwards features through all local layers (can be checked by layer
partition ID and worker ID) and backwards gradients in the reverse order.
[BridgeSrcLayer](layer.html#bridgesrclayer--bridgedstlayer)
(resp. `BridgeDstLayer`) will be blocked until the feature (resp.
gradient) from the source (resp. destination) layer comes. Parameter gradients
are sent to servers via `Update` function. Updated parameters are collected via
`Collect` function, which will be blocked until the parameter is updated.
[Param](param.html) objects have versions, which can be used to
check whether the `Param` objects have been updated or not.
Since RNN models are unrolled into feed-forward models, users need to implement
the forward propagation in the recurrent layer's `ComputeFeature` function,
and implement the backward propagation in the recurrent layer's `ComputeGradient`
function. As a result, the whole `TrainOneBatch` runs
[back-propagation through time (BPTT)](https://en.wikipedia.org/wiki/Backpropagation_through_time) algorithm.
### Implementation of CD
The CD algorithm is implemented in SINGA following the below pseudo code,
CDTrainOneBatch(step, net) {
# positive phase
foreach layer in net.local_layers()
if IsBridgeDstLayer(layer)
recv positive phase data from the src layer (i.e., BridgeSrcLayer)
foreach param in layer.params()
Collect(param) // recv response from servers for last update
layer.ComputeFeature(kPositive)
if IsBridgeSrcLayer(layer)
send positive phase data to dst layer
# negative phase
foreach gibbs in [0...layer_proto_.cd_k]
foreach layer in net.local_layers()
if IsBridgeDstLayer(layer)
recv negative phase data from the src layer (i.e., BridgeSrcLayer)
layer.ComputeFeature(kPositive)
if IsBridgeSrcLayer(layer)
send negative phase data to dst layer
foreach layer in net.local_layers()
layer.ComputeGradient()
foreach param in layer.params
Update(param)
}
Parameter gradients are computed after the positive phase and negative phase.
### Implementing a new algorithm
SINGA implements BP and CD by creating two subclasses of
the [Worker](../api/classsinga_1_1Worker.html) class:
[BPWorker](../api/classsinga_1_1BPWorker.html)'s `TrainOneBatch` function implements the BP
algorithm; [CDWorker](../api/classsinga_1_1CDWorker.html)'s `TrainOneBatch` function implements the CD
algorithm. To implement a new algorithm for the `TrainOneBatch` function, users
need to create a new subclass of the `Worker`, e.g.,
class FooWorker : public Worker {
void TrainOneBatch(int step, shared_ptr<NeuralNet> net, Metric* perf) override;
void TestOneBatch(int step, Phase phase, shared_ptr<NeuralNet> net, Metric* perf) override;
};
The `FooWorker` must implement the above two functions for training one
mini-batch and testing one mini-batch. The `perf` argument is for collecting
training or testing performance, e.g., the objective loss or accuracy. It is
passed to the `ComputeFeature` function of each layer.
Users can define some fields for users to configure
# in user.proto
message FooWorkerProto {
optional int32 b = 1;
}
extend JobProto {
optional FooWorkerProto foo_conf = 101;
}
# in job.proto
JobProto {
...
extension 101..max;
}
It is similar as [adding configuration fields for a new layer](layer.html#implementing-a-new-layer-subclass).
To use `FooWorker`, users need to register it in the [main.cc](programming-guide.html)
and configure the `alg` and `foo_conf` fields,
# in main.cc
const int kFoo = 3; // worker ID, must be different to that of CDWorker and BPWorker
driver.RegisterWorker<FooWorker>(kFoo);
# in job.conf
...
alg: 3
[foo_conf] {
b = 4;
}