blob: c6ca15b1bda175fbe519620f9df634d2a97aade6 [file] [log] [blame]
# Updater
---
Every server in SINGA has an [Updater](../api/classsinga_1_1Updater.html)
instance that updates parameters based on gradients.
In this page, the *Basic user guide* describes the configuration of an updater.
The *Advanced user guide* present details on how to implement a new updater and a new
learning rate changing method.
## Basic user guide
There are many different parameter updating protocols (i.e., subclasses of
`Updater`). They share some configuration fields like
* `type`, an integer for identifying an updater;
* `learning_rate`, configuration for the
[LRGenerator](../api/classsinga_1_1LRGenerator.html) which controls the learning rate.
* `weight_decay`, the co-efficient for [L2 * regularization](http://deeplearning.net/tutorial/gettingstarted.html#regularization).
* [momentum](http://ufldl.stanford.edu/tutorial/supervised/OptimizationStochasticGradientDescent/).
If you are not familiar with the above terms, you can get their meanings in
[this page provided by Karpathy](http://cs231n.github.io/neural-networks-3/#update).
### Configuration of built-in updater classes
#### Updater
The base `Updater` implements the [vanilla SGD algorithm](http://cs231n.github.io/neural-networks-3/#sgd).
Its configuration type is `kSGD`.
Users need to configure at least the `learning_rate` field.
`momentum` and `weight_decay` are optional fields.
updater{
type: kSGD
momentum: float
weight_decay: float
learning_rate {
...
}
}
#### AdaGradUpdater
It inherits the base `Updater` to implement the
[AdaGrad](http://www.magicbroom.info/Papers/DuchiHaSi10.pdf) algorithm.
Its type is `kAdaGrad`.
`AdaGradUpdater` is configured similar to `Updater` except
that `momentum` is not used.
#### NesterovUpdater
It inherits the base `Updater` to implements the
[Nesterov](http://arxiv.org/pdf/1212.0901v2.pdf) (section 3.5) updating protocol.
Its type is `kNesterov`.
`learning_rate` and `momentum` must be configured. `weight_decay` is an
optional configuration field.
#### RMSPropUpdater
It inherits the base `Updater` to implements the
[RMSProp algorithm](http://cs231n.github.io/neural-networks-3/#sgd) proposed by
[Hinton](http://www.cs.toronto.edu/%7Etijmen/csc321/slides/lecture_slides_lec6.pdf)(slide 29).
Its type is `kRMSProp`.
updater {
type: kRMSProp
rmsprop_conf {
rho: float # [0,1]
}
}
#### AdaDeltaUpdater
It inherits the base `Updater` to implements the
[AdaDelta](http://arxiv.org/abs/1212.5701) updating algorithm.
Its type is `kAdaDelta`.
updater {
type: kAdaDelta
adadelta_conf {
rho: float # [0,1]
}
}
#### Adam
It inherits the base `Updater` to implements the
[Adam](http://arxiv.org/pdf/1412.6980.pdf) updating algorithm.
Its type is `kAdam`.
`beta1` and `beta2` is floats, 0 < `beta` < 1, generally close to 1.
updater {
type: kAdam
adam_conf {
beta1: float # [0,1]
beta2: float # [0,1]
}
}
#### AdaMax
It inherits the base `Updater` to implements the
[AdaMax](http://arxiv.org/pdf/1412.6980.pdf) updating algorithm.
Its type is `kAdamMax`.
`beta1` and `beta2` is floats, 0 < `beta` < 1, generally close to 1.
updater {
type: kAdamMax
adammax_conf {
beta1: float # [0,1]
beta2: float # [0,1]
}
}
### Configuration of learning rate
The `learning_rate` field is configured as,
learning_rate {
type: ChangeMethod
base_lr: float # base/initial learning rate
... # fields to a specific changing method
}
The common fields include `type` and `base_lr`. SINGA provides the following
`ChangeMethod`s.
#### kFixed
The `base_lr` is used for all steps.
#### kLinear
The updater should be configured like
learning_rate {
base_lr: float
linear_conf {
freq: int
final_lr: float
}
}
Linear interpolation is used to change the learning rate,
lr = (1 - step / freq) * base_lr + (step / freq) * final_lr
#### kExponential
The udapter should be configured like
learning_rate {
base_lr: float
exponential_conf {
freq: int
}
}
The learning rate for `step` is
lr = base_lr / 2^(step / freq)
#### kInverseT
The updater should be configured like
learning_rate {
base_lr: float
inverset_conf {
final_lr: float
}
}
The learning rate for `step` is
lr = base_lr / (1 + step / final_lr)
#### kInverse
The updater should be configured like
learning_rate {
base_lr: float
inverse_conf {
gamma: float
pow: float
}
}
The learning rate for `step` is
lr = base_lr * (1 + gamma * setp)^(-pow)
#### kStep
The updater should be configured like
learning_rate {
base_lr : float
step_conf {
change_freq: int
gamma: float
}
}
The learning rate for `step` is
lr = base_lr * gamma^ (step / change_freq)
#### kFixedStep
The updater should be configured like
learning_rate {
fixedstep_conf {
step: int
step_lr: float
step: int
step_lr: float
...
}
}
Denote the i-th tuple as (step[i], step_lr[i]), then the learning rate for
`step` is,
step_lr[k]
where step[k] is the smallest number that is larger than `step`.
## Advanced user guide
### Implementing a new Updater subclass
The base Updater class has one virtual function,
class Updater{
public:
virtual void Update(int step, Param* param, float grad_scale = 1.0f) = 0;
protected:
UpdaterProto proto_;
LRGenerator lr_gen_;
};
It updates the values of the `param` based on its gradients. The `step`
argument is for deciding the learning rate which may change through time
(step). `grad_scale` scales the original gradient values. This function is
called by servers once it receives all gradients for the same `Param` object.
To implement a new Updater subclass, users must override the `Update` function.
class FooUpdater : public Updater {
void Update(int step, Param* param, float grad_scale = 1.0f) override;
};
Configuration of this new updater can be declared similar to that of a new
layer,
# in user.proto
FooUpdaterProto {
optional int32 c = 1;
}
extend UpdaterProto {
optional FooUpdaterProto fooupdater_conf= 101;
}
The new updater should be registered in the
[main function](programming-guide.html)
driver.RegisterUpdater<FooUpdater>("FooUpdater");
Users can then configure the job as
# in job.conf
updater {
user_type: "FooUpdater" # must use user_type with the same string identifier as the one used for registration
fooupdater_conf {
c : 20;
}
}
### Implementing a new LRGenerator subclass
The base `LRGenerator` is declared as,
virtual float Get(int step);
To implement a subclass, e.g., `FooLRGen`, users should declare it like
class FooLRGen : public LRGenerator {
public:
float Get(int step) override;
};
Configuration of `FooLRGen` can be defined using a protocol message,
# in user.proto
message FooLRProto {
...
}
extend LRGenProto {
optional FooLRProto foolr_conf = 101;
}
The configuration is then like,
learning_rate {
user_type : "FooLR" # must use user_type with the same string identifier as the one used for registration
base_lr: float
foolr_conf {
...
}
}
Users have to register this subclass in the main function,
driver.RegisterLRGenerator<FooLRGen, std::string>("FooLR")