id: optimizer title: Optimizer

SINGA supports various popular optimizers including stochastic gradient descent with momentum, Adam, RMSProp, and AdaGrad, etc. For each of the optimizer, it supports to use a decay schedular to schedule the learning rate to be applied in different epochs. The optimizers and the decay schedulers are included in singa/opt.py.

Create an optimizer

  1. SGD with momentum
# define hyperparameter learning rate
lr = 0.001
# define hyperparameter momentum
momentum = 0.9
# define hyperparameter weight decay 
weight_decay = 0.0001

from singa import opt
sgd = opt.SGD(lr=lr, momentum=momentum, weight_decay=weight_decay)
  1. RMSProp
# define hyperparameter learning rate
lr = 0.001
# define hyperparameter rho
rho = 0.9
# define hyperparameter epsilon 
epsilon = 1e-8
# define hyperparameter weight decay 
weight_decay = 0.0001

from singa import opt
sgd = opt.RMSProp(lr=lr, rho=rho, epsilon=epsilon, weight_decay=weight_decay)
  1. AdaGrad
# define hyperparameter learning rate
lr = 0.001
# define hyperparameter epsilon 
epsilon = 1e-8
# define hyperparameter weight decay 
weight_decay = 0.0001

from singa import opt
sgd = opt.AdaGrad(lr=lr, epsilon=epsilon, weight_decay=weight_decay)
  1. Adam
# define hyperparameter learning rate
lr = 0.001
# define hyperparameter beta 1 
beta_1= 0.9
# define hyperparameter beta 2 
beta_1= 0.999
# define hyperparameter epsilon 
epsilon = 1e-8
# define hyperparameter weight decay 
weight_decay = 0.0001

from singa import opt
sgd = opt.Adam(lr=lr, beta_1=beta_1, beta_2=beta_2, epsilon=epsilon, weight_decay=weight_decay)

Create a Decay Scheduler

from singa import opt

# define initial learning rate
lr_init = 0.001
# define the rate of decay in the decay scheduler
decay_rate = 0.95
# define whether the learning rate schedule is a staircase shape
staircase=True
# define the decay step of the decay scheduler (in this example the lr is decreased at every 2 steps)
decay_steps = 2

# create the decay scheduler, the schedule of lr becomes lr_init * (decay_rate ^ (step // decay_steps) )
lr = opt.ExponentialDecay(0.1, 2, 0.5, True)
# Use the lr to create an optimizer
sgd = opt.SGD(lr=lr, momentum=0.9, weight_decay=0.0001)

Use the optimizer in Model API

When we create the model, we need to attach the optimizer to the model.

# create a CNN using the Model API
model = CNN()

# initialize optimizer and attach it to the model
sgd = opt.SGD(lr=0.005, momentum=0.9, weight_decay=1e-5)
model.set_optimizer(sgd)

Then, when we call the model, it runs the train_one_batch method that utilizes the optimizer.

Hence, an example of an iterative loop to optimize the model is:

for b in range(num_train_batch):
    # generate the next mini-batch
    x, y = ...

    # Copy the data into input tensors
    tx.copy_from_numpy(x)
    ty.copy_from_numpy(y)

    # Training with one batch
    out, loss = model(tx, ty)