SINGA supports various popular optimizers including stochastic gradient descent with momentum, Adam, RMSProp, and AdaGrad, etc. For each of the optimizer, it supports to use a decay schedular to schedule the learning rate to be applied in different epochs. The optimizers and the decay schedulers are included in singa/opt.py
.
# define hyperparameter learning rate lr = 0.001 # define hyperparameter momentum momentum = 0.9 # define hyperparameter weight decay weight_decay = 0.0001 from singa import opt sgd = opt.SGD(lr=lr, momentum=momentum, weight_decay=weight_decay)
# define hyperparameter learning rate lr = 0.001 # define hyperparameter rho rho = 0.9 # define hyperparameter epsilon epsilon = 1e-8 # define hyperparameter weight decay weight_decay = 0.0001 from singa import opt sgd = opt.RMSProp(lr=lr, rho=rho, epsilon=epsilon, weight_decay=weight_decay)
# define hyperparameter learning rate lr = 0.001 # define hyperparameter epsilon epsilon = 1e-8 # define hyperparameter weight decay weight_decay = 0.0001 from singa import opt sgd = opt.AdaGrad(lr=lr, epsilon=epsilon, weight_decay=weight_decay)
# define hyperparameter learning rate lr = 0.001 # define hyperparameter beta 1 beta_1= 0.9 # define hyperparameter beta 2 beta_1= 0.999 # define hyperparameter epsilon epsilon = 1e-8 # define hyperparameter weight decay weight_decay = 0.0001 from singa import opt sgd = opt.Adam(lr=lr, beta_1=beta_1, beta_2=beta_2, epsilon=epsilon, weight_decay=weight_decay)
from singa import opt # define initial learning rate lr_init = 0.001 # define the rate of decay in the decay scheduler decay_rate = 0.95 # define whether the learning rate schedule is a staircase shape staircase=True # define the decay step of the decay scheduler (in this example the lr is decreased at every 2 steps) decay_steps = 2 # create the decay scheduler, the schedule of lr becomes lr_init * (decay_rate ^ (step // decay_steps) ) lr = opt.ExponentialDecay(0.1, 2, 0.5, True) # Use the lr to create an optimizer sgd = opt.SGD(lr=lr, momentum=0.9, weight_decay=0.0001)
When we create the model, we need to attach the optimizer to the model.
# create a CNN using the Model API model = CNN() # initialize optimizer and attach it to the model sgd = opt.SGD(lr=0.005, momentum=0.9, weight_decay=1e-5) model.set_optimizer(sgd)
Then, when we call the model, it runs the train_one_batch
method that utilizes the optimizer.
Hence, an example of an iterative loop to optimize the model is:
for b in range(num_train_batch): # generate the next mini-batch x, y = ... # Copy the data into input tensors tx.copy_from_numpy(x) ty.copy_from_numpy(y) # Training with one batch out, loss = model(tx, ty)