Optimization: initialize and update weights

Overview

This document summaries the APIs used to initialize and update the model weights during training

.. autosummary::
    :nosignatures:

    mxnet.initializer
    mxnet.optimizer
    mxnet.lr_scheduler

and how to develop a new optimization algorithm in MXNet.

Assume there there is a pre-defined Symbol and a Module is created for it

>>> data = mx.symbol.Variable('data')
>>> label = mx.symbol.Variable('softmax_label')
>>> fc = mx.symbol.FullyConnected(data, name='fc', num_hidden=10)
>>> loss = mx.symbol.SoftmaxOutput(fc, label, name='softmax')
>>> mod = mx.mod.Module(loss)
>>> mod.bind(data_shapes=[('data', (128,20))], label_shapes=[('softmax_label', (128,))])

Next we can initialize the weights with values sampled uniformly from [-1,1]:

>>> mod.init_params(mx.initializer.Uniform(scale=1.0))

Then we will train a model with standard SGD which decreases the learning rate by multiplying 0.9 for each 100 batches.

>>> lr_sch = mx.lr_scheduler.FactorScheduler(step=100, factor=0.9)
>>> mod.init_optimizer(
...     optimizer='sgd', optimizer_params=(('learning_rate', 0.1), ('lr_scheduler', lr_sch)))

Finally run mod.fit(...) to start training.

The `mxnet.initializer` package

.. currentmodule:: mxnet.initializer

The base class Initializer defines the default behaviors to initialize various parameters, such as set bias to 1, except for the weight. Other classes then defines how to initialize the weight.

.. autosummary::
    :nosignatures:

    Initializer
    Uniform
    Normal
    Load
    Mixed
    Zero
    One
    Constant
    Orthogonal
    Xavier
    MSRAPrelu
    Bilinear
    FusedRNN

The `mxnet.optimizer` package

.. currentmodule:: mxnet.optimizer

The base class Optimizer accepts commonly shared arguments such as learning_rate and defines the interface. Each other class in this package implements one weight updating function.

.. autosummary::
    :nosignatures:

    Optimizer
    SGD
    NAG
    RMSProp
    Adam
    AdaGrad
    AdaDelta
    DCASGD
    SGLD

The `mxnet.lr_scheduler` package

.. currentmodule:: mxnet.lr_scheduler

The base class LRScheduler defines the interface, while other classes implement various schemes to change the learning rate during training.

.. autosummary::
    :nosignatures:

    LRScheduler
    FactorScheduler
    MultiFactorScheduler

Implement a new algorithm

Most classes listed in this document are implemented in Python by using NDArray. So implementing new weight updating or initialization functions is straightforward.

For initializer, create a subclass of Initializer and define the _init_weight method. We can also change the default behaviors to initialize other parameters such as _init_bias. See initializer.py for examples.

For optimizer, create a subclass of Optimizer and implement two methods create_state and update. Also add @mx.optimizer.Optimizer.register before this class. See optimizer.py for examples.

For lr_scheduler, create a subclass of LRScheduler and then implement the __call__ method. See lr_scheduler.py for examples.

API Reference

.. automodule:: mxnet.optimizer
    :members:
.. automodule:: mxnet.lr_scheduler
    :members:
.. automodule:: mxnet.initializer
    :members: