blob: 670052aa3fd03cbe910db202d4751b56d5e877be [file] [log] [blame]
Update function for RMSPropAlex optimizer.
`RMSPropAlex` is non-centered version of `RMSProp`.
Define :math:`E[g^2]_t` is the decaying average over past squared gradient and
:math:`E[g]_t` is the decaying average over past gradient.
.. math::
E[g^2]_t = \gamma_1 * E[g^2]_{t-1} + (1 - \gamma_1) * g_t^2\\
E[g]_t = \gamma_1 * E[g]_{t-1} + (1 - \gamma_1) * g_t\\
\Delta_t = \gamma_2 * \Delta_{t-1} - \frac{\eta}{\sqrt{E[g^2]_t - E[g]_t^2 + \epsilon}} g_t\\
The update step is
.. math::
\theta_{t+1} = \theta_t + \Delta_t
The RMSPropAlex code follows the version in Eq(38) - Eq(45) by Alex Graves, 2013.
Graves suggests the momentum term :math:`\gamma_1` to be 0.95, :math:`\gamma_2`
to be 0.9 and the learning rate :math:`\eta` to be 0.0001.
| Argument | Description |
| ``weight`` | NDArray-or-Symbol. |
| | |
| | Weight |
| ``grad`` | NDArray-or-Symbol. |
| | |
| | Gradient |
| ``n`` | NDArray-or-Symbol |
| | n |
| ``g`` | NDArray-or-Symbol |
| | g |
| ``delta`` | NDArray-or-Symbol |
| | delta |
| ``lr`` | float, required. |
| | |
| | Learning rate |
| ``gamma1`` | float, optional, default=0.949999988. |
| | |
| | Decay rate. |
| ``gamma2`` | float, optional, default=0.899999976. |
| | |
| | Decay rate. |
| ``epsilon`` | float, optional, default=9.99999994e-09. |
| | |
| | A small constant for numerical stability. |
| ``wd`` | float, optional, default=0. |
| | |
| | Weight decay augments the objective function with a |
| | regularization term that penalizes large weights. The |
| | penalty scales with the square of the magnitude of each |
| | weight. |
| ``rescale.grad`` | float, optional, default=1. |
| | |
| | Rescale gradient to grad = rescale_grad*grad. |
| ``clip.gradient`` | float, optional, default=-1. |
| | |
| | Clip gradient to the range of [-clip_gradient, |
| | clip_gradient] If clip_gradient <= 0, gradient clipping is |
| | turned off. grad = max(min(grad, clip_gradient), |
| | -clip_gradient). |
| ``clip.weights`` | float, optional, default=-1. |
| | |
| | Clip weights to the range of [-clip_weights, clip_weights] |
| | If clip_weights <= 0, weight clipping is turned off. |
| | weights = max(min(weights, clip_weights), |
| | -clip_weights). |
``out`` The result mx.ndarray
Link to Source Code: