blob: 50647d8973035141d77e2d45f263ffdc2acbf840 [file] [log] [blame]
``mx.opt.rmsprop``
====================================
Description
----------------------
Create an RMSProp optimizer with respective parameters.
Reference: Tieleman T, Hinton G. Lecture 6.5- Divide the gradient by a running average of its recent magnitude[J]. COURSERA: Neural Networks for Machine Learning, 2012, 4(2).
The code follows: http://arxiv.org/pdf/1308.0850v5.pdf Eq(38) - Eq(45) by Alex Graves, 2013.
Create an RMSProp optimizer with respective parameters.
Reference: Tieleman T, Hinton G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude[J]. COURSERA: Neural Networks for Machine Learning, 2012, 4(2).
The code follows: http://arxiv.org/pdf/1308.0850v5.pdf Eq(38) - Eq(45) by Alex Graves, 2013.
Usage
----------
.. code:: r
mx.opt.rmsprop(
learning.rate = 0.002,
centered = TRUE,
gamma1 = 0.95,
gamma2 = 0.9,
epsilon = 1e-04,
wd = 0,
rescale.grad = 1,
clip_gradient = -1,
lr_scheduler = NULL
)
Arguments
------------------
+----------------------------------------+------------------------------------------------------------+
| Argument | Description |
+========================================+============================================================+
| ``learning.rate`` | float, default=0.002. |
| | |
| | The initial learning rate. |
+----------------------------------------+------------------------------------------------------------+
| ``gamma1`` | float, default=0.95. |
| | |
| | decay factor of moving average for gradient, gradient^2. |
+----------------------------------------+------------------------------------------------------------+
| ``gamma2`` | float, default=0.9. |
| | |
| | "momentum" factor. |
+----------------------------------------+------------------------------------------------------------+
| ``epsilon`` | float, default=1e-4 |
+----------------------------------------+------------------------------------------------------------+
| ``wd`` | float, default=0.0. |
| | |
| | L2 regularization coefficient add to all the weights. |
+----------------------------------------+------------------------------------------------------------+
| ``rescale.grad`` | float, default=1.0. |
| | |
| | rescaling factor of gradient. |
+----------------------------------------+------------------------------------------------------------+
| ``clip_gradient`` | float, optional, default=-1 (no clipping if < 0). |
| | |
| | clip gradient in range [-clip_gradient, clip_gradient]. |
+----------------------------------------+------------------------------------------------------------+
| ``lr_scheduler`` | function, optional. |
| | |
| | The learning rate scheduler. |
+----------------------------------------+------------------------------------------------------------+