blob: 3873b9721c122d0e35bd2a6a1171c336d756b447 [file] [log] [blame]
``mx.opt.adadelta``
======================================
Description
----------------------
Create an AdaDelta optimizer with respective parameters.
AdaDelta optimizer as described in Zeiler, M. D. (2012).
*ADADELTA: An adaptive learning rate method.*
http://arxiv.org/abs/1212.5701
Usage
----------
.. code:: r
mx.opt.adadelta(
rho = 0.9,
epsilon = 1e-05,
wd = 0,
rescale.grad = 1,
clip_gradient = -1
)
Arguments
------------------
+----------------------------------------+------------------------------------------------------------+
| Argument | Description |
+========================================+============================================================+
| ``rho`` | float, default=0.90. |
| | |
| | Decay rate for both squared gradients and delta x. |
+----------------------------------------+------------------------------------------------------------+
| ``epsilon`` | float, default=1e-5. |
| | |
| | The constant as described in the thesis. |
+----------------------------------------+------------------------------------------------------------+
| ``wd`` | float, default=0.0. |
| | |
| | L2 regularization coefficient add to all the weights. |
+----------------------------------------+------------------------------------------------------------+
| ``rescale.grad`` | float, default=1. |
| | |
| | rescaling factor of gradient. |
+----------------------------------------+------------------------------------------------------------+
| ``clip_gradient`` | float, default=-1 (no clipping if < 0). |
| | |
| | clip gradient in range [-clip_gradient, clip_gradient]. |
+----------------------------------------+------------------------------------------------------------+