blob: 8aa8aa76bcf364ec1a1dc3694e44f880e66f66c8 [file] [log] [blame]
``mx.symbol.sgd_update``
================================================
Description
----------------------
Update function for Stochastic Gradient Descent (SGD) optimizer.
It updates the weights using::
weight = weight - learning_rate * (gradient + wd * weight)
However, if gradient is of ``row_sparse`` storage type and ``lazy_update`` is True,
only the row slices whose indices appear in grad.indices are updated::
for row in gradient.indices:
weight[row] = weight[row] - learning_rate * (gradient[row] + wd * weight[row])
Usage
----------
.. code:: r
mx.symbol.sgd_update(...)
Arguments
------------------
+----------------------------------------+------------------------------------------------------------+
| Argument | Description |
+========================================+============================================================+
| ``weight`` | NDArray-or-Symbol. |
| | |
| | Weight |
+----------------------------------------+------------------------------------------------------------+
| ``grad`` | NDArray-or-Symbol. |
| | |
| | Gradient |
+----------------------------------------+------------------------------------------------------------+
| ``lr`` | float, required. |
| | |
| | Learning rate |
+----------------------------------------+------------------------------------------------------------+
| ``wd`` | float, optional, default=0. |
| | |
| | Weight decay augments the objective function with a |
| | regularization term that penalizes large weights. The |
| | penalty scales with the square of the magnitude of each |
| | weight. |
+----------------------------------------+------------------------------------------------------------+
| ``rescale.grad`` | float, optional, default=1. |
| | |
| | Rescale gradient to grad = rescale_grad*grad. |
+----------------------------------------+------------------------------------------------------------+
| ``clip.gradient`` | float, optional, default=-1. |
| | |
| | Clip gradient to the range of [-clip_gradient, |
| | clip_gradient] If clip_gradient <= 0, gradient clipping is |
| | turned off. grad = max(min(grad, clip_gradient), |
| | -clip_gradient). |
+----------------------------------------+------------------------------------------------------------+
| ``lazy.update`` | boolean, optional, default=1. |
| | |
| | If true, lazy updates are applied if gradient's stype is |
| | row_sparse. |
+----------------------------------------+------------------------------------------------------------+
| ``name`` | string, optional. |
| | |
| | Name of the resulting symbol. |
+----------------------------------------+------------------------------------------------------------+
Value
----------
``out`` The result mx.symbol
Link to Source Code: http://github.com/apache/incubator-mxnet/blob/1.6.0/src/operator/optimizer_op.cc#L524