blob: faa2d25ce502c5f020284f0360361dfab666b43d [file] [log] [blame]
``mx.nd.LayerNorm``
======================================
Description
----------------------
Layer normalization.
Normalizes the channels of the input tensor by mean and variance, and applies a scale ``gamma`` as
well as offset ``beta``.
Assume the input has more than one dimension and we normalize along axis 1.
We first compute the mean and variance along this axis and then
compute the normalized output, which has the same shape as input, as following:
.. math::
out = \frac{data - mean(data, axis)}{\sqrt{var(data, axis) + \epsilon}} * gamma + beta
Both ``gamma`` and ``beta`` are learnable parameters.
Unlike BatchNorm and InstanceNorm, the *mean* and *var* are computed along the channel dimension.
Assume the input has size *k* on axis 1, then both ``gamma`` and ``beta``
have shape *(k,)*. If ``output_mean_var`` is set to be true, then outputs both ``data_mean`` and
``data_std``. Note that no gradient will be passed through these two outputs.
The parameter ``axis`` specifies which axis of the input shape denotes
the 'channel' (separately normalized groups). The default is -1, which sets the channel
axis to be the last item in the input shape.
Arguments
------------------
+----------------------------------------+------------------------------------------------------------+
| Argument | Description |
+========================================+============================================================+
| ``data`` | NDArray-or-Symbol. |
| | |
| | Input data to layer normalization |
+----------------------------------------+------------------------------------------------------------+
| ``gamma`` | NDArray-or-Symbol |
| | gamma array |
+----------------------------------------+------------------------------------------------------------+
| ``beta`` | NDArray-or-Symbol |
| | beta array |
+----------------------------------------+------------------------------------------------------------+
| ``axis`` | int, optional, default='-1'. |
| | |
| | The axis to perform layer normalization. Usually, this |
| | should be be axis of the channel dimension. Negative |
| | values means indexing from right to |
| | left. |
+----------------------------------------+------------------------------------------------------------+
| ``eps`` | float, optional, default=9.99999975e-06. |
| | |
| | An `epsilon` parameter to prevent division by 0. |
+----------------------------------------+------------------------------------------------------------+
| ``output.mean.var`` | boolean, optional, default=0. |
| | |
| | Output the mean and std calculated along the given axis. |
+----------------------------------------+------------------------------------------------------------+
Value
----------
``out`` The result mx.ndarray
Link to Source Code: http://github.com/apache/incubator-mxnet/blob/1.6.0/src/operator/nn/layer_norm.cc#L156