blob: 92ded6670287309dca67f313395f63b4905bc15f [file] [log] [blame]
{
"cells": [
{
"cell_type": "markdown",
"id": "aa1e3f75",
"metadata": {},
"source": [
"<!--- Licensed to the Apache Software Foundation (ASF) under one -->\n",
"<!--- or more contributor license agreements. See the NOTICE file -->\n",
"<!--- distributed with this work for additional information -->\n",
"<!--- regarding copyright ownership. The ASF licenses this file -->\n",
"<!--- to you under the Apache License, Version 2.0 (the -->\n",
"<!--- \"License\"); you may not use this file except in compliance -->\n",
"<!--- with the License. You may obtain a copy of the License at -->\n",
"\n",
"<!--- http://www.apache.org/licenses/LICENSE-2.0 -->\n",
"\n",
"<!--- Unless required by applicable law or agreed to in writing, -->\n",
"<!--- software distributed under the License is distributed on an -->\n",
"<!--- \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->\n",
"<!--- KIND, either express or implied. See the License for the -->\n",
"<!--- specific language governing permissions and limitations -->\n",
"<!--- under the License. -->\n",
"\n",
"\n",
"# RowSparseNDArray - NDArray for Sparse Gradient Updates\n",
"\n",
"## Motivation\n",
"\n",
"Many real world datasets deal with high dimensional sparse feature vectors. When learning\n",
"the weights of models with sparse datasets, the derived gradients of the weights could be sparse.\n",
"\n",
"Let's say we perform a matrix multiplication of ``X`` and ``W``, where ``X`` is a 1x2 matrix, and ``W`` is a 2x3 matrix. Let ``Y`` be the matrix multiplication of the two matrices:"
]
},
{
"cell_type": "markdown",
"id": "bcbf6a8c",
"metadata": {},
"source": [
"```python\n",
"import mxnet as mx\n",
"X = mx.nd.array([[1,0]])\n",
"W = mx.nd.array([[3,4,5], [6,7,8]])\n",
"Y = mx.nd.dot(X, W)\n",
"{'X': X, 'W': W, 'Y': Y}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "747e203a",
"metadata": {},
"source": [
"```\n",
"{'W':\n",
" [[ 3. 4. 5.]\n",
" [ 6. 7. 8.]]\n",
" <NDArray 2x3 @cpu(0)>, 'X':\n",
" [[ 1. 0.]]\n",
" <NDArray 1x2 @cpu(0)>, 'Y':\n",
" [[ 3. 4. 5.]]\n",
" <NDArray 1x3 @cpu(0)>}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "6bbf5c73",
"metadata": {},
"source": [
"As you can see,"
]
},
{
"cell_type": "markdown",
"id": "e4081d37",
"metadata": {},
"source": [
"```\n",
"Y[0][0] = X[0][0] * W[0][0] + X[0][1] * W[1][0] = 1 * 3 + 0 * 6 = 3\n",
"Y[0][1] = X[0][0] * W[0][1] + X[0][1] * W[1][1] = 1 * 4 + 0 * 7 = 4\n",
"Y[0][2] = X[0][0] * W[0][2] + X[0][1] * W[1][2] = 1 * 5 + 0 * 8 = 5\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "2b4e8866",
"metadata": {},
"source": [
"What about dY / dW, the gradient for ``W``? Let's call it ``grad_W``. To start with, the shape of ``grad_W`` is the same as that of ``W`` as we are taking the derivatives with respect to ``W``, which is 2x3. Then we calculate each entry in ``grad_W`` as follows:"
]
},
{
"cell_type": "markdown",
"id": "e73be284",
"metadata": {},
"source": [
"```\n",
"grad_W[0][0] = X[0][0] = 1\n",
"grad_W[0][1] = X[0][0] = 1\n",
"grad_W[0][2] = X[0][0] = 1\n",
"grad_W[1][0] = X[0][1] = 0\n",
"grad_W[1][1] = X[0][1] = 0\n",
"grad_W[1][2] = X[0][1] = 0\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "d6ffb360",
"metadata": {},
"source": [
"As a matter of fact, you can calculate ``grad_W`` by multiplying the transpose of ``X`` with a matrix of ones:"
]
},
{
"cell_type": "markdown",
"id": "3c9b9e89",
"metadata": {},
"source": [
"```python\n",
"grad_W = mx.nd.dot(X, mx.nd.ones_like(Y), transpose_a=True)\n",
"grad_W\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "d2f60e21",
"metadata": {},
"source": [
"```\n",
"[[ 1. 1. 1.]\n",
" [ 0. 0. 0.]]\n",
"<NDArray 2x3 @cpu(0)>\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "6015038c",
"metadata": {},
"source": [
"As you can see, row 0 of ``grad_W`` contains non-zero values while row 1 of ``grad_W`` does not. Why did that happen?\n",
"If you look at how ``grad_W`` is calculated, notice that since column 1 of ``X`` is filled with zeros, row 1 of ``grad_W`` is filled with zeros too.\n",
"\n",
"In the real world, gradients for parameters that interact with sparse inputs ususally have gradients where many row slices are completely zeros. Storing and manipulating such sparse matrices with many row slices of all zeros in the default dense structure results in wasted memory and processing on the zeros. More importantly, many gradient based optimization methods such as SGD, [AdaGrad](https://stanford.edu/~jduchi/projects/DuchiHaSi10_colt.pdf) and [Adam](https://arxiv.org/pdf/1412.6980.pdf)\n",
"take advantage of sparse gradients and prove to be efficient and effective.\n",
"**In MXNet, the ``RowSparseNDArray`` stores the matrix in ``row sparse`` format, which is designed for arrays of which most row slices are all zeros.**\n",
"In this tutorial, we will describe what the row sparse format is and how to use RowSparseNDArray for sparse gradient updates in MXNet.\n",
"\n",
"## Prerequisites\n",
"\n",
"To complete this tutorial, we need:\n",
"\n",
"- MXNet. See the instructions for your operating system in [Setup and Installation](/get_started)\n",
"- [Jupyter](http://jupyter.org/)\n",
" ```\n",
" pip install jupyter\n",
" ```\n",
"- Basic knowledge of NDArray in MXNet. See the detailed tutorial for NDArray in [NDArray - Imperative tensor operations on CPU/GPU](https://mxnet.apache.org/tutorials/basic/ndarray.html)\n",
"- Understanding of [automatic differentiation with autograd](http://gluon.mxnet.io/chapter01_crashcourse/autograd.html)\n",
"- GPUs - A section of this tutorial uses GPUs. If you don't have GPUs on your\n",
"machine, simply set the variable `gpu_device` (set in the GPUs section of this\n",
"tutorial) to `mx.cpu()`\n",
"\n",
"## Row Sparse Format\n",
"\n",
"A RowSparseNDArray represents a multidimensional NDArray of shape `[LARGE0, D1, .. , Dn]` using two separate 1D arrays:\n",
"`data` and `indices`.\n",
"\n",
"- data: an NDArray of any dtype with shape `[D0, D1, ..., Dn]`.\n",
"- indices: a 1D int64 NDArray with shape `[D0]` with values sorted in ascending order.\n",
"\n",
"The ``indices`` array stores the indices of the row slices with **non-zeros**,\n",
"while the values are stored in ``data`` array. The corresponding NDArray `dense` represented by RowSparseNDArray `rsp` has\n",
"\n",
"``dense[rsp.indices[i], :, :, :, ...] = rsp.data[i, :, :, :, ...]``\n",
"\n",
"A RowSparseNDArray is typically used to represent non-zero row slices of a large NDArray of shape `[LARGE0, D1, .. , Dn]` where LARGE0 >> D0 and most row slices are zeros.\n",
"\n",
"Given this two-dimension matrix:"
]
},
{
"cell_type": "markdown",
"id": "87f9939b",
"metadata": {},
"source": [
"```python\n",
"[[ 1, 2, 3],\n",
" [ 0, 0, 0],\n",
" [ 4, 0, 5],\n",
" [ 0, 0, 0],\n",
" [ 0, 0, 0]]\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "47f9a964",
"metadata": {},
"source": [
"The row sparse representation would be:\n",
"- `data` array holds all the non-zero row slices of the array.\n",
"- `indices` array stores the row index for each row slice with non-zero elements."
]
},
{
"cell_type": "markdown",
"id": "b97c8616",
"metadata": {},
"source": [
"```python\n",
"data = [[1, 2, 3], [4, 0, 5]]\n",
"indices = [0, 2]\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "221b914d",
"metadata": {},
"source": [
"`RowSparseNDArray` supports multidimensional arrays. Given this 3D tensor:"
]
},
{
"cell_type": "markdown",
"id": "d9d589ab",
"metadata": {},
"source": [
"```python\n",
"[[[1, 0],\n",
" [0, 2],\n",
" [3, 4]],\n",
"\n",
" [[5, 0],\n",
" [6, 0],\n",
" [0, 0]],\n",
"\n",
" [[0, 0],\n",
" [0, 0],\n",
" [0, 0]]]\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "89697311",
"metadata": {},
"source": [
"The row sparse representation would be (with `data` and `indices` defined the same as above):"
]
},
{
"cell_type": "markdown",
"id": "59e662f7",
"metadata": {},
"source": [
"```python\n",
"data = [[[1, 0], [0, 2], [3, 4]], [[5, 0], [6, 0], [0, 0]]]\n",
"indices = [0, 1]\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "fb122fb4",
"metadata": {},
"source": [
"``RowSparseNDArray`` is a subclass of ``NDArray``. If you query **stype** of a RowSparseNDArray,\n",
"the value will be **\"row_sparse\"**.\n",
"\n",
"## Array Creation\n",
"\n",
"You can create a `RowSparseNDArray` with data and indices by using the `row_sparse_array` function:"
]
},
{
"cell_type": "markdown",
"id": "9cd7118c",
"metadata": {},
"source": [
"```python\n",
"import mxnet as mx\n",
"import numpy as np\n",
"# Create a RowSparseNDArray with python lists\n",
"shape = (6, 2)\n",
"data_list = [[1, 2], [3, 4]]\n",
"indices_list = [1, 4]\n",
"a = mx.nd.sparse.row_sparse_array((data_list, indices_list), shape=shape)\n",
"# Create a RowSparseNDArray with numpy arrays\n",
"data_np = np.array([[1, 2], [3, 4]])\n",
"indices_np = np.array([1, 4])\n",
"b = mx.nd.sparse.row_sparse_array((data_np, indices_np), shape=shape)\n",
"{'a':a, 'b':b}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "c11c28f5",
"metadata": {},
"source": [
"`{'a': <RowSparseNDArray 6x2 @cpu(0)>, 'b': <RowSparseNDArray 6x2 @cpu(0)>}`<!--notebook-skip-line-->\n",
"\n",
"\n",
"\n",
"## Function Overview\n",
"\n",
"Similar to `CSRNDArray`, the are several functions with `RowSparseNDArray` that behave the same way. In the code blocks below you can try out these common functions:\n",
"\n",
"- **.dtype** - to set the data type\n",
"- **.asnumpy** - to cast as a numpy array for inspecting it\n",
"- **.data** - to access the data array\n",
"- **.indices** - to access the indices array\n",
"- **.tostype** - to set the storage type\n",
"- **.cast_storage** - to convert the storage type\n",
"- **.copy** - to copy the array\n",
"- **.copyto** - to copy to deep copy an existing array\n",
"\n",
"\n",
"## Setting Type\n",
"\n",
"You can create a `RowSparseNDArray` from another specifying the element data type with the option `dtype`, which accepts a numpy type. By default, `float32` is used."
]
},
{
"cell_type": "markdown",
"id": "ded4f787",
"metadata": {},
"source": [
"```python\n",
"# Float32 is used by default\n",
"c = mx.nd.sparse.array(a)\n",
"# Create a 16-bit float array\n",
"d = mx.nd.array(a, dtype=np.float16)\n",
"(c.dtype, d.dtype)\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "27137702",
"metadata": {},
"source": [
"`(numpy.float32, numpy.float16)`<!--notebook-skip-line-->\n",
"\n",
"\n",
"\n",
"## Inspecting Arrays\n",
"\n",
"As with `CSRNDArray`, you can inspect the contents of a `RowSparseNDArray` by filling\n",
"its contents into a dense `numpy.ndarray` using the `asnumpy` function."
]
},
{
"cell_type": "markdown",
"id": "fdfaf20a",
"metadata": {},
"source": [
"```python\n",
"a.asnumpy()\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "7d3aee78",
"metadata": {},
"source": [
"```\n",
"array([[ 0., 0.],\n",
" [ 1., 2.],\n",
" [ 0., 0.],\n",
" [ 0., 0.],\n",
" [ 3., 4.],\n",
" [ 0., 0.]], dtype=float32)\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "668b8c87",
"metadata": {},
"source": [
"You can inspect the internal storage of a RowSparseNDArray by accessing attributes such as `indices` and `data`:"
]
},
{
"cell_type": "markdown",
"id": "618ebe43",
"metadata": {},
"source": [
"```python\n",
"# Access data array\n",
"data = a.data\n",
"# Access indices array\n",
"indices = a.indices\n",
"{'a.stype': a.stype, 'data':data, 'indices':indices}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "2a07aa49",
"metadata": {},
"source": [
"```\n",
"{'a.stype': 'row_sparse', 'data':\n",
" [[ 1. 2.]\n",
" [ 3. 4.]]\n",
" <NDArray 2x2 @cpu(0)>, 'indices':\n",
" [1 4]\n",
" <NDArray 2 @cpu(0)>}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "72442129",
"metadata": {},
"source": [
"## Storage Type Conversion\n",
"\n",
"You can convert an NDArray to a RowSparseNDArray and vice versa by using the `tostype` function:"
]
},
{
"cell_type": "markdown",
"id": "dae1e7e3",
"metadata": {},
"source": [
"```python\n",
"# Create a dense NDArray\n",
"ones = mx.nd.ones((2,2))\n",
"# Cast the storage type from `default` to `row_sparse`\n",
"rsp = ones.tostype('row_sparse')\n",
"# Cast the storage type from `row_sparse` to `default`\n",
"dense = rsp.tostype('default')\n",
"{'rsp':rsp, 'dense':dense}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "c2d2a9d3",
"metadata": {},
"source": [
"```\n",
"{'dense':\n",
" [[ 1. 1.]\n",
" [ 1. 1.]]\n",
" <NDArray 2x2 @cpu(0)>, 'rsp':\n",
" <RowSparseNDArray 2x2 @cpu(0)>}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "58260325",
"metadata": {},
"source": [
"You can also convert the storage type by using the `cast_storage` operator:"
]
},
{
"cell_type": "markdown",
"id": "3c752cea",
"metadata": {},
"source": [
"```python\n",
"# Create a dense NDArray\n",
"ones = mx.nd.ones((2,2))\n",
"# Cast the storage type to `row_sparse`\n",
"rsp = mx.nd.sparse.cast_storage(ones, 'row_sparse')\n",
"# Cast the storage type to `default`\n",
"dense = mx.nd.sparse.cast_storage(rsp, 'default')\n",
"{'rsp':rsp, 'dense':dense}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "045b3b66",
"metadata": {},
"source": [
"```\n",
"{'dense':\n",
" [[ 1. 1.]\n",
" [ 1. 1.]]\n",
" <NDArray 2x2 @cpu(0)>, 'rsp':\n",
" <RowSparseNDArray 2x2 @cpu(0)>}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "c89435d2",
"metadata": {},
"source": [
"## Copies\n",
"\n",
"You can use the `copy` method which makes a deep copy of the array and its data, and returns a new array.\n",
"We can also use the `copyto` method or the slice operator `[]` to deep copy to an existing array."
]
},
{
"cell_type": "markdown",
"id": "1a72b335",
"metadata": {},
"source": [
"```python\n",
"a = mx.nd.ones((2,2)).tostype('row_sparse')\n",
"b = a.copy()\n",
"c = mx.nd.sparse.zeros('row_sparse', (2,2))\n",
"c[:] = a\n",
"d = mx.nd.sparse.zeros('row_sparse', (2,2))\n",
"a.copyto(d)\n",
"{'b is a': b is a, 'b.asnumpy()':b.asnumpy(), 'c.asnumpy()':c.asnumpy(), 'd.asnumpy()':d.asnumpy()}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "45ab1d1d",
"metadata": {},
"source": [
"```\n",
"{'b is a': False, 'b.asnumpy()': array([[ 1., 1.],\n",
" [ 1., 1.]], dtype=float32), 'c.asnumpy()': array([[ 1., 1.],\n",
" [ 1., 1.]], dtype=float32), 'd.asnumpy()': array([[ 1., 1.],\n",
" [ 1., 1.]], dtype=float32)}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "6602105c",
"metadata": {},
"source": [
"If the storage types of source array and destination array do not match,\n",
"the storage type of destination array will not change when copying with `copyto` or the slice operator `[]`. The source array will be temporarily converted to desired storage type before the copy."
]
},
{
"cell_type": "markdown",
"id": "d87156b9",
"metadata": {},
"source": [
"```python\n",
"e = mx.nd.sparse.zeros('row_sparse', (2,2))\n",
"f = mx.nd.sparse.zeros('row_sparse', (2,2))\n",
"g = mx.nd.ones(e.shape)\n",
"e[:] = g\n",
"g.copyto(f)\n",
"{'e.stype':e.stype, 'f.stype':f.stype, 'g.stype':g.stype}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "03795c7e",
"metadata": {},
"source": [
"`{'e.stype': 'row_sparse', 'f.stype': 'row_sparse', 'g.stype': 'default'}`<!--notebook-skip-line-->\n",
"\n",
"\n",
"\n",
"## Retain Row Slices\n",
"\n",
"You can retain a subset of row slices from a RowSparseNDArray specified by their row indices."
]
},
{
"cell_type": "markdown",
"id": "9fc9d307",
"metadata": {},
"source": [
"```python\n",
"data = [[1, 2], [3, 4], [5, 6]]\n",
"indices = [0, 2, 3]\n",
"rsp = mx.nd.sparse.row_sparse_array((data, indices), shape=(5, 2))\n",
"# Retain row 0 and row 1\n",
"rsp_retained = mx.nd.sparse.retain(rsp, mx.nd.array([0, 1]))\n",
"{'rsp.asnumpy()': rsp.asnumpy(), 'rsp_retained': rsp_retained, 'rsp_retained.asnumpy()': rsp_retained.asnumpy()}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "76c3631a",
"metadata": {},
"source": [
"```\n",
"{'rsp.asnumpy()': array([[ 1., 2.],\n",
" [ 0., 0.],\n",
" [ 3., 4.],\n",
" [ 5., 6.],\n",
" [ 0., 0.]], dtype=float32), 'rsp_retained':\n",
" <RowSparseNDArray 5x2 @cpu(0)>, 'rsp_retained.asnumpy()': array([[ 1., 2.],\n",
" [ 0., 0.],\n",
" [ 0., 0.],\n",
" [ 0., 0.],\n",
" [ 0., 0.]], dtype=float32)}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "e4c229fa",
"metadata": {},
"source": [
"## Sparse Operators and Storage Type Inference\n",
"\n",
"Operators that have specialized implementation for sparse arrays can be accessed in ``mx.nd.sparse``. You can read the [mxnet.ndarray.sparse API documentation](http://mxnet.apache.org/api/python/ndarray/sparse.html) to find what sparse operators are available."
]
},
{
"cell_type": "markdown",
"id": "f4f8e349",
"metadata": {},
"source": [
"```python\n",
"shape = (3, 5)\n",
"data = [7, 8, 9]\n",
"indptr = [0, 2, 2, 3]\n",
"indices = [0, 2, 1]\n",
"# A csr matrix as lhs\n",
"lhs = mx.nd.sparse.csr_matrix((data, indices, indptr), shape=shape)\n",
"# A dense matrix as rhs\n",
"rhs = mx.nd.ones((3, 2))\n",
"# row_sparse result is inferred from sparse operator dot(csr.T, dense) based on input stypes\n",
"transpose_dot = mx.nd.sparse.dot(lhs, rhs, transpose_a=True)\n",
"{'transpose_dot': transpose_dot, 'transpose_dot.asnumpy()': transpose_dot.asnumpy()}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "44fa267d",
"metadata": {},
"source": [
"```\n",
"{'transpose_dot':\n",
" <RowSparseNDArray 5x2 @cpu(0)>, 'transpose_dot.asnumpy()': array([[ 7., 7.],\n",
" [ 9., 9.],\n",
" [ 8., 8.],\n",
" [ 0., 0.],\n",
" [ 0., 0.]], dtype=float32)}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "3f4ceb8f",
"metadata": {},
"source": [
"For any sparse operator, the storage type of output array is inferred based on inputs. You can either read the documentation or inspect the `stype` attribute of output array to know what storage type is inferred:"
]
},
{
"cell_type": "markdown",
"id": "753e9fd3",
"metadata": {},
"source": [
"```python\n",
"a = transpose_dot.copy()\n",
"b = a * 2 # b will be a RowSparseNDArray since zero multiplied by 2 is still zero\n",
"c = a + mx.nd.ones((5, 2)) # c will be a dense NDArray\n",
"{'b.stype':b.stype, 'c.stype':c.stype}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "91913771",
"metadata": {},
"source": [
"`{'b.stype': 'row_sparse', 'c.stype': 'default'}`<!--notebook-skip-line-->\n",
"\n",
"\n",
"\n",
"For operators that don't specialize in sparse arrays, you can still use them with sparse inputs with some performance penalty.\n",
"In MXNet, dense operators require all inputs and outputs to be in the dense format.\n",
"\n",
"If sparse inputs are provided, MXNet will convert sparse inputs into dense ones temporarily so that the dense operator can be used.\n",
"\n",
"If sparse outputs are provided, MXNet will convert the dense outputs generated by the dense operator into the provided sparse format.\n",
"\n",
"For operators that don't specialize in sparse arrays, you can still use them with sparse inputs with some performance penalty."
]
},
{
"cell_type": "markdown",
"id": "9eb7825f",
"metadata": {},
"source": [
"```python\n",
"e = mx.nd.sparse.zeros('row_sparse', a.shape)\n",
"d = mx.nd.log(a) # dense operator with a sparse input\n",
"e = mx.nd.log(a, out=e) # dense operator with a sparse output\n",
"{'a.stype':a.stype, 'd.stype':d.stype, 'e.stype':e.stype} # stypes of a and e will be not changed\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "af662085",
"metadata": {},
"source": [
"`{'a.stype': 'row_sparse', 'd.stype': 'default', 'e.stype': 'row_sparse'}` <!--notebook-skip-line-->\n",
"\n",
"\n",
"\n",
"Note that warning messages will be printed when such a storage fallback event happens. If you are using jupyter notebook, the warning message will be printed in your terminal console.\n",
"\n",
"## Sparse Optimizers\n",
"\n",
"In MXNet, sparse gradient updates are applied when gradient is in `row_sparse` storage and the optimizer is created with `lazy_update=True`.\n",
"The sparse optimizers only update the row slices of the weight and the states whose indices appear\n",
"in `gradient.indices`. For example, the default update rule for SGD optimizer is:"
]
},
{
"cell_type": "markdown",
"id": "3085a095",
"metadata": {},
"source": [
"```\n",
"rescaled_grad = learning_rate * rescale_grad * clip(grad, clip_gradient) + weight_decay * weight\n",
"state = momentum * state + rescaled_grad\n",
"weight = weight - state\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "5e86b2ce",
"metadata": {},
"source": [
"However, with sparse gradient the SGD optimizer uses the following lazy update by default:"
]
},
{
"cell_type": "markdown",
"id": "8bf8f439",
"metadata": {},
"source": [
"```\n",
"for row in grad.indices:\n",
" rescaled_grad[row] = learning_rate * rescale_grad * clip(grad[row], clip_gradient) + weight_decay * weight[row]\n",
" state[row] = momentum[row] * state[row] + rescaled_grad[row]\n",
" weight[row] = weight[row] - state[row]\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "31120202",
"metadata": {},
"source": [
"This means that the lazy update leads to different optimization results if `weight_decay` or `momentum` is non-zero.\n",
"To disable lazy update, please set `lazy_update` to be False when creating the optimizer."
]
},
{
"cell_type": "markdown",
"id": "a6dbbf5d",
"metadata": {},
"source": [
"```python\n",
"# Create weight\n",
"shape = (4, 2)\n",
"weight = mx.nd.ones(shape).tostype('row_sparse')\n",
"# Create gradient\n",
"data = [[1, 2], [4, 5]]\n",
"indices = [1, 2]\n",
"grad = mx.nd.sparse.row_sparse_array((data, indices), shape=shape)\n",
"sgd = mx.optimizer.SGD(learning_rate=0.01, momentum=0.01)\n",
"# Create momentum\n",
"momentum = sgd.create_state(0, weight)\n",
"# Before the update\n",
"{\"grad.asnumpy()\":grad.asnumpy(), \"weight.asnumpy()\":weight.asnumpy(), \"momentum.asnumpy()\":momentum.asnumpy()}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "85225832",
"metadata": {},
"source": [
"```\n",
"{'grad.asnumpy()': array([[ 0., 0.],\n",
" [ 1., 2.],\n",
" [ 4., 5.],\n",
" [ 0., 0.]], dtype=float32), 'momentum.asnumpy()': array([[ 0., 0.],\n",
" [ 0., 0.],\n",
" [ 0., 0.],\n",
" [ 0., 0.]], dtype=float32), 'weight.asnumpy()': array([[ 1., 1.],\n",
" [ 1., 1.],\n",
" [ 1., 1.],\n",
" [ 1., 1.]], dtype=float32)}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "6ac193fd",
"metadata": {},
"source": [
"```python\n",
"sgd.update(0, weight, grad, momentum)\n",
"# Only row 0 and row 2 are updated for both weight and momentum\n",
"{\"weight.asnumpy()\":weight.asnumpy(), \"momentum.asnumpy()\":momentum.asnumpy()}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "5d9b0334",
"metadata": {},
"source": [
"```\n",
"{'momentum.asnumpy()': array([[ 0. , 0. ],\n",
" [-0.01, -0.02],\n",
" [-0.04, -0.05],\n",
" [ 0. , 0. ]], dtype=float32),\n",
" 'weight.asnumpy()': array([[ 1. , 1. ],\n",
" [ 0.99000001, 0.98000002],\n",
" [ 0.95999998, 0.94999999],\n",
" [ 1. , 1. ]], dtype=float32)}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "3a9cb015",
"metadata": {},
"source": [
"Note that only [mxnet.optimizer.SGD](https://mxnet.apache.org/api/python/optimization/optimization.html#mxnet.optimizer.SGD), [mxnet.optimizer.Adam](https://mxnet.apache.org/api/python/optimization/optimization.html#mxnet.optimizer.Adam), and\n",
"[mxnet.optimizer.AdaGrad](https://mxnet.apache.org/api/python/optimization/optimization.html#mxnet.optimizer.AdaGrad) support sparse updates in MXNet.\n",
"\n",
"## Advanced Topics\n",
"\n",
"### GPU Support\n",
"\n",
"By default, RowSparseNDArray operators are executed on CPU. To create a RowSparseNDArray on gpu, we need to explicitly specify the context:\n",
"\n",
"**Note** If a GPU is not available, an error will be reported in the following section. In order to execute it on a cpu, set gpu_device to mx.cpu()."
]
},
{
"cell_type": "markdown",
"id": "0c564eea",
"metadata": {},
"source": [
"```python\n",
"import sys\n",
"gpu_device=mx.gpu() # Change this to mx.cpu() in absence of GPUs.\n",
"try:\n",
" a = mx.nd.sparse.zeros('row_sparse', (100, 100), ctx=gpu_device)\n",
" a\n",
"except mx.MXNetError as err:\n",
" sys.stderr.write(str(err))\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "438e3bac",
"metadata": {},
"source": [
"## Next\n",
"\n",
"[Train a Linear Regression Model with Sparse Symbols](/api/python/docs/tutorials/packages/ndarray/sparse/train.html)\n",
"\n",
"\n",
"<!-- INSERT SOURCE DOWNLOAD BUTTONS -->"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 5
}