blob: 527f0c310587398d017fca1ede2b03ec00c1f4ed [file] [log] [blame]
{
"cells": [
{
"cell_type": "markdown",
"id": "4325ac4f",
"metadata": {},
"source": [
"<!--- Licensed to the Apache Software Foundation (ASF) under one -->\n",
"<!--- or more contributor license agreements. See the NOTICE file -->\n",
"<!--- distributed with this work for additional information -->\n",
"<!--- regarding copyright ownership. The ASF licenses this file -->\n",
"<!--- to you under the Apache License, Version 2.0 (the -->\n",
"<!--- \"License\"); you may not use this file except in compliance -->\n",
"<!--- with the License. You may obtain a copy of the License at -->\n",
"\n",
"<!--- http://www.apache.org/licenses/LICENSE-2.0 -->\n",
"\n",
"<!--- Unless required by applicable law or agreed to in writing, -->\n",
"<!--- software distributed under the License is distributed on an -->\n",
"<!--- \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->\n",
"<!--- KIND, either express or implied. See the License for the -->\n",
"<!--- specific language governing permissions and limitations -->\n",
"<!--- under the License. -->\n",
"\n",
"\n",
"# Train a Linear Regression Model with Sparse Symbols\n",
"In previous tutorials, we introduced `CSRNDArray` and `RowSparseNDArray`,\n",
"the basic data structures for manipulating sparse data.\n",
"MXNet also provides `Sparse Symbol` API, which enables symbolic expressions that handle sparse arrays.\n",
"In this tutorial, we first focus on how to compose a symbolic graph with sparse operators,\n",
"then train a linear regression model using sparse symbols with the Module API.\n",
"\n",
"## Prerequisites\n",
"\n",
"To complete this tutorial, we need:\n",
"\n",
"- MXNet. See the instructions for your operating system in [Setup and Installation](/get_started). \n",
"\n",
"- [Jupyter Notebook](https://jupyter.org/index.html) and [Python Requests](https://3.python-requests.org/) packages."
]
},
{
"cell_type": "markdown",
"id": "645a58f2",
"metadata": {},
"source": [
"```\n",
"pip install jupyter requests\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "e9a31655",
"metadata": {},
"source": [
"- Basic knowledge of Symbol in MXNet. See the detailed tutorial for Symbol in [Symbol - Neural Network Graphs and Auto-differentiation](https://mxnet.apache.org/tutorials/basic/symbol.html).\n",
"\n",
"- Basic knowledge of CSRNDArray in MXNet. See the detailed tutorial for CSRNDArray in [CSRNDArray - NDArray in Compressed Sparse Row Storage Format](/api/python/docs/tutorials/packages/ndarray/sparse/csr.html).\n",
"\n",
"- Basic knowledge of RowSparseNDArray in MXNet. See the detailed tutorial for RowSparseNDArray in [RowSparseNDArray - NDArray for Sparse Gradient Updates](/api/python/docs/tutorials/packages/ndarray/sparse/row_sparse.html).\n",
"\n",
"## Variables\n",
"\n",
"Variables are placeholder for arrays. We can use them to hold sparse arrays too.\n",
"\n",
"### Variable Storage Types\n",
"\n",
"The `stype` attribute of a variable is used to indicate the storage type of the array.\n",
"By default, the `stype` of a variable is \"default\" which indicates the default dense storage format.\n",
"We can specify the `stype` of a variable as \"csr\" or \"row_sparse\" to hold sparse arrays."
]
},
{
"cell_type": "markdown",
"id": "2d135cf8",
"metadata": {},
"source": [
"```python\n",
"import mxnet as mx\n",
"import numpy as np\n",
"import random\n",
"\n",
"# set the seeds for repeatability\n",
"random.seed(42)\n",
"np.random.seed(42)\n",
"mx.random.seed(42)\n",
"\n",
"# Create a variable to hold an NDArray\n",
"a = mx.sym.Variable('a')\n",
"# Create a variable to hold a CSRNDArray\n",
"b = mx.sym.Variable('b', stype='csr')\n",
"# Create a variable to hold a RowSparseNDArray\n",
"c = mx.sym.Variable('c', stype='row_sparse')\n",
"(a, b, c)\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "0cb2e1c0",
"metadata": {},
"source": [
"`(<Symbol a>, <Symbol b>, <Symbol c>)` <!--notebook-skip-line-->\n",
"\n",
"\n",
"\n",
"### Bind with Sparse Arrays\n",
"\n",
"The sparse symbols constructed above declare storage types of the arrays to hold.\n",
"To evaluate them, we need to feed the free variables with sparse data.\n",
"\n",
"You can instantiate an executor from a sparse symbol by using the `simple_bind` method,\n",
"which allocate zeros to all free variables according to their storage types.\n",
"The executor provides `forward` method for evaluation and an attribute\n",
"`outputs` to get all the results. Later, we will show the use of the `backward` method and other methods computing the gradients and updating parameters. A simple example first:"
]
},
{
"cell_type": "markdown",
"id": "23a5e415",
"metadata": {},
"source": [
"```python\n",
"shape = (2,2)\n",
"# Instantiate an executor from sparse symbols\n",
"b_exec = b.simple_bind(ctx=mx.cpu(), b=shape)\n",
"c_exec = c.simple_bind(ctx=mx.cpu(), c=shape)\n",
"b_exec.forward()\n",
"c_exec.forward()\n",
"# Sparse arrays of zeros are bound to b and c\n",
"print(b_exec.outputs, c_exec.outputs)\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "492149ca",
"metadata": {},
"source": [
"```\n",
"([\n",
"<CSRNDArray 2x2 @cpu(0)>], [\n",
"<RowSparseNDArray 2x2 @cpu(0)>])\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "b4898449",
"metadata": {},
"source": [
"You can update the array held by the variable by accessing executor's `arg_dict` and assigning new values."
]
},
{
"cell_type": "markdown",
"id": "cf3e7e67",
"metadata": {},
"source": [
"```python\n",
"b_exec.arg_dict['b'][:] = mx.nd.ones(shape).tostype('csr')\n",
"b_exec.forward()\n",
"# The array `b` holds are updated to be ones\n",
"eval_b = b_exec.outputs[0]\n",
"{'eval_b': eval_b, 'eval_b.asnumpy()': eval_b.asnumpy()}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "b276fad0",
"metadata": {},
"source": [
"```\n",
"{'eval_b':\n",
" <CSRNDArray 2x2 @cpu(0)>, 'eval_b.asnumpy()': array([[ 1., 1.],\n",
" [ 1., 1.]], dtype=float32)}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "e5d4acff",
"metadata": {},
"source": [
"## Symbol Composition and Storage Type Inference\n",
"\n",
"### Basic Symbol Composition\n",
"\n",
"The following example builds a simple element-wise addition expression with different storage types.\n",
"The sparse symbols are available in the `mx.sym.sparse` package."
]
},
{
"cell_type": "markdown",
"id": "3e6a739d",
"metadata": {},
"source": [
"```python\n",
"# Element-wise addition of variables with \"default\" stype\n",
"d = mx.sym.elemwise_add(a, a)\n",
"# Element-wise addition of variables with \"csr\" stype\n",
"e = mx.sym.sparse.negative(b)\n",
"# Element-wise addition of variables with \"row_sparse\" stype\n",
"f = mx.sym.sparse.elemwise_add(c, c)\n",
"{'d':d, 'e':e, 'f':f}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "71349473",
"metadata": {},
"source": [
"```\n",
"{'d': <Symbol elemwise_add0>,\n",
" 'e': <Symbol negative0>,\n",
" 'f': <Symbol elemwise_add1>}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "f9b6c177",
"metadata": {},
"source": [
"### Storage Type Inference\n",
"\n",
"What will be the output storage types of sparse symbols? In MXNet, for any sparse symbol, the result storage types are inferred based on storage types of inputs.\n",
"You can read the [Sparse Symbol API](/api/python/docs/api/symbol/sparse/index.html) documentation to find what output storage types are. In the example below we will try out the storage types introduced in the Row Sparse and Compressed Sparse Row tutorials: `default` (dense), `csr`, and `row_sparse`."
]
},
{
"cell_type": "markdown",
"id": "d97c7752",
"metadata": {},
"source": [
"```python\n",
"add_exec = mx.sym.Group([d, e, f]).simple_bind(ctx=mx.cpu(), a=shape, b=shape, c=shape)\n",
"add_exec.forward()\n",
"dense_add = add_exec.outputs[0]\n",
"# The output storage type of elemwise_add(csr, csr) will be inferred as \"csr\"\n",
"csr_add = add_exec.outputs[1]\n",
"# The output storage type of elemwise_add(row_sparse, row_sparse) will be inferred as \"row_sparse\"\n",
"rsp_add = add_exec.outputs[2]\n",
"{'dense_add.stype': dense_add.stype, 'csr_add.stype':csr_add.stype, 'rsp_add.stype': rsp_add.stype}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "4dc4199c",
"metadata": {},
"source": [
"```\n",
"{'csr_add.stype': 'csr',\n",
" 'dense_add.stype': 'default',\n",
" 'rsp_add.stype': 'row_sparse'}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "d92876e6",
"metadata": {},
"source": [
"### Storage Type Fallback\n",
"\n",
"For operators that don't specialize in certain sparse arrays, you can still use them with sparse inputs with some performance penalty. In MXNet, dense operators require all inputs and outputs to be in the dense format. If sparse inputs are provided, MXNet will convert sparse inputs into dense ones temporarily so that the dense operator can be used. If sparse outputs are provided, MXNet will convert the dense outputs generated by the dense operator into the provided sparse format. Warning messages will be printed when such a storage fallback event happens."
]
},
{
"cell_type": "markdown",
"id": "7700254c",
"metadata": {},
"source": [
"```python\n",
"# `log` operator doesn't support sparse inputs at all, but we can fallback on the dense implementation\n",
"csr_log = mx.sym.log(a)\n",
"# `elemwise_add` operator doesn't support adding csr with row_sparse, but we can fallback on the dense implementation\n",
"csr_rsp_add = mx.sym.elemwise_add(b, c)\n",
"fallback_exec = mx.sym.Group([csr_rsp_add, csr_log]).simple_bind(ctx=mx.cpu(), a=shape, b=shape, c=shape)\n",
"fallback_exec.forward()\n",
"fallback_add = fallback_exec.outputs[0]\n",
"fallback_log = fallback_exec.outputs[1]\n",
"{'fallback_add': fallback_add, 'fallback_log': fallback_log}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "6f8dee09",
"metadata": {},
"source": [
"```\n",
"{'fallback_add':\n",
" [[ 0. 0.]\n",
" [ 0. 0.]]\n",
" <NDArray 2x2 @cpu(0)>, 'fallback_log':\n",
" [[-inf -inf]\n",
" [-inf -inf]]\n",
" <NDArray 2x2 @cpu(0)>}\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "6da8f5d1",
"metadata": {},
"source": [
"### Inspecting Storage Types of the Symbol Graph\n",
"\n",
"When the environment variable `MXNET_INFER_STORAGE_TYPE_VERBOSE_LOGGING` is set to `1`, MXNet will log the storage type information of\n",
"operators' inputs and outputs in the computation graph. For example, we can inspect the storage types of\n",
"a linear classification network with sparse operators. Uncomment the line below and inspect your console.:"
]
},
{
"cell_type": "markdown",
"id": "c9e9499b",
"metadata": {},
"source": [
"```python\n",
"# Set logging level for executor\n",
"import mxnet as mx\n",
"import os\n",
"#os.environ['MXNET_INFER_STORAGE_TYPE_VERBOSE_LOGGING'] = \"1\"\n",
"# Data in csr format\n",
"data = mx.sym.var('data', stype='csr', shape=(32, 10000))\n",
"# Weight in row_sparse format\n",
"weight = mx.sym.var('weight', stype='row_sparse', shape=(10000, 2))\n",
"bias = mx.symbol.Variable(\"bias\", shape=(2,))\n",
"dot = mx.symbol.sparse.dot(data, weight)\n",
"pred = mx.symbol.broadcast_add(dot, bias)\n",
"y = mx.symbol.Variable(\"label\")\n",
"output = mx.symbol.SoftmaxOutput(data=pred, label=y, name=\"output\")\n",
"executor = output.simple_bind(ctx=mx.cpu())\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "550aa855",
"metadata": {},
"source": [
"## Training with Module APIs\n",
"\n",
"In the following section we'll walk through how one can implement **linear regression** using sparse symbols and sparse optimizers.\n",
"\n",
"The function you will explore is: *y = x<sub>1</sub> + 2x<sub>2</sub> + ... 100x<sub>100*, where *(x<sub>1</sub>,x<sub>2</sub>, ..., x<sub>100</sub>)* are input features and *y* is the corresponding label.\n",
"\n",
"### Preparing the Data\n",
"\n",
"In MXNet, both [mx.io.LibSVMIter](/api/python/docs/api/mxnet/io/index.html#mxnet.io.LibSVMIter)\n",
"and [mx.io.NDArrayIter](/api/python/docs/api/mxnet/io/index.html#mxnet.io.NDArrayIter)\n",
"support loading sparse data in CSR format. In this example, we'll use the `NDArrayIter`.\n",
"\n",
"You may see some warnings from SciPy. You don't need to worry about those for this example."
]
},
{
"cell_type": "markdown",
"id": "45032b16",
"metadata": {},
"source": [
"```python\n",
"# Random training data\n",
"feature_dimension = 100\n",
"train_data = mx.test_utils.rand_ndarray((1000, feature_dimension), 'csr', 0.01)\n",
"target_weight = mx.nd.arange(1, feature_dimension + 1).reshape((feature_dimension, 1))\n",
"train_label = mx.nd.dot(train_data, target_weight)\n",
"batch_size = 1\n",
"train_iter = mx.io.NDArrayIter(train_data, train_label, batch_size, last_batch_handle='discard', label_name='label')\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "5c1681e2",
"metadata": {},
"source": [
"### Defining the Model\n",
"\n",
"Below is an example of a linear regression model specifying the storage type of the variables."
]
},
{
"cell_type": "markdown",
"id": "cf15386b",
"metadata": {},
"source": [
"```python\n",
"initializer = mx.initializer.Normal(sigma=0.01)\n",
"X = mx.sym.Variable('data', stype='csr')\n",
"Y = mx.symbol.Variable('label')\n",
"weight = mx.symbol.Variable('weight', stype='row_sparse', shape=(feature_dimension, 1), init=initializer)\n",
"bias = mx.symbol.Variable('bias', shape=(1, ))\n",
"pred = mx.sym.broadcast_add(mx.sym.sparse.dot(X, weight), bias)\n",
"lro = mx.sym.LinearRegressionOutput(data=pred, label=Y, name=\"lro\")\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "064302d3",
"metadata": {},
"source": [
"The above network uses the following symbols:\n",
"\n",
"1. `Variable X`: The placeholder for sparse data inputs. The `csr` stype indicates that the array to hold is in CSR format.\n",
"\n",
"2. `Variable Y`: The placeholder for dense labels.\n",
"\n",
"3. `Variable weight`: The placeholder for the weight to learn. The `stype` of weight is specified as `row_sparse` so that it is initialized as RowSparseNDArray,\n",
" and the optimizer will perform sparse update rules on it. The `init` attribute specifies what initializer to use for this variable.\n",
"\n",
"4. `Variable bias`: The placeholder for the bias to learn.\n",
"\n",
"5. `sparse.dot`: The dot product operation of `X` and `weight`. The sparse implementation will be invoked to handle `csr` and `row_sparse` inputs.\n",
"\n",
"6. `broadcast_add`: The broadcasting add operation to apply `bias`.\n",
"\n",
"7. `LinearRegressionOutput`: The output layer which computes *l2* loss against its input and the labels provided to it.\n",
"\n",
"### Training the model\n",
"\n",
"Once we have defined the model structure, the next step is to create a module and initialize the parameters and optimizer."
]
},
{
"cell_type": "markdown",
"id": "4bf0a7e4",
"metadata": {},
"source": [
"```python\n",
"# Create module\n",
"mod = mx.mod.Module(symbol=lro, data_names=['data'], label_names=['label'])\n",
"# Allocate memory by giving the input data and label shapes\n",
"mod.bind(data_shapes=train_iter.provide_data, label_shapes=train_iter.provide_label)\n",
"# Initialize parameters by random numbers\n",
"mod.init_params(initializer=initializer)\n",
"# Use SGD as the optimizer, which performs sparse update on \"row_sparse\" weight\n",
"sgd = mx.optimizer.SGD(learning_rate=0.05, rescale_grad=1.0/batch_size, momentum=0.9)\n",
"mod.init_optimizer(optimizer=sgd)\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "92a05280",
"metadata": {},
"source": [
"Finally, we train the parameters of the model to fit the training data by using the `forward`, `backward`, and `update` methods in Module."
]
},
{
"cell_type": "markdown",
"id": "598e24df",
"metadata": {},
"source": [
"```python\n",
"# Use mean square error as the metric\n",
"metric = mx.metric.create('MSE')\n",
"# Train 10 epochs\n",
"for epoch in range(10):\n",
" train_iter.reset()\n",
" metric.reset()\n",
" for batch in train_iter:\n",
" mod.forward(batch, is_train=True) # compute predictions\n",
" mod.update_metric(metric, batch.label) # accumulate prediction accuracy\n",
" mod.backward() # compute gradients\n",
" mod.update() # update parameters\n",
" print('Epoch %d, Metric = %s' % (epoch, metric.get()))\n",
"assert metric.get()[1] < 1, \"Achieved MSE (%f) is larger than expected (1.0)\" % metric.get()[1] \n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "6b965772",
"metadata": {},
"source": [
"`Epoch 9, Metric = ('mse', 0.35979430613957991)`<!--notebook-skip-line-->\n",
"\n",
"\n",
"\n",
"\n",
"### Training the model with multiple machines or multiple devices\n",
"\n",
"Distributed training with `row_sparse` weights and gradients are supported in MXNet, which significantly reduces communication cost for large models. To train a sparse model with multiple machines, you need to call `prepare` before `forward`, or `save_checkpoint`.\n",
"Please refer to the example in [mxnet/example/sparse/linear_classification](https://github.com/apache/mxnet/tree/master/example/sparse/linear_classification)\n",
"for more details.\n",
"\n",
"<!-- INSERT SOURCE DOWNLOAD BUTTONS -->"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 5
}