blob: d0f026ee9c586b9743b261f5095b61ea4cb85e87 [file] [log] [blame]
{"nbformat": 4, "cells": [{"source": "# Train a Linear Regression Model with Sparse Symbols\nIn previous tutorials, we introduced `CSRNDArray` and `RowSparseNDArray`,\nthe basic data structures for manipulating sparse data.\nMXNet also provides `Sparse Symbol` API, which enables symbolic expressions that handle sparse arrays.\nIn this tutorial, we first focus on how to compose a symbolic graph with sparse operators,\nthen train a linear regression model using sparse symbols with the Module API.\n\n## Prerequisites\n\nTo complete this tutorial, we need:\n\n- MXNet. See the instructions for your operating system in [Setup and Installation](http://mxnet.io/install/index.html). \n\n- [Jupyter Notebook](http://jupyter.org/index.html) and [Python Requests](http://docs.python-requests.org/en/master/) packages.\n```\npip install jupyter requests\n```\n\n- Basic knowledge of Symbol in MXNet. See the detailed tutorial for Symbol in [Symbol - Neural Network Graphs and Auto-differentiation](https://mxnet.incubator.apache.org/tutorials/basic/symbol.html).\n\n- Basic knowledge of CSRNDArray in MXNet. See the detailed tutorial for CSRNDArray in [CSRNDArray - NDArray in Compressed Sparse Row Storage Format](https://mxnet.incubator.apache.org/versions/master/tutorials/sparse/csr.html).\n\n- Basic knowledge of RowSparseNDArray in MXNet. See the detailed tutorial for RowSparseNDArray in [RowSparseNDArray - NDArray for Sparse Gradient Updates](https://mxnet.incubator.apache.org/versions/master/tutorials/sparse/row_sparse.html).\n\n## Variables\n\nVariables are placeholder for arrays. We can use them to hold sparse arrays too.\n\n### Variable Storage Types\n\nThe `stype` attribute of a variable is used to indicate the storage type of the array.\nBy default, the `stype` of a variable is \"default\" which indicates the default dense storage format.\nWe can specify the `stype` of a variable as \"csr\" or \"row_sparse\" to hold sparse arrays.", "cell_type": "markdown", "metadata": {}}, {"source": "import mxnet as mx\n# Create a variable to hold an NDArray\na = mx.sym.Variable('a')\n# Create a variable to hold a CSRNDArray\nb = mx.sym.Variable('b', stype='csr')\n# Create a variable to hold a RowSparseNDArray\nc = mx.sym.Variable('c', stype='row_sparse')\n(a, b, c)", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": " (<Symbol a>, <Symbol b>, <Symbol c>)\n\n\n\n### Bind with Sparse Arrays\n\nThe sparse symbols constructed above declare storage types of the arrays to hold.\nTo evaluate them, we need to feed the free variables with sparse data.\n\nYou can instantiate an executor from a sparse symbol by using the `simple_bind` method,\nwhich allocate zeros to all free variables according to their storage types.\nThe executor provides `forward` method for evaluation and an attribute\n`outputs` to get all the results. Later, we will show the use of the `backward` method and other methods computing the gradients and updating parameters. A simple example first:", "cell_type": "markdown", "metadata": {}}, {"source": "shape = (2,2)\n# Instantiate an executor from sparse symbols\nb_exec = b.simple_bind(ctx=mx.cpu(), b=shape)\nc_exec = c.simple_bind(ctx=mx.cpu(), c=shape)\nb_exec.forward()\nc_exec.forward()\n# Sparse arrays of zeros are bound to b and c\nprint(b_exec.outputs, c_exec.outputs)", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": " ([\n <CSRNDArray 2x2 @cpu(0)>], [\n <RowSparseNDArray 2x2 @cpu(0)>])\n\n\nYou can update the array held by the variable by accessing executor's `arg_dict` and assigning new values.", "cell_type": "markdown", "metadata": {}}, {"source": "b_exec.arg_dict['b'][:] = mx.nd.ones(shape).tostype('csr')\nb_exec.forward()\n# The array `b` holds are updated to be ones\neval_b = b_exec.outputs[0]\n{'eval_b': eval_b, 'eval_b.asnumpy()': eval_b.asnumpy()}", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": " {'eval_b': \n <CSRNDArray 2x2 @cpu(0)>, 'eval_b.asnumpy()': array([[ 1., 1.],\n [ 1., 1.]], dtype=float32)}\n\n\n\n## Symbol Composition and Storage Type Inference\n\n### Basic Symbol Composition\n\nThe following example builds a simple element-wise addition expression with different storage types.\nThe sparse symbols are available in the `mx.sym.sparse` package.", "cell_type": "markdown", "metadata": {}}, {"source": "# Element-wise addition of variables with \"default\" stype\nd = mx.sym.elemwise_add(a, a)\n# Element-wise addition of variables with \"csr\" stype\ne = mx.sym.sparse.negative(b)\n# Element-wise addition of variables with \"row_sparse\" stype\nf = mx.sym.sparse.elemwise_add(c, c)\n{'d':d, 'e':e, 'f':f}", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": " {'d': <Symbol elemwise_add0>,\n 'e': <Symbol negative0>,\n 'f': <Symbol elemwise_add1>}\n\n\n\n### Storage Type Inference\n\nWhat will be the output storage types of sparse symbols? In MXNet, for any sparse symbol, the result storage types are inferred based on storage types of inputs.\nYou can read the [Sparse Symbol API](http://mxnet.io/versions/master/api/python/symbol/sparse.html) documentation to find what output storage types are. In the example below we will try out the storage types introduced in the Row Sparse and Compressed Sparse Row tutorials: `default` (dense), `csr`, and `row_sparse`.", "cell_type": "markdown", "metadata": {}}, {"source": "add_exec = mx.sym.Group([d, e, f]).simple_bind(ctx=mx.cpu(), a=shape, b=shape, c=shape)\nadd_exec.forward()\ndense_add = add_exec.outputs[0]\n# The output storage type of elemwise_add(csr, csr) will be inferred as \"csr\"\ncsr_add = add_exec.outputs[1]\n# The output storage type of elemwise_add(row_sparse, row_sparse) will be inferred as \"row_sparse\"\nrsp_add = add_exec.outputs[2]\n{'dense_add.stype': dense_add.stype, 'csr_add.stype':csr_add.stype, 'rsp_add.stype': rsp_add.stype}", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": " {'csr_add.stype': 'csr',\n 'dense_add.stype': 'default',\n 'rsp_add.stype': 'row_sparse'}\n\n\n\n### Storage Type Fallback\n\nFor operators that don't specialize in certain sparse arrays, you can still use them with sparse inputs with some performance penalty. In MXNet, dense operators require all inputs and outputs to be in the dense format. If sparse inputs are provided, MXNet will convert sparse inputs into dense ones temporarily so that the dense operator can be used. If sparse outputs are provided, MXNet will convert the dense outputs generated by the dense operator into the provided sparse format. Warning messages will be printed when such a storage fallback event happens.", "cell_type": "markdown", "metadata": {}}, {"source": "# `log` operator doesn't support sparse inputs at all, but we can fallback on the dense implementation\ncsr_log = mx.sym.log(a)\n# `elemwise_add` operator doesn't support adding csr with row_sparse, but we can fallback on the dense implementation\ncsr_rsp_add = mx.sym.elemwise_add(b, c)\nfallback_exec = mx.sym.Group([csr_rsp_add, csr_log]).simple_bind(ctx=mx.cpu(), a=shape, b=shape, c=shape)\nfallback_exec.forward()\nfallback_add = fallback_exec.outputs[0]\nfallback_log = fallback_exec.outputs[1]\n{'fallback_add': fallback_add, 'fallback_log': fallback_log}", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": " {'fallback_add': \n [[ 0. 0.]\n [ 0. 0.]]\n <NDArray 2x2 @cpu(0)>, 'fallback_log': \n [[-inf -inf]\n [-inf -inf]]\n <NDArray 2x2 @cpu(0)>}\n\n\n\n### Inspecting Storage Types of the Symbol Graph (Work in Progress)\n\nWhen the environment variable `MXNET_INFER_STORAGE_TYPE_VERBOSE_LOGGING` is set to `1`, MXNet will log the storage type information of\noperators' inputs and outputs in the computation graph. For example, we can inspect the storage types of\na linear classification network with sparse operators as follows:", "cell_type": "markdown", "metadata": {}}, {"source": "# Set logging level for executor\nimport mxnet as mx\nimport os\nos.environ['MXNET_INFER_STORAGE_TYPE_VERBOSE_LOGGING'] = \"1\"\n# Data in csr format\ndata = mx.sym.var('data', stype='csr', shape=(32, 10000))\n# Weight in row_sparse format\nweight = mx.sym.var('weight', stype='row_sparse', shape=(10000, 2))\nbias = mx.symbol.Variable(\"bias\", shape=(2,))\ndot = mx.symbol.sparse.dot(data, weight)\npred = mx.symbol.broadcast_add(dot, bias)\ny = mx.symbol.Variable(\"label\")\noutput = mx.symbol.SoftmaxOutput(data=pred, label=y, name=\"output\")\nexecutor = output.simple_bind(ctx=mx.cpu())", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "## Training with Module APIs\n\nIn the following section we'll walk through how one can implement **linear regression** using sparse symbols and sparse optimizers.\n\nThe function you will explore is: *y = x<sub>1</sub> + 2x<sub>2</sub> + ... 100x<sub>100*, where *(x<sub>1</sub>,x<sub>2</sub>, ..., x<sub>100</sub>)* are input features and *y* is the corresponding label.\n\n### Preparing the Data\n\nIn MXNet, both [mx.io.LibSVMIter](https://mxnet.incubator.apache.org/versions/master/api/python/io/io.html#mxnet.io.LibSVMIter)\nand [mx.io.NDArrayIter](https://mxnet.incubator.apache.org/versions/master/api/python/io/io.html#mxnet.io.NDArrayIter)\nsupport loading sparse data in CSR format. In this example, we'll use the `NDArrayIter`.\n\nYou may see some warnings from SciPy. You don't need to worry about those for this example.", "cell_type": "markdown", "metadata": {}}, {"source": "# Random training data\nfeature_dimension = 100\ntrain_data = mx.test_utils.rand_ndarray((1000, feature_dimension), 'csr', 0.01)\ntarget_weight = mx.nd.arange(1, feature_dimension + 1).reshape((feature_dimension, 1))\ntrain_label = mx.nd.dot(train_data, target_weight)\nbatch_size = 1\ntrain_iter = mx.io.NDArrayIter(train_data, train_label, batch_size, last_batch_handle='discard', label_name='label')", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "### Defining the Model\n\nBelow is an example of a linear regression model specifying the storage type of the variables.", "cell_type": "markdown", "metadata": {}}, {"source": "initializer = mx.initializer.Normal(sigma=0.01)\nX = mx.sym.Variable('data', stype='csr')\nY = mx.symbol.Variable('label')\nweight = mx.symbol.Variable('weight', stype='row_sparse', shape=(feature_dimension, 1), init=initializer)\nbias = mx.symbol.Variable('bias', shape=(1, ))\npred = mx.sym.broadcast_add(mx.sym.sparse.dot(X, weight), bias)\nlro = mx.sym.LinearRegressionOutput(data=pred, label=Y, name=\"lro\")", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "The above network uses the following symbols:\n\n1. `Variable X`: The placeholder for sparse data inputs. The `csr` stype indicates that the array to hold is in CSR format.\n\n2. `Variable Y`: The placeholder for dense labels.\n\n3. `Variable weight`: The placeholder for the weight to learn. The `stype` of weight is specified as `row_sparse` so that it is initialized as RowSparseNDArray,\n and the optimizer will perform sparse update rules on it. The `init` attribute specifies what initializer to use for this variable.\n\n4. `Variable bias`: The placeholder for the bias to learn.\n\n5. `sparse.dot`: The dot product operation of `X` and `weight`. The sparse implementation will be invoked to handle `csr` and `row_sparse` inputs.\n\n6. `broadcast_add`: The broadcasting add operation to apply `bias`.\n\n7. `LinearRegressionOutput`: The output layer which computes *l2* loss against its input and the labels provided to it.\n\n### Training the model\n\nOnce we have defined the model structure, the next step is to create a module and initialize the parameters and optimizer.", "cell_type": "markdown", "metadata": {}}, {"source": "# Create module\nmod = mx.mod.Module(symbol=lro, data_names=['data'], label_names=['label'])\n# Allocate memory by giving the input data and label shapes\nmod.bind(data_shapes=train_iter.provide_data, label_shapes=train_iter.provide_label)\n# Initialize parameters by random numbers\nmod.init_params(initializer=initializer)\n# Use SGD as the optimizer, which performs sparse update on \"row_sparse\" weight\nsgd = mx.optimizer.SGD(learning_rate=0.05, rescale_grad=1.0/batch_size, momentum=0.9)\nmod.init_optimizer(optimizer=sgd)", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "Finally, we train the parameters of the model to fit the training data by using the `forward`, `backward`, and `update` methods in Module.", "cell_type": "markdown", "metadata": {}}, {"source": "# Use mean square error as the metric\nmetric = mx.metric.create('MSE')\n# Train 10 epochs\nfor epoch in range(10):\n train_iter.reset()\n metric.reset()\n for batch in train_iter:\n mod.forward(batch, is_train=True) # compute predictions\n mod.update_metric(metric, batch.label) # accumulate prediction accuracy\n mod.backward() # compute gradients\n mod.update() # update parameters\n print('Epoch %d, Metric = %s' % (epoch, metric.get()))\nassert metric.get()[1] < 1, \"Achieved MSE (%f) is larger than expected (1.0)\" % metric.get()[1] ", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "\n Epoch 0, Metric = ('mse', 886.16457029229127)\n Epoch 1, Metric = ('mse', 173.16523056503445)\n Epoch 2, Metric = ('mse', 71.625164168341811)\n Epoch 3, Metric = ('mse', 29.625375983519298)\n Epoch 4, Metric = ('mse', 12.45004676561909)\n Epoch 5, Metric = ('mse', 6.9090727975622368)\n Epoch 6, Metric = ('mse', 3.0759215722750142)\n Epoch 7, Metric = ('mse', 1.3106610134811276)\n Epoch 8, Metric = ('mse', 0.63063102482907718)\n Epoch 9, Metric = ('mse', 0.35979430613957991)\n\n\n\n\n### Training the model with multiple machines\n\nTo train a sparse model with multiple machines, please refer to the example in [mxnet/example/sparse/](https://github.com/apache/incubator-mxnet/tree/master/example/sparse)\n\n<!-- INSERT SOURCE DOWNLOAD BUTTONS -->\n\n", "cell_type": "markdown", "metadata": {}}], "metadata": {"display_name": "", "name": "", "language": "python"}, "nbformat_minor": 2}