docs/tutorials/python/linear-regression.md - mxnet - Git at Google

 # Linear Regression

 In this tutorial we'll walk through how one can implement *linear regression* using MXNet APIs.

 The function we are trying to learn is: *y = x<sub>1</sub>  +  2x<sub>2</sub>*, where *(x<sub>1</sub>,x<sub>2</sub>)* are input features and *y* is the corresponding label.

 ## Prerequisites

 To complete this tutorial, we need:

 - MXNet. See the instructions for your operating system in [Setup and Installation](http://mxnet.io/install/index.html).

 - [Jupyter Notebook](http://jupyter.org/index.html).

 ```
 $ pip install jupyter
 ```

 To begin, the following code imports the necessary packages we'll need for this exercise.

 ```python
 import mxnet as mx
 import numpy as np

 # Fix the random seed
 mx.random.seed(42)

 import logging
 logging.getLogger().setLevel(logging.DEBUG)
 ```

 ## Preparing the Data

 In MXNet, data is input via **Data Iterators**. Here we will illustrate
 how to encode a dataset into an iterator that MXNet can use. The data used in the example is made up of 2D data points with corresponding integer labels.

 ```python
 #Training data
 train_data = np.random.uniform(0, 1, [100, 2])
 train_label = np.array([train_data[i][0] + 2 * train_data[i][1] for i in range(100)])
 batch_size = 1

 #Evaluation Data
 eval_data = np.array([[7,2],[6,10],[12,2]])
 eval_label = np.array([11,26,16])
 ```

 Once we have the data ready, we need to put it into an iterator and specify
 parameters such as `batch_size` and `shuffle`. `batch_size` specifies the number
 of examples shown to the model each time we update its parameters and `shuffle`
 tells the iterator to randomize the order in which examples are shown to the model.


 ```python
 train_iter = mx.io.NDArrayIter(train_data, train_label, batch_size, shuffle=True, label_name='lin_reg_label')
 eval_iter = mx.io.NDArrayIter(eval_data, eval_label, batch_size, shuffle=False, label_name='lin_reg_label')
 ```

 In the above example, we have made use of `NDArrayIter`, which is useful for iterating
 over both numpy ndarrays and MXNet NDArrays. In general, there are different types of iterators in
 MXNet and you can use one based on the type of data you are processing.
 Documentation for iterators can be found [here](http://mxnet.io/api/python/io/io.html).

 ## MXNet Classes

 1. **IO:** The IO class as we already saw works on the data and carries out
    operations such as feeding data in batches and shuffling.

 2. **Symbol:** The actual MXNet neural network is composed using symbols. MXNet has
    different types of symbols, including variable placeholders for input data,
    neural network layers, and operators that manipulate NDArrays.

 3. **Module:** The module class in MXNet is used to define the overall computation.
 	It is initialized with the model we want to train, the training inputs (data and labels)
 	and some additional parameters such as learning rate and the optimization
 	algorithm to use.

 ## Defining the Model

 MXNet uses **Symbols** for defining a model. Symbols are the building blocks
 and make up various components of the model. Symbols are used to define:

 1. **Variables:** A variable is a placeholder for future data. This symbol is used
    to define a spot which will be filled with training data/labels in the future
    when we commence training.
 2. **Neural Network Layers:** The layers of a network or any other type of model are
    also defined by Symbols. Such a symbol takes one or more previous symbols as
    inputs, performs some transformations on them, and creates one or more outputs.
    One such example is the `FullyConnected` symbol which specifies a fully connected
    layer of a neural network.
 3. **Outputs:** Output symbols are MXNet's way of defining a loss. They are
    suffixed with the word "Output" (eg. the `SoftmaxOutput` layer). You can also
    [create your own loss function](https://github.com/dmlc/mxnet/blob/master/docs/tutorials/r/CustomLossFunction.md#how-to-use-your-own-loss-function).
    Some examples of existing losses are: `LinearRegressionOutput`, which computes
    the l2-loss between it's input symbol and the labels provided to it;
    `SoftmaxOutput`, which computes the categorical cross-entropy.

 The ones described above and other symbols are chained together with the output of
 one symbol serving as input to the next to build the network topology. More information
 about the different types of symbols can be found [here](http://mxnet.io/api/python/symbol/symbol.html).

 ```python
 X = mx.sym.Variable('data')
 Y = mx.symbol.Variable('lin_reg_label')
 fully_connected_layer  = mx.sym.FullyConnected(data=X, name='fc1', num_hidden = 1)
 lro = mx.sym.LinearRegressionOutput(data=fully_connected_layer, label=Y, name="lro")
 ```

 The above network uses the following layers:

 1. `FullyConnected`: The fully connected symbol represents a fully connected layer
    of a neural network (without any activation being applied), which in essence,
    is just a linear regression on the input attributes. It takes the following
    parameters:

    - `data`: Input to the layer (specifies the symbol whose output should be fed here)
    - `num_hidden`: Number of hidden neurons in the layer, which is same as the dimensionality
      of the layer's output

 2. `LinearRegressionOutput`: Output layers in MXNet compute training loss, which is
 	the measure of inaccuracy in the model's predictions. The goal of training is to minimize the
 	training loss. In our example, the `LinearRegressionOutput` layer computes the *l2* loss against
 	its input and the labels provided to it. The parameters to this layer are:

    - `data`: Input to this layer (specifies the symbol whose output should be fed here)
    - `label`: The training labels against which we will compare the input to the layer for calculation of l2 loss

 **Note on naming convention:** the label variable's name should be the same as the
 `label_name` parameter passed to your training data iterator. The default value of
 this is `softmax_label`, but we have updated it to `lin_reg_label` in this
 tutorial as you can see in `Y = mx.symbol.Variable('lin_reg_label')` and
 `train_iter = mx.io.NDArrayIter(..., label_name='lin_reg_label')`.

 Finally, the network is input to a *Module*, where we specify the symbol
 whose output needs to be minimized (in our case, `lro` or the `lin_reg_output`), the
 learning rate to be used while optimization and the number of epochs we want to
 train our model for.

 ```python
 model = mx.mod.Module(
     symbol = lro ,
     data_names=['data'],
     label_names = ['lin_reg_label']# network structure
 )
 ```

 We can visualize the network we created by plotting it:

 ```python
 mx.viz.plot_network(symbol=lro)
 ```

 ## Training the model

 Once we have defined the model structure, the next step is to train the
 parameters of the model to fit the training data. This is accomplished using the
 `fit()` function of the `Module` class.

 ```python
 model.fit(train_iter, eval_iter,
             optimizer_params={'learning_rate':0.01, 'momentum': 0.9},
             num_epoch=20,
             eval_metric='mse',
             batch_end_callback = mx.callback.Speedometer(batch_size, 2))
 ```

 ## Using a trained model: (Testing and Inference)

 Once we have a trained model, we can do a couple of things with it - we can either
 use it for inference or we can evaluate the trained model on test data. The latter is shown below:

 ```python
 model.predict(eval_iter).asnumpy()
 ```

 We can also evaluate our model according to some metric. In this example, we are
 evaluating our model's mean squared error (MSE) on the evaluation data.

 ```python
 metric = mx.metric.MSE()
 mse = model.score(eval_iter, metric)
 print("Achieved {0:.6f} validation MSE".format(mse[0][1]))
 assert model.score(eval_iter, metric)[0][1] < 0.01001, "Achieved MSE (%f) is larger than expected (0.01001)" % model.score(eval_iter, metric)[0][1]
 ```

 Let us try and add some noise to the evaluation data and see how the MSE changes:

 ```python
 eval_data = np.array([[7,2],[6,10],[12,2]])
 eval_label = np.array([11.1,26.1,16.1]) #Adding 0.1 to each of the values
 eval_iter = mx.io.NDArrayIter(eval_data, eval_label, batch_size, shuffle=False, label_name='lin_reg_label')
 model.score(eval_iter, metric)
 ```

 We can also create a custom metric and use it to evaluate a model. More
 information on metrics can be found in the [API documentation](http://mxnet.incubator.apache.org/api/python/metric/metric.html).

 <!-- INSERT SOURCE DOWNLOAD BUTTONS -->
	# Linear Regression

	In this tutorial we'll walk through how one can implement linear regression using MXNet APIs.

	The function we are trying to learn is: y = x<sub>1</sub> + 2x<sub>2</sub>, where (x<sub>1</sub>,x<sub>2</sub>) are input features and y is the corresponding label.

	## Prerequisites

	To complete this tutorial, we need:

	- MXNet. See the instructions for your operating system in [Setup and Installation](http://mxnet.io/install/index.html).

	- [Jupyter Notebook](http://jupyter.org/index.html).

	```
	$ pip install jupyter
	```

	To begin, the following code imports the necessary packages we'll need for this exercise.

	```python
	import mxnet as mx
	import numpy as np

	# Fix the random seed
	mx.random.seed(42)

	import logging
	logging.getLogger().setLevel(logging.DEBUG)
	```

	## Preparing the Data

	In MXNet, data is input via Data Iterators. Here we will illustrate
	how to encode a dataset into an iterator that MXNet can use. The data used in the example is made up of 2D data points with corresponding integer labels.

	```python
	#Training data
	train_data = np.random.uniform(0, 1, [100, 2])
	train_label = np.array([train_data[i][0] + 2 * train_data[i][1] for i in range(100)])
	batch_size = 1

	#Evaluation Data
	eval_data = np.array([[7,2],[6,10],[12,2]])
	eval_label = np.array([11,26,16])
	```

	Once we have the data ready, we need to put it into an iterator and specify
	parameters such as `batch_size` and `shuffle`. `batch_size` specifies the number
	of examples shown to the model each time we update its parameters and `shuffle`
	tells the iterator to randomize the order in which examples are shown to the model.


	```python
	train_iter = mx.io.NDArrayIter(train_data, train_label, batch_size, shuffle=True, label_name='lin_reg_label')
	eval_iter = mx.io.NDArrayIter(eval_data, eval_label, batch_size, shuffle=False, label_name='lin_reg_label')
	```

	In the above example, we have made use of `NDArrayIter`, which is useful for iterating
	over both numpy ndarrays and MXNet NDArrays. In general, there are different types of iterators in
	MXNet and you can use one based on the type of data you are processing.
	Documentation for iterators can be found [here](http://mxnet.io/api/python/io/io.html).

	## MXNet Classes

	1. IO: The IO class as we already saw works on the data and carries out
	operations such as feeding data in batches and shuffling.

	2. Symbol: The actual MXNet neural network is composed using symbols. MXNet has
	different types of symbols, including variable placeholders for input data,
	neural network layers, and operators that manipulate NDArrays.

	3. Module: The module class in MXNet is used to define the overall computation.
	It is initialized with the model we want to train, the training inputs (data and labels)
	and some additional parameters such as learning rate and the optimization
	algorithm to use.

	## Defining the Model

	MXNet uses Symbols for defining a model. Symbols are the building blocks
	and make up various components of the model. Symbols are used to define:

	1. Variables: A variable is a placeholder for future data. This symbol is used
	to define a spot which will be filled with training data/labels in the future
	when we commence training.
	2. Neural Network Layers: The layers of a network or any other type of model are
	also defined by Symbols. Such a symbol takes one or more previous symbols as
	inputs, performs some transformations on them, and creates one or more outputs.
	One such example is the `FullyConnected` symbol which specifies a fully connected
	layer of a neural network.
	3. Outputs: Output symbols are MXNet's way of defining a loss. They are
	suffixed with the word "Output" (eg. the `SoftmaxOutput` layer). You can also
	[create your own loss function](https://github.com/dmlc/mxnet/blob/master/docs/tutorials/r/CustomLossFunction.md#how-to-use-your-own-loss-function).
	Some examples of existing losses are: `LinearRegressionOutput`, which computes
	the l2-loss between it's input symbol and the labels provided to it;
	`SoftmaxOutput`, which computes the categorical cross-entropy.

	The ones described above and other symbols are chained together with the output of
	one symbol serving as input to the next to build the network topology. More information
	about the different types of symbols can be found [here](http://mxnet.io/api/python/symbol/symbol.html).

	```python
	X = mx.sym.Variable('data')
	Y = mx.symbol.Variable('lin_reg_label')
	fully_connected_layer = mx.sym.FullyConnected(data=X, name='fc1', num_hidden = 1)
	lro = mx.sym.LinearRegressionOutput(data=fully_connected_layer, label=Y, name="lro")
	```

	The above network uses the following layers:

	1. `FullyConnected`: The fully connected symbol represents a fully connected layer
	of a neural network (without any activation being applied), which in essence,
	is just a linear regression on the input attributes. It takes the following
	parameters:

	- `data`: Input to the layer (specifies the symbol whose output should be fed here)
	- `num_hidden`: Number of hidden neurons in the layer, which is same as the dimensionality
	of the layer's output

	2. `LinearRegressionOutput`: Output layers in MXNet compute training loss, which is
	the measure of inaccuracy in the model's predictions. The goal of training is to minimize the
	training loss. In our example, the `LinearRegressionOutput` layer computes the l2 loss against
	its input and the labels provided to it. The parameters to this layer are:

	- `data`: Input to this layer (specifies the symbol whose output should be fed here)
	- `label`: The training labels against which we will compare the input to the layer for calculation of l2 loss

	Note on naming convention: the label variable's name should be the same as the
	`label_name` parameter passed to your training data iterator. The default value of
	this is `softmax_label`, but we have updated it to `lin_reg_label` in this
	tutorial as you can see in `Y = mx.symbol.Variable('lin_reg_label')` and
	`train_iter = mx.io.NDArrayIter(..., label_name='lin_reg_label')`.

	Finally, the network is input to a Module, where we specify the symbol
	whose output needs to be minimized (in our case, `lro` or the `lin_reg_output`), the
	learning rate to be used while optimization and the number of epochs we want to
	train our model for.

	```python
	model = mx.mod.Module(
	symbol = lro ,
	data_names=['data'],
	label_names = ['lin_reg_label']# network structure
	)
	```

	We can visualize the network we created by plotting it:

	```python
	mx.viz.plot_network(symbol=lro)
	```

	## Training the model

	Once we have defined the model structure, the next step is to train the
	parameters of the model to fit the training data. This is accomplished using the
	`fit()` function of the `Module` class.

	```python
	model.fit(train_iter, eval_iter,
	optimizer_params={'learning_rate':0.01, 'momentum': 0.9},
	num_epoch=20,
	eval_metric='mse',
	batch_end_callback = mx.callback.Speedometer(batch_size, 2))
	```

	## Using a trained model: (Testing and Inference)

	Once we have a trained model, we can do a couple of things with it - we can either
	use it for inference or we can evaluate the trained model on test data. The latter is shown below:

	```python
	model.predict(eval_iter).asnumpy()
	```

	We can also evaluate our model according to some metric. In this example, we are
	evaluating our model's mean squared error (MSE) on the evaluation data.

	```python
	metric = mx.metric.MSE()
	mse = model.score(eval_iter, metric)
	print("Achieved {0:.6f} validation MSE".format(mse[0][1]))
	assert model.score(eval_iter, metric)[0][1] < 0.01001, "Achieved MSE (%f) is larger than expected (0.01001)" % model.score(eval_iter, metric)[0][1]
	```

	Let us try and add some noise to the evaluation data and see how the MSE changes:

	```python
	eval_data = np.array([[7,2],[6,10],[12,2]])
	eval_label = np.array([11.1,26.1,16.1]) #Adding 0.1 to each of the values
	eval_iter = mx.io.NDArrayIter(eval_data, eval_label, batch_size, shuffle=False, label_name='lin_reg_label')
	model.score(eval_iter, metric)
	```

	We can also create a custom metric and use it to evaluate a model. More
	information on metrics can be found in the [API documentation](http://mxnet.incubator.apache.org/api/python/metric/metric.html).

	<!-- INSERT SOURCE DOWNLOAD BUTTONS -->