Data and models are not the only components that you need to train a deep learning model. In this notebook, you will learn about the common components involved in training deep learning models. Here is a list of components necessary for training models in MXNet.
from mxnet import np, npx,gluon import mxnet as mx from mxnet.gluon import nn npx.set_np() device = mx.cpu()
In a previous notebook, you used net.initialize() to initialize the network before a forward pass. Now, you will learn about initialization in a little more detail.
First, define and initialize the sequential network from earlier. After you initialize it, print the parameters using collect_params() method.
net = nn.Sequential() net.add(nn.Dense(5, in_units=3, activation="relu"), nn.Dense(25, activation="relu"), nn.Dense(2) ) net
net.initialize() params = net.collect_params() for key, value in params.items(): print(key, value)
Next, you will print shape and params after the first forward pass.
x = np.random.uniform(-1, 1, (10, 3)) net(x) # Forward computation params = net.collect_params() for key, value in params.items(): print(key, value)
MXNet makes it easy to initialize by providing many common initializers. A subset that you will be using in the following sections include:
For more information, see Initializers
When you use net.intialize(), MXNet, by default, initializes the weight matrices uniformly by drawing random values with a uniform-distribution between −0.07 and 0.07 and updates the bias parameters by setting them all to 0.
To initialize your network using different built-in types, you have to use the init keyword argument in the initialize() method. Here is an example using constant and normal initialization.
from mxnet import init # Constant init initializes the weights to be a constant value for all the params net.initialize(init=init.Constant(3), device=device) print(net[0].weight.data()[0])
If you use Normal to initialize your weights then you will use a normal distribution with a mean of zero and standard deviation of sigma. If you have already initialized the weight but want to reinitialize the weight, set the force_reinit flag to True.
net.initialize(init=init.Normal(sigma=0.2), force_reinit=True, device=device) print(net[0].weight.data()[0])
Till now you have seen how to create an algorithm and how to initialize it using mxnet APIs; additionally you have learned the basics of using mxnet. When you start training the ML algorithm, how do you actually teach the algorithm to learn or train?
There are three main components for training an algorithm.
You have already learned about autograd in the previous notebook. In this notebook, you will learn more about loss functions and optimizers.
Loss functions are used to train neural networks and help the algorithm learn from the data. The loss function computes the difference between the output from the neural network and ground truth. This output is used to update the neural network weights during training. Next, you will look at a simple example.
Suppose you have a neural network net and the data is stored in a variable data. The data consists of 5 total records (rows) and two features (columns) and the output from the neural network after the first epoch is given by the variable nn_output.
net = gluon.nn.Dense(1) net.initialize() nn_input = np.array([[1.2, 0.56], [3.0, 0.72], [0.89, 0.9], [0.89, 2.3], [0.99, 0.52]]) nn_output = net(nn_input) nn_output
The ground truth value of the data is stored in groundtruth_label is
groundtruth_label = np.array([[0.0083], [0.00382], [0.02061], [0.00495], [0.00639]]).reshape(5, 1)
For this problem, you will use the L2 Loss. L2Loss, also called Mean Squared Error, is a regression loss function that computes the squared distances between the target values and the output of the neural network. It is defined as:
$$L = \frac{1}{2N}\sum_i{|label_i − pred_i|)^2}$$
The L2 loss function creates larger gradients for loss values which are farther apart due to the square operator and it also smooths the loss function space.
def L2Loss(output_values, true_values): return np.mean((output_values - true_values) ** 2, axis=1) / 2 L2Loss(nn_output, groundtruth_label)
Now, you can do the same thing using the mxnet API
from mxnet.gluon import nn, loss as gloss loss = gloss.L2Loss() loss(nn_output, groundtruth_label)
A network can improve by iteratively updating its weights to minimise the loss. Some tasks use a combination of multiple loss functions, but often you will just use one. MXNet Gluon provides a number of the most commonly used loss functions. The choice of your loss function will depend on your network and task. Some common tasks and loss function pairs include:
regression: L1Loss, L2Loss
classification: SigmoidBinaryCrossEntropyLoss, SoftmaxCrossEntropyLoss
embeddings: HingeLoss
You can also create custom loss functions using Loss Blocks.
You can inherit the base Loss class and write your own forward method. The backward propagation will be automatically computed by autograd. However, that only holds true if you can build your loss from existing mxnet operators.
from mxnet.gluon.loss import Loss class custom_L1_loss(Loss): def __init__(self, weight=None, batch_axis=0, **kwargs): super(custom_L1_loss, self).__init__(weight, batch_axis, **kwargs) def forward(self, pred, label): l = np.abs(label - pred) l = l.reshape(len(l),) return l L1 = custom_L1_loss() L1(nn_output, groundtruth_label)
l1=gloss.L1Loss() l1(nn_output, groundtruth_label)
The loss function determines how much to change the parameters based on how far the model is from the groundtruth. Optimizer determines how the model weights or parameters are updated based on the loss function. In Gluon, this optimization step is performed by the gluon.Trainer.
Here is a basic example of how to call the gluon.Trainer method.
from mxnet import optimizer
trainer = gluon.Trainer(net.collect_params(), optimizer="Adam", optimizer_params={ "learning_rate":0.1, "wd":0.001 })
When creating a Gluon Trainer, you must provide the trainer object with
trainer.step is called. For more information, see optimizerscurr_weight = net.weight.data() print(curr_weight)
batch_size = len(nn_input) trainer.step(batch_size, ignore_stale_grad=True) print(net.weight.data())
print(curr_weight - net.weight.grad() * 1 / 5)
MXNet includes a metrics API that you can use to evaluate how your model is performing. This is typically used during training to monitor performance on the validation set. MXNet includes many commonly used metrics, a few are listed below:
Now, you will define two arrays for a dummy binary classification example.
# Vector of likelihoods for all the classes pred = np.array([[0.1, 0.9], [0.05, 0.95], [0.83, 0.17], [0.63, 0.37]]) labels = np.array([1, 1, 0, 1])
Before you can calculate the accuracy of your model, the metric (accuracy) should be instantiated before the training loop
from mxnet.gluon.metric import Accuracy acc = Accuracy()
To run and calculate the updated accuracy for each batch or epoch, you can call the update() method. This method uses labels and predictions which can be either class indexes or a vector of likelihoods for all of the classes.
acc.update(labels=labels, preds=pred)
In addition to built-in metrics, if you want to create a custom metric, you can use the following skeleton code. This code inherits from the EvalMetric base class.
def MyCustomMetric(EvalMetric): def __init__(self): super().init() def update(self, labels, preds): pass
Here is an example using the Precision metric. First, define the two values labels and preds.
labels = np.array([0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1]) preds = np.array([0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0])
Next, define the custom metric class precision and instantiate it
from mxnet.gluon.metric import EvalMetric class precision(EvalMetric): def __init__(self): super().__init__(name="Precision") def update(self,labels, preds): tp_labels = (labels == 1) true_positives = sum(preds[tp_labels] == 1) fp_labels = (labels == 0) false_positives = sum(preds[fp_labels] == 1) return true_positives / (true_positives + false_positives) p = precision()
And finally, call the update method to return the results of precision for your data
p.update(np.array(labels), np.array(preds))
Now that you have learned all the components required to train a neural network, you will see how to load your data using the Gluon API in Step 5: Gluon Datasets and DataLoader