Step 6: Train a Neural Network

Now that you have seen all the necessary components for creating a neural network, you are now ready to put all the pieces together and train a model end to end.

1. Data preparation

The typical process for creating and training a model starts with loading and preparing the datasets. For this Network you will use a dataset of leaf images that consists of healthy and diseased examples of leafs from twelve different plant species. To get this dataset you have to download and extract it with the following commands.

# Import all the necessary libraries to train
import time
import os
import zipfile

import mxnet as mx
from mxnet import np, npx, gluon, init, autograd
from mxnet.gluon import nn
from mxnet.gluon.data.vision import transforms

import matplotlib.pyplot as plt
import matplotlib.pyplot as plt
import numpy as np

from prepare_dataset import process_dataset #utility code to rearrange the data

mx.np.random.seed(42)
# Download dataset
url = 'https://prod-dcd-datasets-cache-zipfiles.s3.eu-west-1.amazonaws.com/hb74ynkjcn-1.zip'
zip_file_path = mx.gluon.utils.download(url)

os.makedirs('plants', exist_ok=True)

with zipfile.ZipFile(zip_file_path, 'r') as zf:
    zf.extractall('plants')

os.remove(zip_file_path)

Data inspection

If you take a look at the dataset you find the following structure for the directories:

plants
|-- Alstonia Scholaris (P2)
|-- Arjun (P1)
|-- Bael (P4)
    |-- diseased
        |-- 0016_0001.JPG
        |-- .
        |-- .
        |-- .
        |-- 0016_0118.JPG
|-- .
|-- .
|-- .
|-- Mango (P0)
    |-- diseased
    |-- healthy

Each plant species has its own directory, for each of those directories you might find subdirectories with examples of diseased leaves, healthy leaves, or both. With this dataset you can formulate different classification problems; for example, you can create a multi-class classifier that determines the species of a plant based on the leaves; you can instead create a binary classifier that tells you whether the plant is healthy or diseased. Additionally, you can create a multi-class, multi-label classifier that tells you both: what species a plant is and whether the plant is diseased or healthy. In this example you will stick to the simplest classification question, which is whether a plant is healthy or not.

To do this, you need to manipulate the dataset in two ways. First, you need to combine all images with labels consisting of healthy and diseased, regardless of the species, and then you need to split the data into train, validation, and test sets. We prepared a small utility script that does this to get the dataset ready for you. Once you run this utility code on the data, the structure will be already organized in folders containing the right images in each of the classes, you can use the ImageFolderDataset class to import the images from the file to MXNet.

# Call the utility function to rearrange the images
process_dataset('plants')

The dataset is located in the datasets folder and the new structure looks like this:

datasets
|-- test
    |-- diseased
    |-- healthy
|-- train
|-- validation
    |-- diseased
    |-- healthy
        |-- image1.JPG
        |-- image2.JPG
        |-- .
        |-- .
        |-- .
        |-- imagen.JPG

Now, you need to create three different Dataset objects from the train, validation, and test folders, and the ImageFolderDataset class takes care of inferring the classes from the directory names. If you don't remember how the ImageFolderDataset works, take a look at Step 5 of this course for a deeper description.

# Use ImageFolderDataset to create a Dataset object from directory structure
train_dataset = gluon.data.vision.ImageFolderDataset('./datasets/train')
val_dataset = gluon.data.vision.ImageFolderDataset('./datasets/validation')
test_dataset = gluon.data.vision.ImageFolderDataset('./datasets/test')

The result from this operation is a different Dataset object for each folder. These objects hold a collection of images and labels and as such they can be indexed, to get the $i$-th element from the dataset. The $i$-th element is a tuple with two objects, the first object of the tuple is the image in array form and the second is the corresponding label for that image.

sample_idx = 888 # choose a random sample
sample = train_dataset[sample_idx]
data = sample[0]
label = sample[1]

plt.imshow(data.asnumpy())
print(f"Data type: {data.dtype}")
print(f"Label: {label}")
print(f"Label description: {train_dataset.synsets[label]}")
print(f"Image shape: {data.shape}")

As you can see from the plot, the image size is very large 4000 x 6000 pixels. Usually, you downsize images before passing them to a neural network to reduce the training time. It is also customary to make slight modifications to the images to improve generalization. That is why you add transformations to the data in a process called Data Augmentation.

You can augment data in MXNet using transforms. For a complete list of all the available transformations in MXNet check out available transforms. It is very common to use more than one transform per image, and it is also common to process transforms sequentially. To this end, you can use the transforms.Compose class. This class is very useful to create a transformation pipeline for your images.

You have to compose two different transformation pipelines, one for training and the other one for validating and testing. This is because each pipeline serves different pursposes. You need to downsize, convert to tensor and normalize images across all the different datsets; however, you typically do not want to randomly flip or add color jitter to the validation or test images since you could reduce performance.

# Import transforms as compose a series of transformations to the images
from mxnet.gluon.data.vision import transforms

jitter_param = 0.05

# mean and std for normalizing image value in range (0,1)
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]

training_transformer = transforms.Compose([
    transforms.Resize(size=224, keep_ratio=True),
    transforms.CenterCrop(128),
    transforms.RandomFlipLeftRight(),
    transforms.RandomColorJitter(contrast=jitter_param),
    transforms.ToTensor(),
    transforms.Normalize(mean, std)
])

validation_transformer = transforms.Compose([
    transforms.Resize(size=224, keep_ratio=True),
    transforms.CenterCrop(128),
    transforms.ToTensor(),
    transforms.Normalize(mean, std)
])

With your augmentations ready, you can create the DataLoaders to use them. To do this the gluon.data.DataLoader class comes in handy. You have to pass the dataset with the applied transformations (notice the .transform_first() method on the datasets) to gluon.data.DataLoader. Additionally, you need to decide the batch size, which is how many images you will be passing to the network, and whether you want to shuffle the dataset.

# Create data loaders
batch_size = 4
train_loader = gluon.data.DataLoader(train_dataset.transform_first(training_transformer),
                                     batch_size=batch_size,
                                     shuffle=True,
                                     try_nopython=True)
validation_loader = gluon.data.DataLoader(val_dataset.transform_first(validation_transformer),
                                          batch_size=batch_size,
                                          try_nopython=True)
test_loader = gluon.data.DataLoader(test_dataset.transform_first(validation_transformer),
                                    batch_size=batch_size,
                                    try_nopython=True)

Now, you can inspect the transformations that you made to the images. A prepared utility function has been provided for this.

# Function to plot batch
def show_batch(batch, columns=4, fig_size=(9, 5), pad=1):
    labels = batch[1].asnumpy()
    batch = batch[0] / 2 + 0.5     # unnormalize
    batch = np.clip(batch.asnumpy(), 0, 1) # clip values
    size = batch.shape[0]
    rows = int(size / columns)
    fig, axes = plt.subplots(rows, columns, figsize=fig_size)
    for ax, img, label in zip(axes.flatten(), batch, labels):
        ax.imshow(np.transpose(img, (1, 2, 0)))
        ax.set(title=f"Label: {label}")
    fig.tight_layout(h_pad=pad, w_pad=pad)
    plt.show()
for batch in train_loader:
    a = batch
    break
show_batch(a)

You can see that the original images changed to have different sizes and variations in color and lighting. These changes followed the specified transformations you stated in the pipeline. You are now ready to go to the next step: Create the architecture.

2. Create Neural Network

Convolutional neural networks are a great tool to capture the spatial relationship of pixel values within images, for this reason they have become the gold standard for computer vision. In this example you will create a small convolutional neural network using what you learned from Step 2 of this crash course series. First, you can set up two functions that will generate the two types of blocks you intend to use, the convolution block and the dense block. Then you can create an entire network based on these two blocks using a custom class.

# The convolutional block has a convolution layer, a max pool layer and a batch normalization layer
def conv_block(filters, kernel_size=2, stride=2, batch_norm=True):
    conv_block = nn.HybridSequential()
    conv_block.add(nn.Conv2D(channels=filters, kernel_size=kernel_size, activation='relu'),
              nn.MaxPool2D(pool_size=4, strides=stride))
    if batch_norm:
        conv_block.add(nn.BatchNorm())
    return conv_block

# The dense block consists of a dense layer and a dropout layer
def dense_block(neurons, activation='relu', dropout=0.2):
    dense_block = nn.HybridSequential()
    dense_block.add(nn.Dense(neurons, activation=activation))
    if dropout:
        dense_block.add(nn.Dropout(dropout))
    return dense_block
# Create neural network blueprint using the blocks
class LeafNetwork(nn.HybridBlock):
    def __init__(self):
        super(LeafNetwork, self).__init__()
        self.conv1 = conv_block(32)
        self.conv2 = conv_block(64)
        self.conv3 = conv_block(128)
        self.flatten = nn.Flatten()
        self.dense1 = dense_block(100)
        self.dense2 = dense_block(10)
        self.dense3 = nn.Dense(2)

    def forward(self, batch):
        batch = self.conv1(batch)
        batch = self.conv2(batch)
        batch = self.conv3(batch)
        batch = self.flatten(batch)
        batch = self.dense1(batch)
        batch = self.dense2(batch)
        batch = self.dense3(batch)

        return batch

You have concluded the architecting part of the network, so now you can actually build a model from that architecture for training. As you have seen previously on Step 4 of this crash course series, to use the network you need to initialize the parameters and hybridize the model.

# Create the model based on the blueprint provided and initialize the parameters
device = mx.gpu()

initializer = mx.initializer.Xavier()

model = LeafNetwork()
model.initialize(initializer, device=device)
model.summary(mx.np.random.uniform(size=(4, 3, 128, 128), device=device))
model.hybridize()

3. Choose Optimizer and Loss function

With the network created you can move on to choosing an optimizer and a loss function. The network you created uses these components to make an informed decision on how to tune the parameters to fit the final objective better. You can use the gluon.Trainer class to help with optimizing these parameters. The gluon.Trainer class needs two things to work properly: the parameters needing to be tuned and the optimizer with its corresponding hyperparameters. The trainer uses the error reported by the loss function to optimize these parameters.

For this particular dataset you will use Stochastic Gradient Descent as the optimizer and Cross Entropy as the loss function.

# SGD optimizer
optimizer = 'sgd'

# Set parameters
optimizer_params = {'learning_rate': 0.001}

# Define the trainer for the model
trainer = gluon.Trainer(model.collect_params(), optimizer, optimizer_params)

# Define the loss function
loss_fn = gluon.loss.SoftmaxCrossEntropyLoss()

Finally, you have to set up the training loop, and you need to create a function to evaluate the performance of the network on the validation dataset.

# Function to return the accuracy for the validation and test set
def test(val_data):
    acc = gluon.metric.Accuracy()
    for batch in val_data:
        data = batch[0]
        labels = batch[1]
        outputs = model(data.to_device(device))
        acc.update([labels], [outputs])

    _, accuracy = acc.get()
    return accuracy

4. Training Loop

Now that you have everything set up, you can start training your network. This might take some time to train depending on the hardware, number of layers, batch size and images you use. For this particular case, you will only train for 2 epochs.

# Start the training loop
epochs = 2
accuracy = gluon.metric.Accuracy()
log_interval = 5

for epoch in range(epochs):
    tic = time.time()
    btic = time.time()
    accuracy.reset()

    for idx, batch in enumerate(train_loader):
        data = batch[0]
        label = batch[1]
        with mx.autograd.record():
            outputs = model(data.to_device(device))
            loss = loss_fn(outputs, label.to_device(device))
        mx.autograd.backward(loss)
        trainer.step(batch_size)
        accuracy.update([label], [outputs])
        if log_interval and (idx + 1) % log_interval == 0:
            _, acc = accuracy.get()

            print(f"""Epoch[{epoch + 1}] Batch[{idx + 1}] Speed: {batch_size / (time.time() - btic)} samples/sec \
                  batch loss = {loss.mean().item()} | accuracy = {acc}""")
            btic = time.time()

    _, acc = accuracy.get()

    acc_val = test(validation_loader)
    print(f"[Epoch {epoch + 1}] training: accuracy={acc}")
    print(f"[Epoch {epoch + 1}] time cost: {time.time() - tic}")
    print(f"[Epoch {epoch + 1}] validation: validation accuracy={acc_val}")

5. Test on the test set

Now that your network is trained and has reached a decent accuracy, you can evaluate the performance on the test set. For that, you can use the test_loader data loader and the test function you created previously.

test(test_loader)

You have a trained network that can confidently discriminate between plants that are healthy and the ones that are diseased. You can now start your garden and set cameras to automatically detect plants in distress! Or change your classification problem to create a model that classify the species of the plants! Either way you might be able to impress your botanist friends.

6. Save the parameters

If you want to preserve the trained weights of the network you can save the parameters in a file. Later, when you want to use the network to make predictions you can load the parameters back!

# Save parameters in the
model.save_parameters('leaf_models.params')

This is the end of this tutorial, to see how you can speed up the training by using GPU hardware continue to the next tutorial