versions/master/api/python/docs/_sources/tutorials/getting-started/crash-course/7-use-gpus.ipynb - mxnet-site - Git at Google

 {
  "cells": [
   {
    "cell_type": "markdown",
    "id": "50f4ae7c",
    "metadata": {},
    "source": [
     "<!--- Licensed to the Apache Software Foundation (ASF) under one -->\n",
     "<!--- or more contributor license agreements.  See the NOTICE file -->\n",
     "<!--- distributed with this work for additional information -->\n",
     "<!--- regarding copyright ownership.  The ASF licenses this file -->\n",
     "<!--- to you under the Apache License, Version 2.0 (the -->\n",
     "<!--- \"License\"); you may not use this file except in compliance -->\n",
     "<!--- with the License.  You may obtain a copy of the License at -->\n",
     "\n",
     "<!---   http://www.apache.org/licenses/LICENSE-2.0 -->\n",
     "\n",
     "<!--- Unless required by applicable law or agreed to in writing, -->\n",
     "<!--- software distributed under the License is distributed on an -->\n",
     "<!--- \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->\n",
     "<!--- KIND, either express or implied.  See the License for the -->\n",
     "<!--- specific language governing permissions and limitations -->\n",
     "<!--- under the License. -->\n",
     "\n",
     "# Step 7: Load and Run a NN using GPU\n",
     "\n",
     "In this step, you will learn how to use graphics processing units (GPUs) with MXNet. If you use GPUs to train and deploy neural networks, you may be able to train or perform inference quicker than with central processing units (CPUs).\n",
     "\n",
     "## Prerequisites\n",
     "\n",
     "Before you start the steps, make sure you have at least one Nvidia GPU on your machine and make sure that you have CUDA properly installed. GPUs from AMD and Intel are not supported. Additionally, you will need to install the GPU-enabled version of MXNet. You can find information about how to install the GPU version of MXNet for your system [here](https://mxnet.apache.org/versions/1.4.1/install/ubuntu_setup.html).\n",
     "\n",
     "You can use the following command to view the number GPUs that are available to MXNet."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "c6825db0",
    "metadata": {},
    "outputs": [],
    "source": [
     "from mxnet import np, npx, gluon, autograd\n",
     "from mxnet.gluon import nn\n",
     "import time\n",
     "npx.set_np()\n",
     "\n",
     "npx.num_gpus() #This command provides the number of GPUs MXNet can access"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "80d930e5",
    "metadata": {},
    "source": [
     "## Allocate data to a GPU\n",
     "\n",
     "MXNet's ndarray is very similar to NumPy's. One major difference is that MXNet's ndarray has a `context` attribute specifieing which device an array is on. By default, arrays are stored on `npx.cpu()`. To change it to the first GPU, you can use the following code, `npx.gpu()` or `npx.gpu(0)` to indicate the first GPU."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "27300d20",
    "metadata": {},
    "outputs": [],
    "source": [
     "gpu = npx.gpu() if npx.num_gpus() > 0 else npx.cpu()\n",
     "x = np.ones((3,4), ctx=gpu)\n",
     "x"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "d7022f2f",
    "metadata": {},
    "source": [
     "If you're using a CPU, MXNet allocates data on the main memory and tries to use as many CPU cores as possible.  If there are multiple GPUs, MXNet will tell you which GPUs the ndarray is allocated on.\n",
     "\n",
     "Assuming there is at least two GPUs. You can create another ndarray and assign it to a different GPU. If you only have one GPU, then you will get an error trying to run this code. In the example code here, you will copy `x` to the second GPU, `npx.gpu(1)`:"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "373d139a",
    "metadata": {},
    "outputs": [],
    "source": [
     "gpu_1 = npx.gpu(1) if npx.num_gpus() > 1 else npx.cpu()\n",
     "x.copyto(gpu_1)"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "499d6b53",
    "metadata": {},
    "source": [
     "MXNet requries that users explicitly move data between devices. But several operators such as `print`, and `asnumpy`, will implicitly move data to main memory.\n",
     "\n",
     "## Choosing GPU Ids\n",
     "If you have multiple GPUs on your machine, MXNet can access each of them through 0-indexing with `npx`. As you saw before, the first GPU was accessed using `npx.gpu(0)`, and the second using `npx.gpu(1)`. This extends to however many GPUs your machine has. So if your machine has eight GPUs, the last GPU is accessed using `npx.gpu(7)`. This allows you to select which GPUs to use for operations and training. You might find it particularly useful when you want to leverage multiple GPUs while training neural networks.\n",
     "\n",
     "## Run an operation on a GPU\n",
     "\n",
     "To perform an operation on a particular GPU, you only need to guarantee that the input of an operation is already on that GPU. The output is allocated on the same GPU as well. Almost all operators in the `np` and `npx` module support running on a GPU."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "f3863981",
    "metadata": {},
    "outputs": [],
    "source": [
     "y = np.random.uniform(size=(3,4), ctx=gpu)\n",
     "x + y"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "7849f602",
    "metadata": {},
    "source": [
     "Remember that if the inputs are not on the same GPU, you will get an error.\n",
     "\n",
     "## Run a neural network on a GPU\n",
     "\n",
     "To run a neural network on a GPU, you only need to copy and move the input data and parameters to the GPU. To demonstrate this you can reuse the previously defined LeafNetwork in [Training Neural Networks](6-train-nn.md). The following code example shows this."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "5acf8c51",
    "metadata": {},
    "outputs": [],
    "source": [
     "# The convolutional block has a convolution layer, a max pool layer and a batch normalization layer\n",
     "def conv_block(filters, kernel_size=2, stride=2, batch_norm=True):\n",
     "    conv_block = nn.HybridSequential()\n",
     "    conv_block.add(nn.Conv2D(channels=filters, kernel_size=kernel_size, activation='relu'),\n",
     "              nn.MaxPool2D(pool_size=4, strides=stride))\n",
     "    if batch_norm:\n",
     "        conv_block.add(nn.BatchNorm())\n",
     "    return conv_block\n",
     "\n",
     "# The dense block consists of a dense layer and a dropout layer\n",
     "def dense_block(neurons, activation='relu', dropout=0.2):\n",
     "    dense_block = nn.HybridSequential()\n",
     "    dense_block.add(nn.Dense(neurons, activation=activation))\n",
     "    if dropout:\n",
     "        dense_block.add(nn.Dropout(dropout))\n",
     "    return dense_block\n",
     "\n",
     "# Create neural network blueprint using the blocks\n",
     "class LeafNetwork(nn.HybridBlock):\n",
     "    def __init__(self):\n",
     "        super(LeafNetwork, self).__init__()\n",
     "        self.conv1 = conv_block(32)\n",
     "        self.conv2 = conv_block(64)\n",
     "        self.conv3 = conv_block(128)\n",
     "        self.flatten = nn.Flatten()\n",
     "        self.dense1 = dense_block(100)\n",
     "        self.dense2 = dense_block(10)\n",
     "        self.dense3 = nn.Dense(2)\n",
     "\n",
     "    def forward(self, batch):\n",
     "        batch = self.conv1(batch)\n",
     "        batch = self.conv2(batch)\n",
     "        batch = self.conv3(batch)\n",
     "        batch = self.flatten(batch)\n",
     "        batch = self.dense1(batch)\n",
     "        batch = self.dense2(batch)\n",
     "        batch = self.dense3(batch)\n",
     "\n",
     "        return batch"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "9cd9d591",
    "metadata": {},
    "source": [
     "Load the saved parameters onto GPU 0 directly as shown below; additionally, you could use `net.collect_params().reset_ctx(gpu)` to change the device."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "c32de982",
    "metadata": {},
    "outputs": [],
    "source": [
     "net.load_parameters('leaf_models.params', ctx=gpu)"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "a7b28d5e",
    "metadata": {},
    "source": [
     "Use the following command to create input data on GPU 0. The forward function will then run on GPU 0."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "2e650e01",
    "metadata": {},
    "outputs": [],
    "source": [
     "x = np.random.uniform(size=(1, 3, 128, 128), ctx=gpu)\n",
     "net(x)"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "7ad92962",
    "metadata": {},
    "source": [
     "## Training with multiple GPUs\n",
     "\n",
     "Finally, you will see how you can use multiple GPUs to jointly train a neural network through data parallelism. To elaborate on what data parallelism is, assume there are *n* GPUs, then you can split each data batch into *n* parts, and use a GPU on each of these parts to run the forward and backward passes on the seperate chunks of the data.\n",
     "\n",
     "First copy the data definitions with the following commands, and the transform functions from the tutorial [Training Neural Networks](6-train-nn.md)."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "cf7db373",
    "metadata": {},
    "outputs": [],
    "source": [
     "# Import transforms as compose a series of transformations to the images\n",
     "from mxnet.gluon.data.vision import transforms\n",
     "\n",
     "jitter_param = 0.05\n",
     "\n",
     "# mean and std for normalizing image value in range (0,1)\n",
     "mean = [0.485, 0.456, 0.406]\n",
     "std = [0.229, 0.224, 0.225]\n",
     "\n",
     "training_transformer = transforms.Compose([\n",
     "    transforms.Resize(size=224, keep_ratio=True),\n",
     "    transforms.CenterCrop(128),\n",
     "    transforms.RandomFlipLeftRight(),\n",
     "    transforms.RandomColorJitter(contrast=jitter_param),\n",
     "    transforms.ToTensor(),\n",
     "    transforms.Normalize(mean, std)\n",
     "])\n",
     "\n",
     "validation_transformer = transforms.Compose([\n",
     "    transforms.Resize(size=224, keep_ratio=True),\n",
     "    transforms.CenterCrop(128),\n",
     "    transforms.ToTensor(),\n",
     "    transforms.Normalize(mean, std)\n",
     "])\n",
     "\n",
     "# Create data loaders\n",
     "batch_size = 4\n",
     "train_loader = gluon.data.DataLoader(train_dataset.transform_first(training_transformer),batch_size=batch_size, shuffle=True, try_nopython=True)\n",
     "validation_loader = gluon.data.DataLoader(val_dataset.transform_first(validation_transformer), batch_size=batch_size, try_nopython=True)\n",
     "test_loader = gluon.data.DataLoader(test_dataset.transform_first(validation_transformer), batch_size=batch_size, try_nopython=True)"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "faaced11",
    "metadata": {},
    "source": [
     "### Define a helper function\n",
     "This is the same test function defined previously in the **Step 6**."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "4d3164bf",
    "metadata": {},
    "outputs": [],
    "source": [
     "# Function to return the accuracy for the validation and test set\n",
     "def test(val_data):\n",
     "    acc = gluon.metric.Accuracy()\n",
     "    for batch in val_data:\n",
     "        data = batch[0]\n",
     "        labels = batch[1]\n",
     "        outputs = model(data)\n",
     "        acc.update([labels], [outputs])\n",
     "\n",
     "    _, accuracy = acc.get()\n",
     "    return accuracy"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "9ec76d7e",
    "metadata": {},
    "source": [
     "The training loop is quite similar to that shown earlier. The major differences are highlighted in the following code."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "1ee8babc",
    "metadata": {},
    "outputs": [],
    "source": [
     "# Diff 1: Use two GPUs for training.\n",
     "available_gpus = [npx.gpu(i) for i in range(npx.num_gpus())]\n",
     "num_gpus = 2\n",
     "devices = available_gpus[:num_gpus]\n",
     "print('Using {} GPUs'.format(len(devices)))\n",
     "\n",
     "# Diff 2: reinitialize the parameters and place them on multiple GPUs\n",
     "net.initialize(force_reinit=True, ctx=devices)\n",
     "\n",
     "# Loss and trainer are the same as before\n",
     "loss_fn = gluon.loss.SoftmaxCrossEntropyLoss()\n",
     "optimizer = 'sgd'\n",
     "optimizer_params = {'learning_rate': 0.001}\n",
     "trainer = gluon.Trainer(net.collect_params(), optimizer, optimizer_params)\n",
     "\n",
     "epochs = 2\n",
     "accuracy = gluon.metric.Accuracy()\n",
     "log_interval = 5\n",
     "\n",
     "for epoch in range(10):\n",
     "    train_loss = 0.\n",
     "    tic = time.time()\n",
     "    btic = time.time()\n",
     "    accuracy.reset()\n",
     "    for idx, batch in enumerate(train_loader):\n",
     "        data, label = batch[0], batch[1]\n",
     "\n",
     "        # Diff 3: split batch and load into corresponding devices\n",
     "        data_list = gluon.utils.split_and_load(data, devices)\n",
     "        label_list = gluon.utils.split_and_load(label, devices)\n",
     "\n",
     "        # Diff 4: run forward and backward on each devices.\n",
     "        # MXNet will automatically run them in parallel\n",
     "        with autograd.record():\n",
     "            outputs = [net(X)\n",
     "                      for X in data_list]\n",
     "            losses = [loss_fn(output, label)\n",
     "                      for output, label in zip(outputs, label_list)]\n",
     "        for l in losses:\n",
     "            l.backward()\n",
     "        trainer.step(batch_size)\n",
     "\n",
     "        # Diff 5: sum losses over all devices. Here, the float\n",
     "        # function will copy data into CPU.\n",
     "        train_loss += sum([float(l.sum()) for l in losses])\n",
     "        accuracy.update(label_list, outputs)\n",
     "        if log_interval and (idx + 1) % log_interval == 0:\n",
     "            _, acc = accuracy.get()\n",
     "\n",
     "            print(f\"\"\"Epoch[{epoch + 1}] Batch[{idx + 1}] Speed: {batch_size / (time.time() - btic)} samples/sec \\\n",
     "                  batch loss = {train_loss} | accuracy = {acc}\"\"\")\n",
     "            btic = time.time()\n",
     "\n",
     "    _, acc = accuracy.get()\n",
     "\n",
     "    acc_val = test(validation_loader)\n",
     "    print(f\"[Epoch {epoch + 1}] training: accuracy={acc}\")\n",
     "    print(f\"[Epoch {epoch + 1}] time cost: {time.time() - tic}\")\n",
     "    print(f\"[Epoch {epoch + 1}] validation: validation accuracy={acc_val}\")"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "8702061e",
    "metadata": {},
    "source": [
     "## Next steps\n",
     "\n",
     "Now that you have completed training and predicting with a neural network on GPUs, you reached the conclusion of the crash course. Congratulations.\n",
     "If you are keen on studying more, checkout [D2L.ai](https://d2l.ai),\n",
     "[GluonCV](https://cv.gluon.ai/tutorials/index.html), [GluonNLP](https://nlp.gluon.ai),\n",
     "[GluonTS](https://ts.gluon.ai/), [AutoGluon](https://auto.gluon.ai)."
    ]
   }
  ],
  "metadata": {
   "language_info": {
    "name": "python"
   }
  },
  "nbformat": 4,
  "nbformat_minor": 5
 }
	{
	"cells": [
	{
	"cell_type": "markdown",
	"id": "50f4ae7c",
	"metadata": {},
	"source": [
	"<!--- Licensed to the Apache Software Foundation (ASF) under one -->\n",
	"<!--- or more contributor license agreements. See the NOTICE file -->\n",
	"<!--- distributed with this work for additional information -->\n",
	"<!--- regarding copyright ownership. The ASF licenses this file -->\n",
	"<!--- to you under the Apache License, Version 2.0 (the -->\n",
	"<!--- \"License\"); you may not use this file except in compliance -->\n",
	"<!--- with the License. You may obtain a copy of the License at -->\n",
	"\n",
	"<!--- http://www.apache.org/licenses/LICENSE-2.0 -->\n",
	"\n",
	"<!--- Unless required by applicable law or agreed to in writing, -->\n",
	"<!--- software distributed under the License is distributed on an -->\n",
	"<!--- \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->\n",
	"<!--- KIND, either express or implied. See the License for the -->\n",
	"<!--- specific language governing permissions and limitations -->\n",
	"<!--- under the License. -->\n",
	"\n",
	"# Step 7: Load and Run a NN using GPU\n",
	"\n",
	"In this step, you will learn how to use graphics processing units (GPUs) with MXNet. If you use GPUs to train and deploy neural networks, you may be able to train or perform inference quicker than with central processing units (CPUs).\n",
	"\n",
	"## Prerequisites\n",
	"\n",
	"Before you start the steps, make sure you have at least one Nvidia GPU on your machine and make sure that you have CUDA properly installed. GPUs from AMD and Intel are not supported. Additionally, you will need to install the GPU-enabled version of MXNet. You can find information about how to install the GPU version of MXNet for your system [here](https://mxnet.apache.org/versions/1.4.1/install/ubuntu_setup.html).\n",
	"\n",
	"You can use the following command to view the number GPUs that are available to MXNet."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "c6825db0",
	"metadata": {},
	"outputs": [],
	"source": [
	"from mxnet import np, npx, gluon, autograd\n",
	"from mxnet.gluon import nn\n",
	"import time\n",
	"npx.set_np()\n",
	"\n",
	"npx.num_gpus() #This command provides the number of GPUs MXNet can access"
	]
	},
	{
	"cell_type": "markdown",
	"id": "80d930e5",
	"metadata": {},
	"source": [
	"## Allocate data to a GPU\n",
	"\n",
	"MXNet's ndarray is very similar to NumPy's. One major difference is that MXNet's ndarray has a `context` attribute specifieing which device an array is on. By default, arrays are stored on `npx.cpu()`. To change it to the first GPU, you can use the following code, `npx.gpu()` or `npx.gpu(0)` to indicate the first GPU."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "27300d20",
	"metadata": {},
	"outputs": [],
	"source": [
	"gpu = npx.gpu() if npx.num_gpus() > 0 else npx.cpu()\n",
	"x = np.ones((3,4), ctx=gpu)\n",
	"x"
	]
	},
	{
	"cell_type": "markdown",
	"id": "d7022f2f",
	"metadata": {},
	"source": [
	"If you're using a CPU, MXNet allocates data on the main memory and tries to use as many CPU cores as possible. If there are multiple GPUs, MXNet will tell you which GPUs the ndarray is allocated on.\n",
	"\n",
	"Assuming there is at least two GPUs. You can create another ndarray and assign it to a different GPU. If you only have one GPU, then you will get an error trying to run this code. In the example code here, you will copy `x` to the second GPU, `npx.gpu(1)`:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "373d139a",
	"metadata": {},
	"outputs": [],
	"source": [
	"gpu_1 = npx.gpu(1) if npx.num_gpus() > 1 else npx.cpu()\n",
	"x.copyto(gpu_1)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "499d6b53",
	"metadata": {},
	"source": [
	"MXNet requries that users explicitly move data between devices. But several operators such as `print`, and `asnumpy`, will implicitly move data to main memory.\n",
	"\n",
	"## Choosing GPU Ids\n",
	"If you have multiple GPUs on your machine, MXNet can access each of them through 0-indexing with `npx`. As you saw before, the first GPU was accessed using `npx.gpu(0)`, and the second using `npx.gpu(1)`. This extends to however many GPUs your machine has. So if your machine has eight GPUs, the last GPU is accessed using `npx.gpu(7)`. This allows you to select which GPUs to use for operations and training. You might find it particularly useful when you want to leverage multiple GPUs while training neural networks.\n",
	"\n",
	"## Run an operation on a GPU\n",
	"\n",
	"To perform an operation on a particular GPU, you only need to guarantee that the input of an operation is already on that GPU. The output is allocated on the same GPU as well. Almost all operators in the `np` and `npx` module support running on a GPU."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "f3863981",
	"metadata": {},
	"outputs": [],
	"source": [
	"y = np.random.uniform(size=(3,4), ctx=gpu)\n",
	"x + y"
	]
	},
	{
	"cell_type": "markdown",
	"id": "7849f602",
	"metadata": {},
	"source": [
	"Remember that if the inputs are not on the same GPU, you will get an error.\n",
	"\n",
	"## Run a neural network on a GPU\n",
	"\n",
	"To run a neural network on a GPU, you only need to copy and move the input data and parameters to the GPU. To demonstrate this you can reuse the previously defined LeafNetwork in [Training Neural Networks](6-train-nn.md). The following code example shows this."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "5acf8c51",
	"metadata": {},
	"outputs": [],
	"source": [
	"# The convolutional block has a convolution layer, a max pool layer and a batch normalization layer\n",
	"def conv_block(filters, kernel_size=2, stride=2, batch_norm=True):\n",
	" conv_block = nn.HybridSequential()\n",
	" conv_block.add(nn.Conv2D(channels=filters, kernel_size=kernel_size, activation='relu'),\n",
	" nn.MaxPool2D(pool_size=4, strides=stride))\n",
	" if batch_norm:\n",
	" conv_block.add(nn.BatchNorm())\n",
	" return conv_block\n",
	"\n",
	"# The dense block consists of a dense layer and a dropout layer\n",
	"def dense_block(neurons, activation='relu', dropout=0.2):\n",
	" dense_block = nn.HybridSequential()\n",
	" dense_block.add(nn.Dense(neurons, activation=activation))\n",
	" if dropout:\n",
	" dense_block.add(nn.Dropout(dropout))\n",
	" return dense_block\n",
	"\n",
	"# Create neural network blueprint using the blocks\n",
	"class LeafNetwork(nn.HybridBlock):\n",
	" def __init__(self):\n",
	" super(LeafNetwork, self).__init__()\n",
	" self.conv1 = conv_block(32)\n",
	" self.conv2 = conv_block(64)\n",
	" self.conv3 = conv_block(128)\n",
	" self.flatten = nn.Flatten()\n",
	" self.dense1 = dense_block(100)\n",
	" self.dense2 = dense_block(10)\n",
	" self.dense3 = nn.Dense(2)\n",
	"\n",
	" def forward(self, batch):\n",
	" batch = self.conv1(batch)\n",
	" batch = self.conv2(batch)\n",
	" batch = self.conv3(batch)\n",
	" batch = self.flatten(batch)\n",
	" batch = self.dense1(batch)\n",
	" batch = self.dense2(batch)\n",
	" batch = self.dense3(batch)\n",
	"\n",
	" return batch"
	]
	},
	{
	"cell_type": "markdown",
	"id": "9cd9d591",
	"metadata": {},
	"source": [
	"Load the saved parameters onto GPU 0 directly as shown below; additionally, you could use `net.collect_params().reset_ctx(gpu)` to change the device."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "c32de982",
	"metadata": {},
	"outputs": [],
	"source": [
	"net.load_parameters('leaf_models.params', ctx=gpu)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "a7b28d5e",
	"metadata": {},
	"source": [
	"Use the following command to create input data on GPU 0. The forward function will then run on GPU 0."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "2e650e01",
	"metadata": {},
	"outputs": [],
	"source": [
	"x = np.random.uniform(size=(1, 3, 128, 128), ctx=gpu)\n",
	"net(x)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "7ad92962",
	"metadata": {},
	"source": [
	"## Training with multiple GPUs\n",
	"\n",
	"Finally, you will see how you can use multiple GPUs to jointly train a neural network through data parallelism. To elaborate on what data parallelism is, assume there are n GPUs, then you can split each data batch into n parts, and use a GPU on each of these parts to run the forward and backward passes on the seperate chunks of the data.\n",
	"\n",
	"First copy the data definitions with the following commands, and the transform functions from the tutorial [Training Neural Networks](6-train-nn.md)."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "cf7db373",
	"metadata": {},
	"outputs": [],
	"source": [
	"# Import transforms as compose a series of transformations to the images\n",
	"from mxnet.gluon.data.vision import transforms\n",
	"\n",
	"jitter_param = 0.05\n",
	"\n",
	"# mean and std for normalizing image value in range (0,1)\n",
	"mean = [0.485, 0.456, 0.406]\n",
	"std = [0.229, 0.224, 0.225]\n",
	"\n",
	"training_transformer = transforms.Compose([\n",
	" transforms.Resize(size=224, keep_ratio=True),\n",
	" transforms.CenterCrop(128),\n",
	" transforms.RandomFlipLeftRight(),\n",
	" transforms.RandomColorJitter(contrast=jitter_param),\n",
	" transforms.ToTensor(),\n",
	" transforms.Normalize(mean, std)\n",
	"])\n",
	"\n",
	"validation_transformer = transforms.Compose([\n",
	" transforms.Resize(size=224, keep_ratio=True),\n",
	" transforms.CenterCrop(128),\n",
	" transforms.ToTensor(),\n",
	" transforms.Normalize(mean, std)\n",
	"])\n",
	"\n",
	"# Create data loaders\n",
	"batch_size = 4\n",
	"train_loader = gluon.data.DataLoader(train_dataset.transform_first(training_transformer),batch_size=batch_size, shuffle=True, try_nopython=True)\n",
	"validation_loader = gluon.data.DataLoader(val_dataset.transform_first(validation_transformer), batch_size=batch_size, try_nopython=True)\n",
	"test_loader = gluon.data.DataLoader(test_dataset.transform_first(validation_transformer), batch_size=batch_size, try_nopython=True)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "faaced11",
	"metadata": {},
	"source": [
	"### Define a helper function\n",
	"This is the same test function defined previously in the Step 6."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "4d3164bf",
	"metadata": {},
	"outputs": [],
	"source": [
	"# Function to return the accuracy for the validation and test set\n",
	"def test(val_data):\n",
	" acc = gluon.metric.Accuracy()\n",
	" for batch in val_data:\n",
	" data = batch[0]\n",
	" labels = batch[1]\n",
	" outputs = model(data)\n",
	" acc.update([labels], [outputs])\n",
	"\n",
	" _, accuracy = acc.get()\n",
	" return accuracy"
	]
	},
	{
	"cell_type": "markdown",
	"id": "9ec76d7e",
	"metadata": {},
	"source": [
	"The training loop is quite similar to that shown earlier. The major differences are highlighted in the following code."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "1ee8babc",
	"metadata": {},
	"outputs": [],
	"source": [
	"# Diff 1: Use two GPUs for training.\n",
	"available_gpus = [npx.gpu(i) for i in range(npx.num_gpus())]\n",
	"num_gpus = 2\n",
	"devices = available_gpus[:num_gpus]\n",
	"print('Using {} GPUs'.format(len(devices)))\n",
	"\n",
	"# Diff 2: reinitialize the parameters and place them on multiple GPUs\n",
	"net.initialize(force_reinit=True, ctx=devices)\n",
	"\n",
	"# Loss and trainer are the same as before\n",
	"loss_fn = gluon.loss.SoftmaxCrossEntropyLoss()\n",
	"optimizer = 'sgd'\n",
	"optimizer_params = {'learning_rate': 0.001}\n",
	"trainer = gluon.Trainer(net.collect_params(), optimizer, optimizer_params)\n",
	"\n",
	"epochs = 2\n",
	"accuracy = gluon.metric.Accuracy()\n",
	"log_interval = 5\n",
	"\n",
	"for epoch in range(10):\n",
	" train_loss = 0.\n",
	" tic = time.time()\n",
	" btic = time.time()\n",
	" accuracy.reset()\n",
	" for idx, batch in enumerate(train_loader):\n",
	" data, label = batch[0], batch[1]\n",
	"\n",
	" # Diff 3: split batch and load into corresponding devices\n",
	" data_list = gluon.utils.split_and_load(data, devices)\n",
	" label_list = gluon.utils.split_and_load(label, devices)\n",
	"\n",
	" # Diff 4: run forward and backward on each devices.\n",
	" # MXNet will automatically run them in parallel\n",
	" with autograd.record():\n",
	" outputs = [net(X)\n",
	" for X in data_list]\n",
	" losses = [loss_fn(output, label)\n",
	" for output, label in zip(outputs, label_list)]\n",
	" for l in losses:\n",
	" l.backward()\n",
	" trainer.step(batch_size)\n",
	"\n",
	" # Diff 5: sum losses over all devices. Here, the float\n",
	" # function will copy data into CPU.\n",
	" train_loss += sum([float(l.sum()) for l in losses])\n",
	" accuracy.update(label_list, outputs)\n",
	" if log_interval and (idx + 1) % log_interval == 0:\n",
	" _, acc = accuracy.get()\n",
	"\n",
	" print(f\"\"\"Epoch[{epoch + 1}] Batch[{idx + 1}] Speed: {batch_size / (time.time() - btic)} samples/sec \\\n",
	" batch loss = {train_loss} \| accuracy = {acc}\"\"\")\n",
	" btic = time.time()\n",
	"\n",
	" _, acc = accuracy.get()\n",
	"\n",
	" acc_val = test(validation_loader)\n",
	" print(f\"[Epoch {epoch + 1}] training: accuracy={acc}\")\n",
	" print(f\"[Epoch {epoch + 1}] time cost: {time.time() - tic}\")\n",
	" print(f\"[Epoch {epoch + 1}] validation: validation accuracy={acc_val}\")"
	]
	},
	{
	"cell_type": "markdown",
	"id": "8702061e",
	"metadata": {},
	"source": [
	"## Next steps\n",
	"\n",
	"Now that you have completed training and predicting with a neural network on GPUs, you reached the conclusion of the crash course. Congratulations.\n",
	"If you are keen on studying more, checkout [D2L.ai](https://d2l.ai),\n",
	"[GluonCV](https://cv.gluon.ai/tutorials/index.html), [GluonNLP](https://nlp.gluon.ai),\n",
	"[GluonTS](https://ts.gluon.ai/), [AutoGluon](https://auto.gluon.ai)."
	]
	}
	],
	"metadata": {
	"language_info": {
	"name": "python"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 5
	}