blob: 5bd2c458c63ae72bb9cca0c7f8af77354407ea61 [file] [log] [blame]
{
"cells": [
{
"cell_type": "markdown",
"id": "2f389ac1",
"metadata": {},
"source": [
"<!--- Licensed to the Apache Software Foundation (ASF) under one -->\n",
"<!--- or more contributor license agreements. See the NOTICE file -->\n",
"<!--- distributed with this work for additional information -->\n",
"<!--- regarding copyright ownership. The ASF licenses this file -->\n",
"<!--- to you under the Apache License, Version 2.0 (the -->\n",
"<!--- \"License\"); you may not use this file except in compliance -->\n",
"<!--- with the License. You may obtain a copy of the License at -->\n",
"\n",
"<!--- http://www.apache.org/licenses/LICENSE-2.0 -->\n",
"\n",
"<!--- Unless required by applicable law or agreed to in writing, -->\n",
"<!--- software distributed under the License is distributed on an -->\n",
"<!--- \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->\n",
"<!--- KIND, either express or implied. See the License for the -->\n",
"<!--- specific language governing permissions and limitations -->\n",
"<!--- under the License. -->\n",
"\n",
"# Use GPUs\n",
"\n",
"We often use GPUs to train and deploy neural networks, because it offers significant more computation power compared to CPUs. In this tutorial we will introduce how to use GPUs with MXNet.\n",
"\n",
"First, make sure you have at least one Nvidia GPU in your machine and CUDA\n",
"properly installed. Other GPUs such as AMD and Intel GPUs are not supported\n",
"yet. Then be sure you have installed the GPU-enabled version of MXNet."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "154a89ab",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "15"
}
},
"outputs": [],
"source": [
"# If you pip installed the plain `mxnet` before, uncomment the\n",
"# following two lines to install the GPU version. You may need to\n",
"# replace `cu92` according to your CUDA version.\n",
"# !pip uninstall mxnet\n",
"# !pip install mxnet-cu92\n",
"\n",
"from mxnet import nd, gpu, gluon, autograd\n",
"from mxnet.gluon import nn\n",
"from mxnet.gluon.data.vision import datasets, transforms\n",
"import time"
]
},
{
"cell_type": "markdown",
"id": "24576b4f",
"metadata": {},
"source": [
"## Allocate data to a GPU\n",
"\n",
"You may notice that MXNet's NDArray is very similar to Numpy. One major difference is NDArray has a `context` attribute that specifies which device this array is on. By default, it is `cpu()`. Now we will change it to the first GPU. You can use `gpu()` or `gpu(0)` to indicate the first GPU."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "ce1bbb55",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "10"
}
},
"outputs": [],
"source": [
"x = nd.ones((3,4), ctx=gpu())\n",
"x"
]
},
{
"cell_type": "markdown",
"id": "e23fede0",
"metadata": {},
"source": [
"For a CPU, MXNet will allocate data on main memory, and try to use all CPU cores as possible, even if there is more than one CPU socket. While if there are multiple GPUs, MXNet needs to specify which GPUs the NDArray will be allocated.\n",
"\n",
"Let's assume there is a least one more GPU. We can create another NDArray and assign it there. (If you only have one GPU, then you will see an error). Here we copy `x` to the second GPU, `gpu(1)`:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "de41caf7",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "11"
}
},
"outputs": [],
"source": [
"x.copyto(gpu(1))"
]
},
{
"cell_type": "markdown",
"id": "f188aed2",
"metadata": {},
"source": [
"MXNet needs users to explicitly move data between devices. But several operators such as `print`, `asnumpy` and `asscalar`, will implicitly move data to main memory.\n",
"\n",
"## Run an operation on a GPU\n",
"\n",
"To perform an operation on a particular GPU, we only need to guarantee that the inputs of this operation are already on that GPU. The output will be allocated on the same GPU as well. Almost all operators in the `nd` module support running on a GPU."
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "6fcd112b",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "21"
}
},
"outputs": [],
"source": [
"y = nd.random.uniform(shape=(3,4), ctx=gpu())\n",
"x + y"
]
},
{
"cell_type": "markdown",
"id": "c985685e",
"metadata": {},
"source": [
"Remember that if the inputs are not on the same GPU, you will see an error.\n",
"\n",
"## Run a neural network on a GPU\n",
"\n",
"Similarly, to run a neural network on a GPU, we only need to copy/move the input data and parameters to the GPU. Let's reuse the previously defined LeNet."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "0e02672a",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "16"
}
},
"outputs": [],
"source": [
"net = nn.Sequential()\n",
"net.add(nn.Conv2D(channels=6, kernel_size=5, activation='relu'),\n",
" nn.MaxPool2D(pool_size=2, strides=2),\n",
" nn.Conv2D(channels=16, kernel_size=3, activation='relu'),\n",
" nn.MaxPool2D(pool_size=2, strides=2),\n",
" nn.Flatten(),\n",
" nn.Dense(120, activation=\"relu\"),\n",
" nn.Dense(84, activation=\"relu\"),\n",
" nn.Dense(10))"
]
},
{
"cell_type": "markdown",
"id": "ab70fda7",
"metadata": {},
"source": [
"And then load the saved parameters into GPU 0 directly, or use `net.collect_params().reset_ctx` to change the device."
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "ed4889cf",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "20"
}
},
"outputs": [],
"source": [
"net.load_parameters('net.params', ctx=gpu(0))"
]
},
{
"cell_type": "markdown",
"id": "1cd23ffb",
"metadata": {},
"source": [
"Now create input data on GPU 0. The forward function will then run on GPU 0."
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "d9f09799",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "22"
}
},
"outputs": [],
"source": [
"x = nd.random.uniform(shape=(1,1,28,28), ctx=gpu(0))\n",
"net(x)"
]
},
{
"cell_type": "markdown",
"id": "696b9176",
"metadata": {},
"source": [
"## [Advanced] Multi-GPU training\n",
"\n",
"Finally, we show how to use multiple GPUs to jointly train a neural network through data parallelism. Let's assume there are *n* GPUs. We split each data batch into *n* parts, and then each GPU will run the forward and backward passes using one part of the data.\n",
"\n",
"Let's first copy the data definitions and the transform function from the [previous tutorial](5-predict.html)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "efa2ae6f",
"metadata": {},
"outputs": [],
"source": [
"batch_size = 256\n",
"transformer = transforms.Compose([\n",
" transforms.ToTensor(),\n",
" transforms.Normalize(0.13, 0.31)])\n",
"train_data = gluon.data.DataLoader(\n",
" datasets.FashionMNIST(train=True).transform_first(transformer),\n",
" batch_size, shuffle=True, num_workers=4)\n",
"valid_data = gluon.data.DataLoader(\n",
" datasets.FashionMNIST(train=False).transform_first(transformer),\n",
" batch_size, shuffle=False, num_workers=4)"
]
},
{
"cell_type": "markdown",
"id": "42167aad",
"metadata": {},
"source": [
"The training loop is quite similar to what we introduced before. The major differences are highlighted in the following code."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c4c2fd14",
"metadata": {},
"outputs": [],
"source": [
"# Diff 1: Use two GPUs for training.\n",
"devices = [gpu(0), gpu(1)]\n",
"# Diff 2: reinitialize the parameters and place them on multiple GPUs\n",
"net.collect_params().initialize(force_reinit=True, ctx=devices)\n",
"# Loss and trainer are the same as before\n",
"softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()\n",
"trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})\n",
"for epoch in range(10):\n",
" train_loss = 0.\n",
" tic = time.time()\n",
" for data, label in train_data:\n",
" # Diff 3: split batch and load into corresponding devices\n",
" data_list = gluon.utils.split_and_load(data, devices)\n",
" label_list = gluon.utils.split_and_load(label, devices)\n",
" # Diff 4: run forward and backward on each devices.\n",
" # MXNet will automatically run them in parallel\n",
" with autograd.record():\n",
" losses = [softmax_cross_entropy(net(X), y)\n",
" for X, y in zip(data_list, label_list)]\n",
" for l in losses:\n",
" l.backward()\n",
" trainer.step(batch_size)\n",
" # Diff 5: sum losses over all devices\n",
" train_loss += sum([l.sum().asscalar() for l in losses])\n",
" print(\"Epoch %d: loss %.3f, in %.1f sec\" % (\n",
" epoch, train_loss/len(train_data)/batch_size, time.time()-tic))"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 5
}