versions/master/api/python/docs/_sources/tutorials/getting-started/crash-course/6-use_gpus.ipynb - mxnet-site - Git at Google

 {
  "cells": [
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "<!--- Licensed to the Apache Software Foundation (ASF) under one -->\n",
     "<!--- or more contributor license agreements.  See the NOTICE file -->\n",
     "<!--- distributed with this work for additional information -->\n",
     "<!--- regarding copyright ownership.  The ASF licenses this file -->\n",
     "<!--- to you under the Apache License, Version 2.0 (the -->\n",
     "<!--- \"License\"); you may not use this file except in compliance -->\n",
     "<!--- with the License.  You may obtain a copy of the License at -->\n",
     "\n",
     "<!---   http://www.apache.org/licenses/LICENSE-2.0 -->\n",
     "\n",
     "<!--- Unless required by applicable law or agreed to in writing, -->\n",
     "<!--- software distributed under the License is distributed on an -->\n",
     "<!--- \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->\n",
     "<!--- KIND, either express or implied.  See the License for the -->\n",
     "<!--- specific language governing permissions and limitations -->\n",
     "<!--- under the License. -->\n",
     "\n",
     "# Step 6: Use GPUs to increase efficiency\n",
     "\n",
     "In this step, you learn how to use graphics processing units (GPUs) with MXNet. If you use GPUs to train and deploy neural networks, you get significantly more computational power when compared to central processing units (CPUs).\n",
     "\n",
     "## Prerequisites\n",
     "\n",
     "Before you start the other steps here, make sure you have at least one Nvidia GPU in your machine and CUDA properly installed. GPUs from AMD and Intel are not supported. Install the GPU-enabled version of MXNet.\n",
     "\n",
     "Use the following commands to check the number GPUs that are available."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 2,
    "metadata": {
     "attributes": {
      "classes": [],
      "id": "",
      "n": "2"
     }
    },
    "outputs": [],
    "source": [
     "from mxnet import np, npx, gluon, autograd\n",
     "from mxnet.gluon import nn\n",
     "import time\n",
     "npx.set_np()\n",
     "\n",
     "npx.num_gpus()"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "## Allocate data to a GPU\n",
     "\n",
     "MXNet's ndarray is very similar to NumPy. One major difference is MXNet's ndarray has a `context` attribute that specifies which device an array is on. By default, it is on `npx.cpu()`. Change it to the first GPU with the following code. Use `npx.gpu()` or `npx.gpu(0)` to indicate the first GPU."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 10,
    "metadata": {
     "attributes": {
      "classes": [],
      "id": "",
      "n": "10"
     }
    },
    "outputs": [],
    "source": [
     "gpu = npx.gpu() if npx.num_gpus() > 0 else npx.cpu()\n",
     "x = np.ones((3,4), ctx=gpu)\n",
     "x"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "If you're using a CPU, MXNet allocates data on main memory and tries to use as many CPU cores as possible.  This is true even if there is more than one CPU socket. If there are multiple GPUs, MXNet specifies which GPUs the ndarray is allocated.\n",
     "\n",
     "Assume there is a least one more GPU. Create another ndarray and assign it there. If you only have one GPU, then you get an error. In the example code here, you copy `x` to the second GPU, `npx.gpu(1)`:"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 11,
    "metadata": {
     "attributes": {
      "classes": [],
      "id": "",
      "n": "11"
     }
    },
    "outputs": [],
    "source": [
     "gpu_1 = npx.gpu(1) if npx.num_gpus() > 1 else npx.cpu()\n",
     "x.copyto(gpu_1)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "MXNet requries that users explicitly move data between devices. But several operators such as `print`, and `asnumpy`, will implicitly move data to main memory.\n",
     "\n",
     "## Run an operation on a GPU\n",
     "\n",
     "To perform an operation on a particular GPU, you only need to guarantee that the input of an operation is already on that GPU. The output is allocated on the same GPU as well. Almost all operators in the `np` and `npx` module support running on a GPU."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 21,
    "metadata": {
     "attributes": {
      "classes": [],
      "id": "",
      "n": "21"
     }
    },
    "outputs": [],
    "source": [
     "y = np.random.uniform(size=(3,4), ctx=gpu)\n",
     "x + y"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "Remember that if the inputs are not on the same GPU, you get an error.\n",
     "\n",
     "## Run a neural network on a GPU\n",
     "\n",
     "To run a neural network on a GPU, you only need to copy and move the input data and parameters to the GPU. Reuse the previously defined LeNet. The following code example shows this."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 16,
    "metadata": {
     "attributes": {
      "classes": [],
      "id": "",
      "n": "16"
     }
    },
    "outputs": [],
    "source": [
     "net = nn.Sequential()\n",
     "net.add(nn.Conv2D(channels=6, kernel_size=5, activation='relu'),\n",
     "        nn.MaxPool2D(pool_size=2, strides=2),\n",
     "        nn.Conv2D(channels=16, kernel_size=3, activation='relu'),\n",
     "        nn.MaxPool2D(pool_size=2, strides=2),\n",
     "        nn.Dense(120, activation=\"relu\"),\n",
     "        nn.Dense(84, activation=\"relu\"),\n",
     "        nn.Dense(10))"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "Load the saved parameters into GPU 0 directly as shown here, or use `net.collect_params().reset_ctx` to change the device."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 20,
    "metadata": {
     "attributes": {
      "classes": [],
      "id": "",
      "n": "20"
     }
    },
    "outputs": [],
    "source": [
     "net.load_parameters('net.params', ctx=gpu)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "Use the following command to create input data on GPU 0. The forward function will then run on GPU 0."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 22,
    "metadata": {
     "attributes": {
      "classes": [],
      "id": "",
      "n": "22"
     }
    },
    "outputs": [],
    "source": [
     "# x = np.random.uniform(size=(1,1,28,28), ctx=gpu)\n",
     "# net(x) FIXME"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "## Training with multiple GPUs\n",
     "\n",
     "Finally, you can see how to use multiple GPUs to jointly train a neural network through data parallelism. Assume there are *n* GPUs. Split each data batch into *n* parts, and then each GPU will run the forward and backward passes using one part of the data.\n",
     "\n",
     "First copy the data definitions with the following commands, and the transform function from the [Predict tutorial](5-predict.md)."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
     "batch_size = 256\n",
     "transformer = gluon.data.vision.transforms.Compose([\n",
     "    gluon.data.vision.transforms.ToTensor(),\n",
     "    gluon.data.vision.transforms.Normalize(0.13, 0.31)])\n",
     "train_data = gluon.data.DataLoader(\n",
     "    gluon.data.vision.datasets.FashionMNIST(train=True).transform_first(\n",
     "        transformer), batch_size, shuffle=True, num_workers=4)\n",
     "valid_data = gluon.data.DataLoader(\n",
     "    gluon.data.vision.datasets.FashionMNIST(train=False).transform_first(\n",
     "        transformer), batch_size, shuffle=False, num_workers=4)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "The training loop is quite similar to that shown earlier. The major differences are highlighted in the following code."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
     "# Diff 1: Use two GPUs for training.\n",
     "devices = [gpu, gpu_1]\n",
     "# Diff 2: reinitialize the parameters and place them on multiple GPUs\n",
     "net.collect_params().initialize(force_reinit=True, ctx=devices)\n",
     "# Loss and trainer are the same as before\n",
     "softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()\n",
     "trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})\n",
     "for epoch in range(10):\n",
     "    train_loss = 0.\n",
     "    tic = time.time()\n",
     "    for data, label in train_data:\n",
     "        # Diff 3: split batch and load into corresponding devices\n",
     "        data_list = gluon.utils.split_and_load(data, devices)\n",
     "        label_list = gluon.utils.split_and_load(label, devices)\n",
     "        # Diff 4: run forward and backward on each devices.\n",
     "        # MXNet will automatically run them in parallel\n",
     "        with autograd.record():\n",
     "            losses = [softmax_cross_entropy(net(X), y)\n",
     "                      for X, y in zip(data_list, label_list)]\n",
     "        for l in losses:\n",
     "            l.backward()\n",
     "        trainer.step(batch_size)\n",
     "        # Diff 5: sum losses over all devices. Here float will copy data\n",
     "        # into CPU.\n",
     "        train_loss += sum([float(l.sum()) for l in losses])\n",
     "    print(\"Epoch %d: loss %.3f, in %.1f sec\" % (\n",
     "        epoch, train_loss/len(train_data)/batch_size, time.time()-tic))"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "## Next steps\n",
     "\n",
     "Now you have completed training and predicting with a neural network by using NP on MXNet and\n",
     "Gluon. You can check the guides to these two front ends: [What is NP on MXNet](../np/index.html) and [gluon](../gluon_from_experiment_to_deployment.md)."
    ]
   }
  ],
  "metadata": {
   "language_info": {
    "name": "python"
   }
  },
  "nbformat": 4,
  "nbformat_minor": 4
 }
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"<!--- Licensed to the Apache Software Foundation (ASF) under one -->\n",
	"<!--- or more contributor license agreements. See the NOTICE file -->\n",
	"<!--- distributed with this work for additional information -->\n",
	"<!--- regarding copyright ownership. The ASF licenses this file -->\n",
	"<!--- to you under the Apache License, Version 2.0 (the -->\n",
	"<!--- \"License\"); you may not use this file except in compliance -->\n",
	"<!--- with the License. You may obtain a copy of the License at -->\n",
	"\n",
	"<!--- http://www.apache.org/licenses/LICENSE-2.0 -->\n",
	"\n",
	"<!--- Unless required by applicable law or agreed to in writing, -->\n",
	"<!--- software distributed under the License is distributed on an -->\n",
	"<!--- \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->\n",
	"<!--- KIND, either express or implied. See the License for the -->\n",
	"<!--- specific language governing permissions and limitations -->\n",
	"<!--- under the License. -->\n",
	"\n",
	"# Step 6: Use GPUs to increase efficiency\n",
	"\n",
	"In this step, you learn how to use graphics processing units (GPUs) with MXNet. If you use GPUs to train and deploy neural networks, you get significantly more computational power when compared to central processing units (CPUs).\n",
	"\n",
	"## Prerequisites\n",
	"\n",
	"Before you start the other steps here, make sure you have at least one Nvidia GPU in your machine and CUDA properly installed. GPUs from AMD and Intel are not supported. Install the GPU-enabled version of MXNet.\n",
	"\n",
	"Use the following commands to check the number GPUs that are available."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {
	"attributes": {
	"classes": [],
	"id": "",
	"n": "2"
	}
	},
	"outputs": [],
	"source": [
	"from mxnet import np, npx, gluon, autograd\n",
	"from mxnet.gluon import nn\n",
	"import time\n",
	"npx.set_np()\n",
	"\n",
	"npx.num_gpus()"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Allocate data to a GPU\n",
	"\n",
	"MXNet's ndarray is very similar to NumPy. One major difference is MXNet's ndarray has a `context` attribute that specifies which device an array is on. By default, it is on `npx.cpu()`. Change it to the first GPU with the following code. Use `npx.gpu()` or `npx.gpu(0)` to indicate the first GPU."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 10,
	"metadata": {
	"attributes": {
	"classes": [],
	"id": "",
	"n": "10"
	}
	},
	"outputs": [],
	"source": [
	"gpu = npx.gpu() if npx.num_gpus() > 0 else npx.cpu()\n",
	"x = np.ones((3,4), ctx=gpu)\n",
	"x"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"If you're using a CPU, MXNet allocates data on main memory and tries to use as many CPU cores as possible. This is true even if there is more than one CPU socket. If there are multiple GPUs, MXNet specifies which GPUs the ndarray is allocated.\n",
	"\n",
	"Assume there is a least one more GPU. Create another ndarray and assign it there. If you only have one GPU, then you get an error. In the example code here, you copy `x` to the second GPU, `npx.gpu(1)`:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 11,
	"metadata": {
	"attributes": {
	"classes": [],
	"id": "",
	"n": "11"
	}
	},
	"outputs": [],
	"source": [
	"gpu_1 = npx.gpu(1) if npx.num_gpus() > 1 else npx.cpu()\n",
	"x.copyto(gpu_1)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"MXNet requries that users explicitly move data between devices. But several operators such as `print`, and `asnumpy`, will implicitly move data to main memory.\n",
	"\n",
	"## Run an operation on a GPU\n",
	"\n",
	"To perform an operation on a particular GPU, you only need to guarantee that the input of an operation is already on that GPU. The output is allocated on the same GPU as well. Almost all operators in the `np` and `npx` module support running on a GPU."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 21,
	"metadata": {
	"attributes": {
	"classes": [],
	"id": "",
	"n": "21"
	}
	},
	"outputs": [],
	"source": [
	"y = np.random.uniform(size=(3,4), ctx=gpu)\n",
	"x + y"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Remember that if the inputs are not on the same GPU, you get an error.\n",
	"\n",
	"## Run a neural network on a GPU\n",
	"\n",
	"To run a neural network on a GPU, you only need to copy and move the input data and parameters to the GPU. Reuse the previously defined LeNet. The following code example shows this."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 16,
	"metadata": {
	"attributes": {
	"classes": [],
	"id": "",
	"n": "16"
	}
	},
	"outputs": [],
	"source": [
	"net = nn.Sequential()\n",
	"net.add(nn.Conv2D(channels=6, kernel_size=5, activation='relu'),\n",
	" nn.MaxPool2D(pool_size=2, strides=2),\n",
	" nn.Conv2D(channels=16, kernel_size=3, activation='relu'),\n",
	" nn.MaxPool2D(pool_size=2, strides=2),\n",
	" nn.Dense(120, activation=\"relu\"),\n",
	" nn.Dense(84, activation=\"relu\"),\n",
	" nn.Dense(10))"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Load the saved parameters into GPU 0 directly as shown here, or use `net.collect_params().reset_ctx` to change the device."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 20,
	"metadata": {
	"attributes": {
	"classes": [],
	"id": "",
	"n": "20"
	}
	},
	"outputs": [],
	"source": [
	"net.load_parameters('net.params', ctx=gpu)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Use the following command to create input data on GPU 0. The forward function will then run on GPU 0."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 22,
	"metadata": {
	"attributes": {
	"classes": [],
	"id": "",
	"n": "22"
	}
	},
	"outputs": [],
	"source": [
	"# x = np.random.uniform(size=(1,1,28,28), ctx=gpu)\n",
	"# net(x) FIXME"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Training with multiple GPUs\n",
	"\n",
	"Finally, you can see how to use multiple GPUs to jointly train a neural network through data parallelism. Assume there are n GPUs. Split each data batch into n parts, and then each GPU will run the forward and backward passes using one part of the data.\n",
	"\n",
	"First copy the data definitions with the following commands, and the transform function from the [Predict tutorial](5-predict.md)."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"batch_size = 256\n",
	"transformer = gluon.data.vision.transforms.Compose([\n",
	" gluon.data.vision.transforms.ToTensor(),\n",
	" gluon.data.vision.transforms.Normalize(0.13, 0.31)])\n",
	"train_data = gluon.data.DataLoader(\n",
	" gluon.data.vision.datasets.FashionMNIST(train=True).transform_first(\n",
	" transformer), batch_size, shuffle=True, num_workers=4)\n",
	"valid_data = gluon.data.DataLoader(\n",
	" gluon.data.vision.datasets.FashionMNIST(train=False).transform_first(\n",
	" transformer), batch_size, shuffle=False, num_workers=4)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"The training loop is quite similar to that shown earlier. The major differences are highlighted in the following code."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"# Diff 1: Use two GPUs for training.\n",
	"devices = [gpu, gpu_1]\n",
	"# Diff 2: reinitialize the parameters and place them on multiple GPUs\n",
	"net.collect_params().initialize(force_reinit=True, ctx=devices)\n",
	"# Loss and trainer are the same as before\n",
	"softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()\n",
	"trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})\n",
	"for epoch in range(10):\n",
	" train_loss = 0.\n",
	" tic = time.time()\n",
	" for data, label in train_data:\n",
	" # Diff 3: split batch and load into corresponding devices\n",
	" data_list = gluon.utils.split_and_load(data, devices)\n",
	" label_list = gluon.utils.split_and_load(label, devices)\n",
	" # Diff 4: run forward and backward on each devices.\n",
	" # MXNet will automatically run them in parallel\n",
	" with autograd.record():\n",
	" losses = [softmax_cross_entropy(net(X), y)\n",
	" for X, y in zip(data_list, label_list)]\n",
	" for l in losses:\n",
	" l.backward()\n",
	" trainer.step(batch_size)\n",
	" # Diff 5: sum losses over all devices. Here float will copy data\n",
	" # into CPU.\n",
	" train_loss += sum([float(l.sum()) for l in losses])\n",
	" print(\"Epoch %d: loss %.3f, in %.1f sec\" % (\n",
	" epoch, train_loss/len(train_data)/batch_size, time.time()-tic))"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Next steps\n",
	"\n",
	"Now you have completed training and predicting with a neural network by using NP on MXNet and\n",
	"Gluon. You can check the guides to these two front ends: [What is NP on MXNet](../np/index.html) and [gluon](../gluon_from_experiment_to_deployment.md)."
	]
	}
	],
	"metadata": {
	"language_info": {
	"name": "python"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 4
	}