blob: cf7103a20783ad42b1287443e7e2309e1cbda117 [file] [log] [blame]
{
"cells": [
{
"cell_type": "markdown",
"id": "82792988",
"metadata": {},
"source": [
"<!--- Licensed to the Apache Software Foundation (ASF) under one -->\n",
"<!--- or more contributor license agreements. See the NOTICE file -->\n",
"<!--- distributed with this work for additional information -->\n",
"<!--- regarding copyright ownership. The ASF licenses this file -->\n",
"<!--- to you under the Apache License, Version 2.0 (the -->\n",
"<!--- \"License\"); you may not use this file except in compliance -->\n",
"<!--- with the License. You may obtain a copy of the License at -->\n",
"\n",
"<!--- http://www.apache.org/licenses/LICENSE-2.0 -->\n",
"\n",
"<!--- Unless required by applicable law or agreed to in writing, -->\n",
"<!--- software distributed under the License is distributed on an -->\n",
"<!--- \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->\n",
"<!--- KIND, either express or implied. See the License for the -->\n",
"<!--- specific language governing permissions and limitations -->\n",
"<!--- under the License. -->\n",
"\n",
"# Step 3: Automatic differentiation with autograd\n",
"\n",
"In this step, you learn how to use the MXNet `autograd` package to perform\n",
"gradient calculations.\n",
"\n",
"## Basic use\n",
"\n",
"To get started, import the `autograd` package with the following code."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "9fc8714e",
"metadata": {},
"outputs": [],
"source": [
"from mxnet import np, npx\n",
"from mxnet import autograd\n",
"npx.set_np()"
]
},
{
"cell_type": "markdown",
"id": "21ab66eb",
"metadata": {},
"source": [
"As an example, you could differentiate a function $f(x) = 2 x^2$ with respect to\n",
"parameter $x$. For Autograd, you can start by assigning an initial value of $x$,\n",
"as follows:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "9034fd89",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"[03:52:06] /work/mxnet/src/storage/storage.cc:202: Using Pooled (Naive) StorageManager for CPU\n"
]
},
{
"data": {
"text/plain": [
"array([[1., 2.],\n",
" [3., 4.]])"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = np.array([[1, 2], [3, 4]])\n",
"x"
]
},
{
"cell_type": "markdown",
"id": "de4757f0",
"metadata": {},
"source": [
"After you compute the gradient of $f(x)$ with respect to $x$, you need a place\n",
"to store it. In MXNet, you can tell a ndarray that you plan to store a gradient\n",
"by invoking its `attach_grad` method, as shown in the following example."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "0b4566f2",
"metadata": {},
"outputs": [],
"source": [
"x.attach_grad()"
]
},
{
"cell_type": "markdown",
"id": "a08f9283",
"metadata": {},
"source": [
"Next, define the function $y=f(x)$. To let MXNet store $y$, so that you can\n",
"compute gradients later, use the following code to put the definition inside an\n",
"`autograd.record()` scope."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "45979fe0",
"metadata": {},
"outputs": [],
"source": [
"with autograd.record():\n",
" y = 2 * x * x"
]
},
{
"cell_type": "markdown",
"id": "29fe50f8",
"metadata": {},
"source": [
"You can invoke back propagation (backprop) by calling `y.backward()`. When $y$\n",
"has more than one entry, `y.backward()` is equivalent to `y.sum().backward()`."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "4e768e7a",
"metadata": {},
"outputs": [],
"source": [
"y.backward()"
]
},
{
"cell_type": "markdown",
"id": "4c447b69",
"metadata": {},
"source": [
"Next, verify whether this is the expected output. Note that $y=2x^2$ and\n",
"$\\frac{dy}{dx} = 4x$, which should be `[[4, 8],[12, 16]]`. Check the\n",
"automatically computed results."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "19546bfe",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 4., 8.],\n",
" [12., 16.]])"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x.grad"
]
},
{
"cell_type": "markdown",
"id": "05373081",
"metadata": {},
"source": [
"Now you get to dive into `y.backward()` by first discussing a bit on gradients. As\n",
"alluded to earlier `y.backward()` is equivalent to `y.sum().backward()`."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "56634308",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 4., 8.],\n",
" [12., 16.]])"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"with autograd.record():\n",
" y = np.sum(2 * x * x)\n",
"y.backward()\n",
"x.grad"
]
},
{
"cell_type": "markdown",
"id": "ce8d4363",
"metadata": {},
"source": [
"Additionally, you can only run backward once. Unless you use the flag\n",
"`retain_graph` to be `True`."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "ef32f6ea",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 4. 8.]\n",
" [12. 16.]]\n",
"Since you have retained your previous graph you can run backward again\n",
"[[ 4. 8.]\n",
" [12. 16.]]\n",
"However, you can't do backward twice unless you retain the graph.\n"
]
}
],
"source": [
"with autograd.record():\n",
" y = np.sum(2 * x * x)\n",
"y.backward(retain_graph=True)\n",
"print(x.grad)\n",
"print(\"Since you have retained your previous graph you can run backward again\")\n",
"y.backward()\n",
"print(x.grad)\n",
"\n",
"try:\n",
" y.backward()\n",
"except:\n",
" print(\"However, you can't do backward twice unless you retain the graph.\")"
]
},
{
"cell_type": "markdown",
"id": "a93733a7",
"metadata": {},
"source": [
"## Custom MXNet ndarray operations\n",
"\n",
"In order to understand the `backward()` method it is beneficial to first\n",
"understand how you can create custom operations. MXNet operators are classes\n",
"with a forward and backward method. Where the number of args in `backward()`\n",
"must equal the number of items returned in the `forward()` method. Additionally,\n",
"the number of arguments in the `forward()` method must match the number of\n",
"output arguments from `backward()`. You can modify the gradients in backward to\n",
"return custom gradients. For instance, below you can return a different gradient then\n",
"the actual derivative."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "8c163e5f",
"metadata": {},
"outputs": [],
"source": [
"class MyFirstCustomOperation(autograd.Function):\n",
" def __init__(self):\n",
" super().__init__()\n",
"\n",
" def forward(self,x,y):\n",
" return 2 * x, 2 * x * y, 2 * y\n",
"\n",
" def backward(self, dx, dxy, dy):\n",
" \"\"\"\n",
" The input number of arguments must match the number of outputs from forward.\n",
" Furthermore, the number of output arguments must match the number of inputs from forward.\n",
" \"\"\"\n",
" return x, y"
]
},
{
"cell_type": "markdown",
"id": "9b617671",
"metadata": {},
"source": [
"Now you can use the first custom operation you have built."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "7b08d478",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"True\n",
"True\n"
]
}
],
"source": [
"x = np.random.uniform(-1, 1, (2, 3)) \n",
"y = np.random.uniform(-1, 1, (2, 3))\n",
"x.attach_grad()\n",
"y.attach_grad()\n",
"with autograd.record():\n",
" z = MyFirstCustomOperation()\n",
" z1, z2, z3 = z(x, y)\n",
" out = z1 + z2 + z3 \n",
"out.backward()\n",
"print(np.array_equiv(x.asnumpy(), x.asnumpy()))\n",
"print(np.array_equiv(y.asnumpy(), y.asnumpy()))"
]
},
{
"cell_type": "markdown",
"id": "3a765f3e",
"metadata": {},
"source": [
"Alternatively, you may want to have a function which is different depending on\n",
"if you are training or not."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "7050f47a",
"metadata": {},
"outputs": [],
"source": [
"def my_first_function(x):\n",
" if autograd.is_training(): # Return something else when training\n",
" return(4 * x)\n",
" else:\n",
" return(x)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "402dccf4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"True\n",
"[[1. 1. 1.]\n",
" [1. 1. 1.]]\n",
"[[4. 4. 4.]\n",
" [4. 4. 4.]]\n"
]
}
],
"source": [
"y = my_first_function(x)\n",
"print(np.array_equiv(y.asnumpy(), x.asnumpy()))\n",
"with autograd.record(train_mode=False):\n",
" y = my_first_function(x)\n",
"y.backward()\n",
"print(x.grad)\n",
"with autograd.record(train_mode=True): # train_mode = True by default\n",
" y = my_first_function(x)\n",
"y.backward()\n",
"print(x.grad)"
]
},
{
"cell_type": "markdown",
"id": "ec67e25d",
"metadata": {},
"source": [
"You could create functions with `autograd.record()`."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "fd6c14aa",
"metadata": {},
"outputs": [],
"source": [
"def my_second_function(x):\n",
" with autograd.record():\n",
" return(2 * x)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "c494ab13",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[2. 2. 2.]\n",
" [2. 2. 2.]]\n"
]
}
],
"source": [
"y = my_second_function(x)\n",
"y.backward()\n",
"print(x.grad)"
]
},
{
"cell_type": "markdown",
"id": "354e78f1",
"metadata": {},
"source": [
"You can also combine multiple functions."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "922dc4ef",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[4. 4. 4.]\n",
" [4. 4. 4.]]\n"
]
}
],
"source": [
"y = my_second_function(x)\n",
"with autograd.record():\n",
" z = my_second_function(y) + 2\n",
"z.backward()\n",
"print(x.grad)"
]
},
{
"cell_type": "markdown",
"id": "cf7af926",
"metadata": {},
"source": [
"Additionally, MXNet records the execution trace and computes the gradient\n",
"accordingly. The following function `f` doubles the inputs until its `norm`\n",
"reaches 1000. Then it selects one element depending on the sum of its elements."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "126401ee",
"metadata": {},
"outputs": [],
"source": [
"def f(a):\n",
" b = a * 2\n",
" while np.abs(b).sum() < 1000:\n",
" b = b * 2\n",
" if b.sum() >= 0:\n",
" c = b[0]\n",
" else:\n",
" c = b[1]\n",
" return c"
]
},
{
"cell_type": "markdown",
"id": "cac69362",
"metadata": {},
"source": [
"In this example, you record the trace and feed in a random value."
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "dcf32361",
"metadata": {},
"outputs": [],
"source": [
"a = np.random.uniform(size=2)\n",
"a.attach_grad()\n",
"with autograd.record():\n",
" c = f(a)\n",
"c.backward()"
]
},
{
"cell_type": "markdown",
"id": "5e0deb11",
"metadata": {},
"source": [
"You can see that `b` is a linear function of `a`, and `c` is chosen from `b`.\n",
"The gradient with respect to `a` be will be either `[c/a[0], 0]` or `[0,\n",
"c/a[1]]`, depending on which element from `b` is picked. You see the results of\n",
"this example with this code:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "3ff623d4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ True, False])"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a.grad == c / a"
]
},
{
"cell_type": "markdown",
"id": "a3d27a5b",
"metadata": {},
"source": [
"As you can notice there are 3 values along the dimension 0, so taking a `mean`\n",
"along this axis is the same as summing that axis and multiplying by `1/3`.\n",
"\n",
"## Advanced MXNet ndarray operations with Autograd\n",
"\n",
"You can control gradients for different ndarray operations. For instance,\n",
"perhaps you want to check that the gradients are propagating properly?\n",
"the `attach_grad()` method automatically detaches itself from the gradient.\n",
"Therefore, the input up until y will no longer look like it has `x`. To\n",
"illustrate this notice that `x.grad` and `y.grad` is not the same in the second\n",
"example."
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "fe78d425",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[14. 14. 14.]\n",
" [14. 14. 14.]]\n",
"[[4. 4. 4.]\n",
" [4. 4. 4.]]\n"
]
}
],
"source": [
"with autograd.record():\n",
" y = 3 * x\n",
" y.attach_grad()\n",
" z = 4 * y + 2 * x\n",
"z.backward()\n",
"print(x.grad)\n",
"print(y.grad)"
]
},
{
"cell_type": "markdown",
"id": "1b49d8b3",
"metadata": {},
"source": [
"Is not the same as:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "191556d5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[14. 14. 14.]\n",
" [14. 14. 14.]]\n",
"None\n"
]
}
],
"source": [
"with autograd.record():\n",
" y = 3 * x\n",
" z = 4 * y + 2 * x\n",
"z.backward()\n",
"print(x.grad)\n",
"print(y.grad)"
]
},
{
"cell_type": "markdown",
"id": "d841412f",
"metadata": {},
"source": [
"## Next steps\n",
"\n",
"Learn how to initialize weights, choose loss function, metrics and optimizers for training your neural network [Step 4: Necessary components\n",
"to train the neural network](./4-components.ipynb)."
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 5
}