blob: 2fe1261986d8d82141e46031527a911de422aace [file] [log] [blame]
{
"cells": [
{
"cell_type": "markdown",
"id": "6a14f615",
"metadata": {},
"source": [
"<!--- Licensed to the Apache Software Foundation (ASF) under one -->\n",
"<!--- or more contributor license agreements. See the NOTICE file -->\n",
"<!--- distributed with this work for additional information -->\n",
"<!--- regarding copyright ownership. The ASF licenses this file -->\n",
"<!--- to you under the Apache License, Version 2.0 (the -->\n",
"<!--- \"License\"); you may not use this file except in compliance -->\n",
"<!--- with the License. You may obtain a copy of the License at -->\n",
"\n",
"<!--- http://www.apache.org/licenses/LICENSE-2.0 -->\n",
"\n",
"<!--- Unless required by applicable law or agreed to in writing, -->\n",
"<!--- software distributed under the License is distributed on an -->\n",
"<!--- \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->\n",
"<!--- KIND, either express or implied. See the License for the -->\n",
"<!--- specific language governing permissions and limitations -->\n",
"<!--- under the License. -->\n",
"\n",
"# Step 3: Automatic differentiation with autograd\n",
"\n",
"In this step, you learn how to use the MXNet `autograd` package to perform\n",
"gradient calculations.\n",
"\n",
"## Basic use\n",
"\n",
"To get started, import the `autograd` package with the following code."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a41a2302",
"metadata": {},
"outputs": [],
"source": [
"from mxnet import np, npx\n",
"from mxnet import autograd\n",
"npx.set_np()"
]
},
{
"cell_type": "markdown",
"id": "cfb4d4cf",
"metadata": {},
"source": [
"As an example, you could differentiate a function $f(x) = 2 x^2$ with respect to\n",
"parameter $x$. For Autograd, you can start by assigning an initial value of $x$,\n",
"as follows:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a55c84e6",
"metadata": {},
"outputs": [],
"source": [
"x = np.array([[1, 2], [3, 4]])\n",
"x"
]
},
{
"cell_type": "markdown",
"id": "008821e4",
"metadata": {},
"source": [
"After you compute the gradient of $f(x)$ with respect to $x$, you need a place\n",
"to store it. In MXNet, you can tell a ndarray that you plan to store a gradient\n",
"by invoking its `attach_grad` method, as shown in the following example."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "db3eff4d",
"metadata": {},
"outputs": [],
"source": [
"x.attach_grad()"
]
},
{
"cell_type": "markdown",
"id": "5f1fecaa",
"metadata": {},
"source": [
"Next, define the function $y=f(x)$. To let MXNet store $y$, so that you can\n",
"compute gradients later, use the following code to put the definition inside an\n",
"`autograd.record()` scope."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "db23322a",
"metadata": {},
"outputs": [],
"source": [
"with autograd.record():\n",
" y = 2 * x * x"
]
},
{
"cell_type": "markdown",
"id": "9069f608",
"metadata": {},
"source": [
"You can invoke back propagation (backprop) by calling `y.backward()`. When $y$\n",
"has more than one entry, `y.backward()` is equivalent to `y.sum().backward()`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "13350863",
"metadata": {},
"outputs": [],
"source": [
"y.backward()"
]
},
{
"cell_type": "markdown",
"id": "59753b31",
"metadata": {},
"source": [
"Next, verify whether this is the expected output. Note that $y=2x^2$ and\n",
"$\\frac{dy}{dx} = 4x$, which should be `[[4, 8],[12, 16]]`. Check the\n",
"automatically computed results."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6f6e7317",
"metadata": {},
"outputs": [],
"source": [
"x.grad"
]
},
{
"cell_type": "markdown",
"id": "813eec0c",
"metadata": {},
"source": [
"Now you get to dive into `y.backward()` by first discussing a bit on gradients. As\n",
"alluded to earlier `y.backward()` is equivalent to `y.sum().backward()`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0e7fb1d5",
"metadata": {},
"outputs": [],
"source": [
"with autograd.record():\n",
" y = np.sum(2 * x * x)\n",
"y.backward()\n",
"x.grad"
]
},
{
"cell_type": "markdown",
"id": "fb985663",
"metadata": {},
"source": [
"Additionally, you can only run backward once. Unless you use the flag\n",
"`retain_graph` to be `True`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c621c976",
"metadata": {},
"outputs": [],
"source": [
"with autograd.record():\n",
" y = np.sum(2 * x * x)\n",
"y.backward(retain_graph=True)\n",
"print(x.grad)\n",
"print(\"Since you have retained your previous graph you can run backward again\")\n",
"y.backward()\n",
"print(x.grad)\n",
"\n",
"try:\n",
" y.backward()\n",
"except:\n",
" print(\"However, you can't do backward twice unless you retain the graph.\")"
]
},
{
"cell_type": "markdown",
"id": "e289a1db",
"metadata": {},
"source": [
"## Custom MXNet ndarray operations\n",
"\n",
"In order to understand the `backward()` method it is beneficial to first\n",
"understand how you can create custom operations. MXNet operators are classes\n",
"with a forward and backward method. Where the number of args in `backward()`\n",
"must equal the number of items returned in the `forward()` method. Additionally,\n",
"the number of arguments in the `forward()` method must match the number of\n",
"output arguments from `backward()`. You can modify the gradients in backward to\n",
"return custom gradients. For instance, below you can return a different gradient then\n",
"the actual derivative."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5583ca32",
"metadata": {},
"outputs": [],
"source": [
"class MyFirstCustomOperation(autograd.Function):\n",
" def __init__(self):\n",
" super().__init__()\n",
"\n",
" def forward(self,x,y):\n",
" return 2 * x, 2 * x * y, 2 * y\n",
"\n",
" def backward(self, dx, dxy, dy):\n",
" \"\"\"\n",
" The input number of arguments must match the number of outputs from forward.\n",
" Furthermore, the number of output arguments must match the number of inputs from forward.\n",
" \"\"\"\n",
" return x, y"
]
},
{
"cell_type": "markdown",
"id": "309142cc",
"metadata": {},
"source": [
"Now you can use the first custom operation you have built."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8f820dd8",
"metadata": {},
"outputs": [],
"source": [
"x = np.random.uniform(-1, 1, (2, 3)) \n",
"y = np.random.uniform(-1, 1, (2, 3))\n",
"x.attach_grad()\n",
"y.attach_grad()\n",
"with autograd.record():\n",
" z = MyFirstCustomOperation()\n",
" z1, z2, z3 = z(x, y)\n",
" out = z1 + z2 + z3 \n",
"out.backward()\n",
"print(np.array_equiv(x.asnumpy(), x.asnumpy()))\n",
"print(np.array_equiv(y.asnumpy(), y.asnumpy()))"
]
},
{
"cell_type": "markdown",
"id": "6c08fbaf",
"metadata": {},
"source": [
"Alternatively, you may want to have a function which is different depending on\n",
"if you are training or not."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "57d1d283",
"metadata": {},
"outputs": [],
"source": [
"def my_first_function(x):\n",
" if autograd.is_training(): # Return something else when training\n",
" return(4 * x)\n",
" else:\n",
" return(x)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "706fc5b0",
"metadata": {},
"outputs": [],
"source": [
"y = my_first_function(x)\n",
"print(np.array_equiv(y.asnumpy(), x.asnumpy()))\n",
"with autograd.record(train_mode=False):\n",
" y = my_first_function(x)\n",
"y.backward()\n",
"print(x.grad)\n",
"with autograd.record(train_mode=True): # train_mode = True by default\n",
" y = my_first_function(x)\n",
"y.backward()\n",
"print(x.grad)"
]
},
{
"cell_type": "markdown",
"id": "ae3a7bc5",
"metadata": {},
"source": [
"You could create functions with `autograd.record()`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2703873f",
"metadata": {},
"outputs": [],
"source": [
"def my_second_function(x):\n",
" with autograd.record():\n",
" return(2 * x)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1ba7abb4",
"metadata": {},
"outputs": [],
"source": [
"y = my_second_function(x)\n",
"y.backward()\n",
"print(x.grad)"
]
},
{
"cell_type": "markdown",
"id": "343fa056",
"metadata": {},
"source": [
"You can also combine multiple functions."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d38f4f7b",
"metadata": {},
"outputs": [],
"source": [
"y = my_second_function(x)\n",
"with autograd.record():\n",
" z = my_second_function(y) + 2\n",
"z.backward()\n",
"print(x.grad)"
]
},
{
"cell_type": "markdown",
"id": "ac2db18a",
"metadata": {},
"source": [
"Additionally, MXNet records the execution trace and computes the gradient\n",
"accordingly. The following function `f` doubles the inputs until its `norm`\n",
"reaches 1000. Then it selects one element depending on the sum of its elements."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "880a5bb8",
"metadata": {},
"outputs": [],
"source": [
"def f(a):\n",
" b = a * 2\n",
" while np.abs(b).sum() < 1000:\n",
" b = b * 2\n",
" if b.sum() >= 0:\n",
" c = b[0]\n",
" else:\n",
" c = b[1]\n",
" return c"
]
},
{
"cell_type": "markdown",
"id": "8967c555",
"metadata": {},
"source": [
"In this example, you record the trace and feed in a random value."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2d7b9e99",
"metadata": {},
"outputs": [],
"source": [
"a = np.random.uniform(size=2)\n",
"a.attach_grad()\n",
"with autograd.record():\n",
" c = f(a)\n",
"c.backward()"
]
},
{
"cell_type": "markdown",
"id": "8759dc87",
"metadata": {},
"source": [
"You can see that `b` is a linear function of `a`, and `c` is chosen from `b`.\n",
"The gradient with respect to `a` be will be either `[c/a[0], 0]` or `[0,\n",
"c/a[1]]`, depending on which element from `b` is picked. You see the results of\n",
"this example with this code:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "526bd860",
"metadata": {},
"outputs": [],
"source": [
"a.grad == c / a"
]
},
{
"cell_type": "markdown",
"id": "85e05094",
"metadata": {},
"source": [
"As you can notice there are 3 values along the dimension 0, so taking a `mean`\n",
"along this axis is the same as summing that axis and multiplying by `1/3`.\n",
"\n",
"## Advanced MXNet ndarray operations with Autograd\n",
"\n",
"You can control gradients for different ndarray operations. For instance,\n",
"perhaps you want to check that the gradients are propagating properly?\n",
"the `attach_grad()` method automatically detaches itself from the gradient.\n",
"Therefore, the input up until y will no longer look like it has `x`. To\n",
"illustrate this notice that `x.grad` and `y.grad` is not the same in the second\n",
"example."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "47d5f509",
"metadata": {},
"outputs": [],
"source": [
"with autograd.record():\n",
" y = 3 * x\n",
" y.attach_grad()\n",
" z = 4 * y + 2 * x\n",
"z.backward()\n",
"print(x.grad)\n",
"print(y.grad)"
]
},
{
"cell_type": "markdown",
"id": "d07133bf",
"metadata": {},
"source": [
"Is not the same as:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a3b454a0",
"metadata": {},
"outputs": [],
"source": [
"with autograd.record():\n",
" y = 3 * x\n",
" z = 4 * y + 2 * x\n",
"z.backward()\n",
"print(x.grad)\n",
"print(y.grad)"
]
},
{
"cell_type": "markdown",
"id": "fc85bec1",
"metadata": {},
"source": [
"## Next steps\n",
"\n",
"Learn how to initialize weights, choose loss function, metrics and optimizers for training your neural network [Step 4: Necessary components\n",
"to train the neural network](./4-components.ipynb)."
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 5
}