versions/master/api/python/docs/_sources/tutorials/getting-started/crash-course/3-autograd.ipynb - mxnet-site - Git at Google

 {
  "cells": [
   {
    "cell_type": "markdown",
    "id": "82792988",
    "metadata": {},
    "source": [
     "<!--- Licensed to the Apache Software Foundation (ASF) under one -->\n",
     "<!--- or more contributor license agreements.  See the NOTICE file -->\n",
     "<!--- distributed with this work for additional information -->\n",
     "<!--- regarding copyright ownership.  The ASF licenses this file -->\n",
     "<!--- to you under the Apache License, Version 2.0 (the -->\n",
     "<!--- \"License\"); you may not use this file except in compliance -->\n",
     "<!--- with the License.  You may obtain a copy of the License at -->\n",
     "\n",
     "<!---   http://www.apache.org/licenses/LICENSE-2.0 -->\n",
     "\n",
     "<!--- Unless required by applicable law or agreed to in writing, -->\n",
     "<!--- software distributed under the License is distributed on an -->\n",
     "<!--- \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->\n",
     "<!--- KIND, either express or implied.  See the License for the -->\n",
     "<!--- specific language governing permissions and limitations -->\n",
     "<!--- under the License. -->\n",
     "\n",
     "# Step 3: Automatic differentiation with autograd\n",
     "\n",
     "In this step, you learn how to use the MXNet `autograd` package to perform\n",
     "gradient calculations.\n",
     "\n",
     "## Basic use\n",
     "\n",
     "To get started, import the `autograd` package with the following code."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 1,
    "id": "9fc8714e",
    "metadata": {},
    "outputs": [],
    "source": [
     "from mxnet import np, npx\n",
     "from mxnet import autograd\n",
     "npx.set_np()"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "21ab66eb",
    "metadata": {},
    "source": [
     "As an example, you could differentiate a function $f(x) = 2 x^2$ with respect to\n",
     "parameter $x$. For Autograd, you can start by assigning an initial value of $x$,\n",
     "as follows:"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 2,
    "id": "9034fd89",
    "metadata": {},
    "outputs": [
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
       "[03:52:06] /work/mxnet/src/storage/storage.cc:202: Using Pooled (Naive) StorageManager for CPU\n"
      ]
     },
     {
      "data": {
       "text/plain": [
        "array([[1., 2.],\n",
        "       [3., 4.]])"
       ]
      },
      "execution_count": 2,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "x = np.array([[1, 2], [3, 4]])\n",
     "x"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "de4757f0",
    "metadata": {},
    "source": [
     "After you compute the gradient of $f(x)$ with respect to $x$, you need a place\n",
     "to store it. In MXNet, you can tell a ndarray that you plan to store a gradient\n",
     "by invoking its `attach_grad` method, as shown in the following example."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 3,
    "id": "0b4566f2",
    "metadata": {},
    "outputs": [],
    "source": [
     "x.attach_grad()"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "a08f9283",
    "metadata": {},
    "source": [
     "Next, define the function $y=f(x)$. To let MXNet store $y$, so that you can\n",
     "compute gradients later, use the following code to put the definition inside an\n",
     "`autograd.record()` scope."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 4,
    "id": "45979fe0",
    "metadata": {},
    "outputs": [],
    "source": [
     "with autograd.record():\n",
     "    y = 2 * x * x"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "29fe50f8",
    "metadata": {},
    "source": [
     "You can invoke back propagation (backprop) by calling `y.backward()`. When $y$\n",
     "has more than one entry, `y.backward()` is equivalent to `y.sum().backward()`."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 5,
    "id": "4e768e7a",
    "metadata": {},
    "outputs": [],
    "source": [
     "y.backward()"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "4c447b69",
    "metadata": {},
    "source": [
     "Next, verify whether this is the expected output. Note that $y=2x^2$ and\n",
     "$\\frac{dy}{dx} = 4x$, which should be `[[4, 8],[12, 16]]`. Check the\n",
     "automatically computed results."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 6,
    "id": "19546bfe",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
        "array([[ 4.,  8.],\n",
        "       [12., 16.]])"
       ]
      },
      "execution_count": 6,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "x.grad"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "05373081",
    "metadata": {},
    "source": [
     "Now you get to dive into `y.backward()` by first discussing a bit on gradients. As\n",
     "alluded to earlier `y.backward()` is equivalent to `y.sum().backward()`."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 7,
    "id": "56634308",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
        "array([[ 4.,  8.],\n",
        "       [12., 16.]])"
       ]
      },
      "execution_count": 7,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "with autograd.record():\n",
     "    y = np.sum(2 * x * x)\n",
     "y.backward()\n",
     "x.grad"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "ce8d4363",
    "metadata": {},
    "source": [
     "Additionally, you can only run backward once. Unless you use the flag\n",
     "`retain_graph` to be `True`."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 8,
    "id": "ef32f6ea",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "[[ 4.  8.]\n",
       " [12. 16.]]\n",
       "Since you have retained your previous graph you can run backward again\n",
       "[[ 4.  8.]\n",
       " [12. 16.]]\n",
       "However, you can't do backward twice unless you retain the graph.\n"
      ]
     }
    ],
    "source": [
     "with autograd.record():\n",
     "    y = np.sum(2 * x * x)\n",
     "y.backward(retain_graph=True)\n",
     "print(x.grad)\n",
     "print(\"Since you have retained your previous graph you can run backward again\")\n",
     "y.backward()\n",
     "print(x.grad)\n",
     "\n",
     "try:\n",
     "    y.backward()\n",
     "except:\n",
     "    print(\"However, you can't do backward twice unless you retain the graph.\")"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "a93733a7",
    "metadata": {},
    "source": [
     "## Custom MXNet ndarray operations\n",
     "\n",
     "In order to understand the `backward()` method it is beneficial to first\n",
     "understand how you can create custom operations. MXNet operators are classes\n",
     "with a forward and backward method. Where the number of args in `backward()`\n",
     "must equal the number of items returned in the `forward()` method. Additionally,\n",
     "the number of arguments in the `forward()` method must match the number of\n",
     "output arguments from `backward()`. You can modify the gradients in backward to\n",
     "return custom gradients. For instance, below you can return a different gradient then\n",
     "the actual derivative."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 9,
    "id": "8c163e5f",
    "metadata": {},
    "outputs": [],
    "source": [
     "class MyFirstCustomOperation(autograd.Function):\n",
     "    def __init__(self):\n",
     "        super().__init__()\n",
     "\n",
     "    def forward(self,x,y):\n",
     "        return 2 * x, 2 * x * y, 2 * y\n",
     "\n",
     "    def backward(self, dx, dxy, dy):\n",
     "        \"\"\"\n",
     "        The input number of arguments must match the number of outputs from forward.\n",
     "        Furthermore, the number of output arguments must match the number of inputs from forward.\n",
     "        \"\"\"\n",
     "        return x, y"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "9b617671",
    "metadata": {},
    "source": [
     "Now you can use the first custom operation you have built."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 10,
    "id": "7b08d478",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "True\n",
       "True\n"
      ]
     }
    ],
    "source": [
     "x = np.random.uniform(-1, 1, (2, 3)) \n",
     "y = np.random.uniform(-1, 1, (2, 3))\n",
     "x.attach_grad()\n",
     "y.attach_grad()\n",
     "with autograd.record():\n",
     "    z = MyFirstCustomOperation()\n",
     "    z1, z2, z3 = z(x, y)\n",
     "    out = z1 + z2 + z3 \n",
     "out.backward()\n",
     "print(np.array_equiv(x.asnumpy(), x.asnumpy()))\n",
     "print(np.array_equiv(y.asnumpy(), y.asnumpy()))"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "3a765f3e",
    "metadata": {},
    "source": [
     "Alternatively, you may want to have a function which is different depending on\n",
     "if you are training or not."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 11,
    "id": "7050f47a",
    "metadata": {},
    "outputs": [],
    "source": [
     "def my_first_function(x):\n",
     "    if autograd.is_training(): # Return something else when training\n",
     "        return(4 * x)\n",
     "    else:\n",
     "        return(x)"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 12,
    "id": "402dccf4",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "True\n",
       "[[1. 1. 1.]\n",
       " [1. 1. 1.]]\n",
       "[[4. 4. 4.]\n",
       " [4. 4. 4.]]\n"
      ]
     }
    ],
    "source": [
     "y = my_first_function(x)\n",
     "print(np.array_equiv(y.asnumpy(), x.asnumpy()))\n",
     "with autograd.record(train_mode=False):\n",
     "    y = my_first_function(x)\n",
     "y.backward()\n",
     "print(x.grad)\n",
     "with autograd.record(train_mode=True): # train_mode = True by default\n",
     "    y = my_first_function(x)\n",
     "y.backward()\n",
     "print(x.grad)"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "ec67e25d",
    "metadata": {},
    "source": [
     "You could create functions with `autograd.record()`."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 13,
    "id": "fd6c14aa",
    "metadata": {},
    "outputs": [],
    "source": [
     "def my_second_function(x):\n",
     "    with autograd.record():\n",
     "        return(2 * x)"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 14,
    "id": "c494ab13",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "[[2. 2. 2.]\n",
       " [2. 2. 2.]]\n"
      ]
     }
    ],
    "source": [
     "y = my_second_function(x)\n",
     "y.backward()\n",
     "print(x.grad)"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "354e78f1",
    "metadata": {},
    "source": [
     "You can also combine multiple functions."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 15,
    "id": "922dc4ef",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "[[4. 4. 4.]\n",
       " [4. 4. 4.]]\n"
      ]
     }
    ],
    "source": [
     "y = my_second_function(x)\n",
     "with autograd.record():\n",
     "    z = my_second_function(y) + 2\n",
     "z.backward()\n",
     "print(x.grad)"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "cf7af926",
    "metadata": {},
    "source": [
     "Additionally, MXNet records the execution trace and computes the gradient\n",
     "accordingly. The following function `f` doubles the inputs until its `norm`\n",
     "reaches 1000. Then it selects one element depending on the sum of its elements."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 16,
    "id": "126401ee",
    "metadata": {},
    "outputs": [],
    "source": [
     "def f(a):\n",
     "    b = a * 2\n",
     "    while np.abs(b).sum() < 1000:\n",
     "        b = b * 2\n",
     "    if b.sum() >= 0:\n",
     "        c = b[0]\n",
     "    else:\n",
     "        c = b[1]\n",
     "    return c"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "cac69362",
    "metadata": {},
    "source": [
     "In this example, you record the trace and feed in a random value."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 17,
    "id": "dcf32361",
    "metadata": {},
    "outputs": [],
    "source": [
     "a = np.random.uniform(size=2)\n",
     "a.attach_grad()\n",
     "with autograd.record():\n",
     "    c = f(a)\n",
     "c.backward()"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "5e0deb11",
    "metadata": {},
    "source": [
     "You can see that `b` is a linear function of `a`, and `c` is chosen from `b`.\n",
     "The gradient with respect to `a` be will be either `[c/a[0], 0]` or `[0,\n",
     "c/a[1]]`, depending on which element from `b` is picked. You see the results of\n",
     "this example with this code:"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 18,
    "id": "3ff623d4",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
        "array([ True, False])"
       ]
      },
      "execution_count": 18,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "a.grad == c / a"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "a3d27a5b",
    "metadata": {},
    "source": [
     "As you can notice there are 3 values along the dimension 0, so taking a `mean`\n",
     "along this axis is the same as summing that axis and multiplying by `1/3`.\n",
     "\n",
     "## Advanced MXNet ndarray operations with Autograd\n",
     "\n",
     "You can control gradients for different ndarray operations. For instance,\n",
     "perhaps you want to check that the gradients are propagating properly?\n",
     "the `attach_grad()` method automatically detaches itself from the gradient.\n",
     "Therefore, the input up until y will no longer look like it has `x`. To\n",
     "illustrate this notice that `x.grad` and `y.grad` is not the same in the second\n",
     "example."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 19,
    "id": "fe78d425",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "[[14. 14. 14.]\n",
       " [14. 14. 14.]]\n",
       "[[4. 4. 4.]\n",
       " [4. 4. 4.]]\n"
      ]
     }
    ],
    "source": [
     "with autograd.record():\n",
     "    y = 3 * x\n",
     "    y.attach_grad()\n",
     "    z = 4 * y + 2 * x\n",
     "z.backward()\n",
     "print(x.grad)\n",
     "print(y.grad)"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "1b49d8b3",
    "metadata": {},
    "source": [
     "Is not the same as:"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 20,
    "id": "191556d5",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "[[14. 14. 14.]\n",
       " [14. 14. 14.]]\n",
       "None\n"
      ]
     }
    ],
    "source": [
     "with autograd.record():\n",
     "    y = 3 * x\n",
     "    z = 4 * y + 2 * x\n",
     "z.backward()\n",
     "print(x.grad)\n",
     "print(y.grad)"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "d841412f",
    "metadata": {},
    "source": [
     "## Next steps\n",
     "\n",
     "Learn how to initialize weights, choose loss function, metrics and optimizers for training your neural network [Step 4: Necessary components\n",
     "to train the neural network](./4-components.ipynb)."
    ]
   }
  ],
  "metadata": {
   "language_info": {
    "name": "python"
   }
  },
  "nbformat": 4,
  "nbformat_minor": 5
 }
	{
	"cells": [
	{
	"cell_type": "markdown",
	"id": "82792988",
	"metadata": {},
	"source": [
	"<!--- Licensed to the Apache Software Foundation (ASF) under one -->\n",
	"<!--- or more contributor license agreements. See the NOTICE file -->\n",
	"<!--- distributed with this work for additional information -->\n",
	"<!--- regarding copyright ownership. The ASF licenses this file -->\n",
	"<!--- to you under the Apache License, Version 2.0 (the -->\n",
	"<!--- \"License\"); you may not use this file except in compliance -->\n",
	"<!--- with the License. You may obtain a copy of the License at -->\n",
	"\n",
	"<!--- http://www.apache.org/licenses/LICENSE-2.0 -->\n",
	"\n",
	"<!--- Unless required by applicable law or agreed to in writing, -->\n",
	"<!--- software distributed under the License is distributed on an -->\n",
	"<!--- \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->\n",
	"<!--- KIND, either express or implied. See the License for the -->\n",
	"<!--- specific language governing permissions and limitations -->\n",
	"<!--- under the License. -->\n",
	"\n",
	"# Step 3: Automatic differentiation with autograd\n",
	"\n",
	"In this step, you learn how to use the MXNet `autograd` package to perform\n",
	"gradient calculations.\n",
	"\n",
	"## Basic use\n",
	"\n",
	"To get started, import the `autograd` package with the following code."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"id": "9fc8714e",
	"metadata": {},
	"outputs": [],
	"source": [
	"from mxnet import np, npx\n",
	"from mxnet import autograd\n",
	"npx.set_np()"
	]
	},
	{
	"cell_type": "markdown",
	"id": "21ab66eb",
	"metadata": {},
	"source": [
	"As an example, you could differentiate a function $f(x) = 2 x^2$ with respect to\n",
	"parameter $x$. For Autograd, you can start by assigning an initial value of $x$,\n",
	"as follows:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"id": "9034fd89",
	"metadata": {},
	"outputs": [
	{
	"name": "stderr",
	"output_type": "stream",
	"text": [
	"[03:52:06] /work/mxnet/src/storage/storage.cc:202: Using Pooled (Naive) StorageManager for CPU\n"
	]
	},
	{
	"data": {
	"text/plain": [
	"array([[1., 2.],\n",
	" [3., 4.]])"
	]
	},
	"execution_count": 2,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"x = np.array([[1, 2], [3, 4]])\n",
	"x"
	]
	},
	{
	"cell_type": "markdown",
	"id": "de4757f0",
	"metadata": {},
	"source": [
	"After you compute the gradient of $f(x)$ with respect to $x$, you need a place\n",
	"to store it. In MXNet, you can tell a ndarray that you plan to store a gradient\n",
	"by invoking its `attach_grad` method, as shown in the following example."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"id": "0b4566f2",
	"metadata": {},
	"outputs": [],
	"source": [
	"x.attach_grad()"
	]
	},
	{
	"cell_type": "markdown",
	"id": "a08f9283",
	"metadata": {},
	"source": [
	"Next, define the function $y=f(x)$. To let MXNet store $y$, so that you can\n",
	"compute gradients later, use the following code to put the definition inside an\n",
	"`autograd.record()` scope."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"id": "45979fe0",
	"metadata": {},
	"outputs": [],
	"source": [
	"with autograd.record():\n",
	" y = 2 * x * x"
	]
	},
	{
	"cell_type": "markdown",
	"id": "29fe50f8",
	"metadata": {},
	"source": [
	"You can invoke back propagation (backprop) by calling `y.backward()`. When $y$\n",
	"has more than one entry, `y.backward()` is equivalent to `y.sum().backward()`."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 5,
	"id": "4e768e7a",
	"metadata": {},
	"outputs": [],
	"source": [
	"y.backward()"
	]
	},
	{
	"cell_type": "markdown",
	"id": "4c447b69",
	"metadata": {},
	"source": [
	"Next, verify whether this is the expected output. Note that $y=2x^2$ and\n",
	"$\\frac{dy}{dx} = 4x$, which should be `[[4, 8],[12, 16]]`. Check the\n",
	"automatically computed results."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 6,
	"id": "19546bfe",
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([[ 4., 8.],\n",
	" [12., 16.]])"
	]
	},
	"execution_count": 6,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"x.grad"
	]
	},
	{
	"cell_type": "markdown",
	"id": "05373081",
	"metadata": {},
	"source": [
	"Now you get to dive into `y.backward()` by first discussing a bit on gradients. As\n",
	"alluded to earlier `y.backward()` is equivalent to `y.sum().backward()`."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 7,
	"id": "56634308",
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([[ 4., 8.],\n",
	" [12., 16.]])"
	]
	},
	"execution_count": 7,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"with autograd.record():\n",
	" y = np.sum(2 * x * x)\n",
	"y.backward()\n",
	"x.grad"
	]
	},
	{
	"cell_type": "markdown",
	"id": "ce8d4363",
	"metadata": {},
	"source": [
	"Additionally, you can only run backward once. Unless you use the flag\n",
	"`retain_graph` to be `True`."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 8,
	"id": "ef32f6ea",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[[ 4. 8.]\n",
	" [12. 16.]]\n",
	"Since you have retained your previous graph you can run backward again\n",
	"[[ 4. 8.]\n",
	" [12. 16.]]\n",
	"However, you can't do backward twice unless you retain the graph.\n"
	]
	}
	],
	"source": [
	"with autograd.record():\n",
	" y = np.sum(2 * x * x)\n",
	"y.backward(retain_graph=True)\n",
	"print(x.grad)\n",
	"print(\"Since you have retained your previous graph you can run backward again\")\n",
	"y.backward()\n",
	"print(x.grad)\n",
	"\n",
	"try:\n",
	" y.backward()\n",
	"except:\n",
	" print(\"However, you can't do backward twice unless you retain the graph.\")"
	]
	},
	{
	"cell_type": "markdown",
	"id": "a93733a7",
	"metadata": {},
	"source": [
	"## Custom MXNet ndarray operations\n",
	"\n",
	"In order to understand the `backward()` method it is beneficial to first\n",
	"understand how you can create custom operations. MXNet operators are classes\n",
	"with a forward and backward method. Where the number of args in `backward()`\n",
	"must equal the number of items returned in the `forward()` method. Additionally,\n",
	"the number of arguments in the `forward()` method must match the number of\n",
	"output arguments from `backward()`. You can modify the gradients in backward to\n",
	"return custom gradients. For instance, below you can return a different gradient then\n",
	"the actual derivative."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 9,
	"id": "8c163e5f",
	"metadata": {},
	"outputs": [],
	"source": [
	"class MyFirstCustomOperation(autograd.Function):\n",
	" def __init__(self):\n",
	" super().__init__()\n",
	"\n",
	" def forward(self,x,y):\n",
	" return 2 * x, 2 * x * y, 2 * y\n",
	"\n",
	" def backward(self, dx, dxy, dy):\n",
	" \"\"\"\n",
	" The input number of arguments must match the number of outputs from forward.\n",
	" Furthermore, the number of output arguments must match the number of inputs from forward.\n",
	" \"\"\"\n",
	" return x, y"
	]
	},
	{
	"cell_type": "markdown",
	"id": "9b617671",
	"metadata": {},
	"source": [
	"Now you can use the first custom operation you have built."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 10,
	"id": "7b08d478",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"True\n",
	"True\n"
	]
	}
	],
	"source": [
	"x = np.random.uniform(-1, 1, (2, 3)) \n",
	"y = np.random.uniform(-1, 1, (2, 3))\n",
	"x.attach_grad()\n",
	"y.attach_grad()\n",
	"with autograd.record():\n",
	" z = MyFirstCustomOperation()\n",
	" z1, z2, z3 = z(x, y)\n",
	" out = z1 + z2 + z3 \n",
	"out.backward()\n",
	"print(np.array_equiv(x.asnumpy(), x.asnumpy()))\n",
	"print(np.array_equiv(y.asnumpy(), y.asnumpy()))"
	]
	},
	{
	"cell_type": "markdown",
	"id": "3a765f3e",
	"metadata": {},
	"source": [
	"Alternatively, you may want to have a function which is different depending on\n",
	"if you are training or not."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 11,
	"id": "7050f47a",
	"metadata": {},
	"outputs": [],
	"source": [
	"def my_first_function(x):\n",
	" if autograd.is_training(): # Return something else when training\n",
	" return(4 * x)\n",
	" else:\n",
	" return(x)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 12,
	"id": "402dccf4",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"True\n",
	"[[1. 1. 1.]\n",
	" [1. 1. 1.]]\n",
	"[[4. 4. 4.]\n",
	" [4. 4. 4.]]\n"
	]
	}
	],
	"source": [
	"y = my_first_function(x)\n",
	"print(np.array_equiv(y.asnumpy(), x.asnumpy()))\n",
	"with autograd.record(train_mode=False):\n",
	" y = my_first_function(x)\n",
	"y.backward()\n",
	"print(x.grad)\n",
	"with autograd.record(train_mode=True): # train_mode = True by default\n",
	" y = my_first_function(x)\n",
	"y.backward()\n",
	"print(x.grad)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "ec67e25d",
	"metadata": {},
	"source": [
	"You could create functions with `autograd.record()`."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 13,
	"id": "fd6c14aa",
	"metadata": {},
	"outputs": [],
	"source": [
	"def my_second_function(x):\n",
	" with autograd.record():\n",
	" return(2 * x)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 14,
	"id": "c494ab13",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[[2. 2. 2.]\n",
	" [2. 2. 2.]]\n"
	]
	}
	],
	"source": [
	"y = my_second_function(x)\n",
	"y.backward()\n",
	"print(x.grad)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "354e78f1",
	"metadata": {},
	"source": [
	"You can also combine multiple functions."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 15,
	"id": "922dc4ef",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[[4. 4. 4.]\n",
	" [4. 4. 4.]]\n"
	]
	}
	],
	"source": [
	"y = my_second_function(x)\n",
	"with autograd.record():\n",
	" z = my_second_function(y) + 2\n",
	"z.backward()\n",
	"print(x.grad)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "cf7af926",
	"metadata": {},
	"source": [
	"Additionally, MXNet records the execution trace and computes the gradient\n",
	"accordingly. The following function `f` doubles the inputs until its `norm`\n",
	"reaches 1000. Then it selects one element depending on the sum of its elements."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 16,
	"id": "126401ee",
	"metadata": {},
	"outputs": [],
	"source": [
	"def f(a):\n",
	" b = a * 2\n",
	" while np.abs(b).sum() < 1000:\n",
	" b = b * 2\n",
	" if b.sum() >= 0:\n",
	" c = b[0]\n",
	" else:\n",
	" c = b[1]\n",
	" return c"
	]
	},
	{
	"cell_type": "markdown",
	"id": "cac69362",
	"metadata": {},
	"source": [
	"In this example, you record the trace and feed in a random value."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 17,
	"id": "dcf32361",
	"metadata": {},
	"outputs": [],
	"source": [
	"a = np.random.uniform(size=2)\n",
	"a.attach_grad()\n",
	"with autograd.record():\n",
	" c = f(a)\n",
	"c.backward()"
	]
	},
	{
	"cell_type": "markdown",
	"id": "5e0deb11",
	"metadata": {},
	"source": [
	"You can see that `b` is a linear function of `a`, and `c` is chosen from `b`.\n",
	"The gradient with respect to `a` be will be either `[c/a[0], 0]` or `[0,\n",
	"c/a[1]]`, depending on which element from `b` is picked. You see the results of\n",
	"this example with this code:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 18,
	"id": "3ff623d4",
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([ True, False])"
	]
	},
	"execution_count": 18,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"a.grad == c / a"
	]
	},
	{
	"cell_type": "markdown",
	"id": "a3d27a5b",
	"metadata": {},
	"source": [
	"As you can notice there are 3 values along the dimension 0, so taking a `mean`\n",
	"along this axis is the same as summing that axis and multiplying by `1/3`.\n",
	"\n",
	"## Advanced MXNet ndarray operations with Autograd\n",
	"\n",
	"You can control gradients for different ndarray operations. For instance,\n",
	"perhaps you want to check that the gradients are propagating properly?\n",
	"the `attach_grad()` method automatically detaches itself from the gradient.\n",
	"Therefore, the input up until y will no longer look like it has `x`. To\n",
	"illustrate this notice that `x.grad` and `y.grad` is not the same in the second\n",
	"example."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 19,
	"id": "fe78d425",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[[14. 14. 14.]\n",
	" [14. 14. 14.]]\n",
	"[[4. 4. 4.]\n",
	" [4. 4. 4.]]\n"
	]
	}
	],
	"source": [
	"with autograd.record():\n",
	" y = 3 * x\n",
	" y.attach_grad()\n",
	" z = 4 * y + 2 * x\n",
	"z.backward()\n",
	"print(x.grad)\n",
	"print(y.grad)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "1b49d8b3",
	"metadata": {},
	"source": [
	"Is not the same as:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 20,
	"id": "191556d5",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[[14. 14. 14.]\n",
	" [14. 14. 14.]]\n",
	"None\n"
	]
	}
	],
	"source": [
	"with autograd.record():\n",
	" y = 3 * x\n",
	" z = 4 * y + 2 * x\n",
	"z.backward()\n",
	"print(x.grad)\n",
	"print(y.grad)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "d841412f",
	"metadata": {},
	"source": [
	"## Next steps\n",
	"\n",
	"Learn how to initialize weights, choose loss function, metrics and optimizers for training your neural network [Step 4: Necessary components\n",
	"to train the neural network](./4-components.ipynb)."
	]
	}
	],
	"metadata": {
	"language_info": {
	"name": "python"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 5
	}