| { |
| "cells": [ |
| { |
| "cell_type": "markdown", |
| "id": "6a14f615", |
| "metadata": {}, |
| "source": [ |
| "<!--- Licensed to the Apache Software Foundation (ASF) under one -->\n", |
| "<!--- or more contributor license agreements. See the NOTICE file -->\n", |
| "<!--- distributed with this work for additional information -->\n", |
| "<!--- regarding copyright ownership. The ASF licenses this file -->\n", |
| "<!--- to you under the Apache License, Version 2.0 (the -->\n", |
| "<!--- \"License\"); you may not use this file except in compliance -->\n", |
| "<!--- with the License. You may obtain a copy of the License at -->\n", |
| "\n", |
| "<!--- http://www.apache.org/licenses/LICENSE-2.0 -->\n", |
| "\n", |
| "<!--- Unless required by applicable law or agreed to in writing, -->\n", |
| "<!--- software distributed under the License is distributed on an -->\n", |
| "<!--- \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->\n", |
| "<!--- KIND, either express or implied. See the License for the -->\n", |
| "<!--- specific language governing permissions and limitations -->\n", |
| "<!--- under the License. -->\n", |
| "\n", |
| "# Step 3: Automatic differentiation with autograd\n", |
| "\n", |
| "In this step, you learn how to use the MXNet `autograd` package to perform\n", |
| "gradient calculations.\n", |
| "\n", |
| "## Basic use\n", |
| "\n", |
| "To get started, import the `autograd` package with the following code." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "a41a2302", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "from mxnet import np, npx\n", |
| "from mxnet import autograd\n", |
| "npx.set_np()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "cfb4d4cf", |
| "metadata": {}, |
| "source": [ |
| "As an example, you could differentiate a function $f(x) = 2 x^2$ with respect to\n", |
| "parameter $x$. For Autograd, you can start by assigning an initial value of $x$,\n", |
| "as follows:" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "a55c84e6", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "x = np.array([[1, 2], [3, 4]])\n", |
| "x" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "008821e4", |
| "metadata": {}, |
| "source": [ |
| "After you compute the gradient of $f(x)$ with respect to $x$, you need a place\n", |
| "to store it. In MXNet, you can tell a ndarray that you plan to store a gradient\n", |
| "by invoking its `attach_grad` method, as shown in the following example." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "db3eff4d", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "x.attach_grad()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "5f1fecaa", |
| "metadata": {}, |
| "source": [ |
| "Next, define the function $y=f(x)$. To let MXNet store $y$, so that you can\n", |
| "compute gradients later, use the following code to put the definition inside an\n", |
| "`autograd.record()` scope." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "db23322a", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "with autograd.record():\n", |
| " y = 2 * x * x" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "9069f608", |
| "metadata": {}, |
| "source": [ |
| "You can invoke back propagation (backprop) by calling `y.backward()`. When $y$\n", |
| "has more than one entry, `y.backward()` is equivalent to `y.sum().backward()`." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "13350863", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "y.backward()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "59753b31", |
| "metadata": {}, |
| "source": [ |
| "Next, verify whether this is the expected output. Note that $y=2x^2$ and\n", |
| "$\\frac{dy}{dx} = 4x$, which should be `[[4, 8],[12, 16]]`. Check the\n", |
| "automatically computed results." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "6f6e7317", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "x.grad" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "813eec0c", |
| "metadata": {}, |
| "source": [ |
| "Now you get to dive into `y.backward()` by first discussing a bit on gradients. As\n", |
| "alluded to earlier `y.backward()` is equivalent to `y.sum().backward()`." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "0e7fb1d5", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "with autograd.record():\n", |
| " y = np.sum(2 * x * x)\n", |
| "y.backward()\n", |
| "x.grad" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "fb985663", |
| "metadata": {}, |
| "source": [ |
| "Additionally, you can only run backward once. Unless you use the flag\n", |
| "`retain_graph` to be `True`." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "c621c976", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "with autograd.record():\n", |
| " y = np.sum(2 * x * x)\n", |
| "y.backward(retain_graph=True)\n", |
| "print(x.grad)\n", |
| "print(\"Since you have retained your previous graph you can run backward again\")\n", |
| "y.backward()\n", |
| "print(x.grad)\n", |
| "\n", |
| "try:\n", |
| " y.backward()\n", |
| "except:\n", |
| " print(\"However, you can't do backward twice unless you retain the graph.\")" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "e289a1db", |
| "metadata": {}, |
| "source": [ |
| "## Custom MXNet ndarray operations\n", |
| "\n", |
| "In order to understand the `backward()` method it is beneficial to first\n", |
| "understand how you can create custom operations. MXNet operators are classes\n", |
| "with a forward and backward method. Where the number of args in `backward()`\n", |
| "must equal the number of items returned in the `forward()` method. Additionally,\n", |
| "the number of arguments in the `forward()` method must match the number of\n", |
| "output arguments from `backward()`. You can modify the gradients in backward to\n", |
| "return custom gradients. For instance, below you can return a different gradient then\n", |
| "the actual derivative." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "5583ca32", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "class MyFirstCustomOperation(autograd.Function):\n", |
| " def __init__(self):\n", |
| " super().__init__()\n", |
| "\n", |
| " def forward(self,x,y):\n", |
| " return 2 * x, 2 * x * y, 2 * y\n", |
| "\n", |
| " def backward(self, dx, dxy, dy):\n", |
| " \"\"\"\n", |
| " The input number of arguments must match the number of outputs from forward.\n", |
| " Furthermore, the number of output arguments must match the number of inputs from forward.\n", |
| " \"\"\"\n", |
| " return x, y" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "309142cc", |
| "metadata": {}, |
| "source": [ |
| "Now you can use the first custom operation you have built." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "8f820dd8", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "x = np.random.uniform(-1, 1, (2, 3)) \n", |
| "y = np.random.uniform(-1, 1, (2, 3))\n", |
| "x.attach_grad()\n", |
| "y.attach_grad()\n", |
| "with autograd.record():\n", |
| " z = MyFirstCustomOperation()\n", |
| " z1, z2, z3 = z(x, y)\n", |
| " out = z1 + z2 + z3 \n", |
| "out.backward()\n", |
| "print(np.array_equiv(x.asnumpy(), x.asnumpy()))\n", |
| "print(np.array_equiv(y.asnumpy(), y.asnumpy()))" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "6c08fbaf", |
| "metadata": {}, |
| "source": [ |
| "Alternatively, you may want to have a function which is different depending on\n", |
| "if you are training or not." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "57d1d283", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "def my_first_function(x):\n", |
| " if autograd.is_training(): # Return something else when training\n", |
| " return(4 * x)\n", |
| " else:\n", |
| " return(x)" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "706fc5b0", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "y = my_first_function(x)\n", |
| "print(np.array_equiv(y.asnumpy(), x.asnumpy()))\n", |
| "with autograd.record(train_mode=False):\n", |
| " y = my_first_function(x)\n", |
| "y.backward()\n", |
| "print(x.grad)\n", |
| "with autograd.record(train_mode=True): # train_mode = True by default\n", |
| " y = my_first_function(x)\n", |
| "y.backward()\n", |
| "print(x.grad)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "ae3a7bc5", |
| "metadata": {}, |
| "source": [ |
| "You could create functions with `autograd.record()`." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "2703873f", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "def my_second_function(x):\n", |
| " with autograd.record():\n", |
| " return(2 * x)" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "1ba7abb4", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "y = my_second_function(x)\n", |
| "y.backward()\n", |
| "print(x.grad)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "343fa056", |
| "metadata": {}, |
| "source": [ |
| "You can also combine multiple functions." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "d38f4f7b", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "y = my_second_function(x)\n", |
| "with autograd.record():\n", |
| " z = my_second_function(y) + 2\n", |
| "z.backward()\n", |
| "print(x.grad)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "ac2db18a", |
| "metadata": {}, |
| "source": [ |
| "Additionally, MXNet records the execution trace and computes the gradient\n", |
| "accordingly. The following function `f` doubles the inputs until its `norm`\n", |
| "reaches 1000. Then it selects one element depending on the sum of its elements." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "880a5bb8", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "def f(a):\n", |
| " b = a * 2\n", |
| " while np.abs(b).sum() < 1000:\n", |
| " b = b * 2\n", |
| " if b.sum() >= 0:\n", |
| " c = b[0]\n", |
| " else:\n", |
| " c = b[1]\n", |
| " return c" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "8967c555", |
| "metadata": {}, |
| "source": [ |
| "In this example, you record the trace and feed in a random value." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "2d7b9e99", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "a = np.random.uniform(size=2)\n", |
| "a.attach_grad()\n", |
| "with autograd.record():\n", |
| " c = f(a)\n", |
| "c.backward()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "8759dc87", |
| "metadata": {}, |
| "source": [ |
| "You can see that `b` is a linear function of `a`, and `c` is chosen from `b`.\n", |
| "The gradient with respect to `a` be will be either `[c/a[0], 0]` or `[0,\n", |
| "c/a[1]]`, depending on which element from `b` is picked. You see the results of\n", |
| "this example with this code:" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "526bd860", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "a.grad == c / a" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "85e05094", |
| "metadata": {}, |
| "source": [ |
| "As you can notice there are 3 values along the dimension 0, so taking a `mean`\n", |
| "along this axis is the same as summing that axis and multiplying by `1/3`.\n", |
| "\n", |
| "## Advanced MXNet ndarray operations with Autograd\n", |
| "\n", |
| "You can control gradients for different ndarray operations. For instance,\n", |
| "perhaps you want to check that the gradients are propagating properly?\n", |
| "the `attach_grad()` method automatically detaches itself from the gradient.\n", |
| "Therefore, the input up until y will no longer look like it has `x`. To\n", |
| "illustrate this notice that `x.grad` and `y.grad` is not the same in the second\n", |
| "example." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "47d5f509", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "with autograd.record():\n", |
| " y = 3 * x\n", |
| " y.attach_grad()\n", |
| " z = 4 * y + 2 * x\n", |
| "z.backward()\n", |
| "print(x.grad)\n", |
| "print(y.grad)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "d07133bf", |
| "metadata": {}, |
| "source": [ |
| "Is not the same as:" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "a3b454a0", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "with autograd.record():\n", |
| " y = 3 * x\n", |
| " z = 4 * y + 2 * x\n", |
| "z.backward()\n", |
| "print(x.grad)\n", |
| "print(y.grad)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "fc85bec1", |
| "metadata": {}, |
| "source": [ |
| "## Next steps\n", |
| "\n", |
| "Learn how to initialize weights, choose loss function, metrics and optimizers for training your neural network [Step 4: Necessary components\n", |
| "to train the neural network](./4-components.ipynb)." |
| ] |
| } |
| ], |
| "metadata": { |
| "language_info": { |
| "name": "python" |
| } |
| }, |
| "nbformat": 4, |
| "nbformat_minor": 5 |
| } |