versions/master/api/python/docs/_sources/tutorials/getting-started/crash-course/2-create-nn.ipynb - mxnet-site - Git at Google

 {
  "cells": [
   {
    "cell_type": "markdown",
    "id": "f1502018",
    "metadata": {},
    "source": [
     "<!--- Licensed to the Apache Software Foundation (ASF) under one -->\n",
     "<!--- or more contributor license agreements.  See the NOTICE file -->\n",
     "<!--- distributed with this work for additional information -->\n",
     "<!--- regarding copyright ownership.  The ASF licenses this file -->\n",
     "<!--- to you under the Apache License, Version 2.0 (the -->\n",
     "<!--- \"License\"); you may not use this file except in compliance -->\n",
     "<!--- with the License.  You may obtain a copy of the License at -->\n",
     "\n",
     "<!---   http://www.apache.org/licenses/LICENSE-2.0 -->\n",
     "\n",
     "<!--- Unless required by applicable law or agreed to in writing, -->\n",
     "<!--- software distributed under the License is distributed on an -->\n",
     "<!--- \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->\n",
     "<!--- KIND, either express or implied.  See the License for the -->\n",
     "<!--- specific language governing permissions and limitations -->\n",
     "<!--- under the License. -->\n",
     "\n",
     "# Step 2: Create a neural network\n",
     "\n",
     "In this step, you learn how to use NP on Apache MXNet to create neural networks\n",
     "in Gluon. In addition to the `np` package that you learned about in the previous\n",
     "step [Step 1: Manipulate data with NP on MXNet](./1-nparray.ipynb), you also need to\n",
     "import the neural network modules from `gluon`. Gluon includes built-in neural\n",
     "network layers in the following two modules:\n",
     "\n",
     "1. `mxnet.gluon.nn`: NN module that maintained by the mxnet team\n",
     "2. `mxnet.gluon.contrib.nn`: Experiemental module that is contributed by the\n",
     "community\n",
     "\n",
     "Use the following commands to import the packages required for this step."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 1,
    "id": "b5edb85d",
    "metadata": {},
    "outputs": [],
    "source": [
     "from mxnet import np, npx\n",
     "from mxnet.gluon import nn\n",
     "npx.set_np()  # Change MXNet to the numpy-like mode."
    ]
   },
   {
    "cell_type": "markdown",
    "id": "db45d00c",
    "metadata": {},
    "source": [
     "## Create your neural network's first layer\n",
     "\n",
     "In this section, you will create a simple neural network with Gluon. One of the\n",
     "simplest network you can create is a single **Dense** layer or **densely-\n",
     "connected** layer. A dense layer consists of nodes in the input that are\n",
     "connected to every node in the next layer. Use the following code example to\n",
     "start with a dense layer with five output units."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 2,
    "id": "65f9bd2d",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
        "Dense(-1 -> 5, linear)"
       ]
      },
      "execution_count": 2,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "layer = nn.Dense(5)\n",
     "layer\n",
     "# output: Dense(-1 -> 5, linear)"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "1dd2c834",
    "metadata": {},
    "source": [
     "In the example above, the output is `Dense(-1 -> 5, linear)`. The **-1** in the\n",
     "output denotes that the size of the input layer is not specified during\n",
     "initialization.\n",
     "\n",
     "You can also call the **Dense** layer with an `in_units` parameter if you know\n",
     "the shape of your input unit."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 3,
    "id": "76eaacee",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
        "Dense(3 -> 5, linear)"
       ]
      },
      "execution_count": 3,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "layer = nn.Dense(5,in_units=3)\n",
     "layer"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "41a603cf",
    "metadata": {},
    "source": [
     "In addition to the `in_units` param, you can also add an activation function to\n",
     "the layer using the `activation` param. The Dense layer implements the operation\n",
     "\n",
     "$$output = \\sigma(W \\cdot X + b)$$\n",
     "\n",
     "Call the Dense layer with an `activation` parameter to use an activation\n",
     "function."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 4,
    "id": "69a4b1e9",
    "metadata": {},
    "outputs": [],
    "source": [
     "layer = nn.Dense(5, in_units=3,activation='relu')"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "1e319ccd",
    "metadata": {},
    "source": [
     "Voila! Congratulations on creating a simple neural network. But for most of your\n",
     "use cases, you will need to create a neural network with more than one dense\n",
     "layer or with multiple types of other layers. In addition to the `Dense` layer,\n",
     "you can find more layers at [mxnet nn layers](../../../api/gluon/nn/index.rst#module-mxnet.gluon.nn)\n",
     "\n",
     "So now that you have created a neural network, you are probably wondering how to\n",
     "pass data into your network?\n",
     "\n",
     "First, you need to initialize the network weights, if you use the default\n",
     "initialization method which draws random values uniformly in the range $[-0.7,\n",
     "0.7]$. You can see this in the following example.\n",
     "\n",
     "**Note**: Initialization is discussed at a little deeper detail in the next\n",
     "notebook"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 5,
    "id": "021e754d",
    "metadata": {},
    "outputs": [
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
       "[03:51:58] /work/mxnet/src/storage/storage.cc:202: Using Pooled (Naive) StorageManager for CPU\n"
      ]
     }
    ],
    "source": [
     "layer.initialize()"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "2e5925d3",
    "metadata": {},
    "source": [
     "Now that you have initialized your network, you can give it data. Passing data\n",
     "through a network is also called a forward pass. You can do a forward pass with\n",
     "random data, shown in the following example. First, you create a `(10,3)` shape\n",
     "random input `x` and feed the data into the layer to compute the output."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 6,
    "id": "8018b3dd",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
        "array([[0.00881556, 0.01138476, 0.        , 0.        , 0.01936117],\n",
        "       [0.        , 0.0035577 , 0.06854778, 0.03361227, 0.        ],\n",
        "       [0.05338536, 0.01661206, 0.        , 0.00864646, 0.        ],\n",
        "       [0.        , 0.        , 0.        , 0.03724934, 0.03653988],\n",
        "       [0.        , 0.00657675, 0.00472842, 0.        , 0.04593495],\n",
        "       [0.        , 0.00795121, 0.        , 0.        , 0.0595542 ],\n",
        "       [0.02296758, 0.01650022, 0.        , 0.        , 0.03874438],\n",
        "       [0.02207369, 0.02308735, 0.04558432, 0.01468477, 0.        ],\n",
        "       [0.        , 0.02254252, 0.        , 0.        , 0.03834942],\n",
        "       [0.08092358, 0.04224379, 0.        , 0.        , 0.        ]])"
       ]
      },
      "execution_count": 6,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "x = np.random.uniform(-1,1,(10,3))\n",
     "layer(x)"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "102f8c57",
    "metadata": {},
    "source": [
     "The layer produces a `(10,5)` shape output from your `(10,3)` input.\n",
     "\n",
     "**When you don't specify the `in_unit` parameter, the system  automatically\n",
     "infers it during the first time you feed in data during the first forward step\n",
     "after you create and initialize the weights.**"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 7,
    "id": "b54c523e",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
        "{'weight': Parameter (shape=(5, 3), dtype=float32),\n",
        " 'bias': Parameter (shape=(5,), dtype=float32)}"
       ]
      },
      "execution_count": 7,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "layer.params"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "63e986db",
    "metadata": {},
    "source": [
     "The `weights` and `bias` can be accessed using the `.data()` method."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 8,
    "id": "656bda95",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
        "array([[ 0.01607367,  0.05928481, -0.0319057 ],\n",
        "       [-0.05814854,  0.01664302, -0.02215988],\n",
        "       [-0.04094896,  0.03231322,  0.05914024],\n",
        "       [ 0.05500493,  0.03504761,  0.05073748],\n",
        "       [ 0.00943237, -0.06525595, -0.04184696]])"
       ]
      },
      "execution_count": 8,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "layer.weight.data()"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "011e290e",
    "metadata": {},
    "source": [
     "## Chain layers into a neural network using nn.Sequential\n",
     "\n",
     "Sequential provides a special way of rapidly building networks when when the\n",
     "network architecture follows a common design pattern: the layers look like a\n",
     "stack of pancakes. Many networks follow this pattern: a bunch of layers, one\n",
     "stacked on top of another, where the output of each layer is fed directly to the\n",
     "input to the next layer. To use sequential, simply provide a list of layers\n",
     "(pass in the layers by calling `net.add(<Layer goes here!>`). To do this you can\n",
     "use your previous example of Dense layers and create a 3-layer multi layer\n",
     "perceptron. You can create a sequential block using `nn.Sequential()` method and\n",
     "add layers using `add()` method."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 9,
    "id": "a2ab463c",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
        "Sequential(\n",
        "  (0): Dense(3 -> 5, Activation(relu))\n",
        "  (1): Dense(-1 -> 25, Activation(relu))\n",
        "  (2): Dense(-1 -> 2, linear)\n",
        ")"
       ]
      },
      "execution_count": 9,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "net = nn.Sequential()\n",
     "\n",
     "net.add(nn.Dense(5,in_units=3,activation='relu'),\n",
     "        nn.Dense(25, activation='relu'), nn.Dense(2))\n",
     "net"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "4ee4727c",
    "metadata": {},
    "source": [
     "The layers are ordered exactly the way you defined your neural network with\n",
     "index starting from 0. You can access the layers by indexing the network using\n",
     "`[]`."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 10,
    "id": "1b66ee0d",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
        "Dense(-1 -> 25, Activation(relu))"
       ]
      },
      "execution_count": 10,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "net[1]"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "c7f5f323",
    "metadata": {},
    "source": [
     "## Create a custom neural network architecture flexibly\n",
     "\n",
     "`nn.Sequential()` allows you to create your multi-layer neural network with\n",
     "existing layers from `gluon.nn`. It also includes a pre-defined `forward()`\n",
     "function that sequentially executes added layers. But what if the built-in\n",
     "layers are not sufficient for your needs. If you want to create networks like\n",
     "ResNet which has complex but repeatable components, how do you create such a\n",
     "network?\n",
     "\n",
     "In gluon, every neural network layer is defined by using a base class\n",
     "`nn.Block()`. A Block has one main job - define a forward method that takes some\n",
     "input x and generates an output. A Block can just do something simple like apply\n",
     "an activation function. It can combine multiple layers together in a single\n",
     "block or also combine a bunch of other Blocks together in creative ways to\n",
     "create complex networks like Resnet. In this case, you will construct three\n",
     "Dense layers. The `forward()` method can then invoke the layers in turn to\n",
     "generate its output.\n",
     "\n",
     "Create a subclass of `nn.Block` and implement two methods by using the following\n",
     "code.\n",
     "\n",
     "- `__init__` create the layers\n",
     "- `forward` define the forward function."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 11,
    "id": "997c5a6b",
    "metadata": {},
    "outputs": [],
    "source": [
     "class Net(nn.Block):\n",
     "    def __init__(self):\n",
     "        super().__init__()\n",
     "    def forward(self, x):\n",
     "        return x"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 12,
    "id": "b9cc930e",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
        "MLP(\n",
        "  (dense1): Dense(-1 -> 5, Activation(relu))\n",
        "  (dense2): Dense(-1 -> 25, Activation(relu))\n",
        "  (dense3): Dense(-1 -> 2, linear)\n",
        ")"
       ]
      },
      "execution_count": 12,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "class MLP(nn.Block):\n",
     "    def __init__(self):\n",
     "        super().__init__()\n",
     "        self.dense1 = nn.Dense(5,activation='relu')\n",
     "        self.dense2 = nn.Dense(25,activation='relu')\n",
     "        self.dense3 = nn.Dense(2)\n",
     "\n",
     "    def forward(self, x):\n",
     "        layer1 = self.dense1(x)\n",
     "        layer2 = self.dense2(layer1)\n",
     "        layer3 = self.dense3(layer2)\n",
     "        return layer3\n",
     "\n",
     "net = MLP()\n",
     "net"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 13,
    "id": "f275942e",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
        "{'weight': Parameter (shape=(5, -1), dtype=float32),\n",
        " 'bias': Parameter (shape=(5,), dtype=float32)}"
       ]
      },
      "execution_count": 13,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "net.dense1.params"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "ab1ec782",
    "metadata": {},
    "source": [
     "Each layer includes parameters that are stored in a `Parameter` class. You can\n",
     "access them using the `params()` method.\n",
     "\n",
     "## Creating custom layers using Parameters (Blocks API)\n",
     "\n",
     "MXNet includes a `Parameter` method to hold your parameters in each layer. You\n",
     "can create custom layers using the `Parameter` class to include computation that\n",
     "may otherwise be not included in the built-in layers. For example, for a dense\n",
     "layer, the weights and biases will be created using the `Parameter` method. But\n",
     "if you want to add additional computation to the dense layer, you can create it\n",
     "using parameter method.\n",
     "\n",
     "Instantiate a parameter, e.g weights with a size `(5,0)` using the `shape`\n",
     "argument."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 14,
    "id": "2c3088e0",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
        "(Parameter (shape=(5, -1), dtype=<class 'numpy.float32'>),\n",
        " Parameter (shape=(5, -1), dtype=<class 'numpy.float32'>))"
       ]
      },
      "execution_count": 14,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "from mxnet.gluon import Parameter\n",
     "\n",
     "weight = Parameter(\"custom_parameter_weight\",shape=(5,-1))\n",
     "bias = Parameter(\"custom_parameter_bias\",shape=(5,-1))\n",
     "\n",
     "weight,bias"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "e430e51c",
    "metadata": {},
    "source": [
     "The `Parameter` method includes a `grad_req` argument that specifies how you\n",
     "want to capture gradients for this Parameter. Under the hood, that lets gluon\n",
     "know that it has to call `.attach_grad()` on the underlying array. By default,\n",
     "the gradient is updated everytime the gradient is written to the grad\n",
     "`grad_req='write'`.\n",
     "\n",
     "Now that you know how parameters work, you are ready to create your very own\n",
     "fully-connected custom layer.\n",
     "\n",
     "To create the custom layers using parameters, you can use the same skeleton with\n",
     "`nn.Block` base class. You will create a custom dense layer that takes parameter\n",
     "x and returns computed `w*x + b` without any activation function"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 15,
    "id": "9b0c2ca2",
    "metadata": {},
    "outputs": [],
    "source": [
     "class custom_layer(nn.Block):\n",
     "   def __init__(self, out_units, in_units=0):\n",
     "       super().__init__()\n",
     "       self.weight = Parameter(\"weight\", shape=(in_units,out_units), allow_deferred_init=True)\n",
     "       self.bias = Parameter(\"bias\", shape=(out_units,), allow_deferred_init=True)\n",
     "   def forward(self, x):\n",
     "       return np.dot(x, self.weight.data()) + self.bias.data()"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "17d3916e",
    "metadata": {},
    "source": [
     "Parameter can be instantiated before the corresponding data is instantiated. For\n",
     "example, when you instantiate a Block but the shapes of each parameter still\n",
     "need to be inferred, the Parameter will wait for the shape to be inferred before\n",
     "allocating memory."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 16,
    "id": "76991cb3",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
        "array([[-0.05604633, -0.06238654,  0.02687173],\n",
        "       [-0.02687152, -0.04365591, -0.00518382],\n",
        "       [-0.02849396, -0.09980228, -0.00695815],\n",
        "       [-0.04527343, -0.00275569, -0.01376584]])"
       ]
      },
      "execution_count": 16,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "dense = custom_layer(3,in_units=5)\n",
     "dense.initialize()\n",
     "dense(np.random.uniform(size=(4, 5)))"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "fabc9407",
    "metadata": {},
    "source": [
     "Similarly, you can use the following code to implement a famous network called\n",
     "[LeNet](http://yann.lecun.com/exdb/lenet/) through `nn.Block` using the built-in\n",
     "`Dense` layer and using `custom_layer` as the last layer"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 17,
    "id": "58855487",
    "metadata": {},
    "outputs": [],
    "source": [
     "class LeNet(nn.Block):\n",
     "   def __init__(self):\n",
     "       super().__init__()\n",
     "       self.conv1 = nn.Conv2D(channels=6, kernel_size=3, activation='relu')\n",
     "       self.pool1 = nn.MaxPool2D(pool_size=2, strides=2)\n",
     "       self.conv2 = nn.Conv2D(channels=16, kernel_size=3, activation='relu')\n",
     "       self.pool2 = nn.MaxPool2D(pool_size=2, strides=2)\n",
     "       self.dense1 = nn.Dense(120, activation=\"relu\")\n",
     "       self.dense2 = nn.Dense(84, activation=\"relu\")\n",
     "       self.dense3 = nn.Dense(10)\n",
     "   def forward(self, x):\n",
     "       x = self.conv1(x)\n",
     "       x = self.pool1(x)\n",
     "       x = self.conv2(x)\n",
     "       x = self.pool2(x)\n",
     "       x = self.dense1(x)\n",
     "       x = self.dense2(x)\n",
     "       x = self.dense3(x)\n",
     "       return x\n",
     "\n",
     "lenet = LeNet()"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 18,
    "id": "d69ba80c",
    "metadata": {},
    "outputs": [],
    "source": [
     "class LeNet_custom(nn.Block):\n",
     "    def __init__(self):\n",
     "        super().__init__()\n",
     "        self.conv1 = nn.Conv2D(channels=6, kernel_size=3, activation='relu')\n",
     "        self.pool1 = nn.MaxPool2D(pool_size=2, strides=2)\n",
     "        self.conv2 = nn.Conv2D(channels=16, kernel_size=3, activation='relu')\n",
     "        self.pool2 = nn.MaxPool2D(pool_size=2, strides=2)\n",
     "        self.dense1 = nn.Dense(120, activation=\"relu\")\n",
     "        self.dense2 = nn.Dense(84, activation=\"relu\")\n",
     "        self.dense3 = custom_layer(10,84)\n",
     "    def forward(self, x):\n",
     "        x = self.conv1(x)\n",
     "        x = self.pool1(x)\n",
     "        x = self.conv2(x)\n",
     "        x = self.pool2(x)\n",
     "        x = self.dense1(x)\n",
     "        x = self.dense2(x)\n",
     "        x = self.dense3(x)\n",
     "        return x\n",
     "\n",
     "lenet_custom = LeNet_custom()"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 19,
    "id": "32c9aff7",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "Lenet:\n",
       "[[-0.00081668 -0.00340701  0.00199039 -0.00121501 -0.00063157  0.00081476\n",
       "  -0.0011025  -0.00216652 -0.00015363 -0.00110007]]\n",
       "Custom Lenet:\n",
       "[[-0.02656163  0.04184292 -0.00254037 -0.06494093 -0.00952166 -0.01921579\n",
       "   0.05423243 -0.02774546  0.06823301  0.00313227]]\n"
      ]
     }
    ],
    "source": [
     "image_data = np.random.uniform(-1,1, (1,1,28,28))\n",
     "\n",
     "lenet.initialize()\n",
     "lenet_custom.initialize()\n",
     "\n",
     "print(\"Lenet:\")\n",
     "print(lenet(image_data))\n",
     "\n",
     "print(\"Custom Lenet:\")\n",
     "print(lenet_custom(image_data))"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "aa9cc4bd",
    "metadata": {},
    "source": [
     "You can use `.data` method to access the weights and bias of a particular layer.\n",
     "For example, the following  accesses the first layer's weight and sixth layer's bias."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 20,
    "id": "1c293d07",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
        "((6, 1, 3, 3), (120,))"
       ]
      },
      "execution_count": 20,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "lenet.conv1.weight.data().shape, lenet.dense1.bias.data().shape"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "0c94cea7",
    "metadata": {},
    "source": [
     "## Using predefined (pretrained) architectures\n",
     "\n",
     "Till now, you have seen how to create your own neural network architectures. But\n",
     "what if you want to replicate or baseline your dataset using some of the common\n",
     "models in computer visions or natural language processing (NLP). Gluon includes\n",
     "common architectures that you can directly use. The Gluon Model Zoo provides a\n",
     "collection of off-the-shelf models e.g. RESNET, BERT etc. These architectures\n",
     "are found at:\n",
     "\n",
     "- [Gluon CV model zoo](https://cv.gluon.ai/model_zoo/index.html)\n",
     "\n",
     "- [Gluon NLP model zoo](https://nlp.gluon.ai/model_zoo/index.html)"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 21,
    "id": "5b7afa70",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "Downloading /home/jenkins_slave/.mxnet/models/resnet50_v2-ecdde353.zip00383814-e655-4621-a110-5ffefe3eb69c from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet50_v2-ecdde353.zip...\n"
      ]
     },
     {
      "data": {
       "text/plain": [
        "(1, 1000)"
       ]
      },
      "execution_count": 21,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "from mxnet.gluon import model_zoo\n",
     "\n",
     "net = model_zoo.vision.resnet50_v2(pretrained=True)\n",
     "net.hybridize()\n",
     "\n",
     "dummy_input = np.ones(shape=(1,3,224,224))\n",
     "output = net(dummy_input)\n",
     "output.shape"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "af120df5",
    "metadata": {},
    "source": [
     "## Deciding the paradigm for your network\n",
     "\n",
     "In MXNet, Gluon API (Imperative programming paradigm) provides a user friendly\n",
     "way for quick prototyping, easy debugging and natural control flow for people\n",
     "familiar with python programming.\n",
     "\n",
     "However, at the backend, MXNET can also convert the network using Symbolic or\n",
     "Declarative programming into static graphs with low level optimizations on\n",
     "operators. However, static graphs are less flexible because any logic must be\n",
     "encoded into the graph as special operators like scan, while_loop and cond. It’s\n",
     "also hard to debug.\n",
     "\n",
     "So how can you make use of symbolic programming while getting the flexibility of\n",
     "imperative programming to quickly prototype and debug?\n",
     "\n",
     "Enter **HybridBlock**\n",
     "\n",
     "HybridBlocks can run in a fully imperatively way where you define their\n",
     "computation with real functions acting on real inputs. But they’re also capable\n",
     "of running symbolically, acting on placeholders. Gluon hides most of this under\n",
     "the hood so you will only need to know how it works when you want to write your\n",
     "own layers."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 22,
    "id": "abc3493e",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
        "HybridSequential(\n",
        "  (0): Dense(3 -> 5, Activation(relu))\n",
        "  (1): Dense(-1 -> 25, Activation(relu))\n",
        "  (2): Dense(-1 -> 2, linear)\n",
        ")"
       ]
      },
      "execution_count": 22,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "net_hybrid_seq = nn.HybridSequential()\n",
     "\n",
     "net_hybrid_seq.add(nn.Dense(5,in_units=3,activation='relu'),\n",
     " nn.Dense(25, activation='relu'), nn.Dense(2) )\n",
     "net_hybrid_seq"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "581c517f",
    "metadata": {},
    "source": [
     "To compile and optimize `HybridSequential`, you can call its `hybridize` method."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 23,
    "id": "af36f9fc",
    "metadata": {},
    "outputs": [],
    "source": [
     "net_hybrid_seq.hybridize()"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "64823d61",
    "metadata": {},
    "source": [
     "## Creating custom layers using Parameters (HybridBlocks API)\n",
     "\n",
     "When you instantiated your custom layer, you specified the input dimension\n",
     "`in_units` that initializes the weights with the shape specified by `in_units`\n",
     "and `out_units`. If you leave the shape of `in_unit` as unknown, you defer the\n",
     "shape to the first forward pass. For the custom layer, you define the\n",
     "`infer_shape()` method and let the shape be inferred at runtime."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 24,
    "id": "eb9756b1",
    "metadata": {},
    "outputs": [
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
       "/work/mxnet/python/mxnet/util.py:755: UserWarning: Parameter 'weight' is already initialized, ignoring. Set force_reinit=True to re-initialize.\n",
       "  return func(*args, **kwargs)\n",
       "/work/mxnet/python/mxnet/util.py:755: UserWarning: Parameter 'bias' is already initialized, ignoring. Set force_reinit=True to re-initialize.\n",
       "  return func(*args, **kwargs)\n"
      ]
     },
     {
      "data": {
       "text/plain": [
        "array([[-0.07053316, -0.07457963,  0.01166525],\n",
        "       [-0.04170407, -0.07482161,  0.00179428],\n",
        "       [-0.07503258,  0.00660181, -0.01401043],\n",
        "       [-0.02333996, -0.06775613,  0.01459978]])"
       ]
      },
      "execution_count": 24,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "class CustomLayer(nn.HybridBlock):\n",
     "    def __init__(self, out_units, in_units=-1):\n",
     "        super().__init__()\n",
     "        self.weight = Parameter(\"weight\", shape=(in_units, out_units), allow_deferred_init=True)\n",
     "        self.bias = Parameter(\"bias\", shape=(out_units,), allow_deferred_init=True)\n",
     "\n",
     "    def forward(self, x):\n",
     "        print(self.weight.shape, self.bias.shape)\n",
     "        return np.dot(x, self.weight.data()) + self.bias.data()\n",
     "\n",
     "    def infer_shape(self, x):\n",
     "        print(self.weight.shape,x.shape)\n",
     "        self.weight.shape = (x.shape[-1],self.weight.shape[1])\n",
     "        dense = CustomLayer(3)\n",
     "\n",
     "dense.initialize()\n",
     "dense(np.random.uniform(size=(4, 5)))"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "844e4d15",
    "metadata": {},
    "source": [
     "### Performance\n",
     "\n",
     "To get a sense of the speedup from hybridizing, you can compare the performance\n",
     "before and after hybridizing by measuring the time it takes to make 1000 forward\n",
     "passes through the network."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 25,
    "id": "8e0d8df9",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "Before hybridizing: 0.6034 sec\n"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "After hybridizing: 0.2799 sec\n"
      ]
     }
    ],
    "source": [
     "from time import time\n",
     "\n",
     "def benchmark(net, x):\n",
     "    y = net(x)\n",
     "    start = time()\n",
     "    for i in range(1,1000):\n",
     "        y = net(x)\n",
     "    return time() - start\n",
     "\n",
     "x_bench = np.random.normal(size=(1,512))\n",
     "\n",
     "net_hybrid_seq = nn.HybridSequential()\n",
     "\n",
     "net_hybrid_seq.add(nn.Dense(256,activation='relu'),\n",
     "                   nn.Dense(128, activation='relu'),\n",
     "                   nn.Dense(2))\n",
     "net_hybrid_seq.initialize()\n",
     "\n",
     "print('Before hybridizing: %.4f sec'%(benchmark(net_hybrid_seq, x_bench)))\n",
     "net_hybrid_seq.hybridize()\n",
     "print('After hybridizing: %.4f sec'%(benchmark(net_hybrid_seq, x_bench)))"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "6f1f1349",
    "metadata": {},
    "source": [
     "Peeling back another layer, you also have a `HybridBlock` which is the hybrid\n",
     "version of the `Block` API.\n",
     "\n",
     "Similar to the `Blocks` API, you define a `forward` function for `HybridBlock`\n",
     "that takes an input `x`. MXNet takes care of hybridizing the model at the\n",
     "backend so you don't have to make changes to your code to convert it to a\n",
     "symbolic paradigm."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 26,
    "id": "07c6cce6",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "Before hybridizing: 0.5799 sec\n"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "After hybridizing: 0.2603 sec\n"
      ]
     }
    ],
    "source": [
     "from mxnet.gluon import HybridBlock\n",
     "\n",
     "class MLP_Hybrid(HybridBlock):\n",
     "    def __init__(self):\n",
     "        super().__init__()\n",
     "        self.dense1 = nn.Dense(256,activation='relu')\n",
     "        self.dense2 = nn.Dense(128,activation='relu')\n",
     "        self.dense3 = nn.Dense(2)\n",
     "    def forward(self, x):\n",
     "        layer1 = self.dense1(x)\n",
     "        layer2 = self.dense2(layer1)\n",
     "        layer3 = self.dense3(layer2)\n",
     "        return layer3\n",
     "\n",
     "net_hybrid = MLP_Hybrid()\n",
     "net_hybrid.initialize()\n",
     "\n",
     "print('Before hybridizing: %.4f sec'%(benchmark(net_hybrid, x_bench)))\n",
     "net_hybrid.hybridize()\n",
     "print('After hybridizing: %.4f sec'%(benchmark(net_hybrid, x_bench)))"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "12f5f4a9",
    "metadata": {},
    "source": [
     "Given a HybridBlock whose forward computation consists of going through other\n",
     "HybridBlocks, you can compile that section of the network by calling the\n",
     "HybridBlocks `.hybridize()` method.\n",
     "\n",
     "All of MXNet’s predefined layers are HybridBlocks. This means that any network\n",
     "consisting entirely of predefined MXNet layers can be compiled and run at much\n",
     "faster speeds by calling `.hybridize()`.\n",
     "\n",
     "## Saving and Loading your models\n",
     "\n",
     "The Blocks API also includes saving your models during and after training so\n",
     "that you can host the model for inference or avoid training the model again from\n",
     "scratch. Another reason would be to train your model using one language (like\n",
     "Python that has a lot of tools for training) and run inference using a different\n",
     "language.\n",
     "\n",
     "There are two ways to save your model in MXNet.\n",
     "1. Save/load the model weights/parameters only\n",
     "2. Save/load the model weights/parameters and the architectures\n",
     "\n",
     "\n",
     "### 1. Save/load the model weights/parameters only\n",
     "\n",
     "You can use `save_parameters` and `load_parameters` method to save and load the\n",
     "model weights. Take your simplest model `layer` and save your parameters first.\n",
     "The model parameters are the params that you save **after** you train your\n",
     "model."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 27,
    "id": "abcaf4e1",
    "metadata": {},
    "outputs": [],
    "source": [
     "file_name = 'layer.params'\n",
     "layer.save_parameters(file_name)"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "70a86a96",
    "metadata": {},
    "source": [
     "And now load this model again. To load the parameters into a model, you will\n",
     "first have to build the model. To do this, you will need to create a simple\n",
     "function to build it."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 28,
    "id": "2bd2f6b1",
    "metadata": {},
    "outputs": [],
    "source": [
     "def build_model():\n",
     "    layer = nn.Dense(5, in_units=3,activation='relu')\n",
     "    return layer\n",
     "\n",
     "layer_new = build_model()"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 29,
    "id": "47024147",
    "metadata": {},
    "outputs": [],
    "source": [
     "layer_new.load_parameters('layer.params')"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "a1647f15",
    "metadata": {},
    "source": [
     "**Note**: The `save_parameters` and `load_parameters` method is used for models\n",
     "that use a `Block` method instead of  `HybridBlock` method to build the model.\n",
     "These models may have complex architectures where the model architectures may\n",
     "change during execution. E.g. if you have a model that uses an if-else\n",
     "conditional statement to choose between two different architectures.\n",
     "\n",
     "### 2. Save/load the model weights/parameters and the architectures\n",
     "\n",
     "For models that use the **HybridBlock**, the model architecture stays static and\n",
     "do no change during execution. Therefore both model parameters **AND**\n",
     "architecture can be saved and loaded using `export`, `imports` methods.\n",
     "\n",
     "Now look at your `MLP_Hybrid` model and export the model using the `export`\n",
     "function. The export function will export the model architecture into a `.json`\n",
     "file and model parameters into a `.params` file."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 30,
    "id": "0311a523",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
        "('MLP_hybrid-symbol.json', 'MLP_hybrid-0000.params')"
       ]
      },
      "execution_count": 30,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "net_hybrid.export('MLP_hybrid')"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 31,
    "id": "12bea807",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
        "('MLP_hybrid-symbol.json', 'MLP_hybrid-0000.params')"
       ]
      },
      "execution_count": 31,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "net_hybrid.export('MLP_hybrid')"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "380e7cdc",
    "metadata": {},
    "source": [
     "Similarly, to load this model back, you can use `gluon.nn.SymbolBlock`. To\n",
     "demonstrate that, load the network serialized above."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 32,
    "id": "f2322422",
    "metadata": {},
    "outputs": [],
    "source": [
     "import warnings\n",
     "with warnings.catch_warnings():\n",
     "    warnings.simplefilter(\"ignore\")\n",
     "    net_loaded = nn.SymbolBlock.imports(\"MLP_hybrid-symbol.json\",\n",
     "                                        ['data'], \"MLP_hybrid-0000.params\",\n",
     "                                        device=None)"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 33,
    "id": "c7bc1d0f",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
        "array([[ 0.13653663, -0.07247495]])"
       ]
      },
      "execution_count": 33,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "net_loaded(x_bench)"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "e829ef5a",
    "metadata": {},
    "source": [
     "## Visualizing your models\n",
     "\n",
     "In MXNet, the `Block.Summary()` method allows you to view the block’s shape\n",
     "arguments and view the block’s parameters. When you combine multiple blocks into\n",
     "a model, the `summary()` applied on the model allows you to view each block’s\n",
     "summary, the total parameters, and the order of the blocks within the model. To\n",
     "do this the `Block.summary()` method requires one forward pass of the data,\n",
     "through your network, in order to create the graph necessary for capturing the\n",
     "corresponding shapes and parameters. Additionally, this method should be called\n",
     "before the hybridize method, since the hybridize method converts the graph into\n",
     "a symbolic one, potentially changing the operations for optimal computation.\n",
     "\n",
     "Look at the following examples\n",
     "\n",
     "- layer: our single layer network\n",
     "- Lenet: a non-hybridized LeNet network\n",
     "- net_Hybrid: our MLP Hybrid network"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 34,
    "id": "16668c82",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "--------------------------------------------------------------------------------\n",
       "        Layer (type)                                Output Shape         Param #\n",
       "================================================================================\n",
       "               Input                                     (10, 3)               0\n",
       "        Activation-1                                     (10, 5)               0\n",
       "             Dense-2                                     (10, 5)              20\n",
       "================================================================================\n",
       "Parameters in forward computation graph, duplicate included\n",
       "   Total params: 20\n",
       "   Trainable params: 20\n",
       "   Non-trainable params: 0\n",
       "Shared params in forward computation graph: 0\n",
       "Unique parameters in model: 20\n",
       "--------------------------------------------------------------------------------\n"
      ]
     }
    ],
    "source": [
     "layer.summary(x)"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 35,
    "id": "7ece38c2",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "--------------------------------------------------------------------------------\n",
       "        Layer (type)                                Output Shape         Param #\n",
       "================================================================================\n",
       "               Input                              (1, 1, 28, 28)               0\n",
       "        Activation-1                              (1, 6, 26, 26)               0\n",
       "            Conv2D-2                              (1, 6, 26, 26)              60\n",
       "         MaxPool2D-3                              (1, 6, 13, 13)               0\n",
       "        Activation-4                             (1, 16, 11, 11)               0\n",
       "            Conv2D-5                             (1, 16, 11, 11)             880\n",
       "         MaxPool2D-6                               (1, 16, 5, 5)               0\n",
       "        Activation-7                                    (1, 120)               0\n",
       "             Dense-8                                    (1, 120)           48120\n",
       "        Activation-9                                     (1, 84)               0\n",
       "            Dense-10                                     (1, 84)           10164\n",
       "            Dense-11                                     (1, 10)             850\n",
       "            LeNet-12                                     (1, 10)               0\n",
       "================================================================================\n",
       "Parameters in forward computation graph, duplicate included\n",
       "   Total params: 60074\n",
       "   Trainable params: 60074\n",
       "   Non-trainable params: 0\n",
       "Shared params in forward computation graph: 0\n",
       "Unique parameters in model: 60074\n",
       "--------------------------------------------------------------------------------\n"
      ]
     }
    ],
    "source": [
     "lenet.summary(image_data)"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "da1e1a8f",
    "metadata": {},
    "source": [
     "You are able to print the summaries of the two networks `layer` and `lenet`\n",
     "easily since you didn't hybridize the two networks. However, the last network\n",
     "`net_Hybrid` was hybridized above and throws an `AssertionError` if you try\n",
     "`net_Hybrid.summary(x_bench)`. To print the summary for `net_Hybrid`, call\n",
     "another instance of the same network and instantiate it for our summary and then\n",
     "hybridize it"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 36,
    "id": "6f429bc7",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "--------------------------------------------------------------------------------\n",
       "        Layer (type)                                Output Shape         Param #\n",
       "================================================================================\n",
       "               Input                                    (1, 512)               0\n",
       "        Activation-1                                    (1, 256)               0\n",
       "             Dense-2                                    (1, 256)          131328\n",
       "        Activation-3                                    (1, 128)               0\n",
       "             Dense-4                                    (1, 128)           32896\n",
       "             Dense-5                                      (1, 2)             258\n",
       "        MLP_Hybrid-6                                      (1, 2)               0\n",
       "================================================================================\n",
       "Parameters in forward computation graph, duplicate included\n",
       "   Total params: 164482\n",
       "   Trainable params: 164482\n",
       "   Non-trainable params: 0\n",
       "Shared params in forward computation graph: 0\n",
       "Unique parameters in model: 164482\n",
       "--------------------------------------------------------------------------------\n"
      ]
     }
    ],
    "source": [
     "net_hybrid_summary = MLP_Hybrid()\n",
     "\n",
     "net_hybrid_summary.initialize()\n",
     "\n",
     "net_hybrid_summary.summary(x_bench)\n",
     "\n",
     "net_hybrid_summary.hybridize()"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "f76b4ea5",
    "metadata": {},
    "source": [
     "## Next steps:\n",
     "\n",
     "Now that you have created a neural network, learn how to automatically compute\n",
     "the gradients in [Step 3: Automatic differentiation with autograd](./3-autograd.ipynb)."
    ]
   }
  ],
  "metadata": {
   "language_info": {
    "name": "python"
   }
  },
  "nbformat": 4,
  "nbformat_minor": 5
 }