| { |
| "cells": [ |
| { |
| "cell_type": "markdown", |
| "id": "c0cb2eee", |
| "metadata": {}, |
| "source": [ |
| "<!--- Licensed to the Apache Software Foundation (ASF) under one -->\n", |
| "<!--- or more contributor license agreements. See the NOTICE file -->\n", |
| "<!--- distributed with this work for additional information -->\n", |
| "<!--- regarding copyright ownership. The ASF licenses this file -->\n", |
| "<!--- to you under the Apache License, Version 2.0 (the -->\n", |
| "<!--- \"License\"); you may not use this file except in compliance -->\n", |
| "<!--- with the License. You may obtain a copy of the License at -->\n", |
| "\n", |
| "<!--- http://www.apache.org/licenses/LICENSE-2.0 -->\n", |
| "\n", |
| "<!--- Unless required by applicable law or agreed to in writing, -->\n", |
| "<!--- software distributed under the License is distributed on an -->\n", |
| "<!--- \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->\n", |
| "<!--- KIND, either express or implied. See the License for the -->\n", |
| "<!--- specific language governing permissions and limitations -->\n", |
| "<!--- under the License. -->\n", |
| "\n", |
| "# Step 2: Create a neural network\n", |
| "\n", |
| "In this step, you learn how to use NP on Apache MXNet to create neural networks\n", |
| "in Gluon. In addition to the `np` package that you learned about in the previous\n", |
| "step [Step 1: Manipulate data with NP on MXNet](./1-nparray.ipynb), you also need to\n", |
| "import the neural network modules from `gluon`. Gluon includes built-in neural\n", |
| "network layers in the following two modules:\n", |
| "\n", |
| "1. `mxnet.gluon.nn`: NN module that maintained by the mxnet team\n", |
| "2. `mxnet.gluon.contrib.nn`: Experiemental module that is contributed by the\n", |
| "community\n", |
| "\n", |
| "Use the following commands to import the packages required for this step." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "9547d35b", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "from mxnet import np, npx\n", |
| "from mxnet.gluon import nn\n", |
| "npx.set_np() # Change MXNet to the numpy-like mode." |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "1375e3da", |
| "metadata": {}, |
| "source": [ |
| "## Create your neural network's first layer\n", |
| "\n", |
| "In this section, you will create a simple neural network with Gluon. One of the\n", |
| "simplest network you can create is a single **Dense** layer or **densely-\n", |
| "connected** layer. A dense layer consists of nodes in the input that are\n", |
| "connected to every node in the next layer. Use the following code example to\n", |
| "start with a dense layer with five output units." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "146eba7b", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "layer = nn.Dense(5)\n", |
| "layer\n", |
| "# output: Dense(-1 -> 5, linear)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "b6f12663", |
| "metadata": {}, |
| "source": [ |
| "In the example above, the output is `Dense(-1 -> 5, linear)`. The **-1** in the\n", |
| "output denotes that the size of the input layer is not specified during\n", |
| "initialization.\n", |
| "\n", |
| "You can also call the **Dense** layer with an `in_units` parameter if you know\n", |
| "the shape of your input unit." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "fafe5f0c", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "layer = nn.Dense(5,in_units=3)\n", |
| "layer" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "5a594649", |
| "metadata": {}, |
| "source": [ |
| "In addition to the `in_units` param, you can also add an activation function to\n", |
| "the layer using the `activation` param. The Dense layer implements the operation\n", |
| "\n", |
| "$$output = \\sigma(W \\cdot X + b)$$\n", |
| "\n", |
| "Call the Dense layer with an `activation` parameter to use an activation\n", |
| "function." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "40d35eba", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "layer = nn.Dense(5, in_units=3,activation='relu')" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "0c314d58", |
| "metadata": {}, |
| "source": [ |
| "Voila! Congratulations on creating a simple neural network. But for most of your\n", |
| "use cases, you will need to create a neural network with more than one dense\n", |
| "layer or with multiple types of other layers. In addition to the `Dense` layer,\n", |
| "you can find more layers at [mxnet nn layers](../../../api/gluon/nn/index.rst#module-mxnet.gluon.nn)\n", |
| "\n", |
| "So now that you have created a neural network, you are probably wondering how to\n", |
| "pass data into your network?\n", |
| "\n", |
| "First, you need to initialize the network weights, if you use the default\n", |
| "initialization method which draws random values uniformly in the range $[-0.7,\n", |
| "0.7]$. You can see this in the following example.\n", |
| "\n", |
| "**Note**: Initialization is discussed at a little deeper detail in the next\n", |
| "notebook" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "71c4fcf3", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "layer.initialize()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "02d13745", |
| "metadata": {}, |
| "source": [ |
| "Now that you have initialized your network, you can give it data. Passing data\n", |
| "through a network is also called a forward pass. You can do a forward pass with\n", |
| "random data, shown in the following example. First, you create a `(10,3)` shape\n", |
| "random input `x` and feed the data into the layer to compute the output." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "fcdc2359", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "x = np.random.uniform(-1,1,(10,3))\n", |
| "layer(x)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "60dab21f", |
| "metadata": {}, |
| "source": [ |
| "The layer produces a `(10,5)` shape output from your `(10,3)` input.\n", |
| "\n", |
| "**When you don't specify the `in_unit` parameter, the system automatically\n", |
| "infers it during the first time you feed in data during the first forward step\n", |
| "after you create and initialize the weights.**" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "57758745", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "layer.params" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "557b88bb", |
| "metadata": {}, |
| "source": [ |
| "The `weights` and `bias` can be accessed using the `.data()` method." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "1e0f04ee", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "layer.weight.data()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "895217d0", |
| "metadata": {}, |
| "source": [ |
| "## Chain layers into a neural network using nn.Sequential\n", |
| "\n", |
| "Sequential provides a special way of rapidly building networks when when the\n", |
| "network architecture follows a common design pattern: the layers look like a\n", |
| "stack of pancakes. Many networks follow this pattern: a bunch of layers, one\n", |
| "stacked on top of another, where the output of each layer is fed directly to the\n", |
| "input to the next layer. To use sequential, simply provide a list of layers\n", |
| "(pass in the layers by calling `net.add(<Layer goes here!>`). To do this you can\n", |
| "use your previous example of Dense layers and create a 3-layer multi layer\n", |
| "perceptron. You can create a sequential block using `nn.Sequential()` method and\n", |
| "add layers using `add()` method." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "e83bd19f", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "net = nn.Sequential()\n", |
| "\n", |
| "net.add(nn.Dense(5,in_units=3,activation='relu'),\n", |
| " nn.Dense(25, activation='relu'), nn.Dense(2))\n", |
| "net" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "7c3c9b14", |
| "metadata": {}, |
| "source": [ |
| "The layers are ordered exactly the way you defined your neural network with\n", |
| "index starting from 0. You can access the layers by indexing the network using\n", |
| "`[]`." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "a3267d8d", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "net[1]" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "c47a18d0", |
| "metadata": {}, |
| "source": [ |
| "## Create a custom neural network architecture flexibly\n", |
| "\n", |
| "`nn.Sequential()` allows you to create your multi-layer neural network with\n", |
| "existing layers from `gluon.nn`. It also includes a pre-defined `forward()`\n", |
| "function that sequentially executes added layers. But what if the built-in\n", |
| "layers are not sufficient for your needs. If you want to create networks like\n", |
| "ResNet which has complex but repeatable components, how do you create such a\n", |
| "network?\n", |
| "\n", |
| "In gluon, every neural network layer is defined by using a base class\n", |
| "`nn.Block()`. A Block has one main job - define a forward method that takes some\n", |
| "input x and generates an output. A Block can just do something simple like apply\n", |
| "an activation function. It can combine multiple layers together in a single\n", |
| "block or also combine a bunch of other Blocks together in creative ways to\n", |
| "create complex networks like Resnet. In this case, you will construct three\n", |
| "Dense layers. The `forward()` method can then invoke the layers in turn to\n", |
| "generate its output.\n", |
| "\n", |
| "Create a subclass of `nn.Block` and implement two methods by using the following\n", |
| "code.\n", |
| "\n", |
| "- `__init__` create the layers\n", |
| "- `forward` define the forward function." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "884ed058", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "class Net(nn.Block):\n", |
| " def __init__(self):\n", |
| " super().__init__()\n", |
| " def forward(self, x):\n", |
| " return x" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "c456e4e5", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "class MLP(nn.Block):\n", |
| " def __init__(self): super().__init__() self.dense1 = nn.Dense(5,activation='relu') self.dense2 = nn.Dense(25,activation='relu') self.dense3 = nn.Dense(2)\n", |
| " def forward(self, x): layer1 = self.dense1(x) layer2 = self.dense2(layer1) layer3 = self.dense3(layer2) return layer3 net = MLP()\n", |
| "net" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "7f5de477", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "net.dense1.params" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "6440c6cf", |
| "metadata": {}, |
| "source": [ |
| "Each layer includes parameters that are stored in a `Parameter` class. You can\n", |
| "access them using the `params()` method.\n", |
| "\n", |
| "## Creating custom layers using Parameters (Blocks API)\n", |
| "\n", |
| "MXNet includes a `Parameter` method to hold your parameters in each layer. You\n", |
| "can create custom layers using the `Parameter` class to include computation that\n", |
| "may otherwise be not included in the built-in layers. For example, for a dense\n", |
| "layer, the weights and biases will be created using the `Parameter` method. But\n", |
| "if you want to add additional computation to the dense layer, you can create it\n", |
| "using parameter method.\n", |
| "\n", |
| "Instantiate a parameter, e.g weights with a size `(5,0)` using the `shape`\n", |
| "argument." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "c34c6d9b", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "from mxnet.gluon import Parameter\n", |
| "\n", |
| "weight = Parameter(\"custom_parameter_weight\",shape=(5,-1))\n", |
| "bias = Parameter(\"custom_parameter_bias\",shape=(5,-1))\n", |
| "\n", |
| "weight,bias" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "16e94c0f", |
| "metadata": {}, |
| "source": [ |
| "The `Parameter` method includes a `grad_req` argument that specifies how you\n", |
| "want to capture gradients for this Parameter. Under the hood, that lets gluon\n", |
| "know that it has to call `.attach_grad()` on the underlying array. By default,\n", |
| "the gradient is updated everytime the gradient is written to the grad\n", |
| "`grad_req='write'`.\n", |
| "\n", |
| "Now that you know how parameters work, you are ready to create your very own\n", |
| "fully-connected custom layer.\n", |
| "\n", |
| "To create the custom layers using parameters, you can use the same skeleton with\n", |
| "`nn.Block` base class. You will create a custom dense layer that takes parameter\n", |
| "x and returns computed `w*x + b` without any activation function" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "3522ae45", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "class custom_layer(nn.Block):\n", |
| " def __init__(self, out_units, in_units=0):\n", |
| " super().__init__()\n", |
| " self.weight = Parameter(\"weight\", shape=(in_units,out_units), allow_deferred_init=True)\n", |
| " self.bias = Parameter(\"bias\", shape=(out_units,), allow_deferred_init=True)\n", |
| " def forward(self, x):\n", |
| " return np.dot(x, self.weight.data()) + self.bias.data()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "aefd1ca0", |
| "metadata": {}, |
| "source": [ |
| "Parameter can be instantiated before the corresponding data is instantiated. For\n", |
| "example, when you instantiate a Block but the shapes of each parameter still\n", |
| "need to be inferred, the Parameter will wait for the shape to be inferred before\n", |
| "allocating memory." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "be6ca85d", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "dense = custom_layer(3,in_units=5)\n", |
| "dense.initialize()\n", |
| "dense(np.random.uniform(size=(4, 5)))" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "b700626d", |
| "metadata": {}, |
| "source": [ |
| "Similarly, you can use the following code to implement a famous network called\n", |
| "[LeNet](http://yann.lecun.com/exdb/lenet/) through `nn.Block` using the built-in\n", |
| "`Dense` layer and using `custom_layer` as the last layer" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "c49d6c55", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "class LeNet(nn.Block):\n", |
| " def __init__(self):\n", |
| " super().__init__()\n", |
| " self.conv1 = nn.Conv2D(channels=6, kernel_size=3, activation='relu')\n", |
| " self.pool1 = nn.MaxPool2D(pool_size=2, strides=2)\n", |
| " self.conv2 = nn.Conv2D(channels=16, kernel_size=3, activation='relu')\n", |
| " self.pool2 = nn.MaxPool2D(pool_size=2, strides=2)\n", |
| " self.dense1 = nn.Dense(120, activation=\"relu\")\n", |
| " self.dense2 = nn.Dense(84, activation=\"relu\")\n", |
| " self.dense3 = nn.Dense(10)\n", |
| " def forward(self, x):\n", |
| " x = self.conv1(x)\n", |
| " x = self.pool1(x)\n", |
| " x = self.conv2(x)\n", |
| " x = self.pool2(x)\n", |
| " x = self.dense1(x)\n", |
| " x = self.dense2(x)\n", |
| " x = self.dense3(x)\n", |
| " return x\n", |
| "\n", |
| "lenet = LeNet()" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "25c50406", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "class LeNet_custom(nn.Block):\n", |
| " def __init__(self):\n", |
| " super().__init__()\n", |
| " self.conv1 = nn.Conv2D(channels=6, kernel_size=3, activation='relu')\n", |
| " self.pool1 = nn.MaxPool2D(pool_size=2, strides=2)\n", |
| " self.conv2 = nn.Conv2D(channels=16, kernel_size=3, activation='relu')\n", |
| " self.pool2 = nn.MaxPool2D(pool_size=2, strides=2)\n", |
| " self.dense1 = nn.Dense(120, activation=\"relu\")\n", |
| " self.dense2 = nn.Dense(84, activation=\"relu\")\n", |
| " self.dense3 = custom_layer(10,84)\n", |
| " def forward(self, x):\n", |
| " x = self.conv1(x)\n", |
| " x = self.pool1(x)\n", |
| " x = self.conv2(x)\n", |
| " x = self.pool2(x)\n", |
| " x = self.dense1(x)\n", |
| " x = self.dense2(x)\n", |
| " x = self.dense3(x)\n", |
| " return x\n", |
| "\n", |
| "lenet_custom = LeNet_custom()" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "87c65f5f", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "image_data = np.random.uniform(-1,1, (1,1,28,28))\n", |
| "\n", |
| "lenet.initialize()\n", |
| "lenet_custom.initialize()\n", |
| "\n", |
| "print(\"Lenet:\")\n", |
| "print(lenet(image_data))\n", |
| "\n", |
| "print(\"Custom Lenet:\")\n", |
| "print(lenet_custom(image_data))" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "e4e64da1", |
| "metadata": {}, |
| "source": [ |
| "You can use `.data` method to access the weights and bias of a particular layer.\n", |
| "For example, the following accesses the first layer's weight and sixth layer's bias." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "e1853d93", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "lenet.conv1.weight.data().shape, lenet.dense1.bias.data().shape" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "b8738336", |
| "metadata": {}, |
| "source": [ |
| "## Using predefined (pretrained) architectures\n", |
| "\n", |
| "Till now, you have seen how to create your own neural network architectures. But\n", |
| "what if you want to replicate or baseline your dataset using some of the common\n", |
| "models in computer visions or natural language processing (NLP). Gluon includes\n", |
| "common architectures that you can directly use. The Gluon Model Zoo provides a\n", |
| "collection of off-the-shelf models e.g. RESNET, BERT etc. These architectures\n", |
| "are found at:\n", |
| "\n", |
| "- [Gluon CV model zoo](https://cv.gluon.ai/model_zoo/index.html)\n", |
| "\n", |
| "- [Gluon NLP model zoo](https://nlp.gluon.ai/model_zoo/index.html)" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "997784f1", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "from mxnet.gluon import model_zoo\n", |
| "\n", |
| "net = model_zoo.vision.resnet50_v2(pretrained=True)\n", |
| "net.hybridize()\n", |
| "\n", |
| "dummy_input = np.ones(shape=(1,3,224,224))\n", |
| "output = net(dummy_input)\n", |
| "output.shape" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "1831d3d4", |
| "metadata": {}, |
| "source": [ |
| "## Deciding the paradigm for your network\n", |
| "\n", |
| "In MXNet, Gluon API (Imperative programming paradigm) provides a user friendly\n", |
| "way for quick prototyping, easy debugging and natural control flow for people\n", |
| "familiar with python programming.\n", |
| "\n", |
| "However, at the backend, MXNET can also convert the network using Symbolic or\n", |
| "Declarative programming into static graphs with low level optimizations on\n", |
| "operators. However, static graphs are less flexible because any logic must be\n", |
| "encoded into the graph as special operators like scan, while_loop and cond. It’s\n", |
| "also hard to debug.\n", |
| "\n", |
| "So how can you make use of symbolic programming while getting the flexibility of\n", |
| "imperative programming to quickly prototype and debug?\n", |
| "\n", |
| "Enter **HybridBlock**\n", |
| "\n", |
| "HybridBlocks can run in a fully imperatively way where you define their\n", |
| "computation with real functions acting on real inputs. But they’re also capable\n", |
| "of running symbolically, acting on placeholders. Gluon hides most of this under\n", |
| "the hood so you will only need to know how it works when you want to write your\n", |
| "own layers." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "e217cad5", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "net_hybrid_seq = nn.HybridSequential()\n", |
| "\n", |
| "net_hybrid_seq.add(nn.Dense(5,in_units=3,activation='relu'),\n", |
| " nn.Dense(25, activation='relu'), nn.Dense(2) )\n", |
| "net_hybrid_seq" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "08cd6852", |
| "metadata": {}, |
| "source": [ |
| "To compile and optimize `HybridSequential`, you can call its `hybridize` method." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "3dcaa94d", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "net_hybrid_seq.hybridize()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "9fc3b313", |
| "metadata": {}, |
| "source": [ |
| "## Creating custom layers using Parameters (HybridBlocks API)\n", |
| "\n", |
| "When you instantiated your custom layer, you specified the input dimension\n", |
| "`in_units` that initializes the weights with the shape specified by `in_units`\n", |
| "and `out_units`. If you leave the shape of `in_unit` as unknown, you defer the\n", |
| "shape to the first forward pass. For the custom layer, you define the\n", |
| "`infer_shape()` method and let the shape be inferred at runtime." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "73cbfe4f", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "class CustomLayer(nn.HybridBlock):\n", |
| " def __init__(self, out_units, in_units=-1):\n", |
| " super().__init__()\n", |
| " self.weight = Parameter(\"weight\", shape=(in_units, out_units), allow_deferred_init=True)\n", |
| " self.bias = Parameter(\"bias\", shape=(out_units,), allow_deferred_init=True)\n", |
| "\n", |
| " def forward(self, x):\n", |
| " print(self.weight.shape, self.bias.shape)\n", |
| " return np.dot(x, self.weight.data()) + self.bias.data()\n", |
| "\n", |
| " def infer_shape(self, x):\n", |
| " print(self.weight.shape,x.shape)\n", |
| " self.weight.shape = (x.shape[-1],self.weight.shape[1])\n", |
| " dense = CustomLayer(3)\n", |
| "\n", |
| "dense.initialize()\n", |
| "dense(np.random.uniform(size=(4, 5)))" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "e29cb405", |
| "metadata": {}, |
| "source": [ |
| "### Performance\n", |
| "\n", |
| "To get a sense of the speedup from hybridizing, you can compare the performance\n", |
| "before and after hybridizing by measuring the time it takes to make 1000 forward\n", |
| "passes through the network." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "d4ea5ef4", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "from time import time\n", |
| "\n", |
| "def benchmark(net, x):\n", |
| " y = net(x)\n", |
| " start = time()\n", |
| " for i in range(1,1000):\n", |
| " y = net(x)\n", |
| " return time() - start\n", |
| "\n", |
| "x_bench = np.random.normal(size=(1,512))\n", |
| "\n", |
| "net_hybrid_seq = nn.HybridSequential()\n", |
| "\n", |
| "net_hybrid_seq.add(nn.Dense(256,activation='relu'),\n", |
| " nn.Dense(128, activation='relu'),\n", |
| " nn.Dense(2))\n", |
| "net_hybrid_seq.initialize()\n", |
| "\n", |
| "print('Before hybridizing: %.4f sec'%(benchmark(net_hybrid_seq, x_bench)))\n", |
| "net_hybrid_seq.hybridize()\n", |
| "print('After hybridizing: %.4f sec'%(benchmark(net_hybrid_seq, x_bench)))" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "948f2311", |
| "metadata": {}, |
| "source": [ |
| "Peeling back another layer, you also have a `HybridBlock` which is the hybrid\n", |
| "version of the `Block` API.\n", |
| "\n", |
| "Similar to the `Blocks` API, you define a `forward` function for `HybridBlock`\n", |
| "that takes an input `x`. MXNet takes care of hybridizing the model at the\n", |
| "backend so you don't have to make changes to your code to convert it to a\n", |
| "symbolic paradigm." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "d6b0fdaf", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "from mxnet.gluon import HybridBlock\n", |
| "\n", |
| "class MLP_Hybrid(HybridBlock):\n", |
| " def __init__(self):\n", |
| " super().__init__()\n", |
| " self.dense1 = nn.Dense(256,activation='relu')\n", |
| " self.dense2 = nn.Dense(128,activation='relu')\n", |
| " self.dense3 = nn.Dense(2)\n", |
| " def forward(self, x):\n", |
| " layer1 = self.dense1(x)\n", |
| " layer2 = self.dense2(layer1)\n", |
| " layer3 = self.dense3(layer2)\n", |
| " return layer3\n", |
| "\n", |
| "net_hybrid = MLP_Hybrid()\n", |
| "net_hybrid.initialize()\n", |
| "\n", |
| "print('Before hybridizing: %.4f sec'%(benchmark(net_hybrid, x_bench)))\n", |
| "net_hybrid.hybridize()\n", |
| "print('After hybridizing: %.4f sec'%(benchmark(net_hybrid, x_bench)))" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "3ffcfea2", |
| "metadata": {}, |
| "source": [ |
| "Given a HybridBlock whose forward computation consists of going through other\n", |
| "HybridBlocks, you can compile that section of the network by calling the\n", |
| "HybridBlocks `.hybridize()` method.\n", |
| "\n", |
| "All of MXNet’s predefined layers are HybridBlocks. This means that any network\n", |
| "consisting entirely of predefined MXNet layers can be compiled and run at much\n", |
| "faster speeds by calling `.hybridize()`.\n", |
| "\n", |
| "## Saving and Loading your models\n", |
| "\n", |
| "The Blocks API also includes saving your models during and after training so\n", |
| "that you can host the model for inference or avoid training the model again from\n", |
| "scratch. Another reason would be to train your model using one language (like\n", |
| "Python that has a lot of tools for training) and run inference using a different\n", |
| "language.\n", |
| "\n", |
| "There are two ways to save your model in MXNet.\n", |
| "1. Save/load the model weights/parameters only\n", |
| "2. Save/load the model weights/parameters and the architectures\n", |
| "\n", |
| "\n", |
| "### 1. Save/load the model weights/parameters only\n", |
| "\n", |
| "You can use `save_parameters` and `load_parameters` method to save and load the\n", |
| "model weights. Take your simplest model `layer` and save your parameters first.\n", |
| "The model parameters are the params that you save **after** you train your\n", |
| "model." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "0dc3bdcf", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "file_name = 'layer.params'\n", |
| "layer.save_parameters(file_name)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "f1a7b39d", |
| "metadata": {}, |
| "source": [ |
| "And now load this model again. To load the parameters into a model, you will\n", |
| "first have to build the model. To do this, you will need to create a simple\n", |
| "function to build it." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "6126b43f", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "def build_model():\n", |
| " layer = nn.Dense(5, in_units=3,activation='relu')\n", |
| " return layer\n", |
| "\n", |
| "layer_new = build_model()" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "56940832", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "layer_new.load_parameters('layer.params')" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "52bb26e9", |
| "metadata": {}, |
| "source": [ |
| "**Note**: The `save_parameters` and `load_parameters` method is used for models\n", |
| "that use a `Block` method instead of `HybridBlock` method to build the model.\n", |
| "These models may have complex architectures where the model architectures may\n", |
| "change during execution. E.g. if you have a model that uses an if-else\n", |
| "conditional statement to choose between two different architectures.\n", |
| "\n", |
| "### 2. Save/load the model weights/parameters and the architectures\n", |
| "\n", |
| "For models that use the **HybridBlock**, the model architecture stays static and\n", |
| "do no change during execution. Therefore both model parameters **AND**\n", |
| "architecture can be saved and loaded using `export`, `imports` methods.\n", |
| "\n", |
| "Now look at your `MLP_Hybrid` model and export the model using the `export`\n", |
| "function. The export function will export the model architecture into a `.json`\n", |
| "file and model parameters into a `.params` file." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "19e6b3e5", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "net_hybrid.export('MLP_hybrid')" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "445cea58", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "net_hybrid.export('MLP_hybrid')" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "54e2b5f7", |
| "metadata": {}, |
| "source": [ |
| "Similarly, to load this model back, you can use `gluon.nn.SymbolBlock`. To\n", |
| "demonstrate that, load the network serialized above." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "9967d4bf", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "import warnings\n", |
| "with warnings.catch_warnings():\n", |
| " warnings.simplefilter(\"ignore\")\n", |
| " net_loaded = nn.SymbolBlock.imports(\"MLP_hybrid-symbol.json\",\n", |
| " ['data'], \"MLP_hybrid-0000.params\",\n", |
| " ctx=None)" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "61b32669", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "net_loaded(x_bench)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "67c40a64", |
| "metadata": {}, |
| "source": [ |
| "## Visualizing your models\n", |
| "\n", |
| "In MXNet, the `Block.Summary()` method allows you to view the block’s shape\n", |
| "arguments and view the block’s parameters. When you combine multiple blocks into\n", |
| "a model, the `summary()` applied on the model allows you to view each block’s\n", |
| "summary, the total parameters, and the order of the blocks within the model. To\n", |
| "do this the `Block.summary()` method requires one forward pass of the data,\n", |
| "through your network, in order to create the graph necessary for capturing the\n", |
| "corresponding shapes and parameters. Additionally, this method should be called\n", |
| "before the hybridize method, since the hybridize method converts the graph into\n", |
| "a symbolic one, potentially changing the operations for optimal computation.\n", |
| "\n", |
| "Look at the following examples\n", |
| "\n", |
| "- layer: our single layer network\n", |
| "- Lenet: a non-hybridized LeNet network\n", |
| "- net_Hybrid: our MLP Hybrid network" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "ada000b7", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "layer.summary(x)" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "dcad5130", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "lenet.summary(image_data)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "bf70330e", |
| "metadata": {}, |
| "source": [ |
| "You are able to print the summaries of the two networks `layer` and `lenet`\n", |
| "easily since you didn't hybridize the two networks. However, the last network\n", |
| "`net_Hybrid` was hybridized above and throws an `AssertionError` if you try\n", |
| "`net_Hybrid.summary(x_bench)`. To print the summary for `net_Hybrid`, call\n", |
| "another instance of the same network and instantiate it for our summary and then\n", |
| "hybridize it" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "c0116e41", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "net_hybrid_summary = MLP_Hybrid()\n", |
| "\n", |
| "net_hybrid_summary.initialize()\n", |
| "\n", |
| "net_hybrid_summary.summary(x_bench)\n", |
| "\n", |
| "net_hybrid_summary.hybridize()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "b2a99c98", |
| "metadata": {}, |
| "source": [ |
| "## Next steps:\n", |
| "\n", |
| "Now that you have created a neural network, learn how to automatically compute\n", |
| "the gradients in [Step 3: Automatic differentiation with autograd](./3-autograd.ipynb)." |
| ] |
| } |
| ], |
| "metadata": { |
| "language_info": { |
| "name": "python" |
| } |
| }, |
| "nbformat": 4, |
| "nbformat_minor": 5 |
| } |