| { |
| "cells": [ |
| { |
| "cell_type": "markdown", |
| "id": "ae65888d", |
| "metadata": {}, |
| "source": [ |
| "<!--- Licensed to the Apache Software Foundation (ASF) under one -->\n", |
| "<!--- or more contributor license agreements. See the NOTICE file -->\n", |
| "<!--- distributed with this work for additional information -->\n", |
| "<!--- regarding copyright ownership. The ASF licenses this file -->\n", |
| "<!--- to you under the Apache License, Version 2.0 (the -->\n", |
| "<!--- \"License\"); you may not use this file except in compliance -->\n", |
| "<!--- with the License. You may obtain a copy of the License at -->\n", |
| "\n", |
| "<!--- http://www.apache.org/licenses/LICENSE-2.0 -->\n", |
| "\n", |
| "<!--- Unless required by applicable law or agreed to in writing, -->\n", |
| "<!--- software distributed under the License is distributed on an -->\n", |
| "<!--- \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->\n", |
| "<!--- KIND, either express or implied. See the License for the -->\n", |
| "<!--- specific language governing permissions and limitations -->\n", |
| "<!--- under the License. -->\n", |
| "\n", |
| "# Layers and Blocks\n", |
| "\n", |
| "<!-- adapted from diveintodeeplearning -->\n", |
| "\n", |
| "As network complexity increases, we move from designing single to entire layers\n", |
| "of neurons.\n", |
| "\n", |
| "Neural network designs like\n", |
| "[ResNet-152](https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf)\n", |
| "have a fair degree of regularity. They consist of *blocks* of repeated (or at\n", |
| "least similarly designed) layers; these blocks then form the basis of more\n", |
| "complex network designs.\n", |
| "\n", |
| "In this section, we'll talk about how to write code that makes such blocks on\n", |
| "demand, just like a Lego factory generates blocks which can be combined to\n", |
| "produce terrific artifacts.\n", |
| "\n", |
| "We start with a very simple block, namely the block for a multilayer\n", |
| "perceptron. A common strategy would be to construct a two-layer network as\n", |
| "follows:" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 1, |
| "id": "61fd937b", |
| "metadata": { |
| "attributes": { |
| "classes": [], |
| "id": "", |
| "n": "1" |
| } |
| }, |
| "outputs": [ |
| { |
| "name": "stderr", |
| "output_type": "stream", |
| "text": [ |
| "[02:44:10] /work/mxnet/src/storage/storage.cc:202: Using Pooled (Naive) StorageManager for CPU\n" |
| ] |
| }, |
| { |
| "data": { |
| "text/plain": [ |
| "array([[ 0.01755518, -0.03655044, 0.02134394, 0.05062544, -0.03138772,\n", |
| " 0.02526961, -0.05070939, -0.04725305, -0.04259878, -0.01433349],\n", |
| " [-0.02081584, -0.04386109, 0.03806401, 0.06545892, -0.00233529,\n", |
| " 0.05181789, -0.04411978, -0.08837319, -0.04048262, -0.04407067]])" |
| ] |
| }, |
| "execution_count": 1, |
| "metadata": {}, |
| "output_type": "execute_result" |
| } |
| ], |
| "source": [ |
| "import mxnet as mx\n", |
| "from mxnet import np, npx\n", |
| "from mxnet.gluon import nn, Block, Parameter, Constant\n", |
| "\n", |
| "\n", |
| "x = np.random.uniform(size=(2, 20))\n", |
| "\n", |
| "net = nn.Sequential()\n", |
| "net.add(nn.Dense(256, activation='relu'))\n", |
| "net.add(nn.Dense(10))\n", |
| "net.initialize()\n", |
| "net(x)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "18308e83", |
| "metadata": {}, |
| "source": [ |
| "This generates a network with a hidden layer of $256$ units, followed by a ReLU\n", |
| "activation and another $10$ units governing the output. In particular, we used\n", |
| "the [nn.Sequential](../../../../api/gluon/nn/index.rst#mxnet.gluon.nn.Sequential)\n", |
| "constructor to generate an empty network into which we then inserted both\n", |
| "layers. What exactly happens inside `nn.Sequential`\n", |
| "has remained rather mysterious so far. In the following we will see that this\n", |
| "really just constructs a block that is a container for other blocks. These\n", |
| "blocks can be combined into larger artifacts, often recursively. The diagram\n", |
| "below shows how:\n", |
| "\n", |
| "![Blocks can be used recursively to form larger artifacts](/_static/blocks.svg)\n", |
| "\n", |
| "In the following we will explain the various steps needed to go from defining\n", |
| "layers to defining blocks (of one or more layers):\n", |
| "\n", |
| "1. Blocks take data as input.\n", |
| "1. Blocks store state in the form of parameters that are inherent to the block.\n", |
| " For instance, the block above contains two hidden layers, and we need a\n", |
| " place to store parameters for it.\n", |
| "1. Blocks produce meaningful output. This is typically encoded in what\n", |
| " we will call the `forward` function. It allows us to invoke a block via\n", |
| " `net(X)` to obtain the desired output. What happens behind the scenes is\n", |
| " that it invokes `forward` to perform forward propagation (also called\n", |
| " forward computation).\n", |
| "1. Blocks initialize the parameters in a lazy fashion as part of the first\n", |
| " `forward` call.\n", |
| "1. Blocks calculate a gradient with regard to their input when invoking\n", |
| " `backward`. Typically this is automatic.\n", |
| "\n", |
| "## A Sequential Block\n", |
| "\n", |
| "The [Block](../../../../api/gluon/block.rst#mxnet.gluon.Block) class is a\n", |
| "generic component describing data flow. When the data flows through a sequence\n", |
| "of blocks, each block applied to the output of the one before with the first\n", |
| "block being applied on the input data itself, we have a special kind of block,\n", |
| "namely the `Sequential` block.\n", |
| "\n", |
| "`Sequential` has helper methods to manage the sequence, with `add` being the\n", |
| "main one of interest allowing you to append blocks in sequence. Once the\n", |
| "operations have been added, the forward computation of the model applies the\n", |
| "blocks on the input data in the order they were added. Below, we implement a\n", |
| "`MySequential` class that has the same functionality as the `Sequential` class.\n", |
| "This may help you understand more clearly how the `Sequential` class works." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 2, |
| "id": "fed70c21", |
| "metadata": { |
| "attributes": { |
| "classes": [], |
| "id": "", |
| "n": "3" |
| } |
| }, |
| "outputs": [], |
| "source": [ |
| "class MySequential(Block):\n", |
| " def __init__(self):\n", |
| " super(MySequential, self).__init__()\n", |
| " self._layers = []\n", |
| "\n", |
| " def add(self, block):\n", |
| " # Here, block is an instance of a Block subclass, and we assume it has a unique name. We save it in the\n", |
| " # member variable _layers of the Block class, and its type is List. When the MySequential instance\n", |
| " # calls the initialize function, the system automatically initializes all members of _layers.\n", |
| " self._layers.append(block)\n", |
| " self.register_child(block)\n", |
| "\n", |
| " def forward(self, x):\n", |
| " # OrderedDict guarantees that members will be traversed in the order they were added.\n", |
| " for block in self._children.values():\n", |
| " x = block()(x)\n", |
| " return x" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "367494f9", |
| "metadata": {}, |
| "source": [ |
| "At its core is the `add` method. It adds any block to the ordered dictionary of\n", |
| "children. These are then executed in sequence when forward propagation is\n", |
| "invoked. Let's see what the MLP looks like now." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 3, |
| "id": "20e66b47", |
| "metadata": { |
| "attributes": { |
| "classes": [], |
| "id": "", |
| "n": "4" |
| } |
| }, |
| "outputs": [ |
| { |
| "data": { |
| "text/plain": [ |
| "array([[ 0.00443557, 0.04548185, -0.0020682 , 0.02690094, -0.03028263,\n", |
| " -0.00693051, -0.0526855 , 0.07452135, 0.05914916, 0.00598204],\n", |
| " [-0.01268907, 0.01685787, -0.0594459 , 0.00842154, -0.07675594,\n", |
| " 0.05660268, -0.03371882, 0.02692341, 0.06052345, -0.03554988]])" |
| ] |
| }, |
| "execution_count": 3, |
| "metadata": {}, |
| "output_type": "execute_result" |
| } |
| ], |
| "source": [ |
| "net = MySequential()\n", |
| "net.add(nn.Dense(256, activation='relu'))\n", |
| "net.add(nn.Dense(10))\n", |
| "net.initialize()\n", |
| "net(x)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "12363d96", |
| "metadata": {}, |
| "source": [ |
| "Indeed, it is no different than It can observed here that the use of the\n", |
| "`MySequential` class is no different from the use of the Sequential class.\n", |
| "\n", |
| "\n", |
| "## A Custom Block\n", |
| "\n", |
| "It is easy to go beyond simple concatenation with `Sequential`. The\n", |
| "`Block` class provides the functionality required to make such customizations.\n", |
| "`Block` has a model constructor provided in the `nn` module, which we can\n", |
| "inherit to define the model we want. The following inherits the `Block` class to\n", |
| "construct the multilayer perceptron mentioned at the beginning of this section.\n", |
| "The `MLP` class defined here overrides the `__init__` and `forward` functions\n", |
| "of the Block class. They are used to create model parameters and define forward\n", |
| "computations, respectively. Forward computation is also forward propagation." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 4, |
| "id": "366062b9", |
| "metadata": { |
| "attributes": { |
| "classes": [], |
| "id": "", |
| "n": "1" |
| } |
| }, |
| "outputs": [], |
| "source": [ |
| "class MLP(nn.Block):\n", |
| " # Declare a layer with model parameters. Here, we declare two fully\n", |
| " # connected layers.\n", |
| "\n", |
| " def __init__(self, **kwargs):\n", |
| " # Call the constructor of the MLP parent class Block to perform the\n", |
| " # necessary initialization. In this way, other function parameters can\n", |
| " # also be specified when constructing an instance, such as the model\n", |
| " # parameter, params, described in the following sections.\n", |
| " super(MLP, self).__init__(**kwargs)\n", |
| " self.hidden = nn.Dense(256, activation='relu') # Hidden layer\n", |
| " self.output = nn.Dense(10) # Output layer\n", |
| "\n", |
| " # Define the forward computation of the model, that is, how to return the\n", |
| " # required model output based on the input x.\n", |
| "\n", |
| " def forward(self, x):\n", |
| " hidden_out = self.hidden(x)\n", |
| " return self.output(hidden_out)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "1162b5a6", |
| "metadata": {}, |
| "source": [ |
| "Let's look at it a bit more closely. The `forward` method invokes a network\n", |
| "simply by evaluating the hidden layer `self.hidden(x)` and subsequently by\n", |
| "evaluating the output layer `self.output( ... )`. This is what we expect in the\n", |
| "forward pass of this block.\n", |
| "\n", |
| "In order for the block to know what it needs to evaluate, we first need to\n", |
| "define the layers. This is what the `__init__` method does. It first\n", |
| "initializes all of the Block-related parameters and then constructs the\n", |
| "requisite layers. This attaches the coresponding layers and the required\n", |
| "parameters to the class. Note that there is no need to define a backpropagation\n", |
| "method in the class. The system automatically generates the `backward` method\n", |
| "needed for back propagation by automatically finding the gradient (see the tutorial\n", |
| "on [autograd](../../autograd/index.ipynb)). The same applies to the\n", |
| "[initialize](../../../../api/gluon/nn/index.rst#mxnet.gluon.nn.Block.initialize)\n", |
| "method, which is generated automatically. Let's try\n", |
| "this out:" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 5, |
| "id": "7c046162", |
| "metadata": { |
| "attributes": { |
| "classes": [], |
| "id": "", |
| "n": "2" |
| } |
| }, |
| "outputs": [ |
| { |
| "data": { |
| "text/plain": [ |
| "array([[-0.00342566, 0.01969311, -0.00701348, -0.0205308 , -0.06892128,\n", |
| " 0.013377 , 0.01372573, 0.02246431, -0.06926788, 0.10998657],\n", |
| " [-0.05067467, -0.02051876, -0.02446966, 0.03369933, -0.04971834,\n", |
| " 0.02028071, -0.03136851, 0.04077387, -0.07098567, 0.0912057 ]])" |
| ] |
| }, |
| "execution_count": 5, |
| "metadata": {}, |
| "output_type": "execute_result" |
| } |
| ], |
| "source": [ |
| "net = MLP()\n", |
| "net.initialize()\n", |
| "net(x)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "bde1858d", |
| "metadata": {}, |
| "source": [ |
| "As explained above, the `Block` class can be quite versatile in terms of what it\n", |
| "does. For instance, its subclass can be a layer (such as the `Dense` class\n", |
| "provided by Gluon), it can be a model (such as the `MLP` class we just derived),\n", |
| "or it can be a part of a model (this is what typically happens when designing\n", |
| "very deep networks). Throughout this chapter we will see how to use this with\n", |
| "great flexibility.\n", |
| "\n", |
| "\n", |
| "## Coding with `Blocks`\n", |
| "\n", |
| "### Blocks\n", |
| "The [Sequential](../../../../api/gluon/nn/index.rst#mxnet.gluon.nn.Sequential) class\n", |
| "can make model construction easier and does not require you to define the\n", |
| "`forward` method; however, directly inheriting from\n", |
| "its parent class, [Block](../../../../api/gluon/block.rst#mxnet.gluon.Block), can greatly\n", |
| "expand the flexibility of model construction. For example, implementing the\n", |
| "`forward` method means you can introduce control flow in the network.\n", |
| "\n", |
| "### Constant parameters\n", |
| "Now we'd like to introduce the notation of a *constant* parameter. These are\n", |
| "parameters that are not used when invoking backpropagation. This sounds very\n", |
| "abstract but here's what's really going on.\n", |
| "Assume that we have some function\n", |
| "\n", |
| "$$f(\\mathbf{x},\\mathbf{w}) = 3 \\cdot \\mathbf{w}^\\top \\mathbf{x}.$$\n", |
| "\n", |
| "In this case $3$ is a constant parameter. We could change $3$ to something else,\n", |
| "say $c$ via\n", |
| "\n", |
| "$$f(\\mathbf{x},\\mathbf{w}) = c \\cdot \\mathbf{w}^\\top \\mathbf{x}.$$\n", |
| "\n", |
| "Nothing has really changed, except that we can adjust the value of $c$. It is\n", |
| "still a constant as far as $\\mathbf{w}$ and $\\mathbf{x}$ are concerned. However,\n", |
| "Gluon doesn't know about this unless we create it with `get_constant`\n", |
| "(this makes the code go faster, too, since we're not sending the Gluon engine\n", |
| "on a wild goose chase after a parameter that doesn't change)." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 6, |
| "id": "33a8b75d", |
| "metadata": { |
| "attributes": { |
| "classes": [], |
| "id": "", |
| "n": "5" |
| } |
| }, |
| "outputs": [], |
| "source": [ |
| "class FancyMLP(nn.Block):\n", |
| " def __init__(self, **kwargs):\n", |
| " super(FancyMLP, self).__init__(**kwargs)\n", |
| "\n", |
| " # Random weight parameters created with the get_constant are not\n", |
| " # iterated during training (i.e. constant parameters).\n", |
| " self.rand_weight = Constant(np.random.uniform(size=(20, 20)))\n", |
| " self.dense = nn.Dense(20, activation='relu')\n", |
| "\n", |
| " def forward(self, x):\n", |
| " x = self.dense(x)\n", |
| " # Use the constant parameters created, as well as the ReLU and dot\n", |
| " # functions of NDArray.\n", |
| "\n", |
| " x = npx.relu(np.dot(x, self.rand_weight.data()) + 1)\n", |
| " # Re-use the fully connected layer. This is equivalent to sharing\n", |
| " # parameters with two fully connected layers.\n", |
| " x = self.dense(x)\n", |
| " # Here in the control flow, we need to call `item` to return the\n", |
| " # scalar for comparison.\n", |
| "\n", |
| " while npx.norm(x).item() > 1:\n", |
| " x /= 2\n", |
| " if npx.norm(x).item() < 0.8:\n", |
| " x *= 10\n", |
| " return x.sum()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "843e2877", |
| "metadata": {}, |
| "source": [ |
| "In this `FancyMLP` model, we used constant weight `rand_weight` (note that it is\n", |
| "not a model parameter), performed a matrix multiplication operation (`nd.dot`),\n", |
| "and reused the *same* `Dense` layer. Note that this is very different from using\n", |
| "two dense layers with different sets of parameters. Instead, we used the same\n", |
| "network twice. Quite often in deep networks one also says that the parameters\n", |
| "are *tied* when one wants to express that multiple parts of a network share the\n", |
| "same parameters. Let's see what happens if we construct it and feed data through\n", |
| "it." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 7, |
| "id": "779f85f5", |
| "metadata": { |
| "attributes": { |
| "classes": [], |
| "id": "", |
| "n": "6" |
| } |
| }, |
| "outputs": [ |
| { |
| "data": { |
| "text/plain": [ |
| "array(27.227854)" |
| ] |
| }, |
| "execution_count": 7, |
| "metadata": {}, |
| "output_type": "execute_result" |
| } |
| ], |
| "source": [ |
| "net = FancyMLP()\n", |
| "net.initialize()\n", |
| "net(x)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "d4a2588c", |
| "metadata": {}, |
| "source": [ |
| "There's no reason why we couldn't mix and match these ways of building a\n", |
| "network. Obviously the example below resembles a [Rube Goldberg\n", |
| "Machine](https://en.wikipedia.org/wiki/Rube_Goldberg_machine). That said, it\n", |
| "combines examples for building a block from individual blocks,\n", |
| "which in turn, may be blocks themselves. Furthermore, we can even combine\n", |
| "multiple strategies inside the same forward function. To demonstrate this,\n", |
| "here's the network." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 8, |
| "id": "59626660", |
| "metadata": { |
| "attributes": { |
| "classes": [], |
| "id": "", |
| "n": "7" |
| } |
| }, |
| "outputs": [ |
| { |
| "data": { |
| "text/plain": [ |
| "array(19.840149)" |
| ] |
| }, |
| "execution_count": 8, |
| "metadata": {}, |
| "output_type": "execute_result" |
| } |
| ], |
| "source": [ |
| "class NestMLP(nn.Block):\n", |
| " def __init__(self, **kwargs):\n", |
| " super(NestMLP, self).__init__(**kwargs)\n", |
| " self.net = nn.Sequential()\n", |
| " self.net.add(nn.Dense(64, activation='relu'),\n", |
| " nn.Dense(32, activation='relu'))\n", |
| " self.dense = nn.Dense(16, activation='relu')\n", |
| "\n", |
| " def forward(self, x):\n", |
| " return self.dense(self.net(x))\n", |
| "\n", |
| "chimera = nn.Sequential()\n", |
| "chimera.add(NestMLP(), nn.Dense(20), FancyMLP())\n", |
| "\n", |
| "chimera.initialize()\n", |
| "chimera(x)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "14ab4937", |
| "metadata": {}, |
| "source": [ |
| "## Hybridization\n", |
| "\n", |
| "The reader may be starting to think about the efficiency of this Python code.\n", |
| "After all, we have lots of dictionary lookups, code execution, and lots of\n", |
| "other Pythonic things going on in what is supposed to be a high performance\n", |
| "deep learning library. The problems of Python's [Global Interpreter\n", |
| "Lock](https://wiki.python.org/moin/GlobalInterpreterLock) are well\n", |
| "known.\n", |
| "\n", |
| "In the device of deep learning, we often have highly performant GPUs that\n", |
| "depend on CPUs running Python to tell them what to do. This mismatch can\n", |
| "manifest in the form of GPU starvation when the CPUs can not provide\n", |
| "instruction fast enough. We can improve this situation by deferring to a more\n", |
| "performant language instead of Python when possible.\n", |
| "\n", |
| "Gluon does this by allowing for [Hybridization](hybridize.ipynb). In it, the\n", |
| "Python interpreter executes the block the first time it's invoked. The Gluon\n", |
| "runtime records what is happening and the next time around it short circuits\n", |
| "any calls to Python. This can accelerate things considerably in some cases but\n", |
| "care needs to be taken with [control flow](../../autograd/index.ipynb#Advanced:-Using-Python-control-flow)." |
| ] |
| } |
| ], |
| "metadata": { |
| "language_info": { |
| "name": "python" |
| } |
| }, |
| "nbformat": 4, |
| "nbformat_minor": 5 |
| } |