blob: d2bd6574d85d87f5c7ac9cdbf7cd5919551e1d47 [file] [log] [blame]
{
"cells": [
{
"cell_type": "markdown",
"id": "f30f1834",
"metadata": {},
"source": [
"<!--- Licensed to the Apache Software Foundation (ASF) under one -->\n",
"<!--- or more contributor license agreements. See the NOTICE file -->\n",
"<!--- distributed with this work for additional information -->\n",
"<!--- regarding copyright ownership. The ASF licenses this file -->\n",
"<!--- to you under the Apache License, Version 2.0 (the -->\n",
"<!--- \"License\"); you may not use this file except in compliance -->\n",
"<!--- with the License. You may obtain a copy of the License at -->\n",
"\n",
"<!--- http://www.apache.org/licenses/LICENSE-2.0 -->\n",
"\n",
"<!--- Unless required by applicable law or agreed to in writing, -->\n",
"<!--- software distributed under the License is distributed on an -->\n",
"<!--- \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->\n",
"<!--- KIND, either express or implied. See the License for the -->\n",
"<!--- specific language governing permissions and limitations -->\n",
"<!--- under the License. -->\n",
"\n",
"\n",
"# Custom Layers\n",
"\n",
"While Gluon API for Apache MxNet comes with [a decent number of pre-defined layers](https://mxnet.apache.org/versions/master/api/python/docs/api/gluon/nn/index.html), at some point one may find that a new layer is needed. Adding a new layer in Gluon API is straightforward, yet there are a few things that one needs to keep in mind.\n",
"\n",
"In this article, I will cover how to create a new layer from scratch, how to use it, what are possible pitfalls and how to avoid them.\n",
"\n",
"## The simplest custom layer\n",
"\n",
"To create a new layer in Gluon API, one must create a class that inherits from [Block](https://github.com/apache/incubator-mxnet/blob/c9818480680f84daa6e281a974ab263691302ba8/python/mxnet/gluon/block.py#L128) class. This class provides the most basic functionality, and all pre-defined layers inherit from it directly or via other subclasses. Because each layer in Apache MxNet inherits from `Block`, words \"layer\" and \"block\" are used interchangeable inside of the Apache MxNet community.\n",
"\n",
"The only instance method needed to be implemented is [forward(self, x)](https://github.com/apache/incubator-mxnet/blob/c9818480680f84daa6e281a974ab263691302ba8/python/mxnet/gluon/block.py#L909), which defines what exactly your layer is going to do during forward propagation. Notice, that it doesn't require to provide what the block should do during back propogation. Back propogation pass for blocks is done by Apache MxNet for you. \n",
"\n",
"In the example below, we define a new layer and implement `forward()` method to normalize input data by fitting it into a range of [0, 1]."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "573b9b80",
"metadata": {},
"outputs": [],
"source": [
"# Do some initial imports used throughout this tutorial \n",
"from __future__ import print_function\n",
"import mxnet as mx\n",
"from mxnet import np, npx, gluon, autograd\n",
"from mxnet.gluon.nn import Dense\n",
"mx.np.random.seed(1) # Set seed for reproducable results"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "db9888d0",
"metadata": {},
"outputs": [],
"source": [
"class NormalizationLayer(gluon.Block):\n",
" def __init__(self):\n",
" super(NormalizationLayer, self).__init__()\n",
"\n",
" def forward(self, x):\n",
" return (x - np.min(x)) / (np.max(x) - np.min(x))"
]
},
{
"cell_type": "markdown",
"id": "46244001",
"metadata": {},
"source": [
"The rest of methods of the `Block` class are already implemented, and majority of them are used to work with parameters of a block. There is one very special method named [hybridize()](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/block.py#L384), though, which I am going to cover before moving to a more complex example of a custom layer.\n",
"\n",
"## Hybridization and the difference between Block and HybridBlock\n",
"\n",
"Looking into implementation of [existing layers](https://mxnet.apache.org/versions/master/api/python/docs/api/gluon/nn/index.html), one may find that more often a block inherits from a [HybridBlock](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/block.py#L428), instead of directly inheriting from `Block`.\n",
"\n",
"The reason for that is that `HybridBlock` allows to write custom layers in imperative programming style, while computing in a symbolic way. It unifies the flexibility of imperative programming with the performance benefits of symbolic programming. You can learn more about the difference between symbolic and imperative programming from [this article](https://mxnet.apache.org/api/architecture/overview.html).\n",
"\n",
"Hybridization is a process that Apache MxNet uses to create a symbolic graph of a forward computation. This allows to increase computation performance by optimizing the computational symbolic graph. Once the symbolic graph is created, Apache MxNet caches and reuses it for subsequent computations.\n",
"\n",
"Hybridization of HybridBlock.forward is based on a deferred computation mode in the MXNet backend, which enables recording computation via tracing in the mxnet.nd and mxnet.np interfaces. The recorded computation can be exported to a symbolic representation and is used for optimized execution with the CachedOp.\n",
"\n",
"As tracing is based on the imperative APIs, users can access shape information of the arrays. As x.shape for some array x is a python tuple, any use of that shape will be a constant in the recorded graph and may limit the recorded graph to be used with inputs of the same shape only.\n",
"\n",
"Knowing this, we can rewrite our example layer, using HybridBlock:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "1a1131a2",
"metadata": {},
"outputs": [],
"source": [
"class NormalizationHybridLayer(gluon.HybridBlock):\n",
" def __init__(self):\n",
" super(NormalizationHybridLayer, self).__init__()\n",
"\n",
" def forward(self, x):\n",
" return (x - np.min(x)) / (np.max(x) - np.min(x))"
]
},
{
"cell_type": "markdown",
"id": "d93b13e7",
"metadata": {},
"source": [
"Thanks to inheriting from HybridBlock, one can easily do forward pass on a given ndarray, either on CPU or GPU:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "42d97894",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"[04:45:49] /work/mxnet/src/storage/storage.cc:202: Using Pooled (Naive) StorageManager for CPU\n"
]
},
{
"data": {
"text/plain": [
"array([0. , 0.5, 1. ])"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"layer = NormalizationHybridLayer()\n",
"layer(np.array([1, 2, 3], device=mx.cpu()))"
]
},
{
"cell_type": "markdown",
"id": "1e9adf50",
"metadata": {},
"source": [
"Output:"
]
},
{
"cell_type": "markdown",
"id": "a297d9d5",
"metadata": {},
"source": [
"```bash\n",
"[0. 0.5 1. ]\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "1bfa10f9",
"metadata": {},
"source": [
"As a rule of thumb, one should always implement custom layers by inheriting from `HybridBlock`. This allows to have more flexibility, and doesn't affect execution speed once hybridization is done. \n",
"\n",
"Unfortunately, at the moment of writing this tutorial, NLP related layers such as [RNN](../../../../api/gluon/rnn/index.rst#mxnet.gluon.rnn.RNN), [GRU](../../../../api/gluon/rnn/index.rst#mxnet.gluon.rnn.GRU) and [LSTM](../../../../api/gluon/rnn/index.rst#mxnet.gluon.rnn.LSTM) are directly inhereting from the `Block` class via common `_RNNLayer` class. That means that networks with such layers cannot be hybridized. But this might change in the future, so stay tuned.\n",
"\n",
"It is important to notice that hybridization has nothing to do with computation on GPU. One can train both hybridized and non-hybridized networks on both CPU and GPU, though hybridized networks would work faster. Though, it is hard to say in advance how much faster it is going to be.\n",
"\n",
"## Adding a custom layer to a network\n",
"\n",
"While it is possible, custom layers are rarely used separately. Most often they are used with predefined layers to create a neural network. Output of one layer is used as an input of another layer.\n",
"\n",
"Depending on which class you used as a base one, you can use either [Sequential](../../../../api/gluon/nn/index.rst#mxnet.gluon.nn.Sequential) or [HybridSequential](../../../../api/gluon/nn/index.rst#mxnet.gluon.nn.HybridSequential) container to form a sequential neural network. By adding layers one by one, one adds dependencies of one layer's input from another layer's output. It is worth noting, that both `Sequential` and `HybridSequential` containers inherit from `Block` and `HybridBlock` respectively. \n",
"\n",
"Below is an example of how to create a simple neural network with a custom layer. In this example, `NormalizationHybridLayer` gets as an input the output from `Dense(5)` layer and pass its output as an input to `Dense(1)` layer."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "91a879f3",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[-0.13601449],\n",
" [ 0.26103738],\n",
" [-0.05046429],\n",
" [-1.2375476 ],\n",
" [-0.15506989]])"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"net = gluon.nn.HybridSequential() # Define a Neural Network as a sequence of hybrid blocks\n",
"net.add(Dense(5)) # Add Dense layer with 5 neurons\n",
"net.add(NormalizationHybridLayer()) # Add a custom layer\n",
"net.add(Dense(1)) # Add Dense layer with 1 neurons\n",
"\n",
"\n",
"net.initialize(mx.init.Xavier(magnitude=2.24)) # Initialize parameters of all layers\n",
"net.hybridize() # Create, optimize and cache computational graph\n",
"input = np.random.uniform(low=-10, high=10, size=(5, 2)) # Create 5 random examples with 2 feature each in range [-10, 10]\n",
"net(input)"
]
},
{
"cell_type": "markdown",
"id": "0e8c69a0",
"metadata": {},
"source": [
"Output:"
]
},
{
"cell_type": "markdown",
"id": "0a6d1e5a",
"metadata": {},
"source": [
"```bash\n",
"[[-0.13601446]\n",
" [ 0.26103732]\n",
" [-0.05046433]\n",
" [-1.2375476 ]\n",
" [-0.15506986]]\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "d0ac3a17",
"metadata": {},
"source": [
"## Parameters of a custom layer\n",
"\n",
"Usually, a layer has a set of associated parameters, sometimes also referred as weights. This is an internal state of a layer. Most often, these parameters are the ones, that we want to learn during backpropogation step, but sometimes these parameters might be just constants we want to use during forward pass. The parameters are usually represented as [Parameter](../../../../api/gluon/parameter.rst#gluon-parameter) class inside of Apache MXNet neural network."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "2eda00a9",
"metadata": {},
"outputs": [],
"source": [
"class NormalizationHybridLayer(gluon.HybridBlock):\n",
" def __init__(self, hidden_units, scales):\n",
" super(NormalizationHybridLayer, self).__init__()\n",
" self.hidden_units = hidden_units\n",
" self.weights = gluon.Parameter('weights',\n",
" shape=(hidden_units, -1),\n",
" allow_deferred_init=True)\n",
"\n",
" self.scales = gluon.Parameter('scales',\n",
" shape=scales.shape,\n",
" init=mx.init.Constant(scales), # Convert to regular list to make this object serializable\n",
" differentiable=False)\n",
" \n",
" def forward(self, x):\n",
" normalized_data = (x - np.min(x)) / (np.max(x) - np.min(x))\n",
" weighted_data = npx.fully_connected(normalized_data, self.weights.data(), num_hidden=self.hidden_units, no_bias=True)\n",
" scaled_data = np.multiply(self.scales.data(), weighted_data)\n",
" return scaled_data\n",
" \n",
" def infer_shape(self, x, *args):\n",
" self.weights.shape = (self.hidden_units, x.shape[x.ndim-1])"
]
},
{
"cell_type": "markdown",
"id": "c0e6a8e5",
"metadata": {},
"source": [
"In the example above 2 set of parameters are defined:\n",
"1. Parameter `weights` is trainable. Its shape is unknown during construction phase and will be infered on the first run of forward propogation; \n",
"1. Parameter `scale` is a constant that doesn't change. Its shape is defined during construction.\n",
"\n",
"Notice a few aspects of this code:\n",
"* Shape is not provided when creating `weights`. Instead it is going to be infered from the shape of the input by `infer_shape` method.\n",
"* `Scales` parameter is initialized and marked as `differentiable=False`.\n",
"\n",
"Running forward pass on this network is very similar to the previous example, so instead of just doing one forward pass, let's run whole training for a few epochs to show that `scales` parameter doesn't change during the training while `weights` parameter is changing."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "9290db62",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=========== Parameters after forward pass ===========\n",
"\n",
"0.weight = [[-0.37410027 -0.46096736]\n",
" [ 0.66630214 0.06184483]\n",
" [-0.59595966 -0.37006742]\n",
" [-0.69023466 0.62416655]\n",
" [ 0.27584368 -0.7502517 ]]\n",
"\n",
"0.bias = [0. 0. 0. 0. 0.]\n",
"\n",
"1.weights = [[-0.3983642 -0.505708 -0.02425683 -0.3133553 -0.35161012]\n",
" [ 0.6467543 0.3918715 -0.6154656 -0.20702496 -0.4243446 ]\n",
" [ 0.6077331 0.03922009 0.13425875 0.5729856 -0.14446527]\n",
" [-0.3572498 0.18545026 -0.09098256 0.5106366 -0.35151464]\n",
" [-0.39846328 0.22245121 0.13075739 0.33387476 -0.10088372]]\n",
"\n",
"1.scales = [2.]\n",
"\n",
"2.weight = [[-0.44562677 -0.51679957 0.53975904 -0.58389556 0.22734201]]\n",
"\n",
"2.bias = [0.]\n",
"\n",
"=========== Parameters after backward pass ===========\n",
"\n",
"0.weight = [[-0.342965 -0.44048083]\n",
" [ 0.6648274 0.06087447]\n",
" [-0.64949214 -0.40121523]\n",
" [-0.65432864 0.64779216]\n",
" [ 0.3275343 -0.71624005]]\n",
"\n",
"0.bias = [-0.00656384 0.0003109 0.0247198 -0.0075696 -0.01089726]\n",
"\n",
"1.weights = [[-0.29839832 -0.47213346 0.08348035 -0.2324698 -0.27368507]\n",
" [ 0.76268613 0.43080837 -0.49052128 -0.11322092 -0.3339738 ]\n",
" [ 0.48665085 -0.00144657 0.00376363 0.4750142 -0.23885089]\n",
" [-0.22626658 0.22944227 0.05018322 0.6166192 -0.24941103]\n",
" [-0.44946212 0.20532274 0.07579394 0.29261002 -0.14063816]]\n",
"\n",
"1.scales = [2.]\n",
"\n",
"2.weight = [[-0.19393581 -0.4308293 0.28927413 -0.52694815 0.22539496]]\n",
"\n",
"2.bias = [-0.17333615]\n",
"\n"
]
}
],
"source": [
"def print_params(title, net):\n",
" \"\"\"\n",
" Helper function to print out the state of parameters of NormalizationHybridLayer\n",
" \"\"\"\n",
" print(title)\n",
" hybridlayer_params = {k: v for k, v in net.collect_params().items()}\n",
" \n",
" for key, value in hybridlayer_params.items():\n",
" print('{} = {}\\n'.format(key, value.data()))\n",
"\n",
"net = gluon.nn.HybridSequential() # Define a Neural Network as a sequence of hybrid blocks\n",
"net.add(Dense(5)) # Add Dense layer with 5 neurons\n",
"net.add(NormalizationHybridLayer(hidden_units=5, \n",
" scales = np.array([2]))) # Add a custom layer\n",
"net.add(Dense(1)) # Add Dense layer with 1 neurons\n",
"\n",
"\n",
"net.initialize(mx.init.Xavier(magnitude=2.24)) # Initialize parameters of all layers\n",
"net.hybridize() # Create, optimize and cache computational graph\n",
"\n",
"input = np.random.uniform(low=-10, high=10, size=(5, 2)) # Create 5 random examples with 2 feature each in range [-10, 10]\n",
"label = np.random.uniform(low=-1, high=1, size=(5, 1))\n",
"\n",
"mse_loss = gluon.loss.L2Loss() # Mean squared error between output and label\n",
"trainer = gluon.Trainer(net.collect_params(), # Init trainer with Stochastic Gradient Descent (sgd) optimization method and parameters for it\n",
" 'sgd', \n",
" {'learning_rate': 0.1, 'momentum': 0.9 })\n",
" \n",
"with autograd.record(): # Autograd records computations done on NDArrays inside \"with\" block \n",
" output = net(input) # Run forward propogation\n",
" \n",
" print_params(\"=========== Parameters after forward pass ===========\\n\", net) \n",
" loss = mse_loss(output, label) # Calculate MSE\n",
" \n",
"loss.backward() # Backward computes gradients and stores them as a separate array within each NDArray in .grad field\n",
"trainer.step(input.shape[0]) # Trainer updates parameters of every block, using .grad field using oprimization method (sgd in this example)\n",
" # We provide batch size that is used as a divider in cost function formula\n",
"print_params(\"=========== Parameters after backward pass ===========\\n\", net)"
]
},
{
"cell_type": "markdown",
"id": "0194c38f",
"metadata": {},
"source": [
"Output:\n",
"\n",
"```bash\n",
"=========== Parameters after forward pass ===========\n",
"\n",
"hybridsequential94_normalizationhybridlayer0_weights = \n",
"[[-0.3983642 -0.505708 -0.02425683 -0.3133553 -0.35161012]\n",
" [ 0.6467543 0.3918715 -0.6154656 -0.20702496 -0.4243446 ]\n",
" [ 0.6077331 0.03922009 0.13425875 0.5729856 -0.14446527]\n",
" [-0.3572498 0.18545026 -0.09098256 0.5106366 -0.35151464]\n",
" [-0.39846328 0.22245121 0.13075739 0.33387476 -0.10088372]]\n",
"\n",
"hybridsequential94_normalizationhybridlayer0_scales = \n",
"[2.]\n",
"\n",
"=========== Parameters after backward pass ===========\n",
"\n",
"hybridsequential94_normalizationhybridlayer0_weights = \n",
"[[-0.29839832 -0.47213346 0.08348035 -0.2324698 -0.27368504]\n",
" [ 0.76268613 0.43080837 -0.49052125 -0.11322092 -0.3339738 ]\n",
" [ 0.48665082 -0.00144657 0.00376363 0.47501418 -0.23885089]\n",
" [-0.22626656 0.22944227 0.05018325 0.6166192 -0.24941102]\n",
" [-0.44946212 0.20532274 0.07579394 0.29261002 -0.14063817]]\n",
"\n",
"hybridsequential94_normalizationhybridlayer0_scales = \n",
"[2.]\n",
"``` \n",
"\n",
"\n",
"As it is seen from the output above, `weights` parameter has been changed by the training and `scales` not.\n",
"\n",
"## Conclusion\n",
"\n",
"One important quality of a Deep learning framework is extensibility. Empowered by flexible abstractions, like `Block` and `HybridBlock`, one can easily extend Apache MxNet functionality to match its needs."
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 5
}