blob: 9a2c4786b4e456a5c26e1bb5972382272a3cb2c1 [file] [log] [blame]
{
"cells": [
{
"cell_type": "markdown",
"id": "4bd1d0e8",
"metadata": {},
"source": [
"<!--- Licensed to the Apache Software Foundation (ASF) under one -->\n",
"<!--- or more contributor license agreements. See the NOTICE file -->\n",
"<!--- distributed with this work for additional information -->\n",
"<!--- regarding copyright ownership. The ASF licenses this file -->\n",
"<!--- to you under the Apache License, Version 2.0 (the -->\n",
"<!--- \"License\"); you may not use this file except in compliance -->\n",
"<!--- with the License. You may obtain a copy of the License at -->\n",
"\n",
"<!--- http://www.apache.org/licenses/LICENSE-2.0 -->\n",
"\n",
"<!--- Unless required by applicable law or agreed to in writing, -->\n",
"<!--- software distributed under the License is distributed on an -->\n",
"<!--- \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->\n",
"<!--- KIND, either express or implied. See the License for the -->\n",
"<!--- specific language governing permissions and limitations -->\n",
"<!--- under the License. -->\n",
"\n",
"# Custom Layers\n",
"\n",
"<!-- adapted from diveintodeeplearning -->\n",
"\n",
"One of the reasons for the success of deep learning can be found in the wide range of re-usable layers that can be used in a deep network. This allows for a tremendous degree of customization and adaptation. Sooner or later you will encounter a layer that doesn't exist yet in Gluon or one that you want to create. This is when it's time to build a custom layer. This section shows you how.\n",
"\n",
"Defining a layer is as easy as subclassing [nn.Block](/api/gluon/mxnet.gluon.nn.Block.html#mxnet.gluon.nn.Block) or [nn.HybridBlock](/api/gluon/mxnet.gluon.nn.HybridBlock.html#mxnet.gluon.nn.HybridBlock) and implementing `forward` or `hybrid_forward`, respectively. To take advantage of the performance gains with `nn.HybridBlock` see the section on [Hybridization](hybridize.html). \n",
"\n",
"Note that we've gone through rationale for defining layers, but `nn.Block`'s work even for non-sequential network. In fact, you can use a `Block` to encapsualte any re-usable architecture you want.\n",
"\n",
"We will discuss making custom layers using `nn.Block` below. \n",
"\n",
"## Layers without Parameters\n",
"\n",
"Since this is slightly intricate, we start with a custom layer that doesn't have any inherent parameters. Our first step is very similar to when we [introduced blocks](nn.md) previously. The following `CenteredLayer` class constructs a layer that subtracts the mean from the input. We build it by inheriting from the `Block` class and overriding the `forward` and `__init__` methods."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "44392544",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "1"
}
},
"outputs": [],
"source": [
"from mxnet import gluon, nd\n",
"from mxnet.gluon import nn\n",
"\n",
"\n",
"class CenteredLayer(nn.Block):\n",
" def __init__(self, **kwargs):\n",
" super(CenteredLayer, self).__init__(**kwargs)\n",
"\n",
" def forward(self, x):\n",
" return x - x.mean()"
]
},
{
"cell_type": "markdown",
"id": "19047ebe",
"metadata": {},
"source": [
"To see how it works let's feed some data into the layer."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "46a22ea6",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "2"
}
},
"outputs": [],
"source": [
"layer = CenteredLayer()\n",
"layer(nd.array([1, 2, 3, 4, 5]))\n",
"print(layer)"
]
},
{
"cell_type": "markdown",
"id": "5a3d98c4",
"metadata": {},
"source": [
"We can also use it to construct more complex models."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "4da4ce80",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "3"
}
},
"outputs": [],
"source": [
"net = nn.Sequential()\n",
"net.add(nn.Dense(128), \n",
" CenteredLayer())\n",
"net.initialize()\n",
"print(net)"
]
},
{
"cell_type": "markdown",
"id": "55f40d53",
"metadata": {},
"source": [
"Let's see whether the centering layer did its job. For that we send random data through the network and check whether the mean is $0$. Note that since we're dealing with floating point numbers, we're going to see a very small albeit typically nonzero number."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "89b950c3",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "4"
}
},
"outputs": [],
"source": [
"y = net(nd.random.uniform(shape=(4, 8)))\n",
"y.mean().asscalar()"
]
},
{
"cell_type": "markdown",
"id": "72e4cf22",
"metadata": {},
"source": [
"## Layers with Parameters\n",
"\n",
"Now that we know how to define layers in principle, let's define layers with parameters. These can be adjusted through training. In order to simplify things for an avid deep learning researcher, the [Parameter](/api/gluon/mxnet.gluon.parameter.html) class and the `ParameterDict` dictionary provide some basic housekeeping functionality. In particular, they govern access, initialization, sharing, saving and loading model parameters. For instance, this way we don't need to write custom serialization routines for each new custom layer.\n",
"\n",
"We can access the parameters via the `params` variable of the `ParameterDict` in `Block`. The parameter dictionary is just that - a dictionary that maps string type parameter names to model parameters in the `Parameter` type. We can create a `Parameter` instance from `ParameterDict` via the `get` function which attempts to retrieve a parameter, or create it if not found."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "72354ad3",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "7"
}
},
"outputs": [],
"source": [
"params = gluon.ParameterDict()\n",
"params.get('param2', shape=(2, 3))\n",
"print(params)"
]
},
{
"cell_type": "markdown",
"id": "e72e7a8f",
"metadata": {},
"source": [
"Let's use this to implement our own version of the dense layer. It has two parameters - bias and weight. To make it a bit nonstandard, we bake in the ReLU activation as default. Next, we implement a fully connected layer with both weight and bias parameters. It uses ReLU as an activation function, where `in_units` and `units` are the number of inputs and the number of outputs, respectively."
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "ffa97a9a",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "19"
}
},
"outputs": [],
"source": [
"class MyDense(nn.Block):\n",
"\n",
" def __init__(self, units, in_units, **kwargs):\n",
" # units: the number of outputs in this layer\n",
" # in_units: the number of inputs in this layer\n",
"\n",
" super(MyDense, self).__init__(**kwargs)\n",
" self.weight = self.params.get('weight', shape=(in_units, units))\n",
" self.bias = self.params.get('bias', shape=(units,))\n",
"\n",
" def forward(self, x):\n",
" linear = nd.dot(x, self.weight.data()) + self.bias.data()\n",
" return nd.relu(linear)"
]
},
{
"cell_type": "markdown",
"id": "db42d28c",
"metadata": {},
"source": [
"Naming the parameters allows us to access them by name through dictionary lookup later. It's a good idea to give them instructive names. Next, we instantiate the `MyDense` class and access its model parameters."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9277bff0",
"metadata": {},
"outputs": [],
"source": [
"dense = MyDense(units=3, in_units=5)\n",
"dense.params"
]
},
{
"cell_type": "markdown",
"id": "2387ae81",
"metadata": {},
"source": [
"We can directly carry out forward calculations using custom layers."
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "396fad9f",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "20"
}
},
"outputs": [],
"source": [
"dense.initialize()\n",
"dense(nd.random.uniform(shape=(2, 5)))\n",
"print(dense)"
]
},
{
"cell_type": "markdown",
"id": "11ecd539",
"metadata": {},
"source": [
"We can also construct models using custom layers. Once we have that we can use it just like the built-in dense layer. The only exception is that in our case, shape inference is not automatic as we have explicitly defined the shape of the weight matrix during initialization."
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "ccfb3936",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "19"
}
},
"outputs": [],
"source": [
"net = nn.Sequential()\n",
"net.add(MyDense(8, in_units=64),\n",
" MyDense(1, in_units=8))\n",
"net.initialize()\n",
"net(nd.random.uniform(shape=(2, 64)))\n",
"print(net)"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 5
}