blob: c71a296b64ea27d9bc2743459d428aa4df766477 [file] [log] [blame]
{
"cells": [
{
"cell_type": "markdown",
"id": "cbf11d31",
"metadata": {},
"source": [
"<!--- Licensed to the Apache Software Foundation (ASF) under one -->\n",
"<!--- or more contributor license agreements. See the NOTICE file -->\n",
"<!--- distributed with this work for additional information -->\n",
"<!--- regarding copyright ownership. The ASF licenses this file -->\n",
"<!--- to you under the Apache License, Version 2.0 (the -->\n",
"<!--- \"License\"); you may not use this file except in compliance -->\n",
"<!--- with the License. You may obtain a copy of the License at -->\n",
"\n",
"<!--- http://www.apache.org/licenses/LICENSE-2.0 -->\n",
"\n",
"<!--- Unless required by applicable law or agreed to in writing, -->\n",
"<!--- software distributed under the License is distributed on an -->\n",
"<!--- \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->\n",
"<!--- KIND, either express or implied. See the License for the -->\n",
"<!--- specific language governing permissions and limitations -->\n",
"<!--- under the License. -->\n",
"\n",
"# Create a neural network\n",
"\n",
"Now let's look how to create neural networks in Gluon. In addition the NDArray package (`nd`) that we just covered, we now will also import the neural network `nn` package from `gluon`."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "9b9c386e",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "2"
}
},
"outputs": [],
"source": [
"from mxnet import nd\n",
"from mxnet.gluon import nn"
]
},
{
"cell_type": "markdown",
"id": "a1e7d529",
"metadata": {},
"source": [
"## Create your neural network's first layer\n",
"\n",
"Let's start with a dense layer with 2 output units.\n",
"<!-- mention what the none and the linear parts mean? -->"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "4ce11cb4",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "31"
}
},
"outputs": [],
"source": [
"layer = nn.Dense(2)\n",
"layer"
]
},
{
"cell_type": "markdown",
"id": "ed76262a",
"metadata": {},
"source": [
"Then initialize its weights with the default initialization method, which draws random values uniformly from $[-0.7, 0.7]$."
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "6d215548",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "32"
}
},
"outputs": [],
"source": [
"layer.initialize()"
]
},
{
"cell_type": "markdown",
"id": "f0892060",
"metadata": {},
"source": [
"Then we do a forward pass with random data. We create a $(3,4)$ shape random input `x` and feed into the layer to compute the output."
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "7f9f3d5b",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "34"
}
},
"outputs": [],
"source": [
"x = nd.random.uniform(-1,1,(3,4))\n",
"layer(x)"
]
},
{
"cell_type": "markdown",
"id": "70c64d26",
"metadata": {},
"source": [
"As can be seen, the layer's input limit of 2 produced a $(3,2)$ shape output from our $(3,4)$ input. Note that we didn't specify the input size of `layer` before (though we can specify it with the argument `in_units=4` here), the system will automatically infer it during the first time we feed in data, create and initialize the weights. So we can access the weight after the first forward pass:"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "c2b6a50d",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "35"
}
},
"outputs": [],
"source": [
"layer.weight.data()"
]
},
{
"cell_type": "markdown",
"id": "340bf945",
"metadata": {},
"source": [
"## Chain layers into a neural network\n",
"\n",
"Let's first consider a simple case that a neural network is a chain of layers. During the forward pass, we run layers sequentially one-by-one. The following code implements a famous network called [LeNet](http://yann.lecun.com/exdb/lenet/) through `nn.Sequential`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "88aceeff",
"metadata": {},
"outputs": [],
"source": [
"net = nn.Sequential()\n",
"# Add a sequence of layers.\n",
"net.add(# Similar to Dense, it is not necessary to specify the input channels\n",
" # by the argument `in_channels`, which will be automatically inferred\n",
" # in the first forward pass. Also, we apply a relu activation on the\n",
" # output. In addition, we can use a tuple to specify a non-square\n",
" # kernel size, such as `kernel_size=(2,4)`\n",
" nn.Conv2D(channels=6, kernel_size=5, activation='relu'),\n",
" # One can also use a tuple to specify non-symmetric pool and stride sizes\n",
" nn.MaxPool2D(pool_size=2, strides=2),\n",
" nn.Conv2D(channels=16, kernel_size=3, activation='relu'),\n",
" nn.MaxPool2D(pool_size=2, strides=2),\n",
" # The dense layer will automatically reshape the 4-D output of last\n",
" # max pooling layer into the 2-D shape: (x.shape[0], x.size/x.shape[0])\n",
" nn.Dense(120, activation=\"relu\"),\n",
" nn.Dense(84, activation=\"relu\"),\n",
" nn.Dense(10))\n",
"net"
]
},
{
"cell_type": "markdown",
"id": "65f3354f",
"metadata": {},
"source": [
"<!--Mention the tuple option for kernel and stride as an exercise for the reader? Or leave it out as too much info for now?-->\n",
"\n",
"The usage of `nn.Sequential` is similar to `nn.Dense`. In fact, both of them are subclasses of `nn.Block`. The following codes show how to initialize the weights and run the forward pass."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f4b217de",
"metadata": {},
"outputs": [],
"source": [
"net.initialize()\n",
"# Input shape is (batch_size, color_channels, height, width)\n",
"x = nd.random.uniform(shape=(4,1,28,28))\n",
"y = net(x)\n",
"y.shape"
]
},
{
"cell_type": "markdown",
"id": "11ef05f2",
"metadata": {},
"source": [
"We can use `[]` to index a particular layer. For example, the following\n",
"accesses the 1st layer's weight and 6th layer's bias."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bd37889a",
"metadata": {},
"outputs": [],
"source": [
"(net[0].weight.data().shape, net[5].bias.data().shape)"
]
},
{
"cell_type": "markdown",
"id": "dd18f332",
"metadata": {},
"source": [
"## Create a neural network flexibly\n",
"\n",
"In `nn.Sequential`, MXNet will automatically construct the forward function that sequentially executes added layers.\n",
"Now let's introduce another way to construct a network with a flexible forward function.\n",
"\n",
"To do it, we create a subclass of `nn.Block` and implement two methods:\n",
"\n",
"- `__init__` create the layers\n",
"- `forward` define the forward function."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "31c26b8d",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "6"
}
},
"outputs": [],
"source": [
"class MixMLP(nn.Block):\n",
" def __init__(self, **kwargs):\n",
" # Run `nn.Block`'s init method\n",
" super(MixMLP, self).__init__(**kwargs)\n",
" self.blk = nn.Sequential()\n",
" self.blk.add(nn.Dense(3, activation='relu'),\n",
" nn.Dense(4, activation='relu'))\n",
" self.dense = nn.Dense(5)\n",
" def forward(self, x):\n",
" y = nd.relu(self.blk(x))\n",
" print(y)\n",
" return self.dense(y)\n",
"\n",
"net = MixMLP()\n",
"net"
]
},
{
"cell_type": "markdown",
"id": "f4d17c1d",
"metadata": {},
"source": [
"In the sequential chaining approach, we can only add instances with `nn.Block` as the base class and then run them in a forward pass. In this example, we used `print` to get the intermediate results and `nd.relu` to apply relu activation. So this approach provides a more flexible way to define the forward function.\n",
"\n",
"The usage of `net` is similar as before."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7e11abda",
"metadata": {},
"outputs": [],
"source": [
"net.initialize()\n",
"x = nd.random.uniform(shape=(2,2))\n",
"net(x)"
]
},
{
"cell_type": "markdown",
"id": "f44bb603",
"metadata": {},
"source": [
"Finally, let's access a particular layer's weight"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "8f4ec9c2",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "8"
}
},
"outputs": [],
"source": [
"net.blk[1].weight.data()"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 5
}