blob: 745ab749377132a564c63f0dbea3203d1238e827 [file] [log] [blame]
"cells": [
"cell_type": "markdown",
"id": "dadf3446",
"metadata": {},
"source": [
"<!--- Licensed to the Apache Software Foundation (ASF) under one -->\n",
"<!--- or more contributor license agreements. See the NOTICE file -->\n",
"<!--- distributed with this work for additional information -->\n",
"<!--- regarding copyright ownership. The ASF licenses this file -->\n",
"<!--- to you under the Apache License, Version 2.0 (the -->\n",
"<!--- \"License\"); you may not use this file except in compliance -->\n",
"<!--- with the License. You may obtain a copy of the License at -->\n",
"<!--- -->\n",
"<!--- Unless required by applicable law or agreed to in writing, -->\n",
"<!--- software distributed under the License is distributed on an -->\n",
"<!--- KIND, either express or implied. See the License for the -->\n",
"<!--- specific language governing permissions and limitations -->\n",
"<!--- under the License. -->\n",
"# Create a neural network\n",
"Now let's look how to create neural networks in Gluon. In addition the NDArray package (`nd`) that we just covered, we now will also import the neural network `nn` package from `gluon`."
"cell_type": "code",
"execution_count": 2,
"id": "ce7c82ed",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "2"
"outputs": [],
"source": [
"from mxnet import nd\n",
"from mxnet.gluon import nn"
"cell_type": "markdown",
"id": "6ddef1f0",
"metadata": {},
"source": [
"## Create your neural network's first layer\n",
"Let's start with a dense layer with 2 output units.\n",
"<!-- mention what the none and the linear parts mean? -->"
"cell_type": "code",
"execution_count": 31,
"id": "6dc844d5",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "31"
"outputs": [],
"source": [
"layer = nn.Dense(2)\n",
"cell_type": "markdown",
"id": "005d9230",
"metadata": {},
"source": [
"Then initialize its weights with the default initialization method, which draws random values uniformly from $[-0.7, 0.7]$."
"cell_type": "code",
"execution_count": 32,
"id": "5758202b",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "32"
"outputs": [],
"source": [
"cell_type": "markdown",
"id": "f740f932",
"metadata": {},
"source": [
"Then we do a forward pass with random data. We create a $(3,4)$ shape random input `x` and feed into the layer to compute the output."
"cell_type": "code",
"execution_count": 34,
"id": "69055a09",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "34"
"outputs": [],
"source": [
"x = nd.random.uniform(-1,1,(3,4))\n",
"cell_type": "markdown",
"id": "116e31b6",
"metadata": {},
"source": [
"As can be seen, the layer's input limit of 2 produced a $(3,2)$ shape output from our $(3,4)$ input. Note that we didn't specify the input size of `layer` before (though we can specify it with the argument `in_units=4` here), the system will automatically infer it during the first time we feed in data, create and initialize the weights. So we can access the weight after the first forward pass:"
"cell_type": "code",
"execution_count": 35,
"id": "99204996",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "35"
"outputs": [],
"source": [
"cell_type": "markdown",
"id": "4223d6ee",
"metadata": {},
"source": [
"## Chain layers into a neural network\n",
"Let's first consider a simple case that a neural network is a chain of layers. During the forward pass, we run layers sequentially one-by-one. The following code implements a famous network called [LeNet]( through `nn.Sequential`."
"cell_type": "code",
"execution_count": null,
"id": "cd261b4d",
"metadata": {},
"outputs": [],
"source": [
"net = nn.Sequential()\n",
"# Add a sequence of layers.\n",
"net.add(# Similar to Dense, it is not necessary to specify the input channels\n",
" # by the argument `in_channels`, which will be automatically inferred\n",
" # in the first forward pass. Also, we apply a relu activation on the\n",
" # output. In addition, we can use a tuple to specify a non-square\n",
" # kernel size, such as `kernel_size=(2,4)`\n",
" nn.Conv2D(channels=6, kernel_size=5, activation='relu'),\n",
" # One can also use a tuple to specify non-symmetric pool and stride sizes\n",
" nn.MaxPool2D(pool_size=2, strides=2),\n",
" nn.Conv2D(channels=16, kernel_size=3, activation='relu'),\n",
" nn.MaxPool2D(pool_size=2, strides=2),\n",
" # The dense layer will automatically reshape the 4-D output of last\n",
" # max pooling layer into the 2-D shape: (x.shape[0], x.size/x.shape[0])\n",
" nn.Dense(120, activation=\"relu\"),\n",
" nn.Dense(84, activation=\"relu\"),\n",
" nn.Dense(10))\n",
"cell_type": "markdown",
"id": "3b57b1bb",
"metadata": {},
"source": [
"<!--Mention the tuple option for kernel and stride as an exercise for the reader? Or leave it out as too much info for now?-->\n",
"The usage of `nn.Sequential` is similar to `nn.Dense`. In fact, both of them are subclasses of `nn.Block`. The following codes show how to initialize the weights and run the forward pass."
"cell_type": "code",
"execution_count": null,
"id": "fa8c45c1",
"metadata": {},
"outputs": [],
"source": [
"# Input shape is (batch_size, color_channels, height, width)\n",
"x = nd.random.uniform(shape=(4,1,28,28))\n",
"y = net(x)\n",
"cell_type": "markdown",
"id": "d2db1658",
"metadata": {},
"source": [
"We can use `[]` to index a particular layer. For example, the following\n",
"accesses the 1st layer's weight and 6th layer's bias."
"cell_type": "code",
"execution_count": null,
"id": "0c4f2799",
"metadata": {},
"outputs": [],
"source": [
"(net[0], net[5]"
"cell_type": "markdown",
"id": "b742bd85",
"metadata": {},
"source": [
"## Create a neural network flexibly\n",
"In `nn.Sequential`, MXNet will automatically construct the forward function that sequentially executes added layers.\n",
"Now let's introduce another way to construct a network with a flexible forward function.\n",
"To do it, we create a subclass of `nn.Block` and implement two methods:\n",
"- `__init__` create the layers\n",
"- `forward` define the forward function."
"cell_type": "code",
"execution_count": 6,
"id": "e9b85246",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "6"
"outputs": [],
"source": [
"class MixMLP(nn.Block):\n",
" def __init__(self, **kwargs):\n",
" # Run `nn.Block`'s init method\n",
" super(MixMLP, self).__init__(**kwargs)\n",
" self.blk = nn.Sequential()\n",
" self.blk.add(nn.Dense(3, activation='relu'),\n",
" nn.Dense(4, activation='relu'))\n",
" self.dense = nn.Dense(5)\n",
" def forward(self, x):\n",
" y = nd.relu(self.blk(x))\n",
" print(y)\n",
" return self.dense(y)\n",
"net = MixMLP()\n",
"cell_type": "markdown",
"id": "55424520",
"metadata": {},
"source": [
"In the sequential chaining approach, we can only add instances with `nn.Block` as the base class and then run them in a forward pass. In this example, we used `print` to get the intermediate results and `nd.relu` to apply relu activation. So this approach provides a more flexible way to define the forward function.\n",
"The usage of `net` is similar as before."
"cell_type": "code",
"execution_count": null,
"id": "81472305",
"metadata": {},
"outputs": [],
"source": [
"x = nd.random.uniform(shape=(2,2))\n",
"cell_type": "markdown",
"id": "cea065b0",
"metadata": {},
"source": [
"Finally, let's access a particular layer's weight"
"cell_type": "code",
"execution_count": 8,
"id": "6bb337c9",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "8"
"outputs": [],
"source": [
"metadata": {
"language_info": {
"name": "python"
"nbformat": 4,
"nbformat_minor": 5