| { |
| "cells": [ |
| { |
| "cell_type": "markdown", |
| "id": "cbf11d31", |
| "metadata": {}, |
| "source": [ |
| "<!--- Licensed to the Apache Software Foundation (ASF) under one -->\n", |
| "<!--- or more contributor license agreements. See the NOTICE file -->\n", |
| "<!--- distributed with this work for additional information -->\n", |
| "<!--- regarding copyright ownership. The ASF licenses this file -->\n", |
| "<!--- to you under the Apache License, Version 2.0 (the -->\n", |
| "<!--- \"License\"); you may not use this file except in compliance -->\n", |
| "<!--- with the License. You may obtain a copy of the License at -->\n", |
| "\n", |
| "<!--- http://www.apache.org/licenses/LICENSE-2.0 -->\n", |
| "\n", |
| "<!--- Unless required by applicable law or agreed to in writing, -->\n", |
| "<!--- software distributed under the License is distributed on an -->\n", |
| "<!--- \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->\n", |
| "<!--- KIND, either express or implied. See the License for the -->\n", |
| "<!--- specific language governing permissions and limitations -->\n", |
| "<!--- under the License. -->\n", |
| "\n", |
| "# Create a neural network\n", |
| "\n", |
| "Now let's look how to create neural networks in Gluon. In addition the NDArray package (`nd`) that we just covered, we now will also import the neural network `nn` package from `gluon`." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 2, |
| "id": "9b9c386e", |
| "metadata": { |
| "attributes": { |
| "classes": [], |
| "id": "", |
| "n": "2" |
| } |
| }, |
| "outputs": [], |
| "source": [ |
| "from mxnet import nd\n", |
| "from mxnet.gluon import nn" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "a1e7d529", |
| "metadata": {}, |
| "source": [ |
| "## Create your neural network's first layer\n", |
| "\n", |
| "Let's start with a dense layer with 2 output units.\n", |
| "<!-- mention what the none and the linear parts mean? -->" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 31, |
| "id": "4ce11cb4", |
| "metadata": { |
| "attributes": { |
| "classes": [], |
| "id": "", |
| "n": "31" |
| } |
| }, |
| "outputs": [], |
| "source": [ |
| "layer = nn.Dense(2)\n", |
| "layer" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "ed76262a", |
| "metadata": {}, |
| "source": [ |
| "Then initialize its weights with the default initialization method, which draws random values uniformly from $[-0.7, 0.7]$." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 32, |
| "id": "6d215548", |
| "metadata": { |
| "attributes": { |
| "classes": [], |
| "id": "", |
| "n": "32" |
| } |
| }, |
| "outputs": [], |
| "source": [ |
| "layer.initialize()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "f0892060", |
| "metadata": {}, |
| "source": [ |
| "Then we do a forward pass with random data. We create a $(3,4)$ shape random input `x` and feed into the layer to compute the output." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 34, |
| "id": "7f9f3d5b", |
| "metadata": { |
| "attributes": { |
| "classes": [], |
| "id": "", |
| "n": "34" |
| } |
| }, |
| "outputs": [], |
| "source": [ |
| "x = nd.random.uniform(-1,1,(3,4))\n", |
| "layer(x)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "70c64d26", |
| "metadata": {}, |
| "source": [ |
| "As can be seen, the layer's input limit of 2 produced a $(3,2)$ shape output from our $(3,4)$ input. Note that we didn't specify the input size of `layer` before (though we can specify it with the argument `in_units=4` here), the system will automatically infer it during the first time we feed in data, create and initialize the weights. So we can access the weight after the first forward pass:" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 35, |
| "id": "c2b6a50d", |
| "metadata": { |
| "attributes": { |
| "classes": [], |
| "id": "", |
| "n": "35" |
| } |
| }, |
| "outputs": [], |
| "source": [ |
| "layer.weight.data()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "340bf945", |
| "metadata": {}, |
| "source": [ |
| "## Chain layers into a neural network\n", |
| "\n", |
| "Let's first consider a simple case that a neural network is a chain of layers. During the forward pass, we run layers sequentially one-by-one. The following code implements a famous network called [LeNet](http://yann.lecun.com/exdb/lenet/) through `nn.Sequential`." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "88aceeff", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "net = nn.Sequential()\n", |
| "# Add a sequence of layers.\n", |
| "net.add(# Similar to Dense, it is not necessary to specify the input channels\n", |
| " # by the argument `in_channels`, which will be automatically inferred\n", |
| " # in the first forward pass. Also, we apply a relu activation on the\n", |
| " # output. In addition, we can use a tuple to specify a non-square\n", |
| " # kernel size, such as `kernel_size=(2,4)`\n", |
| " nn.Conv2D(channels=6, kernel_size=5, activation='relu'),\n", |
| " # One can also use a tuple to specify non-symmetric pool and stride sizes\n", |
| " nn.MaxPool2D(pool_size=2, strides=2),\n", |
| " nn.Conv2D(channels=16, kernel_size=3, activation='relu'),\n", |
| " nn.MaxPool2D(pool_size=2, strides=2),\n", |
| " # The dense layer will automatically reshape the 4-D output of last\n", |
| " # max pooling layer into the 2-D shape: (x.shape[0], x.size/x.shape[0])\n", |
| " nn.Dense(120, activation=\"relu\"),\n", |
| " nn.Dense(84, activation=\"relu\"),\n", |
| " nn.Dense(10))\n", |
| "net" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "65f3354f", |
| "metadata": {}, |
| "source": [ |
| "<!--Mention the tuple option for kernel and stride as an exercise for the reader? Or leave it out as too much info for now?-->\n", |
| "\n", |
| "The usage of `nn.Sequential` is similar to `nn.Dense`. In fact, both of them are subclasses of `nn.Block`. The following codes show how to initialize the weights and run the forward pass." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "f4b217de", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "net.initialize()\n", |
| "# Input shape is (batch_size, color_channels, height, width)\n", |
| "x = nd.random.uniform(shape=(4,1,28,28))\n", |
| "y = net(x)\n", |
| "y.shape" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "11ef05f2", |
| "metadata": {}, |
| "source": [ |
| "We can use `[]` to index a particular layer. For example, the following\n", |
| "accesses the 1st layer's weight and 6th layer's bias." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "bd37889a", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "(net[0].weight.data().shape, net[5].bias.data().shape)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "dd18f332", |
| "metadata": {}, |
| "source": [ |
| "## Create a neural network flexibly\n", |
| "\n", |
| "In `nn.Sequential`, MXNet will automatically construct the forward function that sequentially executes added layers.\n", |
| "Now let's introduce another way to construct a network with a flexible forward function.\n", |
| "\n", |
| "To do it, we create a subclass of `nn.Block` and implement two methods:\n", |
| "\n", |
| "- `__init__` create the layers\n", |
| "- `forward` define the forward function." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 6, |
| "id": "31c26b8d", |
| "metadata": { |
| "attributes": { |
| "classes": [], |
| "id": "", |
| "n": "6" |
| } |
| }, |
| "outputs": [], |
| "source": [ |
| "class MixMLP(nn.Block):\n", |
| " def __init__(self, **kwargs):\n", |
| " # Run `nn.Block`'s init method\n", |
| " super(MixMLP, self).__init__(**kwargs)\n", |
| " self.blk = nn.Sequential()\n", |
| " self.blk.add(nn.Dense(3, activation='relu'),\n", |
| " nn.Dense(4, activation='relu'))\n", |
| " self.dense = nn.Dense(5)\n", |
| " def forward(self, x):\n", |
| " y = nd.relu(self.blk(x))\n", |
| " print(y)\n", |
| " return self.dense(y)\n", |
| "\n", |
| "net = MixMLP()\n", |
| "net" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "f4d17c1d", |
| "metadata": {}, |
| "source": [ |
| "In the sequential chaining approach, we can only add instances with `nn.Block` as the base class and then run them in a forward pass. In this example, we used `print` to get the intermediate results and `nd.relu` to apply relu activation. So this approach provides a more flexible way to define the forward function.\n", |
| "\n", |
| "The usage of `net` is similar as before." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "7e11abda", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "net.initialize()\n", |
| "x = nd.random.uniform(shape=(2,2))\n", |
| "net(x)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "f44bb603", |
| "metadata": {}, |
| "source": [ |
| "Finally, let's access a particular layer's weight" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 8, |
| "id": "8f4ec9c2", |
| "metadata": { |
| "attributes": { |
| "classes": [], |
| "id": "", |
| "n": "8" |
| } |
| }, |
| "outputs": [], |
| "source": [ |
| "net.blk[1].weight.data()" |
| ] |
| } |
| ], |
| "metadata": { |
| "language_info": { |
| "name": "python" |
| } |
| }, |
| "nbformat": 4, |
| "nbformat_minor": 5 |
| } |