versions/1.9.1/api/python/docs/_sources/tutorials/getting-started/crash-course/2-nn.ipynb - mxnet-site - Git at Google

 {
  "cells": [
   {
    "cell_type": "markdown",
    "id": "cbf11d31",
    "metadata": {},
    "source": [
     "<!--- Licensed to the Apache Software Foundation (ASF) under one -->\n",
     "<!--- or more contributor license agreements.  See the NOTICE file -->\n",
     "<!--- distributed with this work for additional information -->\n",
     "<!--- regarding copyright ownership.  The ASF licenses this file -->\n",
     "<!--- to you under the Apache License, Version 2.0 (the -->\n",
     "<!--- \"License\"); you may not use this file except in compliance -->\n",
     "<!--- with the License.  You may obtain a copy of the License at -->\n",
     "\n",
     "<!---   http://www.apache.org/licenses/LICENSE-2.0 -->\n",
     "\n",
     "<!--- Unless required by applicable law or agreed to in writing, -->\n",
     "<!--- software distributed under the License is distributed on an -->\n",
     "<!--- \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->\n",
     "<!--- KIND, either express or implied.  See the License for the -->\n",
     "<!--- specific language governing permissions and limitations -->\n",
     "<!--- under the License. -->\n",
     "\n",
     "# Create a neural network\n",
     "\n",
     "Now let's look how to create neural networks in Gluon. In addition the NDArray package (`nd`) that we just covered, we now will also import the neural network `nn` package from `gluon`."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 2,
    "id": "9b9c386e",
    "metadata": {
     "attributes": {
      "classes": [],
      "id": "",
      "n": "2"
     }
    },
    "outputs": [],
    "source": [
     "from mxnet import nd\n",
     "from mxnet.gluon import nn"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "a1e7d529",
    "metadata": {},
    "source": [
     "## Create your neural network's first layer\n",
     "\n",
     "Let's start with a dense layer with 2 output units.\n",
     "<!-- mention what the none and the linear parts mean? -->"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 31,
    "id": "4ce11cb4",
    "metadata": {
     "attributes": {
      "classes": [],
      "id": "",
      "n": "31"
     }
    },
    "outputs": [],
    "source": [
     "layer = nn.Dense(2)\n",
     "layer"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "ed76262a",
    "metadata": {},
    "source": [
     "Then initialize its weights with the default initialization method, which draws random values uniformly from $[-0.7, 0.7]$."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 32,
    "id": "6d215548",
    "metadata": {
     "attributes": {
      "classes": [],
      "id": "",
      "n": "32"
     }
    },
    "outputs": [],
    "source": [
     "layer.initialize()"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "f0892060",
    "metadata": {},
    "source": [
     "Then we do a forward pass with random data. We create a $(3,4)$ shape random input `x` and feed into the layer to compute the output."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 34,
    "id": "7f9f3d5b",
    "metadata": {
     "attributes": {
      "classes": [],
      "id": "",
      "n": "34"
     }
    },
    "outputs": [],
    "source": [
     "x = nd.random.uniform(-1,1,(3,4))\n",
     "layer(x)"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "70c64d26",
    "metadata": {},
    "source": [
     "As can be seen, the layer's input limit of 2 produced a $(3,2)$ shape output from our $(3,4)$ input. Note that we didn't specify the input size of `layer` before (though we can specify it with the argument `in_units=4` here), the system will automatically infer it during the first time we feed in data, create and initialize the weights. So we can access the weight after the first forward pass:"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 35,
    "id": "c2b6a50d",
    "metadata": {
     "attributes": {
      "classes": [],
      "id": "",
      "n": "35"
     }
    },
    "outputs": [],
    "source": [
     "layer.weight.data()"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "340bf945",
    "metadata": {},
    "source": [
     "## Chain layers into a neural network\n",
     "\n",
     "Let's first consider a simple case that a neural network is a chain of layers. During the forward pass, we run layers sequentially one-by-one. The following code implements a famous network called [LeNet](http://yann.lecun.com/exdb/lenet/) through `nn.Sequential`."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "88aceeff",
    "metadata": {},
    "outputs": [],
    "source": [
     "net = nn.Sequential()\n",
     "# Add a sequence of layers.\n",
     "net.add(# Similar to Dense, it is not necessary to specify the input channels\n",
     "        # by the argument `in_channels`, which will be  automatically inferred\n",
     "        # in the first forward pass. Also, we apply a relu activation on the\n",
     "        # output. In addition, we can use a tuple to specify a  non-square\n",
     "        # kernel size, such as `kernel_size=(2,4)`\n",
     "        nn.Conv2D(channels=6, kernel_size=5, activation='relu'),\n",
     "        # One can also use a tuple to specify non-symmetric pool and stride sizes\n",
     "        nn.MaxPool2D(pool_size=2, strides=2),\n",
     "        nn.Conv2D(channels=16, kernel_size=3, activation='relu'),\n",
     "        nn.MaxPool2D(pool_size=2, strides=2),\n",
     "        # The dense layer will automatically reshape the 4-D output of last\n",
     "        # max pooling layer into the 2-D shape: (x.shape[0], x.size/x.shape[0])\n",
     "        nn.Dense(120, activation=\"relu\"),\n",
     "        nn.Dense(84, activation=\"relu\"),\n",
     "        nn.Dense(10))\n",
     "net"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "65f3354f",
    "metadata": {},
    "source": [
     "<!--Mention the tuple option for kernel and stride as an exercise for the reader? Or leave it out as too much info for now?-->\n",
     "\n",
     "The usage of `nn.Sequential` is similar to `nn.Dense`. In fact, both of them are subclasses of `nn.Block`. The following codes show how to initialize the weights and run the forward pass."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "f4b217de",
    "metadata": {},
    "outputs": [],
    "source": [
     "net.initialize()\n",
     "# Input shape is (batch_size, color_channels, height, width)\n",
     "x = nd.random.uniform(shape=(4,1,28,28))\n",
     "y = net(x)\n",
     "y.shape"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "11ef05f2",
    "metadata": {},
    "source": [
     "We can use `[]` to index a particular layer. For example, the following\n",
     "accesses the 1st layer's weight and 6th layer's bias."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "bd37889a",
    "metadata": {},
    "outputs": [],
    "source": [
     "(net[0].weight.data().shape, net[5].bias.data().shape)"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "dd18f332",
    "metadata": {},
    "source": [
     "## Create a neural network flexibly\n",
     "\n",
     "In `nn.Sequential`, MXNet will automatically construct the forward function that sequentially executes added layers.\n",
     "Now let's introduce another way to construct a network with a flexible forward function.\n",
     "\n",
     "To do it, we create a subclass of `nn.Block` and implement two methods:\n",
     "\n",
     "- `__init__` create the layers\n",
     "- `forward` define the forward function."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 6,
    "id": "31c26b8d",
    "metadata": {
     "attributes": {
      "classes": [],
      "id": "",
      "n": "6"
     }
    },
    "outputs": [],
    "source": [
     "class MixMLP(nn.Block):\n",
     "    def __init__(self, **kwargs):\n",
     "        # Run `nn.Block`'s init method\n",
     "        super(MixMLP, self).__init__(**kwargs)\n",
     "        self.blk = nn.Sequential()\n",
     "        self.blk.add(nn.Dense(3, activation='relu'),\n",
     "                     nn.Dense(4, activation='relu'))\n",
     "        self.dense = nn.Dense(5)\n",
     "    def forward(self, x):\n",
     "        y = nd.relu(self.blk(x))\n",
     "        print(y)\n",
     "        return self.dense(y)\n",
     "\n",
     "net = MixMLP()\n",
     "net"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "f4d17c1d",
    "metadata": {},
    "source": [
     "In the sequential chaining approach, we can only add instances with `nn.Block` as the base class and then run them in a forward pass. In this example, we used `print` to get the intermediate results and `nd.relu` to apply relu activation. So this approach provides a more flexible way to define the forward function.\n",
     "\n",
     "The usage of `net` is similar as before."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "7e11abda",
    "metadata": {},
    "outputs": [],
    "source": [
     "net.initialize()\n",
     "x = nd.random.uniform(shape=(2,2))\n",
     "net(x)"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "f44bb603",
    "metadata": {},
    "source": [
     "Finally, let's access a particular layer's weight"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 8,
    "id": "8f4ec9c2",
    "metadata": {
     "attributes": {
      "classes": [],
      "id": "",
      "n": "8"
     }
    },
    "outputs": [],
    "source": [
     "net.blk[1].weight.data()"
    ]
   }
  ],
  "metadata": {
   "language_info": {
    "name": "python"
   }
  },
  "nbformat": 4,
  "nbformat_minor": 5
 }
	{
	"cells": [
	{
	"cell_type": "markdown",
	"id": "cbf11d31",
	"metadata": {},
	"source": [
	"<!--- Licensed to the Apache Software Foundation (ASF) under one -->\n",
	"<!--- or more contributor license agreements. See the NOTICE file -->\n",
	"<!--- distributed with this work for additional information -->\n",
	"<!--- regarding copyright ownership. The ASF licenses this file -->\n",
	"<!--- to you under the Apache License, Version 2.0 (the -->\n",
	"<!--- \"License\"); you may not use this file except in compliance -->\n",
	"<!--- with the License. You may obtain a copy of the License at -->\n",
	"\n",
	"<!--- http://www.apache.org/licenses/LICENSE-2.0 -->\n",
	"\n",
	"<!--- Unless required by applicable law or agreed to in writing, -->\n",
	"<!--- software distributed under the License is distributed on an -->\n",
	"<!--- \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->\n",
	"<!--- KIND, either express or implied. See the License for the -->\n",
	"<!--- specific language governing permissions and limitations -->\n",
	"<!--- under the License. -->\n",
	"\n",
	"# Create a neural network\n",
	"\n",
	"Now let's look how to create neural networks in Gluon. In addition the NDArray package (`nd`) that we just covered, we now will also import the neural network `nn` package from `gluon`."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"id": "9b9c386e",
	"metadata": {
	"attributes": {
	"classes": [],
	"id": "",
	"n": "2"
	}
	},
	"outputs": [],
	"source": [
	"from mxnet import nd\n",
	"from mxnet.gluon import nn"
	]
	},
	{
	"cell_type": "markdown",
	"id": "a1e7d529",
	"metadata": {},
	"source": [
	"## Create your neural network's first layer\n",
	"\n",
	"Let's start with a dense layer with 2 output units.\n",
	"<!-- mention what the none and the linear parts mean? -->"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 31,
	"id": "4ce11cb4",
	"metadata": {
	"attributes": {
	"classes": [],
	"id": "",
	"n": "31"
	}
	},
	"outputs": [],
	"source": [
	"layer = nn.Dense(2)\n",
	"layer"
	]
	},
	{
	"cell_type": "markdown",
	"id": "ed76262a",
	"metadata": {},
	"source": [
	"Then initialize its weights with the default initialization method, which draws random values uniformly from $[-0.7, 0.7]$."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 32,
	"id": "6d215548",
	"metadata": {
	"attributes": {
	"classes": [],
	"id": "",
	"n": "32"
	}
	},
	"outputs": [],
	"source": [
	"layer.initialize()"
	]
	},
	{
	"cell_type": "markdown",
	"id": "f0892060",
	"metadata": {},
	"source": [
	"Then we do a forward pass with random data. We create a $(3,4)$ shape random input `x` and feed into the layer to compute the output."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 34,
	"id": "7f9f3d5b",
	"metadata": {
	"attributes": {
	"classes": [],
	"id": "",
	"n": "34"
	}
	},
	"outputs": [],
	"source": [
	"x = nd.random.uniform(-1,1,(3,4))\n",
	"layer(x)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "70c64d26",
	"metadata": {},
	"source": [
	"As can be seen, the layer's input limit of 2 produced a $(3,2)$ shape output from our $(3,4)$ input. Note that we didn't specify the input size of `layer` before (though we can specify it with the argument `in_units=4` here), the system will automatically infer it during the first time we feed in data, create and initialize the weights. So we can access the weight after the first forward pass:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 35,
	"id": "c2b6a50d",
	"metadata": {
	"attributes": {
	"classes": [],
	"id": "",
	"n": "35"
	}
	},
	"outputs": [],
	"source": [
	"layer.weight.data()"
	]
	},
	{
	"cell_type": "markdown",
	"id": "340bf945",
	"metadata": {},
	"source": [
	"## Chain layers into a neural network\n",
	"\n",
	"Let's first consider a simple case that a neural network is a chain of layers. During the forward pass, we run layers sequentially one-by-one. The following code implements a famous network called [LeNet](http://yann.lecun.com/exdb/lenet/) through `nn.Sequential`."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "88aceeff",
	"metadata": {},
	"outputs": [],
	"source": [
	"net = nn.Sequential()\n",
	"# Add a sequence of layers.\n",
	"net.add(# Similar to Dense, it is not necessary to specify the input channels\n",
	" # by the argument `in_channels`, which will be automatically inferred\n",
	" # in the first forward pass. Also, we apply a relu activation on the\n",
	" # output. In addition, we can use a tuple to specify a non-square\n",
	" # kernel size, such as `kernel_size=(2,4)`\n",
	" nn.Conv2D(channels=6, kernel_size=5, activation='relu'),\n",
	" # One can also use a tuple to specify non-symmetric pool and stride sizes\n",
	" nn.MaxPool2D(pool_size=2, strides=2),\n",
	" nn.Conv2D(channels=16, kernel_size=3, activation='relu'),\n",
	" nn.MaxPool2D(pool_size=2, strides=2),\n",
	" # The dense layer will automatically reshape the 4-D output of last\n",
	" # max pooling layer into the 2-D shape: (x.shape[0], x.size/x.shape[0])\n",
	" nn.Dense(120, activation=\"relu\"),\n",
	" nn.Dense(84, activation=\"relu\"),\n",
	" nn.Dense(10))\n",
	"net"
	]
	},
	{
	"cell_type": "markdown",
	"id": "65f3354f",
	"metadata": {},
	"source": [
	"<!--Mention the tuple option for kernel and stride as an exercise for the reader? Or leave it out as too much info for now?-->\n",
	"\n",
	"The usage of `nn.Sequential` is similar to `nn.Dense`. In fact, both of them are subclasses of `nn.Block`. The following codes show how to initialize the weights and run the forward pass."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "f4b217de",
	"metadata": {},
	"outputs": [],
	"source": [
	"net.initialize()\n",
	"# Input shape is (batch_size, color_channels, height, width)\n",
	"x = nd.random.uniform(shape=(4,1,28,28))\n",
	"y = net(x)\n",
	"y.shape"
	]
	},
	{
	"cell_type": "markdown",
	"id": "11ef05f2",
	"metadata": {},
	"source": [
	"We can use `[]` to index a particular layer. For example, the following\n",
	"accesses the 1st layer's weight and 6th layer's bias."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "bd37889a",
	"metadata": {},
	"outputs": [],
	"source": [
	"(net[0].weight.data().shape, net[5].bias.data().shape)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "dd18f332",
	"metadata": {},
	"source": [
	"## Create a neural network flexibly\n",
	"\n",
	"In `nn.Sequential`, MXNet will automatically construct the forward function that sequentially executes added layers.\n",
	"Now let's introduce another way to construct a network with a flexible forward function.\n",
	"\n",
	"To do it, we create a subclass of `nn.Block` and implement two methods:\n",
	"\n",
	"- `__init__` create the layers\n",
	"- `forward` define the forward function."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 6,
	"id": "31c26b8d",
	"metadata": {
	"attributes": {
	"classes": [],
	"id": "",
	"n": "6"
	}
	},
	"outputs": [],
	"source": [
	"class MixMLP(nn.Block):\n",
	" def __init__(self, **kwargs):\n",
	" # Run `nn.Block`'s init method\n",
	" super(MixMLP, self).__init__(**kwargs)\n",
	" self.blk = nn.Sequential()\n",
	" self.blk.add(nn.Dense(3, activation='relu'),\n",
	" nn.Dense(4, activation='relu'))\n",
	" self.dense = nn.Dense(5)\n",
	" def forward(self, x):\n",
	" y = nd.relu(self.blk(x))\n",
	" print(y)\n",
	" return self.dense(y)\n",
	"\n",
	"net = MixMLP()\n",
	"net"
	]
	},
	{
	"cell_type": "markdown",
	"id": "f4d17c1d",
	"metadata": {},
	"source": [
	"In the sequential chaining approach, we can only add instances with `nn.Block` as the base class and then run them in a forward pass. In this example, we used `print` to get the intermediate results and `nd.relu` to apply relu activation. So this approach provides a more flexible way to define the forward function.\n",
	"\n",
	"The usage of `net` is similar as before."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "7e11abda",
	"metadata": {},
	"outputs": [],
	"source": [
	"net.initialize()\n",
	"x = nd.random.uniform(shape=(2,2))\n",
	"net(x)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "f44bb603",
	"metadata": {},
	"source": [
	"Finally, let's access a particular layer's weight"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 8,
	"id": "8f4ec9c2",
	"metadata": {
	"attributes": {
	"classes": [],
	"id": "",
	"n": "8"
	}
	},
	"outputs": [],
	"source": [
	"net.blk[1].weight.data()"
	]
	}
	],
	"metadata": {
	"language_info": {
	"name": "python"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 5
	}