blob: 73c6336e095f848f2a7e0c2166d0ebb8664dfdf4 [file] [log] [blame]
{"nbformat": 4, "cells": [{"source": "# Gluon - Neural network building blocks\n\nGluon package is a high-level interface for MXNet designed to be easy to use while\nkeeping most of the flexibility of low level API. Gluon supports both imperative\nand symbolic programming, making it easy to train complex models imperatively\nin Python and then deploy with symbolic graph in C++ and Scala.", "cell_type": "markdown", "metadata": {}}, {"source": "# import dependencies\nfrom __future__ import print_function\nimport numpy as np\nimport mxnet as mx\nimport mxnet.ndarray as F\nimport mxnet.gluon as gluon\nfrom mxnet.gluon import nn\nfrom mxnet import autograd", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "Neural networks (and other machine learning models) can be defined and trained\nwith `gluon.nn` and `gluon.rnn` package. A typical training script has the following\nsteps:\n\n- Define network\n- Initialize parameters\n- Loop over inputs\n- Forward input through network to get output\n- Compute loss with output and label\n- Backprop gradient\n- Update parameters with gradient descent.\n\n\n## Define Network\n\n`gluon.Block` is the basic building block of models. You can define networks by\ncomposing and inheriting `Block`:", "cell_type": "markdown", "metadata": {}}, {"source": "class Net(gluon.Block):\n def __init__(self, **kwargs):\n super(Net, self).__init__(**kwargs)\n with self.name_scope():\n # layers created in name_scope will inherit name space\n # from parent layer.\n self.conv1 = nn.Conv2D(6, kernel_size=5)\n self.pool1 = nn.MaxPool2D(pool_size=(2,2))\n self.conv2 = nn.Conv2D(16, kernel_size=5)\n self.pool2 = nn.MaxPool2D(pool_size=(2,2))\n self.fc1 = nn.Dense(120)\n self.fc2 = nn.Dense(84)\n self.fc3 = nn.Dense(10)\n\n def forward(self, x):\n x = self.pool1(F.relu(self.conv1(x)))\n x = self.pool2(F.relu(self.conv2(x)))\n # 0 means copy over size from corresponding dimension.\n # -1 means infer size from the rest of dimensions.\n x = x.reshape((0, -1))\n x = F.relu(self.fc1(x))\n x = F.relu(self.fc2(x))\n x = self.fc3(x)\n return x", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "## Initialize Parameters\n\nA network must be created and initialized before it can be used:", "cell_type": "markdown", "metadata": {}}, {"source": "net = Net()\n# Initialize on CPU. Replace with `mx.gpu(0)`, or `[mx.gpu(0), mx.gpu(1)]`,\n# etc to use one or more GPUs.\nnet.initialize(mx.init.Xavier(), ctx=mx.cpu())", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "Note that because we didn't specify input size to layers in Net's constructor,\nthe shape of parameters cannot be determined at this point. Actual initialization\nis deferred to the first forward pass, i.e. if you access `net.fc1.weight.data()`\nnow an exception will be raised.\n\nYou can actually initialize the weights by running a forward pass:", "cell_type": "markdown", "metadata": {}}, {"source": "data = mx.nd.random_normal(shape=(10, 1, 32, 32)) # dummy data\noutput = net(data)", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "Or you can specify input size when creating layers, i.e. `nn.Dense(84, in_units=120)`\ninstead of `nn.Dense(84)`.\n\n## Loss Functions\n\nLoss functions take (output, label) pairs and compute a scalar loss for each sample\nin the mini-batch. The scalars measure how far each output is from the label.\n\nThere are many predefined loss functions in `gluon.loss`. Here we use\n`softmax_cross_entropy_loss` for digit classification.\n\nTo compute loss and backprop for one iteration, we do:", "cell_type": "markdown", "metadata": {}}, {"source": "label = mx.nd.arange(10) # dummy label\nwith autograd.record():\n output = net(data)\n L = gluon.loss.SoftmaxCrossEntropyLoss()\n loss = L(output, label)\n loss.backward()\nprint('loss:', loss)\nprint('grad:', net.fc1.weight.grad())", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "## Updating the weights\n\nNow that gradient is computed, we just need to update the weights. This is usually\ndone with formulas like `weight = weight - learning_rate * grad / batch_size`.\nNote we divide gradient by batch_size because gradient is aggregated over the\nentire batch. For example,", "cell_type": "markdown", "metadata": {}}, {"source": "lr = 0.01\nfor p in net.collect_params().values():\n p.data()[:] -= lr / data.shape[0] * p.grad()", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "But sometimes you want more fancy updating rules like momentum and Adam, and since\nthis is a commonly used functionality, gluon provide a `Trainer` class for it:", "cell_type": "markdown", "metadata": {}}, {"source": "trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.01})\n\nwith autograd.record():\n output = net(data)\n L = gluon.loss.SoftmaxCrossEntropyLoss()\n loss = L(output, label)\n loss.backward()\n\n# do the update. Trainer needs to know the batch size of data to normalize\n# the gradient by 1/batch_size.\ntrainer.step(data.shape[0])", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "\n<!-- INSERT SOURCE DOWNLOAD BUTTONS -->\n\n", "cell_type": "markdown", "metadata": {}}], "metadata": {"display_name": "", "name": "", "language": "python"}, "nbformat_minor": 2}