blob: 165517f93d6a2c01415761bcb4591b543becdaf8 [file] [log] [blame]
{"nbformat": 4, "cells": [{"source": "<!--- Licensed to the Apache Software Foundation (ASF) under one -->\n<!--- or more contributor license agreements. See the NOTICE file -->\n<!--- distributed with this work for additional information -->\n<!--- regarding copyright ownership. The ASF licenses this file -->\n<!--- to you under the Apache License, Version 2.0 (the -->\n<!--- \"License\"); you may not use this file except in compliance -->\n<!--- with the License. You may obtain a copy of the License at -->\n\n<!--- http://www.apache.org/licenses/LICENSE-2.0 -->\n\n<!--- Unless required by applicable law or agreed to in writing, -->\n<!--- software distributed under the License is distributed on an -->\n<!--- \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->\n<!--- KIND, either express or implied. See the License for the -->\n<!--- specific language governing permissions and limitations -->\n<!--- under the License. -->\n\n\n# Fine-tune with Pretrained Models\n\nMany of the exciting deep learning algorithms for computer vision require\nmassive datasets for training. The most popular benchmark dataset,\n[ImageNet](http://www.image-net.org/), for example, contains one million images\nfrom one thousand categories. But for any practical problem, we typically have\naccess to comparatively small datasets. In these cases, if we were to train a\nneural network's weights from scratch, starting from random initialized\nparameters, we would overfit the training set badly.\n\nOne approach to get around this problem is to first pretrain a deep net on a\nlarge-scale dataset, like ImageNet. Then, given a new dataset, we can start\nwith these pretrained weights when training on our new task. This process is\ncommonly called _fine-tuning_. There are a number of variations of fine-tuning.\nSometimes, the initial neural network is used only as a _feature extractor_.\nThat means that we freeze every layer prior to the output layer and simply learn\na new output layer. In [another document](https://github.com/dmlc/mxnet-notebooks/blob/master/python/how_to/predict.ipynb), we explained how to\ndo this kind of feature extraction. Another approach is to update all of\nthe network's weights for the new task, and that's the approach we demonstrate in\nthis document.\n\nTo fine-tune a network, we must first replace the last fully-connected layer\nwith a new one that outputs the desired number of classes. We initialize its\nweights randomly. Then we continue training as normal. Sometimes it's common to\nuse a smaller learning rate based on the intuition that we may already be close\nto a good result.\n\nIn this demonstration, we'll fine-tune a model pretrained on ImageNet to the\nsmaller caltech-256 dataset. Following this example, you can fine-tune to other\ndatasets, even for strikingly different applications such as face\nidentification.\n\nWe will show that, even with simple hyper-parameters setting, we can match and\neven outperform state-of-the-art results on caltech-256.\n\n```eval_rst\n.. list-table::\n :header-rows: 1\n\n * - Network \n - Accuracy \n * - Resnet-50 \n - 77.4% \n * - Resnet-152 \n - 86.4% \n```\n\n## Prepare data\n\nWe follow the standard protocol to sample 60 images from each class as the\ntraining set, and the rest for the validation set. We resize images into 256x256\nsize and pack them into the rec file. The scripts to prepare the data is as\nfollowing.\n\n> In order to successfully run the following bash script on Windows please use https://cygwin.com/install.html .\n\n```sh\nwget http://www.vision.caltech.edu/Image_Datasets/Caltech256/256_ObjectCategories.tar\ntar -xf 256_ObjectCategories.tar\n\nmkdir -p caltech_256_train_60\nfor i in 256_ObjectCategories/*; do\n c=`basename $i`\n mkdir -p caltech_256_train_60/$c\n for j in `ls $i/*.jpg | shuf | head -n 60`; do\n mv $j caltech_256_train_60/$c/\n done\ndone\n\npython ~/mxnet/tools/im2rec.py --list --recursive caltech-256-60-train caltech_256_train_60/\npython ~/mxnet/tools/im2rec.py --list --recursive caltech-256-60-val 256_ObjectCategories/\npython ~/mxnet/tools/im2rec.py --resize 256 --quality 90 --num-thread 16 caltech-256-60-val 256_ObjectCategories/\npython ~/mxnet/tools/im2rec.py --resize 256 --quality 90 --num-thread 16 caltech-256-60-train caltech_256_train_60/\n```\n\nThe following code downloads the pregenerated rec files. It may take a few minutes.", "cell_type": "markdown", "metadata": {}}, {"source": "import os, sys\n\nif sys.version_info[0] >= 3:\n from urllib.request import urlretrieve\nelse:\n from urllib import urlretrieve\n\ndef download(url):\n filename = url.split(\"/\")[-1]\n if not os.path.exists(filename):\n urlretrieve(url, filename)\ndownload('http://data.mxnet.io/data/caltech-256/caltech-256-60-train.rec')\ndownload('http://data.mxnet.io/data/caltech-256/caltech-256-60-val.rec')", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "Next, we define the function which returns the data iterators:", "cell_type": "markdown", "metadata": {}}, {"source": "import mxnet as mx\n\ndef get_iterators(batch_size, data_shape=(3, 224, 224)):\n train = mx.io.ImageRecordIter(\n path_imgrec = './caltech-256-60-train.rec',\n data_name = 'data',\n label_name = 'softmax_label',\n batch_size = batch_size,\n data_shape = data_shape,\n shuffle = True,\n rand_crop = True,\n rand_mirror = True)\n val = mx.io.ImageRecordIter(\n path_imgrec = './caltech-256-60-val.rec',\n data_name = 'data',\n label_name = 'softmax_label',\n batch_size = batch_size,\n data_shape = data_shape,\n rand_crop = False,\n rand_mirror = False)\n return (train, val)", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "We then download a pretrained 50-layer ResNet model and load it into memory. Note\nthat if `load_checkpoint` reports an error, we can remove the downloaded files\nand try `get_model` again.", "cell_type": "markdown", "metadata": {}}, {"source": "def get_model(prefix, epoch):\n download(prefix+'-symbol.json')\n download(prefix+'-%04d.params' % (epoch,))\n\nget_model('http://data.mxnet.io/models/imagenet/resnet/50-layers/resnet-50', 0)\nsym, arg_params, aux_params = mx.model.load_checkpoint('resnet-50', 0)", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "## Train\n\nWe first define a function which replaces the last fully-connected layer for a given network.", "cell_type": "markdown", "metadata": {}}, {"source": "def get_fine_tune_model(symbol, arg_params, num_classes, layer_name='flatten0'):\n \"\"\"\n symbol: the pretrained network symbol\n arg_params: the argument parameters of the pretrained model\n num_classes: the number of classes for the fine-tune datasets\n layer_name: the layer name before the last fully-connected layer\n \"\"\"\n all_layers = symbol.get_internals()\n net = all_layers[layer_name+'_output']\n net = mx.symbol.FullyConnected(data=net, num_hidden=num_classes, name='fc1')\n net = mx.symbol.SoftmaxOutput(data=net, name='softmax')\n new_args = dict({k:arg_params[k] for k in arg_params if 'fc1' not in k})\n return (net, new_args)", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "Now we create a module. Note we pass the existing parameters from the loaded model via the `arg_params` argument.\nThe parameters of the last fully-connected layer will be randomly initialized by the `initializer`.", "cell_type": "markdown", "metadata": {}}, {"source": "import logging\nhead = '%(asctime)-15s %(message)s'\nlogging.basicConfig(level=logging.DEBUG, format=head)\n\ndef fit(symbol, arg_params, aux_params, train, val, batch_size, num_gpus):\n devs = [mx.gpu(i) for i in range(num_gpus)]\n mod = mx.mod.Module(symbol=symbol, context=devs)\n mod.fit(train, val,\n num_epoch=8,\n arg_params=arg_params,\n aux_params=aux_params,\n allow_missing=True,\n batch_end_callback = mx.callback.Speedometer(batch_size, 10),\n kvstore='device',\n optimizer='sgd',\n optimizer_params={'learning_rate':0.01},\n initializer=mx.init.Xavier(rnd_type='gaussian', factor_type=\"in\", magnitude=2),\n eval_metric='acc')\n metric = mx.metric.Accuracy()\n return mod.score(val, metric)", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "Then we can start training. We use AWS EC2 g2.8xlarge, which has 8 GPUs.", "cell_type": "markdown", "metadata": {}}, {"source": "num_classes = 256\nbatch_per_gpu = 16\nnum_gpus = 8\n\n(new_sym, new_args) = get_fine_tune_model(sym, arg_params, num_classes)\n\nbatch_size = batch_per_gpu * num_gpus\n(train, val) = get_iterators(batch_size)\nmod_score = fit(new_sym, new_args, aux_params, train, val, batch_size, num_gpus)\nassert mod_score > 0.77, \"Low training accuracy.\"", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "You will see that, after only 8 epochs, we can get 78% validation accuracy. This\nmatches the state-of-the-art results training on caltech-256 alone,\ne.g. [VGG](http://www.robots.ox.ac.uk/~vgg/research/deep_eval/).\n\nNext, we try to use another pretrained model. This model was trained on the\ncomplete Imagenet dataset, which is 10x larger than the Imagenet 1K classes\nversion, and uses a 3x deeper Resnet architecture.", "cell_type": "markdown", "metadata": {}}, {"source": "get_model('http://data.mxnet.io/models/imagenet-11k/resnet-152/resnet-152', 0)\nsym, arg_params, aux_params = mx.model.load_checkpoint('resnet-152', 0)\n(new_sym, new_args) = get_fine_tune_model(sym, arg_params, num_classes)\nmod_score = fit(new_sym, new_args, aux_params, train, val, batch_size, num_gpus)\nassert mod_score > 0.86, \"Low training accuracy.\"", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "\n\nAs can be seen, even for a single data epoch, it reaches 83% validation\naccuracy. After 8 epoches, the validation accuracy increases to 86.4%.\n\n<!-- INSERT SOURCE DOWNLOAD BUTTONS -->\n\n", "cell_type": "markdown", "metadata": {}}], "metadata": {"display_name": "", "name": "", "language": "python"}, "nbformat_minor": 2}