| { |
| "cells": [ |
| { |
| "cell_type": "markdown", |
| "id": "97626fb7", |
| "metadata": {}, |
| "source": [ |
| "<!--- Licensed to the Apache Software Foundation (ASF) under one -->\n", |
| "<!--- or more contributor license agreements. See the NOTICE file -->\n", |
| "<!--- distributed with this work for additional information -->\n", |
| "<!--- regarding copyright ownership. The ASF licenses this file -->\n", |
| "<!--- to you under the Apache License, Version 2.0 (the -->\n", |
| "<!--- \"License\"); you may not use this file except in compliance -->\n", |
| "<!--- with the License. You may obtain a copy of the License at -->\n", |
| "\n", |
| "<!--- http://www.apache.org/licenses/LICENSE-2.0 -->\n", |
| "\n", |
| "<!--- Unless required by applicable law or agreed to in writing, -->\n", |
| "<!--- software distributed under the License is distributed on an -->\n", |
| "<!--- \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->\n", |
| "<!--- KIND, either express or implied. See the License for the -->\n", |
| "<!--- specific language governing permissions and limitations -->\n", |
| "<!--- under the License. -->\n", |
| "\n", |
| "# Image Augmentation\n", |
| "\n", |
| "Image augmentation technology expands the scale of\n", |
| "training data sets by making a series of random changes to the training images\n", |
| "to produce similar, but different, training examples. Given its popularity in\n", |
| "computer vision, the `mxnet.gluon.data.vision.transforms` model provides\n", |
| "multiple pre-defined image augmentation methods. In this section we will briefly\n", |
| "go through this module.\n", |
| "\n", |
| "First, import the module required for this section." |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "62f526a0", |
| "metadata": {}, |
| "source": [ |
| "```python\n", |
| "from matplotlib import pyplot as plt\n", |
| "from mxnet import image\n", |
| "from mxnet.gluon import data as gdata, utils\n", |
| "```\n" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "3a9d52a0", |
| "metadata": {}, |
| "source": [ |
| "Then read the sample $400\\times 500$ image." |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "e0edaab6", |
| "metadata": {}, |
| "source": [ |
| "```python\n", |
| "utils.download('https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/doc/cat.jpg')\n", |
| "img = image.imread('cat.jpg')\n", |
| "plt.imshow(img.asnumpy())\n", |
| "plt.show()\n", |
| "```\n" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "be2c3331", |
| "metadata": {}, |
| "source": [ |
| "In addition, we define a function to draw a list of images." |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "d8a47803", |
| "metadata": {}, |
| "source": [ |
| "```python\n", |
| "def show_images(imgs, num_rows, num_cols, scale=2):\n", |
| " figsize = (num_cols * scale, num_rows * scale)\n", |
| " _, axes = plt.subplots(num_rows, num_cols, figsize=figsize)\n", |
| " for i in range(num_rows):\n", |
| " for j in range(num_cols):\n", |
| " axes[i][j].imshow(imgs[i * num_cols + j].asnumpy())\n", |
| " axes[i][j].axes.get_xaxis().set_visible(False)\n", |
| " axes[i][j].axes.get_yaxis().set_visible(False)\n", |
| " return axes\n", |
| "```\n" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "f40c59d1", |
| "metadata": {}, |
| "source": [ |
| "Most image augmentation methods have a certain degree of randomness. To make it\n", |
| "easier for us to observe the effect of image augmentation, we next define the\n", |
| "auxiliary function `apply`. This function runs the image augmentation method\n", |
| "`aug` multiple times on the input image `img` and shows all results." |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "5ad94efb", |
| "metadata": {}, |
| "source": [ |
| "```python\n", |
| "def apply(img, aug, num_rows=2, num_cols=4, scale=3):\n", |
| " Y = [aug(img) for _ in range(num_rows * num_cols)]\n", |
| " show_images(Y, num_rows, num_cols, scale)\n", |
| "```\n" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "8add2cb0", |
| "metadata": {}, |
| "source": [ |
| "## Flip and Crop\n", |
| "\n", |
| "Flipping the image left and right usually does not change the\n", |
| "category of the object. This is one of the earliest and most widely used methods\n", |
| "of image augmentation. Next, we use the `transforms` module to create the\n", |
| "`RandomFlipLeftRight` instance, which introduces a 50% chance that the image is\n", |
| "flipped left and right." |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "bcdcf9e5", |
| "metadata": {}, |
| "source": [ |
| "```python\n", |
| "apply(img, gdata.vision.transforms.RandomFlipLeftRight())\n", |
| "```\n" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "22db0bcf", |
| "metadata": {}, |
| "source": [ |
| "Flipping up and down is not as commonly used as flipping left and right.\n", |
| "However, at least for this example image, flipping up and down does not hinder\n", |
| "recognition. Next, we create a `RandomFlipTopBottom` instance for a 50% chance\n", |
| "of flipping the image up and down." |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "378bba68", |
| "metadata": {}, |
| "source": [ |
| "```python\n", |
| "apply(img, gdata.vision.transforms.RandomFlipTopBottom())\n", |
| "```\n" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "8ce05128", |
| "metadata": {}, |
| "source": [ |
| "In the example image we used, the cat is in the middle of the image, but this\n", |
| "may not be the case for all images. In the [Pooling Layer](https://d2l.ai/chapter_convolutional-neural-networks/pooling.html) section of the d2l.ai book, we explain that the pooling layer can reduce the sensitivity of the convolutional\n", |
| "layer to the target location. In addition, we can make objects appear at\n", |
| "different positions in the image in different proportions by randomly cropping\n", |
| "the image. This can also reduce the sensitivity of the model to the target\n", |
| "position.\n", |
| "\n", |
| "In the following code, we randomly crop a region with an area of 10%\n", |
| "to 100% of the original area, and the ratio of width to height of the region is\n", |
| "randomly selected from between 0.5 and 2. Then, the width and height of the\n", |
| "region are both scaled to 200 pixels. Unless otherwise stated, the random number\n", |
| "between $a$ and $b$ in this section refers to a continuous value obtained by\n", |
| "uniform sampling in the interval $[a,b]$." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 7, |
| "id": "2d18aef2", |
| "metadata": { |
| "attributes": { |
| "classes": [], |
| "id": "", |
| "n": "7" |
| } |
| }, |
| "outputs": [], |
| "source": [ |
| "shape_aug = gdata.vision.transforms.RandomResizedCrop(\n", |
| " (200, 200), scale=(0.1, 1), ratio=(0.5, 2))\n", |
| "apply(img, shape_aug)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "755a8a17", |
| "metadata": {}, |
| "source": [ |
| "## Change Color\n", |
| "\n", |
| "Another augmentation method is changing colors. We can change\n", |
| "four aspects of the image color: brightness, contrast, saturation, and hue. In\n", |
| "the example below, we randomly change the brightness of the image to a value\n", |
| "between 50% ($1-0.5$) and 150% ($1+0.5$) of the original image." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 8, |
| "id": "0eab1915", |
| "metadata": { |
| "attributes": { |
| "classes": [], |
| "id": "", |
| "n": "8" |
| } |
| }, |
| "outputs": [], |
| "source": [ |
| "apply(img, gdata.vision.transforms.RandomBrightness(0.5))" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "365f2503", |
| "metadata": {}, |
| "source": [ |
| "Similarly, we can randomly change the hue of the image." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 9, |
| "id": "c8b3891a", |
| "metadata": { |
| "attributes": { |
| "classes": [], |
| "id": "", |
| "n": "9" |
| } |
| }, |
| "outputs": [], |
| "source": [ |
| "apply(img, gdata.vision.transforms.RandomHue(0.5))" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "d3f9873b", |
| "metadata": {}, |
| "source": [ |
| "We can also create a `RandomColorJitter` instance and set how to randomly change\n", |
| "the `brightness`, `contrast`, `saturation`, and `hue` of the image at the same\n", |
| "time." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 10, |
| "id": "0906ac9e", |
| "metadata": { |
| "attributes": { |
| "classes": [], |
| "id": "", |
| "n": "10" |
| } |
| }, |
| "outputs": [], |
| "source": [ |
| "color_aug = gdata.vision.transforms.RandomColorJitter(\n", |
| " brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5)\n", |
| "apply(img, color_aug)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "491ed0c4", |
| "metadata": {}, |
| "source": [ |
| "## Overlying Multiple Image Augmentation Methods\n", |
| "\n", |
| "In practice, we will overlay\n", |
| "multiple image augmentation methods. We can overlay the different image\n", |
| "augmentation methods defined above and apply them to each image by using a\n", |
| "`Compose` instance." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 11, |
| "id": "bd8eb588", |
| "metadata": { |
| "attributes": { |
| "classes": [], |
| "id": "", |
| "n": "11" |
| } |
| }, |
| "outputs": [], |
| "source": [ |
| "augs = gdata.vision.transforms.Compose([\n", |
| " gdata.vision.transforms.RandomFlipLeftRight(), color_aug, shape_aug])\n", |
| "apply(img, augs)" |
| ] |
| } |
| ], |
| "metadata": { |
| "language_info": { |
| "name": "python" |
| } |
| }, |
| "nbformat": 4, |
| "nbformat_minor": 5 |
| } |