blob: 4483a252e1d7ed99df1be8b717db38a706a60641 [file] [log] [blame]
{
"cells": [
{
"cell_type": "markdown",
"id": "d83be71e",
"metadata": {},
"source": [
"<!--- Licensed to the Apache Software Foundation (ASF) under one -->\n",
"<!--- or more contributor license agreements. See the NOTICE file -->\n",
"<!--- distributed with this work for additional information -->\n",
"<!--- regarding copyright ownership. The ASF licenses this file -->\n",
"<!--- to you under the Apache License, Version 2.0 (the -->\n",
"<!--- \"License\"); you may not use this file except in compliance -->\n",
"<!--- with the License. You may obtain a copy of the License at -->\n",
"\n",
"<!--- http://www.apache.org/licenses/LICENSE-2.0 -->\n",
"\n",
"<!--- Unless required by applicable law or agreed to in writing, -->\n",
"<!--- software distributed under the License is distributed on an -->\n",
"<!--- \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->\n",
"<!--- KIND, either express or implied. See the License for the -->\n",
"<!--- specific language governing permissions and limitations -->\n",
"<!--- under the License. -->\n",
"\n",
"# Image Augmentation\n",
"\n",
"Image augmentation technology expands the scale of\n",
"training data sets by making a series of random changes to the training images\n",
"to produce similar, but different, training examples. Given its popularity in\n",
"computer vision, the `mxnet.gluon.data.vision.transforms` model provides\n",
"multiple pre-defined image augmentation methods. In this section we will briefly\n",
"go through this module.\n",
"\n",
"First, import the module required for this section."
]
},
{
"cell_type": "markdown",
"id": "3cc9aca0",
"metadata": {},
"source": [
"```python\n",
"from matplotlib import pyplot as plt\n",
"from mxnet import image\n",
"from mxnet.gluon import data as gdata, utils\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "82e82283",
"metadata": {},
"source": [
"Then read the sample $400\\times 500$ image."
]
},
{
"cell_type": "markdown",
"id": "59746d1e",
"metadata": {},
"source": [
"```python\n",
"utils.download('https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/doc/cat.jpg')\n",
"img = image.imread('cat.jpg')\n",
"plt.imshow(img.asnumpy())\n",
"plt.show()\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "73e3505c",
"metadata": {},
"source": [
"In addition, we define a function to draw a list of images."
]
},
{
"cell_type": "markdown",
"id": "40bb750c",
"metadata": {},
"source": [
"```python\n",
"def show_images(imgs, num_rows, num_cols, scale=2):\n",
" figsize = (num_cols * scale, num_rows * scale)\n",
" _, axes = plt.subplots(num_rows, num_cols, figsize=figsize)\n",
" for i in range(num_rows):\n",
" for j in range(num_cols):\n",
" axes[i][j].imshow(imgs[i * num_cols + j].asnumpy())\n",
" axes[i][j].axes.get_xaxis().set_visible(False)\n",
" axes[i][j].axes.get_yaxis().set_visible(False)\n",
" return axes\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "723187a4",
"metadata": {},
"source": [
"Most image augmentation methods have a certain degree of randomness. To make it\n",
"easier for us to observe the effect of image augmentation, we next define the\n",
"auxiliary function `apply`. This function runs the image augmentation method\n",
"`aug` multiple times on the input image `img` and shows all results."
]
},
{
"cell_type": "markdown",
"id": "8ba366d4",
"metadata": {},
"source": [
"```python\n",
"def apply(img, aug, num_rows=2, num_cols=4, scale=3):\n",
" Y = [aug(img) for _ in range(num_rows * num_cols)]\n",
" show_images(Y, num_rows, num_cols, scale)\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "e0c61dd1",
"metadata": {},
"source": [
"## Flip and Crop\n",
"\n",
"Flipping the image left and right usually does not change the\n",
"category of the object. This is one of the earliest and most widely used methods\n",
"of image augmentation. Next, we use the `transforms` module to create the\n",
"`RandomFlipLeftRight` instance, which introduces a 50% chance that the image is\n",
"flipped left and right."
]
},
{
"cell_type": "markdown",
"id": "1fc543e1",
"metadata": {},
"source": [
"```python\n",
"apply(img, gdata.vision.transforms.RandomFlipLeftRight())\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "94a923d2",
"metadata": {},
"source": [
"Flipping up and down is not as commonly used as flipping left and right.\n",
"However, at least for this example image, flipping up and down does not hinder\n",
"recognition. Next, we create a `RandomFlipTopBottom` instance for a 50% chance\n",
"of flipping the image up and down."
]
},
{
"cell_type": "markdown",
"id": "40d29a4c",
"metadata": {},
"source": [
"```python\n",
"apply(img, gdata.vision.transforms.RandomFlipTopBottom())\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "073eaa78",
"metadata": {},
"source": [
"In the example image we used, the cat is in the middle of the image, but this\n",
"may not be the case for all images. In the [Pooling Layer](https://d2l.ai/chapter_convolutional-neural-networks/pooling.html) section of the d2l.ai book, we explain that the pooling layer can reduce the sensitivity of the convolutional\n",
"layer to the target location. In addition, we can make objects appear at\n",
"different positions in the image in different proportions by randomly cropping\n",
"the image. This can also reduce the sensitivity of the model to the target\n",
"position.\n",
"\n",
"In the following code, we randomly crop a region with an area of 10%\n",
"to 100% of the original area, and the ratio of width to height of the region is\n",
"randomly selected from between 0.5 and 2. Then, the width and height of the\n",
"region are both scaled to 200 pixels. Unless otherwise stated, the random number\n",
"between $a$ and $b$ in this section refers to a continuous value obtained by\n",
"uniform sampling in the interval $[a,b]$."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "6a22b23e",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "7"
}
},
"outputs": [],
"source": [
"shape_aug = gdata.vision.transforms.RandomResizedCrop(\n",
" (200, 200), scale=(0.1, 1), ratio=(0.5, 2))\n",
"apply(img, shape_aug)"
]
},
{
"cell_type": "markdown",
"id": "b1c97cbe",
"metadata": {},
"source": [
"## Change Color\n",
"\n",
"Another augmentation method is changing colors. We can change\n",
"four aspects of the image color: brightness, contrast, saturation, and hue. In\n",
"the example below, we randomly change the brightness of the image to a value\n",
"between 50% ($1-0.5$) and 150% ($1+0.5$) of the original image."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "74435f36",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "8"
}
},
"outputs": [],
"source": [
"apply(img, gdata.vision.transforms.RandomBrightness(0.5))"
]
},
{
"cell_type": "markdown",
"id": "a0c4774e",
"metadata": {},
"source": [
"Similarly, we can randomly change the hue of the image."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "0e1e08de",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "9"
}
},
"outputs": [],
"source": [
"apply(img, gdata.vision.transforms.RandomHue(0.5))"
]
},
{
"cell_type": "markdown",
"id": "2302032a",
"metadata": {},
"source": [
"We can also create a `RandomColorJitter` instance and set how to randomly change\n",
"the `brightness`, `contrast`, `saturation`, and `hue` of the image at the same\n",
"time."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "a3c94df7",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "10"
}
},
"outputs": [],
"source": [
"color_aug = gdata.vision.transforms.RandomColorJitter(\n",
" brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5)\n",
"apply(img, color_aug)"
]
},
{
"cell_type": "markdown",
"id": "7e39687f",
"metadata": {},
"source": [
"## Overlying Multiple Image Augmentation Methods\n",
"\n",
"In practice, we will overlay\n",
"multiple image augmentation methods. We can overlay the different image\n",
"augmentation methods defined above and apply them to each image by using a\n",
"`Compose` instance."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "05efb2a2",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "11"
}
},
"outputs": [],
"source": [
"augs = gdata.vision.transforms.Compose([\n",
" gdata.vision.transforms.RandomFlipLeftRight(), color_aug, shape_aug])\n",
"apply(img, augs)"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 5
}