tutorials/basic/record_io.ipynb - mxnet-site - Git at Google

 {"nbformat": 4, "cells": [{"source": "# Record IO - Pack free-format data in binary files\n\nThis tutorial will walk through the python interface for reading and writing\nrecord io files. It can be useful when you need more more control over the\ndetails of data pipeline. For example, when you need to augument image and label\ntogether for detection and segmentation, or when you need a custom data iterator\nfor triplet sampling and negative sampling.\n\nSetup environment first:", "cell_type": "markdown", "metadata": {}}, {"source": "%matplotlib inline\nfrom __future__ import print_function\nimport mxnet as mx\nimport numpy as np\nimport matplotlib.pyplot as plt", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "The relevent code is under `mx.recordio`. There are two classes: `MXRecordIO`,\nwhich supports sequential read and write, and `MXIndexedRecordIO`, which\nsupports random read and sequential write.\n\n## MXRecordIO\n\nFirst let's take a look at `MXRecordIO`. We open a file `tmp.rec` and write 5\nstrings to it:", "cell_type": "markdown", "metadata": {}}, {"source": "record = mx.recordio.MXRecordIO('tmp.rec', 'w')\nfor i in range(5):\n    record.write('record_%d'%i)\nrecord.close()", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "Then we can read it back by opening the same file with 'r':", "cell_type": "markdown", "metadata": {}}, {"source": "record = mx.recordio.MXRecordIO('tmp.rec', 'r')\nwhile True:\n    item = record.read()\n    if not item:\n        break\n    print item\nrecord.close()", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "## MXIndexedRecordIO\n\nSome times you need random access for more complex tasks. `MXIndexedRecordIO` is\ndesigned for this. Here we create a indexed record `tmp.rec` and a corresponding\nindex file `tmp.idx`:", "cell_type": "markdown", "metadata": {}}, {"source": "record = mx.recordio.MXIndexedRecordIO('tmp.idx', 'tmp.rec', 'w')\nfor i in range(5):\n    record.write_idx(i, 'record_%d'%i)\nrecord.close()", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "We can then access records with keys:", "cell_type": "markdown", "metadata": {}}, {"source": "record = mx.recordio.MXIndexedRecordIO('tmp.idx', 'tmp.rec', 'r')\nrecord.read_idx(3)", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "You can list all keys with:", "cell_type": "markdown", "metadata": {}}, {"source": "record.keys", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "## Packing and Unpacking Data\n\nEach record in a .rec file can contain arbitrary binary data, but machine\nlearning data typically has a label/data structure. `mx.recordio` also contains\na few utility functions for packing such data, namely: `pack`, `unpack`,\n`pack_img`, and `unpack_img`.\n\n### Binary Data\n\n`pack` and `unpack` are used for storing float (or 1d array of float) label and\nbinary data:\n\n- pack:", "cell_type": "markdown", "metadata": {}}, {"source": "# pack\ndata = 'data'\nlabel1 = 1.0\nheader1 = mx.recordio.IRHeader(flag=0, label=label1, id=1, id2=0)\ns1 = mx.recordio.pack(header1, data)\nprint('float label:', repr(s1))\nlabel2 = [1.0, 2.0, 3.0]\nheader2 = mx.recordio.IRHeader(flag=0, label=label2, id=2, id2=0)\ns2 = mx.recordio.pack(header2, data)\nprint('array label:', repr(s2))", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "- unpack:", "cell_type": "markdown", "metadata": {}}, {"source": "print(*mx.recordio.unpack(s1))\nprint(*mx.recordio.unpack(s2))", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "### Image Data\n\n`pack_img` and `unpack_img` are used for packing image data. Records packed by\n`pack_img` can be loaded by `mx.io.ImageRecordIter`.\n\n- pack images", "cell_type": "markdown", "metadata": {}}, {"source": "data = np.ones((3,3,1), dtype=np.uint8)\nlabel = 1.0\nheader = mx.recordio.IRHeader(flag=0, label=label, id=0, id2=0)\ns = mx.recordio.pack_img(header, data, quality=100, img_fmt='.jpg')\nprint(repr(s))", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "- unpack images", "cell_type": "markdown", "metadata": {}}, {"source": "print(*mx.recordio.unpack_img(s))", "cell_type": "code", "execution_count": null, "outputs": [], "metadata": {}}, {"source": "\n<!-- INSERT SOURCE DOWNLOAD BUTTONS -->\n\n", "cell_type": "markdown", "metadata": {}}], "metadata": {"display_name": "", "name": "", "language": "python"}, "nbformat_minor": 2}