blob: e415d9448b1245c5dc3a40d00b7356cf084b3fd3 [file] [log] [blame]
# Record IO - Pack free-format data in binary files
This tutorial will walk through the python interface for reading and writing
record io files. It can be useful when you need more more control over the
details of data pipeline. For example, when you need to augument image and label
together for detection and segmentation, or when you need a custom data iterator
for triplet sampling and negative sampling.
Setup environment first:
%matplotlib inline
from __future__ import print_function
import mxnet as mx
import numpy as np
import matplotlib.pyplot as plt
The relevent code is under `mx.recordio`. There are two classes: `MXRecordIO`,
which supports sequential read and write, and `MXIndexedRecordIO`, which
supports random read and sequential write.
## MXRecordIO
First let's take a look at `MXRecordIO`. We open a file `tmp.rec` and write 5
strings to it:
record = mx.recordio.MXRecordIO('tmp.rec', 'w')
for i in range(5):
Then we can read it back by opening the same file with 'r':
record = mx.recordio.MXRecordIO('tmp.rec', 'r')
while True:
item =
if not item:
print item
## MXIndexedRecordIO
Some times you need random access for more complex tasks. `MXIndexedRecordIO` is
designed for this. Here we create a indexed record `tmp.rec` and a corresponding
index file `tmp.idx`:
record = mx.recordio.MXIndexedRecordIO('tmp.idx', 'tmp.rec', 'w')
for i in range(5):
record.write_idx(i, 'record_%d'%i)
We can then access records with keys:
record = mx.recordio.MXIndexedRecordIO('tmp.idx', 'tmp.rec', 'r')
You can list all keys with:
## Packing and Unpacking Data
Each record in a .rec file can contain arbitrary binary data, but machine
learning data typically has a label/data structure. `mx.recordio` also contains
a few utility functions for packing such data, namely: `pack`, `unpack`,
`pack_img`, and `unpack_img`.
### Binary Data
`pack` and `unpack` are used for storing float (or 1d array of float) label and
binary data:
- pack:
# pack
data = 'data'
label1 = 1.0
header1 = mx.recordio.IRHeader(flag=0, label=label1, id=1, id2=0)
s1 = mx.recordio.pack(header1, data)
print('float label:', repr(s1))
label2 = [1.0, 2.0, 3.0]
header2 = mx.recordio.IRHeader(flag=0, label=label2, id=2, id2=0)
s2 = mx.recordio.pack(header2, data)
print('array label:', repr(s2))
- unpack:
### Image Data
`pack_img` and `unpack_img` are used for packing image data. Records packed by
`pack_img` can be loaded by ``.
- pack images
data = np.ones((3,3,1), dtype=np.uint8)
label = 1.0
header = mx.recordio.IRHeader(flag=0, label=label, id=0, id2=0)
s = mx.recordio.pack_img(header, data, quality=100, img_fmt='.jpg')
- unpack images