This document summarizes supported data formats and iterator APIs to read the data including
.. autosummary:: :nosignatures: mxnet.io mxnet.recordio mxnet.image
First, let's see how to write an iterator for a new data format. The following iterator can be used to train a symbol whose input data variable has name data
and input label variable has name softmax_label
. The iterator also provides information about the batch, including the shapes and name.
>>> nd_iter = mx.io.NDArrayIter(data={'data':mx.nd.ones((100,10))}, ... label={'softmax_label':mx.nd.ones((100,))}, ... batch_size=25) >>> print(nd_iter.provide_data) [DataDesc[data,(25, 10L),<type 'numpy.float32'>,NCHW]] >>> print(nd_iter.provide_label) [DataDesc[softmax_label,(25,),<type 'numpy.float32'>,NCHW]]
Let's see a complete example of how to use data iterator in model training.
>>> data = mx.sym.Variable('data') >>> label = mx.sym.Variable('softmax_label') >>> fullc = mx.sym.FullyConnected(data=data, num_hidden=1) >>> loss = mx.sym.SoftmaxOutput(data=data, label=label) >>> mod = mx.mod.Module(loss, data_names=['data'], label_names=['softmax_label']) >>> mod.bind(data_shapes=nd_iter.provide_data, label_shapes=nd_iter.provide_label) >>> mod.fit(nd_iter, num_epoch=2)
A detailed tutorial is available at Iterators - Loading data.
.. currentmodule:: mxnet
.. autosummary:: :nosignatures: io.NDArrayIter io.CSVIter io.ImageRecordIter io.ImageRecordUInt8Iter io.MNISTIter recordio.MXRecordIO recordio.MXIndexedRecordIO image.ImageIter
Data structures and other iterators provided in the mxnet.io
packages.
.. autosummary:: :nosignatures: io.DataDesc io.DataBatch io.DataIter io.ResizeIter io.PrefetchingIter io.MXDataIter
A list of image modification functions provided by mxnet.image
.
.. autosummary:: :nosignatures: image.imdecode image.scale_down image.resize_short image.fixed_crop image.random_crop image.center_crop image.color_normalize image.random_size_crop image.ResizeAug image.RandomCropAug image.RandomSizedCropAug image.CenterCropAug image.RandomOrderAug image.ColorJitterAug image.LightingAug image.ColorNormalizeAug image.HorizontalFlipAug image.CastAug image.CreateAugmenter
Functions to read and write RecordIO files.
.. autosummary:: :nosignatures: recordio.pack recordio.unpack recordio.unpack_img recordio.pack_img
Writing a new data iterator in Python is straightforward. Most MXNet training/inference programs accept an iterable object with provide_data
and provide_label
properties. This tutorial explains how to write an iterator from scratch.
The following example demonstrates how to combine multiple data iterators into a single one. It can be used for multiple modality training such as image captioning, in which images are read by ImageRecordIter
while documents are read by CSVIter
class MultiIter: def __init__(self, iter_list): self.iters = iter_list def next(self): batches = [i.next() for i in self.iters] return DataBatch(data=[*b.data for b in batches], label=[*b.label for b in batches]) def reset(self): for i in self.iters: i.reset() @property def provide_data(self): return [*i.provide_data for i in self.iters] @property def provide_label(self): return [*i.provide_label for i in self.iters] iter = MultiIter([mx.io.ImageRecordIter('image.rec'), mx.io.CSVIter('txt.csv')])
Parsing and performing another pre-processing such as augmentation may be expensive. If performance is critical, we can implement a data iterator in C++. Refer to src/io for examples.
By default, the backend engine treats the first dimension of each data and label variable in data iterators as the batch size (i.e. NCHW
or NT
layout). In order to override the axis for batch size, the provide_data
(and provide_label
if there is label) properties should include the layouts. This is especially useful in RNN since TNC
layouts are often more efficient. For example:
@property def provide_data(self): return [DataDesc(name='seq_var', shape=(seq_length, batch_size), layout='TN')]
The backend engine will recognize the index of N
in the layout
as the axis for batch size.
.. automodule:: mxnet.io :members: .. automodule:: mxnet.image :members: .. automodule:: mxnet.recordio :members: