versions/1.0.0/_sources/api/python/io/io.txt - mxnet-site - Git at Google

 # Data Loading API

 ## Overview

 This document summarizes supported data formats and iterator APIs to read the
 data including

 ```eval_rst
 .. autosummary::
     :nosignatures:

     mxnet.io
     mxnet.recordio
     mxnet.image
 ```

 First, let's see how to write an iterator for a new data format.
 The following iterator can be used to train a symbol whose input data variable has
 name `data` and input label variable has name `softmax_label`.
 The iterator also provides information about the batch, including the
 shapes and name.

 ```python
 >>> nd_iter = mx.io.NDArrayIter(data={'data':mx.nd.ones((100,10))},
 ...                             label={'softmax_label':mx.nd.ones((100,))},
 ...                             batch_size=25)
 >>> print(nd_iter.provide_data)
 [DataDesc[data,(25, 10L),<type 'numpy.float32'>,NCHW]]
 >>> print(nd_iter.provide_label)
 [DataDesc[softmax_label,(25,),<type 'numpy.float32'>,NCHW]]
 ```

 Let's see a complete example of how to use data iterator in model training.
 ```python
 >>> data = mx.sym.Variable('data')
 >>> label = mx.sym.Variable('softmax_label')
 >>> fullc = mx.sym.FullyConnected(data=data, num_hidden=1)
 >>> loss = mx.sym.SoftmaxOutput(data=fullc, label=label)
 >>> mod = mx.mod.Module(loss, data_names=['data'], label_names=['softmax_label'])
 >>> mod.bind(data_shapes=nd_iter.provide_data, label_shapes=nd_iter.provide_label)
 >>> mod.fit(nd_iter, num_epoch=2)
 ```

 A detailed tutorial is available at
 [Iterators - Loading data](http://mxnet.io/tutorials/basic/data.html).

 ## Data iterators

 ```eval_rst
     .. currentmodule:: mxnet
 ```

 ```eval_rst
 .. autosummary::
     :nosignatures:

     io.NDArrayIter
     io.CSVIter
     io.LibSVMIter
     io.ImageRecordIter
     io.ImageRecordUInt8Iter
     io.MNISTIter
     recordio.MXRecordIO
     recordio.MXIndexedRecordIO
     image.ImageIter
     image.ImageDetIter
 ```

 ## Helper classes and functions


 Data structures and other iterators provided in the ``mxnet.io`` packages.

 ```eval_rst
 .. autosummary::
     :nosignatures:

     io.DataDesc
     io.DataBatch
     io.DataIter
     io.ResizeIter
     io.PrefetchingIter
     io.MXDataIter
 ```

 Functions to read and write RecordIO files.

 ```eval_rst
 .. autosummary::
     :nosignatures:

     recordio.pack
     recordio.unpack
     recordio.unpack_img
     recordio.pack_img
 ```

 ## Develop a new iterator

 Writing a new data iterator in Python is straightforward. Most MXNet
 training/inference programs accept an iterable object with ``provide_data``
 and ``provide_label`` properties.
 This [tutorial](http://mxnet.io/tutorials/basic/data.html) explains how to
 write an iterator from scratch.

 The following example demonstrates how to combine
 multiple data iterators into a single one. It can be used for multiple
 modality training such as image captioning, in which images are read by
 ``ImageRecordIter`` while documents are read by ``CSVIter``

 ```python
 class MultiIter:
     def __init__(self, iter_list):
         self.iters = iter_list
     def next(self):
         batches = [i.next() for i in self.iters]
         return DataBatch(data=[*b.data for b in batches],
                          label=[*b.label for b in batches])
     def reset(self):
         for i in self.iters:
             i.reset()
     @property
     def provide_data(self):
         return [*i.provide_data for i in self.iters]
     @property
     def provide_label(self):
         return [*i.provide_label for i in self.iters]

 iter = MultiIter([mx.io.ImageRecordIter('image.rec'), mx.io.CSVIter('txt.csv')])
 ```

 Parsing and performing another pre-processing such as augmentation may be expensive.
 If performance is critical, we can implement a data iterator in C++. Refer to
 [src/io](https://github.com/dmlc/mxnet/tree/master/src/io) for examples.

 ### Change batch layout

 By default, the backend engine treats the first dimension of each data and label variable in data
 iterators as the batch size (i.e. `NCHW` or `NT` layout). In order to override the axis for batch size,
 the `provide_data` (and `provide_label` if there is label) properties should include the layouts. This
 is especially useful in RNN since `TNC` layouts are often more efficient. For example:

 ```python
 @property
 def provide_data(self):
     return [DataDesc(name='seq_var', shape=(seq_length, batch_size), layout='TN')]
 ```
 The backend engine will recognize the index of `N` in the `layout` as the axis for batch size.

 ## API Reference

 <script type="text/javascript" src='../../_static/js/auto_module_index.js'></script>

 ```eval_rst
 .. automodule:: mxnet.io
     :members:
 .. automodule:: mxnet.recordio
     :members:
 ```
 <script>auto_index("api-reference");</script>
	# Data Loading API

	## Overview

	This document summarizes supported data formats and iterator APIs to read the
	data including

	```eval_rst
	.. autosummary::
	:nosignatures:

	mxnet.io
	mxnet.recordio
	mxnet.image
	```

	First, let's see how to write an iterator for a new data format.
	The following iterator can be used to train a symbol whose input data variable has
	name `data` and input label variable has name `softmax_label`.
	The iterator also provides information about the batch, including the
	shapes and name.

	```python
	>>> nd_iter = mx.io.NDArrayIter(data={'data':mx.nd.ones((100,10))},
	... label={'softmax_label':mx.nd.ones((100,))},
	... batch_size=25)
	>>> print(nd_iter.provide_data)
	[DataDesc[data,(25, 10L),<type 'numpy.float32'>,NCHW]]
	>>> print(nd_iter.provide_label)
	[DataDesc[softmax_label,(25,),<type 'numpy.float32'>,NCHW]]
	```

	Let's see a complete example of how to use data iterator in model training.
	```python
	>>> data = mx.sym.Variable('data')
	>>> label = mx.sym.Variable('softmax_label')
	>>> fullc = mx.sym.FullyConnected(data=data, num_hidden=1)
	>>> loss = mx.sym.SoftmaxOutput(data=fullc, label=label)
	>>> mod = mx.mod.Module(loss, data_names=['data'], label_names=['softmax_label'])
	>>> mod.bind(data_shapes=nd_iter.provide_data, label_shapes=nd_iter.provide_label)
	>>> mod.fit(nd_iter, num_epoch=2)
	```

	A detailed tutorial is available at
	[Iterators - Loading data](http://mxnet.io/tutorials/basic/data.html).

	## Data iterators

	```eval_rst
	.. currentmodule:: mxnet
	```

	```eval_rst
	.. autosummary::
	:nosignatures:

	io.NDArrayIter
	io.CSVIter
	io.LibSVMIter
	io.ImageRecordIter
	io.ImageRecordUInt8Iter
	io.MNISTIter
	recordio.MXRecordIO
	recordio.MXIndexedRecordIO
	image.ImageIter
	image.ImageDetIter
	```

	## Helper classes and functions


	Data structures and other iterators provided in the ``mxnet.io`` packages.

	```eval_rst
	.. autosummary::
	:nosignatures:

	io.DataDesc
	io.DataBatch
	io.DataIter
	io.ResizeIter
	io.PrefetchingIter
	io.MXDataIter
	```

	Functions to read and write RecordIO files.

	```eval_rst
	.. autosummary::
	:nosignatures:

	recordio.pack
	recordio.unpack
	recordio.unpack_img
	recordio.pack_img
	```

	## Develop a new iterator

	Writing a new data iterator in Python is straightforward. Most MXNet
	training/inference programs accept an iterable object with ``provide_data``
	and ``provide_label`` properties.
	This [tutorial](http://mxnet.io/tutorials/basic/data.html) explains how to
	write an iterator from scratch.

	The following example demonstrates how to combine
	multiple data iterators into a single one. It can be used for multiple
	modality training such as image captioning, in which images are read by
	``ImageRecordIter`` while documents are read by ``CSVIter``

	```python
	class MultiIter:
	def __init__(self, iter_list):
	self.iters = iter_list
	def next(self):
	batches = [i.next() for i in self.iters]
	return DataBatch(data=[*b.data for b in batches],
	label=[*b.label for b in batches])
	def reset(self):
	for i in self.iters:
	i.reset()
	@property
	def provide_data(self):
	return [*i.provide_data for i in self.iters]
	@property
	def provide_label(self):
	return [*i.provide_label for i in self.iters]

	iter = MultiIter([mx.io.ImageRecordIter('image.rec'), mx.io.CSVIter('txt.csv')])
	```

	Parsing and performing another pre-processing such as augmentation may be expensive.
	If performance is critical, we can implement a data iterator in C++. Refer to
	[src/io](https://github.com/dmlc/mxnet/tree/master/src/io) for examples.

	### Change batch layout

	By default, the backend engine treats the first dimension of each data and label variable in data
	iterators as the batch size (i.e. `NCHW` or `NT` layout). In order to override the axis for batch size,
	the `provide_data` (and `provide_label` if there is label) properties should include the layouts. This
	is especially useful in RNN since `TNC` layouts are often more efficient. For example:

	```python
	@property
	def provide_data(self):
	return [DataDesc(name='seq_var', shape=(seq_length, batch_size), layout='TN')]
	```
	The backend engine will recognize the index of `N` in the `layout` as the axis for batch size.

	## API Reference

	<script type="text/javascript" src='../../_static/js/auto_module_index.js'></script>

	```eval_rst
	.. automodule:: mxnet.io
	:members:
	.. automodule:: mxnet.recordio
	:members:
	```
	<script>auto_index("api-reference");</script>