|  | <!DOCTYPE html> | 
|  |  | 
|  | <html lang="en"> | 
|  | <head> | 
|  | <meta charset="utf-8"/> | 
|  | <meta content="IE=edge" http-equiv="X-UA-Compatible"/> | 
|  | <meta content="width=device-width, initial-scale=1" name="viewport"/> | 
|  | <title>MXNet Scala Data Loading API — mxnet  documentation</title> | 
|  | <link crossorigin="anonymous" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css" integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7" rel="stylesheet"/> | 
|  | <link href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css" rel="stylesheet"/> | 
|  | <link href="../../_static/basic.css" rel="stylesheet" type="text/css"/> | 
|  | <link href="../../_static/pygments.css" rel="stylesheet" type="text/css"/> | 
|  | <link href="../../_static/mxnet.css" rel="stylesheet" type="text/css"> | 
|  | <script type="text/javascript"> | 
|  | var DOCUMENTATION_OPTIONS = { | 
|  | URL_ROOT:    '../../', | 
|  | VERSION:     '', | 
|  | COLLAPSE_INDEX: false, | 
|  | FILE_SUFFIX: '.html', | 
|  | HAS_SOURCE:  true, | 
|  | SOURCELINK_SUFFIX: '' | 
|  | }; | 
|  | </script> | 
|  | <script src="https://code.jquery.com/jquery-1.11.1.min.js" type="text/javascript"></script> | 
|  | <script src="../../_static/underscore.js" type="text/javascript"></script> | 
|  | <script src="../../_static/searchtools_custom.js" type="text/javascript"></script> | 
|  | <script src="../../_static/doctools.js" type="text/javascript"></script> | 
|  | <script src="../../_static/selectlang.js" type="text/javascript"></script> | 
|  | <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script> | 
|  | <script type="text/javascript"> jQuery(function() { Search.loadIndex("/searchindex.js"); Search.init();}); </script> | 
|  | <script> | 
|  | (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ | 
|  | (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new | 
|  | Date();a=s.createElement(o), | 
|  | m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) | 
|  | })(window,document,'script','https://www.google-analytics.com/analytics.js','ga'); | 
|  |  | 
|  | ga('create', 'UA-96378503-1', 'auto'); | 
|  | ga('send', 'pageview'); | 
|  |  | 
|  | </script> | 
|  | <!-- --> | 
|  | <!-- <script type="text/javascript" src="../../_static/jquery.js"></script> --> | 
|  | <!-- --> | 
|  | <!-- <script type="text/javascript" src="../../_static/underscore.js"></script> --> | 
|  | <!-- --> | 
|  | <!-- <script type="text/javascript" src="../../_static/doctools.js"></script> --> | 
|  | <!-- --> | 
|  | <!-- <script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script> --> | 
|  | <!-- --> | 
|  | <link href="index.html" rel="up" title="MXNet - Scala API"/> | 
|  | <link href="ndarray.html" rel="next" title="NDArray API"> | 
|  | <link href="symbol.html" rel="prev" title="MXNet Scala Symbolic API"> | 
|  | <link href="https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/image/mxnet-icon.png" rel="icon" type="image/png"/> | 
|  | </link></link></link></head> | 
|  | <body background="https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/image/mxnet-background-compressed.jpeg" role="document"> | 
|  | <div class="content-block"><div class="navbar navbar-fixed-top"> | 
|  | <div class="container" id="navContainer"> | 
|  | <div class="innder" id="header-inner"> | 
|  | <h1 id="logo-wrap"> | 
|  | <a href="../../" id="logo"><img src="https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/image/mxnet_logo.png"/></a> | 
|  | </h1> | 
|  | <nav class="nav-bar" id="main-nav"> | 
|  | <a class="main-nav-link" href="../../install/index.html">Install</a> | 
|  | <a class="main-nav-link" href="../../tutorials/index.html">Tutorials</a> | 
|  | <span id="dropdown-menu-position-anchor"> | 
|  | <a aria-expanded="true" aria-haspopup="true" class="main-nav-link dropdown-toggle" data-toggle="dropdown" href="#" role="button">Gluon <span class="caret"></span></a> | 
|  | <ul class="dropdown-menu navbar-menu" id="package-dropdown-menu"> | 
|  | <li><a class="main-nav-link" href="../../gluon/index.html">About</a></li> | 
|  | <li><a class="main-nav-link" href="http://gluon.mxnet.io">Tutorials</a></li> | 
|  | </ul> | 
|  | </span> | 
|  | <span id="dropdown-menu-position-anchor"> | 
|  | <a aria-expanded="true" aria-haspopup="true" class="main-nav-link dropdown-toggle" data-toggle="dropdown" href="#" role="button">API <span class="caret"></span></a> | 
|  | <ul class="dropdown-menu navbar-menu" id="package-dropdown-menu"> | 
|  | <li><a class="main-nav-link" href="../../api/python/index.html">Python</a></li> | 
|  | <li><a class="main-nav-link" href="../../api/scala/index.html">Scala</a></li> | 
|  | <li><a class="main-nav-link" href="../../api/r/index.html">R</a></li> | 
|  | <li><a class="main-nav-link" href="../../api/julia/index.html">Julia</a></li> | 
|  | <li><a class="main-nav-link" href="../../api/c++/index.html">C++</a></li> | 
|  | <li><a class="main-nav-link" href="../../api/perl/index.html">Perl</a></li> | 
|  | </ul> | 
|  | </span> | 
|  | <span id="dropdown-menu-position-anchor-docs"> | 
|  | <a aria-expanded="true" aria-haspopup="true" class="main-nav-link dropdown-toggle" data-toggle="dropdown" href="#" role="button">Docs <span class="caret"></span></a> | 
|  | <ul class="dropdown-menu navbar-menu" id="package-dropdown-menu-docs"> | 
|  | <li><a class="main-nav-link" href="../../faq/index.html">FAQ</a></li> | 
|  | <li><a class="main-nav-link" href="../../architecture/index.html">Architecture</a></li> | 
|  | <li><a class="main-nav-link" href="https://github.com/apache/incubator-mxnet/tree/1.0.0/example">Examples</a></li> | 
|  | <li><a class="main-nav-link" href="../../model_zoo/index.html">Model Zoo</a></li> | 
|  | </ul> | 
|  | </span> | 
|  | <a class="main-nav-link" href="https://github.com/dmlc/mxnet">Github</a> | 
|  | <span id="dropdown-menu-position-anchor-community"> | 
|  | <a aria-expanded="true" aria-haspopup="true" class="main-nav-link dropdown-toggle" data-toggle="dropdown" href="#" role="button">Community <span class="caret"></span></a> | 
|  | <ul class="dropdown-menu navbar-menu" id="package-dropdown-menu-community"> | 
|  | <li><a class="main-nav-link" href="../../community/index.html">Community</a></li> | 
|  | <li><a class="main-nav-link" href="../../community/contribute.html">Contribute</a></li> | 
|  | <li><a class="main-nav-link" href="../../community/powered_by.html">Powered By</a></li> | 
|  | </ul> | 
|  | </span> | 
|  | <a class="main-nav-link" href="http://discuss.mxnet.io">Discuss</a> | 
|  | <span id="dropdown-menu-position-anchor-version" style="position: relative"><a href="#" class="main-nav-link dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="true">Versions(1.0.0)<span class="caret"></span></a><ul id="package-dropdown-menu" class="dropdown-menu"><li><a class="main-nav-link" href=https://mxnet.incubator.apache.org/>1.1.0</a></li><li><a class="main-nav-link" href=https://mxnet.incubator.apache.org/versions/1.0.0/index.html>1.0.0</a></li><li><a class="main-nav-link" href=https://mxnet.incubator.apache.org/versions/0.12.1/index.html>0.12.1</a></li><li><a class="main-nav-link" href=https://mxnet.incubator.apache.org/versions/0.12.0/index.html>0.12.0</a></li><li><a class="main-nav-link" href=https://mxnet.incubator.apache.org/versions/0.11.0/index.html>0.11.0</a></li><li><a class="main-nav-link" href=https://mxnet.incubator.apache.org/versions/master/index.html>master</a></li></ul></span></nav> | 
|  | <script> function getRootPath(){ return "../../" } </script> | 
|  | <div class="burgerIcon dropdown"> | 
|  | <a class="dropdown-toggle" data-toggle="dropdown" href="#" role="button">☰</a> | 
|  | <ul class="dropdown-menu" id="burgerMenu"> | 
|  | <li><a href="../../install/index.html">Install</a></li> | 
|  | <li><a class="main-nav-link" href="../../tutorials/index.html">Tutorials</a></li> | 
|  | <li class="dropdown-submenu"> | 
|  | <a href="#" tabindex="-1">Community</a> | 
|  | <ul class="dropdown-menu"> | 
|  | <li><a href="../../community/index.html" tabindex="-1">Community</a></li> | 
|  | <li><a href="../../community/contribute.html" tabindex="-1">Contribute</a></li> | 
|  | <li><a href="../../community/powered_by.html" tabindex="-1">Powered By</a></li> | 
|  | </ul> | 
|  | </li> | 
|  | <li class="dropdown-submenu"> | 
|  | <a href="#" tabindex="-1">API</a> | 
|  | <ul class="dropdown-menu"> | 
|  | <li><a href="../../api/python/index.html" tabindex="-1">Python</a> | 
|  | </li> | 
|  | <li><a href="../../api/scala/index.html" tabindex="-1">Scala</a> | 
|  | </li> | 
|  | <li><a href="../../api/r/index.html" tabindex="-1">R</a> | 
|  | </li> | 
|  | <li><a href="../../api/julia/index.html" tabindex="-1">Julia</a> | 
|  | </li> | 
|  | <li><a href="../../api/c++/index.html" tabindex="-1">C++</a> | 
|  | </li> | 
|  | <li><a href="../../api/perl/index.html" tabindex="-1">Perl</a> | 
|  | </li> | 
|  | </ul> | 
|  | </li> | 
|  | <li class="dropdown-submenu"> | 
|  | <a href="#" tabindex="-1">Docs</a> | 
|  | <ul class="dropdown-menu"> | 
|  | <li><a href="../../tutorials/index.html" tabindex="-1">Tutorials</a></li> | 
|  | <li><a href="../../faq/index.html" tabindex="-1">FAQ</a></li> | 
|  | <li><a href="../../architecture/index.html" tabindex="-1">Architecture</a></li> | 
|  | <li><a href="https://github.com/apache/incubator-mxnet/tree/1.0.0/example" tabindex="-1">Examples</a></li> | 
|  | <li><a href="../../model_zoo/index.html" tabindex="-1">Model Zoo</a></li> | 
|  | </ul> | 
|  | </li> | 
|  | <li><a href="../../architecture/index.html">Architecture</a></li> | 
|  | <li><a class="main-nav-link" href="https://github.com/dmlc/mxnet">Github</a></li> | 
|  | <li id="dropdown-menu-position-anchor-version-mobile" class="dropdown-submenu" style="position: relative"><a href="#" tabindex="-1">Versions(1.0.0)</a><ul class="dropdown-menu"><li><a tabindex="-1" href=https://mxnet.incubator.apache.org/>1.1.0</a></li><li><a tabindex="-1" href=https://mxnet.incubator.apache.org/versions/1.0.0/index.html>1.0.0</a></li><li><a tabindex="-1" href=https://mxnet.incubator.apache.org/versions/0.12.1/index.html>0.12.1</a></li><li><a tabindex="-1" href=https://mxnet.incubator.apache.org/versions/0.12.0/index.html>0.12.0</a></li><li><a tabindex="-1" href=https://mxnet.incubator.apache.org/versions/0.11.0/index.html>0.11.0</a></li><li><a tabindex="-1" href=https://mxnet.incubator.apache.org/versions/master/index.html>master</a></li></ul></li></ul> | 
|  | </div> | 
|  | <div class="plusIcon dropdown"> | 
|  | <a class="dropdown-toggle" data-toggle="dropdown" href="#" role="button"><span aria-hidden="true" class="glyphicon glyphicon-plus"></span></a> | 
|  | <ul class="dropdown-menu dropdown-menu-right" id="plusMenu"></ul> | 
|  | </div> | 
|  | <div id="search-input-wrap"> | 
|  | <form action="../../search.html" autocomplete="off" class="" method="get" role="search"> | 
|  | <div class="form-group inner-addon left-addon"> | 
|  | <i class="glyphicon glyphicon-search"></i> | 
|  | <input class="form-control" name="q" placeholder="Search" type="text"/> | 
|  | </div> | 
|  | <input name="check_keywords" type="hidden" value="yes"/> | 
|  | <input name="area" type="hidden" value="default"> | 
|  | </input></form> | 
|  | <div id="search-preview"></div> | 
|  | </div> | 
|  | <div id="searchIcon"> | 
|  | <span aria-hidden="true" class="glyphicon glyphicon-search"></span> | 
|  | </div> | 
|  | <!-- <div id="lang-select-wrap"> --> | 
|  | <!--   <label id="lang-select-label"> --> | 
|  | <!--     <\!-- <i class="fa fa-globe"></i> -\-> --> | 
|  | <!--     <span></span> --> | 
|  | <!--   </label> --> | 
|  | <!--   <select id="lang-select"> --> | 
|  | <!--     <option value="en">Eng</option> --> | 
|  | <!--     <option value="zh">中文</option> --> | 
|  | <!--   </select> --> | 
|  | <!-- </div> --> | 
|  | <!--     <a id="mobile-nav-toggle"> | 
|  | <span class="mobile-nav-toggle-bar"></span> | 
|  | <span class="mobile-nav-toggle-bar"></span> | 
|  | <span class="mobile-nav-toggle-bar"></span> | 
|  | </a> --> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <script type="text/javascript"> | 
|  | $('body').css('background', 'white'); | 
|  | </script> | 
|  | <div class="container"> | 
|  | <div class="row"> | 
|  | <div aria-label="main navigation" class="sphinxsidebar leftsidebar" role="navigation"> | 
|  | <div class="sphinxsidebarwrapper"> | 
|  | <ul class="current"> | 
|  | <li class="toctree-l1"><a class="reference internal" href="../python/index.html">Python Documents</a></li> | 
|  | <li class="toctree-l1"><a class="reference internal" href="../r/index.html">R Documents</a></li> | 
|  | <li class="toctree-l1"><a class="reference internal" href="../julia/index.html">Julia Documents</a></li> | 
|  | <li class="toctree-l1"><a class="reference internal" href="../c++/index.html">C++ Documents</a></li> | 
|  | <li class="toctree-l1 current"><a class="reference internal" href="index.html">Scala Documents</a><ul class="current"> | 
|  | <li class="toctree-l2 current"><a class="reference internal" href="index.html#scala-api-reference">Scala API Reference</a><ul class="current"> | 
|  | <li class="toctree-l3"><a class="reference internal" href="module.html">Module API is a flexible high-level interface for training neural networks.</a></li> | 
|  | <li class="toctree-l3"><a class="reference internal" href="model.html">Model API is an alternate simple high-level interface for training neural networks.</a></li> | 
|  | <li class="toctree-l3"><a class="reference internal" href="symbol.html">Symbolic API performs operations on NDArrays to assemble neural networks from layers.</a></li> | 
|  | <li class="toctree-l3 current"><a class="current reference internal" href="">IO Data Loading API performs parsing and data loading.</a><ul> | 
|  | <li class="toctree-l4"><a class="reference internal" href="#data-iterator-parameters">Data Iterator Parameters</a></li> | 
|  | <li class="toctree-l4"><a class="reference internal" href="#create-a-data-iterator">Create a Data Iterator</a></li> | 
|  | <li class="toctree-l4"><a class="reference internal" href="#how-to-get-data">How to Get Data</a></li> | 
|  | <li class="toctree-l4"><a class="reference internal" href="#create-a-dataset-using-recordio">Create a Dataset Using RecordIO</a></li> | 
|  | <li class="toctree-l4"><a class="reference internal" href="#next-steps">Next Steps</a></li> | 
|  | </ul> | 
|  | </li> | 
|  | <li class="toctree-l3"><a class="reference internal" href="ndarray.html">NDArray API performs vector/matrix/tensor operations.</a></li> | 
|  | <li class="toctree-l3"><a class="reference internal" href="kvstore.html">KVStore API performs multi-GPU and multi-host distributed training.</a></li> | 
|  | </ul> | 
|  | </li> | 
|  | <li class="toctree-l2"><a class="reference internal" href="index.html#resources">Resources</a></li> | 
|  | </ul> | 
|  | </li> | 
|  | <li class="toctree-l1"><a class="reference internal" href="../perl/index.html">Perl Documents</a></li> | 
|  | <li class="toctree-l1"><a class="reference internal" href="../../faq/index.html">HowTo Documents</a></li> | 
|  | <li class="toctree-l1"><a class="reference internal" href="../../architecture/index.html">System Documents</a></li> | 
|  | <li class="toctree-l1"><a class="reference internal" href="../../tutorials/index.html">Tutorials</a></li> | 
|  | <li class="toctree-l1"><a class="reference internal" href="../../community/index.html">Community</a></li> | 
|  | </ul> | 
|  | </div> | 
|  | </div> | 
|  | <div class="content"> | 
|  | <div class="page-tracker"></div> | 
|  | <div class="section" id="mxnet-scala-data-loading-api"> | 
|  | <span id="mxnet-scala-data-loading-api"></span><h1>MXNet Scala Data Loading API<a class="headerlink" href="#mxnet-scala-data-loading-api" title="Permalink to this headline">¶</a></h1> | 
|  | <p>This topic introduces the data input method for MXNet. MXNet uses an iterator to provide data to the neural network.  Iterators do some preprocessing and generate batches for the neural network.</p> | 
|  | <p>MXNet provides basic iterators for MNIST and RecordIO images. To hide the cost of I/O, MXNet uses a prefetch strategy that enables parallelism for the learning process and data fetching. Data is automatically fetched by an independent thread.</p> | 
|  | <p>Topics:</p> | 
|  | <ul class="simple"> | 
|  | <li><a class="reference external" href="#parameters-for-data-iterator">Data Iterator Parameters</a> clarifies the different usages for dataiter parameters.</li> | 
|  | <li><a class="reference external" href="#create-a-data-iterator">Create a Data Iterator</a> introduces how to create a data iterator in MXNet for Scala.</li> | 
|  | <li><a class="reference external" href="#how-to-get-data">How to Get Data</a> introduces the data resource and data preparation tools.</li> | 
|  | <li><a class="reference external" href="https://mxnet.incubator.apache.org/api/scala/docs/index.html#ml.dmlc.mxnet.IO$">IO API Reference</a> explains the IO API.</li> | 
|  | </ul> | 
|  | <div class="section" id="data-iterator-parameters"> | 
|  | <span id="data-iterator-parameters"></span><h2>Data Iterator Parameters<a class="headerlink" href="#data-iterator-parameters" title="Permalink to this headline">¶</a></h2> | 
|  | <p>To create a data iterator, you typically need to provide five parameters:</p> | 
|  | <ul class="simple"> | 
|  | <li><strong>Dataset Param</strong> provides basic information about the dataset, e.g., file path, input shape.</li> | 
|  | <li><strong>Batch Param</strong> provides information required to form a batch, e.g., batch size.</li> | 
|  | <li><strong>Augmentation Param</strong> tells MXNet which augmentation operations (e.g., crop or mirror) to perform on an input image.</li> | 
|  | <li><strong>Backend Param</strong> controls the behavior of the back-end threads to hide the cost of data loading.</li> | 
|  | <li><strong>Auxiliary Param</strong> provides options for checking and debugging.</li> | 
|  | </ul> | 
|  | <p>You <em>must</em> provide the <strong>Dataset Param</strong> and <strong>Batch Param</strong>, otherwise MXNet can’t create the data batch. Provide other parameters as required by your algorithm and performance needs. We provide a detailed explanation and examples of the options later.</p> | 
|  | </div> | 
|  | <div class="section" id="create-a-data-iterator"> | 
|  | <span id="create-a-data-iterator"></span><h2>Create a Data Iterator<a class="headerlink" href="#create-a-data-iterator" title="Permalink to this headline">¶</a></h2> | 
|  | <p>The IO API provides a simple way to create a data iterator in Scala. | 
|  | The following example code shows how to create a CIFAR data iterator.</p> | 
|  | <div class="highlight-scala"><div class="highlight"><pre><span></span>     <span class="k">val</span> <span class="n">dataiter</span> <span class="k">=</span> <span class="nc">IO</span><span class="o">.</span><span class="nc">ImageRecordIter</span><span class="o">(</span><span class="nc">Map</span><span class="o">(</span> | 
|  | <span class="c1">// Utility Parameter</span> | 
|  | <span class="c1">// Optional</span> | 
|  | <span class="c1">// Name of the data, should match the name of the data input of the network</span> | 
|  | <span class="c1">// data_name='data',</span> | 
|  | <span class="c1">// Utility Parameter</span> | 
|  | <span class="c1">// Optional</span> | 
|  | <span class="c1">// Name of the label, should match the name of the label parameter of the network</span> | 
|  | <span class="c1">// Usually, if the loss layer is named 'foo', then the label input has the name</span> | 
|  | <span class="c1">// 'foo_label', unless overwritten</span> | 
|  | <span class="c1">// label_name='softmax_label',</span> | 
|  | <span class="c1">// Dataset Parameter</span> | 
|  | <span class="c1">// Impulsary</span> | 
|  | <span class="c1">// indicating the data file, please check the data is already there</span> | 
|  | <span class="s">"path_imgrec"</span> <span class="o">-></span> <span class="s">"data/cifar/train.rec"</span><span class="o">,</span> | 
|  | <span class="c1">// Dataset Parameter</span> | 
|  | <span class="c1">// Impulsary</span> | 
|  | <span class="c1">// indicating the image size after preprocessing</span> | 
|  | <span class="s">"data_shape"</span> <span class="o">-></span> <span class="s">"(3,28,28)"</span><span class="o">,</span> | 
|  | <span class="c1">// Batch Parameter</span> | 
|  | <span class="c1">// Impulsary</span> | 
|  | <span class="c1">// tells how many images in a batch</span> | 
|  | <span class="s">"batch_size"</span> <span class="o">-></span> <span class="s">"100"</span><span class="o">,</span> | 
|  | <span class="c1">// Augmentation Parameter</span> | 
|  | <span class="c1">// Optional</span> | 
|  | <span class="c1">// when offers mean_img, each image will subtract the mean value at each pixel</span> | 
|  | <span class="s">"mean_img"</span> <span class="o">-></span> <span class="s">"data/cifar/cifar10_mean.bin"</span><span class="o">,</span> | 
|  | <span class="c1">// Augmentation Parameter</span> | 
|  | <span class="c1">// Optional</span> | 
|  | <span class="c1">// randomly crop a patch of the data_shape from the original image</span> | 
|  | <span class="s">"rand_crop"</span> <span class="o">-></span> <span class="s">"True"</span><span class="o">,</span> | 
|  | <span class="c1">// Augmentation Parameter</span> | 
|  | <span class="c1">// Optional</span> | 
|  | <span class="c1">// randomly mirror the image horizontally</span> | 
|  | <span class="s">"rand_mirror"</span> <span class="o">-></span> <span class="s">"True"</span><span class="o">,</span> | 
|  | <span class="c1">// Augmentation Parameter</span> | 
|  | <span class="c1">// Optional</span> | 
|  | <span class="c1">// randomly shuffle the data</span> | 
|  | <span class="s">"shuffle"</span> <span class="o">-></span> <span class="s">"False"</span><span class="o">,</span> | 
|  | <span class="c1">// Backend Parameter</span> | 
|  | <span class="c1">// Optional</span> | 
|  | <span class="c1">// Preprocessing thread number</span> | 
|  | <span class="s">"preprocess_threads"</span> <span class="o">-></span> <span class="s">"4"</span><span class="o">,</span> | 
|  | <span class="c1">// Backend Parameter</span> | 
|  | <span class="c1">// Optional</span> | 
|  | <span class="c1">// Prefetch buffer size</span> | 
|  | <span class="s">"prefetch_buffer"</span> <span class="k">=</span> <span class="s">"1"</span><span class="o">))</span> | 
|  | </pre></div> | 
|  | </div> | 
|  | <p>First, explicitly specify the kind of data (MNIST, ImageRecord, etc.) to fetch. Then, provide the options for the dataset, batching, image augmentation, multi-tread processing,  and prefetching operations. The code automatically validates the parameters. If a required parameter is missing, MXNet returns an error.</p> | 
|  | </div> | 
|  | <div class="section" id="how-to-get-data"> | 
|  | <span id="how-to-get-data"></span><h2>How to Get Data<a class="headerlink" href="#how-to-get-data" title="Permalink to this headline">¶</a></h2> | 
|  | <p>We provide <a class="reference external" href="https://github.com/dmlc/mxnet/tree/master/scala-package/core/scripts">scripts</a> to download MNIST data and CIFAR10 ImageRecord data. If you want to create your own dataset, we recommend using the Image RecordIO data format.</p> | 
|  | </div> | 
|  | <div class="section" id="create-a-dataset-using-recordio"> | 
|  | <span id="create-a-dataset-using-recordio"></span><h2>Create a Dataset Using RecordIO<a class="headerlink" href="#create-a-dataset-using-recordio" title="Permalink to this headline">¶</a></h2> | 
|  | <p>RecordIO implements a file format for a sequence of records. We recommend storing images as records and packing them together. The benefits include:</p> | 
|  | <ul class="simple"> | 
|  | <li>Storing images in a compact format–e.g., JPEG, for records–greatly reduces the size of the dataset on the disk.</li> | 
|  | <li>Packing data together allows continuous reading on the disk.</li> | 
|  | <li>RecordIO has a simple way to partition, simplifying distributed setting. We provide an example later.</li> | 
|  | </ul> | 
|  | <p>We provide the <a class="reference external" href="https://github.com/dmlc/mxnet/blob/master/tools/im2rec.cc">im2rec tool</a> so you can create an Image RecordIO dataset by yourself. The following walkthrough shows you how.</p> | 
|  | <div class="section" id="prerequisites"> | 
|  | <span id="prerequisites"></span><h3>Prerequisites<a class="headerlink" href="#prerequisites" title="Permalink to this headline">¶</a></h3> | 
|  | <p>Download the data. You don’t need to resize the images manually. You can use <code class="docutils literal"><span class="pre">im2rec</span></code> to resize them automatically. For details, see “Extension: Using Multiple Labels for a Single Image,” later in this topic.</p> | 
|  | </div> | 
|  | <div class="section" id="step-1-make-an-image-list-file"> | 
|  | <span id="step-1-make-an-image-list-file"></span><h3>Step 1. Make an Image List File<a class="headerlink" href="#step-1-make-an-image-list-file" title="Permalink to this headline">¶</a></h3> | 
|  | <p>After you download the data, you need to make an image list file.  The format is:</p> | 
|  | <div class="highlight-python"><div class="highlight"><pre><span></span>    integer_image_index \t label_index \t path_to_image | 
|  | </pre></div> | 
|  | </div> | 
|  | <p>Typically, the program takes the list of names of all of the images, shuffles them, then separates them into two lists: a training filename list and a testing filename list. Write the list in the right format.</p> | 
|  | <p>This is an example file:</p> | 
|  | <div class="highlight-bash"><div class="highlight"><pre><span></span>    <span class="m">95099</span>  <span class="m">464</span>     n04467665_17283.JPEG | 
|  | <span class="m">10025081</span>        <span class="m">412</span>     ILSVRC2010_val_00025082.JPEG | 
|  | <span class="m">74181</span>   <span class="m">789</span>     n01915811_2739.JPEG | 
|  | <span class="m">10035553</span>        <span class="m">859</span>     ILSVRC2010_val_00035554.JPEG | 
|  | <span class="m">10048727</span>        <span class="m">929</span>     ILSVRC2010_val_00048728.JPEG | 
|  | <span class="m">94028</span>   <span class="m">924</span>     n01980166_4956.JPEG | 
|  | <span class="m">1080682</span> <span class="m">650</span>     n11807979_571.JPEG | 
|  | <span class="m">972457</span>  <span class="m">633</span>     n07723039_1627.JPEG | 
|  | <span class="m">7534</span>    <span class="m">11</span>      n01630670_4486.JPEG | 
|  | <span class="m">1191261</span> <span class="m">249</span>     n12407079_5106.JPEG | 
|  | </pre></div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="section" id="step-2-create-the-binary-file"> | 
|  | <span id="step-2-create-the-binary-file"></span><h3>Step 2. Create the Binary File<a class="headerlink" href="#step-2-create-the-binary-file" title="Permalink to this headline">¶</a></h3> | 
|  | <p>To generate a binary image, use <code class="docutils literal"><span class="pre">im2rec</span></code> in the tool folder. <code class="docutils literal"><span class="pre">im2rec</span></code> takes the path of the <code class="docutils literal"><span class="pre">_image</span> <span class="pre">list</span> <span class="pre">file_</span></code> you generated, the <code class="docutils literal"><span class="pre">_root</span> <span class="pre">path_</span></code> of the images, and the <code class="docutils literal"><span class="pre">_output</span> <span class="pre">file</span> <span class="pre">path_</span></code> as input. This process usually takes several hours, so be patient.</p> | 
|  | <p>A sample command:</p> | 
|  | <div class="highlight-bash"><div class="highlight"><pre><span></span>    ./bin/im2rec image.lst image_root_dir output.bin <span class="nv">resize</span><span class="o">=</span><span class="m">256</span> | 
|  | </pre></div> | 
|  | </div> | 
|  | <p>For more details, run <code class="docutils literal"><span class="pre">./bin/im2rec</span></code>.</p> | 
|  | </div> | 
|  | <div class="section" id="extension-multiple-labels-for-a-single-image"> | 
|  | <span id="extension-multiple-labels-for-a-single-image"></span><h3>Extension: Multiple Labels for a Single Image<a class="headerlink" href="#extension-multiple-labels-for-a-single-image" title="Permalink to this headline">¶</a></h3> | 
|  | <p>The <code class="docutils literal"><span class="pre">im2rec</span></code> tool and <code class="docutils literal"><span class="pre">IO.ImageRecordIter</span></code> have multi-label support for a single image. | 
|  | For example, if you have four labels for a single image, you can use the following procedure to use the RecordIO tools.</p> | 
|  | <ol> | 
|  | <li><p class="first">Write the image list files as follows:</p> | 
|  | <div class="highlight-python"><div class="highlight"><pre><span></span>    integer_image_index \t label_1 \t label_2 \t   label_3 \t label_4 \t path_to_image | 
|  | </pre></div> | 
|  | </div> | 
|  | </li> | 
|  | <li><p class="first">Run <code class="docutils literal"><span class="pre">im2rec</span></code>, adding a ‘label_width=4’ to the command argument, for example:</p> | 
|  | </li> | 
|  | </ol> | 
|  | <div class="highlight-bash"><div class="highlight"><pre><span></span>         ./bin/im2rec image.lst image_root_dir output.bin <span class="nv">resize</span><span class="o">=</span><span class="m">256</span> <span class="nv">label_width</span><span class="o">=</span><span class="m">4</span> | 
|  | </pre></div> | 
|  | </div> | 
|  | <ol class="simple"> | 
|  | <li>In the iterator generation code, set <code class="docutils literal"><span class="pre">label_width=4</span></code> and <code class="docutils literal"><span class="pre">path_imglist=<<the< span=""> <span class="pre">PATH</span> <span class="pre">TO</span> <span class="pre">YOUR</span> <span class="pre">image.lst>></span></the<></span></code>, for example:</li> | 
|  | </ol> | 
|  | <div class="highlight-scala"><div class="highlight"><pre><span></span>         <span class="k">val</span> <span class="n">dataiter</span> <span class="k">=</span> <span class="nc">IO</span><span class="o">.</span><span class="nc">ImageRecordIter</span><span class="o">(</span><span class="nc">Map</span><span class="o">(</span> | 
|  | <span class="s">"path_imgrec"</span> <span class="o">-></span> <span class="s">"data/cifar/train.rec"</span><span class="o">,</span> | 
|  | <span class="s">"data_shape"</span> <span class="o">-></span> <span class="s">"(3,28,28)"</span><span class="o">,</span> | 
|  | <span class="s">"path_imglist"</span> <span class="o">-></span> <span class="s">"data/cifar/image.lst"</span><span class="o">,</span> | 
|  | <span class="s">"label_width"</span> <span class="o">-></span> <span class="s">"4"</span> | 
|  | <span class="o">))</span> | 
|  | </pre></div> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="section" id="next-steps"> | 
|  | <span id="next-steps"></span><h2>Next Steps<a class="headerlink" href="#next-steps" title="Permalink to this headline">¶</a></h2> | 
|  | <ul class="simple"> | 
|  | <li><a class="reference internal" href="ndarray.html"><em>NDArray API</em></a> for vector/matrix/tensor operations</li> | 
|  | <li><a class="reference internal" href="kvstore.html"><em>KVStore API</em></a> for multi-GPU and multi-host distributed training</li> | 
|  | </ul> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div aria-label="main navigation" class="sphinxsidebar rightsidebar" role="navigation"> | 
|  | <div class="sphinxsidebarwrapper"> | 
|  | <h3><a href="../../index.html">Table Of Contents</a></h3> | 
|  | <ul> | 
|  | <li><a class="reference internal" href="#">MXNet Scala Data Loading API</a><ul> | 
|  | <li><a class="reference internal" href="#data-iterator-parameters">Data Iterator Parameters</a></li> | 
|  | <li><a class="reference internal" href="#create-a-data-iterator">Create a Data Iterator</a></li> | 
|  | <li><a class="reference internal" href="#how-to-get-data">How to Get Data</a></li> | 
|  | <li><a class="reference internal" href="#create-a-dataset-using-recordio">Create a Dataset Using RecordIO</a><ul> | 
|  | <li><a class="reference internal" href="#prerequisites">Prerequisites</a></li> | 
|  | <li><a class="reference internal" href="#step-1-make-an-image-list-file">Step 1. Make an Image List File</a></li> | 
|  | <li><a class="reference internal" href="#step-2-create-the-binary-file">Step 2. Create the Binary File</a></li> | 
|  | <li><a class="reference internal" href="#extension-multiple-labels-for-a-single-image">Extension: Multiple Labels for a Single Image</a></li> | 
|  | </ul> | 
|  | </li> | 
|  | <li><a class="reference internal" href="#next-steps">Next Steps</a></li> | 
|  | </ul> | 
|  | </li> | 
|  | </ul> | 
|  | </div> | 
|  | </div> | 
|  | </div><div class="footer"> | 
|  | <div class="section-disclaimer"> | 
|  | <div class="container"> | 
|  | <div> | 
|  | <img height="60" src="https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/image/apache_incubator_logo.png"/> | 
|  | <p> | 
|  | Apache MXNet is an effort undergoing incubation at The Apache Software Foundation (ASF), <strong>sponsored by the <i>Apache Incubator</i></strong>. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. | 
|  | </p> | 
|  | <p> | 
|  | "Copyright © 2017, The Apache Software Foundation | 
|  | Apache MXNet, MXNet, Apache, the Apache feather, and the Apache MXNet project logo are either registered trademarks or trademarks of the Apache Software Foundation." | 
|  | </p> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | </div> <!-- pagename != index --> | 
|  | </div> | 
|  | <script crossorigin="anonymous" integrity="sha384-0mSbJDEHialfmuBBQP6A4Qrprq5OVfW37PRR3j5ELqxss1yVqOtnepnHVP9aJ7xS" src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js"></script> | 
|  | <script src="../../_static/js/sidebar.js" type="text/javascript"></script> | 
|  | <script src="../../_static/js/search.js" type="text/javascript"></script> | 
|  | <script src="../../_static/js/navbar.js" type="text/javascript"></script> | 
|  | <script src="../../_static/js/clipboard.min.js" type="text/javascript"></script> | 
|  | <script src="../../_static/js/copycode.js" type="text/javascript"></script> | 
|  | <script src="../../_static/js/page.js" type="text/javascript"></script> | 
|  | <script type="text/javascript"> | 
|  | $('body').ready(function () { | 
|  | $('body').css('visibility', 'visible'); | 
|  | }); | 
|  | </script> | 
|  | </body> | 
|  | </html> |