blob: 7eca53d9600e6227f659486cc89d83ad3a899266 [file] [log] [blame]
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Model Configuration &mdash; incubator-singa 0.3.0 documentation</title>
<link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
<link rel="top" title="incubator-singa 0.3.0 documentation" href="../index.html"/>
<script src="../_static/js/modernizr.min.js"></script>
</head>
<body class="wy-body-for-nav" role="document">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search">
<a href="../index.html" class="icon icon-home"> incubator-singa
<img src="../_static/singa.png" class="logo" />
</a>
<div class="version">
0.3.0
</div>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<ul>
<li class="toctree-l1"><a class="reference internal" href="../downloads.html">Download SINGA</a></li>
<li class="toctree-l1"><a class="reference internal" href="index.html">Documentation</a></li>
</ul>
<p class="caption"><span class="caption-text">Development</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../develop/schedule.html">Development Schedule</a></li>
<li class="toctree-l1"><a class="reference internal" href="../develop/how-contribute.html">How to Contribute to SINGA</a></li>
<li class="toctree-l1"><a class="reference internal" href="../develop/contribute-code.html">How to Contribute Code</a></li>
<li class="toctree-l1"><a class="reference internal" href="../develop/contribute-docs.html">How to Contribute Documentation</a></li>
</ul>
<p class="caption"><span class="caption-text">Community</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../community/source-repository.html">Source Repository</a></li>
<li class="toctree-l1"><a class="reference internal" href="../community/mail-lists.html">Project Mailing Lists</a></li>
<li class="toctree-l1"><a class="reference internal" href="../community/issue-tracking.html">Issue Tracking</a></li>
<li class="toctree-l1"><a class="reference internal" href="../community/team-list.html">The SINGA Team</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" role="navigation" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../index.html">incubator-singa</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="../index.html">Docs</a> &raquo;</li>
<li>Model Configuration</li>
<li class="wy-breadcrumbs-aside">
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<div class="section" id="model-configuration">
<span id="model-configuration"></span><h1>Model Configuration<a class="headerlink" href="#model-configuration" title="Permalink to this headline"></a></h1>
<hr class="docutils" />
<p>SINGA uses the stochastic gradient descent (SGD) algorithm to train parameters
of deep learning models. For each SGD iteration, there is a
<a class="reference external" href="architecture.html">Worker</a> computing
gradients of parameters from the NeuralNet and a <a class="reference external" href="#">Updater</a> updating parameter
values based on gradients. Hence the model configuration mainly consists these
three parts. We will introduce the NeuralNet, Worker and Updater in the
following paragraphs and describe the configurations for them. All model
configuration is specified in the model.conf file in the user provided
workspace folder. E.g., the <a class="reference external" href="https://github.com/apache/incubator-singa/tree/master/examples/cifar10">cifar10 example folder</a>
has a model.conf file.</p>
<div class="section" id="neuralnet">
<span id="neuralnet"></span><h2>NeuralNet<a class="headerlink" href="#neuralnet" title="Permalink to this headline"></a></h2>
<div class="section" id="uniform-model-neuralnet-representation">
<span id="uniform-model-neuralnet-representation"></span><h3>Uniform model (neuralnet) representation<a class="headerlink" href="#uniform-model-neuralnet-representation" title="Permalink to this headline"></a></h3>
<p><img src = "../_static/images/model-categorization.png" style = "width: 400px"> Fig. 1:
Deep learning model categorization</img></p>
<p>Many deep learning models have being proposed. Fig. 1 is a categorization of
popular deep learning models based on the layer connections. The
<a class="reference external" href="https://github.com/apache/incubator-singa/blob/master/include/neuralnet/neuralnet.h">NeuralNet</a>
abstraction of SINGA consists of multiple directly connected layers. This
abstraction is able to represent models from all the three categorizations.</p>
<ul class="simple">
<li>For the feed-forward models, their connections are already directed.</li>
<li>For the RNN models, we unroll them into directed connections, as shown in
Fig. 2.</li>
<li>For the undirected connections in RBM, DBM, etc., we replace each undirected
connection with two directed connection, as shown in Fig. 3.</li>
</ul>
<div style = "height: 200px">
<div style = "float:left; text-align: center">
<img src = "../_static/images/unroll-rbm.png" style = "width: 280px"> <br/>Fig. 2: Unroll RBM </img>
</div>
<div style = "float:left; text-align: center; margin-left: 40px">
<img src = "../_static/images/unroll-rnn.png" style = "width: 550px"> <br/>Fig. 3: Unroll RNN </img>
</div>
</div><p>In specific, the NeuralNet class is defined in
<a class="reference external" href="https://github.com/apache/incubator-singa/blob/master/include/neuralnet/neuralnet.h">neuralnet.h</a> :</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="o">...</span>
<span class="n">vector</span><span class="o">&lt;</span><span class="n">Layer</span><span class="o">*&gt;</span> <span class="n">layers_</span><span class="p">;</span>
<span class="o">...</span>
</pre></div>
</div>
<p>The Layer class is defined in
<a class="reference external" href="https://github.com/apache/incubator-singa/blob/master/include/neuralnet/base_layer.h">base_layer.h</a>:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">vector</span><span class="o">&lt;</span><span class="n">Layer</span><span class="o">*&gt;</span> <span class="n">srclayers_</span><span class="p">,</span> <span class="n">dstlayers_</span><span class="p">;</span>
<span class="n">LayerProto</span> <span class="n">layer_proto_</span><span class="p">;</span> <span class="o">//</span> <span class="n">layer</span> <span class="n">configuration</span><span class="p">,</span> <span class="n">including</span> <span class="n">meta</span> <span class="n">info</span><span class="p">,</span> <span class="n">e</span><span class="o">.</span><span class="n">g</span><span class="o">.</span><span class="p">,</span> <span class="n">name</span>
<span class="o">...</span>
</pre></div>
</div>
<p>The connection with other layers are kept in the <code class="docutils literal"><span class="pre">srclayers_</span></code> and <code class="docutils literal"><span class="pre">dstlayers_</span></code>.
Since there are many different feature transformations, there are many
different Layer implementations correspondingly. For layers that have
parameters in their feature transformation functions, they would have Param
instances in the layer class, e.g.,</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">Param</span> <span class="n">weight</span><span class="p">;</span>
</pre></div>
</div>
</div>
<div class="section" id="configure-the-structure-of-a-neuralnet-instance">
<span id="configure-the-structure-of-a-neuralnet-instance"></span><h3>Configure the structure of a NeuralNet instance<a class="headerlink" href="#configure-the-structure-of-a-neuralnet-instance" title="Permalink to this headline"></a></h3>
<p>To train a deep learning model, the first step is to write the configurations
for the model structure, i.e., the layers and connections for the NeuralNet.
Like <a class="reference external" href="http://caffe.berkeleyvision.org/">Caffe</a>, we use the <a class="reference external" href="https://developers.google.com/protocol-buffers/">Google Protocol
Buffer</a> to define the
configuration protocol. The
<a class="reference external" href="https://github.com/apache/incubator-singa/blob/master/src/proto/model.proto">NetProto</a>
specifies the configuration fields for a NeuralNet instance,</p>
<p>message NetProto {
repeated LayerProto layer = 1;
...
}</p>
<p>The configuration is then</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">layer</span> <span class="p">{</span>
<span class="o">//</span> <span class="n">layer</span> <span class="n">configuration</span>
<span class="p">}</span>
<span class="n">layer</span> <span class="p">{</span>
<span class="o">//</span> <span class="n">layer</span> <span class="n">configuration</span>
<span class="p">}</span>
<span class="o">...</span>
</pre></div>
</div>
<p>To configure the model structure, we just configure each layer involved in the model.</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">message</span> <span class="n">LayerProto</span> <span class="p">{</span>
<span class="o">//</span> <span class="n">the</span> <span class="n">layer</span> <span class="n">name</span> <span class="n">used</span> <span class="k">for</span> <span class="n">identification</span>
<span class="n">required</span> <span class="n">string</span> <span class="n">name</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="o">//</span> <span class="n">source</span> <span class="n">layer</span> <span class="n">names</span>
<span class="n">repeated</span> <span class="n">string</span> <span class="n">srclayers</span> <span class="o">=</span> <span class="mi">3</span><span class="p">;</span>
<span class="o">//</span> <span class="n">parameters</span><span class="p">,</span> <span class="n">e</span><span class="o">.</span><span class="n">g</span><span class="o">.</span><span class="p">,</span> <span class="n">weight</span> <span class="n">matrix</span> <span class="ow">or</span> <span class="n">bias</span> <span class="n">vector</span>
<span class="n">repeated</span> <span class="n">ParamProto</span> <span class="n">param</span> <span class="o">=</span> <span class="mi">12</span><span class="p">;</span>
<span class="o">//</span> <span class="n">the</span> <span class="n">layer</span> <span class="nb">type</span> <span class="kn">from</span> <span class="nn">the</span> <span class="n">enum</span> <span class="n">above</span>
<span class="n">required</span> <span class="n">LayerType</span> <span class="nb">type</span> <span class="o">=</span> <span class="mi">20</span><span class="p">;</span>
<span class="o">//</span> <span class="n">configuration</span> <span class="k">for</span> <span class="n">convolution</span> <span class="n">layer</span>
<span class="n">optional</span> <span class="n">ConvolutionProto</span> <span class="n">convolution_conf</span> <span class="o">=</span> <span class="mi">30</span><span class="p">;</span>
<span class="o">//</span> <span class="n">configuration</span> <span class="k">for</span> <span class="n">concatenation</span> <span class="n">layer</span>
<span class="n">optional</span> <span class="n">ConcateProto</span> <span class="n">concate_conf</span> <span class="o">=</span> <span class="mi">31</span><span class="p">;</span>
<span class="o">//</span> <span class="n">configuration</span> <span class="k">for</span> <span class="n">dropout</span> <span class="n">layer</span>
<span class="n">optional</span> <span class="n">DropoutProto</span> <span class="n">dropout_conf</span> <span class="o">=</span> <span class="mi">33</span><span class="p">;</span>
<span class="o">...</span>
<span class="p">}</span>
</pre></div>
</div>
<p>A sample configuration for a feed-forward model is like</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">layer</span> <span class="p">{</span>
<span class="n">name</span> <span class="p">:</span> <span class="s2">&quot;input&quot;</span>
<span class="nb">type</span> <span class="p">:</span> <span class="n">kRecordInput</span>
<span class="p">}</span>
<span class="n">layer</span> <span class="p">{</span>
<span class="n">name</span> <span class="p">:</span> <span class="s2">&quot;conv&quot;</span>
<span class="nb">type</span> <span class="p">:</span> <span class="n">kInnerProduct</span>
<span class="n">srclayers</span> <span class="p">:</span> <span class="s2">&quot;input&quot;</span>
<span class="n">param</span> <span class="p">{</span>
<span class="o">//</span> <span class="n">configuration</span> <span class="k">for</span> <span class="n">parameter</span>
<span class="p">}</span>
<span class="n">innerproduct_conf</span> <span class="p">{</span>
<span class="o">//</span> <span class="n">configuration</span> <span class="k">for</span> <span class="n">this</span> <span class="n">specific</span> <span class="n">layer</span>
<span class="p">}</span>
<span class="o">...</span>
<span class="p">}</span>
</pre></div>
</div>
<p>The layer type list is defined in
<a class="reference external" href="https://github.com/apache/incubator-singa/blob/master/src/proto/model.proto">LayerType</a>.
One type (kFoo) corresponds to one child class of Layer (FooLayer) and one
configuration field (foo_conf). All built-in layers are introduced in the <a class="reference external" href="layer.html">layer page</a>.</p>
</div>
</div>
<div class="section" id="worker">
<span id="worker"></span><h2>Worker<a class="headerlink" href="#worker" title="Permalink to this headline"></a></h2>
<p>At the beginning, the Work will initialize the values of Param instances of
each layer either randomly (according to user configured distribution) or
loading from a <a class="reference external" href="#">checkpoint file</a>. For each training iteration, the worker
visits layers of the neural network to compute gradients of Param instances of
each layer. Corresponding to the three categories of models, there are three
different algorithm to compute the gradients of a neural network.</p>
<ol class="simple">
<li>Back-propagation (BP) for feed-forward models</li>
<li>Back-propagation through time (BPTT) for recurrent neural networks</li>
<li>Contrastive divergence (CD) for RBM, DBM, etc models.</li>
</ol>
<p>SINGA has provided these three algorithms as three Worker implementations.
Users only need to configure in the model.conf file to specify which algorithm
should be used. The configuration protocol is</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">message</span> <span class="n">ModelProto</span> <span class="p">{</span>
<span class="o">...</span>
<span class="n">enum</span> <span class="n">GradCalcAlg</span> <span class="p">{</span>
<span class="o">//</span> <span class="n">BP</span> <span class="n">algorithm</span> <span class="k">for</span> <span class="n">feed</span><span class="o">-</span><span class="n">forward</span> <span class="n">models</span><span class="p">,</span> <span class="n">e</span><span class="o">.</span><span class="n">g</span><span class="o">.</span><span class="p">,</span> <span class="n">CNN</span><span class="p">,</span> <span class="n">MLP</span><span class="p">,</span> <span class="n">RNN</span>
<span class="n">kBP</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="o">//</span> <span class="n">BPTT</span> <span class="k">for</span> <span class="n">recurrent</span> <span class="n">neural</span> <span class="n">networks</span>
<span class="n">kBPTT</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span>
<span class="o">//</span> <span class="n">CD</span> <span class="n">algorithm</span> <span class="k">for</span> <span class="n">RBM</span><span class="p">,</span> <span class="n">DBM</span> <span class="n">etc</span><span class="o">.</span><span class="p">,</span> <span class="n">models</span>
<span class="n">kCd</span> <span class="o">=</span> <span class="mi">3</span><span class="p">;</span>
<span class="p">}</span>
<span class="o">//</span> <span class="n">gradient</span> <span class="n">calculation</span> <span class="n">algorithm</span>
<span class="n">required</span> <span class="n">GradCalcAlg</span> <span class="n">alg</span> <span class="o">=</span> <span class="mi">8</span> <span class="p">[</span><span class="n">default</span> <span class="o">=</span> <span class="n">kBackPropagation</span><span class="p">];</span>
<span class="o">...</span>
<span class="p">}</span>
</pre></div>
</div>
<p>These algorithms override the TrainOneBatch function of the Worker. E.g., the
BPWorker implements it as</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">void</span> <span class="n">BPWorker</span><span class="p">::</span><span class="n">TrainOneBatch</span><span class="p">(</span><span class="nb">int</span> <span class="n">step</span><span class="p">,</span> <span class="n">Metric</span><span class="o">*</span> <span class="n">perf</span><span class="p">)</span> <span class="p">{</span>
<span class="n">Forward</span><span class="p">(</span><span class="n">step</span><span class="p">,</span> <span class="n">kTrain</span><span class="p">,</span> <span class="n">train_net_</span><span class="p">,</span> <span class="n">perf</span><span class="p">);</span>
<span class="n">Backward</span><span class="p">(</span><span class="n">step</span><span class="p">,</span> <span class="n">train_net_</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
</div>
<p>The Forward function passes the raw input features of one mini-batch through
all layers, and the Backward function visits the layers in reverse order to
compute the gradients of the loss w.r.t each layer&#8217;s feature and each layer&#8217;s
Param objects. Different algorithms would visit the layers in different orders.
Some may traverses the neural network multiple times, e.g., the CDWorker&#8217;s
TrainOneBatch function is:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">void</span> <span class="n">CDWorker</span><span class="p">::</span><span class="n">TrainOneBatch</span><span class="p">(</span><span class="nb">int</span> <span class="n">step</span><span class="p">,</span> <span class="n">Metric</span><span class="o">*</span> <span class="n">perf</span><span class="p">)</span> <span class="p">{</span>
<span class="n">PostivePhase</span><span class="p">(</span><span class="n">step</span><span class="p">,</span> <span class="n">kTrain</span><span class="p">,</span> <span class="n">train_net_</span><span class="p">,</span> <span class="n">perf</span><span class="p">);</span>
<span class="n">NegativePhase</span><span class="p">(</span><span class="n">step</span><span class="p">,</span> <span class="n">kTran</span><span class="p">,</span> <span class="n">train_net_</span><span class="p">,</span> <span class="n">perf</span><span class="p">);</span>
<span class="n">GradientPhase</span><span class="p">(</span><span class="n">step</span><span class="p">,</span> <span class="n">train_net_</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
</div>
<p>Each <code class="docutils literal"><span class="pre">*Phase</span></code> function would visit all layers one or multiple times.
All algorithms will finally call two functions of the Layer class:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span> <span class="o">/**</span>
<span class="o">*</span> <span class="n">Transform</span> <span class="n">features</span> <span class="kn">from</span> <span class="nn">connected</span> <span class="n">layers</span> <span class="n">into</span> <span class="n">features</span> <span class="n">of</span> <span class="n">this</span> <span class="n">layer</span><span class="o">.</span>
<span class="o">*</span>
<span class="o">*</span> <span class="nd">@param</span> <span class="n">phase</span> <span class="n">kTrain</span><span class="p">,</span> <span class="n">kTest</span><span class="p">,</span> <span class="n">kPositive</span><span class="p">,</span> <span class="n">etc</span><span class="o">.</span>
<span class="o">*/</span>
<span class="n">virtual</span> <span class="n">void</span> <span class="n">ComputeFeature</span><span class="p">(</span><span class="n">Phase</span> <span class="n">phase</span><span class="p">,</span> <span class="n">Metric</span><span class="o">*</span> <span class="n">perf</span><span class="p">)</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="o">/**</span>
<span class="o">*</span> <span class="n">Compute</span> <span class="n">gradients</span> <span class="k">for</span> <span class="n">parameters</span> <span class="p">(</span><span class="ow">and</span> <span class="n">connected</span> <span class="n">layers</span><span class="p">)</span><span class="o">.</span>
<span class="o">*</span>
<span class="o">*</span> <span class="nd">@param</span> <span class="n">phase</span> <span class="n">kTrain</span><span class="p">,</span> <span class="n">kTest</span><span class="p">,</span> <span class="n">kPositive</span><span class="p">,</span> <span class="n">etc</span><span class="o">.</span>
<span class="o">*/</span>
<span class="n">virtual</span> <span class="n">void</span> <span class="n">ComputeGradient</span><span class="p">(</span><span class="n">Phase</span> <span class="n">phase</span><span class="p">)</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
</pre></div>
</div>
<p>All <a class="reference external" href="#">Layer implementations</a> must implement the above two functions.</p>
</div>
<div class="section" id="updater">
<span id="updater"></span><h2>Updater<a class="headerlink" href="#updater" title="Permalink to this headline"></a></h2>
<p>Once the gradients of parameters are computed, the Updater will update
parameter values. There are many SGD variants for updating parameters, like
<a class="reference external" href="http://arxiv.org/pdf/1212.5701v1.pdf">AdaDelta</a>,
<a class="reference external" href="http://www.magicbroom.info/Papers/DuchiHaSi10.pdf">AdaGrad</a>,
<a class="reference external" href="http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf">RMSProp</a>,
<a class="reference external" href="http://scholar.google.com/citations?view_op=view_citation&amp;amp;hl=en&amp;amp;user=DJ8Ep8YAAAAJ&amp;amp;citation_for_view=DJ8Ep8YAAAAJ:hkOj_22Ku90C">Nesterov</a>
and SGD with momentum. The core functions of the Updater is</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="o">/**</span>
<span class="o">*</span> <span class="n">Update</span> <span class="n">parameter</span> <span class="n">values</span> <span class="n">based</span> <span class="n">on</span> <span class="n">gradients</span>
<span class="o">*</span> <span class="nd">@param</span> <span class="n">step</span> <span class="n">training</span> <span class="n">step</span>
<span class="o">*</span> <span class="nd">@param</span> <span class="n">param</span> <span class="n">pointer</span> <span class="n">to</span> <span class="n">the</span> <span class="n">Param</span> <span class="nb">object</span>
<span class="o">*</span> <span class="nd">@param</span> <span class="n">grad_scale</span> <span class="n">scaling</span> <span class="n">factor</span> <span class="k">for</span> <span class="n">the</span> <span class="n">gradients</span>
<span class="o">*/</span>
<span class="n">void</span> <span class="n">Update</span><span class="p">(</span><span class="nb">int</span> <span class="n">step</span><span class="p">,</span> <span class="n">Param</span><span class="o">*</span> <span class="n">param</span><span class="p">,</span> <span class="nb">float</span> <span class="n">grad_scale</span><span class="o">=</span><span class="mf">1.0</span><span class="n">f</span><span class="p">);</span>
<span class="o">/**</span>
<span class="o">*</span> <span class="nd">@param</span> <span class="n">step</span> <span class="n">training</span> <span class="n">step</span>
<span class="o">*</span> <span class="nd">@return</span> <span class="n">the</span> <span class="n">learning</span> <span class="n">rate</span> <span class="k">for</span> <span class="n">this</span> <span class="n">step</span>
<span class="o">*/</span>
<span class="nb">float</span> <span class="n">GetLearningRate</span><span class="p">(</span><span class="nb">int</span> <span class="n">step</span><span class="p">);</span>
</pre></div>
</div>
<p>SINGA provides several built-in updaters and learning rate change methods.
Users can configure them according to the UpdaterProto</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">message</span> <span class="n">UpdaterProto</span> <span class="p">{</span>
<span class="n">enum</span> <span class="n">UpdaterType</span><span class="p">{</span>
<span class="o">//</span> <span class="n">noraml</span> <span class="n">SGD</span> <span class="k">with</span> <span class="n">momentum</span> <span class="ow">and</span> <span class="n">weight</span> <span class="n">decay</span>
<span class="n">kSGD</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="o">//</span> <span class="n">adaptive</span> <span class="n">subgradient</span><span class="p">,</span> <span class="n">http</span><span class="p">:</span><span class="o">//</span><span class="n">www</span><span class="o">.</span><span class="n">magicbroom</span><span class="o">.</span><span class="n">info</span><span class="o">/</span><span class="n">Papers</span><span class="o">/</span><span class="n">DuchiHaSi10</span><span class="o">.</span><span class="n">pdf</span>
<span class="n">kAdaGrad</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span>
<span class="o">//</span> <span class="n">http</span><span class="p">:</span><span class="o">//</span><span class="n">www</span><span class="o">.</span><span class="n">cs</span><span class="o">.</span><span class="n">toronto</span><span class="o">.</span><span class="n">edu</span><span class="o">/~</span><span class="n">tijmen</span><span class="o">/</span><span class="n">csc321</span><span class="o">/</span><span class="n">slides</span><span class="o">/</span><span class="n">lecture_slides_lec6</span><span class="o">.</span><span class="n">pdf</span>
<span class="n">kRMSProp</span> <span class="o">=</span> <span class="mi">3</span><span class="p">;</span>
<span class="o">//</span> <span class="n">Nesterov</span> <span class="n">first</span> <span class="n">optimal</span> <span class="n">gradient</span> <span class="n">method</span>
<span class="n">kNesterov</span> <span class="o">=</span> <span class="mi">4</span><span class="p">;</span>
<span class="p">}</span>
<span class="o">//</span> <span class="n">updater</span> <span class="nb">type</span>
<span class="n">required</span> <span class="n">UpdaterType</span> <span class="nb">type</span> <span class="o">=</span> <span class="mi">1</span> <span class="p">[</span><span class="n">default</span><span class="o">=</span><span class="n">kSGD</span><span class="p">];</span>
<span class="o">//</span> <span class="n">configuration</span> <span class="k">for</span> <span class="n">RMSProp</span> <span class="n">algorithm</span>
<span class="n">optional</span> <span class="n">RMSPropProto</span> <span class="n">rmsprop_conf</span> <span class="o">=</span> <span class="mi">50</span><span class="p">;</span>
<span class="n">enum</span> <span class="n">ChangeMethod</span> <span class="p">{</span>
<span class="n">kFixed</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">kInverseT</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">kInverse</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span>
<span class="n">kExponential</span> <span class="o">=</span> <span class="mi">3</span><span class="p">;</span>
<span class="n">kLinear</span> <span class="o">=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">kStep</span> <span class="o">=</span> <span class="mi">5</span><span class="p">;</span>
<span class="n">kFixedStep</span> <span class="o">=</span> <span class="mi">6</span><span class="p">;</span>
<span class="p">}</span>
<span class="o">//</span> <span class="n">change</span> <span class="n">method</span> <span class="k">for</span> <span class="n">learning</span> <span class="n">rate</span>
<span class="n">required</span> <span class="n">ChangeMethod</span> <span class="n">lr_change</span><span class="o">=</span> <span class="mi">2</span> <span class="p">[</span><span class="n">default</span> <span class="o">=</span> <span class="n">kFixed</span><span class="p">];</span>
<span class="n">optional</span> <span class="n">FixedStepProto</span> <span class="n">fixedstep_conf</span><span class="o">=</span><span class="mi">40</span><span class="p">;</span>
<span class="o">...</span>
<span class="n">optional</span> <span class="nb">float</span> <span class="n">momentum</span> <span class="o">=</span> <span class="mi">31</span> <span class="p">[</span><span class="n">default</span> <span class="o">=</span> <span class="mi">0</span><span class="p">];</span>
<span class="n">optional</span> <span class="nb">float</span> <span class="n">weight_decay</span> <span class="o">=</span> <span class="mi">32</span> <span class="p">[</span><span class="n">default</span> <span class="o">=</span> <span class="mi">0</span><span class="p">];</span>
<span class="o">//</span> <span class="n">base</span> <span class="n">learning</span> <span class="n">rate</span>
<span class="n">optional</span> <span class="nb">float</span> <span class="n">base_lr</span> <span class="o">=</span> <span class="mi">34</span> <span class="p">[</span><span class="n">default</span> <span class="o">=</span> <span class="mi">0</span><span class="p">];</span>
<span class="p">}</span>
</pre></div>
</div>
</div>
<div class="section" id="other-model-configuration-fields">
<span id="other-model-configuration-fields"></span><h2>Other model configuration fields<a class="headerlink" href="#other-model-configuration-fields" title="Permalink to this headline"></a></h2>
<p>Some other important configuration fields for training a deep learning model is
listed:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="o">//</span> <span class="n">model</span> <span class="n">name</span><span class="p">,</span> <span class="n">e</span><span class="o">.</span><span class="n">g</span><span class="o">.</span><span class="p">,</span> <span class="s2">&quot;cifar10-dcnn&quot;</span><span class="p">,</span> <span class="s2">&quot;mnist-mlp&quot;</span>
<span class="n">string</span> <span class="n">name</span><span class="p">;</span>
<span class="o">//</span> <span class="n">displaying</span> <span class="n">training</span> <span class="n">info</span> <span class="k">for</span> <span class="n">every</span> <span class="n">this</span> <span class="n">number</span> <span class="n">of</span> <span class="n">iterations</span><span class="p">,</span> <span class="n">default</span> <span class="ow">is</span> <span class="mi">0</span>
<span class="n">int32</span> <span class="n">display_freq</span><span class="p">;</span>
<span class="o">//</span> <span class="n">total</span> <span class="n">num</span> <span class="n">of</span> <span class="n">steps</span><span class="o">/</span><span class="n">iterations</span> <span class="k">for</span> <span class="n">training</span>
<span class="n">int32</span> <span class="n">train_steps</span><span class="p">;</span>
<span class="o">//</span> <span class="n">do</span> <span class="n">test</span> <span class="k">for</span> <span class="n">every</span> <span class="n">this</span> <span class="n">number</span> <span class="n">of</span> <span class="n">training</span> <span class="n">iterations</span><span class="p">,</span> <span class="n">default</span> <span class="ow">is</span> <span class="mi">0</span>
<span class="n">int32</span> <span class="n">test_freq</span><span class="p">;</span>
<span class="o">//</span> <span class="n">run</span> <span class="n">test</span> <span class="k">for</span> <span class="n">this</span> <span class="n">number</span> <span class="n">of</span> <span class="n">steps</span><span class="o">/</span><span class="n">iterations</span><span class="p">,</span> <span class="n">default</span> <span class="ow">is</span> <span class="mf">0.</span>
<span class="o">//</span> <span class="n">The</span> <span class="n">test</span> <span class="n">dataset</span> <span class="n">has</span> <span class="n">test_steps</span> <span class="o">*</span> <span class="n">batchsize</span> <span class="n">instances</span><span class="o">.</span>
<span class="n">int32</span> <span class="n">test_steps</span><span class="p">;</span>
<span class="o">//</span> <span class="n">do</span> <span class="n">checkpoint</span> <span class="k">for</span> <span class="n">every</span> <span class="n">this</span> <span class="n">number</span> <span class="n">of</span> <span class="n">training</span> <span class="n">steps</span><span class="p">,</span> <span class="n">default</span> <span class="ow">is</span> <span class="mi">0</span>
<span class="n">int32</span> <span class="n">checkpoint_freq</span><span class="p">;</span>
</pre></div>
</div>
<p>The pages of <a class="reference external" href="checkpoint.html">checkpoint and restore</a> has details on checkpoint related fields.</p>
</div>
</div>
</div>
</div>
<footer>
<hr/>
<div role="contentinfo">
<p>
&copy; Copyright 2016 The Apache Software Foundation. All rights reserved. Apache Singa, Apache, the Apache feather logo, and the Apache Singa project logos are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners..
</p>
</div>
Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT:'../',
VERSION:'0.3.0',
COLLAPSE_INDEX:false,
FILE_SUFFIX:'.html',
HAS_SOURCE: true
};
</script>
<script type="text/javascript" src="../_static/jquery.js"></script>
<script type="text/javascript" src="../_static/underscore.js"></script>
<script type="text/javascript" src="../_static/doctools.js"></script>
<script type="text/javascript" src="../_static/js/theme.js"></script>
<script type="text/javascript">
jQuery(function () {
SphinxRtdTheme.StickyNav.enable();
});
</script>
<div class="rst-versions shift-up" data-toggle="rst-versions" role="note" aria-label="versions">
<img src="../_static/apache.jpg">
<span class="rst-current-version" data-toggle="rst-current-version">
<span class="fa fa-book"> incubator-singa </span>
v: 0.3.0
<span class="fa fa-caret-down"></span>
</span>
<div class="rst-other-versions">
<dl>
<dt>Languages</dt>
<dd><a href="../../en/index.html">English</a></dd>
<dd><a href="../../zh/index.html">中文</a></dd>
<dd><a href="../../jp/index.html">日本語</a></dd>
<dd><a href="../../kr/index.html">한국어</a></dd>
</dl>
</div>
</div>
<a href="https://github.com/apache/incubator-singa">
<img style="position: absolute; top: 0; right: 0; border: 0; z-index: 10000;"
src="https://s3.amazonaws.com/github/ribbons/forkme_right_orange_ff7600.png"
alt="Fork me on GitHub">
</a>
</body>
</html>