blob: af810d9266e445bf5bb9db0d6ce9c0b531854698 [file] [log] [blame]
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Recurrent Neural Networks for Language Modelling &mdash; incubator-singa 0.3.0 documentation</title>
<link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
<link rel="top" title="incubator-singa 0.3.0 documentation" href="../index.html"/>
<script src="../_static/js/modernizr.min.js"></script>
</head>
<body class="wy-body-for-nav" role="document">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search">
<a href="../index.html" class="icon icon-home"> incubator-singa
<img src="../_static/singa.png" class="logo" />
</a>
<div class="version">
0.3.0
</div>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<ul>
<li class="toctree-l1"><a class="reference internal" href="../downloads.html">Download SINGA</a></li>
<li class="toctree-l1"><a class="reference internal" href="index.html">Documentation</a></li>
</ul>
<p class="caption"><span class="caption-text">Development</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../develop/schedule.html">Development Schedule</a></li>
<li class="toctree-l1"><a class="reference internal" href="../develop/how-contribute.html">How to Contribute to SINGA</a></li>
<li class="toctree-l1"><a class="reference internal" href="../develop/contribute-code.html">How to Contribute Code</a></li>
<li class="toctree-l1"><a class="reference internal" href="../develop/contribute-docs.html">How to Contribute Documentation</a></li>
</ul>
<p class="caption"><span class="caption-text">Community</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../community/source-repository.html">Source Repository</a></li>
<li class="toctree-l1"><a class="reference internal" href="../community/mail-lists.html">Project Mailing Lists</a></li>
<li class="toctree-l1"><a class="reference internal" href="../community/issue-tracking.html">Issue Tracking</a></li>
<li class="toctree-l1"><a class="reference internal" href="../community/team-list.html">The SINGA Team</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" role="navigation" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../index.html">incubator-singa</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="../index.html">Docs</a> &raquo;</li>
<li>Recurrent Neural Networks for Language Modelling</li>
<li class="wy-breadcrumbs-aside">
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<div class="section" id="recurrent-neural-networks-for-language-modelling">
<span id="recurrent-neural-networks-for-language-modelling"></span><h1>Recurrent Neural Networks for Language Modelling<a class="headerlink" href="#recurrent-neural-networks-for-language-modelling" title="Permalink to this headline"></a></h1>
<hr class="docutils" />
<p>Recurrent Neural Networks (RNN) are widely used for modelling sequential data,
such as music and sentences. In this example, we use SINGA to train a
<a class="reference external" href="http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf">RNN model</a>
proposed by Tomas Mikolov for <a class="reference external" href="https://en.wikipedia.org/wiki/Language_model">language modeling</a>.
The training objective (loss) is
to minimize the <a class="reference external" href="https://en.wikipedia.org/wiki/Perplexity">perplexity per word</a>, which
is equivalent to maximize the probability of predicting the next word given the current word in
a sentence.</p>
<p>Different to the <a class="reference external" href="cnn.html">CNN</a>, <a class="reference external" href="mlp.html">MLP</a>
and <a class="reference external" href="rbm.html">RBM</a> examples which use built-in
layers(layer) and records(data),
none of the layers in this example are built-in. Hence users would learn to
implement their own layers and data records through this example.</p>
<div class="section" id="running-instructions">
<span id="running-instructions"></span><h2>Running instructions<a class="headerlink" href="#running-instructions" title="Permalink to this headline"></a></h2>
<p>In <em>SINGA_ROOT/examples/rnnlm/</em>, scripts are provided to run the training job.
First, the data is prepared by</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>$ cp Makefile.example Makefile
$ make download
$ make create
</pre></div>
</div>
<p>Second, to compile the source code under <em>examples/rnnlm/</em>, run</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>$ make rnnlm
</pre></div>
</div>
<p>An executable file <em>rnnlm.bin</em> will be generated.</p>
<p>Third, the training is started by passing <em>rnnlm.bin</em> and the job configuration
to <em>singa-run.sh</em>,</p>
<div class="highlight-default"><div class="highlight"><pre><span></span># at SINGA_ROOT/
# export LD_LIBRARY_PATH=.libs:$LD_LIBRARY_PATH
$ ./bin/singa-run.sh -exec examples/rnnlm/rnnlm.bin -conf examples/rnnlm/job.conf
</pre></div>
</div>
</div>
<div class="section" id="implementations">
<span id="implementations"></span><h2>Implementations<a class="headerlink" href="#implementations" title="Permalink to this headline"></a></h2>
<p><img src="../_static/images/rnnlm.png" align="center" width="400px"/>
<span><strong>Figure 1 - Net structure of the RNN model.</strong></span></p>
<p>The neural net structure is shown Figure 1. Word records are loaded by
<code class="docutils literal"><span class="pre">DataLayer</span></code>. For every iteration, at most <code class="docutils literal"><span class="pre">max_window</span></code> word records are
processed. If a sentence ending character is read, the <code class="docutils literal"><span class="pre">DataLayer</span></code> stops
loading immediately. <code class="docutils literal"><span class="pre">EmbeddingLayer</span></code> looks up a word embedding matrix to extract
feature vectors for words loaded by the <code class="docutils literal"><span class="pre">DataLayer</span></code>. These features are transformed by the
<code class="docutils literal"><span class="pre">HiddenLayer</span></code> which propagates the features from left to right. The
output feature for word at position k is influenced by words from position 0 to
k-1. Finally, <code class="docutils literal"><span class="pre">LossLayer</span></code> computes the cross-entropy loss (see below)
by predicting the next word of each word.
The cross-entropy loss is computed as</p>
<p><code class="docutils literal"><span class="pre">$$L(w_t)=-log</span> <span class="pre">P(w_{t+1}|w_t)$$</span></code></p>
<p>Given <code class="docutils literal"><span class="pre">$w_t$</span></code> the above equation would compute over all words in the vocabulary,
which is time consuming.
<a class="reference external" href="https://f25ea9ccb7d3346ce6891573d543960492b92c30.googledrive.com/host/0ByxdPXuxLPS5RFM5dVNvWVhTd0U/rnnlm-0.4b.tgz">RNNLM Toolkit</a>
accelerates the computation as</p>
<p><code class="docutils literal"><span class="pre">$$P(w_{t+1}|w_t)</span> <span class="pre">=</span> <span class="pre">P(C_{w_{t+1}}|w_t)</span> <span class="pre">*</span> <span class="pre">P(w_{t+1}|C_{w_{t+1}})$$</span></code></p>
<p>Words from the vocabulary are partitioned into a user-defined number of classes.
The first term on the left side predicts the class of the next word, and
then predicts the next word given its class. Both the number of classes and
the words from one class are much smaller than the vocabulary size. The probabilities
can be calculated much faster.</p>
<p>The perplexity per word is computed by,</p>
<p><code class="docutils literal"><span class="pre">$$PPL</span> <span class="pre">=</span> <span class="pre">10^{-</span> <span class="pre">avg_t</span> <span class="pre">log_{10}</span> <span class="pre">P(w_{t+1}|w_t)}$$</span></code></p>
<div class="section" id="data-preparation">
<span id="data-preparation"></span><h3>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this headline"></a></h3>
<p>We use a small dataset provided by the <a class="reference external" href="https://f25ea9ccb7d3346ce6891573d543960492b92c30.googledrive.com/host/0ByxdPXuxLPS5RFM5dVNvWVhTd0U/rnnlm-0.4b.tgz">RNNLM Toolkit</a>.
It has 10,000 training sentences, with 71350 words in total and 3720 unique words.
The subsequent steps follow the instructions in
<a class="reference external" href="data.html">Data Preparation</a> to convert the
raw data into records and insert them into data stores.</p>
<div class="section" id="download-source-data">
<span id="download-source-data"></span><h4>Download source data<a class="headerlink" href="#download-source-data" title="Permalink to this headline"></a></h4>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="c1"># in SINGA_ROOT/examples/rnnlm/</span>
<span class="n">cp</span> <span class="n">Makefile</span><span class="o">.</span><span class="n">example</span> <span class="n">Makefile</span>
<span class="n">make</span> <span class="n">download</span>
</pre></div>
</div>
</div>
<div class="section" id="define-record-format">
<span id="define-record-format"></span><h4>Define record format<a class="headerlink" href="#define-record-format" title="Permalink to this headline"></a></h4>
<p>We define the word record as follows,</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="c1"># in SINGA_ROOT/examples/rnnlm/rnnlm.proto</span>
<span class="n">message</span> <span class="n">WordRecord</span> <span class="p">{</span>
<span class="n">optional</span> <span class="n">string</span> <span class="n">word</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">optional</span> <span class="n">int32</span> <span class="n">word_index</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span>
<span class="n">optional</span> <span class="n">int32</span> <span class="n">class_index</span> <span class="o">=</span> <span class="mi">3</span><span class="p">;</span>
<span class="n">optional</span> <span class="n">int32</span> <span class="n">class_start</span> <span class="o">=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">optional</span> <span class="n">int32</span> <span class="n">class_end</span> <span class="o">=</span> <span class="mi">5</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
</div>
<p>It includes the word string and its index in the vocabulary.
Words in the vocabulary are sorted based on their frequency in the training dataset.
The sorted list is cut into 100 sublists such that each sublist has 1/100 total
word frequency. Each sublist is called a class.
Hence each word has a <code class="docutils literal"><span class="pre">class_index</span></code> ([0,100)). The <code class="docutils literal"><span class="pre">class_start</span></code> is the index
of the first word in the same class as <code class="docutils literal"><span class="pre">word</span></code>. The <code class="docutils literal"><span class="pre">class_end</span></code> is the index of
the first word in the next class.</p>
</div>
<div class="section" id="create-data-stores">
<span id="create-data-stores"></span><h4>Create data stores<a class="headerlink" href="#create-data-stores" title="Permalink to this headline"></a></h4>
<p>We use code from RNNLM Toolkit to read words, and sort them into classes.
The main function in <em>create_store.cc</em> first creates word classes based on the training
dataset. Second it calls the following function to create data store for the
training, validation and test dataset.</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="nb">int</span> <span class="n">create_data</span><span class="p">(</span><span class="n">const</span> <span class="n">char</span> <span class="o">*</span><span class="n">input_file</span><span class="p">,</span> <span class="n">const</span> <span class="n">char</span> <span class="o">*</span><span class="n">output_file</span><span class="p">);</span>
</pre></div>
</div>
<p><code class="docutils literal"><span class="pre">input</span></code> is the path to training/validation/testing text file from the RNNLM Toolkit, <code class="docutils literal"><span class="pre">output</span></code> is output store file.
This function starts with</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">singa</span><span class="p">::</span><span class="n">io</span><span class="p">::</span><span class="n">KVFile</span> <span class="n">store</span><span class="p">;</span>
<span class="n">store</span><span class="o">.</span><span class="n">Open</span><span class="p">(</span><span class="n">output</span><span class="p">,</span> <span class="n">signa</span><span class="p">::</span><span class="n">io</span><span class="p">::</span><span class="n">kCreate</span><span class="p">);</span>
</pre></div>
</div>
<p>Then it reads the words one by one. For each word it creates a <code class="docutils literal"><span class="pre">WordRecord</span></code> instance,
and inserts it into the store,</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="nb">int</span> <span class="n">wcnt</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="o">//</span> <span class="n">word</span> <span class="n">count</span>
<span class="n">WordRecord</span> <span class="n">wordRecord</span><span class="p">;</span>
<span class="k">while</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="n">readWord</span><span class="p">(</span><span class="n">wordstr</span><span class="p">,</span> <span class="n">fin</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">feof</span><span class="p">(</span><span class="n">fin</span><span class="p">))</span> <span class="k">break</span><span class="p">;</span>
<span class="o">...//</span> <span class="n">fill</span> <span class="ow">in</span> <span class="n">the</span> <span class="n">wordRecord</span><span class="p">;</span>
<span class="n">string</span> <span class="n">val</span><span class="p">;</span>
<span class="n">wordRecord</span><span class="o">.</span><span class="n">SerializeToString</span><span class="p">(</span><span class="o">&amp;</span><span class="n">val</span><span class="p">);</span>
<span class="nb">int</span> <span class="n">length</span> <span class="o">=</span> <span class="n">snprintf</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">BUFFER_LEN</span><span class="p">,</span> <span class="s2">&quot;</span><span class="si">%05d</span><span class="s2">&quot;</span><span class="p">,</span> <span class="n">wcnt</span><span class="o">++</span><span class="p">);</span>
<span class="n">store</span><span class="o">.</span><span class="n">Write</span><span class="p">(</span><span class="n">string</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">length</span><span class="p">),</span> <span class="n">val</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
</div>
<p>Compilation and running commands are provided in the <em>Makefile.example</em>.
After executing</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">make</span> <span class="n">create</span>
</pre></div>
</div>
<p><em>train_data.bin</em>, <em>test_data.bin</em> and <em>valid_data.bin</em> will be created.</p>
</div>
</div>
<div class="section" id="layer-implementation">
<span id="layer-implementation"></span><h3>Layer implementation<a class="headerlink" href="#layer-implementation" title="Permalink to this headline"></a></h3>
<p>4 user-defined layers are implemented for this application.
Following the guide for implementing <a class="reference external" href="layer#implementing-a-new-layer-subclass">new Layer subclasses</a>,
we extend the <a class="reference external" href="../api/classsinga_1_1LayerProto.html">LayerProto</a>
to include the configuration messages of user-defined layers as shown below
(3 out of the 7 layers have specific configurations),</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="s2">&quot;job.proto&quot;</span><span class="p">;</span> <span class="o">//</span> <span class="n">Layer</span> <span class="n">message</span> <span class="k">for</span> <span class="n">SINGA</span> <span class="ow">is</span> <span class="n">defined</span>
<span class="o">//</span><span class="n">For</span> <span class="n">implementation</span> <span class="n">of</span> <span class="n">RNNLM</span> <span class="n">application</span>
<span class="n">extend</span> <span class="n">singa</span><span class="o">.</span><span class="n">LayerProto</span> <span class="p">{</span>
<span class="n">optional</span> <span class="n">EmbeddingProto</span> <span class="n">embedding_conf</span> <span class="o">=</span> <span class="mi">101</span><span class="p">;</span>
<span class="n">optional</span> <span class="n">LossProto</span> <span class="n">loss_conf</span> <span class="o">=</span> <span class="mi">102</span><span class="p">;</span>
<span class="n">optional</span> <span class="n">DataProto</span> <span class="n">data_conf</span> <span class="o">=</span> <span class="mi">103</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
</div>
<p>In the subsequent sections, we describe the implementation of each layer,
including its configuration message.</p>
<div class="section" id="rnnlayer">
<span id="rnnlayer"></span><h4>RNNLayer<a class="headerlink" href="#rnnlayer" title="Permalink to this headline"></a></h4>
<p>This is the base layer of all other layers for this applications. It is defined
as follows,</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">RNNLayer</span> <span class="p">:</span> <span class="n">virtual</span> <span class="n">public</span> <span class="n">Layer</span> <span class="p">{</span>
<span class="n">public</span><span class="p">:</span>
<span class="n">inline</span> <span class="nb">int</span> <span class="n">window</span><span class="p">()</span> <span class="p">{</span> <span class="k">return</span> <span class="n">window_</span><span class="p">;</span> <span class="p">}</span>
<span class="n">protected</span><span class="p">:</span>
<span class="nb">int</span> <span class="n">window_</span><span class="p">;</span>
<span class="p">};</span>
</pre></div>
</div>
<p>For this application, two iterations may process different number of words.
Because sentences have different lengths.
The <code class="docutils literal"><span class="pre">DataLayer</span></code> decides the effective window size. All other layers call its source layers to get the
effective window size and resets <code class="docutils literal"><span class="pre">window_</span></code> in <code class="docutils literal"><span class="pre">ComputeFeature</span></code> function.</p>
</div>
<div class="section" id="datalayer">
<span id="datalayer"></span><h4>DataLayer<a class="headerlink" href="#datalayer" title="Permalink to this headline"></a></h4>
<p>DataLayer is for loading Records.</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">DataLayer</span> <span class="p">:</span> <span class="n">public</span> <span class="n">RNNLayer</span><span class="p">,</span> <span class="n">singa</span><span class="p">::</span><span class="n">InputLayer</span> <span class="p">{</span>
<span class="n">public</span><span class="p">:</span>
<span class="n">void</span> <span class="n">Setup</span><span class="p">(</span><span class="n">const</span> <span class="n">LayerProto</span><span class="o">&amp;</span> <span class="n">proto</span><span class="p">,</span> <span class="n">const</span> <span class="n">vector</span><span class="o">&lt;</span><span class="n">Layer</span><span class="o">*&gt;&amp;</span> <span class="n">srclayers</span><span class="p">)</span> <span class="n">override</span><span class="p">;</span>
<span class="n">void</span> <span class="n">ComputeFeature</span><span class="p">(</span><span class="nb">int</span> <span class="n">flag</span><span class="p">,</span> <span class="n">const</span> <span class="n">vector</span><span class="o">&lt;</span><span class="n">Layer</span><span class="o">*&gt;&amp;</span> <span class="n">srclayers</span><span class="p">)</span> <span class="n">override</span><span class="p">;</span>
<span class="nb">int</span> <span class="n">max_window</span><span class="p">()</span> <span class="n">const</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">max_window_</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">private</span><span class="p">:</span>
<span class="nb">int</span> <span class="n">max_window_</span><span class="p">;</span>
<span class="n">singa</span><span class="p">::</span><span class="n">io</span><span class="p">::</span><span class="n">Store</span><span class="o">*</span> <span class="n">store_</span><span class="p">;</span>
<span class="p">};</span>
</pre></div>
</div>
<p>The Setup function gets the user configured max window size.</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">max_window_</span> <span class="o">=</span> <span class="n">proto</span><span class="o">.</span><span class="n">GetExtension</span><span class="p">(</span><span class="n">input_conf</span><span class="p">)</span><span class="o">.</span><span class="n">max_window</span><span class="p">();</span>
</pre></div>
</div>
<p>The <code class="docutils literal"><span class="pre">ComputeFeature</span></code> function loads at most max_window records. It could also
stop when the sentence ending character is encountered.</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="o">...//</span> <span class="n">shift</span> <span class="n">the</span> <span class="n">last</span> <span class="n">record</span> <span class="n">to</span> <span class="n">the</span> <span class="n">first</span>
<span class="n">window_</span> <span class="o">=</span> <span class="n">max_window_</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="nb">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;=</span> <span class="n">max_window_</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="o">//</span> <span class="n">load</span> <span class="n">record</span><span class="p">;</span> <span class="k">break</span> <span class="k">if</span> <span class="n">it</span> <span class="ow">is</span> <span class="n">the</span> <span class="n">ending</span> <span class="n">character</span>
<span class="p">}</span>
</pre></div>
</div>
<p>The configuration of <code class="docutils literal"><span class="pre">DataLayer</span></code> is like</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">name</span><span class="p">:</span> <span class="s2">&quot;data&quot;</span>
<span class="n">user_type</span><span class="p">:</span> <span class="s2">&quot;kData&quot;</span>
<span class="p">[</span><span class="n">data_conf</span><span class="p">]</span> <span class="p">{</span>
<span class="n">path</span><span class="p">:</span> <span class="s2">&quot;examples/rnnlm/train_data.bin&quot;</span>
<span class="n">max_window</span><span class="p">:</span> <span class="mi">10</span>
<span class="p">}</span>
</pre></div>
</div>
</div>
<div class="section" id="embeddinglayer">
<span id="embeddinglayer"></span><h4>EmbeddingLayer<a class="headerlink" href="#embeddinglayer" title="Permalink to this headline"></a></h4>
<p>This layer gets records from <code class="docutils literal"><span class="pre">DataLayer</span></code>. For each record, the word index is
parsed and used to get the corresponding word feature vector from the embedding
matrix.</p>
<p>The class is declared as follows,</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">EmbeddingLayer</span> <span class="p">:</span> <span class="n">public</span> <span class="n">RNNLayer</span> <span class="p">{</span>
<span class="o">...</span>
<span class="n">const</span> <span class="n">std</span><span class="p">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">Param</span><span class="o">*&gt;</span> <span class="n">GetParams</span><span class="p">()</span> <span class="n">const</span> <span class="n">override</span> <span class="p">{</span>
<span class="n">std</span><span class="p">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">Param</span><span class="o">*&gt;</span> <span class="n">params</span><span class="p">{</span><span class="n">embed_</span><span class="p">};</span>
<span class="k">return</span> <span class="n">params</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">private</span><span class="p">:</span>
<span class="nb">int</span> <span class="n">word_dim_</span><span class="p">,</span> <span class="n">vocab_size_</span><span class="p">;</span>
<span class="n">Param</span><span class="o">*</span> <span class="n">embed_</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
</div>
<p>The <code class="docutils literal"><span class="pre">embed_</span></code> field is a matrix whose values are parameter to be learned.
The matrix size is <code class="docutils literal"><span class="pre">vocab_size_</span></code> x <code class="docutils literal"><span class="pre">word_dim_</span></code>.</p>
<p>The Setup function reads configurations for <code class="docutils literal"><span class="pre">word_dim_</span></code> and <code class="docutils literal"><span class="pre">vocab_size_</span></code>. Then
it allocates feature Blob for <code class="docutils literal"><span class="pre">max_window</span></code> words and setups <code class="docutils literal"><span class="pre">embed_</span></code>.</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="nb">int</span> <span class="n">max_window</span> <span class="o">=</span> <span class="n">srclayers</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">data</span><span class="p">(</span><span class="n">this</span><span class="p">)</span><span class="o">.</span><span class="n">shape</span><span class="p">()[</span><span class="mi">0</span><span class="p">];</span>
<span class="n">word_dim_</span> <span class="o">=</span> <span class="n">proto</span><span class="o">.</span><span class="n">GetExtension</span><span class="p">(</span><span class="n">embedding_conf</span><span class="p">)</span><span class="o">.</span><span class="n">word_dim</span><span class="p">();</span>
<span class="n">data_</span><span class="o">.</span><span class="n">Reshape</span><span class="p">(</span><span class="n">vector</span><span class="o">&lt;</span><span class="nb">int</span><span class="o">&gt;</span><span class="p">{</span><span class="n">max_window</span><span class="p">,</span> <span class="n">word_dim_</span><span class="p">});</span>
<span class="o">...</span>
<span class="n">embed_</span><span class="o">-&gt;</span><span class="n">Setup</span><span class="p">(</span><span class="n">vector</span><span class="o">&lt;</span><span class="nb">int</span><span class="o">&gt;</span><span class="p">{</span><span class="n">vocab_size_</span><span class="p">,</span> <span class="n">word_dim_</span><span class="p">});</span>
</pre></div>
</div>
<p>The <code class="docutils literal"><span class="pre">ComputeFeature</span></code> function simply copies the feature vector from the <code class="docutils literal"><span class="pre">embed_</span></code>
matrix into the feature Blob.</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="c1"># reset effective window size</span>
<span class="n">window_</span> <span class="o">=</span> <span class="n">datalayer</span><span class="o">-&gt;</span><span class="n">window</span><span class="p">();</span>
<span class="n">auto</span> <span class="n">records</span> <span class="o">=</span> <span class="n">datalayer</span><span class="o">-&gt;</span><span class="n">records</span><span class="p">();</span>
<span class="o">...</span>
<span class="k">for</span> <span class="p">(</span><span class="nb">int</span> <span class="n">t</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">t</span> <span class="o">&lt;</span> <span class="n">window_</span><span class="p">;</span> <span class="n">t</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nb">int</span> <span class="n">idx</span> <span class="o">&lt;-</span> <span class="n">word</span> <span class="n">index</span>
<span class="n">Copy</span><span class="p">(</span><span class="n">words</span><span class="p">[</span><span class="n">t</span><span class="p">],</span> <span class="n">embed</span><span class="p">[</span><span class="n">idx</span><span class="p">]);</span>
<span class="p">}</span>
</pre></div>
</div>
<p>The <code class="docutils literal"><span class="pre">ComputeGradient</span></code> function copies back the gradients to the <code class="docutils literal"><span class="pre">embed_</span></code> matrix.</p>
<p>The configuration for <code class="docutils literal"><span class="pre">EmbeddingLayer</span></code> is like,</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">user_type</span><span class="p">:</span> <span class="s2">&quot;kEmbedding&quot;</span>
<span class="p">[</span><span class="n">embedding_conf</span><span class="p">]</span> <span class="p">{</span>
<span class="n">word_dim</span><span class="p">:</span> <span class="mi">15</span>
<span class="n">vocab_size</span><span class="p">:</span> <span class="mi">3720</span>
<span class="p">}</span>
<span class="n">srclayers</span><span class="p">:</span> <span class="s2">&quot;data&quot;</span>
<span class="n">param</span> <span class="p">{</span>
<span class="n">name</span><span class="p">:</span> <span class="s2">&quot;w1&quot;</span>
<span class="n">init</span> <span class="p">{</span>
<span class="nb">type</span><span class="p">:</span> <span class="n">kUniform</span>
<span class="n">low</span><span class="p">:</span><span class="o">-</span><span class="mf">0.3</span>
<span class="n">high</span><span class="p">:</span><span class="mf">0.3</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
</div>
</div>
<div class="section" id="hiddenlayer">
<span id="hiddenlayer"></span><h4>HiddenLayer<a class="headerlink" href="#hiddenlayer" title="Permalink to this headline"></a></h4>
<p>This layer unrolls the recurrent connections for at most max_window times.
The feature for position k is computed based on the feature from the embedding layer (position k)
and the feature at position k-1 of this layer. The formula is</p>
<p><code class="docutils literal"><span class="pre">$$f[k]=\sigma</span> <span class="pre">(f[t-1]*W+src[t])$$</span></code></p>
<p>where <code class="docutils literal"><span class="pre">$W$</span></code> is a matrix with <code class="docutils literal"><span class="pre">word_dim_</span></code> x <code class="docutils literal"><span class="pre">word_dim_</span></code> parameters.</p>
<p>If you want to implement a recurrent neural network following our
design, this layer is of vital importance for you to refer to.</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">HiddenLayer</span> <span class="p">:</span> <span class="n">public</span> <span class="n">RNNLayer</span> <span class="p">{</span>
<span class="o">...</span>
<span class="n">const</span> <span class="n">std</span><span class="p">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">Param</span><span class="o">*&gt;</span> <span class="n">GetParams</span><span class="p">()</span> <span class="n">const</span> <span class="n">override</span> <span class="p">{</span>
<span class="n">std</span><span class="p">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">Param</span><span class="o">*&gt;</span> <span class="n">params</span><span class="p">{</span><span class="n">weight_</span><span class="p">};</span>
<span class="k">return</span> <span class="n">params</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">private</span><span class="p">:</span>
<span class="n">Param</span><span class="o">*</span> <span class="n">weight_</span><span class="p">;</span>
<span class="p">};</span>
</pre></div>
</div>
<p>The <code class="docutils literal"><span class="pre">Setup</span></code> function setups the weight matrix as</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">weight_</span><span class="o">-&gt;</span><span class="n">Setup</span><span class="p">(</span><span class="n">std</span><span class="p">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="nb">int</span><span class="o">&gt;</span><span class="p">{</span><span class="n">word_dim</span><span class="p">,</span> <span class="n">word_dim</span><span class="p">});</span>
</pre></div>
</div>
<p>The <code class="docutils literal"><span class="pre">ComputeFeature</span></code> function gets the effective window size (<code class="docutils literal"><span class="pre">window_</span></code>) from its source layer
i.e., the embedding layer. Then it propagates the feature from position 0 to position
<code class="docutils literal"><span class="pre">window_</span></code> -1. The detailed descriptions for this process are illustrated as follows.</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">void</span> <span class="n">HiddenLayer</span><span class="p">::</span><span class="n">ComputeFeature</span><span class="p">()</span> <span class="p">{</span>
<span class="k">for</span><span class="p">(</span><span class="nb">int</span> <span class="n">t</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">t</span> <span class="o">&lt;</span> <span class="n">window_size</span><span class="p">;</span> <span class="n">t</span><span class="o">++</span><span class="p">){</span>
<span class="k">if</span><span class="p">(</span><span class="n">t</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
<span class="n">Copy</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="n">t</span><span class="p">],</span> <span class="n">src</span><span class="p">[</span><span class="n">t</span><span class="p">]);</span>
<span class="k">else</span>
<span class="n">data</span><span class="p">[</span><span class="n">t</span><span class="p">]</span><span class="o">=</span><span class="n">sigmoid</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="n">t</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">*</span><span class="n">W</span> <span class="o">+</span> <span class="n">src</span><span class="p">[</span><span class="n">t</span><span class="p">]);</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
</div>
<p>The <code class="docutils literal"><span class="pre">ComputeGradient</span></code> function computes the gradient of the loss w.r.t. W and the source layer.
Particularly, for each position k, since data[k] contributes to data[k+1] and the feature
at position k in its destination layer (the loss layer), grad[k] should contains the gradient
from two parts. The destination layer has already computed the gradient from the loss layer into
grad[k]; In the <code class="docutils literal"><span class="pre">ComputeGradient</span></code> function, we need to add the gradient from position k+1.</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">void</span> <span class="n">HiddenLayer</span><span class="p">::</span><span class="n">ComputeGradient</span><span class="p">(){</span>
<span class="o">...</span>
<span class="k">for</span> <span class="p">(</span><span class="nb">int</span> <span class="n">k</span> <span class="o">=</span> <span class="n">window_</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span> <span class="n">k</span> <span class="o">&gt;=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">k</span><span class="o">--</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">k</span> <span class="o">&lt;</span> <span class="n">window_</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="n">grad</span><span class="p">[</span><span class="n">k</span><span class="p">]</span> <span class="o">+=</span> <span class="n">dot</span><span class="p">(</span><span class="n">grad</span><span class="p">[</span><span class="n">k</span> <span class="o">+</span> <span class="mi">1</span><span class="p">],</span> <span class="n">weight</span><span class="o">.</span><span class="n">T</span><span class="p">());</span> <span class="o">//</span> <span class="n">add</span> <span class="n">gradient</span> <span class="kn">from</span> <span class="nn">position</span> <span class="n">t</span><span class="o">+</span><span class="mf">1.</span>
<span class="p">}</span>
<span class="n">grad</span><span class="p">[</span><span class="n">k</span><span class="p">]</span> <span class="o">=...</span> <span class="o">//</span> <span class="n">compute</span> <span class="n">gL</span><span class="o">/</span><span class="n">gy</span><span class="p">[</span><span class="n">t</span><span class="p">],</span> <span class="n">y</span><span class="p">[</span><span class="n">t</span><span class="p">]</span><span class="o">=</span><span class="n">data</span><span class="p">[</span><span class="n">t</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">*</span><span class="n">W</span><span class="o">+</span><span class="n">src</span><span class="p">[</span><span class="n">t</span><span class="p">]</span>
<span class="p">}</span>
<span class="n">gweight</span> <span class="o">=</span> <span class="n">dot</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">Slice</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">window_</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span><span class="o">.</span><span class="n">T</span><span class="p">(),</span> <span class="n">grad</span><span class="o">.</span><span class="n">Slice</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">window_</span><span class="p">));</span>
<span class="n">Copy</span><span class="p">(</span><span class="n">gsrc</span><span class="p">,</span> <span class="n">grad</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
</div>
<p>After the loop, we get the gradient of the loss w.r.t y[k], which is used to
compute the gradient of W and the src[k].</p>
</div>
<div class="section" id="losslayer">
<span id="losslayer"></span><h4>LossLayer<a class="headerlink" href="#losslayer" title="Permalink to this headline"></a></h4>
<p>This layer computes the cross-entropy loss and the <code class="docutils literal"><span class="pre">$log_{10}P(w_{t+1}|w_t)$</span></code> (which
could be averaged over all words by users to get the PPL value).</p>
<p>There are two configuration fields to be specified by users.</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">message</span> <span class="n">LossProto</span> <span class="p">{</span>
<span class="n">optional</span> <span class="n">int32</span> <span class="n">nclass</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">optional</span> <span class="n">int32</span> <span class="n">vocab_size</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
</div>
<p>There are two weight matrices to be learned</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">LossLayer</span> <span class="p">:</span> <span class="n">public</span> <span class="n">RNNLayer</span> <span class="p">{</span>
<span class="o">...</span>
<span class="n">private</span><span class="p">:</span>
<span class="n">Param</span><span class="o">*</span> <span class="n">word_weight_</span><span class="p">,</span> <span class="o">*</span><span class="n">class_weight_</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
</div>
<p>The ComputeFeature function computes the two probabilities respectively.</p>
<p><code class="docutils literal"><span class="pre">$$P(C_{w_{t+1}}|w_t)</span> <span class="pre">=</span> <span class="pre">Softmax(w_t</span> <span class="pre">*</span> <span class="pre">class\_weight_)$$</span></code>
<code class="docutils literal"><span class="pre">$$P(w_{t+1}|C_{w_{t+1}})</span> <span class="pre">=</span> <span class="pre">Softmax(w_t</span> <span class="pre">*</span> <span class="pre">word\_weight[class\_start:class\_end])$$</span></code></p>
<p><code class="docutils literal"><span class="pre">$w_t$</span></code> is the feature from the hidden layer for the k-th word, its ground truth
next word is <code class="docutils literal"><span class="pre">$w_{t+1}$</span></code>. The first equation computes the probability distribution over all
classes for the next word. The second equation computes the
probability distribution over the words in the ground truth class for the next word.</p>
<p>The ComputeGradient function computes the gradient of the source layer
(i.e., the hidden layer) and the two weight matrices.</p>
</div>
</div>
<div class="section" id="updater-configuration">
<span id="updater-configuration"></span><h3>Updater Configuration<a class="headerlink" href="#updater-configuration" title="Permalink to this headline"></a></h3>
<p>We employ kFixedStep type of the learning rate change method and the
configuration is as follows. We decay the learning rate once the performance does
not increase on the validation dataset.</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">updater</span><span class="p">{</span>
<span class="nb">type</span><span class="p">:</span> <span class="n">kSGD</span>
<span class="n">learning_rate</span> <span class="p">{</span>
<span class="nb">type</span><span class="p">:</span> <span class="n">kFixedStep</span>
<span class="n">fixedstep_conf</span><span class="p">:{</span>
<span class="n">step</span><span class="p">:</span><span class="mi">0</span>
<span class="n">step</span><span class="p">:</span><span class="mi">48810</span>
<span class="n">step</span><span class="p">:</span><span class="mi">56945</span>
<span class="n">step</span><span class="p">:</span><span class="mi">65080</span>
<span class="n">step</span><span class="p">:</span><span class="mi">73215</span>
<span class="n">step_lr</span><span class="p">:</span><span class="mf">0.1</span>
<span class="n">step_lr</span><span class="p">:</span><span class="mf">0.05</span>
<span class="n">step_lr</span><span class="p">:</span><span class="mf">0.025</span>
<span class="n">step_lr</span><span class="p">:</span><span class="mf">0.0125</span>
<span class="n">step_lr</span><span class="p">:</span><span class="mf">0.00625</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
</div>
</div>
<div class="section" id="trainonebatch-function">
<span id="trainonebatch-function"></span><h3>TrainOneBatch() Function<a class="headerlink" href="#trainonebatch-function" title="Permalink to this headline"></a></h3>
<p>We use BP (BackPropagation) algorithm to train the RNN model here. The
corresponding configuration can be seen below.</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="c1"># In job.conf file</span>
<span class="n">train_one_batch</span> <span class="p">{</span>
<span class="n">alg</span><span class="p">:</span> <span class="n">kBackPropagation</span>
<span class="p">}</span>
</pre></div>
</div>
</div>
<div class="section" id="cluster-configuration">
<span id="cluster-configuration"></span><h3>Cluster Configuration<a class="headerlink" href="#cluster-configuration" title="Permalink to this headline"></a></h3>
<p>The default cluster configuration can be used, i.e., single worker and single server
in a single process.</p>
</div>
</div>
</div>
</div>
</div>
<footer>
<hr/>
<div role="contentinfo">
<p>
&copy; Copyright 2016 The Apache Software Foundation. All rights reserved. Apache Singa, Apache, the Apache feather logo, and the Apache Singa project logos are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners..
</p>
</div>
Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT:'../',
VERSION:'0.3.0',
COLLAPSE_INDEX:false,
FILE_SUFFIX:'.html',
HAS_SOURCE: true
};
</script>
<script type="text/javascript" src="../_static/jquery.js"></script>
<script type="text/javascript" src="../_static/underscore.js"></script>
<script type="text/javascript" src="../_static/doctools.js"></script>
<script type="text/javascript" src="../_static/js/theme.js"></script>
<script type="text/javascript">
jQuery(function () {
SphinxRtdTheme.StickyNav.enable();
});
</script>
<div class="rst-versions shift-up" data-toggle="rst-versions" role="note" aria-label="versions">
<img src="../_static/apache.jpg">
<span class="rst-current-version" data-toggle="rst-current-version">
<span class="fa fa-book"> incubator-singa </span>
v: 0.3.0
<span class="fa fa-caret-down"></span>
</span>
<div class="rst-other-versions">
<dl>
<dt>Languages</dt>
<dd><a href="../../en/index.html">English</a></dd>
<dd><a href="../../zh/index.html">中文</a></dd>
<dd><a href="../../jp/index.html">日本語</a></dd>
<dd><a href="../../kr/index.html">한국어</a></dd>
</dl>
</div>
</div>
<a href="https://github.com/apache/incubator-singa">
<img style="position: absolute; top: 0; right: 0; border: 0; z-index: 10000;"
src="https://s3.amazonaws.com/github/ribbons/forkme_right_orange_ff7600.png"
alt="Fork me on GitHub">
</a>
</body>
</html>