blob: 647dd68a4f057da418b82dce7d7ced389c91d4f4 [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="description" content="">
<meta name="author" content="">
<link rel="icon" href="../../favicon.ico">
<title>Joshua Documentation | Grammar Packing</title>
<!-- Bootstrap core CSS -->
<link href="/dist/css/bootstrap.min.css" rel="stylesheet">
<!-- Custom styles for this template -->
<link href="/joshua6.css" rel="stylesheet">
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="blog-nav">
<!-- <a class="blog-nav-item active" href="#">Joshua</a> -->
<a class="blog-nav-item" href="/">Joshua</a>
<!-- <a class="blog-nav-item" href="/6.0/whats-new.html">New features</a> -->
<a class="blog-nav-item" href="/language-packs/">Language packs</a>
<a class="blog-nav-item" href="/data/">Datasets</a>
<a class="blog-nav-item" href="/support/">Support</a>
<a class="blog-nav-item" href="/contributors.html">Contributors</a>
</nav>
</div>
</div>
<div class="container">
<div class="row">
<div class="col-sm-2">
<div class="sidebar-module">
<!-- <h4>About</h4> -->
<center>
<img src="/images/joshua-logo-small.png" />
<p>Joshua machine translation toolkit</p>
</center>
</div>
<hr>
<center>
<a href="/releases/current/" target="_blank"><button class="button">Download Joshua 6.0.5</button></a>
<br />
<a href="/releases/runtime/" target="_blank"><button class="button">Runtime only version</button></a>
<p>Released November 5, 2015</p>
</center>
<hr>
<!-- <div class="sidebar-module"> -->
<!-- <span id="download"> -->
<!-- <a href="http://joshua-decoder.org/downloads/joshua-6.0.tgz">Download</a> -->
<!-- </span> -->
<!-- </div> -->
<div class="sidebar-module">
<h4>Using Joshua</h4>
<ol class="list-unstyled">
<li><a href="/6.0/install.html">Installation</a></li>
<li><a href="/6.0/quick-start.html">Quick Start</a></li>
</ol>
</div>
<hr>
<div class="sidebar-module">
<h4>Building new models</h4>
<ol class="list-unstyled">
<li><a href="/6.0/pipeline.html">Pipeline</a></li>
<li><a href="/6.0/tutorial.html">Tutorial</a></li>
<li><a href="/6.0/faq.html">FAQ</a></li>
</ol>
</div>
<!--
<div class="sidebar-module">
<h4>Phrase-based</h4>
<ol class="list-unstyled">
<li><a href="/6.0/phrase.html">Training</a></li>
</ol>
</div>
-->
<hr>
<div class="sidebar-module">
<h4>Advanced</h4>
<ol class="list-unstyled">
<li><a href="/6.0/bundle.html">Building language packs</a></li>
<li><a href="/6.0/decoder.html">Decoder options</a></li>
<li><a href="/6.0/file-formats.html">File formats</a></li>
<li><a href="/6.0/packing.html">Packing TMs</a></li>
<li><a href="/6.0/large-lms.html">Building large LMs</a></li>
</ol>
</div>
<hr>
<div class="sidebar-module">
<h4>Developer</h4>
<ol class="list-unstyled">
<li><a href="https://github.com/joshua-decoder/joshua">Github</a></li>
<li><a href="http://cs.jhu.edu/~post/joshua-docs">Javadoc</a></li>
<li><a href="https://groups.google.com/forum/?fromgroups#!forum/joshua_developers">Mailing list</a></li>
</ol>
</div>
</div><!-- /.blog-sidebar -->
<div class="col-sm-8 blog-main">
<div class="blog-title">
<h2>Grammar Packing</h2>
</div>
<div class="blog-post">
<p>Grammar packing refers to the process of taking a textual grammar
output by <a href="thrax.html">Thrax</a> (or Moses, for phrase-based models) and
efficiently encoding it so that it can be loaded
<a href="https://aclweb.org/anthology/W/W12/W12-3134.pdf">very quickly</a>
packing the grammar results in significantly faster load times for
very large grammars. Packing is done automatically by the
<a href="pipeline.html">Joshua pipeline</a>, but you can also run the packer
manually.</p>
<p>The script can be found at
<code class="highlighter-rouge">$JOSHUA/scripts/support/grammar-packer.pl</code>. See that script for
example usage. You can then add it to a Joshua config file, simply
replacing a <code class="highlighter-rouge">tm</code> path to the compressed text-file format with a path
to the packed grammar directory (Joshua will automatically detect that
it is packed, since a packed grammar is a directory).</p>
<p>Packing the grammar requires first sorting it by the rules source side,
which can take quite a bit of temporary space.</p>
<p><em>CAVEAT</em>: You may run into problems packing very very large Hiero
grammars. Email the support list if you do.</p>
<h3 id="examples">Examples</h3>
<p>A Hiero grammar, using the compressed text file version:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>tm = hiero -owner pt -maxspan 20 -path grammar.filtered.gz
</code></pre>
</div>
<p>Pack it:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>$JOSHUA/scripts/support/grammar-packer.pl grammar.filtered.gz grammar.packed
</code></pre>
</div>
<p>Pack a really big grammar:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>$JOSHUA/scripts/support/grammar-packer.pl -m 30g grammar.filtered.gz grammar.packed
</code></pre>
</div>
<p>Be a little more verbose:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>$JOSHUA/scripts/support/grammar-packer.pl -m 30g grammar.filtered.gz grammar.packed
</code></pre>
</div>
<p>You have a different temp file location:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>$JOSHUA/scripts/support/grammar-packer.pl -T /local grammar.filtered.gz grammar.packed
</code></pre>
</div>
<p>Update the config file line:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>tm = hiero -owner pt -maxspan 20 -path grammar.packed
</code></pre>
</div>
<h3 id="using-multiple-packed-grammars-joshua-605">Using multiple packed grammars (Joshua 6.0.5)</h3>
<p>Packed grammars serialize their vocabularies which prevented the use of multiple
packed grammars during decoding. With Joshua 6.0.5, it is possible to use multiple packed grammars during decoding if they have the same serialized vocabulary.
This is achieved by packing these grammars jointly using a revised packing CLI.</p>
<p>To pack multiple grammars:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>$JOSHUA/scripts/support/grammar-packer.pl grammar1.filtered.gz grammar2.filtered.gz [...] grammar1.packed grammar2.packed [...]
</code></pre>
</div>
<p>This will produce two packed grammars with the same vocabulary. To use them in the decoder, put this in your <code class="highlighter-rouge">joshua.config</code>:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>tm = hiero -owner pt -maxspan 20 -path grammar1.packed
tm = hiero -owner pt2 -maxspan 20 -path grammar2.packed
</code></pre>
</div>
<p>Note the different owners.
If you are trying to load multiple packed grammars that do not have the same
vocabulary, the decoder will throw a RuntimeException at loading time:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>Exception in thread "main" java.lang.RuntimeException: Trying to load multiple packed grammars with different vocabularies! Have you packed them jointly?
</code></pre>
</div>
<!-- <h4 class="blog-post-title">Welcome to Joshua!</h4> -->
<!-- <p>This blog post shows a few different types of content that's supported and styled with Bootstrap. Basic typography, images, and code are all supported.</p> -->
<!-- <hr> -->
<!-- <p>Cum sociis natoque penatibus et magnis <a href="#">dis parturient montes</a>, nascetur ridiculus mus. Aenean eu leo quam. Pellentesque ornare sem lacinia quam venenatis vestibulum. Sed posuere consectetur est at lobortis. Cras mattis consectetur purus sit amet fermentum.</p> -->
<!-- <blockquote> -->
<!-- <p>Curabitur blandit tempus porttitor. <strong>Nullam quis risus eget urna mollis</strong> ornare vel eu leo. Nullam id dolor id nibh ultricies vehicula ut id elit.</p> -->
<!-- </blockquote> -->
<!-- <p>Etiam porta <em>sem malesuada magna</em> mollis euismod. Cras mattis consectetur purus sit amet fermentum. Aenean lacinia bibendum nulla sed consectetur.</p> -->
<!-- <h2>Heading</h2> -->
<!-- <p>Vivamus sagittis lacus vel augue laoreet rutrum faucibus dolor auctor. Duis mollis, est non commodo luctus, nisi erat porttitor ligula, eget lacinia odio sem nec elit. Morbi leo risus, porta ac consectetur ac, vestibulum at eros.</p> -->
<!-- <h3>Sub-heading</h3> -->
<!-- <p>Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.</p> -->
<!-- <pre><code>Example code block</code></pre> -->
<!-- <p>Aenean lacinia bibendum nulla sed consectetur. Etiam porta sem malesuada magna mollis euismod. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa.</p> -->
<!-- <h3>Sub-heading</h3> -->
<!-- <p>Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Aenean lacinia bibendum nulla sed consectetur. Etiam porta sem malesuada magna mollis euismod. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus.</p> -->
<!-- <ul> -->
<!-- <li>Praesent commodo cursus magna, vel scelerisque nisl consectetur et.</li> -->
<!-- <li>Donec id elit non mi porta gravida at eget metus.</li> -->
<!-- <li>Nulla vitae elit libero, a pharetra augue.</li> -->
<!-- </ul> -->
<!-- <p>Donec ullamcorper nulla non metus auctor fringilla. Nulla vitae elit libero, a pharetra augue.</p> -->
<!-- <ol> -->
<!-- <li>Vestibulum id ligula porta felis euismod semper.</li> -->
<!-- <li>Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.</li> -->
<!-- <li>Maecenas sed diam eget risus varius blandit sit amet non magna.</li> -->
<!-- </ol> -->
<!-- <p>Cras mattis consectetur purus sit amet fermentum. Sed posuere consectetur est at lobortis.</p> -->
<!-- </div><\!-- /.blog-post -\-> -->
</div>
</div><!-- /.row -->
</div><!-- /.container -->
<!-- Bootstrap core JavaScript
================================================== -->
<!-- Placed at the end of the document so the pages load faster -->
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
<script src="../../dist/js/bootstrap.min.js"></script>
<!-- <script src="../../assets/js/docs.min.js"></script> -->
<!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
<!-- <script src="../../assets/js/ie10-viewport-bug-workaround.js"></script>
-->
<!-- Start of StatCounter Code for Default Guide -->
<script type="text/javascript">
var sc_project=8264132;
var sc_invisible=1;
var sc_security="4b97fe2d";
</script>
<script type="text/javascript" src="http://www.statcounter.com/counter/counter.js"></script>
<noscript>
<div class="statcounter">
<a title="hit counter joomla"
href="http://statcounter.com/joomla/"
target="_blank">
<img class="statcounter"
src="http://c.statcounter.com/8264132/0/4b97fe2d/1/"
alt="hit counter joomla" />
</a>
</div>
</noscript>
<!-- End of StatCounter Code for Default Guide -->
</body>
</html>