blob: f9a7333fa3f68b477d24c7884067a341df52d9c9 [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="description" content="">
<meta name="author" content="">
<link rel="icon" href="../../favicon.ico">
<title>Joshua Documentation | Z-MERT</title>
<!-- Bootstrap core CSS -->
<link href="/dist/css/bootstrap.min.css" rel="stylesheet">
<!-- Custom styles for this template -->
<link href="/joshua6.css" rel="stylesheet">
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="blog-nav">
<!-- <a class="blog-nav-item active" href="#">Joshua</a> -->
<a class="blog-nav-item" href="/">Joshua</a>
<!-- <a class="blog-nav-item" href="/6.0/whats-new.html">New features</a> -->
<a class="blog-nav-item" href="/language-packs/">Language packs</a>
<a class="blog-nav-item" href="/data/">Datasets</a>
<a class="blog-nav-item" href="/support/">Support</a>
<a class="blog-nav-item" href="/contributors.html">Contributors</a>
</nav>
</div>
</div>
<div class="container">
<div class="row">
<div class="col-sm-2">
<div class="sidebar-module">
<!-- <h4>About</h4> -->
<center>
<img src="/images/joshua-logo-small.png" />
<p>Joshua machine translation toolkit</p>
</center>
</div>
<hr>
<center>
<a href="/releases/current/" target="_blank"><button class="button">Download Joshua 6.0.5</button></a>
<br />
<a href="/releases/runtime/" target="_blank"><button class="button">Runtime only version</button></a>
<p>Released November 5, 2015</p>
</center>
<hr>
<!-- <div class="sidebar-module"> -->
<!-- <span id="download"> -->
<!-- <a href="http://joshua-decoder.org/downloads/joshua-6.0.tgz">Download</a> -->
<!-- </span> -->
<!-- </div> -->
<div class="sidebar-module">
<h4>Using Joshua</h4>
<ol class="list-unstyled">
<li><a href="/6.0/install.html">Installation</a></li>
<li><a href="/6.0/quick-start.html">Quick Start</a></li>
</ol>
</div>
<hr>
<div class="sidebar-module">
<h4>Building new models</h4>
<ol class="list-unstyled">
<li><a href="/6.0/pipeline.html">Pipeline</a></li>
<li><a href="/6.0/tutorial.html">Tutorial</a></li>
<li><a href="/6.0/faq.html">FAQ</a></li>
</ol>
</div>
<!--
<div class="sidebar-module">
<h4>Phrase-based</h4>
<ol class="list-unstyled">
<li><a href="/6.0/phrase.html">Training</a></li>
</ol>
</div>
-->
<hr>
<div class="sidebar-module">
<h4>Advanced</h4>
<ol class="list-unstyled">
<li><a href="/6.0/bundle.html">Building language packs</a></li>
<li><a href="/6.0/decoder.html">Decoder options</a></li>
<li><a href="/6.0/file-formats.html">File formats</a></li>
<li><a href="/6.0/packing.html">Packing TMs</a></li>
<li><a href="/6.0/large-lms.html">Building large LMs</a></li>
</ol>
</div>
<hr>
<div class="sidebar-module">
<h4>Developer</h4>
<ol class="list-unstyled">
<li><a href="https://github.com/joshua-decoder/joshua">Github</a></li>
<li><a href="http://cs.jhu.edu/~post/joshua-docs">Javadoc</a></li>
<li><a href="https://groups.google.com/forum/?fromgroups#!forum/joshua_developers">Mailing list</a></li>
</ol>
</div>
</div><!-- /.blog-sidebar -->
<div class="col-sm-8 blog-main">
<div class="blog-title">
<h2>Z-MERT</h2>
</div>
<div class="blog-post">
<p>This document describes how to manually run the ZMERT module. ZMERT is Joshua’s minimum error-rate
training module, written by Omar F. Zaidan. It is easily adapted to drop in different decoders, and
was also written so as to work with different objective functions (other than BLEU).</p>
<p>((Section (1) in <code class="highlighter-rouge">$JOSHUA/examples/ZMERT/README_ZMERT.txt</code> is an expanded version of this section))</p>
<p>Z-MERT, can be used by launching the driver program (<code class="highlighter-rouge">ZMERT.java</code>), which expects a config file as
its main argument. This config file can be used to specify any subset of Z-MERT’s 20-some
parameters. For a full list of those parameters, and their default values, run ZMERT with a single
-h argument as follows:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>java -cp $JOSHUA/bin joshua.zmert.ZMERT -h
</code></pre>
</div>
<p>So what does a Z-MERT config file look like?</p>
<p>Examine the file <code class="highlighter-rouge">examples/ZMERT/ZMERT_config_ex2.txt</code>. You will find that it
specifies the following “main” MERT parameters:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>(*) -dir dirPrefix: working directory
(*) -s sourceFile: source sentences (foreign sentences) of the MERT dataset
(*) -r refFile: target sentences (reference translations) of the MERT dataset
(*) -rps refsPerSen: number of reference translations per sentence
(*) -p paramsFile: file containing parameter names, initial values, and ranges
(*) -maxIt maxMERTIts: maximum number of MERT iterations
(*) -ipi initsPerIt: number of intermediate initial points per iteration
(*) -cmd commandFile: name of file containing commands to run the decoder
(*) -decOut decoderOutFile: name of the output file produced by the decoder
(*) -dcfg decConfigFile: name of decoder config file
(*) -N N: size of N-best list (per sentence) generated in each MERT iteration
(*) -v verbosity: output verbosity level (0-2; higher value =&gt; more verbose)
(*) -seed seed: seed used to initialize the random number generator
</code></pre>
</div>
<p>(Note that the <code class="highlighter-rouge">-s</code> parameter is only used if Z-MERT is running Joshua as an
internal decoder. If Joshua is run as an external decoder, as is the case in
this README, then this parameter is ignored.)</p>
<p>To test Z-MERT on the 100-sentence test set of example2, provide this config
file to Z-MERT as follows:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>java -cp bin joshua.zmert.ZMERT -maxMem 500 examples/ZMERT/ZMERT_config_ex2.txt &gt; examples/ZMERT/ZMERT_example/ZMERT.out
</code></pre>
</div>
<p>This will run Z-MERT for a couple of iterations on the data from the example2
folder. (Notice that we have made copies of the source and reference files
from example2 and renamed them as src.txt and ref.* in the MERT_example folder,
just to have all the files needed by Z-MERT in one place.) Once the Z-MERT run
is complete, you should be able to inspect the log file to see what kinds of
things it did. If everything goes well, the run should take a few minutes, of
which more than 95% is time spent by Z-MERT waiting on Joshua to finish
decoding the sentences (once per iteration).</p>
<p>The output file you get should be equivalent to <code class="highlighter-rouge">ZMERT.out.verbosity1</code>. If you
rerun the experiment with the verbosity (-v) argument set to 2 instead of 1,
the output file you get should be equivalent to <code class="highlighter-rouge">ZMERT.out.verbosity2</code>, which has
more interesting details about what Z-MERT does.</p>
<p>Notice the additional <code class="highlighter-rouge">-maxMem</code> argument. It tells Z-MERT that it should not
persist to use up memory while the decoder is running (during which time Z-MERT
would be idle). The 500 tells Z-MERT that it can only use a maximum of 500 MB.
For more details on this issue, see section (4) in Z-MERT’s README.</p>
<p>A quick note about Z-MERT’s interaction with the decoder. If you examine the
file <code class="highlighter-rouge">decoder_command_ex2.txt</code>, which is provided as the commandFile (<code class="highlighter-rouge">-cmd</code>)
argument in Z-MERT’s config file, you’ll find it contains the command one would
use to run the decoder. Z-MERT launches the commandFile as an external
process, and assumes that it will launch the decoder to produce translations.
(Make sure that commandFile is executable.) After launching this external
process, Z-MERT waits for it to finish, then uses the resulting output file for
parameter tuning (in addition to the output files from previous iterations).
The command file here only has a single command, but your command file could
have multiple lines. Just make sure the command file itself is executable.</p>
<p>Notice that the Z-MERT arguments <code class="highlighter-rouge">configFile</code> and <code class="highlighter-rouge">decoderOutFile</code> (<code class="highlighter-rouge">-cfg</code> and
<code class="highlighter-rouge">-decOut</code>) must match the two Joshua arguments in the commandFile’s (<code class="highlighter-rouge">-cmd</code>) single
command. Also, the Z-MERT argument for N must match the value for <code class="highlighter-rouge">top_n</code> in
Joshua’s config file, indicated by the Z-MERT argument configFile (<code class="highlighter-rouge">-cfg</code>).</p>
<p>For more details on Z-MERT, refer to <code class="highlighter-rouge">$JOSHUA/examples/ZMERT/README_ZMERT.txt</code></p>
<!-- <h4 class="blog-post-title">Welcome to Joshua!</h4> -->
<!-- <p>This blog post shows a few different types of content that's supported and styled with Bootstrap. Basic typography, images, and code are all supported.</p> -->
<!-- <hr> -->
<!-- <p>Cum sociis natoque penatibus et magnis <a href="#">dis parturient montes</a>, nascetur ridiculus mus. Aenean eu leo quam. Pellentesque ornare sem lacinia quam venenatis vestibulum. Sed posuere consectetur est at lobortis. Cras mattis consectetur purus sit amet fermentum.</p> -->
<!-- <blockquote> -->
<!-- <p>Curabitur blandit tempus porttitor. <strong>Nullam quis risus eget urna mollis</strong> ornare vel eu leo. Nullam id dolor id nibh ultricies vehicula ut id elit.</p> -->
<!-- </blockquote> -->
<!-- <p>Etiam porta <em>sem malesuada magna</em> mollis euismod. Cras mattis consectetur purus sit amet fermentum. Aenean lacinia bibendum nulla sed consectetur.</p> -->
<!-- <h2>Heading</h2> -->
<!-- <p>Vivamus sagittis lacus vel augue laoreet rutrum faucibus dolor auctor. Duis mollis, est non commodo luctus, nisi erat porttitor ligula, eget lacinia odio sem nec elit. Morbi leo risus, porta ac consectetur ac, vestibulum at eros.</p> -->
<!-- <h3>Sub-heading</h3> -->
<!-- <p>Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.</p> -->
<!-- <pre><code>Example code block</code></pre> -->
<!-- <p>Aenean lacinia bibendum nulla sed consectetur. Etiam porta sem malesuada magna mollis euismod. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa.</p> -->
<!-- <h3>Sub-heading</h3> -->
<!-- <p>Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Aenean lacinia bibendum nulla sed consectetur. Etiam porta sem malesuada magna mollis euismod. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus.</p> -->
<!-- <ul> -->
<!-- <li>Praesent commodo cursus magna, vel scelerisque nisl consectetur et.</li> -->
<!-- <li>Donec id elit non mi porta gravida at eget metus.</li> -->
<!-- <li>Nulla vitae elit libero, a pharetra augue.</li> -->
<!-- </ul> -->
<!-- <p>Donec ullamcorper nulla non metus auctor fringilla. Nulla vitae elit libero, a pharetra augue.</p> -->
<!-- <ol> -->
<!-- <li>Vestibulum id ligula porta felis euismod semper.</li> -->
<!-- <li>Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.</li> -->
<!-- <li>Maecenas sed diam eget risus varius blandit sit amet non magna.</li> -->
<!-- </ol> -->
<!-- <p>Cras mattis consectetur purus sit amet fermentum. Sed posuere consectetur est at lobortis.</p> -->
<!-- </div><\!-- /.blog-post -\-> -->
</div>
</div><!-- /.row -->
</div><!-- /.container -->
<!-- Bootstrap core JavaScript
================================================== -->
<!-- Placed at the end of the document so the pages load faster -->
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
<script src="../../dist/js/bootstrap.min.js"></script>
<!-- <script src="../../assets/js/docs.min.js"></script> -->
<!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
<!-- <script src="../../assets/js/ie10-viewport-bug-workaround.js"></script>
-->
<!-- Start of StatCounter Code for Default Guide -->
<script type="text/javascript">
var sc_project=8264132;
var sc_invisible=1;
var sc_security="4b97fe2d";
</script>
<script type="text/javascript" src="http://www.statcounter.com/counter/counter.js"></script>
<noscript>
<div class="statcounter">
<a title="hit counter joomla"
href="http://statcounter.com/joomla/"
target="_blank">
<img class="statcounter"
src="http://c.statcounter.com/8264132/0/4b97fe2d/1/"
alt="hit counter joomla" />
</a>
</div>
</noscript>
<!-- End of StatCounter Code for Default Guide -->
</body>
</html>