tag	7d0d15c8f2f02f25ea5c85652ba3fc3dd48291eb
tagger	Matt Post <post@cs.jhu.edu>	Wed Nov 04 14:39:50 2015 -0500
object	71583d64b064fe141c07b02f011e0907c2ce278d

commit	71583d64b064fe141c07b02f011e0907c2ce278d	[log] [tgz]
author	Matt Post <post@cs.jhu.edu>	Wed Nov 04 14:18:33 2015 -0500
committer	Matt Post <post@cs.jhu.edu>	Wed Nov 04 14:18:33 2015 -0500
tree	627eee78e925ae106ce044c83fed4742eba16f24
parent	b99e1c588961e53d2c77cb3787d9237e1f581947 [diff]

tree: 627eee78e925ae106ce044c83fed4742eba16f24

README.md

Welcome to Joshua

Joshua is a statistical machine translation toolkit for both phrase-based (new in version 6.0) and syntax-based decoding. It can be run with pre-built language packs available for download, and can also be used to build models for new language pairs. Among the many features of Joshua are:

Support for both phrase-based and syntax-based decoding models
Translation of weighted input lattices
Thrax: a Hadoop-based, scalable grammar extractor
A sparse feature architecture supporting an arbitrary number of features

The latest release of Joshua is always linked to directly from the Home Page

New in 6.0

Joshua 6.0 includes the following new features:

A fast phrase-based decoder with the ability to read Moses phrase tables
Large speed improvements compared to the previous syntax-based decoder
Special input handling
A host of bugfixes and stability improvements

Working with “language packs”

Joshua includes a number of “language packs”, which are pre-built models that allow you to use the translation system as a black box, without worrying too much about how machine translation works. You can browse the models available for download on the Joshua website.

Building new models

Joshua includes a pipeline script that allows you to build new models, provided you have training data. This pipeline can be run (more or less) by invoking a single command, which handles data preparation, alignment, phrase-table or grammar construction, and tuning of the model parameters. See the documentation for a walkthrough and more information about the many available options.

Quick start

To run the decoder in any form requires setting a few basic environment variables: $JAVA_HOME, $JOSHUA, and potentially $MOSES.

export JAVA_HOME=/path/to/java  # maybe /usr/java/home
export JOSHUA=/path/to/joshua

You might also find it helpful to set these:

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8

Then, compile Joshua by typing:

cd $JOSHUA
ant

The basic method for invoking the decoder looks like this:

cat SOURCE | $JOSHUA/bin/joshua -m MEM -c CONFIG OPTIONS > OUTPUT

Some example usage scenarios and scripts can be found in the examples/ directory.