commit | 9e43cf6e5a9c7dbcc48f0b97b3d6e6f23205f794 | [log] [tgz] |
---|---|---|
author | Matt Post <post@cs.jhu.edu> | Thu Feb 23 16:54:42 2017 -0500 |
committer | Matt Post <post@cs.jhu.edu> | Thu Feb 23 16:54:42 2017 -0500 |
tree | 804875e1b7117bdd6a49aed42ddb5e469402b6ab | |
parent | d0ad680c78f73a47a54ebdbccbf29d30aa7b5354 [diff] |
now supports relative paths Joshua now computes all paths read from config files relative to location of joshua.config. This eases building language packs and loading models from a library because you no longer have to worry about cd()ing to the correct place first. Any relative paths (e.g., those provided in language packs) will be canonicalized by prefixing the full path to the parent directory containing joshua.config. Absolute paths are not affected.
Joshua is a statistical machine translation toolkit for both phrase-based (new in version 6.0) and syntax-based decoding. It can be run with pre-built language packs available for download, and can also be used to build models for new language pairs. Among the many features of Joshua are:
The latest release of Joshua is always linked to directly from the Home Page
Joshua 6.X includes the following new features:
Joshua must be run with a Java JDK 1.8 minimum.
To run the decoder in any form requires setting a few basic environment variables: $JAVA_HOME
, $JOSHUA
, and, for certain (optional) portions of the model-training pipeline, potentially $MOSES
.
export JAVA_HOME=/path/to/java # maybe /usr/java/home export JOSHUA=/path/to/joshua
You might also find it helpful to set these:
export LC_ALL=en_US.UTF-8 export LANG=en_US.UTF-8
Then, compile Joshua by typing:
cd $JOSHUA mvn clean package
You also need to download and compile KenLM and Thrax:
bash ./download-deps.sh
The basic method for invoking the decoder looks like this:
cat SOURCE | $JOSHUA/bin/joshua-decoder -m MEM -c CONFIG OPTIONS > OUTPUT
Some example usage scenarios and scripts can be found in the examples/ directory.
If you are hoping to work on the decoder, we suggest you use Eclipse. You can get started with this by typing
mvn eclipse:eclipse
Joshua includes a number of “language packs”, which are pre-built models that allow you to use the translation system as a black box, without worrying too much about how machine translation works. You can browse the models available for download on the Joshua website.
Joshua includes a pipeline script that allows you to build new models, provided you have training data. This pipeline can be run (more or less) by invoking a single command, which handles data preparation, alignment, phrase-table or grammar construction, and tuning of the model parameters. See the documentation for a walkthrough and more information about the many available options.
Joshua is licensed and released under the permissive Apache License v2.0, a copy of which ships with the Joshua source code.