layout: default6 title: Language packs

The simplest way to use Joshua is to use the provided “language packs”, which are pre-built models that enable translation in for particular language pairs. You can download and unpack each model and then run the included script to translate new sentences.

It is important to note the assumptions underlying the translation engine:

  • Joshua takes input on STDIN and outputs translations to STDOUT.

  • Joshua expects its input to be one plain-text, UTF-8 encoded sentence per UNIX-delimited line. If you are translating documents, you must perform sentence segmentation yourself.

  • Additionally, the input must be tokenized. To tokenize your data, you can use the script provided in each language pack.

Runtime decoder

The default release version of Joshua includes hundreds of megabytes of dependencies used for building models from your own language pairs. If you only wish to run language packs (effectively using Joshua as a black-box translation engine), you can install a “light” version of Joshua that includes only dependencies needed to run the translation piece. See the notes on the installation page.

Available language packs

For information about English Paraphrase Language packs, please see this page.

Have a request? Please email Matt Post.