| Apache Joshua Language Pack |
| =========================== |
| |
| Thanks for downloading the Apache Joshua <SOURCE>--<TARGET> |
| language pack. This language pack provides a machine translation |
| system for automatically translating sentences from <SOURCE> to |
| sentences in <TARGET>. Joshua language packs have no external |
| dependencies, and can be run straight from the provided JAR file. |
| |
| Please see the "Quick Start" section below to learn how to run the |
| language pack. Additional runtime options can be found in the "Runtime |
| Options" section. For complete documentation, please visit the Joshua |
| website, https://joshua.apache.org/ |
| |
| For information on the data used to construct this language pack, |
| please see the CREDITS file, and to see its performance on a range of |
| different publicly available test sets, see BENCHMARKS. |
| |
| A small number of example sentences are provided in 'example.<SRC>', along |
| with a human reference translation for each in 'example.<TRG>'. |
| |
| If you have comments, questions, or concerns, please email Joshua user |
| support: dev@joshua.apache.org. |
| |
| This language pack was released on <DATE>. |
| |
| Quick Start |
| ----------- |
| To run the language pack, invoke the command |
| |
| joshua [OPTIONS ...] |
| |
| The Joshua decoder will start running, accepting input from STDIN and writing to |
| STDOUT. Joshua expects its input in the form of a single sentence per line. Each |
| sentence should first be piped through `prepare.sh`, which normalizes and |
| tokenizes the input for the language pack's source language. |
| |
| cat example.<SRC> | prepare.sh | joshua > output.<TRG> |
| |
| It takes some time (sometimes as much as a minute) to load all of the models |
| into memory, which means there is high latency from startup until the first |
| translation. To reduce this time, Joshua can also be run in server mode, |
| implementing either a direct TCP-IP interface, or implementing a |
| Google-translate style RESTful API. To run Joshua as a TCP-IP server, add the |
| option |
| |
| joshua -server-port 5674 |
| |
| You can then connect directly to the socket using nc or telnet: |
| |
| cat example.<SRC> | prepare.sh | nc localhost 5674 > output.<TRG> |
| |
| You can set the RESTful interface by also passing `-server-type http`: |
| |
| joshua -server-port 5674 -server-type http |
| |
| The RESTful interface is used when running the browser demo (see web/index.html) |
| or when using the Joshua Translation Engine. |
| |
| Web demo |
| -------- |
| See the web/ directory for a web-based AJAX interface to Java. One feature of this |
| approach is that it permits you to add custom words and phrases to Joshua, which then |
| become translation options for the decoder. To start the web demo, |
| |
| 1. Start Joshua in HTTP server mode, as described above |
| |
| ./joshua -server-port 5674 -server-type http |
| |
| This starts Joshua running on port 5674. |
| |
| 2. Load the web interface, passing it to server and port you are using |
| |
| firefox "web/index.html?server=localhost&port=5674" |
| |
| Any browser will do. |
| |
| You can then translate text, entering one sentence per line, in the main box. You can |
| also experiment with adding words and phrases. These are then saved to a custom grammar. |
| These rules can be managed on the "Rules" tab. |
| |
| Runtime options |
| --------------- |
| By default, the language pack runs with a single thread and with options set to |
| balance speed and accuracy. These and many other runtime options can be changed |
| with the following arguments and parameters to the Joshua invocation |
| demonstrated above. |
| |
| - `-v 1` |
| |
| Be more verbose in output. |
| |
| - `-threads N` |
| |
| N is the number of simultaneous decoding threads to launch. If this option is |
| omitted from the command line and the configuration file, the default number of |
| threads, which is 1, will be used. |
| |
| Decoded outputs are assembled in order and Joshua has to hold on to the |
| complete target hypergraph until it is ready to be processed for output, so too |
| many simultaneous threads could result in lots of memory usage if a long |
| sentence results in many sentences being queued up. We have run Joshua with as |
| many as 48 threads without any problems of this kind, but it’s useful to keep |
| in the back of your mind. |
| |
| - `-pop-limit N` |
| |
| This controls how many hypotheses Joshua explores. You can make Joshua faster |
| (but less accurate) by reducing N, and you can make it more accurate (but |
| slower) by increasing N. We suggest values of N as low as 5 and as high as |
| 1000. The default is 100. |
| |
| - `-output-format "formatting string" |
| |
| Specify the output-format variable, which is interpolated for the following |
| variables: |
| |
| %i : the 0-indexed input sentence number |
| %s : the best translation, lower-cased and tokenized |
| %c : the model cost |
| %f : the values of the features of the best translation |
| %S : the best translation, denormalized and re-cased |
| |
| The default value is "%S". |