| Running the Joshua Decoder: |
| --------------------------- |
| |
| If you wish to run the complete machine translation pipeline, Joshua includes a |
| black-box implementation that enables the entire pipeline to be run by typing |
| a single restartable command. See the documentation for a walkthrough and more |
| information about the many options available to the pipeline. |
| |
| - web: http://joshua-decoder.org/5.0/pipeline.html |
| - local mirror: ./joshua-decoder.org/5.0/pipeline.html |
| |
| Manually Running the Joshua Decoder: |
| ------------------------------------ |
| |
| To run the decoder, first set these environment variables: |
| |
| export JAVA_HOME=/path/to/java # maybe /usr/java/home |
| export JOSHUA=/path/to/joshua |
| |
| You might also find it helpful to set these: |
| |
| export LC_ALL=en_US.UTF-8 |
| export LANG=en_US.UTF-8 |
| |
| Then, compile Joshua by typing: |
| |
| cd $JOSHUA |
| ant devel |
| ant all |
| |
| Some issues you may encounter: |
| 1. You get a message stating: "no ken in java.library.path" |
| Symptoms: for some reason your Ken library didn't build. |
| Causes: you are missing the boost library |
| Reference URLs: |
| https://groups.google.com/forum/#!topic/joshua_support/SiGO41tkpsw |
| http://stackoverflow.com/questions/12583080/c-library-in-using-boost-library |
| Corrective Action: |
| Install Boost library: It may be as easy as sudo yum install boost-devel, otherwise, download boost from source, build and install. |
| |
| The basic method for invoking the decoder looks like this: |
| |
| cat SOURCE | JOSHUA -c CONFIG > OUTPUT |
| |
| You can test this using the sample configuration files and inputs can be found |
| in the example/ directory. For example, type: |
| |
| cat examples/example/test.in | $JOSHUA/bin/decoder -c examples/example/joshua.config |
| |
| The decoder output will load the language model and translation models defined |
| in the configuration file, and will then decode the five sentences in the |
| example file. |
| |
| There are a variety of command line options that you can feed to Joshua. |
| For example, you can enable multithreaded decoding with the -threads N flag: |
| |
| cat examples/example/test.in | $JOSHUA/bin/decoder -c examples/example/joshua.config -threads 5 |
| |
| The configuration file defines many additional parameters, all of which can be |
| overridden on the command line by using the format -PARAMETER value. For |
| example, to output the top 10 hypotheses instead of just the top 1 specified in |
| the configuration file, use -top-n N: |
| |
| cat examples/example/test.in | $JOSHUA/bin/decoder -c examples/example/joshua.config -top-n 10 |
| |
| Parameters, whether in the configuration file or on the command line, are |
| converted to a canonical internal representation that ignores hyphens, |
| underscores, and case. So, for example, the following parameters are all |
| equivalent: |
| |
| {top-n, topN, top_n, TOP_N, t-o-p-N} |
| {poplimit, pop-limit, pop-limit, popLimit} |
| |
| and so on. For an example of parameters, see the Joshua configuration file |
| template in $JOSHUA/scripts/training/templates/tune/joshua.config or the online |
| documentation at joshua-decoder.org/4.0/decoder.html. There is a wealth of |
| information in the online documentation. |
| |
| After you have successfully run the decoding example above, we recommend that |
| you take a look at the Joshua pipeline script, which allows you to do full |
| end-to-end training of a translation model. It is stored in |
| |
| $JOSHUA/examples |
| |
| |
| |