| == 4.01 == |
| |
| - Bugfixes: |
| |
| - If the training data were split into 10 or more pieces, the alignments were combined |
| in the wrong order (due to a lexicographic instead of numeric sort on a file glob). |
| |
| - Removed "thrax" dependency from "all" target |
| |
| - Fixed bug in MERT and PRO that waited for all the decoder STDOUT before moving on |
| to STDERR. With large enough dev sets, this could hang indefinitely. |
| |
| - Updated berkeleyLM to version 1.1 (r463) with built-in thread-safe caching. |
| |
| |
| == 4.0 (July 2, 2012) =========================================== |
| |
| The main features of this release are described in |
| |
| "Joshua 4.0: Packing, PRO, and Paraphrasing." |
| Juri Ganitkevitch, Yuan Cao, Jonny Weese, Matt Post, and Chris Callison-Burch. |
| NAACL Workshop on Statistical Machine Translation, June, 2012. |
| |
| They include: |
| |
| - Significantly improved and expanded documentation (both user and developer) |
| |
| See http://joshua-decoder.org/4.0 or ./joshua-decoder.org/4.0/index.html (local mirror) |
| |
| - Synchronous parsing |
| |
| Joshua will compute the best synchronous derivation over a pair of |
| sentences. Pass the sentences in in the form |
| |
| source sentence ||| target sentence |
| |
| and set the parameter "parse = true" (either from the config file or |
| command-line). |
| |
| - PRO implementation |
| |
| We include an implementation of Pairwise Ranking Optimization (PRO, |
| Hopkins & May, EMNLP 2011). It can be activated by passing "--tuner |
| pro" to the pipeline script. |
| |
| - Grammar packing |
| |
| We include an efficient grammar representation that can be used to |
| greatly reduce the memory footprint of large grammars. |
| |
| - Numerous bugfixes |
| |
| == 3.2 (February 17, 2012) ====================================== |
| |
| - Pop-limit pruning. |
| |
| Pruning can now be specified with a single parameter "pop-limit" |
| parameter, which limits the number of pops from the cube pruning |
| candidate list at the span level. This replaces the beam and |
| threshold pruning that was governed by four parameters (fuzz1, |
| fuzz2, relative_threshold, and max_n_rules), whose performance and |
| interaction was somewhat difficult to characterize. The pop-limit |
| allows a simple relationship between decoding time and model score |
| to be defined. |
| |
| Setting "pop-limit" in the configuration file or from the command |
| line turns off beam-and-threshold pruning, and its use is |
| recommended. The default setting is to use a pop-limit of 100. |
| |
| - Multiple language model support |
| |
| You can now specify an arbitrary number of language models. See the |
| documentation in |
| |
| $JOSHUA/scripts/training/templates/mert/joshua.config |
| |
| for information on how to do this. You can also specify multiple |
| --lmfile flags to the pipeline.pl script. |
| |
| - Multiple optimizer + test runs (--optimizer-runs N), averaging the |
| results at the end (Clark et al., ACL 2011) |
| |
| - Added support for BerkeleyLM (Pauls and Klein, ACL 2011) |
| |
| - Support for lattice decoding (thanks to Lane Schwartz and the |
| miniSCALE 2012 team) |
| |
| - Pipeline script: |
| |
| - Removed all external dependencies (e.g., Moses, SRILM) |
| |
| - Reorganized the training data |
| |
| - Permit multiple test runs with subsequent --test FILE --name NAME |
| calls to the pipeline |
| |
| - GIZA++ runs are parallelized if more than one thread is permitted |
| (--threads N, N >=2 ) |
| |
| - Numerous bugfixes |
| |
| - Hadoop cluster rollout is now a single instance (slower but |
| doesn't require error-prone server setup) |
| |
| - Parameters |
| |
| - Joshua now dies if it encounters unknown parameters on the command |
| line or config file |
| |
| - Parameters are now normalized to remove hyphens (-) and |
| underscores (_) and to flatten case, permitting you to specify any |
| of, for example, {pop-limit, popLimit, pop_limit, ...} |
| |
| - Lots of reorganization and purging of old code |
| |
| |
| 3.1 |
| ============================= |
| |
| - Fixed multithreading. Use -threads N from the command line or |
| configuration file to spawn N parallel decoding threads. |
| |
| - Configuration file parameters can now be overridden from the command |
| line. The format is |
| |
| -parameter value |
| |
| Among these must be the configuration file itself, which can be |
| referred to with -config, or -c for short. |
| |
| 3.0 |
| === |
| |
| - Added the highly parameterizable Hadoop-based Thrax grammar |
| extractor, which extracts both Hiero and SAMT grammars. |
| |
| - Incorporated a black-box pipeline script at |
| $JOSHUA/scripts/training/pipeline.pl |
| |
| - Moved development to github.com. |