| - [ ] language model is built incorrectly when starting at MERT with |
| a parsed corpus (maybe SAMT should expect a plain corpus and a .parsed one) |
| - [ ] add recasing with recursive call to pipeline.pl (provide a 1-1 |
| alignment) |
| - [ ] pipeline shold output a script that can be easily - |
| used to decode another test set |
| - [ ] add tree output for test sets |
| - [ ] run MERT multiple times |
| - [X] hadoop cluster roll-out |
| - [X] rm -r hadoop directory after retrieving grammar successfully |
| - [ ] change qsub arg defaults when doing SAMT |
| - [ ] don't put number in train files if maxlen == 0 |
| - [ ] should be easier to stop and start runs (locations of canonical files) |
| - [ ] add in kenlm binarization of the language model |
| - [ ] better tokenization (url aware, e.g.,) |