scripts/training/TODO - joshua - Git at Google

 - [ ] language model is built incorrectly when starting at MERT with
   a parsed corpus (maybe SAMT should expect a plain corpus and a .parsed one)
 - [ ] add recasing with recursive call to pipeline.pl (provide a 1-1
   alignment)
 - [ ] pipeline shold output a script that can be easily -
   used to decode another test set
 - [ ] add tree output for test sets
 - [ ] run MERT multiple times
 - [X] hadoop cluster roll-out
 - [X] rm -r hadoop directory after retrieving grammar successfully
 - [ ] change qsub arg defaults when doing SAMT
 - [ ] don't put number in train files if maxlen == 0
 - [ ] should be easier to stop and start runs (locations of canonical files)
 - [ ] add in kenlm binarization of the language model
 - [ ] better tokenization (url aware, e.g.,)
	- [ ] language model is built incorrectly when starting at MERT with
	a parsed corpus (maybe SAMT should expect a plain corpus and a .parsed one)
	- [ ] add recasing with recursive call to pipeline.pl (provide a 1-1
	alignment)
	- [ ] pipeline shold output a script that can be easily -
	used to decode another test set
	- [ ] add tree output for test sets
	- [ ] run MERT multiple times
	- [X] hadoop cluster roll-out
	- [X] rm -r hadoop directory after retrieving grammar successfully
	- [ ] change qsub arg defaults when doing SAMT
	- [ ] don't put number in train files if maxlen == 0
	- [ ] should be easier to stop and start runs (locations of canonical files)
	- [ ] add in kenlm binarization of the language model
	- [ ] better tokenization (url aware, e.g.,)