blob: 95e99bc9e91aa7a004e530e45214fc2828523dc1 [file] [log] [blame]
An example of how to build a model using the Fisher Spanish CALLHOME corpus
A) Download the corpus:
1) mkdir $HOME/git
2) cd $HOME/git
3) curl -o fisher-callhome-corpus.zip https://codeload.github.com/joshua-decoder/fisher-callhome-corpus/legacy.zip/master
4) unzip fisher-callhome-corpus.zip
5) # Set environment variable SPANISH=$HOME/git/fisher-callhome-corpus
5) mv joshua-decoder-*/ fisher-callhome-corpus
B) Download and install Joshua:
1) cd /directory/to/install/
2) git clone https://github.com/joshua-decoder/joshua.git
3) cd joshua
4) # Set environment variable JAVA_HOME=/path/to/java # Try $(readlink -f /usr/bin/javac | sed "s:/bin/javac::")
5) # Set environment variable JOSHUA=/directory/to/install/joshua
6) ant devel
C) Train the model:
1) mkdir -p $HOME/expts/joshua && cd $HOME/expts/joshua
2) $JOSHUA/bin/pipeline.pl \
--rundir 1 \
--readme "Baseline Hiero run" \
--source es \
--target en \
--lm-gen srilm \
--witten-bell \
--corpus $SPANISH/corpus/asr/callhome_train \
--corpus $SPANISH/corpus/asr/fisher_train \
--tune $SPANISH/corpus/asr/fisher_dev \
--test $SPANISH/corpus/asr/callhome_devtest \
--lm-order 3