This example is intended to guide people who want to making practical STT models with MXNet. With rich functionalities and convenience explained above, you can build your own speech recognition models with it easier than former examples.
Input data are described in a JSON file Libri_sample.json as followed.
You can download two wave files above from this. Put them under /path/to/yourproject/Libri_sample/.
[Notice] The configuration file “default.cfg” included describes DeepSpeech2 with slight changes. You can test the original DeepSpeech2(“deepspeech.cfg”) with a few line changes to the cfg file:
Checkpoints of the model will be saved at every n-th epoch.
You can (re-) train (saved) models by loading checkpoints (starting from 0). For this, you need to modify only two lines of the file “default.cfg”.
You can predict (or test) audios by specifying the mode, model, and test data in the file “default.cfg”.
Train and test your own models by preparing two files.
Run the following line after preparing the files.
You can prepare full LibriSpeech dataset by following the instruction on https://github.com/baidu-research/ba-dls-deepspeech
Change flac_to_wav.sh script of baidu to flac_to_wav.sh in repository to avoid bug
git clone https://github.com/baidu-research/ba-dls-deepspeech cd ba-dls-deepspeech ./download.sh cp -f /path/to/example/flac_to_wav.sh ./ ./flac_to_wav.sh python create_desc_json.py /path/to/ba-dls-deepspeech/LibriSpeech/train-clean-100 train_corpus.json python create_desc_json.py /path/to/ba-dls-deepspeech/LibriSpeech/dev-clean validation_corpus.json python create_desc_json.py /path/to/ba-dls-deepspeech/LibriSpeech/test-clean test_corpus.json