This example is intended to guide people who want to making practical STT models with MXNet. With rich functionalities and convenience explained above, you can build your own speech recognition models with it easier than former examples.
Input data are described in a JSON file Libri_sample.json as followed.
You can download two wave files above from this. Put them under /path/to/yourproject/Libri_sample/.
[Notice] The configuration file “default.cfg” included describes DeepSpeech2 with slight changes. You can test the original DeepSpeech2(“deepspeech.cfg”) with a few line changes to the cfg file:
Checkpoints of the model will be saved at every n-th epoch.
You can (re-) train (saved) models by loading checkpoints (starting from 0). For this, you need to modify only two lines of the file “default.cfg”.
You can predict (or test) audios by specifying the mode, model, and test data in the file “default.cfg”.
Train and test your own models by preparing two files.
Run the following line after preparing the files.