| <!DOCTYPE html> |
| |
| <html lang="en"> |
| <head> |
| <meta charset="utf-8"/> |
| <meta content="IE=edge" http-equiv="X-UA-Compatible"/> |
| <meta content="width=device-width, initial-scale=1" name="viewport"/> |
| <meta content="Speech LSTM" property="og:title"> |
| <meta content="https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/image/og-logo.png" property="og:image"> |
| <meta content="https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/image/og-logo.png" property="og:image:secure_url"> |
| <meta content="Speech LSTM" property="og:description"/> |
| <title>Speech LSTM — mxnet documentation</title> |
| <link crossorigin="anonymous" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css" integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7" rel="stylesheet"/> |
| <link href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css" rel="stylesheet"/> |
| <link href="../../_static/basic.css" rel="stylesheet" type="text/css"> |
| <link href="../../_static/pygments.css" rel="stylesheet" type="text/css"> |
| <link href="../../_static/mxnet.css" rel="stylesheet" type="text/css"/> |
| <script type="text/javascript"> |
| var DOCUMENTATION_OPTIONS = { |
| URL_ROOT: '../../', |
| VERSION: '', |
| COLLAPSE_INDEX: false, |
| FILE_SUFFIX: '.html', |
| HAS_SOURCE: true, |
| SOURCELINK_SUFFIX: '.txt' |
| }; |
| </script> |
| <script src="https://code.jquery.com/jquery-1.11.1.min.js" type="text/javascript"></script> |
| <script src="../../_static/underscore.js" type="text/javascript"></script> |
| <script src="../../_static/searchtools_custom.js" type="text/javascript"></script> |
| <script src="../../_static/doctools.js" type="text/javascript"></script> |
| <script src="../../_static/selectlang.js" type="text/javascript"></script> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script> |
| <script type="text/javascript"> jQuery(function() { Search.loadIndex("/versions/1.0.0/searchindex.js"); Search.init();}); </script> |
| <script> |
| (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ |
| (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new |
| Date();a=s.createElement(o), |
| m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) |
| })(window,document,'script','https://www.google-analytics.com/analytics.js','ga'); |
| |
| ga('create', 'UA-96378503-1', 'auto'); |
| ga('send', 'pageview'); |
| |
| </script> |
| <!-- --> |
| <!-- <script type="text/javascript" src="../../_static/jquery.js"></script> --> |
| <!-- --> |
| <!-- <script type="text/javascript" src="../../_static/underscore.js"></script> --> |
| <!-- --> |
| <!-- <script type="text/javascript" src="../../_static/doctools.js"></script> --> |
| <!-- --> |
| <!-- <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script> --> |
| <!-- --> |
| <link href="../../genindex.html" rel="index" title="Index"> |
| <link href="../../search.html" rel="search" title="Search"/> |
| <link href="https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/image/mxnet-icon.png" rel="icon" type="image/png"/> |
| </link></link></link></meta></meta></meta></head> |
| <body background="https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/image/mxnet-background-compressed.jpeg" role="document"> |
| <div class="content-block"><div class="navbar navbar-fixed-top"> |
| <div class="container" id="navContainer"> |
| <div class="innder" id="header-inner"> |
| <h1 id="logo-wrap"> |
| <a href="../../" id="logo"><img src="https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/image/mxnet_logo.png"/></a> |
| </h1> |
| <nav class="nav-bar" id="main-nav"> |
| <a class="main-nav-link" href="/versions/1.0.0/install/index.html">Install</a> |
| <span id="dropdown-menu-position-anchor"> |
| <a aria-expanded="true" aria-haspopup="true" class="main-nav-link dropdown-toggle" data-toggle="dropdown" href="#" role="button">Gluon <span class="caret"></span></a> |
| <ul class="dropdown-menu navbar-menu" id="package-dropdown-menu"> |
| <li><a class="main-nav-link" href="/versions/1.0.0/tutorials/gluon/gluon.html">About</a></li> |
| <li><a class="main-nav-link" href="https://www.d2l.ai/">Dive into Deep Learning</a></li> |
| <li><a class="main-nav-link" href="https://gluon-cv.mxnet.io">GluonCV Toolkit</a></li> |
| <li><a class="main-nav-link" href="https://gluon-nlp.mxnet.io/">GluonNLP Toolkit</a></li> |
| </ul> |
| </span> |
| <span id="dropdown-menu-position-anchor"> |
| <a aria-expanded="true" aria-haspopup="true" class="main-nav-link dropdown-toggle" data-toggle="dropdown" href="#" role="button">API <span class="caret"></span></a> |
| <ul class="dropdown-menu navbar-menu" id="package-dropdown-menu"> |
| <li><a class="main-nav-link" href="/versions/1.0.0/api/python/index.html">Python</a></li> |
| <li><a class="main-nav-link" href="/versions/1.0.0/api/c++/index.html">C++</a></li> |
| <li><a class="main-nav-link" href="/versions/1.0.0/api/julia/index.html">Julia</a></li> |
| <li><a class="main-nav-link" href="/versions/1.0.0/api/perl/index.html">Perl</a></li> |
| <li><a class="main-nav-link" href="/versions/1.0.0/api/r/index.html">R</a></li> |
| <li><a class="main-nav-link" href="/versions/1.0.0/api/scala/index.html">Scala</a></li> |
| </ul> |
| </span> |
| <span id="dropdown-menu-position-anchor-docs"> |
| <a aria-expanded="true" aria-haspopup="true" class="main-nav-link dropdown-toggle" data-toggle="dropdown" href="#" role="button">Docs <span class="caret"></span></a> |
| <ul class="dropdown-menu navbar-menu" id="package-dropdown-menu-docs"> |
| <li><a class="main-nav-link" href="/versions/1.0.0/faq/index.html">FAQ</a></li> |
| <li><a class="main-nav-link" href="/versions/1.0.0/tutorials/index.html">Tutorials</a> |
| <li><a class="main-nav-link" href="https://github.com/apache/incubator-mxnet/tree/v1.0.0/example">Examples</a></li> |
| <li><a class="main-nav-link" href="/versions/1.0.0/architecture/index.html">Architecture</a></li> |
| <li><a class="main-nav-link" href="https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+Home">Developer Wiki</a></li> |
| <li><a class="main-nav-link" href="/versions/1.0.0/model_zoo/index.html">Model Zoo</a></li> |
| <li><a class="main-nav-link" href="https://github.com/onnx/onnx-mxnet">ONNX</a></li> |
| </li></ul> |
| </span> |
| <span id="dropdown-menu-position-anchor-community"> |
| <a aria-expanded="true" aria-haspopup="true" class="main-nav-link dropdown-toggle" data-toggle="dropdown" href="#" role="button">Community <span class="caret"></span></a> |
| <ul class="dropdown-menu navbar-menu" id="package-dropdown-menu-community"> |
| <li><a class="main-nav-link" href="http://discuss.mxnet.io">Forum</a></li> |
| <li><a class="main-nav-link" href="https://github.com/apache/incubator-mxnet/tree/v1.0.0">Github</a></li> |
| <li><a class="main-nav-link" href="/versions/1.0.0/community/contribute.html">Contribute</a></li> |
| <li><a class="main-nav-link" href="/versions/1.0.0/community/powered_by.html">Powered By</a></li> |
| </ul> |
| </span> |
| <span id="dropdown-menu-position-anchor-version" style="position: relative"><a href="#" class="main-nav-link dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="true">1.0.0<span class="caret"></span></a><ul id="package-dropdown-menu" class="dropdown-menu"><li><a href="/">master</a></li><li><a href="/versions/1.7.0/">1.7.0</a></li><li><a href=/versions/1.6.0/>1.6.0</a></li><li><a href=/versions/1.5.0/>1.5.0</a></li><li><a href=/versions/1.4.1/>1.4.1</a></li><li><a href=/versions/1.3.1/>1.3.1</a></li><li><a href=/versions/1.2.1/>1.2.1</a></li><li><a href=/versions/1.1.0/>1.1.0</a></li><li><a href=/versions/1.0.0/>1.0.0</a></li><li><a href=/versions/0.12.1/>0.12.1</a></li><li><a href=/versions/0.11.0/>0.11.0</a></li></ul></span></nav> |
| <script> function getRootPath(){ return "../../" } </script> |
| <div class="burgerIcon dropdown"> |
| <a class="dropdown-toggle" data-toggle="dropdown" href="#" role="button">☰</a> |
| <ul class="dropdown-menu" id="burgerMenu"> |
| <li><a href="/versions/1.0.0/install/index.html">Install</a></li> |
| <li><a class="main-nav-link" href="/versions/1.0.0/tutorials/index.html">Tutorials</a></li> |
| <li class="dropdown-submenu dropdown"> |
| <a aria-expanded="true" aria-haspopup="true" class="dropdown-toggle burger-link" data-toggle="dropdown" href="#" tabindex="-1">Gluon</a> |
| <ul class="dropdown-menu navbar-menu" id="package-dropdown-menu"> |
| <li><a class="main-nav-link" href="/versions/1.0.0/tutorials/gluon/gluon.html">About</a></li> |
| <li><a class="main-nav-link" href="http://gluon.mxnet.io">The Straight Dope (Tutorials)</a></li> |
| <li><a class="main-nav-link" href="https://gluon-cv.mxnet.io">GluonCV Toolkit</a></li> |
| <li><a class="main-nav-link" href="https://gluon-nlp.mxnet.io/">GluonNLP Toolkit</a></li> |
| </ul> |
| </li> |
| <li class="dropdown-submenu"> |
| <a aria-expanded="true" aria-haspopup="true" class="dropdown-toggle burger-link" data-toggle="dropdown" href="#" tabindex="-1">API</a> |
| <ul class="dropdown-menu"> |
| <li><a class="main-nav-link" href="/versions/1.0.0/api/python/index.html">Python</a></li> |
| <li><a class="main-nav-link" href="/versions/1.0.0/api/c++/index.html">C++</a></li> |
| <li><a class="main-nav-link" href="/versions/1.0.0/api/julia/index.html">Julia</a></li> |
| <li><a class="main-nav-link" href="/versions/1.0.0/api/perl/index.html">Perl</a></li> |
| <li><a class="main-nav-link" href="/versions/1.0.0/api/r/index.html">R</a></li> |
| <li><a class="main-nav-link" href="/versions/1.0.0/api/scala/index.html">Scala</a></li> |
| </ul> |
| </li> |
| <li class="dropdown-submenu"> |
| <a aria-expanded="true" aria-haspopup="true" class="dropdown-toggle burger-link" data-toggle="dropdown" href="#" tabindex="-1">Docs</a> |
| <ul class="dropdown-menu"> |
| <li><a href="/versions/1.0.0/faq/index.html" tabindex="-1">FAQ</a></li> |
| <li><a href="/versions/1.0.0/tutorials/index.html" tabindex="-1">Tutorials</a></li> |
| <li><a href="https://github.com/apache/incubator-mxnet/tree/v1.0.0/example" tabindex="-1">Examples</a></li> |
| <li><a href="/versions/1.0.0/architecture/index.html" tabindex="-1">Architecture</a></li> |
| <li><a href="https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+Home" tabindex="-1">Developer Wiki</a></li> |
| <li><a href="/versions/1.0.0/model_zoo/index.html" tabindex="-1">Gluon Model Zoo</a></li> |
| <li><a href="https://github.com/onnx/onnx-mxnet" tabindex="-1">ONNX</a></li> |
| </ul> |
| </li> |
| <li class="dropdown-submenu dropdown"> |
| <a aria-haspopup="true" class="dropdown-toggle burger-link" data-toggle="dropdown" href="#" role="button" tabindex="-1">Community</a> |
| <ul class="dropdown-menu"> |
| <li><a href="http://discuss.mxnet.io" tabindex="-1">Forum</a></li> |
| <li><a href="https://github.com/apache/incubator-mxnet/tree/v1.0.0" tabindex="-1">Github</a></li> |
| <li><a href="/versions/1.0.0/community/contribute.html" tabindex="-1">Contribute</a></li> |
| <li><a href="/versions/1.0.0/community/powered_by.html" tabindex="-1">Powered By</a></li> |
| </ul> |
| </li> |
| <li id="dropdown-menu-position-anchor-version-mobile" class="dropdown-submenu" style="position: relative"><a href="#" tabindex="-1">1.0.0</a><ul class="dropdown-menu"><li><a tabindex="-1" href=/>master</a></li><li><a tabindex="-1" href=/versions/1.6.0/>1.6.0</a></li><li><a tabindex="-1" href=/versions/1.5.0/>1.5.0</a></li><li><a tabindex="-1" href=/versions/1.4.1/>1.4.1</a></li><li><a tabindex="-1" href=/versions/1.3.1/>1.3.1</a></li><li><a tabindex="-1" href=/versions/1.2.1/>1.2.1</a></li><li><a tabindex="-1" href=/versions/1.1.0/>1.1.0</a></li><li><a tabindex="-1" href=/versions/1.0.0/>1.0.0</a></li><li><a tabindex="-1" href=/versions/0.12.1/>0.12.1</a></li><li><a tabindex="-1" href=/versions/0.11.0/>0.11.0</a></li></ul></li></ul> |
| </div> |
| <div class="plusIcon dropdown"> |
| <a class="dropdown-toggle" data-toggle="dropdown" href="#" role="button"><span aria-hidden="true" class="glyphicon glyphicon-plus"></span></a> |
| <ul class="dropdown-menu dropdown-menu-right" id="plusMenu"></ul> |
| </div> |
| <div id="search-input-wrap"> |
| <form action="../../search.html" autocomplete="off" class="" method="get" role="search"> |
| <div class="form-group inner-addon left-addon"> |
| <i class="glyphicon glyphicon-search"></i> |
| <input class="form-control" name="q" placeholder="Search" type="text"/> |
| </div> |
| <input name="check_keywords" type="hidden" value="yes"> |
| <input name="area" type="hidden" value="default"/> |
| </input></form> |
| <div id="search-preview"></div> |
| </div> |
| <div id="searchIcon"> |
| <span aria-hidden="true" class="glyphicon glyphicon-search"></span> |
| </div> |
| <!-- <div id="lang-select-wrap"> --> |
| <!-- <label id="lang-select-label"> --> |
| <!-- <\!-- <i class="fa fa-globe"></i> -\-> --> |
| <!-- <span></span> --> |
| <!-- </label> --> |
| <!-- <select id="lang-select"> --> |
| <!-- <option value="en">Eng</option> --> |
| <!-- <option value="zh">中文</option> --> |
| <!-- </select> --> |
| <!-- </div> --> |
| <!-- <a id="mobile-nav-toggle"> |
| <span class="mobile-nav-toggle-bar"></span> |
| <span class="mobile-nav-toggle-bar"></span> |
| <span class="mobile-nav-toggle-bar"></span> |
| </a> --> |
| </div> |
| </div> |
| </div> |
| <script type="text/javascript"> |
| $('body').css('background', 'white'); |
| </script> |
| <div class="container"> |
| <div class="row"> |
| <div aria-label="main navigation" class="sphinxsidebar leftsidebar" role="navigation"> |
| <div class="sphinxsidebarwrapper"> |
| <ul> |
| <li class="toctree-l1"><a class="reference internal" href="../../api/python/index.html">Python Documents</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../../api/r/index.html">R Documents</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../../api/julia/index.html">Julia Documents</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../../api/c++/index.html">C++ Documents</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../../api/scala/index.html">Scala Documents</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../../api/perl/index.html">Perl Documents</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../../faq/index.html">HowTo Documents</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../../architecture/index.html">System Documents</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../index.html">Tutorials</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../../community/index.html">Community</a></li> |
| </ul> |
| </div> |
| </div> |
| <div class="content"> |
| <div class="page-tracker"></div> |
| <div class="section" id="speech-lstm"> |
| <span id="speech-lstm"></span><h1>Speech LSTM<a class="headerlink" href="#speech-lstm" title="Permalink to this headline">¶</a></h1> |
| <p>You can get the source code for these examples on <a class="reference external" href="https://github.com/dmlc/mxnet/tree/master/example/speech-demo">GitHub</a>.</p> |
| <div class="section" id="speech-acoustic-modeling-example"> |
| <span id="speech-acoustic-modeling-example"></span><h2>Speech Acoustic Modeling Example<a class="headerlink" href="#speech-acoustic-modeling-example" title="Permalink to this headline">¶</a></h2> |
| <p>The examples folder contains examples for speech recognition:</p> |
| <ul class="simple"> |
| <li><a class="reference external" href="https://github.com/dmlc/mxnet/tree/master/example/speech-demo/lstm_proj.py">lstm_proj.py</a>: Functions for building an LSTM network with and without a projection layer.</li> |
| <li><a class="reference external" href="https://github.com/dmlc/mxnet/tree/master/example/speech-demo/io_util.py">io_util.py</a>: Wrapper functions for <code class="docutils literal"><span class="pre">DataIter</span></code> over speech data.</li> |
| <li><a class="reference external" href="https://github.com/dmlc/mxnet/tree/master/example/speech-demo/train_lstm_proj.py">train_lstm_proj.py</a>: A script for training an LSTM acoustic model.</li> |
| <li><a class="reference external" href="https://github.com/dmlc/mxnet/tree/master/example/speech-demo/decode_mxnet.py">decode_mxnet.py</a>: A script for decoding an LSTMP acoustic model.</li> |
| <li><a class="reference external" href="https://github.com/dmlc/mxnet/tree/master/example/speech-demo/default.cfg">default.cfg</a>: Configuration for training on the <code class="docutils literal"><span class="pre">AMI</span></code> SDM1 dataset. You can use it as a template for writing other configuration files.</li> |
| <li><a class="reference external" href="https://github.com/dmlc/mxnet/tree/master/example/speech-demo/python_wrap">python_wrap</a>: C wrappers for Kaldi C++ code, built into an .so file. Python code that loads the .so file and calls the C wrapper functions in <code class="docutils literal"><span class="pre">io_func/feat_readers/reader_kaldi.py</span></code>.</li> |
| </ul> |
| <p>Connect to Kaldi:</p> |
| <ul class="simple"> |
| <li><a class="reference external" href="https://github.com/dmlc/mxnet/tree/master/example/speech-demo/decode_mxnet.sh">decode_mxnet.sh</a>: Called by Kaldi to decode an acoustic model trained by MXNet (select the <code class="docutils literal"><span class="pre">simple</span></code> method for decoding).</li> |
| </ul> |
| <p>A full receipt:</p> |
| <ul class="simple"> |
| <li><a class="reference external" href="https://github.com/dmlc/mxnet/tree/master/example/speech-demo/run_ami.sh">run_ami.sh</a>: A full receipt to train and decode an acoustic model on AMI. It takes features and alignment from Kaldi to train an acoustic model and decode it.</li> |
| </ul> |
| <p>To create the speech acoustic modeling example, use the following steps.</p> |
| <div class="section" id="build-kaldi"> |
| <span id="build-kaldi"></span><h3>Build Kaldi<a class="headerlink" href="#build-kaldi" title="Permalink to this headline">¶</a></h3> |
| <p>Build Kaldi as shared libraries if you have not already done so.</p> |
| <div class="highlight-bash"><div class="highlight"><pre><span></span><span class="nb">cd</span> kaldi/src |
| ./configure --shared <span class="c1"># and other options that you need</span> |
| make depend |
| make |
| </pre></div> |
| </div> |
| </div> |
| <div class="section" id="build-the-python-wrapper"> |
| <span id="build-the-python-wrapper"></span><h3>Build the Python Wrapper<a class="headerlink" href="#build-the-python-wrapper" title="Permalink to this headline">¶</a></h3> |
| <ol class="simple"> |
| <li>Copy or link the attached <code class="docutils literal"><span class="pre">python_wrap</span></code> folder to <code class="docutils literal"><span class="pre">kaldi/src</span></code>.</li> |
| <li>Compile python_wrap/.</li> |
| </ol> |
| <div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">cd</span> <span class="n">kaldi</span><span class="o">/</span><span class="n">src</span><span class="o">/</span><span class="n">python_wrap</span><span class="o">/</span> |
| <span class="n">make</span> |
| </pre></div> |
| </div> |
| </div> |
| <div class="section" id="extract-features-and-prepare-frame-level-labels"> |
| <span id="extract-features-and-prepare-frame-level-labels"></span><h3>Extract Features and Prepare Frame-level Labels<a class="headerlink" href="#extract-features-and-prepare-frame-level-labels" title="Permalink to this headline">¶</a></h3> |
| <p>The acoustic models use Mel filter-bank or MFCC as input features. They also need to use Kaldi to perform force-alignment to generate frame-level labels from the text transcriptions. For example, if you want to work on the <code class="docutils literal"><span class="pre">AMI</span></code> data <code class="docutils literal"><span class="pre">SDM1</span></code>, you can run <code class="docutils literal"><span class="pre">kaldi/egs/ami/s5/run_sdm.sh</span></code>. Before you can run the examples, you need to configure some paths in <code class="docutils literal"><span class="pre">kaldi/egs/ami/s5/cmd.sh</span></code> and <code class="docutils literal"><span class="pre">kaldi/egs/ami/s5/run_sdm.sh</span></code>. Refer to Kaldi’s documentation for details.</p> |
| <p>The default <code class="docutils literal"><span class="pre">run_sdm.sh</span></code> script generates the force-alignment labels in their stage 7, and saves the force-aligned labels in <code class="docutils literal"><span class="pre">exp/sdm1/tri3a_ali</span></code>. The default script generates MFCC features (13-dimensional). You can try training with the MFCC features, or you can create Mel filter-bank features by yourself. For example, you can use a script like this to compute Mel filter-bank features using Kaldi:</p> |
| <div class="highlight-bash"><div class="highlight"><pre><span></span><span class="ch">#!/bin/bash -u</span> |
| |
| . ./cmd.sh |
| . ./path.sh |
| |
| <span class="c1"># SDM - Single Distant Microphone</span> |
| <span class="nv">micid</span><span class="o">=</span><span class="m">1</span> <span class="c1">#which mic from array should be used?</span> |
| <span class="nv">mic</span><span class="o">=</span>sdm<span class="nv">$micid</span> |
| |
| <span class="c1"># Set bash to 'debug' mode, it prints the commands (option '-x') and exits on :</span> |
| <span class="c1"># -e 'error', -u 'undefined variable', -o pipefail 'error in pipeline',</span> |
| <span class="nb">set</span> -euxo pipefail |
| |
| <span class="c1"># Path where AMI gets downloaded (or where locally available):</span> |
| <span class="nv">AMI_DIR</span><span class="o">=</span><span class="nv">$PWD</span>/wav_db <span class="c1"># Default,</span> |
| <span class="nv">data_dir</span><span class="o">=</span><span class="nv">$PWD</span>/data/<span class="nv">$mic</span> |
| |
| <span class="c1"># make filter bank data</span> |
| <span class="k">for</span> dset in train dev eval<span class="p">;</span> <span class="k">do</span> |
| steps/make_fbank.sh --nj <span class="m">48</span> --cmd <span class="s2">"</span><span class="nv">$train_cmd</span><span class="s2">"</span> <span class="nv">$data_dir</span>/<span class="nv">$dset</span> <span class="se">\</span> |
| <span class="nv">$data_dir</span>/<span class="nv">$dset</span>/log <span class="nv">$data_dir</span>/<span class="nv">$dset</span>/data-fbank |
| steps/compute_cmvn_stats.sh <span class="nv">$data_dir</span>/<span class="nv">$dset</span> <span class="se">\</span> |
| <span class="nv">$data_dir</span>/<span class="nv">$dset</span>/log <span class="nv">$data_dir</span>/<span class="nv">$dset</span>/data |
| |
| apply-cmvn --utt2spk<span class="o">=</span>ark:<span class="nv">$data_dir</span>/<span class="nv">$dset</span>/utt2spk <span class="se">\</span> |
| scp:<span class="nv">$data_dir</span>/<span class="nv">$dset</span>/cmvn.scp scp:<span class="nv">$data_dir</span>/<span class="nv">$dset</span>/feats.scp <span class="se">\</span> |
| ark,scp:<span class="nv">$data_dir</span>/<span class="nv">$dset</span>/feats-cmvn.ark,<span class="nv">$data_dir</span>/<span class="nv">$dset</span>/feats-cmvn.scp |
| |
| mv <span class="nv">$data_dir</span>/<span class="nv">$dset</span>/feats-cmvn.scp <span class="nv">$data_dir</span>/<span class="nv">$dset</span>/feats.scp |
| <span class="k">done</span> |
| </pre></div> |
| </div> |
| <p><code class="docutils literal"><span class="pre">apply-cmvn</span></code> provides mean-variance normalization. The default setup was applied per speaker. It’s more common to perform mean-variance normalization for the whole corpus, and then feed the results to the neural networks:</p> |
| <div class="highlight-default"><div class="highlight"><pre><span></span> <span class="n">compute</span><span class="o">-</span><span class="n">cmvn</span><span class="o">-</span><span class="n">stats</span> <span class="n">scp</span><span class="p">:</span><span class="n">data</span><span class="o">/</span><span class="n">sdm1</span><span class="o">/</span><span class="n">train_fbank</span><span class="o">/</span><span class="n">feats</span><span class="o">.</span><span class="n">scp</span> <span class="n">data</span><span class="o">/</span><span class="n">sdm1</span><span class="o">/</span><span class="n">train_fbank</span><span class="o">/</span><span class="n">cmvn_g</span><span class="o">.</span><span class="n">ark</span> |
| <span class="n">apply</span><span class="o">-</span><span class="n">cmvn</span> <span class="o">--</span><span class="n">norm</span><span class="o">-</span><span class="nb">vars</span><span class="o">=</span><span class="n">true</span> <span class="n">data</span><span class="o">/</span><span class="n">sdm1</span><span class="o">/</span><span class="n">train_fbank</span><span class="o">/</span><span class="n">cmvn_g</span><span class="o">.</span><span class="n">ark</span> <span class="n">scp</span><span class="p">:</span><span class="n">data</span><span class="o">/</span><span class="n">sdm1</span><span class="o">/</span><span class="n">train_fbank</span><span class="o">/</span><span class="n">feats</span><span class="o">.</span><span class="n">scp</span> <span class="n">ark</span><span class="p">,</span><span class="n">scp</span><span class="p">:</span><span class="n">data</span><span class="o">/</span><span class="n">sdm1</span><span class="o">/</span><span class="n">train_fbank_gcmvn</span><span class="o">/</span><span class="n">feats</span><span class="o">.</span><span class="n">ark</span><span class="p">,</span><span class="n">data</span><span class="o">/</span><span class="n">sdm1</span><span class="o">/</span><span class="n">train_fbank_gcmvn</span><span class="o">/</span><span class="n">feats</span><span class="o">.</span><span class="n">scp</span> |
| </pre></div> |
| </div> |
| <p>Note that Kaldi always tries to find features in <code class="docutils literal"><span class="pre">feats.scp</span></code>. Ensure that the normalized features are organized as Kaldi expects them during decoding.</p> |
| <p>Finally, put the features and labels together in a file so that MXNet can find them. More specifically, for each data set (train, dev, eval), you will need to create a file similar to <code class="docutils literal"><span class="pre">train_mxnet.feats</span></code>, with the following contents:</p> |
| <div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">TRANSFORM</span> <span class="n">scp</span><span class="p">:</span><span class="n">feat</span><span class="o">.</span><span class="n">scp</span> |
| <span class="n">scp</span><span class="p">:</span><span class="n">label</span><span class="o">.</span><span class="n">scp</span> |
| </pre></div> |
| </div> |
| <p><code class="docutils literal"><span class="pre">TRANSFORM</span></code> is the transformation you want to apply to the features. By default, we use <code class="docutils literal"><span class="pre">NO_FEATURE_TRANSFORM</span></code>. The <code class="docutils literal"><span class="pre">scp:</span></code> syntax is from Kaldi. <code class="docutils literal"><span class="pre">feat.scp</span></code> is typically the file from <code class="docutils literal"><span class="pre">data/sdm1/train/feats.scp</span></code>, and <code class="docutils literal"><span class="pre">label.scp</span></code> is converted from the force-aligned labels located in <code class="docutils literal"><span class="pre">exp/sdm1/tri3a_ali</span></code>. Because the force-alignments are generated only on the training data, we split the training set in two, using a 90/10 ratio, and then use the 1/10 holdout as the dev set (validation set). The script <a class="reference external" href="https://github.com/dmlc/mxnet/blob/master/example/speech-demo/run_ami.sh">run_ami.sh</a> automatically splits and formats the file for MXNet. Before running it, set the path in the script correctly. The <a class="reference external" href="https://github.com/dmlc/mxnet/blob/master/example/speech-demo/run_ami.sh">run_ami.sh</a> script actually runs the full pipeline, including training the acoustic model and decoding. If the scripts ran successfully, you can skip the following sections.</p> |
| </div> |
| <div class="section" id="run-mxnet-acoustic-model-training"> |
| <span id="run-mxnet-acoustic-model-training"></span><h3>Run MXNet Acoustic Model Training<a class="headerlink" href="#run-mxnet-acoustic-model-training" title="Permalink to this headline">¶</a></h3> |
| <ol class="simple"> |
| <li>Return to the speech demo directory in MXNet. Make a copy of <code class="docutils literal"><span class="pre">default.cfg</span></code>, and edit the necessary parameters, such as the path to the dataset you just prepared.</li> |
| <li>Run <code class="docutils literal"><span class="pre">python</span> <span class="pre">train_lstm.py</span> <span class="pre">--configfile=your-config.cfg</span></code>. For help, use <code class="docutils literal"><span class="pre">python</span> <span class="pre">train_lstm.py</span> <span class="pre">--help</span></code>. You can set all of the configuration parameters in <code class="docutils literal"><span class="pre">default.cfg</span></code>, the customized config file, and through the command line (e.g., using <code class="docutils literal"><span class="pre">--train_batch_size=50</span></code>). The latter values overwrite the former ones.</li> |
| </ol> |
| <p>Here are some example outputs from training on the TIMIT dataset:</p> |
| <div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">Example</span> <span class="n">output</span> <span class="k">for</span> <span class="n">TIMIT</span><span class="p">:</span> |
| <span class="n">Summary</span> <span class="n">of</span> <span class="n">dataset</span> <span class="o">==================</span> |
| <span class="n">bucket</span> <span class="n">of</span> <span class="nb">len</span> <span class="mi">100</span> <span class="p">:</span> <span class="mi">3</span> <span class="n">samples</span> |
| <span class="n">bucket</span> <span class="n">of</span> <span class="nb">len</span> <span class="mi">200</span> <span class="p">:</span> <span class="mi">346</span> <span class="n">samples</span> |
| <span class="n">bucket</span> <span class="n">of</span> <span class="nb">len</span> <span class="mi">300</span> <span class="p">:</span> <span class="mi">1496</span> <span class="n">samples</span> |
| <span class="n">bucket</span> <span class="n">of</span> <span class="nb">len</span> <span class="mi">400</span> <span class="p">:</span> <span class="mi">974</span> <span class="n">samples</span> |
| <span class="n">bucket</span> <span class="n">of</span> <span class="nb">len</span> <span class="mi">500</span> <span class="p">:</span> <span class="mi">420</span> <span class="n">samples</span> |
| <span class="n">bucket</span> <span class="n">of</span> <span class="nb">len</span> <span class="mi">600</span> <span class="p">:</span> <span class="mi">90</span> <span class="n">samples</span> |
| <span class="n">bucket</span> <span class="n">of</span> <span class="nb">len</span> <span class="mi">700</span> <span class="p">:</span> <span class="mi">11</span> <span class="n">samples</span> |
| <span class="n">bucket</span> <span class="n">of</span> <span class="nb">len</span> <span class="mi">800</span> <span class="p">:</span> <span class="mi">2</span> <span class="n">samples</span> |
| <span class="n">Summary</span> <span class="n">of</span> <span class="n">dataset</span> <span class="o">==================</span> |
| <span class="n">bucket</span> <span class="n">of</span> <span class="nb">len</span> <span class="mi">100</span> <span class="p">:</span> <span class="mi">0</span> <span class="n">samples</span> |
| <span class="n">bucket</span> <span class="n">of</span> <span class="nb">len</span> <span class="mi">200</span> <span class="p">:</span> <span class="mi">28</span> <span class="n">samples</span> |
| <span class="n">bucket</span> <span class="n">of</span> <span class="nb">len</span> <span class="mi">300</span> <span class="p">:</span> <span class="mi">169</span> <span class="n">samples</span> |
| <span class="n">bucket</span> <span class="n">of</span> <span class="nb">len</span> <span class="mi">400</span> <span class="p">:</span> <span class="mi">107</span> <span class="n">samples</span> |
| <span class="n">bucket</span> <span class="n">of</span> <span class="nb">len</span> <span class="mi">500</span> <span class="p">:</span> <span class="mi">41</span> <span class="n">samples</span> |
| <span class="n">bucket</span> <span class="n">of</span> <span class="nb">len</span> <span class="mi">600</span> <span class="p">:</span> <span class="mi">6</span> <span class="n">samples</span> |
| <span class="n">bucket</span> <span class="n">of</span> <span class="nb">len</span> <span class="mi">700</span> <span class="p">:</span> <span class="mi">3</span> <span class="n">samples</span> |
| <span class="n">bucket</span> <span class="n">of</span> <span class="nb">len</span> <span class="mi">800</span> <span class="p">:</span> <span class="mi">0</span> <span class="n">samples</span> |
| <span class="mi">2016</span><span class="o">-</span><span class="mi">04</span><span class="o">-</span><span class="mi">21</span> <span class="mi">20</span><span class="p">:</span><span class="mi">02</span><span class="p">:</span><span class="mi">40</span><span class="p">,</span><span class="mi">904</span> <span class="n">Epoch</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="n">Train</span><span class="o">-</span><span class="n">Acc_exlude_padding</span><span class="o">=</span><span class="mf">0.154763</span> |
| <span class="mi">2016</span><span class="o">-</span><span class="mi">04</span><span class="o">-</span><span class="mi">21</span> <span class="mi">20</span><span class="p">:</span><span class="mi">02</span><span class="p">:</span><span class="mi">40</span><span class="p">,</span><span class="mi">904</span> <span class="n">Epoch</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="n">Time</span> <span class="n">cost</span><span class="o">=</span><span class="mf">91.574</span> |
| <span class="mi">2016</span><span class="o">-</span><span class="mi">04</span><span class="o">-</span><span class="mi">21</span> <span class="mi">20</span><span class="p">:</span><span class="mi">02</span><span class="p">:</span><span class="mi">44</span><span class="p">,</span><span class="mi">419</span> <span class="n">Epoch</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="n">Validation</span><span class="o">-</span><span class="n">Acc_exlude_padding</span><span class="o">=</span><span class="mf">0.353552</span> |
| <span class="mi">2016</span><span class="o">-</span><span class="mi">04</span><span class="o">-</span><span class="mi">21</span> <span class="mi">20</span><span class="p">:</span><span class="mi">04</span><span class="p">:</span><span class="mi">17</span><span class="p">,</span><span class="mi">290</span> <span class="n">Epoch</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="n">Train</span><span class="o">-</span><span class="n">Acc_exlude_padding</span><span class="o">=</span><span class="mf">0.447318</span> |
| <span class="mi">2016</span><span class="o">-</span><span class="mi">04</span><span class="o">-</span><span class="mi">21</span> <span class="mi">20</span><span class="p">:</span><span class="mi">04</span><span class="p">:</span><span class="mi">17</span><span class="p">,</span><span class="mi">290</span> <span class="n">Epoch</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="n">Time</span> <span class="n">cost</span><span class="o">=</span><span class="mf">92.870</span> |
| <span class="mi">2016</span><span class="o">-</span><span class="mi">04</span><span class="o">-</span><span class="mi">21</span> <span class="mi">20</span><span class="p">:</span><span class="mi">04</span><span class="p">:</span><span class="mi">20</span><span class="p">,</span><span class="mi">738</span> <span class="n">Epoch</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="n">Validation</span><span class="o">-</span><span class="n">Acc_exlude_padding</span><span class="o">=</span><span class="mf">0.506458</span> |
| <span class="mi">2016</span><span class="o">-</span><span class="mi">04</span><span class="o">-</span><span class="mi">21</span> <span class="mi">20</span><span class="p">:</span><span class="mi">05</span><span class="p">:</span><span class="mi">53</span><span class="p">,</span><span class="mi">127</span> <span class="n">Epoch</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="n">Train</span><span class="o">-</span><span class="n">Acc_exlude_padding</span><span class="o">=</span><span class="mf">0.557543</span> |
| <span class="mi">2016</span><span class="o">-</span><span class="mi">04</span><span class="o">-</span><span class="mi">21</span> <span class="mi">20</span><span class="p">:</span><span class="mi">05</span><span class="p">:</span><span class="mi">53</span><span class="p">,</span><span class="mi">128</span> <span class="n">Epoch</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="n">Time</span> <span class="n">cost</span><span class="o">=</span><span class="mf">92.390</span> |
| <span class="mi">2016</span><span class="o">-</span><span class="mi">04</span><span class="o">-</span><span class="mi">21</span> <span class="mi">20</span><span class="p">:</span><span class="mi">05</span><span class="p">:</span><span class="mi">56</span><span class="p">,</span><span class="mi">568</span> <span class="n">Epoch</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="n">Validation</span><span class="o">-</span><span class="n">Acc_exlude_padding</span><span class="o">=</span><span class="mf">0.548100</span> |
| </pre></div> |
| </div> |
| <p>The final frame accuracy was approximately 62%.</p> |
| </div> |
| <div class="section" id="run-decode-on-the-trained-acoustic-model"> |
| <span id="run-decode-on-the-trained-acoustic-model"></span><h3>Run Decode on the Trained Acoustic Model<a class="headerlink" href="#run-decode-on-the-trained-acoustic-model" title="Permalink to this headline">¶</a></h3> |
| <ol class="simple"> |
| <li>Estimate senone priors by running <code class="docutils literal"><span class="pre">python</span> <span class="pre">make_stats.py</span> <span class="pre">--configfile=your-config.cfg</span> <span class="pre">|</span> <span class="pre">copy-feats</span> <span class="pre">ark:-</span> <span class="pre">ark:label_mean.ark</span></code> (edit necessary items, such as the path to the training dataset). This command generates the label counts in <code class="docutils literal"><span class="pre">label_mean.ark</span></code>.</li> |
| <li>Link to the necessary Kaldi decode setup, e.g., <code class="docutils literal"><span class="pre">local/</span></code> and <code class="docutils literal"><span class="pre">utils/</span></code> and run <code class="docutils literal"><span class="pre">./run_ami.sh</span> <span class="pre">--model</span> <span class="pre">prefix</span> <span class="pre">model</span> <span class="pre">--num_epoch</span> <span class="pre">num</span></code>.</li> |
| </ol> |
| <p>Here are the results for the TIMIT and AMI test sets (using the default setup, three-layer LSTM with projection layers):</p> |
| <table border="1" class="docutils"> |
| <colgroup> |
| <col width="50%"/> |
| <col width="50%"/> |
| </colgroup> |
| <thead valign="bottom"> |
| <tr class="row-odd"><th class="head">Corpus</th> |
| <th class="head">WER</th> |
| </tr> |
| </thead> |
| <tbody valign="top"> |
| <tr class="row-even"><td>TIMIT</td> |
| <td>18.9</td> |
| </tr> |
| <tr class="row-odd"><td>AMI</td> |
| <td>51.7 (42.2)</td> |
| </tr> |
| </tbody> |
| </table> |
| <p>For AMI 42.2 was evaluated non-overlapped speech. The Kaldi-HMM baseline was 67.2%, and DNN was 57.5%.</p> |
| </div> |
| </div> |
| <div class="section" id="next-steps"> |
| <span id="next-steps"></span><h2>Next Steps<a class="headerlink" href="#next-steps" title="Permalink to this headline">¶</a></h2> |
| <div class="toctree-wrapper compound"> |
| <ul> |
| <li class="toctree-l1"><a class="reference external" href="/versions/1.0.0/tutorials/index.html">MXNet tutorials index</a></li> |
| </ul> |
| </div> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div aria-label="main navigation" class="sphinxsidebar rightsidebar" role="navigation"> |
| <div class="sphinxsidebarwrapper"> |
| <h3><a href="../../index.html">Table Of Contents</a></h3> |
| <ul> |
| <li><a class="reference internal" href="#">Speech LSTM</a><ul> |
| <li><a class="reference internal" href="#speech-acoustic-modeling-example">Speech Acoustic Modeling Example</a><ul> |
| <li><a class="reference internal" href="#build-kaldi">Build Kaldi</a></li> |
| <li><a class="reference internal" href="#build-the-python-wrapper">Build the Python Wrapper</a></li> |
| <li><a class="reference internal" href="#extract-features-and-prepare-frame-level-labels">Extract Features and Prepare Frame-level Labels</a></li> |
| <li><a class="reference internal" href="#run-mxnet-acoustic-model-training">Run MXNet Acoustic Model Training</a></li> |
| <li><a class="reference internal" href="#run-decode-on-the-trained-acoustic-model">Run Decode on the Trained Acoustic Model</a></li> |
| </ul> |
| </li> |
| <li><a class="reference internal" href="#next-steps">Next Steps</a></li> |
| </ul> |
| </li> |
| </ul> |
| </div> |
| </div> |
| </div><div class="footer"> |
| <div class="section-disclaimer"> |
| <div class="container"> |
| <div> |
| <img height="60" src="https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/image/apache_incubator_logo.png"/> |
| <p> |
| Apache MXNet is an effort undergoing incubation at The Apache Software Foundation (ASF), <strong>sponsored by the <i>Apache Incubator</i></strong>. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. |
| </p> |
| <p> |
| "Copyright © 2017-2018, The Apache Software Foundation |
| Apache MXNet, MXNet, Apache, the Apache feather, and the Apache MXNet project logo are either registered trademarks or trademarks of the Apache Software Foundation." |
| </p> |
| </div> |
| </div> |
| </div> |
| </div> <!-- pagename != index --> |
| </div> |
| <script crossorigin="anonymous" integrity="sha384-0mSbJDEHialfmuBBQP6A4Qrprq5OVfW37PRR3j5ELqxss1yVqOtnepnHVP9aJ7xS" src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js"></script> |
| <script src="../../_static/js/sidebar.js" type="text/javascript"></script> |
| <script src="../../_static/js/search.js" type="text/javascript"></script> |
| <script src="../../_static/js/navbar.js" type="text/javascript"></script> |
| <script src="../../_static/js/clipboard.min.js" type="text/javascript"></script> |
| <script src="../../_static/js/copycode.js" type="text/javascript"></script> |
| <script src="../../_static/js/page.js" type="text/javascript"></script> |
| <script src="../../_static/js/docversion.js" type="text/javascript"></script> |
| <script type="text/javascript"> |
| $('body').ready(function () { |
| $('body').css('visibility', 'visible'); |
| }); |
| </script> |
| </body> |
| </html> |