Dockerfile to run Kaldi on YARN need two part:
Base libraries which Kaldi depends on
OS base image, for example nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04
Kaldi depended libraries and packages. For example python
, g++
, make
. For GPU support, need cuda
, cudnn
, etc.
Kaldi compile.
Libraries to access HDFS
JDK
Hadoop
Here's an example of a base image (w/o GPU support) to install Kaldi:
FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04 RUN apt-get clean && \ apt-get update && \ apt-get install -y --no-install-recommends \ sudo \ openjdk-8-jdk \ iputils-ping \ g++ \ make \ automake \ autoconf \ bzip2 \ unzip \ wget \ sox \ libtool \ git \ subversion \ python2.7 \ python3 \ zlib1g-dev \ ca-certificates \ patch \ ffmpeg \ vim && \ rm -rf /var/lib/apt/lists/* && \ ln -s /usr/bin/python2.7 /usr/bin/python RUN git clone --depth 1 https://github.com/kaldi-asr/kaldi.git /opt/kaldi && \ cd /opt/kaldi && \ cd /opt/kaldi/tools && \ ./extras/install_mkl.sh && \ make -j $(nproc) && \ cd /opt/kaldi/src && \ ./configure --shared --use-cuda && \ make depend -j $(nproc) && \ make -j $(nproc)
On top of above image, add files, install packages to access HDFS
RUN apt-get update && apt-get install -y openjdk-8-jdk wget # Install hadoop ENV HADOOP_VERSION="3.2.1" ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 RUN wget https://archive.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz && \ tar zxf hadoop-${HADOOP_VERSION}.tar.gz && \ ln -s hadoop-${HADOOP_VERSION} hadoop-current && \ rm hadoop-${HADOOP_VERSION}.tar.gz
Build and push to your own docker registry: Use docker build ...
and docker push ...
to finish this step.
We provided following examples for you to build kaldi docker images.
For latest Kaldi
Under docker/
directory,The CLUSTER_NAME can be modified in build-all.sh to have installation permissions, run build-all.sh
to build Docker images. It will build following images:
kaldi-latest-gpu-base:0.0.1
for base Docker image which includes Hadoop, Kaldi, GPU base libraries, which includes thchs30 model.(No liability) You can also use prebuilt images for convenience in the docker hub: