examples/rnnlm - singa - Git at Google

tree: 7a391c83f571bb8dcc03a8f2e81c873e3542fa2f [path history] [tgz]

examples/rnnlm/README.md

This example trains the RNN model proposed by Tomas Mikolov for language modeling over text dataset contains 71350 words, provided at RNNLM Toolkit. The training objective (loss) is to minimize the perplexity per word, which is equivalent to maximize the probability of predicting the next word given the current word in a sentence. The purpose of this example is to show users how to implement and use their own layers for RNN in SINGA. The example RNN model consists of six layers, namely RnnDataLayer, WordLayer, RnnLabelLayer, EmbeddingLayer, HiddenLayer, and OutputLayer.

File description

The files in this folder include:

rnnlm.proto, definition of the configuration protocol of the layers.
rnnlm.h, declaration of the layers.
rnnlm.cc, definition of the layers.
main.cc, main function that register the layers.
Makefile.exmaple, Makefile for compiling all source code in this folder.
job.conf, the job configuration for training the RNN language model.

Data preparation

To use the RNNLM dataset, we can download it and create DataShard by typing

# in rnnlm/ folder
cp Makefile.example Makefile
make download
make create

Compilation

The Makefile.example contains instructions for compiling the source code.

# in rnnlm/ folder
cp Makefile.example Makefile
make rnnlm

It will generate an executable file rnnlm.bin.

Running

Make sure that there is one example job configuration file, named job.conf.

Before running SINGA, we need to export the LD_LIBRARY_PATH to include the libsinga.so by the following script.

# at the root folder of SINGA
export LD_LIBRARY_PATH=.libs:$LD_LIBRARY_PATH

Then, we can run SINGA as follows.

# at the root folder of SINGA
./bin/singa-run.sh -exec examples/rnnlm/rnnlm.bin -conf examples/rnnlm/job.conf

You will see the values of loss and ppl at each training step.