This example trains a multi-layer RNN (Elman, GRU, or LSTM) on Penn Treebank (PTB) language modeling benchmark.
The model obtains the state-of-the-art result on PTB using LSTM, getting a test perplexity of ~72. And ~97 ppl in WikiText-2, outperform than basic LSTM(99.3) and reach Variational LSTM(96.3).
The following techniques have been adopted for SOTA results:
The PTB data is the processed version from (Mikolov et al, 2010):
bash get_ptb_data.sh python data.py
The wikitext-2 data is downloaded from (The wikitext long term dependency language modeling dataset):
bash get_wikitext2_data.sh
Example runs and the results:
python train.py -data ./data/ptb. --cuda --tied --nhid 650 --emsize 650 --dropout 0.5 # Test ppl of 75.3 in ptb python train.py -data ./data/ptb. --cuda --tied --nhid 1500 --emsize 1500 --dropout 0.65 # Test ppl of 72.0 in ptb
python train.py -data ./data/wikitext-2/wiki. --cuda --tied --nhid 256 --emsize 256 # Test ppl of 97.07 in wikitext-2
python train.py --help
gives the following arguments:
Optional arguments: -h, --help show this help message and exit --data DATA location of the data corpus --model MODEL type of recurrent net (rnn_tanh, rnn_relu, lstm, gru) --emsize EMSIZE size of word embeddings --nhid NHID number of hidden units per layer --nlayers NLAYERS number of layers --lr LR initial learning rate --clip CLIP gradient clipping --epochs EPOCHS upper epoch limit --batch_size N batch size --bptt BPTT sequence length --dropout DROPOUT dropout applied to layers (0 = no dropout) --tied tie the word embedding and softmax weights --cuda Whether to use gpu --log-interval N report interval --save SAVE path to save the final model