This example trains a multi-layer RNN (Elman, GRU, or LSTM) on WikiText-2 language modeling benchmark.
The model obtains ~107 ppl in WikiText-2 using LSTM.
The following techniques have been adopted for SOTA results:
The wikitext-2 data is from (The wikitext long term dependency language modeling dataset). The training script automatically loads the dataset into $PWD/data.
Example runs and the results:
python train.py --cuda --tied --nhid 200 --emsize 200 --epochs 20 --dropout 0.2 # Test ppl of 107.49
python train.py --cuda --tied --nhid 650 --emsize 650 --epochs 40 --dropout 0.5 # Test ppl of 91.51
python train.py --cuda --tied --nhid 1500 --emsize 1500 --epochs 60 --dropout 0.65 # Test ppl of 88.42
python train.py --export-model # hybridize and export model graph. See below for visualization options.
python train.py --help gives the following arguments:
usage: train.py [-h] [--model MODEL] [--emsize EMSIZE] [--nhid NHID]
[--nlayers NLAYERS] [--lr LR] [--clip CLIP] [--epochs EPOCHS]
[--batch_size N] [--bptt BPTT] [--dropout DROPOUT] [--tied]
[--cuda] [--log-interval N] [--save SAVE] [--gctype GCTYPE]
[--gcthreshold GCTHRESHOLD] [--hybridize] [--static-alloc]
[--static-shape] [--export-model]
MXNet Autograd RNN/LSTM Language Model on Wikitext-2.
optional arguments:
-h, --help show this help message and exit
--model MODEL type of recurrent net (rnn_tanh, rnn_relu, lstm, gru)
--emsize EMSIZE size of word embeddings
--nhid NHID number of hidden units per layer
--nlayers NLAYERS number of layers
--lr LR initial learning rate
--clip CLIP gradient clipping
--epochs EPOCHS upper epoch limit
--batch_size N batch size
--bptt BPTT sequence length
--dropout DROPOUT dropout applied to layers (0 = no dropout)
--tied tie the word embedding and softmax weights
--cuda Whether to use gpu
--log-interval N report interval
--save SAVE path to save the final model
--gctype GCTYPE type of gradient compression to use, takes `2bit` or
`none` for now.
--gcthreshold GCTHRESHOLD
threshold for 2bit gradient compression
--hybridize whether to hybridize in mxnet>=1.3 (default=False)
--static-alloc whether to use static-alloc hybridize in mxnet>=1.3
(default=False)
--static-shape whether to use static-shape hybridize in mxnet>=1.3
(default=False)
--export-model export a symbol graph and exit (default=False)
You may visualize the graph with mxnet.viz.plot_network without any additional dependencies. Alternatively, if mxboard is installed, use the following approach for interactive visualization.
#!python import mxnet, mxboard with mxboard.SummaryWriter(logdir='./model-graph') as sw: sw.add_graph(mxnet.sym.load('./model-symbol.json'))
#!/bin/bash tensorboard --logdir=./model-graph/