Restricted Boltzmann machine (RBM)

An example of the binary RBM [1] learning the MNIST data. The RBM is implemented as a custom operator, and a gluon block is also provided. binary_rbm.py contains the implementation of the RBM. binary_rbm_module.py and binary_rbm_gluon.py train the MNIST data using the module interface and the gluon interface respectively. The MNIST data is downloaded automatically.

The progress of the learning is monitored by estimating the log-likelihood using the annealed importance sampling [2,3]. The learning with the default hyperparameters takes about 25 minutes on GTX 1080Ti and the resulting log-likelihood is around -70 for both testing and training datasets.

Here are some samples generated by the RBM with the default hyperparameters. The samples (right) are obtained by 3000 steps of Gibbs sampling starting from randomly chosen real images (left).

Usage:

python binary_rbm_gluon.py --help
usage: binary_rbm_gluon.py [-h] [--num-hidden NUM_HIDDEN] [--k K]
                           [--batch-size BATCH_SIZE] [--num-epoch NUM_EPOCH]
                           [--learning-rate LEARNING_RATE]
                           [--momentum MOMENTUM]
                           [--ais-batch-size AIS_BATCH_SIZE]
                           [--ais-num-batch AIS_NUM_BATCH]
                           [--ais-intermediate-steps AIS_INTERMEDIATE_STEPS]
                           [--ais-burn-in-steps AIS_BURN_IN_STEPS] [--cuda]
                           [--no-cuda] [--device-id DEVICE_ID]
                           [--data-loader-num-worker DATA_LOADER_NUM_WORKER]

Restricted Boltzmann machine learning MNIST

optional arguments:
  -h, --help            show this help message and exit
  --num-hidden NUM_HIDDEN
                        number of hidden units
  --k K                 number of Gibbs sampling steps used in the PCD
                        algorithm
  --batch-size BATCH_SIZE
                        batch size
  --num-epoch NUM_EPOCH
                        number of epochs
  --learning-rate LEARNING_RATE
                        learning rate for stochastic gradient descent
  --momentum MOMENTUM   momentum for the stochastic gradient descent
  --ais-batch-size AIS_BATCH_SIZE
                        batch size for AIS to estimate the log-likelihood
  --ais-num-batch AIS_NUM_BATCH
                        number of batches for AIS to estimate the log-
                        likelihood
  --ais-intermediate-steps AIS_INTERMEDIATE_STEPS
                        number of intermediate distributions for AIS to
                        estimate the log-likelihood
  --ais-burn-in-steps AIS_BURN_IN_STEPS
                        number of burn in steps for each intermediate
                        distributions of AIS to estimate the log-likelihood
  --cuda                train on GPU with CUDA
  --no-cuda             train on CPU
  --device-id DEVICE_ID
                        GPU device id
  --data-loader-num-worker DATA_LOADER_NUM_WORKER
                        number of multithreading workers for the data loader

Default:

Namespace(ais_batch_size=100, ais_burn_in_steps=10, ais_intermediate_steps=10, ais_num_batch=10, batch_size=80, cuda=True, data_loader_num_worker=4, device_id=0, k=30, learning_rate=0.1, momentum=0.3, num_epoch=130, num_hidden=500)

[1] G E Hinton & R R Salakhutdinov, Reducing the Dimensionality of Data with Neural Networks Science 313, 5786 (2006)
[2] R M Neal, Annealed importance sampling. Stat Comput 11 2 (2001)
[3] R Salakhutdinov & I Murray, On the quantitative analysis of deep belief networks. In Proc. ICML '08 25 (2008)