tree: a865baa57d53163478014e3349560aaca313e130 [path history] [tgz]
  1. autograd/
  2. data/
  3. model/
  4. README.md
  5. benchmark.py
  6. train_cnn.py
  7. train_mpi.py
  8. train_multiprocess.py
examples/cifar_distributed_cnn/README.md

Image Classification using Convolutional Neural Networks

Examples inside this folder show how to train CNN models using SINGA for image classification.

  • data includes the scripts for preprocessing image datasets. Currently, MNIST, CIFAR10 and CIFAR100 are included.

  • model includes the CNN model construction codes by creating a subclass of Module to wrap the neural network operations of each model. Then computational graph is enabled to optimized the memory and efficiency.

  • autograd includes the codes to train CNN models by calling the neural network operations imperatively. The computational graph is not created.

  • train_cnn.py is the training script, which controls the training flow by doing BackPropagation and SGD update.

  • train_multiprocess.py is the script for distributed training on a single node with multiple GPUs; it uses Python's multiprocessing module and NCCL.

  • train_mpi.py is the script for distributed training (among multiple nodes) using MPI and NCCL for communication.

  • benchmark.py tests the training throughput using ResNet50 as the workload.