tree: 097eca7ef8ba396db796f08c624a4cec7bfa7af6 [path history] [tgz]
  1. build.sh
  2. Dockerfile
  3. mnist_distributed.py
  4. post.sh
  5. readme.md
dev-support/examples/mnist-pytorch/DDP/readme.md

Pytorch DistributedDataParallel(DDP) Example

Usage

This is an easy mnist example of how to train a distributed pytorch model using DistributedDataParallel(DDP) method and track the metric and paramater in submarine-sdk.

How to execute

  1. Set up (for a single terminal, only need to do this one time)
eval $(minikube -p minikube docker-env)
  1. Build the docker image
./dev-support/examples/mnist-pytorch/DDP/build.sh
  1. Submit a post request
./dev-support/examples/mnist-pytorch/DDP/post.sh