Run Tensorflow Experiment on Submarine UI

Steps to run Tensorflow Experiment

  • Click + New Experiment on the “Experiment” page.

  • Click Define your experiment

  • Put a name to experiment, like “minst-example”

  • Command: python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150

  • Image you can put; apache/submarine:tf-mnist-with-summaries-1.0

  • Click Next Step

  • Choose Distributed Tensorflow

  • Click Add new spec twice to add two new specs (roles)

  • One is Worker, another one is PS, leave rest of the parameters unchanged

  • Click next step, you can review your parameters before submitting the job:

    It should look like below:

    Namemnist-example-111
    Commandpython /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150
    Imageapache/submarine:tf-mnist-with-summaries-1.0
    Environment Variables
    Pscpu=1,nvidia.com/gpu=0,memory=1024M
    Workercpu=1,nvidia.com/gpu=0,memory=1024M
  • Click Submit it will be submitted, you can see the new example running in the Experiment list, you can get logs, etc. directly from the UI