Click + New Experiment
on the “Experiment” page.
Click Define your experiment
Put a name to experiment, like “minst-example”
Command: python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150
Image you can put; apache/submarine:tf-mnist-with-summaries-1.0
Click Next Step
Choose Distributed Tensorflow
Click Add new spec
twice to add two new specs (roles)
One is Worker, another one is PS, leave rest of the parameters unchanged
Click next step, you can review your parameters before submitting the job:
It should look like below:
Name | mnist-example-111 | ||
---|---|---|---|
Command | python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150 | ||
Image | apache/submarine:tf-mnist-with-summaries-1.0 | ||
Environment Variables | |||
Ps | cpu=1,nvidia.com/gpu=0,memory=1024M | ||
Worker | cpu=1,nvidia.com/gpu=0,memory=1024M |
Click Submit
it will be submitted, you can see the new example running in the Experiment
list, you can get logs, etc. directly from the UI