Environments

Redhat 7
Linux Kernel 3.10.0
Docker 1.12.3
Hadoop 2.6.5
Docker image ytensorflow:0.2.1, Dockerfile

Use slider to run a tensorflow cluster

Make sure slider could work well, see the Slider Start
Download app-packages/tensorflow to $SLIDER_HOME/app-packages/tensorflow
Put your tensorflow scripts under app-packages/tensorflow/package/files
Set “site.global.hadoop.conf”, “site.global.user.scripts.entry”, “site.global.user.data.dir”, “site.global.user.checkpoint.dir” according to your situation in “appConfig.default.json”
Set resource in resources.default.json if you need
As is often the case, there is no need to update metainfo.json
Start your tensorflow cluster

cd $SLIDER_HOME/app-packages/tensorflow
slider create [app-name] --appdef . --template appConfig.default.json --resources resources.default.json

Use ytensorflow to run a tensorflow cluster

Introduction

ytensorflow(tensorflow on YARN admin client), is used to submit and manage tensorflow cluster on YARN. It aims to make submit more easier.

Command

ytensorflow cluster -start ./config.json -files ./mnist.py
ytensorflow cluster -stop <appName>
ytensorflow cluster -status <appName>
ytensorflow version

User scripts requirements

The following arguments will be generated by the framework and passed to user script. You should use them in the right positon, just as the “mnist.py”

job_name, worker or ps
task_index
ps_hosts
worker_hosts
data_dir, directory where the user data is stored
ckp_dir, directory for storing the checkpoints