Environments
- Redhat 7
- Linux Kernel 3.10.0
- Docker 1.12.3
- Hadoop 2.6.5
- Docker image ytensorflow:0.2.1, Dockerfile
Use slider to run a tensorflow cluster
- Make sure slider could work well, see the Slider Start
- Download app-packages/tensorflow to $SLIDER_HOME/app-packages/tensorflow
- Put your tensorflow scripts under app-packages/tensorflow/package/files
- Set “site.global.hadoop.conf”, “site.global.user.scripts.entry”, “site.global.user.data.dir”, “site.global.user.checkpoint.dir” according to your situation in “appConfig.default.json”
- Set resource in resources.default.json if you need
- As is often the case, there is no need to update metainfo.json
- Start your tensorflow cluster
cd $SLIDER_HOME/app-packages/tensorflow
slider create [app-name] --appdef . --template appConfig.default.json --resources resources.default.json
Use ytensorflow to run a tensorflow cluster
Introduction
ytensorflow(tensorflow on YARN admin client), is used to submit and manage tensorflow cluster on YARN. It aims to make submit more easier.
Command
ytensorflow cluster -start ./config.json -files ./mnist.py
ytensorflow cluster -stop <appName>
ytensorflow cluster -status <appName>
ytensorflow version
User scripts requirements
The following arguments will be generated by the framework and passed to user script. You should use them in the right positon, just as the “mnist.py”
- job_name, worker or ps
- task_index
- ps_hosts
- worker_hosts
- data_dir, directory where the user data is stored
- ckp_dir, directory for storing the checkpoints