tree: 03cbd7badc039a5c7228d0f62e2d37cebf22c3a0 [path history] [tgz]
  1. package/
  2. ytensorflow/
  3. appConfig.default.json
  4. metainfo.json
  5. README.md
  6. resources.default.json
app-packages/tensorflow/README.md

Environments

  • Redhat 7
  • Linux Kernel 3.10.0
  • Docker 1.12.3
  • Hadoop 2.6.5
  • Docker image ytensorflow:0.2.1, Dockerfile

Use slider to run a tensorflow cluster

  1. Make sure slider could work well, see the Slider Start
  2. Download app-packages/tensorflow to $SLIDER_HOME/app-packages/tensorflow
  3. Put your tensorflow scripts under app-packages/tensorflow/package/files
  4. Set “site.global.hadoop.conf”, “site.global.user.scripts.entry”, “site.global.user.data.dir”, “site.global.user.checkpoint.dir” according to your situation in “appConfig.default.json”
  5. Set resource in resources.default.json if you need
  6. As is often the case, there is no need to update metainfo.json
  7. Start your tensorflow cluster
cd $SLIDER_HOME/app-packages/tensorflow
slider create [app-name] --appdef . --template appConfig.default.json --resources resources.default.json

Use ytensorflow to run a tensorflow cluster

Introduction

ytensorflow(tensorflow on YARN admin client), is used to submit and manage tensorflow cluster on YARN. It aims to make submit more easier.

Command

ytensorflow cluster -start ./config.json -files ./mnist.py
ytensorflow cluster -stop <appName>
ytensorflow cluster -status <appName>
ytensorflow version

User scripts requirements

The following arguments will be generated by the framework and passed to user script. You should use them in the right positon, just as the “mnist.py”

  • job_name, worker or ps
  • task_index
  • ps_hosts
  • worker_hosts
  • data_dir, directory where the user data is stored
  • ckp_dir, directory for storing the checkpoints