[REEF-1511] Add timeout for Task shutdown during IMRU recovery

During IMRU FT recovery sometimes the tasks that are supposed
to be closed by driver don't report back, causing the system hang.
This change adds a timeout for tasks closed by driver, so that
evaluators of unresponsive tasks are shut down after timeout.

The average task closing time is recorded in the TaskManager.
This number is a reference to define the timeout on the fly.
In case the average number is accessed before the data is accumulated,
or the average number is too low in some scenarios, we have
a configurable MinTaskWaitingForCloseTimeout to ensure that
the driver waits long enough before killing the evaluators.

JIRA:
  [REEF-1511](https://issues.apache.org/jira/browse/REEF-1511)

Pull request:
  This closes #1201
7 files changed
tree: 3d0be75077d82c19c744a02e26014ed8ede705f6
  1. bin/
  2. dev/
  3. lang/
  4. website/
  5. .gitattributes
  6. .gitignore
  7. .travis.yml
  8. appveyor.yml
  9. Doxyfile
  10. HEADER
  11. LICENSE
  12. NOTICE
  13. pom.xml
  14. README.md
README.md

Apache REEF™

Apache REEF™ (Retainable Evaluator Execution Framework) is a library for developing portable applications for cluster resource managers such as Apache Hadoop YARN or Apache Mesos. For example, Microsoft Azure Stream Analytics is built on REEF and Hadoop.

Online Documentation

Detailed information on REEF can be found in the following places:

The developer mailing list is the best way to reach REEF‘s developers when the above aren’t sufficient.

Build Status

ComponentOSStatus
REEF JavaUbuntuBuild Status
REEF.NETWindowsBuild status

Building REEF

Java.NET
Build & run unit testsjava\BUILD.mdcs\BUILD.md

Releases

downloads NuGet package

License

Apache 2.0