[REEF-1691] Don't request extra evaluators if evaluator failed at WaitingForEvaluator state

When Evaluators fail in both WaitingForEvaluator and TaskRunning states,
in recovery we use _failedEvaluatorsCount in EvaluatorManager to request
new Evaluators. That number includes failed Evaluators in both states,
while we have already requested new Evaluators instead of ones failed in
WaitingForEvaluator state.

The fix is to request only Evaluators failed during/after task submitting
in recovery.

This change also adds a test case for failed evaluators in both
WaitingForEvaluator and TaskRunning states.

JIRA:
  [REEF-1691](https://issues.apache.org/jira/browse/REEF-1691)

This closes #1210
5 files changed
tree: 4bf8e4715f3358111450f1d3cf852b6506b15b36
  1. bin/
  2. dev/
  3. lang/
  4. website/
  5. .gitattributes
  6. .gitignore
  7. .travis.yml
  8. appveyor.yml
  9. Doxyfile
  10. HEADER
  11. LICENSE
  12. NOTICE
  13. pom.xml
  14. README.md
README.md

Apache REEF™

Apache REEF™ (Retainable Evaluator Execution Framework) is a library for developing portable applications for cluster resource managers such as Apache Hadoop YARN or Apache Mesos. For example, Microsoft Azure Stream Analytics is built on REEF and Hadoop.

Online Documentation

Detailed information on REEF can be found in the following places:

The developer mailing list is the best way to reach REEF‘s developers when the above aren’t sufficient.

Build Status

ComponentOSStatus
REEF JavaUbuntuBuild Status
REEF.NETWindowsBuild status

Building REEF

Java.NET
Build & run unit testsjava\BUILD.mdcs\BUILD.md

Releases

downloads NuGet package

License

Apache 2.0