commit | be68df345c7dcb3af4c14a7720b6e385e9ad3837 | [log] [tgz] |
---|---|---|
author | Julia Wang <jwang98052@yahoo.com> | Wed Jan 04 17:19:53 2017 -0800 |
committer | Mariia Mykhailova <mariia@apache.org> | Tue Jan 17 11:26:14 2017 -0800 |
tree | 4bf8e4715f3358111450f1d3cf852b6506b15b36 | |
parent | 1396fb3dc7a4b9c739e245d260320eb0d3096357 [diff] |
[REEF-1691] Don't request extra evaluators if evaluator failed at WaitingForEvaluator state When Evaluators fail in both WaitingForEvaluator and TaskRunning states, in recovery we use _failedEvaluatorsCount in EvaluatorManager to request new Evaluators. That number includes failed Evaluators in both states, while we have already requested new Evaluators instead of ones failed in WaitingForEvaluator state. The fix is to request only Evaluators failed during/after task submitting in recovery. This change also adds a test case for failed evaluators in both WaitingForEvaluator and TaskRunning states. JIRA: [REEF-1691](https://issues.apache.org/jira/browse/REEF-1691) This closes #1210
Apache REEF™ (Retainable Evaluator Execution Framework) is a library for developing portable applications for cluster resource managers such as Apache Hadoop YARN or Apache Mesos. For example, Microsoft Azure Stream Analytics is built on REEF and Hadoop.
Detailed information on REEF can be found in the following places:
The developer mailing list is the best way to reach REEF‘s developers when the above aren’t sufficient.
Component | OS | Status |
---|---|---|
REEF Java | Ubuntu | |
REEF.NET | Windows |
Java | .NET | |
---|---|---|
Build & run unit tests | java\BUILD.md | cs\BUILD.md |