Some TaskStatus messages will arrive with the reason
field set to a value that can allow frameworks to display better error messages and to implement special behaviour for some of the reasons.
For most reasons, the message
field of the TaskStatus message will give a more detailed, human-readable error description.
Not all status updates will contain a reason.
Frameworks that implement their own executors are free to set the reason field on any status messages they produce.
Note that executors can not generally rely on the fact that the scheduler will see the status update with the reason set by the executor, since only the latest update for each different task state is stored and re-transmitted. See in particular the description of REASON_RECONCILIATION
below.
Most reasons describe conditions that can only be detected in the master or agent code, and will accompany automatically generated status updates from either of these.
For consistency with the existing usages of the different task reasons, we recommend that executors restrict themselves to the following subset if they use a non-default reason in their status updates.
The reason REASON_COMMAND_EXECUTOR_FAILED
is deprecated and will be removed in the future. It should not be referenced by newly written code.
The reasons REASON_CONTAINER_LIMITATION
, REASON_INVALID_FRAMEWORKID
, REASON_SLAVE_UNKNOWN
, REASON_TASK_UNKNOWN
and REASON_EXECUTOR_UNREGISTERED
are not used as of Mesos 1.4.
For these status updates, the reason indicates why the task state changed. Typically, a given reason will always appear together with the same state.
Typically they are generated by mesos when an error occurs that prevents the executor from sending its own status update messages.
Below, a partition-aware framework means a framework which has the Capability::PARTITION_AWARE
capability bit set in its FrameworkInfo
. Messages generated on the master will have the source
field set to SOURCE_MASTER
and messages generated on the agent will have it set to SOURCE_AGENT
in the v1 API or SOURCE_SLAVE
in the v0 API.
As of Mesos 1.4, the following reasons are being used.
TASK_FAILED
TASK_KILLED
TASK_ERROR
TASK_LOST
TASK_DROPPED
:TASK_UNREACHABLE
:TASK_GONE
These reasons do not cause a state change, and will be sent along with the last known state of the task. The reason field indicates why the status update was sent.