blob: 3eaeff8a0633085056e60f926adbe82656dc8b14 [file] [log] [blame]
%
% Licensed to the Apache Software Foundation (ASF) under one
% or more contributor license agreements. See the NOTICE file
% distributed with this work for additional information
% regarding copyright ownership. The ASF licenses this file
% to you under the Apache License, Version 2.0 (the
% "License"); you may not use this file except in compliance
% with the License. You may obtain a copy of the License at
%
% http://www.apache.org/licenses/LICENSE-2.0
%
% Unless required by applicable law or agreed to in writing,
% software distributed under the License is distributed on an
% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
% KIND, either express or implied. See the License for the
% specific language governing permissions and limitations
% under the License.
%
\subsection{\varOrchestrator~(\varOR)}
There is one \varOrchestrator~per \varDUCC~cluster.
The duties of the \varOrchestrator~are:
\textit{
\input{part5/c01-OR.tex}
}
The \varOrchestrator~provides essential functionality for operation of the
\varDUCC~system.
It is configurable and tunable.
The \varOrchestrator~receives user requests to start, stop and modify
Jobs, Services and Reservations. It manages the life-cycles of these
entities, each deployed to a managed cluster of machines
(nodes, computers).
The \varOrchestrator~both publishes and receive reports.
The \varOrchestrator~publication is also known as the \varORmap, which is
the final authority on the state of Jobs, Reservations, and Services.
All other \varDUCC~components respect the state published by the
\varOrchestrator~and each carries out its assigned duties accordingly.
\subsubsection{Controller}
\label{Controller}
The \varOrchestrator~Controller responsibilities entail receiving user
submitted requests and processing them to completion in accordance with
an instance of the appropriate state machine.
User submitted requests comprise:
\begin{itemize}
\item Job \{ Start, Stop, Modify \}
\item Reservation \{ Start, Stop, Modify \}
\item Service \{ Start, Stop, Modify \}
\item Individual Process \{ Stop \}
\end{itemize}
The Controller responsibilities further entail receiving status messages
from other \varDUCC~components and advancing the state machines of user
submitted Jobs, Reservations, and Services as necessary.
Additionally and importantly, the Controller is the final authority
for the \varDUCC~system state comprising active Jobs, Reservations, and
Services. The Controller publishes \varDUCC~system state at regular intervals
for consumption and use by all other \varDUCC~components.
\subsubsection{Authenticator}
The authenticator determines whether or not the requesting user is a
\varDUCC~administrator. Such users have special privileges, such as:
\begin{itemize}
\item the ability to control \varDUCC~system functions
\item the ability to act on behalf of other users
\end{itemize}
The file \varDuccAdministrators~comprises the list of privileged \varDUCC~users.
\subsubsection{Validation}
Each request to submit, cancel or modify is validated against a set of
criteria that define acceptableness. In the case of missing information,
a default value may be employed.
Presently, the following keys are validated by the \varOrchestrator:
\begin{itemize}
\item \varProcessThreadCount
\item \varNumberOfInstances
\item \varSchedulingClass
\end{itemize}
Other keys are validated by the \varCommandLineInterface, prior to arrival
at the \varOrchestrator.
\subsubsection{Factory}
Once accepted, submit requests proceed through a corresponding factory
to have a state machine representation entered into the published
\varORmap~with initial state of \varReceived. The request remains
active until it advances to final state \varCompleted.
Each factory-created representation comprises appropriate information as follows:
\begin{itemize}
\item
\begin{itemize}
\item standard information
\begin{itemize}
\item unique identifier (assigned by \varDUCC)
\item type {Job, Reservation, Service}
\item user name
\item submitting \varPID
\item date of submission
\item date of completion (initially \varNull)
\item description (text supplied by user)
\end{itemize}
\item scheduling information
\begin{itemize}
\item scheduling class
\item scheduling priority
\item scheduling max shares
\item scheduling min shares
\item scheduling threads per share
\item scheduling memory size
\item scheduling memory units
\end{itemize}
\item job driver information
\begin{itemize}
\item java command
\item java classpath
\item environment variables
\item user log directory
\item MQ broker
\item MQ queue
\item \varCollectionReader~descriptor
\item \varCollectionReader~overrides
\item getMeta timeout value
\item work item processing timeout value
\item work item processing exception handler
\item node identity
\item \varLinuxControlGroup~limits
\item state
\end{itemize}
\item job process information (one or more instances)
\begin{itemize}
\item java command
\item java classpath
\item environment variables
\item user log directory
\item MQ broker
\item MQ queue
\item deployment descriptor or aggregate data
\item initialization failure limits
\item node identity
\item \varLinuxControlGroup~limits
\item state
\item service dependencies
\end{itemize}
\item service information (one or more instances)
\begin{itemize}
\item See job process information above.
\end{itemize}
\item managed reservation information
TBD
\item unmanaged reservation information
TBD
\end{itemize}
\end{itemize}
\subsubsection{Checkpoint Supervisor}
The Checkpoint Supervisor provides functions to save and restore state information
to/from persistent storage. State is stored whenever a significant change occurs.
State is restored at \varOrchestrator~boot time.
Saving and restoration of state facilitates reasonable continuity of service
between \varOrchestrator~lifetimes.
\subsubsection{State Supervisor}
The State Supervisor receives and examines publications from other
\varDUCC~components, records and distributes pertinent information obtained
or derived, and advances state machines appropriately.
Publications are received from these components:
\begin{itemize}
\item Job Driver(s)
\item Resource Manager
\item Services Manager
\item Agent(s) Inventory
\end{itemize}
Information from these sources is recorded in the \varORmap.
Based on information derived from all sources, the
\varOrchestrator~advances the state machines of currently active
entities (Jobs, Reservations, Services).
Once the \varCompleted~state is reached, the
entity is no longer active on the cluster.
Note that \varORmap~is, in-turn, published at regular intervals
for use by the other \varDUCC~singleton and distributed components.
The \varORmap~is the "final authority" on the state of
each Job, Reservation and Service currently or formerly deployed.
See \ref{Controller} \varController.
\subsubsection{State Accounting Supervisor}
The State Accounting Supervisor manages finite state machine for
Jobs, Services, and Reservations. It provides functions to:
\begin{description}
\item Advance from the current state to a next valid state
\item Advance from the current state immediately to the \varCompleted~state
\end{description}
\subsubsection{\varLinuxControlGroup~Supervisor}
The \varLinuxControlGroup~Supervisor assigns a maximum size (in bytes) and a composite
unique identity to each \varShare. This information is published for use
by Agents to enforce \varLinuxControlGroup~limitations on storage used by the corresponding
running entity (for example, \varUIMA~pipeline).
Employing \varLinuxControlGroups~ is analogous to defining virtual machines of a certain
size such that exceeding limits causes only the offending process to suffer
any performance penalties, while other co-located well-behaved processes
run unaffected.
\subsubsection{Host Supervisor}
The Host Supervisor is responsible for obtaining sufficient resource for
deploying the Job Drivers for all submitted Jobs. It interacts with the
Resource Manager to allocate and de-allocate resources for this purpose.
It assigns a \varJdShare~to each active Job.
A \varJdShare~is a \varLinuxControlGroup~controlled \varShare~of sufficient size into which a Job
Driver can be deployed. A \varJdShare~is usually significantly smaller than
a normal \varShare.
\subsubsection{Logging / As-User}
The Logging and As-User modules permit the \varOrchestrator~to write logging data into
a file contained in "user-space", meaning a file into a directory writable
by the submitting user, during processing of the submitted entity
(Job, Managed Reservation...).
The Logging module also facilitates the recording to persistent storage of noteworthy
events occurring during the \varOrchestrator~lifetime. Noteworthiness is configurable
as various levels, such as \varINFO, \varDEBUG~and \varTRACE.
\subsubsection{Administrators}
The Administrators module grants users defined in the \varDuccAdministrators~file
special privileges, such a being able cancel to any user's Job.
\subsubsection{Maintenance}
The maintenance thread wakes-up at regular intervals to perform the following
tasks:
\begin{description}
\item Health
The \varOrchestrator~automatically caps Jobs and Services that exceed initialization
error thresholds, and cancels those that exceed processing error thresholds.
\item MQ Reaper
The \varOrchestrator~cleans-up unused \varJobDriver~AMQ permanent queues for Jobs that have completed.
\item Publication Pruning
The \varOrchestrator~regularly publishes state for all active entities (Jobs, Reservations,
Services). It also publishes state for recently completed ones. Pruning removes
from regular \varOrchestrator~publication completed entities that have been competed past a
time threshold, nominally one minute.
\item Node Accounting
This module keeps track of each node's state, up or down. Nodes that do
not report for a time exceeding a threshold, typically a few minutes,
are considered down. This information is used for Jobs whose Job Driver
advanced to the \varCompleted~state, whereby corresponding Job Processes on
nodes that are reported down are marked as stopped by the \varOrchestrator, as opposed
to waiting (potentially forever) for the corresponding Agent to report.
This prevents Jobs from becoming unnecessarily stuck in the completing
state.
\end{description}