blob: 34840ce5d493a3ed9f6da720c30d0404b2fcdc67 [file] [log] [blame]
%
% Licensed to the Apache Software Foundation (ASF) under one
% or more contributor license agreements. See the NOTICE file
% distributed with this work for additional information
% regarding copyright ownership. The ASF licenses this file
% to you under the Apache License, Version 2.0 (the
% "License"); you may not use this file except in compliance
% with the License. You may obtain a copy of the License at
%
% http://www.apache.org/licenses/LICENSE-2.0
%
% Unless required by applicable law or agreed to in writing,
% software distributed under the License is distributed on an
% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
% KIND, either express or implied. See the License for the
% specific language governing permissions and limitations
% under the License.
%
\subsection{\varJobDriver (\varJD)}
There is one \varJobDriver~per \varJob.
The duties of the \varJobDriver~are:
\textit{
\input{part5/c01-JD.tex}
}
The \varJobDriver~comprises a container into which the user specified
\varCollectionReader~is deployed.
The \varJobDriver~interacts with the user specified
\varCollectionReader~to fetch \varCASes~(or \varWorkItems) for
processing by a corresponding \varPipeline.
The \varJobDriver~is deployed into a \varResourceManager~allocated
\varJdShare~managed by a \varDUCC~\varAgent.
The \varJobDriver~is subdivided into several responsibility areas:
\begin{itemize}
\item Core
\item Client
\item Config
\item Event
\end{itemize}
\subsubsection{Core}
\begin{itemize}
\item{Job Driver}
\begin{description}
\item The \varJobDriver~module is the main thread.
It setups and executes the \varJobDriver~runtime environment.
\begin{itemize}
\item{initialize}
\begin{description}
\item The initialize method sets-up all the machinery to
fetch and process \varCASes.
\end{description}
\item{run}
\begin{description}
\item The run method manages all the machinery to
fetch and process \varCASes.
\end{description}
\begin{itemize}
\item{wait for eligibility}
\begin{description}
Do not start queuing \varWorkItems~until at least one \varJobProcess~has initialized.
\end{description}
\item{initialize \varUIMAAS~client}
\begin{description}
\item Create an instance and one thread client for each corresponding \varJobProcess thread.
\end{description}
\item{queue \varWorkItems}
\begin{description}
\item While terminate conditions are absent, repeat the process of queuing one work item for each thread, then sleeping for an interval, then rechecking for termination and performing additional queuing.
\end{description}
\end{itemize}
\end{itemize}
\end{description}
\item{Job Driver Component}
\begin{description}
\item This module initializes the \varJobDriver function,
receives and evaluates \ORMaps with respect to
continuance or termination of the \varJobDriver,
and triggers publication of \varJobDriver status reports.
\end{description}
\item{Job Driver Terminate Exception}
\begin{description}
\item This module provides a means to identify the exception and
possible reason for same when the \varJobDriver~abnormally terminates.
\end{description}
\item{Synchronized Stats}
\begin{description}
\item This module provides a method for the \varJobDriver~to accumulate
statistics in a thread safe manner.
Per \varWorkItem~statistics are maintained and produced:
\end{description}
\begin{itemize}
\item{number of \varWorkItems}
\item{minimum time to complete a \varWorkItem}
\item{maximum time to complete a \varWorkItem}
\item{average time to complete a \varWorkItem}
\item{standard deviation of time to complete a \varWorkItem}
\end{itemize}
\end{itemize}
\subsubsection{Client}
\begin{itemize}
\item{Callback State}
\begin{description}
\item This module tracks \varWorkItem~queuing state.
\item Possible states are:
\begin{itemize}
\item \varPendingQueued
\item \varPendingAssigned
\item \varNotPending
\end{itemize}
\end{description}
\item{\varCAS~Dispatch Map}
\begin{description}
\item This module tracks \varWorkItems.
It comprises a map of \varWorkItems~which includes
node and \varLinux~process identity.
\end{description}
\item{\varCAS~Limbo}
\begin{description}
\item Manage incomplete \varWorkItems.
This module ensures that \varWorkItems~are not simultaneously processed
by multiple \varUIMA~pipelines.
It does not release \varWorkItems~for retry processing elsewhere until
confirmation is received that the previous attempt has been terminated.
\end{description}
\item{\varCAS~Source}
\begin{description}
\item Manage \varCASes.
Employ the user provided \varCR~to fetch
\varCASes~as needed to keep the available \varUIMA~pipelines full
until all \varCASes~have been processed.
Save and restore \varCASes~that were
pre-empted during periods of \varJP~contraction, for example.
\end{description}
\item{\varCAS~Tuple}
\begin{description}
\item Manage \varCAS~instance with meta-data.
\begin{itemize}
\item \varUIMA~\varCAS~object.
\item \varDUCC~assigned sequence number.
\item \varCAS~retry status.
\item \varJob~identifier.
\end{itemize}
\end{description}
\item{Client Thread Factory}
\begin{description}
\item Produce one \varUIMAAS client thread instance for each \varCAS~in-progress.
\end{description}
\item{Dymanic Thread Pool Executor}
\begin{description}
\item Maintain a client-size thread pool for processing \varCASes.
Each thread in the pool is assigned and tracks one \varCAS~sent~out
for processing via \varSendAndReceiveCAS.
Each thread in the pool is re-usable once processing for the
associated \varCAS~is completed.
The thread pool expands and contracts in correlation with
the number of \varResourceManager~assigned \varShares.
\end{description}
\item{\varWorkItem}
\begin{description}
\item The \varWorkItem~represents one \varCAS~to be processed, normally by one of the
distributed \varUIMA pipelines.
\begin{itemize}
\item Manage and track the lifecycle of a \varWorkItem~steps:
\begin{itemize}
\item start
\item getCas
\item \varSendAndReceiveCAS
\item ended or exception
\end{itemize}
\end{itemize}
\end{description}
\item{\varWorkItem}
\begin{description}
Create a new \varWorkItem~for given CasTuple.
Associate a \varWorkItem~with a \varUIMAAS~client thread.
\end{description}
\item{\varWorkItem~Listener}
\item
\begin{description}
\item
\begin{itemize}
\item onBeforeMessageSend
\begin{description}
\item Process callback that indicates work item has been placed on MQ queue and
is awaiting grab by a \varJP.
\end{description}
\item onBeforeProcessCAS
\begin{description}
\item Process callback that indicates work item has been grabbed from MQ queue and
is active in a \varUIMA~pipeline.
The associated node and \varLinux~process identity are provided.
\end{description}
\end{itemize}
\end{description}
\end{itemize}
\subsubsection{Config}
The \varJobDriver~publishes reports at configurable intervals:
\begin{itemize}
\item Job Driver Status Report
Job Driver Status Report is a report on the \varJobDriver-managed
\varCollectionReader~sourced \varCASes~(or \varWorkItems).
Information includes \varWorkItems~total-to-process, number-finished,
number-failed, number-retried and other status.
\end{itemize}
\subsubsection{Event}
The module receives and handles publication events:
\begin{itemize}
\item \varORmap
The \varOrchestrator~notification comprising the \varORmap~is the
"final authority" on the state of each Job, Reservation and Service
currently or formerly deployed to the \varDUCC-managed cluster.
\end{itemize}