blob: 3db3c902bab1650df4e3c0dd8520d5f830f5d439 [file] [log] [blame]
%
% Licensed to the Apache Software Foundation (ASF) under one
% or more contributor license agreements. See the NOTICE file
% distributed with this work for additional information
% regarding copyright ownership. The ASF licenses this file
% to you under the Apache License, Version 2.0 (the
% "License"); you may not use this file except in compliance
% with the License. You may obtain a copy of the License at
%
% http://www.apache.org/licenses/LICENSE-2.0
%
% Unless required by applicable law or agreed to in writing,
% software distributed under the License is distributed on an
% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
% KIND, either express or implied. See the License for the
% specific language governing permissions and limitations
% under the License.
%
\subsection{\varAgent}
There is one \varAgent~per~\varNodeMachineComputer~per \varDUCC~cluster.
The \varAgents~collectively provide essential functionality for operation
of the \varDUCC~system.
The duties of the \varAgent~are:
\textit{
\input{part5/c01-Agent.tex}
}
At the direction of the \varProcessManager, each \varAgent~is
instructed to manage its assigned \varShares~by means of
\varLinuxControlGroups, and by injecting into them local
process elements comprising
Job Drivers, Job Processes, Service Processes, and Managed Processes.
The \varAgent~is subdivided into several responsibility areas:
\begin{itemize}
\item Core
\item Config
\item Deploy
\item Event
\item Exceptions
\item Launcher
\item Metrics Collectors
\item Monitor
\item Processors
\end{itemize}
\subsubsection{Core}
The \varAgent~publishes information about the state of the
\varNodeMachineComputer~it controls.
It also receives publications which it interprets to control
processes deployed thereon.
It also monitors activity on the \varNodeMachineComputer~and
ensures that only sanctioned processes are running.
The \varAgent~is normally launched at \varDUCC~system
start-up time.
However, \varAgents~may be started/stopped independently over time.
\varDUCC~is only able to deploy user submitted applications to a
\varNodeMachineComputer~upon which there exists an active \varAgent.
\subsubsection{Config}
\begin{itemize}
\item Agent Configuration
The \varAgent configures itself according to the
\varDuccProperties~file. Aspects include:
\begin{itemize}
\item launcher.thread.pool.size
\item launcher.process.stop.timeout
\item rogue.process.exclusion.filter
Processes in this list are exempt for rogue process detection
and termination.
\item rogue.process.user.exclusion.filter
Users in this list are exempt for rogue process detection
and termination.
\end{itemize}
The \varAgent publishes reports at configurable intervals:
\begin{itemize}
\item Node Inventory
Node Inventory is a report on the \varAgent-managed processes
on this node.
\item Node Metrics
Node Metrics is a report on the \varAgent-observed metrics
on this node.
\end{itemize}
\end{itemize}
\subsubsection{Deploy}
\begin{itemize}
\item Managed UIMA Service
The module is the \varAgent-managed integration between
\varUIMAAS~and the user supplied application code which is
deployed thereto.
\end{itemize}
\subsubsection{Event}
\begin{itemize}
\item Event Listener
The module handles publication events:
\begin{itemize}
\item Process Start
A notification from the \varProcessManager~to start a user submitted
process constrained to a \varResourceManager~allocated number of \varShares.
\item Process Stop
A notification from the \varProcessManager~to stop a user submitted
process.
\item Process Modify
A notification from the \varProcessManager~to modify a user submitted
process.
\item Process Purge
A notification from the \varProcessManager~to purge a user submitted
process.
\item Job State
A notification from the \varProcessManager~comprising abbreviated
state of the \varDUCC-managed collection of entities:
\varJobs, \varReservations~and \varServices.
\end{itemize}
\end{itemize}
\subsubsection{Launcher}
The modules comprising the Launcher package are tasked with
starting user processes on the \varAgent-managed \varNodeMachineComputer.
The modules are:
\begin{itemize}
\item CGroups Manager
This module provides functionality to partition the \varAgent-managed
\varNodeMachineComputer~into \varShares, each \varShare~with limits on one
or more aspects, including but not limited to memory and swap space.
The CGroups Manager essentially starts, maintains, and stops instant
virtual machines in correspondence with \varResourceManager~allocated
\varShares~into which user submitted processes are launched.
\item Command Executor
This module is the base class that provides functionality to
launch a user specified process within the
\varResourceManager~allocated \varShare.
\item \varDUCC~Command Executor
This module launches a user specified process within the
\varResourceManager~allocated \varShare.
The process may be constrained by a \varLinuxControlGroup~and
may be spawned as the submitting \varUser.
\item \varJVM~Args Parser
The \varJVM~Args Parser module extracts user specified \varJVM~arguments
for use in building an \varAgent-launchable subprocess comprising
the user specified executable code.
\item Launcher
The Managed Process module provides virtual \varAgent~capability.
This module comprises a method used to launch multiple Agents
on the same physical machine.
It allows for the scale up Agents on a single machine to simulate load.
Each Agent instance assumes a given name and IP address.
\item Managed Process
The Managed Process module manages a state machine for each
\varAgent-managed user process. The states comprise:
\begin{itemize}
\item Starting
\item Initializing
\item Ready
\item Failed
\item Stopped
\end{itemize}
\item Process Stream Consumer
The Process Stream Consumer module captures and redirects user process output
to a log file.
\end{itemize}
\subsubsection{Metrics Collectors}
The modules comprising the Metrics Collectors package observe, calculate
or otherwise gather specific metrics. Metrics collected are relative to
these main categories:
\begin{itemize}
\item Garbage Collection Statistics
\item Node CPU, Node CPU Usage, Node CPU Utilization
\item Node Load Average
\item Node Memory Info
\item Node Users
\item Process CPU Usage
\item Process Major Faults
\item Process Resident Memory
\item Process Swap Usage
\end{itemize}
\subsubsection{Monitor}
The modules comprising the Monitor package observe various states and
trigger actions when specific events occur.
\begin{itemize}
\item \varAgent~Monitor
When the \varAgent~detects problems with the network, broker, or ping
functions it terminates all \varAgent~deployed processes.
\item Rogue Process Detector
The \varAgent~detects aliens processes, those not expected for running
the \varOS~or \varDUCC~or user processes deployed by \varDUCC.
According to policy, the \varAgent~may take one or more actions:
\begin{itemize}
\item log an \varAlienDetected~event
\item send notification to subscribers of alien detection events
\item with root privilege, signal the alien process to terminate
\end{itemize}
\end{itemize}
\subsubsection{Processors}
The modules comprising the Processors package assemble information for
consideration when carrying out the \varAgent~duties as well as for publication
to other interested \varDUCC~daemons. Information collected are relative to
these main categories:
\begin{itemize}
\item Linux Node Metrics
\item Linux Process Metrics
\item Node Inventory
\item Node Metrics
\item Process Lifecycle
\item Process Metrics
\end{itemize}