uima-ducc-duccdocs/src/site/tex/duccbook/part1/overview.tex - uima-ducc - Git at Google

 %
 % Licensed to the Apache Software Foundation (ASF) under one
 % or more contributor license agreements.  See the NOTICE file
 % distributed with this work for additional information
 % regarding copyright ownership.  The ASF licenses this file
 % to you under the Apache License, Version 2.0 (the
 % "License"); you may not use this file except in compliance
 % with the License.  You may obtain a copy of the License at
 %
 %   http://www.apache.org/licenses/LICENSE-2.0
 %
 % Unless required by applicable law or agreed to in writing,
 % software distributed under the License is distributed on an
 % "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 % KIND, either express or implied.  See the License for the
 % specific language governing permissions and limitations
 % under the License.
 %
 % Create well-known link to this spot for HTML version
 \ifpdf
 \else
 \HCode{<a name='DUCC_OVERVIEW'></a>}
 \fi
 \chapter{DUCC Overview}

     \section{What is DUCC?}

     DUCC stands for Distributed UIMA Cluster Computing. DUCC is a cluster management system
     providing tooling, management, and scheduling facilities to automate the scale-out of
     applications written to the UIMA framework.

     Core UIMA provides a generalized framework for applications that process unstructured
     information such as human language, but does not provide a scale-out mechanism. UIMA-AS provides
     a scale-out mechanism to distribute UIMA pipelines over a cluster of computing resources, but
     does not provide job or cluster management of the resources. DUCC defines a formal job model
     that closely maps to a standard UIMA pipeline. Around this job model DUCC provides cluster
     management services to automate the scale-out of UIMA pipelines over computing clusters.

     \section{DUCC Job Model}

     The Job Model defines the steps necessary to scale-up a UIMA pipeline using DUCC.  The goal of
     DUCC is to allow the application logic to be unchanged.

     The DUCC Job model consists of standard UIMA components: a Collection Reader (CR), a CAS
     Multiplier (CM), application logic as implemented one or more Analysis Engines (AE), and a CAS
     Consumer (CC).  In theory, any CR, or CM will work with DUCC, but DUCC is all about scale-out.  In
     order to achieve good scale-out these components must be constructed in a specific way.

     The Collection Reader builds input CASs and forwards them to the UIMA pipelines.  In the DUCC
     model, the CR is run in a process separate from the rest of the pipeline. In fact, in all but the
     smallest clusters it is run on a different physical machine than the rest of the pipeline.  To
     achieve scalability, the CR must create very small CASs that do not contain application data,
     but which contain references to data; for instance, file names.  Ideally, the CR should be
     runnable in a process not much larger than the smallest Java virtual machine.  Later sections
     demonstrate methods for achieving this.

     Each pipeline must contain at least one CAS Multiplier which receives the CASs from the CR.  The
     CMs encapsulate the knowledge of how to receive the data references in the small CASs received
     from the CRs and deliver the referenced data to the application pipeline.  DUCC packages the CM,
     AE(s), and CC into a single process, multiple instances of which are then deployed over the
     cluster.

     DUCC does not provide any mechanism for receiving output CASs.  Each application must
     supply its own CAS Consumer which serializes the output of the pipelines for
     consumption by other entities (as serialized CASs, for example).

     A DUCC job therefore consists of a small specification containing the following items:

     \begin{itemize}
       \item The name of a resource containing the CR descriptor.
       \item The name of a resource containing the CM descriptor.
       \item The name of a resource containing the AE descriptor.
       \item The name of a resource containing the CC descriptor.
       \item Other information required to parametrize the above and identify the job
         such as log directory, working directory, desired scale-out, etc.  These are
         described in detail in subsequent sections.
     \end{itemize}

     On job submission, DUCC examines the job specification and automatically creates a scaled-out
     UIMA-AS service with a single process executing the CR as a UIMA-AS client and and as many
     processes as possible executing the combined CM, AE, and CC pipeline as UIMA-AS service
     instances.

     DUCC provides other facilities in support of scale-out:
     \begin{itemize}
       \item The ability to reserve all or part of a node in the cluster.
       \item Automated management of services required in support of jobs.
       \item The ability to schedule and execute arbitrary processes on nodes in the
         cluster.
       \item Debugging tools and support.
       \item A web server to display and manage work and cluster status.
       \item A CLI and a Java API to support the above.
     \end{itemize}

     \section{DUCC From UIMA to Full Scale-out}

     In this section we demonstrate the progression of a simple UIMA pipeline to a fully
     scaled-out job running under DUCC.

     \paragraph{UIMA Pipelines}
     A normal UIMA pipeline
     contains a Collection Reader, one or more Analysis Engines connected in a pipeline, and a CAS
     Consumer as shown in \hyperref[fig:UIMA-pipeline]{Figure ~\ref{fig:UIMA-pipeline}}.

     \begin{figure}[H]
       \centering
 %      \includegraphics[bb=0 0 575 310, width=5.5in]{images/uima-pipeline.jpg}
       \includegraphics[width=5.5in]{images/uima-pipeline.jpg}
       \caption{Standard UIMA Pipeline}
       \label{fig:UIMA-pipeline}
     \end{figure}

     \paragraph{UIMA-AS  Scaled Pipeline}
     With UIMA-AS the CR is separated into a discrete process and a CAS Multiplier is introduced
     into the pipeline as an interface between the CR and the pipeline, as shown in
     \hyperref[fig:UIMA-AS-pipeline]{Figure ~\ref{fig:UIMA-AS-pipeline}} below.
     Multiple pipelines are serviced by the
     CR and are scaled-out over a computing cluster.  The difficulty with this model is that each
     user is individually responsible for finding and scheduling computing nodes, installing
     communication software such as ActiveMQ, and generally managing the distributed job and
     associated hardware.

     \begin{figure}[H]
       \centering
 %      \includegraphics[bb=0 0 584 341, width=5.5in]{images/uima-as-pipeline.jpg}
       \includegraphics[width=5.5in]{images/uima-as-pipeline.jpg}
       \caption{UIMA Pipeline As Scaled by UIMA-AS}
       \label{fig:UIMA-AS-pipeline}
     \end{figure}

     \paragraph{UIMA-AS Pipeline Scaled By DUCC}
     DUCC is a UIMA and  UIMA-AS-aware cluster manager.  To scale out work under DUCC the developer
     tells DUCC what the parts of the application are, and DUCC does the work to build the
     scale-out via UIMA/AS, to find and schedule resources, to deploy the parts of the application
     over the cluster, and to manage the jobs while it executes.

     On job submission, the DUCC Command Line Interface (CLI) inspects the XML defining the analytic
     and generates a UIMA-AS Deployment Descriptor (DD).  The DD establishes some number of pipeline
     threads per process (as indicated in the DUCC job parameters), and generates job-unique queues.

     Under DUCC, the Collection Reader is executed in a process called the Job Driver (or JD). The
     pipelines are executed in one or more processes called Job Processes (or JPs). The JD
     process provides a thin wrapper over the CR to enable communication with DUCC.  The JD uses the
     CR to implement a UIMA-AS client delivering CASs to the multiple (scaled-out) pipelines,
     shown in \hyperref[fig:UIMA-AS-pipeline-DUCC]{Figure ~\ref{fig:UIMA-AS-pipeline-DUCC}} below.

     \begin{figure}[H]
       \centering
 %      \includegraphics[bb=0 0 571 311, width=5.5in]{images/ducc-sequential.jpg}
       \includegraphics[width=5.5in]{images/ducc-sequential.jpg}
       \caption{UIMA Pipeline As Automatically Scaled Out By DUCC}
       \label{fig:UIMA-AS-pipeline-DUCC}
     \end{figure}

     \paragraph{UIMA-AS Pipeline with User-Supplied DD Scaled By DUCC}

     Application programmers may supply their own Deployment Descriptors to control intra-process
     threading and scale-out.  If a DD is supplied in the job parameters, DUCC will use this instead
     of generating one as depicted in \hyperref[fig:UIMA-AS-pipeline-DUCC-DD]{Figure ~\ref{fig:UIMA-AS-pipeline-DUCC-DD}} below.

     \begin{figure}[H]
       \centering
 %      \includegraphics[bb=0 0 571 316,width=5.5in]{images/ducc-parallel.jpg}
       \includegraphics[width=5.5in]{images/ducc-parallel.jpg}
       \caption{UIMA Pipeline With User-Supplied DD as Automatically Scaled Out By DUCC}
       \label{fig:UIMA-AS-pipeline-DUCC-DD}
     \end{figure}


     \section{Error Management }
     DUCC provides a number of facilities to assist error management:

     \begin{itemize}
       \item DUCC uses the UIMA-AS error-handling facilities to reflect errors from the Job Processes
         to the Job Drivers. The JD wrappers implement logic to enforce error thresholds, to identify
         and log errors, and to reflect job problems in the DUCC Web Server.  All error thresholds are
         configurable both globally and on a per-job basis.

       \item Error and timeout thresholds are implemented for both the initialization phase of a pipeline
         and the execution phase.

       \item Retry-after-error is supported: if a process has a failure on some CAS after
         initialization is successful, the process is terminated and all affected CASs are retried, up to some
         configurable threshold.

       \item DUCC ensures that processes can successfully initialize before fully scaling out a job,
         to ensure a cluster is not overwhelmed with errant processes.

       \item Various error conditions encountered  while a job is running will prevent the errant job
         from continuing scale out, and can result in termination of the job.
       \end{itemize}

     \section{Cluster and Job Management}
     DUCC supports  management of multiple jobs and multiple users in a distributed cluster:

     \begin{description}
         \item[Multiple User Support] DUCC runs all work under the identity of the submitting user. Logs
           are written with the user's credentials into the user's file space designated at job
           submission.

         \item[Fair-Share Scheduling] DUCC provides a Fair-Share scheduler to equitably share
           resources among multiple users.  The scheduler also supports semi-permanent reservation of
           full or partial machines.

         \item[Service Management] DUCC provides a Service Manager capable of automatically starting, stopping, and
           otherwise managing and querying both UIMA-AS and non-UIMA-AS services in support of jobs.

         \item[Job Lifetime Management and Orchestration] DUCC includes an Orchestrator to manage the
           lifetimes of all entities in the system.

         \item[Node Sharing] DUCC allocates processes from one or more users on a node, each with a specified
           amount of memory.  DUCC's preferred mechanism for constraining memory use is Linux
           Control Groups, or CGroups.  For nodes that do not suport CGroups, DUCC agents monitor
           RAM use and kill processes that exceed their share size by a settable fudge factor.

         \item[DUCC Agents] DUCC Agents manage each node's local resources and all
           processes started by DUCC. Each node in a cluster has exactly one Agent. The Agent
           \begin{itemize}
             \item Monitors and reports node capabilities (memory, etc) and performance data (CPU busy,
               swap, etc).
             \item Starts, stops, and monitors all processes on behalf of users.
             \item Patrols the node for ``foreign'' (non-DUCC) processes, reporting them to the
               Web Server, and optionally reaping them.
             \item Ensures job processes do not exceed their declared memory requirements
               through the use of Linux Cgroups.
           \end{itemize}

         \item[DUCC Web server] DUCC  provides a web server displaying all aspects of the system:
           \begin{itemize}
               \item All jobs in the system, their current state, resource usage, etc.

               \item All reserved resources and associated information (owner, etc.),
                 including the ability to request and cancel reservations.

               \item All services, including the ability to start, stop, and modify
                 service definitions.

               \item All nodes in the system and their status, usage, etc.

               \item The status of all DUCC management processes.

               \item Access to documentation.
           \end{itemize}


         \item[Cluster Management Support] DUCC provides system management support to:
           \begin{itemize}
               \item Start, stop, and query full DUCC systems.

               \item Start, stop, and quiesce individual DUCC components.

               \item Add and delete nodes from the DUCC system.

               \item Discover DUCC processes (e.g. after partial failures).

               \item Find and kill errant job processes belonging to individual users.

               \item Monitor and display inter-DUCC messages.
           \end{itemize}
       \end{description}


     \section{Security Measures}
     The following DUCC security measures are provided:

     \begin{description}
     \item[command line interface] The CLI employs HTTP to send requests
     to the DUCC controller.  The CLI creates and employs public and private
     security keys in the user's home directory for authentication of HTTP
     requests.  The controller validates requests via these same security keys.
     \item[webserver] The webserver facilitates operational control and
     therefore authentication is desirable.
     \begin{itemize}
     \item[\textit{user}] Each user has the ability to control certain aspects of
     only his/her active submissions.
     \item[\textit{admin}] Each administrator has the ability to control certain
     aspects of any user's active submissions, as well as modification of some
     DUCC operational characteristics.
     \end{itemize}
     A simple interface is provided so
     that an installation can plug-in a site specific authentication mechanism
     comprising userid and password.
     \item[ActiveMQ] TBD.
     \end{description}

     \section{Security Issues}
     The following DUCC security issues should be considered:

     \begin{description}
     \item[submit transmission 'sniffed'] In the event that the DUCC submit
     command is 'sniffed' then the user authentication mechanism is compromised
     and user masquerading is possible.  That is, the userid encryption mechanism
     can be exploited such that user A can submit a job pretending to be user B.
     \item[user \textit{ducc} password compromised] In the event that the \textit{ducc}
     user password is compromised then the root privileged command
     \textbf{ducc\_ling} can be used to become any other user except root.
     \item[user \textit{root} password compromised] In the event that the
     \textit{root} user password is compromised DUCC provides no protection.
     That is, compromising the root user is equivalent to compromising the DUCC
     user password.
     \end{description}
	%
	% Licensed to the Apache Software Foundation (ASF) under one
	% or more contributor license agreements. See the NOTICE file
	% distributed with this work for additional information
	% regarding copyright ownership. The ASF licenses this file
	% to you under the Apache License, Version 2.0 (the
	% "License"); you may not use this file except in compliance
	% with the License. You may obtain a copy of the License at
	%
	% http://www.apache.org/licenses/LICENSE-2.0
	%
	% Unless required by applicable law or agreed to in writing,
	% software distributed under the License is distributed on an
	% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	% KIND, either express or implied. See the License for the
	% specific language governing permissions and limitations
	% under the License.
	%
	% Create well-known link to this spot for HTML version
	\ifpdf
	\else
	\HCode{<a name='DUCC_OVERVIEW'></a>}
	\fi
	\chapter{DUCC Overview}

	\section{What is DUCC?}

	DUCC stands for Distributed UIMA Cluster Computing. DUCC is a cluster management system
	providing tooling, management, and scheduling facilities to automate the scale-out of
	applications written to the UIMA framework.

	Core UIMA provides a generalized framework for applications that process unstructured
	information such as human language, but does not provide a scale-out mechanism. UIMA-AS provides
	a scale-out mechanism to distribute UIMA pipelines over a cluster of computing resources, but
	does not provide job or cluster management of the resources. DUCC defines a formal job model
	that closely maps to a standard UIMA pipeline. Around this job model DUCC provides cluster
	management services to automate the scale-out of UIMA pipelines over computing clusters.

	\section{DUCC Job Model}

	The Job Model defines the steps necessary to scale-up a UIMA pipeline using DUCC. The goal of
	DUCC is to allow the application logic to be unchanged.

	The DUCC Job model consists of standard UIMA components: a Collection Reader (CR), a CAS
	Multiplier (CM), application logic as implemented one or more Analysis Engines (AE), and a CAS
	Consumer (CC). In theory, any CR, or CM will work with DUCC, but DUCC is all about scale-out. In
	order to achieve good scale-out these components must be constructed in a specific way.

	The Collection Reader builds input CASs and forwards them to the UIMA pipelines. In the DUCC
	model, the CR is run in a process separate from the rest of the pipeline. In fact, in all but the
	smallest clusters it is run on a different physical machine than the rest of the pipeline. To
	achieve scalability, the CR must create very small CASs that do not contain application data,
	but which contain references to data; for instance, file names. Ideally, the CR should be
	runnable in a process not much larger than the smallest Java virtual machine. Later sections
	demonstrate methods for achieving this.

	Each pipeline must contain at least one CAS Multiplier which receives the CASs from the CR. The
	CMs encapsulate the knowledge of how to receive the data references in the small CASs received
	from the CRs and deliver the referenced data to the application pipeline. DUCC packages the CM,
	AE(s), and CC into a single process, multiple instances of which are then deployed over the
	cluster.

	DUCC does not provide any mechanism for receiving output CASs. Each application must
	supply its own CAS Consumer which serializes the output of the pipelines for
	consumption by other entities (as serialized CASs, for example).

	A DUCC job therefore consists of a small specification containing the following items:

	\begin{itemize}
	\item The name of a resource containing the CR descriptor.
	\item The name of a resource containing the CM descriptor.
	\item The name of a resource containing the AE descriptor.
	\item The name of a resource containing the CC descriptor.
	\item Other information required to parametrize the above and identify the job
	such as log directory, working directory, desired scale-out, etc. These are
	described in detail in subsequent sections.
	\end{itemize}

	On job submission, DUCC examines the job specification and automatically creates a scaled-out
	UIMA-AS service with a single process executing the CR as a UIMA-AS client and and as many
	processes as possible executing the combined CM, AE, and CC pipeline as UIMA-AS service
	instances.

	DUCC provides other facilities in support of scale-out:
	\begin{itemize}
	\item The ability to reserve all or part of a node in the cluster.
	\item Automated management of services required in support of jobs.
	\item The ability to schedule and execute arbitrary processes on nodes in the
	cluster.
	\item Debugging tools and support.
	\item A web server to display and manage work and cluster status.
	\item A CLI and a Java API to support the above.
	\end{itemize}

	\section{DUCC From UIMA to Full Scale-out}

	In this section we demonstrate the progression of a simple UIMA pipeline to a fully
	scaled-out job running under DUCC.

	\paragraph{UIMA Pipelines}
	A normal UIMA pipeline
	contains a Collection Reader, one or more Analysis Engines connected in a pipeline, and a CAS
	Consumer as shown in \hyperref[fig:UIMA-pipeline]{Figure ~\ref{fig:UIMA-pipeline}}.

	\begin{figure}[H]
	\centering
	% \includegraphics[bb=0 0 575 310, width=5.5in]{images/uima-pipeline.jpg}
	\includegraphics[width=5.5in]{images/uima-pipeline.jpg}
	\caption{Standard UIMA Pipeline}
	\label{fig:UIMA-pipeline}
	\end{figure}

	\paragraph{UIMA-AS Scaled Pipeline}
	With UIMA-AS the CR is separated into a discrete process and a CAS Multiplier is introduced
	into the pipeline as an interface between the CR and the pipeline, as shown in
	\hyperref[fig:UIMA-AS-pipeline]{Figure ~\ref{fig:UIMA-AS-pipeline}} below.
	Multiple pipelines are serviced by the
	CR and are scaled-out over a computing cluster. The difficulty with this model is that each
	user is individually responsible for finding and scheduling computing nodes, installing
	communication software such as ActiveMQ, and generally managing the distributed job and
	associated hardware.

	\begin{figure}[H]
	\centering
	% \includegraphics[bb=0 0 584 341, width=5.5in]{images/uima-as-pipeline.jpg}
	\includegraphics[width=5.5in]{images/uima-as-pipeline.jpg}
	\caption{UIMA Pipeline As Scaled by UIMA-AS}
	\label{fig:UIMA-AS-pipeline}
	\end{figure}

	\paragraph{UIMA-AS Pipeline Scaled By DUCC}
	DUCC is a UIMA and UIMA-AS-aware cluster manager. To scale out work under DUCC the developer
	tells DUCC what the parts of the application are, and DUCC does the work to build the
	scale-out via UIMA/AS, to find and schedule resources, to deploy the parts of the application
	over the cluster, and to manage the jobs while it executes.

	On job submission, the DUCC Command Line Interface (CLI) inspects the XML defining the analytic
	and generates a UIMA-AS Deployment Descriptor (DD). The DD establishes some number of pipeline
	threads per process (as indicated in the DUCC job parameters), and generates job-unique queues.

	Under DUCC, the Collection Reader is executed in a process called the Job Driver (or JD). The
	pipelines are executed in one or more processes called Job Processes (or JPs). The JD
	process provides a thin wrapper over the CR to enable communication with DUCC. The JD uses the
	CR to implement a UIMA-AS client delivering CASs to the multiple (scaled-out) pipelines,
	shown in \hyperref[fig:UIMA-AS-pipeline-DUCC]{Figure ~\ref{fig:UIMA-AS-pipeline-DUCC}} below.

	\begin{figure}[H]
	\centering
	% \includegraphics[bb=0 0 571 311, width=5.5in]{images/ducc-sequential.jpg}
	\includegraphics[width=5.5in]{images/ducc-sequential.jpg}
	\caption{UIMA Pipeline As Automatically Scaled Out By DUCC}
	\label{fig:UIMA-AS-pipeline-DUCC}
	\end{figure}

	\paragraph{UIMA-AS Pipeline with User-Supplied DD Scaled By DUCC}

	Application programmers may supply their own Deployment Descriptors to control intra-process
	threading and scale-out. If a DD is supplied in the job parameters, DUCC will use this instead
	of generating one as depicted in \hyperref[fig:UIMA-AS-pipeline-DUCC-DD]{Figure ~\ref{fig:UIMA-AS-pipeline-DUCC-DD}} below.

	\begin{figure}[H]
	\centering
	% \includegraphics[bb=0 0 571 316,width=5.5in]{images/ducc-parallel.jpg}
	\includegraphics[width=5.5in]{images/ducc-parallel.jpg}
	\caption{UIMA Pipeline With User-Supplied DD as Automatically Scaled Out By DUCC}
	\label{fig:UIMA-AS-pipeline-DUCC-DD}
	\end{figure}


	\section{Error Management }
	DUCC provides a number of facilities to assist error management:

	\begin{itemize}
	\item DUCC uses the UIMA-AS error-handling facilities to reflect errors from the Job Processes
	to the Job Drivers. The JD wrappers implement logic to enforce error thresholds, to identify
	and log errors, and to reflect job problems in the DUCC Web Server. All error thresholds are
	configurable both globally and on a per-job basis.

	\item Error and timeout thresholds are implemented for both the initialization phase of a pipeline
	and the execution phase.

	\item Retry-after-error is supported: if a process has a failure on some CAS after
	initialization is successful, the process is terminated and all affected CASs are retried, up to some
	configurable threshold.

	\item DUCC ensures that processes can successfully initialize before fully scaling out a job,
	to ensure a cluster is not overwhelmed with errant processes.

	\item Various error conditions encountered while a job is running will prevent the errant job
	from continuing scale out, and can result in termination of the job.
	\end{itemize}

	\section{Cluster and Job Management}
	DUCC supports management of multiple jobs and multiple users in a distributed cluster:

	\begin{description}
	\item[Multiple User Support] DUCC runs all work under the identity of the submitting user. Logs
	are written with the user's credentials into the user's file space designated at job
	submission.

	\item[Fair-Share Scheduling] DUCC provides a Fair-Share scheduler to equitably share
	resources among multiple users. The scheduler also supports semi-permanent reservation of
	full or partial machines.

	\item[Service Management] DUCC provides a Service Manager capable of automatically starting, stopping, and
	otherwise managing and querying both UIMA-AS and non-UIMA-AS services in support of jobs.

	\item[Job Lifetime Management and Orchestration] DUCC includes an Orchestrator to manage the
	lifetimes of all entities in the system.

	\item[Node Sharing] DUCC allocates processes from one or more users on a node, each with a specified
	amount of memory. DUCC's preferred mechanism for constraining memory use is Linux
	Control Groups, or CGroups. For nodes that do not suport CGroups, DUCC agents monitor
	RAM use and kill processes that exceed their share size by a settable fudge factor.

	\item[DUCC Agents] DUCC Agents manage each node's local resources and all
	processes started by DUCC. Each node in a cluster has exactly one Agent. The Agent
	\begin{itemize}
	\item Monitors and reports node capabilities (memory, etc) and performance data (CPU busy,
	swap, etc).
	\item Starts, stops, and monitors all processes on behalf of users.
	\item Patrols the node for ``foreign'' (non-DUCC) processes, reporting them to the
	Web Server, and optionally reaping them.
	\item Ensures job processes do not exceed their declared memory requirements
	through the use of Linux Cgroups.
	\end{itemize}

	\item[DUCC Web server] DUCC provides a web server displaying all aspects of the system:
	\begin{itemize}
	\item All jobs in the system, their current state, resource usage, etc.

	\item All reserved resources and associated information (owner, etc.),
	including the ability to request and cancel reservations.

	\item All services, including the ability to start, stop, and modify
	service definitions.

	\item All nodes in the system and their status, usage, etc.

	\item The status of all DUCC management processes.

	\item Access to documentation.
	\end{itemize}


	\item[Cluster Management Support] DUCC provides system management support to:
	\begin{itemize}
	\item Start, stop, and query full DUCC systems.

	\item Start, stop, and quiesce individual DUCC components.

	\item Add and delete nodes from the DUCC system.

	\item Discover DUCC processes (e.g. after partial failures).

	\item Find and kill errant job processes belonging to individual users.

	\item Monitor and display inter-DUCC messages.
	\end{itemize}
	\end{description}


	\section{Security Measures}
	The following DUCC security measures are provided:

	\begin{description}
	\item[command line interface] The CLI employs HTTP to send requests
	to the DUCC controller. The CLI creates and employs public and private
	security keys in the user's home directory for authentication of HTTP
	requests. The controller validates requests via these same security keys.
	\item[webserver] The webserver facilitates operational control and
	therefore authentication is desirable.
	\begin{itemize}
	\item[\textit{user}] Each user has the ability to control certain aspects of
	only his/her active submissions.
	\item[\textit{admin}] Each administrator has the ability to control certain
	aspects of any user's active submissions, as well as modification of some
	DUCC operational characteristics.
	\end{itemize}
	A simple interface is provided so
	that an installation can plug-in a site specific authentication mechanism
	comprising userid and password.
	\item[ActiveMQ] TBD.
	\end{description}

	\section{Security Issues}
	The following DUCC security issues should be considered:

	\begin{description}
	\item[submit transmission 'sniffed'] In the event that the DUCC submit
	command is 'sniffed' then the user authentication mechanism is compromised
	and user masquerading is possible. That is, the userid encryption mechanism
	can be exploited such that user A can submit a job pretending to be user B.
	\item[user \textit{ducc} password compromised] In the event that the \textit{ducc}
	user password is compromised then the root privileged command
	\textbf{ducc\_ling} can be used to become any other user except root.
	\item[user \textit{root} password compromised] In the event that the
	\textit{root} user password is compromised DUCC provides no protection.
	That is, compromising the root user is equivalent to compromising the DUCC
	user password.
	\end{description}