blob: 524bb2ba927caff89a27f525c46d4ff8c0155738 [file] [log] [blame]
%
% Licensed to the Apache Software Foundation (ASF) under one
% or more contributor license agreements. See the NOTICE file
% distributed with this work for additional information
% regarding copyright ownership. The ASF licenses this file
% to you under the Apache License, Version 2.0 (the
% "License"); you may not use this file except in compliance
% with the License. You may obtain a copy of the License at
%
% http://www.apache.org/licenses/LICENSE-2.0
%
% Unless required by applicable law or agreed to in writing,
% software distributed under the License is distributed on an
% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
% KIND, either express or implied. See the License for the
% specific language governing permissions and limitations
% under the License.
%
\section{Overview.}
A DUCC service is defined by the following two criteria:
\begin{itemize}
\item A service is one or more long-running processes that await requests
and return something in response.
\item A service that is managed by DUCC is accompanied by a small program called a
``pinger'' that the DUCC Service Manager uses to gauge the availability and health of
the service. This pinger must always be be present. DUCC will supply a default
pinger for UIMA-AS services if none is specified.
Users may supply their own ``pingers'' by supplying a Java class that implements
the pinger API. This is referred to as a ``custom'' pinger in this document.
There are a number of service registration options which allow
specification and parametrization of custom pingers.
\end{itemize}
The pinger API enables the following functions for custom pingers:
\begin{itemize}
\item increase and decrease the number of service instances,
\item manage failure restart policies,
\item enable and disable service autostart,
\item notify the Service Manager of the date of last use of a service,
\item notify the Service Manager of the health and availability of a service,
\item returns a string for display in the DUCC Web server to show relevant service information
\end{itemize}
A service is usually a UIMA-AS service, but DUCC supports any arbitrary process as a service.
The DUCC Service Manager implements several high-level functions:
\begin{itemize}
\item Ensure services are available for jobs before allowing the jobs to start.
\item Enable fast-fail for jobs which reference services which are unavailable.
\item Start a service when it is referenced by a job, and stop it when no longer needed.
\item Optionally start a service when DUCC is booted.
\item Insure services remain operational across failures.
\item Report service failures.
\item Run service pingers and respond to the pinger API as needed.
\end{itemize}
When work enters the system with a declared dependency on a service, one of the following
actions is taken:
\begin{itemize}
\item If the service is not registered, the work request is automatically canceled (to avoid
wasting resources on a job that is known cannot succeed.)
\item If the service registered but not running, the Service Manager attempts to start it; the job
remains queued until the service is started and its pinger reports good health.
\item If the service exists but cannot be started, the remains queued and error
status is shown in the web server. Once the service is working again the
work is allowed to proceed. (Jobs already running are not directly affected, unless they
also cannot access the service.)
\item If the service processes are running but the pinger reports failure contacting the service,
the work remains queued with error status shown in the webserver. Once the service
pinger indicates the service is functional again the work is allowed to proceed.
\end{itemize}
\section{Service Types.}
\label{sec:services.types}
DUCC supports two types of services: UIMA-AS and CUSTOM:
\begin{description}
\item[UIMA-AS] This is a normal UIMA-AS service. DUCC fully supports all aspects of UIMA-AS
services with minimal effort from developers. A default pinger is supplied by DUCC
for UIMA-AS services. It is legal to define a custom pinger for a UIMA-AS service.
\item[CUSTOM] This is any arbitrary service. Developers must provide a custom pinger
and declare the pinger in the service registration.
\end{description}
DUCC also supports services that are not managed by DUCC. These are known as {\em ping-only}
services. The registration for a ping-only service contains only keywords needed to support a
pinger, which communicates with the non-DUCC service. Ping-only services must
be defined as custom services; there is no default pinger provided for ping-only services.
\section{Service Instance IDs}
\label{sec:service.service.ids}
DUCC 2.0.0 introduces support for constant service instance IDs. As a service is being
started, the SM assigns monotonically increasing IDs to each service instance, starting
with ID 0, up the the maximum number of instances started.
If an instance exits unexpectedly, the SM re-spawns it (unless a failure threshold has been
exceeded). The new instance is assigned the same instance ID as the instance it replaces.
This insures that, for example, instance ``three'' is always started as instance ``three'',
maintained constant over failures and SM restarts.
The instance ID is communicated to the process through the environment with the key
{\tt DUCC\_SERVICE\_INSTANCE}. This key may also be used in service registrations if it
is desired to pass the instance ID via parameters of some sort. For example:
\begin{verbatim}
service_jvm_args -DSERVICE_ID=${DUCC_SERVICE_INSTANCE}
process_executable_args -i ${DUCC_SERVICE_INSTANCE}
\end{verbatim}
\section{Service References and Endpoints}
\label{sec:service.endpoints}
Services are identified by an entity called a {\em service endpoint}. Jobs and other
services use the registered service endpoint to indicate dependencies on specific
services.
A service endpoint is of the form
\begin{verbatim}
<service-type>:<unique id>
\end{verbatim}
The {\em service-type} must be either UIMA-AS or CUSTOM.
The {\em unique id} is any string needed to ensure the service is
uniquely named. For UIMA-AS services, the unique ID must be the same as the
service endpoint specified in service's DD XML descriptor. The UIMA-AS
service endpoint is always of the form:
\begin{verbatim}
queue-name:broker-url
\end{verbatim}
where {\em queue-name} is the name of the ActiveMQ queue used by the service, and {\em broker-url}
is the ActiveMQ broker URL. Sample DUCC Service endpoints:
\begin{verbatim}
UIMA-AS:WikipediaSearchServices:tcp://broker1:61616
UIMA-AS:GoogleSearchServices:http://broker2:61618
\end{verbatim}
Jobs or other services may register dependencies on specific services by listing one or more
service endpoints int their specifications. See the
\hyperref[sec:cli.ducc-submit]{\em job } and
\hyperref[sec:cli.ducc-services]{\em services } CLI descriptions for details.
A service is registered with DUCC using the \hyperref[sec:cli.ducc-services]{ducc\_services}
API/CLI. Service registrations are persisted by DUCC and last over DUCC and cluster restarts.
\section{Service Management Policies}
\label{sec:service.management-policy}
The Service Manager implements these policies for managing services:
\begin{description}
\item[Autostarted Services] An autostarted service is automatically started when the DUCC
system is first booted. If an instance should die, DUCC automatically restarts the
instance and continually maintains the registered number of service instances.
By default, to handle fatal errors in {\em autostarted} services, The Service Manager maintains a time
window in which only a specific number of instance failures may occur. If the number of
failures within that window of time is excessive DUCC will set a {\em disabled} flag and
no longer restart instances. Instance which do not fail are left running. The {\em
disabled} flag must be manually reset once the problem is resolved before new instances
can be started.
The default failure policy is implemented in the service pinger.. Service
owners may redefine the default policy by supplying their own pingers for a service.
\item[Reference-started Services] A reference-started service is a registered service that
is started only when referenced by another job or service. If the service is already
started, the dependent job/service is marked ``Services Available'' and can be scheduled.. If
not, the service registry is checked and if a matching enabled service is found, it is
started by DUCC. While the service is being started, jobs are held ``Waiting For Services''
to ensure the service is viable. Once the service has completed initialization and the pinger
indicates it is viable, all work waiting on it is then marked ``Services Available'' and
started.
To handle fatal errors in {\em reference-started} services, The Service Manager maintains
a time window in which only a specific number of instance failures may occur. If the
number of failures within that window of time is excessive DUCC will set a {\em disabled}
flag and no longer restart instances. Instance which do not fail are left running. The
{\em disabled} flag must be manually reset once the problem is resolved before new
instances can be started. This default policy may be overridden by custom pingers.
When the last job or service that references the on-demand service exits, a timer is
established to keep the service alive for a while, in anticipation that it will be needed
again soon. When the keep-alive timer expires, and there are no more dependent jobs or
services, the reference-started service is automatically stopped to free up its resources
for other work. The time the service is allowed to remain alive is known as its
{\em linger} time and can be controlled with the {\em service\_linger} keyword in the
service registration.
\item[Manually started services] A service may be started via the CLI if it is not
already running and in the absence of references by other work. A service which is
manually stared by the CLI can only be stopped manually by the CLI.
As is the case for {\em autostarted} and {\em reference-started} services, failed
instances will be restarted unless the number of failures within the failure window
is exceeded and the {\em disable} flag is set.
\item[Ping-Only Services]
\phantomsection\label{subsub:services.ping-only}
Ping-only services consist of only
a ping thread. The service itself is not managed in any way by DUCC. This is useful for
managing dependencies on services that are not under DUCC control: the pinger is used
to assess the viability of the external service and prevent dependent jobs from
continuing if the service is unavailable.
Only CUSTOM services may be defined as ping-only services in this version of DUCC.
\end{description}
\paragraph{Dynamically Changing Service Policies}
A service may be {\em stopped}; that is, no instances are running. This state can occur
if the service has experienced too many errors within its failure window, in which case
the service is {\em disabled}, or because the service is not {\em autostarted} or {\em referenced} by
other work.
If a manual {\em stop} is issued the service will be automatically {\em disabled} to insure it
cannot be restarted (by {\em reference} or at boot with {\em autostart}) without manual
intervention.
In all cases, if a service is {\em disabled}, it must be manually {\em enabled} using the CLI.
It is possible, via the CLI, to dynamically switch any service from any management policy
to any other policy, as shown in the following table.
See the \hyperref[sec:cli.ducc-services]{\em Service CLI } reference for details on the various
commands described in this section.
\begin{tabular}{| l | l | p{6cm} | p{6cm} |}
\hline
Current Mode & Desired Mode & Action & Notes \\
\hline
\hline
Autostart & Manual & Use CLI to modify registration to {\em autstart false}. & Service does not stop until requested by CLI. Service will not start at DUCC boot.\\
\hline
Autostart & Reference & Use CLI to modify registration to {\em autostart false} and {\em observe references}. & Service stops after last reference exits, plus {\em linger} time.\\
\hline
Autostart & {\em Stopped} & Use CLI to stop the service. & The CLI stop will by necessity {\em disable} the service to insure it remains stopped. \\
\hline
Reference & Autostart & Use CLI to modify registration to {\em autostart true}. & Service continues to run after last reference exits. Service always started at DUCC boot. \\
\hline
Reference & Manual & Use CLI to {\em ignore references}. & Service continues to run after last reference exits. \\
\hline
Reference & {\em Stopped} & Use CLI to stop the service. & The CLI stop will by necessity {\em disable} the service to insure it remains stopped. \\
\hline
Manual & Autostart & Use CLI to modify registraiton to {\em autostart true}. & Service will be started on DUCC boots. \\
\hline
Manual & Reference & Use CLI to {\em observe references}. & Service will stop after last referencing job exits, plus {\em linger} time.. \\
\hline
Manual & {\em Stopped} & Use CLI to stop the service. & The CLI stop will by necessity {\em disable} the service to insure it remains stopped. \\
\hline
{\em Stopped} & Autostart & Use CLI to modify registration to {\em autostart true}. & Service will start immediately. It may be necessary to {\em enable} the service as well.\\
\hline
{\em Stopped} & Reference & Submit a job or service that references the service. & It may be necessary to {\em enable} the service as well.
The service will stop after the last referencing work exits, plus {\em linger}. \\
\hline
{\em Stopped} & Manual & Use CLI to start the service. & The CLI start will also {\em enable} the service if necessary. \\
\hline
\end{tabular}
\section{Service Pingers}
\label{sec:service.pingers}
A service pinger is a small program that queries a service on behalf of the DUCC Service
Manager. A default pinger is provided for UIMA-AS services and provides the following
functions:
\begin{itemize}
\item Determine if the service is responsive by issuing a UIMA-AS ``get-meta'' call
to the service.
\item Determine the health of the service by issuing a JMX call to the UIMA-AS broker
to collect queueing statistics.
\item Manage the failure window of the service.
\item Returns a string with basic ActiveMQ statistics about the service, or
error information if the service is deemed unusable.
\item Returns date of last use of the service (as determined by presence or
absence of service producers attached to the service queue).
\end{itemize}
Users may supply their own pingers. The following additional functions are available for
pingers. Note that a {\em custom} pinger MAY be supplied for UIMA-AS services, and
MUST be supplied for CUSTOM services. Custom pingers use the Service Manager's
``pinger'' API to perform the following tasks:
\begin{itemize}
\item Inform the Service Manager if the service is viable.
\item Inform the Service Manager if the service is ``healthy''. Service ``health''
is a heuristic used in the DUCC Web server as an alert that a service
is responding but may
not be performing well.
\item Manage service failure policies. Default failure-window policy is
provided to all pingers by the DUCC API handler (optional).
\item Return a string describing current service status, for use by the
web server.
\item Instruct the service manager to increase the number of instances (optional).
\item Instruct the service manager to decrease the number of instances (optional).
\item Enable and disable the services autostart flag (optional).
\item Enable logging of a service's health and state (optional).
\item Return date of last-use to the Service Manager for display in the
webserver (optional).
\end{itemize}
\subsection{The Pinger API}
Pingers are passed static information about the service at pinger-initialization
time, and subsequently, current state of the service is provided on each call (ping).
Information provided at initialization follows. Most of this is
provided in fields in the {\em AServicePing} base class. See the Javadoc for
specific field names and types.
\subsubsection{Pinger Initialization Data}
Data provided once, during pinger initialization, includes:
\begin{description}
\item[Arguments] This is the {\em service\_ping\_arguments} string from the
service registration.
\item[Endpoint] This is the CUSTOM:string or UIMA-AS:string endpoint provided
in the service registration.
\item[Monitor Rate] This is the rate at which the pinger will be called by
the SM, as provided in DUCC's configuration.
\item[Service ID] This is the \hyperref[sec:service.service.ids]{unique numeric service ID} assigned to the service
by DUCC.
\item[Log Enabled] Whether the service log is enabled, as specified by the
{\em service\_ping\_dolog} registration parameter.
\item[Maximum Allowed Failures] This is the value of the {\em instance\_failures\_limit}
parameter, provided by DUCC configuration and optionally overridden by the
service registration.
\item[Instance Failure Window] This is the value of the {\em instance\_failures\_window}
parameter, provided by DUCC configuration and optionally overridden by the
service registration.
\item[Autostart Enabled] This indicates whether the service registration currently
has the {\em autostart} flag enabled.
\item[Last Use] This is the time of last known use of the service, persisted and
maintained over SM restarts. It is 0 if unknown or the service has never been
used.
\end{description}
\subsubsection{Pinger Dynamic Data}
Dynamic information provided to the pinger in each call (ping) consists of:
\begin{description}
\item[All Instance Information] This is an array consisting of the unique integer
IDS of all running processes implementing the service. This includes instances
which may not be currently viable for some reason (still initializing, for example).
\item[Active Instance Information] This is an array consisting of the unique integer
IDS of all running processes implementing the service. This is a subset of
``All Instance Information'' and includes only the service instances that are advanced
to Running state.
\item[Reference Information] This is an array consisting of the unique integer
IDS of all DUCC work (Jobs, other Services, etc) currently referencing the
service.
\item[Autostart Enabled] The current state of the service's autostart flag.
\item[Run Failures] This is the total number of instance failures for the
service since the last start of the SM.
\end{description}
Only a Java API is supported.
\subsection{Declaring a Pinger in A Service}
The following registration options are used for declaring and configuring pingers. Any of these
may be dynamically modified with the service CLI's {\em$--$modify} option. Dynamically changing
these causes the current pinger to be terminated and restarted with the new configuration. See
\hyperref[sec:cli.ducc-services]{ducc\_services} for details of the options:
\begin{itemize}
\item service\_ping\_arguments
\item service\_ping\_class
\item service\_ping\_classpath
\item service\_ping\_jvmargs
\item service\_ping\_timeout
\item service\_ping\_dolog
\item instance\_failures\_window
\item instance\_failures\_limit
\end{itemize}
\subsection{Implementing a Pinger}
Pingers must implement the class {\tt org.apache.uima.ducc.cli.AServicePing}. See the
Javadoc for the details of this class.
Below is a sample CUSTOM pinger for a hypothetical service that returns four integers in
response to a ping. It illustrates simple use of the three required methods, {\em init()},
{\em stop()}, and {\em getStatistics()}.
\begin{figure}[H]
\begin{verbatim}
import java.io.DataInputStream;
import java.io.InputStream;
import java.net.Socket;
import org.apache.uima.ducc.cli.AServicePing;
import org.apache.uima.ducc.cli.ServiceStatistics;
public class CustomPing
extends AServicePing
{
String host;
String port;
public void init(String args, String endpoint) throws Exception {
// Parse the service endpoint, which is a String of the form
// host:port
String[] parts = endpoint.split(":");
host = parts[1];
port = parts[2];
}
public void stop() { }
private long readLong(DataInputStream dis) throws Exception {
return Long.reverseBytes(dis.readLong());
}
public ServiceStatistics getStatistics() {
// Contact the service, interpret the results, and return a state
// object for the service.
ServiceStatistics stats = new ServiceStatistics(false, false,"<NA>");
try {
Socket sock = new Socket(host, Integer.parseInt(port));
DataInputStream dis = new DataInputStream(sock.getInputStream());
long stat1 = readLong(dis); long stat2 = readLong(dis);
long stat3 = readLong(dis); long stat4 = readLong(dis);
stats.setAlive(true); stats.setHealthy(true);
stats.setInfo( "S1[" + stat1 + "] S2[" + stat2 +
"] S3[" + stat3 + "] S4[" + stat4 + "]" );
} catch ( Throwable t) {
t.printStackTrace();
stats.setInfo(t.getMessage());
}
return stats;
}
}
\end{verbatim}
\caption{Sample UIMA-AS Service Pinger}
\label{fig:service.custom.pinger}
\end{figure}
\subsection{Building And Testing Your Pinger}
This section provides the information needed to use the pinger API and build a
custom pinger.
\paragraph{1. Establish a compilation CLASSPATH} One DUCC jar is required in the CLASSPATH to build your pinger:
\begin{verbatim}
DUCC_HOME/lib/uima-ducc-cli.jar
\end{verbatim}
This provides the definition for the {\em AServicePing} and {\em ServiceStatistics} classes.
\paragraph{2. Create a registration}Next, create a service registration for the pinger. While
debugging, it is useful set the directive
\begin{verbatim}
service_ping_dolog = true
\end{verbatim}
This will log any output from {\tt System.out.println()} to the declared log directory
for the service. If not specified in the reqistration, this directory is:
\begin{verbatim}
$HOME/ducc/logs/S-<serviceid>/services
\end{verbatim}
where {\tt$<$servicid$>$} is the DUCC-assigned ID of your service.
Once the pinger is debugged you may want to turn logging off.
\begin{verbatim}
service_ping_dolog = false
\end{verbatim}
If your pinger requires a different version of Java than is used by DUCC, include a
setting for the JAVA\_HOME variable in the environment option.
A sample service registration may look something like the following. Note that you do not need
to include any of the DUCC jars in the classpath for the pinger. DUCC will add the jars it
requires to interact with the pinger automatically. (However you may need other jars to
provide UIMA, UIMA-AS, ActiveMQ, Spring, or other function.)
\begin{verbatim}
bash-3.2$ cat myping.svc
description = Ping-only service
service_request_endpoint = CUSTOM:localhost:7175
service_ping_class = CustomPing
service_ping_classpath = /myhome/CustomPing.class
service_ping_dolog = true
service_ping_timeout = 500
service_ping_aruments = Arg1 Arg2
service_ping_jvm_args = -DXmx50M
environment = JAVA_HOME=/share/jdk1.8 OTHER_VARIABLE=something
\end{verbatim}
\paragraph{3. Register and start the service and pinger} Start up your custom service so the pinger with
the registration containing lines similar to those above. As soon as the service instance is in
DUCC state {\em Running} the SM starts the pinger.
Check the web server to make sure the service ``comes alive''. Check your pinger's
debugging log if it doesn't. Once registered, you can dynamically modify and restart the pinger at any time without
re-registering the service or restarting the service by use of the {\tt $--$modify} option of the
\hyperref[sec:cli.ducc-services]{\em ducc\_services CLI:}
\begin{verbatim}
ducc_services --modify <serviceid> --service_ping_dolog true
ducc_services --modify <serviceid> --service_ping_class OtherCustsomPing
--service_ping_classpath /myhome/OtherCustomPing.class
\end{verbatim}
where $<$serviceid$>$ is the id returned when you registered the pinger.
\paragraph{4. If all else fails ...}
If your pinger does not work and you cannot determine the reason, be sure you enable {\em service\_ping\_dolog} and
look in your log directory, as most problems with pingers are reflected there. As a last resort, you can
inspect the the Service Manager's log in
\begin{verbatim}
$DUCC_HOME/logs/sm.log
\end{verbatim}
\subsection{Globally Registered Pingers}
\label{subsec:services.pingers}
A user-built pinger may be registered with DUCC so that it can be globally used by any DUCC service. To do
this, a registration file containing only pinger-specific parameters is created in DUCC's run-time
directory. Such a pinger may then be designated for a service by using its registered filename
instead of its class in the {\em service\_ping\_class} field of a registration. There is no API or
CLI to register such a pinger; only a DUCC administrator may create a global ping registration.
A globally-registered pinger may then be designated to run as a thread inside the SM or as a
process spawned and managed by the SM. A pinger that runs in a thread in the SM is
called an {\em internal} pinger, and one that runs in a process is called an {\em external}
pinger. An {\em internal} pinger generally has nearly unmeasurable impact on the system,
whereas {\em external} pingers will occupy full JVMs with processes of 50-100MB or more.
A service may override any of the options of a globally-registered {\em external} pinger,
thus allowing significant reuse of existing code. Only the {\em service\_ping\_arguments}
of an {\em internal} pinger may be overridden however.
The default UIMA-AS pinger is permanently registered as an {\em internal} pinger.
Globally registered pingers use a special boolean property, not supported by the
{\em ducc\_services} API/CLI, ``internal'', to determine whether the pinger is
to be run internally to SM or as an external process. Only the DUCC administrator
may update a global pinger's registration to ``internal'', to insure such pingers
are properly vetted and approved by the installation.
More Details of registering global pingers is found in the
\hyperref[chap:sm]{\em Administration section} of this document.
\section{Sample Pinger}
A sample custom UIMA-AS pinger is provided in the Examples directory shipped
with DUCC in
\begin{verbatim}
DUCC_HOME/examples/src/org/apache/uima/ducc/ping
\end{verbatim}
This pinger increases or decreases the number of service instances based
on the queue statistics found by querying ActiveMQ. The goal of this
pinger is to maintain the ActiveMQ ``enqueued time'' to be no more than
some multiple of the average service time for a single item. The factor
used is a parameter passed in with the argument string.
\subsection{Using the Sample Pinger}
The following arguments may be specified to use the sample pinger with any UIMA-AS service. The
{\em service\_ping\_arguments} are specific to this pinger.
\begin{verbatim}
service_ping_class=org.apache.uima.ducc.ping.SamplePing
service_ping_arguments=meta-timeout=15010,broker-jmx-port=1099,window=5,min=1,
max=20,max-growth=3,fast-shrink=true,goal=2.5
service_ping_classpath = ${DUCC_HOME}/lib/uima-ducc/examples/*:
${DUCC_HOME}/apache-uima/lib/*:
${DUCC_HOME}/apache-uima/apache-activemq/lib/*:
${DUCC_HOME}/lib//springframework/*
service_ping_dolog=True
service_ping_timeout=10000
instance_failures_window = ${ducc.sm.instance.failure.window}
instance_failures_limit = ${ducc.sm.instance.failure.max}
\end{verbatim}
The full source for the sample pinger is found in
\begin{verbatim}
DUCC_HOME/examples/src/org/apache/uima/ducc/ping/SamplePing.java
\end{verbatim}
The following arguments are accepted by this pinger and may be specified in a single single
comma-delimited string containing the following initialization parameters:
\begin{description}
\item[meta-timeout] Defines how long to wait for {\em get\_meta} to return.
\item[broker-jmx-port] Defines the JMX port of the service's broker.
\item[window] Defines the shrinkage/growth window size, in minutes.
\item[enable-log] Enable extra logging.
\item[min] The minimum number of service instances to maintain.
\item[max] The maximum number of service instances to allow.
\item[max-growth] The maximum number of instances to grow in a
single request.
\item[fast-shrink] If set, allow services to shrink if the
queue depth is 0, even if consumer are connected. Otherwise
we do not shrink if consumers are attached to the queue.
\item[goal] The multiplier of the ActiveMQ Broker's {\em average enqeue}
time to attempt to maintain by managing the number of instances.
\end{description}
\subsection{Understanding Sample Pinger}
The best way to understand this pinger is to examine the code itself in the
Examples directory. Here we provide a brief line-by-line synopsis of the code.
\paragraph{void init(String args, String ep)}
This required method examines the service arguments and endpoint and establishes a monitor
to issue {\em get-meta} calls to the service and {\em JMS} calls to the
ActiveMq broker. The argument string {\em args} is described above. The
endpoint {\em ep} is the service endpoint used to register the service.
\paragraph{Lines 100-119}
These lines parse the endpoint {\em ep} its components comprising the
UIMA-AS queue name and the URL to the service broker.
\paragraph{Lines 121-125}
These lines disable most UIMA-AS logging as these messages can be quite
numerous. However, during debugging it may be desired to change the logging
levels here.
\paragraph{Lines 130-172}
These lines parse the service argument string {\em args} into its constituent
parts and places the values in variables. They initialize the expansion
and deletion window and normalize it to one slot per minute, regardless of
the actual ping rate.
The window normalization uses the DUCC-supplied value {\em monitor\_rate}
to determine the number of slots in the windows.
\paragraph{Lines 176-177}
These lines initialize the DUCC-supplied {\em UimaAsServiceMonitor} that
queries the UIMA-AS queues, and it resets the queue statistics via JMX so the
monitor can make accurate measurements.
\paragraph{Lines 181-187}
These lines implement the required {\em stop} method which is invoked when
the Service Manager needs to stop the pinger for any reason. They stop the
ActiveMQ queue monitor and emit a shutdown message.
\paragraph{Lines 191-240}
These lines define the required {\em getStatistics} method. This
method collects ActiveMQ statistics, issue {\em get-meta} to the
service to see if it is responding, sets the formatted information
string into the ping reply, and invokes the code to calculate a
potential redeployment of service instances.
\paragraph{Lines 245-248}
These lines override the optional {\em getLastUse} method which
simply returns the time of last known use of the service. The actual
value is calculated in the pinger-specific {\em calculateNewDeployment}
method, described below.
\paragraph{Lines 253-298}
These lines define the pinger-specific {\em calculateNewDeployment}
method. This is invoked after {\em get-meta} is called and after the
UIMA-AS queue has been queried in ActiveMQ. This is the key method of
this pinger. It uses information passed in on the last ping from the
Service Manager in conjunction with information in the ActiveMQ queue
to determine if more, or fewer service instances are needed to meet the
performance goals. If fewer instances are needed, it selects specific
instances to stop. The method is
\hyperref[subsec:services.calculate-new]{\em described in detail} below.
\paragraph{Lines 407-410}
These lines override the optional {\em getAdditions} method. The method
returns the number of new service instances required to meet performance
goals, as calculated in
\hyperref[subsec:services.calculate-new]{\em calculateNewDeployment}.
Regardless of what this method returns, the Service Manager may choose
not to start new instances, based on its configured maximum,
{\em ducc.sm.max.instances} as defined in {\em ducc.properties}.
\paragraph{Lines 416-419}
These lines override the optional {\em getDeletions} method. This
method returns the specific service instances to be stopped, if any.
The DUCC-assigned unique IDs of all service instances are passed in to
the pinger on each ping. These instances are monotonically increasing
over time so pingers may assume that lower numbers represent older
instances.
\paragraph{Lines 429-480}
These lines define a class used as a call-back on the UIMA-AS
{\em get-meta} requests to determine the host and PID of the
service instance responding to the {\em get-meta}. If the
{\em get-meta} request should timeout, this information can be used to
help identify ailing or overloaded service instances.
\subsection{Calculating New Deployments in the Pinger}
\label{subsec:services.calculate-new}
his section details the use of ActiveMQ queue statistics
in conjunction with the Service Monitor data to calculate the number
of service instances to increase or decrease.
It is important that this code be very careful about ``smoothing'' the
performance statistics to keep growth and shrinkage stable. Things
to take into consideration include:
\begin{enumerate}
\item Immediately after a new service instance becomes available to
serve, if there is demand for this service, the ActiveMQ statistics
will fluctuate for a few minutes until traffic stabilizes. Thus
decisions based on these statistics must reflect history as well as
current information.
\item Immediately after a client begins to use a service, the statistics
will also fluctuate, again requiring smoothing.
\item The DUCC work dispatching model will not over-dispatch work to the
job processes. Thus actual demand on a service is a function of the
number of actively deployed and initialized JPs. If the number of
JPs decreases due to preemption, demand on the service by that job
will decrease proportionally. Similarly, demand can increase as the
job expands.
It is common for demand on a service to ramp up slowly as
a job enters the system, and increase rapidly as a job completes its
initialization phase and starts to double. Thus, the ActiveMQ statistics
can be quite erratic for a while, until the job stabilizes.
This again requires some sort of smoothing of the data when making
decisions about service growth and shrinkage.
\end{enumerate}
To handle this data smoothing, the SamplePing classes uses two time-based {\em windows}, one for
growth, and one for shrinkage, to keep growth and shrinkage stable. The window size is defined
in the service ping argument {\em window}.
Each window period, if more
services are needed, a mark is made in the current slot of the {\em expansion window}; otherwise
the current slot is cleared. Similarly, each period, if fewer services are needed, a mark is
made in the {\em shrinkage window}; otherwise, the current slot is cleared.
After the marks are made, if the {\em expansion window} has all slots filled,
a request for new processes is made; thus, a short period of increased does not
destabilize the system with a request for services that may be of little use.
Additionally, when a request is made, the number of new processes requested is
capped by the ping argument {\em max-growth} to insure that the service
grows smoothly. And finally, if the service is already at some configured maximum
number of instances, defined by the {\em max} parameter, no additional instances
are requested.
Similarly, the {\em shrinkage window} is used to govern shrinkage. All slots must be
filled, indicating the service has been over-provisioned for a while, before a request
is made to delete instances. The number of instances is never reduced below the
configured {\em min} value. As well, this particular pinger never shrinks by more than
a single instance at a time, on the reasoning that it is more costly to start a new
service than to maintain one for too long. Only if there is no long-term use of
the extra instances are they reduced (as
determined by the window).
Given this introduction, we describe the key method in detail.
\paragraph{Lines 262-277}
These lines extract four quantities from the ActiveMQ statistics:
\begin{enumerate}
\item Average enqueue time, {\em eT}
\item Current queue depth, {\em Q}
\item The current number of service consumers {\em cc}
\item The current number of service producers {\em pc}
\end{enumerate}
The code then gets the DUCC IDs of all the currently started service
instances, and the number of instances that are started but still in
their ``initialization'' phase. This is important because instances that
are still initializing are not servicing the queue, but will soon start
to do so. The current ActiveMQ statistics reflect do NOT yet reflect
this however, they reflect only the instances that are actually serving.
Finally, if there are service producers, we note the time of day to
return to the SM as the last known use of this service by some process.
\paragraph{Lines 267}
This line calculates the number of Java threads per service instance, needed to calculate the
maximum capacity of the service in its current deployment.
(Note that in each UIMA-AS service, UIMA-AS itself occupies one thread, used to
manage the service, and this thread manifests itself as a consumer
on the queue.)
\paragraph{Line 301}
This declares {\em new\_ni}, the number of additional instances, if any.
At the end of this method, new\_ni will either be 0 or $>$0.
\paragraph{Lines 303-312}
If the current queue depth is 0 (Q $==$ 0), we know a number of things:
\begin{enumerate}
\item The service is not over-provisioned; there is no work queued and
waiting for some service. We therefor do not need to expand.
\item If there are no consumers, i.e. no clients that need work done,
we are potentially over-provisioned, so we fill in a slot in the
expansion window.
If there {\em are} consumers, we may not want to
shrink because it is possible that one of the service instances is
busy; we cannot tell. So we allow the {\em fast-shrink}
ping argument to govern whether or not connected consumers may
prevent service shrinkage.
\end{enumerate}
There is nothing else that can be said about a service if its
current queue depth is 0.
\paragraph{Lines 312-360}
If the queue depth is non-zero we are able to calculate the total
service capacity and the amount each instance contributes to the
total capacity. From this we can determine
\begin{enumerate}
\item whether the service is performing at or near its goal,
\item if the service is performing worse than its goal, how many
new instances are needed to meet the goal, and
\item if the service is performing better than its goal, how many
instances can be given up and still meet the goal.
\end{enumerate}
Details follow.
\paragraph{Lines 314 and 315}
The average time a single instance takes to serve a single request, {\em Ti} is given
by the simple formula
\begin{verbatim}
Ti = (eT / Q) * active
\end{verbatim}
where
\begin{description}
\item[eT] is the average time an item stays in queue (from AMQ),
\item[Q] is the current queue depth (from AMQ),
\item[active] is the current number of service instances (from SM)
\end{description}
Therefore the time taken by a single thread {\em Ti} is given by
\begin{verbatim}
Tt = Ti * nthreads
\end{verbatim}
\paragraph{Lines 319 and 320}
We want {\em Tt} to become close to the current
\begin{verbatim}
Tt * goal
\end{verbatim}
where {\em goal} is given by the ping arguments. The
current ratio of actual service time to desired is then given by
\begin{verbatim}
r = eT / g
\end{verbatim}
Because we know that the DUCC job driver will never over-commit; that is,
we know the current demand will remain constant unless the jobs using the
service expand or contract (which are relatively rare events), we can state
that the number of service instances required is directly proportional
to {\em r}.
If $r > 1$ we may need more instances to meet our {\em goal} and if
$r < 1$ we may be over-provisioned.
\paragraph{Lines 325-347}
If $r > 1$ we may be over-provisioned. We calculate the number of required
instances by multiplying the current instances by {\em r} and rounding down.
We account for instances that we know are starting but not yet started,
cap on max instances per service, and again on max growth per cycle.
If we still require additions, we make a mark in the expansion window,
otherwise we clear the expansion window.
\paragraph{Lines 349-360}
If $r < 1$ we need to calculate shrinkage. Because starting instances
is expensive we conservatively use $r < .5$ instead and make a mark
in the shrinkage window.
Otherwise we clear the mark in the shrinkage window.
\paragraph{Lines 367-396}
Finally we sum across the shrinkage and expansion windows. If either
window is full, we schedule growth (line 375, set the variable {\em additions})
or shrinkage (line 388, set {\em deletions}).
Note that to schedule shrinkage, we must choose a specific instance. In this
case we choose the {\em newest} instance, i.e. the one with the largest
DUCC ID, as it is most likely not to have initialized, or perhaps not to
have ``warmed up'' (i.e. caches filled, etc.). We could choose more than
one but this pinger is conservative and only shrinks by one instance
each time.
\subsection{Summary of Sample Pinger}
This pinger illustrates these functions over-and above the functions provided
by the default UIMA-AS pinger:
\begin{enumerate}
\item Use of pinger-specific arguments
\item Use of information provided by SM on each ping (service instances
active, total service instances,
\item Use of performance information acquired from ActiveMQ
\item Requesting new service instances of the SM
\item Requesting that instances be removed by SM,
\item Setting of last-use of a service
\end{enumerate}
It illustrates one mechanism for smoothing growth and shrinkage of a service
to prevent thrashing in your system.
It illustrates one mechanism for determining the actual performance of
a service by analyzing ActiveMQ queueing statistics.
It illustrates the use of ``globally registered pingers.''