| % |
| % Licensed to the Apache Software Foundation (ASF) under one |
| % or more contributor license agreements. See the NOTICE file |
| % distributed with this work for additional information |
| % regarding copyright ownership. The ASF licenses this file |
| % to you under the Apache License, Version 2.0 (the |
| % "License"); you may not use this file except in compliance |
| % with the License. You may obtain a copy of the License at |
| % |
| % http://www.apache.org/licenses/LICENSE-2.0 |
| % |
| % Unless required by applicable law or agreed to in writing, |
| % software distributed under the License is distributed on an |
| % "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| % KIND, either express or implied. See the License for the |
| % specific language governing permissions and limitations |
| % under the License. |
| % |
| |
| \section{Overview.} |
| A DUCC service is defined by the following two criteria: |
| \begin{itemize} |
| \item A service is one or more long-running processes that await requests |
| and return something in response. |
| \item A service that is managed by DUCC is accompanied by a small program called a |
| ``pinger'' that the DUCC Service Manager uses to gauge the availability and health of |
| the service. This pinger must always be be present. DUCC will supply a default |
| pinger for UIMA-AS services if none is specified. |
| |
| Users may supply their own ``pingers'' by supplying a Java class that implements |
| the pinger API. This is referred to as a ``custom'' pinger in this document. |
| There are a number of service registration options which allow |
| specification and parametrization of custom pingers. |
| |
| \end{itemize} |
| The pinger API enables the following functions for custom pingers: |
| \begin{itemize} |
| \item increase and decrease the number of service instances, |
| \item manage failure restart policies, |
| \item enable and disable service autostart, |
| \item notify the Service Manager of the date of last use of a service, |
| \item notify the Service Manager of the health and availability of a service, |
| \item returns a string for display in the DUCC Web server to show relevant service information |
| \end{itemize} |
| |
| |
| A service is usually a UIMA-AS service, but DUCC supports any arbitrary process as a service. |
| |
| The DUCC Service Manager implements several high-level functions: |
| |
| \begin{itemize} |
| \item Ensure services are available for jobs before allowing the jobs to start. |
| \item Enable fast-fail for jobs which reference services which are unavailable. |
| \item Start a service when it is referenced by a job, and stop it when no longer needed. |
| \item Optionally start a service when DUCC is booted. |
| \item Insure services remain operational across failures. |
| \item Report service failures. |
| \item Run service pingers and respond to the pinger API as needed. |
| \end{itemize} |
| |
| When work enters the system with a declared dependency on a service, one of the following |
| actions is taken: |
| \begin{itemize} |
| \item If the service is not registered, the work request is automatically canceled (to avoid |
| wasting resources on a job that is known cannot succeed.) |
| \item If the service registered but not running, the Service Manager attempts to start it; the job |
| remains queued until the service is started and its pinger reports good health. |
| \item If the service exists but cannot be started, the remains queued and error |
| status is shown in the web server. Once the service is working again the |
| work is allowed to proceed. (Jobs already running are not directly affected, unless they |
| also cannot access the service.) |
| \item If the service processes are running but the pinger reports failure contacting the service, |
| the work remains queued with error status shown in the webserver. Once the service |
| pinger indicates the service is functional again the work is allowed to proceed. |
| \end{itemize} |
| |
| \section{Service Types.} |
| \label{sec:services.types} |
| DUCC supports two types of services: UIMA-AS and CUSTOM: |
| |
| \begin{description} |
| \item[UIMA-AS] This is a normal UIMA-AS service. DUCC fully supports all aspects of UIMA-AS |
| services with minimal effort from developers. A default pinger is supplied by DUCC |
| for UIMA-AS services. It is legal to define a custom pinger for a UIMA-AS service. |
| |
| \item[CUSTOM] This is any arbitrary service. Developers must provide a custom pinger |
| and declare the pinger in the service registration. |
| \end{description} |
| |
| DUCC also supports services that are not managed by DUCC. These are known as {\em ping-only} |
| services. The registration for a ping-only service contains only keywords needed to support a |
| pinger, which communicates with the non-DUCC service. Ping-only services must |
| be defined as custom services; there is no default pinger provided for ping-only services. |
| |
| \section{Service Instance IDs} |
| \label{sec:service.service.ids} |
| DUCC 2.0.0 introduces support for constant service instance IDs. As a service is being |
| started, the SM assigns monotonically increasing IDs to each service instance, starting |
| with ID 0, up the the maximum number of instances started. |
| |
| If an instance exits unexpectedly, the SM re-spawns it (unless a failure threshold has been |
| exceeded). The new instance is assigned the same instance ID as the instance it replaces. |
| This insures that, for example, instance ``three'' is always started as instance ``three'', |
| maintained constant over failures and SM restarts. |
| |
| The instance ID is communicated to the process through the environment with the key |
| {\tt DUCC\_SERVICE\_INSTANCE}. This key may also be used in service registrations if it |
| is desired to pass the instance ID via parameters of some sort. For example: |
| \begin{verbatim} |
| service_jvm_args -DSERVICE_ID=${DUCC_SERVICE_INSTANCE} |
| process_executable_args -i ${DUCC_SERVICE_INSTANCE} |
| \end{verbatim} |
| |
| \section{Service References and Endpoints} |
| \label{sec:service.endpoints} |
| Services are identified by an entity called a {\em service endpoint}. Jobs and other |
| services use the registered service endpoint to indicate dependencies on specific |
| services. |
| |
| A service endpoint is of the form |
| \begin{verbatim} |
| <service-type>:<unique id> |
| \end{verbatim} |
| |
| The {\em service-type} must be either UIMA-AS or CUSTOM. |
| |
| The {\em unique id} is any string needed to ensure the service is |
| uniquely named. For UIMA-AS services, the unique ID must be the same as the |
| service endpoint specified in service's DD XML descriptor. The UIMA-AS |
| service endpoint is always of the form: |
| \begin{verbatim} |
| queue-name:broker-url |
| \end{verbatim} |
| where {\em queue-name} is the name of the ActiveMQ queue used by the service, and {\em broker-url} |
| is the ActiveMQ broker URL. Sample DUCC Service endpoints: |
| \begin{verbatim} |
| UIMA-AS:WikipediaSearchServices:tcp://broker1:61616 |
| UIMA-AS:GoogleSearchServices:http://broker2:61618 |
| \end{verbatim} |
| |
| Jobs or other services may register dependencies on specific services by listing one or more |
| service endpoints int their specifications. See the |
| \hyperref[sec:cli.ducc-submit]{\em job } and |
| \hyperref[sec:cli.ducc-services]{\em services } CLI descriptions for details. |
| |
| A service is registered with DUCC using the \hyperref[sec:cli.ducc-services]{ducc\_services} |
| API/CLI. Service registrations are persisted by DUCC and last over DUCC and cluster restarts. |
| |
| \section{Service Management Policies} |
| \label{sec:service.management-policy} |
| |
| The Service Manager implements these policies for managing services: |
| \begin{description} |
| |
| \item[Autostarted Services] An autostarted service is automatically started when the DUCC |
| system is first booted. If an instance should die, DUCC automatically restarts the |
| instance and continually maintains the registered number of service instances. |
| |
| By default, to handle fatal errors in {\em autostarted} services, The Service Manager maintains a time |
| window in which only a specific number of instance failures may occur. If the number of |
| failures within that window of time is excessive DUCC will set a {\em disabled} flag and |
| no longer restart instances. Instance which do not fail are left running. The {\em |
| disabled} flag must be manually reset once the problem is resolved before new instances |
| can be started. |
| |
| The default failure policy is implemented in the service pinger.. Service |
| owners may redefine the default policy by supplying their own pingers for a service. |
| |
| \item[Reference-started Services] A reference-started service is a registered service that |
| is started only when referenced by another job or service. If the service is already |
| started, the dependent job/service is marked ``Services Available'' and can be scheduled.. If |
| not, the service registry is checked and if a matching enabled service is found, it is |
| started by DUCC. While the service is being started, jobs are held ``Waiting For Services'' |
| to ensure the service is viable. Once the service has completed initialization and the pinger |
| indicates it is viable, all work waiting on it is then marked ``Services Available'' and |
| started. |
| |
| To handle fatal errors in {\em reference-started} services, The Service Manager maintains |
| a time window in which only a specific number of instance failures may occur. If the |
| number of failures within that window of time is excessive DUCC will set a {\em disabled} |
| flag and no longer restart instances. Instance which do not fail are left running. The |
| {\em disabled} flag must be manually reset once the problem is resolved before new |
| instances can be started. This default policy may be overridden by custom pingers. |
| |
| When the last job or service that references the on-demand service exits, a timer is |
| established to keep the service alive for a while, in anticipation that it will be needed |
| again soon. When the keep-alive timer expires, and there are no more dependent jobs or |
| services, the reference-started service is automatically stopped to free up its resources |
| for other work. The time the service is allowed to remain alive is known as its |
| {\em linger} time and can be controlled with the {\em service\_linger} keyword in the |
| service registration. |
| |
| \item[Manually started services] A service may be started via the CLI if it is not |
| already running and in the absence of references by other work. A service which is |
| manually stared by the CLI can only be stopped manually by the CLI. |
| |
| As is the case for {\em autostarted} and {\em reference-started} services, failed |
| instances will be restarted unless the number of failures within the failure window |
| is exceeded and the {\em disable} flag is set. |
| |
| \item[Ping-Only Services] |
| \phantomsection\label{subsub:services.ping-only} |
| Ping-only services consist of only |
| a ping thread. The service itself is not managed in any way by DUCC. This is useful for |
| managing dependencies on services that are not under DUCC control: the pinger is used |
| to assess the viability of the external service and prevent dependent jobs from |
| continuing if the service is unavailable. |
| |
| Only CUSTOM services may be defined as ping-only services in this version of DUCC. |
| |
| \end{description} |
| |
| \paragraph{Dynamically Changing Service Policies} |
| A service may be {\em stopped}; that is, no instances are running. This state can occur |
| if the service has experienced too many errors within its failure window, in which case |
| the service is {\em disabled}, or because the service is not {\em autostarted} or {\em referenced} by |
| other work. |
| |
| If a manual {\em stop} is issued the service will be automatically {\em disabled} to insure it |
| cannot be restarted (by {\em reference} or at boot with {\em autostart}) without manual |
| intervention. |
| |
| In all cases, if a service is {\em disabled}, it must be manually {\em enabled} using the CLI. |
| |
| It is possible, via the CLI, to dynamically switch any service from any management policy |
| to any other policy, as shown in the following table. |
| |
| See the \hyperref[sec:cli.ducc-services]{\em Service CLI } reference for details on the various |
| commands described in this section. |
| |
| \begin{tabular}{| l | l | p{6cm} | p{6cm} |} |
| \hline |
| Current Mode & Desired Mode & Action & Notes \\ |
| \hline |
| \hline |
| Autostart & Manual & Use CLI to modify registration to {\em autstart false}. & Service does not stop until requested by CLI. Service will not start at DUCC boot.\\ |
| \hline |
| Autostart & Reference & Use CLI to modify registration to {\em autostart false} and {\em observe references}. & Service stops after last reference exits, plus {\em linger} time.\\ |
| \hline |
| Autostart & {\em Stopped} & Use CLI to stop the service. & The CLI stop will by necessity {\em disable} the service to insure it remains stopped. \\ |
| \hline |
| Reference & Autostart & Use CLI to modify registration to {\em autostart true}. & Service continues to run after last reference exits. Service always started at DUCC boot. \\ |
| \hline |
| Reference & Manual & Use CLI to {\em ignore references}. & Service continues to run after last reference exits. \\ |
| \hline |
| Reference & {\em Stopped} & Use CLI to stop the service. & The CLI stop will by necessity {\em disable} the service to insure it remains stopped. \\ |
| \hline |
| Manual & Autostart & Use CLI to modify registraiton to {\em autostart true}. & Service will be started on DUCC boots. \\ |
| \hline |
| Manual & Reference & Use CLI to {\em observe references}. & Service will stop after last referencing job exits, plus {\em linger} time.. \\ |
| \hline |
| Manual & {\em Stopped} & Use CLI to stop the service. & The CLI stop will by necessity {\em disable} the service to insure it remains stopped. \\ |
| \hline |
| {\em Stopped} & Autostart & Use CLI to modify registration to {\em autostart true}. & Service will start immediately. It may be necessary to {\em enable} the service as well.\\ |
| \hline |
| {\em Stopped} & Reference & Submit a job or service that references the service. & It may be necessary to {\em enable} the service as well. |
| The service will stop after the last referencing work exits, plus {\em linger}. \\ |
| \hline |
| {\em Stopped} & Manual & Use CLI to start the service. & The CLI start will also {\em enable} the service if necessary. \\ |
| \hline |
| \end{tabular} |
| |
| \section{Service Pingers} |
| \label{sec:service.pingers} |
| A service pinger is a small program that queries a service on behalf of the DUCC Service |
| Manager. A default pinger is provided for UIMA-AS services and provides the following |
| functions: |
| \begin{itemize} |
| \item Determine if the service is responsive by issuing a UIMA-AS ``get-meta'' call |
| to the service. |
| \item Determine the health of the service by issuing a JMX call to the UIMA-AS broker |
| to collect queueing statistics. |
| \item Manage the failure window of the service. |
| \item Returns a string with basic ActiveMQ statistics about the service, or |
| error information if the service is deemed unusable. |
| \item Returns date of last use of the service (as determined by presence or |
| absence of service producers attached to the service queue). |
| \end{itemize} |
| |
| Users may supply their own pingers. The following additional functions are available for |
| pingers. Note that a {\em custom} pinger MAY be supplied for UIMA-AS services, and |
| MUST be supplied for CUSTOM services. Custom pingers use the Service Manager's |
| ``pinger'' API to perform the following tasks: |
| \begin{itemize} |
| \item Inform the Service Manager if the service is viable. |
| \item Inform the Service Manager if the service is ``healthy''. Service ``health'' |
| is a heuristic used in the DUCC Web server as an alert that a service |
| is responding but may |
| not be performing well. |
| \item Manage service failure policies. Default failure-window policy is |
| provided to all pingers by the DUCC API handler (optional). |
| \item Return a string describing current service status, for use by the |
| web server. |
| \item Instruct the service manager to increase the number of instances (optional). |
| \item Instruct the service manager to decrease the number of instances (optional). |
| \item Enable and disable the services autostart flag (optional). |
| \item Enable logging of a service's health and state (optional). |
| \item Return date of last-use to the Service Manager for display in the |
| webserver (optional). |
| \end{itemize} |
| |
| \subsection{The Pinger API} |
| |
| Pingers are passed static information about the service at pinger-initialization |
| time, and subsequently, current state of the service is provided on each call (ping). |
| |
| Information provided at initialization follows. Most of this is |
| provided in fields in the {\em AServicePing} base class. See the Javadoc for |
| specific field names and types. |
| |
| \subsubsection{Pinger Initialization Data} |
| Data provided once, during pinger initialization, includes: |
| \begin{description} |
| \item[Arguments] This is the {\em service\_ping\_arguments} string from the |
| service registration. |
| \item[Endpoint] This is the CUSTOM:string or UIMA-AS:string endpoint provided |
| in the service registration. |
| \item[Monitor Rate] This is the rate at which the pinger will be called by |
| the SM, as provided in DUCC's configuration. |
| \item[Service ID] This is the \hyperref[sec:service.service.ids]{unique numeric service ID} assigned to the service |
| by DUCC. |
| \item[Log Enabled] Whether the service log is enabled, as specified by the |
| {\em service\_ping\_dolog} registration parameter. |
| \item[Maximum Allowed Failures] This is the value of the {\em instance\_failures\_limit} |
| parameter, provided by DUCC configuration and optionally overridden by the |
| service registration. |
| \item[Instance Failure Window] This is the value of the {\em instance\_failures\_window} |
| parameter, provided by DUCC configuration and optionally overridden by the |
| service registration. |
| \item[Autostart Enabled] This indicates whether the service registration currently |
| has the {\em autostart} flag enabled. |
| \item[Last Use] This is the time of last known use of the service, persisted and |
| maintained over SM restarts. It is 0 if unknown or the service has never been |
| used. |
| \end{description} |
| |
| \subsubsection{Pinger Dynamic Data} |
| |
| Dynamic information provided to the pinger in each call (ping) consists of: |
| \begin{description} |
| \item[All Instance Information] This is an array consisting of the unique integer |
| IDS of all running processes implementing the service. This includes instances |
| which may not be currently viable for some reason (still initializing, for example). |
| |
| \item[Active Instance Information] This is an array consisting of the unique integer |
| IDS of all running processes implementing the service. This is a subset of |
| ``All Instance Information'' and includes only the service instances that are advanced |
| to Running state. |
| |
| \item[Reference Information] This is an array consisting of the unique integer |
| IDS of all DUCC work (Jobs, other Services, etc) currently referencing the |
| service. |
| |
| \item[Autostart Enabled] The current state of the service's autostart flag. |
| |
| \item[Run Failures] This is the total number of instance failures for the |
| service since the last start of the SM. |
| \end{description} |
| |
| Only a Java API is supported. |
| |
| \subsection{Declaring a Pinger in A Service} |
| |
| The following registration options are used for declaring and configuring pingers. Any of these |
| may be dynamically modified with the service CLI's {\em$--$modify} option. Dynamically changing |
| these causes the current pinger to be terminated and restarted with the new configuration. See |
| \hyperref[sec:cli.ducc-services]{ducc\_services} for details of the options: |
| \begin{itemize} |
| \item service\_ping\_arguments |
| \item service\_ping\_class |
| \item service\_ping\_classpath |
| \item service\_ping\_jvmargs |
| \item service\_ping\_timeout |
| \item service\_ping\_dolog |
| \item instance\_failures\_window |
| \item instance\_failures\_limit |
| \end{itemize} |
| |
| |
| \subsection{Implementing a Pinger} |
| Pingers must implement the class {\tt org.apache.uima.ducc.cli.AServicePing}. See the |
| Javadoc for the details of this class. |
| |
| Below is a sample CUSTOM pinger for a hypothetical service that returns four integers in |
| response to a ping. It illustrates simple use of the three required methods, {\em init()}, |
| {\em stop()}, and {\em getStatistics()}. |
| |
| \begin{figure}[H] |
| \begin{verbatim} |
| import java.io.DataInputStream; |
| import java.io.InputStream; |
| import java.net.Socket; |
| import org.apache.uima.ducc.cli.AServicePing; |
| import org.apache.uima.ducc.cli.ServiceStatistics; |
| |
| public class CustomPing |
| extends AServicePing |
| { |
| String host; |
| String port; |
| public void init(String args, String endpoint) throws Exception { |
| // Parse the service endpoint, which is a String of the form |
| // host:port |
| String[] parts = endpoint.split(":"); |
| host = parts[1]; |
| port = parts[2]; |
| } |
| |
| public void stop() { } |
| |
| private long readLong(DataInputStream dis) throws Exception { |
| return Long.reverseBytes(dis.readLong()); |
| } |
| |
| public ServiceStatistics getStatistics() { |
| // Contact the service, interpret the results, and return a state |
| // object for the service. |
| ServiceStatistics stats = new ServiceStatistics(false, false,"<NA>"); |
| try { |
| Socket sock = new Socket(host, Integer.parseInt(port)); |
| DataInputStream dis = new DataInputStream(sock.getInputStream()); |
| |
| long stat1 = readLong(dis); long stat2 = readLong(dis); |
| long stat3 = readLong(dis); long stat4 = readLong(dis); |
| |
| stats.setAlive(true); stats.setHealthy(true); |
| stats.setInfo( "S1[" + stat1 + "] S2[" + stat2 + |
| "] S3[" + stat3 + "] S4[" + stat4 + "]" ); |
| } catch ( Throwable t) { |
| t.printStackTrace(); |
| stats.setInfo(t.getMessage()); |
| } |
| return stats; |
| } |
| } |
| \end{verbatim} |
| \caption{Sample UIMA-AS Service Pinger} |
| \label{fig:service.custom.pinger} |
| |
| \end{figure} |
| |
| \subsection{Building And Testing Your Pinger} |
| This section provides the information needed to use the pinger API and build a |
| custom pinger. |
| |
| \paragraph{1. Establish a compilation CLASSPATH} One DUCC jar is required in the CLASSPATH to build your pinger: |
| \begin{verbatim} |
| DUCC_HOME/lib/uima-ducc-cli.jar |
| \end{verbatim} |
| This provides the definition for the {\em AServicePing} and {\em ServiceStatistics} classes. |
| |
| \paragraph{2. Create a registration}Next, create a service registration for the pinger. While |
| debugging, it is useful set the directive |
| \begin{verbatim} |
| service_ping_dolog = true |
| \end{verbatim} |
| This will log any output from {\tt System.out.println()} to the declared log directory |
| for the service. If not specified in the reqistration, this directory is: |
| \begin{verbatim} |
| $HOME/ducc/logs/S-<serviceid>/services |
| \end{verbatim} |
| where {\tt$<$servicid$>$} is the DUCC-assigned ID of your service. |
| |
| Once the pinger is debugged you may want to turn logging off. |
| \begin{verbatim} |
| service_ping_dolog = false |
| \end{verbatim} |
| |
| If your pinger requires a different version of Java than is used by DUCC, include a |
| setting for the JAVA\_HOME variable in the environment option. |
| |
| A sample service registration may look something like the following. Note that you do not need |
| to include any of the DUCC jars in the classpath for the pinger. DUCC will add the jars it |
| requires to interact with the pinger automatically. (However you may need other jars to |
| provide UIMA, UIMA-AS, ActiveMQ, Spring, or other function.) |
| \begin{verbatim} |
| bash-3.2$ cat myping.svc |
| |
| description = Ping-only service |
| service_request_endpoint = CUSTOM:localhost:7175 |
| service_ping_class = CustomPing |
| service_ping_classpath = /myhome/CustomPing.class |
| service_ping_dolog = true |
| service_ping_timeout = 500 |
| service_ping_aruments = Arg1 Arg2 |
| service_ping_jvm_args = -DXmx50M |
| environment = JAVA_HOME=/share/jdk1.8 OTHER_VARIABLE=something |
| \end{verbatim} |
| |
| \paragraph{3. Register and start the service and pinger} Start up your custom service so the pinger with |
| the registration containing lines similar to those above. As soon as the service instance is in |
| DUCC state {\em Running} the SM starts the pinger. |
| |
| |
| Check the web server to make sure the service ``comes alive''. Check your pinger's |
| debugging log if it doesn't. Once registered, you can dynamically modify and restart the pinger at any time without |
| re-registering the service or restarting the service by use of the {\tt $--$modify} option of the |
| \hyperref[sec:cli.ducc-services]{\em ducc\_services CLI:} |
| \begin{verbatim} |
| ducc_services --modify <serviceid> --service_ping_dolog true |
| ducc_services --modify <serviceid> --service_ping_class OtherCustsomPing |
| --service_ping_classpath /myhome/OtherCustomPing.class |
| |
| \end{verbatim} |
| where $<$serviceid$>$ is the id returned when you registered the pinger. |
| |
| \paragraph{4. If all else fails ...} |
| If your pinger does not work and you cannot determine the reason, be sure you enable {\em service\_ping\_dolog} and |
| look in your log directory, as most problems with pingers are reflected there. As a last resort, you can |
| inspect the the Service Manager's log in |
| \begin{verbatim} |
| $DUCC_HOME/logs/sm.log |
| \end{verbatim} |
| |
| \subsection{Globally Registered Pingers} |
| \label{subsec:services.pingers} |
| |
| A user-built pinger may be registered with DUCC so that it can be globally used by any DUCC service. To do |
| this, a registration file containing only pinger-specific parameters is created in DUCC's run-time |
| directory. Such a pinger may then be designated for a service by using its registered filename |
| instead of its class in the {\em service\_ping\_class} field of a registration. There is no API or |
| CLI to register such a pinger; only a DUCC administrator may create a global ping registration. |
| |
| A globally-registered pinger may then be designated to run as a thread inside the SM or as a |
| process spawned and managed by the SM. A pinger that runs in a thread in the SM is |
| called an {\em internal} pinger, and one that runs in a process is called an {\em external} |
| pinger. An {\em internal} pinger generally has nearly unmeasurable impact on the system, |
| whereas {\em external} pingers will occupy full JVMs with processes of 50-100MB or more. |
| |
| A service may override any of the options of a globally-registered {\em external} pinger, |
| thus allowing significant reuse of existing code. Only the {\em service\_ping\_arguments} |
| of an {\em internal} pinger may be overridden however. |
| |
| The default UIMA-AS pinger is permanently registered as an {\em internal} pinger. |
| |
| Globally registered pingers use a special boolean property, not supported by the |
| {\em ducc\_services} API/CLI, ``internal'', to determine whether the pinger is |
| to be run internally to SM or as an external process. Only the DUCC administrator |
| may update a global pinger's registration to ``internal'', to insure such pingers |
| are properly vetted and approved by the installation. |
| |
| More Details of registering global pingers is found in the |
| \hyperref[chap:sm]{\em Administration section} of this document. |
| |
| \section{Sample Pinger} |
| |
| A sample custom UIMA-AS pinger is provided in the Examples directory shipped |
| with DUCC in |
| \begin{verbatim} |
| DUCC_HOME/examples/src/org/apache/uima/ducc/ping |
| \end{verbatim} |
| |
| This pinger increases or decreases the number of service instances based |
| on the queue statistics found by querying ActiveMQ. The goal of this |
| pinger is to maintain the ActiveMQ ``enqueued time'' to be no more than |
| some multiple of the average service time for a single item. The factor |
| used is a parameter passed in with the argument string. |
| |
| \subsection{Using the Sample Pinger} |
| The following arguments may be specified to use the sample pinger with any UIMA-AS service. The |
| {\em service\_ping\_arguments} are specific to this pinger. |
| \begin{verbatim} |
| service_ping_class=org.apache.uima.ducc.ping.SamplePing |
| service_ping_arguments=meta-timeout=15010,broker-jmx-port=1099,window=5,min=1, |
| max=20,max-growth=3,fast-shrink=true,goal=2.5 |
| service_ping_classpath = ${DUCC_HOME}/lib/uima-ducc/examples/*: |
| ${DUCC_HOME}/apache-uima/lib/*: |
| ${DUCC_HOME}/apache-uima/apache-activemq/lib/*: |
| ${DUCC_HOME}/lib//springframework/* |
| service_ping_dolog=True |
| service_ping_timeout=10000 |
| |
| instance_failures_window = ${ducc.sm.instance.failure.window} |
| instance_failures_limit = ${ducc.sm.instance.failure.max} |
| \end{verbatim} |
| |
| The full source for the sample pinger is found in |
| \begin{verbatim} |
| DUCC_HOME/examples/src/org/apache/uima/ducc/ping/SamplePing.java |
| \end{verbatim} |
| |
| The following arguments are accepted by this pinger and may be specified in a single single |
| comma-delimited string containing the following initialization parameters: |
| \begin{description} |
| \item[meta-timeout] Defines how long to wait for {\em get\_meta} to return. |
| \item[broker-jmx-port] Defines the JMX port of the service's broker. |
| \item[window] Defines the shrinkage/growth window size, in minutes. |
| \item[enable-log] Enable extra logging. |
| \item[min] The minimum number of service instances to maintain. |
| \item[max] The maximum number of service instances to allow. |
| \item[max-growth] The maximum number of instances to grow in a |
| single request. |
| \item[fast-shrink] If set, allow services to shrink if the |
| queue depth is 0, even if consumer are connected. Otherwise |
| we do not shrink if consumers are attached to the queue. |
| \item[goal] The multiplier of the ActiveMQ Broker's {\em average enqeue} |
| time to attempt to maintain by managing the number of instances. |
| \end{description} |
| |
| |
| \subsection{Understanding Sample Pinger} |
| |
| The best way to understand this pinger is to examine the code itself in the |
| Examples directory. Here we provide a brief line-by-line synopsis of the code. |
| |
| \paragraph{void init(String args, String ep)} |
| This required method examines the service arguments and endpoint and establishes a monitor |
| to issue {\em get-meta} calls to the service and {\em JMS} calls to the |
| ActiveMq broker. The argument string {\em args} is described above. The |
| endpoint {\em ep} is the service endpoint used to register the service. |
| |
| \paragraph{Lines 100-119} |
| These lines parse the endpoint {\em ep} its components comprising the |
| UIMA-AS queue name and the URL to the service broker. |
| |
| \paragraph{Lines 121-125} |
| These lines disable most UIMA-AS logging as these messages can be quite |
| numerous. However, during debugging it may be desired to change the logging |
| levels here. |
| |
| \paragraph{Lines 130-172} |
| These lines parse the service argument string {\em args} into its constituent |
| parts and places the values in variables. They initialize the expansion |
| and deletion window and normalize it to one slot per minute, regardless of |
| the actual ping rate. |
| |
| The window normalization uses the DUCC-supplied value {\em monitor\_rate} |
| to determine the number of slots in the windows. |
| |
| \paragraph{Lines 176-177} |
| These lines initialize the DUCC-supplied {\em UimaAsServiceMonitor} that |
| queries the UIMA-AS queues, and it resets the queue statistics via JMX so the |
| monitor can make accurate measurements. |
| |
| \paragraph{Lines 181-187} |
| These lines implement the required {\em stop} method which is invoked when |
| the Service Manager needs to stop the pinger for any reason. They stop the |
| ActiveMQ queue monitor and emit a shutdown message. |
| |
| \paragraph{Lines 191-240} |
| These lines define the required {\em getStatistics} method. This |
| method collects ActiveMQ statistics, issue {\em get-meta} to the |
| service to see if it is responding, sets the formatted information |
| string into the ping reply, and invokes the code to calculate a |
| potential redeployment of service instances. |
| |
| \paragraph{Lines 245-248} |
| These lines override the optional {\em getLastUse} method which |
| simply returns the time of last known use of the service. The actual |
| value is calculated in the pinger-specific {\em calculateNewDeployment} |
| method, described below. |
| |
| \paragraph{Lines 253-298} |
| These lines define the pinger-specific {\em calculateNewDeployment} |
| method. This is invoked after {\em get-meta} is called and after the |
| UIMA-AS queue has been queried in ActiveMQ. This is the key method of |
| this pinger. It uses information passed in on the last ping from the |
| Service Manager in conjunction with information in the ActiveMQ queue |
| to determine if more, or fewer service instances are needed to meet the |
| performance goals. If fewer instances are needed, it selects specific |
| instances to stop. The method is |
| \hyperref[subsec:services.calculate-new]{\em described in detail} below. |
| |
| \paragraph{Lines 407-410} |
| These lines override the optional {\em getAdditions} method. The method |
| returns the number of new service instances required to meet performance |
| goals, as calculated in |
| \hyperref[subsec:services.calculate-new]{\em calculateNewDeployment}. |
| |
| Regardless of what this method returns, the Service Manager may choose |
| not to start new instances, based on its configured maximum, |
| {\em ducc.sm.max.instances} as defined in {\em ducc.properties}. |
| |
| \paragraph{Lines 416-419} |
| These lines override the optional {\em getDeletions} method. This |
| method returns the specific service instances to be stopped, if any. |
| |
| The DUCC-assigned unique IDs of all service instances are passed in to |
| the pinger on each ping. These instances are monotonically increasing |
| over time so pingers may assume that lower numbers represent older |
| instances. |
| |
| |
| \paragraph{Lines 429-480} |
| These lines define a class used as a call-back on the UIMA-AS |
| {\em get-meta} requests to determine the host and PID of the |
| service instance responding to the {\em get-meta}. If the |
| {\em get-meta} request should timeout, this information can be used to |
| help identify ailing or overloaded service instances. |
| |
| \subsection{Calculating New Deployments in the Pinger} |
| \label{subsec:services.calculate-new} |
| |
| his section details the use of ActiveMQ queue statistics |
| in conjunction with the Service Monitor data to calculate the number |
| of service instances to increase or decrease. |
| |
| It is important that this code be very careful about ``smoothing'' the |
| performance statistics to keep growth and shrinkage stable. Things |
| to take into consideration include: |
| \begin{enumerate} |
| \item Immediately after a new service instance becomes available to |
| serve, if there is demand for this service, the ActiveMQ statistics |
| will fluctuate for a few minutes until traffic stabilizes. Thus |
| decisions based on these statistics must reflect history as well as |
| current information. |
| |
| \item Immediately after a client begins to use a service, the statistics |
| will also fluctuate, again requiring smoothing. |
| |
| \item The DUCC work dispatching model will not over-dispatch work to the |
| job processes. Thus actual demand on a service is a function of the |
| number of actively deployed and initialized JPs. If the number of |
| JPs decreases due to preemption, demand on the service by that job |
| will decrease proportionally. Similarly, demand can increase as the |
| job expands. |
| |
| It is common for demand on a service to ramp up slowly as |
| a job enters the system, and increase rapidly as a job completes its |
| initialization phase and starts to double. Thus, the ActiveMQ statistics |
| can be quite erratic for a while, until the job stabilizes. |
| |
| This again requires some sort of smoothing of the data when making |
| decisions about service growth and shrinkage. |
| \end{enumerate} |
| |
| To handle this data smoothing, the SamplePing classes uses two time-based {\em windows}, one for |
| growth, and one for shrinkage, to keep growth and shrinkage stable. The window size is defined |
| in the service ping argument {\em window}. |
| Each window period, if more |
| services are needed, a mark is made in the current slot of the {\em expansion window}; otherwise |
| the current slot is cleared. Similarly, each period, if fewer services are needed, a mark is |
| made in the {\em shrinkage window}; otherwise, the current slot is cleared. |
| |
| After the marks are made, if the {\em expansion window} has all slots filled, |
| a request for new processes is made; thus, a short period of increased does not |
| destabilize the system with a request for services that may be of little use. |
| Additionally, when a request is made, the number of new processes requested is |
| capped by the ping argument {\em max-growth} to insure that the service |
| grows smoothly. And finally, if the service is already at some configured maximum |
| number of instances, defined by the {\em max} parameter, no additional instances |
| are requested. |
| |
| Similarly, the {\em shrinkage window} is used to govern shrinkage. All slots must be |
| filled, indicating the service has been over-provisioned for a while, before a request |
| is made to delete instances. The number of instances is never reduced below the |
| configured {\em min} value. As well, this particular pinger never shrinks by more than |
| a single instance at a time, on the reasoning that it is more costly to start a new |
| service than to maintain one for too long. Only if there is no long-term use of |
| the extra instances are they reduced (as |
| determined by the window). |
| |
| Given this introduction, we describe the key method in detail. |
| |
| \paragraph{Lines 262-277} |
| These lines extract four quantities from the ActiveMQ statistics: |
| \begin{enumerate} |
| \item Average enqueue time, {\em eT} |
| \item Current queue depth, {\em Q} |
| \item The current number of service consumers {\em cc} |
| \item The current number of service producers {\em pc} |
| \end{enumerate} |
| |
| The code then gets the DUCC IDs of all the currently started service |
| instances, and the number of instances that are started but still in |
| their ``initialization'' phase. This is important because instances that |
| are still initializing are not servicing the queue, but will soon start |
| to do so. The current ActiveMQ statistics reflect do NOT yet reflect |
| this however, they reflect only the instances that are actually serving. |
| |
| Finally, if there are service producers, we note the time of day to |
| return to the SM as the last known use of this service by some process. |
| |
| \paragraph{Lines 267} |
| This line calculates the number of Java threads per service instance, needed to calculate the |
| maximum capacity of the service in its current deployment. |
| |
| (Note that in each UIMA-AS service, UIMA-AS itself occupies one thread, used to |
| manage the service, and this thread manifests itself as a consumer |
| on the queue.) |
| |
| \paragraph{Line 301} |
| This declares {\em new\_ni}, the number of additional instances, if any. |
| At the end of this method, new\_ni will either be 0 or $>$0. |
| |
| \paragraph{Lines 303-312} |
| If the current queue depth is 0 (Q $==$ 0), we know a number of things: |
| \begin{enumerate} |
| \item The service is not over-provisioned; there is no work queued and |
| waiting for some service. We therefor do not need to expand. |
| \item If there are no consumers, i.e. no clients that need work done, |
| we are potentially over-provisioned, so we fill in a slot in the |
| expansion window. |
| |
| If there {\em are} consumers, we may not want to |
| shrink because it is possible that one of the service instances is |
| busy; we cannot tell. So we allow the {\em fast-shrink} |
| ping argument to govern whether or not connected consumers may |
| prevent service shrinkage. |
| \end{enumerate} |
| |
| There is nothing else that can be said about a service if its |
| current queue depth is 0. |
| |
| \paragraph{Lines 312-360} |
| |
| If the queue depth is non-zero we are able to calculate the total |
| service capacity and the amount each instance contributes to the |
| total capacity. From this we can determine |
| \begin{enumerate} |
| \item whether the service is performing at or near its goal, |
| \item if the service is performing worse than its goal, how many |
| new instances are needed to meet the goal, and |
| \item if the service is performing better than its goal, how many |
| instances can be given up and still meet the goal. |
| \end{enumerate} |
| |
| Details follow. |
| |
| \paragraph{Lines 314 and 315} |
| The average time a single instance takes to serve a single request, {\em Ti} is given |
| by the simple formula |
| |
| \begin{verbatim} |
| Ti = (eT / Q) * active |
| \end{verbatim} |
| where |
| |
| \begin{description} |
| \item[eT] is the average time an item stays in queue (from AMQ), |
| \item[Q] is the current queue depth (from AMQ), |
| \item[active] is the current number of service instances (from SM) |
| \end{description} |
| |
| Therefore the time taken by a single thread {\em Ti} is given by |
| \begin{verbatim} |
| Tt = Ti * nthreads |
| \end{verbatim} |
| |
| \paragraph{Lines 319 and 320} |
| We want {\em Tt} to become close to the current |
| \begin{verbatim} |
| Tt * goal |
| \end{verbatim} |
| |
| where {\em goal} is given by the ping arguments. The |
| current ratio of actual service time to desired is then given by |
| \begin{verbatim} |
| r = eT / g |
| \end{verbatim} |
| |
| Because we know that the DUCC job driver will never over-commit; that is, |
| we know the current demand will remain constant unless the jobs using the |
| service expand or contract (which are relatively rare events), we can state |
| that the number of service instances required is directly proportional |
| to {\em r}. |
| |
| If $r > 1$ we may need more instances to meet our {\em goal} and if |
| $r < 1$ we may be over-provisioned. |
| |
| \paragraph{Lines 325-347} |
| If $r > 1$ we may be over-provisioned. We calculate the number of required |
| instances by multiplying the current instances by {\em r} and rounding down. |
| We account for instances that we know are starting but not yet started, |
| cap on max instances per service, and again on max growth per cycle. |
| |
| If we still require additions, we make a mark in the expansion window, |
| otherwise we clear the expansion window. |
| |
| \paragraph{Lines 349-360} |
| If $r < 1$ we need to calculate shrinkage. Because starting instances |
| is expensive we conservatively use $r < .5$ instead and make a mark |
| in the shrinkage window. |
| |
| Otherwise we clear the mark in the shrinkage window. |
| |
| \paragraph{Lines 367-396} |
| Finally we sum across the shrinkage and expansion windows. If either |
| window is full, we schedule growth (line 375, set the variable {\em additions}) |
| or shrinkage (line 388, set {\em deletions}). |
| |
| Note that to schedule shrinkage, we must choose a specific instance. In this |
| case we choose the {\em newest} instance, i.e. the one with the largest |
| DUCC ID, as it is most likely not to have initialized, or perhaps not to |
| have ``warmed up'' (i.e. caches filled, etc.). We could choose more than |
| one but this pinger is conservative and only shrinks by one instance |
| each time. |
| |
| \subsection{Summary of Sample Pinger} |
| This pinger illustrates these functions over-and above the functions provided |
| by the default UIMA-AS pinger: |
| \begin{enumerate} |
| \item Use of pinger-specific arguments |
| \item Use of information provided by SM on each ping (service instances |
| active, total service instances, |
| \item Use of performance information acquired from ActiveMQ |
| \item Requesting new service instances of the SM |
| \item Requesting that instances be removed by SM, |
| \item Setting of last-use of a service |
| \end{enumerate} |
| |
| It illustrates one mechanism for smoothing growth and shrinkage of a service |
| to prevent thrashing in your system. |
| |
| It illustrates one mechanism for determining the actual performance of |
| a service by analyzing ActiveMQ queueing statistics. |
| |
| It illustrates the use of ``globally registered pingers.'' |