<?xml version="1.0" encoding="UTF-8"?> | |
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" | |
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"[ | |
<!ENTITY % uimaents SYSTEM "../entities.ent"> | |
]> | |
<!-- | |
Licensed to the Apache Software Foundation (ASF) under one | |
or more contributor license agreements. See the NOTICE file | |
distributed with this work for additional information | |
regarding copyright ownership. The ASF licenses this file | |
to you under the Apache License, Version 2.0 (the | |
"License"); you may not use this file except in compliance | |
with the License. You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, | |
software distributed under the License is distributed on an | |
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | |
KIND, either express or implied. See the License for the | |
specific language governing permissions and limitations | |
under the License. | |
--> | |
<chapter id="ugr.async.mt"> | |
<title>Monitoring and Tuning</title> | |
<para> | |
UIMA AS deployments can involve many separate parts running on many | |
different machines. Monitoring facilities and tools built into UIMA AS help | |
in collecting information on the performance of these parts. You can | |
use the monitoring information to identify deployment issues, such as | |
bottlenecks, and address these with various approaches that alter the | |
deployment choices; this is what we mean by "tuning the deployment". | |
</para> | |
<para> | |
Monitoring happens in several parts: | |
<itemizedlist> | |
<listitem><para>Each node running a JVM hosting UIMA AS services or clients provides | |
JMX information tracking many items of interest.</para></listitem> | |
<listitem> | |
<para>UIMA AS services include some of these measurements in the information | |
passed back to its client, along with the returned CAS. This allows | |
clients to collect and aggregate measurements over a cluster of remotely-deployed | |
components.</para> | |
</listitem> | |
<listitem> | |
<para>UIMA AS includes a Monitor component that can optionally be turned on to | |
sample the JMX data at | |
a specified interval, and write the results into the UIMA log (or to the | |
console if no log is configured) in several formats, one of which is | |
convenient for reading, and the other is convenient for importing into | |
a spreadsheet program.</para> | |
</listitem> | |
</itemizedlist> | |
</para> | |
<para>Tuning a UIMA AS application is done using several approaches: | |
<itemizedlist> | |
<listitem><para>changing the topology of the scaleout - for instance, allocating more | |
nodes to some parts, less to others</para></listitem> | |
<listitem> | |
<para>adjusting deployment parameters, such as the number of CASes in a CasPool, or | |
the number of threads assigned to do various tasks</para> | |
</listitem> | |
</itemizedlist> | |
</para> | |
<para> | |
In addition, tuning can involve changing the actual analytic algorithms | |
to tune them - but that is beyond the scope of this chapter. | |
</para> | |
<section id="ugr.async.mt.monitoring"> | |
<title>Monitoring</title> | |
<title>JMX</title> | |
<para>JMX (Java Management Extensions) is a standard Java mechanism that | |
is used to monitor and control Java applications. A standard Java tool | |
provided with most Javas, called | |
<code>jconsole</code>, is a GUI based application that can connect to | |
a JVM and display the information JMX is providing, and also control | |
the application in application-defined specific ways.</para> | |
<para>JMX information is provided by a hierarchy of JMX Beans. More | |
background and information on JMX and the jconsole tool is available on the web.</para> | |
<para>This section will first describe the basic JMX Beans, and then | |
later describe a UIMA AS monitor tool that can sample the values of these beans at | |
a specified interval and write the results to the UIMA log in various | |
formats.</para> | |
<section id="ugr.async.mt.jmx_monitoring"> | |
<title>JMX Information from UIMA AS</title> | |
<para>JMX information is provided by every UIMA AS service or client as it runs. | |
Each item provided is either an instantaneous measurement ( | |
e.g. the number of items in a queue) or an accumulating measurement ( | |
e.g. the number of CASes processed). Accumulating measures | |
can be reset to 0 using standard JMX mechanisms.</para> | |
<para> | |
JMX information is provided on a JVM basis; a JVM can be hosting 0 or more | |
UIMA AS Services and/or clients. A UIMA AS Service is defined as a component | |
that connects to a queue | |
and accepts CASes to process. A UIMA AS Client, in contrast, sends CASes to | |
be processed; it can be a top level client, or | |
a UIMA AS Service having one or more AS Aggregate delegates, to which it is | |
sending CASes to be processed. | |
</para> | |
<para> | |
UIMA AS Services send | |
some of their measurements back to the UIMA AS Clients that sent them CASes; those | |
clients incorporate these measurements into aggregate statistics that they provide. | |
This allows accumulating information among components deployed over many nodes | |
interconnected on a network. | |
</para> | |
<para> | |
Some JMX measurement items are constant, and document various settings, descriptors, | |
names, etc., in use by the (one or more) UIMA AS services and/or | |
clients running on this JVM.</para> | |
<para>Some time measurements are associated with running some process. These, | |
where possible, are cpu times, as measured by the thread or threads running the process, using the | |
ThreadMXBean class. On some Javas, thread-based cpu time may not be supported, however. In that | |
case, wall-clock time is used instead.</para> | |
<para> | |
If the process is multi-threaded, and the cpu has multiple cores, | |
you can get time measurements which exceed the wall clock interval, due to the process consuming | |
cpu time on multiple threads at once.</para> | |
<para>Timing information not associated with running code, such as idle time, is measured as wall-clock time.</para> | |
<para>The following sections describe the JMX Beans implemented by UIMA AS. The | |
Notes in the tables include the following flags: | |
<itemizedlist> | |
<listitem> | |
<para><emphasis role="bold">inst/acc/const</emphasis> - instantaneous, accumulating, or constant measurement</para> | |
</listitem> | |
<listitem> | |
<para><emphasis role="bold">sent</emphasis> - sent up to the invoking client with returning CAS</para> | |
</listitem> | |
</itemizedlist> | |
</para> | |
<section id="ugr.async.mt.jmx_monitoring.service"> | |
<title>UIMA AS Services JMX measures</title> | |
<para>The next 4 tables detail the JMX measures provided by UIMA AS services.</para> | |
<section id="ugr.async.mt.jmx_monitoring.constant.service"> | |
<title>Service information</title> | |
<informaltable frame="all"> | |
<tgroup cols="4" colsep="1" rowsep="1"> | |
<colspec colname="c1" colwidth="2*"/> | |
<colspec colname="c2" colwidth="5*"/> | |
<colspec colname="c3" colwidth="1*"/> | |
<colspec colname="c4" colwidth="1*"/> | |
<thead> | |
<row> | |
<entry align="center">Name</entry> | |
<entry align="center">Description</entry> | |
<entry align="center">Units</entry> | |
<entry align="center">Notes</entry> | |
</row> | |
</thead> | |
<tbody> | |
<row> | |
<entry>state</entry> | |
<entry>The state of the service (Running, Initializing, Disabled, Stopping, Failed)</entry> | |
<entry>string</entry> | |
<entry>inst</entry> | |
</row> | |
<row> | |
<entry>input queueName</entry> | |
<entry>The name of the input queue</entry> | |
<entry>string</entry> | |
<entry>const</entry> | |
</row> | |
<row> | |
<entry>reply queueName</entry> | |
<entry>The internally generated name of the reply queue</entry> | |
<entry>string</entry> | |
<entry>const</entry> | |
</row> | |
<row> | |
<entry>broker URL</entry> | |
<entry>The URL of the JMS queue broker</entry> | |
<entry>string</entry> | |
<entry>const</entry> | |
</row> | |
<row> | |
<entry>deployment descriptor</entry> | |
<entry>The path to the deployment descriptor for this service</entry> | |
<entry>string</entry> | |
<entry>const</entry> | |
</row> | |
<row> | |
<entry>is CAS Multiplier</entry> | |
<entry>is this Service a CAS Multiplier</entry> | |
<entry>boolean</entry> | |
<entry>const</entry> | |
</row> | |
<row> | |
<entry>is top level</entry> | |
<entry>is this Service a top level service, meaning that it connects to | |
an input queue on a queue broker</entry> | |
<entry>boolean</entry> | |
<entry>const</entry> | |
</row> | |
<row> | |
<entry>service key</entry> | |
<entry>The key name used in the associated Analysis Engine aggregate that specifies | |
this as a delegate</entry> | |
<entry>string</entry> | |
<entry>const</entry> | |
</row> | |
<row> | |
<entry>is Aggregate</entry> | |
<entry>is this service an AS Aggregate (i.e., has delegates and | |
is marked async="true")</entry> | |
<entry>boolean</entry> | |
<entry>const</entry> | |
</row> | |
<row> | |
<entry>analysisEngine instance count</entry> | |
<entry>The number of replications of the AS Primitive</entry> | |
<entry>count</entry> | |
<entry>const</entry> | |
</row> | |
</tbody> | |
</tgroup> | |
</informaltable> | |
</section> | |
<section id="ugr.async.mt.jmx_monitoring.service.performance"> | |
<title>Service Performance Measurements</title> | |
<informaltable frame="all"> | |
<tgroup cols="4" colsep="1" rowsep="1"> | |
<colspec colname="c1" colwidth="2*"/> | |
<colspec colname="c2" colwidth="5*"/> | |
<colspec colname="c3" colwidth="1*"/> | |
<colspec colname="c4" colwidth="1*"/> | |
<thead> | |
<row> | |
<entry align="center">Name</entry> | |
<entry align="center">Description</entry> | |
<entry align="center">Units</entry> | |
<entry align="center">Notes</entry> | |
</row> | |
</thead> | |
<tbody> | |
<row> | |
<entry>number of CASes processed</entry> | |
<entry>The number of CASes processed by a component</entry> | |
<entry>count - CASes</entry> | |
<entry>acc</entry> | |
</row> | |
<row> | |
<entry>cas deserialization time</entry> | |
<entry>The thread time spent deserializing CASes (receiving, either from client, or replies from delegates)</entry> | |
<entry>milli seconds</entry> | |
<entry>acc</entry> | |
</row> | |
<row> | |
<entry>cas serialization time</entry> | |
<entry>The thread time spent serializing CASes (sending, either to delegates or back to client)</entry> | |
<entry>count - CASes</entry> | |
<entry>acc</entry> | |
</row> | |
<row> | |
<entry>analysis time</entry> | |
<entry>The thread time spent in AS Primitive analytics</entry> | |
<entry>milli seconds</entry> | |
<entry>acc</entry> | |
</row> | |
<row> | |
<entry>idle time</entry> | |
<entry>The wall clock time a service has been idle. Measure starts | |
after a reply is sent until the next request is receives, and excludes | |
serialization/deserialization times.</entry> | |
<entry>milli seconds</entry> | |
<entry>acc</entry> | |
</row> | |
<row> | |
<entry>cas pool wait time</entry> | |
<entry>The time spent waiting for a CAS to become available in the CAS Pool</entry> | |
<entry>milli seconds</entry> | |
<entry>acc</entry> | |
</row> | |
<row> | |
<entry>shadow cas pool wait time</entry> | |
<entry>A shadow cas pool is established for services which are Cas Multipliers. | |
This is the time spent waiting for a CAS to become available in the Shadow CAS Pool.</entry> | |
<entry>milli seconds</entry> | |
<entry>acc</entry> | |
</row> | |
<row> | |
<entry>time spent in CM getNext</entry> | |
<entry>The time spent inside Cas Multipliers, getting another CAS. | |
This time (doesn't include / includes ????) | |
the time | |
spent waiting for a CAS to become available in the CAS Pool waiting for a CAS to become available in the CAS Pool</entry> | |
<entry>milli seconds</entry> | |
<entry>acc</entry> | |
</row> | |
<row> | |
<entry>process thread count</entry> | |
<entry>The number of threads available to process requests</entry> | |
<entry>count</entry> | |
<entry>inst</entry> | |
</row> | |
</tbody> | |
</tgroup> | |
</informaltable> | |
</section> | |
<section id="ugr.async.mt.jmx_monitoring.service.internal.queues"> | |
<title>Co-located Service Queues</title> | |
<para>Co-located services use light-weight, internal (not JMS) queues. | |
These have similar measures as are used with JMS queues, and include | |
these measures for both the input queues and the reply (output) queues: | |
<informaltable frame="all"> | |
<tgroup cols="4" colsep="1" rowsep="1"> | |
<colspec colname="c1" colwidth="2*"/> | |
<colspec colname="c2" colwidth="5*"/> | |
<colspec colname="c3" colwidth="1*"/> | |
<colspec colname="c4" colwidth="1*"/> | |
<thead> | |
<row> | |
<entry align="center">Name</entry> | |
<entry align="center">Description</entry> | |
<entry align="center">Units</entry> | |
<entry align="center">Notes</entry> | |
</row> | |
</thead> | |
<tbody> | |
<row> | |
<entry>consumer count</entry> | |
<entry>The number of threads configured to read the queue</entry> | |
<entry>count</entry> | |
<entry>const</entry> | |
</row> | |
<row> | |
<entry>dequeue count</entry> | |
<entry>The number of CASes that have been read from this queue</entry> | |
<entry>count</entry> | |
<entry>acc</entry> | |
</row> | |
<row> | |
<entry>queue size</entry> | |
<entry>The number of CASes in the queue</entry> | |
<entry>count</entry> | |
<entry>inst</entry> | |
</row> | |
</tbody> | |
</tgroup> | |
</informaltable> | |
</para> | |
</section> | |
<section id="ugr.async.mt.jmx_monitoring.service.error"> | |
<title>Service Error Measurements</title> | |
<informaltable frame="all"> | |
<tgroup cols="4" colsep="1" rowsep="1"> | |
<colspec colname="c1" colwidth="2*"/> | |
<colspec colname="c2" colwidth="5*"/> | |
<colspec colname="c3" colwidth="1*"/> | |
<colspec colname="c4" colwidth="1*"/> | |
<thead> | |
<row> | |
<entry align="center">Name</entry> | |
<entry align="center">Description</entry> | |
<entry align="center">Units</entry> | |
<entry align="center">Notes</entry> | |
</row> | |
</thead> | |
<tbody> | |
<row> | |
<entry>process Errors</entry> | |
<entry>The number of process errors</entry> | |
<entry>count</entry> | |
<entry>acc</entry> | |
</row> | |
<row> | |
<entry>getMetadata Errors</entry> | |
<entry>The number of getMetadata errors</entry> | |
<entry>count</entry> | |
<entry>acc</entry> | |
</row> | |
<row> | |
<entry>cpc Errors</entry> | |
<entry>The number of Collection Process Complete (cpc) errors</entry> | |
<entry>count</entry> | |
<entry>acc</entry> | |
</row> | |
</tbody> | |
</tgroup> | |
</informaltable> | |
</section> | |
</section> | |
<section id="ugr.async.mt.jmx_monitoring.client"> | |
<title>Application Client information</title> | |
<para>This section describes monitoring | |
information provided by the UIMA AS Client APIs. | |
Any code that uses the <xref linkend="ugr.ref.async.api.organization">UIMA AS Client APIs</xref>, | |
such as the example application | |
client <code>RunRemoteAsyncAE</code>, will have a set of these | |
JMX measures. Currently no additional | |
tooling (beyond standard tools like <code>jconsole</code>) are provided to | |
view these. | |
</para> | |
<section id="ugr.async.mt.jmx_monitoring.client.measures"> | |
<informaltable frame="all"> | |
<tgroup cols="4" colsep="1" rowsep="1"> | |
<colspec colname="c1" colwidth="2*"/> | |
<colspec colname="c2" colwidth="5*"/> | |
<colspec colname="c3" colwidth="1*"/> | |
<colspec colname="c4" colwidth="1*"/> | |
<thead> | |
<row> | |
<entry align="center">Name</entry> | |
<entry align="center">Description</entry> | |
<entry align="center">Units</entry> | |
<entry align="center">Notes</entry> | |
</row> | |
</thead> | |
<tbody> | |
<row> | |
<entry>application Name</entry> | |
<entry>A user-supplied string identifying the application</entry> | |
<entry>string</entry> | |
<entry>const</entry> | |
</row> | |
<row> | |
<entry>service queue name</entry> | |
<entry>The name of the service queue this client connects to</entry> | |
<entry>string</entry> | |
<entry>const</entry> | |
</row> | |
<row> | |
<entry>serialization method</entry> | |
<entry>either xmi or binary. This is the serialization the client will use to send | |
CASes to the service, and also tells the service which serialization to use | |
in sending the CASes back.</entry> | |
<entry>string</entry> | |
<entry>const</entry> | |
</row> | |
<row> | |
<entry>cas pool size</entry> | |
<entry>This client's cas pool size, limiting the number of simultaneous outstanding requests in process</entry> | |
<entry>count</entry> | |
<entry>const</entry> | |
</row> | |
<row> | |
<entry>total number of CASes processed</entry> | |
<entry>count of the total number of CASes sent from this client. Note: in the case | |
where the service is a Cas Multiplier, the "child" CASes are not included in this count.</entry> | |
<entry>count</entry> | |
<entry>acc</entry> | |
</row> | |
<row> | |
<entry>total time to process</entry> | |
<entry>total thread time spent in processing all CASes, including time in remote delegates</entry> | |
<entry>milli seconds</entry> | |
<entry>acc</entry> | |
</row> | |
<row> | |
<entry>average process time</entry> | |
<entry>total number of CASes processed / total time to process</entry> | |
<entry>milli seconds</entry> | |
<entry>inst</entry> | |
</row> | |
<row> | |
<entry>max process time</entry> | |
<entry>maximum thread time spent in processing a CAS, including time in remote delegates</entry> | |
<entry>milli seconds</entry> | |
<entry>inst</entry> | |
</row> | |
<row> | |
<entry>total serialization time</entry> | |
<entry>total thread time spent in serializing, both to delegates | |
(and recursively, to their delegates) and replies back to senders</entry> | |
<entry>milli seconds</entry> | |
<entry>acc</entry> | |
</row> | |
<row> | |
<entry>average serialization time</entry> | |
<entry>average thread time spent in serializing a CAS, both to delegates | |
(and recursively, to their delegates) and replies back to senders</entry> | |
<entry>milli seconds</entry> | |
<entry>inst</entry> | |
</row> | |
<row> | |
<entry>max serialization time</entry> | |
<entry>maximum thread time spent in serializing a CAS, both to delegates | |
(and recursively, to their delegates) and replies back to senders</entry> | |
<entry>milli seconds</entry> | |
<entry>inst</entry> | |
</row> | |
<row> | |
<entry>total deserialization time</entry> | |
<entry>total thread time spent in deserializing, both replies from delegates and CASes from upper | |
level components being sent to lower level ones.</entry> | |
<entry>milli seconds</entry> | |
<entry>acc</entry> | |
</row> | |
<row> | |
<entry>average deserialization time</entry> | |
<entry>average thread time spent in deserializing, both replies from delegates and CASes from upper | |
level components being sent to lower level ones.</entry> | |
<entry>milli seconds</entry> | |
<entry>inst</entry> | |
</row> | |
<row> | |
<entry>max deserialization time</entry> | |
<entry>maximum thread time spent in deserializing, both replies from delegates and CASes from upper | |
level components being sent to lower level ones.</entry> | |
<entry>milli seconds</entry> | |
<entry>inst</entry> | |
</row> | |
<row> | |
<entry>total idle time</entry> | |
<entry>total wall clock time a top-level service thread has been idle since the thread was last used. | |
If there is more than one service thread, this number is the sum.</entry> | |
<entry>milli seconds</entry> | |
<entry>acc</entry> | |
</row> | |
<row> | |
<entry>average idle time</entry> | |
<entry>average wall clock time all top-level service threads have been idle since they were last used</entry> | |
<entry>milli seconds</entry> | |
<entry>inst</entry> | |
</row> | |
<row> | |
<entry>max idle time</entry> | |
<entry>maximum wall clock time a top-level service thread has been idle since the thread was last used</entry> | |
<entry>milli seconds</entry> | |
<entry>inst</entry> | |
</row> | |
<row> | |
<entry>total time waiting for reply</entry> | |
<entry>total wall clock time, measured from the time a CAS is sent to the top-level queue, until that CAS | |
is returned. Any generated CASes from Cas Multipliers are not counted in this measurement.</entry> | |
<entry>milli seconds</entry> | |
<entry>acc</entry> | |
</row> | |
<row> | |
<entry>average time waiting for reply</entry> | |
<entry>average wall clock time from the time a CAS is sent to the reply is received</entry> | |
<entry>milli seconds</entry> | |
<entry>inst</entry> | |
</row> | |
<row> | |
<entry>max time waiting for reply</entry> | |
<entry>maximum wall clock time from the time a CAS is sent to the reply is received</entry> | |
<entry>milli seconds</entry> | |
<entry>inst</entry> | |
</row> | |
<row> | |
<entry>total response latency time</entry> | |
<entry>total wall clock time, measured from the time a CAS is sent to the top-level queue, including | |
the serialization and deserialization times at the client, until that CAS | |
is returned. Any generated CASes from Cas Multipliers are not counted in this measurement.</entry> | |
<entry>milli seconds</entry> | |
<entry>acc</entry> | |
</row> | |
<row> | |
<entry>average response latency time</entry> | |
<entry>average wall clock time, measured from the time a CAS is sent to the top-level queue, including | |
the serialization and deserialization times at the client, until that CAS | |
is returned.</entry> | |
<entry>milli seconds</entry> | |
<entry>inst</entry> | |
</row> | |
<row> | |
<entry>max response latency time</entry> | |
<entry>maximum wall clock time, measured from the time a CAS is sent to the top-level queue, including | |
the serialization and deserialization times at the client, until that CAS | |
is returned.</entry> | |
<entry>milli seconds</entry> | |
<entry>inst</entry> | |
</row> | |
<row> | |
<entry>total time waiting for CAS</entry> | |
<entry>total wall-clock time spent waiting for a | |
free CAS to be available in the client's CAS pool, before | |
sending the CAS to input queue for the top level service. </entry> | |
<entry>milli seconds</entry> | |
<entry>acc</entry> | |
</row> | |
<row> | |
<entry>average time waiting for CAS</entry> | |
<entry>average wall-clock time spent waiting for a | |
free CAS to be available in the client's CAS pool</entry> | |
<entry>milli seconds</entry> | |
<entry>inst</entry> | |
</row> | |
<row> | |
<entry>max time waiting for CAS</entry> | |
<entry>maximum wall-clock time spent waiting for a | |
free CAS to be available in the client's CAS pool</entry> | |
<entry>milli seconds</entry> | |
<entry>inst</entry> | |
</row> | |
<row> | |
<entry>total number of CASes requested</entry> | |
<entry>total number of CASes fetched from the CAS pool</entry> | |
<entry>count</entry> | |
<entry>acc</entry> | |
</row> | |
</tbody> | |
</tgroup> | |
</informaltable> | |
</section> | |
<section id="ugr.async.mt.jmx_monitoring.client.error"> | |
<title>Client Error Measurements</title> | |
<informaltable frame="all"> | |
<tgroup cols="4" colsep="1" rowsep="1"> | |
<colspec colname="c1" colwidth="2*"/> | |
<colspec colname="c2" colwidth="5*"/> | |
<colspec colname="c3" colwidth="1*"/> | |
<colspec colname="c4" colwidth="1*"/> | |
<thead> | |
<row> | |
<entry align="center">Name</entry> | |
<entry align="center">Description</entry> | |
<entry align="center">Units</entry> | |
<entry align="center">Notes</entry> | |
</row> | |
</thead> | |
<tbody> | |
<row> | |
<entry>getMeta Timeout Error Count</entry> | |
<entry>number of times a getMeta timed out</entry> | |
<entry>count</entry> | |
<entry>acc</entry> | |
</row> | |
<row> | |
<entry>getMeta Error Count</entry> | |
<entry>number of times a getMeta request returned with an error</entry> | |
<entry>count</entry> | |
<entry>acc</entry> | |
</row> | |
<row> | |
<entry>process Timeout Error Count</entry> | |
<entry>number of times a process call timed out</entry> | |
<entry>count</entry> | |
<entry>acc</entry> | |
</row> | |
<row> | |
<entry>process Error Count</entry> | |
<entry>number of times a process call returned with an error</entry> | |
<entry>count</entry> | |
<entry>acc</entry> | |
</row> | |
</tbody> | |
</tgroup> | |
</informaltable> | |
</section> | |
</section> | |
</section> | |
<section id="ugr.async.mt.jmx_sampling"> | |
<title>Logging Sampled JMX information at intervals</title> | |
<para> | |
A common tuning procedure is to run a deployment for a fairly long time with a | |
typical load, and to see what and where hot spots develop. During this process, | |
it is sometimes useful to convert accumulating measurements into averages, perhaps | |
averages per CAS processed. | |
</para> | |
<para> | |
UIMA AS includes a monitor component, org.apache.uima.aae.jmx.monitor.JmxMonitor, | |
to sample JMX measures at specified intervals, | |
compute various averages, and write the results into the UIMA Log (or on the console | |
if no log is configured). The monitor program can be automatically enabled for any deployed service | |
by specifying <code>-D</code> parameters on the JVM command | |
line which launches the service, or, it can be run stand-alone; when run stand-alone, you provide an | |
argument specifying the JVM it is to connect to to get the JMX information. It only connects | |
to one JVM per run; typically, you would connect it to the top-level service. | |
</para> | |
<para> | |
The monitor outputs information for that service and its immediate delegates (local or remote); however, it | |
includes information from the complete recursive chain of delegates when computing its measures. You can | |
get detailed monitoring for sub-services by starting or attaching a monitor to those sub-services. | |
</para> | |
<para> | |
ActiveMQ uses Queue Brokers to manage the JMS queues used by UIMA AS. These brokers have JMX information | |
that is useful in tuning applications. The Monitor program identifies the Queue Broker being used by the | |
service, and connects to it and incorporates information about queue lengths (both the input queue | |
and the reply queue) into its measurements. | |
</para> | |
<section id="ugr.async.mt.jmx_sampling.configuring"> | |
<title>Configuring JVM to run the monitor</title> | |
<para>Specify the following JVM System Variable parameters to configure a UIMA AS Client or Service to enable | |
sampling and logging of JMX measures: | |
<itemizedlist> | |
<listitem><para><code>-Duima.jmx.monitor.interval=1000</code> - (optional; default is 1000) specifies the | |
sampling interval in milliseconds</para></listitem> | |
<listitem><para><code>-Duima.jmx.monitor.formatter=<CustomFormatterClassName></code></para></listitem> | |
<listitem><para><code>-Dcom.sun.management.jmxremote</code> - enable JMX</para></listitem> | |
<listitem><para><code>-Dcom.sun.management.jmxremote.port=8009</code></para></listitem> | |
<listitem><para><code>-Dcom.sun.management.jmxremote.authenticate=false</code></para></listitem> | |
<listitem><para><code>-Dcom.sun.management.jmxremote.ssl=false</code></para></listitem> | |
</itemizedlist> | |
This configures JMX to run on port 8009 with no authentication, and sets the sampling interval to 1 second, | |
and specifies a custom formatter class name. | |
</para> | |
<para>There are two <code>formatter-classes</code> provided with UIMA AS: | |
<itemizedlist> | |
<listitem><para><code>org.apache.uima.aae.jmx.monitor.BasicUimaJmxMonitorListener - </code> | |
this is a multi-line formatter that formats for human-readable output</para></listitem> | |
<listitem><para><code>org.apache.uima.aae.jmx.monitor.SingleLineUimaJmxMonitorListener - </code> | |
this is a formatter that produces one line per interval, suitable for importing into | |
a spreadsheet program.</para></listitem> | |
</itemizedlist> | |
Both of these log to the UIMA log at the INFO log level. | |
</para> | |
<para>You can also write your own formatter. The monitor provides an API to plug in a custom formatter | |
for displaying service metrics. A custom formatter must implement JmxMonitorListener interface. | |
See the method <code>startMonitor</code> in the class <code>UIMA_Service</code> for an | |
example of how custom JMX Listeners are plugged into the monitor. | |
</para> | |
</section> | |
<section id="ugr.async.mt.jmx_sampling.standalone"> | |
<title>Running the Monitor program standalone</title> | |
<para>The monitor program can be started separately and pointed to a running UIMA AS Client or Service. | |
To start the program, invoke Java with the following classpath and parameters: | |
<itemizedlist> | |
<listitem> | |
<para>ClassPath:</para> | |
<itemizedlist> | |
<listitem><para>%UIMA_HOME%/lib/uimaj-as-activemq.jar</para></listitem> | |
<listitem><para>%UIMA_HOME%/lib/uimaj-as-core.jar</para></listitem> | |
<listitem><para>%UIMA_HOME%/lib/uima-core.jar</para></listitem> | |
<listitem><para>%UIMA_HOME%/apache-activemq-4.1.1/apache-activemq-4.1.1.jar</para></listitem> | |
</itemizedlist> | |
</listitem> | |
<listitem> | |
<para>Parameters:</para> | |
<itemizedlist> | |
<listitem><para><code>-Djava.util.logging.config.file=%UIMA_HOME%/config/MonitorLogger.properties</code> | |
- specifies the logging file where the information is written to</para></listitem> | |
<listitem><para><code>org.apache.uima.aae.jmx.monitor.JmxMonitor</code> - | |
the class whose main method is invoked</para></listitem> | |
<listitem><para><code>uri</code> - the URI of the jmx instance to monitor.</para></listitem> | |
<listitem><para><code>interval</code> - the (optional) | |
sampling interval, in milliseconds (default = 1000)</para></listitem> | |
</itemizedlist> | |
</listitem> | |
</itemizedlist> | |
</para> | |
<para>When run in this manner, it is not (currently) possible to specify the | |
log message formatting class; the multi-line output format is always used.</para> | |
</section> | |
<section id="ugr.async.mt.jmx_sampling.output"> | |
<title>Monitoring output</title> | |
<para>The monitoring program combines information from the JMX measures, including the associated | |
Queue Broker, sampling accumulating measurements at the specified sampling interval, and produces | |
the following outputs: | |
<informaltable frame="all"> | |
<tgroup cols="3" colsep="1" rowsep="1"> | |
<colspec colname="c1" colwidth="2*"/> | |
<colspec colname="c2" colwidth="5*"/> | |
<colspec colname="c3" colwidth="1*"/> | |
<thead> | |
<row> | |
<entry align="center">Name</entry> | |
<entry align="center">Description</entry> | |
<entry align="center">Units</entry>> | |
</row> | |
</thead> | |
<tbody> | |
<row> | |
<entry>Input queue depth</entry> | |
<entry>number of CASes waiting to be processed by a service</entry> | |
<entry>count</entry> | |
</row> | |
<row> | |
<entry>Reply queue depth</entry> | |
<entry>number of CASes returned to the client but not yet picked up by the client</entry> | |
<entry>count</entry> | |
</row> | |
<row> | |
<entry>CASes processed in interval</entry> | |
<entry>Number of CASes processed in this sampling interval</entry> | |
<entry>count</entry> | |
</row> | |
<row> | |
<entry>Idle time in interval</entry> | |
<entry>The total time this service has been idle during this interval</entry> | |
<entry>milli seconds</entry> | |
</row> | |
<row> | |
<entry>Analysis time in interval</entry> | |
<entry>The sum of the times spent in analysis by the service during this interval, | |
including analysis time spent in delegates, recursively</entry> | |
<entry>milli seconds</entry> | |
</row> | |
<row> | |
<entry>Cas Pool free Cas Count</entry> | |
<entry>Number of available CASes in the Cas Pool at the end of the interval</entry> | |
<entry>count</entry> | |
</row> | |
</tbody> | |
</tgroup> | |
</informaltable> | |
</para> | |
<para>In addition to the performance metrics the monitor also provides basic service information: | |
<itemizedlist> | |
<listitem> | |
<para>Service name</para> | |
</listitem> | |
<listitem> | |
<para>Is service top level</para> | |
</listitem> | |
<listitem> | |
<para>Is service remote</para> | |
</listitem> | |
<listitem> | |
<para>Is service a cas multiplier</para> | |
</listitem> | |
<listitem> | |
<para>Number of processing threads</para> | |
</listitem> | |
<listitem> | |
<para>Service uptime (milliseconds)</para> | |
</listitem> | |
</itemizedlist> | |
</para> | |
</section> | |
</section> | |
</section> | |
<section id="ugr.async.mt.tuning"> | |
<title>Tuning</title> | |
<section id="ugr.async.mt.tuning.approach"> | |
<title>Tuning procedure</title> | |
<para>This section is a cookbook of best practices for tuning a UIMA AS deployment. The summary information | |
provided by the Monitor program is used to guide the tuning.</para> | |
<para>The main metric for detecting an overloaded service is the input queue depth. If it is growing or high, the service | |
is not able to keep up with the load. There are more CASes arriving at the queue than the service can process. | |
Consider increasing number of instances of the services within the JVM (if on a multi-core machine having | |
additional capacity), or deploy additional instances of the service.</para> | |
<para>The main metric for detecting idle service is the idle time. If it is high, it can indicate that the service is not | |
receiving enough CASes. This can be caused by a bottleneck in the service's client; supporting evidence for this | |
can be a high reply queue depth for the client - indicating the client is overloaded. | |
Ideally, the idle time should be at zero, which means that the service receives enough CASes | |
to process, continually.</para> | |
<para>A CasPool free Cas Count of 0 can point to a bottleneck in a service's client; supporting | |
evidence for this can be a high idle time. In this case, the service does not have enough CASes in its pool and is | |
forced to wait. Remember that a CAS is not returned to the Service's CAS pool until the client signals it can be. | |
A typical reason is a slow client (look for evidence such as a high reply queue depth). Consider | |
incrementing service's Cas pool and check the client's metrics to determine a reason why it is slow.</para> | |
</section> | |
<section id="ugr.async.mt.tuning.settings"> | |
<title>Tuning Settings</title> | |
<para>This section has a list of the tuning parameters and a description of what they do and how they interact.</para> | |
<informaltable frame="all"> | |
<tgroup cols="2" colsep="1" rowsep="1"> | |
<colspec colname="c1" colwidth="2*"/> | |
<colspec colname="c2" colwidth="4*"/> | |
<thead> | |
<row> | |
<entry align="center">Name</entry> | |
<entry align="center">Description</entry> | |
</row> | |
</thead> | |
<tbody> | |
<row> | |
<entry>number of services on different machines started</entry> | |
<entry>You can adjust the number of machines assigned to a particular service, | |
even dynamically, by just starting / stopping additional servers that specify | |
the same input queue.</entry> | |
</row> | |
<row> | |
<entry>number of instances of a service</entry> | |
<entry>This is similar to the number of services on different machines started, above, | |
but specifies replication of an AS Primitive within one JVM. This is useful for making | |
use of multi-core machines sharing a common memory - large tables that might be | |
part of the analysis algorithm can be shared by all instances.</entry> | |
</row> | |
<row> | |
<entry>CAS pool size</entry> | |
<entry>This size limits the number of CASes being processed asynchronously.</entry> | |
</row> | |
<row> | |
<entry>casMultiplier poolSize</entry> | |
<entry>This size limits the number of CASes generated by a CAS Multiplier that are being processed asynchronously.</entry> | |
</row> | |
<row> | |
<entry>Service input queue prefetch</entry> | |
<entry>If set greater than 0, allows up to "n" CASes to be pulled into one service provider, at a time. | |
This can increase throughput, but can hurt latency, since one service may have several CASes pulled into it, | |
queued up, while another instance of the service could be "starved" and be sitting there idle. </entry> | |
</row> | |
<row> | |
<entry>Specifying async="true"/"false" on an aggregate</entry> | |
<entry>The default is false, because there is less overhead (no queues are set up, etc.). Setting this to | |
"true" allows multiple CASes to flow simultaneously in the aggregate.</entry> | |
</row> | |
<row> | |
<entry>remoteReplyQueueScaleout</entry> | |
<entry>This parameter indicates the number of threads that will be deployed to read from the remote reply queue. | |
Set to > 1 if deserialization time of replies is a bottleneck.</entry> | |
</row> | |
</tbody> | |
</tgroup> | |
</informaltable> | |
</section> | |
</section> | |
<section id="ugr.async.mt.limits"> | |
<title>Limitations</title> | |
<para>The current (2.3.0) implementation has the following limitations: | |
<itemizedlist> | |
<listitem><para>Monitoring program</para> | |
<itemizedlist> | |
<listitem><para>The monitoring program reads the JMS Queue Broker URL | |
from the configuration information provided by JMX for the UIMA AS Service | |
being monitored. It uses this information to connect to JMX on that broker, but | |
currently assumes that JMX is set up on the default port (1099). This is | |
currently hardcoded into the Monitor program, so be aware of this if you | |
change the port number for JMX on the JMS Queue Broker (a parameter in | |
ActiveMQ's configuration for the broker). | |
</para></listitem> | |
<listitem><para>When the Monitor program is run as a stand-alone program, | |
it is not (currently) possible to specify alternatives for the | |
log message formatting class; the multi-line output format is always used.</para></listitem> | |
</itemizedlist> | |
</listitem> | |
</itemizedlist> | |
</para> | |
</section> | |
</chapter> |