Apache UIMA Asynchronous Scaleout (UIMA-AS) Version 2.3.0-incubating README | |
--------------------------------------------------------------------------- | |
0. Changes in the 2.3.0 release | |
- UIMA-AS "graduated" from Sandbox, now a top-level (within UIMA) add-on | |
- UIMA-AS repackaged - no longer includes the base UIMA. You must download a comparable version of base UIMA, separately | |
- Many improvements in error handling and recovery | |
- General hardening / bug fixes | |
- Many improvements in JMX reporting | |
- Optimizations and tunings for larger deployments | |
- Support for late binding of broker urls and ports | |
- Support for passing configuration information in URLs to ActiveMQ | |
- Delta Cas format support added, including Delta Binary serialization where the type systems are identical | |
- Multiple remote CAS multiplier support added | |
- Using Java queuing instead of embedded broker for local services (performance improvement) | |
- Support for high volume - multiple concurrent listener threads | |
- Initial support for Apache Camel and OSGI enablement added | |
- New launcher allows specifying directories containing Jars; all Jars found will be added to the classpath | |
1. Contents of Apache UIMA-AS binary distribution | |
The Apache UIMA-AS binary distribution includes | |
- Apache UIMA Asynchronous Scaleout extensions | |
- Saxon | |
- Apache ActiveMQ | |
- Spring Framework | |
It does not include the base UIMA binary distribution, which it depends on. | |
You should download and unzip that distribution, and then download and unzip the | |
UIMA-AS distribtuion over the base distribution. | |
UIMA-AS components include: | |
bin/startBroker.sh/bat: starts the ActiveMQ broker, which must be running before | |
UIMA AS services can be deployed. | |
bin/deployAsyncService.sh/bat: deploys an AnalysisEngine as a UIMA-AS | |
service. Takes one or more UIMA-AS Deployment Descriptors as arguments. | |
bin/runRemoteAsyncAE.sh/bat: Calls a UIMA-AS service. Takes arguments specifying the | |
location of the service, and an optional CollectionReader descriptor file used to | |
obtain the CASes to be processed by the service. | |
docs/pdf/uima_async_scaleout.pdf: UIMA-AS documentation, including the specification | |
for the deployment descriptor file syntax. | |
examples/deploy/as/... (Sample Deployment Descriptors) | |
Deploy_RoomNumberAnnotator.xml: Deploys Room Number Annotator Primitive AE | |
Deploy_MeetingDetectorTAE.xml: Deploys Meeting Detector Aggregate AE with all | |
delegates in the same JVM. | |
Deploy_MeetingDetectorTAE_Whiteboard.xml: Deploys Meeting Detector Aggregate AE | |
using the whiteboard Flow Controller. | |
Deploy_MeetingDetectorTAE_RemoteRoomNumber.xml: Deploys Meeting Detector Aggregate AE | |
that uses remotely deployed RoomNumberAnnotator. | |
Deploy_MeetingDetectorTAE_3MeetingAnnotator.xml: Deploys Meeting Detector Aggregate AE | |
with three instances of the MeetingAnnotator component. | |
Deploy_MeetingDetectorTAE_Sync_3Instances.xml: Deploys 3 instances of the | |
Meeting Detector as a Synchronous Aggregate (meaning the delegate AEs do not each | |
get their own input queue). | |
Deploy_MeetingAnnotator.xml: Deploys C++ Meeting Annotator. Note: requires | |
installation of uimacpp SDK into $UIMA_HOME. | |
MeetingFinderAggregate.xml: Aggregate descriptor that use the same components as the | |
CPE examples MeetingFinderCPE* in base UIMA. | |
Deploy_MeetingFinder.xml: Deploys MeetingFinderAggregate illustrating scalability and | |
error handling similar to the CPM examples; see Section 4 on migration below. | |
descriptors/as/... (Other Sample Descriptors for use with UIMA AS) | |
MeetingDetectorAsyncAE.xml: Specifier that can be used to call a UIMA AS | |
Service from an existing UIMA application; see Section 2.5 below. | |
2. Installation and Setup | |
2.1 Supported Platforms | |
UIMA AS Requires Java 5 or later. It has been tested with Sun Java 5 on Windows XP and Linux. | |
Other platforms and Java (5+) implementations should work, but have not been significantly tested. | |
2.2. Environment Variables | |
After you have unpacked the UIMA AS UIMA distribution, you must perform the following | |
environment variable settings (the same as for normal Apache UIMA setup): | |
* Set JAVA_HOME to the directory of your JRE installation you would like to use for UIMA. | |
* Set UIMA_HOME to the apache-uima-as directory of your unpacked Apache UIMA distribution | |
* Append UIMA_HOME/bin to your PATH | |
Note: The Mac OS X operating system has special procedures for setting up global environment | |
variables; see http://developer.apple.com/qa/qa2001/qa1067.html for how to do this. | |
2.3 Running the Setup Script | |
You must run the script UIMA_HOME/bin/adjustExamplePaths.bat (or .sh). This updates | |
paths in the examples based on the actual UIMA_HOME directory path. | |
2.4 Setting up Eclipse | |
Eclipse users should install the UIMA Eclipse Plugins and UIMA Examples Project using the | |
procedure described in Chapter 3 of the Apache UIMA Overview and Setup guide, | |
which you can find online at http://incubator.apache.org/uima; click on Documentation -> | |
HTML Online Version -> Overview and Setup -> 3. Eclipse IDE setup for UIMA. | |
However, since UIMA AS requires Java 5, you must be sure to set up your uimaj-examples Eclipse | |
project to use a version 5 (or later) JRE, and you must set your compiler compliance level to 5.0. To do | |
this go to Window->Preferences and navigate to the Java->Compiler page. Remember to | |
run the base Eclipse using Java 5 (or later), as well. | |
3. Getting Started | |
3.1 Starting the ActiveMQ Broker | |
UIMA AS services require an ActiveMQ broker to be available with which to create/register | |
the service request queue. If no broker is available, start a new broker on the same machine | |
the services will run on or another machine; this is done by first setting an env parameter | |
ACTIVEMQ_BASE pointing at a writable directory, or simply by cd'ing to a writable directory, | |
and running: | |
startBroker.sh/bat | |
The first time run this script will create a new directory $ACTIVEMQ_BASE/amq (or ./amq) | |
and default configuration files will be copied there. The configuration files can then be | |
customized to modify broker behavior for subsequent startups. | |
Note: only one broker can be started at a time on the same machine with the same | |
configuration file, or on different machines from the same writable directory. | |
When the broker starts it will print a message such as: | |
INFO TransportServerThreadSupport - Listening for connections at: tcp://yourHostname:61616 | |
Note this URL since you will need it to run services and clients. | |
The tcp listening port must be exposed to any clients or services using the broker. | |
To connect to a broker running behind a firewall using HTTP tunneling, see section 3.6 below. | |
3.2 Deploying an Analysis Engine as a UIMA AS Asynchronous Service | |
a. Create a Deployment Descriptor. | |
Examples can be found in the examples/deploy/as directory, | |
and the syntax is documented in docs/pdf/uima_async_scaleout.pdf. | |
Note: One of the things that the deployment descriptor may contain is a broker placeholder with | |
this syntax: ${defaultBrokerURL}. The placeholder is replaced at runtime with an actual broker | |
URL. The value for the placeholder comes from System properties. The brokerURL attribute of <inputQueue ...> | |
element is optional. If not present, a default of tcp://localhost:61616 will be used. | |
The examples assume the broker is listening on tcp://localhost:61616. | |
b. Run the command: | |
deployAsyncService.sh/cmd [testDD.xml] [-brokerURL url] | |
The argument to the command is the deployment descriptor you created in step (a). An optional argument | |
-brokerURL specifies a URL of the broker that the service will use to create connections to queues. This | |
argument takes effect only if your deployment descriptor does not explicitly name the broker URL in the | |
<inputQueue ...> xml element *or* the brokerURL attribute is set to a placeholder ${defaultBrokerURL}. | |
Omitting brokerURL or using a placeholder is a way to keep your deployment descriptors portable. You don't | |
need to edit your deployment descriptors when switching brokers. | |
Note: If you use import by name in your deployment descriptor, UIMA AS searches the CLASSPATH | |
as well as directories on UIMA_DATAPATH to resolve the import. | |
Note: deployAsyncService.sh/cmd scripts launch UimaBootstrap main program which loads UIMA jars | |
dynamically from UIMA_HOME/lib, UIMA_HOME/apache-activemq-4.1.1, UIMA_HOME/apache-activemq-4.1.1/lib, | |
and UIMA_HOME/apache-activemq-4.1.1/lib/optional directories. If you want to use a different | |
version of ActiveMQ, set the ACTIVEMQ_HOME environment variable to the location of | |
ActiveMQ you intend to use. Also, if you want to deploy your own annotator that is | |
installed in a different directory than UIMA_HOME/lib, set the UIMA_CLASSPATH | |
environment variable to point to one or more Classpath entries; these entries can either be | |
directories of Java class files, Jar files, or directories of Jar files (in which case, all the | |
Jars are added in an arbitrary order). Separate multiple entries using File.pathSeparator. | |
Note: Both UIMA AS client and UIMA AS service by default add a time-to-live (TTL) to | |
every request message. This enables expiration of messages that are not consumed. | |
Currently, the UIMA AS client multiplies the Process Timeout value by 10 and uses | |
this as TTL. The UIMA AS service sets TTL to the Timeout value multiplied by the number | |
of outstanding requests. To disable TTL, add a system property -DNoTTL. A convenient | |
way to set this parameter is by adding -DNoTTL to the env parameter UIMA_JVM_OPTS | |
before running deployAsyncService and/or runRemoteAsyncAE. | |
3.3 Calling a UIMA AS Asynchronous Service | |
To test a remote UIMA service you can use the script: | |
runRemoteAsyncAE.sh/cmd brokerUrl endpoint | |
This connects to a remote AE at specified brokerUrl and endpoint (which must match | |
the inputQueue endpoint in the remote AE service's deployment descriptor). | |
A subset of the optional arguments to runRemoteAsyncAE are: | |
-c Specifies a CollectionReader descriptor. The client will obtain CASes from the | |
CollectionReader and send them to the service for processing. If this option | |
is omitted, one empty CAS will be sent to the service (useful for services | |
containing a CAS Multiplier acting as a collection reader). | |
-d Specifies a deployment descriptor. The specified service will be deployed before processing | |
begins, and the service will be undeployed after processing completes. | |
Multiple -d entries can be given. | |
-o Specifies an Output Directory. All CASes received by the client's | |
CallbackListener will be serialized to XMI in the specified OutputDir. | |
If omitted, no XMI files will be output. | |
The full set of arguments are documented if you type the command with no arguments. | |
3.4 Quick Test of an async service | |
Start two terminal windows, each with an environments setup as described in section 2.2. | |
* In the first terminal window start the broker (as described in section 3.1), | |
by running the commands: | |
cd some-writable-directory | |
startBroker.sh/bat | |
* In the second terminal window, deploy a sample service and send it some CASes: | |
cd $UIMA_HOME/examples/deploy/as | |
runRemoteAsyncAE.sh/cmd tcp://localhost:61616 MeetingDetectorTaeQueue \ | |
-d Deploy_MeetingDetectorTAE.xml \ | |
-c $UIMA_HOME/examples/descriptors/collection_reader/FileSystemCollectionReader.xml | |
If you get an UnsupportedClassVersionError, Java 5 is probably not being used. | |
If the driver fails to find the input data, adjustExamplePaths was probably not run. | |
3.5 Calling a UIMA AS Asynchronous Service from an Existing UIMA Application | |
You can also call a UIMA AS Service from the DocumentAnalyzer or any other UIMA | |
application using a new JMS client Service Descriptor | |
(see section 1.7 in the UIMA Asynchronous Scaleout documentation). | |
However, note that this is a synchronous interface, that is, it will process only | |
one CAS at a time, so it will not take advantage of the scalability that UIMA AS | |
provides. To process more than one CAS at a time, you can use the Asynchronous | |
UIMA AS Client as described in section 3.3, or write your own application using the | |
UIMA AS Client APIs. | |
An example JMS client service descriptor is provided in | |
examples/descriptors/as/MeetingDetectorAsyncAE.xml | |
The JMS service makes use of the customResourceSpecifier capability in Apache UIMA. | |
For more information on the customResourceSpecifier see the "Custom Resource Specifiers" | |
section in the Apache UIMA Reference manual. | |
3.6 Firewalls between clients and services | |
A service running behind a firewall can be accessed as long as its input queue | |
is on a broker that is accessable. For example, the service can register with a | |
public broker running outside the firewall. Alternatively the broker may be configured | |
to tunnel over HTTP. For details see http://activemq.apache.org/configuring-transports.html | |
Note: There are bugs in the standard ActiveMQ HTTP connector core librarys (which we have | |
patched) associated with CASes larger than 64KB and with doublebyte characters. | |
The ActiveMQ jars distributed with UIMA AS include the bug fixes described in | |
http://issues.apache.org/activemq/browse/AMQ-1308 | |
3.7 Monitoring a broker and its queues | |
When the broker starts it will print a message such as: | |
INFO ManagementContext - JMX consoles can connect to service:jmx:rmi:///jndi/rmi://localhost:1099/jmxrmi | |
Connect a JMX console to this service with: | |
$JAVA_HOME/bin/jconsole service:jmx:rmi:///jndi/rmi://localhost:1099/jmxrmi | |
(Note: jconsole is available in Java SDK (not JRE) distributions from Sun) | |
If your console is not on the same machine as the broker, replace localhost by | |
the name of the broker's machine. The default ports for the broker (61616) and | |
for the JMX server (1099) can be overridden in the broker configuration file | |
created when the broker is first started. For more details see | |
http://activemq.apache.org/jmx.html | |
3.8 Monitoring UIMA AS service | |
UIMA AS service monitoring is available via JMX and jConsole. To enable this, please set the following | |
before starting a service: | |
set UIMA_JVM_OPTS=-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=<uniquePortNumber, e.g. 8009> | |
-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false | |
Connect a jConsole to this JMX service as described in 3.7 above using the appropriate port, e.g. | |
$JAVA_HOME/bin/jconsole service:jmx:rmi:///jndi/rmi://localhost:8009/jmxrmi | |
Under the MBeans tab, expand org.apache.uima in the left panel to view UIMA components enabled for JMX monitoring. | |
3.9 Stopping UIMA AS service | |
A service can be stopped from a command line or remotely using jConsole and JMX. When the service is | |
launched it displays a prompt on stdout: | |
Enter 'q' to quiesce and stop the service or 's' to stop it now: | |
It reads stdin expecting either 'q' or 's', ignoring all other characters. When 'q' is typed, the | |
service will quiesce and stop. As part of this, the service closes its input queue and waits until all CASes | |
still "in-play" are finished. When the input CAS is returned the service stops. When 's' is typed the service | |
closes its input queue and immediately releases all CASes being processed and stops. | |
To stop UIMA AS process remotely or if the process runs in a background use jConsole and JMX. Using approach | |
described in 3.8 above launch jConsole. Once the connection is created, in the left pane open: | |
org.apache.uima | |
ee.jms.service | |
<Your Annotator Name> Uima EE Service | |
Controller | |
Operations | |
Here you will find two buttons labeled: | |
CompleteProcessingAndStop | |
StopNow | |
CompleteProcessingAndStop will initiate quiesce while StopNow will initiate a hard stop. | |
4. Migration from CPM to UIMA-AS | |
Migrating a collection processing engine from the CPM to UIMA-AS is straightforward. | |
First, migrate the CPE descriptor to a standard UIMA aggregate descriptor: | |
create a UIMA aggregate that includes all the components specified in the CPE descriptor. | |
Transfer any parameter overrides in the CPE descriptor to the aggregate descriptor. | |
Note that the aggreate descriptor must set <multipleDeploymentAllowed> to false | |
to be consistent with collection reader and CAS consumer delegates. | |
Second, test this aggregate descriptor by instantiating the aggregate and sending it a single CAS. | |
The contents of the CAS are not important; its purpose is to start the collection reader delegate | |
which will then create the actual CASes to be processed by the other aggregate components. | |
The CAS Visual Debugger, CVD, is a useful tool for doing this test. | |
Next, create a UIMA-AS deployment descriptor that specifies desired scaleout and error handling. | |
Vinci services are still supported, although it is recommended to replace them with UIMA-AS | |
services to enable more efficient load balancing and greater scaleout capability. | |
An example of this kind of migration is embodied by the sample descriptors: | |
Original: | |
$UIMA_HOME/examples/descriptors/collection_processing_engine/MeetingFinderCPE_Integrated.xml | |
Migrated: | |
$UIMA_HOME/examples/deploy/as/MeetingFinderAggregate.xml | |
$UIMA_HOME/examples/deploy/as/Deploy_MeetingFinder.xml | |
5. Known problems/limitations with Release 2.3.0 | |
1. No automatic refresh for broken connections with temp reply queues. | |
2. When connecting to an AMQ broker behind a firewall, avoid using the maxInactivityDuration=0 | |
decoration on the brokerURL (see: http://activemq.apache.org/configuring-wire-formats.html) | |
as it turns off AMQ 'keep alive' messaging. Without these, a firewall may assume a connection | |
has become stale and close its ports. | |
3. UIMA AS does not (yet) support PEAR specifiers. The deployment descriptor should not point to PEAR file. | |
Instead, PEARs must be unzipped and classpath adjusted to point to required resources. | |
4. To monitor multiple UIMA AS services on the same machine, each must be assigned a unique | |
JMX port (see section 3.8). | |
For up-to-date information on UIMA-AS issues, see | |
http://issues.apache.org/jira/secure/BrowseProject.jspa?id=12310570 | |
Crypto Notice | |
------------- | |
This distribution includes cryptographic software. The country in | |
which you currently reside may have restrictions on the import, | |
possession, use, and/or re-export to another country, of | |
encryption software. BEFORE using any encryption software, please | |
check your country's laws, regulations and policies concerning the | |
import, possession, or use, and re-export of encryption software, to | |
see if this is permitted. See <http://www.wassenaar.org/> for more | |
information. | |
The U.S. Government Department of Commerce, Bureau of Industry and | |
Security (BIS), has classified this software as Export Commodity | |
Control Number (ECCN) 5D002.C.1, which includes information security | |
software using or performing cryptographic functions with asymmetric | |
algorithms. The form and manner of this Apache Software Foundation | |
distribution makes it eligible for export under the License Exception | |
ENC Technology Software Unrestricted (TSU) exception (see the BIS | |
Export Administration Regulations, Section 740.13) for both object | |
code and source code. | |
The following provides more details on the included cryptographic | |
software: | |
This distribution includes portions of Apache ActiveMQ, which, in | |
turn, is classified as being controlled under ECCN 5D002. | |
Disclaimer | |
----------- | |
Apache UIMA is an effort undergoing incubation at The Apache Software | |
Foundation (ASF). Incubation is required of all newly accepted projects | |
until a further review indicates that the infrastructure, communications, | |
and decision making process have stabilized in a manner consistent with | |
other successful ASF projects. While incubation status is not necessarily | |
a reflection of the completeness or stability of the code, it does | |
indicate that the project has yet to be fully endorsed by the ASF. |