blob: 4fd4cd42e12fff38d616bd4f9eb5014955047376 [file] [log] [blame]
Apache UIMA Asynchronous Scaleout (UIMA-AS) Version 2.3.0-incubating README
---------------------------------------------------------------------------
0. Changes in the 2.3.0 release
- UIMA-AS "graduated" from Sandbox, now a top-level (within UIMA) add-on
- UIMA-AS repackaged - no longer includes the base UIMA. You must download a comparable version of base UIMA, separately
- Many improvements in error handling and recovery
- General hardening / bug fixes
- Many improvements in JMX reporting
- Optimizations and tunings for larger deployments
- Support for late binding of broker urls and ports
- Support for passing configuration information in URLs to ActiveMQ
- Delta Cas format support added, including Delta Binary serialization where the type systems are identical
- Multiple remote CAS multiplier support added
- Using Java queuing instead of embedded broker for local services (performance improvement)
- Support for high volume - multiple concurrent listener threads
- Initial support for Apache Camel and OSGI enablement added
- New launcher allows specifying directories containing Jars; all Jars found will be added to the classpath
1. Contents of Apache UIMA-AS binary distribution
The Apache UIMA-AS binary distribution includes
- Apache UIMA Asynchronous Scaleout extensions
- Saxon
- Apache ActiveMQ
- Spring Framework
It does not include the base UIMA binary distribution, which it depends on.
You should download and unzip that distribution, and then download and unzip the
UIMA-AS distribtuion over the base distribution.
UIMA-AS components include:
bin/startBroker.sh/bat: starts the ActiveMQ broker, which must be running before
UIMA AS services can be deployed.
bin/deployAsyncService.sh/bat: deploys an AnalysisEngine as a UIMA-AS
service. Takes one or more UIMA-AS Deployment Descriptors as arguments.
bin/runRemoteAsyncAE.sh/bat: Calls a UIMA-AS service. Takes arguments specifying the
location of the service, and an optional CollectionReader descriptor file used to
obtain the CASes to be processed by the service.
docs/pdf/uima_async_scaleout.pdf: UIMA-AS documentation, including the specification
for the deployment descriptor file syntax.
examples/deploy/as/... (Sample Deployment Descriptors)
Deploy_RoomNumberAnnotator.xml: Deploys Room Number Annotator Primitive AE
Deploy_MeetingDetectorTAE.xml: Deploys Meeting Detector Aggregate AE with all
delegates in the same JVM.
Deploy_MeetingDetectorTAE_Whiteboard.xml: Deploys Meeting Detector Aggregate AE
using the whiteboard Flow Controller.
Deploy_MeetingDetectorTAE_RemoteRoomNumber.xml: Deploys Meeting Detector Aggregate AE
that uses remotely deployed RoomNumberAnnotator.
Deploy_MeetingDetectorTAE_3MeetingAnnotator.xml: Deploys Meeting Detector Aggregate AE
with three instances of the MeetingAnnotator component.
Deploy_MeetingDetectorTAE_Sync_3Instances.xml: Deploys 3 instances of the
Meeting Detector as a Synchronous Aggregate (meaning the delegate AEs do not each
get their own input queue).
Deploy_MeetingAnnotator.xml: Deploys C++ Meeting Annotator. Note: requires
installation of uimacpp SDK into $UIMA_HOME.
MeetingFinderAggregate.xml: Aggregate descriptor that use the same components as the
CPE examples MeetingFinderCPE* in base UIMA.
Deploy_MeetingFinder.xml: Deploys MeetingFinderAggregate illustrating scalability and
error handling similar to the CPM examples; see Section 4 on migration below.
descriptors/as/... (Other Sample Descriptors for use with UIMA AS)
MeetingDetectorAsyncAE.xml: Specifier that can be used to call a UIMA AS
Service from an existing UIMA application; see Section 2.5 below.
2. Installation and Setup
2.1 Supported Platforms
UIMA AS Requires Java 5 or later. It has been tested with Sun Java 5 on Windows XP and Linux.
Other platforms and Java (5+) implementations should work, but have not been significantly tested.
2.2. Environment Variables
After you have unpacked the UIMA AS UIMA distribution, you must perform the following
environment variable settings (the same as for normal Apache UIMA setup):
* Set JAVA_HOME to the directory of your JRE installation you would like to use for UIMA.
* Set UIMA_HOME to the apache-uima-as directory of your unpacked Apache UIMA distribution
* Append UIMA_HOME/bin to your PATH
Note: The Mac OS X operating system has special procedures for setting up global environment
variables; see http://developer.apple.com/qa/qa2001/qa1067.html for how to do this.
2.3 Running the Setup Script
You must run the script UIMA_HOME/bin/adjustExamplePaths.bat (or .sh). This updates
paths in the examples based on the actual UIMA_HOME directory path.
2.4 Setting up Eclipse
Eclipse users should install the UIMA Eclipse Plugins and UIMA Examples Project using the
procedure described in Chapter 3 of the Apache UIMA Overview and Setup guide,
which you can find online at http://incubator.apache.org/uima; click on Documentation ->
HTML Online Version -> Overview and Setup -> 3. Eclipse IDE setup for UIMA.
However, since UIMA AS requires Java 5, you must be sure to set up your uimaj-examples Eclipse
project to use a version 5 (or later) JRE, and you must set your compiler compliance level to 5.0. To do
this go to Window->Preferences and navigate to the Java->Compiler page. Remember to
run the base Eclipse using Java 5 (or later), as well.
3. Getting Started
3.1 Starting the ActiveMQ Broker
UIMA AS services require an ActiveMQ broker to be available with which to create/register
the service request queue. If no broker is available, start a new broker on the same machine
the services will run on or another machine; this is done by first setting an env parameter
ACTIVEMQ_BASE pointing at a writable directory, or simply by cd'ing to a writable directory,
and running:
startBroker.sh/bat
The first time run this script will create a new directory $ACTIVEMQ_BASE/amq (or ./amq)
and default configuration files will be copied there. The configuration files can then be
customized to modify broker behavior for subsequent startups.
Note: only one broker can be started at a time on the same machine with the same
configuration file, or on different machines from the same writable directory.
When the broker starts it will print a message such as:
INFO: Listening for connections at: tcp://yourHostname:61616
Note this URL since you will need it to run services and clients.
The tcp listening port must be exposed to any clients or services using the broker.
To connect to a broker running behind a firewall using HTTP tunneling, see section 3.6 below.
3.2 Deploying an Analysis Engine as a UIMA AS Asynchronous Service
a. Create a Deployment Descriptor.
Examples can be found in the examples/deploy/as directory,
and the syntax is documented in docs/pdf/uima_async_scaleout.pdf.
Note: One of the things that the deployment descriptor may contain is a broker placeholder with
this syntax: ${defaultBrokerURL}. The placeholder is replaced at runtime with an actual broker
URL. The value for the placeholder comes from System properties. The brokerURL attribute of <inputQueue ...>
element is optional. If not present, a default of tcp://localhost:61616 will be used.
The examples assume the broker is listening on tcp://localhost:61616.
b. Run the command:
deployAsyncService.sh/cmd [testDD.xml] [-brokerURL url]
The argument to the command is the deployment descriptor you created in step (a). An optional argument
-brokerURL specifies a URL of the broker that the service will use to create connections to queues. This
argument takes effect only if your deployment descriptor does not explicitly name the broker URL in the
<inputQueue ...> xml element *or* the brokerURL attribute is set to a placeholder ${defaultBrokerURL}.
Omitting brokerURL or using a placeholder is a way to keep your deployment descriptors portable. You don't
need to edit your deployment descriptors when switching brokers.
Note: If you use import by name in your deployment descriptor, UIMA AS searches the CLASSPATH
as well as directories on UIMA_DATAPATH to resolve the import.
Note: deployAsyncService.sh/cmd scripts launch UimaBootstrap main program which loads UIMA jars
dynamically from UIMA_HOME/lib, UIMA_HOME/apache-activemq-4.1.1, UIMA_HOME/apache-activemq-4.1.1/lib,
and UIMA_HOME/apache-activemq-4.1.1/lib/optional directories. If you want to use a different
version of ActiveMQ, please set ACTIVEMQ_HOME environment variable to the location of
ActiveMQ you intend to use. Also, if you want to deploy your own annotator that is
installed in a different directory than UIMA_HOME/lib please set the UIMA_CLASSPATH
environment variable to point to the directories that contain your jar files. You
may specify multiple directories using File.pathSeparator; each directory's contained
JARs will be added to the class path. The paths can also contain jar files.
Note: Both UIMA AS client and UIMA AS service by default add a time-to-live (TTL) to
every request message. This enables expiration of messages that are not consumed.
Currently, the UIMA AS client multiplies the Process Timeout value by 10 and uses
this as TTL. The UIMA AS service sets TTL to the Timeout value multiplied by the number
of outstanding requests. To disable TTL, add a system property -DNoTTL. A convenient
way to set this parameter is by adding -DNoTTL to the env parameter UIMA_JVM_OPTS
before running deployAsyncService and/or runRemoteAsyncAE.
3.3 Calling a UIMA AS Asynchronous Service
To test a remote UIMA service you can use the script:
runRemoteAsyncAE.sh/cmd brokerUrl endpoint
This connects to a remote AE at specified brokerUrl and endpoint (which must match
the inputQueue endpoint in the remote AE service's deployment descriptor).
A subset of the optional arguments to runRemoteAsyncAE are:
-c Specifies a CollectionReader descriptor. The client will obtain CASes from the
CollectionReader and send them to the service for processing. If this option
is omitted, one empty CAS will be sent to the service (useful for services
containing a CAS Multiplier acting as a collection reader).
-d Specifies a deployment descriptor. The specified service will be deployed before processing
begins, and the service will be undeployed after processing completes.
Multiple -d entries can be given.
-o Specifies an Output Directory. All CASes received by the client's
CallbackListener will be serialized to XMI in the specified OutputDir.
If omitted, no XMI files will be output.
The full set of arguments are documented if you type the command with no arguments.
3.4 Quick Test of an async service
Start two terminal windows, each with an environments setup as described in section 2.2.
* In the first terminal window start the broker (as described in section 3.1),
by running the commands:
cd some-writable-directory
startBroker.sh/bat
* In the second terminal window, deploy a sample service and send it some CASes:
cd $UIMA_HOME/examples/deploy/as
runRemoteAsyncAE.sh/cmd tcp://localhost:61616 MeetingDetectorTaeQueue \
-d Deploy_MeetingDetectorTAE.xml \
-c $UIMA_HOME/examples/descriptors/collection_reader/FileSystemCollectionReader.xml
If you get an UnsupportedClassVersionError, Java 5 is probably not being used.
If the driver fails to find the input data, adjustExamplePaths was probably not run.
3.5 Calling a UIMA AS Asynchronous Service from an Existing UIMA Application
You can also call a UIMA AS Service from the DocumentAnalyzer or any other UIMA
application using a new JMS client. However, note that this is a synchronous interface,
that is, it will process only one CAS at a time, so it will not take advantage of the
scalability that UIMA AS provides. To process more than one CAS at a
time, you must use the Asynchronous UIMA AS Client as described in section 3.3.
An example JMS client service descriptor is provided in
examples/descriptors/as/MeetingDetectorAsyncAE.xml
The JMS service makes use of the customResourceSpecifier capability in Apache UIMA.
For more information on the customResourceSpecifier see the "Custom Resource Specifiers"
section in the Apache UIMA Reference manual.
3.6 Firewalls between clients and services
A service running behind a firewall can be accessed as long as its input queue
is on a broker that is accessable. For example, the service can register with a
public broker running outside the firewall.
By default, the reply queue used by an aggregate when calling a remote delegate is located
on the host where the aggregate is running. This will not work if there is a firewall blocking
the service from replying to this reply queue, or any other reason that the symbolic or actual IP
address of the aggregate's host is not accessable by the service.
There are two ways to fix this problem, the easiest being to specify that the reply queue
should be created on the service's broker. This is done by adding
<replyQueue location="remote"/>
to the remoteAnalysisEngine definition for the remote delegate.
The client API used by runRemoteAsyncAE always creates a reply queue on the service's broker.
These "remote" reply queues are JMS temporary queues, which means that they will be
deleted when the client aggregate or application terminates.
A more complicated approach is for the client to use an HTTP connector. In this case
UIMA AS always creates reply queues on the service's broker.
Note: There are bugs in the standard ActiveMQ HTTP connector core librarys (which we have
patched) associated with CASes larger than 64KB
and with doublebyte characters. The ActiveMQ jars distributed with UIMA AS
include the bug fixes described in http://issues.apache.org/activemq/browse/AMQ-1308
3.7 Monitoring a broker and its queues
When the broker starts it will print a message such as:
INFO ManagementContext - JMX consoles can connect to service:jmx:rmi:///jndi/rmi://localhost:1099/jmxrmi
Connect a JMX console to this service with:
$JAVA_HOME/bin/jconsole service:jmx:rmi:///jndi/rmi://localhost:1099/jmxrmi
(Note: jconsole is available in Java SDK (not JRE) distributions from Sun)
If your console is not on the same machine as the broker replace localhost by
the name of the broker's machine. For more details see http://activemq.apache.org/jmx.html
3.8 Monitoring UIMA AS service
UIMA AS service monitoring is available via JMX and jConsole. To enable this, please set the following
before starting a service:
set UIMA_JVM_OPTS=-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8009
-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false
The above configures JMX Server to run on port 8009 and enables remote monitoring of the process. Connect
jConsole using similare approach as described in 3.7 above but this using the following:
$JAVA_HOME/bin/jconsole service:jmx:rmi:///jndi/rmi://localhost:8009/jmxrmi
In the left pane locate and expand org.apache.uima to view UIMA components enabled for JMX monitoring.
3.9 Stopping UIMA AS service
A service can be stopped from a command line or remotely using jConsole and JMX. When the service is
launched it displays a prompt on stdout:
Enter 'q' to quiesce and stop the service or 's' to stop it now:
It reads stdin expecting either 'q' or 's', ignoring all other characters. When 'q' is typed, the
service will quiesce and stop. As part of this, the service closes its input queue and waits until all CASes
still "in-play" are finished. When the input CAS is returned the service stops. When 's' is typed the service
closes its input queue and immediately releases all CASes being processed and stops.
To stop UIMA AS process remotely or if the process runs in a background use jConsole and JMX. Using approach
described in 3.8 above launch jConsole. Once the connection is created, in the left pane open:
org.apache.uima
ee.jms.service
<Your Annotator Name> Uima EE Service
Controller
Operations
Here you will find two buttons labeled:
CompleteProcessingAndStop
StopNow
CompleteProcessingAndStop will initiate quiesce while StopNow will initiate a hard stop.
4. Migration from CPM to UIMA-AS
Migrating a collection processing engine from the CPM to UIMA-AS is straightforward.
First, migrate the CPE descriptor to a standard UIMA aggregate descriptor:
create a UIMA aggregate that includes all the components specified in the CPE descriptor.
Transfer any parameter overrides in the CPE descriptor to the aggregate descriptor.
Note that the aggreate descriptor must set <multipleDeploymentAllowed> to false
to be consistent with collection reader and CAS consumer delegates.
Second, test this aggregate descriptor by instantiating the aggregate and sending it a single CAS.
The contents of the CAS are not important; its purpose is to start the collection reader delegate
which will then create the actual CASes to be processed by the other aggregate components.
The CAS Visual Debugger, CVD, is a useful tool for doing this test.
Next, create a UIMA-AS deployment descriptor that specifies desired scaleout and error handling.
Vinci services are still supported, although it is recommended to replace them with UIMA-AS
services to enable more efficient load balancing and greater scaleout capability.
An example of this kind of migration is embodied by the sample descriptors:
Original:
$UIMA_HOME/examples/descriptors/collection_processing_engine/MeetingFinderCPE_Integrated.xml
Migrated:
$UIMA_HOME/examples/deploy/as/MeetingFinderAggregate.xml
$UIMA_HOME/examples/deploy/as/Deploy_MeetingFinder.xml
5. Known problems/limitations with Release 2.3.0
1. No automatic refresh for broken connections with temp reply queues.
2. When connecting to an AMQ broker behind a firewall, avoid using the maxInactivityDuration=0
decoration on the brokerURL (see: http://activemq.apache.org/configuring-wire-formats.html)
as it turns off AMQ 'keep alive' messaging. A firewall may close ports if it is
configured to detect stale connections.
3. UIMA AS does not (yet) support PEAR specifiers. The deployment descriptor should not point to PEAR file.
Instead, PEARs must be unzipped and classpath adjusted to point to required resources.
4. To run multiple UIMA AS services on the same machine, a unique JMX port must be provided
by setting the following environment variable:
set UIMA_JVM_OPTS=-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=<unique port#> -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false
For up-to-date information on UIMA-AS issues, see
http://issues.apache.org/jira/secure/BrowseProject.jspa?id=12310570
Crypto Notice
-------------
This distribution includes cryptographic software. The country in
which you currently reside may have restrictions on the import,
possession, use, and/or re-export to another country, of
encryption software. BEFORE using any encryption software, please
check your country's laws, regulations and policies concerning the
import, possession, or use, and re-export of encryption software, to
see if this is permitted. See <http://www.wassenaar.org/> for more
information.
The U.S. Government Department of Commerce, Bureau of Industry and
Security (BIS), has classified this software as Export Commodity
Control Number (ECCN) 5D002.C.1, which includes information security
software using or performing cryptographic functions with asymmetric
algorithms. The form and manner of this Apache Software Foundation
distribution makes it eligible for export under the License Exception
ENC Technology Software Unrestricted (TSU) exception (see the BIS
Export Administration Regulations, Section 740.13) for both object
code and source code.
The following provides more details on the included cryptographic
software:
This distribution includes portions of Apache ActiveMQ, which, in
turn, is classified as being controlled under ECCN 5D002.
Disclaimer
-----------
Apache UIMA is an effort undergoing incubation at The Apache Software
Foundation (ASF). Incubation is required of all newly accepted projects
until a further review indicates that the infrastructure, communications,
and decision making process have stabilized in a manner consistent with
other successful ASF projects. While incubation status is not necessarily
a reflection of the completeness or stability of the code, it does
indicate that the project has yet to be fully endorsed by the ASF.