Apache UIMA (Unstructured Information Management Architecture) v2.2.0 | |
Release Notes | |
----------------------------------------------------------------------- | |
CONTENTS | |
1. What is UIMA? | |
2. Major Changes in this Release | |
3. Migrating from IBM UIMA to Apache UIMA | |
4. How to Get Involved | |
5. How to Report Issues | |
6. List of JIRA Issues Fixed in this Release | |
1. What is UIMA? | |
Unstructured Information Management applications are software systems that | |
analyze large volumes of unstructured information in order to discover knowledge | |
that is relevant to an end user. UIMA is a framework and SDK for developing such | |
applications. An example UIM application might ingest plain text and identify | |
entities, such as persons, places, organizations; or relations, such as | |
works-for or located-at. UIMA enables such an application to be decomposed into | |
components, for example "language identification" -> | |
"language specific segmentation" -> "sentence boundary detection" -> | |
"entity detection (person/place names etc.)". Each component must implement | |
interfaces defined by the framework and must provide self-describing metadata | |
via XML descriptor files. The framework manages these components and the data | |
flow between them. Components are written in Java or C++; the data that | |
flows between components is designed for efficient mapping between these | |
languages. UIMA additionally provides capabilities to wrap components as | |
network services, and can scale to very large volumes by replicating processing | |
pipelines over a cluster of networked nodes. | |
Apache UIMA is an Apache-licensed open source implementation of the UIMA | |
specification (that specification is, in turn, being developed concurrently by | |
a technical committee within OASIS , a standards organization). We invite and | |
encourage you to participate in both the implementation and specification | |
efforts. | |
UIMA is a component framework for analysing unstructured content such as text, | |
audio and video. It comprises an SDK and tooling for composing and running | |
analytic components written in Java and C++, with some support for Perl, | |
Python and TCL. | |
2. Major Changes in this Release | |
This section describes what has changed between version 2.1 and version 2.2 of | |
Apache UIMA. | |
2.1 Pear Runtime | |
It is now possible to run installed PEAR files directly, without any manual setup. | |
When you package a PEAR file with Apache UIMA 2.2, a descriptor will be created | |
from which you can create an analysis engine. You no longer need to worry about | |
classpath setup for the PEAR, this is now handled by the framework. You can also | |
refer to several PEARs from an aggregate analysis engine descriptor, without | |
additional setup. | |
2.2 Class Loading Improvements for JCas Cover Classes | |
When an aggregate analysis engine contains one or more PEARs, each with their | |
own classpath, the JCAS will handle the class loading so that each annotator | |
will see the set of cover classes that were loaded with its own class loader. | |
So if you package your analysis in a PEAR, you can now be sure that the correct | |
version of the JCas cover classes will be used in your annotator. This fixes a | |
long-standing issue where Analysis Engines that used different, incompatible | |
versions of JCas cover classes could not be combined into an Aggregate. In particular, | |
you can now add new features to the document annotation and use the JCas to access them. | |
2.3 CPE Descriptors now support <import> | |
You can now use <import location="..."/> or <import name="..."/> in CPE Descriptors, | |
and these will be resolved in the same way as in other component descriptors. | |
3. Migrating from IBM UIMA to Apache UIMA | |
This section describes how to move from pre-Apache versions of UIMA to the | |
Apache version (starting with Apache UIMA 2.1). | |
Note: Before running the migration utility, be sure to back up your files, just | |
in case you encounter any problems, because the migration tool updates the | |
files in place in the directories where it finds them. | |
The migration utility is run by executing the script file | |
apache-uima/bin/ibmUimaToApacheUima.bat (Windows) or | |
apache-uima/bin/ibmUimaToApacheUima.sh (UNIX). You must pass one argument: the | |
directory containing the files that you want to be migrated. Subdirectories | |
will be processed recursively. | |
The script scans your files and applies the necessary updates, for example | |
replacing the com.ibm package names with the new org.apache package names. | |
The script will only attempt to modify files with the extensions: java, xml, | |
xmi, wsdd, properties, launch, bat, cmd, sh, ksh, or csh; and files with no | |
extension. Also, files with size greater than 1,000,000 bytes will be skipped. | |
(If you want the script to modify files with other extensions, you can edit | |
the script file and change the -ext argument appropriately.) | |
If the migration tool reports warnings, there may be a few additional steps to | |
take. The following two sections explain some simple manual changes that you | |
might need to make to your code. | |
3.1. JCas Cover Classes for DocumentAnnotation | |
If you have run JCasGen it is likely that you have the classes | |
com.ibm.uima.jcas.tcas.DocumentAnnotation and | |
com.ibm.uima.jcas.tcas.DocumentAnnotation_Type as part of your code. This | |
package name is no longer valid, and the migration utility does not move your | |
files between directories so it is unable to fix this. | |
If you have not made manual modifications to these classes, the best solution | |
is usually to just delete these two classes (and their containing package). | |
There is a default version in the uima-document-annotation.jar file that is | |
included in Apache UIMA. If you have made custom changes, then you should not | |
delete the file but instead move it to the correct package | |
org.apache.uima.jcas.tcas. For more information about JCas and | |
DocumentAnnotation please see Section 5.5.4, | |
"Adding Features to DocumentAnnotation" in the UIMA References manual | |
(docs/html/references/references.html). | |
3.2. JCas.getDocumentAnnotation | |
The deprecated method JCas.getDocumentAnnotation has been removed. Its use | |
must be replaced with JCas.getDocumentAnnotationFs. The method | |
JCas.getDocumentAnnotationFs() returns type TOP, so your code must cast this to | |
type DocumentAnnotation. The reasons for this are described in Section | |
5.5.4, "Adding Features to DocumentAnnotation" in the UIMA References manual | |
(docs/html/references/references.html). | |
3.3. Rare Cases Where Additional Manual Migration is Necessary | |
For most users there should not be any additional migration steps necessary. | |
However, if the migration tool reported an additional warning or if you are | |
having trouble getting your code to compile or run after running the migration, | |
please see Section 1.4.2. "Rare Cases Where Additional Manual Migration is | |
Necessary," in the Overview and Setup manual | |
(docs/html/overview_and_setup/overview_and_setup.html). | |
4. How to Get Involved | |
The Apache UIMA project really needs and appreciates any contributions, | |
including documentation help, source code and feedback. If you are interested | |
in contributing, please visit http://incubator.apache.org/uima/get-involved.html. | |
5. How to Report Issues | |
The Apache UIMA project uses JIRA for issue tracking. Please report any | |
issues you find at http://issues.apache.org/jira/browse/uima. | |
6. List of JIRA Issues Fixed in this Release | |
Release Notes - UIMA - Version 2.2 | |
** Sub-task | |
* [UIMA-326] - Add Out-of-typesystem Data Support to XMI Serialization | |
* [UIMA-343] - Framework support for import in CPE descriptor | |
* [UIMA-344] - CPE GUI should create <import> elements instead of <include> | |
* [UIMA-345] - Documentation for <import> in CPE Descriptor | |
** Bug | |
* [UIMA-32] - CPE GUI doesn't parse ${CPM_HOME} variable | |
* [UIMA-194] - Tools highlight incorrect annotation offsets due to XML serialization bug in Sun Java 1.4.2 | |
* [UIMA-269] - Test PEAR Files don't run | |
* [UIMA-270] - When CVD run with -desc option, status bar still says "(No AE Loaded)" | |
* [UIMA-271] - PEAR Installer doesn't enable "Install" button if PEAR file name is input by keyboard | |
* [UIMA-303] - Problems with BoundedQueue.dequeue(timeout) | |
* [UIMA-316] - CVD does not display auto-indexes correctly | |
* [UIMA-329] - extractAndBuild scripts need to check for presence of JAI libraries | |
* [UIMA-330] - Calling reconfigure() on aggregate AE doesn't call reconfigure() on FlowController. | |
* [UIMA-336] - Schema validation fails for service client descriptors | |
* [UIMA-347] - Custom indexes defined in C++ annotators are ignored | |
* [UIMA-356] - fix IBM dependency in CVD log properties file | |
* [UIMA-359] - Blob serialization problems | |
* [UIMA-362] - CVD UIMA about box is editable | |
* [UIMA-364] - CDE add type button and other actions broken | |
* [UIMA-365] - call tae.destroy() in AnalysisEngine_implTest to close open file handles | |
* [UIMA-367] - Deadlock can occur in MultiprocessingAnalysisEngine_impl.setResultSpecification | |
* [UIMA-370] - Migration tool gets ProgressImpl wrong | |
* [UIMA-374] - CPE GUI left in bad state if you open a CPE descriptor that refers to a nonexistent component descriptor | |
* [UIMA-376] - README refers to outdated GUI label | |
* [UIMA-383] - Duplicate operationalProperties element in example descriptor ex2/RoomNumberAnnotator.xml | |
* [UIMA-385] - setUimaClasspath script has extra space at end of set PATH command, making last path entry invalid | |
* [UIMA-387] - XMI Serializer can write invalid control characters | |
* [UIMA-389] - AnnotationBase.getSofa() throws ClassCastException | |
* [UIMA-392] - Eclipse Plugin packaging not working correctly | |
* [UIMA-393] - ibmUimaToApacheUIMA.sh migration script doesn't work | |
* [UIMA-394] - sofa2jcasMap not be consistenly set | |
* [UIMA-396] - Javadoc for Feature.isMultipleReferencesAllowed is incorrect | |
* [UIMA-397] - JSR47Logger_implTest failing with Sun Java 6 | |
* [UIMA-400] - Fix Eclipse plugin | |
* [UIMA-402] - Adding Remote SOAP AE to Aggregate in CDE causes validation error | |
* [UIMA-404] - try to cast NoClassDefFoundError to Exception | |
* [UIMA-410] - Type priority test case failing with IBM JDK 1.5.0_5ea | |
* [UIMA-411] - PearInstallerTest fails when running from mvn install target - caused by class loading issues in the PEAR verification code | |
* [UIMA-414] - Component Descriptor Editor not marking editor as "changed" if an override is added to an existing parameter having overrides. | |
* [UIMA-415] - Component Descriptor Editor fails when removing parameter override | |
* [UIMA-421] - CVD broken after restructuring | |
* [UIMA-422] - update UIMA DocBook version and Date | |
* [UIMA-424] - update UIMA Framework version to 2.2 | |
* [UIMA-426] - Component Descriptor Editor feature to edit parts which require other parts for context is broken - CDE wont start up | |
* [UIMA-427] - CVD throws NPE when descriptor file should be loaded | |
* [UIMA-429] - Running an AE in CVD resets the document text (making it scroll to the end). | |
* [UIMA-433] - CAS Editor changing documents does not always sets dirty flag | |
* [UIMA-435] - Update runtime plugin manifest package list for CVD package name change | |
* [UIMA-437] - Annotators are not prevented from calling CAS.release() | |
* [UIMA-440] - CAS heap doesn't grow correctly when first page exceeded | |
* [UIMA-442] - FileUtilsTest fail on Linux | |
* [UIMA-443] - fix flow ResultSpec handling | |
* [UIMA-449] - XMI serialization does not work with Sun Java 1.5.0_12 | |
* [UIMA-452] - Cas Editor: Actions to modify annotations spans do not check bounds | |
* [UIMA-455] - Unused import com.sun.org.apache.bcel.internal.generic.ISTORE in CasEditor causes build break with IBM JVM | |
* [UIMA-459] - References html file has 0 bytes after clean build | |
* [UIMA-462] - CDE: when saving a remote delegate, where the remote is registered but not running, gets an internal CDE error | |
* [UIMA-464] - ClassCastException thrown when using subiterator and moveTo() | |
* [UIMA-467] - TypeSystemUtils.typeSystem2TypeSystemDescription produces invalid output for arrays with elementType specified | |
* [UIMA-468] - race condition in JCasImpl initializing static array | |
* [UIMA-469] - not all jars in the lib directory of a PEAR project are added to the PEAR CLASSPATH automatically | |
* [UIMA-474] - Log messages for duplicate resource declarations have their arguments switched | |
* [UIMA-476] - FSArray causes error in SOAP service | |
* [UIMA-479] - fix test class names that do not end with "Test" | |
* [UIMA-480] - DocumentAnalyzer interactive mode only eligible if an input data directory is specified | |
* [UIMA-484] - Clean build fails on Saxon download (tmp dir does not exist) | |
* [UIMA-486] - CVD error message box cannot be closed with OK button | |
* [UIMA-488] - CVD doesn't handle Errors that are thrown by an AE | |
* [UIMA-489] - Windows .bat files should use "endlocal" command | |
* [UIMA-490] - release number in wrong format | |
* [UIMA-491] - CPE GUI doesn't handle spaces in component descriptor file paths | |
* [UIMA-492] - uimaj-cpe test failures on some machines when run from maven | |
* [UIMA-494] - AnalysisEngineDescription_impl indirectly uses promletatic method URL.equals() | |
* [UIMA-496] - PEAR API does not delete the PEAR ID subdirectory before the new content is installed | |
* [UIMA-508] - Docbook build tool - not updating the olink databases unless running the full 4-book build | |
* [UIMA-514] - remove souce jars from binary distribution | |
* [UIMA-517] - CDE has internal bug - shows up in Error log when using the Add delegate to an aggregate when picking top level project | |
* [UIMA-519] - Infinite Loop in AnnotationIndexImpl tree() | |
* [UIMA-520] - Calling CasCreationUtils to produce a custom resource is ignoring the passed in ResourceManager in some cases | |
** Improvement | |
* [UIMA-53] - Add Flow.aborted() method | |
* [UIMA-125] - Apache UIMA client should be able to communicate with IBM UIMA (1.x or 2.0) service | |
* [UIMA-236] - Part of getting better results from Docbook - upgrade to current version (4.5 and 1.72.0) and FOP 0.93 | |
* [UIMA-238] - make docbook build script skip build if output exists and target date is later than dependent source dates (normal "make" behavior) | |
* [UIMA-307] - Fix CVD screenshots | |
* [UIMA-337] - Should log process begin/end for service adapters | |
* [UIMA-338] - Add method XMLParser.parseFlowControllerDescription | |
* [UIMA-339] - Support MBean Name Prefix in the additional parameters map passed to produceAE | |
* [UIMA-348] - CollectionProcessComplete should execute in fixedFlow order if there is a fixedFlow | |
* [UIMA-353] - Expose ResourceManager.setCasManager | |
* [UIMA-354] - UIMA datapath support for pear files | |
* [UIMA-355] - Eclipse PDE nature for org.apache.uima.runtime project | |
* [UIMA-358] - Add JMX MBeans for CAS Pools | |
* [UIMA-363] - add log level configuration possibility for CVD | |
* [UIMA-366] - Rename plugin directories from xxxxx.version to xxxx_version | |
* [UIMA-368] - Allow setting logger config file and other JVM system properties in scripts/bat files | |
* [UIMA-372] - remove deprecated methods in testcases | |
* [UIMA-375] - Paragraph on "Eclipse has a steep learning curve..." repeated inside one section | |
* [UIMA-378] - CDE plugin: change some private members to protected that derived classes can work with them | |
* [UIMA-380] - runCPE utility should report initilazation time and processing time separately | |
* [UIMA-381] - Rename CVD packages to more intuitive name | |
* [UIMA-386] - Switching to use correct class loader | |
* [UIMA-388] - When CollectionReader wrapped as CAS Multiplier, if a second process call comes in, call reconfigure | |
* [UIMA-401] - Make DocBook build work out of the box in Eclipse | |
* [UIMA-406] - Continue restructuring of CVD code | |
* [UIMA-408] - Make more CASImpl methods private, have clients use ll APIs. | |
* [UIMA-409] - Reorganization of TypeSystemImpl, CASImpl, FSClassRegistry, adding new CASMetadata class | |
* [UIMA-419] - Reduce space used for casAddr to JCas object map by a factor of 4 or more | |
* [UIMA-425] - CVD should have close method that doesn't shut down JVM | |
* [UIMA-436] - Eclipse Runtime Plugin: add line to permit Fragments to add to API for other tooling | |
* [UIMA-439] - Docbooks: support scale= in pdfs, convert to 0.93 FOP, fix scaling of many images | |
* [UIMA-445] - add CasEditor abstract to the Sandbox web page. | |
* [UIMA-460] - Change CAS Editor Docs project to be a general sandbox docs project | |
* [UIMA-461] - Have docbuild ant script check for JVM version 5 or better | |
* [UIMA-465] - Need getViewIterator() method to work with a variable number of views | |
* [UIMA-499] - Make it easier for users to view UIMA JavaDocs from Eclipse | |
* [UIMA-500] - Reduce excessive synch lock contention caused by calls to ll_isValidTypeCode that are not needed | |
* [UIMA-501] - Cas Editor: Finish the about dialog | |
* [UIMA-507] - Remove ref to gutenberg.org to avoid licensing entanglement possibility | |
** New Feature | |
* [UIMA-325] - Enhance XMI Serializer to support merging multiple XMI documents into a single CAS | |
* [UIMA-327] - Flow Controller API extensions in support of more complex flow options | |
* [UIMA-331] - Provide/extend a built-in flow controller that can be configured to do ParallellStep or to continue after error | |
* [UIMA-341] - Support <import> in CPE Descriptor | |
* [UIMA-342] - make jcasgen able to used other templates | |
* [UIMA-351] - UIMA pear runtime | |
* [UIMA-352] - Allow custom service adapters to be plugged in | |
* [UIMA-377] - add API to build PEAR packages | |
* [UIMA-413] - Allow RunAE to use XMI format XML CAS for input and output | |
* [UIMA-416] - CVD should be able to read and write XMI documents | |
* [UIMA-418] - add new UIMA analysis example descriptor | |
* [UIMA-446] - Create FS variables project in sandbox | |
** Task | |
* [UIMA-309] - Change version number to 2.2-SNAPSHOT (post-2.1.0 release) | |
* [UIMA-473] - Update README and RELEASE_NOTES | |