<?xml version="1.0" encoding="UTF-8"?> | |
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" | |
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[ | |
<!ENTITY imgroot "images/version_3_users_guide/overview/"> | |
<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent"> | |
%uimaents; | |
]> | |
<!-- | |
Licensed to the Apache Software Foundation (ASF) under one | |
or more contributor license agreements. See the NOTICE file | |
distributed with this work for additional information | |
regarding copyright ownership. The ASF licenses this file | |
to you under the Apache License, Version 2.0 (the | |
"License"); you may not use this file except in compliance | |
with the License. You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, | |
software distributed under the License is distributed on an | |
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | |
KIND, either express or implied. See the License for the | |
specific language governing permissions and limitations | |
under the License. | |
--> | |
<chapter id="uv3.overview"> | |
<title>Overview of UIMA Version 3</title> | |
<titleabbrev>Overview</titleabbrev> | |
<para>UIMA Version 3 adds significant new functionality for the Java SDK, while remaining | |
backward compatible with Version 2. Much of this new function is enabled by a shift in the | |
internal details of how Feature Structures are represented. In Version 3, these are represented | |
internally as ordinary Java objects, and subject to garbage collection.</para> | |
<blockquote><para> | |
In contrast, version 2 stored Feature Structure data in special internal | |
arrays of <code>ints</code> and other data types. | |
Any Java object representation of Feature Structures in version 2 | |
was merely forwarding references to these internal data representations. | |
</para> | |
</blockquote> | |
<para>If JCas is being used in an application, the JCas classes must be migrated, but this can often | |
be done automatically. In Version 3, the JCas classes ending in "_Type" are no longer used, and the | |
main JCas class definitions are much simplified.</para> | |
<blockquote><para> | |
If an application doesn't use JCas classes, then nothing need be done for migration. | |
Otherwise, the JCas classes can be migrated in several ways: | |
<variablelist> | |
<!-- <varlistentry> | |
<term><emphasis role="strong">automatically (if running with a Java Development Kit, which includes a Java compiler), | |
when starting the application</emphasis></term> | |
<listitem><para>This is the easiest, and works for normal JCas classes, but incurs a migration cost | |
at every startup.</para></listitem> | |
</varlistentry> --> | |
<varlistentry> | |
<term><emphasis role="strong">generating during build</emphasis></term> | |
<listitem> | |
<para>If the project is built by Maven, it's possible the JCas classes are built from the type descriptions, | |
using UIMA's | |
Maven JCasGen plugin. If so, you can just rebuild the project; the JCasGen plugin for V3 generates | |
the new JCas classes. | |
</para> | |
</listitem> | |
</varlistentry> | |
<varlistentry> | |
<term><emphasis role="strong">running the migration utility</emphasis></term> | |
<listitem> | |
<para>This is the recommended way if you can't regenerate the classes from the type descriptions.</para> | |
<para>This does the work of migrating and produces new versions of the JCas classes, which | |
need to replace the existing ones. It allows complex existing JCas classes to migrated, perhaps | |
with developer assistance as needed. Once done, the application has no migration startup cost.</para> | |
<para>The migration tool is capable of using existing source or compiled JCas classes as input, and | |
can migrate classes contained within Jars or PEARs. | |
</para> | |
</listitem> | |
</varlistentry> | |
<varlistentry> | |
<term><emphasis role="strong">regenerating the JCas classes using the JCasGen tool</emphasis></term> | |
<listitem><para>The JCasGen tool (available as a Eclipse or Maven plugin, or a stand-alone application) | |
generates Version 3 JCas classes from the XML descriptors.</para> | |
<para>This is perfectly adequate for migrating non-customized JCas classes. When run from the | |
UIMA Eclipse plugin for editing XML component descriptors, it will attempt to merge customizations | |
with generated code. However, its approach is not as comprehensive as the migration tool, which parses | |
the Java source code.</para></listitem> | |
</varlistentry> | |
</variablelist> | |
</para></blockquote> | |
<para>Migration of JCas classes is the first step needed to start using UIMA version 3. | |
See the later chapter on migration for details on using the migration tool. | |
</para> | |
<section id="uv3.overview.new"> | |
<title>What's new in UIMA Java SDK version 3</title> | |
<titleabbrev>What's new</titleabbrev> | |
<para>The major improvements in version 3 include: | |
<variablelist> | |
<varlistentry> | |
<term><emphasis role="strong">Support for arbitrary Java objects, transportable in the CAS</emphasis></term> | |
<listitem> | |
<para>Support is added to allow users to define additional UIMA Types whose JCas implementation may | |
include Java objects, with serialization and deserialization performed using normal CAS transportable | |
data. A following chapter on Custom Java Objects describes this new facility.</para> | |
</listitem> | |
</varlistentry> | |
<varlistentry> | |
<term><emphasis role="strong">New UIMA built-in types, built using the custom Java object support</emphasis></term> | |
<listitem> | |
<para>The new support that allows custom serialization of arbitrary Java objects so they can be transported | |
in the CAS (above) is used to implement several new built-in UIMA types. These are implemented in a | |
"lazy" style, avoiding extra computation until needed. | |
<variablelist> | |
<varlistentry> | |
<term><emphasis role="strong">FSArrayList</emphasis></term> | |
<listitem> | |
<para>a Java ArrayList of Feature Structures. The JCas class implements the List API.</para> | |
</listitem> | |
</varlistentry> | |
<varlistentry> | |
<term><emphasis role="strong">IntegerArrayList</emphasis></term> | |
<listitem> | |
<para>a variable length int array. Supports OfInt iterators.</para> | |
</listitem> | |
</varlistentry> | |
<varlistentry> | |
<term><emphasis role="strong">FSHashSet</emphasis></term> | |
<listitem> | |
<para>a Java HashSet containing Feature Structures. This JCas class implements the Set API.</para> | |
</listitem> | |
</varlistentry> | |
</variablelist> | |
</para> | |
</listitem> | |
</varlistentry> | |
<varlistentry> | |
<term><emphasis role="strong">Select framework for accessing Feature Structures</emphasis></term> | |
<listitem> | |
<para> | |
A new <emphasis>select framework</emphasis> provides a concise way to work with Feature Structure | |
data stored in the CAS or other collections. It is integrated with the Java 8 <emphasis>stream</emphasis> | |
framework, while providing additional capabilities supported by UIMA, such as the ability to move | |
both forwards and backwards while iterating, moving to specific positions, and doing various kinds | |
of specialized Annotation selection such as working with Annotations spanned by another annotation. | |
</para> | |
<para>This user's guide has a chapter devoted to this new framework. | |
</para> | |
</listitem> | |
</varlistentry> | |
<varlistentry> | |
<term><emphasis role="strong">Elimination of ConcurrentModificationException while iterating over UIMA indexes</emphasis></term> | |
<listitem> | |
<para>The index and iteration mechanisms are improved; it is now allowed to modify the indexes while | |
iterating over them (the iteration will be unaffected by the modification).</para> | |
<para>Note that the automatic index corruption avoidance introduced in more recent versions of UIMA could | |
be automatically removing Feature Structures from indexes and adding them back, if the user was updating | |
some Feature of a Feature Structure that was part of an index specification for inclusion or ordering purposes.</para> | |
<blockquote><para>In version 2, you would accomplish this using a two pass scheme: | |
Pass 1 would iterate and merely collect the Feature Structures to be updated into a Java collection of some kind. | |
Pass 2 would use a plain Java iterator over that collection and modify the Feature Structures and/or the UIMA indexes. | |
This is no longer needed in version 3; UIMA iterators use a copy-on-write technique to allow index updating, | |
while doing whatever minimal copying is needed to continue iteration over the original index.</para> | |
</blockquote> | |
</listitem> | |
</varlistentry> | |
<varlistentry> | |
<term><emphasis role="strong">Automatic garbage collection of unreferenced Feature Structures</emphasis></term> | |
<listitem> | |
<para>This allows creating of temporary Feature Structures, and automatically reclaiming | |
space resources when they are no longer needed. In version 2, space was reclaimed only when a | |
CAS was reset at the end of processing.</para> | |
</listitem> | |
</varlistentry> | |
<varlistentry> | |
<term><emphasis role="strong">better performance</emphasis></term> | |
<listitem> | |
<para>The internal design details have been extensively reworked to align | |
with recent trends in computer hardware | |
over the last 10-15 years. In particular, space and time tradeoffs are adjusted in favor of | |
using more memory for better locality-of-reference, which improves performance. | |
In addition, the many internal algorithms (such as managing Feature Structure indexes) have | |
been improved. | |
</para> | |
<para>Type system implementations are reused where possible, reducing the footprint in many scaled-out cases.</para> | |
</listitem> | |
</varlistentry> | |
<varlistentry> | |
<term><emphasis role="strong">Backwards compatible</emphasis></term> | |
<listitem> | |
<para>Version 3 is intended to be binary backwards compatible - the goal is that you should be able to run | |
existing applications without recompiling them, except for the need to migrate or regenerate | |
any User supplied JCas Classes. Utilities are provided to help do the necessary JCas migration mostly automatically.</para> | |
</listitem> | |
</varlistentry> | |
<!-- <varlistentry> | |
<term><emphasis role="strong">New facility: unique IDs for selected Feature Structures</emphasis></term> | |
<listitem> | |
<para>User defined Types can have use a special, reserved feature name, uimaOID, to store | |
a unique ID which is generated by UIMA. To enable this for a type, the user defines the | |
uimaOID feature; if present, UIMA will assign a | |
unique (for this CAS) OID to this feature when creating the Feature Structure. | |
This OID is composed of an incrementing number prefixed by the CAS's OID prefix. | |
</para> | |
<para>When UIMA sends a CAS to a remote service, it generates an incremented prefix OID that is | |
unique for this CAS and sends that along with the CAS; this prefix OID is used by the remote | |
service when it needs to generate a unique OID. | |
</para> | |
<para>A new CAS API allows retrieving indexed Feature Structures using this OID. This capability | |
is built lazily on first use, as a special hash table mapping these ids to Feature Structures. Note that | |
this table is built using Java WeakReferences and therefore will not block garbage collecting those | |
Feature Structures which are subsequently removed from the index and have no other references to them. | |
</para> | |
</listitem> | |
</varlistentry> --> | |
<varlistentry> | |
<term><emphasis role="strong">Integration with Java 8</emphasis></term> | |
<listitem> | |
<para>Version 3 requires Java 8 as the minimum level. Some of version 3's new facilities, such as the | |
<code>select</code> framework for accessing Feature Structures from CASs or other collections, | |
integrate with the new Java 8 language constructs, such as <code>Streams</code> and <code>Spliterators</code>.</para> | |
</listitem> | |
</varlistentry> | |
</variablelist> | |
</para> | |
<para>Just to give a small taste of the kinds of things Java 8 integration provides, | |
here's an example of using the new <code>select</code> framework, where the task is to compute | |
<itemizedlist spacing="compact"> | |
<listitem><para>a Set of all the found types | |
<itemizedlist spacing="compact"> | |
<listitem><para>in a UIMA index</para></listitem> | |
<listitem><para>under some top-most type "MyType"</para></listitem> | |
<listitem><para>occurring as Annotations within a particular bounding Annotation</para></listitem> | |
<listitem><para>that are nonOverlapping</para></listitem> | |
</itemizedlist> | |
</para></listitem> | |
</itemizedlist> | |
</para> | |
<para> | |
Here is the Java code using the new <code>select</code> framework together with Java 8 streaming functions: | |
<informalexample> <?dbfo keep-together="always"?> | |
<programlisting>Set<Type> foundTypes = | |
myIndex.select(MyType.class) | |
.coveredBy(myBoundingAnnotation) | |
.nonOverlapping() | |
.map(fs -> fs.getType()) | |
.collect(Collectors.toCollection(TreeSet::new)); | |
</programlisting> | |
</informalexample> | |
</para> | |
<para> | |
Another example: to collect, by category, the average length of the annotations having that category. | |
Here we assume that <code>MyType</code> is an <code>Annotation</code> and that it has | |
a feature called <code>category</code> which returns a String denoting the category: | |
<informalexample> <?dbfo keep-together="always"?> | |
<programlisting>Map<String, Double> freqByCategory = | |
myIndex.select(MyType.class) | |
.collect(Collectors | |
.groupingBy(MyType::getCategory, | |
Collectors.averagingDouble(f -> | |
(double)(f.getEnd() - f.getBegin())))); | |
</programlisting> | |
</informalexample> | |
</para> | |
</section> | |
<section id="uv3.overview.java8"> | |
<title>Java 8 is required</title> | |
<para>The UIMA Java SDK Version 3 requires Java 8 or later.</para> | |
</section> | |
</chapter> |