blob: 90417c5f00e68e47f0912c97c84370843d2645bb [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
<!ENTITY imgroot "images/version_3_users_guide/backwards_compatibility/">
<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent">
%uimaents;
]>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<chapter id="uv3.backwards_compatibility">
<title>Backwards Compatibility</title>
<titleabbrev>Backwards Compatibility</titleabbrev>
<para>Because users have made substantial investment in developing
applications using the UIMA framework, a goal of version 3 is to protect this investment, by enabling
Annotators and applications developed under previous versions to be able to be used in
subsequent versions of the framework.</para>
<para>To this end, version 3 is designed to be backwards compatible,
except for needing:
<itemizedlist>
<listitem>
<para>possibly a recompilation (due to some rearrangements of many classes and interfaces)</para>
</listitem>
<listitem>
<para>a new set of User-defined JCas classes (if these were
previously being used). The creation of these Cas classes
can be done by regenerating them using JCasGen, or by
using a migration tool that handles converting the existing JCas classes.
A later chapter covers how to upgrade the JCas classes.</para>
</listitem>
</itemizedlist>
There are some additional exceptions, described in the following sections.
</para>
<section id="uv3.backwards_compatibility.jcas">
<title>JCas and non-JCas APIs</title>
<para>The JCas class changes include no longer needing or using the <code>Xyz_Type</code>
sister classes for each main JCas class. User code is unlikely to access these sister classes.
The JCas API method to access this sister class now throws a UnsupportedOperation exception.
</para>
<para>The non-JCas Java cover classes for the built-in UIMA types remain, for backwards compatibility.
So, if you have code that casts a Feature Structure instance to AnnotationImpl (a now deprecated
version 2 non-JCas Java cover class), that will continue to work.
</para>
<section id="uv3.backwards_compatibility.jcas.names">
<title>Additional reserved names in the JCas generated classes</title>
<titleabbrev>JCas reserved names</titleabbrev>
<para>Names beginning with "_" (underscore) are being used by the new JCas implementation, so you should not
name things with this convention. If you do, please insure your names are not colliding with the names
being used by the generated JCas files.
</para>
</section>
</section>
<section id="uv3.backwards_compatibility.serialization">
<title>Serialization forms</title>
<para>
The backwards compatibility extends to the serialized forms, so that it should be
possible to have a UIMA-AS services working with a client, where the client is a version 3 instance, but the
server is still a version 2 (or vice versa).</para>
<section id="uv3.backwards_compatibility.serialization.deltas">
<title>Delta CAS Version 2 Binary deserialization not supported</title>
<para>The binary serialization forms, including Compressed Binary Form 4, build an
internal model of the v2 CAS in order to be able to deserialize v2 generated
versions. For <emphasis role="strong">delta</emphasis> CAS, this model cannot be accurately built, because version 3
excludes from the model all unreachable Feature Structures, so in most cases it
won't match the version 2 layout.
</para>
<para>Version 3 will throw an exception if delta CAS deserialization of a version 2
binary delta CAS is attempted.
</para>
</section>
</section>
<section id="uv3.backwards_compatibility.low_level_apis">
<title>APIs for creating and modifying Feature Structures</title>
<para>There are 3 sets of APIs for creating and modifying Feature Structures;
all are supported in V3.
<itemizedlist spacing="compact">
<listitem>
<para>Using the JCas classes</para>
</listitem>
<listitem>
<para>Using the normal CAS interface with Type and Feature objects</para>
</listitem>
<listitem>
<para>Using the low level CAS interface with int codes for Types and Features</para>
</listitem>
</itemizedlist>
</para>
<para>Version 3 retains all 3 sets, to enable backward compatibility.</para>
<para>The low level CAS interface was originally provided to enable a extra-high-performance
(but without compile-time type safety checks) mode.
In Version 3, this mode is actually somewhat slower than the others, and no longer has any advantages.
</para>
<para>Using the low level CAS interface also sometimes blocks one of the new features of Version 3 -
namely, automatic
garbage collection of unreachable Feature Structures. This is because creating a Feature Structure using the
low level API creates the Java object for that Feature Structure, but returns an "int" handle to it.
In order to be able to find the Feature Structure, given that int handle, an entry is made in an internal map.
This map holds a reference to this Feature Structure, which prevents it from being garbage collected (until of coursse,
the CAS is reset).
</para>
<para>The normal CAS APIs allow writing Annotators where the type system is unknown at compile time; these
are fully supported.</para>
</section>
<section id="uv3.backwards_compatibility.preserve_v2_ids">
<title>Preserving V2 ids, with low level CAS Api accessibility</title>
<titleabbrev>Preserving V2 Ids</titleabbrev>
<para>
Some V2 applications make use of the Feature Structure address, using these as an integer identifier
and using the low level CAS APIs to access the Feature Structure, given this integer. These applications
also often use the stability of these ids across some serialization/deserializations.
</para>
<para>
Normally in V3, deserialization of CASs having these IDs occurs without preserving the IDs, and without setting
up the low level CAS APIs to be able to access these using them. If an existing application depends
on the low level access via the address, a special mode, called <code>V2IdRefs</code>, can be specified, which will support this.
It comes at a cost however, which is that all new Feature Structures created (or deserialized) will be added to
an internal table to enable the low level CAS getFSForRef(int) method to work. As a result, these Feature Structures
are not eligible for garbage collection.
</para>
<para>This mode is set on individual CASs via a new API; a default value may optionally be specified.
Once set on a CAS, it remains until set to a different value;
CAS Reset does not affect the setting, nor does checking it into / out of a CAS Pool.
</para>
<para>When a new CAS is created, this mode is set according to two sources:
<itemizedlist>
<listitem><para>a <code>-Duima.default_v2_id_references</code> system property, read once when the
UIMA framework classes are loaded.</para>
</listitem>
<listitem><para>A run-time value kept per thread, managed by an API on the LowLevelCAS interface. The
setting is inherited by any child threads the thread creates, and
overrides the system property if used.</para></listitem>
<listitem><para>If neither of these are used, then the default is to not be in the sepcial v2-mode.</para></listitem>
</itemizedlist>
</para>
<para>The APIs for this are part of the LowLevelCAS. The controlling APIs all return an instance
of AutoClosableNoException, which can be used to reset the setting to its previous value.
A recommended way of using these is with the Java <code>try with resources</code> construct:
<programlisting>try (AutoClosableNoException w = llcas.ll_enableV2IdRefs) {
... some operations
} // automatically restores previous value
</programlisting>
</para>
<para>LowLevelCas instance APIs for enabling/disabling this mode on a particular CAS:</para>
<programlisting>// set the mode
AutoClosable ll_enableV2IdRefs()
// same, but with explicit set or reset of the mode
AutoClosableNoException ll_enableV2IdRefs(true/false)
// return true if the mode is enabled
boolean is_ll_enableV2IdRefs()
</programlisting>
<para>Static LowLevelCas APIs for setting the default value for this mode for new CASs on a particular thread:</para>
<programlisting>
// set the default
AutoClosableNoException LowLeveCas.ll_defaultV2IdRefs()
// same, but with explicit set or reset of the mode
AutoClosableNoException LowLeveCas.ll_defaultV2IdRefs(true/false)
// return true if the mode is enabled
boolean LowLeveCas.is_ll_defaultV2IdRefs()
</programlisting>
<para>This mode modifies multiple things in the operation of UIMA V3.</para>
<itemizedlist>
<listitem><para>Newly created Feature structures
have IDs which match what UIMA V2 references (the "addresses")
would be. For serialized forms (except Xmi), these IDs
match the (imputed) v2 IDs of the serialized form.
</para>
<para>Newly created Feature Structures, including those created when deserializing,
are added to an internal map which maps the ID to the Feature Structure instance.
Feature Structures may be located by ID using the LowLevelCAS API
<code>getFSForRef().</code>
</para>
<para>
In order for this to work correctly, the mode must be set while the CAS is empty.
If the mode is attempted to be set on a non-empty CAS, an IllegalStateException
is thrown.
<!-- Serialized forms like Xmi and XCAS have the ids
included in the external form;
-->
</para>
</listitem>
<listitem>
<para>This mode modifies serialization (except for XCas, Xmi, and Compressed form 6, which in V2 are
implemented to just serialize reachable Feature Structures) to include non-reachable FSs.
</para>
</listitem>
<listitem><para>Note: This does not affect the <code>select</code> framework results - unreachable Feature Structures
are not included.</para></listitem>
</itemizedlist>
</section>
<section id="uv3.backwards_compatibility.PEARs">
<title>PEAR support</title>
<para>Pears are supported in Version 3. If they use JCas, their JCas classes need to be migrated.
</para>
<para>When a PEAR contains a JCas class definition different from the surrounding non-PEAR context,
each Feature Structure instance within that PEAR has a lazily-created "dual" representation using
the PEAR's JCas class definition. The UIMA framework things storing references to Feature Structures
are modified to store the non-PEAR version of the Feature Structure, but to return (when in
a particular PEAR component in the pipeline) the dual version. The intent is that this be
"invisible" to the PEAR's annotators. Both of these representations share the same
underlying CAS data, so modifications to one are seen in the other.
</para>
<para>If a user builds code that holds onto Feature Structure references, outside of
annotators (e.g., as a shared External Resource), and sets and references these from
both outside and inside one (or more) PEARs, they should adopt a strategy of storing
the non-PEAR form. To get the non-PEAR form from a Feature Structure, use the method
<code>myFeatureStructure._maybeGetBaseForPearFs()</code>.
</para>
<blockquote><para>Similarly, if code running in an Annotator within a PEAR wants to
work with a Feature Structure extracted from non-UIMA managed data outside of annotators
(e.g., such as a shared External Resource) where the form stored is the non-PEAR form,
you can convert to the PEAR form using the method
<code>myFeatureStructure.__maybeGetPearFs()</code>. This method checks to see if
the processing context of the pipeline is currently within a PEAR, and if that PEAR has
a different definition for that JCas class, and if so, it returns that version of the
Feature Structure.
</para></blockquote>
<para>The new Java Object support does not support multiple,
different JCas class definitions for the same
UIMA Type, inside and outside of the PEAR context.
If this is detected, a runtime exception is thrown.
</para>
<para>The workaround for this is to manually merge any JCas class definitions for the same class.</para>
</section>
<section id="uv3.backwards_compatibility.toString">
<title>toString()</title>
<para>
The formatting of various UIMA artifacts, including Feature Structures,
has changed somewhat, to be more informative. This may impact
situations such as testing, where the exact string representations
are being compared.
</para>
<para>A special global Java property, -Duima.v2_pretty_print_format can be set to have the toString() operation
for Feature Sructures print in the V2 style.
</para>
</section>
<section id="uv3.backwards_compatibility.logging">
<title>Logging configuration is somewhat different</title>
<para>The default logging configuration in v2 was to use Java Util Logging (the logger built into Java).
For v3, the default is to use SLF4J which, in turn, picks a back-end logger, depending on what it finds
in the class path.</para>
<para>This change was done to permit easier integration of UIMA as a library running within other frameworks.
</para>
<para>V3 UIMA logger includes the APIs like info(..), warn(..) etc., that are part of the
SLF4j APIs. In addition, these are augmented with the Java 8 style lambda arguments that
were introduced in log4j-2, for more concise and efficient log message computation.</para>
<para>The new UIMA Logger APIs (e.g. logger.info(...), logger.warn(...))
use the SLF4j and other modern logger substitutable notation of "{}", as opposed
to the style adopted by the original Java logger, of "{nnn}". All modern loggers have switched to
this. </para>
<para>The technique for (optionally) reporting the class and method (and sometimes, line number)
was changed to conform to current logger conventions - whereby the loggers themselves obtain this
information from the call stack. The V2 calls which pass in the sourceClass and sourceMethod information
have this information ignored, but replaced with what the loggers obtain from the stack track.
In some cases, where the callers in V2 were not actually passing in the correct class/method information,
this will result in a different log record.</para>
<para>
For more details, please see the logging chapter.
</para>
</section>
<section id="uv3.backwards_compatibility.typesystem_sharing">
<title>Type System sharing</title>
<para>Type System definitions are shared when they are equal. After type systems
have been built up from type definitions, at "commit" time, a check is made to see if an
identical type system already exists (same types and features).
This is often the case when a UIMA application is
scaling up by adding multiple pipelines, all using the same type system.
</para>
<para>If an identical committed type system already exists, then the commit operation returns
it, and the one just built is discarded. Normally, this is not an issue.
However, some application code may save references to the type system object or to defined types and features.
These references end up pointing to the discarded version, when the commit operation finds an already
committed equal version.</para>
<para>Application code may code around this by re-acquiring references to the type system object, and to any type and feature
objects, if the type system instance object returned from <code>commit</code> is not identical (==) to the
one being committed.
The type system commit APIs are changed to return the type system - either the one being committed, or an
already existing equal committed type system. So when coding
<code>myTypesystem.commit();</code> if you later refer to <code>myTypesystem</code>, change this to
<code>myTypesystem = myTypesystem.commit();</code>, to keep the variable <code>myTypesystem</code> always
referring to to the committed type system.
</para>
</section>
<!-- Not implemented
<section id="uv3.backwards_compatibility.deserializing_0_length">
<title>Deserializing 0 length items in a CAS</title>
<para>In V3, 0-length arrays and lists belonging to one CAS are deserialized into a shared Java object. Since
0-length arrays and lists are
immutable, this should normally not matter. It could cause problems if the application somehow is
depending on object non-identity equality for these items.
</para>
</section>
-->
<section id="uv3.backwards_compatibility.checking">
<title>Some checks moved to native Java</title>
<para>In the interest of performance, some duplicate checks, such as whether an array index is within bounds,
have been removed from UIMA when they are already being checked by the underlying Java runtime.
This has affected some of the internal APIs, such as the JCas's <code>checkArrayBounds</code>
which was removed because it was no longer being used.
</para>
</section>
<section id="uv3.backwards_compatibility.class_hierarchy">
<title>Some class hierarchies have been modified</title>
<para>The various JCas Classes implementing the built-ins for arrays have some additional
interfaces added, grouping them into <code>CommonPrimitiveArray</code> or
<code>CommonArray</code>. These changes are internal, and should not affect users.
</para>
</section>
<section id="uv3.backwards_compatibility.multiple_type_systems_with_common_jcas">
<title>Enabling multiple versions of type systems to work with a single common JCas class</title>
<titleabbrev>Multi-TypeSystems single JCas</titleabbrev>
<para>Some applications may use a JCas class definition, defining for type T features f1, f2, f3 (for example), in a mode
where under a single class loader (for example, in one Java application), multiple CASs are loaded and processed,
where each CAS might have other versions of the type system, defining for type T a subset of the features in the JCas.
</para>
<para>In order to make this scenario possible, v3 takes an extra step, right before type system commit time, of
loading the JCas classes corresponding to the types, and then augmenting the type definitions with additional
features defined in the JCas but not in the type description. After this is done, the type system is committed,
and offsets are assigned to the JCas class that are constant, even when a subsequent type system is loaded
that defines more features (provided that no new features are introduced).</para>
<para>This feature represents a trade-off between highly efficient, locked-down offsets for features, and
some limited flexibility to handle a somewhat common use case where additional features exist in the JCas.
The JCas loading code always checks to insure compatibility between the offsets in the JCas classes, as first set up,
and any subsequent type system being used with that JCas.
</para>
<para>This accommodation doesn't handle many possible scenarios. Some of these include situations where a
supertype might subsequently add extra feature slots, or the features end up after merging to have a different
ordering.</para>
<para>For cases where this accommodation is insufficient, the workaround is to run separate UIMA applications, each
under its own class loader, for the incompatible situations.</para>
<para>PEARs, because they are loaded lazily after the type system has been committed, do not support this kind of
augmentation of types from the Pear-specific JCas class definition.</para>
</section>
</chapter>