blob: fe09ad2a5807d2a618a757d30cb5ba8ba6d44392 [file] [log] [blame]
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
[[_uv3.backwards_compatibility]]
= Backwards Compatibility
Because users have made substantial investment in developing applications using the UIMA framework, a goal of version 3 is to protect this investment, by enabling Annotators and applications developed under previous versions to be able to be used in subsequent versions of the framework.
To this end, version 3 is designed to be backwards compatible, except for needing:
* possibly a recompilation (due to some rearrangements of many classes and interfaces)
* a new set of User-defined JCas classes (if these were previously being used). The creation of these CAS classes can be done by regenerating them using JCasGen, or by using a migration tool that handles converting the existing JCas classes. A later chapter covers how to upgrade the JCas classes.
There are some additional exceptions, described in the following sections.
[[_uv3.backwards_compatibility.jcas]]
== JCas and non-JCas APIs
The JCas class changes include no longer needing or using the `Xyz_Type` sister classes for each main JCas class.
User code is unlikely to access these sister classes.
The JCas API method to access this sister class now throws a UnsupportedOperation exception.
The non-JCas Java cover classes for the built-in UIMA types remain, for backwards compatibility.
So, if you have code that casts a Feature Structure instance to AnnotationImpl (a now deprecated version 2 non-JCas Java cover class), that will continue to work.
[[_uv3.backwards_compatibility.jcas.names]]
=== Additional reserved names in the JCas generated classes
// <titleabbrev>JCas reserved names</titleabbrev>
Names beginning with "_" (underscore) are being used by the new JCas implementation, so you should not name things with this convention.
If you do, please insure your names are not colliding with the names being used by the generated JCas files.
[[_uv3.backwards_compatibility.serialization]]
== Serialization forms
The backwards compatibility extends to the serialized forms, so that it should be possible to have a UIMA-AS services working with a client, where the client is a version 3 instance, but the server is still a version 2 (or vice versa).
[[_uv3.backwards_compatibility.serialization.deltas]]
=== Delta CAS Version 2 Binary deserialization not supported
The binary serialization forms, including Compressed Binary Form 4, build an internal model of the v2 CAS in order to be able to deserialize v2 generated versions.
For *delta* CAS, this model cannot be accurately built, because version 3 excludes from the model all unreachable Feature Structures, so in most cases it won't match the version 2 layout.
Version 3 will throw an exception if delta CAS deserialization of a version 2 binary delta CAS is attempted.
[[_uv3.backwards_compatibility.low_level_apis]]
== APIs for creating and modifying Feature Structures
There are 3 sets of APIs for creating and modifying Feature Structures; all are supported in V3.
* Using the JCas classes
* Using the normal CAS interface with Type and Feature objects
* Using the low level CAS interface with int codes for Types and Features
Version 3 retains all 3 sets, to enable backward compatibility.
The low level CAS interface was originally provided to enable a extra-high-performance (but without compile-time type safety checks) mode.
In Version 3, this mode is actually somewhat slower than the others, and no longer has any advantages.
Using the low level CAS interface also sometimes blocks one of the new features of Version 3 - namely, automatic garbage collection of unreachable Feature Structures.
This is because creating a Feature Structure using the low level API creates the Java object for that Feature Structure, but returns an "int" handle to it.
In order to be able to find the Feature Structure, given that int handle, an entry is made in an internal map.
This map holds a reference to this Feature Structure, which prevents it from being garbage collected (until of course, the CAS is reset).
The normal CAS APIs allow writing Annotators where the type system is unknown at compile time; these are fully supported.
[[_uv3.backwards_compatibility.preserve_v2_ids]]
== Preserving V2 ids, with low level CAS Api accessibility
// <titleabbrev>Preserving V2 Ids</titleabbrev>
Some V2 applications make use of the Feature Structure address, using these as an integer identifier and using the low level CAS APIs to access the Feature Structure, given this integer.
These applications also often use the stability of these ids across some serialization/deserializations.
Normally in V3, deserialization of CASs having these IDs occurs without preserving the IDs, and without setting up the low level CAS APIs to be able to access these using them.
If an existing application depends on the low level access via the address, a special mode, called ``V2IdRefs``, can be specified, which will support this.
It comes at a cost however, which is that all new Feature Structures created (or deserialized) will be added to an internal table to enable the low level CAS `getFSForRef(int)` method to work.
As a result, these Feature Structures are not eligible for garbage collection.
This mode is set on individual CASs via a new API; a default value may optionally be specified.
Once set on a CAS, it remains until set to a different value; CAS Reset does not affect the setting, nor does checking it into / out of a CAS Pool.
When a new CAS is created, this mode is set according to two sources:
* a `-Duima.default_v2_id_references` system property, read once when the UIMA framework classes are loaded.
* A run-time value kept per thread, managed by an API on the LowLevelCAS interface. The setting is inherited by any child threads the thread creates, and overrides the system property if used.
* If neither of these are used, then the default is to not be in the sepcial v2-mode.
The APIs for this are part of the LowLevelCAS.
The controlling APIs all return an instance of `AutoClosableNoException`, which can be used to reset the setting to its previous value.
A recommended way of using these is with the Java `try with resources` construct:
[source]
----
try (AutoClosableNoException w = llcas.ll_enableV2IdRefs) {
... some operations
} // automatically restores previous value
----
LowLevelCas instance APIs for enabling/disabling this mode on a particular CAS:
[source]
----
// set the mode
AutoClosable ll_enableV2IdRefs()
// same, but with explicit set or reset of the mode
AutoClosableNoException ll_enableV2IdRefs(true/false)
// return true if the mode is enabled
boolean is_ll_enableV2IdRefs()
----
Static LowLevelCas APIs for setting the default value for this mode for new CASs on a particular thread:
[source]
----
// set the default
AutoClosableNoException LowLeveCas.ll_defaultV2IdRefs()
// same, but with explicit set or reset of the mode
AutoClosableNoException LowLeveCas.ll_defaultV2IdRefs(true/false)
// return true if the mode is enabled
boolean LowLeveCas.is_ll_defaultV2IdRefs()
----
This mode modifies multiple things in the operation of UIMA V3.
* Newly created Feature structures have IDs which match what UIMA V2 references (the "addresses") would be. For serialized forms (except Xmi), these IDs match the (imputed) v2 IDs of the serialized form.
+
Newly created Feature Structures, including those created when deserializing, are added to an internal map which maps the ID to the Feature Structure instance.
Feature Structures may be located by ID using the LowLevelCAS API `getFSForRef().`
+
In order for this to work correctly, the mode must be set while the CAS is empty.
If the mode is attempted to be set on a non-empty CAS, an IllegalStateException is thrown.
* This mode modifies serialization (except for XCas, XMI, and Compressed form 6, which in V2 are implemented to just serialize reachable Feature Structures) to include non-reachable FSs.
* Note: This does not affect the `select` framework results - unreachable Feature Structures are not included.
[[_uv3.backwards_compatibility.pears]]
== PEAR support
Pears are supported in Version 3.
If they use JCas, their JCas classes need to be migrated.
When a PEAR contains a JCas class definition different from the surrounding non-PEAR context, each Feature Structure instance within that PEAR has a lazily-created "dual" representation using the PEAR's JCas class definition.
The UIMA framework things storing references to Feature Structures are modified to store the non-PEAR version of the Feature Structure, but to return (when in a particular PEAR component in the pipeline) the dual version.
The intent is that this be "invisible" to the PEAR's annotators.
Both of these representations share the same underlying CAS data, so modifications to one are seen in the other.
If a user builds code that holds onto Feature Structure references, outside of annotators (e.g., as a shared External Resource), and sets and references these from both outside and inside one (or more) PEARs, they should adopt a strategy of storing the non-PEAR form.
To get the non-PEAR form from a Feature Structure, use the method ``myFeatureStructure._maybeGetBaseForPearFs()``.
Similarly, if code running in an Annotator within a PEAR wants to work with a Feature Structure extracted from non-UIMA managed data outside of annotators (e.g., such as a shared External Resource) where the form stored is the non-PEAR form, you can convert to the PEAR form using the method ``myFeatureStructure.__maybeGetPearFs()``.
This method checks to see if the processing context of the pipeline is currently within a PEAR, and if that PEAR has a different definition for that JCas class, and if so, it returns that version of the Feature Structure.
The new Java Object support does not support multiple, different JCas class definitions for the same UIMA Type, inside and outside of the PEAR context.
If this is detected, a runtime exception is thrown.
The workaround for this is to manually merge any JCas class definitions for the same class.
[[_uv3.backwards_compatibility.tostring]]
== toString()
The formatting of various UIMA artifacts, including Feature Structures, has changed somewhat, to be more informative.
This may impact situations such as testing, where the exact string representations are being compared.
A special global Java property, -Duima.v2_pretty_print_format can be set to have the toString() operation for Feature Sructures print in the V2 style.
[[_uv3.backwards_compatibility.logging]]
== Logging configuration is somewhat different
The default logging configuration in v2 was to use Java Util Logging (the logger built into Java). For v3, the default is to use SLF4J which, in turn, picks a back-end logger, depending on what it finds in the class path.
This change was done to permit easier integration of UIMA as a library running within other frameworks.
V3 UIMA logger includes the APIs like info(..), warn(..) etc., that are part of the SLF4j APIs.
In addition, these are augmented with the Java 8 style lambda arguments that were introduced in log4j-2, for more concise and efficient log message computation.
The new UIMA Logger APIs (e.g.
logger.info(...), logger.warn(...)) use the SLF4j and other modern logger substitutable notation of "{}", as opposed to the style adopted by the original Java logger, of "{nnn}". All modern loggers have switched to this.
The technique for (optionally) reporting the class and method (and sometimes, line number) was changed to conform to current logger conventions - whereby the loggers themselves obtain this information from the call stack.
The V2 calls which pass in the sourceClass and sourceMethod information have this information ignored, but replaced with what the loggers obtain from the stack track.
In some cases, where the callers in V2 were not actually passing in the correct class/method information, this will result in a different log record.
For more details, please see the logging chapter.
[[_uv3.backwards_compatibility.typesystem_sharing]]
== Type System sharing
Type System definitions are shared when they are equal.
After type systems have been built up from type definitions, at "commit" time, a check is made to see if an identical type system already exists (same types and features). This is often the case when a UIMA application is scaling up by adding multiple pipelines, all using the same type system.
If an identical committed type system already exists, then the commit operation returns it, and the one just built is discarded.
Normally, this is not an issue.
However, some application code may save references to the type system object or to defined types and features.
These references end up pointing to the discarded version, when the commit operation finds an already committed equal version.
Application code may code around this by re-acquiring references to the type system object, and to any type and feature objects, if the type system instance object returned from `commit` is not identical (==) to the one being committed.
The type system commit APIs are changed to return the type system - either the one being committed, or an already existing equal committed type system.
So when coding `myTypesystem.commit();` if you later refer to ``myTypesystem``, change this to ``myTypesystem = myTypesystem.commit();``, to keep the variable `myTypesystem` always referring to to the committed type system.
[[_uv3.backwards_compatibility.checking]]
== Some checks moved to native Java
In the interest of performance, some duplicate checks, such as whether an array index is within bounds, have been removed from UIMA when they are already being checked by the underlying Java runtime.
This has affected some of the internal APIs, such as the JCas's `checkArrayBounds` which was removed because it was no longer being used.
[[_uv3.backwards_compatibility.class_hierarchy]]
== Some class hierarchies have been modified
The various JCas Classes implementing the built-ins for arrays have some additional interfaces added, grouping them into `CommonPrimitiveArray` or ``CommonArray``.
These changes are internal, and should not affect users.
[[_uv3.backwards_compatibility.multiple_type_systems_with_common_jcas]]
== Enabling multiple versions of type systems to work with a single common JCas class
// <titleabbrev>Multi-TypeSystems single JCas</titleabbrev>
Some applications may use a JCas class definition, defining for type T features f1, f2, f3 (for example), in a mode where under a single class loader (for example, in one Java application), multiple CASs are loaded and processed, where each CAS might have other versions of the type system, defining for type T a subset of the features in the JCas.
In order to make this scenario possible, v3 takes an extra step, right before type system commit time, of loading the JCas classes corresponding to the types, and then augmenting the type definitions with additional features defined in the JCas but not in the type description.
After this is done, the type system is committed, and offsets are assigned to the JCas class that are constant, even when a subsequent type system is loaded that defines more features (provided that no new features are introduced).
This feature represents a trade-off between highly efficient, locked-down offsets for features, and some limited flexibility to handle a somewhat common use case where additional features exist in the JCas.
The JCas loading code always checks to insure compatibility between the offsets in the JCas classes, as first set up, and any subsequent type system being used with that JCas.
This accommodation doesn't handle many possible scenarios.
Some of these include situations where a supertype might subsequently add extra feature slots, or the features end up after merging to have a different ordering.
For cases where this accommodation is insufficient, the workaround is to run separate UIMA applications, each under its own class loader, for the incompatible situations.
PEARs, because they are loaded lazily after the type system has been committed, do not support this kind of augmentation of types from the Pear-specific JCas class definition.