<?xml version="1.0" encoding="UTF-8"?> | |
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" | |
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[ | |
<!ENTITY imgroot "images/version_3_users_guide/custom_java_objects/"> | |
<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent"> | |
%uimaents; | |
]> | |
<!-- | |
Licensed to the Apache Software Foundation (ASF) under one | |
or more contributor license agreements. See the NOTICE file | |
distributed with this work for additional information | |
regarding copyright ownership. The ASF licenses this file | |
to you under the Apache License, Version 2.0 (the | |
"License"); you may not use this file except in compliance | |
with the License. You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, | |
software distributed under the License is distributed on an | |
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | |
KIND, either express or implied. See the License for the | |
specific language governing permissions and limitations | |
under the License. | |
--> | |
<chapter id="uv3.custom_java_objects"> | |
<title>Defining CAS-transported custom Java objects</title> | |
<titleabbrev>CAS Java Objects</titleabbrev> | |
<para>One of the goals of v3 is to support more of the Java collection framework within the CAS, | |
to enable users to conveniently build more complex models that could be transported by the CAS. | |
For example, a user might want to store a Java "Set" object, representing a set of Feature Structures. | |
Or a user might want to use an adjustable array, like Java's ArrayList. | |
</para> | |
<para>With the current version 2 implementation of JCas, users already may add arbitrary Java objects to their | |
JCas class definitions as fields, but these do not get transported with the CAS (for instance, during | |
serialization). Furthermore, in version 2, the actual JCas instance you get when accessing a Feature Structure | |
in some edge cases may be a fresh instance, losing any previously computed value held as a Java field. | |
In contrast, each Feature Structure in a CAS is represented as the same unique Java Object | |
(because that's the only way a Feature Structure is stored). | |
</para> | |
<para>Version 3 has a new a capability that enables converting arbitrary Java objects | |
that might be part of a | |
JCas class definition, into "ordinary" CAS values that can be transported with the CAS. | |
This is done using a set of conventions which the framework follows, and which developers writing these | |
classes make use of; they include | |
two kinds of marker Java interfaces, and 2 methods that are called when serializing and deserializing. | |
<blockquote><para> | |
The marker interfaces identify those JCas classes which need these extra methods called. The extra methods are | |
methods implemented by the creator of these JCas classes, which marshal/unmarshal CAS feature data to/from the | |
Java Object this class is supporting. | |
</para></blockquote> | |
</para> | |
<para> | |
Storing the Java Object data as the value of a normal CAS Feature means that they get "transported" in a portable | |
way with the CAS - they can be saved to external storage and read back in later, or sent to remote services, etc. | |
</para> | |
<section id="uv3.custom_java_objects.tutorial"> | |
<title>Tutorial example</title> | |
<para>Here's a tutorial example on how to design and implement your own special Java object. For this example, | |
we'll imagine we need to implement a map from FeatureStructures to FeatureStructures.</para> | |
<figure id="uv3.custom_java_objects.5_steps"> | |
<title>Creating a custom Java CAS-stored Object</title> | |
<mediaobject> | |
<imageobject> | |
<imagedata width="5.5in" format="PNG" fileref="&imgroot;5_steps.png"/> | |
</imageobject> | |
<textobject><phrase>5 steps to creating a custom CAS transportable Java Object</phrase> | |
</textobject> | |
</mediaobject> | |
</figure> | |
<para> | |
Step 1 is deciding on the Java Object implementation to use. We can define a special class, but in this case, | |
we'll just use the ordinary Java HashMap<TOP, TOP> for this.</para> | |
<para>Step 2 is deciding on the CAS Feature Structure representation of this. For this example, let's design this | |
to represent the serialized form of the hashmap as 2 FSArrays, one for the keys, and one for the values. | |
We could also use just one array and intermingle the keys and values. It's up to the designer of this new JCas | |
class to decide how to do this.</para> | |
<para>Step 3 is defining the UIMA Type for this. Let's call it FS2FSmap. It will have 2 Features: an FSArray | |
for the keys, and another FSArray for the values. Let's name those features "keys" and "values". | |
Notice that there's no mention of the Java object in the UIMA Type definition. | |
</para> | |
<para>Step 4 is to run JCasGen on this class to get an initial version of the class. Of course, it will be missing | |
the Java HashMap, but we'll add that in the next step.</para> | |
<variablelist><varlistentry><term>Step 5: modify 3 aspects of the generated JCas class.</term> | |
<listitem> | |
<para> | |
<variablelist> | |
<varlistentry> | |
<term>1. Mark the class with one of two interfaces:</term> | |
<listitem> | |
<itemizedlist spacing="compact"> | |
<listitem><para><code>UimaSerializable</code></para></listitem> | |
<listitem><para><code>UimaSerializableFSs</code></para></listitem> | |
</itemizedlist> | |
<para>These identify this JCas class a needing the calls to marshal/unmarshal the data to/from the | |
Java Object and the normal CAS data features. Use the second form if the data includes any | |
Feature Structure references. In our example, the data does include Feature Structure references, | |
so we add <code>implements UimaSerializableFSs</code> to our JCas class.</para> | |
</listitem> | |
</varlistentry> | |
<varlistentry> | |
<term>2. Add the Java Object as a field to the class</term> | |
<listitem> | |
<!-- create a blank line --> | |
<variablelist><varlistentry><term></term><listitem><para></para></listitem></varlistentry></variablelist> | |
<para>We'll define a new field: | |
<programlisting>final private Map<TOP, TOP> fs2fsMap = new HashMap<>();</programlisting></para> | |
</listitem> | |
</varlistentry> | |
<varlistentry> | |
<term>3. Implement two methods to marshal/unmarshal the Java Object data to the CAS Data Features</term> | |
<listitem> | |
<!-- create a blank line --> | |
<variablelist><varlistentry><term></term><listitem><para></para></listitem></varlistentry></variablelist> | |
<para>Now, we need to add the code that translates between the two UIMA Features | |
"keys" and "values" and the map, and vice-versa. We put this code into two methods, | |
called <code>_init_from_cas_data</code> and <code>_save_to_cas_data</code>. | |
These are special methods that are part of this new framework extension; | |
they are called by the framework at critical times during deserialization and serialization. | |
Their purpose is to encapsulate all that is needed to convert from transportable | |
normal CAS data, and the Java Object(s).</para> | |
<para>In this example, the <code>_init_from_cas_data</code> method would iterate over the two Features, | |
together, and add each key value pair to the Java Object. Likewise, the | |
<code>_save_to_cas_data</code> would first create two FSArray objects for the keys and values, | |
and then iterate over the hash map and extract these and set them into the key and value arrays. | |
<programlisting>public void _init_from_cas_data() { | |
FSArray keys = getKeys(); | |
FSArray values = getValues(); | |
fs2fsMap.clear(); | |
for (int i = keys.size() - 1; i >=0; i--) { | |
fs2fsMap.put(keys.get(i), values.get(i)); | |
} | |
} | |
public void _save_to_cas_data() { | |
int i = 0; | |
FSArray keys = new FSArray(this, fs2fsMap.size()); | |
FSArray values = new FSArray(this, fs2fsMap.size()); | |
for (Entry<TOP, TOP> entry : fs2fsMap.entrySet()) { | |
keys.set(i, entry.getKey()); | |
values.set(i, entry.getValues()); | |
i++; | |
} | |
setKeys(keys); | |
setValues(values); | |
} | |
</programlisting> | |
</para> | |
<para>Beyond this simple implementation, various optimization can be done. | |
One typical one is to treat the use case where no updates were done as a special | |
case (but one which might occur frequently), and in that case having the | |
_save_to_cas_data operation do nothing, since the original CAS data is still valid.</para> | |
<para>One additional "boilerplate" method is required for all of these classes:</para> | |
<para><code> public FeatureStructureImplC _superClone() {return clone();}</code></para> | |
</listitem> | |
</varlistentry> | |
</variablelist> | |
</para> | |
</listitem> | |
</varlistentry></variablelist> | |
<para>For custom types which hold collections of Feature Structures, you can have those participate in the | |
<code>Select</code> framework, by implementing the optional Interface <code>SelectViaCopyToArray</code>.</para> | |
<para>For more examples, please see the implementations of the semi-built-in classes described in the | |
following section.</para> | |
</section> | |
<section id="uv3.custom_java_objects.new_semibuiltins"> | |
<title>Additional semi-built-in UIMA Types for some common Java Objects</title> | |
<titleabbrev>semi-built-in UIMA Types</titleabbrev> | |
<para>Some additional semi-built-in UIMA types are defined in Version 3 using this new mechanism. They work fully in | |
Java, and are serialized or transported to non-Java frameworks as ordinary CAS objects.</para> | |
<para>Semi-built-in means that the JCas cover classes for these are defined as part of the core Java classes, | |
but the types themselves are not "built-in". They may be added to any tyupe system by importing them by name using the import statement: | |
<programlisting><import name="org.apache.uima.semibuiltins"/></programlisting> | |
If you have a Java project whose classpath includes uimaj-core, and you run the Component Descriptor Editor | |
Eclipse plugin tool on a descriptor which includes a type system, you can configure this import by selecting | |
the Add on the Import type system subpanel, and import by name, and selecting org.apache.uima.semibuiltins. | |
(Note: this will not show up if your project doesn't include uimaj-core on its build path.) | |
</para> | |
<section id="uv3.custom_java_objects.semibuiltin_fsarraylist"> | |
<title>FSArrayList</title> | |
<titleabbrev>FSArrayList</titleabbrev> | |
<para><code>org.apache.uima.jcas.cas.FSArrayList</code> is like the current FSArray, except that it implements the List API and supports | |
adding to the array, with automatic resizing, like an ArrayList in Java. It is implemented internally | |
using a Java ArrayList.</para> | |
<para>The CAS data form is held in a plain FSArray feature.</para> | |
<para>The <code>equals()</code> method is true if both FSArrayList objects have the same size, and | |
contents are <code>equal</code> item by item. | |
The list of supported operations includes all of the operations of | |
the Java <code>List</code> interface. This object also includes the <code>select</code> methods, so it | |
can be used as a source for the <code>select</code> framework.</para> | |
</section> | |
<section id="uv3.custom_java_objects.semibuiltin_integerarraylists"> | |
<title>IntegerArrayList</title> | |
<titleabbrev>IntegerArrayList</titleabbrev> | |
<para><code>org.apache.uima.jcas.cas.IntegerArrayList</code> is like the current IntegerArray, except that it implements the List API and supports | |
adding to the array, with automatic resizing, like an ArrayList in Java.</para> | |
<para>The CAS data form is held in a plain IntegerArray feature.</para> | |
<para>The <code>equals()</code> method is true if both IntegerArrayList objects have | |
the same size, and | |
contents are <code>equal</code> item by item. | |
The list of supported operations includes a subset of the operations of | |
the Java <code>List</code> interface, where certain values are changed to Java primitive | |
<code>ints</code>. To support the <code>Iterable</code> interface, there is | |
a version of <code>iterator()</code> where the result is "boxed" into an | |
Integer. For efficiency, there's also a method intListIterator, which returns | |
an instance of IntListIterator, which permits iterating forwards and backwards, without | |
boxing.</para> | |
</section> | |
<section id="uv3.custom_java_objects.semibuiltin_FSHashSet"> | |
<title>FSHashSet and FSLinkedHashSet</title> | |
<para><code>org.apache.uima.jcas.cas.FSHashSet</code> and <code>org.apache.uima.jcas.cas.FSLinkedHashSet</code> | |
store Feature Structures in a (Linked) HashSet, using whatever is defined | |
as the Feature Structure's <code>equals</code> and <code>hashcode</code>. | |
<blockquote> | |
<para>You may customize the particular equals and hashcode by creating a wrapper class that | |
is a subclass of the type of interest which forwards to the underlying Feature Structure, but | |
has its own definition of <code>equals</code> and <code>hashcode</code>. | |
</para> | |
</blockquote> | |
</para> | |
<para>The CAS data form is held in an FSArray consisting of the members of the set.</para> | |
<para>If you want a predictable iteratation order, use FSLinkedHashSet instead of FSHashSet. | |
</para> | |
</section> | |
<section id="uv3.custom_java_objects.semibuiltin_Int2FS"> | |
<title>Int2FS Int to Feature Structure map</title> | |
<para>Some applications find it convenient to have a map from ints to Feature Structures. | |
In UIMA V2, they made use of the low level CAS APIs that allowed | |
getting an Feature Structure from an int id using <code>ll_getFSForRef(int)</code>. | |
</para> | |
<para>In v3, use of the low level APIs in this manner | |
can be enabled, but is discouraged, because it prevents garbage collection of non-reachable Feature Structures. | |
</para> | |
<para><code>org.apache.uima.jcas.cas.Int2FS<T></code> maps from <code>int</code>s to <code>Feature Structure</code>s | |
of type T. | |
This provides an alternative way to have int -> FS maps, | |
under user control of what exactly gets added to them, | |
supporting removes and clearing, under application control | |
</para> | |
<para>The <code>iterator()</code> method returns an Iterator over <code>IntEntry<T></code> objects - these | |
are like java <code>Entry<K, V></code> objects except the key is an int.</para> | |
</section> | |
</section> | |
<section id="uv3.custom_java_objects.design"> | |
<title>Design for reuse</title> | |
<para>While it is possible to have a single custom JCas class implement multiple Java Objects, this is | |
typically not a good design practice, as it reduces reusability. It is usually better to | |
implement one custom Java object per JCas class, with an associated UIMA type, and have that as | |
the reusable entity. | |
</para> | |
</section> | |
</chapter> | |