blob: 00a0c86609420f6aa2bd98d0962088dc8948257c [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
<!ENTITY imgroot "images/references/ref.cas/" >
<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >
%uimaents;
]>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<chapter id="ugr.ref.cas">
<title>CAS Reference</title>
<para>The CAS (Common Analysis System) is the part of the Unstructured Information
Management Architecture (UIMA) that is concerned with creating and handling the data
that annotators manipulate.</para>
<para>Java users typically use the JCas (Java interface to the CAS) when manipulating
objects in the CAS. This chapter describes an alternative interface to the CAS which
allows discovery and specification of types and features at run time. It is recommended
for use when the using code cannot know ahead of time the type system it will be dealing
with.</para>
<para>Use of the CAS as described here is also recommended (or necessary) when components add
to the definitions of types of other components. This UIMA feature allows users to add features
to a type that was already defined elsewhere. When this feature is used in conjunction with the
JCas, it can lead to problems with class loading. This is because different JCas representations
of a single type are generated by the different components, and only one of them is loaded
(unless you are using Pear descriptors). Note:
we do not recommend that you add features to pre-existing types. A type should be defined in one
place only, and then there is no problem with using the JCas. However, if you do use this feature,
do not use the JCas. Similarly, if you distribute your components for inclusion in somebody else's
UIMA application, and you're not sure that they won't add features to your types, do not use the
JCas for the same reasons.
</para>
<section id="ugr.ref.cas.javadocs">
<title>Javadocs</title>
<para>The subdirectory <literal>docs/api</literal> contains the documentation
details of all the classes, methods, and constants for the APIs discussed here. Please
refer to this for details on the methods, classes and constants, specifically in the
packages <literal>org.apache.uima.cas.*</literal>.</para>
</section>
<section id="ugr.ref.cas.overview">
<title>CAS Overview</title>
<para>There are three<footnote><para>A fourth part, the Subject of Analysis,
is discussed in <olink targetdoc="&uima_docs_tutorial_guides;"
/> <olink targetdoc="&uima_docs_tutorial_guides;"
targetptr="ugr.tug.aas"/>.</para></footnote> main parts to the CAS: the type system, data creation and
manipulation, and indexing. We will start with a brief
description of these components.</para>
<section id="ugr.ref.cas.type_system">
<title>The Type System</title>
<para>The type system specifies what kind of data you will be able to manipulate in your
annotators. The type system defines two kinds of entities, types and features. Types
are arranged in a single inheritance tree and define the kinds of entities (objects)
you can manipulate in the CAS. Features optionally specify slots or fields within a
type. The correspondence to Java is to equate a CAS Type to a Java Class, and the CAS
Features to fields within the type. A critical difference is that CAS types have no
methods; they are just data structures with named slots (features). These features can
have as values primitive things like integers, floating point numbers, and strings,
and they also can hold references to other instances of objects in the CAS. We call
instances of the data structures declared by the type system <quote>feature
structures</quote> (not to be confused with <quote>features</quote>). Feature
structures are similar to the many variants of record structures found in computer
science.<footnote><para> The name <quote>feature structure</quote> comes from
terminology used in linguistics.</para></footnote></para>
<para>Each CAS Type defines a supertype; it is a subtype of that supertype. This means
that any features that the supertype defines are features of the subtype; in other
words, it inherits its supertype&apos;s features. Only single inheritance is
supported; a type&apos;s feature set is the union of all of the features in its
supertype hierarchy. There is a built-in type called uima.cas.TOP; this is the top,
root node of the inheritance tree. It defines no features.</para>
<para>The values that can be stored in features are either built-in primitive values or
references to other feature structures. The primitive values are
<literal>boolean</literal>, <literal>byte</literal>,
<literal>short</literal> (16 bit integers), <literal>integer</literal> (32
bit), <literal>long</literal> (64 bit), <literal>float</literal> (32 bit),
<literal>double</literal> (64 bit floats) and strings; the official names of these
are <literal>uima.cas.Boolean</literal>, <literal>uima.cas.Byte</literal>,
<literal>uima.cas.Short</literal>, <literal>uima.cas.Integer</literal>,
<literal>uima.cas.Long</literal>, <literal>uima.cas.Float</literal>
,<literal> uima.cas.Double</literal> and <literal>uima.cas.String</literal>
. The strings are Java strings, and characters are Java characters. Technically, this means
that characters are UTF-16 code points, which is not quite the same as a Unicode character.
This distinction should make no difference for almost all applications.
The CAS also defines other basic built-in types for arrays of these, plus arrays of
references to other objects, called <literal>uima.cas.IntegerArray</literal>
,<literal> uima.cas.FloatArray</literal>,
<literal>uima.cas.StringArray</literal>,
<literal>uima.cas.FSArray</literal>, etc.</para>
<para>The CAS also defines a built-in type called
<literal>uima.tcas.Annotation</literal> which inherits from
<literal>uima.cas.AnnotationBase</literal> which in turn inherits from
<literal>uima.cas.TOP</literal>. There are two features defined by this type,
called <literal>begin</literal> and <literal>end</literal>, both of which are
integer valued.</para>
</section>
<section id="ugr.ref.cas.creating_accessing_manipulating_data">
<title>Creating, accessing and manipulating data</title>
<titleabbrev>Creating/Accessing/Changing data</titleabbrev>
<para>
Creating and accessing data in the CAS requires knowledge about the types and features
defined in the type system. The idea is similar to other data access APIs, such as the XML
DOM or SAX APIs, or database access APIs such as JDBC. Contrary to those APIs, however, the
CAS does not use the names of type system entities directly in the APIs. Rather, you use
the type system to access type and feature entities by name, then use these entities in the
data manipulation APIs. This can be compared to the Java reflection APIs: the type system
is comparable to the Java class loader, and the type and feature objects to the
<literal>java.lang.Class</literal> and <literal>java.lang.reflect.Field</literal> classes.
</para>
<para>
Why does it have to be this complicated? You wouldn&apos;t normally use reflection to create a
Java object, either. As mentioned earlier, the JCas provides the more straightforward
method to manipulate CAS data. The CAS access methods described here need only be used for
generic types of applications that need to be able to handle any kind of data (e.g., generic
tooling) or when the JCas may not be used for other reasons. The generic kinds of applications
are exactly the ones where you would use the reflection API in Java as well.
</para>
</section>
<section id="ugr.ref.cas.creating_using_indexes">
<title>Creating and using indexes</title>
<para>Each view of a CAS provides a set of indexes for that view. Instances of Types (that is, Feature
Structures) can be added to a view&apos;s indexes. These indexes provide
a way for annotators to locate existing data in the CAS, using a specific index (or the
method <literal>getAllIndexedFS</literal> of the object <literal>FSIndexRepository</literal>) to
retrieve the Feature Structures that were previously created. If you want the data you
Newly created Feature Structures are not automatically added to the indexes; you choose which
Feature Structures to add and use one of several APIs to add them.
</para>
<para>Indexes are named and are associated with a CAS Type; they are used to index
instances of that CAS type (including instances of that type&apos;s subtypes). If
you are using multiple views (see <olink
targetdoc="&uima_docs_tutorial_guides;"/> <olink
targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.mvs"/>),
each view contains a separate instantiation of all of the indexes.
To access an index, you
minimally need to know its name. A CAS view provides an index repository which you can
query for indexes for that view. Once you have a handle to an index, you can get
information about the feature structures in the index, the size of the index, as well
as an iterator over the feature structures.</para>
<para>There are three kinds of indexes:
<itemizedlist spacing="compact">
<listitem>
<para>bag - no ordering</para>
</listitem>
<listitem>
<para>set - uses a user-specfied set of keys to define equality; holds one instance of the set of equal items.</para>
</listitem>
<listitem>
<para>sorted - uses a user-specified set of keys to define ordering.</para>
</listitem>
</itemizedlist>
</para>
<para>For set indexes, the comparator keys are augmented with an implicit additional field - the type of the
feature structure. This means that an index over Annotations, having subtype Token, and a key of the "begin" value,
will behave as follows:
<itemizedlist>
<listitem><para>If you make two Tokens (or two Annotations), both having a begin value of 17, and add both of them to the indexes,
only one of them will be in the index.</para>
</listitem>
<listitem><para>If you make 1 Token and 1 Annotation, both having a begin value of 17, and add both of them to the indexes,
both of them will be in the index (because the types are different).
</para></listitem>
</itemizedlist>
</para>
<para>Indexes are defined in the XML descriptor metadata for the application. Each CAS
View has its own, separate instantiation of indexes based on these definitions,
kept in the view's index repository. When you obtain an index, it is always from a
particular CAS view's index repository.
When you index an item, it is always added to all indexes where it
belongs, within just the view's repository. You can specify different repositories
(associated with different CAS views) to use; a given Feature Structure instance
may be indexed in more than one CAS View (unless it is a subtype of AnnotationBase).</para>
<para>Indexes implement the Iterable interface, so you may use the Java enhanced for loop to iterate over them.</para>
<para>You can also get iterators from indexes;
iterators allow you to enumerate the feature structures in an index. There are two kinds of iterators supported:
the regular Java iterator API, and a specific FS iterator API
where the usual Java iterator APIs (<literal>hasNext()</literal> and <literal>next()</literal>)
are augmented by <literal>isValid()</literal>, <literal>moveToNext() / moveToPrevious()</literal> (which does
not return an element) and <literal>get()</literal>. Finally, there is a <literal>moveTo(FeatureStructure)</literal>
API, which, for sorted indexes, moves the iteration point to the left-most (among otherwise "equal") item
in the index which compares "equal" to the given FeatureStructure, using the index's defined comparator.
</para>
<para>
Which API style you use is up to you,
but we do not recommend mixing the styles as the results are sometimes unexpected. If you
just want to iterate over an index from start to finish, either style is equally appropriate.
If you also use <literal>moveTo(FeatureStructure fs)</literal> and
<literal>moveToPrevious()</literal>, it is better to use the special FS iterator style.
</para>
<note><para>The reason to not mix these styles is that you might be thinking that
next() followed by moveToPrevious() would always work. This is not true, because
next() returns the "current" element, and advances to the next position, which might be
beyond the last element. At that point, the iterator becomes "invalid", and
moveToNext and moveToPrevious no longer move the iterator. But you can
call these methods on the iterator &mdash; moveToFirst(), moveToLast(), or moveTo(FS) &mdash; to reset it.</para></note>
<para>Indexes are created by specifying them in the annotator&apos;s or
aggregate&apos;s resource descriptor. An index specification includes its name,
the CAS type being indexed, the kind (bag, set or sorted) of index it is, and an (optional) set of keys.
The keys are used for set and sorted indexes, and specify what values are used for
ordering, or (for sets) what values are used to determine set equality.
When a CAS pipeline is created, all index
specifications are combined; duplicate definitions (having the same name) are
allowed only if their definitions are the same. </para>
<para>Feature structure instances need to be explicitly added to the index repository by a
method call. Feature structures that are not indexed will not be visible to other
annotators, (unless they are located via being referenced by some other feature of
another feature structure, which is indexed, or through a chain of these).</para>
<para>The framework defines an unnamed bag index which indexes all types. The
only access provided for this index is the getAllIndexedFS(type) method on the
index repository, which returns an iterator over all indexed instances of the
specified type (including its subtypes) for that CAS View.
</para>
<para>The framework defines one standard, built-in annotation index, called
AnnotationIndex, which indexes the <literal>uima.tcas.Annotation</literal>
type: all feature structures of type <literal>uima.tcas.Annotation</literal> or
its subtypes are automatically indexed with this built-in index.</para>
<para>The ordering relation used by this index is to first order by the value of the
<quote>begin</quote> features (in ascending order) and then by the value of the
<quote>end</quote> feature (in descending order), and then, finally, by the
Type Priority. This ordering insures that
longer annotations starting at the same spot come before shorter ones. For Subjects
of Analysis other than Text, this may not be an appropriate index.</para>
<para>In addition to normal iterators, there is a <literal>select</literal> API, documented
in the Version 3 Users guide, which provides additional capabilities for accessing
Feature Structures via the indexes.</para>
</section>
</section>
<section id="ugr.ref.cas.builtin_types">
<title>Built-in CAS Types</title>
<para>The CAS has two kinds of built-in types &ndash; primitive and non-primitive. The
primitive types are:
<itemizedlist spacing="compact">
<listitem><para>uima.cas.Boolean</para></listitem>
<listitem><para>uima.cas.Byte</para></listitem>
<listitem><para>uima.cas.Short</para></listitem>
<listitem><para>uima.cas.Integer</para></listitem>
<listitem><para>uima.cas.Long</para></listitem>
<listitem><para>uima.cas.Float</para></listitem>
<listitem><para>uima.cas.Double</para></listitem>
<listitem><para>uima.cas.String</para></listitem>
</itemizedlist></para>
<para>The <literal>Byte, Short, Integer, </literal>and<literal> Long</literal> are
all signed integer types, of length 8, 16, 32, and 64 bits. The
<literal>Double</literal> type is 64 bit floating point. The
<literal>String</literal> type can be subtyped to create sets of allowed values; see
<olink targetdoc="&uima_docs_ref;"
targetptr="ugr.ref.xml.component_descriptor.type_system.string_subtypes"/>.
These types can be used to specify the range of a String-valued feature. They act like
Strings, but have additional checking to insure the setting of values into them
conforms to one of the allowed values, or to null (which is the value if it is not set).
Note that the other primitive types cannot be used
as a supertype for another type definition; only
<literal>uima.cas.String</literal> can be sub-typed.</para>
<para>The non-primitive types exist in a type hierarchy; the top of the hierarchy is the
type <literal>uima.cas.TOP</literal>. All other non-primitive types inherit from
some supertype.</para>
<para>There are 9 built-in array types. These arrays have a size specified when they are
created; the size is fixed at creation time. They are named:
<itemizedlist spacing="compact">
<listitem><para>uima.cas.BooleanArray</para></listitem>
<listitem><para>uima.cas.ByteArray</para></listitem>
<listitem><para>uima.cas.ShortArray</para></listitem>
<listitem><para>uima.cas.IntegerArray</para></listitem>
<listitem><para>uima.cas.LongArray</para></listitem>
<listitem><para>uima.cas.FloatArray</para></listitem>
<listitem><para>uima.cas.DoubleArray</para></listitem>
<listitem><para>uima.cas.StringArray</para></listitem>
<listitem><para>uima.cas.FSArray</para></listitem>
</itemizedlist></para>
<para>The <literal>uima.cas.FSArray</literal> type is an array whose elements are
arbitrary other feature structures (instances of non-primitive types).</para>
<para>The JCas cover classes for the array types support the Iterable API, so you may
write extended for loops over instances of these. For example:
<programlisting>FSArray&lt;MyType&gt; myArray = ...
for (MyType fs : myArray) {
some_method(fs);
}</programlisting>
</para>
<para>There are 3 built-in types associated with the artifact being analyzed:
<itemizedlist spacing="compact">
<listitem><para>uima.cas.AnnotationBase</para></listitem>
<listitem><para>uima.tcas.Annotation</para></listitem>
<listitem><para>uima.tcas.DocumentAnnotation</para></listitem>
</itemizedlist></para>
<para>The <literal>AnnotationBase</literal> type defines one system-used feature
which specifies for an annotation the subject of analysis (Sofa) to which it refers. The
Annotation type extends from this and defines 2 features, taking
<literal>uima.cas.Integer</literal> values, called <literal>begin</literal>
and <literal>end</literal>. The <literal>begin</literal> feature typically
identifies the start of a span of text the annotation covers; the
<literal>end</literal> feature identifies the end. The values refer to character
offsets; the starting index is 0. An annotation of the word <quote>CAS</quote> in a text
<quote>CAS Reference</quote> would have a start index of 0, and an end index of 3; the
difference between end and start is the length of the span the annotation refers
to.</para>
<para>Annotations are always with respect to some Sofa (Subject of Analysis &ndash; see
<olink targetdoc="&uima_docs_tutorial_guides;"/>
<olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aas"/>
.</para>
<note><para>Artifacts which are not text strings may have a different interpretation of
the meaning of begin and end, or may define their own kind of annotation, extending from
<literal>AnnotationBase</literal>. </para></note>
<para id="ugr.ref.cas.document_annotation">The <literal>DocumentAnnotation</literal> type has one special instance. It is
a subtype of the Annotation type, and the built-in definition defines one feature,
<literal>language</literal>, which is a string indicating the language of the
document in the CAS. The value of this language feature is used by the system to control
flow among annotators when the <quote>CapabilityLanguageFlow</quote> mode is used,
allowing the flow to skip over annotators that don&apos;t process particular
languages. Users may extend this type by adding additional features to it, using the XML
Descriptor element for defining a type.</para>
<note><para>
We do <emphasis>not</emphasis> recommend extending the <literal>DocumentAnnotation</literal>
type. If you do, you must <emphasis>not</emphasis> use the JCas, for the reasons stated
earlier.
</para></note>
<para>Each CAS view has a different associated instance of the
<literal>DocumentAnnotation</literal> type. On the CAS, use
<literal>getDocumentationAnnotation()</literal> to access the
<literal>DocumentAnnotation</literal>.</para>
<para>There are also built-in types supporting linked lists, similar to the ones available in
Java and other programming languages. Their use is
constrained by the usual properties of linked lists: not very space efficient, no (efficient)
random access, but an easy choice if you don't know how long your list will be ahead of time. The
implementation is type specific; there are different list building objects for each of
the primitive types, plus one for general feature structures. Here are the type names:
<itemizedlist spacing="compact">
<listitem><para>uima.cas.FloatList</para></listitem>
<listitem><para>uima.cas.IntegerList</para></listitem>
<listitem><para>uima.cas.StringList</para></listitem>
<listitem><para>uima.cas.FSList</para>
<para></para></listitem>
<listitem><para>uima.cas.EmptyFloatList</para></listitem>
<listitem><para>uima.cas.EmptyIntegerList</para></listitem>
<listitem><para>uima.cas.EmptyStringList</para></listitem>
<listitem><para>uima.cas.EmptyFSList</para>
<para></para></listitem>
<listitem><para>uima.cas.NonEmptyFloatList</para></listitem>
<listitem><para>uima.cas.NonEmptyIntegerList</para></listitem>
<listitem><para>uima.cas.NonEmptyStringList</para></listitem>
<listitem><para>uima.cas.NonEmptyFSList</para></listitem>
</itemizedlist></para>
<para>For the primitive types <literal>Float</literal>,
<literal>Integer</literal>, <literal>String</literal> and
<literal>FeatureStructure</literal>, there is a base type, for instance,
<literal>uima.cas.FloatList</literal>. For each of these, there are two subtypes,
corresponding to a non-empty element, and a marker that serves to indicate the end of the
list, or an empty list. The non-empty types define two features &ndash;
<literal>head</literal> and <literal>tail</literal>. The head feature holds the
particular value for that part of the list. The tail refers to the next list object
(either a non-empty one or the empty version to indicate the end of the list).</para>
<para>For JCas users, the new operator for the NonEmptyXyzList classes includes a 3 argument version
where you may specify the head and tail values as part of the constructor. The JCas
cover classes for these implement
a <code>push(item)</code> method which creates a new non-empty node, sets the <code>head</code> value
to <code>item</code>, and the tail to the node it is called on, and returns the new node.
These classes also implement Iterable, so you can use the enhanced Java <code>for</code> operator.
The iterator stops when it gets to the end of the list, determined by either the tail being null or
the element being one of the EmptyXXXList elements.
Here's a StringList example:
<programlisting>StringList sl = jcas.emptyStringList();
sl = sl.push("2");
sl = sl.push("1");
for (String s : sl) {
someMethod(s); // some sample use
}</programlisting>
</para>
<para>There are no other built-in types. Users are free to define their own type systems,
building upon these types.</para>
</section>
<section id="ugr.ref.cas.accessing_the_type_system">
<title>Accessing the type system</title>
<para>
During annotator processing, or outside an annotator, access the type system by calling
<literal>CAS.getTypeSystem()</literal>.
</para>
<para>However, CAS annotators implement an additional method,
<literal>typeSystemInit()</literal>, which is called by the UIMA framework before the
annotator&apos;s process method. This method, implemented by the annotator writer,
is passed a reference to the CAS&apos;s type system metadata. The method typically uses
the type system APIs to obtain type and feature objects corresponding to all the types
and features the annotator will be using in its process method. This initialization
step should not be done during an annotator&apos;s initialize method since the type
system can change after the initialize method is called; it should not be done during the
process method, since this is presumably work that is identical for each incoming
document, and so should be performed only when the type system changes (which will be a
rare event). The UIMA framework guarantees it will call the <literal>typeSystemInit
</literal>method of an annotator whenever the type system changes, before calling the
annotator&apos;s <literal>process()</literal> method.</para>
<para>The initialization done by <literal>typeSystemInit()</literal> is done by the
UIMA framework when you use the JCas APIs; you only need to provide a
<literal>typeSystemInit()</literal> method, as described here, when you are not using
the JCas approach.</para>
<section id="ugr.ref.cas.type_system.printer_example">
<title>TypeSystemPrinter example</title>
<para>Here is a code fragment that, given a CAS Type System, will print a list of all
types.</para>
<programlisting>// Get all type names from the type system
// and print them to stdout.
private void listTypes1(TypeSystem ts) {
for (Type t : ts) {
// print its name.
System.out.println(t.getName());
}
}</programlisting>
<para>This method is passed the type system as a parameter. From the type system, we can
get an iterator
over all the types. If you run this against a CAS created with no additional
user-defined types, we should see something like this on the console:</para>
<programlisting>Types in the type system:
uima.cas.Boolean
uima.cas.Byte
uima.cas.Short
uima.cas.Integer
uima.cas.Long
uima.cas.ArrayBase
...
</programlisting>
<para>If the type system had user-defined types these would show up too. Note that some
of these types are not directly creatable &ndash; they are types used by the framework
in the type hierarchy (e.g. uima.cas.ArrayBase).</para>
<para>CAS type names include a name-space prefix. The components of a type name are
separated by the dot (.). A type name component must start with a Unicode letter,
followed by an arbitrary sequence of letters, digits and the underscore (_). By
convention, the last component of a type name starts with an uppercase letter, the
rest start with a lowercase letter.</para>
<para>Listing the type names is mildly useful, but it would be even better if we could see
the inheritance relation between the types. The following code prints the
inheritance tree in indented format.</para>
<programlisting>private static final int INDENT = 2;
private void listTypes2(TypeSystem ts) {
// Get the root of the inheritance tree.
Type top = ts.getTopType();
// Recursively print the tree.
printInheritanceTree(ts, top, 0);
}
private void printInheritanceTree(TypeSystem ts, Type type, int level) {
indent(level); // Print indentation.
System.out.println(type.getName());
// Get a vector of the immediate subtypes.
Vector subTypes =
ts.getDirectlySubsumedTypes(type);
++level; // Increase the indentation level.
for (int i = 0; i &lt; subTypes.size(); i++) {
// Print the subtypes.
printInheritanceTree(ts, (Type) subTypes.get(i), level);
}
}
// A simple, inefficient indenter
private void indent(int level) {
int spaces = level * INDENT;
for (int i = 0; i &lt; spaces; i++) {
System.out.print(" ");
}
}</programlisting>
<para> This example shows that you can traverse the type hierarchy by starting at the top
with TypeSystem.getTopType and by retrieving subtypes with
<literal>TypeSystem.getDirectlySubsumedTypes()</literal>.</para>
<para>The Javadocs also have APIs that allow you to access the features, as well as what
the allowed value type is for that feature. Here is sample code which prints out all the
features of all the types, together with the allowed value types (the feature
<quote>range</quote>). Each feature has a <quote>domain</quote> which is the type
where it is defined, as well as a <quote>range</quote>.
<programlisting>private void listFeatures2(TypeSystem ts) {
Iterator featureIterator = ts.getFeatures();
Feature f;
System.out.println("Features in the type system:");
while (featureIterator.hasNext()) {
f = (Feature) featureIterator.next();
System.out.println(
f.getShortName() + ": " +
f.getDomain() + " -&gt; " + f.getRange());
}
System.out.println();
}</programlisting></para>
<para>We can ask a feature object for its domain (the type it is defined on) and its range
(the type of the value of the feature). The terminology derives from the fact that
features can be viewed as functions on subspaces of the object space.</para>
</section>
<section id="ugr.ref.cas.cas_apis_create_modify_feature_structures">
<title>Using the CAS APIs to create and modify feature structures</title>
<titleabbrev>Using CAS APIs: Feature Structures</titleabbrev>
<para>Assume a type system declaration that defines two types: Entity and Person.
Entity has no features defined within it but inherits from uima.tcas.Annotation
&ndash; so it has the begin and end features. Person is, in turn, a subtype of Entity,
and adds firstName and lastName features. CAS type systems are declaratively
specified using XML; the format of this XML is described in <olink
targetdoc="&uima_docs_ref;"
targetptr="ugr.ref.xml.component_descriptor.type_system"/>.
<programlisting><![CDATA[<!-- Type System Definition -->
<typeSystemDescription>
<types>
<typeDescription>
<name>com.xyz.proj.Entity</name>
<description />
<supertypeName>uima.tcas.Annotation</supertypeName>
</typeDescription>
<typeDescription>
<name>Person</name>
<description />
<supertypeName>com.xyz.proj.Entity </supertypeName>
<features>
<featureDescription>
<name>firstName</name>
<description />
<rangeTypeName>uima.cas.String</rangeTypeName>
</featureDescription>
<featureDescription>
<name>lastName</name>
<description />
<rangeTypeName>uima.cas.String</rangeTypeName>
</featureDescription>
</features>
</typeDescription>
</types>
</typeSystemDescription>]]></programlisting></para>
<para>
To be able to access types and features, we need to know their names. The CAS interface defines
constants that hold the names of built-in feature names, such as, e.g.,
<literal>CAS.TYPE_NAME_INTEGER</literal>. It is good programming practice to create such
constants for the types and features you define, for your own use as well as for others who will
be using your annotators.
</para>
<programlisting>/** Entity type name constant. */
public static final String ENTITY_TYPE_NAME = "com.xyz.proj.Entity";
/** Person type name constant. */
public static final String PERSON_TYPE_NAME = "com. xyz.proj.Person";
/** First name feature name constant. */
public static final String FIRST_NAME_FEAT_NAME = "firstName";
/** Last name feature name constant. */
public static final String LAST_NAME_FEAT_NAME = "lastName";</programlisting>
<para>Next we define type and feature member variables; these will hold the values of the
type and feature objects needed by the CAS APIs, to be assigned during
<literal>typeSystemInit()</literal>.</para>
<programlisting>// Type system object variables
private Type entityType;
private Type personType;
private Feature firstNameFeature;
private Feature lastNameFeature;
private Type stringType;</programlisting>
<para>The type system does not throw an exception if we ask for something that is
not known, it simply returns null; therefore the code checks for this and throws a proper
exception. We require all these types and features to be defined for the annotator to
work. One might imagine situations where certain computations are predicated on some type
or feature being defined in the type system, but that is not the case here.</para>
<programlisting>// Get a type object corresponding to a name.
// If it doesn&apos;t exist, throw an exception.
private Type initType(String typeName)
throws AnnotatorInitializationException {
Type type = ts.getType(typeName);
if (type == null) {
throw new AnnotatorInitializationException(
AnnotatorInitializationException.TYPE_NOT_FOUND,
new Object[] { this.getClass().getName(), typeName });
}
return type;
}
// We add similar code for retrieving feature objects.
// Get a feature object from a name and a type object.
// If it doesn&apos;t exist, throw an exception.
private Feature initFeature(String featName, Type type)
throws AnnotatorInitializationException {
Feature feat = type.getFeatureByBaseName(featName);
if (feat == null) {
throw new AnnotatorInitializationException(
AnnotatorInitializationException.FEATURE_NOT_FOUND,
new Object[] { this.getClass().getName(), featName });
}
return feat;
}</programlisting>
<para>Using these two functions, code for initializing the type system described
above would be:
<programlisting>public void typeSystemInit(TypeSystem aTypeSystem)
throws AnalysisEngineProcessException {
this.typeSystem = aTypeSystem;
// Set type system member variables.
this.entityType = initType(ENTITY_TYPE_NAME);
this.personType = initType(PERSON_TYPE_NAME);
this.firstNameFeature =
initFeature(FIRST_NAME_FEAT_NAME, personType);
this.lastNameFeature =
initFeature(LAST_NAME_FEAT_NAME, personType);
this.stringType = initType(CAS.TYPE_NAME_STRING);
}</programlisting></para>
<para>Note that we initialize the string type by using a type name constant from the
CAS.</para>
</section>
</section>
<section id="ugr.ref.cas.creating_feature_structures">
<title>Creating feature structures</title>
<para>To create feature structures in JCas, we use the Java <quote>new</quote>
operator. In the CAS, we use one of several different API methods on the CAS object,
depending on which of the 10 basic kinds of feature structures we are creating (a plain
feature structure, or an instance of the built-in primitive type arrays or FSArray).
There are is also a method to create an instance of a
<literal>uima.tcas.Annotation</literal>, setting the begin and end
values.</para>
<para>Once a feature structure is created, it needs to be added to the CAS indexes (unless
it will be accessed via some reference from another accessible feature structure). The
CAS provides this API: Assuming aCAS holds a reference to a CAS, and token holds a
reference to a newly created feature structure, here&apos;s the code to add that
feature structure to all the relevant CAS indexes:</para>
<programlisting> // Add the token to the index repository.
aCAS.addFsToIndexes(token);</programlisting>
<para>There is also a corresponding <literal>removeFsFromIndexes(token)</literal>
method on CAS objects.</para>
<para>As of version 2.4.1, there are two methods you can use on an index repository
to efficiently bulk-remove all
instances of particular types of feature structures from a particular view. One of these,
<code>aCas.getIndexRepository().removeAllIncludingSubtypes(aType)</code> removes all instances of a particular
type, including instances which are subtypes of the specified type. The other,
<code>aCas.getIndexRepository().removeAllExcludingSubtypes(aType)</code> remove all instances of a particular
type, only. In both cases, the removal is done from the particular view of the CAS referenced
by aCas.</para>
<section id="ugr.ref.cas.updating_indexed_feature_structures">
<title>Updating indexed feature structures</title>
<para>Version 2.7.0 added protection for indexes when feature structure key
value features are updated. By default this protection is automatic, but
at some performance cost. Users may optimize this further.</para>
<para>Protection is needed because some of the indexes (the Sorted and Set types) use comparators defined
to use values of the particular features; if these values
need to be changed after the feature structure is added to the indexes,
the correct way to do this is to:
<orderedlist spacing="compact">
<listitem><para>completely remove the item from all indexes where it is indexed, in all views
where it is indexed,</para>
</listitem>
<listitem><para>update the value of the features being used as keys,</para></listitem>
<listitem><para>add the item back to the indexes, in all views.</para></listitem>
</orderedlist></para>
<note><para>It&rsquo;s OK to change feature values which are not used in determining
sort ordering (or set membership), without removing and re-adding back to the index.
</para></note>
<!-- <para>To completely remove an item from the indexes may entail removing it multiple times, if it was
added multiple times and (as of version 2.7.0) the JVM global property
<code>uima.allow_duplicate_add_to_indexes</code> is true.</para> -->
<para>The automatic protection checks for updates of
features being used as keys, and if it finds an update like this for a feature structure that
is in the indexes, it removes the feature structure from the indexes, does the update,
and adds it back. It will do this for every feature update. This is obviously not
efficient when multiple features are being updated; in that case it would better to
remove the feature structure, do all the updates to all the features needing updates, and then
do a single add-back operation.</para>
<para>This is supported in user&rsquo;s code by using the new method <code>protectIndexes</code>
available in both the CAS and JCas interface.
Here's two ways
of using this, one with a try / finally and the other with a Runnable:
<programlisting>// an approach using try / finally
AutoCloseable ac = my_cas.protectIndexes(); // my_cas is a CAS or a JCas
try {
... arbitrary user code which updates features
which may be "keys" in one or more indexes
} finally {
ac.close();
}
// This can more compactly be written using the auto-close feature of try:
try (AutoCloseable ac = my_cas.protectIndexes()) {
... arbitrary user code which updates features
which may be "keys" in one or more indexes
}
// an approach using a Runnable, written in Java 8 lambda syntax
my_cas.protectIndexes(() -> {
... arbitrary user code updating "key" features,
but no checked exceptions are permitted
});</programlisting></para>
<para>The <code>protectIndexes</code> implementation only removes feature structures that
have features being updated which are used as keys in some index(es). At the end of the scope
of the protectIndexes, it adds all of these back. It also skips removing feature structures
from bag indexes, since these have no keys.</para>
<para>Within a <code>protectIndexes</code> block, do not do any operations which depend on the
indexes being valid, such as creating and using an iterator. This is because the removed FSs
are only added back at the end of the protectIndexes block.</para>
<para>The JVM property <code>-Duima.report_fs_update_corrupts_index</code> will generate a log entry
everytime the frameworks finds (and automatically surrounds with a remove - add-back) an update to
a feature which could corrupt the index. The log entries can be identified by scanning for messages
starting with <code>While FS was in the index, the feature</code> - the message goes on to identify
the feature in question. Users can use these reports to find the places in their code where
they can either change the design to avoid updating these values after the item is indexed, or
surround the updates with their own <code>protectIndexes</code> blocks.</para>
<para>Initially, the out-of-the-box defaults
for the UIMA framework will run with an automatic (but somewhat inefficient) protection. To improve upon this,
users would:
<itemizedlist>
<listitem><para>Turn on reporting using a global JVM flag <code>
-Duima.report_fs_update_corrupts_index</code>.
This will cause a message to be logged each time the automatic protection is being invoked,
and allows the user to find the spots to improve.</para>
</listitem>
<listitem><para>Improve each spot, perhaps by surrounding the update code with a protectIndexes
block, or by rearranging code to reduce updating feature values used as index keys.</para>
</listitem>
<listitem><para>Once the code is no longer generating any reports, you can turn off the
automatic protection for production runs using the JVM global property
<code>-Duima.disable_auto_protect_indexes</code>, and rely on the protectIndexes blocks.
If protection is disabled, then the corruption detection is skipped, making the production
runs perhaps a bit faster, although this is not significant in most cases.</para></listitem>
<listitem><para>For automated build systems, there&rsquo;s a JVM parameter,
<code>-Duima.exception_when_fs_update_corrupts_index</code>, which will throw an
exception if any automatic recovery situation is encountered. You can use this
in build/test scenarios to insure
(after adding all needed protectIndexes blocks) that the code remains safe for
turning off the checking in production runs.</para></listitem>
</itemizedlist>
</para>
</section>
</section>
<section id="ugr.ref.cas.accessing_modifying_features_of_feature_structures">
<title>Accessing or modifying features of feature structures</title>
<titleabbrev>Accessing or modifying Features</titleabbrev>
<para>Values of individual features for a feature structure can be set or referenced,
using a set of methods that depend on the type of value that feature is declared to have.
There are methods on FeatureStructure for this: getBooleanValue, getByteValue,
getShortValue, getIntValue, getLongValue, getFloatValue, getDoubleValue,
getStringValue, and getFeatureValue (which means to get a value which in turn is a
reference to a feature structure). There are corresponding <quote>setter</quote>
methods, as well. These methods on the feature structure object take as arguments the
feature object retrieved earlier in the typeSystemInit method.</para>
<para>Using the previous example, with the type system initialized with type personType
and feature lastNameFeature, here&apos;s a sample code fragment that gets and sets
that feature:</para>
<programlisting>// Assume aPerson is a variable holding an object of type Person
// get the lastNameFeature value from the feature structure
String lastName = aPerson.getStringValue(lastNameFeature);
// set the lastNameFeature value
aPerson.setStringValue(lastNameFeature, newStringValueForLastName);</programlisting>
<para>The getters and setters for each of the primitive types are defined in the Javadocs
as methods of the FeatureStructure interface.</para>
</section>
<section id="ugr.ref.cas.indexes_and_iterators">
<title>Indexes and Iterators</title>
<para>Each CAS can have many indexes associated with it; each CAS View contains
a complete set of instantiations of the indexes. Each index is represented by an
instance of the type org.apache.uima.cas.FSIndex. You use the object
org.apache.uima.cas.FSIndexRepository, accessible via a method on a CAS object, to
retrieve instances of indexes. There are methods that let you select the index
by name, by type, or by both name and type. Since each index is already associated with a type,
passing both a name and a type is valid only if the type passed in is the same
type or a subtype of the one declared in the index specification for the named index. If you
pass in a subtype, the returned FSIndex object refers to an index that will return only
items belonging to that subtype (or subtypes of that subtype).</para>
<para>The returned FSIndex objects are used, in turn, to create iterators.
There is also a method on the Index Repository, <literal>getAllIndexedFS</literal>,
which will return an iterator over all indexed Feature Structures (for that CAS View),
in no particular order. The iterators
created can be used like common Java iterators, to sequentially retrieve items
indexed. If the index represents a sorted index, the items are returned in a sorted
order, where the sort order is specified in the XML index definition. This XML is part of
the Component Descriptor, see <olink targetdoc="&uima_docs_ref;"
targetptr="ugr.ref.xml.component_descriptor.aes.index"/>.</para>
<para>In UIMA V3, Feature structures may be added to or removed from indexes while iterating
over them. If this happens, any iterators already created will continue to operate over the
before-modification version of the index, unless or until the iterator is re-synchronized with the current
value of the index via one of the following specific 3 iterator API calls:
moveToFirst, moveToLast, or moveTo(FeatureStructure).
ConcurrentModificationException is no longer thrown in UIMA v3.
</para>
<para>Feature structures being iterated over may have features which are used as the "keys" of an index, updated.
If this is done, UIMA will protect the indexes (to prevent index corruption) by automatically removing the
Feature Structure from the indexes,
updating the field, and adding the FS back to the index (possibly in a new position).
This automatic remove / add-back operation no longer makes the iterator throw a ConcurrentModificationException
(as it did in UIMA Version 2) if the iterator is incremented or decremented;
existing iterators will continue to operate as if no index modification occurred.
</para>
<!-- <para>As of version 2.7.0, a new method on FSIndex, <code>withSnapshotIterators(),</code>
allows creating a light-weight FSIndex based on the original FSIndex
that supports doing arbitrary index operations while iterating, and will not throw
<code>ConcurrentModificationException</code>. Iterators obtained from this instance use a
<emphasis>snapshot</emphasis> technique - they create a snapshot of the original index when the
iterator is created, and then use that snapshot while operating, so the iteration is unaffected by any
modifications to the actual index.</para> -->
<section id="ugr.ref.cas.index.built_in_indexes">
<title>Built-in Indexes</title>
<para>An unnamed built-in bag index exists which holds all feature structures which are indexed.
The only access to this index is the method getAllIndexedFS(Type) which returns an iterator
over all indexed Feature Structures.</para>
<para>The CAS also contains a built-in index for the type <literal>uima.tcas.Annotation</literal>, which sorts
annotations in the order in which they appear in the document. Annotations are sorted first by increasing
<literal>begin</literal> position. Ties are then broken by <emphasis>decreasing</emphasis>
<literal>end</literal> position (so that longer annotations come first). Annotations that match in both
their <literal>begin</literal> and <literal>end</literal> features are sorted using the Type Priority,
if any are defined
(see <olink targetdoc="&uima_docs_ref;"
targetptr="ugr.ref.xml.component_descriptor.aes.type_priority"/> )</para>
</section>
<section id="ugr.ref.cas.index.adding_to_indexes">
<title>Adding Feature Structures to the Indexes</title>
<para>Feature Structures are added to the indexes by various APIs. These add the Feature Structure to
<emphasis>all</emphasis> indexes that are defined for the type of that FeatureStructure (or any of its
supertypes), in a particular view.
Note that you should not add a Feature Structure to the indexes until you have set values for all
of the features that may be used as sort keys in an index.</para>
<para>There are multiple APIs for adding FSs to the index.
<itemizedlist>
<listitem><para>(preferred) myFeatureStructure.addToIndexes(). This adds the feature structure instance to the
view in which it was originally created.</para>
</listitem>
<listitem><para>(preferred) myFeatureStructure.addToIndexes(JCas or CAS). This adds the feature structure instance to the
view represented by the argument.</para>
</listitem>
<listitem><para>(older form) casView.addFsToIndexes(myFeatureStructure) or jcasView.addFsToIndexes(myFeatureStructure).
This adds the feature structure instance to the
view represented by the cas (or jcas).</para>
</listitem>
<listitem><para>(older form) fsIndexRepositoryView.addFsToIndexes(myFeatureStructure).
This adds the feature structure instance to the
view represented by the fsIndexRepository instance.</para>
</listitem>
</itemizedlist>
</para>
</section>
<section id="ugr.ref.cas.index.iterators">
<title>Iterators over UIMA Indexes</title>
<para>Iterators are objects of class <literal>org.apache.uima.cas.FSIterator.</literal> This class
extends <literal>java.util.Iterator</literal> and implements the normal Java iterator methods, plus
additional ones that allow moving both forwards and backwards.</para>
<para>UIMA Indexes implement iterable, so you can use the index directly in a Java extended for loop.</para>
</section>
<section id="ugr.ref.cas.index.annotation_index">
<title>Special iterators for Annotation types</title>
<para>Note: we recommend using the UIMA V3 select framework, instead of the following.
It implements all of the following capabilities, and more, in a uniform manner.</para>
<para>The built-in index over the <literal>uima.tcas.Annotation</literal> type
named <quote><literal>AnnotationIndex</literal></quote> has additional
capabilities. To use them, you first get a reference to this built-in index using
either the <literal>getAnnotationIndex</literal> method on a CAS View object, or
by asking the <literal>FSIndexRepository</literal> object for an index having the
particular name <quote>AnnotationIndex</quote>, for example:
<programlisting>AnnotationIndex idx = aCAS.getAnnotationIndex();
// or you can iterate over a specific subtype of Annotation:
AnnotationIndex idx = aCAS.getAnnotationIndex(aType); </programlisting></para>
<para>This object can be used to produce several additional kinds of iterators. It can
produce unambiguous iterators; these skip over elements until it finds one where the
start position of the next annotation is equal to or greater than the end position of
the previously returned annotation.</para>
<para>It can also produce several kinds of subiterators; these are iterators whose
annotations fall within the span of another annotation. This kind of iterator can
also have the unambiguous property, if desired. It also can be
<quote>strict</quote> or not; strict means that the returned annotation lies
completely within the span of the controlling annotation. Non-strict only implies
that the beginning of the returned annotation falls within the span of the
controlling annotation.</para>
<para>There is also a method which produces an <literal>AnnotationTree</literal>
object, which contains nodes representing the results of doing a strict,
unambiguous subiterator over the span of some controlling annotation. For more
details, please refer to the Javadocs for the
<literal>org.apache.uima.cas.text</literal> package.</para>
</section>
<section id="ugr.ref.cas.index.constraints_and_filtered_iterators">
<title>Constraints and Filtered iterators</title>
<para>Note: for new code, consider using the select framework plus Streams, instead of
the following.</para>
<para>There is a set of API calls that build constraint objects. These objects can be
used directly to test if a particular feature structure matches (satisfies) the
constraint, or they can be passed to the createFilteredIterator method to create an
iterator that skips over instances which fail to satisfy the constraint.</para>
<para>It is possible to specify a feature value located by following a chain of
references starting from the feature structure being tested. Here&apos;s a
scenario to explore this concept. Let&apos;s suppose you have the following type
system (namespaces are omitted for clarity):
<blockquote>
<para><emphasis role="bold">Token</emphasis>, having a feature PartOfSpeech
which holds a reference to another type (POS)</para>
<para><emphasis role="bold">POS</emphasis> (a type with many subtypes, each
representing a different part of speech)</para>
<para><emphasis role="bold">Noun</emphasis> (a subtype of POS)</para>
<para><emphasis role="bold">ProperName</emphasis> (a subtype of Noun),
having a feature Class which holds an integer value encoding some information
about the proper noun.</para></blockquote></para>
<para>If you want to filter Token instances, such that only those tokens get through
which are proper names of class 3 (for example), you would need a test that started with
a Token instance, followed its PartOfSpeech reference to another instance (the
ProperName instance) and then tested the Class feature of that instance for a value
equal to 3.</para>
<para>To support this, the filtering approach has components that specify tests, and
components that specify <quote>paths</quote>. The tests that can be done include
testing references to type instances to see if they are instances of some type or its
subtypes; this is done with a FSTypeConstraint constraint. Other tests check for
equality or, for numeric values, ranges.</para>
<para>Each test may be combined with a path &ndash; to get to the value to test. Tests that
start from a feature structure instance can be combined with and and or connectors.
The Javadocs for these are in the package org.apache.uima.cas in the classes that end
in Constraint, plus the classes ConstraintFactory, FeaturePath and CAS.
Here&apos;s an example; assume the variable cas holds a reference to a CAS instance.
<programlisting>// Start by getting the constraint factory from the CAS.
ConstraintFactory cf = cas.getConstraintFactory();
// To specify a path to an item to test, you start by
// creating an empty path.
FeaturePath path = cas.createFeaturePath();
// Add POS feature to path, creating one-element path.
path.addFeature(posFeat);
// You can extend the chain arbitrarily by adding additional
// features.
// Create a new type constraint.
// Type constraints will check that structures
// they match against have a type at least as specific
// as the type specified in the constraint.
FSTypeConstraint nounConstraint = cf.createTypeConstraint();
// Set the type (by default it is TOP).
// This succeeds if the type being tested by this constraint
// is nounType or a subtype of nounType.
nounConstraint.add(nounType);
// Embed the noun constraint under the pos path.
// This means, associate the test with the path, so it tests the
// proper value.
// The result is a test which will
// match a feature structure that has a posFeat defined
// which has a value which is an instance of a nounType or
// one of its subtypes.
FSMatchConstraint embeddedNoun = cf.embedConstraint(path, nounConstraint);
// Create a type constraint for token (or a subtype of it)
FSTypeConstraint tokenConstraint = cf.createTypeConstraint();
// Set the type.
tokenConstraint.add(tokenType);
// Create the final constraint by conjoining the two constraints.
FSMatchConstraint nounTokenCons = cf.and(nounConstraint, tokenConstraint);
// Create a filtered iterator from some annotation iterator.
FSIterator it = cas.createFilteredIterator(annotIt, nounTokenCons);</programlisting>
</para></section></section>
<section id="ugr.ref.cas.guide_to_javadocs">
<title>The CAS API&apos;s &ndash; a guide to the Javadocs</title>
<titleabbrev>CAS API&apos;s Javadocs</titleabbrev>
<para>The CAS APIs are organized into 3 Java packages: cas, cas.impl, and cas.text. Most
of the APIs described here are in the cas package. The cas.impl package contains classes
used in serializing and deserializing (reading and writing external representations) the
CAS in various formats, for
transporting the CAS among local and remote annotators, or for storing the CAS in
permanent storage. The cas.text contains the APIs that extend the CAS to support
artifact (including <quote>text</quote>) analysis.</para>
<section id="ugr.ref.cas.javadocs.cas_package">
<title>APIs in the CAS package</title>
<para>The main objects implementing the APIs discussed here are shown in the diagram
below. The hierarchy represents that there is a way to get from an upper object to an
instance of the lower object, usually by using a method on the upper object; this is not
an inheritance hierarchy.
<figure id="ugr.ref.cas.fig.api_hierarchy">
<title>CAS Object hierarchy</title>
<mediaobject>
<imageobject>
<imagedata width="5.8in" format="JPG"
fileref="&imgroot;image001.png"/>
</imageobject>
<textobject><phrase>CAS object hierarchy</phrase></textobject>
</mediaobject>
</figure> </para>
<para>The main Interface is the CAS interface. This has most of the functionality of the
CAS, except for the type system metadata access, and the indexing access. JCas and CAS
are alternative representations and API approaches to the CAS; each has a method to
get the other. You can mix JCas and CAS APIs in your application as needed. To use the
JCas APIs, you have to create the Java classes that correspond to the CAS types, and
include them in the Java class path of the application. If you have a CAS object, you can
get a JCas object by using the getJCas() method call on the CAS object; likewise, you
can get the CAS object from a JCas by using the getCAS() method call on the JCas object.
There is also a low level CAS interface that is not part of the official API, and is
intended for internal use only &ndash; it is not documented here.</para>
<para>The type system metadata APIs are found in the TypeSystem interface. The objects
defining each type and feature are defined by the interfaces Type and Feature. The
Type interface has methods to see what types subsume other types, to iterate over the
types available, and to extract information about the types, including what
features it has. The Feature interface has methods that get what type it belongs to,
its name, and its range (the kind of values it can hold).</para>
<para>The FSIndexRepository gives you access to methods to get instances of indexes, and
also provides access to the iterator over all indexed feature structures:
<literal>getAllIndexedFS(aType)</literal>.
The FSIndex and AnnotationIndex objects give you methods to create instances of
iterators.</para>
<para>Iterators and the CAS methods that create new feature structures return
FeatureStructure objects. These objects can be used to set and get the values of
defined features within them.</para>
</section>
</section>
<section id="ugr.ref.cas.typemerging">
<title>Type Merging</title>
<para>When annotators are combined in an aggregate, their defined type systems are merged.
This is designed to support independent development of annotator components. The merge
results in a single defined type system for CASes that flow through a particular set of
annotators.</para>
<para>The basic operation of a type system merge is to iterate through all the defined types,
and if two annotators define the same fully qualified type name,
to take the features defined for those types
and form a logical union of those features. This operation requires that same-named features
have the same range type names. The resulting type system has features comprising the union
of all features over all the various definitions for this type in different annotators.
</para>
<para>Feature merging checks that for all features having the same name in a type, that the
range type is identical; otherwise an error is signaled.</para>
<para>Types are combined for merging when their fully qualified names are the same.
Two different definitions can be merged even if their supertype definitions do not match, if
one supertype subsumes the other supertype; otherwise an error is signaled. Likewise, two types
with the same name can be merged only if their features can be merged.
</para>
</section>
<section id="ugr.ref.cas.limitedmultipleaccess">
<title>Limited multi-thread access to read-only CASs</title>
<para>Some applications may find it useful to scale up pipelines and run these in parallel.</para>
<para>
Generally, CASs are not threadsafe, and only one thread at a time may operate on it. In many
scenarios, a CAS may be initialized and then filled with Feature Structures, and after some point,
no more updates to that particular CAS will be done.</para>
<para>
If a CAS is no longer going to be changed, it is possible to
access it on multiple threads in a read-only mode, simultaneously, with some limitations. Limitations
arise because some UIMA Framework activities may update internal CAS data structures.</para>
<para>Operational data is updated while running a pipeline when a PEAR is entered or exited,
because PEARs establish new class loaders and can potentially switch the JCas classes being used
(This happens because the class loaders might define different JCas cover classes
implementing the same UIMA type).
Because of this, you cannot have multiple pipelines accessing a CAS in read-only mode if one or more of those
pipelines contains a PEAR. There are other edge cases where this may happen as well; for example, if you are
running a pipeline with an Extension Class Loader,
and have a callback routine loaded under a different class loader, UIMA will switch the JCas classes when
calling the callback.
</para>
</section>
</chapter>