uima-docbook-references/src/docbook/ref.cas.xml - uima-uimaj - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
 "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
 <!ENTITY imgroot "images/references/ref.cas/" >
 <!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >
 %uimaents;
 ]>
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->
 <chapter id="ugr.ref.cas">
   <title>CAS Reference</title>

   <para>The CAS (Common Analysis System) is the part of the Unstructured Information
     Management Architecture (UIMA) that is concerned with creating and handling the data
     that annotators manipulate.</para>

   <para>Java users typically use the JCas (Java interface to the CAS) when manipulating
     objects in the CAS. This chapter describes an alternative interface to the CAS which
     allows discovery and specification of types and features at run time. It is recommended
     for use when the using code cannot know ahead of time the type system it will be dealing
     with.</para>

   <para>Use of the CAS as described here is also recommended (or necessary) when components add
   to the definitions of types of other components.  This UIMA feature allows users to add features
   to a type that was already defined elsewhere.  When this feature is used in conjunction with the
   JCas, it can lead to problems with class loading.  This is because different JCas representations
   of a single type are generated by the different components, and only one of them is loaded
   (unless you are using Pear descriptors).  Note:
   we do not recommend that you add features to pre-existing types.  A type should be defined in one
   place only, and then there is no problem with using the JCas.  However, if you do use this feature,
   do not use the JCas.  Similarly, if you distribute your components for inclusion in somebody else's
   UIMA application, and you're not sure that they won't add features to your types, do not use the
   JCas for the same reasons.
   </para>

   <section id="ugr.ref.cas.javadocs">
     <title>Javadocs</title>

     <para>The subdirectory <literal>docs/api</literal> contains the documentation
       details of all the classes, methods, and constants for the APIs discussed here. Please
       refer to this for details on the methods, classes and constants, specifically in the
       packages <literal>org.apache.uima.cas.*</literal>.</para>
   </section>

   <section id="ugr.ref.cas.overview">
     <title>CAS Overview</title>

     <para>There are three<footnote><para>A fourth part, the Subject of Analysis,
       is discussed in <olink targetdoc="&uima_docs_tutorial_guides;"
         /> <olink targetdoc="&uima_docs_tutorial_guides;"
         targetptr="ugr.tug.aas"/>.</para></footnote> main parts to the CAS: the type system, data creation and
       manipulation, and indexing.  We will start with a brief
       description of these components.</para>
     <section id="ugr.ref.cas.type_system">
       <title>The Type System</title>

       <para>The type system specifies what kind of data you will be able to manipulate in your
         annotators. The type system defines two kinds of entities, types and features. Types
         are arranged in a single inheritance tree and define the kinds of entities (objects)
         you can manipulate in the CAS. Features optionally specify slots or fields within a
         type. The correspondence to Java is to equate a CAS Type to a Java Class, and the CAS
         Features to fields within the type. A critical difference is that CAS types have no
         methods; they are just data structures with named slots (features). These features can
         have as values primitive things like integers, floating point numbers, and strings,
         and they also can hold references to other instances of objects in the CAS. We call
         instances of the data structures declared by the type system <quote>feature
         structures</quote> (not to be confused with <quote>features</quote>). Feature
         structures are similar to the many variants of record structures found in computer
         science.<footnote><para> The name <quote>feature structure</quote> comes from
         terminology used in linguistics.</para></footnote></para>

       <para>Each CAS Type defines a supertype; it is a subtype of that supertype. This means
         that any features that the supertype defines are features of the subtype; in other
         words, it inherits its supertype&apos;s features. Only single inheritance is
         supported; a type&apos;s feature set is the union of all of the features in its
         supertype hierarchy. There is a built-in type called uima.cas.TOP; this is the top,
         root node of the inheritance tree. It defines no features.</para>

       <para>The values that can be stored in features are either built-in primitive values or
         references to other feature structures. The primitive values are
         <literal>boolean</literal>, <literal>byte</literal>,
         <literal>short</literal> (16 bit integers), <literal>integer</literal> (32
         bit), <literal>long</literal> (64 bit), <literal>float</literal> (32 bit),
         <literal>double</literal> (64 bit floats) and strings; the official names of these
         are <literal>uima.cas.Boolean</literal>, <literal>uima.cas.Byte</literal>,
         <literal>uima.cas.Short</literal>, <literal>uima.cas.Integer</literal>,
         <literal>uima.cas.Long</literal>, <literal>uima.cas.Float</literal>
         ,<literal> uima.cas.Double</literal> and <literal>uima.cas.String</literal>
         . The strings are Java strings, and characters are Java characters.  Technically, this means
         that characters are UTF-16 code points, which is not quite the same as a Unicode character.
         This distinction should make no difference for almost all applications.
         The CAS also defines other basic built-in types for arrays of these, plus arrays of
         references to other objects, called <literal>uima.cas.IntegerArray</literal>
         ,<literal> uima.cas.FloatArray</literal>,
         <literal>uima.cas.StringArray</literal>,
         <literal>uima.cas.FSArray</literal>, etc.</para>

       <para>The CAS also defines a built-in type called
         <literal>uima.tcas.Annotation</literal> which inherits from
         <literal>uima.cas.AnnotationBase</literal> which in turn inherits from
         <literal>uima.cas.TOP</literal>. There are two features defined by this type,
         called <literal>begin</literal> and <literal>end</literal>, both of which are
         integer valued.</para>

     </section>

     <section id="ugr.ref.cas.creating_accessing_manipulating_data">
       <title>Creating, accessing and manipulating data</title>
       <titleabbrev>Creating/Accessing/Changing data</titleabbrev>

       <para>
         Creating and accessing data in the CAS requires knowledge about the types and features
         defined in the type system.  The idea is similar to other data access APIs, such as the XML
         DOM or SAX APIs, or database access APIs such as JDBC.  Contrary to those APIs, however, the
         CAS does not use the names of type system entities directly in the APIs.  Rather, you use
         the type system to access type and feature entities by name, then use these entities in the
         data manipulation APIs.  This can be compared to the Java reflection APIs: the type system
         is comparable to the Java class loader, and the type and feature objects to the
         <literal>java.lang.Class</literal> and <literal>java.lang.reflect.Field</literal> classes.
       </para>

       <para>
         Why does it have to be this complicated?  You wouldn&apos;t normally use reflection to create a
         Java object, either.  As mentioned earlier, the JCas provides the more straightforward
         method to manipulate CAS data.  The CAS access methods described here need only be used for
         generic types of applications that need to be able to handle any kind of data (e.g., generic
         tooling) or when the JCas may not be used for other reasons.  The generic kinds of applications
         are exactly the ones where you would use the reflection API in Java as well.
       </para>

     </section>

     <section id="ugr.ref.cas.creating_using_indexes">
       <title>Creating and using indexes</title>

       <para>Each view of a CAS provides a set of indexes for that view. Instances of Types (that is, Feature
         Structures) can be added to a view&apos;s indexes. These indexes provide
         a way for annotators to locate existing data in the CAS, using a specific index (or the
         method <literal>getAllIndexedFS</literal> of the object <literal>FSIndexRepository</literal>) to
         retrieve the Feature Structures that were previously created. If you want the data you
         Newly created Feature Structures are not automatically added to the indexes; you choose which
         Feature Structures to add and use one of several APIs to add them.
         </para>

       <para>Indexes are named and are associated with a CAS Type; they are used to index
         instances of that CAS type (including instances of that type&apos;s subtypes). If
         you are using multiple views (see <olink
           targetdoc="&uima_docs_tutorial_guides;"/> <olink
           targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.mvs"/>),
         each view contains a separate instantiation of all of the indexes.
         To access an index, you
         minimally need to know its name. A CAS view provides an index repository which you can
         query for indexes for that view. Once you have a handle to an index, you can get
         information about the feature structures in the index, the size of the index, as well
         as an iterator over the feature structures.</para>

       <para>There are three kinds of indexes:
         <itemizedlist spacing="compact">
           <listitem>
             <para>bag - no ordering</para>
           </listitem>
           <listitem>
             <para>set - uses a user-specfied set of keys to define equality; holds one instance of the set of equal items.</para>
           </listitem>
           <listitem>
             <para>sorted - uses a user-specified set of keys to define ordering.</para>
           </listitem>
         </itemizedlist>
       </para>

       <para>For set indexes, the comparator keys are augmented with an implicit additional field - the type of the
         feature structure.  This means that an index over Annotations, having subtype Token, and a key of the "begin" value,
         will behave as follows:

         <itemizedlist>
           <listitem><para>If you make two Tokens (or two Annotations), both having a begin value of 17, and add both of them to the indexes,
             only one of them will be in the index.</para>
           </listitem>
           <listitem><para>If you make 1 Token and 1 Annotation, both having a begin value of 17, and add both of them to the indexes,
             both of them will be in the index (because the types are different).
           </para></listitem>
         </itemizedlist>
       </para>

       <para>Indexes are defined in the XML descriptor metadata for the application. Each CAS
         View has its own, separate instantiation of indexes based on these definitions,
         kept in the view's index repository. When you obtain an index, it is always from a
         particular CAS view's index repository.
         When you index an item, it is always added to all indexes where it
         belongs, within just the view's repository. You can specify different repositories
         (associated with different CAS views) to use; a given Feature Structure instance
         may be indexed in more than one CAS View (unless it is a subtype of AnnotationBase).</para>

       <para>Indexes implement the Iterable interface, so you may use the Java enhanced for loop to iterate over them.</para>

       <para>You can also get iterators from indexes;
         iterators allow you to enumerate the feature structures in an index.  There are two kinds of iterators supported:
         the regular Java iterator API, and a specific FS iterator API
         where the usual Java iterator APIs (<literal>hasNext()</literal> and <literal>next()</literal>)
         are augmented by <literal>isValid()</literal>, <literal>moveToNext() / moveToPrevious()</literal> (which does
         not return an element) and <literal>get()</literal>.  Finally, there is a <literal>moveTo(FeatureStructure)</literal>
         API, which, for sorted indexes, moves the iteration point to the left-most (among otherwise "equal") item
         in the index which compares "equal" to the given FeatureStructure, using the index's defined comparator.
       </para>

       <para>
         Which API style you use is up to you,
         but we do not recommend mixing the styles as the results are sometimes unexpected.  If you
         just want to iterate over an index from start to finish, either style is equally appropriate.
         If you also use <literal>moveTo(FeatureStructure fs)</literal> and
         <literal>moveToPrevious()</literal>, it is better to use the special FS iterator style.
       </para>

       <note><para>The reason to not mix these styles is that you might be thinking that
         next() followed by moveToPrevious() would always work.  This is not true, because
         next() returns the "current" element, and advances to the next position, which might be
         beyond the last element.  At that point, the iterator becomes "invalid", and
         moveToNext and moveToPrevious no longer move the iterator.  But you can
         call these methods on the iterator &mdash; moveToFirst(), moveToLast(), or moveTo(FS) &mdash; to reset it.</para></note>

       <para>Indexes are created by specifying them in the annotator&apos;s or
         aggregate&apos;s resource descriptor. An index specification includes its name,
         the CAS type being indexed, the kind (bag, set or sorted) of index it is, and an (optional) set of keys.
         The keys are used for set and sorted indexes, and specify what values are used for
         ordering, or (for sets) what values are used to determine set equality.
         When a CAS pipeline is created, all index
         specifications are combined; duplicate definitions (having the same name) are
         allowed only if their definitions are the same. </para>

       <para>Feature structure instances need to be explicitly added to the index repository by a
         method call. Feature structures that are not indexed will not be visible to other
         annotators, (unless they are located via being referenced by some other feature of
         another feature structure, which is indexed, or through a chain of these).</para>

       <para>The framework defines an unnamed bag index which indexes all types.  The
         only access provided for this index is the getAllIndexedFS(type) method on the
         index repository, which returns an iterator over all indexed instances of the
         specified type (including its subtypes) for that CAS View.
       </para>

       <para>The framework defines one standard, built-in annotation index, called
         AnnotationIndex, which indexes the <literal>uima.tcas.Annotation</literal>
         type: all feature structures of type <literal>uima.tcas.Annotation</literal> or
         its subtypes are automatically indexed with this built-in index.</para>

       <para>The ordering relation used by this index is to first order by the value of the
         <quote>begin</quote> features (in ascending order) and then by the value of the
         <quote>end</quote> feature (in descending order), and then, finally, by the
         Type Priority. This ordering insures that
         longer annotations starting at the same spot come before shorter ones. For Subjects
         of Analysis other than Text, this may not be an appropriate index.</para>

       <para>In addition to normal iterators, there is a <literal>select</literal> API, documented
        in the Version 3 Users guide, which provides additional capabilities for accessing
        Feature Structures via the indexes.</para>

     </section>
   </section>

   <section id="ugr.ref.cas.builtin_types">
     <title>Built-in CAS Types</title>

     <para>The CAS has two kinds of built-in types &ndash; primitive and non-primitive. The
       primitive types are:

       <itemizedlist spacing="compact">
         <listitem><para>uima.cas.Boolean</para></listitem>
         <listitem><para>uima.cas.Byte</para></listitem>
         <listitem><para>uima.cas.Short</para></listitem>
         <listitem><para>uima.cas.Integer</para></listitem>
         <listitem><para>uima.cas.Long</para></listitem>
         <listitem><para>uima.cas.Float</para></listitem>
         <listitem><para>uima.cas.Double</para></listitem>
         <listitem><para>uima.cas.String</para></listitem>
       </itemizedlist></para>

     <para>The <literal>Byte, Short, Integer, </literal>and<literal> Long</literal> are
       all signed integer types, of length 8, 16, 32, and 64 bits. The
       <literal>Double</literal> type is 64 bit floating point. The
       <literal>String</literal> type can be subtyped to create sets of allowed values; see
         <olink targetdoc="&uima_docs_ref;"
         targetptr="ugr.ref.xml.component_descriptor.type_system.string_subtypes"/>.
       These types can be used to specify the range of a String-valued feature. They act like
       Strings, but have additional checking to insure the setting of values into them
       conforms to one of the allowed values, or to null (which is the value if it is not set).
       Note that the other primitive types cannot be used
       as a supertype for another type definition; only
       <literal>uima.cas.String</literal> can be sub-typed.</para>

     <para>The non-primitive types exist in a type hierarchy; the top of the hierarchy is the
       type <literal>uima.cas.TOP</literal>. All other non-primitive types inherit from
       some supertype.</para>

     <para>There are 9 built-in array types. These arrays have a size specified when they are
       created; the size is fixed at creation time. They are named:

       <itemizedlist spacing="compact">
         <listitem><para>uima.cas.BooleanArray</para></listitem>
         <listitem><para>uima.cas.ByteArray</para></listitem>
         <listitem><para>uima.cas.ShortArray</para></listitem>
         <listitem><para>uima.cas.IntegerArray</para></listitem>
         <listitem><para>uima.cas.LongArray</para></listitem>
         <listitem><para>uima.cas.FloatArray</para></listitem>
         <listitem><para>uima.cas.DoubleArray</para></listitem>
         <listitem><para>uima.cas.StringArray</para></listitem>
         <listitem><para>uima.cas.FSArray</para></listitem>
       </itemizedlist></para>

     <para>The <literal>uima.cas.FSArray</literal> type is an array whose elements are
       arbitrary other feature structures (instances of non-primitive types).</para>

     <para>The JCas cover classes for the array types support the Iterable API, so you may
     write extended for loops over instances of these.  For example:
     <programlisting>FSArray&lt;MyType&gt; myArray = ...
 for (MyType fs : myArray) {
   some_method(fs);
 }</programlisting>
     </para>

     <para>There are 3 built-in types associated with the artifact being analyzed:

       <itemizedlist spacing="compact">
         <listitem><para>uima.cas.AnnotationBase</para></listitem>
         <listitem><para>uima.tcas.Annotation</para></listitem>
         <listitem><para>uima.tcas.DocumentAnnotation</para></listitem>
       </itemizedlist></para>

     <para>The <literal>AnnotationBase</literal> type defines one system-used feature
       which specifies for an annotation the subject of analysis (Sofa) to which it refers. The
       Annotation type extends from this and defines 2 features, taking
       <literal>uima.cas.Integer</literal> values, called <literal>begin</literal>
       and <literal>end</literal>. The <literal>begin</literal> feature typically
       identifies the start of a span of text the annotation covers; the
       <literal>end</literal> feature identifies the end. The values refer to character
       offsets; the starting index is 0. An annotation of the word <quote>CAS</quote> in a text
       <quote>CAS Reference</quote> would have a start index of 0, and an end index of 3; the
       difference between end and start is the length of the span the annotation refers
       to.</para>

     <para>Annotations are always with respect to some Sofa (Subject of Analysis &ndash; see
         <olink targetdoc="&uima_docs_tutorial_guides;"/>
         <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aas"/>
       .</para>
     <note><para>Artifacts which are not text strings may have a different interpretation of
     the meaning of begin and end, or may define their own kind of annotation, extending from
     <literal>AnnotationBase</literal>. </para></note>

     <para id="ugr.ref.cas.document_annotation">The <literal>DocumentAnnotation</literal> type has one special instance. It is
       a subtype of the Annotation type, and the built-in definition defines one feature,
       <literal>language</literal>, which is a string indicating the language of the
       document in the CAS. The value of this language feature is used by the system to control
       flow among annotators when the <quote>CapabilityLanguageFlow</quote> mode is used,
       allowing the flow to skip over annotators that don&apos;t process particular
       languages. Users may extend this type by adding additional features to it, using the XML
       Descriptor element for defining a type.</para>

     <note><para>
       We do <emphasis>not</emphasis> recommend extending the <literal>DocumentAnnotation</literal>
       type.  If you do, you must <emphasis>not</emphasis> use the JCas, for the reasons stated
       earlier.
     </para></note>

     <para>Each CAS view has a different associated instance of the
       <literal>DocumentAnnotation</literal> type.  On the CAS, use
       <literal>getDocumentationAnnotation()</literal> to access the
       <literal>DocumentAnnotation</literal>.</para>

     <para>There are also built-in types supporting linked lists, similar to the ones available in
     Java and other programming languages. Their use is
       constrained by the usual properties of linked lists: not very space efficient, no (efficient)
       random access, but an easy choice if you don't know how long your list will be ahead of time. The
       implementation is type specific; there are different list building objects for each of
       the primitive types, plus one for general feature structures. Here are the type names:
       <itemizedlist spacing="compact">
         <listitem><para>uima.cas.FloatList</para></listitem>
         <listitem><para>uima.cas.IntegerList</para></listitem>
         <listitem><para>uima.cas.StringList</para></listitem>
         <listitem><para>uima.cas.FSList</para>
           <para></para></listitem>
         <listitem><para>uima.cas.EmptyFloatList</para></listitem>
         <listitem><para>uima.cas.EmptyIntegerList</para></listitem>
         <listitem><para>uima.cas.EmptyStringList</para></listitem>
         <listitem><para>uima.cas.EmptyFSList</para>
           <para></para></listitem>
         <listitem><para>uima.cas.NonEmptyFloatList</para></listitem>
         <listitem><para>uima.cas.NonEmptyIntegerList</para></listitem>
         <listitem><para>uima.cas.NonEmptyStringList</para></listitem>
         <listitem><para>uima.cas.NonEmptyFSList</para></listitem>

       </itemizedlist></para>

     <para>For the primitive types <literal>Float</literal>,
       <literal>Integer</literal>, <literal>String</literal> and
       <literal>FeatureStructure</literal>, there is a base type, for instance,
       <literal>uima.cas.FloatList</literal>. For each of these, there are two subtypes,
       corresponding to a non-empty element, and a marker that serves to indicate the end of the
       list, or an empty list. The non-empty types define two features &ndash;
       <literal>head</literal> and <literal>tail</literal>. The head feature holds the
       particular value for that part of the list. The tail refers to the next list object
       (either a non-empty one or the empty version to indicate the end of the list).</para>

     <para>For JCas users, the new operator for the NonEmptyXyzList classes includes a 3 argument version
     where you may specify the head and tail values as part of the constructor.  The JCas
     cover classes for these implement
     a <code>push(item)</code> method which creates a new non-empty node, sets the <code>head</code> value
     to <code>item</code>, and the tail to the node it is called on, and returns the new node.
     These classes also implement Iterable, so you can use the enhanced Java <code>for</code> operator.
     The iterator stops when it gets to the end of the list, determined by either the tail being null or
     the element being one of the EmptyXXXList elements.
     Here's a StringList example:
     <programlisting>StringList sl = jcas.emptyStringList();
 sl = sl.push("2");
 sl = sl.push("1");

 for (String s : sl) {
   someMethod(s);  // some sample use
 }</programlisting>

     </para>

     <para>There are no other built-in types. Users are free to define their own type systems,
       building upon these types.</para>

   </section>

   <section id="ugr.ref.cas.accessing_the_type_system">
     <title>Accessing the type system</title>

     <para>
       During annotator processing, or outside an annotator, access the type system by calling
       <literal>CAS.getTypeSystem()</literal>.
     </para>

     <para>However, CAS annotators implement an additional method,
       <literal>typeSystemInit()</literal>, which is called by the UIMA framework before the
       annotator&apos;s process method. This method, implemented by the annotator writer,
       is passed a reference to the CAS&apos;s type system metadata. The method typically uses
       the type system APIs to obtain type and feature objects corresponding to all the types
       and features the annotator will be using in its process method. This initialization
       step should not be done during an annotator&apos;s initialize method since the type
       system can change after the initialize method is called; it should not be done during the
       process method, since this is presumably work that is identical for each incoming
       document, and so should be performed only when the type system changes (which will be a
       rare event). The UIMA framework guarantees it will call the <literal>typeSystemInit
       </literal>method of an annotator whenever the type system changes, before calling the
       annotator&apos;s <literal>process()</literal> method.</para>

     <para>The initialization done by <literal>typeSystemInit()</literal> is done by the
       UIMA framework when you use the JCas APIs; you only need to provide a
       <literal>typeSystemInit()</literal> method, as described here, when you are not using
       the JCas approach.</para>

     <section id="ugr.ref.cas.type_system.printer_example">
       <title>TypeSystemPrinter example</title>

       <para>Here is a code fragment that, given a CAS Type System, will print a list of all
         types.</para>


       <programlisting>// Get all type names from the type system
 // and print them to stdout.
 private void listTypes1(TypeSystem ts) {
   for (Type t : ts) {
     // print its name.
     System.out.println(t.getName());
   }
 }</programlisting>

       <para>This method is passed the type system as a parameter.  From the type system, we can
         get an iterator
         over all the types. If you run this against a CAS created with no additional
         user-defined types, we should see something like this on the console:</para>

       <programlisting>Types in the type system:
 uima.cas.Boolean
 uima.cas.Byte
 uima.cas.Short
 uima.cas.Integer
 uima.cas.Long
 uima.cas.ArrayBase
 ...
         </programlisting>

       <para>If the type system had user-defined types these would show up too. Note that some
         of these types are not directly creatable &ndash; they are types used by the framework
         in the type hierarchy (e.g. uima.cas.ArrayBase).</para>

       <para>CAS type names include a name-space prefix. The components of a type name are
         separated by the dot (.). A type name component must start with a Unicode letter,
         followed by an arbitrary sequence of letters, digits and the underscore (_). By
         convention, the last component of a type name starts with an uppercase letter, the
         rest start with a lowercase letter.</para>

       <para>Listing the type names is mildly useful, but it would be even better if we could see
         the inheritance relation between the types. The following code prints the
         inheritance tree in indented format.</para>


       <programlisting>private static final int INDENT = 2;
 private void listTypes2(TypeSystem ts) {
   // Get the root of the inheritance tree.
   Type top = ts.getTopType();
   // Recursively print the tree.
   printInheritanceTree(ts, top, 0);
 }

 private void printInheritanceTree(TypeSystem ts, Type type, int level) {
   indent(level); // Print indentation.
   System.out.println(type.getName());
   // Get a vector of the immediate subtypes.
   Vector subTypes =
     ts.getDirectlySubsumedTypes(type);
   ++level; // Increase the indentation level.
   for (int i = 0; i &lt; subTypes.size(); i++) {
     // Print the subtypes.
     printInheritanceTree(ts, (Type) subTypes.get(i), level);
   }
 }

 // A simple, inefficient indenter
 private void indent(int level) {
   int spaces = level * INDENT;
   for (int i = 0; i &lt; spaces; i++) {
     System.out.print(" ");
   }
 }</programlisting>

       <para> This example shows that you can traverse the type hierarchy by starting at the top
         with TypeSystem.getTopType and by retrieving subtypes with
         <literal>TypeSystem.getDirectlySubsumedTypes()</literal>.</para>

       <para>The Javadocs also have APIs that allow you to access the features, as well as what
         the allowed value type is for that feature. Here is sample code which prints out all the
         features of all the types, together with the allowed value types (the feature
         <quote>range</quote>). Each feature has a <quote>domain</quote> which is the type
         where it is defined, as well as a <quote>range</quote>.


         <programlisting>private void listFeatures2(TypeSystem ts) {
   Iterator featureIterator = ts.getFeatures();
   Feature f;
   System.out.println("Features in the type system:");
   while (featureIterator.hasNext()) {
     f = (Feature) featureIterator.next();
     System.out.println(
       f.getShortName() + ": " +
       f.getDomain() + " -&gt; " + f.getRange());
   }
   System.out.println();
 }</programlisting></para>

       <para>We can ask a feature object for its domain (the type it is defined on) and its range
         (the type of the value of the feature). The terminology derives from the fact that
         features can be viewed as functions on subspaces of the object space.</para>

     </section>

     <section id="ugr.ref.cas.cas_apis_create_modify_feature_structures">
       <title>Using the CAS APIs to create and modify feature structures</title>
       <titleabbrev>Using CAS APIs: Feature Structures</titleabbrev>

       <para>Assume a type system declaration that defines two types: Entity and Person.
         Entity has no features defined within it but inherits from uima.tcas.Annotation
         &ndash; so it has the begin and end features. Person is, in turn, a subtype of Entity,
         and adds firstName and lastName features. CAS type systems are declaratively
         specified using XML; the format of this XML is described in <olink
           targetdoc="&uima_docs_ref;"
           targetptr="ugr.ref.xml.component_descriptor.type_system"/>.


         <programlisting><![CDATA[<!-- Type System Definition -->
 <typeSystemDescription>
   <types>
     <typeDescription>
       <name>com.xyz.proj.Entity</name>
       <description />
       <supertypeName>uima.tcas.Annotation</supertypeName>
     </typeDescription>
     <typeDescription>
       <name>Person</name>
       <description />
       <supertypeName>com.xyz.proj.Entity </supertypeName>
       <features>
         <featureDescription>
           <name>firstName</name>
           <description />
           <rangeTypeName>uima.cas.String</rangeTypeName>
         </featureDescription>
         <featureDescription>
           <name>lastName</name>
           <description />
           <rangeTypeName>uima.cas.String</rangeTypeName>
         </featureDescription>
       </features>
     </typeDescription>
   </types>
 </typeSystemDescription>]]></programlisting></para>

   <para>
     To be able to access types and features, we need to know their names.  The CAS interface defines
     constants that hold the names of built-in feature names, such as, e.g.,
     <literal>CAS.TYPE_NAME_INTEGER</literal>.  It is good programming practice to create such
     constants for the types and features you define, for your own use as well as for others who will
     be using your annotators.
   </para>


       <programlisting>/** Entity type name constant. */
 public static final String ENTITY_TYPE_NAME = "com.xyz.proj.Entity";

 /** Person type name constant. */
 public static final String PERSON_TYPE_NAME = "com. xyz.proj.Person";

 /** First name feature name constant. */
 public static final String FIRST_NAME_FEAT_NAME = "firstName";

 /** Last name feature name constant. */
 public static final String LAST_NAME_FEAT_NAME = "lastName";</programlisting>

       <para>Next we define type and feature member variables; these will hold the values of the
         type and feature objects needed by the CAS APIs, to be assigned during
         <literal>typeSystemInit()</literal>.</para>


       <programlisting>// Type system object variables
 private Type entityType;
 private Type personType;
 private Feature firstNameFeature;
 private Feature lastNameFeature;
 private Type stringType;</programlisting>

       <para>The type system does not throw an exception if we ask for something that is
         not known, it simply returns null; therefore the code checks for this and throws a proper
         exception.  We require all these types and features to be defined for the annotator to
         work.  One might imagine situations where certain computations are predicated on some type
         or feature being defined in the type system, but that is not the case here.</para>


       <programlisting>// Get a type object corresponding to a name.
 // If it doesn&apos;t exist, throw an exception.
 private Type initType(String typeName)
   throws AnnotatorInitializationException {
   Type type = ts.getType(typeName);
   if (type == null) {
     throw new AnnotatorInitializationException(
       AnnotatorInitializationException.TYPE_NOT_FOUND,
       new Object[] { this.getClass().getName(), typeName });
   }
   return type;
 }

 // We add similar code for retrieving feature objects.
 // Get a feature object from a name and a type object.
 // If it doesn&apos;t exist, throw an exception.
 private Feature initFeature(String featName, Type type)
   throws AnnotatorInitializationException {
   Feature feat = type.getFeatureByBaseName(featName);
   if (feat == null) {
     throw new AnnotatorInitializationException(
       AnnotatorInitializationException.FEATURE_NOT_FOUND,
       new Object[] { this.getClass().getName(), featName });
   }
   return feat;
 }</programlisting>

       <para>Using these two functions, code for initializing the type system described
         above would be:


         <programlisting>public void typeSystemInit(TypeSystem aTypeSystem)
     throws AnalysisEngineProcessException {
   this.typeSystem = aTypeSystem;
   // Set type system member variables.
   this.entityType = initType(ENTITY_TYPE_NAME);
   this.personType = initType(PERSON_TYPE_NAME);
   this.firstNameFeature =
     initFeature(FIRST_NAME_FEAT_NAME, personType);
   this.lastNameFeature =
     initFeature(LAST_NAME_FEAT_NAME, personType);
   this.stringType = initType(CAS.TYPE_NAME_STRING);
 }</programlisting></para>

       <para>Note that we initialize the string type by using a type name constant from the
         CAS.</para>

     </section>
   </section>

   <section id="ugr.ref.cas.creating_feature_structures">
     <title>Creating feature structures</title>

     <para>To create feature structures in JCas, we use the Java <quote>new</quote>
       operator. In the CAS, we use one of several different API methods on the CAS object,
       depending on which of the 10 basic kinds of feature structures we are creating (a plain
       feature structure, or an instance of the built-in primitive type arrays or FSArray).
       There are is also a method to create an instance of a
       <literal>uima.tcas.Annotation</literal>, setting the begin and end
       values.</para>

     <para>Once a feature structure is created, it needs to be added to the CAS indexes (unless
       it will be accessed via some reference from another accessible feature structure). The
       CAS provides this API: Assuming aCAS holds a reference to a CAS, and token holds a
       reference to a newly created feature structure, here&apos;s the code to add that
       feature structure to all the relevant CAS indexes:</para>


     <programlisting>    // Add the token to the index repository.
     aCAS.addFsToIndexes(token);</programlisting>

     <para>There is also a corresponding <literal>removeFsFromIndexes(token)</literal>
       method on CAS objects.</para>

     <para>As of version 2.4.1, there are two methods you can use on an index repository
     to efficiently bulk-remove all
     instances of particular types of feature structures from a particular view.  One of these,
     <code>aCas.getIndexRepository().removeAllIncludingSubtypes(aType)</code> removes all instances of a particular
     type, including instances which are subtypes of the specified type.  The other,
     <code>aCas.getIndexRepository().removeAllExcludingSubtypes(aType)</code> remove all instances of a particular
     type, only.  In both cases, the removal is done from the particular view of the CAS referenced
     by aCas.</para>

     <section id="ugr.ref.cas.updating_indexed_feature_structures">
     <title>Updating indexed feature structures</title>
     <para>Version 2.7.0 added protection for indexes when feature structure key
     value features are updated.  By default this protection is automatic, but
     at some performance cost.  Users may optimize this further.</para>

     <para>Protection is needed because some of the indexes (the Sorted and Set types) use comparators defined
     to use values of the particular features; if these values
     need to be changed after the feature structure is added to the indexes,
     the correct way to do this is to:
     <orderedlist spacing="compact">
       <listitem><para>completely remove the item from all indexes where it is indexed, in all views
       where it is indexed,</para>
       </listitem>
       <listitem><para>update the value of the features being used as keys,</para></listitem>
       <listitem><para>add the item back to the indexes, in all views.</para></listitem>
     </orderedlist></para>

       <note><para>It&rsquo;s OK to change feature values which are not used in determining
       sort ordering (or set membership), without removing and re-adding back to the index.
       </para></note>

     <!-- <para>To completely remove an item from the indexes may entail removing it multiple times, if it was
     added multiple times and (as of version 2.7.0) the JVM global property
     <code>uima.allow_duplicate_add_to_indexes</code> is true.</para> -->

     <para>The automatic protection checks for updates of
     features being used as keys, and if it finds an update like this for a feature structure that
     is in the indexes, it removes the feature structure from the indexes, does the update,
     and adds it back.  It will do this for every feature update.  This is obviously not
     efficient when multiple features are being updated; in that case it would better to
     remove the feature structure, do all the updates to all the features needing updates, and then
     do a single add-back operation.</para>

     <para>This is supported in user&rsquo;s code by using the new method <code>protectIndexes</code>
     available in both the CAS and JCas interface.

     Here's two ways
     of using this, one with a try / finally and the other with a Runnable:
             <programlisting>// an approach using try / finally
 AutoCloseable ac = my_cas.protectIndexes();  // my_cas is a CAS or a JCas
 try {
    ...  arbitrary user code which updates features
         which may be "keys" in one or more indexes
 } finally {
   ac.close();
 }

 // This can more compactly be written using the auto-close feature of try:

 try (AutoCloseable ac = my_cas.protectIndexes()) {
    ...  arbitrary user code which updates features
         which may be "keys" in one or more indexes
 }

 // an approach using a Runnable, written in Java 8 lambda syntax
 my_cas.protectIndexes(() -> {
   ... arbitrary user code updating "key" features,
       but no checked exceptions are permitted
   });</programlisting></para>

     <para>The <code>protectIndexes</code> implementation only removes feature structures that
     have features being updated which are used as keys in some index(es). At the end of the scope
     of the protectIndexes, it adds all of these back.  It also skips removing feature structures
     from bag indexes, since these have no keys.</para>

     <para>Within a <code>protectIndexes</code> block, do not do any operations which depend on the
     indexes being valid, such as creating and using an iterator.  This is because the removed FSs
     are only added back at the end of the protectIndexes block.</para>

     <para>The JVM property <code>-Duima.report_fs_update_corrupts_index</code> will generate a log entry
     everytime the frameworks finds (and automatically surrounds with a remove - add-back) an update to
     a feature which could corrupt the index.  The log entries can be identified by scanning for messages
     starting with <code>While FS was in the index, the feature</code> - the message goes on to identify
     the feature in question.  Users can use these reports to find the places in their code where
     they can either change the design to avoid updating these values after the item is indexed, or
     surround the updates with their own <code>protectIndexes</code> blocks.</para>

     <para>Initially, the out-of-the-box defaults
     for the UIMA framework will run with an automatic (but somewhat inefficient) protection.  To improve upon this,
     users would:
     <itemizedlist>
       <listitem><para>Turn on reporting using a global JVM flag <code>
       -Duima.report_fs_update_corrupts_index</code>.
       This will cause a message to be logged each time the automatic protection is being invoked,
       and allows the user to find the spots to improve.</para>
       </listitem>
       <listitem><para>Improve each spot, perhaps by surrounding the update code with a protectIndexes
       block, or by rearranging code to reduce updating feature values used as index keys.</para>
       </listitem>
       <listitem><para>Once the code is no longer generating any reports, you can turn off the
       automatic protection for production runs using the JVM global property
       <code>-Duima.disable_auto_protect_indexes</code>, and rely on the protectIndexes blocks.
       If protection is disabled, then the corruption detection is skipped, making the production
       runs perhaps a bit faster, although this is not significant in most cases.</para></listitem>
       <listitem><para>For automated build systems, there&rsquo;s a JVM parameter,
       <code>-Duima.exception_when_fs_update_corrupts_index</code>, which will throw an
       exception if any automatic recovery situation is encountered.  You can use this
       in build/test scenarios to insure
       (after adding all needed protectIndexes blocks) that the code remains safe for
       turning off the checking in production runs.</para></listitem>

     </itemizedlist>
     </para>

     </section>
   </section>

   <section id="ugr.ref.cas.accessing_modifying_features_of_feature_structures">
     <title>Accessing or modifying features of feature structures</title>
     <titleabbrev>Accessing or modifying Features</titleabbrev>

     <para>Values of individual features for a feature structure can be set or referenced,
       using a set of methods that depend on the type of value that feature is declared to have.
       There are methods on FeatureStructure for this: getBooleanValue, getByteValue,
       getShortValue, getIntValue, getLongValue, getFloatValue, getDoubleValue,
       getStringValue, and getFeatureValue (which means to get a value which in turn is a
       reference to a feature structure). There are corresponding <quote>setter</quote>
       methods, as well. These methods on the feature structure object take as arguments the
       feature object retrieved earlier in the typeSystemInit method.</para>

     <para>Using the previous example, with the type system initialized with type personType
       and feature lastNameFeature, here&apos;s a sample code fragment that gets and sets
       that feature:</para>


     <programlisting>// Assume aPerson is a variable holding an object of type Person
 // get the lastNameFeature value from the feature structure
 String lastName = aPerson.getStringValue(lastNameFeature);
 // set the lastNameFeature value
 aPerson.setStringValue(lastNameFeature, newStringValueForLastName);</programlisting>

     <para>The getters and setters for each of the primitive types are defined in the Javadocs
       as methods of the FeatureStructure interface.</para>

   </section>

   <section id="ugr.ref.cas.indexes_and_iterators">
     <title>Indexes and Iterators</title>

     <para>Each CAS can have many indexes associated with it; each CAS View contains
       a complete set of instantiations of the indexes.   Each index is represented by an
       instance of the type org.apache.uima.cas.FSIndex. You use the object
       org.apache.uima.cas.FSIndexRepository, accessible via a method on a CAS object, to
       retrieve instances of indexes. There are methods that let you select the index
       by name, by type, or by both name and type. Since each index is already associated with a type,
       passing both a name and a type is valid only if the type passed in is the same
       type or a subtype of the one declared in the index specification for the named index. If you
       pass in a subtype, the returned FSIndex object refers to an index that will return only
       items belonging to that subtype (or subtypes of that subtype).</para>

     <para>The returned FSIndex objects are used, in turn, to create iterators.
       There is also a method on the Index Repository, <literal>getAllIndexedFS</literal>,
       which will return an iterator over all indexed Feature Structures (for that CAS View),
       in no particular order.  The iterators
       created can be used like common Java iterators, to sequentially retrieve items
       indexed. If the index represents a sorted index, the items are returned in a sorted
       order, where the sort order is specified in the XML index definition. This XML is part of
       the Component Descriptor, see <olink targetdoc="&uima_docs_ref;"
         targetptr="ugr.ref.xml.component_descriptor.aes.index"/>.</para>

     <para>In UIMA V3, Feature structures may be added to or removed from indexes while iterating
       over them.  If this happens, any iterators already created will continue to operate over the
       before-modification version of the index, unless or until the iterator is re-synchronized with the current
       value of the index via one of the following specific 3 iterator API calls:
       moveToFirst, moveToLast, or moveTo(FeatureStructure).
       ConcurrentModificationException is no longer thrown in UIMA v3.
     </para>

     <para>Feature structures being iterated over may have features which are used as the "keys" of an index, updated.
     If this is done, UIMA will protect the indexes (to prevent index corruption) by automatically removing the
     Feature Structure from the indexes,
     updating the field, and adding the FS back to the index (possibly in a new position).
     This automatic remove / add-back operation no longer makes the iterator throw a ConcurrentModificationException
     (as it did in UIMA Version 2) if the iterator is incremented or decremented;
     existing iterators will continue to operate as if no index modification occurred.
     </para>

     <!-- <para>As of version 2.7.0, a new method on FSIndex, <code>withSnapshotIterators(),</code>
     allows creating a light-weight FSIndex based on the original FSIndex
     that supports doing arbitrary index operations while iterating, and will not throw
     <code>ConcurrentModificationException</code>.  Iterators obtained from this instance use a
     <emphasis>snapshot</emphasis> technique - they create a snapshot of the original index when the
     iterator is created, and then use that snapshot while operating, so the iteration is unaffected by any
     modifications to the actual index.</para>  -->

     <section id="ugr.ref.cas.index.built_in_indexes">
       <title>Built-in Indexes</title>

       <para>An unnamed built-in bag index exists which holds all feature structures which are indexed.
       The only access to this index is the method getAllIndexedFS(Type) which returns an iterator
       over all indexed Feature Structures.</para>

       <para>The CAS also contains a built-in index for the type <literal>uima.tcas.Annotation</literal>, which sorts
         annotations in the order in which they appear in the document. Annotations are sorted first by increasing
         <literal>begin</literal> position. Ties are then broken by <emphasis>decreasing</emphasis>
         <literal>end</literal> position (so that longer annotations come first). Annotations that match in both
         their <literal>begin</literal> and <literal>end</literal> features are sorted using the Type Priority,
         if any are defined
         (see <olink targetdoc="&uima_docs_ref;"
           targetptr="ugr.ref.xml.component_descriptor.aes.type_priority"/> )</para>
     </section>


     <section id="ugr.ref.cas.index.adding_to_indexes">
       <title>Adding Feature Structures to the Indexes</title>

       <para>Feature Structures are added to the indexes by various APIs. These add the Feature Structure to
         <emphasis>all</emphasis> indexes that are defined for the type of that FeatureStructure (or any of its
         supertypes), in a particular view.
         Note that you should not add a Feature Structure to the indexes until you have set values for all
         of the features that may be used as sort keys in an index.</para>

       <para>There are multiple APIs for adding FSs to the index.
         <itemizedlist>
           <listitem><para>(preferred) myFeatureStructure.addToIndexes(). This adds the feature structure instance to the
           view in which it was originally created.</para>
           </listitem>
           <listitem><para>(preferred) myFeatureStructure.addToIndexes(JCas or CAS). This adds the feature structure instance to the
             view represented by the argument.</para>
           </listitem>
           <listitem><para>(older form) casView.addFsToIndexes(myFeatureStructure) or jcasView.addFsToIndexes(myFeatureStructure).
             This adds the feature structure instance to the
             view represented by the cas (or jcas).</para>
           </listitem>
           <listitem><para>(older form) fsIndexRepositoryView.addFsToIndexes(myFeatureStructure).
             This adds the feature structure instance to the
             view represented by the fsIndexRepository instance.</para>
           </listitem>
         </itemizedlist>
       </para>
     </section>

     <section id="ugr.ref.cas.index.iterators">
       <title>Iterators over UIMA Indexes</title>


       <para>Iterators are objects of class <literal>org.apache.uima.cas.FSIterator.</literal> This class
         extends <literal>java.util.Iterator</literal> and implements the normal Java iterator methods, plus
         additional ones that allow moving both forwards and backwards.</para>

       <para>UIMA Indexes implement iterable, so you can use the index directly in a Java extended for loop.</para>

     </section>

     <section id="ugr.ref.cas.index.annotation_index">
       <title>Special iterators for Annotation types</title>

       <para>Note: we recommend using the UIMA V3 select framework, instead of the following.
         It implements all of the following capabilities, and more, in a uniform manner.</para>

       <para>The built-in index over the <literal>uima.tcas.Annotation</literal> type
         named <quote><literal>AnnotationIndex</literal></quote> has additional
         capabilities. To use them, you first get a reference to this built-in index using
         either the <literal>getAnnotationIndex</literal> method on a CAS View object, or
         by asking the <literal>FSIndexRepository</literal> object for an index having the
         particular name <quote>AnnotationIndex</quote>, for example:

         <programlisting>AnnotationIndex idx = aCAS.getAnnotationIndex();
 // or you can iterate over a specific subtype of Annotation:
 AnnotationIndex idx = aCAS.getAnnotationIndex(aType); </programlisting></para>

       <para>This object can be used to produce several additional kinds of iterators. It can
         produce unambiguous iterators; these skip over elements until it finds one where the
         start position of the next annotation is equal to or greater than the end position of
         the previously returned annotation.</para>

       <para>It can also produce several kinds of subiterators; these are iterators whose
         annotations fall within the span of another annotation. This kind of iterator can
         also have the unambiguous property, if desired. It also can be
         <quote>strict</quote> or not; strict means that the returned annotation lies
         completely within the span of the controlling annotation. Non-strict only implies
         that the beginning of the returned annotation falls within the span of the
         controlling annotation.</para>

       <para>There is also a method which produces an <literal>AnnotationTree</literal>
         object, which contains nodes representing the results of doing a strict,
         unambiguous subiterator over the span of some controlling annotation. For more
         details, please refer to the Javadocs for the
         <literal>org.apache.uima.cas.text</literal> package.</para>

     </section>

     <section id="ugr.ref.cas.index.constraints_and_filtered_iterators">
       <title>Constraints and Filtered iterators</title>

       <para>Note: for new code, consider using the select framework plus Streams, instead of
         the following.</para>

       <para>There is a set of API calls that build constraint objects. These objects can be
         used directly to test if a particular feature structure matches (satisfies) the
         constraint, or they can be passed to the createFilteredIterator method to create an
         iterator that skips over instances which fail to satisfy the constraint.</para>

       <para>It is possible to specify a feature value located by following a chain of
         references starting from the feature structure being tested. Here&apos;s a
         scenario to explore this concept. Let&apos;s suppose you have the following type
         system (namespaces are omitted for clarity):

         <blockquote>
           <para><emphasis role="bold">Token</emphasis>, having a feature PartOfSpeech
             which holds a reference to another type (POS)</para>

           <para><emphasis role="bold">POS</emphasis> (a type with many subtypes, each
             representing a different part of speech)</para>

           <para><emphasis role="bold">Noun</emphasis> (a subtype of POS)</para>

           <para><emphasis role="bold">ProperName</emphasis> (a subtype of Noun),
             having a feature Class which holds an integer value encoding some information
             about the proper noun.</para></blockquote></para>

       <para>If you want to filter Token instances, such that only those tokens get through
         which are proper names of class 3 (for example), you would need a test that started with
         a Token instance, followed its PartOfSpeech reference to another instance (the
         ProperName instance) and then tested the Class feature of that instance for a value
         equal to 3.</para>

       <para>To support this, the filtering approach has components that specify tests, and
         components that specify <quote>paths</quote>. The tests that can be done include
         testing references to type instances to see if they are instances of some type or its
         subtypes; this is done with a FSTypeConstraint constraint. Other tests check for
         equality or, for numeric values, ranges.</para>

       <para>Each test may be combined with a path &ndash; to get to the value to test. Tests that
         start from a feature structure instance can be combined with and and or connectors.
         The Javadocs for these are in the package org.apache.uima.cas in the classes that end
         in Constraint, plus the classes ConstraintFactory, FeaturePath and CAS.
         Here&apos;s an example; assume the variable cas holds a reference to a CAS instance.


         <programlisting>// Start by getting the constraint factory from the CAS.
 ConstraintFactory cf = cas.getConstraintFactory();

 // To specify a path to an item to test, you start by
 // creating an empty path.
 FeaturePath path = cas.createFeaturePath();

 // Add POS feature to path, creating one-element path.
 path.addFeature(posFeat);

 // You can extend the chain arbitrarily by adding additional
 // features.

 // Create a new type constraint.

 // Type constraints will check that structures
 // they match against have a type at least as specific
 // as the type specified in the constraint.
 FSTypeConstraint nounConstraint = cf.createTypeConstraint();

 // Set the type (by default it is TOP).
 // This succeeds if the type being tested by this constraint
 // is nounType or a subtype of nounType.
 nounConstraint.add(nounType);

 // Embed the noun constraint under the pos path.
 // This means, associate the test with the path, so it tests the
 // proper value.

 // The result is a test which will
 // match a feature structure that has a posFeat defined
 // which has a value which is an instance of a nounType or
 // one of its subtypes.
 FSMatchConstraint embeddedNoun = cf.embedConstraint(path, nounConstraint);

 // Create a type constraint for token (or a subtype of it)
 FSTypeConstraint tokenConstraint = cf.createTypeConstraint();

 // Set the type.
 tokenConstraint.add(tokenType);

 // Create the final constraint by conjoining the two constraints.
 FSMatchConstraint nounTokenCons = cf.and(nounConstraint, tokenConstraint);

 // Create a filtered iterator from some annotation iterator.
 FSIterator it = cas.createFilteredIterator(annotIt, nounTokenCons);</programlisting>
         </para></section></section>

   <section id="ugr.ref.cas.guide_to_javadocs">
     <title>The CAS API&apos;s &ndash; a guide to the Javadocs</title>
     <titleabbrev>CAS API&apos;s Javadocs</titleabbrev>

     <para>The CAS APIs are organized into 3 Java packages: cas, cas.impl, and cas.text. Most
       of the APIs described here are in the cas package. The cas.impl package contains classes
       used in serializing and deserializing (reading and writing external representations) the
       CAS in various formats, for
       transporting the CAS among local and remote annotators, or for storing the CAS in
       permanent storage. The cas.text contains the APIs that extend the CAS to support
       artifact (including <quote>text</quote>) analysis.</para>

     <section id="ugr.ref.cas.javadocs.cas_package">
       <title>APIs in the CAS package</title>

       <para>The main objects implementing the APIs discussed here are shown in the diagram
         below. The hierarchy represents that there is a way to get from an upper object to an
         instance of the lower object, usually by using a method on the upper object; this is not
         an inheritance hierarchy.
         <figure id="ugr.ref.cas.fig.api_hierarchy">
           <title>CAS Object hierarchy</title>
           <mediaobject>
             <imageobject>
               <imagedata width="5.8in" format="JPG"
                 fileref="&imgroot;image001.png"/>
             </imageobject>
             <textobject><phrase>CAS object hierarchy</phrase></textobject>
           </mediaobject>
         </figure> </para>

       <para>The main Interface is the CAS interface. This has most of the functionality of the
         CAS, except for the type system metadata access, and the indexing access. JCas and CAS
         are alternative representations and API approaches to the CAS; each has a method to
         get the other. You can mix JCas and CAS APIs in your application as needed. To use the
         JCas APIs, you have to create the Java classes that correspond to the CAS types, and
         include them in the Java class path of the application. If you have a CAS object, you can
         get a JCas object by using the getJCas() method call on the CAS object; likewise, you
         can get the CAS object from a JCas by using the getCAS() method call on the JCas object.
         There is also a low level CAS interface that is not part of the official API, and is
         intended for internal use only &ndash; it is not documented here.</para>

       <para>The type system metadata APIs are found in the TypeSystem interface. The objects
         defining each type and feature are defined by the interfaces Type and Feature. The
         Type interface has methods to see what types subsume other types, to iterate over the
         types available, and to extract information about the types, including what
         features it has. The Feature interface has methods that get what type it belongs to,
         its name, and its range (the kind of values it can hold).</para>

       <para>The FSIndexRepository gives you access to methods to get instances of indexes, and
         also provides access to the iterator over all indexed feature structures:
         <literal>getAllIndexedFS(aType)</literal>.
         The FSIndex and AnnotationIndex objects give you methods to create instances of
         iterators.</para>

       <para>Iterators and the CAS methods that create new feature structures return
         FeatureStructure objects. These objects can be used to set and get the values of
         defined features within them.</para>
     </section>
   </section>

   <section id="ugr.ref.cas.typemerging">
     <title>Type Merging</title>

     <para>When annotators are combined in an aggregate, their defined type systems are merged.
     This is designed to support independent development of annotator components.  The merge
     results in a single defined type system for CASes that flow through a particular set of
     annotators.</para>

     <para>The basic operation of a type system merge is to iterate through all the defined types,
     and if two annotators define the same fully qualified type name,
     to take the features defined for those types
     and form a logical union of those features.  This operation requires that same-named features
     have the same range type names.  The resulting type system has features comprising the union
     of all features over all the various definitions for this type in different annotators.
     </para>

     <para>Feature merging checks that for all features having the same name in a type, that the
     range type is identical; otherwise an error is signaled.</para>

     <para>Types are combined for merging when their fully qualified names are the same.
     Two different definitions can be merged even if their supertype definitions do not match, if
     one supertype subsumes the other supertype; otherwise an error is signaled.  Likewise, two types
     with the same name can be merged only if their features can be merged.
     </para>
     </section>

   <section id="ugr.ref.cas.limitedmultipleaccess">
     <title>Limited multi-thread access to read-only CASs</title>

     <para>Some applications may find it useful to scale up pipelines and run these in parallel.</para>
     <para>
     Generally, CASs are not threadsafe, and only one thread at a time may operate on it.  In many
     scenarios, a CAS may be initialized and then filled with Feature Structures, and after some point,
     no more updates to that particular CAS will be done.</para>

     <para>
     If a CAS is no longer going to be changed, it is possible to
     access it on multiple threads in a read-only mode, simultaneously, with some limitations.  Limitations
     arise because some UIMA Framework activities may update internal CAS data structures.</para>

     <para>Operational data is updated while running a pipeline when a PEAR is entered or exited,
     because PEARs establish new class loaders and can potentially switch the JCas classes being used
     (This happens because the class loaders might define different JCas cover classes
     implementing the same UIMA type).
     Because of this, you cannot have multiple pipelines accessing a CAS in read-only mode if one or more of those
     pipelines contains a PEAR. There are other edge cases where this may happen as well; for example, if you are
     running a pipeline with an Extension Class Loader,
     and have a callback routine loaded under a different class loader, UIMA will switch the JCas classes when
     calling the callback.
     </para>
     </section>
 </chapter>