<?xml version="1.0" encoding="UTF-8"?> | |
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" | |
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[ | |
<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" > | |
%uimaents; | |
]> | |
<!-- | |
Licensed to the Apache Software Foundation (ASF) under one | |
or more contributor license agreements. See the NOTICE file | |
distributed with this work for additional information | |
regarding copyright ownership. The ASF licenses this file | |
to you under the Apache License, Version 2.0 (the | |
"License"); you may not use this file except in compliance | |
with the License. You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, | |
software distributed under the License is distributed on an | |
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | |
KIND, either express or implied. See the License for the | |
specific language governing permissions and limitations | |
under the License. | |
--> | |
<chapter id="ugr.ref.jcas"> | |
<title>JCas Reference</title> | |
<para>The CAS is a system for sharing data among annotators, consisting of data structures | |
(definable at run time), sets of indexes over these data, metadata describing these, subjects of | |
analysis, and a high | |
performance serialization/deserialization mechanism. JCas provides Java approach to | |
accessing CAS data, and is based on using generated, specific Java classes for each CAS | |
type.</para> | |
<para>Annotators process one CAS per call to their process method. During processing, | |
annotators can retrieve feature structures from the passed in CAS, add new ones, modify | |
existing ones, and use and update CAS indexes. Of course, an annotator can also use plain | |
Java Objects in addition; but the data in the CAS is what is shared among annotators within | |
an application.</para> | |
<para>All the facilities present in the APIs for the CAS are available when using the JCas | |
APIs; indeed, you can use the getCas() method to get the corresponding CAS object from a | |
JCas (and vice-versa). The JCas APIs often have helper methods that make using this | |
interface more convenient for Java developers.</para> | |
<para>The data in the CAS are typed objects having fields. JCas uses a set of generated Java | |
classes (each corresponding to a particular CAS type) with <quote>getter</quote> and | |
<quote>setter</quote> methods for the features, plus a constructor so new instances can | |
be made. The Java classes don't actually store the data in the class instance; | |
instead, the getters and setters forward to the underlying CAS data representation. | |
Because of this, applications which use the JCas interface can share data with annotators | |
using plain CAS (i.e., not using the JCas approach). </para> | |
<para>Users can modify the JCas generated | |
Java classes by adding fields to them; this allows arbitrary non-CAS data to also be | |
represented within the JCas objects, as well; however, the non-CAS data stored in the JCas | |
object instances cannot be shared with annotators using the plain CAS.</para> | |
<para>Data in the CAS initially has no corresponding JCas type instances; these are created | |
as needed at the first reference. This means, if your annotator is passed a large CAS having | |
millions of CAS feature structures, but you only reference a few of them, and no previously | |
created Java JCas object instances were created by upstream annotators, the only Java | |
objects that will be created will be those that correspond to the CAS feature structures | |
that you reference.</para> | |
<para>The JCas class Java source files are generated from XML type system descriptions. The | |
JCasGen utility does the work of generating the corresponding Java Class Model for the CAS | |
types. There are a variety of ways JCasGen can be run; these are described later. You | |
include the generated classes with your UIMA component, and you can publish these classes | |
for others who might want to use your type system.</para> | |
<para>The specification of the type system in XML can be written using a conventional text | |
editor, an XML editor, or using the Eclipse plug-in that supports editing UIMA | |
descriptors.</para> | |
<para>Changes to the type system are done by changing the XML and regenerating the | |
corresponding Java Class Models. Of course, once you've published your type system | |
for others to use, you should be careful that any changes you make don't adversely | |
impact the users. Additional features can be added to existing types without breaking | |
other code.</para> | |
<para>A separate Java class is generated for each type; this type implements the CAS | |
FeatureStructure interface, as well as having the special getters and setters for the | |
included features. In the current implementation, an additional helper class per type is | |
also generated. The generated Java classes have methods (getters and setters) for the | |
fields as defined in the XML type specification. Descriptor comments are reflected in the | |
generated Java code as Java-doc style comments.</para> | |
<section id="ugr.ref.jcas.name_spaces"> | |
<title>Name Spaces</title> | |
<para>Full Type names consist of a <quote>namespace</quote> prefix dotted with a simple | |
name. Namespaces are used like packages to avoid collisions between types that are | |
defined by different people at different times. The namespace is used as the Java | |
package name for generated Java files.</para> | |
<para>Type names used in the CAS correspond to the generated Java classes directly. If the | |
CAS name is com.myCompany.myProject.ExampleClass, the generated Java class is in the | |
package com.myCompany.myProject, and the class is ExampleClass.</para> | |
<para> | |
An exception to this rule is the built-in types | |
starting with <literal>uima.cas </literal>and <literal>uima.tcas</literal>; | |
these names are mapped to Java packages named | |
<literal>org.apache.uima.jcas.cas</literal> and | |
<literal>org.apache.uima.jcas.tcas</literal>.</para> | |
</section> | |
<section id="ugr.ref.jcas.use_of_description"> | |
<title>XML description element</title> | |
<titleabbrev>Use of XML Description</titleabbrev> | |
<para>Each XML type specification can have <description ... | |
> tags. The description for a type will be copied into the generated Java code, as a | |
Javadoc style comment for the class. When writing these descriptions in the XML type | |
specification file, you might want to use html tags, as allowed in Javadocs.</para> | |
<para>If you use the Component Description Editor, you can write the html tags normally, | |
for instance, <quote><h1>My Title</h1></quote>. The Component | |
Descriptor Editor will take care of coverting the actual descriptor source so that it | |
has the leading <quote><</quote> character written as <quote>&lt;</quote>, | |
to avoid confusing the XML type specification. For example, <p> would be written | |
in the source of the descriptor as &lt;p>. Any characters used in the Javadoc | |
comment must of course be from the character set allowed by the XML type specification. | |
These specifications often start with the line <?xml version=<quote>1.0</quote> | |
encoding=<quote>UTF-8</quote> ?>, which means you can use any of the UTF-8 | |
characters.</para> | |
</section> | |
<section id="ugr.ref.jcas.mapping_built_ins"> | |
<title>Mapping built-in CAS types to Java types</title> | |
<para>The built-in primitive CAS types map to Java types as follows:</para> | |
<programlisting>uima.cas.Boolean → boolean | |
uima.cas.Byte → byte | |
uima.cas.Short → short | |
uima.cas.Integer → int | |
uima.cas.Long → long | |
uima.cas.Float → float | |
uima.cas.Double → double | |
uima.cas.String → String</programlisting> | |
</section> | |
<section id="ugr.ref.jcas.augmenting_generated_code"> | |
<title>Augmenting the generated Java Code</title> | |
<para>The Java Class Models generated for each type can be augmented by the user. Typical | |
augmentations include adding additional (non-CAS) fields and methods, and import | |
statements that might be needed to support these. Commonly added methods include | |
additional constructors (having different parameter signatures), and | |
implementations of toString().</para> | |
<para>To augment the code, just edit the generated Java source code for the class named the | |
same as the CAS type. Here's an example of an additional method you might add; the | |
various getter methods are retrieving values from the instance:</para> | |
<programlisting>public String toString() { // for debugging | |
return "XsgParse " | |
+ getslotName() + ": " | |
+ getheadWord().getCoveredText() | |
+ " seqNo: " + getseqNo() | |
+ ", cAddr: " + id | |
+ ", size left mods: " + getlMods().size() | |
+ ", size right mods: " + getrMods().size(); | |
}</programlisting> | |
<section id="ugr.ref.jcas.data_persistence"> | |
<title>Persistence of additional data</title> | |
<para>If you add custom instance fields to JCas cover classes, these exist in the JCas cover object instance, | |
but not in the CAS itself. Each time a CAS object is referenced (by an iterator, or by following a Feature | |
Structure reference), a new JCas cover object instance may be created. If you need these values, you can (a) | |
make them CAS values if possible, or (b) hold a reference to the the particular JCas cover object instance in | |
your Java code. For some simple cases, setting the the performance tuning option JCAS_CACHE_ENABLE (see | |
<olink targetdoc="&uima_docs_tutorial_guides;" targetptr="tug.application.pto"/>) | |
to true | |
will cause the same JCas cover object that was previously used for a particular CAS Feature Structure to be | |
reused. However, this capability won't work when other factors interfere with the ability to reuse the same | |
object. Pear isolation is an example of this.</para> | |
<para>Because of this, and because the JCas Cache holds on to the JCas cover objects beyond their useful life and | |
prevents them from being garbage collected, it is normally recommended running with the | |
JCAS_CACHE_ENABLE set to "false".</para> | |
</section> | |
<section id="ugr.ref.jcas.keeping_augmentations_when_regenerating"> | |
<title>Keeping hand-coded augmentations when regenerating</title> | |
<para>If the type system specification changes, you have to re-run the JCasGen | |
generator. This will produce updated Java for the Class Models that capture the | |
changed specification. If you have previously augmented the source for these Java | |
Class Models, your changes must be merged with the newly (re)generated Java source | |
code for the Class Models. This can be done by hand, or you can run the version of JCasGen | |
that is integrated with Eclipse, and use automatic merging that is done using Eclipse's EMF | |
plug-in. You can obtain Eclipse and the needed EMF plug-in from <ulink | |
url="http://www.eclipse.org/"/>.</para> | |
<para>If you run the generator version that works without using Eclipse, it will not | |
merge Java source changes you may have previously made; if you want them retained, | |
you'll have to do the merging by hand.</para> | |
<para>The Java source merging will keep additional constructors, additional fields, | |
and any changes you may have made to the readObject method (see below). Merging will | |
<emphasis>not</emphasis> delete classes in the target corresponding to deleted CAS types, which no longer | |
are in the source – you should delete these by hand.</para> | |
<warning><para>The merging supports Java 1.4 syntactic constructs only. | |
JCasGen generates Java 1.4 code, so as long as any code you change here also sticks to | |
only Java 1.4 constructs, the merge will work. If you use Java 5 or later specific syntax or constructs, the merge | |
operation will likely fail to merge properly.</para></warning> | |
</section> | |
<section id="ugr.ref.jcas.additional_constructors"> | |
<title>Additional Constructors</title> | |
<para>Any additional constructors that you add must include the JCas argument. The | |
first line of your constructor is required to be</para> | |
<programlisting>this(jcas); // run the standard constructor</programlisting> | |
<para>where jcas is the passed in JCas reference. If the type you're defining | |
extends <literal>uima.tcas.Annotation</literal>, JCasGen will automatically | |
add a constructor which takes 2 additional parameters – the begin and end Java | |
int values, and set the <literal>uima.tcas.Annotation</literal> | |
<literal>begin</literal> and <literal>end</literal> fields.</para> | |
<para>Here's an example: If you're defining a type MyType which has a | |
feature parent, you might make an additional constructor which has an additional | |
argument of parent:</para> | |
<programlisting>MyType(JCas jcas, MyType parent) { | |
this(jcas); // run the standard constructor | |
setParent(parent); // set the parent field from the parameter | |
}</programlisting> | |
<section id="ugr.ref.jcas.using_readobject"> | |
<title>Using readObject</title> | |
<para>Fields defined by augmenting the Java Class Model to include additional | |
fields represent data that exist for this class in Java, in a local JVM (Java Virtual | |
Machine), but do not exist in the CAS when it is passed to other environments (for | |
example, passing to a remote annotator).</para> | |
<para>A problem can arise when new instances are created, perhaps by the underlying | |
system when it iterates over an index, which is: how to insure that any additional | |
non-CAS fields are properly initialized. To allow for arbitrary initialization | |
at instance creation time, an initialization method in the Java Class Model, | |
called readObject is used. The generated default for this method is to do nothing, | |
but it is one of the methods that you can modify – to do whatever | |
initialization might be needed. It is called with 0 parameters, during the | |
constructor for the object, after the basic object fields have been set up. It can | |
refer to fields in the CAS using the getters and setters, and other fields in the Java | |
object instance being initialized.</para> | |
<para>A pre-existing CAS feature structure could exist if a CAS was being passed to | |
this annotator; in this case the JCas system calls the readObject method when | |
creating the corresponding Java instance for the first time for the CAS feature | |
structure. This can happen at two points: when a new object is being returned from an | |
iterator over a CAS index, or a getter method is getting a field for the first time | |
whose value is a feature structure.</para> | |
</section> | |
</section> | |
<section id="ugr.ref.jcas.modifying_generated_items"> | |
<title>Modifying generated items</title> | |
<para>The following modifications, if made in generated items, will be preserved when | |
regenerating.</para> | |
<para>The public/private etc. flags associated with methods (getters and setters). | |
You can change the default (<quote>public</quote>) if needed.</para> | |
<para><quote>final</quote> or <quote>abstract</quote> can be added to the type | |
itself, with the usual semantics.</para> | |
</section> | |
</section> | |
<section id="ugr.ref.jcas.merging_types_from_other_specs"> | |
<title>Merging types</title> | |
<titleabbrev>Merging Types</titleabbrev> | |
<para>Type definitions are merged by the framework from all the components being run together.</para> | |
<section id="ugr.ref.jcas.merging_types.aggregates_and_cpes"> | |
<title>Aggregate AEs and CPEs as sources of types</title> | |
<para>When running aggregate AEs (Analysis Engines), or a set of AEs in a collection processing engine, the | |
UIMA framework will build a merged type system (Note: this <quote>merge</quote> is merging types, not to be | |
confused with merging Java source code, discussed above). This merged type system has all the types of every | |
component used in the application. In addition, application code can use UIMA Framework APIs to read and merge | |
type descriptions, manually.</para> | |
<para>In most cases, each type system can have its own Java Class Models generated individually, perhaps at an | |
earlier time, and the resulting class files (or .jar files containing these class files) can be put in the | |
class path to enable JCas.</para> | |
<para>However, it is possible that there may be multiple definitions of the same CAS type, each of which might | |
have different features defined. In this case, the UIMA framework will create a merged type by accumulating | |
all the defined features for a particular type into that type's type definition. However, the JCas | |
classes for these types are not automatically merged, which can create some issues for JCas users, as | |
discussed in the next section.</para> | |
</section> | |
<section id="ugr.ref.jcas.merging_types.jcasgen_support"> | |
<title>JCasGen support for type merging</title> | |
<para>When there are multiple definitions of the same CAS type with different features defined, then JCasGen | |
can be re-run on the merged type system, to create one set of JCas Class definitions for the merged types, | |
which can then be shared by all the components. | |
Directions for running JCasGen can be found in <olink | |
targetdoc="&uima_docs_tools;" targetptr="ugr.tools.jcasgen"/>. This is typically done by the person who | |
is assembling the Aggregate Analysis Engine or Collection Processing Engine. The resulting merged Java | |
Class Model will then contain get and set methods for the complete set of features. These Java classes must | |
then be made available in the class path, <emphasis>replacing</emphasis> the pre-merge versions of the | |
classes.</para> | |
<para>If hand-modifications were done to the pre-merge versions of the classes, these must be applied to the | |
merged versions, as described in section <xref | |
linkend="ugr.ref.jcas.keeping_augmentations_when_regenerating"/>, above. If just one of the | |
pre-merge versions had hand-modifications, the source for this hand-modified version can be put into the | |
file system where the generated output will go, and the -merge option for JCasGen will automatically | |
merge the hand-modifications with the generated code. If | |
<emphasis>both</emphasis> pre-merged versions had hand-modifications, then these modifications must | |
be manually merged.</para> | |
<para>An alternative to this is packaging the components as individual PEAR files, each with their own | |
version of the JCas generated Classes. The Framework (as of release 2.2) can run PEAR files using the | |
pear file descriptor, and supply each component with its particular version of the JCas generated class.</para> | |
</section> | |
<section id="ugr.ref.jcas.impact_of_type_merging_on_composability"> | |
<title>Impact of Type Merging on Composability of Annotators</title> | |
<titleabbrev>Type Merging impacts on Composability</titleabbrev> | |
<para>The recommended approach in UIMA is to build and maintain type systems as separate components, which are | |
imported by Annotators. Using this approach, Type Merging does not occur because the Type System and its JCas | |
classes are centrally managed and shared by the annotators.</para> | |
<para>If you do choose to create a JCas Annotator that relies on Type Merging (meaning that your annotator | |
redefines a Type that is already in use elsewhere, and adds its own features), this can negatively impact the | |
reusability of your annotator, unless your component is used as a PEAR file.</para> | |
<para>If not using PEAR file packaging isolation capability, whenever | |
anyone wants to combine your annotator with another annotator that uses a different version of | |
the same Type, they will need to be aware of all of the issues described in the previous section. They will need | |
to have the know-how to re-run JCasGen and appropriately set up their classpath to include the merged Java | |
classes and to not include the pre-merge classes. (To enable this, you should package these classes | |
separately from other .jar files for your annotator, so that they can be more easily excluded.) And, if you | |
have done hand-modifications to your JCas classes, the person assembling your annotator will need to | |
properly merge those changes. These issues significantly complicate the task of combining annotators, and | |
will cause your annotator not to be as easily reusable as other UIMA annotators. </para> | |
</section> | |
<section id="ugr.ref.jcas.documentannotation_issues"> | |
<title>Adding Features to DocumentAnnotation</title> | |
<para>There is one built-in type, <literal>uima.tcas.DocumentAnnotation</literal>, | |
to which applications can add additional features. (All other built-in types | |
are "feature-final" and you cannot add additional features to them.) Frequently, | |
additional features are added to <literal>uima.tcas.DocumentAnnotation</literal> | |
to provide a place to store document-level metadata.</para> | |
<para>For the same reasons mentioned in the previous section, adding features to | |
DocumentAnnotation is not recommended if you are using JCas. Instead, it is recommended | |
that you define your own type for storing your document-level metadata. You can create | |
an instance of this type and add it to the indexes in the usual way. You can then | |
retrieve this instance using the iterator returned from the method<literal>getAllIndexedFS(type)</literal> | |
on an instance of a JFSIndexRepository object. | |
(As of UIMA v2.1, you do not have to declare a custom index in your descriptor to | |
get this to work).</para> | |
<para>If you do choose to add features to DocumentAnnotation, there are additional issues to | |
be aware of. The UIMA SDK provides the JCas cover class for the built-in definition of | |
DocumentAnnotation, in the separate jar file <literal>uima-document-annotation.jar</literal>. | |
If you add additional features to DocumentAnnotation, you must remove this jar file | |
from your classpath, because you will not want to use the default JCas cover class. | |
You will need to re-run JCasGen as described in <xref | |
linkend="ugr.ref.jcas.merging_types.jcasgen_support"/>. JCasGen will generate a new cover | |
class for DocumentAnnotation, which you must place in your classpath in lieu of the version | |
in <literal>uima-document-annotation.jar</literal>.</para> | |
<para>Also, this is the reason why the method <literal>JCas.getDocumentAnnotationFs()</literal> returns | |
type <literal>TOP</literal>, rather than type <literal>DocumentAnnotation</literal>. Because the | |
<literal>DocumentAnnotation</literal> class can be replaced by users, it is not part of | |
<literal>uima-core.jar</literal> and so the core UIMA framework cannot have any references | |
to it. In your code, you may <quote>cast</quote> the result of <literal>JCas.getDocumentAnnotationFs()</literal> | |
to type <literal>DocumentAnnotation</literal>, which must be available on the classpath either via | |
<literal>uima-document-annotation.jar</literal> or by including a custom version that you have generated using JCasGen.</para> | |
</section> | |
</section> | |
<section id="ugr.ref.jcas.using_within_an_annotator"> | |
<title>Using JCas within an Annotator</title> | |
<para>To use JCas within an annotator, you must include the generated Java classes output | |
from JCasGen in the class path.</para> | |
<para>An annotator written using JCas is built by defining a class for the annotator that | |
extends JCasAnnotator_ImplBase. The process method for this annotator is | |
written</para> | |
<programlisting>public void process(JCas jcas) | |
throws AnalysisEngineProcessException { | |
... // body of annotator goes here | |
}</programlisting> | |
<para>The process method is passed the JCas instance to use as a parameter.</para> | |
<para>The JCas reference is used throughout the annotator to refer to the particular JCas | |
instance being worked on. In pooled or multi-threaded implementations, there will be a | |
separate JCas for each thread being (simultaneously) worked on.</para> | |
<para>You can do several kinds of operations using the JCas APIs: create new feature | |
structures (instances of CAS types) (using the new operator), access existing feature | |
structures passed to your annotator in the JCas (for example, by using the next method of | |
an iterator over the feature structures), get and set the fields of a particular | |
instance of a feature structure, and add and remove feature structure instances from | |
the CAS indexes. To support iteration, there are also functions to get and use indexes | |
and iterators over the instances in a JCas.</para> | |
<section id="ugr.ref.jcas.new_instances"> | |
<title>Creating new instances using the Java <quote>new</quote> operator</title> | |
<titleabbrev>Creating new instances</titleabbrev> | |
<para>The new operator creates new instances of JCas types. It takes at least one | |
parameter, the JCas instance in which the type is to be created. For example, if there | |
was a type Meeting defined, you can create a new instance of it using: | |
<programlisting>Meeting m = new Meeting(jcas);</programlisting></para> | |
<para>Other variations of constructors can be added in custom code; the single | |
parameter version is the one automatically generated by JCasGen. For types that are | |
subtypes of Annotation, JCasGen also generates an additional constructor with | |
additional <quote>begin</quote> and <quote>end</quote> arguments.</para> | |
</section> | |
<section id="ugr.ref.jcas.getters_and_setters"> | |
<title>Getters and Setters</title> | |
<para>If the CAS type Meeting had fields location and time, you could get or set these by | |
using getter or setter methods. These methods have names formed by splicing together | |
the word <quote>get</quote> or <quote>set</quote> followed by the field name, with | |
the first letter of the field name capitalized. For instance | |
<programlisting>getLocation()</programlisting></para> | |
<para>The getter forms take no parameters and return the value of the field; the setter | |
forms take one parameter, the value to set into the field, and return void.</para> | |
<para>There are built-in CAS types for arrays of integers, strings, floats, and | |
feature structures. For fields whose values are these types of arrays, there is an | |
alternate form of getters and setters that take an additional parameter, written as | |
the first parameter, which is the index in the array of an item to get or set.</para> | |
</section> | |
<section id="ugr.ref.jcas.obtaining_refs_to_indexes"> | |
<title>Obtaining references to Indexes</title> | |
<para>The only way to access instances (not otherwise referenced from other | |
instances) passed in to your annotator in its JCas is to use an iterator over some | |
index. Indexes in the CAS are specified in the annotator descriptor. Indexes have a | |
name; text annotators have a built-in, standard index over all annotations.</para> | |
<para>To get an index, first get the JFSIndexRepository from the JCas using the method | |
jcas.getJFSIndexRepository(). Here are the calls to get indexes:</para> | |
<programlisting>JFSIndexRepository ir = jcas.getJFSIndexRepository(); | |
ir.getIndex(name-of-index) // get the index by its name, a string | |
ir.getIndex(name-of-index, Foo.type) // filtered by specific type | |
ir.getAnnotationIndex() // get AnnotationIndex | |
ir.getAnnotationIndex(Foo.type) // filtered by specific type</programlisting> | |
<para>For convenience, the getAnnotationIndex method is available directly on the JCas object | |
instance; the implementation merely forwards to the associated index repository.</para> | |
<para>Filtering types have to be a subtype of the type specified for this index in its | |
index specification. They can be written as either Foo.type or if you have an instance | |
of Foo, you can write</para> | |
<programlisting>fooInstance.jcasType.casType. </programlisting> | |
<para>Foo is (of course) an example of the name of the type.</para> | |
</section> | |
<section id="ugr.ref.jcas.adding_removing_instances_to_indexes"> | |
<title>Adding (and removing) instances to (from) indexes</title> | |
<titleabbrev>Updating Indexes</titleabbrev> | |
<para>CAS indexes are maintained automatically by the CAS. But you must add any | |
instances of feature structures you want the index to find, to the indexes by using the | |
call:</para> | |
<programlisting>myInstance.addToIndexes();</programlisting> | |
<para>Do this after setting all features in the instance <emphasis role="bold-italic">which could be used in indexing</emphasis>, for example, in | |
determining the sorting order. After indexing, do not change the values of these | |
particular features because the indexes will not be updated. If you need to change the | |
values, you must first remove the instance from the CAS indexes, change the values, | |
and then add the instance back. To remove an instance from the indexes, use the method: | |
<programlisting>myInstance.removeFromIndexes();</programlisting></para> | |
<note><para>It's OK to change feature values which are not used in determining | |
sort ordering (or set membership), without removing and re-adding back to the index. | |
</para></note> | |
<para>When writing a Multi-View component, you may need to index instances in multiple | |
CAS views. The methods above use the indexes associated with the current JCas object. | |
There is a variation of the <literal>addToIndexes / removeFromIndexes</literal> methods which | |
takes one argument: a reference to a JCas object holding the view in which you want to | |
index this instance. | |
<programlisting>myInstance.addToIndexes(anotherJCas) | |
myInstance.removeFromIndexes(anotherJCas)</programlisting> | |
</para> | |
<para> | |
You can also explicitly add instances to other views using the addFsToIndexes method on | |
other JCas (or CAS) objects. For instance, if you had 2 other CAS views (myView1 and | |
myView2), in which you wanted to index myInstance, you could write:</para> | |
<programlisting>myInstance.addToIndexes(); //addToIndexes used with the new operator | |
myView1.addFsToIndexes(myInstance); // index myInstance in myView1 | |
myView2.addFsToIndexes(myInstance); // index myInstance in myView2</programlisting> | |
<para> | |
The rules for determining which index to use with a particular JCas object are designed to | |
behave the way most would think they should; if you need specific behavior, you can always | |
explicitly designate which view the index adding and removing operations should work on. | |
</para> | |
<para> | |
The rules are: | |
If the instance is a subtype of AnnotationBase, then the view is the view associated with the | |
annotation as specified in the feature holding the view reference in AnnotationBase. | |
Otherwise, if the instance was created using the "new" operator, then the view is the view passed to the | |
instance's constructor. | |
Otherwise, if the instance was created by getting a feature value from some other instance, whose range | |
type is a feature structure, then the view is the same as the referring instance. | |
Otherwise, if the instance was created by any of the Feature Structure Iterator operations over some index, | |
then it is the view associated with the index. | |
</para> | |
</section> | |
<section id="ugr.ref.jcas.using_iterators"> | |
<title>Using Iterators</title> | |
<para>Once you have an index obtained from the JCas, you can get an iterator from the | |
index; here is an example:</para> | |
<programlisting>FSIndexRepository ir = jcas.getFSIndexRepository(); | |
FSIndex myIndex = ir.getIndex("myIndexName"); | |
FSIterator myIterator = myIndex.iterator(); | |
JFSIndexRepository ir = jcas.getJFSIndexRepository(); | |
FSIndex myIndex = ir.getIndex("myIndexName", Foo.type); // filtered | |
FSIterator myIterator = myIndex.iterator();</programlisting> | |
<para>Iterators work like normal Java iterators, but are augmented to support | |
additional capabilities. Iterators are described in the CAS Reference, <olink | |
targetdoc="&uima_docs_ref;" | |
targetptr="ugr.ref.cas.indexes_and_iterators"/>.</para> | |
</section> | |
<section id="ugr.ref.jcas.class_loaders"> | |
<title>Class Loaders in UIMA</title> | |
<para>The basic concept of a UIMA application includes assembling engines into a flow. | |
The application made up of these Engines are run within the UIMA Framework, either by | |
the Collection Processing Manager, or by using more basic UIMA Framework | |
APIs.</para> | |
<para>The UIMA Framework exists within a JVM (Java Virtual Machine). A JVM has the | |
capability to load multiple applications, in a way where each one is isolated from the | |
others, by using a separate class loader for each application. For instance, one set | |
of UIMA Framework Classes could be shared by multiple sets of application - specific | |
classes, even if these application-specific classes had the same names but were | |
different versions.</para> | |
<section id="ugr.ref.jcas.class_loaders.optional"> | |
<title>Use of Class Loaders is optional</title> | |
<para>The UIMA framework will use a specific ClassLoader, based on how | |
ResourceManager instances are used. Specific ClassLoaders are only created if | |
you specify an ExtensionClassPath as part of the ResourceManager. If you do not | |
need to support multiple applications within one UIMA framework within a JVM, | |
don't specify an ExtensionClassPath; in this case, the classloader used | |
will be the one used to load the UIMA framework - usually the overall application | |
class loader.</para> | |
<para>Of course, you should not run multiple UIMA applications together, in this | |
way, if they have different class definitions for the same class name. This | |
includes the JCas <quote>cover</quote> classes. This case might arise, for | |
instance, if both applications extended | |
<literal>uima.tcas.DocumentAnnotation</literal> in differing, | |
incompatible ways. Each application would need its own definition of this class, | |
but only one could be loaded (unless you specify ExtensionClassPath in the | |
ResourceManager which will cause the UIMA application to load its private | |
versions of its classes, from its classpath).</para> | |
</section> | |
</section> | |
<section id="ugr.ref.jcas.accessing_jcas_objects_outside_uima_components"> | |
<title>Issues accessing JCas objects outside of UIMA Engine Components</title> | |
<para>If you are using the ExtensionClassPaths, the JCas cover classes are loaded | |
under a class loader created by the ResourceManager part of the UIMA Framework. | |
If you reference the same JCas | |
classes outside of any UIMA component, for instance, in top level application code, | |
the JCas classes used by that top level application code also must be in the class path | |
for the application code.</para> | |
<para>Alternatively, you could do all the JCas processing inside a UIMA component (and do no | |
processing using JCas outside of the UIMA pipeline).</para> | |
</section> | |
</section> | |
<section id="ugr.ref.jcas.setting_up_classpath"> | |
<title>Setting up Classpath for JCas</title> | |
<para>The JCas Java classes generated by JCasGen are typically compiled and put into a JAR | |
file, which, in turn, is put into the application's class path.</para> | |
<para>This JAR file must be generated from the application's merged type system. | |
This is most conveniently done by opening the top level descriptor used by the | |
application in the Component Descriptor Editor tool, and pressing the Run-JCasGen | |
button on the Type System Definition page.</para> | |
</section> | |
<section id="ugr.ref.jcas.pear_support"> | |
<title>PEAR isolation</title> | |
<para> | |
As of version 2.2, the framework supports component descriptors which are PEAR descriptors. | |
These descriptors define components plus include information on the class path needed to | |
run them. The framework uses the class path information to set up a localized class path, just | |
for code running within the PEAR context. This allows PEAR files requiring different | |
versions of common code to work well together, even if the class names in the different versions | |
have the same names. | |
</para> | |
<para>The mechanism used to switch the class loaders when entering a PEAR-packaged annotator in | |
a flow depends on the framework knowing if JCas is being used within that annotator code. The | |
framework will know this if the particular view being passed has had a previous call to | |
getJCas(), or if the particular annotator is marked as a JCas-using one (by having it extend the | |
class <code>JCasAnnotator_ImplBase).</code></para> | |
</section> | |
</chapter> |