docs/d/uimaj-current/references.html - uima-site - Git at Google

 <html><head>
       <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
    <title>UIMA References</title><link rel="stylesheet" type="text/css" href="css/stylesheet-html.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div lang="en" class="book" title="UIMA References" id="d5e1"><div xmlns:d="http://docbook.org/ns/docbook" class="titlepage"><div><div><h1 class="title">UIMA References</h1></div><div><div class="authorgroup">
       <h3 class="corpauthor">Written and maintained by the Apache UIMA&#8482; Development Community</h3>
     </div></div><div><p class="releaseinfo">Version 3.1.1</p></div><div><p class="copyright">Copyright &copy; 2006, 2019 The Apache Software Foundation</p></div><div><p class="copyright">Copyright &copy; 2004, 2006 International Business Machines Corporation</p></div><div><div class="legalnotice" title="Legal Notice"><a name="d5e8"></a>
       <p> </p>
       <p title="License and Disclaimer">
         <b>License and Disclaimer.&nbsp;</b>

         The ASF licenses this documentation
            to you under the Apache License, Version 2.0 (the
            "License"); you may not use this documentation except in compliance
            with the License.  You may obtain a copy of the License at

          </p><div class="blockquote"><blockquote class="blockquote">
            <a class="ulink" href="http://www.apache.org/licenses/LICENSE-2.0" target="_top">http://www.apache.org/licenses/LICENSE-2.0</a>
          </blockquote></div><p title="License and Disclaimer">

            Unless required by applicable law or agreed to in writing,
            this documentation and its contents are distributed under the License
            on an
            "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
            KIND, either express or implied.  See the License for the
            specific language governing permissions and limitations
            under the License.

       </p>
       <p> </p>
       <p> </p>
       <p title="Trademarks">
         <b>Trademarks.&nbsp;</b>
         All terms mentioned in the text that are known to be trademarks or
         service marks have been appropriately capitalized.  Use of such terms
         in this book should not be regarded as affecting the validity of the
         the trademark or service mark.

       </p>
     </div></div><div><p class="pubdate">November, 2019</p></div></div><hr></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="chapter"><a href="#ugr.ref.javadocs">1. Javadocs</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.javadocs.libraries">1.1. Using named Eclipse User Libraries</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.ref.xml.component_descriptor">2. Component Descriptor Reference</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.xml.component_descriptor.notation">2.1. Notation</a></span></dt><dt><span class="section"><a href="#ugr.ref.xml.component_descriptor.imports">2.2. Imports</a></span></dt><dt><span class="section"><a href="#ugr.ref.xml.component_descriptor.type_system">2.3. Type System Descriptors</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.xml.component_descriptor.type_system.imports">2.3.1. Imports</a></span></dt><dt><span class="section"><a href="#ugr.ref.xml.component_descriptor.type_system.types">2.3.2. Types</a></span></dt><dt><span class="section"><a href="#ugr.ref.xml.component_descriptor.type_system.features">2.3.3. Features</a></span></dt><dt><span class="section"><a href="#ugr.ref.xml.component_descriptor.type_system.string_subtypes">2.3.4. String Subtypes</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.ref.xml.component_descriptor.aes">2.4. Analysis Engine Descriptors</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.xml.component_descriptor.aes.primitive">2.4.1. Primitive Analysis Engine Descriptors</a></span></dt><dt><span class="section"><a href="#ugr.ref.xml.component_descriptor.aes.aggregate">2.4.2. Aggregate Analysis Engine Descriptors</a></span></dt><dt><span class="section"><a href="#ugr.ref.xml.component_descriptor.aes.configuration_parameters">2.4.3. Configuration Parameters</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.ref.xml.component_descriptor.flow_controller">2.5. Flow Controller Descriptors</a></span></dt><dt><span class="section"><a href="#ugr.ref.xml.component_descriptor.collection_processing_parts">2.6. Collection Processing Component Descriptors</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.xml.component_descriptor.collection_processing_parts.collection_reader">2.6.1. Collection Reader Descriptors</a></span></dt><dt><span class="section"><a href="#ugr.ref.xml.component_descriptor.collection_processing_parts.cas_initializer">2.6.2. CAS Initializer Descriptors (deprecated)</a></span></dt><dt><span class="section"><a href="#ugr.ref.xml.component_descriptor.collection_processing_parts.cas_consumer">2.6.3. CAS Consumer Descriptors</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.ref.xml.component_descriptor.service_client">2.7. Service Client Descriptors</a></span></dt><dt><span class="section"><a href="#ugr.ref.xml.component_descriptor.custom_resource_specifiers">2.8. Custom Resource Specifiers</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.ref.xml.cpe_descriptor">3. CPE Descriptor Reference</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.xml.cpe_descriptor.overview">3.1. CPE Overview</a></span></dt><dt><span class="section"><a href="#ugr.ref.xml.cpe_descriptor.notation">3.2. Notation</a></span></dt><dt><span class="section"><a href="#ugr.ref.xml.cpe_descriptor.imports">3.3. Imports</a></span></dt><dt><span class="section"><a href="#ugr.ref.xml.cpe_descriptor.descriptor">3.4. CPE Descriptor Overview</a></span></dt><dt><span class="section"><a href="#ugr.ref.xml.cpe_descriptor.descriptor.collection_reader">3.5. Collection Reader</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.xml.cpe_descriptor.descriptor.collection_reader.error_handling">3.5.1. Error handling for Collection Readers</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.ref.xml.cpe_descriptor.descriptor.cas_processors">3.6. CAS Processors</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual">3.6.1. Specifying an Individual CAS Processor</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.ref.xml.cpe_descriptor.descriptor.operational_parameters">3.7. CPE Operational Parameters</a></span></dt><dt><span class="section"><a href="#ugr.ref.xml.cpe_descriptor.descriptor.resource_manager_configuration">3.8. Resource Manager Configuration</a></span></dt><dt><span class="section"><a href="#ugr.ref.xml.cpe_descriptor.descriptor.example">3.9. Example CPE Descriptor</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.ref.cas">4. CAS Reference</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.cas.javadocs">4.1. Javadocs</a></span></dt><dt><span class="section"><a href="#ugr.ref.cas.overview">4.2. CAS Overview</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.cas.type_system">4.2.1. The Type System</a></span></dt><dt><span class="section"><a href="#ugr.ref.cas.creating_accessing_manipulating_data">4.2.2. Creating/Accessing/Changing data</a></span></dt><dt><span class="section"><a href="#ugr.ref.cas.creating_using_indexes">4.2.3. Creating and using indexes</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.ref.cas.builtin_types">4.3. Built-in CAS Types</a></span></dt><dt><span class="section"><a href="#ugr.ref.cas.accessing_the_type_system">4.4. Accessing the type system</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.cas.type_system.printer_example">4.4.1. TypeSystemPrinter example</a></span></dt><dt><span class="section"><a href="#ugr.ref.cas.cas_apis_create_modify_feature_structures">4.4.2. Using CAS APIs: Feature Structures</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.ref.cas.creating_feature_structures">4.5. Creating feature structures</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.cas.updating_indexed_feature_structures">4.5.1. Updating indexed feature structures</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.ref.cas.accessing_modifying_features_of_feature_structures">4.6. Accessing or modifying Features</a></span></dt><dt><span class="section"><a href="#ugr.ref.cas.indexes_and_iterators">4.7. Indexes and Iterators</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.cas.index.built_in_indexes">4.7.1. Built-in Indexes</a></span></dt><dt><span class="section"><a href="#ugr.ref.cas.index.adding_to_indexes">4.7.2. Adding Feature Structures to the Indexes</a></span></dt><dt><span class="section"><a href="#ugr.ref.cas.index.iterators">4.7.3. Iterators over UIMA Indexes</a></span></dt><dt><span class="section"><a href="#ugr.ref.cas.index.annotation_index">4.7.4. Special iterators for Annotation types</a></span></dt><dt><span class="section"><a href="#ugr.ref.cas.index.constraints_and_filtered_iterators">4.7.5. Constraints and Filtered iterators</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.ref.cas.guide_to_javadocs">4.8. CAS API's Javadocs</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.cas.javadocs.cas_package">4.8.1. APIs in the CAS package</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.ref.cas.typemerging">4.9. Type Merging</a></span></dt><dt><span class="section"><a href="#ugr.ref.cas.limitedmultipleaccess">4.10. Limited multi-thread access to read-only CASs</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.ref.jcas">5. JCas Reference</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.jcas.name_spaces">5.1. Name Spaces</a></span></dt><dt><span class="section"><a href="#ugr.ref.jcas.use_of_description">5.2. Use of XML Description</a></span></dt><dt><span class="section"><a href="#ugr.ref.jcas.mapping_built_ins">5.3. Mapping built-in CAS types to Java types</a></span></dt><dt><span class="section"><a href="#ugr.ref.jcas.augmenting_generated_code">5.4. Augmenting the generated Java Code</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.jcas.keeping_augmentations_when_regenerating">5.4.1. Keeping hand-coded augmentations when regenerating</a></span></dt><dt><span class="section"><a href="#ugr.ref.jcas.additional_constructors">5.4.2. Additional Constructors</a></span></dt><dt><span class="section"><a href="#ugr.ref.jcas.modifying_generated_items">5.4.3. Modifying generated items</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.ref.jcas.merging_types_from_other_specs">5.5. Merging Types</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.jcas.merging_types.aggregates_and_cpes">5.5.1. Aggregate AEs and CPEs as sources of types</a></span></dt><dt><span class="section"><a href="#ugr.ref.jcas.merging_types.jcasgen_support">5.5.2. JCasGen support for type merging</a></span></dt><dt><span class="section"><a href="#ugr.ref.jcas.impact_of_type_merging_on_composability">5.5.3. Type Merging impacts on Composability</a></span></dt><dt><span class="section"><a href="#ugr.ref.jcas.documentannotation_issues">5.5.4. Adding Features to DocumentAnnotation</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.ref.jcas.using_within_an_annotator">5.6. Using JCas within an Annotator</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.jcas.new_instances">5.6.1. Creating new instances</a></span></dt><dt><span class="section"><a href="#ugr.ref.jcas.getters_and_setters">5.6.2. Getters and Setters</a></span></dt><dt><span class="section"><a href="#ugr.ref.jcas.obtaining_refs_to_indexes">5.6.3. Obtaining references to Indexes</a></span></dt><dt><span class="section"><a href="#ugr.ref.jcas.adding_removing_instances_to_indexes">5.6.4. Updating Indexes</a></span></dt><dt><span class="section"><a href="#ugr.ref.jcas.using_iterators">5.6.5. Using Iterators</a></span></dt><dt><span class="section"><a href="#ugr.ref.jcas.class_loaders">5.6.6. Class Loaders in UIMA</a></span></dt><dt><span class="section"><a href="#ugr.ref.jcas.accessing_jcas_objects_outside_uima_components">5.6.7. Issues accessing JCas objects outside of UIMA Engine Components</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.ref.jcas.setting_up_classpath">5.7. Setting up Classpath for JCas</a></span></dt><dt><span class="section"><a href="#ugr.ref.jcas.pear_support">5.8. PEAR isolation</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.ref.pear">6. PEAR Reference</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.pear.packaging_a_component">6.1. Packaging a UIMA component</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.pear.creating_pear_structure">6.1.1. Creating the PEAR structure</a></span></dt><dt><span class="section"><a href="#ugr.ref.pear.populating_pear_structure">6.1.2. Populating the PEAR structure</a></span></dt><dt><span class="section"><a href="#ugr.ref.pear.creating_installation_descriptor">6.1.3. Creating the installation descriptor</a></span></dt><dt><span class="section"><a href="#ugr.ref.pear.installation_descriptor">6.1.4. Installation Descriptor: template</a></span></dt><dt><span class="section"><a href="#ugr.ref.pear.packaging_into_1_file">6.1.5. Packaging the PEAR structure into one file</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.ref.pear.installing">6.2. Installing a PEAR package</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.pear.installing_pear_using_API">6.2.1. Installing a PEAR file using the PEAR APIs</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.ref.pear.specifier">6.3. PEAR package descriptor</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.ref.xmi">7. XMI CAS Serialization Reference</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.xmi.xmi_tag">7.1. XMI Tag</a></span></dt><dt><span class="section"><a href="#ugr.ref.xmi.feature_structures">7.2. Feature Structures</a></span></dt><dt><span class="section"><a href="#ugr.ref.xmi.primitive_features">7.3. Primitive Features</a></span></dt><dt><span class="section"><a href="#ugr.ref.xmi.reference_features">7.4. Reference Features</a></span></dt><dt><span class="section"><a href="#ugr.ref.xmi.array_and_list_features">7.5. Array and List Features</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.xmi.array_and_list_features.as_multi_valued_properties">7.5.1. Arrays and Lists as Multi-Valued Properties</a></span></dt><dt><span class="section"><a href="#ugr.ref.xmi.array_and_list_features.as_1st_class_objects">7.5.2. Arrays and Lists as First-Class Objects</a></span></dt><dt><span class="section"><a href="#ugr.ref.xmi.null_array_list_elements">7.5.3. Null Array/List Elements</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.ref.xmi.sofas_views">7.6. Subjects of Analysis (Sofas) and Views</a></span></dt><dt><span class="section"><a href="#ugr.ref.xmi.linking_to_ecore_type_system">7.7. Linking XMI docs to Ecore Type System</a></span></dt><dt><span class="section"><a href="#ugr.ref.xmi.delta">7.8. Delta CAS XMI Format</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.ref.compress">8. Compressed Binary CASes</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.compress.overview">8.1. Binary CAS Compression overview</a></span></dt><dt><span class="section"><a href="#ugr.ref.compress.usage">8.2. Using Compressed Binary CASes</a></span></dt><dt><span class="section"><a href="#ugr.ref.compress.simple-deltas">8.3. Simple Delta CAS serialization</a></span></dt><dt><span class="section"><a href="#ugr.ref.compress.use-cases">8.4. Use Case cookbook</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.ref.json">9. JSON support</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.json.overview">9.1. JSON serialization support overview</a></span></dt><dt><span class="section"><a href="#ug.ref.json.cas">9.2. JSON CAS Serialization</a></span></dt><dd><dl><dt><span class="section"><a href="#ug.ref.json.cas.bigpic">9.2.1. The Big Picture</a></span></dt><dt><span class="section"><a href="#ug.ref.json.cas.context">9.2.2. The _context section</a></span></dt><dt><span class="section"><a href="#ug.ref.json.cas.featurestructures">9.2.3. Serializing Feature Structures</a></span></dt></dl></dd><dt><span class="section"><a href="#ug.ref.json.cas.featurestructures.organization">9.3. Organizing the Feature Structures</a></span></dt><dt><span class="section"><a href="#ug.ref.json.cas.features">9.4. Additional JSON CAS Serialization features</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.json.delta">9.4.1. Delta CAS</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.ref.json.usage">9.5. Using JSON CAS serialization</a></span></dt><dt><span class="section"><a href="#ugr.ref.json.descriptionserialization">9.6. JSON serialization for UIMA descriptors</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.ref.config">10. Setup and Configuration</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.config.properties">10.1. UIMA JVM Configuration Properties</a></span></dt><dt><span class="section"><a href="#ugr.ref.config.protect-index">10.2. Configuring index protection</a></span></dt><dt><span class="section"><a href="#ugr.ref.config.property-table">10.3. Properties Table</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.ref.resources">11. UIMA Resources</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.resources.overview">11.1. What is a UIMA Resource?</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ref.resources.resource-inner-implementations">11.1.1. Resource Inner Implementations</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.ref.resources.sharing-across-pipelines">11.2. Sharing Resources</a></span></dt><dt><span class="section"><a href="#ugr.ref.resources.external-resource-multiple-parameterized-instances">11.3. External Resources support for multiple Parameterized Instances</a></span></dt></dl></dd></dl></div>


   <div class="chapter" title="Chapter&nbsp;1.&nbsp;Javadocs" id="ugr.ref.javadocs"><div class="titlepage"><div><div><h2 class="title">Chapter&nbsp;1.&nbsp;Javadocs</h2></div></div></div>


   <p>The details of all the public APIs for UIMA are contained in the API Javadocs. These are located in the docs/api
     directory; the top level to open in your browser is called <a class="ulink" href="api/index.html" target="_top">api/index.html</a>.</p>

   <p>Eclipse supports the ability to attach the Javadocs to your project. The Javadoc should already be attached
     to the <code class="literal">uimaj-examples</code> project, if you followed the setup instructions in <a href="overview_and_setup.html#d4e1" class="olink">UIMA Overview &amp; SDK Setup</a> <a href="overview_and_setup.html#ugr.ovv.eclipse_setup.example_code" class="olink">Section&nbsp;3.2, &#8220;Setting up Eclipse to view Example Code&#8221;</a>. To attach
     Javadocs to your own Eclipse project, use the following instructions.</p>

   <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>As an alternative, you can add the UIMA source to the UIMA binary distribution; if you
   do this you not only will have the Javadocs automatically available (you can skip the following
   setup), you will have the ability to step through the UIMA framework code while debugging.
   To add the source, follow the instructions as described in the setup chapter:
   <a href="overview_and_setup.html#d4e1" class="olink">UIMA Overview &amp; SDK Setup</a>
   <a href="overview_and_setup.html#ugr.ovv.eclipse_setup.adding_source" class="olink">Section&nbsp;3.3, &#8220;Adding the UIMA source code to the jar files&#8221;</a>.</p></div>

   <p>To add the Javadocs, open a project which is referring to the UIMA APIs in its class path, and open the project properties. Then pick
     Java Build Path. Pick the "Libraries" tab and select one of the UIMA library entries (if you don't have, for
     instance, uima-core.jar in this list, it's unlikely your code will compile). Each library entry has a small "&gt;"
     sign on its left - click that to expand the view to see the Javadoc location. If you highlight that and press edit - you
     can add a reference to the Javadocs, in the following dialog:


     </p><div class="screenshot">
     <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="574"><tr><td><img src="images/references/ref.javadocs/image002.jpg" width="574" alt="Screenshot of attaching Javadoc to source in Eclipse"></td></tr></table></div>
   </div>

   <p>Once you do this, Eclipse can show you Javadocs for UIMA APIs as you work. To see the Javadoc for a UIMA API, you
     can hover over the API class or method, or select it and press shift-F2, or use the menu Navigate <span class="symbol">&#8594;</span>
     Open External Javadoc, or open the Javadoc view (Window <span class="symbol">&#8594;</span> Show View <span class="symbol">&#8594;</span> Other
     <span class="symbol">&#8594;</span> Java <span class="symbol">&#8594;</span> Javadoc).</p>

   <p>In a similar manner, you can attach the source for the UIMA framework, if you download the source
     distribution. The source corresponding to particular
     releases is available from the Apache UIMA web site (<a class="ulink" href="http://uima.apache.org" target="_top">http://uima.apache.org</a>) on the
     downloads page.</p>

   <div class="section" title="1.1.&nbsp;Using named Eclipse User Libraries"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.javadocs.libraries">1.1.&nbsp;Using named Eclipse User Libraries</h2></div></div></div>

   <p>You can also create a named "user library" in Eclipse containing the UIMA Jars, and attach the Javadocs (or
   optionally, the sources); this named library is saved in the Eclipse workspace.  Once created, it can be
   added to the classpath of newly created Eclipse projects.</p>

   <p>Use the menu option Project <span class="symbol">&#8594;</span> Properties
   <span class="symbol">&#8594;</span> Java Build Path, and then pick the Libraries tab, and click the Add Library button. Then select
   User Libraries, click "Next", and pick the library you created for the UIMA Jars.</p>

   <p>To create this library in the workspace,
     use the same menu picks as above, but after you select the User Libraries and click "Next", you can click the "New Library..."
     button to define your new library.  You use the "Add Jars" button and multi-select all the Jars in the lib directory
     of the UIMA binary distribution.  Then you add the Javadoc attachment for each Jar.  The path to use is
     file:/ -- insert the path to your install of UIMA -- /docs/api.  After you do this for the first Jar, you can
     copy this string to the clipboard and paste it into the rest of the Jars.</p>
     </div>
 </div>
   <div class="chapter" title="Chapter&nbsp;2.&nbsp;Component Descriptor Reference" id="ugr.ref.xml.component_descriptor"><div class="titlepage"><div><div><h2 class="title">Chapter&nbsp;2.&nbsp;Component Descriptor Reference</h2></div></div></div>


   <p>This chapter is the reference guide for the UIMA SDK's Component Descriptor XML
     schema. A <span class="emphasis"><em>Component Descriptor</em></span> (also sometimes called a
     <span class="emphasis"><em>Resource Specifier</em></span> in the code) is an XML file that either (a)
     completely describes a component, including all information needed to construct the
     component and interact with it, or (b) specifies how to connect to and interact with an
     existing component that has been published as a remote service.
     <span class="emphasis"><em>Component</em></span> (also called <span class="emphasis"><em>Resource</em></span>) is a
     general term for modules produced by UIMA developers and used by UIMA applications. The
     types of Components are: Analysis Engines, Collection Readers, CAS
     Initializers<sup>[<a name="d5e71" href="#ftn.d5e71" class="footnote">1</a>]</sup>, CAS Consumers, and Collection Processing Engines.
     However, Collection Processing Engine Descriptors are significantly different in
     format and are covered in a separate chapter, <a href="references.html#ugr.ref.xml.cpe_descriptor" class="olink">Chapter&nbsp;3, <i>Collection Processing Engine Descriptor Reference</i></a>.</p>

   <p><a class="xref" href="#ugr.ref.xml.component_descriptor.notation" title="2.1.&nbsp;Notation">Section&nbsp;2.1, &#8220;Notation&#8221;</a> describes the notation used in this
     chapter.</p>

   <p><a class="xref" href="#ugr.ref.xml.component_descriptor.imports" title="2.2.&nbsp;Imports">Section&nbsp;2.2, &#8220;Imports&#8221;</a> describes the UIMA SDK's
     <span class="emphasis"><em>import</em></span> syntax, used to allow XML descriptors to import
     information from other XML files, to allow sharing of information between several XML
     descriptors.</p>

   <p><a class="xref" href="#ugr.ref.xml.component_descriptor.aes" title="2.4.&nbsp;Analysis Engine Descriptors">Section&nbsp;2.4, &#8220;Analysis Engine Descriptors&#8221;</a> describes the XML format for <span class="emphasis"><em>Analysis Engine
     Descriptors</em></span>. These are descriptors that completely describe Analysis
     Engines, including all information needed to construct and interact with them.</p>

   <p><a class="xref" href="#ugr.ref.xml.component_descriptor.collection_processing_parts" title="2.6.&nbsp;Collection Processing Component Descriptors">Section&nbsp;2.6, &#8220;Collection Processing Component Descriptors&#8221;</a> describes the XML format for
     <span class="emphasis"><em>Collection Processing Component Descriptors</em></span>. This includes
     Collection Iterator, CAS Initializer, and CAS Consumer Descriptors.</p>

   <p><a class="xref" href="#ugr.ref.xml.component_descriptor.service_client" title="2.7.&nbsp;Service Client Descriptors">Section&nbsp;2.7, &#8220;Service Client Descriptors&#8221;</a> describes the XML format for
     <span class="emphasis"><em>Service Client Descriptors</em></span>, which specify how to connect to and
     interact with resources deployed as remote services.</p>

    <p><a class="xref" href="#ugr.ref.xml.component_descriptor.custom_resource_specifiers" title="2.8.&nbsp;Custom Resource Specifiers">Section&nbsp;2.8, &#8220;Custom Resource Specifiers&#8221;</a> describes the XML format for
     <span class="emphasis"><em>Custom Resource Specifiers</em></span>, which allow you to plug in your
     own Java class as a UIMA Resource.</p>

   <div class="section" title="2.1.&nbsp;Notation"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.xml.component_descriptor.notation">2.1.&nbsp;Notation</h2></div></div></div>


     <p>This chapter uses an informal notation to specify the syntax of Component
       Descriptors. The formal syntax is defined by an XML schema definition, which is
       contained in the file <code class="literal">resourceSpecifierSchema.xsd</code>,
       located in the <code class="literal">uima-core.jar</code> file.</p>

     <p>The notation used in this chapter is:</p>

     <div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>An ellipsis (...) inside an element body indicates
       that the substructure of that element has been omitted (to be described in another
       section of this chapter). An example of this would be:


       </p><pre class="programlisting">&lt;analysisEngineMetaData&gt;
 ...
 &lt;/analysisEngineMetaData&gt;</pre><p>
       An ellipsis immediately after an element indicates that the element type may be may be
       repeated arbitrarily many times. For example:


       </p><pre class="programlisting">&lt;parameter&gt;[String]&lt;/parameter&gt;
 &lt;parameter&gt;[String]&lt;/parameter&gt;
 ...</pre><p>
       indicates that there may be arbitrarily many parameter elements in this
       context.</p></li><li class="listitem"><p>Bracketed expressions (e.g. <code class="literal">[String]</code>)
         indicate the type of value that may be used at that location.</p></li><li class="listitem"><p>A vertical bar, as in <code class="literal">true|false</code>, indicates
         alternatives. This can be applied to literal values, bracketed type names, and
         elements.</p></li><li class="listitem"><p>Which elements are optional and which are required is specified in
         prose, not in the syntax definition. </p></li></ul></div>
   </div>

   <div class="section" title="2.2.&nbsp;Imports"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.xml.component_descriptor.imports">2.2.&nbsp;Imports</h2></div></div></div>


     <p>The UIMA SDK defines a particular syntax for XML descriptors to import information
       from other XML files. When one of the following appears in an XML descriptor:


       </p><pre class="programlisting">&lt;import location="[URL]" /&gt; or
 &lt;import name="[Name]" /&gt;</pre><p>
       it indicates that information from a separate XML file is being imported. Note that
       imports are allowed only in certain places in the descriptor. In the remainder of this
       chapter, it will be indicated at which points imports are allowed.</p>

     <p>If an import specifies a <code class="literal">location</code> attribute, the value of
       that attribute specifies the URL at which the XML file to import will be found. This can be
       a relative URL, which will be resolved relative to the descriptor containing the
       <code class="literal">import</code> element, or an absolute URL. Relative URLs can be written
       without a protocol/scheme (e.g., <span class="quote">&#8220;<span class="quote">file:</span>&#8221;</span>), and without a host machine
       name. In this case the relative URL might look something like
       <code class="literal">org/apache/myproj/MyTypeSystem.xml.</code></p>

     <p>An absolute URL is written with one of the following prefixes, followed by a path
       such as <code class="literal">org/apache/myproj/MyTypeSystem.xml</code>:

       </p><div class="itemizedlist"><ul class="itemizedlist" type="disc" compact><li class="listitem"><p>file:/ <span class="symbol">&#8592;</span> has no network
         address</p></li><li class="listitem"><p>file:/// <span class="symbol">&#8592;</span> has an empty network address</p></li><li class="listitem"><p>file://some.network.address/</p></li></ul></div>

     <p>For more information about URLs, please read the javadoc information for the Java
       class <span class="quote">&#8220;<span class="quote">URL</span>&#8221;</span>.</p>

     <p>If an import specifies a <code class="literal">name</code> attribute, the value of that
       attribute should take the form of a Java-style dotted name (e.g.
       <code class="literal">org.apache.myproj.MyTypeSystem</code>). An .xml file with this name
       will be searched for in the classpath or datapath (described below). As in Java, the dots
       in the name will be converted to file path separators. So an import specifying the
       example name in this paragraph will result in a search for
       <code class="literal">org/apache/myproj/MyTypeSystem.xml</code> in the classpath or
       datapath.</p>

     <p><a name="ugr.ref.xml.component_descriptor.datapath"></a>The datapath works similarly to the classpath but can be set programmatically
       through the resource manager API. Application developers can specify a datapath
       during initialization, using the following code:


       </p><pre class="programlisting">
 ResourceManager resMgr = UIMAFramework.newDefaultResourceManager();
 resMgr.setDataPath(yourPathString);
 AnalysisEngine ae =
   UIMAFramework.produceAnalysisEngine(desc, resMgr, null);
 </pre>

     <p>The default datapath for the entire JVM can be set via the
       <code class="literal">uima.datapath</code> Java system property, but this feature should
       only be used for standalone applications that don't need to run in the same JVM as
       other code that may need a different datapath.</p>

     <p>The value of a name or location attribute may be parameterized with references to external
     override variables using the <code class="literal">${variable-name}</code> syntax.
     </p><pre class="programlisting">&lt;import location="Annotator${with}ExternalOverrides.xml" /&gt;</pre><p>
 	If a variable is undefined the value is left unmodified and a warning message identifies the missing
 	variable.</p>

     <p>Previous versions of UIMA also supported XInclude. That support didn't work in
       many situations, and it is no longer supported. To include other files, please use
       &lt;import&gt;.</p>


   </div>

   <div class="section" title="2.3.&nbsp;Type System Descriptors"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.xml.component_descriptor.type_system">2.3.&nbsp;Type System Descriptors</h2></div></div></div>


     <p>A Type System Descriptor is used to define the types and features that can be
       represented in the CAS. A Type System Descriptor can be imported into an Analysis Engine
       or Collection Processing Component Descriptor.</p>

     <p>The basic structure of a Type System Descriptor is as follows:


       </p><pre class="programlisting">&lt;typeSystemDescription xmlns="http://uima.apache.org/resourceSpecifier"&gt;

   &lt;name&gt; [String] &lt;/name&gt;
   &lt;description&gt;[String]&lt;/description&gt;
   &lt;version&gt;[String]&lt;/version&gt;
   &lt;vendor&gt;[String]&lt;/vendor&gt;

   &lt;imports&gt;
     &lt;import ...&gt;
     ...
   &lt;/imports&gt;

   &lt;types&gt;
     &lt;typeDescription&gt;
       ...
     &lt;/typeDescription&gt;

     ...

   &lt;/types&gt;

 &lt;/typeSystemDescription&gt;</pre>

     <p>All of the subelements are optional.</p>

     <div class="section" title="2.3.1.&nbsp;Imports"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.xml.component_descriptor.type_system.imports">2.3.1.&nbsp;Imports</h3></div></div></div>


       <p>The <code class="literal">imports</code> section allows this descriptor to import
         types from other type system descriptors. The import syntax is described in <a class="xref" href="#ugr.ref.xml.component_descriptor.imports" title="2.2.&nbsp;Imports">Section&nbsp;2.2, &#8220;Imports&#8221;</a>. A type system may import any number of other type
         systems and then define additional types which refer to imported types. Circular
         imports are allowed.</p>
     </div>

     <div class="section" title="2.3.2.&nbsp;Types"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.xml.component_descriptor.type_system.types">2.3.2.&nbsp;Types</h3></div></div></div>


       <p>The <code class="literal">types</code> element contains zero or more
         <code class="literal">typeDescription</code> elements. Each
         <code class="literal">typeDescription</code> has the form:


         </p><pre class="programlisting">&lt;typeDescription&gt;
   &lt;name&gt;[TypeName]&lt;/name&gt;
   &lt;description&gt;[String]&lt;/description&gt;
   &lt;supertypeName&gt;[TypeName]&lt;/supertypeName&gt;
   &lt;features&gt;
     ...
   &lt;/features&gt;
 &lt;/typeDescription&gt;</pre>

       <p>The name element contains the name of the type. A
         <code class="literal">[TypeName]</code> is a dot-separated list of names, where each name
         consists of a letter followed by any number of letters, digits, or underscores.
         <code class="literal">TypeNames</code> are case sensitive. Letter and digit are as defined
         by Java; therefore, any Unicode letter or digit may be used (subject to the character
         encoding defined by the descriptor file's XML header). The name following the
         final dot is considered to be the <span class="quote">&#8220;<span class="quote">short name</span>&#8221;</span> of the type; the
         preceding portion is the namespace (analogous to the package.class syntax used in
         Java). Namespaces beginning with uima are reserved and should not be used. Examples
         of valid type names are:</p>

       <div class="itemizedlist"><ul class="itemizedlist" type="disc" compact><li class="listitem"><p>test.TokenAnnotation</p>
         </li><li class="listitem"><p>org.myorg.TokenAnnotation</p></li><li class="listitem"><p>com.my_company.proj123.TokenAnnotation </p></li></ul></div>

       <p>These would all be considered distinct types since they have different
         namespaces. Best practice here is to follow the normal Java naming conventions of
         having namespaces be all lowercase, with the short type names having an initial
         capital, but this is not mandated, so <code class="literal">ABC.mYtyPE</code> is an allowed
         type name. While type names without namespaces (e.g.
         <code class="literal">TokenAnnotation</code> alone) are allowed, but discouraged because
         naming conflicts can then result when combining annotators that use different
         type systems.</p>

       <p>The <code class="literal">description</code> element contains a textual description
         of the type. The <code class="literal">supertypeName</code> element contains the name of the
         type from which it inherits (this can be set to the name of another user-defined type,
         or it may be set to any built-in type which may be subclassed, such as
         <code class="literal">uima.tcas.Annotation</code> for a new annotation
         type or <code class="literal">uima.cas.TOP</code> for a new type that is not
         an annotation). All three of these elements are required.</p>

     </div>

     <div class="section" title="2.3.3.&nbsp;Features"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.xml.component_descriptor.type_system.features">2.3.3.&nbsp;Features</h3></div></div></div>


       <p>The <code class="literal">features</code> element of a
         <code class="literal">typeDescription</code> is required only if the type we are specifying
         introduces new features. If the <code class="literal">features</code> element is present,
         it contains zero or more <code class="literal">featureDescription</code> elements, each of
         which has the form:</p>


       <pre class="programlisting">&lt;featureDescription&gt;
   &lt;name&gt;[Name]&lt;/name&gt;
   &lt;description&gt;[String]&lt;/description&gt;
   &lt;rangeTypeName&gt;[Name]&lt;/rangeTypeName&gt;
   &lt;elementType&gt;[Name]&lt;/elementType&gt;
   &lt;multipleReferencesAllowed&gt;true|false&lt;/multipleReferencesAllowed&gt;
 &lt;/featureDescription&gt;</pre>

       <p>A feature's name follows the same rules as a type short name &#8211; a letter
         followed by any number of letters, digits, or underscores. Feature names are case
         sensitive.</p>

       <p>The feature's <code class="literal">rangeTypeName</code> specifies the type of
         value that the feature can take. This may be the name of any type defined in your type
         system, or one of the predefined types. All of the predefined types have names that are
         prefixed with <code class="literal">uima.cas</code> or <code class="literal">uima.tcas</code>,
         for example:


         </p><pre class="programlisting">uima.cas.TOP
 uima.cas.String
 uima.cas.Long
 uima.cas.FSArray
 uima.cas.StringList
 uima.tcas.Annotation.</pre><p>
         For a complete list of predefined types, see the CAS API documentation.</p>

       <p>The <code class="literal">elementType</code> of a feature is optional, and applies only
         when the <code class="literal">rangeTypeName</code> is
         <code class="literal">uima.cas.FSArray</code> or <code class="literal">uima.cas.FSList</code>
         The <code class="literal">elementType</code> specifies what type of value can be assigned as
         an element of the array or list. This must be the name of a non-primitive type. If
         omitted, it defaults to <code class="literal">uima.cas.TOP</code>, meaning that any
         FeatureStructure can be assigned as an element the array or list. Note: depending on
         the CAS Interface that you use in your code, this constraint may or may not be
         enforced.
         Note: At run time, the elementType is available from a runtime Feature object
             (using the <code class="literal">a_feature_object.getRange().getComponentType()</code> method)
             only when specified for the <code class="literal">uima.cas.FSArray</code> ranges; it isn't
             available for <code class="literal">uima.cas.FSList</code> ranges.
         </p>


       <p>The <code class="literal">multipleReferencesAllowed</code> feature is optional, and
         applies only when the <code class="literal">rangeTypeName</code> is an array or list type (it
         applies to arrays and lists of primitive as well as non-primitive types). Setting
         this to false (the default) indicates that this feature has exclusive ownership of
         the array or list, so changes to the array or list are localized. Setting this to true
         indicates that the array or list may be shared, so changes to it may affect other
         objects in the CAS. Note: there is currently no guarantee that the framework will
         enforce this restriction. However, this setting may affect how the CAS is
         serialized.</p>

     </div>

     <div class="section" title="2.3.4.&nbsp;String Subtypes"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.xml.component_descriptor.type_system.string_subtypes">2.3.4.&nbsp;String Subtypes</h3></div></div></div>


       <p>There is one other special type that you can declare &#8211; a subset of the String
         type that specifies a restricted set of allowed values. This is useful for features
         that can have only certain String values, such as parts of speech. Here is an example of
         how to declare such a type:</p>


       <pre class="programlisting">&lt;typeDescription&gt;
   &lt;name&gt;PartOfSpeech&lt;/name&gt;
   &lt;description&gt;A part of speech.&lt;/description&gt;
   &lt;supertypeName&gt;uima.cas.String&lt;/supertypeName&gt;
   &lt;allowedValues&gt;
     &lt;value&gt;
       &lt;string&gt;NN&lt;/string&gt;
       &lt;description&gt;Noun, singular or mass.&lt;/description&gt;
     &lt;/value&gt;
     &lt;value&gt;
       &lt;string&gt;NNS&lt;/string&gt;
       &lt;description&gt;Noun, plural.&lt;/description&gt;
     &lt;/value&gt;
     &lt;value&gt;
       &lt;string&gt;VB&lt;/string&gt;
       &lt;description&gt;Verb, base form.&lt;/description&gt;
     &lt;/value&gt;
     ...
   &lt;/allowedValues&gt;
 &lt;/typeDescription&gt;</pre>

     </div>
   </div>

   <div class="section" title="2.4.&nbsp;Analysis Engine Descriptors"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.xml.component_descriptor.aes">2.4.&nbsp;Analysis Engine Descriptors</h2></div></div></div>


     <p>Analysis Engine (AE) descriptors completely describe Analysis Engines. There
       are two basic types of Analysis Engines &#8211; <span class="emphasis"><em>Primitive</em></span> and
       <span class="emphasis"><em>Aggregate</em></span>. A <span class="emphasis"><em>Primitive</em></span> Analysis
       Engine is a container for a single <span class="emphasis"><em>annotator</em></span>, where as an
       <span class="emphasis"><em>Aggregate</em></span> Analysis Engine is composed of a collection of other
       Analysis Engines. (For more information on this and other terminology, see <a href="overview_and_setup.html#d4e1" class="olink">UIMA Overview &amp; SDK Setup</a> <a href="overview_and_setup.html#ugr.ovv.conceptual" class="olink">Chapter&nbsp;2, <i>UIMA Conceptual Overview</i></a>).</p>

     <p>Both Primitive and Aggregate Analysis Engines have descriptors, and the two types
       of descriptors have some similarities and some differences. <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.primitive" title="2.4.1.&nbsp;Primitive Analysis Engine Descriptors">Section&nbsp;2.4.1, &#8220;Primitive Analysis Engine Descriptors&#8221;</a>
       discusses Primitive Analysis Engine descriptors.  <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.aggregate" title="2.4.2.&nbsp;Aggregate Analysis Engine Descriptors">Section&nbsp;2.4.2, &#8220;Aggregate Analysis Engine Descriptors&#8221;</a> then
       describes how Aggregate Analysis Engine descriptors are different.</p>

     <div class="section" title="2.4.1.&nbsp;Primitive Analysis Engine Descriptors"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.xml.component_descriptor.aes.primitive">2.4.1.&nbsp;Primitive Analysis Engine Descriptors</h3></div></div></div>


       <div class="section" title="2.4.1.1.&nbsp;Basic Structure"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.component_descriptor.aes.primitive.basic">2.4.1.1.&nbsp;Basic Structure</h4></div></div></div>


         <pre class="programlisting">&lt;?xml version="1.0" encoding="UTF-8" ?&gt;
 &lt;analysisEngineDescription
         xmlns="http://uima.apache.org/resourceSpecifier"&gt;
   &lt;frameworkImplementation&gt;org.apache.uima.java&lt;/frameworkImplementation&gt;

   &lt;primitive&gt;true&lt;/primitive&gt;
   &lt;annotatorImplementationName&gt; [String] &lt;/annotatorImplementationName&gt;

   &lt;analysisEngineMetaData&gt;
     ...
   &lt;/analysisEngineMetaData&gt;

   &lt;externalResourceDependencies&gt;
     ...
   &lt;/externalResourceDependencies&gt;

   &lt;resourceManagerConfiguration&gt;
     ...
   &lt;/resourceManagerConfiguration&gt;

 &lt;/analysisEngineDescription&gt;</pre>

         <p>The document begins with a standard XML header. The recommended root tag is
           <code class="literal">&lt;analysisEngineDescription&gt;</code>, although
           <code class="literal">&lt;taeDescription&gt;</code> is also allowed for backwards
           compatibility.</p>

         <p>Within the root element we declare that we are using the XML namespace
           <code class="literal">http://uima.apache.org/resourceSpecifier.</code> It is
           required that this namespace be used; otherwise, the descriptor will not be able to
           be validated for errors.</p>

         <p> The first subelement,
           <code class="literal">&lt;frameworkImplementation&gt;,</code> currently must have
           the value <code class="literal">org.apache.uima.java</code>, or
           <code class="literal">org.apache.uima.cpp</code>. In future versions, there may be
           other framework implementations, or perhaps implementations produced by other
           vendors.</p>

         <p>The second subelement, <code class="literal">&lt;primitive&gt;,</code> contains
           the Boolean value <code class="literal">true</code>, indicating that this XML document
           describes a <span class="emphasis"><em>Primitive</em></span> Analysis Engine.</p>

         <p>The next subelement,<code class="literal">
           &lt;annotatorImplementationName&gt;</code> is how the UIMA framework
           determines which annotator class to use. This should contain a fully-qualified
           Java class name for Java implementations, or the name of a .dll or .so file for C++
           implementations.</p>

         <p>The <code class="literal">&lt;analysisEngineMetaData&gt;</code> object contains
           descriptive information about the analysis engine and what it does. It is
           described in <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.metadata" title="2.4.1.2.&nbsp;Analysis Engine MetaData">Section&nbsp;2.4.1.2, &#8220;Analysis Engine MetaData&#8221;</a>.</p>

         <p>The <code class="literal">&lt;externalResourceDependencies&gt;</code> and
           <code class="literal">&lt;resourceManagerConfiguration&gt;</code> elements declare
           the external resource files that the analysis engine relies
           upon. They are optional and are described in <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.primitive.external_resource_dependencies" title="2.4.1.8.&nbsp;External Resource Dependencies">Section&nbsp;2.4.1.8, &#8220;External Resource Dependencies&#8221;</a> and <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.primitive.resource_manager_configuration" title="2.4.1.9.&nbsp;Resource Manager Configuration">Section&nbsp;2.4.1.9, &#8220;Resource Manager Configuration&#8221;</a>.</p>

         </div>

         <div class="section" title="2.4.1.2.&nbsp;Analysis Engine MetaData"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.component_descriptor.aes.metadata">2.4.1.2.&nbsp;Analysis Engine MetaData</h4></div></div></div>


           <pre class="programlisting">&lt;analysisEngineMetaData&gt;
   &lt;name&gt; [String] &lt;/name&gt;
   &lt;description&gt;[String]&lt;/description&gt;
   &lt;version&gt;[String]&lt;/version&gt;
   &lt;vendor&gt;[String]&lt;/vendor&gt;

   &lt;configurationParameters&gt; ...  &lt;/configurationParameters&gt;

   &lt;configurationParameterSettings&gt;
     ...
   &lt;/configurationParameterSettings&gt;

   &lt;typeSystemDescription&gt; ... &lt;/typeSystemDescription&gt;

   &lt;typePriorities&gt; ... &lt;/typePriorities&gt;

   &lt;fsIndexCollection&gt; ... &lt;/fsIndexCollection&gt;

   &lt;capabilities&gt; ... &lt;/capabilities&gt;

   &lt;operationalProperties&gt; ... &lt;/operationalProperties&gt;

 &lt;/analysisEngineMetaData&gt;</pre>

           <p>The <code class="literal">analysisEngineMetaData</code> element contains four
             simple string fields &#8211; <code class="literal">name</code>,
             <code class="literal">description</code>, <code class="literal">version</code>, and
             <code class="literal">vendor</code>. Only the <code class="literal">name</code> field is
             required, but providing values for the other fields is recommended. The
             <code class="literal">name</code> field is just a descriptive name meant to be read by
             users; it does not need to be unique across all Analysis Engines.</p>

           <p>Configuration parameters are described in
             <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.configuration_parameters" title="2.4.3.&nbsp;Configuration Parameters">Section&nbsp;2.4.3, &#8220;Configuration Parameters&#8221;</a>.</p>

           <p>The other sub-elements &#8211;
             <code class="literal">typeSystemDescription</code>,
             <code class="literal">typePriorities</code>, <code class="literal">fsIndexes</code>,
             <code class="literal">capabilities</code> and
             <code class="literal">operationalProperties</code> are described in the following
             sections. The only one of these that is required is
             <code class="literal">capabilities</code>; the others are optional.</p>

         </div>

           <div class="section" title="2.4.1.3.&nbsp;Type System Definition"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.component_descriptor.aes.type_system">2.4.1.3.&nbsp;Type System Definition</h4></div></div></div>


             <pre class="programlisting">&lt;typeSystemDescription&gt;

   &lt;name&gt; [String] &lt;/name&gt;
   &lt;description&gt;[String]&lt;/description&gt;
   &lt;version&gt;[String]&lt;/version&gt;
   &lt;vendor&gt;[String]&lt;/vendor&gt;

   &lt;imports&gt;
     &lt;import ...&gt;
     ...
   &lt;/imports&gt;

   &lt;types&gt;
     &lt;typeDescription&gt;
       ...
     &lt;/typeDescription&gt;

     ...

   &lt;/types&gt;

 &lt;/typeSystemDescription&gt;</pre>

             <p>A <code class="literal">typeSystemDescription</code> element defines a type
               system for an Analysis Engine. The syntax for the element is described in <a class="xref" href="#ugr.ref.xml.component_descriptor.type_system" title="2.3.&nbsp;Type System Descriptors">Section&nbsp;2.3, &#8220;Type System Descriptors&#8221;</a>.</p>

             <p>The recommended usage is to <code class="literal">import</code> an external type
               system, using the import syntax described in <a class="xref" href="#ugr.ref.xml.component_descriptor.imports" title="2.2.&nbsp;Imports">Section&nbsp;2.2, &#8220;Imports&#8221;</a>
               of this chapter. For example:


               </p><pre class="programlisting">&lt;typeSystemDescription&gt;
   &lt;imports&gt;
     &lt;import location="MySharedTypeSystem.xml"&gt;
   &lt;/imports&gt;
 &lt;/typeSystemDescription&gt;</pre>

             <p>This allows several AEs to share a single type system definition. The file
               <code class="literal">MySharedTypeSystem.xml</code> would then contain the full
               type system information, including the <code class="literal">name</code>,
               <code class="literal">description</code>, <code class="literal">vendor</code>,
               <code class="literal">version</code>, and <code class="literal">types</code>.</p>

           </div>
           <div class="section" title="2.4.1.4.&nbsp;Type Priority Definition"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.component_descriptor.aes.type_priority">2.4.1.4.&nbsp;Type Priority Definition</h4></div></div></div>


             <pre class="programlisting">&lt;typePriorities&gt;
   &lt;name&gt; [String] &lt;/name&gt;
   &lt;description&gt;[String]&lt;/description&gt;
   &lt;version&gt;[String]&lt;/version&gt;
   &lt;vendor&gt;[String]&lt;/vendor&gt;

   &lt;imports&gt;
     &lt;import ...&gt;
     ...
   &lt;/imports&gt;

   &lt;priorityLists&gt;
     &lt;priorityList&gt;
       &lt;type&gt;[TypeName]&lt;/type&gt;
       &lt;type&gt;[TypeName]&lt;/type&gt;
         ...
     &lt;/priorityList&gt;

     ...

   &lt;/priorityLists&gt;
 &lt;/typePriorities&gt;</pre>

             <p>The <code class="literal">&lt;typePriorities&gt;</code> element contains
               zero or more <code class="literal">&lt;priorityList&gt;</code> elements; each
               <code class="literal">&lt;priorityList&gt;</code> contains zero or more types.
               Like a type system, a type priorities definition may also declare a name,
               description, version, and vendor, and may import other type priorities. See
                 <a class="xref" href="#ugr.ref.xml.component_descriptor.imports" title="2.2.&nbsp;Imports">Section&nbsp;2.2, &#8220;Imports&#8221;</a> for the import syntax.</p>

             <p>Type priority is used when iterating over feature structures in the CAS.
               For example, if the CAS contains a <code class="literal">Sentence</code> annotation
               and a <code class="literal">Paragraph</code> annotation with the same span of text
               (i.e. a one-sentence paragraph), which annotation should be returned first
               by an iterator? Probably the Paragraph, since it is conceptually
               <span class="quote">&#8220;<span class="quote">bigger,</span>&#8221;</span> but the framework does not know that and must be
               explicitly told that the Paragraph annotation has priority over the Sentence
               annotation, like this:


               </p><pre class="programlisting">&lt;typePriorities&gt;
   &lt;priorityList&gt;
     &lt;type&gt;org.myorg.Paragraph&lt;/type&gt;
     &lt;type&gt;org.myorg.Sentence&lt;/type&gt;
   &lt;/priorityList&gt;
 &lt;/typePriorities&gt;</pre>

             <p>All of the <code class="literal">&lt;priorityList&gt;</code> elements defined
               in the descriptor (and in all component descriptors of an aggregate analysis
               engine descriptor) are merged to produce a single priority list.</p>

             <p>Subtypes of types specified here are also ordered, unless overridden by
               another user-specified type ordering. For example, if you specify type A
               comes before type B, then subtypes of A will come before subtypes of B, unless
               there is an overriding specification which declares some subtype of B comes
               before some subtype of A.</p>

             <p>If there are inconsistencies between the priority list (type A declared
               before type B in one priority list, and type B declared before type A in
               another), the framework will throw an exception.</p>

             <p>User defined indexes may declare if they wish to use the type priority or
               not; see the next section.</p>
           </div>

           <div class="section" title="2.4.1.5.&nbsp;Index Definition"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.component_descriptor.aes.index">2.4.1.5.&nbsp;Index Definition</h4></div></div></div>


             <pre class="programlisting">&lt;fsIndexCollection&gt;

   &lt;name&gt;[String]&lt;/name&gt;
   &lt;description&gt;[String]&lt;/description&gt;
   &lt;version&gt;[String]&lt;/version&gt;
   &lt;vendor&gt;[String]&lt;/vendor&gt;

   &lt;imports&gt;
     &lt;import ...&gt;
     ...
   &lt;/imports&gt;

   &lt;fsIndexes&gt;

     &lt;fsIndexDescription&gt;
       ...
     &lt;/fsIndexDescription&gt;

     &lt;fsIndexDescription&gt;
       ...
     &lt;/fsIndexDescription&gt;

   &lt;/fsIndexes&gt;

 &lt;/fsIndexCollection&gt;</pre>

             <p>The <code class="literal">fsIndexCollection</code> element declares<span class="emphasis"><em> Feature Structure
               Indexes</em></span>, each of which defined an index that holds feature structures of a given type.
               Information in the CAS is always accessed through an index. There is a built-in default annotation
               index declared which can be used to access instances of type
               <code class="literal">uima.tcas.Annotation</code> (or its subtypes), sorted based on their
               <code class="literal">begin</code> and <code class="literal">end</code> features, and the type priority ordering (if specified).
               For all other types, there is a
               default, unsorted (bag) index. If there is a need for a specialized index it must be declared in this
               element of the descriptor. See <a href="references.html#ugr.ref.cas.indexes_and_iterators" class="olink">Section&nbsp;4.7, &#8220;Indexes and Iterators&#8221;</a> for details on FS indexes.</p>

             <p>Like type systems and type priorities, an
               <code class="literal">fsIndexCollection</code> can declare a
               <code class="literal">name</code>, <code class="literal">description</code>,
               <code class="literal">vendor</code>, and <code class="literal">version</code>, and may
               import other <code class="literal">fsIndexCollection</code>s. The import syntax is
               described in <a class="xref" href="#ugr.ref.xml.component_descriptor.imports" title="2.2.&nbsp;Imports">Section&nbsp;2.2, &#8220;Imports&#8221;</a>.</p>

             <p>An <code class="literal">fsIndexCollection</code> may also define zero or more
               <code class="literal">fsIndexDescription</code> elements, each of which defines a
               single index. Each <code class="literal">fsIndexDescription</code> has the form:


               </p><pre class="programlisting">&lt;fsIndexDescription&gt;

   &lt;label&gt;[String]&lt;/label&gt;
   &lt;typeName&gt;[TypeName]&lt;/typeName&gt;
   &lt;kind&gt;sorted|bag|set&lt;/kind&gt;

   &lt;keys&gt;

     &lt;fsIndexKey&gt;
       &lt;featureName&gt;[Name]&lt;/featureName&gt;
       &lt;comparator&gt;standard|reverse&lt;/comparator&gt;
     &lt;/fsIndexKey&gt;

     &lt;fsIndexKey&gt;
       &lt;typePriority/&gt;
     &lt;/fsIndexKey&gt;

     ...

   &lt;/keys&gt;
 &lt;/fsIndexDescription&gt;</pre>

             <p>The <code class="literal">label</code> element defines the name by which
               applications and annotators refer to this index. The
               <code class="literal">typeName</code> element contains the name of the type that will
               be contained in this index. This must match one of the type names defined in the
               <code class="literal">&lt;typeSystemDescription&gt;</code>.</p>

             <p>There are three possible values for the
               <code class="literal">&lt;kind&gt;</code> of index. Sorted indexes enforce an
               ordering of feature structures, based on defined keys.  Bag indexes do
               not enforce ordering, and have no defined keys. Set indexes do not
               enforce ordering, but use defined keys to specify equivalence classes;
               addToIndexes will not add a Feature Structure to a set index if its keys
               match those of an entry of the same type already in the index.
               If the <code class="literal">&lt;kind&gt;</code>element is omitted, it will default to
               sorted, which is the most common type of index.</p>

             <p>Prior to version 2.7.0, the bag and sorted indexes stored duplicate entries for the
             same identical FS, if it was added to the indexes multiple times. As of version 2.7.0, this
             is changed; a second or subsequent add to index operation has no effect.  This has the
             consequence that a remove operation now guarantees that the particular FS is removed
             (as opposed to only being able to say that one (of perhaps many duplicate entries) is removed).
             Since sending to remote annotators only adds entries to indexes at most once, this
             behavior is consistent with that.</p>

             <p>Note that even after this change, there is still a distinct difference in meaning for bag and set indexes.
             The set index uses equal defined key values plus the type of the Feature Structure to determine equivalence classes for Feature Structures, and
             will not add a Feature Structure if it has equal key values and the same type to an entry already in there.</p>

             <p>It is possible, however, that users may be depending on having multiple instances of
             the identical FeatureStructure in the indicies. Therefore, UIMA uses
              a JVM defined property,
             "uima.allow_duplicate_add_to_indexes", which (if defined whend UIMA is loaded) will restore the previous behavior.</p>

             <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>If duplicates are allowed, then the proper way to update an indexed Feature Structure is to
               </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>remove <span class="bold"><strong>*all*</strong></span> instances of the FS to be
                   updated </p></li><li class="listitem"><p>update the features</p></li><li class="listitem"><p>re-add the Feature Structure to the indexes (perhaps multiple times, depending on the
                 details of your logic).</p></li></ul></div></div>

             <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>There is usually no need to explicitly declare a Bag index in your descriptor.
               As of UIMA v2.1, if you do not declare any index for a type (or any of its
               supertypes), a Bag index will be automatically created if an instance of that type is added to the indexes.</p></div>

             <p>An Sorted or Set index may define zero or more <span class="emphasis"><em>keys</em></span>. These keys
               determine the sort order of the feature structures within a sorted index, and
               partially determine equality for set indexes (the equality measure always includes testing that the types are the same).
               Bag indexes do not use keys, and
 			  equality is determined by Feature Structure identity (that is, two elements
 			  are considered equal if and only if they are exactly the same feature structure,
 			  located in the same place in the CAS). Keys are
               ordered by precedence &#8211; the first key is evaluated first, and
               subsequent keys are evaluated only if necessary.</p>

             <p>Each key is represented by an <code class="literal">fsIndexKey</code> element.
               Most <code class="literal">fsIndexKeys</code> contains a
               <code class="literal">featureName</code> and a <code class="literal">comparator</code>.
               The <code class="literal">featureName</code> must match the name of one of the
               features for the type specified in the
               <code class="literal">&lt;typeName&gt;</code> element for this index. The
               comparator defines how the features will be compared &#8211; a value of
               <code class="literal">standard</code> means that features will be compared using the
               standard comparison for their data type (e.g. for numerical types, smaller
               values precede larger values, and for string types, Unicode string
               comparison is performed). A value of <code class="literal">reverse</code> means that
               features will be compared using the reverse of the standard comparison (e.g.
               for numerical types, larger values precede smaller values, etc.). For Set
               indexes, the comparator direction is ignored &#8211; the keys are only used
               for the equality testing.</p>

             <p>Each key used in comparisons must refer to a feature whose range type is
               Boolean, Byte, Short, Integer, Long, Float, Double, or String.
               </p>

             <p>There is a second type of a key, one which contains only the
               <code class="literal">&lt;typePriority/&gt;</code>. When this key is used, it
               indicates that Feature Structures will be compared using the type priorities
               declared in the <code class="literal">&lt;typePriorities&gt;</code> section of the
               descriptor.</p>

           </div>

           <div class="section" title="2.4.1.6.&nbsp;Capabilities"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.component_descriptor.aes.capabilities">2.4.1.6.&nbsp;Capabilities</h4></div></div></div>


             <pre class="programlisting">&lt;capabilities&gt;
   &lt;capability&gt;

     &lt;inputs&gt;
       &lt;type allAnnotatorFeatures="true|false"[TypeName]&lt;/type&gt;
       ...
       &lt;feature&gt;[TypeName]:[Name]&lt;/feature&gt;
       ...
     &lt;/inputs&gt;

     &lt;outputs&gt;
       &lt;type allAnnotatorFeatures="true|false"[TypeName]&lt;/type&gt;
       ...
       &lt;feature&gt;[TypeName]:[Name]&lt;/feature&gt;
       ...
     &lt;/output&gt;

     &lt;inputSofas&gt;
       &lt;sofaName&gt;[name]&lt;/sofaName&gt;
       ...
     &lt;/inputSofas&gt;

     &lt;outputSofas&gt;
       &lt;sofaName&gt;[name]&lt;/sofaName&gt;
       ...
     &lt;/outputSofas&gt;

     &lt;languagesSupported&gt;
       &lt;language&gt;[ISO Language ID]&lt;/language&gt;
         ...
     &lt;/languagesSupported&gt;
   &lt;/capability&gt;

   &lt;capability&gt;
     ...
   &lt;/capability&gt;

   ...

 &lt;/capabilities&gt;</pre>

             <p>The capabilities definition is used by the UIMA Framework in several
               ways, including setting up the Results Specification for process calls,
               routing control for aggregates based on language, and as part of the Sofa
               mapping function.</p>

             <p>The <code class="literal">capabilities</code> element contains one or more
               <code class="literal">capability</code> elements. In Version 2 and onwards, only one
               capability set should be used (multiple sets will continue to work for a while,
               but they're not logically consistently supported).
               </p>

             <p>Each <code class="literal">capability</code> contains
               <code class="literal">inputs</code>, <code class="literal">outputs</code>,
               <code class="literal">languagesSupported, inputSofas, and outputSofas</code>.
               Inputs and outputs element are required (though they may be empty);
               <code class="literal">&lt;languagesSupported&gt;, &lt;inputSofas</code>&gt;,
               and <code class="literal">&lt;outputSofas&gt;</code> are optional.</p>

             <p>Both inputs and outputs may contain a mixture of type and feature
               elements.</p>

             <p><code class="literal">&lt;type...&gt;</code> elements contain the name of one
               of the types defined in the type system or one of the built in types. Declaring a
               type as an input means that this component expects instances of this type to be
               in the CAS when it receives it to process. Declaring a type as an output means
               that this component creates new instances of this type in the CAS.</p>

             <p>There is an optional attribute
               <code class="literal">allAnnotatorFeatures</code>, which defaults to false if
               omitted. The Component Descriptor Editor tool defaults this to true when a new
               type is added to the list of inputs and/or outputs. When this attribute is true,
               it specifies that all of the type's features are also declared as input or
               output. Otherwise, the features that are required as inputs or populated as
               outputs must be explicitly specified in feature elements.</p>

             <p><code class="literal">&lt;feature...&gt;</code> elements contain the
               <span class="quote">&#8220;<span class="quote">fully-qualified</span>&#8221;</span> feature name, which is the type name
               followed by a colon, followed by the feature name, e.g.
               <code class="literal">org.myorg.TokenAnnotation:lemma</code>.
               <code class="literal">&lt;feature...&gt;</code> elements in the
               <code class="literal">&lt;inputs&gt;</code> section must also have a corresponding
               type declared as an input. In output sections, this is not required. If the type
               is not specified as an output, but a feature for that type is, this means that
               existing instances of the type have the values of the specified features
               updated. Any type mentioned in a <code class="literal">&lt;feature&gt;</code>
               element must be either specified as an input or an output or both.</p>

             <p><code class="literal">language </code>elements contain one of the ISO language
               identifiers, such as <code class="literal">en</code> for English, or
               <code class="literal">en-US</code> for the United States dialect of English.</p>

             <p>The list of language codes can be found here: <a class="ulink" href="http://www.ics.uci.edu/pub/ietf/http/related/iso639.txt" target="_top">http://www.ics.uci.edu/pub/ietf/http/related/iso639.txt</a>
               and the country codes here:
               <a class="ulink" href="http://www.chemie.fu-berlin.de/diverse/doc/ISO_3166.html" target="_top">http://www.chemie.fu-berlin.de/diverse/doc/ISO_3166.html</a>
               </p>

             <p><code class="literal">&lt;inputSofas&gt;</code> and
               <code class="literal">&lt;outputSofas&gt;</code> declare sofa names used by this
               component. All Sofa names must be unique within a particular capability set. A
               Sofa name must be an input or an output, and cannot be both. It is an error to have a
               Sofa name declared as an input in one capability set, and also have it declared
               as an output in another capability set.</p>

             <p>A <code class="literal">&lt;sofaName&gt;</code> is written as a simple
               Java-style identifier, without any periods in the name, except that it may be
               written to end in <span class="quote">&#8220;<span class="quote"><code class="literal">.*</code></span>&#8221;</span>. If written in this
               manner, it specifies a set of Sofa names, all of which start with the base name
               (the part before the .*) followed by a period and then an arbitrary Java
               identifier (without periods). This form is used to specify in the descriptor
               that the component could generate an arbitrary number of Sofas, the exact
               names and numbers of which are unknown before the component is run.</p>

           </div>

           <div class="section" title="2.4.1.7.&nbsp;OperationalProperties"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.component_descriptor.aes.operational_properties">2.4.1.7.&nbsp;OperationalProperties</h4></div></div></div>


             <p>Components can specify specific operational properties that can be
               useful in deployment. The following are available:</p>


             <pre class="programlisting">&lt;operationalProperties&gt;
   &lt;modifiesCas&gt; true|false &lt;/modifiesCas&gt;
   &lt;multipleDeploymentAllowed&gt; true|false &lt;/multipleDeploymentAllowed&gt;
   &lt;outputsNewCASes&gt; true|false &lt;/outputsNewCASes&gt;
 &lt;/operationalProperties&gt;</pre>

             <p><code class="literal">ModifiesCas</code>, if false, indicates that this
               component does not modify the CAS. If it is not specified, the default value is
               true except for CAS Consumer components.</p>

             <p><code class="literal">multipleDeploymentAllowed</code>, if true, allows the
               component to be deployed multiple times to increase performance through
               scale-out techniques. If it is not specified, the default value is true,
               except for CAS Consumer and Collection Reader components.</p>

             <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>If you wrap one or more CAS Consumers inside an aggregate as the only
             components, you must explicitly specify in the aggregate the
             <code class="literal">multipleDeploymentAllowed</code> property as false (assuming the CAS Consumer
             components take the default here); otherwise the framework will complain about inconsistent
             settings for these.</p></div>

             <p><code class="literal">outputsNewCASes</code>, if true, allows the component to
               create new CASes during processing, for example to break a large artifact into
               smaller pieces. See <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.cm" class="olink">Chapter&nbsp;7, <i>CAS Multiplier Developer's Guide</i></a> for details.</p>
           </div>

           <div class="section" title="2.4.1.8.&nbsp;External Resource Dependencies"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.component_descriptor.aes.primitive.external_resource_dependencies">2.4.1.8.&nbsp;External Resource Dependencies</h4></div></div></div>


             <pre class="programlisting">&lt;externalResourceDependencies&gt;
   &lt;externalResourceDependency&gt;
     &lt;key&gt;[String]&lt;/key&gt;
     &lt;description&gt;[String] &lt;/description&gt;
     &lt;interfaceName&gt;[String]&lt;/interfaceName&gt;
     &lt;optional&gt;true|false&lt;/optional&gt;
   &lt;/externalResourceDependency&gt;

   &lt;externalResourceDependency&gt;
     ...
   &lt;/externalResourceDependency&gt;

   ...

 &lt;/externalResourceDependencies&gt;</pre>

             <p>A primitive annotator may declare zero or more
               <code class="literal">&lt;externalResourceDependency&gt;</code> elements. Each
               dependency has the following elements:

               </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><code class="literal">key</code> &#8211; the
                 string by which the annotator code will attempt to access the resource. Must
                 be unique within this annotator.</p></li><li class="listitem"><p><code class="literal">description</code> &#8211; a textual
                   description of the dependency.</p></li><li class="listitem"><p><code class="literal">interfaceName</code> &#8211; the
                   fully-qualified name of the Java interface through which the annotator
                   will access the data. This is optional. If not specified, the annotator
                   can only get an InputStream to the data.</p></li><li class="listitem"><p><code class="literal">optional</code> &#8211; whether the
                   resource is optional. If false, an exception will be thrown if no resource
                   is assigned to satisfy this dependency. Defaults to false. </p>
                   </li></ul></div>

           </div>

           <div class="section" title="2.4.1.9.&nbsp;Resource Manager Configuration"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.component_descriptor.aes.primitive.resource_manager_configuration">2.4.1.9.&nbsp;Resource Manager Configuration</h4></div></div></div>


             <pre class="programlisting">&lt;resourceManagerConfiguration&gt;

   &lt;name&gt;[String]&lt;/name&gt;
   &lt;description&gt;[String]&lt;/description&gt;
   &lt;version&gt;[String]&lt;/version&gt;
   &lt;vendor&gt;[String]&lt;/vendor&gt;

   &lt;imports&gt;
     &lt;import ...&gt;
     ...
   &lt;/imports&gt;

   &lt;externalResources&gt;

     &lt;externalResource&gt;
       &lt;name&gt;[String]&lt;/name&gt;
       &lt;description&gt;[String]&lt;/description&gt;
       &lt;fileResourceSpecifier&gt;
         &lt;fileUrl&gt;[URL]&lt;/fileUrl&gt;
       &lt;/fileResourceSpecifier&gt;
       &lt;implementationName&gt;[String]&lt;/implementationName&gt;
     &lt;/externalResource&gt;
     ...
   &lt;/externalResources&gt;

   &lt;externalResourceBindings&gt;
     &lt;externalResourceBinding&gt;
       &lt;key&gt;[String]&lt;/key&gt;
       &lt;resourceName&gt;[String]&lt;/resourceName&gt;
     &lt;/externalResourceBinding&gt;
     ...
   &lt;/externalResourceBindings&gt;

 &lt;/resourceManagerConfiguration&gt;</pre>

             <p>This element declares external resources and binds them to
               annotators' external resource dependencies.</p>

             <p>The <code class="literal">resourceManagerConfiguration</code> element may
               optionally contain an <code class="literal">import</code>, which allows resource
               definitions to be stored in a separate (shareable) file. See <a class="xref" href="#ugr.ref.xml.component_descriptor.imports" title="2.2.&nbsp;Imports">Section&nbsp;2.2, &#8220;Imports&#8221;</a> for details.</p>

             <p>The <code class="literal">externalResources</code> element contains zero or
               more <code class="literal">externalResource</code> elements, each of which
               consists of:

               </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><code class="literal">name</code> &#8211; the
                 name of the resource. This name is referred to in the bindings (see below).
                 Resource names need to be unique within any Aggregate Analysis Engine or
                 Collection Processing Engine, so the Java-like
                 <code class="literal">org.myorg.mycomponent.MyResource</code> syntax is
                 recommended.</p></li><li class="listitem"><p><code class="literal">description</code> &#8211; English
                   description of the resource.</p></li><li class="listitem"><p>Resource Specifier &#8211;
                   Declares the location of the resource. There are different
                   possibilities for how this is done (see below).</p></li><li class="listitem"><p><code class="literal">implementationName</code> &#8211; The
                   fully-qualified name of the Java class that will be instantiated from the
                   resource data. This is optional; if not specified, the resource will be
                   accessible as an input stream to the raw data. If specified, the Java class
                   must implement the <code class="literal">interfaceName</code> that is
                   specified in the External Resource Dependency to which it is bound.
                   </p></li></ul></div>

             <p>One possibility for the resource specifier is a
               <code class="literal">&lt;fileResourceSpecifier&gt;</code>, as shown above. This
               simply declares a URL to the resource data. This support is built on the Java
               class URL and its method URL.openStream(); it supports the protocols
               <span class="quote">&#8220;<span class="quote">file</span>&#8221;</span>, <span class="quote">&#8220;<span class="quote">http</span>&#8221;</span> and <span class="quote">&#8220;<span class="quote">jar</span>&#8221;</span> (for
               referring to files in jars) by default, and you can plug in handlers for other
               protocols. The URL has to start with file: (or some other protocol). It is
               relative to either the classpath or the <span class="quote">&#8220;<span class="quote">data path</span>&#8221;</span>. The data
               path works like the classpath but can be set programmatically via
               <code class="literal">ResourceManager.setDataPath()</code>. Setting the Java
               System property <code class="literal">uima.datapath</code> also works.</p>

             <p><code class="literal">file:com/apache.d.txt</code> is a relative path;
               relative paths for resources are resolved using the classpath and/or the
               datapath. For the file protocol, URLs starting with file:/ or file:/// are
               absolute. Note that <code class="literal">file://org/apache/d.txt</code> is NOT an
               absolute path starting with <span class="quote">&#8220;<span class="quote">org</span>&#8221;</span>. The <span class="quote">&#8220;<span class="quote">//</span>&#8221;</span>
               indicates that what follows is a host name. Therefore if you try to use this URL
               it will complain that it can't connect to the host <span class="quote">&#8220;<span class="quote">org</span>&#8221;</span>
               </p>

 			<p>The URL value may contain references to external override variables using the
  		      <code class="literal">${variable-name}</code> syntax,
 			  e.g. <code class="literal">file:com/${dictUrl}.txt</code>.
 			  If a variable is undefined the value is left unmodified and a warning message
  		      identifies the missing variable.
 			  </p>

             <p>Another option is a
               <code class="literal">&lt;fileLanguageResourceSpecifier&gt;</code>, which is
               intended to support resources, such as dictionaries, that depend on the
               language of the document being processed. Instead of a single URL, a prefix and
               suffix are specified, like this:


               </p><pre class="programlisting">&lt;fileLanguageResourceSpecifier&gt;
   &lt;fileUrlPrefix&gt;file:FileLanguageResource_implTest_data_&lt;/fileUrlPrefix&gt;
   &lt;fileUrlSuffix&gt;.dat&lt;/fileUrlSuffix&gt;
 &lt;/fileLanguageResourceSpecifier&gt;</pre>

             <p>The URL of the actual resource is then formed by concatenating the prefix,
               the language of the document (as an ISO language code, e.g.
               <code class="literal">en</code> or <code class="literal">en-US</code>
               &#8211; see <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.capabilities" title="2.4.1.6.&nbsp;Capabilities">Section&nbsp;2.4.1.6, &#8220;Capabilities&#8221;</a> for more
               information), and the suffix.</p>

 		    <p>A third option is a <code class="literal">customResourceSpecifier</code>, which allows
 			  you to plug in an arbitrary Java class.  See <a class="xref" href="#ugr.ref.xml.component_descriptor.custom_resource_specifiers" title="2.8.&nbsp;Custom Resource Specifiers">Section&nbsp;2.8, &#8220;Custom Resource Specifiers&#8221;</a>
 			  for more information.</p>

             <p>The <code class="literal">externalResourceBindings</code> element declares
               which resources are bound to which dependencies. Each
               <code class="literal">externalResourceBinding</code> consists of:

               </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><code class="literal">key</code> &#8211;
                 identifies the dependency. For a binding declared in a primitive analysis
                 engine descriptor, this must match the value of the
                 <code class="literal">key</code> element of one of the
                 <code class="literal">externalResourceDependency</code> elements. Bindings
                 may also be specified in aggregate analysis engine descriptors, in which
                 case a compound key is used
                 &#8211; see <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.aggregate.external_resource_bindings" title="2.4.2.4.&nbsp;External Resource Bindings">Section&nbsp;2.4.2.4, &#8220;External Resource Bindings&#8221;</a>
                 .</p></li><li class="listitem"><p><code class="literal">resourceName</code> &#8211; the name of
                   the resource satisfying the dependency. This must match the value of the
                   <code class="literal">name</code> element of one of the
                   <code class="literal">externalResource</code> declarations. </p>
                   </li></ul></div>

             <p>A given resource dependency may only be bound to one external resource;
               one external resource may be bound to many dependencies &#8211; to allow
               resource sharing.</p>
           </div>

           <div class="section" title="2.4.1.10.&nbsp;Environment Variable References"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.component_descriptor.aes.environment_variable_references">2.4.1.10.&nbsp;Environment Variable References</h4></div></div></div>


             <p>In several places throughout the descriptor, it is possible to reference
               environment variables. In Java, these are actually references to Java system
               properties. To reference system environment variables from a Java analysis
               engine you must pass the environment variables into the Java virtual machine
               by using the <code class="literal">&#8722;D</code> option on the <code class="literal">java</code>
               command line.</p>

             <p>The syntax for environment variable references is
               <code class="literal">&lt;envVarRef&gt;[VariableName]&lt;/envVarRef&gt;</code>
               , where [VariableName] is any valid Java system property name. Environment
               variable references are valid in the following places:

               </p><div class="itemizedlist"><ul class="itemizedlist" type="disc" compact><li class="listitem"><p>The value of a
                 configuration parameter (String-valued parameters only)</p>
                 </li><li class="listitem"><p>The
                   <code class="literal">&lt;annotatorImplementationName&gt;</code> element
                   of a primitive AE descriptor</p></li><li class="listitem"><p>The <code class="literal">&lt;name&gt;</code> element within
                   <code class="literal">&lt;analysisEngineMetaData&gt;</code></p>
                   </li><li class="listitem"><p>Within a
                   <code class="literal">&lt;fileResourceSpecifier&gt;</code> or
                   <code class="literal">&lt;fileLanguageResourceSpecifier&gt;</code>
                   </p></li></ul></div>

             <p>For example, if the value of a configuration parameter were specified as:
               <code class="literal">&lt;string&gt;&lt;envVarRef&gt;TEMP_DIR&lt;/envVarRef&gt;/temp.dat&lt;/string&gt;</code>
               , and the value of the <code class="literal">TEMP_DIR</code> Java System property were
               <code class="literal">c:/temp</code>, then the configuration parameter's
               value would evaluate to <code class="literal">c:/temp/temp.dat</code>.</p>

             <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>The Component Descriptor Editor does not support
               environment variable references.  If you need to, however, you
               can use the <code class="code">source</code> tab view in the CDE to manually
               add this notation.
               </p></div>

           </div>
         </div>
         <div class="section" title="2.4.2.&nbsp;Aggregate Analysis Engine Descriptors"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.xml.component_descriptor.aes.aggregate">2.4.2.&nbsp;Aggregate Analysis Engine Descriptors</h3></div></div></div>


           <p>Aggregate Analysis Engines do not contain an annotator, but instead
             contain one or more component (also called <span class="emphasis"><em>delegate</em></span>)
             analysis engines.</p>

           <p>Aggregate Analysis Engine Descriptors maintain most of the same structure
             as Primitive Analysis Engine Descriptors. The differences are:</p>

           <div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>An Aggregate Analysis Engine Descriptor
             contains the element
             <code class="literal">&lt;primitive&gt;false&lt;/primitive&gt;</code> rather
             than <code class="literal">&lt;primitive&gt;true&lt;/primitive&gt;</code>.
             </p></li><li class="listitem"><p>An Aggregate Analysis Engine Descriptor must not include a
               <code class="literal">&lt;annotatorImplementationName&gt;</code>
               element.</p></li><li class="listitem"><p>In place of the
               <code class="literal">&lt;annotatorImplementationName&gt;</code>, an Aggregate
               Analysis Engine Descriptor must have a
               <code class="literal">&lt;delegateAnalysisEngineSpecifiers&gt;</code>
               element. See <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.aggregate.delegates" title="2.4.2.1.&nbsp;Delegate Analysis Engine Specifiers">Section&nbsp;2.4.2.1, &#8220;Delegate Analysis Engine Specifiers&#8221;</a>.</p>
               </li><li class="listitem"><p>An Aggregate Analysis Engine Descriptor may provide a
               <code class="literal">&lt;flowController&gt;</code> element immediately
               following the
               <code class="literal">&lt;delegateAnalysisEngineSpecifiers&gt;</code>. <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.aggregate.flow_controller" title="2.4.2.2.&nbsp;FlowController">Section&nbsp;2.4.2.2, &#8220;FlowController&#8221;</a>.</p></li><li class="listitem"><p>Under the analysisEngineMetaData element, an Aggregate
               Analysis Engine Descriptor may specify an additional element --
               <code class="literal">&lt;flowConstraints&gt;</code>. See <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.aggregate.flow_constraints" title="2.4.2.3.&nbsp;FlowConstraints">Section&nbsp;2.4.2.3, &#8220;FlowConstraints&#8221;</a>. Typically only one
               of <code class="literal">&lt;flowController&gt;</code> and
               <code class="literal">&lt;flowConstraints&gt;</code> are specified. If both are
               specified, the <code class="literal">&lt;flowController&gt;</code> takes
               precedence, and the flow controller implementation can use the information
               in specified in the <code class="literal">&lt;flowConstraints&gt;</code> as part of
               its configuration input.</p></li><li class="listitem"><p>An aggregate Analysis Engine Descriptors must not contain a
               <code class="literal">&lt;typeSystemDescription&gt;</code> element. The Type
               System of the Aggregate Analysis Engine is derived by merging the Type System
               of the Analysis Engines that the aggregate contains.</p></li><li class="listitem"><p>Within aggregate Analysis Engine Descriptors,
               <code class="literal">&lt;configurationParameter&gt;</code> elements may define
               <code class="literal">&lt;overrides&gt;</code>. See <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.aggregate.configuration_parameter_overrides" title="2.4.3.3.&nbsp;Configuration Parameter Overrides">Section&nbsp;2.4.3.3, &#8220;Configuration Parameter Overrides&#8221;</a>
               .</p></li><li class="listitem"><p>External Resource Bindings can bind resources to
               dependencies declared by any delegate AE within the aggregate. See <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.aggregate.external_resource_bindings" title="2.4.2.4.&nbsp;External Resource Bindings">Section&nbsp;2.4.2.4, &#8220;External Resource Bindings&#8221;</a>.</p>
               </li><li class="listitem"><p>An additional optional element,
               <code class="literal">&lt;sofaMappings&gt;</code>, may be included. </p>
               </li></ul></div>

           <div class="section" title="2.4.2.1.&nbsp;Delegate Analysis Engine Specifiers"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.component_descriptor.aes.aggregate.delegates">2.4.2.1.&nbsp;Delegate Analysis Engine Specifiers</h4></div></div></div>


             <pre class="programlisting">&lt;delegateAnalysisEngineSpecifiers&gt;

   &lt;delegateAnalysisEngine key="[String]"&gt;
     &lt;analysisEngineDescription&gt;...&lt;/analysisEngineDescription&gt; |
     &lt;import .../&gt;
   &lt;/delegateAnalysisEngine&gt;

   &lt;delegateAnalysisEngine key="[String]"&gt;
     ...
   &lt;/delegateAnalysisEngine&gt;

   ...

 &lt;/delegateAnalysisEngineSpecifiers&gt;</pre>

             <p>The <code class="literal">delegateAnalysisEngineSpecifiers</code> element
               contains one or more <code class="literal">delegateAnalysisEngine</code>
               elements. Each of these must have a unique key, and must contain
               either:</p>

             <div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>A complete
               <code class="literal">analysisEngineDescription</code> element describing the
               delegate analysis engine <span class="bold"><strong>OR</strong></span></p>
               </li><li class="listitem"><p>An <code class="literal">import</code> element giving the name or
                 location of the XML descriptor for the delegate analysis engine (see <a class="xref" href="#ugr.ref.xml.component_descriptor.imports" title="2.2.&nbsp;Imports">Section&nbsp;2.2, &#8220;Imports&#8221;</a>).</p></li></ul></div>

             <p>The latter is the much more common usage, and is the only form supported by
               the Component Descriptor Editor tool.</p>
           </div>
           <div class="section" title="2.4.2.2.&nbsp;FlowController"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.component_descriptor.aes.aggregate.flow_controller">2.4.2.2.&nbsp;FlowController</h4></div></div></div>


             <pre class="programlisting">&lt;flowController key="[String]"&gt;
     &lt;flowControllerDescription&gt;...&lt;/flowControllerDescription&gt; |
     &lt;import .../&gt;
   &lt;/flowController&gt;</pre>

             <p>The optional <code class="literal">flowController</code> element identifies
               the descriptor of the FlowController component that will be used to determine
               the order in which delegate Analysis Engine are called.</p>

             <p>The <code class="literal">key</code> attribute is optional, but recommended; it
               assigns the FlowController an identifier that can be used for configuration
               parameter overrides, Sofa mappings, or external resource bindings. The key
               must not be the same as any of the delegate analysis engine keys.</p>

             <p>As with the <code class="literal">delegateAnalysisEngine</code> element, the
               <code class="literal">flowController</code> element may contain either a complete
               <code class="literal">flowControllerDescription</code> or an
               <code class="literal">import</code>, but the import is recommended. The Component
               Descriptor Editor tool only supports imports here.</p>

           </div>
           <div class="section" title="2.4.2.3.&nbsp;FlowConstraints"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.component_descriptor.aes.aggregate.flow_constraints">2.4.2.3.&nbsp;FlowConstraints</h4></div></div></div>


             <p>If a <code class="literal">&lt;flowController&gt;</code> is not specified, the
               order in which delegate Analysis Engines are called within the aggregate
               Analysis Engine is specified using the
               <code class="literal">&lt;flowConstraints&gt;</code> element, which must occur
               immediately following the
               <code class="literal">configurationParameterSettings</code> element. If a
               <code class="literal">&lt;flowController&gt;</code> is specified, then the
               <code class="literal">&lt;flowConstraints&gt;</code> are optional. They can be
               used to pass an ordering of delegate keys to the
               <code class="literal">&lt;flowController&gt;</code>.</p>

             <p>There are two options for flow constraints --
               <code class="literal">&lt;fixedFlow&gt;</code> or
               <code class="literal">&lt;capabilityLanguageFlow&gt;</code>. Each is discussed
               in a separate section below.</p>

             <div class="section" title="Fixed Flow"><div class="titlepage"><div><div><h5 class="title" id="ugr.ref.xml.component_descriptor.aes.aggregate.flow_constraints.fixed_flow">Fixed Flow</h5></div></div></div>


               <pre class="programlisting">&lt;flowConstraints&gt;
   &lt;fixedFlow&gt;
     &lt;node&gt;[String]&lt;/node&gt;
     &lt;node&gt;[String]&lt;/node&gt;
     ...
   &lt;/fixedFlow&gt;
 &lt;/flowConstraints&gt;</pre>

               <p>The <code class="literal">flowConstraints</code> element must be included
                 immediately following the
                 <code class="literal">configurationParameterSettings</code> element.</p>

               <p>Currently the <code class="literal">flowConstraints</code> element must
                 contain a <code class="literal">fixedFlow</code> element. Eventually, other
                 types of flow constraints may be possible.</p>

               <p>The <code class="literal">fixedFlow</code> element contains one or more
                 <code class="literal">node</code> elements, each of which contains an identifier
                 which must match the key of a delegate analysis engine specified in the
                 <code class="literal">delegateAnalysisEngineSpecifiers</code>
                 element.</p>

             </div>
             <div class="section" title="Capability Language Flow"><div class="titlepage"><div><div><h5 class="title" id="ugr.ref.xml.component_descriptor.aes.aggregate.flow_constraints.capability_language_flow">Capability Language Flow</h5></div></div></div>


               <pre class="programlisting">&lt;flowConstraints&gt;
   &lt;capabilityLanguageFlow&gt;
     &lt;node&gt;[String]&lt;/node&gt;
     &lt;node&gt;[String]&lt;/node&gt;
     ...
   &lt;/capabilityLanguageFlow&gt;
 &lt;/flowConstraints&gt;</pre>

               <p>If you use <code class="literal">&lt;capabilityLanguageFlow&gt;</code>,
                 the delegate Analysis Engines named by the
                 <code class="literal">&lt;node&gt;</code> elements are called in the given order,
                 except that a delegate Analysis Engine is skipped if any of the following are
                 true (according to that Analysis Engine's declared output
                 capabilities):</p>

               <div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>It cannot produce any of the aggregate
                 Analysis Engine's output capabilities for the language of the
                 current document.</p></li><li class="listitem"><p>All of the output capabilities have already been
                   produced by an earlier Analysis Engine in the flow. </p></li></ul></div>

               <p>For example, if two annotators produce
                 <code class="literal">org.myorg.TokenAnnotation</code> feature structures for
                 the same language, these feature structures will only be produced by the
                 first annotator in the list.</p>

               <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>The flow analysis uses the specific types that are specified in the
               output capabilities, without any expansion for subtypes.  So, if you expect
               a type TT and another type SubTT (which is a subtype of TT) in the output, you
               must include both of them in the output capabilities.</p></div>
             </div>
           </div>

           <div class="section" title="2.4.2.4.&nbsp;External Resource Bindings"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.component_descriptor.aes.aggregate.external_resource_bindings">2.4.2.4.&nbsp;External Resource Bindings</h4></div></div></div>


             <p>Aggregate analysis engine descriptors can declare resource bindings
               that bind resources to dependencies declared in any of the delegate analysis
               engines (or their subcomponents, recursively) within that aggregate. This
               allows resource sharing. Any binding at this level overrides (supersedes)
               any binding specified by a contained component or their subcomponents,
               recursively.</p>

             <p>For example, consider an aggregate Analysis Engine Descriptor that
               contains delegate Analysis Engines with keys
               <code class="literal">annotator1</code> and <code class="literal">annotator2</code> (as
               declared in the <code class="literal">&lt;delegateAnalysisEngine&gt;</code>
               element &#8211; see <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.aggregate.delegates" title="2.4.2.1.&nbsp;Delegate Analysis Engine Specifiers">Section&nbsp;2.4.2.1, &#8220;Delegate Analysis Engine Specifiers&#8221;</a>),
               where <code class="literal">annotator1</code> declares a resource dependency with
               key <code class="literal">myResource</code> and <code class="literal">annotator2</code>
               declares a resource dependency with key <code class="literal">someResource</code>
               .</p>

             <p>Within that aggregate Analysis Engine Descriptor, the following
               <code class="literal">resourceManagerConfiguration</code> would bind both of
               those dependencies to a single external resource file.</p>


             <pre class="programlisting">&lt;resourceManagerConfiguration&gt;

   &lt;externalResources&gt;
     &lt;externalResource&gt;
       &lt;name&gt;ExampleResource&lt;/name&gt;
       &lt;fileResourceSpecifier&gt;
         &lt;fileUrl&gt;file:MyResourceFile.dat&lt;/fileUrl&gt;
       &lt;/fileResourceSpecifier&gt;
     &lt;/externalResource&gt;
   &lt;/externalResources&gt;

   &lt;externalResourceBindings&gt;
     &lt;externalResourceBinding&gt;
       &lt;key&gt;annotator1/myResource&lt;/key&gt;
       &lt;resourceName&gt;ExampleResource&lt;/resourceName&gt;
     &lt;/externalResourceBinding&gt;
     &lt;externalResourceBinding&gt;
       &lt;key&gt;annotator2/someResource&lt;/key&gt;
       &lt;resourceName&gt;ExampleResource&lt;/resourceName&gt;
     &lt;/externalResourceBinding&gt;
   &lt;/externalResourceBindings&gt;

 &lt;/resourceManagerConfiguration&gt;</pre>

             <p>The syntax for the <code class="literal">externalResources</code> declaration
               is exactly the same as described previously. In the resource bindings note the
               use of the compound keys, e.g. <code class="literal">annotator1/myResource</code>.
               This identifies the resource dependency key
               <code class="literal">myResource</code> within the annotator with key
               <code class="literal">annotator1</code>. Compound resource dependencies can be
               multiple levels deep to handle nested aggregate analysis engines.</p>
           </div>

           <div class="section" title="2.4.2.5.&nbsp;Sofa Mappings"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.component_descriptor.aes.aggregate.sofa_mappings">2.4.2.5.&nbsp;Sofa Mappings</h4></div></div></div>


             <p>Sofa mappings are specified between Sofa names declared in this
               aggregate descriptor as part of the
               <code class="literal">&lt;capability&gt;</code> section, and the Sofa names
               declared in the delegate components. For purposes of the mapping, all the
               declarations of Sofas in any of the capability sets contained within the
               <code class="literal">&lt;capabilities&gt; </code>element are considered
               together.</p>


             <pre class="programlisting">&lt;sofaMappings&gt;
   &lt;sofaMapping&gt;
     &lt;componentKey&gt;[keyName]&lt;/componentKey&gt;
     &lt;componentSofaName&gt;[sofaName]&lt;/componentSofaName&gt;
     &lt;aggregateSofaName&gt;[sofaName]&lt;/aggregateSofaName&gt;
   &lt;/sofaMapping&gt;
   ...
 &lt;/sofaMappings&gt;</pre>

             <p>The &lt;componentSofaName&gt; may be omitted in the case where the
               component is not aware of Multiple Views or Sofas. In this case, the UIMA
               framework will arrange for the specified &lt;aggregateSofaName&gt; to be
               the one visible to the delegate component.</p>

             <p>The &lt;componentKey&gt; is the key name for the component as specified
               in the list of delegate components for this aggregate.</p>

             <p>The sofaNames used must be declared as input or output sofas in some
               capability set.</p>
           </div>
         </div>

         <div class="section" title="2.4.3.&nbsp;Configuration Parameters"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.xml.component_descriptor.aes.configuration_parameters">2.4.3.&nbsp;Configuration Parameters</h3></div></div></div>

           <p>Configuration parameters may be declared and set in both Primitive and
           Aggregate descriptors. Parameters set in an aggregate may override parameters set in one or
           more of its delegates.
           </p>
         <div class="section" title="2.4.3.1.&nbsp;Configuration Parameter Declaration"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.component_descriptor.aes.configuration_parameter_declaration">2.4.3.1.&nbsp;Configuration Parameter Declaration</h4></div></div></div>


           <p>Configuration Parameters are made available to annotator
             implementations and applications by the following interfaces:
             </p><div class="itemizedlist"><ul class="itemizedlist" type="circle" compact><li class="listitem" style="list-style-type: circle"><p>
             <code class="literal">AnnotatorContext</code> <sup>[<a name="d5e690" href="#ftn.d5e690" class="footnote">2</a>]</sup> (passed as an argument to the
             initialize() method of a version 1 annotator)</p>
             </li><li class="listitem" style="list-style-type: circle"><p>
             <code class="literal">ConfigurableResource</code> (every Analysis Engine
             implements this interface)</p>
             </li><li class="listitem" style="list-style-type: circle"><p>
             <code class="literal">UimaContext</code> (passed
             as an argument to the initialize() method of a version 2 annotator) (you can get
             this from any resource, including Analysis Engines, using the method
             <code class="literal">getUimaContext</code>()).</p>
             </li></ul></div>

           <p>Use AnnotatorContext within version 1 annotators and UimaContext for
             version 2 annotators and outside of annotators (for instance, in CasConsumers,
             or the containing application) to access configuration parameters.</p>

           <p>Configuration parameters are set from the corresponding elements in the
             XML descriptor for the application. If you need to programmatically change
             parameter settings within an application, you can use methods in
             ConfigurableResource; if you do this, you need to call reconfigure()
             afterwards to have the UIMA framework notify all the contained analysis
             components that the parameter configuration has changed (the analysis
             engine's reinitialize() methods will be called). Note that in the current
             implementation, only integrated deployment components have configuration
             parameters passed to them; remote components obtain their parameters from
             their remote startup environment. This will likely change in the
             future.</p>

           <p>There are two ways to specify the
             <code class="literal">&lt;configurationParameters&gt;</code> section &#8211; as a
             list of configuration parameters or a list of groups. A list of parameters, which
             are not part of any group, looks like this:


             </p><pre class="programlisting">&lt;configurationParameters&gt;
   &lt;configurationParameter&gt;
     &lt;name&gt;[String]&lt;/name&gt;
     &lt;externalOverrideName&gt;[String]&lt;/externalOverrideName&gt;
     &lt;description&gt;[String]&lt;/description&gt;
     &lt;type&gt;String|Integer|Float|Boolean&lt;/type&gt;
     &lt;multiValued&gt;true|false&lt;/multiValued&gt;
     &lt;mandatory&gt;true|false&lt;/mandatory&gt;
     &lt;overrides&gt;
       &lt;parameter&gt;[String]&lt;/parameter&gt;
       &lt;parameter&gt;[String]&lt;/parameter&gt;
         ...
     &lt;/overrides&gt;
   &lt;/configurationParameter&gt;
   &lt;configurationParameter&gt;
     ...
   &lt;/configurationParameter&gt;
     ...
 &lt;/configurationParameters&gt;</pre>

           <p>For each configuration parameter, the following are specified:</p>

           <div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="bold"><strong>name</strong></span>
             &#8211; the name by which the annotator code refers to the parameter. All
             parameters declared in an analysis engine descriptor must have distinct names.
             (required). The name is composed of normal Java identifier characters.</p>
             </li><li class="listitem"><p><span class="bold"><strong>externalOverrideName</strong></span> &#8211; the
               name of a property in an external settings file that if defined overrides
               any value set in this descriptor or in its parent. See <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.external_configuration_parameter_overrides" title="2.4.3.4.&nbsp;External Configuration Parameter Overrides">Section&nbsp;2.4.3.4, &#8220;External Configuration Parameter Overrides&#8221;</a>
               for a discussion of external configuration parameter overrides.
               (optional)</p></li><li class="listitem"><p><span class="bold"><strong>description</strong></span> &#8211; a
               natural language description of the intent of the parameter
               (optional)</p></li><li class="listitem"><p><span class="bold"><strong>type</strong></span> &#8211; the data
               type of the parameter's value &#8211; must be one of
               <code class="literal">String</code>, <code class="literal">Integer</code>,
               <code class="literal">Float</code>, or <code class="literal">Boolean</code>
               (required).</p></li><li class="listitem"><p><span class="bold"><strong>multiValued</strong></span> &#8211;
               <code class="literal">true</code> if the parameter can take multiple-values (an
               array), <code class="literal">false</code> if the parameter takes only a single value
               (optional, defaults to false).</p></li><li class="listitem"><p><span class="bold"><strong>mandatory</strong></span> &#8211;
               <code class="literal">true</code> if a value must be provided for the parameter
               (optional, defaults to false).</p></li><li class="listitem"><p><span class="bold"><strong>overrides</strong></span> &#8211; this
               is used only in aggregate Analysis Engines, but is included here for
               completeness. See <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.aggregate.configuration_parameter_overrides" title="2.4.3.3.&nbsp;Configuration Parameter Overrides">Section&nbsp;2.4.3.3, &#8220;Configuration Parameter Overrides&#8221;</a>
               for a discussion of configuration parameter overriding in aggregate
               Analysis Engines. (optional).</p></li></ul></div>

           <p>A list of groups looks like this:


             </p><pre class="programlisting">&lt;configurationParameters defaultGroup="[String]"
     searchStrategy="none|default_fallback|language_fallback" &gt;

   &lt;commonParameters&gt;
     [zero or more parameters]
   &lt;/commonParameters&gt;

   &lt;configurationGroup names="name1 name2 name3 ..."&gt;
     [zero or more parameters]
   &lt;/configurationGroup&gt;

   &lt;configurationGroup names="name4 name5 ..."&gt;
     [zero or more parameters]
   &lt;/configurationGroup&gt;

   ...

 &lt;/configurationParameters&gt;</pre>

           <p>Both the<code class="literal"> &lt;commonParameters&gt;</code> and
             <code class="literal">&lt;configurationGroup&gt;</code> elements contain zero or
             more <code class="literal">&lt;configurationParameter&gt;</code> elements, with
             the same syntax described above.</p>

           <p>The <code class="literal">&lt;commonParameters&gt;</code> element declares
             parameters that exist in all groups. Each
             <code class="literal">&lt;configurationGroup&gt;</code> element has a names
             attribute, which contains a list of group names separated by whitespace (space
             or tab characters). Names consist of any number of non-whitespace characters;
             however the Component Descriptor Editor tool restricts this to be normal Java
             identifiers, including the period (.) and the dash (-). One configuration group
             will be created for each name, and all of the groups will contain the same set of
             parameters.</p>

           <p>The <code class="literal">defaultGroup</code> attribute specifies the name of the
             group to be used in the case where an annotator does a lookup for a configuration
             parameter without specifying a group name. It may also be used as a fallback if the
             annotator specifies a group that does not exist &#8211; see below.</p>

           <p>The <code class="literal">searchStrategy</code> attribute determines the action
             to be taken when the context is queried for the value of a parameter belonging to a
             particular configuration group, if that group does not exist or does not contain
             a value for the requested parameter. There are currently three possible values:

             </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="bold"><strong>none</strong></span>
               &#8211; there is no fallback; return null if there is no value in the exact group
               specified by the user.</p></li><li class="listitem"><p><span class="bold"><strong>default_fallback</strong></span>
                 &#8211; if there is no value found in the specified group, look in the default
                 group (as defined by the <code class="literal">default</code> attribute)</p>
                 </li><li class="listitem"><p><span class="bold"><strong>language_fallback</strong></span>
                 &#8211; this setting allows for a specific use of configuration parameter
                 groups where the groups names correspond to ISO language and country codes
                 (for an example, see below). The fallback sequence is:
                 <code class="literal">&lt;lang&gt;_&lt;country&gt;_&lt;region&gt; <span class="symbol">&#8594;</span>
                 &lt;lang&gt;_&lt;country&gt; <span class="symbol">&#8594;</span> &lt;lang&gt; <span class="symbol">&#8594;</span>
                 &lt;default&gt;.</code> </p></li></ul></div><p>
             </p>

           <div class="section" title="Example"><div class="titlepage"><div><div><h5 class="title" id="ugr.ref.xml.component_descriptor.aes.configuration_parameter_declaration.example">Example</h5></div></div></div>


             <pre class="programlisting">&lt;configurationParameters defaultGroup="en"
         searchStrategy="language_fallback"&gt;

   &lt;commonParameters&gt;
     &lt;configurationParameter&gt;
       &lt;name&gt;DictionaryFile&lt;/name&gt;
       &lt;description&gt;Location of dictionary for this
            language&lt;/description&gt;
       &lt;type&gt;String&lt;/type&gt;
       &lt;multiValued&gt;false&lt;/multiValued&gt;
       &lt;mandatory&gt;false&lt;/mandatory&gt;
     &lt;/configurationParameter&gt;
   &lt;/commonParameters&gt;

   &lt;configurationGroup names="en de en-US"/&gt;

   &lt;configurationGroup names="zh"&gt;
     &lt;configurationParameter&gt;
       &lt;name&gt;DBC_Strategy&lt;/name&gt;
       &lt;description&gt;Strategy for dealing with double-byte
           characters.&lt;/description&gt;
       &lt;type&gt;String&lt;/type&gt;
       &lt;multiValued&gt;false&lt;/multiValued&gt;
       &lt;mandatory&gt;false&lt;/mandatory&gt;
     &lt;/configurationParameter&gt;
   &lt;/configurationGroup&gt;

 &lt;/configurationParameters&gt;</pre>

             <p>In this example, we are declaring a <code class="literal">DictionaryFile</code>
               parameter that can have a different value for each of the languages that our AE
               supports
               &#8211; English (general), German, U.S. English, and Chinese. For Chinese
               only, we also declare a <code class="literal">DBC_Strategy</code>
               parameter.</p>

             <p>We are using the <code class="literal">language_fallback</code> search
               strategy, so if an annotator requests the dictionary file for the
               <code class="literal">en-GB</code> (British English) group, we will fall back to the
               more general <code class="literal">en</code> group.</p>

             <p>Since we have defined <code class="literal">en</code> as the default group, this
               value will be returned if the context is queried for the
               <code class="literal">DictionaryFile</code> parameter without specifying any
               group name, or if a nonexistent group name is specified.</p>
           </div>
         </div>

         <div class="section" title="2.4.3.2.&nbsp;Configuration Parameter Settings"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.component_descriptor.aes.configuration_parameter_settings">2.4.3.2.&nbsp;Configuration Parameter Settings</h4></div></div></div>


           <p>For configuration parameters that are not part of any group, the
             <code class="literal">&lt;configurationParameterSettings&gt;</code> element
             looks like this:


             </p><pre class="programlisting">&lt;configurationParameterSettings&gt;
   &lt;nameValuePair&gt;
     &lt;name&gt;[String]&lt;/name&gt;
     &lt;value&gt;
       &lt;string&gt;[String]&lt;/string&gt;  |
       &lt;integer&gt;[Integer]&lt;/integer&gt; |
       &lt;float&gt;[Float]&lt;/float&gt; |
       &lt;boolean&gt;true|false&lt;/boolean&gt;  |
       &lt;array&gt; ... &lt;/array&gt;
     &lt;/value&gt;
   &lt;/nameValuePair&gt;

   &lt;nameValuePair&gt;
     ...
   &lt;/nameValuePair&gt;
   ...
 &lt;/configurationParameterSettings&gt;</pre>

           <p>There are zero or more <code class="literal">nameValuePair</code> elements. Each
             <code class="literal">nameValuePair</code> contains a name (which refers to one of the
             configuration parameters) and a value for that parameter.</p>

           <p>The <code class="literal">value</code> element contains an element that matches
             the type of the parameter. For single-valued parameters, this is either
             <code class="literal">&lt;string&gt;</code>, <code class="literal">&lt;integer&gt;</code>
             , <code class="literal">&lt;float&gt;</code>, or
             <code class="literal">&lt;boolean&gt;</code>. For multi-valued parameters, this is
             an <code class="literal">&lt;array&gt;</code> element, which then contains zero or
             more instances of the appropriate type of primitive value, e.g.:


             </p><pre class="programlisting">&lt;array&gt;&lt;string&gt;One&lt;/string&gt;&lt;string&gt;Two&lt;/string&gt;&lt;/array&gt;</pre>

           <p>For parameters declared in configuration groups the
             <code class="literal">&lt;configurationParameterSettings&gt;</code> element
             looks like this:


             </p><pre class="programlisting">&lt;configurationParameterSettings&gt;

   &lt;settingsForGroup name="[String]"&gt;
     [one or more &lt;nameValuePair&gt; elements]
   &lt;/settingsForGroup&gt;

   &lt;settingsForGroup name="[String]"&gt;
     [one or more &lt;nameValuePair&gt; elements]
   &lt;/settingsForGroup&gt;

 ...

 &lt;/configurationParameterSettings&gt;</pre><p>
             where each <code class="literal">&lt;settingsForGroup&gt;</code> element has a name
             that matches one of the configuration groups declared under the
             <code class="literal">&lt;configurationParameters&gt;</code> element and contains
             the parameter settings for that group.</p>

           <div class="section" title="Example"><div class="titlepage"><div><div><h5 class="title" id="ugr.ref.xml.component_descriptor.aes.configuration_parameter_settings.example">Example</h5></div></div></div>


             <p>Here are the settings that correspond to the parameter declarations in
               the previous example:


               </p><pre class="programlisting">&lt;configurationParameterSettings&gt;

   &lt;settingsForGroup name="en"&gt;
     &lt;nameValuePair&gt;
       &lt;name&gt;DictionaryFile&lt;/name&gt;
       &lt;value&gt;&lt;string&gt;resourcesEnglishdictionary.dat&gt;&lt;/string&gt;&lt;/value&gt;
     &lt;/nameValuePair&gt;
   &lt;/settingsForGroup&gt;

   &lt;settingsForGroup name="en-US"&gt;
     &lt;nameValuePair&gt;
       &lt;name&gt;DictionaryFile&lt;/name&gt;
       &lt;value&gt;&lt;string&gt;resourcesEnglish_USdictionary.dat&lt;/string&gt;&lt;/value&gt;
     &lt;/nameValuePair&gt;
   &lt;/settingsForGroup&gt;

   &lt;settingsForGroup name="de"&gt;
     &lt;nameValuePair&gt;
       &lt;name&gt;DictionaryFile&lt;/name&gt;
       &lt;value&gt;&lt;string&gt;resourcesDeutschdictionary.dat&lt;/string&gt;&lt;/value&gt;
     &lt;/nameValuePair&gt;
   &lt;/settingsForGroup&gt;

   &lt;settingsForGroup name="zh"&gt;
     &lt;nameValuePair&gt;
       &lt;name&gt;DictionaryFile&lt;/name&gt;
       &lt;value&gt;&lt;string&gt;resourcesChinesedictionary.dat&lt;/string&gt;&lt;/value&gt;
     &lt;/nameValuePair&gt;

     &lt;nameValuePair&gt;
       &lt;name&gt;DBC_Strategy&lt;/name&gt;
       &lt;value&gt;&lt;string&gt;default&lt;/string&gt;&lt;/value&gt;
     &lt;/nameValuePair&gt;

   &lt;/settingsForGroup&gt;

 &lt;/configurationParameterSettings&gt;</pre>
           </div>
           </div>

           <div class="section" title="2.4.3.3.&nbsp;Configuration Parameter Overrides"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.component_descriptor.aes.aggregate.configuration_parameter_overrides">2.4.3.3.&nbsp;Configuration Parameter Overrides</h4></div></div></div>


             <p>In an aggregate Analysis Engine Descriptor, each
               <code class="literal">&lt;configurationParameter&gt; </code>element should
               contain an <code class="literal">&lt;overrides&gt;</code> element, with the
               following syntax:</p>


             <pre class="programlisting">&lt;overrides&gt;

   &lt;parameter&gt;
     [delegateAnalysisEngineKey]/[parameterName]
   &lt;/parameter&gt;

   &lt;parameter&gt;
     [delegateAnalysisEngineKey]/[parameterName]
   &lt;/parameter&gt;
   ...

 &lt;/overrides&gt;</pre>

             <p>Since aggregate Analysis Engines have no code associated with them, the
               only way in which their configuration parameters can affect their processing
               is by overriding the parameter values of one or more delegate analysis
               engines. The <code class="literal">&lt;overrides&gt; </code>element determines
               which parameters, in which delegate Analysis Engines, are overridden by this
               configuration parameter.</p>

             <p>For example, consider an aggregate Analysis Engine Descriptor that
               contains delegate Analysis Engines with keys
               <code class="literal">annotator1</code> and <code class="literal">annotator2</code> (as
               declared in the &lt;delegateAnalysisEngine&gt; element &#8211; see <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.aggregate.delegates" title="2.4.2.1.&nbsp;Delegate Analysis Engine Specifiers">Section&nbsp;2.4.2.1, &#8220;Delegate Analysis Engine Specifiers&#8221;</a>) and also declares a
               configuration parameter as follows:


               </p><pre class="programlisting">&lt;configurationParameter&gt;
   &lt;name&gt;AggregateParam&lt;/name&gt;
   &lt;type&gt;String&lt;/type&gt;
   &lt;overrides&gt;
     &lt;parameter&gt;annotator1/param1&lt;/parameter&gt;
     &lt;parameter&gt;annotator2/param2&lt;/parameter&gt;
   &lt;/overrides&gt;
 &lt;/configurationParameter&gt;</pre>

             <p>The value of the <code class="literal">AggregateParam</code> parameter
               (whether assigned in the aggregate descriptor or at runtime by an
               application) will override the value of parameter
               <code class="literal">param1</code> in <code class="literal">annotator1</code> and also
               override the value of parameter <code class="literal">param2</code> in
               <code class="literal">annotator2</code>. No other parameters will be
               affected.  Note that <code class="literal">AggregateParam</code> may itself be overridden by a
               parameter in an outer aggregate that has this aggregate as one of its delegates.
             </p>

             <p>Prior to release 2.4.1, if an aggregate Analysis Engine descriptor
               declared a configuration parameter with no explicit overrides, that
               parameter would override any parameters having the same name within any
               delegate analysis engine. Starting with release 2.4.1, support for this
               usage has been dropped.</p>

           </div>


           <div class="section" title="2.4.3.4.&nbsp;External Configuration Parameter Overrides"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.component_descriptor.aes.external_configuration_parameter_overrides">2.4.3.4.&nbsp;External Configuration Parameter Overrides</h4></div></div></div>


             <p>
             External parameter overrides are usually declared in primitive descriptors as a way to
             easily modify the parameters in some or all of an application's annotators.
             By using external settings files and shared parameter names the configuration
             information can be specified without regard for a particular descriptor hierachy.
             </p>

             <p>
             Configuration parameter declarations in primitive and aggregate descriptors may
             include an <code class="literal">&lt;externalOverrideName&gt;</code> element,
             which specifies the name of a property that may be defined in an external settings file.
             If this element is present, and if a entry can be found for its name in a settings
             files, then this value overrides the value otherwise specified for this parameter.
             </p>

             <p>
             The value overrides any value set in this descriptor or set by an override in a parent
             aggregate.  In primitive descriptors the value set by an external override is always
             applied.  In aggregate descriptors the value set by an external override applies to the
             aggregate parameter, and is passed down to the overridden delegate parameters in the
             usual way, i.e. only if the delegate's parameter has not been set by an external override.
             </p>

             <p>
             Im the absence of external overrides,
             parameter evaluation can be viewed as proceeding from the primitive descriptor up through
             any aggregates containing overrides, taking the last setting found.  With external
             overrides the search ends with the first external override found that has a value
             assigned by a settings file.
             </p>

             <p>
             The same external name may be used for multiple parameters;
             the effect of this is that one setting will override multiple parameters.
             </p>

             <p>
             The settings for all descriptors in a pipeline are usually loaded from one or more files
             whose names are obtained from the Java system property <span class="emphasis"><em>UimaExternalOverrides</em></span>.
             The value of the property must be a comma-separated list of resource names.  If the name
             has a prefix of "file:" or no prefix, the filesystem is searched.  If the name has a
             prefix of "path:" the rest must be a Java-style dotted name, similar to the name
             attribute for descriptor imports.  The dots are replaced by file separators and a suffix
             of ".settings" is appended before searching the datapath and classpath.
             e.g. <code class="literal">&#8722;DUimaExternalOverrides=/data/file1.settings,file:relative/file2.settings,path:org.apache.uima.resources.file3</code>.
             </p>

             <p>
             Override settings may also be specified when creating an analysis engine by putting a
             <code class="literal">Settings</code> object in the additional parameters map for the
             <code class="literal">produceAnalysisEngine</code> method.  In this case the
             Java system property <span class="emphasis"><em>UimaExternalOverrides</em></span> is ignored.
             </p><pre class="programlisting">  // Construct an analysis engine that uses two settings files
   Settings extSettings =
       UIMAFramework.getResourceSpecifierFactory().createSettings();
   for (String fname : new String[] { "externalOverride.settings",
                                      "default.settings" }) {
     FileInputStream fis = new FileInputStream(fname);
     extSettings.load(fis);
     fis.close();
   }
   Map&lt;String,Object&gt; aeParms = new HashMap&lt;String,Object&gt;();
   aeParms.put(Resource.PARAM_EXTERNAL_OVERRIDE_SETTINGS, extSettings);
   AnalysisEngine ae = UIMAFramework.produceAnalysisEngine(desc, aeParms);
             </pre><p>
             </p>

             <p>
             These external settings consist of key - value pairs stored in a
             file using the UTF-8 character encoding, and written in a style similar to that
             of Java properties files.
             </p><div class="itemizedlist"><ul class="itemizedlist" type="circle" compact><li class="listitem" style="list-style-type: circle"><p>
             Leading whitespace is ignored.
             </p></li><li class="listitem" style="list-style-type: circle"><p>
             Comment lines start with '#' or '!'.
             </p></li><li class="listitem" style="list-style-type: circle"><p>
             The key and value are separated by whitespace, '=' or ':'.
             </p></li><li class="listitem" style="list-style-type: circle"><p>
             Keys must contain at least one character and only letters, digits, or the characters '. / - ~ _'.
             </p></li><li class="listitem" style="list-style-type: circle"><p>
             If a line ends with '\' it is extended with the following line (after removing any
             leading whitespace.)
             </p></li><li class="listitem" style="list-style-type: circle"><p>
             Whitespace is trimmed from both keys and values.
             </p></li><li class="listitem" style="list-style-type: circle"><p>
             Duplicate key values are ignored &#8211; once a value is assigned to a key it cannot be changed.
             </p></li><li class="listitem" style="list-style-type: circle"><p>
             Values may reference other settings using the syntax '${key}'.
             </p></li><li class="listitem" style="list-style-type: circle"><p>
             Array values are represented as a list of strings separated by commas or line breaks,
             and bracketed by the '[ ]' characters.  The value must start with an '[' and is
             terminated by the first unescaped ']' which must be at the end of a line.
             The elements of an array (and hence the array size) may be indirectly specified using
             the '${key}' syntax but the brackets '[ ]' must be explicitly specified.
             </p></li><li class="listitem" style="list-style-type: circle"><p>
             In values the special characters '$ { } [ , ] \' are treated as regular characters if
             preceeded by the escape character '\'.
             </p></li></ul></div><p>
       </p><pre class="programlisting">
 key1  :  value1
  key2 =  value  2
   key3   element2, element3, element4
  # Next assignment is ignored as key3 has already been set
 key3  :   value ignored
 key4  =  [ array element1, ${key3}, element5
            element6 ]
 key5     value with a reference ${key1} to key1
 key6  :  long value string \
          continued from previous line (with leading whitespace stripped)
 key7  =  value without a reference \${not-a-key}
 key8     \[ value that is not an array ]
 key9  :  [ array element1\, with embedded comma, element2 ]
 </pre><p>
             </p>

             <p>
             Multiple settings files are allowed; they are loaded in order, such that
             early ones take precedence over later ones, following the first-assignment-wins rule.
             So, if you have lots of settings,
             you can put the defaults in one file, and then in a earlier file, override just the
             ones you need to.
             </p>

             <p>
             An external override name may be specified for a parameter declared in a group, but if
             the parameter is in the common group or the group is declared with multiple names, the
             external name is shared amongst all, i.e. these parameters cannot be given group-specific values.
             </p>
           </div>

           <div class="section" title="2.4.3.5.&nbsp;Direct Access to External Configuration Parameters"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.component_descriptor.aes.external_configuration_parameter_access">2.4.3.5.&nbsp;Direct Access to External Configuration Parameters</h4></div></div></div>


             <p>
             Annotators and flow controllers can directly access these shared configuration
             parameters from their UimaContext.
             Direct access means an access where the key to select the shared parameter is the
             parameter name as specified in the external configuration settings file.
 			</p><pre class="programlisting">
 String value = aContext.getSharedSettingValue(paramName);
 String values[] = aContext.getSharedSettingArray(arrayParamName);
 String allNames[] = aContext.getSharedSettingNames();
 			</pre><p>
             Java code called by an annotator or flow controller in the same thread or a child thread
             can use the <code class="literal">UimaContextHolder</code> to get the annotator's UimaContext and
             hence access the shared configuration parameters.
 			</p><pre class="programlisting">
 UimaContext uimaContext = UimaContextHolder.getUimaContext();
 if (uimaContext != null) {
   value = uimaContext.getSharedSettingValue(paramName);
 }
 			</pre><p>
 			The UIMA framework puts the context in an InheritableThreadLocal variable.  The value
 			will be null if <code class="literal">getUimaContext</code> is not invoked by an annotator or flow
 			controller on the same thread or a child thread.
             </p>
           </div>

           <div class="section" title="2.4.3.6.&nbsp;Other Uses for External Configuration Parameters"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.component_descriptor.aes.other_uses_for_external_configuration_parameters">2.4.3.6.&nbsp;Other Uses for External Configuration Parameters</h4></div></div></div>

 			<p>
             Explicit references to shared configuration parameters can be specified as part of the
             value of the name and location attributes of the <code class="literal">import</code> element
 			and in the value of the fileUrl for a <code class="literal">fileResourceSpecifier</code>
 			(see <a class="xref" href="#ugr.ref.xml.component_descriptor.imports" title="2.2.&nbsp;Imports">Section&nbsp;2.2, &#8220;Imports&#8221;</a> and <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.primitive.resource_manager_configuration" title="2.4.1.9.&nbsp;Resource Manager Configuration">Section&nbsp;2.4.1.9, &#8220;Resource Manager Configuration&#8221;</a>).
             </p>
 		  </div>

         </div>
       </div>


   <div class="section" title="2.5.&nbsp;Flow Controller Descriptors"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.xml.component_descriptor.flow_controller">2.5.&nbsp;Flow Controller Descriptors</h2></div></div></div>


     <p>The basic structure of a Flow Controller Descriptor is as follows:


       </p><pre class="programlisting">&lt;?xml version="1.0" ?&gt;
 &lt;flowControllerDescription
     xmlns="http://uima.apache.org/resourceSpecifier"&gt;

   &lt;frameworkImplementation&gt;org.apache.uima.java&lt;/frameworkImplementation&gt;

   &lt;implementationName&gt;[ClassName]&lt;/implementationName&gt;

   &lt;processingResourceMetaData&gt;
     ...
   &lt;/processingResourceMetaData&gt;

   &lt;externalResourceDependencies&gt;
     ...
   &lt;/externalResourceDependencies&gt;

   &lt;resourceManagerConfiguration&gt;
     ...
   &lt;/resourceManagerConfiguration&gt;

 &lt;/flowControllerDescription&gt;</pre>

     <p>The <code class="literal">frameworkImplementation</code> element must always be set to
       the value <code class="literal">org.apache.uima.java</code>.</p>

     <p>The <code class="literal">implementationName</code> element must contain the
       fully-qualified class name of the Flow Controller implementation. This must name a
       class that implements the <code class="literal">FlowController</code> interface.</p>

     <p>The <code class="literal">processingResourceMetaData</code> element contains
       essentially the same information as a Primitive Analysis Engine Descriptor's
       <code class="literal">analysisEngineMetaData</code> element, described in <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.metadata" title="2.4.1.2.&nbsp;Analysis Engine MetaData">Section&nbsp;2.4.1.2, &#8220;Analysis Engine MetaData&#8221;</a>.</p>

     <p>The <code class="literal">externalResourceDependencies</code> and
       <code class="literal">resourceManagerConfiguration</code> elements are exactly the same as
       in Primitive Analysis Engine Descriptors (see <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.primitive.external_resource_dependencies" title="2.4.1.8.&nbsp;External Resource Dependencies">Section&nbsp;2.4.1.8, &#8220;External Resource Dependencies&#8221;</a> and <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.primitive.resource_manager_configuration" title="2.4.1.9.&nbsp;Resource Manager Configuration">Section&nbsp;2.4.1.9, &#8220;Resource Manager Configuration&#8221;</a>).</p>

   </div>

   <div class="section" title="2.6.&nbsp;Collection Processing Component Descriptors"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.xml.component_descriptor.collection_processing_parts">2.6.&nbsp;Collection Processing Component Descriptors</h2></div></div></div>


     <p>There are three types of Collection Processing Components &#8211; Collection
       Readers, CAS Initializers (deprecated as of UIMA Version 2), and CAS Consumers. Each
       type of component has a corresponding descriptor. The structure of these descriptors
       is very similar to that of primitive Analysis Engine Descriptors.</p>

     <div class="section" title="2.6.1.&nbsp;Collection Reader Descriptors"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.xml.component_descriptor.collection_processing_parts.collection_reader">2.6.1.&nbsp;Collection Reader Descriptors</h3></div></div></div>


       <p>The basic structure of a Collection Reader descriptor is as follows:


         </p><pre class="programlisting">&lt;?xml version="1.0" ?&gt;
 &lt;collectionReaderDescription
     xmlns="http://uima.apache.org/resourceSpecifier"&gt;

   &lt;frameworkImplementation&gt;org.apache.uima.java&lt;/frameworkImplementation&gt;
   &lt;implementationName&gt;[ClassName]&lt;/implementationName&gt;

   &lt;processingResourceMetaData&gt;
     ...
   &lt;/processingResourceMetaData&gt;

   &lt;externalResourceDependencies&gt;
    ...
   &lt;/externalResourceDependencies&gt;

   &lt;resourceManagerConfiguration&gt;

    ...

   &lt;/resourceManagerConfiguration&gt;

 &lt;/collectionReaderDescription&gt;</pre>

       <p>The <code class="literal">frameworkImplementation</code> element must always be set
         to the value <code class="literal">org.apache.uima.java</code>.</p>

       <p>The <code class="literal">implementationName</code> element contains the
         fully-qualified class name of the Collection Reader implementation. This must name
         a class that implements the <code class="literal">CollectionReader</code>
         interface.</p>

       <p>The <code class="literal">processingResourceMetaData</code> element contains
         essentially the same information as a Primitive Analysis Engine
         Descriptor's' <code class="literal">analysisEngineMetaData</code> element:


         </p><pre class="programlisting">&lt;processingResourceMetaData&gt;

   &lt;name&gt; [String] &lt;/name&gt;
   &lt;description&gt;[String]&lt;/description&gt;
   &lt;version&gt;[String]&lt;/version&gt;
   &lt;vendor&gt;[String]&lt;/vendor&gt;

   &lt;configurationParameters&gt;
      ...
   &lt;/configurationParameters&gt;

   &lt;configurationParameterSettings&gt;
     ...
   &lt;/configurationParameterSettings&gt;

   &lt;typeSystemDescription&gt;
    ...
   &lt;/typeSystemDescription&gt;

   &lt;typePriorities&gt;
    ...
   &lt;/typePriorities&gt;

   &lt;fsIndexes&gt;
    ...
   &lt;/fsIndexes&gt;

   &lt;capabilities&gt;
    ...
   &lt;/capabilities&gt;

 &lt;/processingResourceMetaData&gt;</pre>

       <p>The contents of these elements are the same as that described in <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.metadata" title="2.4.1.2.&nbsp;Analysis Engine MetaData">Section&nbsp;2.4.1.2, &#8220;Analysis Engine MetaData&#8221;</a>, with the exception that the capabilities
         section should not declare any inputs (because the Collection Reader is always the
         first component to receive the CAS).</p>

       <p>The <code class="literal">externalResourceDependencies</code> and
         <code class="literal">resourceManagerConfiguration</code> elements are exactly the same
         as in the Primitive Analysis Engine Descriptors (see <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.primitive.external_resource_dependencies" title="2.4.1.8.&nbsp;External Resource Dependencies">Section&nbsp;2.4.1.8, &#8220;External Resource Dependencies&#8221;</a> and <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.primitive.resource_manager_configuration" title="2.4.1.9.&nbsp;Resource Manager Configuration">Section&nbsp;2.4.1.9, &#8220;Resource Manager Configuration&#8221;</a>).</p>

     </div>
     <div class="section" title="2.6.2.&nbsp;CAS Initializer Descriptors (deprecated)"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.xml.component_descriptor.collection_processing_parts.cas_initializer">2.6.2.&nbsp;CAS Initializer Descriptors (deprecated)</h3></div></div></div>


       <p>The basic structure of a CAS Initializer Descriptor is as follows:


         </p><pre class="programlisting">&lt;?xml version="1.0" encoding="UTF-8" ?&gt;
 &lt;casInitializerDescription
     xmlns="http://uima.apache.org/resourceSpecifier"&gt;

   &lt;frameworkImplementation&gt;org.apache.uima.java&lt;/frameworkImplementation&gt;
   &lt;implementationName&gt;[ClassName] &lt;/implementationName&gt;

   &lt;processingResourceMetaData&gt;
     ...
   &lt;/processingResourceMetaData&gt;

   &lt;externalResourceDependencies&gt;
     ...
   &lt;/externalResourceDependencies&gt;

   &lt;resourceManagerConfiguration&gt;
     ...
   &lt;/resourceManagerConfiguration&gt;

 &lt;/casInitializerDescription&gt;</pre>

       <p>The <code class="literal">frameworkImplementation</code> element must always be set
         to the value <code class="literal">org.apache.uima.java</code>.</p>

       <p>The <code class="literal">implementationName</code> element contains the
         fully-qualified class name of the CAS Initializer implementation. This must name a
         class that implements the <code class="literal">CasInitializer</code> interface.</p>

       <p>The <code class="literal">processingResourceMetaData</code> element contains
         essentially the same information as a Primitive Analysis Engine
         Descriptor's' <code class="literal">analysisEngineMetaData</code> element,
         as described in <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.metadata" title="2.4.1.2.&nbsp;Analysis Engine MetaData">Section&nbsp;2.4.1.2, &#8220;Analysis Engine MetaData&#8221;</a>, with the exception of some
         changes to the capabilities section. A CAS Initializer's capabilities
         element looks like this:


         </p><pre class="programlisting">&lt;capabilities&gt;
   &lt;capability&gt;
     &lt;outputs&gt;
       &lt;type allAnnotatorFeatures="true|false"&gt;[String]&lt;/type&gt;
       &lt;type&gt;[TypeName]&lt;/type&gt;
       ...
       &lt;feature&gt;[TypeName]:[Name]&lt;/feature&gt;
       ...
     &lt;/outputs&gt;

     &lt;outputSofas&gt;
       &lt;sofaName&gt;[name]&lt;/sofaName&gt;
       ...
     &lt;/outputSofas&gt;

     &lt;mimeTypesSupported&gt;
       &lt;mimeType&gt;[MIME Type]&lt;/mimeType&gt;
       ...
     &lt;/mimeTypesSupported&gt;
   &lt;/capability&gt;

   &lt;capability&gt;
     ...
   &lt;/capability&gt;
   ...
 &lt;/capabilities&gt;</pre>

       <p>The differences between a CAS Initializer's capabilities declaration
         and an Analysis Engine's capabilities declaration are that the CAS Initializer does not
         declare any input CAS types and features or input Sofas (because it is always the first
         to operate on a CAS), it doesn't have a language specifier, and that the CAS
         Initializer may declare a set of MIME types that it supports for its input documents.
         Examples include: text/plain, text/html, and application/pdf. For a list of MIME
         types see <a class="ulink" href="http://www.iana.org/assignments/media-types/" target="_top">http://www.iana.org/assignments/media-types/</a>. This
         information is currently only for users' information, the framework does not
         use it for anything. This may change in future versions.</p>

       <p>The <code class="literal">externalResourceDependencies</code> and
         <code class="literal">resourceManagerConfiguration</code> elements are exactly the same
         as in the Primitive Analysis Engine Descriptors (see <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.primitive.external_resource_dependencies" title="2.4.1.8.&nbsp;External Resource Dependencies">Section&nbsp;2.4.1.8, &#8220;External Resource Dependencies&#8221;</a> and <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.primitive.resource_manager_configuration" title="2.4.1.9.&nbsp;Resource Manager Configuration">Section&nbsp;2.4.1.9, &#8220;Resource Manager Configuration&#8221;</a>).</p>

     </div>
     <div class="section" title="2.6.3.&nbsp;CAS Consumer Descriptors"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.xml.component_descriptor.collection_processing_parts.cas_consumer">2.6.3.&nbsp;CAS Consumer Descriptors</h3></div></div></div>


       <p>The basic structure of a CAS Consumer Descriptor is as follows:


         </p><pre class="programlisting">&lt;?xml version="1.0" encoding="UTF-8" ?&gt;
 &lt;casConsumerDescription
     xmlns="http://uima.apache.org/resourceSpecifier"&gt;

   &lt;frameworkImplementation&gt;org.apache.uima.java&lt;/frameworkImplementation&gt;

   &lt;implementationName&gt;[ClassName]&lt;/implementationName&gt;

   &lt;processingResourceMetaData&gt;
     ...
   &lt;/processingResourceMetaData&gt;

   &lt;externalResourceDependencies&gt;
     ...
   &lt;/externalResourceDependencies&gt;

   &lt;resourceManagerConfiguration&gt;
     ...
   &lt;/resourceManagerConfiguration&gt;
 &lt;/casConsumerDescription&gt;</pre>

         <p>The <code class="literal">frameworkImplementation</code> element currently must
           have the value <code class="literal">org.apache.uima.java</code>, or
            <code class="literal">org.apache.uima.cpp</code>.</p>

         <p>The next subelement,<code class="literal">
           &lt;annotatorImplementationName&gt;</code> is how the UIMA framework
           determines which annotator class to use. This should contain a fully-qualified
           Java class name for Java implementations, or the name of a .dll or .so file for C++
           implementations.</p>
       <p>The <code class="literal">frameworkImplementation</code> element must always be set
         to the value <code class="literal">org.apache.uima.java</code>.</p>

       <p>The <code class="literal">implementationName</code> element must contain the
         fully-qualified class name of the CAS Consumer implementation, or the name
         of a .dll or .so file for C++ implementations.  For Java, the named class must
         implement the <code class="literal">CasConsumer</code> interface.</p>

       <p>The <code class="literal">processingResourceMetaData</code> element contains
         essentially the same information as a Primitive Analysis Engine Descriptor's
         <code class="literal">analysisEngineMetaData</code> element, described in <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.metadata" title="2.4.1.2.&nbsp;Analysis Engine MetaData">Section&nbsp;2.4.1.2, &#8220;Analysis Engine MetaData&#8221;</a>, except that the CAS Consumer Descriptor's
         <code class="literal">capabilities</code> element should not declare outputs or
         outputSofas (since CAS Consumers do not modify the CAS).</p>

       <p>The <code class="literal">externalResourceDependencies</code> and
         <code class="literal">resourceManagerConfiguration</code> elements are exactly the same
         as in Primitive Analysis Engine Descriptors (see <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.primitive.external_resource_dependencies" title="2.4.1.8.&nbsp;External Resource Dependencies">Section&nbsp;2.4.1.8, &#8220;External Resource Dependencies&#8221;</a> and <a class="xref" href="#ugr.ref.xml.component_descriptor.aes.primitive.resource_manager_configuration" title="2.4.1.9.&nbsp;Resource Manager Configuration">Section&nbsp;2.4.1.9, &#8220;Resource Manager Configuration&#8221;</a>).</p>

     </div>
   </div>

   <div class="section" title="2.7.&nbsp;Service Client Descriptors"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.xml.component_descriptor.service_client">2.7.&nbsp;Service Client Descriptors</h2></div></div></div>


     <p>Service Client Descriptors specify only a location of a remote service. They are
       therefore much simpler in structure. In the UIMA SDK, a Service Client Descriptor that
       refers to a valid Analysis Engine or CAS Consumer service can be used in place of the
       actual Analysis Engine or CAS Consumer Descriptor. The UIMA SDK will handle the details
       of calling the remote service. (For details on <span class="emphasis"><em>deploying</em></span> an
       Analysis Engine or CAS Consumer as a service, see <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.application.remote_services" class="olink">Section&nbsp;3.6, &#8220;Working with Remote Services&#8221;</a>.</p>

     <p>The UIMA SDK is extensible to support different types of remote services. In future
       versions, there may be different variations of service client descriptors that cater
       to different types of services. For now, the only type of service client descriptor is
       the <code class="literal">uriSpecifier</code>, which supports the SOAP and Vinci
       protocols.</p>


     <pre class="programlisting">&lt;?xml version="1.0" encoding="UTF-8" ?&gt;
 &lt;uriSpecifier xmlns="http://uima.apache.org/resourceSpecifier"&gt;
   &lt;resourceType&gt;AnalysisEngine | CasConsumer &lt;/resourceType&gt;
   &lt;uri&gt;[URI]&lt;/uri&gt;
   &lt;protocol&gt;SOAP | SOAPwithAttachments | Vinci&lt;/protocol&gt;
   &lt;timeout&gt;[Integer]&lt;/timeout&gt;
   &lt;parameters&gt;
     &lt;parameter name="VNS_HOST" value="some.internet.ip.name-or-address"/&gt;
     &lt;parameter name="VNS_PORT" value="9000"/&gt;
     &lt;parameter name="GetMetaDataTimeout" value="[Integer]"/&gt;
   &lt;/parameters&gt;
 &lt;/uriSpecifier&gt;</pre>

     <p>The <code class="literal">resourceType</code> element is required for new descriptors,
       but is currently allowed to be omitted for backward compatibility. It specifies the
       type of component (Analysis Engine or CAS Consumer) that is implemented by the service
       endpoint described by this descriptor.</p>

     <p>The <code class="literal">uri</code> element contains the URI for the web service. (Note
       that in the case of Vinci, this will be the service name, which is looked up in the Vinci
       Naming Service.)</p>

     <p>The <code class="literal">protocol</code> element may be set to SOAP,
       SOAPwithAttachments, or Vinci; other protocols may be added later. These specify the
       particular data transport format that will be used.</p>

     <p>The <code class="literal">timeout</code> element is optional. If present, it specifies
       the number of milliseconds to wait for a request to be processed before an exception is
       thrown. A value of zero or less will wait forever. If no timeout is specified, a default
       value (currently 60 seconds) will be used.</p>

     <p>The parameters element is optional. If present, it can specify values for each
       of the following:
     </p>
     <div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><code class="literal">VNS_HOST</code>: host name for the Vinci naming service.
       </p></li><li class="listitem"><p><code class="literal">VNS_PORT</code>: port number for the Vinci naming service.
       </p></li><li class="listitem"><p><code class="literal">GetMetaDataTimeout</code>: timeout period (in milliseconds) for
           the GetMetaData call.  If not specified, the default is 60 seconds.  This may need
           to be set higher if there are a lot of clients competing for connections to the service.
       </p></li></ul></div>

     <p>If the <code class="literal">VNS_HOST</code> and <code class="literal">VNS_PORT</code> are not specified
       in the descriptor, the values used for these comes from
       parameters passed on the Java command line using the
       <code class="literal">&#8722;DVNS_HOST=&lt;host&gt;</code> and/or
       <code class="literal">&#8722;DVNS_PORT=&lt;port&gt;</code> system arguments. If not present, and
       a system argument is also not present, the values for these default to
       <code class="literal">localhost</code> for the <code class="literal">VNS_HOST</code> and
       <code class="literal">9000</code> for the <code class="literal">VNS_PORT</code>.</p>

     <p>For details on how to deploy and call Analysis Engine and CAS Consumer services, see
         <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.application.remote_services" class="olink">Section&nbsp;3.6, &#8220;Working with Remote Services&#8221;</a>.</p>

   </div>

   <div class="section" title="2.8.&nbsp;Custom Resource Specifiers"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.xml.component_descriptor.custom_resource_specifiers">2.8.&nbsp;Custom Resource Specifiers</h2></div></div></div>

 	<p>A Custom Resource Specifier allows you to plug in your own Java class as a UIMA Resource.
 		For example you can support a new service protocol by plugging in a Java class that implements
 		the UIMA <code class="literal">AnalysisEngine</code> interface and communicates with the remote service.</p>

 	<p>A Custom Resource Specifier has the following format:</p>
     <pre class="programlisting">&lt;?xml version="1.0" encoding="UTF-8" ?&gt;
 &lt;customResourceSpecifier xmlns="http://uima.apache.org/resourceSpecifier"&gt;
   &lt;resourceClassName&gt;[Java Class Name]&lt;/resourceClassName&gt;
   &lt;parameters&gt;
     &lt;parameter name="[String]" value="[String]"/&gt;
     &lt;parameter name="[String]" value="[String]"/&gt;
   &lt;/parameters&gt;
 &lt;/customResourceSpecifier&gt;</pre>

 	<p>The <code class="literal">resourceClassName</code> element must contain the fully-qualified name of a Java class
 	that can be found in the classpath (including the UIMA extension classpath, if you have specified one using
 	the <code class="literal">ResourceManager.setExtensionClassPath</code> method).  This class must implement the
 	UIMA <code class="literal">Resource</code> interface.</p>

 	<p>When an application calls the <code class="literal">UIMAFramework.produceResource</code> method and passes a
 	<code class="literal">CustomResourceSpecifier</code>, the UIMA framework will load the named class and call its
 	<code class="literal">initialize(ResourceSpecifier,Map)</code> method, passing the <code class="literal">CustomResourceSpecifier</code>
 	as the first argument.  Your class can override the <code class="literal">initialize</code> method and use the
 	<code class="literal">CustomResourceSpecifier</code> API to get access to the <code class="literal">parameter</code> names and values
 	specified in the XML.</p>

 	<p>If you are using a custom resource specifier to plug in a class that implements a new service protocol,
 	your class must also implement the <code class="literal">AnalysisEngine</code> interface.  Generally it should also
 	extend <code class="literal">AnalysisEngineImplBase</code>.  The key methods that should be implemented are
 	<code class="literal">getMetaData</code>, <code class="literal">processAndOutputNewCASes</code>,
 	<code class="literal">collectionProcessComplete</code>, and <code class="literal">destroy</code>.</p>
   </div>
 <div class="footnotes"><br><hr width="100" align="left"><div class="footnote"><p><sup>[<a id="ftn.d5e71" href="#d5e71" class="para">1</a>] </sup>This component is deprecated and should not be use in new
     development.</p></div><div class="footnote"><p><sup>[<a id="ftn.d5e690" href="#d5e690" class="para">2</a>] </sup>Deprecated; use
             UimaContext instead.</p></div></div></div>
   <div class="chapter" title="Chapter&nbsp;3.&nbsp;Collection Processing Engine Descriptor Reference" id="ugr.ref.xml.cpe_descriptor"><div class="titlepage"><div><div><h2 class="title">Chapter&nbsp;3.&nbsp;Collection Processing Engine Descriptor Reference</h2></div></div></div>


   <p>A UIMA <span class="emphasis"><em>Collection Processing Engine</em></span> (CPE) is a combination
     of UIMA components assembled to analyze a collection of artifacts. A CPE is an
     instantiation of the UIMA <span class="emphasis"><em>Collection Processing Architecture</em></span>,
     which defines the collection processing components, interfaces, and APIs. A CPE is
     executed by a UIMA framework component called the <span class="emphasis"><em>Collection Processing
     Manager</em></span> (CPM), which provides a number of services for deploying CPEs,
     running CPEs, and handling errors.</p>

   <p>A CPE can be assembled programmatically within a Java application, or it can be
     assembled declaratively via a CPE configuration specification, called a CPE
     Descriptor. This chapter describes the format of the CPE Descriptor.</p>

   <p>Details about the CPE, including its function, sub-components, APIs, and related
     tools, can be found in <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.cpe" class="olink">Chapter&nbsp;2, <i>Collection Processing Engine Developer's Guide</i></a>. Here we briefly summarize the CPE to define terms and
     provide context for the later sections that describe the CPE Descriptor.</p>

   <div class="section" title="3.1.&nbsp;CPE Overview"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.xml.cpe_descriptor.overview">3.1.&nbsp;CPE Overview</h2></div></div></div>


     <div class="figure"><a name="ugr.ref.xml.cpe_descriptor.overview.fig.runtime"></a><div class="figure-contents">

       <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="574"><tr><td><img src="images/references/ref.xml.cpe_descriptor/image002.png" width="574" alt="CPE Runtime Overview diagram"></td></tr></table></div>
     </div><p class="title"><b>Figure&nbsp;3.1.&nbsp;CPE Runtime Overview</b></p></div><br class="figure-break">

     <p>An illustration of the CPE runtime is shown in <a class="xref" href="#ugr.ref.xml.cpe_descriptor.overview.fig.runtime" title="Figure&nbsp;3.1.&nbsp;CPE Runtime Overview">Figure&nbsp;3.1, &#8220;CPE Runtime Overview&#8221;</a>. Some of the CPE components, such as the
       <span class="emphasis"><em>queues</em></span> and <span class="emphasis"><em>processing pipelines</em></span>, are
       internal to the CPE, but their behavior and deployment may be configured using the CPE
       Descriptor. Other CPE components, such as the <span class="emphasis"><em>Collection
       Reader</em></span> and <span class="emphasis"><em>CAS Processors</em></span>, are defined and
       configured externally from the CPE and then plugged in to the CPE to create the overall
       engine. The parts of a CPE are:

       </p><div class="variablelist"><dl><dt><span class="term">Collection Reader</span></dt><dd><p>understands the native data collection format and iterates
             over the collection producing subjects of analysis</p></dd><dt><span class="term">CAS Initializer<sup>[<a name="d5e1067" href="#ftn.d5e1067" class="footnote">3</a>]</sup>
             </span></dt><dd><p>initializes a CAS with a subject of analysis</p>
             </dd><dt><span class="term">Artifact Producer</span></dt><dd><p>asynchronously pulls CASes from the Collection Reader,
             creates batches of CASes and puts them into the work queue</p></dd><dt><span class="term">Work Queue</span></dt><dd><p>shared queue containing batches of CASes queued by the Artifact
             Producer for analysis by Analysis Engines</p>
           </dd><dt><span class="term">B1-Bn</span></dt><dd><p>individual batches containing 1 or more CASes</p>
             </dd><dt><span class="term">AE1-AEn</span></dt><dd><p>Analysis Engines arranged by a CPE descriptor</p>
             </dd><dt><span class="term">Processing Pipelines</span></dt><dd><p>each pipeline runs in a separate thread and contains a
             replicated set of the Analysis Engines running in the defined sequence</p>
             </dd><dt><span class="term">Output Queue</span></dt><dd><p>holds batches of CASes with analysis results intended for CAS
             Consumers</p></dd><dt><span class="term">CAS Consumers</span></dt><dd><p>perform collection level analysis over the CASes and extract
             analysis results, e.g., creating indexes or databases</p></dd></dl></div><p>
       </p>
   </div>

   <div class="section" title="3.2.&nbsp;Notation"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.xml.cpe_descriptor.notation">3.2.&nbsp;Notation</h2></div></div></div>


     <p>CPE Descriptors are XML files. This chapter uses an informal notation to specify
       the syntax of CPE Descriptors.</p>

     <p>The notation used in this chapter is:

       </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>An ellipsis (...) inside an element body indicates
         that the substructure of that element has been omitted (to be described in another
         section of this chapter). An example of this would be:


         </p><pre class="programlisting">&lt;collectionReader&gt;
 ...
 &lt;/collectionReader&gt;</pre>
         </li><li class="listitem"><p>An ellipsis immediately after an element indicates that the
           element type may be repeated arbitrarily many times. For example:


           </p><pre class="programlisting">&lt;parameter&gt;[String]&lt;/parameter&gt;
 &lt;parameter&gt;[String]&lt;/parameter&gt;
 ...</pre><p>
           indicates that there may be arbitrarily many parameter elements in this
           context.</p></li><li class="listitem"><p>An ellipsis inside an element means details of the attributes
           associated with that element are defined later, e.g.:

           </p><pre class="programlisting">&lt;casProcessor ...&gt;</pre>
           </li><li class="listitem"><p>Bracketed expressions (e.g. <code class="literal">[String]</code>)
           indicate the type of value that may be used at that location.</p></li><li class="listitem"><p>A vertical bar, as in <code class="literal">true|false</code>, indicates
           alternatives. This can be applied to literal values, bracketed type names, and
           elements. </p></li></ul></div>

     <p>Which elements are optional and which are required is specified in prose, not in the
       syntax definition.</p>

   </div>

   <div class="section" title="3.3.&nbsp;Imports"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.xml.cpe_descriptor.imports">3.3.&nbsp;Imports</h2></div></div></div>


     <p>As of version 2.2, a CPE Descriptor can use the same <code class="literal">import</code> mechanism
       as other component descriptors.  This allows referring to component
       descriptors using either relative paths (resolved relative to the location of the CPE descriptor)
       or the classpath/datapath.  For details see <a href="references.html#ugr.ref.xml.component_descriptor" class="olink">Chapter&nbsp;2, <i>Component Descriptor Reference</i></a>.</p>

     <p>The follwing older syntax is still supported, but <span class="emphasis"><em>not recommended</em></span>:

       </p><pre class="programlisting">&lt;descriptor&gt;
     &lt;include href="[URL or File]"/&gt;
 &lt;/descriptor&gt;</pre>

     <p>The <code class="literal">[URL or File]</code> attribute is a URL or a filename for the descriptor of the
       incorporated component. The argument is first attempted to be resolved as a URL.</p>

     <p>
       Relative paths in an <code class="literal">include</code> are resolved relative to the current working directory
       (NOT the CPE descriptor location as is the case for <code class="literal">import</code>).
       A filename relative to another directory can be specified using the <code class="literal">CPM_HOME</code>
       variable, e.g.,
     </p><pre class="programlisting">&lt;descriptor&gt;
     &lt;include href="${CPM_HOME}/desc_dir/descriptor.xml"/&gt;
 &lt;/descriptor&gt;</pre><p>

       In this case, the value for the <code class="literal">CPM_HOME</code> variable must be
       provided to the CPE by specifying it on the Java command line, e.g.,

     </p><pre class="programlisting">java -DCPM_HOME="C:/Program Files/apache/uima/cpm" ...</pre><p>

   </p>

   </div>

   <div class="section" title="3.4.&nbsp;CPE Descriptor Overview"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.xml.cpe_descriptor.descriptor">3.4.&nbsp;CPE Descriptor Overview</h2></div></div></div>


     <p>A CPE Descriptor consists of information describing the following four main
       elements.</p>

     <div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>The <span class="emphasis"><em>Collection Reader</em></span>, which
       is responsible for gathering artifacts and initializing the Common Analysis
       Structure (CAS) used to support processing in the UIMA collection processing
       engine.</p></li><li class="listitem"><p>The <span class="emphasis"><em>CAS Processors</em></span>, responsible for
         analyzing individual artifacts, analyzing across artifacts, and extracting
         analysis results. CAS Processors include <span class="emphasis"><em>Analysis Engines</em></span>
         and <span class="emphasis"><em>CAS Consumers</em></span>.</p></li><li class="listitem"><p>Operational parameters of the <span class="emphasis"><em>Collection Processing
         Manager</em></span> (CPM), such as checkpoint frequency and deployment
         mode.</p></li><li class="listitem"><p>Resource Manager Configuration (optional). </p></li></ol></div>

     <p>The CPE Descriptor has the following high level skeleton:


       </p><pre class="programlisting">&lt;?xml version="1.0"?&gt;
 &lt;cpeDescription&gt;
    &lt;collectionReader&gt;
 ...
    &lt;/collectionReader&gt;
    &lt;casProcessors&gt;
 ...
    &lt;/casProcessors&gt;
    &lt;cpeConfig&gt;
 ...
    &lt;/cpeConfig&gt;
    &lt;resourceManagerConfiguration&gt;
 ...
    &lt;/resourceManagerConfiguration&gt;
 &lt;/cpeDescription&gt;</pre>

     <p>Details of each of the four main elements are described in the sections that
       follow.</p>
  </div>
     <div class="section" title="3.5.&nbsp;Collection Reader"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.xml.cpe_descriptor.descriptor.collection_reader">3.5.&nbsp;Collection Reader</h2></div></div></div>


       <p>The <code class="literal">&lt;collectionReader&gt;</code> section identifies the
         Collection Reader and optional CAS Initializer that are to be used in the CPE. The
         Collection Reader is responsible for retrieval of artifacts from a collection
         outside of the CPE, and the optional CAS Initializer (deprecated as of UIMA Version 2)
         is responsible for initializing the CAS with the artifact.</p>

       <p>A Collection Reader may initialize the CAS itself, in which case it does not
         require a CAS Initializer. This should be clearly specified in the documentation for
         the Collection Reader. Specifying a CAS Initializer for a Collection Reader that
         does not make use of a CAS Initializer will not cause an error, but the specified CAS
         Initializer will not be used.</p>

       <p>The complete structure of the <code class="literal">&lt;collectionReader&gt;</code>
         section is:


         </p><pre class="programlisting">&lt;collectionReader&gt;
   &lt;collectionIterator&gt;
     &lt;descriptor&gt;
       &lt;import ...&gt; | &lt;include .../&gt;
     &lt;/descriptor&gt;
     &lt;configurationParameterSettings&gt;...&lt;/configurationParameterSettings&gt;
     &lt;sofaNameMappings&gt;...&lt;/sofaNameMappings&gt;
   &lt;/collectionIterator&gt;
   &lt;casInitializer&gt;
     &lt;descriptor&gt;
       &lt;import ...&gt; | &lt;include .../&gt;
     &lt;/descriptor&gt;
     &lt;configurationParameterSettings&gt;...&lt;/configurationParameterSettings&gt;
     &lt;sofaNameMappings&gt;...&lt;/sofaNameMappings&gt;
   &lt;/casInitializer&gt;
 &lt;/collectionReader&gt;</pre>

       <p>The <code class="literal">&lt;collectionIterator&gt;</code> identifies the
         descriptor for the Collection Reader, and the <code class="literal">&lt;casInitializer&gt;
         </code>identifies the descriptor for the CAS Initializer. The format and
         details of the Collection Reader and CAS Initializer descriptors are described in
           <a href="references.html#ugr.ref.xml.component_descriptor.collection_processing_parts.collection_reader" class="olink">Section&nbsp;2.6.1, &#8220;Collection Reader Descriptors&#8221;</a>
         . The <code class="literal">&lt;configurationParameterSettings&gt; </code>and the
         <code class="literal">&lt;sofaNameMappings&gt;</code> elements are described in the next
         section.</p>

       <div class="section" title="3.5.1.&nbsp;Error handling for Collection Readers"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.xml.cpe_descriptor.descriptor.collection_reader.error_handling">3.5.1.&nbsp;Error handling for Collection Readers</h3></div></div></div>


         <p>The CPM will abort if the Collection Reader throws a large number of
           consecutive exceptions (default = 100). This default can by changed by using the
           Java initialization parameter <code class="literal">&#8722;DMaxCRErrorThreshold
           xxx.</code></p>
       </div>
     </div>

     <div class="section" title="3.6.&nbsp;CAS Processors"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.xml.cpe_descriptor.descriptor.cas_processors">3.6.&nbsp;CAS Processors</h2></div></div></div>


       <p>The <code class="literal">&lt;casProcessors&gt;</code> section identifies the
         components that perform the analysis on the input data, including CAS analysis
         (Analysis Engines) and analysis results extraction (CAS Consumers). The CAS
         Consumers may also perform collection level analysis, where the analysis is
         performed (or aggregated) over multiple CASes. The basic structure of the CAS
         Processors section is:


         </p><pre class="programlisting">&lt;casProcessors
     dropCasOnException="true|false"
     casPoolSize="[Number]"
     processingUnitThreadCount="[Number]"&gt;

   &lt;casProcessor ...&gt;
         ...
   &lt;/casProcessor&gt;

   &lt;casProcessor ...&gt;
         ...
   &lt;/casProcessor&gt;
     ...
 &lt;/casProcessors&gt;</pre>

       <p>The <code class="literal">&lt;casProcessors&gt;</code> section has two mandatory
         attributes and one optional attribute that configure the characteristics of the CAS
         Processor flow in the CPE. The first mandatory attribute is a casPoolSize, which
         defines the fixed number of CAS instances that the CPM will create and use during
         processing. All CAS instances are maintained in a CAS Pool with a check-in and
         check-out access. Each CAS is checked-out from the CAS Pool by the Collection Reader
         and initialized with an initial subject of analysis. The CAS is checked-in into the
         CAS Pool when it is completely processed, at the end of the processing chain. A larger
         CAS Pool size will result in more memory being used by the CPM. CAS objects can be large
         and care should be taken to determine the optimum size of the CAS Pool, weighing memory
         tradeoffs with performance.</p>

       <p>The second mandatory <code class="literal">&lt;casProcessors&gt;</code> attribute
         is <code class="literal">processingUnitThreadCount</code>, which specifies the number of
         replicated <span class="emphasis"><em>Processing Pipelines</em></span>. Each Processing
         Pipeline runs in its own thread. The CPM takes CASes from the work queue and submits
         each CAS to one of the Processing Pipelines for analysis. A Processing Pipeline
         contains one or more Analysis Engines invoked in a given sequence. If more than one
         Processing Pipeline is specified, the CPM replicates instances of each Analysis
         Engine defined in the CPE descriptor. Each Processing Pipeline thread runs
         independently, consuming CASes from work queue and depositing CASes with analysis
         results onto the output queue. On multiprocessor machines, multiple Processing
         Pipelines can run in parallel, improving overall throughput of the CPM.</p>
       <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>The number of Processing Pipelines should be equal to or greater than CAS
       Pool size. </p></div>

       <p>Elements in the pipeline (each represented by a &lt;casProcessor&gt; element)
         may indicate that they do not permit multiple deployment in their Analysis Engine
         descriptor. If so, even though multiple pipelines are being used, all CASes passing
         through the pipelines will be routed through one instance of these marked Engines.
         </p>

       <p>The final, optional, &lt;casProcessors&gt; attribute is
         <code class="literal">dropCasOnException</code>. It defines a policy that determines what
         happens with the CAS when an exception happens during processing. If the value of this
         attribute is set to true and an exception happens, the CPM will notify all registered
         listeners of the exception (see <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.cpe.using_listeners" class="olink">Section&nbsp;2.3.1, &#8220;Using Listeners&#8221;</a>), clear the CAS and check the CAS
         back into the CAS Pool so that it can be re-used. The presumption is that an exception
         may leave the CAS in an inconsistent state and therefore that CAS should not be allowed
         to move through the processing chain. When this attribute is omitted the CPM's
         default is the same as specifying
         <code class="literal">dropCasOnException="false"</code>.</p>

       <div class="section" title="3.6.1.&nbsp;Specifying an Individual CAS Processor"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual">3.6.1.&nbsp;Specifying an Individual CAS Processor</h3></div></div></div>


         <p>The CAS Processors that make up the Processing Pipeline and the CAS Consumer
           pipeline are specified with the <code class="literal">&lt;casProcessor&gt;</code>
           entity, which appears within the <code class="literal">&lt;casProcessors&gt;</code>
           entity. It may appear multiple times, once for each CAS Processor specified for
           this CPE.</p>

         <p>The order of the <code class="literal">&lt;casProcessor&gt;</code> entities with
           the <code class="literal">&lt;casProcessors&gt;</code> section specifies the order in
           which the CAS Processors will run. Although CAS Consumers are usually put at the end
           of the pipeline, they need not be. Also, Aggregate Analysis Engines may include CAS
           Consumers.</p>

         <p>The overall format of the <code class="literal">&lt;casProcessor&gt;</code> entity
           is:


           </p><pre class="programlisting">&lt;casProcessor deployment="local|remote|integrated" name="[String]" &gt;
     &lt;descriptor&gt;
       &lt;import ...&gt; | &lt;include .../&gt;
     &lt;/descriptor&gt;
     &lt;configurationParameterSettings&gt;...&lt;/configurationParameterSettings&gt;
     &lt;sofaNameMappings&gt;...&lt;/sofaNameMappings&gt;
     &lt;runInSeparateProcess&gt;...&lt;/runInSeparateProcess&gt;
     &lt;deploymentParameters&gt;...&lt;/deploymentParameters&gt;
     &lt;filter/&gt;
     &lt;errorHandling&gt;...&lt;/errorHandling&gt;
     &lt;checkpoint batch="Number"/&gt;
 &lt;/casProcessor&gt;</pre>

         <p>The <code class="literal">&lt;casProcessor&gt;</code> element has two mandatory
           attributes, <code class="literal">deployment</code> and <code class="literal">name</code>. The
           mandatory <code class="literal">name</code> attribute specifies a unique string
           identifying the CAS Processor.</p>

         <p>The mandatory <code class="literal">deployment</code> attribute specifies the CAS
           Processor deployment mode. Currently, three deployment options are supported:

           </p><div class="variablelist"><dl><dt><span class="term">integrated</span></dt><dd><p>indicates <span class="emphasis"><em>integrated</em></span> deployment
                 of the CAS Processor. The CPM deploys and collocates the CAS Processor in the
                 same process space as the CPM. This type of deployment is recommended to
                 increase the performance of the CPE. However, it is NOT recommended to
                 deploy annotators containing JNI this way. Such CAS Processors may cause a
                 fatal exception and force the JVM to exit without cleanup (bringing down the
                 CPM). Any UIMA SDK compliant pure Java CAS Processors may be safely deployed
                 this way.</p>
                 <p>The descriptor for an integrated deployment can, in fact, be a remote
                   service descriptor. When used this way, however, the CPM error recovery
                   options (see below) operate in the integrated mode, which means that many
                   of the retry options are not available.</p></dd><dt><span class="term">remote</span></dt><dd><p>indicates <span class="emphasis"><em>non-managed</em></span>
                 deployment of the CAS Processor. The CAS Processor descriptor referenced
                 in the <code class="literal">&lt;descriptor&gt;</code> element must be a Vinci
                 <span class="emphasis"><em>Service Client Descriptor</em></span>, which identifies a
                 remotely deployed CAS Processor service (see <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.application.remote_services" class="olink">Section&nbsp;3.6, &#8220;Working with Remote Services&#8221;</a>). The CPM
                 assumes that the CAS Processor is already running as a remote service and
                 will connect to it using the URI provided in the client service descriptor.
                 The lifecycle of a remotely deployed CAS Processor is not managed by the CPM,
                 so appropriate infrastructure should be in place to start/restart such CAS
                 Processors when necessary. This deployment provides fault isolation and
                 is implementation (i.e., programming language) neutral.</p>
                 </dd><dt><span class="term">local</span></dt><dd><p>indicates <span class="emphasis"><em>managed</em></span> deployment of
                 the CAS Processor. The CAS Processor descriptor referenced in the
                 <code class="literal">&lt;descriptor&gt;</code> element must be a Vinci
                 <span class="emphasis"><em>Service Deployment Descriptor</em></span>, which configures
                 a CAS Processor for deployment as a Vinci service (see <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.application.remote_services" class="olink">Section&nbsp;3.6, &#8220;Working with Remote Services&#8221;</a>). The CPM
                 deploys the CAS Processor in a separate process and manages the life cycle
                 (start/stop) of the CAS Processor. Communication between the CPM and the
                 CAS Processor is done with Vinci. When the CPM completes processing, the
                 process containing the CAS Processor is terminated. This deployment mode
                 insulates the CPM from the CAS Processor, creating a more robust deployment
                 at the cost of a small communication overhead. On multiprocessor machines,
                 the separate processes may run concurrently and improve overall
                 throughput.</p></dd></dl></div>

         <p>A number of elements may appear within the
           <code class="literal">&lt;casProcessor&gt;</code> element.</p>

         <div class="section" title="3.6.1.1.&nbsp;<descriptor&gt; Element"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.descriptor">3.6.1.1.&nbsp;&lt;descriptor&gt; Element</h4></div></div></div>


           <p>The <code class="literal">&lt;descriptor&gt;</code> element is mandatory. It
             identifies the descriptor for the referenced CAS Processor using the syntax
             described in <a href="references.html#ugr.ref.xml.component_descriptor.aes" class="olink">Section&nbsp;2.4, &#8220;Analysis Engine Descriptors&#8221;</a>.

             </p><div class="itemizedlist"><ul class="itemizedlist" type="disc" compact><li class="listitem"><p>For
               <span class="emphasis"><em><code class="literal">remote</code></em></span> CAS Processors, the
               referenced descriptor must be a Vinci <span class="emphasis"><em>Service Client
               Descriptor</em></span>, which identifies a remotely deployed CAS Processor
               service.</p></li><li class="listitem"><p>For <span class="emphasis"><em>local</em></span> CAS Processors, the
                 referenced descriptor must be a Vinci <span class="emphasis"><em>Service Deployment
                 Descriptor</em></span>.</p></li><li class="listitem"><p>For <span class="emphasis"><em>integrated</em></span> CAS Processors,
                 the referenced descriptor must be an Analysis Engine Descriptor
                 (primitive or aggregate). </p></li></ul></div><p> </p>

           <p>See <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.application.remote_services" class="olink">Section&nbsp;3.6, &#8220;Working with Remote Services&#8221;</a> for more
             information on creating these descriptors and deploying services.</p>

         </div>

         <div class="section" title="3.6.1.2.&nbsp;<configurationParameterSettings&gt; Element"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.configuration_parameter_settings">3.6.1.2.&nbsp;&lt;configurationParameterSettings&gt; Element</h4></div></div></div>


           <p>This element provides a way to override the contained Analysis
             Engine's parameters settings. Any entry specified here must already be
             defined; values specified replace the corresponding values for each
             parameter. <span class="bold-italic">For Cas Processors, this mechanism
             is only available when they are deployed in <span class="quote">&#8220;<span class="quote">integrated</span>&#8221;</span>
             mode.</span> For Collection Readers and Initializers, it always is
             available.</p>

           <p>The content of this element is identical to the component descriptor for
             specifying parameters (in the case where no parameter groups are
             specified)<sup>[<a name="d5e1266" href="#ftn.d5e1266" class="footnote">4</a>]</sup>. Here is an example:


             </p><pre class="programlisting">&lt;configurationParameterSettings&gt;
   &lt;nameValuePair&gt;
     &lt;name&gt;CivilianTitles&lt;/name&gt;
     &lt;value&gt;
       &lt;array&gt;
         &lt;string&gt;Mr.&lt;/string&gt;
         &lt;string&gt;Ms.&lt;/string&gt;
         &lt;string&gt;Mrs.&lt;/string&gt;
         &lt;string&gt;Dr.&lt;/string&gt;
       &lt;/array&gt;
     &lt;/value&gt;
   &lt;/nameValuePair&gt;
   ...
 &lt;/configurationParameterSettings&gt;</pre>

         </div>

         <div class="section" title="3.6.1.3.&nbsp;<sofaNameMappings&gt; Element"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.sofa_name_mappings">3.6.1.3.&nbsp;&lt;sofaNameMappings&gt; Element</h4></div></div></div>


           <p>This optional element provides a mapping from defined Sofa names in the
             component, or the default Sofa name (if the component does not declare any Sofa
             names). The form of this element is:


             </p><pre class="programlisting">&lt;sofaNameMappings&gt;
   &lt;sofaNameMapping cpeSofaName="a_CPE_name"
                    componentSofaName="a_component_Name"/&gt;
   ...
 &lt;/sofaNameMappings&gt;</pre>

           <p>There can be any number of<code class="literal">
             &lt;sofaNameMapping&gt;</code> elements contained in the
             <code class="literal">&lt;sofaNameMappings&gt;</code> element. The
             <code class="literal">componentSofaName</code> attribute is optional; leave it out to
             specify a mapping for the <code class="literal">_InitialView</code> - that is, for
             Single-View components.</p>

         </div>

         <div class="section" title="3.6.1.4.&nbsp;<runInSeparateProcess&gt; Element"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.run_in_separate_process">3.6.1.4.&nbsp;&lt;runInSeparateProcess&gt; Element</h4></div></div></div>


           <p>The <code class="literal">&lt;runInSeparateProcess&gt;</code> element is
             mandatory for <code class="literal">local</code> CAS Processors, but should not appear
             for <code class="literal">remote</code> or <code class="literal">integrated</code> CAS
             Processors. It enables the CPM to create external processes using the provided
             runtime environment. Applications launched this way communicate with the CPM
             using the Vinci protocol and connectivity is enabled by a local instance of the
             VNS that the CPM manages. Since communication is based on Vinci, the application
             need not be implemented in Java. Any language for which Vinci provides support
             may be used to create an application, and the CPM will seamlessly communicate
             with it. The overall structure of this element is:


             </p><pre class="programlisting">&lt;runInSeparateProcess&gt;
     &lt;exec dir="[String]" executable="[String]"&gt;
         &lt;env key="[String]" value ="[String]"/&gt;
         ...
         &lt;arg&gt;[String]&lt;/arg&gt;
         ...
     &lt;/exec&gt;
 &lt;/runInSeparateProcess&gt;</pre>

           <p>The <code class="literal">&lt;exec&gt;</code> element provides information
             about how to execute the referenced CAS Processor. Two attributes are defined
             for the <code class="literal">&lt;exec&gt;</code> element. The
             <code class="literal">dir</code> attribute is currently not used &#8211; it is reserved
             for future functionality. The <code class="literal">executable</code> attribute
             specifies the actual Vinci service executable that will be run by the CPM, e.g.,
             <code class="literal">java</code>, a batch script, an application (.exe), etc. The
             executable must be specified with a fully qualified path, or be found in the
             <code class="literal">PATH</code> of the CPM.</p>

           <p>The <code class="literal">&lt;exec&gt;</code> element has two elements within it
             that define parameters used to construct the command line for executing the CAS
             Processor. These elements must be listed in the order in which they should be
             defined for the CAS Processor.</p>

           <p>The optional <code class="literal">&lt;env&gt;</code> element is used to set an
             environment variable. The variable <code class="literal">key</code> will be set to
             <code class="literal">value</code>. For example,


             </p><pre class="programlisting">&lt;env key="CLASSPATH" value="C:Javalib"/&gt;</pre><p>
             will set the environment variable <code class="literal">CLASSPATH</code> to the value
             <code class="literal">C:Javalib</code>. The <code class="literal">&lt;env&gt;</code>
             element may be repeated to set multiple environment variables. All of the
             key/value pairs will be added to the environment by the CPM prior to launching the
             executable.</p>
           <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>The CPM actually adds ALL system environment variables when it
           launches the program. It queries the Operating System for its current system
           variables and one by one adds them to the program's process
           configuration.</p></div>

           <p>The <code class="literal">&lt;arg&gt;</code> element is used to specify arbitrary
             string arguments that will appear on the command line when the CPM runs the
             command specified in the <code class="literal">executable</code> attribute.</p>

           <p>For example, the following would be used to invoke the UIMA Java
             implementation of the Vinci service wrapper on a Java CAS Processor:


             </p><pre class="programlisting">&lt;runInSeparateProcess&gt;
     &lt;exec executable="java"&gt;
         &lt;arg&gt;&amp;minus;DVNS_HOST=localhost&lt;/arg&gt;
         &lt;arg&gt;&amp;minus;DVNS_PORT=9099&lt;/arg&gt;
         &lt;arg&gt;org.apache.uima.reference_impl.analysis_engine.service.
 vinci.VinciAnalysisEngineService_impl&lt;/arg&gt;
         &lt;arg&gt;C:uimadescdeployCasProcessor.xml&lt;/arg&gt;
     &lt;/exec&gt;
 &lt;runInSeparateProcess&gt;</pre>

           <p>This will cause the CPM to run the following command line when starting the
             CAS Processor:


             </p><pre class="programlisting">java -DVNS_HOST=localhost -DVNS_PORT=9099
   org.apache.uima.reference_impl.analysis_engine.service.vinci.\\
               VinciAnalysisEngineService_impl
   C:uimadescdeployCasProcessor.xml</pre>

           <p>The first argument specifies that the Vinci Naming Service is running on the
             <code class="literal">localhost</code>. The second argument specifies that the Vinci
             Naming Service port number is <code class="literal">9099</code>. The third argument
             (split over 2 lines in this documentation)
             identifies the UIMA implementation of the Vinci service wrapper. This class
             contains the <code class="literal">main</code> method that will execute. That main
             method in turn takes a single argument &#8211; the filename for the CAS Processor
             service deployment descriptor. Thus the last argument identifies the Vinci
             service deployment descriptor file for the CAS Processor. Since this is the same
             descriptor file specified earlier in the
             <code class="literal">&lt;descriptor&gt;</code> element, the string
             <code class="literal">${descriptor}</code> can be used to refer to the descriptor,
             e.g.:


             </p><pre class="programlisting">&lt;arg&gt;${descriptor}&lt;/arg&gt;</pre>

           <p>The CPM will expand this out to the service deployment descriptor file
             referenced in the <code class="literal">&lt;descriptor&gt;</code> element.</p>

         </div>

         <div class="section" title="3.6.1.5.&nbsp;<deploymentParameters&gt; Element"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.deployment_parameters">3.6.1.5.&nbsp;&lt;deploymentParameters&gt; Element</h4></div></div></div>


           <p>The <code class="literal">&lt;deploymentParameters&gt;</code> element defines
             a number of deployment parameters that control how the CPM will interact with the
             CAS Processor. This element has the following overall form:


             </p><pre class="programlisting">&lt;deploymentParameters&gt;
     &lt;parameter name="[String]" value="..." type="string|integer" /&gt;
     ...
 &lt;/deploymentParameters&gt;</pre>

           <p>The <code class="literal">name</code> attribute identifies the parameter, the
             <code class="literal">value</code> attribute specifies the value that will be assigned
             to the parameter, and the <code class="literal">type</code> attribute indicates the
             type of the parameter, either <code class="literal">string</code> or
             <code class="literal">integer</code>. The available parameters include:

             </p><div class="variablelist"><dl><dt><span class="term">service-access</span></dt><dd><p>string parameter whose value must be
                   <span class="quote">&#8220;<span class="quote">exclusive</span>&#8221;</span>, if present. This parameter is only
                   effective for remote deployments. It modifies the Vinci service
                   connections to be preallocated and dedicated, one service instance per
                   pipe-line. It is only relevant for non-Integrated deployement modes. If
                   there are fewer services instances that are available (and alive &#8211;
                   responding to a <span class="quote">&#8220;<span class="quote">ping</span>&#8221;</span> request) than there are pipelines,
                   the number of pipelines (the number of concurrent threads) is reduced to
                   match the number of available instances. If not specified, the VNS is
                   queried each time a service is needed, and a <span class="quote">&#8220;<span class="quote">random</span>&#8221;</span>
                   instance is assigned from the pool of available instances. If a services
                   dies during processing, the CPM will use its normal error handling
                   procedures to attempt to reconnect. The number of attempts is specified
                   in the CPE descriptor for each Cas Processor using the
                   <code class="literal">&lt;maxConsecutiveRestarts value="10"
                   action="kill-pipeline"
                   waitTimeBetweenRetries="50"/&gt;</code> xml element. The
                   <span class="quote">&#8220;<span class="quote">value</span>&#8221;</span> attribute is the number of reconnection tries;
                   the <span class="quote">&#8220;<span class="quote">action</span>&#8221;</span> says what to do if the retries exceed the
                   limit. The <span class="quote">&#8220;<span class="quote">kill-pipeline</span>&#8221;</span> action stops the pipeline
                   that was associated with the failing service (other pipelines will
                   continue to work). The CAS in process within a killed pipeline will be
                   dropped. These events are communicated to the application using the
                   normal event listener mechanism. The
                   <code class="literal">waitTimeBetweenRetries</code> says how many
                   milliseconds to wait inbetween attempts to reconnect.</p>
                   </dd><dt><span class="term">vnsHost</span></dt><dd><p>(Deprecated) string parameter specifying the VNS host,
                   e.g., <code class="literal">localhost</code> for local CAS Processors, host
                   name or IP address of VNS host for remote CAS Processors. This parameter is
                   deprecated; use the parameter specification instead inside the Vinci
                   <span class="emphasis"><em>Service Client Descriptor</em></span>, if needed. It is
                   ignored for integrated and local deployments. If present, for remote
                   deployments, it specifies the VNS Host to use, unless that is specified in
                   the Vinci <span class="emphasis"><em>Service Client Descriptor</em></span>.</p>
                   </dd><dt><span class="term">vnsPort</span></dt><dd><p>(Deprecated) integer parameter specifying the VNS port
                   number. This parameter is deprecated; use the parameter specification
                   instead inside the Vinci <span class="emphasis"><em>Service Client
                   Descriptor,</em></span> if needed. It is ignored for integrated and
                   local deployments. If present, for remote deployments, it specifies the
                   VNS Port number to use, unless that is specified in the Vinci
                   <span class="emphasis"><em>Service Client Descriptor.</em></span></p>
                   </dd></dl></div>

           <p>For example, the following parameters might be used with a CAS Processor
             deployed in local mode:


             </p><pre class="programlisting">&lt;deploymentParameters&gt;
   &lt;parameter name="service-access" value="exclusive" type="string"/&gt;
 &lt;/deploymentParameters&gt;</pre>

         </div>

         <div class="section" title="3.6.1.6.&nbsp;<filter&gt; Element"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.filter">3.6.1.6.&nbsp;&lt;filter&gt; Element</h4></div></div></div>


           <p>The &lt;filter&gt; element is a required element but currently should be
             left empty. This element is reserved for future use.</p>

         </div>

         <div class="section" title="3.6.1.7.&nbsp;<errorHandling&gt; Element"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.error_handling">3.6.1.7.&nbsp;&lt;errorHandling&gt; Element</h4></div></div></div>


           <p>The mandatory <code class="literal">&lt;errorHandling&gt;</code> element
             defines error and restart policies for the CAS Processor. Each CAS Processor may
             define different actions in the event of errors and restarts. The CPM monitors
             and logs errant behaviors and attempts to recover the component based on the
             policies specified in this element.</p>

           <p>There are two kinds of faults:

             </p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>One kind only occurs with non-integrated CAS
               Processors &#8211; this fault is either a timeout attempting to launch or
               connect to the non-integrated component, or some other kind of connection
               related exception (for instance, the network connection might timeout or get
               reset).</p></li><li class="listitem"><p>The other kind happens when the CAS Processor component (an
                 Annotator, for example) throws any kind of exception. This kind may occur
                 with any kind of deployment, integrated or not. </p></li></ol></div>

           <p>The &lt;errorHandling&gt; has specifications for each of these kinds of
             faults. The format of this element is:


             </p><pre class="programlisting">&lt;errorHandling&gt;
   &lt;maxConsecutiveRestarts action="continue|disable|terminate"
                            value="[Number]"/&gt;
   &lt;errorRateThreshold action="continue|disable|terminate" value="[Rate]"/&gt;
   &lt;timeout max="[Number]"/&gt;
 &lt;/errorHandling&gt;</pre>

           <p>The mandatory <code class="literal">&lt;maxConsecutiveRestarts&gt;</code>
             element applies only to faults of the first kind, and therefore, only applies to
             non-integrated deployments. If such a fault occurs, a retry is attempted, up to
             <code class="literal">value="[Number]"</code> of times. This retry resets the
             connection (if one was made) and attempts to reconnect and perhaps re-launch
             (see below for details). The original CAS (not a partially updated one) is sent to
             the CAS Processor as part of the retry, once the deployed component has been
             successfully restarted or reconnected to.</p>

           <p>The <code class="literal">action</code> attribute specifies the action to take
             when the threshold specified by the <code class="literal">value="[Number]"</code> is
             exceeded. The possible actions are:

             </p><div class="variablelist"><dl><dt><span class="term">continue</span></dt><dd><p>skip any further processing for this CAS by this CAS
                   Processor, and pass the CAS to the next CAS Processor in the Pipeline.
                   </p>
                   <p>The <span class="quote">&#8220;<span class="quote">restart</span>&#8221;</span> action is done, because it is needed
                     for the next CAS.</p>

                   <p>If the <code class="literal">dropCasOnException="true"</code>, the CPM
                     will NOT pass the CAS to the next CAS Processor in the chain. Instead, the
                     CPM will abort processing of this CAS, release the CAS back to the CAS
                     Pool and will process the next CAS in the queue.</p>

                   <p>The counter counting the restarts toward the threshold is only
                     reset after a CAS is successfully processed.</p></dd><dt><span class="term">disable</span></dt><dd><p>the current CAS is handled just as in the
                   <code class="literal">continue</code> case, but in addition, the CAS Processor
                   is marked so that its <span class="emphasis"><em>process()</em></span> method will not be
                   called again (i.e., it will be <span class="quote">&#8220;<span class="quote">skipped</span>&#8221;</span> for future
                   CASes)</p></dd><dt><span class="term">terminate</span></dt><dd><p>the CPM will terminate all processing and exit.</p>
                   </dd></dl></div>

           <p>The definition of an error for the
             <code class="literal">&lt;maxConsecutiveRestarts&gt;</code> element differs
             slightly for each of the three CAS Processor deployment modes:
             </p><div class="variablelist"><dl><dt><span class="term">local</span></dt><dd><p>Local CAS Processors experience two general error
                   types:
                   </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>launch errors &#8211; errors associated with
                       launching a process</p></li><li class="listitem"><p>processing errors &#8211; errors associated with
                       sending Vinci commands to the process</p></li></ul></div>

                   <p>A launch error is defined by a failure of the process to
                     successfully register with the local VNS within a default time window.
                     The current timeout is 15 minutes. Multiple local CAS Processors are
                     launched sequentially, with a subsequent processor launched
                     immediately after its previous processor successfully registers
                     with the VNS.</p>

                   <p>A processing error is detected if a connection to the CAS Processor
                     is lost or if the processing time exceeds a specified timeout
                     value.</p>

                   <p>For local CAS Processors, the
                     &lt;maxConsecutiveRestarts&gt; element specifies the number of
                     consecutive attempts made to launch the CAS Processor at CPM startup or
                     after the CPM has lost a connection to the CAS Processor.</p>
                   </dd><dt><span class="term">remote</span></dt><dd><p>For remote CAS Processors, the
                   &lt;maxConsecutiveRestarts&gt; element applies to errors from
                   sending Vinci commands. An error is detected if a connection to the CAS
                   Processor is lost, or if the processing time exceeds the timeout value
                   specified in the &lt;timeout&gt; element (see below).</p>
                   </dd><dt><span class="term">integrated</span></dt><dd><p>Although mandatory, the
                   &lt;maxConsecutiveRestarts&gt; element is NOT used for integrated CAS
                   Processors, because Integrated CAS Processors are not
                   re-instantiated/restarted on exceptions. This setting is ignored by
                   the CPM for Integrated CAS Processors but it is required. Future version
                   of the CPM will make this element mandatory for remote and local CAS
                   Processors only.</p></dd></dl></div>

           <p>The mandatory <code class="literal">&lt;errorRateThreshold&gt;</code> element
             is used for all faults &#8211; both those above, and exceptions thrown by the CAS
             Processor itself. It specifies the number of retries for exceptions thrown by
             the CAS Processor itself, a maximum error rate, and the corresponding action to
             take when this rate is exceeded. The <code class="literal">value</code> attribute
             specifies the error rate in terms of errors per sample size in the form
             <span class="quote">&#8220;<span class="quote"><code class="literal">N/M</code></span>&#8221;</span>, where <code class="literal">N</code> is the
             number of errors and <code class="literal">M</code> is the sample size, defined in terms
             of the number of documents.</p>

           <p>The first number is used also to indicate the maximum number of retries. If
             this number is less than the <code class="literal">&lt;maxConsecutiveRestarts
             value="[Number]"&gt;, </code>it will override, reducing the number of
             <span class="quote">&#8220;<span class="quote">restarts</span>&#8221;</span> attempted. A retry is done only if the
             <code class="literal">dropCasOnException </code>is false. If it is set to true, no retry
             occurs, but the error is counted.</p>

           <p>When the number of counted errors exceeds the sample size, an action
             specified by the <code class="literal">action</code> attribute is taken. The possible
             actions and their meaning are the same as described above for the
             <code class="literal">&lt;maxConsecutiveRestarts&gt;</code> element:
             </p><div class="itemizedlist"><ul class="itemizedlist" type="disc" compact><li class="listitem"><p><code class="literal">continue</code></p></li><li class="listitem"><p><code class="literal">disable</code></p></li><li class="listitem"><p><code class="literal">terminate</code></p></li></ul></div>

           <p>The <code class="literal">dropCasOnException="true"</code> attribute of the
             <code class="literal">&lt;casProcessors&gt;</code> element modifies the action
             taken for continue and disable, in the same manner as above. For example:


             </p><pre class="programlisting">&lt;errorRateThreshold value="3/1000" action="disable"/&gt;</pre><p>
             specifies that each error thrown by the CAS Processor itself will be retried up to
             3 times (if <code class="literal">dropCasOnException</code> is false) and the CAS
             Processor will be disabled if the error rate exceeds 3 errors in 1000
             documents.</p>

           <p>If a document causes an error and the error rate threshold for the CAS
             Processor is not exceeded, the CPM increments the CAS Processor's error
             count and retries processing that document (if
             <code class="literal">dropCasOnException</code> is false). The retry means that the
             CPM calls the CAS Processor's process() method again, passing in as an
             argument the same CAS that previously caused an exception.</p>
           <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>The CPM does not attempt to rollback any partial changes that may have
           been applied to the CAS in the previous process() call. </p></div>

           <p>Errors are accumulated across documents. For example, assume the error
             rate threshold is <code class="literal">3/1000</code>. The same document may fail three
             times before finally succeeding on the fourth try, but the error count is now 3. If
             one more error occurs within the current sample of 1000 documents, the error rate
             threshold will be exceeded and the specified action will be taken. If no more
             errors occur within the current sample, the error counter is reset to 0 for the
             next sample of 1000 documents.</p>

           <p>The <code class="literal">&lt;timeout&gt;</code> element is a mandatory element.
             Although mandatory for all CAS Processors, this element is only relevant for
             local and remote CAS Processors. For integrated CAS Processors, this element is
             ignored. In the current CPM implementation the integrated CAS Processor
             process() method is not subject to timeouts.</p>

           <p>The <code class="literal">max</code> attribute specifies the maximum amount of
             time in milliseconds the CPM will wait for a process() method to complete When
             exceeded, the CPM will generate an exception and will treat this as an error
             subject to the threshold defined in the
             <code class="literal">&lt;errorRateThreshold&gt;</code> element above, including
             doing retries.</p>

           <div class="section" title="Retry action taken on a timeout"><div class="titlepage"><div><div><h5 class="title" id="ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.error_handling.timeout_retry_action">Retry action taken on a timeout</h5></div></div></div>


             <p>The action taken depends on whether the CAS Processor is local (managed)
               or remote (unmanaged). Local CAS Processors (which are services) are killed
               and restarted, and a new connection to them is established. For remote CAS
               Processors, the connection to them is dropped, and a new connection is
               reestablished (which may actually connect to a different instance of the
               remote services, if it has multiple instances).</p>
           </div>
         </div>

         <div class="section" title="3.6.1.8.&nbsp;<checkpoint&gt; Element"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.checkpoint">3.6.1.8.&nbsp;&lt;checkpoint&gt; Element</h4></div></div></div>


           <p>The <code class="literal">&lt;checkpoint&gt;</code> element is an optional
             element used to improve the performance of CAS Consumers. It has a single
             attribute, <code class="literal">batch</code>, which specifies the number of CASes in a
             batch, e.g.:


             </p><pre class="programlisting">&lt;checkpoint batch="1000"&gt;</pre>

           <p>sets the batch size to 1000 CASes. The batch size is the interval used to mark a
             point in processing requiring special handling. The CAS Processor's
             <code class="literal">batchProcessComplete()</code> method will be called by the CPM
             when this mark is reached so that the processor can take appropriate action. This
             mark could be used as a mechanism to buffer up results in CAS Consumers and perform
             time-consuming operations, such as check-pointing, that should not be done on a
             per-document basis.</p>

         </div>
       </div>
     </div>

     <div class="section" title="3.7.&nbsp;CPE Operational Parameters"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.xml.cpe_descriptor.descriptor.operational_parameters">3.7.&nbsp;CPE Operational Parameters</h2></div></div></div>


       <p>The parameters for configuring the overall CPE and CPM are specified in the
         <code class="literal">&lt;cpeConfig&gt;</code> section. The overall format of this
         section is:


         </p><pre class="programlisting">&lt;cpeConfig&gt;
   &lt;startAt&gt;[NumberOrID]&lt;/startAt&gt;

   &lt;numToProcess&gt;[Number]&lt;/numToProcess&gt;

   &lt;outputQueue dequeueTimeout="[Number]" queueClass="[ClassName]" /&gt;

   &lt;checkpoint file="[File]" time="[Number]" batch="[Number]"/&gt;

   &lt;timerImpl&gt;[ClassName]&lt;/timerImpl&gt;

   &lt;deployAs&gt;vinciService|interactive|immediate|single-threaded
   &lt;/deployAs&gt;

 &lt;/cpeConfig&gt;</pre>

       <p>This section of the CPE descriptor allows for defining the starting entity, the
         number of entities to process, a checkpoint file and frequency, a pluggable timer, an
         optional output queue implementation, and finally a mode of operation. The mode of
         operation determines how the CPM interacts with users and other systems.</p>

       <p>The <code class="literal">&lt;startAt&gt;</code> element is an optional argument. It
         defines the starting entity in the collection at which the CPM should start
         processing.</p>

       <p>The implementation in the CPM passes this argument to the Collection Reader
         as the value of the parameter <span class="quote">&#8220;<span class="quote"><code class="literal">startNumber</code></span>&#8221;</span>.
         The CPM does not do anything else with this parameter; in particular, the CPM has no
         ability to skip to a specific document - that function, if available, is only provided
         by a particular Collection Reader implementation.</p>

       <p>If the <code class="literal">&lt;startAt&gt;</code> element is used, the Collection
         Reader descriptor must define a single-valued configuration parameter with the
         name <code class="literal">startNumber</code>. It can declare this value to be of any type;
         the value passed in this XML element must be convertible to that type.</p>

       <p>A typical use is to declare this to be an integer type, and to pass the sequential
         document number where processing should start. An alternative implementation
         might take a specific document ID; the collection reader could search through its
         collection until it reaches this ID and then start there.</p>

       <p>This parameter will only make sense if the particular collection reader is
         implemented to use the <code class="literal">startNumber</code> configuration
         parameter.</p>

       <p>The <code class="literal">&lt;numToProcess&gt;</code> element is an optional
         element. It specifies the total number of entities to process. Use -1 to indicate ALL.
         If not defined, the number of entities to process will be taken from the Collection
         Reader configuration. If present, this value overrides the Collection Reader
         configuration.</p>

       <p>The <code class="literal">&lt;outputQueue&gt;</code> element is an optional element.
         It enables plugging in a custom implementation for the Output Queue. When omitted,
         the CPM will use a default output queue that is based on First-in First-out (FIFO)
         model.</p>

       <p>The UIMA SDK provides a second implementation for the Output Queue that can be
         plugged in to the CPM, named <span class="quote">&#8220;<span class="quote">
         <code class="literal">org.apache.uima.collection.impl.cpm.engine.SequencedQueue</code>
         </span>&#8221;</span>.</p>

       <p>This implementation supports handling very large documents that are split into
         <span class="quote">&#8220;<span class="quote">chunks</span>&#8221;</span>; it provides a delivery mechanism that insures the
         sequential order of the chunks using information carried in the CAS metadata. This
         metadata, which is required for this implementation to work correctly, must be added
         as an instance of a Feature Structure of type
         <code class="literal">org.apache.es.tt.DocumentMetaData</code> and referred to by an
         additional feature named <code class="literal">esDocumentMetaData</code> in the special
         instance of <code class="literal">uima.tcas.DocumentAnnotation</code> that is
         associated with the CAS. This is usually done by the Collection Reader; the instance
         contains the following features:

         </p><div class="variablelist"><dl><dt><span class="term">sequenceNumber</span></dt><dd><p>[Number] the sequential number of a chunk, starting at 1. If
               not a chunk (i.e. complete document), the value should be 0.</p>
               </dd><dt><span class="term">documentId</span></dt><dd><p>[Number] current document id. Chunks belonging to the same
               document have identical document id.</p></dd><dt><span class="term">isCompleted</span></dt><dd><p>[Number] 1 if the chunk is the last in a sequence, 0
               otherwise.</p></dd><dt><span class="term">url</span></dt><dd><p>[String] document url.</p></dd><dt><span class="term">throttleID</span></dt><dd><p>[String] special attribute currently used by
               OmniFind.</p></dd></dl></div>

       <p>This implementation of a sequenced queue supports proper sequencing of CASes in
         CPM deployments that use document chunking. Chunking is a technique of splitting
         large documents into pieces to reduce overall memory consumption. Chunking does not
         depend on the number of CASes in the CAS Pool. It works equally well with one or more
         CASes in the CAS Pool. Each chunk is packaged in a separate CAS and placed in the Work
         Queue. If the CAS Pool is depleted, the CollectionReader thread is suspended until a
         CAS is released back to the pool by the processing threads. A document may be split into
         1, 2, 3 or more chunks that are analyzed independently. In order to reconstruct the
         document correctly, the CAS Consumer can depend on receiving the chunks in the same
         sequential order that the chunks were <span class="quote">&#8220;<span class="quote">produced</span>&#8221;</span>, when this
         sequenced queue implementation is used. To plug in this sequenced queue to the CPM use
         the following specification:


         </p><pre class="programlisting">&lt;outputQueue dequeueTimeout="100000" queueClass=
 "org.apache.uima.collection.impl.cpm.engine.SequencedQueue"/&gt;</pre><p>

         where the mandatory <code class="literal">queueClass</code> attribute defines the name of
         the class and the second mandatory attribute, <code class="literal">dequeueTimeout</code>
         specifies the maximum number of milliseconds to wait for the expected chunk.</p>

       <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>The value for this timeout must be carefully determined to avoid
       excessive occurrences of timeouts. Typically, the size of a chunk and the type of
       analysis being done are the most important factors when deciding on the value for the
       timeout. The larger the chunk and the more complicated analysis, the more time it takes
       for the chunk to go from source to sink. You may specify 0, in which case, the timeout is
       disabled - i.e., it is equivalent to an infinitely long timeout.</p></div>

       <p>If the chunk doesn't arrive in the configured time window, the entire
         document is presumed to be invalid and the CAS is dropped from further processing.
         This action occurs regardless of any other error action specification. The
         SequencedQueue invalidate the document, adding the offending document's
         metadata to a local cache of invalid documents. </p>

       <p>If the time out occurs, the CPM notifies all registered listeners (see <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.cpe.using_listeners" class="olink">Section&nbsp;2.3.1, &#8220;Using Listeners&#8221;</a>) by calling
         entityProcessComplete(). As part of this call, the SequencedQueue will pass null
         instead of a CAS as the first argument, and a special exception &#8211;
         CPMChunkTimeoutException. The reason for passing null as the first argument is
         because the time out occurs due to the fact that the chunk has not been received in the
         configured timeout window, so there is no CAS available when the timeout event
         occurs.</p>

       <p>The CPMChunkTimeoutException object includes an API that allows the listener
         to retrieve the offending document id as well as the other metadata attributes as
         defined above. These attributes are part of each chunk's metadata and are added
         by the Collection Reader.</p>

       <p>Each chunk that SequencedQueue works on is subjected to a test to determine if the
         chunk belongs to an invalid document. This test checks the chunk's metadata
         against the data in the local cache. If there is a match, the chunk is dropped. This
         check is only performed for chunks and complete documents are not subject to this
         check.</p>

       <p>If there is an exception during the processing of a chunk, the CPM sends a
         notification to all registered listeners. The notification includes the CAS and an
         exception. When the listener notification is completed, the CPM also sends separate
         notifications, containing the CAS, to the Artifact Producer and the
         SequencedQueue. The intent is to stop adding new chunks to the Work Queue that belong
         to an <span class="quote">&#8220;<span class="quote">invalid</span>&#8221;</span> document and also to deal with chunks that are
         en-route, being processed by the processing threads.</p>

       <p>In response to the notification, the Artifact Producer will drop and release
         back to the CAS Pool all CASes that belong to an <span class="quote">&#8220;<span class="quote">invalid</span>&#8221;</span> document.
         Currently, there is no support in the CollectionReader's API to tell it to stop
         generating chunks. The CollectionReader keeps producing the chunks but the
         Artifact Producer immediately drops/releases them to the CAS Pool. Before the CAS is
         released back to the CAS Pool, the Artifact Producer sends notification to all
         registered listeners. This notification includes the CAS and an exception &#8211;
         SkipCasException.</p>

       <p>In response to the notification of an exception involving a chunk, the
         SequencedQueue retrieves from the CAS the metadata and adds it to its local cache of
         <span class="quote">&#8220;<span class="quote">invalid</span>&#8221;</span> documents. All chunks de-queued from the OutputQueue and
         belonging to <span class="quote">&#8220;<span class="quote">invalid</span>&#8221;</span> documents will be dropped and released back to
         the CAS Pool. Before dropping the CAS, the CPM sends notification to all registered
         listeners. The notification includes the CAS and SkipCasException.</p>

       <p>The <code class="literal">&lt;checkpoint&gt;</code> element is an optional element.
         It specifies a CPE checkpoint file, checkpoint frequency, and strategy for
         checkpoints (time or count based). At checkpoint time, the CPM saves status
         information and statistics to the checkpoint file. The checkpoint file is specified
         in the <code class="literal">file</code> attribute, which has the same form as the
         <code class="literal">href</code> attribute of the <code class="literal">&lt;include&gt;</code>
         element described in <a class="xref" href="#ugr.ref.xml.cpe_descriptor.imports" title="3.3.&nbsp;Imports">Section&nbsp;3.3, &#8220;Imports&#8221;</a>. The
         <code class="literal">time</code> attribute indicates that a checkpoint should be taken
         every <code class="literal">[Number]</code> seconds, and the <code class="literal">batch</code>
         attribute indicates that a checkpoint should be taken every
         <code class="literal">[Number]</code> batches.</p>

       <p>The <code class="literal">&lt;timerImpl&gt;</code> element is optional. It is used to
         identify a custom timer plug-in class to generate time stamps during the CPM
         execution. The value of the element is a Java class name.</p>

       <p>The <code class="literal">&lt;deployAs&gt;</code> element indicates the type of CPM
         deployment. Valid contents for this element include:

         </p><div class="variablelist"><dl><dt><span class="term">vinciService</span></dt><dd><p>Vinci service exposing APIs for stop, pause, resume, and
               getStats</p></dd><dt><span class="term">interactive</span></dt><dd><p>provide command line menus (start, stop, pause,
               resume)</p></dd><dt><span class="term">immediate</span></dt><dd><p>run the CPM without menus or a service API</p></dd><dt><span class="term">single-threaded</span></dt><dd><p>run the CPM in a single threaded mode. In this mode, the
               Collection Reader, the Processing Pipeline, and the CAS Consumer Pipeline
               are all running in one thread without the work queue and the output
               queue.</p></dd></dl></div>

     </div>

     <div class="section" title="3.8.&nbsp;Resource Manager Configuration"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.xml.cpe_descriptor.descriptor.resource_manager_configuration">3.8.&nbsp;Resource Manager Configuration</h2></div></div></div>


       <p>External resource bindings for the CPE may optionally be specified in an
         element:


         </p><pre class="programlisting">&lt;resourceManagerConfiguration href="..."/&gt;</pre>

       <p>For an introduction to external resources, refer to <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.aae.accessing_external_resource_files" class="olink">Section&nbsp;1.5.4, &#8220;Accessing External Resources&#8221;</a>.</p>

       <p>In the <code class="literal">resourceManagerConfiguration</code> element, the value
         of the href attribute refers to another file that contains definitions and bindings
         for the external resources used by the CPE. The format of this file is the same as the XML
         snippet <a href="references.html#ugr.ref.xml.component_descriptor.aes.aggregate.external_resource_bindings" class="olink">Section&nbsp;2.4.2.4, &#8220;External Resource Bindings&#8221;</a>
         . For example, in a CPE containing an aggregate analysis engine with two annotators,
         and a CAS Consumer, the following resource manager configuration file would bind
         external resource dependencies in all three components to the same physical
         resource:


         </p><pre class="programlisting">&lt;resourceManagerConfiguration&gt;

   &lt;!-- Declare Resource --&gt;

   &lt;externalResources&gt;
     &lt;externalResource&gt;
       &lt;name&gt;ExampleResource&lt;/name&gt;
       &lt;fileResourceSpecifier&gt;
         &lt;fileUrl&gt;file:MyResourceFile.dat&lt;/fileUrl&gt;
       &lt;/fileResourceSpecifier&gt;
     &lt;/externalResource&gt;
   &lt;/externalResources&gt;

   &lt;!-- Bind component resource dependencies to ExampleResource --&gt;

   &lt;externalResourceBindings&gt;
     &lt;externalResourceBinding&gt;
       &lt;key&gt;MyAE/annotator1/myResourceKey&lt;/key&gt;
       &lt;resourceName&gt;ExampleResource&lt;/resourceName&gt;
     &lt;/externalResourceBinding&gt;

     &lt;externalResourceBinding&gt;
       &lt;key&gt;MyAE/annotator2/someResourceKey&lt;/key&gt;
       &lt;resourceName&gt;ExampleResource&lt;/resourceName&gt;
     &lt;/externalResourceBinding&gt;

     &lt;externalResourceBinding&gt;
       &lt;key&gt;MyCasConsumer/otherResourceKey&lt;/key&gt;
       &lt;resourceName&gt;ExampleResource&lt;/resourceName&gt;
     &lt;/externalResourceBinding&gt;

   &lt;/externalResourceBindings&gt;

 &lt;/resourceManagerConfiguration&gt;</pre>

       <p>In this example, <code class="literal">MyAE</code> and
         <code class="literal">MyCasConsumer</code> are the names of the Analysis Engine and CAS
         Consumer, as specified by the name attributes of the CPE's
         <code class="literal">&lt;casProcessor&gt;</code> elements.
         <code class="literal">annotator1</code> and <code class="literal">annotator2</code> are the
         annotator keys specified within the Aggregate AE Descriptor, and
         <code class="literal">myResourceKey</code>, <code class="literal">someResourceKey</code>, and
         <code class="literal">otherResourceKey</code> are the keys of the resource dependencies
         declared in the individual annotator and CAS Consumer descriptors.</p>

     </div>

     <div class="section" title="3.9.&nbsp;Example CPE Descriptor"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.xml.cpe_descriptor.descriptor.example">3.9.&nbsp;Example CPE Descriptor</h2></div></div></div>


       <pre class="programlisting">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
 &lt;cpeDescription&gt;
   &lt;collectionReader&gt;
     &lt;collectionIterator&gt;
       &lt;descriptor&gt;
         &lt;import location=
            "../collection_reader/FileSystemCollectionReader.xml"/&gt;
       &lt;/descriptor&gt;
     &lt;/collectionIterator&gt;
   &lt;/collectionReader&gt;
   &lt;casProcessors dropCasOnException="true" casPoolSize="1"
       processingUnitThreadCount="1"&gt;
     &lt;casProcessor deployment="integrated"
       name="Aggregate TAE - Name Recognizer and Person Title Annotator"&gt;
       &lt;descriptor&gt;
         &lt;import location=
            "../analysis_engine/NamesAndPersonTitles_TAE.xml"/&gt;
       &lt;/descriptor&gt;
       &lt;deploymentParameters/&gt;
       &lt;filter/&gt;
       &lt;errorHandling&gt;
         &lt;errorRateThreshold action="terminate" value="100/1000"/&gt;
                 &lt;maxConsecutiveRestarts action="terminate" value="30"/&gt;
                 &lt;timeout max="100000"/&gt;
       &lt;/errorHandling&gt;
       &lt;checkpoint batch="1"/&gt;
     &lt;/casProcessor&gt;
     &lt;casProcessor deployment="integrated" name="Annotation Printer"&gt;
       &lt;descriptor&gt;
         &lt;import location="../cas_consumer/AnnotationPrinter.xml"/&gt;
       &lt;/descriptor&gt;
       &lt;deploymentParameters/&gt;
       &lt;filter/&gt;
       &lt;errorHandling&gt;
         &lt;errorRateThreshold action="terminate" value="100/1000"/&gt;
         &lt;maxConsecutiveRestarts action="terminate" value="30"/&gt;
         &lt;timeout max="100000"/&gt;
       &lt;/errorHandling&gt;
       &lt;checkpoint batch="1"/&gt;
     &lt;/casProcessor&gt;
   &lt;/casProcessors&gt;
   &lt;cpeConfig&gt;
     &lt;numToProcess&gt;1&lt;/numToProcess&gt;
     &lt;deployAs&gt;immediate&lt;/deployAs&gt;
     &lt;checkpoint file="" time="3000"/&gt;
     &lt;timerImpl/&gt;
   &lt;/cpeConfig&gt;
 &lt;/cpeDescription&gt;</pre>
     </div>

 <div class="footnotes"><br><hr width="100" align="left"><div class="footnote"><p><sup>[<a id="ftn.d5e1067" href="#d5e1067" class="para">3</a>] </sup>Deprecated</p></div><div class="footnote"><p><sup>[<a id="ftn.d5e1266" href="#d5e1266" class="para">4</a>] </sup>An earlier UIMA version required these to have a
             suffix of <span class="quote">&#8220;<span class="quote">_p</span>&#8221;</span>, e.g., <span class="quote">&#8220;<span class="quote">string_p</span>&#8221;</span>. This is no
             longer required, but this format is accepted, also, for backward
             compatibility.</p></div></div></div>
   <div class="chapter" title="Chapter&nbsp;4.&nbsp;CAS Reference" id="ugr.ref.cas"><div class="titlepage"><div><div><h2 class="title">Chapter&nbsp;4.&nbsp;CAS Reference</h2></div></div></div>


   <p>The CAS (Common Analysis System) is the part of the Unstructured Information
     Management Architecture (UIMA) that is concerned with creating and handling the data
     that annotators manipulate.</p>

   <p>Java users typically use the JCas (Java interface to the CAS) when manipulating
     objects in the CAS. This chapter describes an alternative interface to the CAS which
     allows discovery and specification of types and features at run time. It is recommended
     for use when the using code cannot know ahead of time the type system it will be dealing
     with.</p>

   <p>Use of the CAS as described here is also recommended (or necessary) when components add
   to the definitions of types of other components.  This UIMA feature allows users to add features
   to a type that was already defined elsewhere.  When this feature is used in conjunction with the
   JCas, it can lead to problems with class loading.  This is because different JCas representations
   of a single type are generated by the different components, and only one of them is loaded
   (unless you are using Pear descriptors).  Note:
   we do not recommend that you add features to pre-existing types.  A type should be defined in one
   place only, and then there is no problem with using the JCas.  However, if you do use this feature,
   do not use the JCas.  Similarly, if you distribute your components for inclusion in somebody else's
   UIMA application, and you're not sure that they won't add features to your types, do not use the
   JCas for the same reasons.
   </p>

   <div class="section" title="4.1.&nbsp;Javadocs"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.cas.javadocs">4.1.&nbsp;Javadocs</h2></div></div></div>


     <p>The subdirectory <code class="literal">docs/api</code> contains the documentation
       details of all the classes, methods, and constants for the APIs discussed here. Please
       refer to this for details on the methods, classes and constants, specifically in the
       packages <code class="literal">org.apache.uima.cas.*</code>.</p>
   </div>

   <div class="section" title="4.2.&nbsp;CAS Overview"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.cas.overview">4.2.&nbsp;CAS Overview</h2></div></div></div>


     <p>There are three<sup>[<a name="d5e1615" href="#ftn.d5e1615" class="footnote">5</a>]</sup> main parts to the CAS: the type system, data creation and
       manipulation, and indexing.  We will start with a brief
       description of these components.</p>
     <div class="section" title="4.2.1.&nbsp;The Type System"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.cas.type_system">4.2.1.&nbsp;The Type System</h3></div></div></div>


       <p>The type system specifies what kind of data you will be able to manipulate in your
         annotators. The type system defines two kinds of entities, types and features. Types
         are arranged in a single inheritance tree and define the kinds of entities (objects)
         you can manipulate in the CAS. Features optionally specify slots or fields within a
         type. The correspondence to Java is to equate a CAS Type to a Java Class, and the CAS
         Features to fields within the type. A critical difference is that CAS types have no
         methods; they are just data structures with named slots (features). These features can
         have as values primitive things like integers, floating point numbers, and strings,
         and they also can hold references to other instances of objects in the CAS. We call
         instances of the data structures declared by the type system <span class="quote">&#8220;<span class="quote">feature
         structures</span>&#8221;</span> (not to be confused with <span class="quote">&#8220;<span class="quote">features</span>&#8221;</span>). Feature
         structures are similar to the many variants of record structures found in computer
         science.<sup>[<a name="d5e1624" href="#ftn.d5e1624" class="footnote">6</a>]</sup></p>

       <p>Each CAS Type defines a supertype; it is a subtype of that supertype. This means
         that any features that the supertype defines are features of the subtype; in other
         words, it inherits its supertype's features. Only single inheritance is
         supported; a type's feature set is the union of all of the features in its
         supertype hierarchy. There is a built-in type called uima.cas.TOP; this is the top,
         root node of the inheritance tree. It defines no features.</p>

       <p>The values that can be stored in features are either built-in primitive values or
         references to other feature structures. The primitive values are
         <code class="literal">boolean</code>, <code class="literal">byte</code>,
         <code class="literal">short</code> (16 bit integers), <code class="literal">integer</code> (32
         bit), <code class="literal">long</code> (64 bit), <code class="literal">float</code> (32 bit),
         <code class="literal">double</code> (64 bit floats) and strings; the official names of these
         are <code class="literal">uima.cas.Boolean</code>, <code class="literal">uima.cas.Byte</code>,
         <code class="literal">uima.cas.Short</code>, <code class="literal">uima.cas.Integer</code>,
         <code class="literal">uima.cas.Long</code>, <code class="literal">uima.cas.Float</code>
         ,<code class="literal"> uima.cas.Double</code> and <code class="literal">uima.cas.String</code>
         . The strings are Java strings, and characters are Java characters.  Technically, this means
         that characters are UTF-16 code points, which is not quite the same as a Unicode character.
         This distinction should make no difference for almost all applications.
         The CAS also defines other basic built-in types for arrays of these, plus arrays of
         references to other objects, called <code class="literal">uima.cas.IntegerArray</code>
         ,<code class="literal"> uima.cas.FloatArray</code>,
         <code class="literal">uima.cas.StringArray</code>,
         <code class="literal">uima.cas.FSArray</code>, etc.</p>

       <p>The CAS also defines a built-in type called
         <code class="literal">uima.tcas.Annotation</code> which inherits from
         <code class="literal">uima.cas.AnnotationBase</code> which in turn inherits from
         <code class="literal">uima.cas.TOP</code>. There are two features defined by this type,
         called <code class="literal">begin</code> and <code class="literal">end</code>, both of which are
         integer valued.</p>

     </div>

     <div class="section" title="4.2.2.&nbsp;Creating, accessing and manipulating data"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.cas.creating_accessing_manipulating_data">4.2.2.&nbsp;Creating, accessing and manipulating data</h3></div></div></div>


       <p>
         Creating and accessing data in the CAS requires knowledge about the types and features
         defined in the type system.  The idea is similar to other data access APIs, such as the XML
         DOM or SAX APIs, or database access APIs such as JDBC.  Contrary to those APIs, however, the
         CAS does not use the names of type system entities directly in the APIs.  Rather, you use
         the type system to access type and feature entities by name, then use these entities in the
         data manipulation APIs.  This can be compared to the Java reflection APIs: the type system
         is comparable to the Java class loader, and the type and feature objects to the
         <code class="literal">java.lang.Class</code> and <code class="literal">java.lang.reflect.Field</code> classes.
       </p>

       <p>
         Why does it have to be this complicated?  You wouldn't normally use reflection to create a
         Java object, either.  As mentioned earlier, the JCas provides the more straightforward
         method to manipulate CAS data.  The CAS access methods described here need only be used for
         generic types of applications that need to be able to handle any kind of data (e.g., generic
         tooling) or when the JCas may not be used for other reasons.  The generic kinds of applications
         are exactly the ones where you would use the reflection API in Java as well.
       </p>

     </div>

     <div class="section" title="4.2.3.&nbsp;Creating and using indexes"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.cas.creating_using_indexes">4.2.3.&nbsp;Creating and using indexes</h3></div></div></div>


       <p>Each view of a CAS provides a set of indexes for that view. Instances of Types (that is, Feature
         Structures) can be added to a view's indexes. These indexes provide
         a way for annotators to locate existing data in the CAS, using a specific index (or the
         method <code class="literal">getAllIndexedFS</code> of the object <code class="literal">FSIndexRepository</code>) to
         retrieve the Feature Structures that were previously created. If you want the data you
         Newly created Feature Structures are not automatically added to the indexes; you choose which
         Feature Structures to add and use one of several APIs to add them.
         </p>

       <p>Indexes are named and are associated with a CAS Type; they are used to index
         instances of that CAS type (including instances of that type's subtypes). If
         you are using multiple views (see <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.mvs" class="olink">Chapter&nbsp;6, <i>Multiple CAS Views of an Artifact</i></a>),
         each view contains a separate instantiation of all of the indexes.
         To access an index, you
         minimally need to know its name. A CAS view provides an index repository which you can
         query for indexes for that view. Once you have a handle to an index, you can get
         information about the feature structures in the index, the size of the index, as well
         as an iterator over the feature structures.</p>

       <p>There are three kinds of indexes:
         </p><div class="itemizedlist"><ul class="itemizedlist" type="disc" compact><li class="listitem">
             <p>bag - no ordering</p>
           </li><li class="listitem">
             <p>set - uses a user-specfied set of keys to define equality; holds one instance of the set of equal items.</p>
           </li><li class="listitem">
             <p>sorted - uses a user-specified set of keys to define ordering.</p>
           </li></ul></div><p>
       </p>

       <p>For set indexes, the comparator keys are augmented with an implicit additional field - the type of the
         feature structure.  This means that an index over Annotations, having subtype Token, and a key of the "begin" value,
         will behave as follows:

         </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>If you make two Tokens (or two Annotations), both having a begin value of 17, and add both of them to the indexes,
             only one of them will be in the index.</p>
           </li><li class="listitem"><p>If you make 1 Token and 1 Annotation, both having a begin value of 17, and add both of them to the indexes,
             both of them will be in the index (because the types are different).
           </p></li></ul></div><p>
       </p>

       <p>Indexes are defined in the XML descriptor metadata for the application. Each CAS
         View has its own, separate instantiation of indexes based on these definitions,
         kept in the view's index repository. When you obtain an index, it is always from a
         particular CAS view's index repository.
         When you index an item, it is always added to all indexes where it
         belongs, within just the view's repository. You can specify different repositories
         (associated with different CAS views) to use; a given Feature Structure instance
         may be indexed in more than one CAS View (unless it is a subtype of AnnotationBase).</p>

       <p>Indexes implement the Iterable interface, so you may use the Java enhanced for loop to iterate over them.</p>

       <p>You can also get iterators from indexes;
         iterators allow you to enumerate the feature structures in an index.  There are two kinds of iterators supported:
         the regular Java iterator API, and a specific FS iterator API
         where the usual Java iterator APIs (<code class="literal">hasNext()</code> and <code class="literal">next()</code>)
         are augmented by <code class="literal">isValid()</code>, <code class="literal">moveToNext() / moveToPrevious()</code> (which does
         not return an element) and <code class="literal">get()</code>.  Finally, there is a <code class="literal">moveTo(FeatureStructure)</code>
         API, which, for sorted indexes, moves the iteration point to the left-most (among otherwise "equal") item
         in the index which compares "equal" to the given FeatureStructure, using the index's defined comparator.
       </p>

       <p>
         Which API style you use is up to you,
         but we do not recommend mixing the styles as the results are sometimes unexpected.  If you
         just want to iterate over an index from start to finish, either style is equally appropriate.
         If you also use <code class="literal">moveTo(FeatureStructure fs)</code> and
         <code class="literal">moveToPrevious()</code>, it is better to use the special FS iterator style.
       </p>

       <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>The reason to not mix these styles is that you might be thinking that
         next() followed by moveToPrevious() would always work.  This is not true, because
         next() returns the "current" element, and advances to the next position, which might be
         beyond the last element.  At that point, the iterator becomes "invalid", and
         moveToNext and moveToPrevious no longer move the iterator.  But you can
         call these methods on the iterator &#8212; moveToFirst(), moveToLast(), or moveTo(FS) &#8212; to reset it.</p></div>

       <p>Indexes are created by specifying them in the annotator's or
         aggregate's resource descriptor. An index specification includes its name,
         the CAS type being indexed, the kind (bag, set or sorted) of index it is, and an (optional) set of keys.
         The keys are used for set and sorted indexes, and specify what values are used for
         ordering, or (for sets) what values are used to determine set equality.
         When a CAS pipeline is created, all index
         specifications are combined; duplicate definitions (having the same name) are
         allowed only if their definitions are the same. </p>

       <p>Feature structure instances need to be explicitly added to the index repository by a
         method call. Feature structures that are not indexed will not be visible to other
         annotators, (unless they are located via being referenced by some other feature of
         another feature structure, which is indexed, or through a chain of these).</p>

       <p>The framework defines an unnamed bag index which indexes all types.  The
         only access provided for this index is the getAllIndexedFS(type) method on the
         index repository, which returns an iterator over all indexed instances of the
         specified type (including its subtypes) for that CAS View.
       </p>

       <p>The framework defines one standard, built-in annotation index, called
         AnnotationIndex, which indexes the <code class="literal">uima.tcas.Annotation</code>
         type: all feature structures of type <code class="literal">uima.tcas.Annotation</code> or
         its subtypes are automatically indexed with this built-in index.</p>

       <p>The ordering relation used by this index is to first order by the value of the
         <span class="quote">&#8220;<span class="quote">begin</span>&#8221;</span> features (in ascending order) and then by the value of the
         <span class="quote">&#8220;<span class="quote">end</span>&#8221;</span> feature (in descending order), and then, finally, by the
         Type Priority. This ordering insures that
         longer annotations starting at the same spot come before shorter ones. For Subjects
         of Analysis other than Text, this may not be an appropriate index.</p>

       <p>In addition to normal iterators, there is a <code class="literal">select</code> API, documented
        in the Version 3 Users guide, which provides additional capabilities for accessing
        Feature Structures via the indexes.</p>

     </div>
   </div>

   <div class="section" title="4.3.&nbsp;Built-in CAS Types"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.cas.builtin_types">4.3.&nbsp;Built-in CAS Types</h2></div></div></div>


     <p>The CAS has two kinds of built-in types &#8211; primitive and non-primitive. The
       primitive types are:

       </p><div class="itemizedlist"><ul class="itemizedlist" type="disc" compact><li class="listitem"><p>uima.cas.Boolean</p></li><li class="listitem"><p>uima.cas.Byte</p></li><li class="listitem"><p>uima.cas.Short</p></li><li class="listitem"><p>uima.cas.Integer</p></li><li class="listitem"><p>uima.cas.Long</p></li><li class="listitem"><p>uima.cas.Float</p></li><li class="listitem"><p>uima.cas.Double</p></li><li class="listitem"><p>uima.cas.String</p></li></ul></div>

     <p>The <code class="literal">Byte, Short, Integer, </code>and<code class="literal"> Long</code> are
       all signed integer types, of length 8, 16, 32, and 64 bits. The
       <code class="literal">Double</code> type is 64 bit floating point. The
       <code class="literal">String</code> type can be subtyped to create sets of allowed values; see
         <a href="references.html#ugr.ref.xml.component_descriptor.type_system.string_subtypes" class="olink">Section&nbsp;2.3.4, &#8220;String Subtypes&#8221;</a>.
       These types can be used to specify the range of a String-valued feature. They act like
       Strings, but have additional checking to insure the setting of values into them
       conforms to one of the allowed values, or to null (which is the value if it is not set).
       Note that the other primitive types cannot be used
       as a supertype for another type definition; only
       <code class="literal">uima.cas.String</code> can be sub-typed.</p>

     <p>The non-primitive types exist in a type hierarchy; the top of the hierarchy is the
       type <code class="literal">uima.cas.TOP</code>. All other non-primitive types inherit from
       some supertype.</p>

     <p>There are 9 built-in array types. These arrays have a size specified when they are
       created; the size is fixed at creation time. They are named:

       </p><div class="itemizedlist"><ul class="itemizedlist" type="disc" compact><li class="listitem"><p>uima.cas.BooleanArray</p></li><li class="listitem"><p>uima.cas.ByteArray</p></li><li class="listitem"><p>uima.cas.ShortArray</p></li><li class="listitem"><p>uima.cas.IntegerArray</p></li><li class="listitem"><p>uima.cas.LongArray</p></li><li class="listitem"><p>uima.cas.FloatArray</p></li><li class="listitem"><p>uima.cas.DoubleArray</p></li><li class="listitem"><p>uima.cas.StringArray</p></li><li class="listitem"><p>uima.cas.FSArray</p></li></ul></div>

     <p>The <code class="literal">uima.cas.FSArray</code> type is an array whose elements are
       arbitrary other feature structures (instances of non-primitive types).</p>

     <p>The JCas cover classes for the array types support the Iterable API, so you may
     write extended for loops over instances of these.  For example:
     </p><pre class="programlisting">FSArray&lt;MyType&gt; myArray = ...
 for (MyType fs : myArray) {
   some_method(fs);
 }</pre><p>
     </p>

     <p>There are 3 built-in types associated with the artifact being analyzed:

       </p><div class="itemizedlist"><ul class="itemizedlist" type="disc" compact><li class="listitem"><p>uima.cas.AnnotationBase</p></li><li class="listitem"><p>uima.tcas.Annotation</p></li><li class="listitem"><p>uima.tcas.DocumentAnnotation</p></li></ul></div>

     <p>The <code class="literal">AnnotationBase</code> type defines one system-used feature
       which specifies for an annotation the subject of analysis (Sofa) to which it refers. The
       Annotation type extends from this and defines 2 features, taking
       <code class="literal">uima.cas.Integer</code> values, called <code class="literal">begin</code>
       and <code class="literal">end</code>. The <code class="literal">begin</code> feature typically
       identifies the start of a span of text the annotation covers; the
       <code class="literal">end</code> feature identifies the end. The values refer to character
       offsets; the starting index is 0. An annotation of the word <span class="quote">&#8220;<span class="quote">CAS</span>&#8221;</span> in a text
       <span class="quote">&#8220;<span class="quote">CAS Reference</span>&#8221;</span> would have a start index of 0, and an end index of 3; the
       difference between end and start is the length of the span the annotation refers
       to.</p>

     <p>Annotations are always with respect to some Sofa (Subject of Analysis &#8211; see
         <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a>
         <a href="tutorials_and_users_guides.html#ugr.tug.aas" class="olink">Chapter&nbsp;5, <i>Annotations, Artifacts, and Sofas</i></a>
       .</p>
     <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>Artifacts which are not text strings may have a different interpretation of
     the meaning of begin and end, or may define their own kind of annotation, extending from
     <code class="literal">AnnotationBase</code>. </p></div>

     <p><a name="ugr.ref.cas.document_annotation"></a>The <code class="literal">DocumentAnnotation</code> type has one special instance. It is
       a subtype of the Annotation type, and the built-in definition defines one feature,
       <code class="literal">language</code>, which is a string indicating the language of the
       document in the CAS. The value of this language feature is used by the system to control
       flow among annotators when the <span class="quote">&#8220;<span class="quote">CapabilityLanguageFlow</span>&#8221;</span> mode is used,
       allowing the flow to skip over annotators that don't process particular
       languages. Users may extend this type by adding additional features to it, using the XML
       Descriptor element for defining a type.</p>

     <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>
       We do <span class="emphasis"><em>not</em></span> recommend extending the <code class="literal">DocumentAnnotation</code>
       type.  If you do, you must <span class="emphasis"><em>not</em></span> use the JCas, for the reasons stated
       earlier.
     </p></div>

     <p>Each CAS view has a different associated instance of the
       <code class="literal">DocumentAnnotation</code> type.  On the CAS, use
       <code class="literal">getDocumentationAnnotation()</code> to access the
       <code class="literal">DocumentAnnotation</code>.</p>

     <p>There are also built-in types supporting linked lists, similar to the ones available in
     Java and other programming languages. Their use is
       constrained by the usual properties of linked lists: not very space efficient, no (efficient)
       random access, but an easy choice if you don't know how long your list will be ahead of time. The
       implementation is type specific; there are different list building objects for each of
       the primitive types, plus one for general feature structures. Here are the type names:
       </p><div class="itemizedlist"><ul class="itemizedlist" type="disc" compact><li class="listitem"><p>uima.cas.FloatList</p></li><li class="listitem"><p>uima.cas.IntegerList</p></li><li class="listitem"><p>uima.cas.StringList</p></li><li class="listitem"><p>uima.cas.FSList</p>
           <p></p></li><li class="listitem"><p>uima.cas.EmptyFloatList</p></li><li class="listitem"><p>uima.cas.EmptyIntegerList</p></li><li class="listitem"><p>uima.cas.EmptyStringList</p></li><li class="listitem"><p>uima.cas.EmptyFSList</p>
           <p></p></li><li class="listitem"><p>uima.cas.NonEmptyFloatList</p></li><li class="listitem"><p>uima.cas.NonEmptyIntegerList</p></li><li class="listitem"><p>uima.cas.NonEmptyStringList</p></li><li class="listitem"><p>uima.cas.NonEmptyFSList</p></li></ul></div>

     <p>For the primitive types <code class="literal">Float</code>,
       <code class="literal">Integer</code>, <code class="literal">String</code> and
       <code class="literal">FeatureStructure</code>, there is a base type, for instance,
       <code class="literal">uima.cas.FloatList</code>. For each of these, there are two subtypes,
       corresponding to a non-empty element, and a marker that serves to indicate the end of the
       list, or an empty list. The non-empty types define two features &#8211;
       <code class="literal">head</code> and <code class="literal">tail</code>. The head feature holds the
       particular value for that part of the list. The tail refers to the next list object
       (either a non-empty one or the empty version to indicate the end of the list).</p>

     <p>For JCas users, the new operator for the NonEmptyXyzList classes includes a 3 argument version
     where you may specify the head and tail values as part of the constructor.  The JCas
     cover classes for these implement
     a <code class="code">push(item)</code> method which creates a new non-empty node, sets the <code class="code">head</code> value
     to <code class="code">item</code>, and the tail to the node it is called on, and returns the new node.
     These classes also implement Iterable, so you can use the enhanced Java <code class="code">for</code> operator.
     The iterator stops when it gets to the end of the list, determined by either the tail being null or
     the element being one of the EmptyXXXList elements.
     Here's a StringList example:
     </p><pre class="programlisting">StringList sl = jcas.emptyStringList();
 sl = sl.push("2");
 sl = sl.push("1");

 for (String s : sl) {
   someMethod(s);  // some sample use
 }</pre><p>

     </p>

     <p>There are no other built-in types. Users are free to define their own type systems,
       building upon these types.</p>

   </div>

   <div class="section" title="4.4.&nbsp;Accessing the type system"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.cas.accessing_the_type_system">4.4.&nbsp;Accessing the type system</h2></div></div></div>


     <p>
       During annotator processing, or outside an annotator, access the type system by calling
       <code class="literal">CAS.getTypeSystem()</code>.
     </p>

     <p>However, CAS annotators implement an additional method,
       <code class="literal">typeSystemInit()</code>, which is called by the UIMA framework before the
       annotator's process method. This method, implemented by the annotator writer,
       is passed a reference to the CAS's type system metadata. The method typically uses
       the type system APIs to obtain type and feature objects corresponding to all the types
       and features the annotator will be using in its process method. This initialization
       step should not be done during an annotator's initialize method since the type
       system can change after the initialize method is called; it should not be done during the
       process method, since this is presumably work that is identical for each incoming
       document, and so should be performed only when the type system changes (which will be a
       rare event). The UIMA framework guarantees it will call the <code class="literal">typeSystemInit
       </code>method of an annotator whenever the type system changes, before calling the
       annotator's <code class="literal">process()</code> method.</p>

     <p>The initialization done by <code class="literal">typeSystemInit()</code> is done by the
       UIMA framework when you use the JCas APIs; you only need to provide a
       <code class="literal">typeSystemInit()</code> method, as described here, when you are not using
       the JCas approach.</p>

     <div class="section" title="4.4.1.&nbsp;TypeSystemPrinter example"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.cas.type_system.printer_example">4.4.1.&nbsp;TypeSystemPrinter example</h3></div></div></div>


       <p>Here is a code fragment that, given a CAS Type System, will print a list of all
         types.</p>


       <pre class="programlisting">// Get all type names from the type system
 // and print them to stdout.
 private void listTypes1(TypeSystem ts) {
   for (Type t : ts) {
     // print its name.
     System.out.println(t.getName());
   }
 }</pre>

       <p>This method is passed the type system as a parameter.  From the type system, we can
         get an iterator
         over all the types. If you run this against a CAS created with no additional
         user-defined types, we should see something like this on the console:</p>

       <pre class="programlisting">Types in the type system:
 uima.cas.Boolean
 uima.cas.Byte
 uima.cas.Short
 uima.cas.Integer
 uima.cas.Long
 uima.cas.ArrayBase
 ...
         </pre>

       <p>If the type system had user-defined types these would show up too. Note that some
         of these types are not directly creatable &#8211; they are types used by the framework
         in the type hierarchy (e.g. uima.cas.ArrayBase).</p>

       <p>CAS type names include a name-space prefix. The components of a type name are
         separated by the dot (.). A type name component must start with a Unicode letter,
         followed by an arbitrary sequence of letters, digits and the underscore (_). By
         convention, the last component of a type name starts with an uppercase letter, the
         rest start with a lowercase letter.</p>

       <p>Listing the type names is mildly useful, but it would be even better if we could see
         the inheritance relation between the types. The following code prints the
         inheritance tree in indented format.</p>


       <pre class="programlisting">private static final int INDENT = 2;
 private void listTypes2(TypeSystem ts) {
   // Get the root of the inheritance tree.
   Type top = ts.getTopType();
   // Recursively print the tree.
   printInheritanceTree(ts, top, 0);
 }

 private void printInheritanceTree(TypeSystem ts, Type type, int level) {
   indent(level); // Print indentation.
   System.out.println(type.getName());
   // Get a vector of the immediate subtypes.
   Vector subTypes =
     ts.getDirectlySubsumedTypes(type);
   ++level; // Increase the indentation level.
   for (int i = 0; i &lt; subTypes.size(); i++) {
     // Print the subtypes.
     printInheritanceTree(ts, (Type) subTypes.get(i), level);
   }
 }

 // A simple, inefficient indenter
 private void indent(int level) {
   int spaces = level * INDENT;
   for (int i = 0; i &lt; spaces; i++) {
     System.out.print(" ");
   }
 }</pre>

       <p> This example shows that you can traverse the type hierarchy by starting at the top
         with TypeSystem.getTopType and by retrieving subtypes with
         <code class="literal">TypeSystem.getDirectlySubsumedTypes()</code>.</p>

       <p>The Javadocs also have APIs that allow you to access the features, as well as what
         the allowed value type is for that feature. Here is sample code which prints out all the
         features of all the types, together with the allowed value types (the feature
         <span class="quote">&#8220;<span class="quote">range</span>&#8221;</span>). Each feature has a <span class="quote">&#8220;<span class="quote">domain</span>&#8221;</span> which is the type
         where it is defined, as well as a <span class="quote">&#8220;<span class="quote">range</span>&#8221;</span>.


         </p><pre class="programlisting">private void listFeatures2(TypeSystem ts) {
   Iterator featureIterator = ts.getFeatures();
   Feature f;
   System.out.println("Features in the type system:");
   while (featureIterator.hasNext()) {
     f = (Feature) featureIterator.next();
     System.out.println(
       f.getShortName() + ": " +
       f.getDomain() + " -&gt; " + f.getRange());
   }
   System.out.println();
 }</pre>

       <p>We can ask a feature object for its domain (the type it is defined on) and its range
         (the type of the value of the feature). The terminology derives from the fact that
         features can be viewed as functions on subspaces of the object space.</p>

     </div>

     <div class="section" title="4.4.2.&nbsp;Using the CAS APIs to create and modify feature structures"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.cas.cas_apis_create_modify_feature_structures">4.4.2.&nbsp;Using the CAS APIs to create and modify feature structures</h3></div></div></div>


       <p>Assume a type system declaration that defines two types: Entity and Person.
         Entity has no features defined within it but inherits from uima.tcas.Annotation
         &#8211; so it has the begin and end features. Person is, in turn, a subtype of Entity,
         and adds firstName and lastName features. CAS type systems are declaratively
         specified using XML; the format of this XML is described in <a href="references.html#ugr.ref.xml.component_descriptor.type_system" class="olink">Section&nbsp;2.3, &#8220;Type System Descriptors&#8221;</a>.


         </p><pre class="programlisting">&lt;!-- Type System Definition --&gt;
 &lt;typeSystemDescription&gt;
   &lt;types&gt;
     &lt;typeDescription&gt;
       &lt;name&gt;com.xyz.proj.Entity&lt;/name&gt;
       &lt;description /&gt;
       &lt;supertypeName&gt;uima.tcas.Annotation&lt;/supertypeName&gt;
     &lt;/typeDescription&gt;
     &lt;typeDescription&gt;
       &lt;name&gt;Person&lt;/name&gt;
       &lt;description /&gt;
       &lt;supertypeName&gt;com.xyz.proj.Entity &lt;/supertypeName&gt;
       &lt;features&gt;
         &lt;featureDescription&gt;
           &lt;name&gt;firstName&lt;/name&gt;
           &lt;description /&gt;
           &lt;rangeTypeName&gt;uima.cas.String&lt;/rangeTypeName&gt;
         &lt;/featureDescription&gt;
         &lt;featureDescription&gt;
           &lt;name&gt;lastName&lt;/name&gt;
           &lt;description /&gt;
           &lt;rangeTypeName&gt;uima.cas.String&lt;/rangeTypeName&gt;
         &lt;/featureDescription&gt;
       &lt;/features&gt;
     &lt;/typeDescription&gt;
   &lt;/types&gt;
 &lt;/typeSystemDescription&gt;</pre>

   <p>
     To be able to access types and features, we need to know their names.  The CAS interface defines
     constants that hold the names of built-in feature names, such as, e.g.,
     <code class="literal">CAS.TYPE_NAME_INTEGER</code>.  It is good programming practice to create such
     constants for the types and features you define, for your own use as well as for others who will
     be using your annotators.
   </p>


       <pre class="programlisting">/** Entity type name constant. */
 public static final String ENTITY_TYPE_NAME = "com.xyz.proj.Entity";

 /** Person type name constant. */
 public static final String PERSON_TYPE_NAME = "com. xyz.proj.Person";

 /** First name feature name constant. */
 public static final String FIRST_NAME_FEAT_NAME = "firstName";

 /** Last name feature name constant. */
 public static final String LAST_NAME_FEAT_NAME = "lastName";</pre>

       <p>Next we define type and feature member variables; these will hold the values of the
         type and feature objects needed by the CAS APIs, to be assigned during
         <code class="literal">typeSystemInit()</code>.</p>


       <pre class="programlisting">// Type system object variables
 private Type entityType;
 private Type personType;
 private Feature firstNameFeature;
 private Feature lastNameFeature;
 private Type stringType;</pre>

       <p>The type system does not throw an exception if we ask for something that is
         not known, it simply returns null; therefore the code checks for this and throws a proper
         exception.  We require all these types and features to be defined for the annotator to
         work.  One might imagine situations where certain computations are predicated on some type
         or feature being defined in the type system, but that is not the case here.</p>


       <pre class="programlisting">// Get a type object corresponding to a name.
 // If it doesn't exist, throw an exception.
 private Type initType(String typeName)
   throws AnnotatorInitializationException {
   Type type = ts.getType(typeName);
   if (type == null) {
     throw new AnnotatorInitializationException(
       AnnotatorInitializationException.TYPE_NOT_FOUND,
       new Object[] { this.getClass().getName(), typeName });
   }
   return type;
 }

 // We add similar code for retrieving feature objects.
 // Get a feature object from a name and a type object.
 // If it doesn't exist, throw an exception.
 private Feature initFeature(String featName, Type type)
   throws AnnotatorInitializationException {
   Feature feat = type.getFeatureByBaseName(featName);
   if (feat == null) {
     throw new AnnotatorInitializationException(
       AnnotatorInitializationException.FEATURE_NOT_FOUND,
       new Object[] { this.getClass().getName(), featName });
   }
   return feat;
 }</pre>

       <p>Using these two functions, code for initializing the type system described
         above would be:


         </p><pre class="programlisting">public void typeSystemInit(TypeSystem aTypeSystem)
     throws AnalysisEngineProcessException {
   this.typeSystem = aTypeSystem;
   // Set type system member variables.
   this.entityType = initType(ENTITY_TYPE_NAME);
   this.personType = initType(PERSON_TYPE_NAME);
   this.firstNameFeature =
     initFeature(FIRST_NAME_FEAT_NAME, personType);
   this.lastNameFeature =
     initFeature(LAST_NAME_FEAT_NAME, personType);
   this.stringType = initType(CAS.TYPE_NAME_STRING);
 }</pre>

       <p>Note that we initialize the string type by using a type name constant from the
         CAS.</p>

     </div>
   </div>

   <div class="section" title="4.5.&nbsp;Creating feature structures"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.cas.creating_feature_structures">4.5.&nbsp;Creating feature structures</h2></div></div></div>


     <p>To create feature structures in JCas, we use the Java <span class="quote">&#8220;<span class="quote">new</span>&#8221;</span>
       operator. In the CAS, we use one of several different API methods on the CAS object,
       depending on which of the 10 basic kinds of feature structures we are creating (a plain
       feature structure, or an instance of the built-in primitive type arrays or FSArray).
       There are is also a method to create an instance of a
       <code class="literal">uima.tcas.Annotation</code>, setting the begin and end
       values.</p>

     <p>Once a feature structure is created, it needs to be added to the CAS indexes (unless
       it will be accessed via some reference from another accessible feature structure). The
       CAS provides this API: Assuming aCAS holds a reference to a CAS, and token holds a
       reference to a newly created feature structure, here's the code to add that
       feature structure to all the relevant CAS indexes:</p>


     <pre class="programlisting">    // Add the token to the index repository.
     aCAS.addFsToIndexes(token);</pre>

     <p>There is also a corresponding <code class="literal">removeFsFromIndexes(token)</code>
       method on CAS objects.</p>

     <p>As of version 2.4.1, there are two methods you can use on an index repository
     to efficiently bulk-remove all
     instances of particular types of feature structures from a particular view.  One of these,
     <code class="code">aCas.getIndexRepository().removeAllIncludingSubtypes(aType)</code> removes all instances of a particular
     type, including instances which are subtypes of the specified type.  The other,
     <code class="code">aCas.getIndexRepository().removeAllExcludingSubtypes(aType)</code> remove all instances of a particular
     type, only.  In both cases, the removal is done from the particular view of the CAS referenced
     by aCas.</p>

     <div class="section" title="4.5.1.&nbsp;Updating indexed feature structures"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.cas.updating_indexed_feature_structures">4.5.1.&nbsp;Updating indexed feature structures</h3></div></div></div>

     <p>Version 2.7.0 added protection for indexes when feature structure key
     value features are updated.  By default this protection is automatic, but
     at some performance cost.  Users may optimize this further.</p>

     <p>Protection is needed because some of the indexes (the Sorted and Set types) use comparators defined
     to use values of the particular features; if these values
     need to be changed after the feature structure is added to the indexes,
     the correct way to do this is to:
     </p><div class="orderedlist"><ol class="orderedlist" type="1" compact><li class="listitem"><p>completely remove the item from all indexes where it is indexed, in all views
       where it is indexed,</p>
       </li><li class="listitem"><p>update the value of the features being used as keys,</p></li><li class="listitem"><p>add the item back to the indexes, in all views.</p></li></ol></div>

       <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>It&#8217;s OK to change feature values which are not used in determining
       sort ordering (or set membership), without removing and re-adding back to the index.
       </p></div>


     <p>The automatic protection checks for updates of
     features being used as keys, and if it finds an update like this for a feature structure that
     is in the indexes, it removes the feature structure from the indexes, does the update,
     and adds it back.  It will do this for every feature update.  This is obviously not
     efficient when multiple features are being updated; in that case it would better to
     remove the feature structure, do all the updates to all the features needing updates, and then
     do a single add-back operation.</p>

     <p>This is supported in user&#8217;s code by using the new method <code class="code">protectIndexes</code>
     available in both the CAS and JCas interface.

     Here's two ways
     of using this, one with a try / finally and the other with a Runnable:
             </p><pre class="programlisting">// an approach using try / finally
 AutoCloseable ac = my_cas.protectIndexes();  // my_cas is a CAS or a JCas
 try {
    ...  arbitrary user code which updates features
         which may be "keys" in one or more indexes
 } finally {
   ac.close();
 }

 // This can more compactly be written using the auto-close feature of try:

 try (AutoCloseable ac = my_cas.protectIndexes()) {
    ...  arbitrary user code which updates features
         which may be "keys" in one or more indexes
 }

 // an approach using a Runnable, written in Java 8 lambda syntax
 my_cas.protectIndexes(() -&gt; {
   ... arbitrary user code updating "key" features,
       but no checked exceptions are permitted
   });</pre>

     <p>The <code class="code">protectIndexes</code> implementation only removes feature structures that
     have features being updated which are used as keys in some index(es). At the end of the scope
     of the protectIndexes, it adds all of these back.  It also skips removing feature structures
     from bag indexes, since these have no keys.</p>

     <p>Within a <code class="code">protectIndexes</code> block, do not do any operations which depend on the
     indexes being valid, such as creating and using an iterator.  This is because the removed FSs
     are only added back at the end of the protectIndexes block.</p>

     <p>The JVM property <code class="code">-Duima.report_fs_update_corrupts_index</code> will generate a log entry
     everytime the frameworks finds (and automatically surrounds with a remove - add-back) an update to
     a feature which could corrupt the index.  The log entries can be identified by scanning for messages
     starting with <code class="code">While FS was in the index, the feature</code> - the message goes on to identify
     the feature in question.  Users can use these reports to find the places in their code where
     they can either change the design to avoid updating these values after the item is indexed, or
     surround the updates with their own <code class="code">protectIndexes</code> blocks.</p>

     <p>Initially, the out-of-the-box defaults
     for the UIMA framework will run with an automatic (but somewhat inefficient) protection.  To improve upon this,
     users would:
     </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>Turn on reporting using a global JVM flag <code class="code">
       -Duima.report_fs_update_corrupts_index</code>.
       This will cause a message to be logged each time the automatic protection is being invoked,
       and allows the user to find the spots to improve.</p>
       </li><li class="listitem"><p>Improve each spot, perhaps by surrounding the update code with a protectIndexes
       block, or by rearranging code to reduce updating feature values used as index keys.</p>
       </li><li class="listitem"><p>Once the code is no longer generating any reports, you can turn off the
       automatic protection for production runs using the JVM global property
       <code class="code">-Duima.disable_auto_protect_indexes</code>, and rely on the protectIndexes blocks.
       If protection is disabled, then the corruption detection is skipped, making the production
       runs perhaps a bit faster, although this is not significant in most cases.</p></li><li class="listitem"><p>For automated build systems, there&#8217;s a JVM parameter,
       <code class="code">-Duima.exception_when_fs_update_corrupts_index</code>, which will throw an
       exception if any automatic recovery situation is encountered.  You can use this
       in build/test scenarios to insure
       (after adding all needed protectIndexes blocks) that the code remains safe for
       turning off the checking in production runs.</p></li></ul></div><p>
     </p>

     </div>
   </div>

   <div class="section" title="4.6.&nbsp;Accessing or modifying features of feature structures"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.cas.accessing_modifying_features_of_feature_structures">4.6.&nbsp;Accessing or modifying features of feature structures</h2></div></div></div>


     <p>Values of individual features for a feature structure can be set or referenced,
       using a set of methods that depend on the type of value that feature is declared to have.
       There are methods on FeatureStructure for this: getBooleanValue, getByteValue,
       getShortValue, getIntValue, getLongValue, getFloatValue, getDoubleValue,
       getStringValue, and getFeatureValue (which means to get a value which in turn is a
       reference to a feature structure). There are corresponding <span class="quote">&#8220;<span class="quote">setter</span>&#8221;</span>
       methods, as well. These methods on the feature structure object take as arguments the
       feature object retrieved earlier in the typeSystemInit method.</p>

     <p>Using the previous example, with the type system initialized with type personType
       and feature lastNameFeature, here's a sample code fragment that gets and sets
       that feature:</p>


     <pre class="programlisting">// Assume aPerson is a variable holding an object of type Person
 // get the lastNameFeature value from the feature structure
 String lastName = aPerson.getStringValue(lastNameFeature);
 // set the lastNameFeature value
 aPerson.setStringValue(lastNameFeature, newStringValueForLastName);</pre>

     <p>The getters and setters for each of the primitive types are defined in the Javadocs
       as methods of the FeatureStructure interface.</p>

   </div>

   <div class="section" title="4.7.&nbsp;Indexes and Iterators"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.cas.indexes_and_iterators">4.7.&nbsp;Indexes and Iterators</h2></div></div></div>


     <p>Each CAS can have many indexes associated with it; each CAS View contains
       a complete set of instantiations of the indexes.   Each index is represented by an
       instance of the type org.apache.uima.cas.FSIndex. You use the object
       org.apache.uima.cas.FSIndexRepository, accessible via a method on a CAS object, to
       retrieve instances of indexes. There are methods that let you select the index
       by name, by type, or by both name and type. Since each index is already associated with a type,
       passing both a name and a type is valid only if the type passed in is the same
       type or a subtype of the one declared in the index specification for the named index. If you
       pass in a subtype, the returned FSIndex object refers to an index that will return only
       items belonging to that subtype (or subtypes of that subtype).</p>

     <p>The returned FSIndex objects are used, in turn, to create iterators.
       There is also a method on the Index Repository, <code class="literal">getAllIndexedFS</code>,
       which will return an iterator over all indexed Feature Structures (for that CAS View),
       in no particular order.  The iterators
       created can be used like common Java iterators, to sequentially retrieve items
       indexed. If the index represents a sorted index, the items are returned in a sorted
       order, where the sort order is specified in the XML index definition. This XML is part of
       the Component Descriptor, see <a href="references.html#ugr.ref.xml.component_descriptor.aes.index" class="olink">Section&nbsp;2.4.1.5, &#8220;Index Definition&#8221;</a>.</p>

     <p>In UIMA V3, Feature structures may be added to or removed from indexes while iterating
       over them.  If this happens, any iterators already created will continue to operate over the
       before-modification version of the index, unless or until the iterator is re-synchronized with the current
       value of the index via one of the following specific 3 iterator API calls:
       moveToFirst, moveToLast, or moveTo(FeatureStructure).
       ConcurrentModificationException is no longer thrown in UIMA v3.
     </p>

     <p>Feature structures being iterated over may have features which are used as the "keys" of an index, updated.
     If this is done, UIMA will protect the indexes (to prevent index corruption) by automatically removing the
     Feature Structure from the indexes,
     updating the field, and adding the FS back to the index (possibly in a new position).
     This automatic remove / add-back operation no longer makes the iterator throw a ConcurrentModificationException
     (as it did in UIMA Version 2) if the iterator is incremented or decremented;
     existing iterators will continue to operate as if no index modification occurred.
     </p>


     <div class="section" title="4.7.1.&nbsp;Built-in Indexes"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.cas.index.built_in_indexes">4.7.1.&nbsp;Built-in Indexes</h3></div></div></div>


       <p>An unnamed built-in bag index exists which holds all feature structures which are indexed.
       The only access to this index is the method getAllIndexedFS(Type) which returns an iterator
       over all indexed Feature Structures.</p>

       <p>The CAS also contains a built-in index for the type <code class="literal">uima.tcas.Annotation</code>, which sorts
         annotations in the order in which they appear in the document. Annotations are sorted first by increasing
         <code class="literal">begin</code> position. Ties are then broken by <span class="emphasis"><em>decreasing</em></span>
         <code class="literal">end</code> position (so that longer annotations come first). Annotations that match in both
         their <code class="literal">begin</code> and <code class="literal">end</code> features are sorted using the Type Priority,
         if any are defined
         (see <a href="references.html#ugr.ref.xml.component_descriptor.aes.type_priority" class="olink">Section&nbsp;2.4.1.4, &#8220;Type Priority Definition&#8221;</a> )</p>
     </div>


     <div class="section" title="4.7.2.&nbsp;Adding Feature Structures to the Indexes"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.cas.index.adding_to_indexes">4.7.2.&nbsp;Adding Feature Structures to the Indexes</h3></div></div></div>


       <p>Feature Structures are added to the indexes by various APIs. These add the Feature Structure to
         <span class="emphasis"><em>all</em></span> indexes that are defined for the type of that FeatureStructure (or any of its
         supertypes), in a particular view.
         Note that you should not add a Feature Structure to the indexes until you have set values for all
         of the features that may be used as sort keys in an index.</p>

       <p>There are multiple APIs for adding FSs to the index.
         </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>(preferred) myFeatureStructure.addToIndexes(). This adds the feature structure instance to the
           view in which it was originally created.</p>
           </li><li class="listitem"><p>(preferred) myFeatureStructure.addToIndexes(JCas or CAS). This adds the feature structure instance to the
             view represented by the argument.</p>
           </li><li class="listitem"><p>(older form) casView.addFsToIndexes(myFeatureStructure) or jcasView.addFsToIndexes(myFeatureStructure).
             This adds the feature structure instance to the
             view represented by the cas (or jcas).</p>
           </li><li class="listitem"><p>(older form) fsIndexRepositoryView.addFsToIndexes(myFeatureStructure).
             This adds the feature structure instance to the
             view represented by the fsIndexRepository instance.</p>
           </li></ul></div><p>
       </p>
     </div>

     <div class="section" title="4.7.3.&nbsp;Iterators over UIMA Indexes"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.cas.index.iterators">4.7.3.&nbsp;Iterators over UIMA Indexes</h3></div></div></div>


       <p>Iterators are objects of class <code class="literal">org.apache.uima.cas.FSIterator.</code> This class
         extends <code class="literal">java.util.Iterator</code> and implements the normal Java iterator methods, plus
         additional ones that allow moving both forwards and backwards.</p>

       <p>UIMA Indexes implement iterable, so you can use the index directly in a Java extended for loop.</p>

     </div>

     <div class="section" title="4.7.4.&nbsp;Special iterators for Annotation types"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.cas.index.annotation_index">4.7.4.&nbsp;Special iterators for Annotation types</h3></div></div></div>


       <p>Note: we recommend using the UIMA V3 select framework, instead of the following.
         It implements all of the following capabilities, and more, in a uniform manner.</p>

       <p>The built-in index over the <code class="literal">uima.tcas.Annotation</code> type
         named <span class="quote">&#8220;<span class="quote"><code class="literal">AnnotationIndex</code></span>&#8221;</span> has additional
         capabilities. To use them, you first get a reference to this built-in index using
         either the <code class="literal">getAnnotationIndex</code> method on a CAS View object, or
         by asking the <code class="literal">FSIndexRepository</code> object for an index having the
         particular name <span class="quote">&#8220;<span class="quote">AnnotationIndex</span>&#8221;</span>, for example:

         </p><pre class="programlisting">AnnotationIndex idx = aCAS.getAnnotationIndex();
 // or you can iterate over a specific subtype of Annotation:
 AnnotationIndex idx = aCAS.getAnnotationIndex(aType); </pre>

       <p>This object can be used to produce several additional kinds of iterators. It can
         produce unambiguous iterators; these skip over elements until it finds one where the
         start position of the next annotation is equal to or greater than the end position of
         the previously returned annotation.</p>

       <p>It can also produce several kinds of subiterators; these are iterators whose
         annotations fall within the span of another annotation. This kind of iterator can
         also have the unambiguous property, if desired. It also can be
         <span class="quote">&#8220;<span class="quote">strict</span>&#8221;</span> or not; strict means that the returned annotation lies
         completely within the span of the controlling annotation. Non-strict only implies
         that the beginning of the returned annotation falls within the span of the
         controlling annotation.</p>

       <p>There is also a method which produces an <code class="literal">AnnotationTree</code>
         object, which contains nodes representing the results of doing a strict,
         unambiguous subiterator over the span of some controlling annotation. For more
         details, please refer to the Javadocs for the
         <code class="literal">org.apache.uima.cas.text</code> package.</p>

     </div>

     <div class="section" title="4.7.5.&nbsp;Constraints and Filtered iterators"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.cas.index.constraints_and_filtered_iterators">4.7.5.&nbsp;Constraints and Filtered iterators</h3></div></div></div>


       <p>Note: for new code, consider using the select framework plus Streams, instead of
         the following.</p>

       <p>There is a set of API calls that build constraint objects. These objects can be
         used directly to test if a particular feature structure matches (satisfies) the
         constraint, or they can be passed to the createFilteredIterator method to create an
         iterator that skips over instances which fail to satisfy the constraint.</p>

       <p>It is possible to specify a feature value located by following a chain of
         references starting from the feature structure being tested. Here's a
         scenario to explore this concept. Let's suppose you have the following type
         system (namespaces are omitted for clarity):

         </p><div class="blockquote"><blockquote class="blockquote">
           <p><span class="bold"><strong>Token</strong></span>, having a feature PartOfSpeech
             which holds a reference to another type (POS)</p>

           <p><span class="bold"><strong>POS</strong></span> (a type with many subtypes, each
             representing a different part of speech)</p>

           <p><span class="bold"><strong>Noun</strong></span> (a subtype of POS)</p>

           <p><span class="bold"><strong>ProperName</strong></span> (a subtype of Noun),
             having a feature Class which holds an integer value encoding some information
             about the proper noun.</p></blockquote></div>

       <p>If you want to filter Token instances, such that only those tokens get through
         which are proper names of class 3 (for example), you would need a test that started with
         a Token instance, followed its PartOfSpeech reference to another instance (the
         ProperName instance) and then tested the Class feature of that instance for a value
         equal to 3.</p>

       <p>To support this, the filtering approach has components that specify tests, and
         components that specify <span class="quote">&#8220;<span class="quote">paths</span>&#8221;</span>. The tests that can be done include
         testing references to type instances to see if they are instances of some type or its
         subtypes; this is done with a FSTypeConstraint constraint. Other tests check for
         equality or, for numeric values, ranges.</p>

       <p>Each test may be combined with a path &#8211; to get to the value to test. Tests that
         start from a feature structure instance can be combined with and and or connectors.
         The Javadocs for these are in the package org.apache.uima.cas in the classes that end
         in Constraint, plus the classes ConstraintFactory, FeaturePath and CAS.
         Here's an example; assume the variable cas holds a reference to a CAS instance.


         </p><pre class="programlisting">// Start by getting the constraint factory from the CAS.
 ConstraintFactory cf = cas.getConstraintFactory();

 // To specify a path to an item to test, you start by
 // creating an empty path.
 FeaturePath path = cas.createFeaturePath();

 // Add POS feature to path, creating one-element path.
 path.addFeature(posFeat);

 // You can extend the chain arbitrarily by adding additional
 // features.

 // Create a new type constraint.

 // Type constraints will check that structures
 // they match against have a type at least as specific
 // as the type specified in the constraint.
 FSTypeConstraint nounConstraint = cf.createTypeConstraint();

 // Set the type (by default it is TOP).
 // This succeeds if the type being tested by this constraint
 // is nounType or a subtype of nounType.
 nounConstraint.add(nounType);

 // Embed the noun constraint under the pos path.
 // This means, associate the test with the path, so it tests the
 // proper value.

 // The result is a test which will
 // match a feature structure that has a posFeat defined
 // which has a value which is an instance of a nounType or
 // one of its subtypes.
 FSMatchConstraint embeddedNoun = cf.embedConstraint(path, nounConstraint);

 // Create a type constraint for token (or a subtype of it)
 FSTypeConstraint tokenConstraint = cf.createTypeConstraint();

 // Set the type.
 tokenConstraint.add(tokenType);

 // Create the final constraint by conjoining the two constraints.
 FSMatchConstraint nounTokenCons = cf.and(nounConstraint, tokenConstraint);

 // Create a filtered iterator from some annotation iterator.
 FSIterator it = cas.createFilteredIterator(annotIt, nounTokenCons);</pre><p>
         </p></div></div>

   <div class="section" title="4.8.&nbsp;The CAS API's &#8211; a guide to the Javadocs"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.cas.guide_to_javadocs">4.8.&nbsp;The CAS API's &#8211; a guide to the Javadocs</h2></div></div></div>


     <p>The CAS APIs are organized into 3 Java packages: cas, cas.impl, and cas.text. Most
       of the APIs described here are in the cas package. The cas.impl package contains classes
       used in serializing and deserializing (reading and writing external representations) the
       CAS in various formats, for
       transporting the CAS among local and remote annotators, or for storing the CAS in
       permanent storage. The cas.text contains the APIs that extend the CAS to support
       artifact (including <span class="quote">&#8220;<span class="quote">text</span>&#8221;</span>) analysis.</p>

     <div class="section" title="4.8.1.&nbsp;APIs in the CAS package"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.cas.javadocs.cas_package">4.8.1.&nbsp;APIs in the CAS package</h3></div></div></div>


       <p>The main objects implementing the APIs discussed here are shown in the diagram
         below. The hierarchy represents that there is a way to get from an upper object to an
         instance of the lower object, usually by using a method on the upper object; this is not
         an inheritance hierarchy.
         </p><div class="figure"><a name="ugr.ref.cas.fig.api_hierarchy"></a><div class="figure-contents">

           <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="574"><tr><td><img src="images/references/ref.cas/image001.png" width="574" alt="CAS object hierarchy"></td></tr></table></div>
         </div><p class="title"><b>Figure&nbsp;4.1.&nbsp;CAS Object hierarchy</b></p></div><p><br class="figure-break"> </p>

       <p>The main Interface is the CAS interface. This has most of the functionality of the
         CAS, except for the type system metadata access, and the indexing access. JCas and CAS
         are alternative representations and API approaches to the CAS; each has a method to
         get the other. You can mix JCas and CAS APIs in your application as needed. To use the
         JCas APIs, you have to create the Java classes that correspond to the CAS types, and
         include them in the Java class path of the application. If you have a CAS object, you can
         get a JCas object by using the getJCas() method call on the CAS object; likewise, you
         can get the CAS object from a JCas by using the getCAS() method call on the JCas object.
         There is also a low level CAS interface that is not part of the official API, and is
         intended for internal use only &#8211; it is not documented here.</p>

       <p>The type system metadata APIs are found in the TypeSystem interface. The objects
         defining each type and feature are defined by the interfaces Type and Feature. The
         Type interface has methods to see what types subsume other types, to iterate over the
         types available, and to extract information about the types, including what
         features it has. The Feature interface has methods that get what type it belongs to,
         its name, and its range (the kind of values it can hold).</p>

       <p>The FSIndexRepository gives you access to methods to get instances of indexes, and
         also provides access to the iterator over all indexed feature structures:
         <code class="literal">getAllIndexedFS(aType)</code>.
         The FSIndex and AnnotationIndex objects give you methods to create instances of
         iterators.</p>

       <p>Iterators and the CAS methods that create new feature structures return
         FeatureStructure objects. These objects can be used to set and get the values of
         defined features within them.</p>
     </div>
   </div>

   <div class="section" title="4.9.&nbsp;Type Merging"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.cas.typemerging">4.9.&nbsp;Type Merging</h2></div></div></div>


     <p>When annotators are combined in an aggregate, their defined type systems are merged.
     This is designed to support independent development of annotator components.  The merge
     results in a single defined type system for CASes that flow through a particular set of
     annotators.</p>

     <p>The basic operation of a type system merge is to iterate through all the defined types,
     and if two annotators define the same fully qualified type name,
     to take the features defined for those types
     and form a logical union of those features.  This operation requires that same-named features
     have the same range type names.  The resulting type system has features comprising the union
     of all features over all the various definitions for this type in different annotators.
     </p>

     <p>Feature merging checks that for all features having the same name in a type, that the
     range type is identical; otherwise an error is signaled.</p>

     <p>Types are combined for merging when their fully qualified names are the same.
     Two different definitions can be merged even if their supertype definitions do not match, if
     one supertype subsumes the other supertype; otherwise an error is signaled.  Likewise, two types
     with the same name can be merged only if their features can be merged.
     </p>
     </div>

   <div class="section" title="4.10.&nbsp;Limited multi-thread access to read-only CASs"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.cas.limitedmultipleaccess">4.10.&nbsp;Limited multi-thread access to read-only CASs</h2></div></div></div>


     <p>Some applications may find it useful to scale up pipelines and run these in parallel.</p>
     <p>
     Generally, CASs are not threadsafe, and only one thread at a time may operate on it.  In many
     scenarios, a CAS may be initialized and then filled with Feature Structures, and after some point,
     no more updates to that particular CAS will be done.</p>

     <p>
     If a CAS is no longer going to be changed, it is possible to
     access it on multiple threads in a read-only mode, simultaneously, with some limitations.  Limitations
     arise because some UIMA Framework activities may update internal CAS data structures.</p>

     <p>Operational data is updated while running a pipeline when a PEAR is entered or exited,
     because PEARs establish new class loaders and can potentially switch the JCas classes being used
     (This happens because the class loaders might define different JCas cover classes
     implementing the same UIMA type).
     Because of this, you cannot have multiple pipelines accessing a CAS in read-only mode if one or more of those
     pipelines contains a PEAR. There are other edge cases where this may happen as well; for example, if you are
     running a pipeline with an Extension Class Loader,
     and have a callback routine loaded under a different class loader, UIMA will switch the JCas classes when
     calling the callback.
     </p>
     </div>
 <div class="footnotes"><br><hr width="100" align="left"><div class="footnote"><p><sup>[<a id="ftn.d5e1615" href="#d5e1615" class="para">5</a>] </sup>A fourth part, the Subject of Analysis,
       is discussed in <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.aas" class="olink">Chapter&nbsp;5, <i>Annotations, Artifacts, and Sofas</i></a>.</p></div><div class="footnote"><p><sup>[<a id="ftn.d5e1624" href="#d5e1624" class="para">6</a>] </sup> The name <span class="quote">&#8220;<span class="quote">feature structure</span>&#8221;</span> comes from
         terminology used in linguistics.</p></div></div></div>
   <div class="chapter" title="Chapter&nbsp;5.&nbsp;JCas Reference" id="ugr.ref.jcas"><div class="titlepage"><div><div><h2 class="title">Chapter&nbsp;5.&nbsp;JCas Reference</h2></div></div></div>


   <p>The CAS is a system for sharing data among annotators, consisting of data structures
     (definable at run time), sets of indexes over these data, metadata describing these, subjects of
     analysis, and a high
     performance serialization/deserialization mechanism. JCas provides Java approach to
     accessing CAS data, and is based on using generated, specific Java classes for each CAS
     type.</p>

   <p>Annotators process one CAS per call to their process method. During processing,
     annotators can retrieve feature structures from the passed in CAS, add new ones, modify
     existing ones, and use and update CAS indexes. Of course, an annotator can also use plain
     Java Objects in addition; but the data in the CAS is what is shared among annotators within
     an application.</p>

   <p>All the facilities present in the APIs for the CAS are available when using the JCas
     APIs; indeed, you can use the getCas() method to get the corresponding CAS object from a
     JCas (and vice-versa). The JCas APIs often have helper methods that make using this
     interface more convenient for Java developers.</p>

   <p>The data in the CAS are typed objects having fields. JCas uses a set of generated Java
     classes (each corresponding to a particular CAS type) with <span class="quote">&#8220;<span class="quote">getter</span>&#8221;</span> and
     <span class="quote">&#8220;<span class="quote">setter</span>&#8221;</span> methods for the features, plus a constructor so new instances can
     be made. The Java classes stores the data in the class instance.</p>

     <p>Users can modify the JCas generated
     Java classes by adding fields to them; this allows arbitrary non-CAS data to also be
     represented within the JCas objects, as well; however, the non-CAS data stored in the JCas
     object instances cannot be shared with annotators using the plain CAS, unless special
     provision is made - see the chapter in the v3 user's guide on storing arbitrary
     Java objects in the CAS.</p>

   <p>The JCas class Java source files are generated from XML type system descriptions. The
     JCasGen utility does the work of generating the corresponding Java Class Model for the CAS
     types. There are a variety of ways JCasGen can be run; these are described later. You
     include the generated classes with your UIMA component, and you can publish these classes
     for others who might want to use your type system.</p>

   <p>JCas classes are not required for all UIMA types.  Those types which don't have
     corresponding JCas classes use the nearest JCas class corresponding to a type in their superchain.</p>

   <p>The specification of the type system in XML can be written using a conventional text
     editor, an XML editor, or using the Eclipse plug-in that supports editing UIMA
     descriptors.</p>

   <p>Changes to the type system are done by changing the XML and regenerating the
     corresponding Java Class Models. Of course, once you've published your type system
     for others to use, you should be careful that any changes you make don't adversely
     impact the users. Additional features can be added to existing types without breaking
     other code.</p>

   <p>A separate Java class is generated for each type; this type implements the CAS
     FeatureStructure interface, as well as having the special getters and setters for the
     included features. The generated Java classes have methods (getters and setters) for the
     fields as defined in the XML type specification. Descriptor comments are reflected in the
     generated Java code as Java-doc style comments.</p>


   <div class="section" title="5.1.&nbsp;Name Spaces"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.jcas.name_spaces">5.1.&nbsp;Name Spaces</h2></div></div></div>


     <p>Full Type names consist of a <span class="quote">&#8220;<span class="quote">namespace</span>&#8221;</span> prefix dotted with a simple
       name. Namespaces are used like packages to avoid collisions between types that are
       defined by different people at different times. The namespace is used as the Java
       package name for generated Java files.</p>

     <p>Type names used in the CAS correspond to the generated Java classes directly. If the
       CAS name is com.myCompany.myProject.ExampleClass, the generated Java class is in the
       package com.myCompany.myProject, and the class is ExampleClass.</p>

     <p>
       An exception to this rule is the built-in types
       starting with <code class="literal">uima.cas </code>and <code class="literal">uima.tcas</code>;
       these names are mapped to Java packages named
       <code class="literal">org.apache.uima.jcas.cas</code> and
       <code class="literal">org.apache.uima.jcas.tcas</code>.</p>

   </div>

   <div class="section" title="5.2.&nbsp;XML description element"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.jcas.use_of_description">5.2.&nbsp;XML description element</h2></div></div></div>


     <p>Each XML type specification can have &lt;description ...
       &gt; tags. The description for a type will be copied into the generated Java code, as a
       Javadoc style comment for the class. When writing these descriptions in the XML type
       specification file, you might want to use html tags, as allowed in Javadocs.</p>

     <p>If you use the Component Description Editor, you can write the html tags normally,
       for instance, <span class="quote">&#8220;<span class="quote">&lt;h1&gt;My Title&lt;/h1&gt;</span>&#8221;</span>. The Component
       Descriptor Editor will take care of coverting the actual descriptor source so that it
       has the leading <span class="quote">&#8220;<span class="quote">&lt;</span>&#8221;</span> character written as <span class="quote">&#8220;<span class="quote">&amp;lt;</span>&#8221;</span>,
       to avoid confusing the XML type specification. For example, &lt;p&gt; would be written
       in the source of the descriptor as &amp;lt;p&gt;. Any characters used in the Javadoc
       comment must of course be from the character set allowed by the XML type specification.
       These specifications often start with the line &lt;?xml version=<span class="quote">&#8220;<span class="quote">1.0</span>&#8221;</span>
       encoding=<span class="quote">&#8220;<span class="quote">UTF-8</span>&#8221;</span> ?&gt;, which means you can use any of the UTF-8
       characters.</p>

   </div>

   <div class="section" title="5.3.&nbsp;Mapping built-in CAS types to Java types"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.jcas.mapping_built_ins">5.3.&nbsp;Mapping built-in CAS types to Java types</h2></div></div></div>


     <p>The built-in primitive CAS types map to Java types as follows:</p>


     <pre class="programlisting">uima.cas.Boolean <span class="symbol">&#8594;</span> boolean
 uima.cas.Byte    <span class="symbol">&#8594;</span> byte
 uima.cas.Short   <span class="symbol">&#8594;</span> short
 uima.cas.Integer <span class="symbol">&#8594;</span> int
 uima.cas.Long    <span class="symbol">&#8594;</span> long
 uima.cas.Float   <span class="symbol">&#8594;</span> float
 uima.cas.Double  <span class="symbol">&#8594;</span> double
 uima.cas.String  <span class="symbol">&#8594;</span> String</pre>

   </div>

   <div class="section" title="5.4.&nbsp;Augmenting the generated Java Code"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.jcas.augmenting_generated_code">5.4.&nbsp;Augmenting the generated Java Code</h2></div></div></div>


     <p>The Java Class Models generated for each type can be augmented by the user. Typical
       augmentations include adding additional (non-CAS) fields and methods, and import
       statements that might be needed to support these. Commonly added methods include
       additional constructors (having different parameter signatures), and
       implementations of toString().</p>

     <p>To augment the code, just edit the generated Java source code for the class named the
       same as the CAS type. Here's an example of an additional method you might add; the
       various getter methods are retrieving values from the instance:</p>


     <pre class="programlisting">public String toString() { // for debugging
   return "XsgParse "
     + getslotName() + ": "
     + getheadWord().getCoveredText()
     + " seqNo: " + getseqNo()
     + ", cAddr: " + id
     + ", size left mods: " + getlMods().size()
     + ", size right mods: " + getrMods().size();
 }</pre>


     <div class="section" title="5.4.1.&nbsp;Keeping hand-coded augmentations when regenerating"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.jcas.keeping_augmentations_when_regenerating">5.4.1.&nbsp;Keeping hand-coded augmentations when regenerating</h3></div></div></div>


       <p>If the type system specification changes, you have to re-run the JCasGen
         generator. This will produce updated Java for the Class Models that capture the
         changed specification. If you have previously augmented the source for these Java
         Class Models, your changes must be merged with the newly (re)generated Java source
         code for the Class Models. This can be done by hand, or you can run the version of JCasGen
         that is integrated with Eclipse, and use automatic merging that is done using Eclipse's EMF
         plug-in. You can obtain Eclipse and the needed EMF plug-in from <a class="ulink" href="http://www.eclipse.org/" target="_top">http://www.eclipse.org/</a>.</p>

       <p>If you run the generator version that works without using Eclipse, it will not
         merge Java source changes you may have previously made; if you want them retained,
         you'll have to do the merging by hand.</p>

       <p>The Java source merging will keep additional constructors, additional fields,
         and any changes you may have made to the readObject method (see below). Merging will
         <span class="emphasis"><em>not</em></span> delete classes in the target corresponding to deleted CAS types, which no longer
         are in the source &#8211; you should delete these by hand.</p>

       <div class="warning" title="Warning" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Warning</h3><p>The merging supports Java 1.4 syntactic constructs only.
         JCasGen generates Java 1.4 code, so as long as any code you change here also sticks to
         only Java 1.4 constructs, the merge will work.  If you use Java 5 or later specific syntax or constructs, the merge
         operation will likely fail to merge properly.</p></div>
     </div>

     <div class="section" title="5.4.2.&nbsp;Additional Constructors"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.jcas.additional_constructors">5.4.2.&nbsp;Additional Constructors</h3></div></div></div>


       <p>Any additional constructors that you add must include the JCas argument. The
         first line of your constructor is required to be</p>


       <pre class="programlisting">this(jcas);        // run the standard constructor</pre>

       <p>where jcas is the passed in JCas reference. If the type you're defining
         extends <code class="literal">uima.tcas.Annotation</code>, JCasGen will automatically
         add a constructor which takes 2 additional parameters &#8211; the begin and end Java
         int values, and set the <code class="literal">uima.tcas.Annotation</code>
         <code class="literal">begin</code> and <code class="literal">end</code> fields.</p>

       <p>Here's an example: If you're defining a type MyType which has a
         feature parent, you might make an additional constructor which has an additional
         argument of parent:</p>


       <pre class="programlisting">MyType(JCas jcas, MyType parent) {
   this(jcas);        // run the standard constructor
   setParent(parent); // set the parent field from the parameter
 }</pre>

       <div class="section" title="5.4.2.1.&nbsp;Using readObject"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.jcas.using_readobject">5.4.2.1.&nbsp;Using readObject</h4></div></div></div>


         <p>Fields defined by augmenting the Java Class Model to include additional
           fields represent data that exist for this class in Java, in a local JVM (Java Virtual
           Machine), but do not exist in the CAS when it is passed to other environments (for
           example, passing to a remote annotator).</p>

         <p>A problem can arise when new instances are created, perhaps by the underlying
           system when it iterates over an index, which is: how to insure that any additional
           non-CAS fields are properly initialized. To allow for arbitrary initialization
           at instance creation time, an initialization method in the Java Class Model,
           called readObject is used. The generated default for this method is to do nothing,
           but it is one of the methods that you can modify &#8211; to do whatever
           initialization might be needed. It is called with 0 parameters, during the
           constructor for the object, after the basic object fields have been set up. It can
           refer to fields in the CAS using the getters and setters, and other fields in the Java
           object instance being initialized.</p>

         <p>A pre-existing CAS feature structure could exist if a CAS was being passed to
           this annotator; in this case the JCas system calls the readObject method when
           creating the corresponding Java instance for the first time for the CAS feature
           structure. This can happen at two points: when a new object is being returned from an
           iterator over a CAS index, or a getter method is getting a field for the first time
           whose value is a feature structure.</p>

       </div>
     </div>

     <div class="section" title="5.4.3.&nbsp;Modifying generated items"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.jcas.modifying_generated_items">5.4.3.&nbsp;Modifying generated items</h3></div></div></div>


       <p>The following modifications, if made in generated items, will be preserved when
         regenerating.</p>

       <p>The public/private etc. flags associated with methods (getters and setters).
         You can change the default (<span class="quote">&#8220;<span class="quote">public</span>&#8221;</span>) if needed.</p>

       <p><span class="quote">&#8220;<span class="quote">final</span>&#8221;</span> or <span class="quote">&#8220;<span class="quote">abstract</span>&#8221;</span> can be added to the type
         itself, with the usual semantics.</p>

     </div>
   </div>

   <div class="section" title="5.5.&nbsp;Merging types"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.jcas.merging_types_from_other_specs">5.5.&nbsp;Merging types</h2></div></div></div>


     <p>Type definitions are merged by the framework from all the components being run together.</p>

     <div class="section" title="5.5.1.&nbsp;Aggregate AEs and CPEs as sources of types"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.jcas.merging_types.aggregates_and_cpes">5.5.1.&nbsp;Aggregate AEs and CPEs as sources of types</h3></div></div></div>


       <p>When running aggregate AEs (Analysis Engines), or a set of AEs in a collection processing engine, the
         UIMA framework will build a merged type system (Note: this <span class="quote">&#8220;<span class="quote">merge</span>&#8221;</span> is merging types, not to be
         confused with merging Java source code, discussed above). This merged type system has all the types of every
         component used in the application.  In addition, application code can use UIMA Framework APIs to read and merge
         type descriptions, manually.</p>

       <p>In most cases, each type system can have its own Java Class Models generated individually, perhaps at an
         earlier time, and the resulting class files (or .jar files containing these class files) can be put in the
         class path to enable JCas.</p>

       <p>However, it is possible that there may be multiple definitions of the same CAS type, each of which might
         have different features defined. In this case, the UIMA framework will create a merged type by accumulating
         all the defined features for a particular type into that type's type definition. However, the JCas
         classes for these types are not automatically merged, which can create some issues for JCas users, as
         discussed in the next section.</p>

     </div>

     <div class="section" title="5.5.2.&nbsp;JCasGen support for type merging"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.jcas.merging_types.jcasgen_support">5.5.2.&nbsp;JCasGen support for type merging</h3></div></div></div>


       <p>When there are multiple definitions of the same CAS type with different features defined, then JCasGen
         can be re-run on the merged type system, to create one set of JCas Class definitions for the merged types,
         which can then be shared by all the components.
         Directions for running JCasGen can be found in <a href="tools.html#d5e1" class="olink">UIMA Tools Guide and Reference</a> <a href="tools.html#ugr.tools.jcasgen" class="olink">Chapter&nbsp;8, <i>JCasGen User's Guide</i></a>. This is typically done by the person who
         is assembling the Aggregate Analysis Engine or Collection Processing Engine. The resulting merged Java
         Class Model will then contain get and set methods for the complete set of features. These Java classes must
         then be made available in the class path, <span class="emphasis"><em>replacing</em></span> the pre-merge versions of the
         classes.</p>

       <p>If hand-modifications were done to the pre-merge versions of the classes, these must be applied to the
         merged versions, as described in section <a class="xref" href="#ugr.ref.jcas.keeping_augmentations_when_regenerating" title="5.4.1.&nbsp;Keeping hand-coded augmentations when regenerating">Section&nbsp;5.4.1, &#8220;Keeping hand-coded augmentations when regenerating&#8221;</a>, above. If just one of the
         pre-merge versions had hand-modifications, the source for this hand-modified version can be put into the
         file system where the generated output will go, and the -merge option for JCasGen will automatically
         merge the hand-modifications with the generated code. If
         <span class="emphasis"><em>both</em></span> pre-merged versions had hand-modifications, then these modifications must
         be manually merged.</p>

       <p>An alternative to this is packaging the components as individual PEAR files, each with their own
       version of the JCas generated Classes.  The Framework (as of release 2.2) can run PEAR files using the
       pear file descriptor, and supply each component with its particular version of the JCas generated class.</p>

     </div>

     <div class="section" title="5.5.3.&nbsp;Impact of Type Merging on Composability of Annotators"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.jcas.impact_of_type_merging_on_composability">5.5.3.&nbsp;Impact of Type Merging on Composability of Annotators</h3></div></div></div>


       <p>The recommended approach in UIMA is to build and maintain type systems as separate components, which are
         imported by Annotators. Using this approach, Type Merging does not occur because the Type System and its JCas
         classes are centrally managed and shared by the annotators.</p>

       <p>If you do choose to create a JCas Annotator that relies on Type Merging (meaning that your annotator
         redefines a Type that is already in use elsewhere, and adds its own features), this can negatively impact the
         reusability of your annotator, unless your component is used as a PEAR file.</p>

       <p>If not using PEAR file packaging isolation capability, whenever
         anyone wants to combine your annotator with another annotator that uses a different version of
         the same Type, they will need to be aware of all of the issues described in the previous section. They will need
         to have the know-how to re-run JCasGen and appropriately set up their classpath to include the merged Java
         classes and to not include the pre-merge classes. (To enable this, you should package these classes
         separately from other .jar files for your annotator, so that they can be more easily excluded.) And, if you
         have done hand-modifications to your JCas classes, the person assembling your annotator will need to
         properly merge those changes. These issues significantly complicate the task of combining annotators, and
         will cause your annotator not to be as easily reusable as other UIMA annotators. </p>

     </div>

     <div class="section" title="5.5.4.&nbsp;Adding Features to DocumentAnnotation"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.jcas.documentannotation_issues">5.5.4.&nbsp;Adding Features to DocumentAnnotation</h3></div></div></div>


       <p>There is one built-in type, <code class="literal">uima.tcas.DocumentAnnotation</code>,
         to which applications can add additional features.  (All other built-in types
         are "feature-final" and you cannot add additional features to them.)  Frequently,
         additional features are added to <code class="literal">uima.tcas.DocumentAnnotation</code>
         to provide a place to store document-level metadata.</p>

       <p>For the same reasons mentioned in the previous section, adding features to
         DocumentAnnotation is not recommended if you are using JCas.  Instead, it is recommended
         that you define your own type for storing your document-level metadata.  You can create
         an instance of this type and add it to the indexes in the usual way.  You can then
         retrieve this instance using the iterator returned from the method<code class="literal">getAllIndexedFS(type)</code>
         on an instance of a JFSIndexRepository object.
         (As of UIMA v2.1, you do not have to declare a custom index in your descriptor to
         get this to work).</p>

       <p>If you do choose to add features to DocumentAnnotation, there are additional issues to
         be aware of.  The UIMA SDK provides the JCas cover class for the built-in definition of
         DocumentAnnotation, in the separate jar file <code class="literal">uima-document-annotation.jar</code>.
         If you add additional features to DocumentAnnotation, you must remove this jar file
         from your classpath, because you will not want to use the default JCas cover class.
         You will need to re-run JCasGen as described in <a class="xref" href="#ugr.ref.jcas.merging_types.jcasgen_support" title="5.5.2.&nbsp;JCasGen support for type merging">Section&nbsp;5.5.2, &#8220;JCasGen support for type merging&#8221;</a>.  JCasGen will generate a new cover
         class for DocumentAnnotation, which you must place in your classpath in lieu of the version
         in <code class="literal">uima-document-annotation.jar</code>.</p>

       <p>Also, this is the reason why the method <code class="literal">JCas.getDocumentAnnotationFs()</code> returns
         type <code class="literal">TOP</code>, rather than type <code class="literal">DocumentAnnotation</code>.  Because the
         <code class="literal">DocumentAnnotation</code> class can be replaced by users, it is not part of
         <code class="literal">uima-core.jar</code> and so the core UIMA framework cannot have any references
         to it.  In your code, you may <span class="quote">&#8220;<span class="quote">cast</span>&#8221;</span> the result of <code class="literal">JCas.getDocumentAnnotationFs()</code>
         to type <code class="literal">DocumentAnnotation</code>, which must be available on the classpath either via
         <code class="literal">uima-document-annotation.jar</code> or by including a custom version that you have generated using JCasGen.</p>
     </div>

   </div>

   <div class="section" title="5.6.&nbsp;Using JCas within an Annotator"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.jcas.using_within_an_annotator">5.6.&nbsp;Using JCas within an Annotator</h2></div></div></div>


     <p>To use JCas within an annotator, you must include the generated Java classes output
       from JCasGen in the class path.</p>

     <p>An annotator written using JCas is built by defining a class for the annotator that
       extends JCasAnnotator_ImplBase. The process method for this annotator is
       written</p>

     <pre class="programlisting">public void process(JCas jcas)
      throws AnalysisEngineProcessException {
   ... // body of annotator goes here
 }</pre>

     <p>The process method is passed the JCas instance to use as a parameter.</p>

     <p>The JCas reference is used throughout the annotator to refer to the particular JCas
       instance being worked on. In pooled or multi-threaded implementations, there will be a
       separate JCas for each thread being (simultaneously) worked on.</p>

     <p>You can do several kinds of operations using the JCas APIs: create new feature
       structures (instances of CAS types) (using the new operator), access existing feature
       structures passed to your annotator in the JCas (for example, by using the next method of
       an iterator over the feature structures), get and set the fields of a particular
       instance of a feature structure, and add and remove feature structure instances from
       the CAS indexes. To support iteration, there are also functions to get and use indexes
       and iterators over the instances in a JCas.</p>

     <div class="section" title="5.6.1.&nbsp;Creating new instances using the Java &#8220;new&#8221; operator"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.jcas.new_instances">5.6.1.&nbsp;Creating new instances using the Java <span class="quote">&#8220;<span class="quote">new</span>&#8221;</span> operator</h3></div></div></div>


       <p>The new operator creates new instances of JCas types. It takes at least one
         parameter, the JCas instance in which the type is to be created. For example, if there
         was a type Meeting defined, you can create a new instance of it using:

         </p><pre class="programlisting">Meeting m = new Meeting(jcas);</pre>

       <p>Other variations of constructors can be added in custom code; the single
         parameter version is the one automatically generated by JCasGen. For types that are
         subtypes of Annotation, JCasGen also generates an additional constructor with
         additional <span class="quote">&#8220;<span class="quote">begin</span>&#8221;</span> and <span class="quote">&#8220;<span class="quote">end</span>&#8221;</span> arguments.</p>

     </div>
     <div class="section" title="5.6.2.&nbsp;Getters and Setters"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.jcas.getters_and_setters">5.6.2.&nbsp;Getters and Setters</h3></div></div></div>


       <p>If the CAS type Meeting had fields location and time, you could get or set these by
         using getter or setter methods. These methods have names formed by splicing together
         the word <span class="quote">&#8220;<span class="quote">get</span>&#8221;</span> or <span class="quote">&#8220;<span class="quote">set</span>&#8221;</span> followed by the field name, with
         the first letter of the field name capitalized. For instance

         </p><pre class="programlisting">getLocation()</pre>

       <p>The getter forms take no parameters and return the value of the field; the setter
         forms take one parameter, the value to set into the field, and return void.</p>

       <p>There are built-in CAS types for arrays of integers, strings, floats, and
         feature structures. For fields whose values are these types of arrays, there is an
         alternate form of getters and setters that take an additional parameter, written as
         the first parameter, which is the index in the array of an item to get or set.</p>

     </div>

     <div class="section" title="5.6.3.&nbsp;Obtaining references to Indexes"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.jcas.obtaining_refs_to_indexes">5.6.3.&nbsp;Obtaining references to Indexes</h3></div></div></div>


       <p>The only way to access instances (not otherwise referenced from other
         instances) passed in to your annotator in its JCas is to use an iterator over some
         index. Indexes in the CAS are specified in the annotator descriptor. Indexes have a
         name; text annotators have a built-in, standard index over all annotations.</p>

       <p>To get an index, first get the JFSIndexRepository from the JCas using the method
         jcas.getJFSIndexRepository(). Here are the calls to get indexes:</p>


       <pre class="programlisting">JFSIndexRepository ir = jcas.getJFSIndexRepository();

 ir.getIndex(name-of-index) // get the index by its name, a string
 ir.getIndex(name-of-index, Foo.type) // filtered by specific type

 ir.getAnnotationIndex()      // get AnnotationIndex
 jcas.getAnnotationIndex()    // get directly from jcas
 ir.getAnnotationIndex(Foo.type)      // filtered by specific type</pre>
 jcas.getAnnotationIndex(Foo.class)   // better

       <p>For convenience, the getAnnotationIndex method is available directly on the JCas object
       instance; the implementation merely forwards to the associated index repository.</p>

       <p>Filtering types have to be a subtype of the type specified for this index in its
         index specification. They can be written as either Foo.type or if you have an instance
         of Foo, you can write</p>

       <pre class="programlisting">fooInstance.getClass()</pre>

       <p>Foo is (of course) an example of the name of the type.</p>

     </div>
     <div class="section" title="5.6.4.&nbsp;Adding (and removing) instances to (from) indexes"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.jcas.adding_removing_instances_to_indexes">5.6.4.&nbsp;Adding (and removing) instances to (from) indexes</h3></div></div></div>


       <p>CAS indexes are maintained automatically by the CAS. But you must add any
         instances of feature structures you want the index to find, to the indexes by using the
         call:</p>

       <pre class="programlisting">myInstance.addToIndexes();</pre>

       <p>Do this after setting all features in the instance <span class="bold-italic">which could be used in indexing</span>,
         for example, in determining the sorting order.
         See <a class="xref" href="#ugr.ref.cas.updating_indexed_feature_structures" title="4.5.1.&nbsp;Updating indexed feature structures">Section&nbsp;4.5.1, &#8220;Updating indexed feature structures&#8221;</a> for details
         on updating indexed feature structures.
       </p>

       <p>When writing a Multi-View component, you may need to index instances in multiple
         CAS views. The methods above use the indexes associated with the current JCas object.
         There is a variation of the <code class="literal">addToIndexes / removeFromIndexes</code> methods which
         takes one argument: a reference to a JCas object holding the view in which you want to
         index this instance.
         </p><pre class="programlisting">myInstance.addToIndexes(anotherJCas)
 myInstance.removeFromIndexes(anotherJCas)</pre><p>
       </p>

       <p>
         You can also explicitly add instances to other views using the addFsToIndexes method on
         other JCas (or CAS) objects. For instance, if you had 2 other CAS views (myView1 and
         myView2), in which you wanted to index myInstance, you could write:</p>

       <pre class="programlisting">myInstance.addToIndexes(); //addToIndexes used with the new operator
 myView1.addFsToIndexes(myInstance); // index myInstance in myView1
 myView2.addFsToIndexes(myInstance); // index myInstance in myView2</pre>

       <p>
         The rules for determining which index to use with a particular JCas object are designed to
         behave the way most would think they should; if you need specific behavior, you can always
         explicitly designate which view the index adding and removing operations should work on.
       </p>

       <p>
         The rules are:
         If the instance is a subtype of AnnotationBase, then the view is the view associated with the
         annotation as specified in the feature holding the view reference in AnnotationBase.
         Otherwise, if the instance was created using the "new" operator, then the view is the view passed to the
         instance's constructor.
         Otherwise, if the instance was created by getting a feature value from some other instance, whose range
         type is a feature structure, then the view is the same as the referring instance.
         Otherwise, if the instance was created by any of the Feature Structure Iterator operations over some index,
         then it is the view associated with the index.
       </p>

       <p>As of release 2.4.1, there are two efficient bulk-remove methods to remove all instances of a given type,
       or all instances of a given type and its subtypes.
         These are invoked on an instance of an IndexRepository,
       for a particular view.  For example, to remove all instances of Token from a particular JCas instance:
             </p>
        <pre class="programlisting">jcas.removeAllIncludingSubtypes(Token.type) or
 jcas.removeAllIncludingSubtypes(aTokenInstance.getTypeIndexID()) or
 jcas.getFsIndexRepository().
        removeAllIncludingSubtypes(jcas.getCasType(Token.type))
 </pre>

     </div>

     <div class="section" title="5.6.5.&nbsp;Using Iterators"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.jcas.using_iterators">5.6.5.&nbsp;Using Iterators</h3></div></div></div>


       <p>This chapter describes obtaining and using iterators.  However, it is recommended that instead
         you use the select framework, described in a chapter in the version 3 user's guide.</p>

       <p>Once you have an index obtained from the JCas, you can get an iterator from the
         index; here is an example:</p>


       <pre class="programlisting">FSIndexRepository ir = jcas.getFSIndexRepository();
 FSIndex myIndex = ir.getIndex("myIndexName");
 FSIterator myIterator = myIndex.iterator();

 JFSIndexRepository ir = jcas.getJFSIndexRepository();
 FSIndex myIndex = ir.getIndex("myIndexName", Foo.type); // filtered
 FSIterator myIterator = myIndex.iterator();</pre>

       <p>Iterators work like normal Java iterators, but are augmented to support
         additional capabilities. Iterators are described in the CAS Reference, <a href="references.html#ugr.ref.cas.indexes_and_iterators" class="olink">Section&nbsp;4.7, &#8220;Indexes and Iterators&#8221;</a>.</p>

     </div>

     <div class="section" title="5.6.6.&nbsp;Class Loaders in UIMA"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.jcas.class_loaders">5.6.6.&nbsp;Class Loaders in UIMA</h3></div></div></div>


       <p>The basic concept of a UIMA application includes assembling engines into a flow.
         The application made up of these Engines are run within the UIMA Framework, either by
         the Collection Processing Manager, or by using more basic UIMA Framework
         APIs.</p>

       <p>The UIMA Framework exists within a JVM (Java Virtual Machine). A JVM has the
         capability to load multiple applications, in a way where each one is isolated from the
         others, by using a separate class loader for each application. For instance, one set
         of UIMA Framework Classes could be shared by multiple sets of application - specific
         classes, even if these application-specific classes had the same names but were
         different versions.</p>

       <div class="section" title="5.6.6.1.&nbsp;Use of Class Loaders is optional"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.jcas.class_loaders.optional">5.6.6.1.&nbsp;Use of Class Loaders is optional</h4></div></div></div>


         <p>The UIMA framework will use a specific ClassLoader, based on how
           ResourceManager instances are used. Specific ClassLoaders are only created if
           you specify an ExtensionClassPath as part of the ResourceManager. If you do not
           need to support multiple applications within one UIMA framework within a JVM,
           don't specify an ExtensionClassPath; in this case, the classloader used
           will be the one used to load the UIMA framework - usually the overall application
           class loader.</p>

         <p>Of course, you should not run multiple UIMA applications together, in this
           way, if they have different class definitions for the same class name. This
           includes the JCas <span class="quote">&#8220;<span class="quote">cover</span>&#8221;</span> classes. This case might arise, for
           instance, if both applications extended
           <code class="literal">uima.tcas.DocumentAnnotation</code> in differing,
           incompatible ways. Each application would need its own definition of this class,
           but only one could be loaded (unless you specify ExtensionClassPath in the
           ResourceManager which will cause the UIMA application to load its private
           versions of its classes, from its classpath).</p>
       </div>
     </div>

     <div class="section" title="5.6.7.&nbsp;Issues accessing JCas objects outside of UIMA Engine Components"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.jcas.accessing_jcas_objects_outside_uima_components">5.6.7.&nbsp;Issues accessing JCas objects outside of UIMA Engine Components</h3></div></div></div>


       <p>If you are using the ExtensionClassPaths, the JCas cover classes are loaded
         under a class loader created by the ResourceManager part of the UIMA Framework.
         If you reference the same JCas
         classes outside of any UIMA component, for instance, in top level application code,
         the JCas classes used by that top level application code also must be in the class path
         for the application code.</p>

       <p>Alternatively, you could do all the JCas processing inside a UIMA component (and do no
         processing using JCas outside of the UIMA pipeline).</p>

     </div>
   </div>

   <div class="section" title="5.7.&nbsp;Setting up Classpath for JCas"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.jcas.setting_up_classpath">5.7.&nbsp;Setting up Classpath for JCas</h2></div></div></div>


     <p>The JCas Java classes generated by JCasGen are typically compiled and put into a JAR
       file, which, in turn, is put into the application's class path.</p>

     <p>This JAR file must be generated from the application's merged type system.
       This is most conveniently done by opening the top level descriptor used by the
       application in the Component Descriptor Editor tool, and pressing the Run-JCasGen
       button on the Type System Definition page.</p>

   </div>

   <div class="section" title="5.8.&nbsp;PEAR isolation"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.jcas.pear_support">5.8.&nbsp;PEAR isolation</h2></div></div></div>

     <p>
       As of version 2.2, the framework supports component descriptors which are PEAR descriptors.
       These descriptors define components plus include information on the class path needed to
       run them.  The framework uses the class path information to set up a localized class path, just
       for code running within the PEAR context.  This allows PEAR files requiring different
       versions of common code to work well together, even if the class names in the different versions
       have the same names.
     </p>

     <p>The mechanism used to switch the class loaders when entering a PEAR-packaged annotator in
     a flow depends on the framework knowing if JCas is being used within that annotator code.  The
     framework will know this if the particular view being passed has had a previous call to
     getJCas(), or if the particular annotator is marked as a JCas-using one (by having it extend the
     class <code class="code">JCasAnnotator_ImplBase).</code></p>

   </div>

 </div>
   <div class="chapter" title="Chapter&nbsp;6.&nbsp;PEAR Reference" id="ugr.ref.pear"><div class="titlepage"><div><div><h2 class="title">Chapter&nbsp;6.&nbsp;PEAR Reference</h2></div></div></div>


 	<p>
 		A PEAR (Processing Engine ARchive) file is a standard package
 		for UIMA components. This chapter describes the PEAR 1.0 structure and
 		specification.
 	</p>

 	<p>
 		The PEAR package can be used for distribution and reuse by other
 		components or applications. It also allows applications and
 		tools to manage UIMA components automatically for verification,
 		deployment, invocation, testing, etc.
 	</p>

 	<p>
 		Currently, there is an Eclipse plugin and a command line tool
 		available to create PEAR packages for standard UIMA components.
 		Please refer to
 		<a href="tools.html#d5e1" class="olink">UIMA Tools Guide and Reference</a>
 		<a href="tools.html#ugr.tools.pear.packager" class="olink">Chapter&nbsp;9, <i>PEAR Packager User's Guide</i></a>
 		for more information about these tools.
 	</p>

   <p>
     PEARs distributed to new targets can be installed at those targets.
     UIMA includes a tool for installing PEARs; see
 		<a href="tools.html#d5e1" class="olink">UIMA Tools Guide and Reference</a>
     <a href="tools.html#ugr.tools.pear.installer" class="olink">Chapter&nbsp;11, <i>PEAR Installer User's Guide</i></a> for
     more information about installing PEARs.
   </p>

   <p>
     An installed PEAR can be used as a component within a UIMA pipeline,
     by specifying the pear descriptor that is created when
     installing the pear.  See
     <a href="references.html#ugr.ref.pear.specifier" class="olink">Section&nbsp;6.3, &#8220;PEAR package descriptor&#8221;</a>.
   </p>

 	<div class="section" title="6.1.&nbsp;Packaging a UIMA component"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.pear.packaging_a_component">6.1.&nbsp;Packaging a UIMA component</h2></div></div></div>


 		<p>
 			For the purpose of describing the process of creating a PEAR
 			file and its internal structure, this section describes the
 			steps used to package a UIMA component as a valid PEAR file.
 			The PEAR packaging process consists of the following steps:

 			</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">
 					<p>
 						<a class="xref" href="#ugr.ref.pear.creating_pear_structure" title="6.1.1.&nbsp;Creating the PEAR structure">Section&nbsp;6.1.1, &#8220;Creating the PEAR structure&#8221;</a>
 					</p>
 				</li><li class="listitem">
 					<p>
 						<a class="xref" href="#ugr.ref.pear.populating_pear_structure" title="6.1.2.&nbsp;Populating the PEAR structure">Section&nbsp;6.1.2, &#8220;Populating the PEAR structure&#8221;</a>
 					</p>
 				</li><li class="listitem">
 					<p>
 						<a class="xref" href="#ugr.ref.pear.creating_installation_descriptor" title="6.1.3.&nbsp;Creating the installation descriptor">Section&nbsp;6.1.3, &#8220;Creating the installation descriptor&#8221;</a>
 					</p>
 				</li><li class="listitem">
 					<p>
 						<a class="xref" href="#ugr.ref.pear.packaging_into_1_file" title="6.1.5.&nbsp;Packaging the PEAR structure into one file">Section&nbsp;6.1.5, &#8220;Packaging the PEAR structure into one file&#8221;</a>
 					</p>
 				</li></ul></div><p>
 		</p>

 		<div class="section" title="6.1.1.&nbsp;Creating the PEAR structure"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.pear.creating_pear_structure">6.1.1.&nbsp;Creating the PEAR structure</h3></div></div></div>


 			<p>
 				The first step in the PEAR creation process is to create
 				a PEAR structure. The PEAR structure is a structured
 				tree of folders and files, including the following
 				elements:

 				</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">
 						<p>
 							Required Elements:

 							</p><div class="itemizedlist"><ul class="itemizedlist" type="circle"><li class="listitem">
 									<p>
 										The
 										<span class="bold"><strong>
 											metadata
 										</strong></span>
 										folder which contains the PEAR
 										installation descriptor and
 										properties files.
 									</p>
 								</li><li class="listitem">
 									<p>
 										The installation descriptor (
 										<span class="bold"><strong>
 											metadata/install.xml
 										</strong></span>
 										)
 									</p>
 								</li><li class="listitem">
 									<p>
 										A UIMA analysis engine
 										descriptor and its required
 										code, delegates (if any), and
 										resources
 									</p>
 								</li></ul></div><p>
 						</p>
 					</li><li class="listitem">
 						<p>
 							Optional Elements:

 							</p><div class="itemizedlist"><ul class="itemizedlist" type="circle"><li class="listitem">
 									<p>
 										The desc folder to contain
 										descriptor files of analysis
 										engines, delegates analysis
 										engines (all levels), and other
 										components (Collection Readers,
 										CAS Consumers, etc).
 									</p>
 								</li><li class="listitem">
 									<p>
 										The src folder to contain the
 										source code
 									</p>
 								</li><li class="listitem">
 									<p>
 										The bin folder to contain
 										executables, scripts, class
 										files, dlls, shared libraries,
 										etc.
 									</p>
 								</li><li class="listitem">
 									<p>
 										The lib folder to contain jar
 										files.
 									</p>
 								</li><li class="listitem">
 									<p>
 										The doc folder containing
 										documentation materials,
 										preferably accessible through an
 										index.html.
 									</p>
 								</li><li class="listitem">
 									<p>
 										The data folder to contain data
 										files (e.g. for testing).
 									</p>
 								</li><li class="listitem">
 									<p>
 										The conf folder to contain
 										configuration files.
 									</p>
 								</li><li class="listitem">
 									<p>
 										The resources folder to contain
 										other resources and
 										dependencies.
 									</p>
 								</li><li class="listitem">
 									<p>
 										Other user-defined folders or
 										files are allowed, but should be
 										avoided.
 									</p>
 								</li></ul></div><p>
 						</p>
 					</li></ul></div><p>
 			</p>

 			<div class="figure"><a name="ugr.ref.pear.fig.pear_structure"></a><div class="figure-contents">

 				<div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="297"><tr><td><img src="images/references/ref.pear/image002.jpg" width="297" alt="diagram of the PEAR structure"></td></tr></table></div>
 			</div><p class="title"><b>Figure&nbsp;6.1.&nbsp;The PEAR Structure</b></p></div><br class="figure-break">

 		</div>
 		<div class="section" title="6.1.2.&nbsp;Populating the PEAR structure"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.pear.populating_pear_structure">6.1.2.&nbsp;Populating the PEAR structure</h3></div></div></div>


 			<p>
 				After creating the PEAR structure, the component's
 				descriptor files, code files, resources files, and any
 				other files and folders are copied into the
 				corresponding folders of the PEAR structure. The
 				developer should make sure that the code would work with
 				this layout of files and folders, and that there are no
 				broken links. Although it is strongly discouraged, the
 				optional elements of the PEAR structure can be replaced
 				by other user defined files and folder, if required for
 				the component to work properly.
 			</p>
 			<div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3>
 				<p>
 					The PEAR structure must be self-contained. For
 					example, this means that the component must run
 					properly independently from the PEAR root folder
 					location. If the developer needs to use an absolute
 					path in configuration or descriptor files, then
 					he/she should put these files in the
 					<span class="quote">&#8220;<span class="quote">conf</span>&#8221;</span>
 					or
 					<span class="quote">&#8220;<span class="quote">desc</span>&#8221;</span>
 					and replace the path of the PEAR root folder with
 					the string
 					<span class="quote">&#8220;<span class="quote">$main_root</span>&#8221;</span>
 					. The tools that deploy and use PEAR files should
 					localize the files in the
 					<span class="quote">&#8220;<span class="quote">conf</span>&#8221;</span>
 					and
 					<span class="quote">&#8220;<span class="quote">desc</span>&#8221;</span>
 					folders by replacing the string
 					<span class="quote">&#8220;<span class="quote">$main_root</span>&#8221;</span>
 					with the local absolute path of the PEAR root
 					folder. The
 					<span class="quote">&#8220;<span class="quote">$main_root</span>&#8221;</span>
 					macro can also be used in the Installation
 					descriptor (install.xml)
 				</p>
 			</div>

 			<p>
 				Currently there are three types of component packages
 				depending on their deployment:
 			</p>

 			<div class="section" title="6.1.2.1.&nbsp;Standard Type"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.pear.package_type.standard">6.1.2.1.&nbsp;Standard Type</h4></div></div></div>


 				<p>
 					A component package with the
 					<span class="bold"><strong>standard</strong></span>
 					type must be a valid Analysis Engine, and all the
 					required files to deploy it locally must be included
 					in the PEAR package.
 				</p>

 			</div>
 			<div class="section" title="6.1.2.2.&nbsp;Service Type"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.pear.package_type.service">6.1.2.2.&nbsp;Service Type</h4></div></div></div>


 				<p>
 					A component package with the
 					<span class="bold"><strong>service</strong></span>
 					type must be deployable locally as a supported UIMA
 					service (e.g. Vinci). In this case, all the required
 					files to deploy it locally must be included in the
 					PEAR package.
 				</p>

 			</div>
 			<div class="section" title="6.1.2.3.&nbsp;Network Type"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.pear.package_type.network">6.1.2.3.&nbsp;Network Type</h4></div></div></div>


 				<p>
 					A component package with the network type is not
 					deployed locally but rather in the
 					<span class="quote">&#8220;<span class="quote">remote</span>&#8221;</span>
 					environment. It's accessed as a network AE
 					(e.g. Vinci Service). The component owner has the
 					responsibility to start the service and make sure
 					it's up and running before it's used by
 					others (like a webmaster that makes sure the web
 					site is up and running). In this case, the PEAR
 					package does not have to contain files required for
 					deployment, but must contain the network AE
 					descriptor (see
 					<a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.aae.creating_xml_descriptor" class="olink">Section&nbsp;1.1.4, &#8220;Creating the XML Descriptor&#8221;</a>
 					) and the &lt;DESC&gt; tag in the installation
 					descriptor must point to the network AE descriptor.
 					For more information about Network Analysis Engines,
 					please refer to
 					<a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.application.remote_services" class="olink">Section&nbsp;3.6, &#8220;Working with Remote Services&#8221;</a>
 					.
 				</p>

 			</div>
 		</div>

 		<div class="section" title="6.1.3.&nbsp;Creating the installation descriptor"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.pear.creating_installation_descriptor">6.1.3.&nbsp;Creating the installation descriptor</h3></div></div></div>


 			<p>
 				The installation descriptor is an xml file called
 				install.xml under the metadata folder of the PEAR
 				structure. It's also called InsD. The InsD XML file
 				should be created in the UTF-8 file encoding. The InsD
 				should contain the following sections:
 			</p>

 			<div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">
 					<p>
 						&lt;OS&gt;: This section is used to specify
 						supported operating systems
 					</p>
 				</li><li class="listitem">
 					<p>
 						&lt;TOOLKITS&gt;: This section is used to
 						specify toolkits, such as JDK, needed by the
 						component.
 					</p>
 				</li><li class="listitem">
 					<p>
 						&lt;SUBMITTED_COMPONENT&gt;: This is the most
 						important section in the Installation
 						Descriptor. It's used to specify required
 						information about the component. See
 						<a class="xref" href="#ugr.ref.pear.installation_descriptor" title="6.1.4.&nbsp; Documented template for the installation descriptor:">Section&nbsp;6.1.4, &#8220;Installation Descriptor: template&#8221;</a>
 						for detailed information about this section.
 					</p>
 				</li><li class="listitem">
 					<p>
 						&lt;INSTALLATION&gt;: This section is explained
 						in section
 						<a class="xref" href="#ugr.ref.pear.installing" title="6.2.&nbsp;Installing a PEAR package">Section&nbsp;6.2, &#8220;Installing a PEAR package&#8221;</a>
 						.
 					</p>
 				</li></ul></div>

 		</div>

 		<div class="section" title="6.1.4.&nbsp; Documented template for the installation descriptor:"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.pear.installation_descriptor">6.1.4.&nbsp;
 				Documented template for the installation descriptor:
 			</h3></div></div></div>


 			<p>
 				The following is a sample
 				<span class="quote">&#8220;<span class="quote">documented template</span>&#8221;</span>
 				which describes content of the installation descriptor
 				install.xml:
 			</p>


 			<pre class="programlisting">&lt;? xml version="1.0" encoding="UTF-8"?&gt;
 &lt;!-- Installation Descriptor Template --&gt;
 &lt;COMPONENT_INSTALLATION_DESCRIPTOR&gt;
   &lt;!-- Specifications of OS names, including version, etc. --&gt;
   &lt;OS&gt;
     &lt;NAME&gt;OS_Name_1&lt;/NAME&gt;
     &lt;NAME&gt;OS_Name_2&lt;/NAME&gt;
   &lt;/OS&gt;
   &lt;!-- Specifications of required standard toolkits --&gt;
   &lt;TOOLKITS&gt;
     &lt;JDK_VERSION&gt;JDK_Version&lt;/JDK_VERSION&gt;
   &lt;/TOOLKITS&gt;

   &lt;!-- There are 2 types of variables that are used in the InsD:
        a) $main_root , which will be substituted with the real path to the
                  main component root directory after installing the
                  main (submitted) component
        b) $component_id$root, which will be substituted with the real path
           to the root directory of a given delegate component after
           installing the given delegate component --&gt;

   &lt;!-- Specification of submitted component (AE)             --&gt;
   &lt;!-- Note: submitted_component_id is assigned by developer; --&gt;
   &lt;!--       XML descriptor file name is set by developer.    --&gt;
   &lt;!-- Important: ID element should be the first in the       --&gt;
   &lt;!--            SUBMITTED_COMPONENT section.                --&gt;
   &lt;!-- Submitted component may include optional specification --&gt;
   &lt;!-- of Collection Reader that can be used for testing the  --&gt;
   &lt;!-- submitted component.                                   --&gt;
   &lt;!-- Submitted component may include optional specification --&gt;
   &lt;!-- of CAS Consumer that can be used for testing the       --&gt;
   &lt;!-- submitted component.                                   --&gt;

   &lt;SUBMITTED_COMPONENT&gt;
     &lt;ID&gt;submitted_component_id&lt;/ID&gt;
     &lt;NAME&gt;Submitted component name&lt;/NAME&gt;
     &lt;DESC&gt;$main_root/desc/ComponentDescriptor.xml&lt;/DESC&gt;

     &lt;!-- deployment options:                                   --&gt;
     &lt;!-- a) "standard" is deploying AE locally                 --&gt;
     &lt;!-- b) "service"  is deploying AE locally as a service,   --&gt;
     &lt;!--    using specified command (script)                   --&gt;
     &lt;!-- c) "network"  is deploying a pure network AE, which   --&gt;
     &lt;!--    is running somewhere on the network                --&gt;

     &lt;DEPLOYMENT&gt;standard | service | network&lt;/DEPLOYMENT&gt;

     &lt;!-- Specifications for "service" deployment option only   --&gt;
     &lt;SERVICE_COMMAND&gt;$main_root/bin/startService.bat&lt;/SERVICE_COMMAND&gt;
     &lt;SERVICE_WORKING_DIR&gt;$main_root&lt;/SERVICE_WORKING_DIR&gt;
     &lt;SERVICE_COMMAND_ARGS&gt;

       &lt;ARGUMENT&gt;
         &lt;VALUE&gt;1st_parameter_value&lt;/VALUE&gt;
         &lt;COMMENTS&gt;1st parameter description&lt;/COMMENTS&gt;
       &lt;/ARGUMENT&gt;

       &lt;ARGUMENT&gt;
         &lt;VALUE&gt;2nd_parameter_value&lt;/VALUE&gt;
         &lt;COMMENTS&gt;2nd parameter description&lt;/COMMENTS&gt;
       &lt;/ARGUMENT&gt;

     &lt;/SERVICE_COMMAND_ARGS&gt;

     &lt;!-- Specifications for "network" deployment option only   --&gt;

     &lt;NETWORK_PARAMETERS&gt;
       &lt;VNS_SPECS VNS_HOST="vns_host_IP" VNS_PORT="vns_port_No" /&gt;
     &lt;/NETWORK_PARAMETERS&gt;

     &lt;!-- General specifications                                --&gt;

     &lt;COMMENTS&gt;Main component description&lt;/COMMENTS&gt;

     &lt;COLLECTION_READER&gt;
       &lt;COLLECTION_ITERATOR_DESC&gt;
         $main_root/desc/CollIterDescriptor.xml
       &lt;/COLLECTION_ITERATOR_DESC&gt;

       &lt;CAS_INITIALIZER_DESC&gt;
         $main_root/desc/CASInitializerDescriptor.xml
       &lt;/CAS_INITIALIZER_DESC&gt;
     &lt;/COLLECTION_READER&gt;

     &lt;CAS_CONSUMER&gt;
       &lt;DESC&gt;$main_root/desc/CASConsumerDescriptor.xml&lt;/DESC&gt;
     &lt;/CAS_CONSUMER&gt;

   &lt;/SUBMITTED_COMPONENT&gt;
   &lt;!-- Specifications of the component installation process --&gt;
   &lt;INSTALLATION&gt;
     &lt;!-- List of delegate components that should be installed together --&gt;
     &lt;!-- with the main submitted component (for aggregate components)  --&gt;
     &lt;!-- Important: ID element should be the first in each             --&gt;

     &lt;!--            DELEGATE_COMPONENT section.                        --&gt;
     &lt;DELEGATE_COMPONENT&gt;
       &lt;ID&gt;first_delegate_component_id&lt;/ID&gt;
       &lt;NAME&gt;Name of first required separate component&lt;/NAME&gt;
     &lt;/DELEGATE_COMPONENT&gt;

     &lt;DELEGATE_COMPONENT&gt;
       &lt;ID&gt;second_delegate_component_id&lt;/ID&gt;
       &lt;NAME&gt;Name of second required separate component&lt;/NAME&gt;
     &lt;/DELEGATE_COMPONENT&gt;

     &lt;!-- Specifications of local path names that should be replaced --&gt;
     &lt;!-- with real path names after the main component as well as   --&gt;
     &lt;!-- all required delegate (library) components are installed.  --&gt;
     &lt;!-- &lt;FILE&gt; and &lt;REPLACE_WITH&gt; values may use the $main_root or --&gt;
     &lt;!-- one of the $component_id$root variables.                   --&gt;
     &lt;!-- Important: ACTION element should be the first in each      --&gt;
     &lt;!--            PROCESS section.                                --&gt;

     &lt;PROCESS&gt;
       &lt;ACTION&gt;find_and_replace_path&lt;/ACTION&gt;
       &lt;PARAMETERS&gt;
         &lt;FILE&gt;$main_root/desc/ComponentDescriptor.xml&lt;/FILE&gt;
         &lt;FIND_STRING&gt;../resources/dict/&lt;/FIND_STRING&gt;
         &lt;REPLACE_WITH&gt;$main_root/resources/dict/&lt;/REPLACE_WITH&gt;
         &lt;COMMENTS&gt;Specify actual dictionary location in XML component
           descriptor
         &lt;/COMMENTS&gt;
       &lt;/PARAMETERS&gt;
     &lt;/PROCESS&gt;

     &lt;PROCESS&gt;
       &lt;ACTION&gt;find_and_replace_path&lt;/ACTION&gt;
       &lt;PARAMETERS&gt;
         &lt;FILE&gt;$main_root/desc/DelegateComponentDescriptor.xml&lt;/FILE&gt;
         &lt;FIND_STRING&gt;
 local_root_directory_for_1st_delegate_component/resources/dict/
         &lt;/FIND_STRING&gt;
         &lt;REPLACE_WITH&gt;
           $first_delegate_component_id$root/resources/dict/
         &lt;/REPLACE_WITH&gt;
         &lt;COMMENTS&gt;
           Specify actual dictionary location in the descriptor of the 1st
           delegate component
         &lt;/COMMENTS&gt;
       &lt;/PARAMETERS&gt;
     &lt;/PROCESS&gt;

     &lt;!-- Specifications of environment variables that should be set prior
          to running the main component and all other reused components.
          &lt;VAR_VALUE&gt; values may use the $main_root or one of the
          $component_id$root variables. --&gt;

     &lt;PROCESS&gt;
       &lt;ACTION&gt;set_env_variable&lt;/ACTION&gt;
       &lt;PARAMETERS&gt;
         &lt;VAR_NAME&gt;env_variable_name&lt;/VAR_NAME&gt;
         &lt;VAR_VALUE&gt;env_variable_value&lt;/VAR_VALUE&gt;
         &lt;COMMENTS&gt;Set environment variable value&lt;/COMMENTS&gt;
       &lt;/PARAMETERS&gt;
     &lt;/PROCESS&gt;

   &lt;/INSTALLATION&gt;
 &lt;/COMPONENT_INSTALLATION_DESCRIPTOR&gt;</pre>

       <div class="section" title="6.1.4.1.&nbsp;The SUBMITTED_COMPONENT section"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.pear.installation_descriptor.submitted_component">6.1.4.1.&nbsp;The SUBMITTED_COMPONENT section</h4></div></div></div>


         <p>The SUBMITTED_COMPONENT section of the installation descriptor
           (install.xml) is used to specify required information about the UIMA component.
           Before explaining the details, let's clarify the concept of component ID and
           <span class="quote">&#8220;<span class="quote">macros</span>&#8221;</span> used in the installation descriptor. The component ID
           element should be the <span class="bold"><strong>first element </strong></span>in the
           SUBMITTED_COMPONENT section.</p>

         <p>The component id is a string that uniquely identifies the component. It should
           use the JAVA naming convention (e.g.
           com.company_name.project_name.etc.mycomponent).</p>

         <p>Macros are variables such as $main_root, used to represent a string such as the
           full path of a certain directory.</p>

         <p>The values of these macros are defined by the PEAR installation process, when the
           PEAR is installed, and represent the values local to that particular installation.
           The values are stored in the <code class="literal">metadata/PEAR.properties</code> file that is
           generated during PEAR installation.
           The tools and applications that use and deploy PEAR files replace these macros with
           the corresponding values in the local environment as part of the deployment
           process in the files included in the conf and desc folders.</p>

         <p>Currently, there are two types of macros:</p>

         <div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>$main_root, which represents the local absolute
           path of the main component root directory after deployment. </p></li><li class="listitem"><p>$<span class="emphasis"><em>component_id</em></span>$root, which
             represents the local absolute path to the root directory of the component which
             has <span class="emphasis"><em>component_id </em></span> as component ID. This component could
             be, for instance, a delegate component. </p></li></ul></div>

         <p>For example, if some part of a descriptor needs to have a path to the data
           subdirectory of the PEAR, you write <code class="literal">$main_root/data</code>. If
           your PEAR refers to a delegate component having the ID
           <span class="quote">&#8220;<span class="quote"><code class="literal">my.comp.Dictionary</code></span>&#8221;</span>, and you need to
           specify a path to one of this component's subdirectories, e.g.
           <code class="literal">resource/dict</code>, you write
           <code class="literal">$my.comp.Dictionary$root/resources/dict</code>. </p>

       </div>
       <div class="section" title="6.1.4.2.&nbsp;The ID, NAME, and DESC tags"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.pear.installation_descriptor.id_name_desc">6.1.4.2.&nbsp;The ID, NAME, and DESC tags</h4></div></div></div>


         <p>These tags are used to specify the component ID, Name, and descriptor path
           using the corresponding tags as follows:


           </p><pre class="programlisting">&lt;SUBMITTED_COMPONENT&gt;
   &lt;ID&gt;submitted_component_id&lt;/ID&gt;
   &lt;NAME&gt;Submitted component name&lt;/NAME&gt;
   &lt;DESC&gt;$main_root/desc/ComponentDescriptor.xml&lt;/DESC&gt;</pre>

       </div>
       <div class="section" title="6.1.4.3.&nbsp;Tags related to deployment types"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.pear.installation_descriptor.deployment_type">6.1.4.3.&nbsp;Tags related to deployment types</h4></div></div></div>


         <p>As mentioned before, there are currently three types of PEAR packages,
           depending on the following deployment types</p>
         <div class="section" title="Standard Type"><div class="titlepage"><div><div><h5 class="title" id="ugr.ref.pear.installation_descriptor.deployment_type.standard">Standard Type</h5></div></div></div>


           <p>A component package with the <span class="bold"><strong>standard</strong></span>
             type must be a valid UIMA Analysis Engine, and all the required files to deploy it
             must be included in the PEAR package. This deployment type should be specified as
             follows:


             </p><pre class="programlisting">&lt;DEPLOYMENT&gt;standard&lt;/DEPLOYMENT&gt;</pre>
         </div>
         <div class="section" title="Service Type"><div class="titlepage"><div><div><h5 class="title" id="ugr.ref.pear.installation_descriptor.deployment_type.service">Service Type</h5></div></div></div>


           <p>A component package with the <span class="bold"><strong>service</strong></span>
             type must be deployable locally as a supported UIMA service (e.g. Vinci). The
             installation descriptor must include the path for the executable or script to
             start the service including its arguments, and the working directory from where
             to launch it, following this template:


             </p><pre class="programlisting">&lt;DEPLOYMENT&gt;service&lt;/DEPLOYMENT&gt;
 &lt;SERVICE_COMMAND&gt;$main_root/bin/startService.bat&lt;/SERVICE_COMMAND&gt;
 &lt;SERVICE_WORKING_DIR&gt;$main_root&lt;/SERVICE_WORKING_DIR&gt;
 &lt;SERVICE_COMMAND_ARGS&gt;
   &lt;ARGUMENT&gt;
     &lt;VALUE&gt;1st_parameter_value&lt;/VALUE&gt;
     &lt;COMMENTS&gt;1st parameter description&lt;/COMMENTS&gt;
   &lt;/ARGUMENT&gt;
   &lt;ARGUMENT&gt;
     &lt;VALUE&gt;2nd_parameter_value&lt;/VALUE&gt;
     &lt;COMMENTS&gt;2nd parameter description&lt;/COMMENTS&gt;
   &lt;/ARGUMENT&gt;
 &lt;/SERVICE_COMMAND_ARGS&gt;</pre>

         </div>
         <div class="section" title="Network Type"><div class="titlepage"><div><div><h5 class="title" id="ugr.ref.pear.installation_descriptor.deployment_type.network">Network Type</h5></div></div></div>


           <p>A component package with the network type is not deployed locally, but
             rather in a <span class="quote">&#8220;<span class="quote">remote</span>&#8221;</span> environment. It's accessed as a
             network AE (e.g. Vinci Service). In this case, the PEAR package does not have to
             contain files required for deployment, but must contain the network AE
             descriptor. The &lt;DESC&gt; tag in the installation descriptor (See section
             2.3.2.1) must point to the network AE descriptor. Here is a template in the case of
             Vinci services:


             </p><pre class="programlisting">&lt;DEPLOYMENT&gt;network&lt;/DEPLOYMENT&gt;
 &lt;NETWORK_PARAMETERS&gt;
   &lt;VNS_SPECS VNS_HOST="vns_host_IP" VNS_PORT="vns_port_No" /&gt;
 &lt;/NETWORK_PARAMETERS&gt;</pre>
         </div>
       </div>
       <div class="section" title="6.1.4.4.&nbsp;The Collection Reader and CAS Consumer tags"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.pear.installation_descriptor.collection_reader_cas_consumer">6.1.4.4.&nbsp;The Collection Reader and CAS Consumer tags</h4></div></div></div>


         <p>These sections of the installation descriptor are used by any specific
           Collection Reader or CAS Consumer to be used with the packaged analysis
           engine.</p>

       </div>
       <div class="section" title="6.1.4.5.&nbsp;The INSTALLATION section"><div class="titlepage"><div><div><h4 class="title" id="ugr.ref.pear.installation_descriptor.installation">6.1.4.5.&nbsp;The INSTALLATION section</h4></div></div></div>


         <p>The &lt;INSTALLATION&gt; section specifies the external dependencies of
           the component and the operations that should be performed during the PEAR package
           installation.</p>

         <p>The component dependencies are specified in the
           &lt;DELEGATE_COMPONENT&gt; sub-sections, as shown in the installation
           descriptor template above.</p>

         <p>Important: The ID element should be the first element in each
           &lt;DELEGATE_COMPONENT&gt; sub-section.</p>

         <p>The &lt;INSTALLATION&gt; section may specify the following operations:

           </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>Setting environment variables that are
             required to run the installed component.
             </p>
             <p>This is also how you specify additional classpaths
           for a Java component - by specifying the setting of an environmental variable
             named CLASSPATH.  The <code class="literal">buildComponentClasspath</code> method
           of the PackageBrowser class builds a classpath string from what it finds in
           the CLASSPATH specification here, plus adds a classpath entry for all
           Jars in the <code class="literal">lib</code> directory.  Because of this, there is no need
             to specify Class Path entries for Jars in the lib directory, when using
             the Eclipse plugin pear packager or the Maven Pear Packager.</p>

             <div class="blockquote"><blockquote class="blockquote"><p>When specifying the value of the CLASSPATH environment
             variable, use the semicolon ";" as the separator character, regardless of the
             target Operating System conventions.  This delimiter will be replaced with
             the right one for the Operating System during PEAR installation.</p>
             </blockquote></div>

             <p>If your component needs to set the UIMA datapath you must specify the necessary
             datapath setting using an environment variable with the key <code class="literal">uima.datapath</code>.
             When such a key is specified the <code class="literal">getComponentDataPath</code> method of the
             PackageBrowser class will return the specified datapath settings for your component.
             </p>

             <div class="warning" title="Warning" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Warning</h3><p>Do not put UIMA Framework Jars into the lib directory of your
             PEAR; doing so will cause system failures due to class loading issues.</p></div>
             </li><li class="listitem"><p>Note that you can use <span class="quote">&#8220;<span class="quote">macros</span>&#8221;</span>, like
               $main_root or $component_id$root in the VAR_VALUE element of the
               &lt;PARAMETERS&gt; sub-section.</p></li><li class="listitem"><p>Finding and replacing string expressions in files.</p>
               </li><li class="listitem"><p>Note that you can use the <span class="quote">&#8220;<span class="quote">macros</span>&#8221;</span> in the FILE
               and REPLACE_WITH elements of the &lt;PARAMETERS&gt; sub-section. </p>
               </li></ul></div>

         <p>Important: the ACTION element always should be the 1st element in each
           &lt;PROCESS&gt; sub-section.</p>

         <p>By default, the PEAR Installer will try to process every file in the desc and
           conf directories of the PEAR package in order to find the <span class="quote">&#8220;<span class="quote">macros</span>&#8221;</span>
           and replace them with actual path expressions. In addition to this, the installer
           will process the files specified in the
           &lt;INSTALLATION&gt; section.</p>

         <p>Important: all XML files which are going to be processed should be created
           using UTF-8 or UTF-16 file encoding. All other text files which are going to be
           processed should be created using the ASCII file encoding.</p>
       </div>
     </div>

     <div class="section" title="6.1.5.&nbsp;Packaging the PEAR structure into one file"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.pear.packaging_into_1_file">6.1.5.&nbsp;Packaging the PEAR structure into one file</h3></div></div></div>


       <p>The last step of the PEAR process is to simply <span class="bold"><strong>
         zip</strong></span> the content of the PEAR root folder (<span class="bold"><strong>not
         including the root folder itself</strong></span>) to a PEAR file with the extension <span class="quote">&#8220;<span class="quote">.pear</span>&#8221;</span>.</p>

       <p>To do this you can either use the PEAR packaging tools that are described in <span class="quote">&#8220;<span class="quote"><a href="tools.html#d5e1" class="olink">UIMA Tools Guide and Reference</a> <a href="tools.html#ugr.tools.pear.packager" class="olink">Chapter&nbsp;9, <i>PEAR Packager User's Guide</i></a></span>&#8221;</span> or you can use the PEAR packaging API that is shown below.</p>

       <p>
       To use the PEAR packaging API you first have to create the necessary information for the PEAR package:
         </p><pre class="programlisting">    //define PEAR data
     String componentID = "AnnotComponentID";
     String mainComponentDesc = "desc/mainComponentDescriptor.xml";
     String classpath ="$main_root/bin;";
     String datapath ="$main_root/resources;";
     String mainComponentRoot = "/home/user/develop/myAnnot";
     String targetDir = "/home/user/develop";
     Properties annotatorProperties = new Properties();
     annotatorProperties.setProperty("sysProperty1", "value1");</pre><p>

   	    To create a complete PEAR package in one step call:
   	    </p><pre class="programlisting">PackageCreator.generatePearPackage(
    componentID, mainComponentDesc, classpath, datapath,
    mainComponentRoot, targetDir, annotatorProperties);</pre><p>
         The created PEAR package has the file name &lt;componentID&gt;.pear and is located in the &lt;targetDir&gt;.
         </p>
         <p>
         To create just the PEAR installation descriptor in the main component root directory call:
         </p><pre class="programlisting">PackageCreator.createInstallDescriptor(componentID, mainComponentDesc,
    classpath, datapath, mainComponentRoot, annotatorProperties);</pre><p>

   	    To package a PEAR file with an existing installation descriptor call:
         </p><pre class="programlisting">PackageCreator.createPearPackage(componentID, mainComponentRoot,
    targetDir);</pre><p>
         The created PEAR package has the file name &lt;componentID&gt;.pear and is located in the &lt;targetDir&gt;.
   	  </p>

     </div>
   </div>
   <div class="section" title="6.2.&nbsp;Installing a PEAR package"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.pear.installing">6.2.&nbsp;Installing a PEAR package</h2></div></div></div>


     <p>The installation of a PEAR package can be done using
       the PEAR installer tool (see <a href="tools.html#d5e1" class="olink">UIMA Tools Guide and Reference</a> <a href="tools.html#ugr.tools.pear.installer" class="olink">Chapter&nbsp;11, <i>PEAR Installer User's Guide</i></a>, or by an application using
       the PEAR APIs, directly. </p>

     <p>During the PEAR installation the PEAR file is extracted to the installation directory and the PEAR macros
     in the descriptors are updated with the corresponding path. At the end of the installation the PEAR verification
     is called to check if the installed PEAR package can be started successfully. The PEAR verification use the classpath,
     datapath and the system property settings of the PEAR package to verify the PEAR content. Necessary Java library
     path settings for native libararies, PATH variable settings or system environment variables cannot be recognized
     automatically and the use must take care of that manually.</p>

     <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>By default the PEAR packages are not installed directly to the specified installation directory. For each PEAR
     a subdirectory with the name of the PEAR's ID is created where the PEAR package is installed to. If the PEAR installation
     directory already exists, the old content is automatically deleted before the new content is installed.</p></div>

     <div class="section" title="6.2.1.&nbsp;Installing a PEAR file using the PEAR APIs"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.pear.installing_pear_using_API">6.2.1.&nbsp;Installing a PEAR file using the PEAR APIs</h3></div></div></div>


       <p>The example below shows how to use the PEAR APIs to install a
       PEAR package and access the installed PEAR package data. For more details about the PackageBrowser API,
       please refer to the Javadocs for the org.apache.uima.pear.tools package.

       </p><pre class="programlisting">File installDir = new File("/home/user/uimaApp/installedPears");
 File pearFile = new File("/home/user/uimaApp/testpear.pear");
 boolean doVerification = true;

 try {
   // install PEAR package
   PackageBrowser instPear = PackageInstaller.installPackage(
  	installDir, pearFile, doVerification);

   // retrieve installed PEAR data
   // PEAR package classpath
   String classpath = instPear.buildComponentClassPath();
   // PEAR package datapath
   String datapath = instPear.getComponentDataPath();
   // PEAR package main component descriptor
   String mainComponentDescriptor = instPear
      	.getInstallationDescriptor().getMainComponentDesc();
   // PEAR package component ID
   String mainComponentID = instPear
      	.getInstallationDescriptor().getMainComponentId();
   // PEAR package pear descriptor
   String pearDescPath = instPear.getComponentPearDescPath();

   // print out settings
   System.out.println("PEAR package class path: " + classpath);
   System.out.println("PEAR package datapath: " + datapath);
   System.out.println("PEAR package mainComponentDescriptor: "
    	+ mainComponentDescriptor);
   System.out.println("PEAR package mainComponentID: "
    	+ mainComponentID);
   System.out.println("PEAR package specifier path: " + pearDescPath);

   } catch (PackageInstallerException ex) {
     // catch PackageInstallerException - PEAR installation failed
     ex.printStackTrace();
     System.out.println("PEAR installation failed");
   } catch (IOException ex) {
     ex.printStackTrace();
     System.out.println("Error retrieving installed PEAR settings");
   }</pre>

 	  <p>
 	    To run a PEAR package after it was installed using the PEAR API see the example below. It use the
 	    generated PEAR specifier that was automatically created during the PEAR installation.
 	    For more details about the APIs please refer to the Javadocs.


       </p><pre class="programlisting">File installDir = new File("/home/user/uimaApp/installedPears");
 File pearFile = new File("/home/user/uimaApp/testpear.pear");
 boolean doVerification = true;

 try {

   // Install PEAR package
   PackageBrowser instPear = PackageInstaller.installPackage(
   	installDir, pearFile, doVerification);

   // Create a default resouce manager
   ResourceManager rsrcMgr = UIMAFramework.newDefaultResourceManager();

   // Create analysis engine from the installed PEAR package using
   // the created PEAR specifier
   XMLInputSource in =
         new XMLInputSource(instPear.getComponentPearDescPath());
   ResourceSpecifier specifier =
         UIMAFramework.getXMLParser().parseResourceSpecifier(in);
   AnalysisEngine ae =
         UIMAFramework.produceAnalysisEngine(specifier, rsrcMgr, null);

   // Create a CAS with a sample document text
   CAS cas = ae.newCAS();
   cas.setDocumentText("Sample text to process");
   cas.setDocumentLanguage("en");

   // Process the sample document
   ae.process(cas);
   } catch (Exception ex) {
          ex.printStackTrace();
   }</pre>

     </div>

   </div>

     <div class="section" title="6.3.&nbsp;PEAR package descriptor"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.pear.specifier">6.3.&nbsp;PEAR package descriptor</h2></div></div></div>


     <p>
        To run an installed PEAR package directly in the UIMA framework the <code class="literal">pearSpecifier</code>
        XML descriptor can be used. Typically during the PEAR installation such an specifier is automatically generated
        and contains all the necessary information to run the installed PEAR package. Settings for system environment
        variables, system PATH settings or Java library path settings cannot be recognized
        automatically and must be set manually when the JVM is started.
     </p>

     <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>The PEAR may contain specifications for "environment variables" and their settings.
       When such a PEAR is run
     directly in the UIMA framework, those settings (except for Classpath and Data Path) are converted
     to Java System properties, and set to the specified values.  Java cannot set true environmental variables;
     if such a setting is needed, the application would need to arrange to do this prior to invoking Java.</p>

     <p>The Classpath and Data Path settings are used by UIMA to configure a special Resource Manager
     that is used when code from this PEAR is being run.</p></div>

           <p>
        The generated PEAR descriptor
        is located in the component root directory of the installed PEAR package and has a filename like
        &lt;componentID&gt;_pear.xml.
     </p>
     <p>
        The PEAR package descriptor looks like:
     </p>
     <pre class="programlisting">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
 &lt;pearSpecifier xmlns="http://uima.apache.org/resourceSpecifier"&gt;
    &lt;pearPath&gt;/home/user/uimaApp/installedPears/testpear&lt;/pearPath&gt;
    &lt;parameters&gt;   &lt;!-- optional --&gt;
       &lt;parameter&gt; &lt;!-- any number, repeated --&gt;
         &lt;name&gt;name-of-the-parameter&lt;/name&gt;
         &lt;value&gt;string-value&lt;/value&gt;
       &lt;/parameter&gt;
    &lt;/parameters&gt;
 &lt;/pearSpecifier&gt;</pre>
     <p>
        The <code class="literal">pearPath</code> setting in the descriptor must point to the component root directory
        of the installed PEAR package.
     </p>
     <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3>

       <p>
          It is not possible to share resources between PEAR Analysis Engines that are instantiated using the PEAR
          descriptor. The PEAR runtime created for each PEAR descriptor has its own specific ResourceManager
          (unless exactly the same Classpath and Data Path are being used).
       </p>
     </div>

     <p>The optional <code class="literal">parameters</code> section, if used, specifies parameter values,
       which are used to customize / override parameter values in the PEAR descriptor.
       External Settings overrides continue to work for PEAR descriptors, and have precedence, if specified.
     </p>

   </div>

 </div>
   <div class="chapter" title="Chapter&nbsp;7.&nbsp;XMI CAS Serialization Reference" id="ugr.ref.xmi"><div class="titlepage"><div><div><h2 class="title">Chapter&nbsp;7.&nbsp;XMI CAS Serialization Reference</h2></div></div></div>


   <p>This is the specification for the mapping of the UIMA CAS into the XMI (XML Metadata
     Interchange<sup>[<a name="d5e2511" href="#ftn.d5e2511" class="footnote">7</a>]</sup>) format. XMI is an OMG standard for expressing object graphs in
     XML. The UIMA SDK provides support for XMI through the classes
     <code class="literal">org.apache.uima.cas.impl.XmiCasSerializer</code> and
     <code class="literal">org.apache.uima.cas.impl.XmiCasDeserializer</code>.</p>

   <div class="section" title="7.1.&nbsp;XMI Tag"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.xmi.xmi_tag">7.1.&nbsp;XMI Tag</h2></div></div></div>


     <p>The outermost tag is &lt;XMI&gt; and must include a version number and XML
       namespace attribute:


       </p><pre class="programlisting">&lt;xmi:XMI xmi:version="2.0" xmlns:xmi="http://www.omg.org/XMI"&gt;
   &lt;!-- CAS Contents here --&gt;
 &lt;/xmi:XMI&gt;</pre>

     <p>XML namespaces<sup>[<a name="d5e2521" href="#ftn.d5e2521" class="footnote">8</a>]</sup> are used throughout. The <span class="quote">&#8220;<span class="quote">xmi</span>&#8221;</span> namespace prefix is used to
       identify elements and attributes that are defined by the XMI specification. The XMI
       document will also define one namespace prefix for each CAS namespace, as described in
       the next section.</p>

   </div>

   <div class="section" title="7.2.&nbsp;Feature Structures"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.xmi.feature_structures">7.2.&nbsp;Feature Structures</h2></div></div></div>


     <p>UIMA Feature Structures are mapped to XML elements. The name of the element is
       formed from the CAS type name, making use of XML namespaces as follows.</p>

     <p>The CAS type namespace is converted to an XML namespace URI by the following rule:
       replace all dots with slashes, prepend http:///, and append .ecore.</p>

     <p>This mapping was chosen because it is the default mapping used by the Eclipse
       Modeling Framework (EMF)<sup>[<a name="d5e2529" href="#ftn.d5e2529" class="footnote">9</a>]</sup> to create namespace URIs from Java package names. The use of
       the http scheme is a common convention, and does not imply any HTTP communication. The
       .ecore suffix is due to the fact that the recommended type system definition for a
       namespace is an ECore model, see <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.xmi_emf" class="olink">Chapter&nbsp;8, <i>XMI and EMF Interoperability</i></a>.</p>

     <p>Consider the CAS type name <span class="quote">&#8220;<span class="quote">org.myproj.Foo</span>&#8221;</span>. The CAS namespace
       (<span class="quote">&#8220;<span class="quote">org.myorg.</span>&#8221;</span>) is converted to the XML namespace URI is
       http:///org/myproj.ecore.</p>

     <p>The XML element name is then formed by concatenating the XML namespace prefix
       (which is an arbitrary token, but typically we use the last component of the CAS
       namespace) with the type name (excluding the namespace).</p>

     <p>So the example <span class="quote">&#8220;<span class="quote">org.myproj.Foo</span>&#8221;</span> FeatureStructure is written to
       XMI as:


       </p><pre class="programlisting">&lt;xmi:XMI
     xmi:version="2.0"
     xmlns:xmi="http://www.omg.org/XMI"
     xmlns:myproj="http:///org/myproj.ecore"&gt;
   ...
   &lt;myproj:Foo xmi:id="1"/&gt;
   ...
 &lt;/xmi:XMI&gt;</pre>

     <p>The xmi:id attribute is only required if this object will be referred to from
       elsewhere in the XMI document. If provided, the xmi:id must be unique for each
       feature.</p>

     <p>All namespace prefixes (e.g. <span class="quote">&#8220;<span class="quote">myproj</span>&#8221;</span>) in this example must be
       bound to URIs using the <span class="quote">&#8220;<span class="quote">xmlns...</span>&#8221;</span> attribute, as defined by the XML
       namespaces specification.</p>
   </div>

   <div class="section" title="7.3.&nbsp;Primitive Features"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.xmi.primitive_features">7.3.&nbsp;Primitive Features</h2></div></div></div>


     <p>CAS features of primitive types (String, Boolean, Byte, Short, Integer, Long ,
       Float, or Double) can be mapped either to XML attributes or XML elements. For example, a
       CAS FeatureStructure of type org.myproj.Foo, with features:


       </p><pre class="programlisting">begin   = 14
 end     = 19
 myFeature = "bar"</pre><p>
       could be mapped to:


       </p><pre class="programlisting">&lt;xmi:XMI xmi:version="2.0" xmlns:xmi="http://www.omg.org/XMI"
     xmlns:myproj="http:///org/myproj.ecore"&gt;
   ...
   &lt;myproj:Foo xmi:id="1" begin="14" end="19" myFeature="bar"/&gt;
   ...
 &lt;/xmi:XMI&gt;</pre><p>
       or equivalently:


       </p><pre class="programlisting">&lt;xmi:XMI xmi:version="2.0" xmlns:xmi="http://www.omg.org/XMI"
     xmlns:myproj="http:///org/myproj.ecore"&gt;
   ...
   &lt;myproj:Foo xmi:id="1"&gt;
     &lt;begin&gt;14&lt;/begin&gt;
     &lt;end&gt;19&lt;/end&gt;
     &lt;myFeature&gt;bar&lt;/myFeature&gt;
   &lt;/myproj:Foo&gt;
   ...
 &lt;/xmi:XMI&gt;</pre>

     <p>The attribute serialization is preferred for compactness, but either
       representation is allowable. Mixing the two styles is allowed; some features can be
       represented as attributes and others as elements.</p>

   </div>

   <div class="section" title="7.4.&nbsp;Reference Features"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.xmi.reference_features">7.4.&nbsp;Reference Features</h2></div></div></div>


     <p>CAS features that are references to other feature structures (excluding arrays
       and lists, which are handled separately) are serialized as ID references.</p>

     <p>If we add to the previous CAS example a feature structure of type org.myproj.Baz,
       with feature <span class="quote">&#8220;<span class="quote">myFoo</span>&#8221;</span> that is a reference to the Foo object, the
       serialization would be:


       </p><pre class="programlisting">&lt;xmi:XMI xmi:version="2.0" xmlns:xmi="http://www.omg.org/XMI"
     xmlns:myproj="http:///org/myproj.ecore"&gt;
   ...
   &lt;myproj:Foo xmi:id="1" begin="14" end="19" myFeature="bar"/&gt;
   &lt;myproj:Baz xmi:id="2" myFoo="1"/&gt;
   ...
 &lt;/xmi:XMI&gt;</pre>

     <p>As with primitive-valued features, it is permitted to use an element rather than an
       attribute. However, the syntax is slightly different:</p>


     <pre class="programlisting">&lt;myproj:Baz xmi:id="2"&gt;
    &lt;myFoo href="#1"/&gt;
 &lt;myproj.Baz&gt;</pre>

     <p>Note that in the attribute representation, a reference feature is
       indistinguishable from an integer-valued feature, so the meaning cannot be
       determined without prior knowledge of the type system. The element representation is
       unambiguous.</p>

   </div>

   <div class="section" title="7.5.&nbsp;Array and List Features"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.xmi.array_and_list_features">7.5.&nbsp;Array and List Features</h2></div></div></div>


     <p>For a CAS feature whose range type is one of the CAS array or list types, the XMI serialization depends on the
       setting of the <span class="quote">&#8220;<span class="quote">multipleReferencesAllowed</span>&#8221;</span> attribute for that feature in the UIMA Type System
       Description (see <a href="references.html#ugr.ref.xml.component_descriptor.type_system.features" class="olink">Section&nbsp;2.3.3, &#8220;Features&#8221;</a>).</p>

     <p>An array or list with multipleReferencesAllowed = false (the default) is serialized as a
       <span class="quote">&#8220;<span class="quote">multi-valued</span>&#8221;</span> property in XMI. An array or list with multipleReferencesAllowed = true is
       serialized as a first-class object. Details are described below.</p>

     <div class="section" title="7.5.1.&nbsp;Arrays and Lists as Multi-Valued Properties"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.xmi.array_and_list_features.as_multi_valued_properties">7.5.1.&nbsp;Arrays and Lists as Multi-Valued Properties</h3></div></div></div>


       <p>In XMI, a multi-valued property is the most natural XMI representation for most cases. Consider the
         example where the FeatureStructure of type org.myproj.Baz has a feature myIntArray whose value is the
         integer array {2,4,6}. This can be mapped to:

         </p><pre class="programlisting">&lt;myproj:Baz xmi:id="3" myIntArray="2 4 6"/&gt;</pre><p> or
         equivalently:


         </p><pre class="programlisting">&lt;myproj:Baz xmi:id="3"&gt;
   &lt;myIntArray&gt;2&lt;/myIntArray&gt;
   &lt;myIntArray&gt;4&lt;/myIntArray&gt;
   &lt;myIntArray&gt;6&lt;/myIntArray&gt;
 &lt;/myproj:Baz&gt;</pre><p>
         </p>

       <p>Note that String arrays whose elements contain embedded spaces MUST use the latter mapping.</p>

       <p>FSArray or FSList features are serialized in a similar way. For example an FSArray feature that contains
         references to the elements with xmi:id's <span class="quote">&#8220;<span class="quote">13</span>&#8221;</span> and <span class="quote">&#8220;<span class="quote">42</span>&#8221;</span> could be
         serialized as:

         </p><pre class="programlisting">&lt;myproj:Baz xmi:id="3" myFsArray="13 42"/&gt;</pre><p> or:


         </p><pre class="programlisting">&lt;myproj:Baz xmi:id="3"&gt;
   &lt;myFsArray href="#13"/&gt;
   &lt;myFsArray href="#42"/&gt;
 &lt;/myproj:Baz&gt;</pre><p>
         </p>
     </div>

     <div class="section" title="7.5.2.&nbsp;Arrays and Lists as First-Class Objects"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.xmi.array_and_list_features.as_1st_class_objects">7.5.2.&nbsp;Arrays and Lists as First-Class Objects</h3></div></div></div>


       <p>The multi-valued-property representation described in the previous section does not allow multiple
         references to an array or list object. Therefore, it cannot be used for features that are defined to allow
         multiple references (i.e. features for which multipleReferencesAllowed = true in the Type System
         Description).</p>

       <p>When multipleReferencesAllowed is set to true, array and list features are serialized as references,
         and the array or list objects are serialized as separate objects in the XMI. Consider again the example where
         the FeatureStructure of type org.myproj.Baz has a feature myIntArray whose value is the integer array
         {2,4,6}. If myIntArray is defined with multipleReferencesAllowed=true, the serialization will be as
         follows:

         </p><pre class="programlisting">&lt;myproj:Baz xmi:id="3" myIntArray="4"/&gt;</pre><p> or:


         </p><pre class="programlisting">&lt;myproj:Baz xmi:id="3"&gt;
   &lt;myIntArray href="#4"/&gt;
 &lt;/myproj:Baz&gt;</pre><p>
         with the array object serialized as

         </p><pre class="programlisting">&lt;cas:IntegerArray xmi:id="4" elements="2 4 6"/&gt;</pre><p> or:


         </p><pre class="programlisting">&lt;cas:IntegerArray xmi:id="4"&gt;
   &lt;elements&gt;2&lt;/elements&gt;
   &lt;elements&gt;4&lt;/elements&gt;
   &lt;elements&gt;6&lt;/elements&gt;
 &lt;/cas:IntegerArray&gt;</pre>

       <p>Note that in this case, the XML element name is formed from the CAS type name (e.g.
         <span class="quote">&#8220;<span class="quote"><code class="literal">uima.cas.IntegerArray</code></span>&#8221;</span>) in the same way as for other
         FeatureStructures. The elements of the array are serialized either as a space-separated attribute named
         <span class="quote">&#8220;<span class="quote">elements</span>&#8221;</span> or as a series of child elements named <span class="quote">&#8220;<span class="quote">elements</span>&#8221;</span>.</p>

       <p>List nodes are just standard FeatureStructures with <span class="quote">&#8220;<span class="quote">head</span>&#8221;</span> and <span class="quote">&#8220;<span class="quote">tail</span>&#8221;</span>
         features, and are serialized using the normal FeatureStructure serialization. For example, an
         IntegerList with the values 2, 4, and 6 would be serialized as the four objects:


         </p><pre class="programlisting">&lt;cas:NonEmptyIntegerList xmi:id="10" head="2" tail="11"/&gt;
 &lt;cas:NonEmptyIntegerList xmi:id="11" head="4" tail="12"/&gt;
 &lt;cas:NonEmptyIntegerList xmi:id="12" head="6" tail="13"/&gt;
 &lt;cas:EmptyIntegerList xmi:id"13"/&gt;</pre>

       <p>This representation of arrays allows multiple references to an array of list. It also allows a feature
         with range type TOP to refer to an array or list. However, it is a very unnatural representation in XMI and does
         not support interoperability with other XMI-based systems, so we instead recommend using the
         multi-valued-property representation described in the previous section whenever it is possible.</p>

       <p>When a feature is specified in the descriptor without a multipleReferencesAllowed attribute, or with the
       attribute specified as <code class="code">false</code>, but the framework discovers multiple references during
       serialization, it will issue a message to the log say that it discovered this (look for the phrase
       "serialized in duplicate").  The serialization will continue, but the multiply-referenced items will
       be serialized in duplicate.</p>
     </div>

     <div class="section" title="7.5.3.&nbsp;Null Array/List Elements"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.xmi.null_array_list_elements">7.5.3.&nbsp;Null Array/List Elements</h3></div></div></div>


       <p>In UIMA, an element of an FSArray or FSList may be null. In XMI, multi-valued properties do not permit null
         values. As a workaround for this, we use a dummy instance of the special type cas:NULL, which has xmi:id 0.
         For example, in the following example the <span class="quote">&#8220;<span class="quote">myFsArray</span>&#8221;</span> feature refers to an FSArray whose
         second element is null:


         </p><pre class="programlisting">&lt;cas:NULL xmi:id="0"/&gt;
 &lt;myproj:Baz xmi:id="3"&gt;
   &lt;myFsArray href="#13"/&gt;
   &lt;myFsArray href="#0"/&gt;
   &lt;myFsArray href="#42"/&gt;
 &lt;/myproj:Baz&gt;</pre>

     </div>

   </div>

   <div class="section" title="7.6.&nbsp;Subjects of Analysis (Sofas) and Views"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.xmi.sofas_views">7.6.&nbsp;Subjects of Analysis (Sofas) and Views</h2></div></div></div>


     <p>A UIMA CAS contain one or more subjects of analysis (Sofas). These are serialized no
       differently from any other feature structure. For example:


       </p><pre class="programlisting">&lt;?xml version="1.0"?&gt;
 &lt;xmi:XMI xmi:version="2.0" xmlns:xmi=http://www.omg.org/XMI
     xmlns:cas="http:///uima/cas.ecore"&gt;
   &lt;cas:Sofa xmi:id="1" sofaNum="1"
       text="the quick brown fox jumps over the lazy dog."/&gt;
 &lt;/xmi:XMI&gt;</pre>

     <p>Each Sofa defines a separate View. Feature Structures in the CAS can be members of
       one or more views. (A Feature Structure that is a member of a view is indexed in its
       IndexRepository, but that is an implementation detail.)</p>

     <p>In the XMI serialization, views will be represented as first-class objects. Each
       View has an (optional) <span class="quote">&#8220;<span class="quote">sofa</span>&#8221;</span> feature, which references a sofa, and
       multi-valued reference to the members of the View. For example:</p>


     <pre class="programlisting">&lt;cas:View sofa="1" members="3 7 21 39 61"/&gt;</pre>

     <p>Here the integers 3, 7, 21, 39, and 61 refer to the xmi:id fields of the objects that
       are members of this view.</p>
   </div>

   <div class="section" title="7.7.&nbsp;Linking an XMI Document to its Ecore Type System"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.xmi.linking_to_ecore_type_system">7.7.&nbsp;Linking an XMI Document to its Ecore Type System</h2></div></div></div>


     <p>If the CAS Type System has been saved to an Ecore file (as described in <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.xmi_emf" class="olink">Chapter&nbsp;8, <i>XMI and EMF Interoperability</i></a>), it is possible to store a
       link from an XMI document to that Ecore type system. This is done using an xsi:schemaLocation attribute
       on the root XMI element.</p>

     <p>The xsi:schemaLocation attribute is a space-separated list that represents a
       mapping from namespace URI (e.g. http:///org/myproj.ecore) to the physical URI of the
       .ecore file containing the type system for that namespace. For example:


       </p><pre class="programlisting">xsi:schemaLocation=
   "http:///org/myproj.ecore file:/c:/typesystems/myproj.ecore"</pre><p>
       would indicate that the definition for the org.myproj CAS types is contained in the file
       <code class="literal">c:/typesystems/myproj.ecore</code>. You can specify a different
       mapping for each of your CAS namespaces, using a space separated list. For details see
       Budinsky et al. <span class="emphasis"><em>Eclipse Modeling Framework</em></span>.</p>
   </div>

   <div class="section" title="7.8.&nbsp;Delta CAS XMI Format"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.xmi.delta">7.8.&nbsp;Delta CAS XMI Format</h2></div></div></div>


    <p>
    The Delta CAS XMI serialization format is designed primarily to reduce the overhead serialization when calling annotators
    configured as services. Only Feature Structures and Views that are new or modified by the service
    are serialized and returned by the service.
    </p>
    <p>
    The classes <code class="literal">org.apache.uima.cas.impl.XmiCasSerializer</code> and
     <code class="literal">org.apache.uima.cas.impl.XmiCasDeserializer</code> support serialization of only the modifications to the CAS.
     A caller is expected to set a marker to indicate the point from which changes to the CAS are to be tracked.
    </p>
    <p>
    A Delta CAS XMI document contains only the Feature Structures and Views that have been added or modified.
    The new and modified Feature Structures are represented in exactly the format as in a complete CAS serialization.
    The <code class="literal"> cas:View </code> element has been extended with three additional attributes to represent modifications to
    View membership. These new attributes are <code class="literal">added_members</code>, <code class="literal">deleted_members</code> and
    <code class="literal">reindexed_members</code>. For example:
    </p>
     <pre class="programlisting">&lt;cas:View sofa="1" added_members="63 77"
           deleted_member="7 61" reindexed_members="39" /&gt;</pre>
     <p>
     Here the integers 63, 77 represent xmi:id fields of the objects that have been newly added members to this View,
     7 and 61 are xmi:id fields of the objects that have been removed from this view and 39 is the xmi:id of an object to be reindexed in this view.
     </p>
   </div>
 <div class="footnotes"><br><hr width="100" align="left"><div class="footnote"><p><sup>[<a id="ftn.d5e2511" href="#d5e2511" class="para">7</a>] </sup> For details on XMI see Grose et al. <span class="emphasis"><em>Mastering
     XMI. Java Programming with XMI, XML, and UML. </em></span>John Wiley &amp; Sons, Inc.
     2002.</p></div><div class="footnote"><p><sup>[<a id="ftn.d5e2521" href="#d5e2521" class="para">8</a>] </sup>http://www.w3.org/TR/xml-names11/</p>
       </div><div class="footnote"><p><sup>[<a id="ftn.d5e2529" href="#d5e2529" class="para">9</a>] </sup> For details on EMF and Ecore see Budinsky et
       al. <span class="emphasis"><em>Eclipse Modeling Framework 2.0</em></span>. Addison-Wesley.
       2006.</p></div></div></div>
   <div class="chapter" title="Chapter&nbsp;8.&nbsp;Compressed Binary CASes" id="ugr.ref.compress"><div class="titlepage"><div><div><h2 class="title">Chapter&nbsp;8.&nbsp;Compressed Binary CASes</h2></div></div></div>


   <div class="section" title="8.1.&nbsp;Binary CAS Compression overview"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.compress.overview">8.1.&nbsp;Binary CAS Compression overview</h2></div></div></div>


     <p>UIMA has a proprietary binary serialization format, used internally
     for several things, including communicating with embedded C++ annotators using
     UIMA-CPP.  This binary format is also selectable for use with UIMA-AS.  Its use
     requires that the source and target systems implement the identical type system
     (because the type system is not sent, and internal coding is used within the
     format that is keyed to the particular type system).</p>

     <p>Starting with version 2.4.1, two additional forms of binary serialization are added.
     Both compress the data being serialized; typical size ratios can approach 50 : 1,
     depending on the exact contents of the CAS, when compared with normal binary serialization.
     </p>

     <p>The two forms are called 4 and 6, for historical/internal reasons.  The serialized forms
     of both of these is fixed, but not currently standardized, and the form being used is encoded in the header so
     that the appropriate deserializer can be chosen.  Both forms include support for Delta CAS
     being returned from a service.</p>

     <p>Form 6 builds on form 4, and adds: serializing only those feature structures which
     are reachable (that is, in some index, or referenced by other reachable feature structures),
     and type filtering.</p>

     <p>Type filtering takes a source type system and a target type system, and for serializing
     (source to target), sends the binary representation of reachable feature structures in the target's type system.
     For deserializing (reading a target into a source), the filtering takes the specification being read
     as being encoded using the target's type system, and translates that into the source's type system.
     In this process, types which exist in the source but not the target are skipped (when serializing);
     types which exist in the target, but not the source are skipped when deserializing.

     Features that exist in some
     source type but not in the version of the same type in the target are skipped (when serializing)
     or set to default values (i.e., 0 or null) when being deserialized.</p>

     <p>There are two main use cases for using compressed forms.  The first one is for communicating with
     UIMA-AS remote services (not yet implemented).

     </p>

     <p>The second use case is for saving compressed representations of CASes to other media, such as disk files,
     where they can be deserialized later for use in other UIMA applications.</p>

   </div>


   <div class="section" title="8.2.&nbsp;Using Compressed Binary CASes"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.compress.usage">8.2.&nbsp;Using Compressed Binary CASes</h2></div></div></div>


     <p>The main user interface for serializing a CAS using compression is to use one of the
     static methods named serializeWithCompression in Serialization.  If you pass a Type System argument representing
     a target type system, then form 6 compression is used; otherwise form 4 is used.
     To get the benefit of only serializing reachable Feature Structure instances, without type mapping
     (which is only in form 6), pass a type system argument which is null.
     </p>

     <p>To deserialize into a CAS without type mapping, use one of the deserialize method in Serialization.
     There are multiple forms of this method, depending on the arguments.  The forms which take extra arguments
     include a ReuseInfo may only be used with serialized forms created with form 6 compression.
     The plain form of deserialize works with all forms of binary serialization, compressed and non-compressed, by examining a common
     header which identifies the form of binary serialization used; however, for form 6, since it requires
     additional arguments, it will fail - and you need to use the other deserialize form.</p>

     <p>Form 6 has an additional object, ReuseInfo, which holds information which
     is required for subsequent Delta CAS format serializations / deserializations.
     It can speed up subsequent serializations of the same
     CAS (before it is further updated), for instance, if an application is sending the CAS to multiple services in parallel.
     The serializeWithCompression method returns this object when form 6 is being used.

     </p>
     <p>In addition, the CasIOUtils class offers static load and save methods, which can be used with the SerialFormat
     enum to serialize and deserialize to URLs or streams; see the Javadocs for details.</p>
   </div>

   <div class="section" title="8.3.&nbsp;Simple Delta CAS serialization"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.compress.simple-deltas">8.3.&nbsp;Simple Delta CAS serialization</h2></div></div></div>

     <p>Use Form 4 for this, because form 6 supports delta CAS but requires
     that at the time of deserialization of a CAS (on the receiver side) which will later be delta serialized
     back to the sender,
     an instance of the ReuseInfo must be saved, and that
     same instance then used for delta serialization; furthermore, the original serialization
     (on the sender side)
     also must save an instance of the ReuseInfo and use this when deserializing the delta CAS.
     </p>

     <p>Form 4 may not be as efficient as form 6 in that it does not filter the CASes
     either by type systems nor by only sending reachable Feature Structure
     instances.  But, it doesn't require a ReuseInfo object when doing delta serialization or
     deserialization,
     so it may be more convenient to use when saving
     delta CASes to files (as opposed to the other use case of
     a remote service returning delta CASes to a remote client).</p>
   </div>

   <div class="section" title="8.4.&nbsp;Use Case cookbook"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.compress.use-cases">8.4.&nbsp;Use Case cookbook</h2></div></div></div>

     <p>
     Here are some use cases, together with a suggested approach and example of how to use the APIs.
     </p>

     <p>
       <span class="strong"><strong>Save a CAS to an output stream, using form 4 (no type system filtering):</strong></span>
     </p>
           <pre class="programlisting">// set up an output stream.  In this example, an internal byte array.
 ByteArrayOutputStream baos = new ByteArrayOutputStream(OUT_BFR_INIT_SZ);
 Serialization.serializeWithCompression(casSrc, baos);
   // or
 CasIOUtls.save(casSrc, baos, SerialFormat.COMPRESSED);
 </pre>

       <p><span class="strong"><strong>Deserialize from a stream into an existing CAS:</strong></span></p>
       <pre class="programlisting">// assume the stream is a byte array input stream
 // For example, one could be created
 //   from the above ByteArrayOutputStream as follows:
 ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
 // Deserialize into a cas having the identical type system
 Serialization.deserializeCAS(cas, bais);
   // or
 CasIOUtils.load(bais, aCas);
 </pre>

 <p>Note that the <code class="code">deserializeCAS(cas, inputStream)</code> method is a general way to
 deserialize into a CAS from an inputStream for all forms of binary serialized data
 (with exceptions as noted above).
 The method reads a common header, and based on what it finds, selects the appropriate
 deserialization routine.</p>

 <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>The <code class="code">deserialization</code> method with just 2 arguments method doesn't support type filtering, or
 delta cas deserializating for form 6. To do those, see example below.
 </p>
 </div>

 <p><span class="strong"><strong>Serialize to an output stream, filtering out some types and/or features:</strong></span>
 </p>
 <p>
 To do this, an additional input specifying the Type System of the target must
 be supplied; this Type System should be a subset of the source CAS's.
 The <code class="code">out</code> parameter may be an OutputStream, a DataOutputStream, or a File.
 </p>

 <pre class="programlisting">// set up an output stream.  In this example, an internal byte array.
 ByteArrayOutputStream baos = new ByteArrayOutputStream(OUT_BFR_INIT_SZ);
 Serialization.serializeWithCompression(cas, out, tgtTypeSystem);
 </pre>

 <p><span class="strong"><strong>Deserialize with type filtering:</strong></span></p>
 <p>There are 2 type systems involved here: one is the receiving CAS, and the other is the type system
 used to decode the serialized form.  This may optionally be stored with the serialized form:</p>
 <pre class="programlisting">CasIOUtils.save(cas, out, SerialFormat.COMPRESSED_FILTERED_TS);
 </pre>
 <p>and/or it can be supplied at load time.  Here's two examples of suppling this at load time:</p>
 <pre class="programlisting">CasIOUtils.load(input, cas, typeSystem);
 CasIOUtils.load(input, type_system_serialized_form_input, cas);
 </pre>

 <p>The reuseInfo should be null unless
 deserializing a delta CAS, in which case, it must be the reuse info captured when
 the original CAS was serialized out.
 If the target type system is identical to the one in the CAS, you may pass null for it.
 If a delta cas is not being received, you must pass null for the reuseInfo.
 </p>
 <pre class="programlisting">ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
 Serialization.deserializeCAS(cas, bais, tgtTypeSystem, reuseInfo);
 </pre>
 </div>


 </div>
   <div class="chapter" title="Chapter&nbsp;9.&nbsp;JSON Serialization of CASs and UIMA Description objects" id="ugr.ref.json"><div class="titlepage"><div><div><h2 class="title">Chapter&nbsp;9.&nbsp;JSON Serialization of CASs and UIMA Description objects</h2></div></div></div>


   <div class="section" title="9.1.&nbsp;JSON serialization support overview"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.json.overview">9.1.&nbsp;JSON serialization support overview</h2></div></div></div>


     <p>Applications are moving to the "cloud", and new applications are being rapidly developed that are hooking
     things up using various mashup techniques.  New standards and conventions are emerging to support this kind
     of application development, such as REST services.
     JSON is now a popular way for services to communicate;
     its popularity is rising (in 2014) while XML is falling.</p>

     <p>Starting with version 2.7.0, JSON style serialization (but not (yet) deserialization)
     for CASs and UIMA descriptions is supported.
     The exact format of the serialization is configurable in several aspects.
     The implementation is built on top of the Jackson JSON generation library.
     </p>

     <p>The next section discusses serialization for CASes, while a later section describes serialization
     of description objects, such as type system descriptions.</p>
   </div>

   <div class="section" title="9.2.&nbsp;JSON CAS Serialization"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ug.ref.json.cas">9.2.&nbsp;JSON CAS Serialization</h2></div></div></div>


     <p>CASs primarily consist of collections of Feature Structures (FSs). Similar to XMI serialization, JSON
     serialization skips serializing unreachable FSs, outputting only those FSs that are found in the indexes (these are called
     <span class="emphasis"><em>roots</em></span>), plus all of
     the FSs that are referenced via some chain of references, from the roots.
     </p>

     <p>To support the kinds of things users do with FSs,
     the serialized form may be augmented to include additional information beyond the FSs.</p>
     <p>For traditional UIMA implementations, the serialized formats mostly assumed that the receivers had access to
     a type system description, which specified details of the types of each feature value.  For JSON serialization,
     some of this information can be including directly in the serialization.</p>

     <p>This abbreviated type system information is one kind of additional information that can be included;
     here's a summary list of the various kinds of additional information you can add to the serialization:</p>
     <div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">
         <p>having a way to identify which fields in a FS should be treated as references to other FSs, or
         as representing serialized binary data from UIMA byte arrays.</p>
       </li><li class="listitem">
         <p>something like XML namespaces to allow the use of short type names in the serialization while handling name
         collisions</p>
       </li><li class="listitem">
         <p>enough of the UIMA type hierarchy to allow the common operation of iterating over a type together
         with all of its subtypes</p>
       </li><li class="listitem"><p>A way to identify which FSs were "added-to-the-indexes" (separately, per CAS View)
       and therefore serve as roots when
       iterating over types.</p>
       </li><li class="listitem"><p>An identification of the associated type system definition</p></li></ul></div>

     <p>Simple JSON serialization does not have a convention for supporting these, but many extensions do.
     We borrow some of the concepts in the JSON-LD (linked data) standard in providing this
     additional information.</p>

     <div class="section" title="9.2.1.&nbsp;The Big Picture"><div class="titlepage"><div><div><h3 class="title" id="ug.ref.json.cas.bigpic">9.2.1.&nbsp;The Big Picture</h3></div></div></div>


 	    <p>CAS JSON serialization consists of several parts: an optional _context, the set of Feature Structures,
 	    and (if doing a delta serialization) information about changes to what was indexed.</p>

 	    <div class="figure"><a name="ug.ref.json.fig.bigpic"></a><div class="figure-contents">

 		    <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="347"><tr><td><img src="images/references/ref.json/big_picture2.png" width="347" alt="The big picture showing the parts of serialization, with the _context optional."></td></tr></table></div>
       </div><p class="title"><b>Figure&nbsp;9.1.&nbsp;The major sections of JSON serialization</b></p></div><br class="figure-break">

     <p>The serializer can be configured to omit
     the _context or parts of the _context for cases where that information isn't needed.  The index changes
     information is only included if Delta CAS serialization is specified.  Note that Delta CAS support
     is incomplete; so this information is just for planning purposes.</p>
     </div>

     <div class="section" title="9.2.2.&nbsp;The _context section"><div class="titlepage"><div><div><h3 class="title" id="ug.ref.json.cas.context">9.2.2.&nbsp;The _context section</h3></div></div></div>

           <p>The _context section has entries for each used type as well as some special additional entries.
           Each entry for a type has multiple sub-entries, identified
           by a key-name.  Each sub-entry can be selectively omitted if not needed.


           </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="bold"><strong>_type_system</strong></span> - a URI of the type system information</p></li><li class="listitem"><p><span class="bold"><strong>_types</strong></span> - information about each used type
               </p><div class="itemizedlist"><ul class="itemizedlist" type="circle"><li class="listitem"><p><span class="bold"><strong>_id</strong></span> - the type's fully qualified UIMA type name</p></li><li class="listitem"><p><span class="bold"><strong>_feature_types</strong></span> - a map from features of this type to
 		                                information about the type of the value of the feature</p></li><li class="listitem"><p><span class="bold"><strong>_subtypes</strong></span> - an array of used subtype short-names</p></li></ul></div><p>
             </p></li></ul></div><p>
           </p>


 			    <p>Here's an example:</p>
 			    <div class="informalexample">

           <pre class="programlisting">"_context" : {
   "_type_system" : "URI to the type system information",
   "_types : {
     "A_Typical_User_or_built_in_Type" : {
       "_id" : "org.apache.uima.test.A_Typical_User_or_built_in_Type",
       "_feature_types" : [
            "sofa"         : "_ref",
            "aFS"          : "_ref",
            "an_array"     : "_array",
            "a_byte_array" : "_byte_array"],
       "_subtypes" : [ "subtype1", "subtype2", ... ] },
     "Sofa" : {
       "_id" : "uima.cas.Sofa",
       "_feature_types" : {"sofaArray" : "_ref"} }
   }
 }</pre></div>

       <p>The <span class="bold"><strong>_type_system</strong></span> is an optional URI that references a UIMA type system description that
       defines the types for the CAS being serialized.</p>

       <p>In the <span class="bold"><strong>_types</strong></span> section, the key (e.g. "Sofa" or "A_Typical_User_or_built_in_Type") is the "short" name
       for the type used in the serialization.
       It is either just
       the last segment of the full type name (e.g. for the type x.y.z.TypeName, it's TypeName), or,
       if name would collide with another type name if just the last segment
       was used (example:  some.package.cname.Foo,  and some.other.package.cname.Foo), then the key is made up of
       the next-to-last segment, with an optional suffixed incrementing integer in case of collisions on that name,
       a colon (:) and then the last name.</p>

       <div class="blockquote"><blockquote class="blockquote"><p>In this example, since the next to last segment of both names is
       "cname", one namespace name would be "cname", and the other would be "cname1".  The keys in this case would be
       cname:Foo and cname1:Foo.</p></blockquote></div>

       <p>The value of the _id is the fully qualified name of the type.</p>

       <p>The <span class="bold"><strong>_feature_types</strong></span> values of _ref, _array, and _byte_array indicate the corresponding values
       of the named features need special handling
       when deserailized.
       </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="bold"><strong>_ref</strong></span> - used when features are deserialized as numbers, but they are to be
       interpreted as references to other FSs whose <code class="code">id</code> is the number.  UIMA lists and arrays of
       FSs are marked with _ref; if the value is a JSON array, the elements of the array will be either
       numbers (to be interpreted as references), or embedded serializations of FSs.</p></li><li class="listitem"><p><span class="bold"><strong>_array</strong></span> - used when features are serialized as JSON
         arrays containing embedded values,
       unless the corresponding UIMA object has
       multiple references, in which case it is serialized as a FS reference which looks like a single number.
       If a feature is marked with _array, then a non-array, single number should be interpreted as the
       <code class="code">id</code> of the feature structure that is the array or the first element of the list of items.
       This designation is used for both UIMA arrays and lists.</p>

       <p>This designation is for arrays and lists of primitive values, except for byte arrays.
       In the case of FS arrays and lists, the _ref designation is used instead of this to indicate that the
       resulting values in a JSON array that look like numbers should be interpreted as references.</p></li><li class="listitem"><p><span class="bold"><strong>_byte_array</strong></span> - _byte_array features are serialized numbers (if they are a
       reference to a separate object, or as strings (if embedded).  The strings are to be decoded into
       binary byte arrays using the Base64 encoding (the standard one used by Jackson to serialize binary data).</p></li></ul></div><p>
       </p>

       <p>
       Note that single element arrays are <span class="emphasis"><em>not</em></span> unwrapped, as in some other JSON serializations, to enable distinguishing
       references to arrays from embedded arrays.
       </p>

       <p><span class="bold"><strong>_subtypes</strong></span> are a list of the type's used subtypes.  A type is <span class="emphasis"><em>used</em></span>
        if it is the type of a Feature Structure
       being serialized,
       or if it is in the supertype chain of some Feature Structure which is serialized.  If a type has no
       used subtypes, this element is omitted.
       The names are represented as the "short" name.  Users typically use this information
       to construct support for iterators over a type which includes all of its subtypes.</p>


       <div class="section" title="9.2.2.1.&nbsp;Omitting parts of the _context section"><div class="titlepage"><div><div><h4 class="title" id="ug.ref.json.cas.context.omit">9.2.2.1.&nbsp;Omitting parts of the _context section</h4></div></div></div>

           <p>It is possible to selectively omit some of the
           _context sections (or the entire _context), via configuration.
           Here's an example:</p>

           <div class="informalexample">

           <pre class="programlisting">// make a new instance to hold the serialization configuration
 JsonCasSerializer jcs = new JsonCasSerializer();
 // Omit the expanded type names information
 jcs.setJsonContext(JsonContextFormat.omitExpandedTypeNames);</pre></div>

           <p>See the Javadocs for <code class="code">JsonContextFormat</code> for how to specify the parts.</p>
       </div>

     </div>

     <div class="section" title="9.2.3.&nbsp;Serializing Feature Structures"><div class="titlepage"><div><div><h3 class="title" id="ug.ref.json.cas.featurestructures">9.2.3.&nbsp;Serializing Feature Structures</h3></div></div></div>


     <p>Feature Structures themselves are represented as JSON objects consisting of field - value pairs, where the
     fields correspond to UIMA Features, and the values are the values of the features.
     </p>

     <p>The various kinds of values for a UIMA feature are represented by their natural JSON counterpart.
     UIMA primitive boolean values are represented by JSON true and false literals. UIMA Strings are
     represented as JSON strings.  Numbers are represented by JSON numbers.
     Byte Arrays are represented by the Jackson standard binary encoding (base64 encoding), written as JSON strings.
     References to other Feature Structures are also represented as JSON integer numbers, the values of which are
     interpreted as ids of the referred-to
     FSs.  These ids are treated in the same manner as the xmi:ids of XMI Serialization.  Arrays and Lists when
     embedded (see following section) are represented as JSON arrays using the [] notation.</p>

     <p>Besides the feature values defined for a Feature Structure, an additional special feature
     may be serialized:  _type.
     The _type is the type name, written using the short format.  This is automatically included when the type cannot
     easily be
     inferred from other contextual information.
     </p>

     <p>Here's an example, with some comments which, since JSON doesn't support comments, are just here for explanation:</p>
               <div class="informalexample">
     <pre class="programlisting">{ "_type" : "Annotation", // _type may be omitted
   "feat1" : true,   // boolean value represented as true or false
   "feat2" : 123,    // could be a number or a reference to FS with id 123
   "feat3" : "b3axgh"//could be a string or a base64 encoded byte array
 }</pre></div>


     <div class="section" title="9.2.3.1.&nbsp;Embedding normally referenced values"><div class="titlepage"><div><div><h4 class="title" id="ug.ref.json.cas.featurestructures.embedding">9.2.3.1.&nbsp;Embedding normally referenced values</h4></div></div></div>


       <p>Consider a FS which has a feature that refers to another FS.  This can be serialized in one of two ways:</p>
       <div class="itemizedlist"><ul class="itemizedlist" type="disc" compact><li class="listitem"><p>the value of the feature can be coded as an <code class="code">id</code> (a number), where the number is the <code class="code">id</code> of the
         referred-to FS.</p></li><li class="listitem"><p>The value of the feature can be coded as the serialization of the referred-to FS.</p></li></ul></div>

       <p>
       This second way of encoding is often done by JSON style serializations, and is called "embedding".  Referred-to
       FSs may be embedded if there are no other references to the embedded FS.  Multiple references may arise due to
       having a FS referenced as a "root" in some CAS View, or being used as a value in a FS feature.</p>

       <p>Following the XMI conventions, UIMA arrays and lists which are
       identified as singly referenced by either the static or dynamic method (see below) are embedded
       directly as the value of a feature.  In this case, the JSON serialization writes out the value of the feature
       as a JSON array.  Otherwise, the value is written out as a FS reference, and a separate serialization occurs of
       the list elements or the array.</p>

       <p>In addition to arrays and lists, FSs which are identifed as singly referenced from another FS are
       serialized as the embedded value of the referring feature.
       This is also done (when using the dynamic method) for singly referenced rooted instances.
       </p>
       <p>
       If a FS is multiply referenced, the serialization in these
       cases is just the numeric value of the <code class="code">id</code> of the FS.</p>
       </div>

       <div class="section" title="9.2.3.2.&nbsp;Dynamic vs Static multiple-references and embedding"><div class="titlepage"><div><div><h4 class="title" id="ug.ref.json.cas.featurestructures.dynamicstatic">9.2.3.2.&nbsp;Dynamic vs Static multiple-references and embedding</h4></div></div></div>


       <p>There are two methods of determining if a particular FS or list or array can be embedded.

       </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="bold"><strong>dynamic</strong></span> - calculates at serilization time whether or not there
         are multiple references to a given FS.</p></li><li class="listitem"><p><span class="bold"><strong>static</strong></span> - looks in the type system definition to see if
         the feature is marked with &lt;multipleReferencesAllowed&gt;.
         </p><div class="itemizedlist"><ul class="itemizedlist" type="circle" compact><li class="listitem"><p><code class="code">multipleReferencesAllowed</code> false <span class="symbol">&#8594;</span> use the embedded style</p></li><li class="listitem"><p><code class="code">multipleReferencesAllowed</code> true <span class="symbol">&#8594;</span> use separate objects</p></li></ul></div><p>
         Note that since this flag is not available for
         references to FSs from View indexes, any FS that is indexed in any view is considered (if using static mode)
         to be multipleReferencesAllowed.
         </p></li></ul></div><p>
       </p>

       <p>Delta serialization only supports the static method; this mode is forced on if delta serialization
       is specified.</p>

       <p>Dynamic embedding is enabled by default for JSON, but may be disabled via configuration.</p>
     </div>

     <div class="section" title="9.2.3.3.&nbsp;Embedded Arrays and Lists"><div class="titlepage"><div><div><h4 class="title" id="ug.ref.json.cas.featurestructures.embeddedArraysLists">9.2.3.3.&nbsp;Embedded Arrays and Lists</h4></div></div></div>


     <p>When static embedding is being used, a case can arise where some feature is marked to have only
     singly referenced FS values, but that value may actually be multiply referenced.  This is detected during
     serialization, and an message is issued if an error handler has been specified to the serializer.
     The serialization continues, however.  In the case of an Array, the value of the array is embedded
     in the serialization and the fact that these were referring to the same object is lost.
     In the case of a list, if any element in the list
     has multiple references (for example,  if the list has back-references, loops, etc.),
     the serialization of the list is truncated at the point where the multiple reference
     occurs.</p>

     <div class="blockquote"><blockquote class="blockquote"><p>Note that you can correctly serialize arbitrarily linked complex list structures created
     using the built-in list types only if you use dynamic embedding, or
     specify <code class="code">multipleReferencesAllowed</code> = true.</p></blockquote></div>


     <p>Embedded list or array values are both serialized using the JSON array notation; as a result, these
     alternative representations are not distinguised in the JSON serialization.</p>
     </div>

     <div class="section" title="9.2.3.4.&nbsp;Omitting null values"><div class="titlepage"><div><div><h4 class="title" id="ug.ref.json.cas.featurestructures.null">9.2.3.4.&nbsp;Omitting null values</h4></div></div></div>


       <p>Following the conventions established in XMI serialization, features with <code class="code">null</code> values have their
       key-value pairs omitted from the FS serialization when the type of the feature value is:
       </p>
     <div class="itemizedlist"><ul class="itemizedlist" type="disc" compact><li class="listitem">
         <p>a Feature Structure Reference</p>
       </li><li class="listitem">
         <p>a String ( whose value is <code class="code">null</code>, not "" (a 0-length String))</p>
       </li><li class="listitem">
         <p>an embedded Array or List (where the entire array and/or list is <code class="code">null</code>)</p>
       </li></ul></div>

     <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>Inside arrays or lists of FSs, references which are being serialized
     as references have a <code class="code">null</code> reference coded as the number 0; references which are embedded are serialized as
     <code class="code">null</code>.</p></div>

     <p>Configuring the serializer with <code class="code">setOmit0Values(true)</code> causes
     additional primitive features (byte/short/int/long/float/double) to be omitted, when their values are 0 or 0.0</p>

     </div>

     </div>

     </div>

     <div class="section" title="9.3.&nbsp;Organizing the Feature Structures"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ug.ref.json.cas.featurestructures.organization">9.3.&nbsp;Organizing the Feature Structures</h2></div></div></div>


     <p>The set of all FSs being serialized is divided into two parts.  The first part represents
     all FSs that are root FSs, in that they were in one or more indexes at the time of serialization.  The second part
     represents feature structures that are multiply referenced, or are referenced via a chain of references from the
     root FSs.  The same feature structure can appear in both lists.  The elements in the second part are actual
     serialized FSs, whereas, the elements in the first part are either references to the corresponding FSs in the
     second part, if they exist, or the actual embedded serialized FSs.  Actual embedded serialized FSs only
     exist once in the two parts.</p>

               <div class="informalexample">
     <pre class="programlisting">"_views" : {
   "_InitialView" : {
      "theFirstType" : [  { ... fs1 ...}, 123, 456, { ... fsn ...} ]
      "anotherType"  : [  { ... fs1 ...}, ... { ... fsn ...} ]
       ...     // more types which have roots in view "12"
          },
   "AnotherView" : {
      "theFirstType" : [  { ... fsv1 ...}, 123, { ... fsvn ...} ]
      "anotherType"  : [  { ... fsv1 ...}, ... { ... fsvn ...} ]
       ...     // more types which have roots in view "25"
          },
    ...        // more views
 },

 "_referenced_fss" : {
   "12" : {"_type" : "Sofa",  "sofaNum" : 1,  "sofaID" : "_InitialView" },
   "25" : {"_type" : "Sofa",  "sofaNum" : 2,  "sofaID" : "AnotherView" },

   "123" : { ... fs-123 ... },
   "456" : { ... fs-456 ... },
   ...
 }</pre></div>

     <p>The first part map is made up of multiple maps, one for each separate CAS View.
     The outer map is keyed by the <code class="code">id</code> of the corresponding SofaFS (or 0, if there is no corresponding SofaFS).
     For each view, the value is a map whose key is a used Type, and the values are an array of instances
     of FSs of that type which were found in some index; these are the "root" FSs.  Only root instances
     of a particular type are included in this array.
     </p>


     <p>The second part map has keys which are the <code class="code">id</code> value of the FSs, and values which are
     a map of key-value pairs corresponding to the feature-values of that FS.
     In this case, the _type extra feature is added to record the type.</p>


     <p>The _views map, keyed by view and type name, has all the FSs (as an JSON array) for that type that were in
     one or more indexes in any View.  If a FS in this array is not multiply referenced (using dynamic mode),
     then it is embedded here. Otherwise, only the reference (a simple number representing the <code class="code">id</code> of that FS) is serialized for that FS.</p>


     </div>


     <div class="section" title="9.4.&nbsp;Additional JSON CAS Serialization features"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ug.ref.json.cas.features">9.4.&nbsp;Additional JSON CAS Serialization features</h2></div></div></div>


     <p>JSON serialization also supports several additional features, including:</p>
     <div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">
         <p>Type and feature filtering: only types and features that exist in a specified type system description
         are serialized.</p>
       </li><li class="listitem">
         <p>An ErrorHandler; this will be called in various error situations, including when
         serializing in static mode an array or list value for a feature marked <code class="code">multipleReferencesAllowed = false</code>
         is found to have multiple references.</p>
       </li><li class="listitem">
         <p>A switch to control omitting of numeric features that have 0 values (default is to include these).
         See the <code class="code">setOmit0Values(true_or_false)</code> method in JsonCasSerializer.</p>
       </li><li class="listitem">
         <p>a pretty printing flag (default is not to do pretty-printing)</p>
       </li></ul></div>
     <p>See the Javadocs for JsonCasSerializer for details.</p>

     <div class="section" title="9.4.1.&nbsp;Delta CAS"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.json.delta">9.4.1.&nbsp;Delta CAS</h3></div></div></div>


       <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>Delta CAS support is incomplete, and is not supported as of release 2.7.0, but may
       be supported in later releases.  The information here is just for planning purposes.</p></div>

       <p><span class="bold"><strong>_delta_cas</strong></span> is present only when a delta CAS serialization is being performed.
       This serializes just the
       changes in the CAS since a Mark was set; so for cases where a large CAS is deserialized into a service, which
       then does a relatively small amount of additions and modifications, only those changes are serialized.
       The values of the keys are arrays of the ids of FSs that were added to the indexes,
       removed from the indexes, or reindexed.</p>

       <p>This mode requires the static embeddability mode.  When specified, a <code class="code">_delta_cas</code> key-value
       is added to the serialization at the end,
       which lists the FSs (by <code class="code">id</code>) that were added, removed, or reindexed, since the mark was set.
       Additional extra information, created when the CAS was previously deserialized and the mark set,
       must be passed to the serializer, in the form of an instance of <code class="code">XmiSerializationSharedData</code>,
       or JsonSerializationSharedData (not yet defined as of release 2.7.0).</p>

       <p>Here's what the last part of the serialization looks like, when Delta CAS is specified:
                 </p><div class="informalexample">
       <pre class="programlisting">"_delta_cas" : {
   "added_members" : [  123, ... ],
   "deleted_members" : [  456, ... ],
   "reindexed_members" : [] }</pre></div><p>
       </p>


     </div>
   </div>


   <div class="section" title="9.5.&nbsp;Using JSON CAS serialization"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.json.usage">9.5.&nbsp;Using JSON CAS serialization</h2></div></div></div>


     <p>The support is built on top the Jackson JSON serialization
     package.  We follow Jackson conventions for configuring.</p>

     <p>The serialization APIs are in the JsonCasSerializer class.</p>

     <p>Although there are some static short-cut methods for common use cases, the basic operations needed
     to serialize a CAS as JSON are:</p>

     <div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">
         <p>Make an instance of the <code class="code">JsonCasSerializer</code> class.  This will serve to collect configuration information.</p>
       </li><li class="listitem">
         <p>Do any additional configuration needed.  See the Javadocs for details.
         The following objects can be configured:</p>
         <div class="itemizedlist"><ul class="itemizedlist" type="circle"><li class="listitem">
             <p>The <code class="code">JsonCasSerializer</code> object: here you can specify the kind of JSON formatting, what to serialize,
             whether or not delta serialization is wanted, prettyprinting, and more.</p>
           </li><li class="listitem">
             <p>The underlying <code class="code">JsonFactory</code> object from Jackson.  Normally, you won't need to configure this.
             If you do, you can create your own instance of this object and configure it and use it in the
             serialization.</p>
           </li><li class="listitem">
             <p>The underlying <code class="code">JsonGenerator</code> from Jackson. Normally, you won't need to configure this.
             If you do, you can get the instance the serializer will be using and configure that.</p>
           </li></ul></div>
       </li><li class="listitem">
         <p>Once all the configuration is done, the serialize(...) call is done in this class,
         which will create a one-time-use
         inner class where the actual serialization is done.  The serialize(...) method is thread-safe, in that the same
         JsonCasSerializer instance (after it has been configured) can kick off multiple
         (identically configured) serializations
         on different threads at the same time.</p>
         <p>The serialize call follows the Jackson conventions, taking one of 3 specifications of where to serialize to:
         a Writer, an OutputStream, or a File.</p>
       </li></ul></div>

     <p>Here's an example:</p>
               <div class="informalexample">
     <pre class="programlisting">JsonCasSerializer jcs = new JsonCasSerializer();
 jcs.setPrettyPrint(true); // do some configuration
 StringWriter sw = new StringWriter();
 jcs.serialize(cas, sw); // serialize into sw</pre></div>

     <p>The JsonCasSerializer class also has some static convenience methods for JSON serialization, for the
     most common configuration cases; please see the Javadocs for details. These are named jsonSerialize, to
     distinguish them from the non-static serialize methods.</p>

     <p>Many of the common configuration methods generally return the instance, so they can be chained together.
     For example, if <code class="code">jcs</code> is an instance of the JsonCasSerializer, you can write
     <code class="code">jcs.setPrettyPrint(true).setOmit0values(true);</code> to configure both of these.</p>


   </div>

   <div class="section" title="9.6.&nbsp;JSON serialization for UIMA descriptors"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.json.descriptionserialization">9.6.&nbsp;JSON serialization for UIMA descriptors</h2></div></div></div>


     <p>UIMA descriptors are things like analysis engine descriptors, type system descriptors, etc.
     UIMA has an internal form for these, typically named UIMA <span class="emphasis"><em>description</em></span>s;
     these can be serialized out as XML using a <code class="code">toXML</code> method.
     JSON support adds the ability to serialize these a JSON objects, as well.  It may be of use, for example,
     to have the full type system description for a UIMA pipeline available in JSON notation.
     </p>

     <p>The class JsonMetaDataSerializer defines a set of static methods that serialize UIMA description objects
     using a toJson method that takes as an argument the description object to be serialized, and the standard
     set of serialiization targets that Jackson supports (File, Writer, or OutputStream).  There is also
     an optional prettyprint flag (default is no prettyprinting).</p>

     <p>The resulting JSON serialization is just a straight-forward serialization of the description object,
     having the same fields as the XML serialization of it.</p>

     <p>Here's what a small TypeSystem description looks like, serialized:</p>

               <div class="informalexample">
     <pre class="programlisting">{"typeSystemDescription" :
   {"name" : "casTestCaseTypesystem",
    "description" : "Type system description for CAS test cases.",
    "version" : "1.0",
    "vendor" : "Apache Software Foundation",
    "types" : [
      {"typeDescription" :
        {"name" : "Token",
         "description" : "",
          "supertypeName" : "uima.tcas.Annotation",
          "features" : [
            {"featureDescription" :
              {"name" : "type",
               "description" : "",
               "rangeTypeName" :
               "TokenType" } },
            {"featureDescription" :
              {"name" : "tokenFloatFeat",
               "description" : "",
               "rangeTypeName" : "uima.cas.Float" } } ] } },
      {"typeDescription" :
        {"name" : "TokenType",
         "description" : "",
         "supertypeName" : "uima.cas.TOP" } } ] } }</pre></div>

     <p>Here's a sample of code to serialize a UIMA description object held in the variable <code class="code">tsd</code>, with
     and without pretty printing:</p>


           <div class="informalexample">
     <pre class="programlisting">StringWriter sw = new StringWriter();
 JsonMetaDataSerializer.toJSON(tsd, sw); // no prettyprinting

 sw = new StringWriter();
 JsonMetaDataSerializer.toJSON(tsd, sw, true); // prettyprinting</pre></div>
   </div>

 </div>
   <div class="chapter" title="Chapter&nbsp;10.&nbsp;UIMA Setup and Configuration" id="ugr.ref.config"><div class="titlepage"><div><div><h2 class="title">Chapter&nbsp;10.&nbsp;UIMA Setup and Configuration</h2></div></div></div>


   <div class="section" title="10.1.&nbsp;UIMA JVM Configuration Properties"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.config.properties">10.1.&nbsp;UIMA JVM Configuration Properties</h2></div></div></div>


     <p> Some updates change UIMA's behavior between released versions.  For example, sometimes an error check
   is enhanced, and this can cause something that previously incorrect but not checked, to now signal an error.
   Often, users will want these kinds of things to be ignored, at least for a while, to give them time to
   analyze and correct the issues.
     </p>

     <p>
       To enable users to gradually address these issues, there are some global JVM properties
   for UIMA that can restore earlier behaviors, in some cases.
   These are detailed in the table below.  Additionally, there are other JVM properties that can
   be used in checking and optimizing some performance trade-offs, such as the automatic index protection.
   For the most part, you don't need to assign any values to these properties,
   just define them.  For example to disable the enhanced check that insures you
   don't add a subtype of AnnotationBase to the wrong View, you could disable this by
   adding the JVM argument <code class="code">-Duima.disable_enhanced_check_wrong_add_to_index</code>.
   This would remove the enhanced
   checking for this, added in version 2.7.0 (the previously existing partial checking is
   still there, though).
     </p>
   </div>

   <div class="section" title="10.2.&nbsp;Configuring index protection"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.config.protect-index">10.2.&nbsp;Configuring index protection</h2></div></div></div>


     <p>A new feature in version 2.7.0 optionally can include checking for invalid feature updates
     which could corrupt indexes.  Because this checking can slightly slow down performance, there are
     global JVM properties to control it.  The suggested way to operation with these is as follows.
     </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>At the beginning, run with automatic protection enabled (the default), but
 	    turn on explicit reporting (<code class="code">-Duima.report_fs_update_corrupts_index</code>)</p></li><li class="listitem"><p>For all reported instances, examine your code to see if you can restructure to
 	    do the updates before adding the FS to the indexes.  Where you cannot, surround the code doing
 	    these updates with a try / finally or block form of <code class="code">protectIndexes()</code>,
 	    which is described in
 	     <a class="xref" href="#ugr.ref.cas.updating_indexed_feature_structures" title="4.5.1.&nbsp;Updating indexed feature structures">Section&nbsp;4.5.1, &#8220;Updating indexed feature structures&#8221;</a> (and also is similarly available with JCas).
 	    </p></li><li class="listitem"><p>After no further reports, for maximum performance, leave in the protections
 	    you may have installed in the above step, and then disable the reporting and runtime checking,
 	    using the JVM argument
 	    <code class="code">-Duima.disable_auto_protect_indexes</code>, and removing (if present)
 	    <code class="code">-Duima.report_fs_update_corrupts_index</code>.</p></li></ul></div><p>
     One additional JVM property, <code class="code">-Duima.throw_exception_when_fs_update_corrupts_index</code>,
     is intended to be used in automated build / testing configurations.  It causes the framework to throw
     a UIMARuntimeException if an update outside of a <code class="code">protectIndexes</code> block occurs
     that could corrupt the indexes,
     rather than "recovering" this.
     </p>
   </div>

   <div class="section" title="10.3.&nbsp;Properties Table"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.config.property-table">10.3.&nbsp;Properties Table</h2></div></div></div>


     <p>This table describes the various JVM defined properties; specify these on the Java command line
     using -Dxxxxxx, where the xxxxxx is one of
     the properties starting with <code class="code">uima.</code> from the table below.</p>
     <div class="informaltable">
      <table style="border-collapse: collapse;border-top: 0.5pt solid black; border-bottom: 0.5pt solid black; border-left: 0.5pt solid black; border-right: 0.5pt solid black; "><colgroup><col class="Title"><col class="Description"><col class="Version"></colgroup><tbody><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><span class="bold"><strong>Title</strong></span></td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><span class="bold"><strong>Property Name &amp; Description</strong></span></td><td style="border-bottom: 0.5pt solid black; "><span class="bold"><strong>Since Version</strong></span></td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><p>Use built-in Java Logger as default back-end</p></td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><p><code class="code">uima.use_jul_as_default_uima_logger</code></p>

                   <p>See <a class="ulink" href="https://issues.apache.org/jira/browse/UIMA-5381" target="_top">UIMA-5381</a>.
                   The standard UIMA logger uses an slf4j implementation, which, in turn hooks up to
                   a back end implementation based on what can be found in the class path (see slf4j documentation).
                   If no backend implementation is found, the slf4j default is to use a NOP logger back end
                   which discards all logging.</p>

                   <p>When this flag is specified, the behavior of the UIMA logger
                         is altered to use the built-in-to-Java logging implementation
                         as the back end for the UIMA logger.
                   </p></td><td style="border-bottom: 0.5pt solid black; "><p>3.0.0</p></td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><p>XML: enable doctype declarations</p></td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><p><code class="code">uima.xml.enable.doctype_decl</code> (default is false)</p>

            <p>See <a class="ulink" href="https://issues.apache.org/jira/browse/UIMA-6064" target="_top">UIMA-6064</a>
            Normally, this is turned off to avoid exposure to malicious XML; see
            <a class="ulink" href="https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Processing" target="_top">
              XML External Entity processing vulnerability</a>.
            </p>
            </td><td style="border-bottom: 0.5pt solid black; "><p>2.10.4, 3.1.0</p></td></tr><tr><td style="border-bottom: 0.5pt solid black; " colspan="3" align="center"><span class="bold"><strong>Index protection properties</strong></span></td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><p>Report Illegal Index-key Feature Updates</p></td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><p><code class="code">uima.report_fs_update_corrupts_index</code> (default is not to report)</p>

                   <p>See <a class="ulink" href="https://issues.apache.org/jira/browse/UIMA-4135" target="_top">UIMA-4135</a>.
                         Updating Features which are used in Set and Sorted
                         indexes as "keys" may corrupt the indexes, if the Feature Structure (FS)
                         has been added to the indexes.  To update these, you must first
                         completely remove the FS from the indexes in all views, then do the updates, and then
                         add it back.  UIMA now checks for this (unless specifically disabled, see below),
                         and if this property is set, will log WARN messages for each occurrence unless
                         the user does explicit <code class="code">protectIndexes</code> (see CAS JavaDocs for CAS / JCas <code class="code">protectIndexes</code> methods), if this
                         property is defined.</p>
                    <p>To scan the logs for these reports, search for instances of lines having the string
                          <code class="code">While FS was in the index, the feature</code></p>

                    <p>Specifying this property overrides <code class="code">uima.disable_auto_protect_indexes</code>.</p>

                    <p>Users would run with this property defined, and then for high performance,
                         would use the report to manually change their code to avoid the problem or
                         to wrap the updates with a <code class="code">protectIndexes</code> kind of protection (see the
                         reference manual, in the CAS or JCas chapters, for examples of user code doing this,
                         and then run with the protection turned off (see below).

                         </p></td><td style="border-bottom: 0.5pt solid black; "><p>2.7.0</p></td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><p>Throw exception on illegal Index-key Feature Updates</p></td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><p><code class="code">uima.exception_when_fs_update_corrupts_index</code> (default is false)</p>

                   <p>See <a class="ulink" href="https://issues.apache.org/jira/browse/UIMA-4150" target="_top">UIMA-4150</a>.
                         Throws a UIMARuntimeException if an Indexed FS feature used as a key in one or more
                         indexes is updated, outside of an explicit <code class="code">protectIndexes</code> block..  \
                         This is intended for use in automated build and test environments,
                         to provide a strong signal if this kind of mistake gets into the build.
                         If it is not set, then the other properties specify if corruption should be checked for,
                         recovered automatically, and / or reported</p>

                    <p>Specifying this property also forces <code class="code">uima.report_fs_update_corrupts_index</code>
                          to true even if it was set to false.</p>

                    </td><td style="border-bottom: 0.5pt solid black; "><p>2.7.0</p></td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><p>Disable the index corruption checking</p></td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><p><code class="code">uima.disable_auto_protect_indexes</code></p>

                   <p>See <a class="ulink" href="https://issues.apache.org/jira/browse/UIMA-4135" target="_top">UIMA-4135</a>.
                         After you have fixed all reported issues identified with the above report,
                         you may set this property to omit this check, which may slightly improve
                         performance.</p>
                   <p>Note that this property is ignored if the <code class="code">-Dexception_when_fs_update_corrupts_index</code>
                   or <code class="code">-Dreport_fs_update_corrupts_index</code></p>
            </td><td style="border-bottom: 0.5pt solid black; "><p>2.7.0</p></td></tr><tr><td style="border-bottom: 0.5pt solid black; " colspan="3" align="center"><span class="bold"><strong>Measurement / Tracing properties</strong></span></td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><p>Trace Feature Structure Creation/Updating</p></td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><p><code class="code">uima.trace_fs_creation_and_updating</code></p>
                   <p>This causes a trace file to be produced in the current working directory.
                   The file has one line for each Feature Structure that is created, and include
                   information on the cas/cas-view, and the features that are set for the Feature Structure.
                   There is, additionally, one line for each Feature Structure update.
                   Updates that occur next-to trace information for the same Feature Structure are combined.
                   </p>

                   <p>This can generate a lot of output, and definitely slows down execution.</p>
             </td><td style="border-bottom: 0.5pt solid black; "><p>2.10.1</p></td></tr><tr><td style="border-right: 0.5pt solid black; "><p>Measure index flattening optimization</p></td><td style="border-right: 0.5pt solid black; "><p><code class="code">uima.measure.flatten_index</code></p>

                   <p>See <a class="ulink" href="https://issues.apache.org/jira/browse/UIMA-4357" target="_top">UIMA-4357</a>.
                         This creates a short report to System.out when Java is shutdown.
                         The report has some statistics about the automatic management of
                         flattened index creation and use.</p>

            </td><td style=""><p>2.8.0</p></td></tr></tbody></table>
    </div>
    <p>Some additional global flags intended for helping v3 migration are documented in the V3 user's guide.</p>
   </div>

 </div>
   <div class="chapter" title="Chapter&nbsp;11.&nbsp;UIMA Resources" id="ugr.ref.resources"><div class="titlepage"><div><div><h2 class="title">Chapter&nbsp;11.&nbsp;UIMA Resources</h2></div></div></div>


   <div class="section" title="11.1.&nbsp;What is a UIMA Resource?"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.resources.overview">11.1.&nbsp;What is a UIMA Resource?</h2></div></div></div>

     <p>UIMA uses the term <code class="code">Resource</code> to describe all UIMA components
     that can be acquired by an application or by other resources.</p>

     <div class="figure"><a name="ref.resource.fig.kinds"></a><div class="figure-contents">

       <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="297"><tr><td><img src="images/references/ref.resources/res_resource_kinds.png" width="297" alt="Resource Kinds, a partial list"></td></tr></table></div>
     </div><p class="title"><b>Figure&nbsp;11.1.&nbsp;Resource Kinds</b></p></div><br class="figure-break">

     <p>There are many kinds of resources; here's a list of the main kinds:
       </p><div class="variablelist"><dl><dt><span class="term"><span class="strong"><strong>Annotator</strong></span></span></dt><dd><p>a user written component, receives a CAS, does some processing, and returns the possibly
           updated CAS.  Variants include CollectionReaders, CAS Consumers, CAS Multipliers.</p></dd><dt><span class="term"><span class="strong"><strong>Flow Controller</strong></span></span></dt><dd><p>a user written component controlling the flow of CASes within an aggregate.</p></dd><dt><span class="term"><span class="strong"><strong>External Resource</strong></span></span></dt><dd><p>a user written component. Variants include:
             </p><div class="itemizedlist"><ul class="itemizedlist" type="disc" compact><li class="listitem"><p>Data - includes special lifecycle call to load data</p></li><li class="listitem"><p>Parameterized - allows multiple instantiations with simple string parameter variants;
                 example: a dictionary, that has variants in content for different languages</p></li><li class="listitem"><p>Configurable - supports configuration from the XML specifier</p></li></ul></div><p>
           </p></dd></dl></div><p>
     </p>

    <div class="section" title="11.1.1.&nbsp;Resource Inner Implementations"><div class="titlepage"><div><div><h3 class="title" id="ugr.ref.resources.resource-inner-implementations">11.1.1.&nbsp;Resource Inner Implementations</h3></div></div></div>


       <p>Many of the resource kinds include in their specification a (possibly optional) element, which is
       the name of a Java class which implements the resource.  We will call this class the "inner implementation".</p>

       <p>The UIMA framework creates instances of Resource from resource specifiers, by calling
       the framework's <code class="code">produceResource(specifier, additional_parameters)</code> method.
       This call produces a instance of Resource.  </p>

       <div class="blockquote"><blockquote class="blockquote">
         <p>
           For example, calling produceResource on an AnalysisEngineDescription produces an instance of
           AnalysisEngine.  This, in turn will have a reference to the user-written inner implementation class.
           specified by the <code class="code">annotatorImplementationName</code>.
         </p>
         <p>External resource descriptors may include an <code class="code">implementationName</code> element.
 	        Calling produceResource on a ExternalResourceDescription produces an instance of Resource;
 	        the resource obtained by subsequent calls to <code class="code">getResource(...)</code>
 	        is dependent on the particular descriptor, and may be an instance of
 	        the inner implementation class.
         </p>
       </blockquote></div>

       <p>For external resources, each resource specifier kind handles the case where
       the inner implementation is omitted.  If it is supplied, the named class must implement
       the interface specified in the bindings for this resource. In addition, the particular specifier kind may
       further restrict the kinds of classes the user supplies as the implementationName.
       </p>

       <p>Some examples of this further restriction:
         </p><div class="variablelist"><dl><dt><span class="term"><span class="strong"><strong>customResource</strong></span></span></dt><dd><p>the class must also implement the Resource interface</p></dd><dt><span class="term"><span class="strong"><strong>dataResource</strong></span></span></dt><dd><p>the class must also implement the SharedResourceObject interface</p></dd></dl></div><p>
       </p>

     </div>

   </div>

   <div class="section" title="11.2.&nbsp;Sharing Resources, even across pipelines"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.resources.sharing-across-pipelines">11.2.&nbsp;Sharing Resources, even across pipelines</h2></div></div></div>


     <p>UIMA applications run one or more UIMA Pipelines.  Each pipeline has a top-level Analysis Engine, which
     may be an aggregation of many other Analysis Engine components.  The UIMA framework instantiates Annotator
     resources as specified to configure the pipelines.</p>

     <p>Sometimes, many identical pipelines are created (for example,
     in order to exploit multi-core hardware by processing multiple CASes in parallel). In this case, the framework
     would produce multiple instances of those Annotation resources; these are implemented as multiple instances
     of the same Java class.</p>

     <p>Sets of External Resources plus a CAS Pool and UIMA Extension ClassLoader are set up and kept,
        per instance of a ResourceManager;
     this instance serves to allow sharing of these items across one or more pipelines.

     </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">
         <p>The UIMA Extension ClassLoader (if specified) is used to find the resources to be loaded
         by the framework</p>
       </li><li class="listitem">
         <p>The <code class="code">External Resources</code> are specified by a pipeline's resource configuration.</p>
       </li><li class="listitem">
         <p>The CAS Pool is a pool of CASs all with identical type systems and index definitions, associated
         with a pipeline.</p>
       </li></ul></div><p> </p>

     <p>When setting up a pipeline, the UIMA Framework's <code class="code">produceResource</code>
     or one of its specialized variants is called, and a new
     ResourceManager being created and used for that pipeline.  However, in many cases, it may be advantageous to
     share the same Resources across multiple pipelines; this is easily doable by passing a common instance of the
     ResourceManager to the pipeline creation methods (using the additional parameters of the produceResource method).</p>

     <p>
       To handle additional use cases, the ResourceManager has a <code class="code">copy()</code> method which creates a copy of the
       Resource Manager instance.  The new instance is created with a null CAS Manager; if you want to share the
       the CAS Pool, you have to copy the CAS Manager: <code class="code">newRM.setCasManager(originalRM.getCasManager())</code>.
       You also may set the Extension Class Loader in the new instance (PEAR wrappers use this to allow
       PEARs to have their own classpath).  See the Javadocs for details.
     </p>

   </div>

   <div class="section" title="11.3.&nbsp;External Resources support for multiple Parameterized Instances"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ref.resources.external-resource-multiple-parameterized-instances">11.3.&nbsp;External Resources support for multiple Parameterized Instances</h2></div></div></div>

     <p>A typical external resource gets a single instantiation, shared with all users of a particular
     ResourceManager.
     Sometimes, multiple instantiations may be useful (of the same resource).  The framework supports this for
     ParameterizedDataResources.  There's one kind supplied with UIMA - the fileLanguageResourceSpecifier.
     This works by having each call to getResource(name, extra_keys[]) use the extra keys to select a particular
     instance.  On the first call for a particular instance, the named resource uses the extra keys to
     initialize a new instance by calling its <code class="code">load</code> method with a data resource derived from the
     extra keys by the named resource.
     </p>

     <p>For example, the fileLanguageResourceSpecifier uses the language code and goes through
       a process with lots of defaulting and fall back to find a resource to load, based on the language code.
     </p>

   </div>

 </div>
 </div></body></html>