uima-docbook-references/src/docbook/ref.xml.component_descriptor.xml - uima-uimaj - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
 "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
 <!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >
 <!ENTITY tp "ugr.ref.xml.component_descriptor.">
 %uimaents;
 ]>
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->
 <chapter id="ugr.ref.xml.component_descriptor">
   <title>Component Descriptor Reference</title>

   <para>This chapter is the reference guide for the UIMA SDK&apos;s Component Descriptor XML
     schema. A <emphasis>Component Descriptor</emphasis> (also sometimes called a
     <emphasis>Resource Specifier</emphasis> in the code) is an XML file that either (a)
     completely describes a component, including all information needed to construct the
     component and interact with it, or (b) specifies how to connect to and interact with an
     existing component that has been published as a remote service.
     <emphasis>Component</emphasis> (also called <emphasis>Resource</emphasis>) is a
     general term for modules produced by UIMA developers and used by UIMA applications. The
     types of Components are: Analysis Engines, Collection Readers, CAS
     Initializers<footnote><para>This component is deprecated and should not be use in new
     development.</para></footnote>, CAS Consumers, and Collection Processing Engines.
     However, Collection Processing Engine Descriptors are significantly different in
     format and are covered in a separate chapter, <olink targetdoc="&uima_docs_ref;"
       targetptr="ugr.ref.xml.cpe_descriptor"/>.</para>

   <para><xref linkend="&tp;notation"/> describes the notation used in this
     chapter.</para>

   <para><xref linkend="&tp;imports"/> describes the UIMA SDK&apos;s
     <emphasis>import</emphasis> syntax, used to allow XML descriptors to import
     information from other XML files, to allow sharing of information between several XML
     descriptors.</para>

   <para><xref linkend="&tp;aes"/> describes the XML format for <emphasis>Analysis Engine
     Descriptors</emphasis>. These are descriptors that completely describe Analysis
     Engines, including all information needed to construct and interact with them.</para>

   <para><xref linkend="&tp;collection_processing_parts"/> describes the XML format for
     <emphasis>Collection Processing Component Descriptors</emphasis>. This includes
     Collection Iterator, CAS Initializer, and CAS Consumer Descriptors.</para>

   <para><xref linkend="&tp;service_client"/> describes the XML format for
     <emphasis>Service Client Descriptors</emphasis>, which specify how to connect to and
     interact with resources deployed as remote services.</para>

    <para><xref linkend="&tp;custom_resource_specifiers"/> describes the XML format for
     <emphasis>Custom Resource Specifiers</emphasis>, which allow you to plug in your
     own Java class as a UIMA Resource.</para>

   <section id="&tp;notation">
     <title>Notation</title>

     <para>This chapter uses an informal notation to specify the syntax of Component
       Descriptors. The formal syntax is defined by an XML schema definition, which is
       contained in the file <literal>resourceSpecifierSchema.xsd</literal>,
       located in the <literal>uima-core.jar</literal> file.</para>

     <para>The notation used in this chapter is:</para>

     <itemizedlist><listitem><para>An ellipsis (...) inside an element body indicates
       that the substructure of that element has been omitted (to be described in another
       section of this chapter). An example of this would be:


       <programlisting>&lt;analysisEngineMetaData&gt;
 ...
 &lt;/analysisEngineMetaData&gt;</programlisting>
       An ellipsis immediately after an element indicates that the element type may be may be
       repeated arbitrarily many times. For example:


       <programlisting>&lt;parameter&gt;[String]&lt;/parameter&gt;
 &lt;parameter&gt;[String]&lt;/parameter&gt;
 ...</programlisting>
       indicates that there may be arbitrarily many parameter elements in this
       context.</para></listitem>

       <listitem><para>Bracketed expressions (e.g. <literal>[String]</literal>)
         indicate the type of value that may be used at that location.</para></listitem>

       <listitem><para>A vertical bar, as in <literal>true|false</literal>, indicates
         alternatives. This can be applied to literal values, bracketed type names, and
         elements.</para></listitem>

       <listitem><para>Which elements are optional and which are required is specified in
         prose, not in the syntax definition. </para></listitem></itemizedlist>
   </section>

   <section id="&tp;imports">
     <title>Imports</title>

     <para>The UIMA SDK defines a particular syntax for XML descriptors to import information
       from other XML files. When one of the following appears in an XML descriptor:


       <programlisting>&lt;import location="[URL]" /&gt; or
 &lt;import name="[Name]" /&gt;</programlisting>
       it indicates that information from a separate XML file is being imported. Note that
       imports are allowed only in certain places in the descriptor. In the remainder of this
       chapter, it will be indicated at which points imports are allowed.</para>

     <para>If an import specifies a <literal>location</literal> attribute, the value of
       that attribute specifies the URL at which the XML file to import will be found. This can be
       a relative URL, which will be resolved relative to the descriptor containing the
       <literal>import</literal> element, or an absolute URL. Relative URLs can be written
       without a protocol/scheme (e.g., <quote>file:</quote>), and without a host machine
       name. In this case the relative URL might look something like
       <literal>org/apache/myproj/MyTypeSystem.xml.</literal></para>

     <para>An absolute URL is written with one of the following prefixes, followed by a path
       such as <literal>org/apache/myproj/MyTypeSystem.xml</literal>:

       <itemizedlist spacing="compact"><listitem><para>file:/ &larr; has no network
         address</para></listitem>
         <listitem><para>file:/// &larr; has an empty network address</para></listitem>
         <listitem><para>file://some.network.address/</para></listitem>
         </itemizedlist></para>

     <para>For more information about URLs, please read the javadoc information for the Java
       class <quote>URL</quote>.</para>

     <para>If an import specifies a <literal>name</literal> attribute, the value of that
       attribute should take the form of a Java-style dotted name (e.g.
       <literal>org.apache.myproj.MyTypeSystem</literal>). An .xml file with this name
       will be searched for in the classpath or datapath (described below). As in Java, the dots
       in the name will be converted to file path separators. So an import specifying the
       example name in this paragraph will result in a search for
       <literal>org/apache/myproj/MyTypeSystem.xml</literal> in the classpath or
       datapath.</para>

     <para id="&tp;datapath">The datapath works similarly to the classpath but can be set programmatically
       through the resource manager API. Application developers can specify a datapath
       during initialization, using the following code:


       <programlisting>
 ResourceManager resMgr = UIMAFramework.newDefaultResourceManager();
 resMgr.setDataPath(yourPathString);
 AnalysisEngine ae = UIMAFramework.produceAE(desc, resMgr, null);
 </programlisting></para>

     <para>The default datapath for the entire JVM can be set via the
       <literal>uima.datapath</literal> Java system property, but this feature should
       only be used for standalone applications that don&apos;t need to run in the same JVM as
       other code that may need a different datapath.</para>
     <para>Previous versions of UIMA also supported XInclude. That support didn't work in
       many situations, and it is no longer supported. To include other files, please use
       &lt;import&gt;.</para>
     <!--
     <para>The UIMA SDK also supports XInclude, a W3C candidate recommendation,
     to include XML files within other XML files.  However, it is recommended that the import syntax be used instead, as it
     is more flexible and better supports tool developers.</para>

     <note><para>UIMA tools for editing XML
     descriptors do not support the use of xi:include because they cannot correctly
     determine what parts of a descriptor are updatable, and what parts are included
     from other files.  They do support the
     use of &lt;import&gt;.
     </para></note>

     <para>To use XInclude, you first must include the XInclude
     namespace in your document&apos;s root element, e.g.:</para>

     <programlisting>&lt;analysisEngineDescription xmlns="http://uima.apache.org/resourceSpecifier" xmlns:xi="http://www.w3.org/2001/XInclude"&gt;</programlisting>

     <para>Then, you can include a file using the syntax <literal>&lt;xi:include
     href="[URL]"/&gt;</literal></para>

     <para>where [URL] can be any relative or absolute URL referring
     to another XML document.  The referred-to
     document must be a valid XML document, meaning that it must consist of exactly
     one root element and must define all of the namespace prefixes that it uses.  The default namespace (generally <literal>http://uima.apache.org/resourceSpecifier</literal>) will be
     inherited from the parent document.   When UIMA parses the XML document, it will automatically replace the <literal>&lt;xi:include&gt; </literal>element with the entire XML document
     referred to by the href.  For more
     information on XInclude see
     <a href="http://www.w3.org/TR/xinclude/">http://www.w3.org/TR/xinclude/</a>.</para>
     -->

   </section>

   <section id="&tp;type_system">
     <title>Type System Descriptors</title>

     <para>A Type System Descriptor is used to define the types and features that can be
       represented in the CAS. A Type System Descriptor can be imported into an Analysis Engine
       or Collection Processing Component Descriptor.</para>

     <para>The basic structure of a Type System Descriptor is as follows:


       <programlisting><![CDATA[<typeSystemDescription xmlns="http://uima.apache.org/resourceSpecifier">

   <name> [String] </name>
   <description>[String]</description>
   <version>[String]</version>
   <vendor>[String]</vendor>

   <imports>
     <import ...>
     ...
   </imports>

   <types>
     <typeDescription>
       ...
     </typeDescription>

     ...

   </types>

 </typeSystemDescription>]]></programlisting></para>

     <para>All of the subelements are optional.</para>

     <section id="&tp;type_system.imports">
       <title>Imports</title>

       <para>The <literal>imports</literal> section allows this descriptor to import
         types from other type system descriptors. The import syntax is described in <xref
           linkend="&tp;imports"/>. A type system may import any number of other type
         systems and then define additional types which refer to imported types. Circular
         imports are allowed.</para>
     </section>

     <section id="&tp;type_system.types">
       <title>Types</title>

       <para>The <literal>types</literal> element contains zero or more
         <literal>typeDescription</literal> elements. Each
         <literal>typeDescription</literal> has the form:


         <programlisting><![CDATA[<typeDescription>
   <name>[TypeName]</name>
   <description>[String]</description>
   <supertypeName>[TypeName]</supertypeName>
   <features>
     ...
   </features>
 </typeDescription>]]></programlisting></para>

       <para>The name element contains the name of the type. A
         <literal>[TypeName]</literal> is a dot-separated list of names, where each name
         consists of a letter followed by any number of letters, digits, or underscores.
         <literal>TypeNames</literal> are case sensitive. Letter and digit are as defined
         by Java; therefore, any Unicode letter or digit may be used (subject to the character
         encoding defined by the descriptor file&apos;s XML header). The name following the
         final dot is considered to be the <quote>short name</quote> of the type; the
         preceding portion is the namespace (analogous to the package.class syntax used in
         Java). Namespaces beginning with uima are reserved and should not be used. Examples
         of valid type names are:</para>

       <itemizedlist spacing="compact"><listitem><para>test.TokenAnnotation</para>
         </listitem>

         <listitem><para>org.myorg.TokenAnnotation</para></listitem>

         <listitem><para>com.my_company.proj123.TokenAnnotation </para></listitem>
         </itemizedlist>

       <para>These would all be considered distinct types since they have different
         namespaces. Best practice here is to follow the normal Java naming conventions of
         having namespaces be all lowercase, with the short type names having an initial
         capital, but this is not mandated, so <literal>ABC.mYtyPE</literal> is an allowed
         type name. While type names without namespaces (e.g.
         <literal>TokenAnnotation</literal> alone) are allowed, but discouraged because
         naming conflicts can then result when combining annotators that use different
         type systems.</para>

       <para>The <literal>description</literal> element contains a textual description
         of the type. The <literal>supertypeName</literal> element contains the name of the
         type from which it inherits (this can be set to the name of another user-defined type,
         or it may be set to any built-in type which may be subclassed, such as
         <literal>uima.tcas.Annotation</literal> for a new annotation
         type or <literal>uima.cas.TOP</literal> for a new type that is not
         an annotation). All three of these elements are required.</para>

     </section>

     <section id="&tp;type_system.features">
       <title>Features</title>

       <para>The <literal>features</literal> element of a
         <literal>typeDescription</literal> is required only if the type we are specifying
         introduces new features. If the <literal>features</literal> element is present,
         it contains zero or more <literal>featureDescription</literal> elements, each of
         which has the form:</para>


       <programlisting><![CDATA[<featureDescription>
   <name>[Name]</name>
   <description>[String]</description>
   <rangeTypeName>[Name]</rangeTypeName>
   <elementType>[Name]</elementType>
   <multipleReferencesAllowed>true|false</multipleReferencesAllowed>
 </featureDescription>]]></programlisting>

       <para>A feature&apos;s name follows the same rules as a type short name &ndash; a letter
         followed by any number of letters, digits, or underscores. Feature names are case
         sensitive.</para>

       <para>The feature&apos;s <literal>rangeTypeName</literal> specifies the type of
         value that the feature can take. This may be the name of any type defined in your type
         system, or one of the predefined types. All of the predefined types have names that are
         prefixed with <literal>uima.cas</literal> or <literal>uima.tcas</literal>,
         for example:


         <programlisting>uima.cas.TOP
 uima.cas.String
 uima.cas.Long
 uima.cas.FSArray
 uima.cas.StringList
 uima.tcas.Annotation.</programlisting>
         For a complete list of predefined types, see the CAS API documentation.</para>

       <para>The <literal>elementType</literal> of a feature is optional, and applies only
         when the <literal>rangeTypeName</literal> is
         <literal>uima.cas.FSArray</literal> or <literal>uima.cas.FSList</literal>
         The <literal>elementType</literal> specifies what type of value can be assigned as
         an element of the array or list. This must be the name of a non-primitive type. If
         omitted, it defaults to <literal>uima.cas.TOP</literal>, meaning that any
         FeatureStructure can be assigned as an element the array or list. Note: depending on
         the CAS Interface that you use in your code, this constraint may or may not be
         enforced.
         Note: At run time, the elementType is available from a runtime Feature object
             (using the <literal>a_feature_object.getRange().getComponentType()</literal> method)
             only when specified for the <literal>uima.cas.FSArray</literal> ranges; it isn't
             available for <literal>uima.cas.FSList</literal> ranges.
         </para>


       <para>The <literal>multipleReferencesAllowed</literal> feature is optional, and
         applies only when the <literal>rangeTypeName</literal> is an array or list type (it
         applies to arrays and lists of primitive as well as non-primitive types). Setting
         this to false (the default) indicates that this feature has exclusive ownership of
         the array or list, so changes to the array or list are localized. Setting this to true
         indicates that the array or list may be shared, so changes to it may affect other
         objects in the CAS. Note: there is currently no guarantee that the framework will
         enforce this restriction. However, this setting may affect how the CAS is
         serialized.</para>

     </section>

     <section id="&tp;type_system.string_subtypes">
       <title>String Subtypes</title>

       <para>There is one other special type that you can declare &ndash; a subset of the String
         type that specifies a restricted set of allowed values. This is useful for features
         that can have only certain String values, such as parts of speech. Here is an example of
         how to declare such a type:</para>


       <programlisting><![CDATA[<typeDescription>
   <name>PartOfSpeech</name>
   <description>A part of speech.</description>
   <supertypeName>uima.cas.String</supertypeName>
   <allowedValues>
     <value>
       <string>NN</string>
       <description>Noun, singular or mass.</description>
     </value>
     <value>
       <string>NNS</string>
       <description>Noun, plural.</description>
     </value>
     <value>
       <string>VB</string>
       <description>Verb, base form.</description>
     </value>
     ...
   </allowedValues>
 </typeDescription>]]></programlisting>

     </section>
   </section>

   <section id="&tp;aes">
     <title>Analysis Engine Descriptors</title>

     <para>Analysis Engine (AE) descriptors completely describe Analysis Engines. There
       are two basic types of Analysis Engines &ndash; <emphasis>Primitive</emphasis> and
       <emphasis>Aggregate</emphasis>. A <emphasis>Primitive</emphasis> Analysis
       Engine is a container for a single <emphasis>annotator</emphasis>, where as an
       <emphasis>Aggregate</emphasis> Analysis Engine is composed of a collection of other
       Analysis Engines. (For more information on this and other terminology, see <olink
         targetdoc="&uima_docs_overview;" targetptr="ugr.ovv.conceptual"/>).</para>

     <para>Both Primitive and Aggregate Analysis Engines have descriptors, and the two types
       of descriptors have some similarities and some differences. <xref linkend="&tp;aes.primitive"/>
       discusses Primitive Analysis Engine descriptors.  <xref linkend="&tp;aes.aggregate"/> then
       describes how Aggregate Analysis Engine descriptors are different.</para>

     <section id="&tp;aes.primitive">
       <title>Primitive Analysis Engine Descriptors</title>

       <section id="&tp;aes.primitive.basic">
         <title>Basic Structure</title>


         <programlisting><![CDATA[<?xml version="1.0" encoding="UTF-8" ?>
 <analysisEngineDescription
         xmlns="http://uima.apache.org/resourceSpecifier">
   <frameworkImplementation>org.apache.uima.java</frameworkImplementation>

   <primitive>true</primitive>
   <annotatorImplementationName> [String] </annotatorImplementationName>

   <analysisEngineMetaData>
     ...
   </analysisEngineMetaData>

   <externalResourceDependencies>
     ...
   </externalResourceDependencies>

   <resourceManagerConfiguration>
     ...
   </resourceManagerConfiguration>

 </analysisEngineDescription>]]></programlisting>

         <para>The document begins with a standard XML header. The recommended root tag is
           <literal>&lt;analysisEngineDescription&gt;</literal>, although
           <literal>&lt;taeDescription&gt;</literal> is also allowed for backwards
           compatibility.</para>

         <para>Within the root element we declare that we are using the XML namespace
           <literal>http://uima.apache.org/resourceSpecifier.</literal> It is
           required that this namespace be used; otherwise, the descriptor will not be able to
           be validated for errors.</para>

         <para> The first subelement,
           <literal>&lt;frameworkImplementation&gt;,</literal> currently must have
           the value <literal>org.apache.uima.java</literal>, or
           <literal>org.apache.uima.cpp</literal>. In future versions, there may be
           other framework implementations, or perhaps implementations produced by other
           vendors.</para>

         <para>The second subelement, <literal>&lt;primitive&gt;,</literal> contains
           the Boolean value <literal>true</literal>, indicating that this XML document
           describes a <emphasis>Primitive</emphasis> Analysis Engine.</para>

         <para>The next subelement,<literal>
           &lt;annotatorImplementationName&gt;</literal> is how the UIMA framework
           determines which annotator class to use. This should contain a fully-qualified
           Java class name for Java implementations, or the name of a .dll or .so file for C++
           implementations.</para>

         <para>The <literal>&lt;analysisEngineMetaData&gt;</literal> object contains
           descriptive information about the analysis engine and what it does. It is
           described in <xref linkend="&tp;aes.metadata"/>.</para>

         <para>The <literal>&lt;externalResourceDependencies&gt;</literal> and
           <literal>&lt;resourceManagerConfiguration&gt;</literal> elements declare
           the external resource files that the analysis engine relies
           upon. They are optional and are described in <xref
             linkend="&tp;aes.primitive.external_resource_dependencies"/> and <xref
             linkend="&tp;aes.primitive.resource_manager_configuration"/>.</para>

         </section>

         <section id="&tp;aes.metadata">
           <title>Analysis Engine MetaData</title>


           <programlisting><![CDATA[<analysisEngineMetaData>
   <name> [String] </name>
   <description>[String]</description>
   <version>[String]</version>
   <vendor>[String]</vendor>

   <configurationParameters> ...  </configurationParameters>

   <configurationParameterSettings>
     ...
   </configurationParameterSettings>

   <typeSystemDescription> ... </typeSystemDescription>

   <typePriorities> ... </typePriorities>

   <fsIndexCollection> ... </fsIndexCollection>

   <capabilities> ... </capabilities>

   <operationalProperties> ... </operationalProperties>

 </analysisEngineMetaData>]]></programlisting>

           <para>The <literal>analysisEngineMetaData</literal> element contains four
             simple string fields &ndash; <literal>name</literal>,
             <literal>description</literal>, <literal>version</literal>, and
             <literal>vendor</literal>. Only the <literal>name</literal> field is
             required, but providing values for the other fields is recommended. The
             <literal>name</literal> field is just a descriptive name meant to be read by
             users; it does not need to be unique across all Analysis Engines.</para>

           <para>The other sub-elements &ndash;
             <literal>configurationParameters</literal>,
             <literal>configurationParameterSettings</literal>,
             <literal>typeSystemDescription</literal>,
             <literal>typePriorities</literal>, <literal>fsIndexes</literal>,
             <literal>capabilities</literal> and
             <literal>operationalProperties</literal> are described in the following
             sections. The only one of these that is required is
             <literal>capabilities</literal>; the others are optional.</para>

         </section>

         <section id="&tp;aes.configuration_parameter_declaration">
           <title>Configuration Parameter Declaration</title>

           <para>Configuration Parameters are made available to annotator
             implementations and applications by the following interfaces:
             <literal>AnnotatorContext</literal> <footnote><para>Deprecated; use
             UimaContext instead.</para></footnote> (passed as an argument to the
             initialize() method of a version 1 annotator),
             <literal>ConfigurableResource</literal> (every Analysis Engine
             implements this interface), and the <literal>UimaContext</literal> (passed
             as an argument to the initialize() method of a version 2 annotator) (you can get
             this from any resource, including Analysis Engines, using the method
             <literal>getUimaContext</literal>()).</para>

           <para>Use AnnotatorContext within version 1 annotators and UimaContext for
             version 2 annotators and outside of annotators (for instance, in CasConsumers,
             or the containing application) to access configuration parameters.</para>

           <para>Configuration parameters are set from the corresponding elements in the
             XML descriptor for the application. If you need to programmatically change
             parameter settings within an application, you can use methods in
             ConfigurableResource; if you do this, you need to call reconfigure()
             afterwards to have the UIMA framework notify all the contained analysis
             components that the parameter configuration has changed (the analysis
             engine&apos;s reinitialize() methods will be called). Note that in the current
             implementation, only integrated deployment components have configuration
             parameters passed to them; remote components obtain their parameters from
             their remote startup environment. This will likely change in the
             future.</para>

           <para>There are two ways to specify the
             <literal>&lt;configurationParameters&gt;</literal> section &ndash; as a
             list of configuration parameters or a list of groups. A list of parameters, which
             are not part of any group, looks like this:


             <programlisting><![CDATA[<configurationParameters>
   <configurationParameter>
     <name>[String]</name>
     <description>[String]</description>
     <type>String|Integer|Float|Boolean</type>
     <multiValued>true|false</multiValued>
     <mandatory>true|false</mandatory>
     <overrides>
       <parameter>[String]</parameter>
       <parameter>[String]</parameter>
         ...
     </overrides>
   </configurationParameter>
   <configurationParameter>
     ...
   </configurationParameter>
     ...
 </configurationParameters>]]></programlisting></para>

           <para>For each configuration parameter, the following are specified:</para>

           <itemizedlist><listitem><para><emphasis role="bold">name</emphasis>
             &ndash; the name by which the annotator code refers to the parameter. All
             parameters declared in an analysis engine descriptor must have distinct names.
             (required). The name is composed of normal Java identifier characters.</para>
             </listitem>

             <listitem><para><emphasis role="bold">description</emphasis> &ndash; a
               natural language description of the intent of the parameter
               (optional)</para></listitem>

             <listitem><para><emphasis role="bold">type</emphasis> &ndash; the data
               type of the parameter&apos;s value &ndash; must be one of
               <literal>String</literal>, <literal>Integer</literal>,
               <literal>Float</literal>, or <literal>Boolean</literal>
               (required).</para></listitem>

             <listitem><para><emphasis role="bold">multiValued</emphasis> &ndash;
               <literal>true</literal> if the parameter can take multiple-values (an
               array), <literal>false</literal> if the parameter takes only a single value
               (optional, defaults to false).</para></listitem>

             <listitem><para><emphasis role="bold">mandatory</emphasis> &ndash;
               <literal>true</literal> if a value must be provided for the parameter
               (optional, defaults to false).</para></listitem>

             <listitem><para><emphasis role="bold">overrides</emphasis> &ndash; this
               is used only in aggregate Analysis Engines, but is included here for
               completeness. See <xref
                 linkend="&tp;aes.aggregate.configuration_parameter_overrides"/>
               for a discussion of configuration parameter overriding in aggregate
               Analysis Engines. (optional) </para></listitem></itemizedlist>

           <para>A list of groups looks like this:


             <programlisting><![CDATA[<configurationParameters defaultGroup="[String]"
     searchStrategy="none|default_fallback|language_fallback" >

   <commonParameters>
     [zero or more parameters]
   </commonParameters>

   <configurationGroup names="name1 name2 name3 ...">
     [zero or more parameters]
   </configurationGroup>

   <configurationGroup names="name4 name5 ...">
     [zero or more parameters]
   </configurationGroup>

   ...

 </configurationParameters>]]></programlisting></para>

           <para>Both the<literal> &lt;commonParameters&gt;</literal> and
             <literal>&lt;configurationGroup&gt;</literal> elements contain zero or
             more <literal>&lt;configurationParameter&gt;</literal> elements, with
             the same syntax described above.</para>

           <para>The <literal>&lt;commonParameters&gt;</literal> element declares
             parameters that exist in all groups. Each
             <literal>&lt;configurationGroup&gt;</literal> element has a names
             attribute, which contains a list of group names separated by whitespace (space
             or tab characters). Names consist of any number of non-whitespace characters;
             however the Component Descriptor Editor tool restricts this to be normal Java
             identifiers, including the period (.) and the dash (-). One configuration group
             will be created for each name, and all of the groups will contain the same set of
             parameters.</para>

           <para>The <literal>defaultGroup</literal> attribute specifies the name of the
             group to be used in the case where an annotator does a lookup for a configuration
             parameter without specifying a group name. It may also be used as a fallback if the
             annotator specifies a group that does not exist &ndash; see below.</para>

           <para>The <literal>searchStrategy</literal> attribute determines the action
             to be taken when the context is queried for the value of a parameter belonging to a
             particular configuration group, if that group does not exist or does not contain
             a value for the requested parameter. There are currently three possible values:

             <itemizedlist><listitem><para><emphasis role="bold">none</emphasis>
               &ndash; there is no fallback; return null if there is no value in the exact group
               specified by the user.</para></listitem>

               <listitem><para><emphasis role="bold">default_fallback</emphasis>
                 &ndash; if there is no value found in the specified group, look in the default
                 group (as defined by the <literal>default</literal> attribute)</para>
                 </listitem>

               <listitem><para><emphasis role="bold">language_fallback</emphasis>
                 &ndash; this setting allows for a specific use of configuration parameter
                 groups where the groups names correspond to ISO language and country codes
                 (for an example, see below). The fallback sequence is:
                 <literal>&lt;lang&gt;_&lt;country&gt;_&lt;region&gt; &rarr;
                 &lt;lang&gt;_&lt;country&gt; &rarr; &lt;lang&gt; &rarr;
                 &lt;default&gt;.</literal> </para></listitem></itemizedlist>
             </para>

           <section id="&tp;aes.configuration_parameter_declaration.example">
             <title>Example</title>


             <programlisting><![CDATA[<configurationParameters defaultGroup="en"
         searchStrategy="language_fallback">

   <commonParameters>
     <configurationParameter>
       <name>DictionaryFile</name>
       <description>Location of dictionary for this
            language</description>
       <type>String</type>
       <multiValued>false</multiValued>
       <mandatory>false</mandatory>
     </configurationParameter>
   </commonParameters>

   <configurationGroup names="en de en-US"/>

   <configurationGroup names="zh">
     <configurationParameter>
       <name>DBC_Strategy</name>
       <description>Strategy for dealing with double-byte
           characters.</description>
       <type>String</type>
       <multiValued>false</multiValued>
       <mandatory>false</mandatory>
     </configurationParameter>
   </configurationGroup>

 </configurationParameters>]]></programlisting>

             <para>In this example, we are declaring a <literal>DictionaryFile</literal>
               parameter that can have a different value for each of the languages that our AE
               supports
               &ndash; English (general), German, U.S. English, and Chinese. For Chinese
               only, we also declare a <literal>DBC_Strategy</literal>
               parameter.</para>

             <para>We are using the <literal>language_fallback</literal> search
               strategy, so if an annotator requests the dictionary file for the
               <literal>en-GB</literal> (British English) group, we will fall back to the
               more general <literal>en</literal> group.</para>

             <para>Since we have defined <literal>en</literal> as the default group, this
               value will be returned if the context is queried for the
               <literal>DictionaryFile</literal> parameter without specifying any
               group name, or if a nonexistent group name is specified.</para>
           </section>
         </section>

         <section id="&tp;aes.configuration_parameter_settings">
           <title>Configuration Parameter Settings</title>

           <para>If no configuration groups were declared, the
             <literal>&lt;configurationParameterSettings&gt;</literal> element
             looks like this:


             <programlisting><![CDATA[<configurationParameterSettings>
   <nameValuePair>
     <name>[String]</name>
     <value>
       <string>[String]</string>  |
       <integer>[Integer]</integer> |
       <float>[Float]</float> |
       <boolean>true|false</boolean>  |
       <array> ... </array>
     </value>
   </nameValuePair>

   <nameValuePair>
     ...
   </nameValuePair>
   ...
 </configurationParameterSettings>]]></programlisting></para>

           <para>There are zero or more <literal>nameValuePair</literal> elements. Each
             <literal>nameValuePair</literal> contains a name (which refers to one of the
             configuration parameters) and a value for that parameter.</para>

           <para>The <literal>value</literal> element contains an element that matches
             the type of the parameter. For single-valued parameters, this is either
             <literal>&lt;string&gt;</literal>, <literal>&lt;integer&gt;</literal>
             , <literal>&lt;float&gt;</literal>, or
             <literal>&lt;boolean&gt;</literal>. For multi-valued parameters, this is
             an <literal>&lt;array&gt;</literal> element, which then contains zero or
             more instances of the appropriate type of primitive value, e.g.:


             <programlisting>&lt;array&gt;&lt;string&gt;One&lt;/string&gt;&lt;string&gt;Two&lt;/string&gt;&lt;/array&gt;</programlisting></para>

           <para>If configuration groups were declared, then the
             <literal>&lt;configurationParameterSettings&gt;</literal> element
             looks like this:


             <programlisting><![CDATA[<configurationParameterSettings>

   <settingsForGroup name="[String]">
     [one or more <nameValuePair> elements]
   </settingsForGroup>

   <settingsForGroup name="[String]">
     [one or more <nameValuePair> elements]
   </settingsForGroup>

 ...

 </configurationParameterSettings>]]></programlisting>
             where each <literal>&lt;settingsForGroup&gt;</literal> element has a name
             that matches one of the configuration groups declared under the
             <literal>&lt;configurationParameters&gt;</literal> element and contains
             the parameter settings for that group.</para>

           <section id="&tp;aes.configuration_parameter_settings.example">
             <title>Example</title>

             <para>Here are the settings that correspond to the parameter declarations in
               the previous example:


               <programlisting><![CDATA[<configurationParameterSettings>

   <settingsForGroup name="en">
     <nameValuePair>
       <name>DictionaryFile</name>
       <value><string>resourcesEnglishdictionary.dat></string></value>
     </nameValuePair>
   </settingsForGroup>

   <settingsForGroup name="en-US">
     <nameValuePair>
       <name>DictionaryFile</name>
       <value><string>resourcesEnglish_USdictionary.dat</string></value>
     </nameValuePair>
   </settingsForGroup>

   <settingsForGroup name="de">
     <nameValuePair>
       <name>DictionaryFile</name>
       <value><string>resourcesDeutschdictionary.dat</string></value>
     </nameValuePair>
   </settingsForGroup>

   <settingsForGroup name="zh">
     <nameValuePair>
       <name>DictionaryFile</name>
       <value><string>resourcesChinesedictionary.dat</string></value>
     </nameValuePair>

     <nameValuePair>
       <name>DBC_Strategy</name>
       <value><string>default</string></value>
     </nameValuePair>

   </settingsForGroup>

 </configurationParameterSettings>]]></programlisting></para>
           </section>
           </section>

           <section id="&tp;aes.type_system">
             <title>Type System Definition</title>


             <programlisting><![CDATA[<typeSystemDescription>

   <name> [String] </name>
   <description>[String]</description>
   <version>[String]</version>
   <vendor>[String]</vendor>

   <imports>
     <import ...>
     ...
   </imports>

   <types>
     <typeDescription>
       ...
     </typeDescription>

     ...

   </types>

 </typeSystemDescription>]]></programlisting>

             <para>A <literal>typeSystemDescription</literal> element defines a type
               system for an Analysis Engine. The syntax for the element is described in <xref
                 linkend="&tp;type_system"/>.</para>

             <para>The recommended usage is to <literal>import</literal> an external type
               system, using the import syntax described in <xref linkend="&tp;imports"/>
               of this chapter. For example:


               <programlisting>&lt;typeSystemDescription&gt;
   &lt;imports&gt;
     &lt;import location="MySharedTypeSystem.xml"&gt;
   &lt;/imports&gt;
 &lt;/typeSystemDescription&gt;</programlisting></para>

             <para>This allows several AEs to share a single type system definition. The file
               <literal>MySharedTypeSystem.xml</literal> would then contain the full
               type system information, including the <literal>name</literal>,
               <literal>description</literal>, <literal>vendor</literal>,
               <literal>version</literal>, and <literal>types</literal>.</para>

           </section>
           <section id="&tp;aes.type_priority">
             <title>Type Priority Definition</title>


             <programlisting><![CDATA[<typePriorities>
   <name> [String] </name>
   <description>[String]</description>
   <version>[String]</version>
   <vendor>[String]</vendor>

   <imports>
     <import ...>
     ...
   </imports>

   <priorityLists>
     <priorityList>
       <type>[TypeName]</type>
       <type>[TypeName]</type>
         ...
     </priorityList>

     ...

   </priorityLists>
 </typePriorities>]]></programlisting>

             <para>The <literal>&lt;typePriorities&gt;</literal> element contains
               zero or more <literal>&lt;priorityList&gt;</literal> elements; each
               <literal>&lt;priorityList&gt;</literal> contains zero or more types.
               Like a type system, a type priorities definition may also declare a name,
               description, version, and vendor, and may import other type priorities. See
                 <xref linkend="&tp;imports"/> for the import syntax.</para>

             <para>Type priority is used when iterating over feature structures in the CAS.
               For example, if the CAS contains a <literal>Sentence</literal> annotation
               and a <literal>Paragraph</literal> annotation with the same span of text
               (i.e. a one-sentence paragraph), which annotation should be returned first
               by an iterator? Probably the Paragraph, since it is conceptually
               <quote>bigger,</quote> but the framework does not know that and must be
               explicitly told that the Paragraph annotation has priority over the Sentence
               annotation, like this:


               <programlisting>&lt;typePriorities&gt;
   &lt;priorityList&gt;
     &lt;type&gt;org.myorg.Paragraph&lt;/type&gt;
     &lt;type&gt;org.myorg.Sentence&lt;/type&gt;
   &lt;/priorityList&gt;
 &lt;/typePriorities&gt;</programlisting></para>

             <para>All of the <literal>&lt;priorityList&gt;</literal> elements defined
               in the descriptor (and in all component descriptors of an aggregate analysis
               engine descriptor) are merged to produce a single priority list.</para>

             <para>Subtypes of types specified here are also ordered, unless overridden by
               another user-specified type ordering. For example, if you specify type A
               comes before type B, then subtypes of A will come before subtypes of B, unless
               there is an overriding specification which declares some subtype of B comes
               before some subtype of A.</para>

             <para>If there are inconsistencies between the priority list (type A declared
               before type B in one priority list, and type B declared before type A in
               another), the framework will throw an exception.</para>

             <para>User defined indexes may declare if they wish to use the type priority or
               not; see the next section.</para>
           </section>

           <section id="&tp;aes.index">
             <title>Index Definition</title>


             <programlisting><![CDATA[<fsIndexCollection>

   <name>[String]</name>
   <description>[String]</description>
   <version>[String]</version>
   <vendor>[String]</vendor>

   <imports>
     <import ...>
     ...
   </imports>

   <fsIndexes>

     <fsIndexDescription>
       ...
     </fsIndexDescription>

     <fsIndexDescription>
       ...
     </fsIndexDescription>

   </fsIndexes>

 </fsIndexCollection>]]></programlisting>

             <para>The <literal>fsIndexCollection</literal> element declares<emphasis> Feature Structure
               Indexes</emphasis>, each of which defined an index that holds feature structures of a given type.
               Information in the CAS is always accessed through an index. There is a built-in default annotation
               index declared which can be used to access instances of type
               <literal>uima.tcas.Annotation</literal> (or its subtypes), sorted based on their
               <literal>begin</literal> and <literal>end</literal> features. For all other types, there is a
               default, unsorted (bag) index. If there is a need for a specialized index it must be declared in this
               element of the descriptor. See <olink targetdoc="&uima_docs_ref;"
                 targetptr="ugr.ref.cas.indexes_and_iterators"/> for details on FS indexes.</para>

             <para>Like type systems and type priorities, an
               <literal>fsIndexCollection</literal> can declare a
               <literal>name</literal>, <literal>description</literal>,
               <literal>vendor</literal>, and <literal>version</literal>, and may
               import other <literal>fsIndexCollection</literal>s. The import syntax is
               described in <xref linkend="&tp;imports"/>.</para>

             <para>An <literal>fsIndexCollection</literal> may also define zero or more
               <literal>fsIndexDescription</literal> elements, each of which defines a
               single index. Each <literal>fsIndexDescription</literal> has the form:


               <programlisting><![CDATA[<fsIndexDescription>

   <label>[String]</label>
   <typeName>[TypeName]</typeName>
   <kind>sorted|bag|set</kind>

   <keys>

     <fsIndexKey>
       <featureName>[Name]</featureName>
       <comparator>standard|reverse</comparator>
     </fsIndexKey>

     <fsIndexKey>
       <typePriority/>
     </fsIndexKey>

     ...

   </keys>
 </fsIndexDescription>]]></programlisting></para>

             <para>The <literal>label</literal> element defines the name by which
               applications and annotators refer to this index. The
               <literal>typeName</literal> element contains the name of the type that will
               be contained in this index. This must match one of the type names defined in the
               <literal>&lt;typeSystemDescription&gt;</literal>.</para>

             <para>There are three possible values for the
               <literal>&lt;kind&gt;</literal> of index. Sorted indexes enforce an
               ordering of feature structures, and may contain duplicates. Bag indexes do
               not enforce ordering, and also may contain duplicates. Set indexes do not
               enforce ordering and may not contain duplicates.  If the <literal>&lt;kind&gt;</literal>element is omitted, it will default to
               sorted, which is the most common type of index.</para>

             <note><para>There is usually no need to explicitly declare a Bag index in your descriptor.
               As of UIMA v2.1, if you do not declare any index for a type (or any of its
               supertypes), a Bag index will be automatically created.</para></note>

             <para>An index may define zero or more <emphasis>keys</emphasis>. These keys
               determine the sort order of the feature structures within a sorted index, and
               determine equality for set indexes. Bag indexes do not use keys, and
 			  equality is determined by Feature Structure identity (that is, two elements
 			  are considered equal if and only if they are exactly the same feature structure,
 			  located in the same place in the CAS). Keys are
               ordered by precedence &ndash; the first key is evaluated first, and
               subsequent keys are evaluated only if necessary.</para>

             <para>Each key is represented by an <literal>fsIndexKey</literal> element.
               Most <literal>fsIndexKeys</literal> contains a
               <literal>featureName</literal> and a <literal>comparator</literal>.
               The <literal>featureName</literal> must match the name of one of the
               features for the type specified in the
               <literal>&lt;typeName&gt;</literal> element for this index. The
               comparator defines how the features will be compared &ndash; a value of
               <literal>standard</literal> means that features will be compared using the
               standard comparison for their data type (e.g. for numerical types, smaller
               values precede larger values, and for string types, Unicode string
               comparison is performed). A value of <literal>reverse</literal> means that
               features will be compared using the reverse of the standard comparison (e.g.
               for numerical types, larger values precede smaller values, etc.). For Set
               indexes, the comparator direction is ignored &ndash; the keys are only used
               for the equality testing.</para>

             <para>Each key used in comparisons must refer to a feature whose range type is
               String, Float, or Integer.</para>

             <para>There is a second type of a key, one which contains only the
               <literal>&lt;typePriority/&gt;</literal>. When this key is used, it
               indicates that Feature Structures will be compared using the type priorities
               declared in the <literal>&lt;typePriorities&gt;</literal> section of the
               descriptor.</para>

           </section>

           <section id="&tp;aes.capabilities">
             <title>Capabilities</title>


             <programlisting><![CDATA[<capabilities>
   <capability>

     <inputs>
       <type allAnnotatorFeatures="true|false"[TypeName]</type>
       ...
       <feature>[TypeName]:[Name]</feature>
       ...
     </inputs>

     <outputs>
       <type allAnnotatorFeatures="true|false"[TypeName]</type>
       ...
       <feature>[TypeName]:[Name]</feature>
       ...
     </output>

     <inputSofas>
       <sofaName>[name]</sofaName>
       ...
     </inputSofas>

     <outputSofas>
       <sofaName>[name]</sofaName>
       ...
     </outputSofas>

     <languagesSupported>
       <language>[ISO Language ID]</language>
         ...
     </languagesSupported>
   </capability>

   <capability>
     ...
   </capability>

   ...

 </capabilities>]]></programlisting>

             <para>The capabilities definition is used by the UIMA Framework in several
               ways, including setting up the Results Specification for process calls,
               routing control for aggregates based on language, and as part of the Sofa
               mapping function.</para>

             <para>The <literal>capabilities</literal> element contains one or more
               <literal>capability</literal> elements. In Version 2 and onwards, only one
               capability set should be used (multiple sets will continue to work for a while,
               but they're not logically consistently supported).
               <!-- Because you can therefore
               declare multiple capability sets, you can use this to model component behavior

               that for a given set of inputs, produces a particular set of outputs. --></para>

             <para>Each <literal>capability</literal> contains
               <literal>inputs</literal>, <literal>outputs</literal>,
               <literal>languagesSupported, inputSofas, and outputSofas</literal>.
               Inputs and outputs element are required (though they may be empty);
               <literal>&lt;languagesSupported&gt;, &lt;inputSofas</literal>&gt;,
               and <literal>&lt;outputSofas&gt;</literal> are optional.</para>

             <para>Both inputs and outputs may contain a mixture of type and feature
               elements.</para>

             <para><literal>&lt;type...&gt;</literal> elements contain the name of one
               of the types defined in the type system or one of the built in types. Declaring a
               type as an input means that this component expects instances of this type to be
               in the CAS when it receives it to process. Declaring a type as an output means
               that this component creates new instances of this type in the CAS.</para>

             <para>There is an optional attribute
               <literal>allAnnotatorFeatures</literal>, which defaults to false if
               omitted. The Component Descriptor Editor tool defaults this to true when a new
               type is added to the list of inputs and/or outputs. When this attribute is true,
               it specifies that all of the type&apos;s features are also declared as input or
               output. Otherwise, the features that are required as inputs or populated as
               outputs must be explicitly specified in feature elements.</para>

             <para><literal>&lt;feature...&gt;</literal> elements contain the
               <quote>fully-qualified</quote> feature name, which is the type name
               followed by a colon, followed by the feature name, e.g.
               <literal>org.myorg.TokenAnnotation:lemma</literal>.
               <literal>&lt;feature...&gt;</literal> elements in the
               <literal>&lt;inputs&gt;</literal> section must also have a corresponding
               type declared as an input. In output sections, this is not required. If the type
               is not specified as an output, but a feature for that type is, this means that
               existing instances of the type have the values of the specified features
               updated. Any type mentioned in a <literal>&lt;feature&gt;</literal>
               element must be either specified as an input or an output or both.</para>

             <para><literal>language </literal>elements contain one of the ISO language
               identifiers, such as <literal>en</literal> for English, or
               <literal>en-US</literal> for the United States dialect of English.</para>

             <para>The list of language codes can be found here: <ulink
                 url="http://www.ics.uci.edu/pub/ietf/http/related/iso639.txt"/>
               and the country codes here:
               <ulink
                 url="http://www.chemie.fu-berlin.de/diverse/doc/ISO_3166.html"/>
               </para>

             <para><literal>&lt;inputSofas&gt;</literal> and
               <literal>&lt;outputSofas&gt;</literal> declare sofa names used by this
               component. All Sofa names must be unique within a particular capability set. A
               Sofa name must be an input or an output, and cannot be both. It is an error to have a
               Sofa name declared as an input in one capability set, and also have it declared
               as an output in another capability set.</para>

             <para>A <literal>&lt;sofaName&gt;</literal> is written as a simple
               Java-style identifier, without any periods in the name, except that it may be
               written to end in <quote><literal>.*</literal></quote>. If written in this
               manner, it specifies a set of Sofa names, all of which start with the base name
               (the part before the .*) followed by a period and then an arbitrary Java
               identifier (without periods). This form is used to specify in the descriptor
               that the component could generate an arbitrary number of Sofas, the exact
               names and numbers of which are unknown before the component is run.</para>

           </section>

           <section id="&tp;aes.operational_properties">
             <title>OperationalProperties</title>

             <para>Components can specify specific operational properties that can be
               useful in deployment. The following are available:</para>


             <programlisting><![CDATA[<operationalProperties>
   <modifiesCas> true|false </modifiesCas>
   <multipleDeploymentAllowed> true|false </multipleDeploymentAllowed>
   <outputsNewCASes> true|false </outputsNewCASes>
 </operationalProperties>]]></programlisting>

             <para><literal>ModifiesCas</literal>, if false, indicates that this
               component does not modify the CAS. If it is not specified, the default value is
               true except for CAS Consumer components.</para>

             <para><literal>multipleDeploymentAllowed</literal>, if true, allows the
               component to be deployed multiple times to increase performance through
               scale-out techniques. If it is not specified, the default value is true,
               except for CAS Consumer and Collection Reader components.</para>

             <note><para>If you wrap one or more CAS Consumers inside an aggregate as the only
             components, you must explicitly specify in the aggregate the
             <literal>multipleDeploymentAllowed</literal> property as false (assuming the CAS Consumer
             components take the default here); otherwise the framework will complain about inconsistent
             settings for these.</para></note>

             <para><literal>outputsNewCASes</literal>, if true, allows the component to
               create new CASes during processing, for example to break a large artifact into
               smaller pieces. See <olink targetdoc="&uima_docs_tutorial_guides;"
                 targetptr="ugr.tug.cm"/> for details.</para>
           </section>

           <section id="&tp;aes.primitive.external_resource_dependencies">
             <title>External Resource Dependencies</title>


             <programlisting><![CDATA[<externalResourceDependencies>
   <externalResourceDependency>
     <key>[String]</key>
     <description>[String] </description>
     <interfaceName>[String]</interfaceName>
     <optional>true|false</optional>
   </externalResourceDependency>

   <externalResourceDependency>
     ...
   </externalResourceDependency>

   ...

 </externalResourceDependencies>]]></programlisting>

             <para>A primitive annotator may declare zero or more
               <literal>&lt;externalResourceDependency&gt;</literal> elements. Each
               dependency has the following elements:

               <itemizedlist><listitem><para><literal>key</literal> &ndash; the
                 string by which the annotator code will attempt to access the resource. Must
                 be unique within this annotator.</para></listitem>

                 <listitem><para><literal>description</literal> &ndash; a textual
                   description of the dependency</para></listitem>

                 <listitem><para><literal>interfaceName</literal> &ndash; the
                   fully-qualified name of the Java interface through which the annotator
                   will access the data. This is optional. If not specified, the annotator
                   can only get an InputStream to the data.</para></listitem>

                 <listitem><para><literal>optional</literal> &ndash; whether the
                   resource is optional. If false, an exception will be thrown if no resource
                   is assigned to satisfy this dependency. Defaults to false. </para>
                   </listitem></itemizedlist></para>

           </section>

           <section id="&tp;aes.primitive.resource_manager_configuration">
             <title>Resource Manager Configuration</title>


             <programlisting><![CDATA[<resourceManagerConfiguration>

   <name>[String]</name>
   <description>[String]</description>
   <version>[String]</version>
   <vendor>[String]</vendor>

   <imports>
     <import ...>
     ...
   </imports>

   <externalResources>

     <externalResource>
       <name>[String]</name>
       <description>[String]</description>
       <fileResourceSpecifier>
         <fileUrl>[URL]</fileUrl>
       </fileResourceSpecifier>
       <implementationName>[String]</implementationName>
     </externalResource>
     ...
   </externalResources>

   <externalResourceBindings>
     <externalResourceBinding>
       <key>[String]</key>
       <resourceName>[String]</resourceName>
     </externalResourceBinding>
     ...
   </externalResourceBindings>

 </resourceManagerConfiguration>]]></programlisting>

             <para>This element declares external resources and binds them to
               annotators&apos; external resource dependencies.</para>

             <para>The <literal>resourceManagerConfiguration</literal> element may
               optionally contain an <literal>import</literal>, which allows resource
               definitions to be stored in a separate (shareable) file. See <xref
                 linkend="&tp;imports"/> for details.</para>

             <para>The <literal>externalResources</literal> element contains zero or
               more <literal>externalResource</literal> elements, each of which
               consists of:

               <itemizedlist><listitem><para><literal>name</literal> &ndash; the
                 name of the resource. This name is referred to in the bindings (see below).
                 Resource names need to be unique within any Aggregate Analysis Engine or
                 Collection Processing Engine, so the Java-like
                 <literal>org.myorg.mycomponent.MyResource</literal> syntax is
                 recommended.</para></listitem>

                 <listitem><para><literal>description</literal> &ndash; English
                   description of the resource</para></listitem>

                 <listitem><para>Resource Specifier &ndash;
                   Declares the location of the resource. There are different
                   possibilities for how this is done (see below).</para></listitem>

                 <listitem><para><literal>implementationName</literal> &ndash; The
                   fully-qualified name of the Java class that will be instantiated from the
                   resource data. This is optional; if not specified, the resource will be
                   accessible as an input stream to the raw data. If specified, the Java class
                   must implement the <literal>interfaceName</literal> that is
                   specified in the External Resource Dependency to which it is bound.
                   </para></listitem></itemizedlist></para>

             <para>One possibility for the resource specifier is a
               <literal>&lt;fileResourceSpecifier&gt;</literal>, as shown above. This
               simply declares a URL to the resource data. This support is built on the Java
               class URL and its method URL.openStream(); it supports the protocols
               <quote>file</quote>, <quote>http</quote> and <quote>jar</quote> (for
               referring to files in jars) by default, and you can plug in handlers for other
               protocols. The URL has to start with file: (or some other protocol). It is
               relative to either the classpath or the <quote>data path</quote>. The data
               path works like the classpath but can be set programmatically via
               <literal>ResourceManager.setDataPath()</literal>. Setting the Java
               System property <literal>uima.datapath</literal> also works.</para>

             <para><literal>file:com/apache.d.txt</literal> is a relative path;
               relative paths for resources are resolved using the classpath and/or the
               datapath. For the file protocol, URLs starting with file:/ or file:/// are
               absolute. Note that <literal>file://org/apache/d.txt</literal> is NOT an
               absolute path starting with <quote>org</quote>. The <quote>//</quote>
               indicates that what follows is a host name. Therefore if you try to use this URL
               it will complain that it can&apos;t connect to the host <quote>org</quote>
               </para>

             <para>Another option is a
               <literal>&lt;fileLanguageResourceSpecifier&gt;</literal>, which is
               intended to support resources, such as dictionaries, that depend on the
               language of the document being processed. Instead of a single URL, a prefix and
               suffix are specified, like this:


               <programlisting><![CDATA[<fileLanguageResourceSpecifier>
   <fileUrlPrefix>file:FileLanguageResource_implTest_data_</fileUrlPrefix>
   <fileUrlSuffix>.dat</fileUrlSuffix>
 </fileLanguageResourceSpecifier>]]></programlisting></para>

             <para>The URL of the actual resource is then formed by concatenating the prefix,
               the language of the document (as an ISO language code, e.g.
               <literal>en</literal> or <literal>en-US</literal>
               &ndash; see <xref linkend="&tp;aes.capabilities"/> for more
               information), and the suffix.</para>

 		    <para>A third option is a <literal>customResourceSpecifier</literal>, which allows
 			  you to plug in an arbitrary Java class.  See <xref linkend="&tp;custom_resource_specifiers"/>
 			  for more information.</para>

             <para>The <literal>externalResourceBindings</literal> element declares
               which resources are bound to which dependencies. Each
               <literal>externalResourceBinding</literal> consists of:

               <itemizedlist><listitem><para><literal>key</literal> &ndash;
                 identifies the dependency. For a binding declared in a primitive analysis
                 engine descriptor, this must match the value of the
                 <literal>key</literal> element of one of the
                 <literal>externalResourceDependency</literal> elements. Bindings
                 may also be specified in aggregate analysis engine descriptors, in which
                 case a compound key is used
                 &ndash; see <xref
                   linkend="&tp;aes.aggregate.external_resource_bindings"/>
                 .</para></listitem>

                 <listitem><para><literal>resourceName</literal> &ndash; the name of
                   the resource satisfying the dependency. This must match the value of the
                   <literal>name</literal> element of one of the
                   <literal>externalResource</literal> declarations. </para>
                   </listitem></itemizedlist></para>

             <para>A given resource dependency may only be bound to one external resource;
               one external resource may be bound to many dependencies &ndash; to allow
               resource sharing.</para>
           </section>

           <section id="&tp;aes.environment_variable_references">
             <title>Environment Variable References</title>

             <para>In several places throughout the descriptor, it is possible to reference
               environment variables. In Java, these are actually references to Java system
               properties. To reference system environment variables from a Java analysis
               engine you must pass the environment variables into the Java virtual machine
               by using the <literal>-D</literal> option on the <literal>java</literal>
               command line.</para>

             <para>The syntax for environment variable references is
               <literal>&lt;envVarRef&gt;[VariableName]&lt;/envVarRef&gt;</literal>
               , where [VariableName] is any valid Java system property name. Environment
               variable references are valid in the following places:

               <itemizedlist spacing="compact"><listitem><para>The value of a
                 configuration parameter (String-valued parameters only)</para>
                 </listitem>

                 <listitem><para>The
                   <literal>&lt;annotatorImplementationName&gt;</literal> element
                   of a primitive AE descriptor</para></listitem>

                 <listitem><para>The <literal>&lt;name&gt;</literal> element within
                   <literal>&lt;analysisEngineMetaData&gt;</literal></para>
                   </listitem>

                 <listitem><para>Within a
                   <literal>&lt;fileResourceSpecifier&gt;</literal> or
                   <literal>&lt;fileLanguageResourceSpecifier&gt;</literal>
                   </para></listitem></itemizedlist></para>

             <para>For example, if the value of a configuration parameter were specified as:
               <literal>&lt;string&gt;&lt;envVarRef&gt;TEMP_DIR&lt;/envVarRef&gt;/temp.dat&lt;/string&gt;</literal>
               , and the value of the <literal>TEMP_DIR</literal> Java System property were
               <literal>c:/temp</literal>, then the configuration parameter&apos;s
               value would evaluate to <literal>c:/temp/temp.dat</literal>.</para>

             <note><para>The Component Descriptor Editor does not support
               environment variable references.  If you need to, however, you
               can use the <code>source</code> tab view in the CDE to manually
               add this notation.
               </para></note>

           </section>
         </section>
         <section id="&tp;aes.aggregate">
           <title>Aggregate Analysis Engine Descriptors</title>

           <para>Aggregate Analysis Engines do not contain an annotator, but instead
             contain one or more component (also called <emphasis>delegate</emphasis>)
             analysis engines.</para>

           <para>Aggregate Analysis Engine Descriptors maintain most of the same structure
             as Primitive Analysis Engine Descriptors. The differences are:</para>

           <itemizedlist><listitem><para>An Aggregate Analysis Engine Descriptor
             contains the element
             <literal>&lt;primitive&gt;false&lt;/primitive&gt;</literal> rather
             than <literal>&lt;primitive&gt;true&lt;/primitive&gt;</literal>.
             </para></listitem>

             <listitem><para>An Aggregate Analysis Engine Descriptor must not include a
               <literal>&lt;annotatorImplementationName&gt;</literal>
               element.</para></listitem>

             <listitem><para>In place of the
               <literal>&lt;annotatorImplementationName&gt;</literal>, an Aggregate
               Analysis Engine Descriptor must have a
               <literal>&lt;delegateAnalysisEngineSpecifiers&gt;</literal>
               element. See <xref linkend="&tp;aes.aggregate.delegates"/>.</para>
               </listitem>

             <listitem><para>An Aggregate Analysis Engine Descriptor may provide a
               <literal>&lt;flowController&gt;</literal> element immediately
               following the
               <literal>&lt;delegateAnalysisEngineSpecifiers&gt;</literal>. <xref
                 linkend="&tp;aes.aggregate.flow_controller"/>.</para></listitem>

             <listitem><para>Under the analysisEngineMetaData element, an Aggregate
               Analysis Engine Descriptor may specify an additional element --
               <literal>&lt;flowConstraints&gt;</literal>. See <xref
                 linkend="&tp;aes.aggregate.flow_constraints"/>. Typically only one
               of <literal>&lt;flowController&gt;</literal> and
               <literal>&lt;flowConstraints&gt;</literal> are specified. If both are
               specified, the <literal>&lt;flowController&gt;</literal> takes
               precedence, and the flow controller implementation can use the information
               in specified in the <literal>&lt;flowConstraints&gt;</literal> as part of
               its configuration input.</para></listitem>

             <listitem><para>An aggregate Analysis Engine Descriptors must not contain a
               <literal>&lt;typeSystemDescription&gt;</literal> element. The Type
               System of the Aggregate Analysis Engine is derived by merging the Type System
               of the Analysis Engines that the aggregate contains.</para></listitem>

             <listitem><para>Within aggregate Analysis Engine Descriptors,
               <literal>&lt;configurationParameter&gt;</literal> elements may define
               <literal>&lt;overrides&gt;</literal>. See <xref
                 linkend="&tp;aes.aggregate.configuration_parameter_overrides"/>
               .</para></listitem>

             <listitem><para>External Resource Bindings can bind resources to
               dependencies declared by any delegate AE within the aggregate. See <xref
                 linkend="&tp;aes.aggregate.external_resource_bindings"/>.</para>
               </listitem>

             <listitem><para>An additional optional element,
               <literal>&lt;sofaMappings&gt;</literal>, may be included. </para>
               </listitem></itemizedlist>

           <section id="&tp;aes.aggregate.delegates">
             <title>Delegate Analysis Engine Specifiers</title>


             <programlisting><![CDATA[<delegateAnalysisEngineSpecifiers>

   <delegateAnalysisEngine key="[String]">
     <analysisEngineDescription>...</analysisEngineDescription> |
     <import .../>
   </delegateAnalysisEngine>

   <delegateAnalysisEngine key="[String]">
     ...
   </delegateAnalysisEngine>

   ...

 </delegateAnalysisEngineSpecifiers>]]></programlisting>

             <para>The <literal>delegateAnalysisEngineSpecifiers</literal> element
               contains one or more <literal>delegateAnalysisEngine</literal>
               elements. Each of these must have a unique key, and must contain
               either:</para>

             <itemizedlist><listitem><para>A complete
               <literal>analysisEngineDescription</literal> element describing the
               delegate analysis engine <emphasis role="bold">OR</emphasis></para>
               </listitem>

               <listitem><para>An <literal>import</literal> element giving the name or
                 location of the XML descriptor for the delegate analysis engine (see <xref
                   linkend="&tp;imports"/>).</para></listitem></itemizedlist>

             <para>The latter is the much more common usage, and is the only form supported by
               the Component Descriptor Editor tool.</para>
           </section>
           <section id="&tp;aes.aggregate.flow_controller">
             <title>FlowController</title>


             <programlisting><![CDATA[<flowController key="[String]">
     <flowControllerDescription>...</flowControllerDescription> |
     <import .../>
   </flowController>]]></programlisting>

             <para>The optional <literal>flowController</literal> element identifies
               the descriptor of the FlowController component that will be used to determine
               the order in which delegate Analysis Engine are called.</para>

             <para>The <literal>key</literal> attribute is optional, but recommended; it
               assigns the FlowController an identifier that can be used for configuration
               parameter overrides, Sofa mappings, or external resource bindings. The key
               must not be the same as any of the delegate analysis engine keys.</para>

             <para>As with the <literal>delegateAnalysisEngine</literal> element, the
               <literal>flowController</literal> element may contain either a complete
               <literal>flowControllerDescription</literal> or an
               <literal>import</literal>, but the import is recommended. The Component
               Descriptor Editor tool only supports imports here.</para>

           </section>
           <section id="&tp;aes.aggregate.flow_constraints">
             <title>FlowConstraints</title>

             <para>If a <literal>&lt;flowController&gt;</literal> is not specified, the
               order in which delegate Analysis Engines are called within the aggregate
               Analysis Engine is specified using the
               <literal>&lt;flowConstraints&gt;</literal> element, which must occur
               immediately following the
               <literal>configurationParameterSettings</literal> element. If a
               <literal>&lt;flowController&gt;</literal> is specified, then the
               <literal>&lt;flowConstraints&gt;</literal> are optional. They can be
               used to pass an ordering of delegate keys to the
               <literal>&lt;flowController&gt;</literal>.</para>

             <para>There are two options for flow constraints --
               <literal>&lt;fixedFlow&gt;</literal> or
               <literal>&lt;capabilityLanguageFlow&gt;</literal>. Each is discussed
               in a separate section below.</para>

             <section id="&tp;aes.aggregate.flow_constraints.fixed_flow">
               <title>Fixed Flow</title>


               <programlisting><![CDATA[<flowConstraints>
   <fixedFlow>
     <node>[String]</node>
     <node>[String]</node>
     ...
   </fixedFlow>
 </flowConstraints>]]></programlisting>

               <para>The <literal>flowConstraints</literal> element must be included
                 immediately following the
                 <literal>configurationParameterSettings</literal> element.</para>

               <para>Currently the <literal>flowConstraints</literal> element must
                 contain a <literal>fixedFlow</literal> element. Eventually, other
                 types of flow constraints may be possible.</para>

               <para>The <literal>fixedFlow</literal> element contains one or more
                 <literal>node</literal> elements, each of which contains an identifier
                 which must match the key of a delegate analysis engine specified in the
                 <literal>delegateAnalysisEngineSpecifiers</literal>
                 element.</para>

             </section>
             <section
               id="&tp;aes.aggregate.flow_constraints.capability_language_flow">
               <title>Capability Language Flow</title>


               <programlisting><![CDATA[<flowConstraints>
   <capabilityLanguageFlow>
     <node>[String]</node>
     <node>[String]</node>
     ...
   </capabilityLanguageFlow>
 </flowConstraints>]]></programlisting>

               <para>If you use <literal>&lt;capabilityLanguageFlow&gt;</literal>,
                 the delegate Analysis Engines named by the
                 <literal>&lt;node&gt;</literal> elements are called in the given order,
                 except that a delegate Analysis Engine is skipped if any of the following are
                 true (according to that Analysis Engine&apos;s declared output
                 capabilities):</para>

               <itemizedlist><listitem><para>It cannot produce any of the aggregate
                 Analysis Engine&apos;s output capabilities for the language of the
                 current document.</para></listitem>

                 <listitem><para>All of the output capabilities have already been
                   produced by an earlier Analysis Engine in the flow. </para></listitem>
                 </itemizedlist>

               <para>For example, if two annotators produce
                 <literal>org.myorg.TokenAnnotation</literal> feature structures for
                 the same language, these feature structures will only be produced by the
                 first annotator in the list.</para>

               <note><para>The flow analysis uses the specific types that are specified in the
               output capabilities, without any expansion for subtypes.  So, if you expect
               a type TT and another type SubTT (which is a subtype of TT) in the output, you
               must include both of them in the output capabilities.</para></note>
             </section>
           </section>

           <section id="&tp;aes.aggregate.configuration_parameter_overrides">
             <title>Configuration Parameter Overrides</title>

             <para>In an aggregate Analysis Engine Descriptor, each
               <literal>&lt;configurationParameter&gt; </literal>element should
               contain an <literal>&lt;overrides&gt;</literal> element, with the
               following syntax:</para>


             <programlisting><![CDATA[<overrides>

   <parameter>
     [delegateAnalysisEngineKey]/[parameterName]
   </parameter>

   <parameter>
     [delegateAnalysisEngineKey]/[parameterName]
   </parameter>
   ...

 </overrides>]]></programlisting>

             <para>Since aggregate Analysis Engines have no code associated with them, the
               only way in which their configuration parameters can affect their processing
               is by overriding the parameter values of one or more delegate analysis
               engines. The <literal>&lt;overrides&gt; </literal>element determines
               which parameters, in which delegate Analysis Engines, are overridden by this
               configuration parameter.</para>

             <para>For example, consider an aggregate Analysis Engine Descriptor that
               contains delegate Analysis Engines with keys
               <literal>annotator1</literal> and <literal>annotator2</literal> (as
               declared in the &lt;delegateAnalysisEngine&gt; element &ndash; see <xref
                 linkend="&tp;aes.aggregate.delegates"/>) and also declares a
               configuration parameter as follows:


               <programlisting><![CDATA[<configurationParameter>
   <name>AggregateParam</name>
   <type>String</type>
   <overrides>
     <parameter>annotator1/param1</parameter>
     <parameter>annotator2/param2</parameter>
   </overrides>
 </configurationParameter>]]></programlisting></para>

             <para>The value of the <literal>AggregateParam</literal> parameter
               (whether assigned in the aggregate descriptor or at runtime by an
               application) will override the value of parameter
               <literal>param1</literal> in <literal>annotator1</literal> and also
               override the value of parameter <literal>param2</literal> in
               <literal>annotator2</literal>. No other parameters will be
               affected.</para>

             <para>For historical reasons only, if an aggregate Analysis Engine descriptor
               declares a configuration parameter with no explicit overrides, that
               parameter will override any parameters having the same name within any
               delegate analysis engine. This usage is strongly discouraged. The UIMA SDK
               currently supports this usage but logs a warning message to the log file. This
               support may be dropped in future versions.</para>

           </section>

           <section id="&tp;aes.aggregate.external_resource_bindings">
             <title>External Resource Bindings</title>

             <para>Aggregate analysis engine descriptors can declare resource bindings
               that bind resources to dependencies declared in any of the delegate analysis
               engines (or their subcomponents, recursively) within that aggregate. This
               allows resource sharing. Any binding at this level overrides (supersedes)
               any binding specified by a contained component or their subcomponents,
               recursively.</para>

             <para>For example, consider an aggregate Analysis Engine Descriptor that
               contains delegate Analysis Engines with keys
               <literal>annotator1</literal> and <literal>annotator2</literal> (as
               declared in the <literal>&lt;delegateAnalysisEngine&gt;</literal>
               element &ndash; see <xref linkend="&tp;aes.aggregate.delegates"/>),
               where <literal>annotator1</literal> declares a resource dependency with
               key <literal>myResource</literal> and <literal>annotator2</literal>
               declares a resource dependency with key <literal>someResource</literal>
               .</para>

             <para>Within that aggregate Analysis Engine Descriptor, the following
               <literal>resourceManagerConfiguration</literal> would bind both of
               those dependencies to a single external resource file.</para>


             <programlisting><![CDATA[<resourceManagerConfiguration>

   <externalResources>
     <externalResource>
       <name>ExampleResource</name>
       <fileResourceSpecifier>
         <fileUrl>file:MyResourceFile.dat</fileUrl>
       </fileResourceSpecifier>
     </externalResource>
   </externalResources>

   <externalResourceBindings>
     <externalResourceBinding>
       <key>annotator1/myResource</key>
       <resourceName>ExampleResource</resourceName>
     </externalResourceBinding>
     <externalResourceBinding>
       <key>annotator2/someResource</key>
       <resourceName>ExampleResource</resourceName>
     </externalResourceBinding>
   </externalResourceBindings>

 </resourceManagerConfiguration>]]></programlisting>

             <para>The syntax for the <literal>externalResources</literal> declaration
               is exactly the same as described previously. In the resource bindings note the
               use of the compound keys, e.g. <literal>annotator1/myResource</literal>.
               This identifies the resource dependency key
               <literal>myResource</literal> within the annotator with key
               <literal>annotator1</literal>. Compound resource dependencies can be
               multiple levels deep to handle nested aggregate analysis engines.</para>
           </section>

           <section id="&tp;aes.aggregate.sofa_mappings">
             <title>Sofa Mappings</title>

             <para>Sofa mappings are specified between Sofa names declared in this
               aggregate descriptor as part of the
               <literal>&lt;capability&gt;</literal> section, and the Sofa names
               declared in the delegate components. For purposes of the mapping, all the
               declarations of Sofas in any of the capability sets contained within the
               <literal>&lt;capabilities&gt; </literal>element are considered
               together.</para>


             <programlisting><![CDATA[<sofaMappings>
   <sofaMapping>
     <componentKey>[keyName]</componentKey>
     <componentSofaName>[sofaName]</componentSofaName>
     <aggregateSofaName>[sofaName]</aggregateSofaName>
   </sofaMapping>
   ...
 </sofaMappings>]]></programlisting>

             <para>The &lt;componentSofaName&gt; may be omitted in the case where the
               component is not aware of Multiple Views or Sofas. In this case, the UIMA
               framework will arrange for the specified &lt;aggregateSofaName&gt; to be
               the one visible to the delegate component.</para>

             <para>The &lt;componentKey&gt; is the key name for the component as specified
               in the list of delegate components for this aggregate.</para>

             <para>The sofaNames used must be declared as input or output sofas in some
               capability set.</para>
           </section>
         </section>
       </section>


   <section id="&tp;flow_controller">
     <title>Flow Controller Descriptors</title>

     <para>The basic structure of a Flow Controller Descriptor is as follows:


       <programlisting><![CDATA[<?xml version="1.0" ?>
 <flowControllerDescription
     xmlns="http://uima.apache.org/resourceSpecifier">

   <frameworkImplementation>org.apache.uima.java</frameworkImplementation>

   <implementationName>[ClassName]</implementationName>

   <processingResourceMetaData>
     ...
   </processingResourceMetaData>

   <externalResourceDependencies>
     ...
   </externalResourceDependencies>

   <resourceManagerConfiguration>
     ...
   </resourceManagerConfiguration>

 </flowControllerDescription>]]></programlisting></para>

     <para>The <literal>frameworkImplementation</literal> element must always be set to
       the value <literal>org.apache.uima.java</literal>.</para>

     <para>The <literal>implementationName</literal> element must contain the
       fully-qualified class name of the Flow Controller implementation. This must name a
       class that implements the <literal>FlowController</literal> interface.</para>

     <para>The <literal>processingResourceMetaData</literal> element contains
       essentially the same information as a Primitive Analysis Engine Descriptor&apos;s
       <literal>analysisEngineMetaData</literal> element, described in <xref
         linkend="&tp;aes.metadata"/>.</para>

     <para>The <literal>externalResourceDependencies</literal> and
       <literal>resourceManagerConfiguration</literal> elements are exactly the same as
       in Primitive Analysis Engine Descriptors (see <xref
         linkend="&tp;aes.primitive.external_resource_dependencies"/> and <xref
         linkend="&tp;aes.primitive.resource_manager_configuration"/>.</para>

   </section>

   <section id="&tp;collection_processing_parts">
     <title>Collection Processing Component Descriptors</title>

     <para>There are three types of Collection Processing Components &ndash; Collection
       Readers, CAS Initializers (deprecated as of UIMA Version 2), and CAS Consumers. Each
       type of component has a corresponding descriptor. The structure of these descriptors
       is very similar to that of primitive Analysis Engine Descriptors.</para>

     <section id="&tp;collection_processing_parts.collection_reader">
       <title>Collection Reader Descriptors</title>

       <para>The basic structure of a Collection Reader descriptor is as follows:


         <programlisting><![CDATA[<?xml version="1.0" ?>
 <collectionReaderDescription
     xmlns="http://uima.apache.org/resourceSpecifier">

   <frameworkImplementation>org.apache.uima.java</frameworkImplementation>
   <implementationName>[ClassName]</implementationName>

   <processingResourceMetaData>
     ...
   </processingResourceMetaData>

   <externalResourceDependencies>
    ...
   </externalResourceDependencies>

   <resourceManagerConfiguration>

    ...

   </resourceManagerConfiguration>

 </collectionReaderDescription>]]></programlisting></para>

       <para>The <literal>frameworkImplementation</literal> element must always be set
         to the value <literal>org.apache.uima.java</literal>.</para>

       <para>The <literal>implementationName</literal> element contains the
         fully-qualified class name of the Collection Reader implementation. This must name
         a class that implements the <literal>CollectionReader</literal>
         interface.</para>

       <para>The <literal>processingResourceMetaData</literal> element contains
         essentially the same information as a Primitive Analysis Engine
         Descriptor&apos;s&apos; <literal>analysisEngineMetaData</literal> element:


         <programlisting><![CDATA[<processingResourceMetaData>

   <name> [String] </name>
   <description>[String]</description>
   <version>[String]</version>
   <vendor>[String]</vendor>

   <configurationParameters>
      ...
   </configurationParameters>

   <configurationParameterSettings>
     ...
   </configurationParameterSettings>

   <typeSystemDescription>
    ...
   </typeSystemDescription>

   <typePriorities>
    ...
   </typePriorities>

   <fsIndexes>
    ...
   </fsIndexes>

   <capabilities>
    ...
   </capabilities>

 </processingResourceMetaData>]]></programlisting></para>

       <para>The contents of these elements are the same as that described in <xref
           linkend="&tp;aes.metadata"/>, with the exception that the capabilities
         section should not declare any inputs (because the Collection Reader is always the
         first component to receive the CAS).</para>

       <para>The <literal>externalResourceDependencies</literal> and
         <literal>resourceManagerConfiguration</literal> elements are exactly the same
         as in the Primitive Analysis Engine Descriptors (see <xref
           linkend="&tp;aes.primitive.external_resource_dependencies"/> and <xref
           linkend="&tp;aes.primitive.resource_manager_configuration"/>.</para>

     </section>
     <section id="&tp;collection_processing_parts.cas_initializer">
       <title>CAS Initializer Descriptors (deprecated)</title>

       <para>The basic structure of a CAS Initializer Descriptor is as follows:


         <programlisting><![CDATA[<?xml version="1.0" encoding="UTF-8" ?>
 <casInitializerDescription
     xmlns="http://uima.apache.org/resourceSpecifier">

   <frameworkImplementation>org.apache.uima.java</frameworkImplementation>
   <implementationName>[ClassName] </implementationName>

   <processingResourceMetaData>
     ...
   </processingResourceMetaData>

   <externalResourceDependencies>
     ...
   </externalResourceDependencies>

   <resourceManagerConfiguration>
     ...
   </resourceManagerConfiguration>

 </casInitializerDescription>]]></programlisting></para>

       <para>The <literal>frameworkImplementation</literal> element must always be set
         to the value <literal>org.apache.uima.java</literal>.</para>

       <para>The <literal>implementationName</literal> element contains the
         fully-qualified class name of the CAS Initializer implementation. This must name a
         class that implements the <literal>CasInitializer</literal> interface.</para>

       <para>The <literal>processingResourceMetaData</literal> element contains
         essentially the same information as a Primitive Analysis Engine
         Descriptor&apos;s&apos; <literal>analysisEngineMetaData</literal> element,
         as described in <xref linkend="&tp;aes.metadata"/>, with the exception of some
         changes to the capabilities section. A CAS Initializer&apos;s capabilities
         element looks like this:


         <programlisting><![CDATA[<capabilities>
   <capability>
     <outputs>
       <type allAnnotatorFeatures="true|false">[String]</type>
       <type>[TypeName]</type>
       ...
       <feature>[TypeName]:[Name]</feature>
       ...
     </outputs>

     <outputSofas>
       <sofaName>[name]</sofaName>
       ...
     </outputSofas>

     <mimeTypesSupported>
       <mimeType>[MIME Type]</mimeType>
       ...
     </mimeTypesSupported>
   </capability>

   <capability>
     ...
   </capability>
   ...
 </capabilities>]]></programlisting></para>

       <para>The differences between a CAS Initializer&apos;s capabilities declaration
         and an Analysis Engine&apos;s capabilities declaration are that the CAS Initializer does not
         declare any input CAS types and features or input Sofas (because it is always the first
         to operate on a CAS), it doesn&apos;t have a language specifier, and that the CAS
         Initializer may declare a set of MIME types that it supports for its input documents.
         Examples include: text/plain, text/html, and application/pdf. For a list of MIME
         types see <ulink url="http://www.iana.org/assignments/media-types/"/>. This
         information is currently only for users&apos; information, the framework does not
         use it for anything. This may change in future versions.</para>

       <para>The <literal>externalResourceDependencies</literal> and
         <literal>resourceManagerConfiguration</literal> elements are exactly the same
         as in the Primitive Analysis Engine Descriptors (see <xref
           linkend="&tp;aes.primitive.external_resource_dependencies"/> and <xref
           linkend="&tp;aes.primitive.resource_manager_configuration"/>).</para>

     </section>
     <section id="&tp;collection_processing_parts.cas_consumer">
       <title>CAS Consumer Descriptors</title>

       <para>The basic structure of a CAS Consumer Descriptor is as follows:


         <programlisting><![CDATA[<?xml version="1.0" encoding="UTF-8" ?>
 <casConsumerDescription
     xmlns="http://uima.apache.org/resourceSpecifier">

   <frameworkImplementation>org.apache.uima.java</frameworkImplementation>

   <implementationName>[ClassName]</implementationName>

   <processingResourceMetaData>
     ...
   </processingResourceMetaData>

   <externalResourceDependencies>
     ...
   </externalResourceDependencies>

   <resourceManagerConfiguration>
     ...
   </resourceManagerConfiguration>
 </casConsumerDescription>]]></programlisting></para>

         <para>The <literal>frameworkImplementation</literal> element currently must
           have the value <literal>org.apache.uima.java</literal>, or
            <literal>org.apache.uima.cpp</literal>.</para>

         <para>The next subelement,<literal>
           &lt;annotatorImplementationName&gt;</literal> is how the UIMA framework
           determines which annotator class to use. This should contain a fully-qualified
           Java class name for Java implementations, or the name of a .dll or .so file for C++
           implementations.</para>
       <para>The <literal>frameworkImplementation</literal> element must always be set
         to the value <literal>org.apache.uima.java</literal>.</para>

       <para>The <literal>implementationName</literal> element must contain the
         fully-qualified class name of the CAS Consumer implementation, or the name
         of a .dll or .so file for C++ implementations.  For Java, the named class must
         implement the <literal>CasConsumer</literal> interface.</para>

       <para>The <literal>processingResourceMetaData</literal> element contains
         essentially the same information as a Primitive Analysis Engine Descriptor&apos;s
         <literal>analysisEngineMetaData</literal> element, described in <xref
           linkend="&tp;aes.metadata"/>, except that the CAS Consumer Descriptor&apos;s
         <literal>capabilities</literal> element should not declare outputs or
         outputSofas (since CAS Consumers do not modify the CAS).</para>

       <para>The <literal>externalResourceDependencies</literal> and
         <literal>resourceManagerConfiguration</literal> elements are exactly the same
         as in Primitive Analysis Engine Descriptors (see <xref
           linkend="&tp;aes.primitive.external_resource_dependencies"/> and <xref
           linkend="&tp;aes.primitive.resource_manager_configuration"/>.</para>

     </section>
   </section>

   <section id="&tp;service_client">
     <title>Service Client Descriptors</title>

     <para>Service Client Descriptors specify only a location of a remote service. They are
       therefore much simpler in structure. In the UIMA SDK, a Service Client Descriptor that
       refers to a valid Analysis Engine or CAS Consumer service can be used in place of the
       actual Analysis Engine or CAS Consumer Descriptor. The UIMA SDK will handle the details
       of calling the remote service. (For details on <emphasis>deploying</emphasis> an
       Analysis Engine or CAS Consumer as a service, see <olink targetdoc="&uima_docs_tutorial_guides;"
         targetptr="ugr.tug.application.remote_services"/>.</para>

     <para>The UIMA SDK is extensible to support different types of remote services. In future
       versions, there may be different variations of service client descriptors that cater
       to different types of services. For now, the only type of service client descriptor is
       the <literal>uriSpecifier</literal>, which supports the SOAP and Vinci
       protocols.</para>


     <programlisting><![CDATA[<?xml version="1.0" encoding="UTF-8" ?>
 <uriSpecifier xmlns="http://uima.apache.org/resourceSpecifier">
   <resourceType>AnalysisEngine | CasConsumer </resourceType>
   <uri>[URI]</uri>
   <protocol>SOAP | SOAPwithAttachments | Vinci</protocol>
   <timeout>[Integer]</timeout>
   <parameters>
     <parameter name="VNS_HOST" value="some.internet.ip.name-or-address"/>
     <parameter name="VNS_PORT" value="9000"/>
     <parameter name="GetMetaDataTimeout" value="[Integer]"/>
   </parameters>
 </uriSpecifier>]]></programlisting>

     <para>The <literal>resourceType</literal> element is required for new descriptors,
       but is currently allowed to be omitted for backward compatibility. It specifies the
       type of component (Analysis Engine or CAS Consumer) that is implemented by the service
       endpoint described by this descriptor.</para>

     <para>The <literal>uri</literal> element contains the URI for the web service. (Note
       that in the case of Vinci, this will be the service name, which is looked up in the Vinci
       Naming Service.)</para>

     <para>The <literal>protocol</literal> element may be set to SOAP,
       SOAPwithAttachments, or Vinci; other protocols may be added later. These specify the
       particular data transport format that will be used.</para>

     <para>The <literal>timeout</literal> element is optional. If present, it specifies
       the number of milliseconds to wait for a request to be processed before an exception is
       thrown. A value of zero or less will wait forever. If no timeout is specified, a default
       value (currently 60 seconds) will be used.</para>

     <para>The parameters element is optional. If present, it can specify values for each
       of the following:
     </para>
     <itemizedlist>
       <listitem><para><literal>VNS_HOST</literal>: host name for the Vinci naming service.
       </para></listitem>
       <listitem><para><literal>VNS_PORT</literal>: port number for the Vinci naming service.
       </para></listitem>
       <listitem><para><literal>GetMetaDataTimeout</literal>: timeout period (in milliseconds) for
           the GetMetaData call.  If not specified, the default is 60 seconds.  This may need
           to be set higher if there are a lot of clients competing for connections to the service.
       </para></listitem>
     </itemizedlist>

     <para>If the <literal>VNS_HOST</literal> and <literal>VNS_PORT</literal> are not specified
       in the descriptor, the values used for these comes from
       parameters passed on the Java command line using the
       <literal>-DVNS_HOST=&lt;host&gt;</literal> and/or
       <literal>-DVNS_PORT=&lt;port&gt;</literal> system arguments. If not present, and
       a system argument is also not present, the values for these default to
       <literal>localhost</literal> for the <literal>VNS_HOST</literal> and
       <literal>9000</literal> for the <literal>VNS_PORT</literal>.</para>

     <para>For details on how to deploy and call Analysis Engine and CAS Consumer services, see
         <olink targetdoc="&uima_docs_tutorial_guides;"
         targetptr="ugr.tug.application.remote_services"/>.</para>

   </section>

   <section id="&tp;custom_resource_specifiers">
     <title>Custom Resource Specifiers</title>
 	<para>A Custom Resource Specifier allows you to plug in your own Java class as a UIMA Resource.
 		For example you can support a new service protocol by plugging in a Java class that implements
 		the UIMA <literal>AnalysisEngine</literal> interface and communicates with the remote service.</para>

 	<para>A Custom Resource Specifier has the following format:</para>
     <programlisting><![CDATA[<?xml version="1.0" encoding="UTF-8" ?>
 <customResourceSpecifier xmlns="http://uima.apache.org/resourceSpecifier">
   <resourceClassName>[Java Class Name]</resourceClassName>
   <parameters>
     <parameter name="[String]" value="[String]"/>
     <parameter name="[String]" value="[String]"/>
   </parameters>
 </customResourceSpecifier>]]></programlisting>

 	<para>The <literal>resourceClassName</literal> element must contain the fully-qualified name of a Java class
 	that can be found in the classpath (including the UIMA extension classpath, if you have specified one using
 	the <literal>ResourceManager.setExtensionClassPath</literal> method).  This class must implement the
 	UIMA <literal>Resource</literal> interface.</para>

 	<para>When an application calls the <literal>UIMAFramework.produceResource</literal> method and passes a
 	<literal>CustomResourceSpecifier</literal>, the UIMA framework will load the named class and call its
 	<literal>initialize(ResourceSpecifier,Map)</literal> method, passing the <literal>CustomResourceSpecifier</literal>
 	as the first argument.  Your class can override the <literal>initialize</literal> method and use the
 	<literal>CustomResourceSpecifier</literal> API to get access to the <literal>parameter</literal> names and values
 	specified in the XML.</para>

 	<para>If you are using a custom resource specifier to plug in a class that implements a new service protocol,
 	your class must also implement the <literal>AnalysisEngine</literal> interface.  Generally it should also
 	extend <literal>AnalysisEngineImplBase</literal>.  The key methods that should be implemented are
 	<literal>getMetaData</literal>, <literal>processAndOutputNewCASes</literal>,
 	<literal>collectionProcessComplete</literal>, and <literal>destroy</literal>.</para>
   </section>
 </chapter>