Issue #331: Convert remaining documentation to asciidoc

- Convert everything to Asciidoc
- Set up asciidoctor build
- Remove old docbook documentation
diff --git a/aggregate-uimaj-docbooks/pom.xml b/aggregate-uimaj-docbooks/pom.xml
deleted file mode 100644
index 33ce823..0000000
--- a/aggregate-uimaj-docbooks/pom.xml
+++ /dev/null
@@ -1,73 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!--
-   Licensed to the Apache Software Foundation (ASF) under one
-   or more contributor license agreements.  See the NOTICE file
-   distributed with this work for additional information
-   regarding copyright ownership.  The ASF licenses this file
-   to you under the Apache License, Version 2.0 (the
-   "License"); you may not use this file except in compliance
-   with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
-   Unless required by applicable law or agreed to in writing,
-   software distributed under the License is distributed on an
-   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-   KIND, either express or implied.  See the License for the
-   specific language governing permissions and limitations
-   under the License.    
--->
-<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
-  <modelVersion>4.0.0</modelVersion>
-
-  <parent>
-    <groupId>org.apache.uima</groupId>
-    <artifactId>uimaj-parent</artifactId>
-    <version>3.5.0-SNAPSHOT</version>
-    <relativePath>../uimaj-parent/pom.xml</relativePath>
-  </parent>
-
-  <artifactId>aggregate-uimaj-docbooks</artifactId>
-  <packaging>pom</packaging>
-  <name>Apache UIMA Aggregate: ${project.artifactId}</name>
-  <url>${uimaWebsiteUrl}</url>
-
-  <build>
-    <plugins>
-      <plugin>
-        <artifactId>maven-resources-plugin</artifactId>
-        <executions>
-          <execution>
-            <goals>
-              <goal>copy-resources</goal>
-            </goals>
-            <phase>process-resources</phase>
-            <configuration>
-              <outputDirectory>${project.build.directory}</outputDirectory>
-              <resources>
-                <resource>
-                  <filtering>true</filtering>
-                  <directory>src/main/resources</directory>
-                  <includes>
-                    <include>**/*.*</include>
-                  </includes>
-                </resource>
-              </resources>
-            </configuration>
-          </execution>
-        </executions>
-      </plugin>
-    </plugins>
-  </build>
-
-  <modules>
-    <!-- Legacy Docbook -->
-    <module>../uima-docbook-overview-and-setup</module>
-    <module>../uima-docbook-references</module>
-    <module>../uima-docbook-tools</module>
-    <module>../uima-docbook-tutorials-and-users-guides</module>
-    <!-- Converted to Asciidoc -->
-    <module>../uima-doc-v3-users-guide</module>
-    <module>../uima-doc-v3-maintainers-guide</module>
-  </modules>
-</project>
\ No newline at end of file
diff --git a/aggregate-uimaj-docbooks/src/main/resources/d/index.html b/aggregate-uimaj-docbooks/src/main/resources/d/index.html
deleted file mode 100644
index f320158..0000000
--- a/aggregate-uimaj-docbooks/src/main/resources/d/index.html
+++ /dev/null
@@ -1,87 +0,0 @@
-
-<html>
-<head>
-<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
-<title>Apache UIMA Documentation Overview</title>
-<link rel="stylesheet" href="css/stylesheet-html.css" type="text/css">
-<meta name="generator" content="DocBook XSL Stylesheets V1.70.0">
-</head>
-<body bgcolor="white" text="black" link="#0000FF" vlink="#840084"
-  alink="#0000FF">
-<div class="book" lang="en" id="d0e2">
-<div class="titlepage">
-<div>
-<div>
-<h1 class="title"><a name="d0e2"></a>UIMA Documentation Overview</h1>
-</div>
-<div>
-<div class="authorgroup">
-<p>Authors: The Apache UIMA Development Community</p>
-</div>
-</div>
-<div>
-<p class="releaseinfo">Version ${project.version}</p>
-</div>
-<div>
-<p class="copyright">Copyright &copy; 2006, ${buildYear} The Apache Software
-Foundation</p>
-</div>
-<div>
-<p class="copyright">Copyright &copy; 2004, 2006 International
-Business Machines Corporation</p>
-</div>
-<div>
-<div class="legalnotice"><a name="d0e15"></a>
-<p><b>License and Disclaimer.&nbsp;</b>The ASF licenses this
-documentation to you under the Apache License, Version 2.0 (the
-"License"); you may not use this documentation except in compliance with
-the License. You may obtain a copy of the License at</p>
-<div class="blockquote">
-<blockquote class="blockquote"><a
-  href="http://www.apache.org/licenses/LICENSE-2.0" target="_top">http://www.apache.org/licenses/LICENSE-2.0</a></blockquote>
-</div>
-<p>Unless required by applicable law or agreed to in writing, this
-documentation and its contents are distributed under the License on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
-express or implied. See the License for the specific language governing
-permissions and limitations under the License.</p>
-<p></p>
-<p><b>Trademarks.&nbsp;</b>All terms mentioned in the text that are
-known to be trademarks or service marks have been appropriately
-capitalized. Use of such terms in this book should not be regarded as
-affecting the validity of the the trademark or service mark.</p>
-</div>
-</div>
-<div>
-<p class="pubdate">
-${buildMonth}, ${buildYear}
-</p>
-</div>
-</div>
-<hr>
-</div>
-
-<h3>Books in the UIMA Documentation</h3>
-
-<p>
-  The UIMA documentation is available in both PDF and HTML formats.  It is
-  divided into four books.
-</p>
-<dl>
-<dt><span class="chapter"><a href="overview_and_setup.html">1. Overview &amp; Setup</a></span></dt>
-  <dd>This includes overview material, a general guide to all the rest of the documentation,
-  and some setup instructions if you are using the Eclipse IDE.</dd>
-<dt><span class="chapter"><a href="tutorials_and_users_guides.html">2. UIMA Tutorial and Developers' Guides</a></span></dt>
-  <dd>This is a set of tutorial chapters and some general overview guides to the major features and capabilities of UIMA.</dd>
-<dt><span class="chapter"><a href="tools.html">3. UIMA Tools Guide and Reference</a></span></dt>
-  <dd>UIMA comes with a set of tools, which are described in this book.</dd>
-<dt><span class="chapter"><a href="references.html">4. UIMA References</a></span></dt>
-  <dd>Reference materials, covering the XML descriptors, the major APIs, the data interchange formats,
-    and the component packaging.</dd>
-</dl>
-
-<h3>Release Notes</h3>
-<p>Click <a href="RELEASE_NOTES.html">RELEASE_NOTES.html</a> for the ${project.version} Apache UIMA JAVA SDK
-release notes.</p>
-
-</html>
diff --git a/aggregate-uimaj/pom.xml b/aggregate-uimaj/pom.xml
index f09129c..4f0ae3a 100644
--- a/aggregate-uimaj/pom.xml
+++ b/aggregate-uimaj/pom.xml
@@ -43,8 +43,9 @@
     <module>../uimaj-component-test-util</module>
     <module>../jVinci</module>
     <module>../aggregate-uimaj-eclipse-plugins</module>
-    <module>../aggregate-uimaj-docbooks</module>
-    <!--module>distr-superPom</module-->
+    <module>../uimaj-documentation</module>
+    <module>../uima-doc-v3-users-guide</module>
+    <module>../uima-doc-v3-maintainers-guide</module>
     <module>../uimaj-document-annotation</module>
     <module>../PearPackagingMavenPlugin</module>
     <module>../jcasgen-maven-plugin</module>
diff --git a/pom.xml b/pom.xml
index 14a88d6..1324556 100644
--- a/pom.xml
+++ b/pom.xml
@@ -286,7 +286,6 @@
                   <exclude>jVinci/**</exclude>
                   <exclude>PearPackagingMavenPlugin/**</exclude>
                   <exclude>jcasgen-maven-plugin/**</exclude>
-                  <exclude>uima-docbook-*/**</exclude>
                   <exclude>uima-doc-*/**</exclude>
                   <exclude>uimaj-adapter-*/**</exclude>
                   <exclude>uimaj-bom/**</exclude>
diff --git a/uima-doc-v3-users-guide/pom.xml b/uima-doc-v3-users-guide/pom.xml
index f7782fc..81fb224 100644
--- a/uima-doc-v3-users-guide/pom.xml
+++ b/uima-doc-v3-users-guide/pom.xml
@@ -76,7 +76,6 @@
             <docinfo1>true</docinfo1>
             <project-version>${project.version}</project-version>
             <revnumber>${project.version}</revnumber>
-            <product-name>Apache UIMA Version 3 User's Guide</product-name>
             <product-website-url>https://uima.apache.org</product-website-url>
             <icons>font</icons>
           </attributes>
diff --git a/uima-doc-v3-users-guide/src/docs/asciidoc/version_3_users_guide.adoc b/uima-doc-v3-users-guide/src/docs/asciidoc/version_3_users_guide.adoc
index 253ad04..7cf1923 100644
--- a/uima-doc-v3-users-guide/src/docs/asciidoc/version_3_users_guide.adoc
+++ b/uima-doc-v3-users-guide/src/docs/asciidoc/version_3_users_guide.adoc
@@ -15,7 +15,7 @@
 // specific language governing permissions and limitations
 // under the License.
 
-= Apache UIMA™
+= Apache UIMA™ - UIMA 3 User's Guide
 :Author: Apache UIMA™ Development Community
 :toc-title: UIMA 3 User's Guide
 
diff --git a/uima-docbook-overview-and-setup/pom.xml b/uima-docbook-overview-and-setup/pom.xml
deleted file mode 100644
index 2d7f4ec..0000000
--- a/uima-docbook-overview-and-setup/pom.xml
+++ /dev/null
@@ -1,50 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!--
-   Licensed to the Apache Software Foundation (ASF) under one
-   or more contributor license agreements.  See the NOTICE file
-   distributed with this work for additional information
-   regarding copyright ownership.  The ASF licenses this file
-   to you under the Apache License, Version 2.0 (the
-   "License"); you may not use this file except in compliance
-   with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
-   Unless required by applicable law or agreed to in writing,
-   software distributed under the License is distributed on an
-   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-   KIND, either express or implied.  See the License for the
-   specific language governing permissions and limitations
-   under the License.    
--->
-<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
-	<modelVersion>4.0.0</modelVersion>
-  
-  <parent>
-    <groupId>org.apache.uima</groupId>
-    <artifactId>uimaj-parent</artifactId>
-    <version>3.5.0-SNAPSHOT</version>
-    <relativePath>../uimaj-parent/pom.xml</relativePath>
-  </parent>
-  
-	<artifactId>uima-docbook-overview-and-setup</artifactId>
-	<packaging>pom</packaging>
-	<name>Apache UIMA SDK Documentation - overview and setup</name>	
-  <url>${uimaWebsiteUrl}</url>
-   
-  <properties>
-    <!-- next property is the name of the top file under src/docbook without trailing .xml -->
-    <bookNameRoot>overview_and_setup</bookNameRoot>
-  </properties>
- 	
-  <repositories>
-    <repository>
-      <id>apache.snapshots</id>
-      <name>Apache Snapshot Repository</name>
-      <url>https://repository.apache.org/snapshots</url>
-      <releases>
-        <enabled>false</enabled>
-      </releases>
-    </repository>
-  </repositories>  
-</project>
\ No newline at end of file
diff --git a/uima-docbook-overview-and-setup/src/docbook/conceptual_overview.xml b/uima-docbook-overview-and-setup/src/docbook/conceptual_overview.xml
deleted file mode 100644
index 5501fdf..0000000
--- a/uima-docbook-overview-and-setup/src/docbook/conceptual_overview.xml
+++ /dev/null
@@ -1,989 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [

-<!ENTITY key_concepts "Key UIMA Concepts Introduced in this Section:">

-<!ENTITY imgroot "images/overview-and-setup/conceptual_overview_files/" >

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.ovv.conceptual">

-  <title>UIMA Conceptual Overview</title>

-  

-  <para>UIMA is an open, industrial-strength, scaleable and extensible platform for

-    creating, integrating and deploying unstructured information management solutions

-    from powerful text or multi-modal analysis and search components. </para>

-  

-  <para>The Apache UIMA project is an implementation of the Java UIMA framework available

-    under the Apache License, providing a common foundation for industry and academia to

-    collaborate and accelerate the world-wide development of technologies critical for

-    discovering vital knowledge present in the fastest growing sources of information

-    today.</para>

-  

-  <para>This chapter presents an introduction to many essential UIMA concepts. It is meant to

-    provide a broad overview to give the reader a quick sense of UIMA&apos;s basic

-    architectural philosophy and the UIMA SDK&apos;s capabilities. </para>

-  

-  <para>This chapter provides a general orientation to UIMA and makes liberal reference to

-    the other chapters in the UIMA SDK documentation set, where the reader may find detailed

-    treatments of key concepts and development practices. It may be useful to refer to <olink

-      targetdoc="&uima_docs_overview;" targetptr="ugr.glossary"/>, to become familiar

-    with the terminology in this overview.</para>

-  

-  <section id="ugr.ovv.conceptual.uima_introduction">

-    <title>UIMA Introduction</title>

-    <figure id="ugr.ovv.conceptual.fig.bridge">

-      <title>UIMA helps you build the bridge between the unstructured and structured

-        worlds</title>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="5.5in" format="PNG" fileref="&imgroot;image002.png"/>

-        </imageobject>

-        <textobject><phrase>Picture of a bridge between unstructured information

-          artifacts and structured metadata about those artifacts</phrase>

-        </textobject>

-      </mediaobject>

-    </figure>

-    

-    <para> Unstructured information represents the largest, most current and fastest

-      growing source of information available to businesses and governments. The web is just

-      the tip of the iceberg. Consider the mounds of information hosted in the enterprise and

-      around the world and across different media including text, voice and video. The

-      high-value content in these vast collections of unstructured information is,

-      unfortunately, buried in lots of noise. Searching for what you need or doing

-      sophisticated data mining over unstructured information sources presents new

-      challenges. </para>

-    

-    <para>An unstructured information management (UIM) application may be generally

-      characterized as a software system that analyzes large volumes of unstructured

-      information (text, audio, video, images, etc.) to discover, organize and deliver

-      relevant knowledge to the client or application end-user. An example is an application

-      that processes millions of medical abstracts to discover critical drug interactions.

-      Another example is an application that processes tens of millions of documents to

-      discover key evidence indicating probable competitive threats. </para>

-    

-    <para>First and foremost, the unstructured data must be analyzed to interpret, detect

-      and locate concepts of interest, for example, named entities like persons,

-      organizations, locations, facilities, products etc., that are not explicitly tagged

-      or annotated in the original artifact. More challenging analytics may detect things

-      like opinions, complaints, threats or facts. And then there are relations, for

-      example, located in, finances, supports, purchases, repairs etc. The list of concepts 

-      important for applications to discover in unstructured content is large, varied and 

-      often domain specific. 

-      Many different component analytics may solve different parts of the overall analysis task. 

-      These component analytics must interoperate and must be easily combined to facilitate 

-      the development of UIM applications.</para>

-    

-    <para>The result of analysis are used to populate structured forms so that conventional 

-      data processing and search technologies 

-      like search engines, database engines or OLAP

-      (On-Line Analytical Processing, or Data Mining) engines 

-      can efficiently deliver the newly discovered content in response to the client requests 

-      or queries.</para>

-    

-    <para>In analyzing unstructured content, UIM applications make use of a variety of

-      analysis technologies including:</para>

-    

-    <itemizedlist spacing="compact">

-      <listitem><para>Statistical and rule-based Natural Language Processing

-        (NLP)</para>

-      </listitem>

-      <listitem><para>Information Retrieval (IR)</para>

-      </listitem>

-      <listitem><para>Machine learning</para>

-      </listitem>

-      <listitem><para>Ontologies</para>

-      </listitem>

-      <listitem><para>Automated reasoning and</para>

-      </listitem>

-      <listitem><para>Knowledge Sources (e.g., CYC, WordNet, FrameNet, etc.)</para>

-      </listitem>

-      

-    </itemizedlist>

-    

-    <para>Specific analysis capabilities using these technologies are developed 

-      independently using different techniques, interfaces and platforms.

-      </para>

-    

-    <para>The bridge from the unstructured world to the structured world is built through the

-      composition and deployment of these analysis capabilities. This integration is often

-      a costly challenge. </para>

-    

-    <para>The Unstructured Information Management Architecture (UIMA) is an architecture

-      and software framework that helps you build that bridge. It supports creating,

-      discovering, composing and deploying a broad range of analysis capabilities and

-      linking them to structured information services.</para>

-    

-    <para>UIMA allows development teams to match the right skills with the right parts of a

-      solution and helps enable rapid integration across technologies and platforms using a

-      variety of different deployment options. These ranging from tightly-coupled

-      deployments for high-performance, single-machine, embedded solutions to parallel

-      and fully distributed deployments for highly flexible and scaleable

-      solutions.</para>

-    

-  </section>

-  

-  <section id="ugr.ovv.conceptual.architecture_framework_sdk">

-    <title>The Architecture, the Framework and the SDK</title>

-    <para>UIMA is a software architecture which specifies component interfaces, data

-      representations, design patterns and development roles for creating, describing,

-      discovering, composing and deploying multi-modal analysis capabilities.</para>

-    

-    <para>The <emphasis role="bold">UIMA framework</emphasis> provides a run-time

-      environment in which developers can plug in their UIMA component implementations and

-      with which they can build and deploy UIM applications. The framework is not specific to

-      any IDE or platform. Apache hosts a Java and (soon) a C++ implementation of the UIMA

-      Framework.</para>

-    

-    <para>The <emphasis role="bold">UIMA Software Development Kit (SDK)</emphasis>

-      includes the UIMA framework, plus tools and utilities for using UIMA. Some of the

-      tooling supports an Eclipse-based ( <ulink url="http://www.eclipse.org/"/>)

-      development environment. </para>

-    

-  </section>

-  

-  <section id="ugr.ovv.conceptual.analysis_basics">

-    <title>Analysis Basics</title>

-    <note><title>&key_concepts;</title><para>Analysis Engine, Document, Annotator, Annotator

-      Developer, Type, Type System, Feature, Annotation, CAS, Sofa, JCas, UIMA

-      Context.</para>

-    </note>

-    

-    <section id="ugr.ovv.conceptual.aes_annotators_and_analysis_results">

-      <title>Analysis Engines, Annotators &amp; Results</title>

-      <figure id="ugr.ovv.conceptual.metadata_in_cas">

-        <title>Objects represented in the Common Analysis Structure (CAS)</title>

-        <mediaobject>

-          <imageobject role="html">

-            <imagedata format="PNG" width="594px" align="center" fileref="&imgroot;image004.png"/>

-          </imageobject>

-          <imageobject role="fo">

-            <imagedata format="PNG" width="5.5in" align="center" fileref="&imgroot;image004.png"/>

-          </imageobject>          

-          <textobject><phrase>Picture of some text, with a hierarchy of discovered

-            metadata about words in the text, including some image of a person as metadata

-            about that name.</phrase>

-          </textobject>

-        </mediaobject>

-      </figure>

-      

-      <para>UIMA is an architecture in which basic building blocks called Analysis Engines

-        (AEs) are composed to analyze a document and infer and record descriptive attributes

-        about the document as a whole, and/or about regions therein. This descriptive

-        information, produced by AEs is referred to generally as <emphasis role="bold">

-        analysis results</emphasis>. Analysis results typically represent meta-data

-        about the document content. One way to think about AEs is as software agents that

-        automatically discover and record meta-data about original content.</para>

-      

-      <para>UIMA supports the analysis of different modalities including text, audio and

-        video. The majority of examples we provide are for text. We use the term <emphasis

-          role="bold">document, </emphasis>therefore, to generally refer to any unit of

-        content that an AE may process, whether it is a text document or a segment of audio, for

-        example. See the <olink targetdoc="&uima_docs_tutorial_guides;"/>

-        <olink targetdoc="&uima_docs_tutorial_guides;"

-          targetptr="ugr.tug.mvs"/> for more information on multimodal processing

-        in UIMA.</para>

-      

-      <para>Analysis results include different statements about the content of a document.

-        For example, the following is an assertion about the topic of a document:</para>

-      

-      

-      <programlisting>(1) The Topic of document D102 is "CEOs and Golf".</programlisting>

-      

-      <para>Analysis results may include statements describing regions more granular than

-        the entire document. We use the term <emphasis role="bold">span</emphasis> to

-        refer to a sequence of characters in a text document. Consider that a document with the

-        identifier D102 contains a span, <quote>Fred Centers</quote> starting at

-        character position 101. An AE that can detect persons in text may represent the

-        following statement as an analysis result:</para>

-      

-      

-      <programlisting>(2) The span from position 101 to 112 in document D102 denotes a Person</programlisting>

-      

-      <para>In both statements 1 and 2 above there is a special pre-defined term or what we call

-        in UIMA a <emphasis role="bold">Type</emphasis>. They are

-        <emphasis>Topic</emphasis> and <emphasis>Person</emphasis> respectively.

-        UIMA types characterize the kinds of results that an AE may create &ndash; more on

-        types later.</para>

-      

-      <para>Other analysis results may relate two statements. For example, an AE might

-        record in its results that two spans are both referring to the same person:</para>

-      

-      

-      <programlisting>(3) The Person denoted by span 101 to 112 and 

-  the Person denoted by span 141 to 143 in document D102 

-  refer to the same Entity.</programlisting>

-      

-      <para>The above statements are some examples of the kinds of results that AEs may record

-        to describe the content of the documents they analyze. These are not meant to indicate

-        the form or syntax with which these results are captured in UIMA &ndash; more on that

-        later in this overview.</para>

-      

-      <para>The UIMA framework treats Analysis engines as pluggable, composible,

-        discoverable, managed objects. At the heart of AEs are the analysis algorithms that

-        do all the work to analyze documents and record analysis results. </para>

-      

-      <para>UIMA provides a basic component type intended to house the core analysis

-        algorithms running inside AEs. Instances of this component are called <emphasis

-          role="bold">Annotators</emphasis>. The analysis algorithm developer&apos;s

-        primary concern therefore is the development of annotators. The UIMA framework

-        provides the necessary methods for taking annotators and creating analysis

-        engines.</para>

-      

-      <para>In UIMA the person who codes analysis algorithms takes on the role of the

-          <emphasis role="bold">Annotator Developer</emphasis>. <olink

-          targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aae"/> 

-          in <olink targetdoc="&uima_docs_tutorial_guides;"/> will take the reader

-        through the details involved in creating UIMA annotators and analysis

-        engines.</para>

-      

-      <para>At the most primitive level an AE wraps an annotator adding the necessary APIs and

-        infrastructure for the composition and deployment of annotators within the UIMA

-        framework. The simplest AE contains exactly one annotator at its core. Complex AEs

-        may contain a collection of other AEs each potentially containing within them other

-        AEs. </para>

-    </section>

-    

-    <section id="ugr.ovv.conceptual.representing_results_in_cas">

-      <title>Representing Analysis Results in the CAS</title>

-      

-      <para>How annotators represent and share their results is an important part of the UIMA

-        architecture. UIMA defines a <emphasis role="bold">Common Analysis Structure

-        (CAS)</emphasis> precisely for these purposes.</para>

-      

-      <para>The CAS is an object-based data structure that allows the representation of

-        objects, properties and values. Object types may be related to each other in a

-        single-inheritance hierarchy. The CAS logically (if not physically) contains the

-        document being analyzed. Analysis developers share and record their analysis

-        results in terms of an object model within the CAS. <footnote><para> We have plans to

-        extend the representational capabilities of the CAS and align its semantics with the

-        semantics of the OMG&apos;s Essential Meta-Object Facility (EMOF) and with the

-        semantics of the Eclipse Modeling Framework&apos;s ( <ulink

-          url="http://www.eclipse.org/emf/"/>) Ecore semantics and XMI-based

-        representation.</para> </footnote> </para>

-      

-      <para>The UIMA framework includes an implementation and interfaces to the CAS. For a

-        more detailed description of the CAS and its interfaces see <olink

-          targetdoc="&uima_docs_ref;"/> <olink

-          targetdoc="&uima_docs_ref;" targetptr="ugr.ref.cas"/>.</para>

-      

-      <para>A CAS that logically contains statement 2 (repeated here for your

-        convenience)</para>

-      

-      

-      <programlisting>(2) The span from position 101 to 112 in document D102 denotes a Person</programlisting>

-      

-      <para>would include objects of the Person type. For each person found in the body of a

-        document, the AE would create a Person object in the CAS and link it to the span of text

-        where the person was mentioned in the document.</para>

-      

-      <para>While the CAS is a general purpose data structure, UIMA defines a

-        few basic types and affords the developer the ability to extend these to define an

-        arbitrarily rich <emphasis role="bold">Type System</emphasis>. You can think of a

-        type system as an object schema for the CAS.</para>

-      

-      <para>A type system defines the various types of objects that may be discovered in 

-        documents by AE's that subscribe to that type system.</para>

-      

-      <para>As suggested above, Person may be defined as a type. Types have properties or

-          <emphasis role="bold">features</emphasis>. So for example,

-        <emphasis>Age</emphasis> and <emphasis>Occupation</emphasis> may be defined as

-        features of the Person type.</para>

-      

-      <para>Other types might be <emphasis>Organization, Company, Bank, Facility, Money,

-        Size, Price, Phone Number, Phone Call, Relation, Network Packet, Product, Noun

-        Phrase, Verb, Color, Parse Node, Feature Weight Array</emphasis> etc.</para>

-      

-      <para>There are no limits to the different types that may be defined in a type system. A

-        type system is domain and application specific.</para>

-      

-      <para>Types in a UIMA type system may be organized into a taxonomy. For example,

-        <emphasis>Company</emphasis> may be defined as a subtype of

-        <emphasis>Organization</emphasis>. <emphasis>NounPhrase</emphasis> may be a

-        subtype of a <emphasis>ParseNode</emphasis>.</para>

-      

-      <section id="ugr.ovv.conceptual.annotation_type">

-        <title>The Annotation Type</title>

-        

-        <para>A general and common type used in artifact analysis and from which additional

-          types are often derived is the <emphasis role="bold">annotation</emphasis>

-          type. </para>

-        

-        <para>The annotation type is used to annotate or label regions of an artifact. Common

-          artifacts are text documents, but they can be other things, such as audio streams.

-          The annotation type for text includes two features, namely

-          <emphasis>begin</emphasis> and <emphasis>end</emphasis>. Values of these

-          features represent integer offsets in the artifact and delimit a span. Any

-          particular annotation object identifies the span it annotates with the

-          <emphasis>begin</emphasis> and <emphasis>end</emphasis> features.</para>

-        

-        <para>The key idea here is that the annotation type is used to identify and label or

-          <quote>annotate</quote> a specific region of an artifact.</para>

-        

-        <para>Consider that the Person type is defined as a subtype of annotation. An

-          annotator, for example, can create a Person annotation to record the discovery of a

-          mention of a person between position 141 and 143 in document D102. The annotator can

-          create another person annotation to record the detection of a mention of a person in

-          the span between positions 101 and 112. </para>

-      </section>

-      <section id="ugr.ovv.conceptual.not_just_annotations">

-        <title>Not Just Annotations</title>

-        

-        <para>While the annotation type is a useful type for annotating regions of a

-          document, annotations are not the only kind of types in a CAS. A CAS is a general

-          representation scheme and may store arbitrary data structures to represent the

-          analysis of documents.</para>

-        

-        <para>As an example, consider statement 3 above (repeated here for your

-          convenience).</para>

-        

-        

-        <programlisting>(3) The Person denoted by span 101 to 112 and 

-  the Person denoted by span 141 to 143 in document D102 

-  refer to the same Entity.</programlisting>

-        

-        <para>This statement mentions two person annotations in the CAS; the first, call it

-          P1 delimiting the span from 101 to 112 and the other, call it P2, delimiting the span

-          from 141 to 143. Statement 3 asserts explicitly that these two spans refer to the

-          same entity. This means that while there are two expressions in the text

-          represented by the annotations P1 and P2, each refers to one and the same person.

-          </para>

-        

-        <para>The Entity type may be introduced into a type system to capture this kind of

-          information. The Entity type is not an annotation. It is intended to represent an

-          object in the domain which may be referred to by different expressions (or

-          mentions) occurring multiple times within a document (or across documents within

-          a collection of documents). The Entity type has a feature named

-          <emphasis>occurrences. </emphasis>This feature is used to point to all the

-          annotations believed to label mentions of the same entity.</para>

-        

-        <para>Consider that the spans annotated by P1 and P2 were <quote>Fred

-          Center</quote> and <quote>He</quote> respectively. The annotator might create

-          a new Entity object called

-          <code>FredCenter</code>. To represent the relationship in statement 3 above,

-          the annotator may link FredCenter to both P1 and P2 by making them values of its

-          <emphasis>occurrences</emphasis> feature.</para>

-        

-        <para> <xref linkend="ugr.ovv.conceptual.metadata_in_cas"/> also

-          illustrates that an entity may be linked to annotations referring to regions of

-          image documents as well. To do this the annotation type would have to be extended

-          with the appropriate features to point to regions of an image.</para>

-      </section>

-      

-      <section id="ugr.ovv.conceptual.multiple_views_within_a_cas">

-        <title>Multiple Views within a CAS</title>

-        

-        <para>UIMA supports the simultaneous analysis of multiple views of a document. This

-          support comes in handy for processing multiple forms of the artifact, for example, the audio

-          and the closed captioned views of a single speech stream, or the tagged and detagged 

-          views of an HTML document.</para>

-        

-        <para>AEs analyze one or more views of a document. Each view contains a specific

-            <emphasis role="bold">subject of analysis(Sofa)</emphasis>, plus a set of

-          indexes holding metadata indexed by that view. The CAS, overall, holds one or more

-          CAS Views, plus the descriptive objects that represent the analysis results for

-          each. </para>

-        

-        <para>Another common example of using CAS Views is for different translations of a

-          document. Each translation may be represented with a different CAS View. Each

-          translation may be described by a different set of analysis results. For more

-          details on CAS Views and Sofas see <olink

-            targetdoc="&uima_docs_tutorial_guides;"/> <olink

-            targetdoc="&uima_docs_tutorial_guides;"

-            targetptr="ugr.tug.mvs"/> and <olink

-            targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aas"/>. </para>

-      </section>

-    </section>

-    

-    <section id="ugr.ovv.conceptual.interacting_with_cas_and_external_resources">

-      <title>Interacting with the CAS and External Resources</title>

-      <titleabbrev>Using CASes and External Resources</titleabbrev>

-      

-      <para>The two main interfaces that a UIMA component developer interacts with are the

-        CAS and the UIMA Context.</para>

-      

-      <para>UIMA provides an efficient implementation of the CAS with multiple programming

-        interfaces. Through these interfaces, the annotator developer interacts with the

-        document and reads and writes analysis results. The CAS interfaces provide a suite of

-        access methods that allow the developer to obtain indexed iterators to the different

-        objects in the CAS. See <olink targetdoc="&uima_docs_ref;"/> <olink targetdoc="&uima_docs_ref;"

-          targetptr="ugr.ref.cas"/>. While many objects may exist in a CAS, the annotator

-        developer can obtain a specialized iterator to all Person objects associated with a

-        particular view, for example.</para>

-      

-      <para>For Java annotator developers, UIMA provides the JCas. This interface provides

-        the Java developer with a natural interface to CAS objects. Each type declared in the

-        type system appears as a Java Class; the UIMA framework renders the Person type as a

-        Person class in Java. As the analysis algorithm detects mentions of persons in the

-        documents, it can create Person objects in the CAS. For more details on how to interact

-        with the CAS using this interface, refer to <olink targetdoc="&uima_docs_ref;"

-        /> <olink targetdoc="&uima_docs_ref;"

-          targetptr="ugr.ref.jcas"/>.</para>

-      

-      <para>The component developer, in addition to interacting with the CAS, can access

-        external resources through the framework&apos;s resource manager interface

-        called the <emphasis role="bold">UIMA Context</emphasis>. This interface, among

-        other things, can ensure that different annotators working together in an aggregate

-        flow may share the same instance of an external file or remote resource accessed

-        via its URL, for example. For details on using

-        the UIMA Context see <olink targetdoc="&uima_docs_tutorial_guides;"

-        /> <olink targetdoc="&uima_docs_tutorial_guides;"

-          targetptr="ugr.tug.aae"/>.</para>

-      

-    </section>

-    <section id="ugr.ovv.conceptual.component_descriptors">

-      <title>Component Descriptors</title>

-      <para>UIMA defines interfaces for a small set of core components that users of the

-        framework provide implmentations for. Annotators and Analysis Engines are two of

-        the basic building blocks specified by the architecture. Developers implement them

-        to build and compose analysis capabilities and ultimately applications.</para>

-      

-      <para>There are others components in addition to these, which we will learn about

-        later, but for every component specified in UIMA there are two parts required for its

-        implementation:</para>

-      

-      <orderedlist spacing="compact">

-        <listitem><para>the declarative part and</para></listitem>

-        <listitem><para>the code part.</para></listitem>

-      </orderedlist>

-      

-      <para>The declarative part contains metadata describing the component, its

-        identity, structure and behavior and is called the <emphasis role="bold">

-        Component Descriptor</emphasis>. Component descriptors are represented in XML.

-        The code part implements the algorithm. The code part may be a program in Java.</para>

-      

-      <para>As a developer using the UIMA SDK, to implement a UIMA component it is always the

-        case that you will provide two things: the code part and the Component Descriptor.

-        Note that when you are composing an engine, the code may be already provided in

-        reusable subcomponents. In these cases you may not be developing new code but rather

-        composing an aggregate engine by pointing to other components where the code has been

-        included.</para>

-      

-      <para>Component descriptors are represented in XML and aid in component discovery,

-        reuse, composition and development tooling. The UIMA SDK provides tools for easily

-        creating and maintaining the component descriptors that relieve the developer from

-        editing XML directly. This tool is described briefly in <olink

-          targetdoc="&uima_docs_tutorial_guides;"/> <olink

-          targetdoc="&uima_docs_tutorial_guides;"

-          targetptr="ugr.tug.aae"/>, and more

-        thoroughly in <olink targetdoc="&uima_docs_tools;"/>

-        <olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cde"/>

-        .</para>

-      

-      <para>Component descriptors contain standard metadata including the

-        component&apos;s name, author, version, and a reference to the class that

-        implements the component.</para>

-      

-      <para>In addition to these standard fields, a component descriptor identifies the

-        type system the component uses and the types it requires in an input CAS and the types it

-        plans to produce in an output CAS.</para>

-      

-      <para>For example, an AE that detects person types may require as input a CAS that

-        includes a tokenization and deep parse of the document. The descriptor refers to a

-        type system to make the component&apos;s input requirements and output types

-        explicit. In effect, the descriptor includes a declarative description of the

-        component&apos;s behavior and can be used to aid in component discovery and

-        composition based on desired results. UIMA analysis engines provide an interface

-        for accessing the component metadata represented in their descriptors. For more

-        details on the structure of UIMA component descriptors refer to <olink

-          targetdoc="&uima_docs_ref;"/> <olink

-          targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.component_descriptor"/>.</para>

-      

-    </section>

-  </section>

-  <section id="ugr.ovv.conceptual.aggregate_analysis_engines">

-    <title>Aggregate Analysis Engines</title>

-    

-    <note><title>&key_concepts;</title><para>Aggregate Analysis Engine, Delegate Analysis Engine,

-      Tightly and Loosely Coupled, Flow Specification, Analysis Engine Assembler</para>

-    </note>

-    

-    <figure id="ugr.ovv.conceptual.sample_aggregate">

-      <title>Sample Aggregate Analysis Engine</title>

-      <mediaobject>

-        <imageobject role="html">

-          <imagedata width="588px" format="PNG" fileref="&imgroot;image006.png"/>

-        </imageobject>

-        <imageobject role="fo">

-          <imagedata width="5.5in" format="PNG" fileref="&imgroot;image006.png"/>

-        </imageobject>

-        <textobject><phrase>Picture of multiple parts (a language identifier,

-          tokenizer, part of speech annotator, shallow parser, and named entity detector)

-          strung together into a flow, and all of them wrapped as a single aggregate object,

-          which produces as annotations the union of all the results of the individual

-          annotator components ( tokens, parts of speech, names, organizations, places,

-          persons, etc.)</phrase>

-        </textobject>

-      </mediaobject>

-    </figure>

-    

-    <para>A simple or primitive UIMA Analysis Engine (AE) contains a single annotator. AEs,

-      however, may be defined to contain other AEs organized in a workflow. These more complex

-      analysis engines are called <emphasis role="bold">Aggregate Analysis

-      Engines.</emphasis> </para>

-    

-    <para>Annotators tend to perform fairly granular functions, for example language

-      detection, tokenization or part of speech detection. 

-    These functions typically address just part of an overall analysis task. A workflow 

-      of component engines may be orchestrated to perform more complex tasks.</para>

-    

-    <para>An AE that performs named entity detection, for example, may

-      include a pipeline of annotators starting with language detection feeding

-      tokenization, then part-of-speech detection, then deep grammatical parsing and then

-      finally named-entity detection. Each step in the pipeline is required by the

-      subsequent analysis. For example, the final named-entity annotator can only do its

-      analysis if the previous deep grammatical parse was recorded in the CAS.</para>

-    

-    <para>Aggregate AEs are built to encapsulate potentially complex internal structure

-      and insulate it from users of the AE. In our example, the aggregate analysis engine

-      developer acquires the internal components, defines the necessary flow

-      between them and publishes the resulting AE. Consider the simple example illustrated

-      in <xref linkend="ugr.ovv.conceptual.sample_aggregate"/> where

-      <quote>MyNamed-EntityDetector</quote> is composed of a linear flow of more

-      primitive analysis engines.</para>

-    

-    <para>Users of this AE need not know how it is constructed internally but only need its name

-      and its published input requirements and output types. These must be declared in the

-      aggregate AE&apos;s descriptor. Aggregate AE&apos;s descriptors declare the components

-      they contain and a <emphasis role="bold">flow specification</emphasis>. The flow

-      specification defines the order in which the internal component AEs should be run. The

-      internal AEs specified in an aggregate are also called the <emphasis role="bold">

-      delegate analysis engines.</emphasis> The term "delegate" is used because aggregate AE's 

-      are thought to "delegate" functions to their internal AEs.</para>

-    

-    <para>

-      In UIMA 2.0, the developer can implement a "Flow Controller" and include it as part 

-      of an aggregate AE by referring to it in the aggregate AE's descriptor. 

-      The flow controller is responsible for computing the "flow", that is, 

-      for determining the order in which of delegate AE's that will process the CAS. 

-      The Flow Contoller has access to the CAS and any external resources it may require 

-      for determining the flow. It can do this dynamically at run-time, it can 

-      make multi-step decisions and it can consider any sort of flow specification 

-      included in the aggregate AE's descriptor. See

-      <olink targetdoc="&uima_docs_tutorial_guides;"/> 

-      <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.fc"/> 

-      for details on the UIMA Flow Controller interface.

-    </para>

-    

-    <para>We refer to the development role associated with building an aggregate from

-      delegate AEs as the <emphasis role="bold">Analysis Engine Assembler</emphasis>

-      .</para>

-    

-    <para>The UIMA framework, given an aggregate analysis engine descriptor, will run all

-      delegate AEs, ensuring that each one gets access to the CAS in the sequence produced by

-      the flow controller. The UIMA framework is equipped to handle different

-      deployments where the delegate engines, for example, are <emphasis role="bold">

-      tightly-coupled</emphasis> (running in the same process) or <emphasis role="bold">

-      loosely-coupled</emphasis> (running in separate processes or even on different

-      machines). The framework supports a number of remote protocols for loose coupling

-      deployments of aggregate analysis engines.</para>

-    

-    <para>The UIMA framework facilitates the deployment of AEs as remote services by using an

-      adapter layer that automatically creates the necessary infrastructure in response to

-      a declaration in the component&apos;s descriptor. For more details on creating

-      aggregate analysis engines refer to <olink targetdoc="&uima_docs_ref;"

-        /> <olink targetdoc="&uima_docs_ref;"

-        targetptr="ugr.ref.xml.component_descriptor"/> The component descriptor editor tool

-      assists in the specification of aggregate AEs from a repository of available engines.

-      For more details on this tool refer to <olink targetdoc="&uima_docs_tools;"

-        /> <olink targetdoc="&uima_docs_tools;"

-        targetptr="ugr.tools.cde"/>.</para>

-    

-    <para>The UIMA framework implementation has two built-in flow implementations: one

-      that support a linear flow between components, and one with conditional branching

-      based on the language of the document. It also supports user-provided flow

-      controllers, as described in <olink targetdoc="&uima_docs_tutorial_guides;"

-        /> <olink targetdoc="&uima_docs_tutorial_guides;"

-        targetptr="ugr.tug.fc"/>. Furthermore, the application developer is

-      free to create multiple AEs and provide their own logic to combine the AEs in arbitrarily

-      complex flows. For more details on this the reader may refer to <olink

-        targetdoc="&uima_docs_tutorial_guides;"/> <olink

-        targetdoc="&uima_docs_tutorial_guides;"

-        targetptr="ugr.tug.application.using_aes"/>.</para>

-    

-  </section>

-  

-  <section id="ugr.ovv.conceptual.applicaiton_building_and_collection_processing">

-    <title>Application Building and Collection Processing</title>

-    

-    <note><title>&key_concepts;</title><para>Process Method, Collection Processing Architecture,

-      Collection Reader, CAS Consumer, CAS Initializer, Collection Processing Engine,

-      Collection Processing Manager.</para></note>

-    

-    <section id="ugr.ovv.conceptual.using_framework_from_an_application">

-      <title>Using the framework from an Application</title>

-      

-      <figure id="ugr.ovv.conceptual.application_factory_ae">

-        <title>Using UIMA Framework to create and interact with an Analysis Engine</title>

-        <mediaobject>

-          <imageobject role="html">

-            <imagedata width="618px" align="center" format="PNG" fileref="&imgroot;image008.png"/>

-          </imageobject>

-          <imageobject role="fo">

-            <imagedata width="5.5in" align="center" format="PNG" fileref="&imgroot;image008.png"/>

-          </imageobject>

-          <textobject><phrase>Picture of application interacting with UIMA&apos;s

-            factory to produce an analysis engine, which acts as a container for annotators,

-            and interfaces with the application via the process and getMetaData methods

-            among others.</phrase>

-          </textobject>

-        </mediaobject>

-      </figure>

-      

-      <para>As mentioned above, the basic AE interface may be thought of as simply CAS in/CAS

-        out.</para>

-      

-      <para>The application is responsible for interacting with the UIMA framework to

-        instantiate an AE, create or acquire an input CAS, initialize the input CAS with a

-        document and then pass it to the AE through the <emphasis role="bold">process

-        method</emphasis>. This interaction with the framework is illustrated in <xref

-          linkend="ugr.ovv.conceptual.application_factory_ae"/>. </para>

-      

-      <para>The UIMA AE Factory takes the declarative information from the Component

-        Descriptor and the class files implementing the annotator, and instantiates the AE

-        instance, setting up the CAS and the UIMA Context.</para>

-      

-      <para>The AE, possibly calling many delegate AEs internally, performs the overall

-        analysis and its process method returns the CAS containing new analysis results.

-        </para>

-      

-      <para>The application then decides what to do with the returned CAS. There are many

-        possibilities. For instance the application could: display the results, store the

-        CAS to disk for post processing, extract and index analysis results as part of a search

-        or database application etc.</para>

-      

-      <para>The UIMA framework provides methods to support the application developer in

-        creating and managing CASes and instantiating, running and managing AEs. Details

-        may be found in <olink targetdoc="&uima_docs_tutorial_guides;"

-        /> <olink targetdoc="&uima_docs_tutorial_guides;"

-          targetptr="ugr.tug.application"/>.</para>

-    </section>

-    

-    <section id="ugr.ovv.conceptual.graduating_to_collection_processing">

-      <title>Graduating to Collection Processing</title>

-      <figure id="ugr.ovv.conceptual.fig.cpe">

-        <title>High-Level UIMA Component Architecture from Source to Sink</title>

-        <mediaobject>

-          <imageobject role="html">

-            <imagedata width="578px" format="PNG" align="center" fileref="&imgroot;image010.png"/>

-          </imageobject>

-          <imageobject role="fo">

-            <imagedata width="5.5in" format="PNG" align="center" fileref="&imgroot;image010.png"/>

-          </imageobject>

-        </mediaobject>

-      </figure>

-      

-      <para>Many UIM applications analyze entire collections of documents. They connect to

-        different document sources and do different things with the results. But in the

-        typical case, the application must generally follow these logical steps:

-        

-        <orderedlist spacing="compact">

-          <listitem><para>Connect to a physical source</para></listitem>

-          <listitem><para>Acquire a document from the source</para></listitem>

-          <listitem><para>Initialize a CAS with the document to be analyzed</para>

-            </listitem>

-          <listitem><para>Send the CAS to a selected analysis engine</para></listitem>

-          <listitem><para>Process the resulting CAS</para></listitem>

-          <listitem><para>Go back to 2 until the collection is processed</para>

-            </listitem>

-          <listitem><para>Do any final processing required after all the documents in the

-            collection have been analyzed</para></listitem>

-        </orderedlist> </para>

-      

-      <para>UIMA supports UIM application development for this general type of processing

-        through its <emphasis role="bold">Collection Processing

-        Architecture</emphasis>.</para>

-      

-      <para>As part of the collection processing architecture UIMA introduces two primary

-        components in addition to the annotator and analysis engine. These are the <emphasis

-          role="bold">Collection Reader</emphasis> and the <emphasis role="bold">CAS

-        Consumer</emphasis>. The complete flow from source, through document analysis,

-        and to CAS Consumers supported by UIMA is illustrated in <xref

-          linkend="ugr.ovv.conceptual.fig.cpe"/>.</para>

-      

-      <para>The Collection Reader&apos;s job is to connect to and iterate through a source

-        collection, acquiring documents and initializing CASes for analysis. </para>

-      

-      <!--

-      <para>Since the structure, access and iteration methods for

-      physical document sources vary independently from the format of stored

-      documents, UIMA defines another type of component called a <emphasis role="bold">CAS Intializer</emphasis>.  

-      The CAS Initializer&apos;s job is specific to a

-      document format and specialized logic for mapping that format to a CAS. In the

-      simplest case a CAS Intializer may take the document provided by the containing

-      Collection Reader and insert it as a subject of analysis (or Sofa) in the

-      CAS.  A more advanced scenario is one

-      where the CAS Intializer may be implemented to handle documents that conform to

-      a certain XML schema and map some subset of the XML tags to CAS types and then

-      insert the de-tagged document content as the subject of analysis.  Collection Readers may reuse plug-in CAS

-      Initializers for different document formats.</para>

-      -->

-      

-      <para>CAS Consumers, as the name suggests, function at the end of the flow. Their job is

-        to do the final CAS processing. A CAS Consumer may be implemented, for example, to

-        index CAS contents in a search engine, extract elements of interest and populate a

-        relational database or serialize and store analysis results to disk for subsequent

-        and further analysis. </para>

-      

-      <para>A UIMA <emphasis role="bold">Collection Processing Engine</emphasis> (CPE)

-        is an aggregate component that specifies a <quote>source to sink</quote> flow from a

-        Collection Reader though a set of analysis engines and then to a set of CAS Consumers.

-        </para>

-      

-      <para>CPEs are specified by XML files called CPE Descriptors. These are declarative

-        specifications that point to their contained components (Collection Readers,

-        analysis engines and CAS Consumers) and indicate a flow among them. The flow

-        specification allows for filtering capabilities to, for example, skip over AEs

-        based on CAS contents. Details about the format of CPE Descriptors may be found in

-        <olink targetdoc="&uima_docs_ref;"/>

-          <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/>.

-        </para>

-      

-      <figure id="ugr.ovv.conceptual.fig.cpm">

-        <title>Collection Processing Manager in UIMA Framework</title>

-        <mediaobject>

-          <imageobject role="html">

-            <imagedata width="576px" align="center" format="PNG" fileref="&imgroot;image012.png"/>

-          </imageobject>

-          <imageobject role="fo">

-            <imagedata width="5.5in" align="center" format="PNG" fileref="&imgroot;image012.png"/>

-          </imageobject>

-          <textobject><phrase>box and arrows picture of application using CPE factory to

-            instantiate a Collection Processing Engine, and that engine interacting with

-            the application.</phrase></textobject>

-        </mediaobject>

-      </figure>

-      

-      <para>The UIMA framework includes a <emphasis role="bold">Collection Processing

-        Manager</emphasis> (CPM). The CPM is capable of reading a CPE descriptor, and

-        deploying and running the specified CPE. <xref

-          linkend="ugr.ovv.conceptual.fig.cpe"/> illustrates the role of the CPM

-        in the UIMA Framework.</para>

-      

-      <para>Key features of the CPM are failure recovery, CAS management and scale-out.

-        </para>

-      

-      <para>Collections may be large and take considerable time to analyze. A configurable

-        behavior of the CPM is to log faults on single document failures while continuing to

-        process the collection. This behavior is commonly used because analysis components

-        often tend to be the weakest link -- in practice they may choke on strangely formatted

-        content. </para>

-      

-      <para>This deployment option requires that the CPM run in a separate process or a

-        machine distinct from the CPE components. A CPE may be configured to run with a variety

-        of deployment options that control the features provided by the CPM. For details see

-        <olink targetdoc="&uima_docs_ref;"/>

-          <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/>

-        .</para>

-      

-      <para>The UIMA SDK also provides a tool called the CPE Configurator. This tool provides

-        the developer with a user interface that simplifies the process of connecting up all

-        the components in a CPE and running the result. For details on using the CPE

-        Configurator see <olink targetdoc="&uima_docs_tools;"

-        /> <olink targetdoc="&uima_docs_tools;"

-          targetptr="ugr.tools.cpe"/>. This tool currently does not provide

-        access to the full set of CPE deployment options supported by the CPM; however, you can

-        configure other parts of the CPE descriptor by editing it directly. For details on how

-        to create and run CPEs refer to <olink targetdoc="&uima_docs_tutorial_guides;"

-        /> <olink targetdoc="&uima_docs_tutorial_guides;"

-          targetptr="ugr.tug.cpe"/>.</para>

-      

-    </section>

-    

-  </section>

-  

-  <section id="ugr.ovv.conceptual.exploiting_analysis_results">

-    <title>Exploiting Analysis Results</title>

-    

-    <note><title>&key_concepts;</title><para>Semantic Search, XML Fragment Queries.</para>

-    </note>

-    

-    <section id="ugr.ovv.conceptual.semantic_search">

-      <title>Semantic Search</title>

-      

-      <para>In a simple UIMA Collection Processing Engine (CPE), a Collection Reader reads

-        documents from the file system and initializes CASs with their content. These are

-        then fed to an AE that annotates tokens and sentences, the CASs, now enriched with

-        token and sentence information, are passed to a CAS Consumer that populates a search

-        engine index. </para>

-      

-      <para>The search engine query processor can then use the token index to provide basic

-        key-word search. For example, given a query <quote>center</quote> the search

-        engine would return all the documents that contained the word

-        <quote>center</quote>.</para>

-      

-      <para><emphasis role="bold">Semantic Search</emphasis> is a search paradigm that

-        can exploit the additional metadata generated by analytics like a UIMA CPE.</para>

-      

-      <para>Consider that we plugged a named-entity recognizer into the CPE described

-        above. Assume this analysis engine is capable of detecting in documents and

-        annotating in the CAS mentions of persons and organizations.</para>

-      

-      <para>Complementing the name-entity recognizer we add a CAS Consumer that extracts in

-        addition to token and sentence annotations, the person and organizations added to

-        the CASs by the name-entity detector. It then feeds these into the semantic search

-        engine&apos;s index.</para>

-      

-      <para>A semantic search engine can exploit

-        this addition information from the CAS to support more powerful queries. For

-        example, imagine a user is looking for documents that mention an organization with

-        <quote>center</quote> it is name but is not sure of the full or precise name of the

-        organization. A key-word search on <quote>center</quote> would likely produce way

-        too many documents because <quote>center</quote> is a common and ambiguous term.

-        A semantic search engine might support a query language called

-        <emphasis role="bold">XML Fragments</emphasis>. This query language is

-        designed to exploit the CAS annotations entered in its index. The XML Fragment query,

-        for example,

-        

-        

-        <programlisting>&lt;organization&gt; center &lt;/organization&gt;</programlisting>

-        will produce first only documents that contain <quote>center</quote> where it

-        appears as part of a mention annotated as an organization by the name-entity

-        recognizer. This will likely be a much shorter list of documents more precisely

-        matching the user&apos;s interest.</para>

-      

-      <para>Consider taking this one step further. We add a relationship recognizer that

-        annotates mentions of the CEO-of relationship. We configure the CAS Consumer so that

-        it sends these new relationship annotations to the semantic search index as well.

-        With these additional analysis results in the index we can submit queries like

-        

-        

-        <programlisting>&lt;ceo_of&gt;

-    &lt;person&gt; center &lt;/person&gt;

-    &lt;organization&gt; center &lt;/organization&gt;

-&lt;ceo_of&gt;</programlisting>

-        This query will precisely target documents that contain a mention of an organization

-        with <quote>center</quote> as part of its name where that organization is mentioned

-        as part of a

-        <code>CEO-of</code> relationship annotated by the relationship

-        recognizer.</para>

-      

-      <para>For more details about using UIMA and Semantic Search see the section on

-        integrating text analysis and search in <olink

-          targetdoc="&uima_docs_tutorial_guides;"/> <olink

-          targetdoc="&uima_docs_tutorial_guides;"

-          targetptr="ugr.tug.application"/>.</para>

-    </section>

-    

-    <section id="ugr.ovv.conceptual.databases">

-      <title>Databases</title>

-      

-      <para>Search engine indices are not the only place to deposit analysis results for use

-        by applications. Another classic example is populating databases. While many

-        approaches are possible with varying degrees of flexibly and performance all are

-        highly dependent on application specifics. We included a simple sample CAS Consumer

-        that provides the basics for getting your analysis result into a relational

-        database. It extracts annotations from a CAS and writes them to a relational

-        database, using the open source Apache Derby database.</para>

-    </section>

-  </section>

-  

-  <section id="ugr.ovv.conceptual.multimodal_processing">

-    <title>Multimodal Processing in UIMA</title>

-    <para>In previous sections we&apos;ve seen how the CAS is initialized with an initial

-      artifact that will be subsequently analyzed by Analysis engines and CAS Consumers. The

-      first Analysis engine may make some assertions about the artifact, for example, in the

-      form of annotations. Subsequent Analysis engines will make further assertions about

-      both the artifact and previous analysis results, and finally one or more CAS Consumers

-      will extract information from these CASs for structured information storage.</para>

-    <figure id="ugr.ovv.conceptual.fig.multiple_sofas">

-      <title>Multiple Sofas in support of multi-modal analysis of an audio Stream. Some

-        engines work on the audio <quote>view</quote>, some on the text

-        <quote>view</quote> and some on both.</title>

-      <mediaobject>

-        <imageobject role="html">

-          <imagedata width="576px" format="PNG" align="center" fileref="&imgroot;image014.png"/>

-        </imageobject>

-        <imageobject role="fo">

-          <imagedata width="5.5in" format="PNG" align="center" fileref="&imgroot;image014.png"/>

-        </imageobject>

-        <textobject><phrase>Picture showing audio on the left broken into segments by a

-          segmentation component, then sent to multiple analysis pipelines in parallel,

-          some processing the raw audio, others processing the recognized speech as

-          text.</phrase></textobject>

-      </mediaobject>

-    </figure>

-    <para>Consider a processing pipeline, illustrated in <xref

-        linkend="ugr.ovv.conceptual.fig.multiple_sofas"/>, that starts with an

-      audio recording of a conversation, transcribes the audio into text, and then extracts

-      information from the text transcript. Analysis Engines at the start of the pipeline are

-      analyzing an audio subject of analysis, and later analysis engines are analyzing a text

-      subject of analysis. The CAS Consumer will likely want to build a search index from

-      concepts found in the text to the original audio segment covered by the concept.</para>

-    

-    <para>What becomes clear from this relatively simple scenario is that the CAS must be

-      capable of simultaneously holding multiple subjects of analysis. Some analysis

-      engine will analyze only one subject of analysis, some will analyze one and create

-      another, and some will need to access multiple subjects of analysis at the same time.

-      </para>

-    

-    <para>The support in UIMA for multiple subjects of analysis is called <emphasis

-        role="bold">Sofa</emphasis> support; Sofa is an acronym which is derived from

-        <emphasis role="underline">S</emphasis>ubject <emphasis role="underline">

-      of</emphasis> <emphasis role="underline">A</emphasis>nalysis, which is a physical 

-      representation of an artifact (e.g., the detagged text of a web-page, the HTML 

-      text of the same web-page, the audio segment of a video, the close-caption text 

-      of the same audio segment). A Sofa may

-      be associated with CAS Views. A particular CAS will have one or more views, each view

-      corresponding to a particular subject of analysis, together with a set of the defined

-      indexes that index the metadata (that is, Feature Structures) created in that view.</para>

-    

-    <para>Analysis results can be indexed in, or <quote>belong</quote> to, a specific view.

-      UIMA components may be written in <quote>Multi-View</quote> mode - able to create and

-      access multiple Sofas at the same time, or in <quote>Single-View</quote> mode, simply

-      receiving a particular view of the CAS corresponding to a particular single Sofa. For

-      single-view mode components, it is up to the person assembling the component to supply

-      the needed information to insure a particular view is passed to the component at run

-      time. This is done using XML descriptors for Sofa mapping (see <olink

-        targetdoc="&uima_docs_tutorial_guides;"/> <olink

-        targetdoc="&uima_docs_tutorial_guides;"

-        targetptr="ugr.tug.mvs.sofa_name_mapping"/>).</para>

-    

-    <para>Multi-View capability brings benefits to text-only processing as well. An input

-      document can be transformed from one format to another. Examples of this include

-      transforming text from HTML to plain text or from one natural language to another.

-      </para>

-  </section>

-  

-  <section id="ugr.ovv.conceptual.next_steps">

-    <title>Next Steps</title>

-    

-    <para>This chapter presented a high-level overview of UIMA concepts. Along the way, it

-      pointed to other documents in the UIMA SDK documentation set where the reader can find

-      details on how to apply the related concepts in building applications with the UIMA

-      SDK.</para>

-    

-    <para>At this point the reader may return to the documentation guide in <olink

-        targetdoc="&uima_docs_overview;" targetptr="ugr.project_overview_doc_use"/>

-      to learn how they might proceed in getting started using UIMA.</para>

-    

-    <para>For a more detailed overview of the UIMA architecture, framework and development

-      roles we refer the reader to the following paper:</para>

-    

-    <para>D. Ferrucci and A. Lally, <quote>Building an example application using the

-      Unstructured Information Management Architecture,</quote> <emphasis>IBM Systems

-      Journal</emphasis> <emphasis role="bold">43</emphasis>, No. 3, 455-475 (2004).

-      </para>

-    

-    <para>This paper can be found on line at <ulink

-        url="http://www.research.ibm.com/journal/sj43-3.html"/></para>

-  </section>

-  

-</chapter>

diff --git a/uima-docbook-overview-and-setup/src/docbook/eclipse_setup.xml b/uima-docbook-overview-and-setup/src/docbook/eclipse_setup.xml
deleted file mode 100644
index d3010c9..0000000
--- a/uima-docbook-overview-and-setup/src/docbook/eclipse_setup.xml
+++ /dev/null
@@ -1,431 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [

-<!ENTITY imgroot "images/overview-and-setup/eclipse_setup_files/" >

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.ovv.eclipse_setup">

-  <title>Setting up the Eclipse IDE to work with UIMA</title>

-  <titleabbrev>Eclipse IDE setup for UIMA</titleabbrev>

-  

-  <para>This chapter describes how to set up the UIMA SDK to work with Eclipse. Eclipse (<ulink

-      url="&url_eclipse;"/>) is a popular open-source Integrated Development

-    Environment for many things, including Java. The UIMA SDK does not require that you use

-    Eclipse. However, we recommend that you do use Eclipse because some useful UIMA SDK tools

-    run as plug-ins to the Eclipse platform and because the UIMA SDK examples are provided in a

-    form that's easy to import into your Eclipse environment.</para>

-  

-  <para>If you are not planning on using the UIMA SDK with Eclipse, you may skip this chapter and

-    read <olink targetdoc="&uima_docs_tutorial_guides;"/>

-    <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aae"/>

-    next.</para>

-  

-  <para>This chapter provides instructions for

-    

-    <itemizedlist spacing="compact"><listitem><para>installing Eclipse, </para>

-      </listitem>

-      

-      <listitem><para>installing the UIMA SDK's Eclipse plugins into your Eclipse

-        environment, and </para></listitem>

-      

-      <listitem><para>importing the example UIMA code into an Eclipse project. </para>

-        </listitem></itemizedlist></para>

-  

-  <para>The UIMA Eclipse plugins are designed to be used with Eclipse version 4.10 (2018-12) or

-    later.

-  </para>

-  

-  <note><para>You will need to run Eclipse using a Java at the 1.8 or later level, in order

-  to use the UIMA Eclipse plugins.</para></note>

-  

-  <section id="ugr.ovv.eclipse_setup.installation">

-    <title>Installation</title>

-    <section id="ugr.ovv.eclipse_setup.install_eclipse">

-      <title>Install Eclipse</title>

-      

-      <itemizedlist spacing="compact"><listitem><para>Go to <ulink

-          url="&url_eclipse;"/> and follow the instructions there to download Eclipse.

-        </para></listitem>

-        

-        <listitem><para>We recommend using the latest release level. 

-          Navigate to the Eclipse Release version you

-          want and download the archive for your platform.</para></listitem>

-        

-        <listitem><para>Unzip the archive to install Eclipse somewhere, e.g., c:\</para>

-          </listitem>

-        

-        <listitem><para>Eclipse has a bit of a learning curve. If you plan to make

-          significant use of Eclipse, check out the tutorial under the help menu. It is well

-          worth the effort. There are also books you can get that describe Eclipse and its

-        use.</para></listitem></itemizedlist>

-      

-      <para>The first time Eclipse starts up it will take a bit longer as it completes its

-        installation. A <quote>welcome</quote> page will come up. After you are through

-        reading the welcome information, click on the arrow to exit the welcome page and get to

-        the main Eclipse screens.</para>

-    </section>

-    

-    <section id="ugr.ovv.eclipse_setup.install_uima_eclipse_plugins">

-      <title>Installing the UIMA Eclipse Plugins</title>

-      

-      <para>The best way to do this is to use the Eclipse Install New Software mechanism, because that will 

-        insure that all needed prerequisites are also installed.  See below for an alternative,

-        manual approach.</para>

-      

-        <note><para>If your computer is on an internet connection which uses a proxy server, you can

-        configure Eclipse to know about that. Put your proxy settings into Eclipse using the

-        Eclipse preferences by accessing the menus: Window &rarr; Preferences... &rarr;

-        Install/Update, and Enable HTTP proxy connection under the Proxy Settings with the

-        information about your proxy. </para></note>

-

-      

-      <para>To use the Eclipse Install New Software mechanism, start Eclipse, and then pick the menu 

-        <command>Help &rarr; Install new software...</command>.  In the next page, enter

-        the following URL in the "Work with" box and press enter:

-        <itemizedlist>

-          <listitem><para></para><code>https://www.apache.org/dist/uima/eclipse-update-site/</code> or</listitem>

-          <listitem><para></para><code>https://www.apache.org/dist/uima/eclipse-update-site-v3/</code>.</listitem>

-        </itemizedlist>

-        Choose the 2nd if you are working with core UIMA Java SDK at version 3 or later.

-       .</para>

-      

-      <para>Now select the plugin tools you wish to install, and click Next, and follow the 

-        remaining panels to install the UIMA plugins.  </para>

-    </section>

-

-    <!--      

-    <section id="ugr.ovv.eclipse_setup.install_emf">

-      <title>Manual Install additional Eclipse component: EMF</title>

-      <para>You can skip this section if you installed EMF using the above process.</para>

-      

-      <warning><para>EMF comes in many versions; <emphasis role="bold">you must install

-      the version that corresponds to the level of Eclipse that you are running.</emphasis>

-      This is automatically done for you if you install it using the Eclipse update mechanism,

-      described below. If you separately download an EMF package, you will need to verify it is

-      the version that corresponds to the level of Eclipse you are running, before installing

-      it.</para></warning>

-      

-      <para>Before installing EMF using these instructions, please go to <ulink

-          url="&url_emf;"/> and read the installation instructions, and then click on the

-        "Update Manager" link to see what url to use in the next step, where you use the built-in

-        facilities in Eclipse to find and install new features. </para>

-      

-      <para> The exact way to install EMF changes from time to time. In the next few paragraphs,

-        we try to give instructions that should work for most versions. Please see the end of

-        this section for shortcut instructions for the current version of Eclipse at the time

-        of this writing, Eclipse 3.3. </para>

-      

-      <para>Activate the software feature finding by using the menu: Help &rarr; Software

-        Updates &rarr; Find and Install. Select <quote>Search for new features to

-        install</quote>, push <quote>Next</quote>. Specify the update sites to use to

-        search for EMF, making sure the <quote>Ignore features not applicable to this

-        environment</quote> box is checked (at the bottom of the dialog), and push

-        <quote>Finish</quote>. A good site to use is one of the Discovery Sites (e.g. Callisto or Europa) - which has a

-        collection of Eclipse components including EMF. </para>

-            

-      <para>This will launch a search for updates to Eclipse; it may show a list of update site

-        mirrors &ndash; click OK. When it finishes, it shows a list of possible updates in an

-        expandable tree. Expand the tree nodes to find EMF SDK. The specific level may vary

-        from the level shown below as newer versions are released.</para>

-      

-      <informalfigure>

-        <mediaobject>

-          <imageobject>

-            <imagedata width="4in" format="JPG" fileref="&imgroot;image002.jpg"/>

-          </imageobject>

-          <textobject><phrase>Screenshot showing search results for EMF</phrase>

-          </textobject>

-        </mediaobject>

-      </informalfigure>

-      

-      <para>Click <quote>Next</quote>. Then pick Eclipse Modeling Framework (EMF), and

-        push <quote>Next</quote>, accept any licensing agreements, etc., until it

-        finishes the installation. It may say it's an <quote>unsigned feature</quote>;

-        proceed by clicking <quote>Install</quote>. If it recommends restarting, you may

-        do that.</para>

-      

-      <para>This will install EMF, without any extras. (If you want the whole EMF system,

-        including source and documentation, you can pick the <quote>EMF SDK</quote> and the

-        <quote>Examples for Eclipse Modeling Framework</quote>.)</para>

-      

-      <section id="ugr.ovv.eclipse_setup.install_emf_shortcut">

-        <title>EMF Installation Shortcut for Eclipse 3.2</title>

-        <para>Since Eclipse 3.2, all major Eclipse sub-projects coordinate their

-          release timeframes and publish the consolidated releases. The code name

-          for 3.2 was Callisto, the one for 3.3 is Europa.  You can

-          easily install EMF via the release discovery site as follows.

-          <orderedlist>

-            <listitem><para> From the Eclipse menu, select Help/Software Updates/Find

-              and Install.../Search for new features to install. </para></listitem>

-            <listitem><para> Check the "[release name] discovery site", push "Next". </para>

-              </listitem>

-            <listitem><para> Select a convenient mirror site. </para></listitem>

-            <listitem><para> Check the EMF box under "Models and model development"

-              </para></listitem>

-            <listitem><para> Follow the instructions for the rest of the install. </para>

-              </listitem>

-          </orderedlist> </para>

-      </section>

-      

-    </section>

-     -->

-    

-    <section id="ugr.ovv.eclipse_setup.install_uima_sdk">

-      <title>Install the UIMA SDK</title>

-      <para>If you haven't already done so, please download and install the UIMA SDK from

-          <ulink url="&url_apache_uima_download;"/>.  Be sure to set the environmental variable

-          UIMA_HOME pointing to the root of the installed UIMA SDK and run the

-          <literal>adjustExamplePaths.bat</literal> or <literal>adjustExamplePaths.sh</literal>

-          script, as explained in the README.</para>

-

-      <para>The environmental parameter UIMA_HOME is used by the command-line scripts in the

-          %UIMA_HOME%/bin directory as well as by eclipse run configurations in the uimaj-examples

-          sample project.</para>

-

-    </section>

-    

-    <section id="ugr.ovv.eclipse_setup.install_uima_eclipse_plugins_manually">

-      <title>Installing the UIMA Eclipse Plugins, manually</title>

-      

-      <para>If you installed the UIMA plugins using the update mechanism above, please skip this section.</para>

-      

-      <para>If you are unable to use the Eclipse Update mechanism to install the UIMA plugins, you 

-        can do this manually.  In the directory %UIMA_HOME%/eclipsePlugins (The environment variable

-        %UIMA_HOME% is where you installed the UIMA SDK), you will see a set of folders. Copy these

-        to your %ECLIPSE_HOME%/dropins directory (%ECLIPSE_HOME% is where you

-        installed Eclipse).</para>

-      

-    </section>

-    

-    <section id="ugr.ovv.eclipse_setup.start_eclipse">

-      <title>Start Eclipse</title>

-      <para>If you have Eclipse running, restart it (shut it down, and start it again) using

-        the

-        <code>-clean</code> option; you can do this by running the command

-        <command>eclipse -clean</command> (see explanation in the next section) in the

-        directory where you installed Eclipse. You may want to set up a desktop shortcut at

-        this point for Eclipse.</para>

-      

-      <section id="ugr.ovv.eclipse_setup.special_startup_parameter_clean">

-        <title>Special startup parameter for Eclipse: -clean</title>

-        <para>If you have modified the plugin structure (by copying or files directly in the

-          file system) after you started it for the first time, please include

-          the <quote>-clean</quote> parameter in the startup arguments to Eclipse,

-          <emphasis>one time</emphasis> (after any plugin modifications were done). This

-          is needed because Eclipse may not notice the changes you made, otherwise. This

-          parameter forces Eclipse to reexamine all of its plugins at startup and recompute

-          any cached information about them.</para>

-      </section>

-      

-    </section>

-  </section>

-  <section id="ugr.ovv.eclipse_setup.example_code">

-    <title>Setting up Eclipse to view Example Code</title>

-    <para>Later chapters refer to example code. Here's how to create a special project in Eclipse to

-      hold the examples.</para>

-    

-    <itemizedlist spacing="compact"><listitem><para>In Eclipse, if the Java

-      perspective is not already open, switch to it by going to Window &rarr; Open Perspective

-      &rarr; Java.</para></listitem>

-      

-      <listitem><para>Set up a class path variable named UIMA_HOME, whose value is the

-        directory where you installed the UIMA SDK. This is done as follows:

-        

-        <itemizedlist><listitem><para>Go to Window &rarr; Preferences &rarr; Java

-          &rarr; Build Path &rarr; Classpath Variables.</para></listitem>

-          

-          <listitem><para>Click <quote>New</quote></para></listitem>

-          

-          <listitem><para>Enter UIMA_HOME (all capitals, exactly as written) in the

-            <quote>Name</quote> field.</para></listitem>

-          

-          <listitem><para>Enter your installation directory (e.g. <literal>C:/Program

-            Files/apache-uima</literal>) in the <quote>Path</quote> field</para>

-            </listitem>

-          

-          <listitem><para>Click <quote>OK</quote> in the <quote>New Variable

-            Entry</quote> dialog</para></listitem>

-          

-          <listitem><para>Click <quote>OK</quote> in the <quote>Preferences</quote>

-            dialog</para></listitem>

-          

-          <listitem><para>If it asks you if you want to do a full build, click

-            <quote>Yes</quote> </para></listitem></itemizedlist></para>

-        </listitem>

-      

-      <listitem><para>Select the File &rarr; Import menu option</para></listitem>

-      

-      <listitem><para>Select <quote>General/Existing Project into Workspace</quote> and click

-        the <quote>Next</quote> button.</para></listitem>

-      

-      <listitem><para>Click <quote>Browse</quote> and browse to the

-        %UIMA_HOME%/examples directory</para></listitem>

-      

-      <listitem><para>Click <quote>Finish.</quote> This will create a new project called

-        <quote>uimaj-examples</quote> in your Eclipse workspace. There should be no

-        compilation errors. </para></listitem></itemizedlist>

-    

-    <para>To verify that you have set up the project correctly, check that there are no error

-      messages in the <quote>Problems</quote> view.</para>

-    

-  </section>

-   

-  <section id="ugr.ovv.eclipse_setup.adding_source">

-    <title>Adding the UIMA source code to the jar files</title>

-    

-    <note><para>If you are running a current version of Eclipse, and have the m2e (Maven extensions for Eclipse) 

-    plugin installed, Eclipse should be able to automatically download the source for the jars, so you may not need

-    to do anything special (it does take a few seconds, and you need an internet connection).</para></note>

-    <para>Otherwise, if you would like to be able to jump to the UIMA source code in Eclipse or to step

-    through it with the debugger, you can add the UIMA source code directly to the jar files.  This is

-    done via a shell script that comes with the source distribution.  To add the source code

-    to the jars, you need to:

-    </para>

-    

-    <itemizedlist>

-    

-    <listitem>

-    <para>

-    Download and unpack the UIMA source distribution.

-    </para>

-    </listitem>

-    

-    <listitem>

-    <para>

-    Download and install the UIMA binary distribution (the UIMA_HOME environment variable needs

-    to be set to point to where you installed the UIMA binary distribution).    

-    </para>

-    </listitem>

-    

-    <listitem>

-      <para>"cd" to the root directory of the source distribution</para>

-    </listitem>

-    

-    <listitem>

-    <para>

-    Execute the <command>src\main\readme_src\addSourceToJars</command> script in the root directory of the 

-    source distribution.

-    </para>

-    </listitem>

-    

-    </itemizedlist>

-    

-    <para>

-    This adds the source code to the jar files, and it will then be automatically available

-    from Eclipse.  There is no further Eclipse setup required.

-    </para>

-  

-  </section>

-  

-  

-  <section id="ugr.ovv.eclipse_setup.linking_uima_javadocs">

-     <title>Attaching UIMA Javadocs</title>

-     

-     <para>The binary distribution also includes the UIMA Javadocs.  They are

-       attached to the UIMA library Jar files in the uima-examples project described

-       above.  You can attach the Javadocs to your own project as well.  

-     </para>

-    

-     <note><para>If you attached the source as described in the previous section, you 

-     don't need to attach the Javadocs because the source includes the Javadoc comments.</para></note>

-     

-     <para>Attaching the Javadocs enables Javadoc help for UIMA APIs.  After they are 

-       attached, if you hover your mouse

-     over a certain UIMA api element, the corresponding Javadoc will appear.  

-       You can then press <quote>F2</quote> to make the hover "stick", or 

-       <quote>Shift-F2</quote> to open the default 

-       web-browser on your system to let you browse the entire Javadoc information 

-       for that element.

-     </para>

-     <para>If this pop-up behavior is something you don't want, you can turn it off

-     in the Eclipse preferences, in the menu Window &rarr; Preferences &rarr;

-       Java &rarr; Editors &rarr; hovers.

-     </para>

-    

-     <para>Eclipse also has a Javadoc "view" which you can show, using the Window &rarr;

-     Show View &rarr; Javadoc.</para>

-   

-     <para>See <olink targetdoc="&uima_docs_ref;"/>

-     <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.javadocs.libraries"/>

-     for information on how to set up a UIMA "library" with the Javadocs attached, which

-     can be reused for other projects in your Eclipse workspace.</para>

-                

-     <para>You can attach the Javadocs to each UIMA library jar you think you might be 

-       interested in.  It makes most sense

-       for the uima-core.jar, you'll probably use the core APIs most of all.

-     </para>

-     

-     <para>Here's a screenshot of what you should see when you hover your mouse pointer over the

-     class name <quote>CAS</quote> in the source code.

-     </para>

-

-       <informalfigure>

-         <mediaobject>

-           <imageobject>

-             <imagedata width="5.7in" format="JPG" fileref="&imgroot;image004.jpg"/>

-           </imageobject>

-           <textobject><phrase>Screenshot of mouse-over for UIMA APIs</phrase>

-           </textobject>

-         </mediaobject>

-       </informalfigure>

-       

-   </section>

-   

-  <section id="ugr.ovv.eclipse_setup.running_external_tools_from_eclipse">

-    <title>Running external tools from Eclipse</title>

-    

-    <para>You can run many tools without using Eclipse at all, by using the shell scripts in the

-      UIMA SDK's bin directory. In addition, many tools can be run from inside Eclipse;

-      examples are the Document Analyzer, CPE Configurator, CAS Visual Debugger, 

-      and JCasGen. The uimaj-examples project provides Eclipse launch

-      configurations that make this easy to do.</para>

-    

-    <para>To run these tools from Eclipse:</para>

-    

-    <itemizedlist spacing="compact"><listitem><para>If the Java perspective is not

-      already open, switch to it by going to Window &rarr; Open Perspective &rarr;

-      Java.</para></listitem>

-      

-      <listitem><para>Go to Run &rarr; Run... </para></listitem>

-      

-      <listitem><para>In the window that appears, select <quote>UIMA CPE GUI</quote>,

-        <quote>UIMA CAS Visual Debugger</quote>, <quote>UIMA JCasGen</quote>, or 

-        <quote>UIMA Document Analyzer</quote>

-        from the list of run configurations on the left. (If you don't see, these, please

-        select the uimaj-examples project and do a Menu &rarr; File

-        &rarr; Refresh).</para></listitem>

-      

-      <listitem><para>Press the <quote>Run</quote> button. The tools should start. Close

-        the tools by clicking the <quote>X</quote> in the upper right corner on the GUI.

-        </para></listitem></itemizedlist>

-    

-    <para>For instructions on using the Document Analyzer and CPE Configurator, 

-      in the <olink targetdoc="&uima_docs_tools;"/> book see <olink

-        targetdoc="&uima_docs_tools;" targetptr="ugr.tools.doc_analyzer"/>, and

-        <olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cpe"/> For

-      instructions on using the CAS Visual Debugger and JCasGen, see <olink

-        targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cvd"/> and

-        <olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.jcasgen"/></para>

-    

-  </section>

-  

-</chapter>

diff --git a/uima-docbook-overview-and-setup/src/docbook/faqs.xml b/uima-docbook-overview-and-setup/src/docbook/faqs.xml
deleted file mode 100644
index f6cd391..0000000
--- a/uima-docbook-overview-and-setup/src/docbook/faqs.xml
+++ /dev/null
@@ -1,411 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.faqs">

-  <title>UIMA Frequently Asked Questions (FAQ&apos;s)</title>

-  <titleabbrev>UIMA FAQ&apos;s</titleabbrev>

-

-  <variablelist>

-    <varlistentry id="ugr.faqs.what_is_uima">

-    <term><emphasis role="bold">What is UIMA?</emphasis></term>

-        <listitem><para>UIMA stands for Unstructured Information Management

-          Architecture. It is component software architecture for the development,

-          discovery, composition and deployment of multi-modal analytics for the analysis

-          of unstructured information.</para>

-          <para>UIMA processing occurs through a series of modules called 

-            <link linkend="ugr.faqs.annotator_versus_ae">analysis engines</link>. The result of analysis is an assignment of semantics to the elements of

-            unstructured data, for example, the indication that the phrase

-            <quote>Washington</quote> refers to a person&apos;s name or that it refers to a

-            place.</para>

-          

-          <para>Analysis Engine&apos;s output can be saved in conventional structures,

-            for example, relational databases or search engine indices, where the content

-            of the original unstructured information may be efficiently accessed

-            according to its inferred semantics. </para>

-          

-          <para>UIMA supports developers in creating,

-            integrating, and deploying components across platforms and among dispersed

-            teams working to develop unstructured information management

-            applications.</para>

-        </listitem>

-      </varlistentry>

-      <varlistentry id="ugr.faqs.pronounce">

-        <term><emphasis role="bold">How do you pronounce UIMA?</emphasis></term>

-        <listitem><para>You &ndash; eee &ndash; muh. 

-        <!-- Or, in IPA notation, /juːiːmə/ (which does not

-        display correctly in our PDF documentation, so it's commented out). --></para></listitem>

-      </varlistentry>

-      <varlistentry id="ugr.faqs.difference_apache_uima">

-        <term><emphasis role="bold">What&apos;s the difference between UIMA and the Apache UIMA?</emphasis></term>

-        <listitem><para>UIMA is an architecture which specifies component interfaces,

-          design patterns, data representations and development roles.</para>

-          

-          <para>Apache UIMA is an open source, Apache-licensed software project.  It includes run-time

-            frameworks in Java and C++, APIs and tools for implementing, composing, packaging

-            and deploying UIMA components.</para>

-          

-          <para>The UIMA run-time framework allows developers to plug-in their components

-            and applications and run them on different platforms and according to different

-            deployment options that range from tightly-coupled (running in the same

-            process space) to loosely-coupled (distributed across different processes or

-            machines for greater scale, flexibility and recoverability).</para>

-            

-          <para>The UIMA project has several significant subprojects, including UIMA-AS (for flexibly

-          scaling out UIMA pipelines over clusters of machines), uimaFIT (for a way of using UIMA without the xml descriptors; also provides 

-          many convenience methods), UIMA-DUCC (for managing clusters of 

-          machines running scaled-out UIMA "jobs" in a "fair" way), RUTA (Eclipse-based tooling and \

-          a runtime framework for development of rule-based

-          Annotators), Addons (where you can find many extensions), and uimaFIT supplying a Java centric

-          set of friendlier interfaces and avoiding XML.</para>

-        </listitem>

-      </varlistentry> 

-      

-      <varlistentry id="ugr.faqs.what_is_an_annotation">

-        

-        <term><emphasis role="bold">What is an Annotation?</emphasis></term>

-        <listitem><para>An annotation is metadata that is associated with a region of a

-          document. It often is a label, typically represented as string of characters. The

-          region may be the whole document. </para>

-          

-          <para>An example is the label <quote>Person</quote> associated with the span of

-            text <quote>George Washington</quote>. We say that <quote>Person</quote>

-            annotates <quote>George Washington</quote> in the sentence <quote>George

-            Washington was the first president of the United States</quote>. The

-            association of the label

-            <quote>Person</quote> with a particular span of text is an annotation. Another

-            example may have an annotation represent a topic, like <quote>American

-            Presidents</quote> and be used to label an entire document.</para>

-          

-          <para>Annotations are not limited to regions of texts. An annotation may annotate

-            a region of an image or a segment of audio. The same concepts apply.</para>

-        </listitem>

-      </varlistentry>

- 

-  

-      <varlistentry id="ugr.faqs.what_is_the_cas">

-        <term><emphasis role="bold">What is the CAS?</emphasis></term>

-        <listitem><para>The CAS stands for Common Analysis Structure. It provides

-          cooperating UIMA components with a common representation and mechanism for

-          shared access to the artifact being analyzed (e.g., a document, audio file, video

-          stream etc.) and the current analysis results.</para></listitem>

-      </varlistentry>

-      <varlistentry id="ugr.faqs.what_does_the_cas_contain">

-        <term><emphasis role="bold">What does the CAS contain?</emphasis></term>

-        <listitem><para>The CAS is a data structure for which UIMA provides multiple

-          interfaces. It contains and provides the analysis algorithm or application

-          developer with access to</para>

-          

-          <itemizedlist spacing="compact">

-            

-            <listitem><para>the subject of analysis (the artifact being analyzed, like

-              the document),</para></listitem>

-            

-            <listitem><para>the analysis results or metadata(e.g., annotations, parse

-              trees, relations, entities etc.),</para></listitem>

-            

-            <listitem><para>indices to the analysis results, and</para></listitem>

-            

-            <listitem><para>the type system (a schema for the analysis results).</para>

-            </listitem>

-          </itemizedlist>

-          

-          <para>A CAS can hold multiple versions of the artifact being analyzed (for

-            instance, a raw html document, and a detagged version, or an English version and a

-            corresponding German version, or an audio sample, and the text that

-            corresponds, etc.). For each version there is a separate instance of the results

-            indices.</para></listitem>

-      </varlistentry>

-      

-      <varlistentry id="ugr.faqs.only_annotations">

-        <term><emphasis role="bold">Does the CAS only contain Annotations?</emphasis></term>

-        <listitem><para>No. The CAS contains the artifact being analyzed plus the analysis

-          results. Analysis results are those metadata recorded by <link linkend="ugr.faqs.annotator_versus_ae">analysis engines</link> in the

-          CAS. The most common form of analysis result is the addition of an annotation. But an

-          analysis engine may write any structure that conforms to the CAS&apos;s type

-          system into the CAS. These may not be annotations but may be other things, for

-          example links between annotations and properties of objects associated with

-          annotations.</para>

-          <para>The CAS may have multiple representations of the artifact being analyzed, each one

-            represented in the CAS as a particular Subject of Analysis. or <link linkend="ugr.faqs.what_is_a_sofa">Sofa</link></para></listitem>

-      </varlistentry>

-      

-      <varlistentry id="ugr.faqs.just_xml">

-        <term><emphasis role="bold">Is the CAS just XML?</emphasis></term>

-        <listitem><para>No, in fact there are many possible representations of the CAS. If all

-          of the <link linkend="ugr.faqs.annotator_versus_ae">analysis engines</link> are running in the same process, an efficient, in-memory

-          data object is used. If a CAS must be sent to an analysis engine on a remote machine, it

-          can be done via an XML or a binary serialization of the CAS. </para>

-          

-          <para>The UIMA framework provides multiple serialization and de-serialization methods

-            in various formats, including XML.  See the Javadocs for the CasIOUtils class.

-            </para></listitem>

-      </varlistentry>

-      

-      <varlistentry id="ugr.faqs.what_is_a_type_system">

-        <term><emphasis role="bold">What is a Type System?</emphasis></term>

-        <listitem><para>Think of a type system as a schema or class model for the <link linkend="ugr.faqs.what_is_the_cas">CAS</link>. It defines

-          the types of objects and their properties (or features) that may be instantiated in

-          a CAS. A specific CAS conforms to a particular type system. UIMA components declare

-          their input and output with respect to a type system. </para>

-          

-          <para>Type Systems include the definitions of types, their properties, range

-            types (these can restrict the value of properties to other types) and

-            single-inheritance hierarchy of types.</para></listitem>

-      </varlistentry>

-      

-      <varlistentry id="ugr.faqs.what_is_a_sofa">

-        <term><emphasis role="bold">What is a Sofa?</emphasis></term>

-        <listitem><para>Sofa stands for &ldquo;Subject of Analysis&quot;. A <link linkend="ugr.faqs.what_is_the_cas">CAS</link> is

-          associated with a single artifact being analysed by a collection of UIMA analysis

-          engines. But a single artifact may have multiple independent views, each of which

-          may be analyzed separately by a different set of <link linkend="ugr.faqs.annotator_versus_ae">analysis engines</link>. For example,

-          given a document it may have different translations, each of which are associated

-          with the original document but each potentially analyzed by different engines. A

-          CAS may have multiple Views, each containing a different Subject of Analysis

-          corresponding to some version of the original artifact. This feature is ideal for

-          multi-modal analysis, where for example, one view of a video stream may be the video

-          frames and the other the close-captions.</para></listitem>

-      </varlistentry>

-      

-      <varlistentry id="ugr.faqs.annotator_versus_ae">

-        <term><emphasis role="bold">What's the difference between an Annotator and an Analysis

-          Engine?</emphasis></term>

-        <listitem><para>In the terminology of UIMA, an annotator is simply some code that

-          analyzes documents and outputs <link linkend="ugr.faqs.what_is_an_annotation">annotations</link> on the content of the documents. The

-          UIMA framework takes the annotator, together with metadata describing such

-          things as the input requirements and outputs types of the annotator, and produces

-          an analysis engine. </para>

-          

-          <para>Analysis Engines contain the framework-provided infrastructure that

-            allows them to be easily combined with other analysis engines in different flows

-            and according to different deployment options (collocated or as web services,

-            for example). </para>

-          

-          <para>Analysis Engines are the framework-generated objects that an Application

-            interacts with. An Annotator is a user-written class that implements the one of

-            the supported Annotator interfaces.</para></listitem>

-      </varlistentry>

-      

-      <varlistentry id="ugr.faqs.web_services">

-        <term><emphasis role="bold">Are UIMA analysis engines web services?</emphasis></term>

-        <listitem><para>They can be deployed as such. Deploying an analysis engine as a web

-          service is one of the deployment options supported by the UIMA framework.</para>

-        </listitem>

-      </varlistentry>

-      

-      <varlistentry id="ugr.faqs.stateless_aes">

-        <term><emphasis role="bold">Do Analysis Engines have to be

-          &quot;stateless&quot;?</emphasis></term>

-        <listitem><para>This is a user-specifyable option. The XML metadata for the

-          component includes an

-          <code>operationalProperties</code> element which can specify if multiple

-          deployment is allowed. If true, then a particular instance of an Engine might not

-          see all the CASes being processed. If false, then that component will see all of the

-          CASes being processed. In this case, it can accumulate state information among all

-          the CASes. Typically, Analysis Engines in the main analysis pipeline are marked

-          multipleDeploymentAllowed = true. The CAS Consumer component, on the other hand,

-          defaults to having this property set to false, and is typically associated with

-          some resource like a database or search engine that aggregates analysis results

-          across an entire collection.</para>

-          

-          <para>Analysis Engines developers are encouraged not to maintain state between

-            documents that would prevent their engine from working as advertised if

-            operated in a parallelized environment.</para></listitem>

-      </varlistentry>

-      

-      <varlistentry id="ugr.faqs.uddi">

-        <term><emphasis role="bold">Is engine meta-data compatible with web services and

-          UDDI?</emphasis></term>

-        <listitem><para>All UIMA component implementations are associated with Component

-          Descriptors which represents metadata describing various properties about the

-          component to support discovery, reuse, validation, automatic composition and

-          development tooling. In principle, UIMA component descriptors are compatible

-          with web services and UDDI. However, the UIMA framework currently uses its own XML

-          representation for component metadata. It would not be difficult to convert

-          between UIMA&apos;s XML representation and other standard representations.</para>

-        </listitem>

-      </varlistentry>

-      

-      <varlistentry id="ugr.faqs.scaling">

-        <term><emphasis role="bold">How do you scale a UIMA application?</emphasis></term>

-        <listitem><para>The UIMA framework allows components such as 

-          <link linkend="ugr.faqs.annotator_versus_ae">analysis engines</link> and

-          CAS Consumers to be easily deployed as services or in other containers and managed

-          by systems middleware designed to scale. UIMA applications tend to naturally

-          scale-out across documents allowing many documents to be analyzed in

-          parallel.</para>

-          <para>The UIMA-AS project has extensive capabilities to flexibly scale a UIMA

-            pipeline across multiple machines.  The UIMA-DUCC project supports a 

-            unified management of large clusters of machines running multiple "jobs" 

-            each consisting of a pipeline with data sources and sinks.</para>

-          <para>Within the core UIMA framework, there is a component called the CPM (Collection Processing

-            Manager) which has features and configuration settings for scaling an

-            application to increase its throughput and recoverability; 

-            the CPM was the earlier version of scaleout technology, and has been 

-            superceded by the UIMA-AS effort (although it is still supported).</para></listitem>

-      </varlistentry>

-      

-      <varlistentry id="ugr.faqs.embedding">

-        <term><emphasis role="bold">What does it mean to embed UIMA in systems middleware?</emphasis></term>

-        <listitem><para>An example of an embedding would be the deployment of a UIMA analysis

-          engine as an Enterprise Java Bean inside an application server such as IBM

-          WebSphere. Such an embedding allows the deployer to take advantage of the features

-          and tools provided by WebSphere for achieving scalability, service management,

-          recoverability etc. UIMA is independent of any particular systems middleware, so

-          <link linkend="ugr.faqs.annotator_versus_ae">analysis engines</link> could be deployed on other application servers as well.</para>

-        </listitem>

-      </varlistentry>

-      

-      <varlistentry id="ugr.faqs.cpm_versus_cpe">

-        <term><emphasis role="bold">How is the CPM different from a CPE?</emphasis></term>

-        <listitem><para>These name complimentary aspects of collection processing. The CPM

-          (Collection Processing <emphasis role="bold">Manager</emphasis> is the part of 

-          the UIMA framework that manages the execution of a workflow of UIMA

-          components orchestrated to analyze a large collection of documents. The UIMA

-          developer does not implement or describe a CPM. It is a piece of infrastructure code

-          that handles CAS transport, instance management, batching, check-pointing,

-          statistics collection and failure recovery in the execution of a collection

-          processing workflow.</para>

-          

-          <para>A Collection Processing Engine (CPE) is component created by the framework

-            from a specific CPE descriptor. A CPE descriptor refers to a series of UIMA

-            components including a Collection Reader, CAS Initializer, Analysis

-            Engine(s) and CAS Consumers. These components are organized in a work flow and

-            define a collection analysis job or CPE. A CPE acquires documents from a source

-            collection, initializes CASs with document content, performs document

-            analysis and then produces collection level results (e.g., search engine

-            index, database etc). The CPM is the execution engine for a CPE.</para>

-        </listitem>

-      </varlistentry>

-      

-      <!-- 

-      <varlistentry id="ugr.faqs.semantic_search">

-        <term><emphasis role="bold">What is Semantic Search and what is its relationship to

-          UIMA?</emphasis></term>

-        <listitem><para>Semantic Search refers to a document search paradigm that allows

-          users to search based not just on the keywords contained in the documents, but also

-          on the semantics associated with the text by <link linkend="ugr.faqs.annotator_versus_ae">analysis engines</link>. UIMA applications

-          perform analysis on text documents and generate semantics in the form of

-          <link linkend="ugr.faqs.what_is_an_annotation">annotations</link> on regions of text. For example, a UIMA analysis engine may discover

-          the text <quote>First Financial Bank</quote> to refer to an organization and

-          annotated it as such. With traditional keyword search, the query

-          <command>first</command> will return all documents that contain that word.

-          <command>First</command> is a frequent and ambiguous term &ndash; it occurs a lot

-          and can mean different things in different places. If the user is looking for

-          organizations that contain that word <command>first</command> in their names,

-          s/he will likely have to sift through lots of documents containing the word

-          <quote>first</quote> used in different ways. Semantic Search exploits the

-          results of analysis to allow more precise queries. For example, the semantic

-          search query <emphasis>&lt;organization&gt; first

-          &lt;/organization&gt;</emphasis> will rank first documents that contain the

-          word <quote>first</quote> as part of the name of an organization. The UIMA SDK

-          documentation demonstrates how UIMA applications can be built using semantic

-          search. It provides details about the XML Fragment Query language. This is the

-          particular query language used by the semantic search engine that comes with the

-          SDK.</para>

-          </listitem>

-      </varlistentry>

-       

-      

-      <varlistentry id="ugr.faqs.xml_fragment_not_xml">

-        <term><emphasis role="bold">Is an XML Fragment Query valid XML?</emphasis></term>

-        <listitem><para>Not necessarily. The XML Fragment Query syntax is used to formulate

-          queries interpreted by the semantic search engine that ships with the UIMA SDK.

-          This query language relies on basic XML syntax as an intuitive way to describe

-          hierarchical patterns of annotations that may occur in a <link linkend="ugr.faqs.what_is_the_cas">CAS</link>. The language

-          deviates from valid XML in order to support queries over

-          <quote>overlapping</quote> or <quote>cross-over</quote> annotations and

-          other features that affect the interpretation of the query by the query processor.

-          For example, it admits notations in the query to indicate whether a keyword or an

-          annotation is optional or required to match a document.</para></listitem>

-      </varlistentry>

-      -->

-      

-      <varlistentry id="ugr.faqs.modalities_other_than_text">

-        <term><emphasis role="bold">Does UIMA support modalities other than text?</emphasis></term>

-        <listitem><para>The UIMA architecture supports the development, discovery,

-          composition and deployment of multi-modal analytics including text, audio and

-          video. Applications that process text, speech and video have been developed using

-          UIMA. This release of the SDK, however, does not include examples of these

-          multi-modal applications. </para>

-          

-          <para>It does however include documentation and programming examples for using

-            the key feature required for building multi-modal applications. UIMA supports

-            multiple subjects of analysis or <link linkend="ugr.faqs.what_is_a_sofa">Sofas</link>. These allow multiple views of a single

-            artifact to be associated with a <link linkend="ugr.faqs.what_is_the_cas">CAS</link>. For example, if an artifact is a video

-            stream, one Sofa could be associated with the video frames and another with the

-            closed-captions text. UIMA&apos;s multiple Sofa feature is included and

-            described in this release of the SDK.</para></listitem>

-      </varlistentry>

-      

-      <varlistentry id="ugr.faqs.compare">

-        <term><emphasis role="bold">How does UIMA compare to other similar work?</emphasis></term>

-        <listitem><para>A number of different frameworks for NLP have preceded UIMA. Two of

-          them were developed at IBM Research and represent UIMA&apos;s early roots. For

-          details please refer to the UIMA article that appears in the IBM Systems Journal

-          Vol. 43, No. 3 (<ulink

-            url="http://www.research.ibm.com/journal/sj/433/ferrucci.html"/>

-          ).</para>

-          

-          <para>UIMA has advanced that state of the art along a number of dimensions

-            including: support for distributed deployments in different middleware

-            environments, easy framework embedding in different software product

-            platforms (key for commercial applications), broader architectural converge

-            with its collection processing architecture, support for

-            multiple-modalities, support for efficient integration across programming

-            languages, support for a modern software engineering discipline calling out

-            different roles in the use of UIMA to develop applications, the extensive use of

-            descriptive component metadata to support development tooling, component

-            discovery and composition. (Please note that not all of these features are

-            available in this release of the SDK.)</para></listitem>

-      </varlistentry>

-      

-      <varlistentry id="ugr.faqs.open_source">

-        <term><emphasis role="bold">Is UIMA Open Source?</emphasis></term>

-        <listitem><para>Yes. As of version 2, UIMA development has moved to Apache and is being

-          developed within the Apache open source processes. It is licensed under the Apache

-          version 2 license. 

-            </para>

-        </listitem>

-      </varlistentry>

-      

-      <varlistentry id="ugr.faqs.levels_required">

-        <term><emphasis role="bold">What Java level and OS are required for the UIMA SDK?</emphasis></term>

-        <listitem><para>As of release 3.0.0, the UIMA SDK requires Java 1.8.

-          It has been tested on mainly on Windows and Linux platforms, with some

-          testing on the MacOSX. Other

-          platforms and JDK implementations will likely work, but have

-          not been as significantly tested.</para></listitem>

-      </varlistentry>

-      

-      <varlistentry id="ugr.faqs.building_apps_on_top_of_uima">

-        <term><emphasis role="bold">Can I build my UIM application on top of UIMA?</emphasis></term>

-        <listitem><para>Yes. Apache UIMA is licensed under the Apache version 2 license,

-          enabling you to build and distribute applications which include the framework.

-          </para></listitem>

-      </varlistentry>

-      

-

- </variablelist>

-</chapter>

diff --git a/uima-docbook-overview-and-setup/src/docbook/glossary.xml b/uima-docbook-overview-and-setup/src/docbook/glossary.xml
deleted file mode 100644
index 24e7ec8..0000000
--- a/uima-docbook-overview-and-setup/src/docbook/glossary.xml
+++ /dev/null
@@ -1,582 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE glossary PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<glossary id="ugr.glossary">

-  <title>Glossary: Key Terms &amp; Concepts</title>

-  <titleabbrev>Glossary</titleabbrev>

- <!-- 

-  <para></para>

-  <glossary id="ugr.glossary.glossary">

-   -->

-    <!--

-    <glossentry id="ugr.glossary.">

-      <glossterm></glossterm>

-      <glosssee otherterm="ugr.glossary."></glosssee>

-      <glossdef>

-        <para></para>

-        <glossseealso otherterm="ugr.glossary."/>

-      </glossdef>

-    </glossentry>

-      -->

-       <glossentry id="ugr.glossary.aggregate">

-      <glossterm>Aggregate &ae;</glossterm>

-      <glossdef>

-        <para>An <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>

- made up of multiple subcomponent

-&ae;s arranged in a flow.  The

-flow can be one of the two built-in flows, or a custom flow provided by the user.</para>

-      </glossdef>

-    </glossentry> 

-    

-  <glossentry id="ugr.glossary.analysis_engine">

-    <glossterm>&ae;</glossterm>

-    <glossdef><para>A program that analyzes artifacts (e.g. documents) and infers information about

-them, and which implements the UIMA &ae; interface Specification. It

-does not matter how the program is built, with what framework or whether or not

-it contains component (<quote>sub</quote>) &ae;s.</para>

-    </glossdef>

-  </glossentry>

-

-  

-    <glossentry id="ugr.glossary.annotation">

-      <glossterm>Annotation</glossterm>

-      <glossdef>

-        <para>The association of a metadata, such as a label, with a region of text (or other

-type of artifact). For example, the label <quote>Person</quote> associated with a

-region of text <quote>John Doe</quote> constitutes an annotation. We say

-<quote>Person</quote> annotates the span of text from X to Y containing exactly

-<quote>John Doe</quote>. An annotation is represented as a special

-          <glossterm linkend="ugr.glossary.type">type</glossterm> 

-

-in a UIMA <glossterm linkend="ugr.glossary.type_system">type system</glossterm>.

-           It is the type used to record

-the labeling of regions of a <glossterm linkend="ugr.glossary.sofa">Sofa</glossterm>.

-          Annotations are <glossterm linkend="ugr.glossary.feature_structure">Feature Structures</glossterm>

-          whose <glossterm linkend="ugr.glossary.type">Type</glossterm> is Annotation or a subtype

-          of that.</para>

-      </glossdef>

-    </glossentry>

-  

-    <glossentry id="ugr.glossary.annotator">

-      <glossterm>Annotator</glossterm>

-      <glossdef>

-        <para>A software

-component that implements the UIMA annotator interface. Annotators are

-implemented to produce and record annotations over regions of an artifact

-(e.g., text document, audio, and video).</para>

-      </glossdef>

-    </glossentry>

-  

-    <glossentry id="ugr.glossary.application">

-      <glossterm>Application</glossterm>

-      <glossdef>

-        <para>An application is the outer containing code that invokes

-        the UIMA framework functions to instantiate an 

-        <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm> or a

-        <glossterm linkend="ugr.glossary.cpe">Collection Processing Engine</glossterm> from a particular 

-        descriptor, and run it.</para>

-      </glossdef>

-    </glossentry>

-

-      <glossentry id="ugr.glossary.apache_uima_java_framework">

-      <glossterm>Apache UIMA Java Framework</glossterm>

-      <glossdef>

-        <para>A Java-based implementation of the <glossterm linkend="ugr.glossary.uima">UIMA</glossterm>

-         architecture.  It provides a run-time environment in which developers can plug in and run their UIMA component 

-         implementations and with which they can build and deploy UIM applications.  The framework is the

-         core part of the <glossterm linkend="ugr.glossary.apache_uima_sdk">Apache UIMA SDK</glossterm>.</para>

-      </glossdef>

-    </glossentry>

-

-    <glossentry id="ugr.glossary.apache_uima_sdk">

-      <glossterm>Apache UIMA Software Development Kit (SDK)</glossterm>

-      <glossdef>

-        <para>The SDK for which you are now reading the documentation.  The SDK includes the framework

-          plus additional components such as tooling and examples.  Some of the tooling is Eclipse-based 

-          (<ulink url="http://www.eclipse.org/"/>).</para>

-      </glossdef>

-    </glossentry>

-    

-      <glossentry id="ugr.glossary.cas">

-      <glossterm>CAS</glossterm>

-      <glossdef>

-        <para>The UIMA Common Analysis Structure is

-the primary data structure which UIMA analysis components use to represent and

-share analysis results.  It contains:</para>

-

-<itemizedlist><listitem><para>The artifact. This is the object

-being analyzed such as a text document or audio or video stream. The CAS

-projects one or more views of the artifact. Each view is referred to as a 

-  <glossterm linkend="ugr.glossary.sofa">Sofa</glossterm>.</para></listitem>

-

-

-<listitem><para>A type system description &ndash;

-indicating the types, subtypes, and their features. </para></listitem>

-

-

-<listitem><para>Analysis metadata &ndash; <quote>standoff</quote>

-annotations describing the artifact or a region of the artifact </para></listitem>

-

-

-<listitem><para>An index repository to support

-efficient access to and iteration over the results of analysis.

-</para></listitem></itemizedlist>

-

-<para>UIMA&apos;s primary interface to this structure is provided by

-a class called the Common Analysis System. We use <quote>CAS</quote> to refer to

-both the structure and system. Where the common analysis structure is used

-through a different interface, the particular implementation of the structure

-is indicated, For example, the <glossterm linkend="ugr.glossary.jcas">JCas</glossterm> is a native Java object

-representation of the contents of the common analysis structure.</para>

-

-<para>A CAS can have multiple views; each view has a unique

-representation of the artifact, and has its own index repository, representing

-results of analysis for that representation of the artifact.</para>

-      </glossdef>

-    </glossentry>

-  

-    <glossentry id="ugr.glossary.cas_consumer">

-      <glossterm>CAS Consumer</glossterm>

-      <glossdef>

-        <para>A component that

-receives each CAS in the collection, usually after it has been processed by an 

-          <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>. It is responsible for taking the results from

-the CAS and using them for some purpose, perhaps storing selected results into

-a database, for instance.  The CAS

-Consumer may also perform collection-level analysis, saving these results in an

-application-specific, aggregate data structure.</para>

-      </glossdef>

-    </glossentry>

-  

-    <glossentry id="ugr.glossary.cas_initializer">

-      <glossterm>CAS Initializer (deprecated)</glossterm>

-      <glossdef>

-        <para>Prior to version 2, this was the component that took an 

-          undefined input form and produced a particular <glossterm linkend="ugr.glossary.sofa">Sofa</glossterm>.

-          For version 2, this has been replaced with using any <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>

-          which takes a particular <glossterm linkend="ugr.glossary.cas_view">CAS View</glossterm> and creates a

-          new output Sofa.  For example, if the document is HTML, an &ae; might 

-          create a Sofa which is a detagged version of an input CAS View, perhaps also

-creating annotations derived from the tags. For example &lt;p&gt; tags

-might be translated into Paragraph annotations in the CAS.</para>

-      </glossdef>

-    </glossentry>

-  

-    <glossentry id="ugr.glossary.cas_multiplier">

-      <glossterm>CAS Multiplier</glossterm>

-      <glossdef>

-        <para>A component, implemented by a UIMA developer,

-that takes a CAS as input and produces 0 or more new CASes as output.  Common use cases for a CAS Multiplier

-          include creating alternative versions of an input <glossterm linkend="ugr.glossary.sofa">Sofa</glossterm> 

-          (see <glossterm linkend="ugr.glossary.cas_initializer">CAS Initializer</glossterm>), and breaking 

-          a large input CAS into smaller pieces, each of which is emitted as a

-separate output CAS.  There are other

-uses, however, such as aggregating input CASes into a single output CAS.</para>

-      </glossdef>

-    </glossentry>

-  

-    <glossentry id="ugr.glossary.cas_processor">

-      <glossterm>CAS Processor</glossterm>

-      <glossdef>

-        <para>A component of a Collection Processing Engine (CPE) that

-takes a CAS as input and returns a CAS as output. There are two types of CAS

-Processors: <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>s and 

-          <glossterm linkend="ugr.glossary.cas_consumer">CAS Consumer</glossterm>s.</para>

-      </glossdef>

-    </glossentry>

-  

-    <glossentry id="ugr.glossary.cas_view">

-      <glossterm>CAS View</glossterm>

-      <glossdef>

-        <para>A CAS Object which shares the base CAS and type system

-definition and index specifications, but has a unique index repository and a

-particular <glossterm linkend="ugr.glossary.sofa">Sofa</glossterm>.   Views are named, and applications and

-annotators can dynamically create additional views whenever they are needed.

-Annotations are made with respect to one view.  Feature structures can have references to feature structures 

-          indexed in other views, as needed.</para>

-      </glossdef>

-    </glossentry>

-  

-    <glossentry id="ugr.glossary.cde">

-      <glossterm>CDE</glossterm>

-      <glossdef>

-        <para>The Component Descriptor Editor. This

-is the Eclipse tool that lets you conveniently edit the UIMA descriptors; 

-          see <olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cde"/>.</para>

-      </glossdef>

-    </glossentry>

-  

-    <glossentry id="ugr.glossary.cpe">

-      <glossterm>Collection Processing Engine (CPE)</glossterm>

-      <glossdef>

-        <para>Performs Collection Processing

-through the combination of a 

-          <glossterm linkend="ugr.glossary.collection_reader">Collection Reader</glossterm>,

-          0 or more <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>s,

- and zero or more <glossterm linkend="ugr.glossary.cas_consumer">CAS Consumer</glossterm>s.

-The Collection Processing Manager (CPM) manages the execution of the engine.</para>

-        <para>The CPE also refers to the XML specification of the Collection Processing

-        engine.  The CPM reads a CPE specification and instantiates a CPE instance from it,

-        and runs it.</para>

-      </glossdef>

-    </glossentry>

-  

-    <glossentry id="ugr.glossary.cpm">

-      <glossterm>Collection Processing Manager (CPM)</glossterm>

-      <glossdef>

-        <para>The part of the framework that

-manages the execution of collection processing, routing CASs from the 

-          <glossterm linkend="ugr.glossary.collection_reader">Collection Reader</glossterm>

-          

-to 0 or more <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>s

-and then to the 0 or more <glossterm linkend="ugr.glossary.cas_consumer">CAS Consumer</glossterm>s. The CPM

-provides feedback such as performance statistics and error reporting and supports

-other features such as parallelization and error handling.</para>

-      </glossdef>

-    </glossentry>

-  

-    <glossentry id="ugr.glossary.collection_reader">

-      <glossterm>Collection Reader</glossterm>

-      <glossdef>

-        <para>A component

-that reads documents from some source, for example a file system or database.

-The collection reader initializes a CAS with this document.  

-          Each document is returned as a CAS that may then be processed by 

-          an <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>s. If the task of populating a CAS

-from the document is complex, you may use an arbitrarily complex chain of 

-          <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>s and have the last one

-          create and initialize a new <glossterm linkend="ugr.glossary.sofa">Sofa</glossterm>.</para>

-      </glossdef>

-    </glossentry>

-

-<!--   

-    <glossentry id="ugr.glossary.fact_search">

-      <glossterm>Fact Search</glossterm>

-      <glossdef>

-        <para>A search that given a fact pattern, returns facts

-extracted from a collection of documents by a set of &ae;s that

-match the fact pattern.</para>

-      </glossdef>

-    </glossentry>

-   -->

-   

-    <glossentry id="ugr.glossary.feature_structure">

-      <glossterm>Feature Structure</glossterm>

-      <glossdef>

-        <para>An instance of a <glossterm linkend="ugr.glossary.type">Type</glossterm>.

-        Feature Structures are kept in the <glossterm linkend="ugr.glossary.cas">CAS, and may

-        (optionally) be added to the defined <glossterm linkend="ugr.glossary.index">indexes</glossterm>.

-        Feature Structures may contain references to other Feature Structures.

-        Feature Structures whose type is Annotation or a subtype of that, are referred to as 

-        <glossterm linkend="ugr.glossary.annotation">annotations</glossterm>.</glossterm></para>

-      </glossdef>

-    </glossentry>

-    

-    <glossentry id="ugr.glossary.feature">

-      <glossterm>Feature</glossterm>

-      <glossdef>

-        <para>A data member or attribute of a type.  Each feature itself has an

-associated range type, the type of the value that it can hold.  In the

-database analogy where types are tables, features are columns.

-        In the world of structured data types, each feature is a <quote>field</quote>,

-        or data member.</para>

-      </glossdef>

-    </glossentry>

-  

-    <glossentry id="ugr.glossary.flow_controller">

-      <glossterm>Flow Controller</glossterm>

-      <glossdef>

-        <para>A component which implements the interfaces needed

-to specify a custom flow within an <glossterm linkend="ugr.glossary.aggregate">&aae;</glossterm>.</para>

-      </glossdef>

-    </glossentry>

-  

-    <glossentry id="ugr.glossary.hybrid_analysis_engine">

-      <glossterm>Hybrid &ae;</glossterm>

-      <glossdef>

-        <para>An <glossterm linkend="ugr.glossary.aggregate">&aae;</glossterm> 

-          where more than one of its component &ae;s are deployed

-the same address space and one or more are deployed remotely (part tightly and

-part loosely-coupled).</para>

-      </glossdef>

-    </glossentry>

-  

-    <glossentry id="ugr.glossary.index">

-      <glossterm>Index</glossterm>

-      <glossdef>

-        <para>Data in the CAS can only be retrieved using Indexes.  

-          Indexes are analogous to the indexes that are

-specified on tables of a database.  Indexes belong to Index Repositories;

-there is one Repository for each

-view of the CAS.  Indexes are specified

-to retrieve instances of some CAS Type (including its subtypes), and can be

-optionally sorted in a user-definable way. 

-          For example, all types derived from the UIMA

-built-in type <literal>uima.tcas.Annotation</literal> contain begin

-and end features, which mark the begin and end offsets in the text where this

-annotation occurs.  There is a built-in index of Annotations that specifies that

-annotations are retrieved sequentially by sorting first on the value of the begin 

-feature (ascending) and then by the value of the end feature (descending).

-In this case, iterating over the annotations, one first obtains annotations that 

-come sequentially first in the text, while favoring longer annotations, in the case

-where two annotations start at the same offset.  Users can define their own indexes

-as well.</para>

-      </glossdef>

-    </glossentry>

-  

-    <glossentry id="ugr.glossary.jcas">

-      <glossterm>JCas</glossterm>

-      <glossdef>

-        <para>A Java object interface to the contents of the CAS.  

-          This interface uses additional generated Java classes, where each type in the CAS

-is represented as a Java class with the same name, each feature is represented with

-a getter and setter method, and each instance of a type is represented as a

-Java object of the corresponding Java class.</para>

-      </glossdef>

-    </glossentry>

-  

-<!-- 

-    <glossentry id="ugr.glossary.keyword_search">

-      <glossterm>Keyword Search</glossterm>

-      <glossdef>

-        <para>The standard search method where one supplies words (or <quote>keywords</quote>)

-and candidate documents are returned.</para>

-      </glossdef>

-    </glossentry>

- -->

- <!--   

-    <glossentry id="ugr.glossary.knowledge_base">

-      <glossterm>Knowledge Base</glossterm>

-      <glossdef>

-        <para>A collection of data that may be interpreted as a

-set of facts and rules considered true in a possible world.</para>

-      </glossdef>

-    </glossentry>

-   -->

-   

-    <glossentry id="ugr.glossary.loosely_coupled_analysis_engine">

-      <glossterm>Loosely-Coupled &ae;</glossterm>

-      <glossdef>

-        <para>An <glossterm linkend="ugr.glossary.aggregate">&aae;</glossterm>

-         where no two of its component &ae;s run in the

-same address space but where each is remote with respect to the others that

-make up the aggregate. Loosely coupled engines are ideal for using 

-          remote &ae; services that are

-not locally available, or for quickly assembling and testing functionality in

-cross-language, cross-platform distributed environments. They also better enable

-distributed scaleable implementations where quick recoverability may have a

-greater impact on overall throughput than analysis speed.</para>

-      </glossdef>

-    </glossentry>

-  

-<!--  -->

-    <glossentry id="ugr.glossary.ontology">

-      <glossterm></glossterm>

-      <glossdef>

-        <para>The part of a knowledge base that defines the semantics of the data

-axiomatically.</para>

-      </glossdef>

-    </glossentry>

- -->

-   

-    <glossentry id="ugr.glossary.pear">

-      <glossterm>PEAR</glossterm>

-      <glossdef>

-        <para>An archive file that packages up a UIMA component with its code,

-descriptor files and other resources required to install and run it in another

-environment. You can generate PEAR files using utilities that come with the

-UIMA SDK.</para>

-      </glossdef>

-    </glossentry>

-  

-    <glossentry id="ugr.glossary.primitive_analysis_engine">

-      <glossterm>Primitive &ae;</glossterm>

-      <glossdef>

-        <para>An <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm> 

-          that is composed of a single 

-          <glossterm linkend="ugr.glossary.annotator">Annotator</glossterm>; one that has

-no component (or <quote>sub</quote>) &ae;s inside of it; 

-contrast with

-          <glossterm linkend="ugr.glossary.aggregate">&aae;</glossterm>.</para>

-      </glossdef>

-    </glossentry>

- 

- <!--  

-    <glossentry id="ugr.glossary.semantic_search">

-      <glossterm>Semantic Search</glossterm>

-      <glossdef>

-        <para> search where the semantic intent of the query is

-specified using one or more entity or relation specifiers.  For example,

-one could specify that they are looking for a person (named) <quote>Bush.</quote>

-Such a query would then not return results about the kind of bushes that grow

-in your garden but rather just persons named Bush.</para>

-      </glossdef>

-    </glossentry>

- -->

- 

-    <glossentry id="ugr.glossary.structured_information">

-      <glossterm>Structured Information</glossterm>

-      <glossdef>

-        <para>Items stored in structured resources such as

-search engine indices, databases or knowledge bases. The canonical example of

-structured information is the database table. Each element of information in

-the database is associated with a precisely defined schema where each table

-column heading indicates its precise semantics, defining exactly how the

-information should be interpreted by a computer program or end-user.</para>

-      </glossdef>

-    </glossentry>

-  

-    <glossentry id="ugr.glossary.sofa">

-      <glossterm>Subject of Analysis (Sofa)</glossterm>

-      <glossdef>

-        <para>A piece of

-data (e.g., text document, image, audio segment, or video segment), which is intended

-for analysis by UIMA analysis components.  It belongs to a 

-          <glossterm linkend="ugr.glossary.cas_view">CAS View</glossterm> which has the same name; there

-          is a one-to-one correspondence between these.  There can be multiple Sofas contained within

-one CAS, each one representing a different view of the original artifact &ndash; for example,

-an audio file could be the original artifact, and also be one Sofa, and another

-could be the output of a voice-recognition component, where the Sofa would be

-the corresponding text document. Sofas may be analyzed independently or

-simultaneously; they all co-exist within the CAS.  </para>

-      </glossdef>

-    </glossentry>

-  

-    <glossentry id="ugr.glossary.tightly_coupled_analysis_engine">

-      <glossterm>Tightly-Coupled &ae;</glossterm>

-      <glossdef>

-        <para>An <glossterm linkend="ugr.glossary.aggregate">&aae;</glossterm>

- where all of its component &ae;s run in the same address space.</para>

-      </glossdef>

-    </glossentry>

-  

-    <glossentry id="ugr.glossary.type">

-      <glossterm>Type</glossterm>

-      <glossdef>

-        <para>A specification of an object in the

-          <glossterm linkend="ugr.glossary.cas">CAS</glossterm> used to store the results of

-analysis.  Types are defined using inheritance, so some types may be

-defined purely for the sake of defining other types, and are in this sense <quote>abstract

-types.</quote>  Types usually contain 

-          <glossterm linkend="ugr.glossary.feature">Feature</glossterm>s, which are attributes, or

-properties of the type.  A type is roughly equivalent to a class in an

-object oriented programming language, or a table in a database.  Instances of types in the CAS

-          may be indexed for retrieval.</para>

-      </glossdef>

-    </glossentry>

-  

-    <glossentry id="ugr.glossary.type_system">

-      <glossterm>Type System</glossterm>

-      <glossdef>

-        <para>A collection of related <glossterm linkend="ugr.glossary.type">types</glossterm>.

-          All components that can access the CAS,

-including <glossterm linkend="ugr.glossary.application">Applications</glossterm>,

-          <glossterm linkend="ugr.glossary.analysis_engine">Analysis Engine</glossterm>s,

-          <glossterm linkend="ugr.glossary.collection_reader">Collection Readers</glossterm>,

-          <glossterm linkend="ugr.glossary.flow_controller">Flow Controllers</glossterm>, or

-          <glossterm linkend="ugr.glossary.cas_consumer">CAS Consumers</glossterm>

-declare the type system that they use. Type systems are shared across &ae;s, allowing the outputs 

-          of one &ae; to be read as input by another &ae;.

-A type system is roughly analogous to a set of related classes in object

-oriented programming, or a set of related tables in a database.  The type

-system / type / feature terminology comes from computational linguistics.</para>

-      </glossdef>

-    </glossentry>

-  

-    <glossentry id="ugr.glossary.unstructured_information">

-      <glossterm>Unstructured Information</glossterm>

-      <glossdef>

-        <para>The canonical example of unstructured

-information is the natural language text document. The intended meaning of a

-document's content is only implicit and its precise interpretation by a

-computer program requires some degree of analysis to explicate the document's

-semantics. Other examples include audio, video and images. Contrast with

-<glossterm linkend="ugr.glossary.structured_information">Structured Information</glossterm>.

-        </para>          

-      </glossdef>

-    </glossentry>

-  

-    <glossentry id="ugr.glossary.uima">

-      <glossterm>UIMA</glossterm>

-      <glossdef>

-        <para>UIMA is an acronym that stands for Unstructured Information Management Architecture; 

-          it is a software architecture which specifies component interfaces, design patterns

-and development roles for creating, describing, discovering, composing and

-deploying multi-modal analysis capabilities.  The UIMA specification is being developed by a 

-        technical committee at <ulink url="http://www.oasis-open.org/committees/uima">OASIS</ulink>.</para>

-      </glossdef>

-    </glossentry>

-  

-    <glossentry id="ugr.glossary.uima_java_framework">

-      <glossterm>UIMA Java Framework</glossterm>

-      <glossdef>

-        <para>See <glossterm linkend="ugr.glossary.apache_uima_java_framework">Apache UIMA Java Framework</glossterm>.</para>

-        <para/>

-      </glossdef>

-    </glossentry>

-

-    <glossentry id="ugr.glossary.uima_sdk">

-      <glossterm>UIMA SDK</glossterm>

-      <glossdef>

-        <para>See <glossterm linkend="ugr.glossary.apache_uima_sdk">Apache UIMA SDK</glossterm>.</para>

-        <para/>

-      </glossdef>

-    </glossentry>

-  

-    <glossentry id="ugr.glossary.xcas">

-      <glossterm>XCAS</glossterm>

-      <glossdef>

-        <para>An XML representation of the CAS. The XCAS can be used for saving

-and restoring CASs to and from streams. The UIMA SDK provides XCAS serialization and

-de-serialization methods for CASes.  This is an older serialization format and

-new UIMA code should use the standard <glossterm linkend="ugr.glossary.xmi">XMI</glossterm>

-format instead.</para>

-      </glossdef>

-    </glossentry>

-  

-    <glossentry id="ugr.glossary.xmi">

-      <glossterm>XML Metadata Interchange (XMI)</glossterm>

-      <glossdef>

-        <para>An OMG standard for representing

-object graphs in XML, which UIMA uses to serialize analysis results from the

-CAS to an XML representation.  The UIMA SDK provides XMI serialization and

-de-serialization methods for CASes</para>

-      </glossdef>

-    </glossentry>

-

-

-  <!--  

-    <glossentry id="ugr.glossary.">

-      <glossterm></glossterm>

-      <glossdef>

-        <para></para>

-      </glossdef>

-    </glossentry>

-  -->

-  

-  </glossary>

-

- <!-- 

-</chapter>

-   -->

diff --git a/uima-docbook-overview-and-setup/src/docbook/known_issues.xml b/uima-docbook-overview-and-setup/src/docbook/known_issues.xml
deleted file mode 100644
index 2fcfd08..0000000
--- a/uima-docbook-overview-and-setup/src/docbook/known_issues.xml
+++ /dev/null
@@ -1,68 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.issues">

-  <title>Known Issues</title>

-  <titleabbrev>Known Issues</titleabbrev>

-

-  <variablelist>

-    <varlistentry id="ugr.issues.cr_to_xml">

-    <term><emphasis role="bold">Sun Java 1.4.2_12 doesn't serialize CR characters to XML</emphasis></term>

-        <listitem>

-        <para>(Note: Apache UIMA now requires Java 1.5, so this issue is moot.) The XML serialization support in Sun Java 1.4.2_12 doesn't serialize CR characters to 

-        XML. As a result, if the document text contains CR characters, XCAS or XMI serialization 

-        will cause them to be lost, resulting in incorrect annotation offsets. This is exposed in 

-        the DocumentAnalyzer, with the highlighting being incorrect if the input document contains 

-        CR characters. </para>

-        </listitem>

-      </varlistentry>

-    <varlistentry id="ugr.issues.jcasgen_java_1.4">

-      <term><emphasis role="bold">JCasGen merge facility only supports Java levels 1.4 or earlier</emphasis></term>

-      <listitem>

-        <para>JCasGen has a facility to merge in user (hand-coded) changes with the code generated

-          by JCasGen.  This merging supports Java 1.4 constructs only.  JCasGen generates Java 1.4 

-          compliant code, so as long as any code you change here also only uses Java 1.4 constructs, the 

-      merge will work, even if you're using Java 5 or later.  

-          If you use syntactic structures particular to Java 5 or later, the merge

-        operation will likely fail to merge properly.</para>

-      </listitem>

-    </varlistentry>

-    <varlistentry id="ugr.issues.libgcj.4.1.2">

-      <term><emphasis role="bold">Descriptor editor in Eclipse tooling does not work with libgcj 4.1.2</emphasis></term>

-      <listitem>

-        <para>The descriptor editor in the Eclipse tooling does not work with libgcj 4.1.2, and

-        possibly other versions of libgcj.  This is apparently due to a bug in the implementation of

-        their XML library, which results in a class cast error.  libgcj is used as the default

-        JVM for Eclipse in Ubuntu (and other Linux distributions?).  The workaround is to use a

-        different JVM to start Eclipse.</para>

-      </listitem>

-    </varlistentry>

-      <!--

-      <varlistentry>

-      <term><emphasis role="bold"></emphasis></term>

-      <listitem><para></para></listitem>

-      </varlistentry>

-      -->

- </variablelist>

-</chapter>

diff --git a/uima-docbook-overview-and-setup/src/docbook/overview_and_setup.xml b/uima-docbook-overview-and-setup/src/docbook/overview_and_setup.xml
deleted file mode 100644
index 54ca57b..0000000
--- a/uima-docbook-overview-and-setup/src/docbook/overview_and_setup.xml
+++ /dev/null
@@ -1,34 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<book lang="en" >

-  <title>UIMA Overview &amp; SDK Setup</title>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="../../target/docbook-shared/common_book_info_ibm_c.xml"/>

-

-  <toc/>

-

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="project_overview.xml" />

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="conceptual_overview.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="eclipse_setup.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="faqs.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="known_issues.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="glossary.xml"/>

-</book>

diff --git a/uima-docbook-overview-and-setup/src/docbook/project_overview.xml b/uima-docbook-overview-and-setup/src/docbook/project_overview.xml
deleted file mode 100644
index 8ffec29..0000000
--- a/uima-docbook-overview-and-setup/src/docbook/project_overview.xml
+++ /dev/null
@@ -1,731 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.project_overview">

-  <title>UIMA Overview</title>

-  <titleabbrev>Overview</titleabbrev>

-  

-  <para>The Unstructured Information Management Architecture (UIMA) is an architecture and software framework

-    for creating, discovering, composing and deploying a broad range of multi-modal analysis capabilities and

-    integrating them with search technologies.  The architecture is undergoing a standardization effort, 

-    referred to as the <emphasis>UIMA specification</emphasis> by a technical committee within

-    <ulink url="http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=uima">OASIS</ulink>.  

-    </para>

-  

-  <para>The <emphasis>Apache UIMA</emphasis> framework is an Apache licensed, open source implementation of the

-    UIMA Architecture, and provides a run-time environment in which developers can plug in

-    and run their UIMA component implementations and with which they can build and deploy UIM applications. The

-    framework itself is not specific to any IDE or platform.</para>

-  

-  <para>It includes an all-Java implementation of the

-    UIMA framework for the development, description, composition and deployment of UIMA components and

-    applications. It also provides the developer with an Eclipse-based (<ulink url="http://www.eclipse.org/"/>

-    ) development environment that includes a set of tools and utilities for using UIMA. It also includes 

-    a C++ version of the framework, and

-    enablements for Annotators built in Perl, Python, and TCL.</para>

-  

-  <para>This chapter is the intended starting point for readers that are new to the Apache UIMA Project. It includes

-    this introduction and the following sections:</para> 

-  <itemizedlist>

-    <listitem>

-      <para> <xref linkend="ugr.project_overview_doc_overview"/> provides a list of the books and topics included in

-        the Apache UIMA documentation with a brief summary of each. </para>

-    </listitem>

-    <listitem>

-      <para> <xref linkend="ugr.project_overview_doc_use"/> describes a recommended path through the

-        documentation to help get the reader up and running with UIMA </para>

-    </listitem>

-  </itemizedlist>

-    

-    <para>The main website for Apache UIMA is <ulink url="http://uima.apache.org"/>.  Here you 

-    can find out many things, including:

-     <itemizedlist spacing="compact">

-       <listitem><para>how to download (both the binary and source distributions</para></listitem>

-       <listitem><para>how to participate in the development</para></listitem>

-       <listitem><para>mailing lists - including the user list used like a forum for questions and answers</para></listitem>

-       <listitem><para>a Wiki where you can find and contribute all kinds of information, including tips and best practices</para></listitem>

-       <listitem><para>a sandbox - a subproject for potential new additions to Apache UIMA or to subprojects of it.  Things here

-       are works in progress, and may (or may not) be included in releases.</para></listitem>

-       <listitem><para>links to conferences</para></listitem>

-     </itemizedlist>

-      </para>

- 

-  <section id="ugr.project_overview_doc_overview">

-    <title>Apache UIMA Project Documentation Overview</title>

-    <para> The user documentation for UIMA is organized into several parts.

-      <itemizedlist spacing="compact">

-        <listitem>

-          <para> Overviews - this documentation </para>

-        </listitem>

-        <listitem>

-          <para> Eclipse Tooling Installation and Setup - also in this document </para>

-        </listitem>

-        <listitem>

-          <para> Tutorials and Developer's Guides </para>

-        </listitem>

-        <listitem>

-          <para> Tools Users' Guides </para>

-        </listitem>

-        <listitem>

-          <para> References </para>

-        </listitem>

-        <listitem>

-          <para>Version 3 users-guide</para>

-        </listitem>

-      </itemizedlist> </para>

-    

-    <para>

-    The first 2 parts make up this book; the last 4 have individual 

-    books.  The books are provided both as

-    (somewhat large) html files, viewable in browsers, and also as PDF files.  

-    The documentation is fully hyperlinked, with tables of contents.  The PDF versions are set up to 

-    print nicely - they have page numbers included on the cross references within a book. </para>

-    

-    <para>If you view the PDF files inside

-    a browser that supports imbedded viewing of PDF, the hyperlinks between different PDF books may work (not 

-    all browsers have been tested...).</para>

-    

-    <para>The following set of tables gives a more detailed overview of the various parts of the

-    documentation.

-    </para>

-    

-    <section id="ugr.project_overview_overview">

-      <title>Overviews</title>

-      

-      <informaltable frame="all" rowsep="1" colsep="1">

-        <tgroup cols="2">

-          <colspec colnum="1" colname="col1" colwidth="1*"/>

-          <colspec colnum="2" colname="col2" colwidth="2.5*"/>

-          <tbody>

-            <row>

-              <entry><emphasis>Overview of the Documentation</emphasis>

-              </entry>

-              <entry>

-                <para>What you are currently reading.  Lists the documents provided in the Apache 

-                UIMA documentation set and provides

-                 a recommended path through the documentation for getting started using

-                  UIMA.  It includes release notes and provides a brief high-level description of 

-                  the different software modules included in the

-                  Apache UIMA Project.  See <xref linkend="ugr.project_overview_doc_overview"/>.</para>

-              </entry>

-            </row>

-            <row>

-              <entry><emphasis>Conceptual Overview</emphasis>

-              </entry>

-              <entry>Provides a broad conceptual overview of the UIMA component architecture; includes

-                references to the other documents in the documentation set that provide more detail.

-                See <xref linkend="ugr.ovv.conceptual"/></entry>

-            </row>

-            <row>

-              <entry><emphasis>UIMA FAQs</emphasis>

-              </entry>

-              <entry>Frequently Asked Questions about general UIMA concepts. (Not a programming

-                resource.)  See <xref linkend="ugr.faqs"/>.</entry>

-            </row>

-            <row>

-              <entry><emphasis>Known Issues</emphasis>

-              </entry>

-              <entry>Known issues and problems with the UIMA SDK.  See <xref linkend="ugr.issues"/>.</entry>

-            </row>

-            <row>

-              <entry><emphasis>Glossary</emphasis>

-              </entry>

-              <entry>UIMA terms and concepts and their basic definitions.  See <xref linkend="ugr.glossary"/>.</entry>

-            </row>

-          </tbody>

-        </tgroup>

-      </informaltable>

-    </section>

-    

-    <section id="ugr.project_overview_setup">

-      <title>Eclipse Tooling Installation and Setup</title>

-      <para>Provides step-by-step instructions for installing Apache UIMA in the Eclipse Interactive

-        Development Environment.  See <xref linkend="ugr.ovv.eclipse_setup"/>.</para>

-    </section>

-    

-    <section id="ugr.project_overview_tutorials_dev_guides">

-      <title>Tutorials and Developer&apos;s Guides</title>

-      <informaltable>

-        <tgroup cols="2">

-          <colspec colnum="1" colname="col1" colwidth="1*"/>

-          <colspec colnum="2" colname="col2" colwidth="2.5*"/>

-          <tbody>

-            <row id="ugr.project_overview_tutorial_annotator">

-              <entry><emphasis>Annotators and Analysis Engines</emphasis>

-              </entry>

-              <entry>Tutorial-style guide for building UIMA annotators and analysis engines. This chapter

-                introduces the developer to creating type systems and using UIMA&apos;s common data structure,

-                the CAS or Common Analysis Structure. It demonstrates how to use built in tools to specify and create

-                basic UIMA analysis components.  See 

-                <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aae"/>.</entry>

-            </row>

-            <row id="ugr.project_overview_tutorial_cpe">

-              <entry><emphasis>Building UIMA Collection Processing Engines</emphasis>

-              </entry>

-              <entry>Tutorial-style guide for building UIMA collection processing engines. These

-               manage the

-                analysis of collections of documents from source to sink.  See 

-                <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.cpe"/>.</entry>

-            </row>

-            <row id="ugr.project_overview_tutorial_application_development">

-              <entry><emphasis>Developing Complete Applications</emphasis>

-              </entry>

-              <entry>Tutorial-style guide on using the UIMA APIs to create, run and manage UIMA components from

-                your application. Also describes APIs for saving and restoring the contents of a CAS using an XML

-                format called <trademark class="registered"> XMI</trademark>.  See 

-                <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.application"/>.</entry>

-            </row>

-            <row id="ugr.project_overview_guide_flow_controller">

-              <entry><emphasis>Flow Controller</emphasis>

-              </entry>

-              <entry>When multiple components are combined in an Aggregate, each CAS flow among the various

-                components. UIMA provides two built-in flows, and also allows custom flows to be

-                implemented.  See <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.fc"/>.</entry>

-            </row>

-            <row id="ugr.project_overview_guide_multiple_sofas">

-              <entry><emphasis>Developing Applications using Multiple Subjects of Analysis</emphasis>

-              </entry>

-              <entry>A single CAS maybe associated with multiple subjects of analysis (Sofas). These are useful

-                for representing and analyzing different formats or translations of the same document. For

-                multi-modal analysis, Sofas are good for different modal representations of the same stream

-                (e.g., audio and close-captions).This chapter provides the developer details on how to use

-                multiple Sofas in an application.  See 

-                <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aas"/>.</entry>

-            </row>

-            <row id="ugr.project_overview_guide_multiple_views">

-              <entry><emphasis>Multiple CAS Views of an Artifact</emphasis>

-              </entry>

-              <entry>UIMA provides an extension to the basic model of the CAS which supports 

-              analysis of multiple views of the same artifact, all contained with the CAS. This 

-              chapter describes the concepts, terminology, and the API and XML extensions that 

-              enable this.  See 

-                <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.mvs"/>.</entry>

-            </row>

-            <row id="ugr.project_overview_guide_cas_multiplier">

-              <entry><emphasis>CAS Multiplier</emphasis>

-              </entry>

-              <entry>A component may add additional CASes into the workflow. This may be useful to break up a large

-                artifact into smaller units, or to create a new CAS that collects information from multiple other

-                CASes.  See <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.cm"/>.</entry>

-            </row>

-            <row id="ugr.project_overview_xmi_emf">

-              <entry><emphasis>XMI and EMF Interoperability</emphasis>

-              </entry>

-              <entry>The UIMA Type system and the contents of the CAS itself can be externalized using the XMI

-                standard for XML MetaData. Eclipse Modeling Framework (EMF) tooling can be used to develop

-                applications that use this information.  See 

-                <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.xmi_emf"/>.</entry>

-            </row>

-          </tbody>

-        </tgroup>

-      </informaltable>

-    </section>

-    

-    <section id="ugr.project_overview_tool_guides">

-      <title>Tools Users&apos; Guides</title>

-      

-      <informaltable>

-        <tgroup cols="2">

-          <colspec colnum="1" colname="col1" colwidth="1*"/>

-          <colspec colnum="2" colname="col2" colwidth="2.5*"/>

-          <tbody>

-            <row id="ugr.project_overview_tools_component_descriptor_editor">

-              <entry><emphasis>Component Descriptor Editor</emphasis>

-              </entry>

-              <entry>Describes the features of the Component Descriptor Editor Tool. This tool provides a GUI for

-                specifying the details of UIMA component descriptors, including those for Analysis Engines

-                (primitive and aggregate), Collection Readers, CAS Consumers and Type Systems.  See 

-                <olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cde"/>.</entry>

-            </row>

-            <row id="ugr.project_overview_tools_cpe_configurator">

-              <entry><emphasis>Collection Processing Engine Configurator</emphasis>

-              </entry>

-              <entry>Describes the User Interfaces and features of the CPE Configurator tool. This tool allows the

-                user to select and configure the components of a Collection Processing Engine and then to run the

-                engine.  See 

-                <olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cpe"/>.</entry>

-            </row>

-            <row id="ugr.project_overview_tools_pear_packager">

-              <entry><emphasis>Pear Packager</emphasis>

-              </entry>

-              <entry>Describes how to use the PEAR Packager utility. This utility enables developers to produce an

-                archive file for an analysis engine that includes all required resources for installing that

-                analysis engine in another UIMA environment.  See 

-                <olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.pear.packager"/>.</entry>

-            </row>

-            <row id="ugr.project_overview_tools_pear_installer">

-              <entry><emphasis>Pear Installer</emphasis>

-              </entry>

-              <entry>Describes how to use the PEAR Installer utility. This utility installs and verifies an

-                analysis engine from an archive file (PEAR) with all its resources in the right place so it is ready to

-                run.  See 

-                <olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.pear.installer"/>.</entry>

-            </row>

-            <row id="ugr.project_overview_tools_pear_merger">

-              <entry><emphasis>Pear Merger</emphasis>

-              </entry>

-              <entry>Describes how to use the Pear Merger utility, which does a simple merge of multiple PEAR

-                packages into one.  See 

-                <olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.pear.merger"/>.</entry>

-            </row>

-            <row id="ugr.project_overview_tools_document_analyzer">

-              <entry><emphasis>Document Analyzer</emphasis>

-              </entry>

-              <entry>Describes the features of a tool for applying a UIMA analysis engine to a set of documents and

-                viewing the results.  See 

-                <olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.doc_analyzer"/>.</entry>

-            </row>

-            <row id="ugr.project_overview_tools_cas_visual_debugger">

-              <entry><emphasis>CAS Visual Debugger</emphasis>

-              </entry>

-              <entry>Describes the features of a tool for viewing the detailed structure and contents of a CAS. Good

-                for debugging.  See 

-                <olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cvd"/>.</entry>

-            </row>

-            <row id="ugr.project_overview_tools_jcasgen">

-              <entry><emphasis>JCasGen</emphasis>

-              </entry>

-              <entry>Describes how to run the JCasGen utility, which automatically builds Java classes that

-                correspond to a particular CAS Type System.  See 

-                <olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.jcasgen"/>.</entry>

-            </row>

-            <row id="ugr.project_overview_tools_xml_cas_viewer">

-              <entry><emphasis>XML CAS Viewer</emphasis>

-              </entry>

-              <entry>Describes how to run the supplied viewer to view externalized XML forms of CASes. This viewer

-                is used in the examples.  See 

-                <olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.annotation_viewer"/>.</entry>

-            </row>

-          </tbody>

-        </tgroup>

-      </informaltable>

-    </section>

-    

-    <section id="ugr.project_overview_reference">

-      <title>References</title>

-      <informaltable>

-        <tgroup cols="2">

-          <colspec colnum="1" colname="col1" colwidth="1*"/>

-          <colspec colnum="2" colname="col2" colwidth="2.5*"/>

-          <tbody>

-            <row id="ugr.project_overview_javadocs">

-              <entry><emphasis>Introduction to the UIMA API Javadocs</emphasis>

-              </entry>

-              <entry>Javadocs detailing the UIMA programming interfaces  See 

-                <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.javadocs"/></entry>

-            </row>

-            <row id="ugr.project_overview_xml_ref_component_descriptor">

-              <entry><emphasis>XML: Component Descriptor</emphasis>

-              </entry>

-              <entry>Provides detailed XML format for all the UIMA component descriptors, except the CPE (see

-                next).  See 

-                <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.component_descriptor"/>.</entry>

-            </row>

-            <row id="ugr.project_overview_xml_ref_collection_processing_engine_descriptor">

-              <entry><emphasis>XML: Collection Processing Engine Descriptor</emphasis>

-              </entry>

-              <entry>Provides detailed XML format for the Collection Processing Engine descriptor.  See 

-                <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/></entry>

-            </row>

-            <row id="ugr.project_overview_cas">

-              <entry><emphasis>CAS</emphasis>

-              </entry>

-              <entry>Provides detailed description of the principal CAS interface.  See 

-                <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.cas"/></entry>

-            </row>

-            <row id="ugr.project_overview_jcas">

-              <entry><emphasis>JCas</emphasis>

-              </entry>

-              <entry>Provides details on the JCas, a native Java interface to the CAS.  See 

-                <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.jcas"/></entry>

-            </row>

-            <row id="ugr.project_overview_ref_pear">

-              <entry><emphasis>PEAR Reference</emphasis>

-              </entry>

-              <entry>Provides detailed description of the deployable archive format for UIMA

-                components.  See 

-                <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.pear"/></entry>

-            </row>

-            <row id="ugr.project_overview_xmi_cas_serialization">

-              <entry><emphasis>XMI CAS Serialization Reference</emphasis>

-              </entry>

-              <entry>Provides detailed description of the deployable archive format for UIMA

-                components.  See 

-                <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xmi"/></entry>

-            </row>

-          </tbody>

-        </tgroup>

-      </informaltable>

-    </section>

-    

-    <section id="ugr.project_overview_v3">

-      <title>Version 3 User's guide</title>

-      <para>This book describes Version 3's features, capabilities, and differences with version 2.

-        </para>

-    </section>

-    

-  </section>

-  

-  <section id="ugr.project_overview_doc_use">

-    <!-- _crossRef358 -->

-    <title>How to use the Documentation</title>

-    <orderedlist>

-      <listitem>

-        <para>Explore this chapter to get an overview of the different documents that are included with Apache UIMA.</para>

-      </listitem>

-      <listitem>

-        <para> Read <olink targetdoc="&uima_docs_overview;" targetptr="ugr.ovv.conceptual"/> to get a broad

-          view of the basic UIMA concepts and philosophy with reference to the other documents included in the

-          documentation set which provide greater detail. </para>

-      </listitem>

-      <listitem>

-        <para> For more general information on the UIMA architecture and how it has been used, refer to the IBM Systems

-          Journal special issue on Unstructured Information Management, on-line at <ulink

-            url="http://www.research.ibm.com/journal/sj43-3.html"/> or to the section of the UIMA project

-          website on Apache website where other publications are listed. </para>

-      </listitem>

-      <listitem>

-        <para> Set up Apache UIMA in your Eclipse environment. To do this, follow the instructions in <xref

-            linkend="ugr.ovv.eclipse_setup"/>. </para>

-      </listitem>

-      <listitem>

-        <para> Develop sample UIMA annotators, run them and explore the results. Read <olink

-            targetdoc="&uima_docs_tutorial_guides;"/> <olink

-            targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aae"/> and follow it like a tutorial

-          to learn how to develop your first UIMA annotator and set up and run your first UIMA analysis engines.

-          <itemizedlist>

-            <listitem>

-              <para> As part of this you will use a few tools including

-                <itemizedlist>

-                  <listitem>

-                    <para> The UIMA Component Descriptor Editor, described in more detail in <olink

-                        targetdoc="&uima_docs_tools;"/> <olink

-                        targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cde"/> and </para>

-                  </listitem>

-                  <listitem>

-                    <para> The Document Analyzer, described in more detail in <olink

-                        targetdoc="&uima_docs_tools;"/> <olink

-                        targetdoc="&uima_docs_tools;" targetptr="ugr.tools.doc_analyzer"/>. </para>

-                  </listitem>

-                  

-                </itemizedlist> </para>

-              

-            </listitem>

-            <listitem>

-              <para>While following along in <olink targetdoc="&uima_docs_tutorial_guides;"/>

-                <olink targetdoc="&uima_docs_tutorial_guides;"

-                  targetptr="ugr.tug.aae"/>, reference documents that may help are:

-                <itemizedlist>

-                  <listitem>

-                    <para> <olink targetdoc="&uima_docs_ref;"/> <olink targetdoc="&uima_docs_ref;"

-                        targetptr="ugr.ref.xml.component_descriptor"/> for understanding the analysis

-                      engine descriptors </para>

-                  </listitem>

-                  <listitem>

-                    <para> <olink targetdoc="&uima_docs_ref;"/> 

-                      <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.jcas"/> for

-                      understanding the JCas </para>

-                  </listitem>

-                </itemizedlist> </para>

-            </listitem>

-          </itemizedlist> </para>

-      </listitem>

-      <listitem>

-        <para> Learn how to create, run and manage a UIMA analysis engine as part of an application. 

-          Connect your analysis engine to the provided semantic search engine to learn how a

-          complete analysis and search application may be built with Apache UIMA. <olink

-            targetdoc="&uima_docs_tutorial_guides;"/> <olink

-            targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.application"/> will guide you

-          through this process.

-          <itemizedlist>

-            <listitem>

-              <para> As part of this you will use the document analyzer (described in more detail in <olink

-                  targetdoc="&uima_docs_tools;"/> <olink

-                  targetdoc="&uima_docs_tools;" targetptr="ugr.tools.doc_analyzer"/> and semantic search

-                GUI tools (see <olink targetdoc="&uima_docs_tutorial_guides;"/>

-                <olink targetdoc="&uima_docs_tutorial_guides;"

-                  targetptr="ugr.tug.application.search.query_tool"/>. </para>

-            </listitem>

-          </itemizedlist> </para>

-      </listitem>

-      <listitem>

-        <para> Pat yourself on the back. Congratulations! If you reached this step successfully, then you have an

-          appreciation for the UIMA analysis engine architecture. You would have built a few sample annotators,

-          deployed UIMA analysis engines to analyze a few documents, searched over the results using the built-in

-          semantic search engine and viewed the results through a built-in viewer

-          &ndash; all as part of a simple but complete application. </para>

-      </listitem>

-      <listitem>

-        <para> Develop and run a Collection Processing Engine (CPE) to analyze and gather the results of an entire

-          collection of documents. <olink targetdoc="&uima_docs_tutorial_guides;"/>

-          <olink targetdoc="&uima_docs_tutorial_guides;"

-            targetptr="ugr.tug.cpe"/> will guide you through this process.

-          <itemizedlist>

-            <listitem>

-              <para> As part of this you will use the CPE Configurator tool. For details see <olink

-                  targetdoc="&uima_docs_tools;"/> <olink

-                  targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cpe"/>. </para>

-            </listitem>

-            <listitem>

-              <para> You will also learn about CPE Descriptors. The detailed format for these may be found in <olink

-                  targetdoc="&uima_docs_ref;"/> <olink

-                  targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/>. </para>

-            </listitem>

-          </itemizedlist> </para>

-      </listitem>

-      <listitem>

-        <para> Learn how to package up an analysis engine for easy installation into another UIMA environment.

-            <olink targetdoc="&uima_docs_tools;"/>

-            <olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.pear.packager"/> and <olink

-            targetdoc="&uima_docs_tools;"/> <olink

-            targetdoc="&uima_docs_tools;" targetptr="ugr.tools.pear.installer"/> will teach you how to

-          create UIMA analysis engine archives so that you can easily share your components with a broader

-          community. </para>

-      </listitem>

-    </orderedlist>

-  </section>

-  

-  <section id="ugr.project_overview_changes_from_previous">

-      <title>Changes from UIMA Version 2</title>

-      <para>See the separate document Version 3 User's Guide.s</para>

-  </section> 

-     

-  <section id="ugr.project_overview_migrating_from_v2_to_v3">

-    <title>Migrating existing UIMA pipelines from Version 2 to Version 3</title>

-    <para>The format of JCas classes changed when going from version 2 to version 3. 

-          If you had JCas classes for user types, these need to be regenerated using the 

-          version 3 JCasGen tooling or Maven plugin.  Alternatively, these can be 

-          migrated without regenerating; the migration preserves any customization 

-          users may have added to the JCas classes.</para>

-          

-    <para>The Version 3 User's Guide has a chapter detailing the migration, including

-      a description of the migration tool to aid in this process.</para>

-  </section>

-  

-  <section id="ugr.project_overview_summary">

-    <title>Apache UIMA Summary</title>

-    <section id="ugr.ovv.summary.general">

-      <title>General</title>

-      <para>UIMA supports the development, discovery, composition and deployment of multi-modal

-        analytics for the analysis of unstructured information and its integration with search

-        technologies.</para>

-      

-      <para>Apache UIMA includes APIs and tools for creating analysis components. Examples of analysis components include

-        tokenizers, summarizers, categorizers, parsers, named-entity detectors etc. Tutorial examples are

-        provided with Apache UIMA; additional components are available from the community. </para>

-    </section>

-    <section id="ugr.ovv.summary.programming_language_support">

-      <title>Programming Language Support</title>

-      <para>UIMA supports the development and integration of analysis algorithms developed in different

-        programming languages. </para>

-      

-      <para>The Apache UIMA project is both a Java framework and a matching C++

-        enablement layer, which allows annotators to be written in C++ and have access to a C++ version of the CAS. The

-        C++ enablement layer also enables annotators to be written in Perl, Python, and TCL, and to interoperate with

-        those written in other languages. <!--Documentation for this is provided here (link to be filled in).-->

-        </para>

-      

-    </section>

-    <section id="ugr.ovv.general.summary.multi_modal_support">

-      <title>Multi-Modal Support</title>

-      <para>The UIMA architecture supports the development, discovery, composition and deployment of

-        multi-modal analytics, including text, audio and video. <olink

-          targetdoc="&uima_docs_tutorial_guides;"/> <olink

-          targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aas"/> discuss this is more

-        detail.</para>

-    </section>

-  </section>

-      

-  <section id="ugr.project_overview_summary_sdk_capabilities">

-    <title>Summary of Apache UIMA Capabilities</title>

-    <informaltable frame="all" rowsep="1" colsep="1">

-      <tgroup cols="2">

-        <colspec colnum="1" colname="col1" colwidth=".75*"/>

-        <colspec colnum="2" colname="col2" colwidth="*"/>

-        <tbody>

-          <row>

-            <entry role="tableSubhead">Module</entry>

-            <entry role="tableSubhead">Description</entry>

-          </row>

-          <row>

-            <entry>UIMA Framework Core</entry>

-            <entry>

-              <para>A framework integrating core functions for creating, deploying, running and managing UIMA

-                components, including analysis engines and Collection Processing Engines in collocated and/or

-                distributed configurations. </para>

-              

-              <para>The framework includes an implementation of core components for transport layer adaptation,

-                CAS management, workflow management based on declarative specifications, resource management,

-                configuration management, logging, and other functions.</para>

-            </entry>

-          </row>

-          <row>

-            <entry>C++ and other programming language Interoperability</entry>

-            

-            <entry>

-              <para>Includes C++ CAS and supports the creation of UIMA compliant C++ components that can be

-                deployed in the UIMA run-time through a built-in JNI adapter. This includes high-speed binary

-                serialization.</para>

-              

-              <para>Includes support for creating service-based UIMA engines. This is ideal for

-                wrapping existing code written in different languages.</para>

-            </entry>

-          </row>

-          <row>

-            <entry role="tableSubhead">Framework Services and APIs</entry>

-            <entry role="tableSubhead">Note that interfaces of these components are available to the developer

-              but different implementations are possible in different implementations of the UIMA

-              framework.</entry>

-          </row>

-          <row>

-            <entry>CAS</entry>

-            <entry>These classes provide the developer with typed access to the Common Analysis Structure (CAS),

-              including type system schema, elements, subjects of analysis and indices. Multiple subjects of

-              analysis (Sofas) mechanism supports the independent or simultaneous analysis of multiple views of

-              the same artifacts (e.g. documents), supporting multi-lingual and multi-modal analysis.</entry>

-          </row>

-          <row>

-            <entry>JCas</entry>

-            <entry>An alternative interface to the CAS, providing Java-based UIMA Analysis components with

-              native Java object access to CAS types and their attributes or features, using the

-              JavaBeans conventions of getters and setters.</entry>

-          </row>

-          

-          <row>

-            <entry>Collection Processing Management (CPM)</entry>

-            <entry>Core functions for running UIMA collection processing engines in collocated and/or

-              distributed configurations. The CPM provides scalability across parallel processing pipelines,

-              check-pointing, performance monitoring and recoverability.</entry>

-          </row>

-          <row>

-            <entry>Resource Manager</entry>

-            <entry>Provides UIMA components with run-time access to external resources handling capabilities

-              such as resource naming, sharing, and caching. </entry>

-          </row>

-          <row>

-            <entry>Configuration Manager</entry>

-            <entry>Provides UIMA components with run-time access to their configuration parameter settings.

-              </entry>

-          </row>

-          <row>

-            <entry>Logger</entry>

-            <entry>Provides access to a common logging facility.</entry>

-          </row>

-          <row>

-            <entry namest="col1" nameend="col2" align="center" role="tableSubhead"> Tools and Utilities

-              </entry>

-          </row>

-          <row>

-            <entry>JCasGen</entry>

-            <entry>Utility for generating a Java object model for CAS types from a UIMA XML type system

-              definition.</entry>

-          </row>

-          <row>

-            <entry>Saving and Restoring CAS contents</entry>

-            <entry>APIs in the core framework support saving and restoring the contents of a CAS to streams 

-              in multiple formats, including XMI, binary, and compressed forms.  

-              These apis are collected into the CasIOUtils class.</entry>

-          </row>

-          <row>

-            <entry>PEAR Packager for Eclipse</entry>

-            <entry>Tool for building a UIMA component archive to facilitate porting, registering, installing and

-              testing components.</entry>

-          </row>

-          <row>

-            <entry>PEAR Installer</entry>

-            <entry>Tool for installing and verifying a UIMA component archive in a UIMA installation.</entry>

-          </row>

-          <row>

-            <entry>PEAR Merger</entry>

-            <entry>Utility that combines multiple PEARs into one.</entry>

-          </row>

-          <row>

-            <entry>Component Descriptor Editor</entry>

-            <entry>Eclipse Plug-in for specifying and configuring component descriptors for UIMA analysis

-              engines as well as other UIMA component types including Collection Readers and CAS

-              Consumers.</entry>

-          </row>

-          <row>

-            <entry>CPE Configurator</entry>

-            <entry>Graphical tool for configuring Collection Processing Engines and applying them to

-              collections of documents.</entry>

-          </row>

-          <row>

-            <entry>Java Annotation Viewer</entry>

-            <entry>Viewer for exploring annotations and related CAS data.</entry>

-          </row>

-          <row>

-            <entry>CAS Visual Debugger</entry>

-            <entry>GUI Java application that provides developers with detailed visual view of the contents of a

-              CAS.</entry>

-          </row>

-          <row>

-            <entry>Document Analyzer</entry>

-            <entry>GUI Java application that applies analysis engines to sets of documents and shows results in a

-              viewer.</entry>

-          </row>

-          <row>

-            <entry>CAS Editor</entry>

-            <entry>Eclipse plug-in that lets you edit the contents of a CAS</entry>

-          </row>

-          <row>

-            <entry>UIMA Pipeline Eclipse Launcher</entry>

-            <entry>Eclipse plug-in that lets you configure Eclipse launchers for UIMA pipelines</entry>

-          </row>

-          <row>

-            <entry namest="col1" nameend="col2" align="center" role="tableSubhead"> Example Analysis

-              Components </entry>

-          </row>

-          <row>

-            <entry>Database Writer</entry>

-            <entry>CAS Consumer that writes the content of selected CAS types into a relational database, using

-              JDBC. This code is in cpe/PersonTitleDBWriterCasConsumer. </entry>

-          </row>

-          <row>

-            <entry>Annotators</entry>

-            <entry> Set of simple annotators meant for pedagogical purposes. Includes: Date/time, Room-number,

-              Regular expression, Tokenizer, and Meeting-finder annotator. There are sample CAS Multipliers

-              as well. </entry>

-          </row>

-          <row>

-            <entry>Flow Controllers</entry>

-            <entry> There is a sample flow-controller based on the whiteboard concept of sending the CAS to whatever

-              annotator hasn't yet processed it, when that annotator's inputs are available in the CAS. </entry>

-          </row>

-          <row>

-            <entry>XMI Collection Reader, CAS Consumer</entry>

-            <entry>Reads and writes the CAS in the XMI format</entry>

-          </row>

-          

-          <row>

-            <entry>File System Collection Reader</entry>

-            <entry> Simple Collection Reader for pulling documents from the file system and initializing CASes.

-              </entry>

-          </row>

-        </tbody>

-      </tgroup>

-    </informaltable>

-  </section>

-  

-</chapter>
\ No newline at end of file
diff --git a/uima-docbook-references/pom.xml b/uima-docbook-references/pom.xml
deleted file mode 100644
index 3b0a8f0..0000000
--- a/uima-docbook-references/pom.xml
+++ /dev/null
@@ -1,50 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!--
-   Licensed to the Apache Software Foundation (ASF) under one
-   or more contributor license agreements.  See the NOTICE file
-   distributed with this work for additional information
-   regarding copyright ownership.  The ASF licenses this file
-   to you under the Apache License, Version 2.0 (the
-   "License"); you may not use this file except in compliance
-   with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
-   Unless required by applicable law or agreed to in writing,
-   software distributed under the License is distributed on an
-   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-   KIND, either express or implied.  See the License for the
-   specific language governing permissions and limitations
-   under the License.    
--->
-<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
-  <modelVersion>4.0.0</modelVersion>
-
-  <parent>
-    <groupId>org.apache.uima</groupId>
-    <artifactId>uimaj-parent</artifactId>
-    <version>3.5.0-SNAPSHOT</version>
-    <relativePath>../uimaj-parent/pom.xml</relativePath>
-  </parent>
-
-  <artifactId>uima-docbook-references</artifactId>
-  <packaging>pom</packaging>
-  <name>Apache UIMA SDK Documentation - references</name>
-  <url>${uimaWebsiteUrl}</url>
-
-  <properties>
-    <!-- next property is the name of the top file under src/docbook without trailing .xml -->
-    <bookNameRoot>references</bookNameRoot>
-  </properties>
-
-  <repositories>
-    <repository>
-      <id>apache.snapshots</id>
-      <name>Apache Snapshot Repository</name>
-      <url>https://repository.apache.org/snapshots</url>
-      <releases>
-        <enabled>false</enabled>
-      </releases>
-    </repository>
-  </repositories>
-</project>
\ No newline at end of file
diff --git a/uima-docbook-references/src/docbook/ref.cas.xml b/uima-docbook-references/src/docbook/ref.cas.xml
deleted file mode 100644
index 00a0c86..0000000
--- a/uima-docbook-references/src/docbook/ref.cas.xml
+++ /dev/null
@@ -1,1228 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY imgroot "images/references/ref.cas/" >

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.ref.cas">

-  <title>CAS Reference</title>

-  

-  <para>The CAS (Common Analysis System) is the part of the Unstructured Information

-    Management Architecture (UIMA) that is concerned with creating and handling the data

-    that annotators manipulate.</para>

-  

-  <para>Java users typically use the JCas (Java interface to the CAS) when manipulating

-    objects in the CAS. This chapter describes an alternative interface to the CAS which

-    allows discovery and specification of types and features at run time. It is recommended

-    for use when the using code cannot know ahead of time the type system it will be dealing

-    with.</para>

-    

-  <para>Use of the CAS as described here is also recommended (or necessary) when components add

-  to the definitions of types of other components.  This UIMA feature allows users to add features

-  to a type that was already defined elsewhere.  When this feature is used in conjunction with the

-  JCas, it can lead to problems with class loading.  This is because different JCas representations

-  of a single type are generated by the different components, and only one of them is loaded 

-  (unless you are using Pear descriptors).  Note:

-  we do not recommend that you add features to pre-existing types.  A type should be defined in one

-  place only, and then there is no problem with using the JCas.  However, if you do use this feature,

-  do not use the JCas.  Similarly, if you distribute your components for inclusion in somebody else's

-  UIMA application, and you're not sure that they won't add features to your types, do not use the

-  JCas for the same reasons.

-  </para>

-  

-  <section id="ugr.ref.cas.javadocs">

-    <title>Javadocs</title>

-    

-    <para>The subdirectory <literal>docs/api</literal> contains the documentation

-      details of all the classes, methods, and constants for the APIs discussed here. Please

-      refer to this for details on the methods, classes and constants, specifically in the

-      packages <literal>org.apache.uima.cas.*</literal>.</para>

-  </section>

-  

-  <section id="ugr.ref.cas.overview">

-    <title>CAS Overview</title>

-    

-    <para>There are three<footnote><para>A fourth part, the Subject of Analysis,

-      is discussed in <olink targetdoc="&uima_docs_tutorial_guides;"

-        /> <olink targetdoc="&uima_docs_tutorial_guides;"

-        targetptr="ugr.tug.aas"/>.</para></footnote> main parts to the CAS: the type system, data creation and

-      manipulation, and indexing.  We will start with a brief

-      description of these components.</para>

-    <section id="ugr.ref.cas.type_system">

-      <title>The Type System</title>

-      

-      <para>The type system specifies what kind of data you will be able to manipulate in your

-        annotators. The type system defines two kinds of entities, types and features. Types

-        are arranged in a single inheritance tree and define the kinds of entities (objects)

-        you can manipulate in the CAS. Features optionally specify slots or fields within a

-        type. The correspondence to Java is to equate a CAS Type to a Java Class, and the CAS

-        Features to fields within the type. A critical difference is that CAS types have no

-        methods; they are just data structures with named slots (features). These features can

-        have as values primitive things like integers, floating point numbers, and strings,

-        and they also can hold references to other instances of objects in the CAS. We call

-        instances of the data structures declared by the type system <quote>feature

-        structures</quote> (not to be confused with <quote>features</quote>). Feature

-        structures are similar to the many variants of record structures found in computer

-        science.<footnote><para> The name <quote>feature structure</quote> comes from

-        terminology used in linguistics.</para></footnote></para>

-      

-      <para>Each CAS Type defines a supertype; it is a subtype of that supertype. This means

-        that any features that the supertype defines are features of the subtype; in other

-        words, it inherits its supertype&apos;s features. Only single inheritance is

-        supported; a type&apos;s feature set is the union of all of the features in its

-        supertype hierarchy. There is a built-in type called uima.cas.TOP; this is the top,

-        root node of the inheritance tree. It defines no features.</para>

-      

-      <para>The values that can be stored in features are either built-in primitive values or

-        references to other feature structures. The primitive values are

-        <literal>boolean</literal>, <literal>byte</literal>,

-        <literal>short</literal> (16 bit integers), <literal>integer</literal> (32

-        bit), <literal>long</literal> (64 bit), <literal>float</literal> (32 bit),

-        <literal>double</literal> (64 bit floats) and strings; the official names of these

-        are <literal>uima.cas.Boolean</literal>, <literal>uima.cas.Byte</literal>,

-        <literal>uima.cas.Short</literal>, <literal>uima.cas.Integer</literal>,

-        <literal>uima.cas.Long</literal>, <literal>uima.cas.Float</literal>

-        ,<literal> uima.cas.Double</literal> and <literal>uima.cas.String</literal>

-        . The strings are Java strings, and characters are Java characters.  Technically, this means

-        that characters are UTF-16 code points, which is not quite the same as a Unicode character.

-        This distinction should make no difference for almost all applications.

-        The CAS also defines other basic built-in types for arrays of these, plus arrays of

-        references to other objects, called <literal>uima.cas.IntegerArray</literal>

-        ,<literal> uima.cas.FloatArray</literal>,

-        <literal>uima.cas.StringArray</literal>,

-        <literal>uima.cas.FSArray</literal>, etc.</para>

-      

-      <para>The CAS also defines a built-in type called

-        <literal>uima.tcas.Annotation</literal> which inherits from

-        <literal>uima.cas.AnnotationBase</literal> which in turn inherits from

-        <literal>uima.cas.TOP</literal>. There are two features defined by this type,

-        called <literal>begin</literal> and <literal>end</literal>, both of which are

-        integer valued.</para>

-      

-    </section>

-    

-    <section id="ugr.ref.cas.creating_accessing_manipulating_data">

-      <title>Creating, accessing and manipulating data</title>

-      <titleabbrev>Creating/Accessing/Changing data</titleabbrev>

-      

-      <para>

-        Creating and accessing data in the CAS requires knowledge about the types and features 

-        defined in the type system.  The idea is similar to other data access APIs, such as the XML

-        DOM or SAX APIs, or database access APIs such as JDBC.  Contrary to those APIs, however, the

-        CAS does not use the names of type system entities directly in the APIs.  Rather, you use

-        the type system to access type and feature entities by name, then use these entities in the

-        data manipulation APIs.  This can be compared to the Java reflection APIs: the type system

-        is comparable to the Java class loader, and the type and feature objects to the

-        <literal>java.lang.Class</literal> and <literal>java.lang.reflect.Field</literal> classes.

-      </para>

-      

-      <para>

-        Why does it have to be this complicated?  You wouldn&apos;t normally use reflection to create a

-        Java object, either.  As mentioned earlier, the JCas provides the more straightforward

-        method to manipulate CAS data.  The CAS access methods described here need only be used for

-        generic types of applications that need to be able to handle any kind of data (e.g., generic

-        tooling) or when the JCas may not be used for other reasons.  The generic kinds of applications

-        are exactly the ones where you would use the reflection API in Java as well.

-      </para>

-      

-    </section>

-    

-    <section id="ugr.ref.cas.creating_using_indexes">

-      <title>Creating and using indexes</title>

-      

-      <para>Each view of a CAS provides a set of indexes for that view. Instances of Types (that is, Feature

-        Structures) can be added to a view&apos;s indexes. These indexes provide

-        a way for annotators to locate existing data in the CAS, using a specific index (or the

-        method <literal>getAllIndexedFS</literal> of the object <literal>FSIndexRepository</literal>) to

-        retrieve the Feature Structures that were previously created. If you want the data you

-        Newly created Feature Structures are not automatically added to the indexes; you choose which

-        Feature Structures to add and use one of several APIs to add them. 

-        </para>

-      

-      <para>Indexes are named and are associated with a CAS Type; they are used to index

-        instances of that CAS type (including instances of that type&apos;s subtypes). If

-        you are using multiple views (see <olink

-          targetdoc="&uima_docs_tutorial_guides;"/> <olink

-          targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.mvs"/>),

-        each view contains a separate instantiation of all of the indexes.

-        To access an index, you

-        minimally need to know its name. A CAS view provides an index repository which you can

-        query for indexes for that view. Once you have a handle to an index, you can get

-        information about the feature structures in the index, the size of the index, as well

-        as an iterator over the feature structures.</para>

-        

-      <para>There are three kinds of indexes:

-        <itemizedlist spacing="compact">

-          <listitem>

-            <para>bag - no ordering</para>

-          </listitem>

-          <listitem>

-            <para>set - uses a user-specfied set of keys to define equality; holds one instance of the set of equal items.</para>

-          </listitem>

-          <listitem>

-            <para>sorted - uses a user-specified set of keys to define ordering.</para>

-          </listitem>          

-        </itemizedlist>

-      </para>

-      

-      <para>For set indexes, the comparator keys are augmented with an implicit additional field - the type of the

-        feature structure.  This means that an index over Annotations, having subtype Token, and a key of the "begin" value,

-        will behave as follows:

-        

-        <itemizedlist>

-          <listitem><para>If you make two Tokens (or two Annotations), both having a begin value of 17, and add both of them to the indexes,

-            only one of them will be in the index.</para>

-          </listitem>

-          <listitem><para>If you make 1 Token and 1 Annotation, both having a begin value of 17, and add both of them to the indexes,

-            both of them will be in the index (because the types are different).

-          </para></listitem>

-        </itemizedlist> 

-      </para>

-      

-      <para>Indexes are defined in the XML descriptor metadata for the application. Each CAS

-        View has its own, separate instantiation of indexes based on these definitions, 

-        kept in the view's index repository. When you obtain an index, it is always from a

-        particular CAS view's index repository. 

-        When you index an item, it is always added to all indexes where it

-        belongs, within just the view's repository. You can specify different repositories

-        (associated with different CAS views) to use; a given Feature Structure instance 

-        may be indexed in more than one CAS View (unless it is a subtype of AnnotationBase).</para>

-

-      <para>Indexes implement the Iterable interface, so you may use the Java enhanced for loop to iterate over them.</para>

-            

-      <para>You can also get iterators from indexes; 

-        iterators allow you to enumerate the feature structures in an index.  There are two kinds of iterators supported:

-        the regular Java iterator API, and a specific FS iterator API

-        where the usual Java iterator APIs (<literal>hasNext()</literal> and <literal>next()</literal>)

-        are augmented by <literal>isValid()</literal>, <literal>moveToNext() / moveToPrevious()</literal> (which does

-        not return an element) and <literal>get()</literal>.  Finally, there is a <literal>moveTo(FeatureStructure)</literal>

-        API, which, for sorted indexes, moves the iteration point to the left-most (among otherwise "equal") item

-        in the index which compares "equal" to the given FeatureStructure, using the index's defined comparator.

-      </para>

-      

-      <para>  

-        Which API style you use is up to you,

-        but we do not recommend mixing the styles as the results are sometimes unexpected.  If you

-        just want to iterate over an index from start to finish, either style is equally appropriate.

-        If you also use <literal>moveTo(FeatureStructure fs)</literal> and 

-        <literal>moveToPrevious()</literal>, it is better to use the special FS iterator style.

-      </para>

-      

-      <note><para>The reason to not mix these styles is that you might be thinking that

-        next() followed by moveToPrevious() would always work.  This is not true, because

-        next() returns the "current" element, and advances to the next position, which might be

-        beyond the last element.  At that point, the iterator becomes "invalid", and 

-        moveToNext and moveToPrevious no longer move the iterator.  But you can

-        call these methods on the iterator &mdash; moveToFirst(), moveToLast(), or moveTo(FS) &mdash; to reset it.</para></note>

-      

-      <para>Indexes are created by specifying them in the annotator&apos;s or

-        aggregate&apos;s resource descriptor. An index specification includes its name,

-        the CAS type being indexed, the kind (bag, set or sorted) of index it is, and an (optional) set of keys.

-        The keys are used for set and sorted indexes, and specify what values are used for 

-        ordering, or (for sets) what values are used to determine set equality. 

-        When a CAS pipeline is created, all index

-        specifications are combined; duplicate definitions (having the same name) are

-        allowed only if their definitions are the same. </para>

-      

-      <para>Feature structure instances need to be explicitly added to the index repository by a

-        method call. Feature structures that are not indexed will not be visible to other

-        annotators, (unless they are located via being referenced by some other feature of

-        another feature structure, which is indexed, or through a chain of these).</para>

-      

-      <para>The framework defines an unnamed bag index which indexes all types.  The

-        only access provided for this index is the getAllIndexedFS(type) method on the

-        index repository, which returns an iterator over all indexed instances of the

-        specified type (including its subtypes) for that CAS View.

-      </para>

-      

-      <para>The framework defines one standard, built-in annotation index, called

-        AnnotationIndex, which indexes the <literal>uima.tcas.Annotation</literal>

-        type: all feature structures of type <literal>uima.tcas.Annotation</literal> or

-        its subtypes are automatically indexed with this built-in index.</para>

-      

-      <para>The ordering relation used by this index is to first order by the value of the

-        <quote>begin</quote> features (in ascending order) and then by the value of the

-        <quote>end</quote> feature (in descending order), and then, finally, by the 

-        Type Priority. This ordering insures that

-        longer annotations starting at the same spot come before shorter ones. For Subjects

-        of Analysis other than Text, this may not be an appropriate index.</para>

-        

-      <para>In addition to normal iterators, there is a <literal>select</literal> API, documented

-       in the Version 3 Users guide, which provides additional capabilities for accessing

-       Feature Structures via the indexes.</para>  

-      

-    </section>

-  </section>

-  

-  <section id="ugr.ref.cas.builtin_types">

-    <title>Built-in CAS Types</title>

-    

-    <para>The CAS has two kinds of built-in types &ndash; primitive and non-primitive. The

-      primitive types are:

-      

-      <itemizedlist spacing="compact">

-        <listitem><para>uima.cas.Boolean</para></listitem>

-        <listitem><para>uima.cas.Byte</para></listitem>

-        <listitem><para>uima.cas.Short</para></listitem>

-        <listitem><para>uima.cas.Integer</para></listitem>

-        <listitem><para>uima.cas.Long</para></listitem>

-        <listitem><para>uima.cas.Float</para></listitem>

-        <listitem><para>uima.cas.Double</para></listitem>

-        <listitem><para>uima.cas.String</para></listitem>

-      </itemizedlist></para>

-    

-    <para>The <literal>Byte, Short, Integer, </literal>and<literal> Long</literal> are

-      all signed integer types, of length 8, 16, 32, and 64 bits. The

-      <literal>Double</literal> type is 64 bit floating point. The

-      <literal>String</literal> type can be subtyped to create sets of allowed values; see

-        <olink targetdoc="&uima_docs_ref;"

-        targetptr="ugr.ref.xml.component_descriptor.type_system.string_subtypes"/>.

-      These types can be used to specify the range of a String-valued feature. They act like

-      Strings, but have additional checking to insure the setting of values into them

-      conforms to one of the allowed values, or to null (which is the value if it is not set). 

-      Note that the other primitive types cannot be used

-      as a supertype for another type definition; only

-      <literal>uima.cas.String</literal> can be sub-typed.</para>

-    

-    <para>The non-primitive types exist in a type hierarchy; the top of the hierarchy is the

-      type <literal>uima.cas.TOP</literal>. All other non-primitive types inherit from

-      some supertype.</para>

-    

-    <para>There are 9 built-in array types. These arrays have a size specified when they are

-      created; the size is fixed at creation time. They are named:

-      

-      <itemizedlist spacing="compact">

-        <listitem><para>uima.cas.BooleanArray</para></listitem>

-        <listitem><para>uima.cas.ByteArray</para></listitem>

-        <listitem><para>uima.cas.ShortArray</para></listitem>

-        <listitem><para>uima.cas.IntegerArray</para></listitem>

-        <listitem><para>uima.cas.LongArray</para></listitem>

-        <listitem><para>uima.cas.FloatArray</para></listitem>

-        <listitem><para>uima.cas.DoubleArray</para></listitem>

-        <listitem><para>uima.cas.StringArray</para></listitem>

-        <listitem><para>uima.cas.FSArray</para></listitem>

-      </itemizedlist></para>

-    

-    <para>The <literal>uima.cas.FSArray</literal> type is an array whose elements are

-      arbitrary other feature structures (instances of non-primitive types).</para>

-      

-    <para>The JCas cover classes for the array types support the Iterable API, so you may

-    write extended for loops over instances of these.  For example:

-    <programlisting>FSArray&lt;MyType&gt; myArray = ...

-for (MyType fs : myArray) {

-  some_method(fs);

-}</programlisting>

-    </para>

-    

-    <para>There are 3 built-in types associated with the artifact being analyzed:

-      

-      <itemizedlist spacing="compact">

-        <listitem><para>uima.cas.AnnotationBase</para></listitem>

-        <listitem><para>uima.tcas.Annotation</para></listitem>

-        <listitem><para>uima.tcas.DocumentAnnotation</para></listitem>

-      </itemizedlist></para>

-    

-    <para>The <literal>AnnotationBase</literal> type defines one system-used feature

-      which specifies for an annotation the subject of analysis (Sofa) to which it refers. The

-      Annotation type extends from this and defines 2 features, taking

-      <literal>uima.cas.Integer</literal> values, called <literal>begin</literal>

-      and <literal>end</literal>. The <literal>begin</literal> feature typically

-      identifies the start of a span of text the annotation covers; the

-      <literal>end</literal> feature identifies the end. The values refer to character

-      offsets; the starting index is 0. An annotation of the word <quote>CAS</quote> in a text

-      <quote>CAS Reference</quote> would have a start index of 0, and an end index of 3; the

-      difference between end and start is the length of the span the annotation refers

-      to.</para>

-    

-    <para>Annotations are always with respect to some Sofa (Subject of Analysis &ndash; see

-        <olink targetdoc="&uima_docs_tutorial_guides;"/>

-        <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aas"/>

-      .</para>

-    <note><para>Artifacts which are not text strings may have a different interpretation of

-    the meaning of begin and end, or may define their own kind of annotation, extending from

-    <literal>AnnotationBase</literal>. </para></note>

-    

-    <para id="ugr.ref.cas.document_annotation">The <literal>DocumentAnnotation</literal> type has one special instance. It is

-      a subtype of the Annotation type, and the built-in definition defines one feature,

-      <literal>language</literal>, which is a string indicating the language of the

-      document in the CAS. The value of this language feature is used by the system to control

-      flow among annotators when the <quote>CapabilityLanguageFlow</quote> mode is used,

-      allowing the flow to skip over annotators that don&apos;t process particular

-      languages. Users may extend this type by adding additional features to it, using the XML

-      Descriptor element for defining a type.</para>

-      

-    <note><para>

-      We do <emphasis>not</emphasis> recommend extending the <literal>DocumentAnnotation</literal>

-      type.  If you do, you must <emphasis>not</emphasis> use the JCas, for the reasons stated

-      earlier.

-    </para></note>

-    

-    <para>Each CAS view has a different associated instance of the

-      <literal>DocumentAnnotation</literal> type.  On the CAS, use 

-      <literal>getDocumentationAnnotation()</literal> to access the 

-      <literal>DocumentAnnotation</literal>.</para>

-    

-    <para>There are also built-in types supporting linked lists, similar to the ones available in

-    Java and other programming languages. Their use is

-      constrained by the usual properties of linked lists: not very space efficient, no (efficient)

-      random access, but an easy choice if you don't know how long your list will be ahead of time. The

-      implementation is type specific; there are different list building objects for each of

-      the primitive types, plus one for general feature structures. Here are the type names:

-      <itemizedlist spacing="compact">

-        <listitem><para>uima.cas.FloatList</para></listitem>

-        <listitem><para>uima.cas.IntegerList</para></listitem>

-        <listitem><para>uima.cas.StringList</para></listitem>

-        <listitem><para>uima.cas.FSList</para>

-          <para></para></listitem>

-        <listitem><para>uima.cas.EmptyFloatList</para></listitem>

-        <listitem><para>uima.cas.EmptyIntegerList</para></listitem>

-        <listitem><para>uima.cas.EmptyStringList</para></listitem>

-        <listitem><para>uima.cas.EmptyFSList</para>

-          <para></para></listitem>

-        <listitem><para>uima.cas.NonEmptyFloatList</para></listitem>

-        <listitem><para>uima.cas.NonEmptyIntegerList</para></listitem>

-        <listitem><para>uima.cas.NonEmptyStringList</para></listitem>

-        <listitem><para>uima.cas.NonEmptyFSList</para></listitem>

-        

-      </itemizedlist></para>

-    

-    <para>For the primitive types <literal>Float</literal>,

-      <literal>Integer</literal>, <literal>String</literal> and

-      <literal>FeatureStructure</literal>, there is a base type, for instance,

-      <literal>uima.cas.FloatList</literal>. For each of these, there are two subtypes,

-      corresponding to a non-empty element, and a marker that serves to indicate the end of the

-      list, or an empty list. The non-empty types define two features &ndash;

-      <literal>head</literal> and <literal>tail</literal>. The head feature holds the

-      particular value for that part of the list. The tail refers to the next list object

-      (either a non-empty one or the empty version to indicate the end of the list).</para>

-    

-    <para>For JCas users, the new operator for the NonEmptyXyzList classes includes a 3 argument version

-    where you may specify the head and tail values as part of the constructor.  The JCas 

-    cover classes for these implement

-    a <code>push(item)</code> method which creates a new non-empty node, sets the <code>head</code> value

-    to <code>item</code>, and the tail to the node it is called on, and returns the new node.

-    These classes also implement Iterable, so you can use the enhanced Java <code>for</code> operator.

-    The iterator stops when it gets to the end of the list, determined by either the tail being null or 

-    the element being one of the EmptyXXXList elements.

-    Here's a StringList example:

-    <programlisting>StringList sl = jcas.emptyStringList();

-sl = sl.push("2");

-sl = sl.push("1");

-

-for (String s : sl) {

-  someMethod(s);  // some sample use

-}</programlisting>

-    

-    </para>

-    

-    <para>There are no other built-in types. Users are free to define their own type systems,

-      building upon these types.</para>

-    

-  </section>

-  

-  <section id="ugr.ref.cas.accessing_the_type_system">

-    <title>Accessing the type system</title>

-    

-    <para>

-      During annotator processing, or outside an annotator, access the type system by calling 

-      <literal>CAS.getTypeSystem()</literal>.

-    </para>

-    

-    <para>However, CAS annotators implement an additional method,

-      <literal>typeSystemInit()</literal>, which is called by the UIMA framework before the

-      annotator&apos;s process method. This method, implemented by the annotator writer,

-      is passed a reference to the CAS&apos;s type system metadata. The method typically uses

-      the type system APIs to obtain type and feature objects corresponding to all the types

-      and features the annotator will be using in its process method. This initialization

-      step should not be done during an annotator&apos;s initialize method since the type

-      system can change after the initialize method is called; it should not be done during the

-      process method, since this is presumably work that is identical for each incoming

-      document, and so should be performed only when the type system changes (which will be a

-      rare event). The UIMA framework guarantees it will call the <literal>typeSystemInit

-      </literal>method of an annotator whenever the type system changes, before calling the

-      annotator&apos;s <literal>process()</literal> method.</para>

-    

-    <para>The initialization done by <literal>typeSystemInit()</literal> is done by the

-      UIMA framework when you use the JCas APIs; you only need to provide a

-      <literal>typeSystemInit()</literal> method, as described here, when you are not using

-      the JCas approach.</para>

-    

-    <section id="ugr.ref.cas.type_system.printer_example">

-      <title>TypeSystemPrinter example</title>

-      

-      <para>Here is a code fragment that, given a CAS Type System, will print a list of all

-        types.</para>

-      

-      

-      <programlisting>// Get all type names from the type system

-// and print them to stdout.

-private void listTypes1(TypeSystem ts) {

-  for (Type t : ts) {

-    // print its name.

-    System.out.println(t.getName());

-  }

-}</programlisting>

-      

-      <para>This method is passed the type system as a parameter.  From the type system, we can 

-        get an iterator

-        over all the types. If you run this against a CAS created with no additional

-        user-defined types, we should see something like this on the console:</para>

-      

-      <programlisting>Types in the type system: 

-uima.cas.Boolean 

-uima.cas.Byte

-uima.cas.Short 

-uima.cas.Integer 

-uima.cas.Long 

-uima.cas.ArrayBase 

-...

-        </programlisting>

-      

-      <para>If the type system had user-defined types these would show up too. Note that some

-        of these types are not directly creatable &ndash; they are types used by the framework

-        in the type hierarchy (e.g. uima.cas.ArrayBase).</para>

-      

-      <para>CAS type names include a name-space prefix. The components of a type name are

-        separated by the dot (.). A type name component must start with a Unicode letter,

-        followed by an arbitrary sequence of letters, digits and the underscore (_). By

-        convention, the last component of a type name starts with an uppercase letter, the

-        rest start with a lowercase letter.</para>

-      

-      <para>Listing the type names is mildly useful, but it would be even better if we could see

-        the inheritance relation between the types. The following code prints the

-        inheritance tree in indented format.</para>

-      

-      

-      <programlisting>private static final int INDENT = 2;

-private void listTypes2(TypeSystem ts) {

-  // Get the root of the inheritance tree.

-  Type top = ts.getTopType();

-  // Recursively print the tree.

-  printInheritanceTree(ts, top, 0);

-}

-

-private void printInheritanceTree(TypeSystem ts, Type type, int level) {

-  indent(level); // Print indentation.

-  System.out.println(type.getName());

-  // Get a vector of the immediate subtypes.

-  Vector subTypes =

-    ts.getDirectlySubsumedTypes(type);

-  ++level; // Increase the indentation level.

-  for (int i = 0; i &lt; subTypes.size(); i++) {

-    // Print the subtypes.

-    printInheritanceTree(ts, (Type) subTypes.get(i), level);

-  }

-}

-  

-// A simple, inefficient indenter

-private void indent(int level) {

-  int spaces = level * INDENT;

-  for (int i = 0; i &lt; spaces; i++) {

-    System.out.print(" ");

-  }

-}</programlisting>

-      

-      <para> This example shows that you can traverse the type hierarchy by starting at the top

-        with TypeSystem.getTopType and by retrieving subtypes with

-        <literal>TypeSystem.getDirectlySubsumedTypes()</literal>.</para>

-      

-      <para>The Javadocs also have APIs that allow you to access the features, as well as what

-        the allowed value type is for that feature. Here is sample code which prints out all the

-        features of all the types, together with the allowed value types (the feature

-        <quote>range</quote>). Each feature has a <quote>domain</quote> which is the type

-        where it is defined, as well as a <quote>range</quote>.

-        

-        

-        <programlisting>private void listFeatures2(TypeSystem ts) {

-  Iterator featureIterator = ts.getFeatures();

-  Feature f;

-  System.out.println("Features in the type system:");

-  while (featureIterator.hasNext()) {

-    f = (Feature) featureIterator.next();

-    System.out.println(

-      f.getShortName() + ": " +

-      f.getDomain() + " -&gt; " + f.getRange());

-  }

-  System.out.println();

-}</programlisting></para>

-      

-      <para>We can ask a feature object for its domain (the type it is defined on) and its range

-        (the type of the value of the feature). The terminology derives from the fact that

-        features can be viewed as functions on subspaces of the object space.</para>

-      

-    </section>

-    

-    <section id="ugr.ref.cas.cas_apis_create_modify_feature_structures">

-      <title>Using the CAS APIs to create and modify feature structures</title>

-      <titleabbrev>Using CAS APIs: Feature Structures</titleabbrev>

-      

-      <para>Assume a type system declaration that defines two types: Entity and Person.

-        Entity has no features defined within it but inherits from uima.tcas.Annotation

-        &ndash; so it has the begin and end features. Person is, in turn, a subtype of Entity,

-        and adds firstName and lastName features. CAS type systems are declaratively

-        specified using XML; the format of this XML is described in <olink

-          targetdoc="&uima_docs_ref;"

-          targetptr="ugr.ref.xml.component_descriptor.type_system"/>.

-        

-        

-        <programlisting><![CDATA[<!-- Type System Definition -->

-<typeSystemDescription>

-  <types>

-    <typeDescription>

-      <name>com.xyz.proj.Entity</name>

-      <description />

-      <supertypeName>uima.tcas.Annotation</supertypeName>

-    </typeDescription>

-    <typeDescription>

-      <name>Person</name>

-      <description />

-      <supertypeName>com.xyz.proj.Entity </supertypeName>

-      <features>

-        <featureDescription>

-          <name>firstName</name>

-          <description />

-          <rangeTypeName>uima.cas.String</rangeTypeName>

-        </featureDescription>

-        <featureDescription>

-          <name>lastName</name>

-          <description />

-          <rangeTypeName>uima.cas.String</rangeTypeName>

-        </featureDescription>

-      </features>

-    </typeDescription>

-  </types>

-</typeSystemDescription>]]></programlisting></para>

-      

-  <para>

-    To be able to access types and features, we need to know their names.  The CAS interface defines

-    constants that hold the names of built-in feature names, such as, e.g.,

-    <literal>CAS.TYPE_NAME_INTEGER</literal>.  It is good programming practice to create such

-    constants for the types and features you define, for your own use as well as for others who will

-    be using your annotators.

-  </para>

-      

-      

-      <programlisting>/** Entity type name constant. */

-public static final String ENTITY_TYPE_NAME = "com.xyz.proj.Entity";

-  

-/** Person type name constant. */

-public static final String PERSON_TYPE_NAME = "com. xyz.proj.Person";

-

-/** First name feature name constant. */

-public static final String FIRST_NAME_FEAT_NAME = "firstName";

-

-/** Last name feature name constant. */

-public static final String LAST_NAME_FEAT_NAME = "lastName";</programlisting>

-      

-      <para>Next we define type and feature member variables; these will hold the values of the

-        type and feature objects needed by the CAS APIs, to be assigned during

-        <literal>typeSystemInit()</literal>.</para>

-      

-      

-      <programlisting>// Type system object variables

-private Type entityType;

-private Type personType;

-private Feature firstNameFeature;

-private Feature lastNameFeature;

-private Type stringType;</programlisting>

-      

-      <para>The type system does not throw an exception if we ask for something that is

-        not known, it simply returns null; therefore the code checks for this and throws a proper

-        exception.  We require all these types and features to be defined for the annotator to

-        work.  One might imagine situations where certain computations are predicated on some type

-        or feature being defined in the type system, but that is not the case here.</para>

-      

-      

-      <programlisting>// Get a type object corresponding to a name.

-// If it doesn&apos;t exist, throw an exception.

-private Type initType(String typeName)

-  throws AnnotatorInitializationException {

-  Type type = ts.getType(typeName);

-  if (type == null) {

-    throw new AnnotatorInitializationException(

-      AnnotatorInitializationException.TYPE_NOT_FOUND,

-      new Object[] { this.getClass().getName(), typeName });

-  }

-  return type;

-}

-

-// We add similar code for retrieving feature objects.

-// Get a feature object from a name and a type object.

-// If it doesn&apos;t exist, throw an exception.

-private Feature initFeature(String featName, Type type)

-  throws AnnotatorInitializationException {

-  Feature feat = type.getFeatureByBaseName(featName);

-  if (feat == null) {

-    throw new AnnotatorInitializationException(

-      AnnotatorInitializationException.FEATURE_NOT_FOUND,

-      new Object[] { this.getClass().getName(), featName });

-  }

-  return feat;

-}</programlisting>

-      

-      <para>Using these two functions, code for initializing the type system described

-        above would be:

-        

-        

-        <programlisting>public void typeSystemInit(TypeSystem aTypeSystem)

-    throws AnalysisEngineProcessException {

-  this.typeSystem = aTypeSystem;

-  // Set type system member variables.

-  this.entityType = initType(ENTITY_TYPE_NAME);

-  this.personType = initType(PERSON_TYPE_NAME);

-  this.firstNameFeature =

-    initFeature(FIRST_NAME_FEAT_NAME, personType);

-  this.lastNameFeature =

-    initFeature(LAST_NAME_FEAT_NAME, personType);

-  this.stringType = initType(CAS.TYPE_NAME_STRING);

-}</programlisting></para>

-      

-      <para>Note that we initialize the string type by using a type name constant from the

-        CAS.</para>

-      

-    </section>

-  </section>

-  

-  <section id="ugr.ref.cas.creating_feature_structures">

-    <title>Creating feature structures</title>

-    

-    <para>To create feature structures in JCas, we use the Java <quote>new</quote>

-      operator. In the CAS, we use one of several different API methods on the CAS object,

-      depending on which of the 10 basic kinds of feature structures we are creating (a plain

-      feature structure, or an instance of the built-in primitive type arrays or FSArray).

-      There are is also a method to create an instance of a

-      <literal>uima.tcas.Annotation</literal>, setting the begin and end

-      values.</para>

-    

-    <para>Once a feature structure is created, it needs to be added to the CAS indexes (unless

-      it will be accessed via some reference from another accessible feature structure). The

-      CAS provides this API: Assuming aCAS holds a reference to a CAS, and token holds a

-      reference to a newly created feature structure, here&apos;s the code to add that

-      feature structure to all the relevant CAS indexes:</para>

-    

-    

-    <programlisting>    // Add the token to the index repository.

-    aCAS.addFsToIndexes(token);</programlisting>

-    

-    <para>There is also a corresponding <literal>removeFsFromIndexes(token)</literal>

-      method on CAS objects.</para>

-    

-    <para>As of version 2.4.1, there are two methods you can use on an index repository 

-    to efficiently bulk-remove all

-    instances of particular types of feature structures from a particular view.  One of these, 

-    <code>aCas.getIndexRepository().removeAllIncludingSubtypes(aType)</code> removes all instances of a particular

-    type, including instances which are subtypes of the specified type.  The other, 

-    <code>aCas.getIndexRepository().removeAllExcludingSubtypes(aType)</code> remove all instances of a particular

-    type, only.  In both cases, the removal is done from the particular view of the CAS referenced

-    by aCas.</para>

-    

-    <section id="ugr.ref.cas.updating_indexed_feature_structures">

-    <title>Updating indexed feature structures</title>

-    <para>Version 2.7.0 added protection for indexes when feature structure key

-    value features are updated.  By default this protection is automatic, but 

-    at some performance cost.  Users may optimize this further.</para>

-    

-    <para>Protection is needed because some of the indexes (the Sorted and Set types) use comparators defined

-    to use values of the particular features; if these values 

-    need to be changed after the feature structure is added to the indexes, 

-    the correct way to do this is to:

-    <orderedlist spacing="compact">

-      <listitem><para>completely remove the item from all indexes where it is indexed, in all views

-      where it is indexed,</para>       

-      </listitem>

-      <listitem><para>update the value of the features being used as keys,</para></listitem>

-      <listitem><para>add the item back to the indexes, in all views.</para></listitem> 

-    </orderedlist></para>

-            

-      <note><para>It&rsquo;s OK to change feature values which are not used in determining

-      sort ordering (or set membership), without removing and re-adding back to the index.

-      </para></note>

-      

-    <!-- <para>To completely remove an item from the indexes may entail removing it multiple times, if it was 

-    added multiple times and (as of version 2.7.0) the JVM global property 

-    <code>uima.allow_duplicate_add_to_indexes</code> is true.</para> -->

-    

-    <para>The automatic protection checks for updates of

-    features being used as keys, and if it finds an update like this for a feature structure that

-    is in the indexes, it removes the feature structure from the indexes, does the update,

-    and adds it back.  It will do this for every feature update.  This is obviously not 

-    efficient when multiple features are being updated; in that case it would better to 

-    remove the feature structure, do all the updates to all the features needing updates, and then

-    do a single add-back operation.</para>

-   

-    <para>This is supported in user&rsquo;s code by using the new method <code>protectIndexes</code> 

-    available in both the CAS and JCas interface.    

-    

-    Here's two ways

-    of using this, one with a try / finally and the other with a Runnable:

-            <programlisting>// an approach using try / finally

-AutoCloseable ac = my_cas.protectIndexes();  // my_cas is a CAS or a JCas

-try {

-   ...  arbitrary user code which updates features

-        which may be "keys" in one or more indexes

-} finally {

-  ac.close();

-}

-

-// This can more compactly be written using the auto-close feature of try:

-

-try (AutoCloseable ac = my_cas.protectIndexes()) {

-   ...  arbitrary user code which updates features 

-        which may be "keys" in one or more indexes

-}

-

-// an approach using a Runnable, written in Java 8 lambda syntax

-my_cas.protectIndexes(() -> {

-  ... arbitrary user code updating "key" features,

-      but no checked exceptions are permitted

-  });</programlisting></para>

-    

-    <para>The <code>protectIndexes</code> implementation only removes feature structures that

-    have features being updated which are used as keys in some index(es). At the end of the scope

-    of the protectIndexes, it adds all of these back.  It also skips removing feature structures

-    from bag indexes, since these have no keys.</para>

-    

-    <para>Within a <code>protectIndexes</code> block, do not do any operations which depend on the 

-    indexes being valid, such as creating and using an iterator.  This is because the removed FSs 

-    are only added back at the end of the protectIndexes block.</para>

-

-    <para>The JVM property <code>-Duima.report_fs_update_corrupts_index</code> will generate a log entry

-    everytime the frameworks finds (and automatically surrounds with a remove - add-back) an update to 

-    a feature which could corrupt the index.  The log entries can be identified by scanning for messages

-    starting with <code>While FS was in the index, the feature</code> - the message goes on to identify

-    the feature in question.  Users can use these reports to find the places in their code where 

-    they can either change the design to avoid updating these values after the item is indexed, or

-    surround the updates with their own <code>protectIndexes</code> blocks.</para>

-    

-    <para>Initially, the out-of-the-box defaults

-    for the UIMA framework will run with an automatic (but somewhat inefficient) protection.  To improve upon this,

-    users would:

-    <itemizedlist>

-      <listitem><para>Turn on reporting using a global JVM flag <code>

-      -Duima.report_fs_update_corrupts_index</code>.

-      This will cause a message to be logged each time the automatic protection is being invoked,

-      and allows the user to find the spots to improve.</para>

-      </listitem>

-      <listitem><para>Improve each spot, perhaps by surrounding the update code with a protectIndexes

-      block, or by rearranging code to reduce updating feature values used as index keys.</para>

-      </listitem>

-      <listitem><para>Once the code is no longer generating any reports, you can turn off the

-      automatic protection for production runs using the JVM global property

-      <code>-Duima.disable_auto_protect_indexes</code>, and rely on the protectIndexes blocks.

-      If protection is disabled, then the corruption detection is skipped, making the production 

-      runs perhaps a bit faster, although this is not significant in most cases.</para></listitem>

-      <listitem><para>For automated build systems, there&rsquo;s a JVM parameter, 

-      <code>-Duima.exception_when_fs_update_corrupts_index</code>, which will throw an

-      exception if any automatic recovery situation is encountered.  You can use this 

-      in build/test scenarios to insure

-      (after adding all needed protectIndexes blocks) that the code remains safe for 

-      turning off the checking in production runs.</para></listitem>

-      

-    </itemizedlist>

-    </para>

-        

-    </section>

-  </section>

-  

-  <section id="ugr.ref.cas.accessing_modifying_features_of_feature_structures">

-    <title>Accessing or modifying features of feature structures</title>

-    <titleabbrev>Accessing or modifying Features</titleabbrev>

-    

-    <para>Values of individual features for a feature structure can be set or referenced,

-      using a set of methods that depend on the type of value that feature is declared to have.

-      There are methods on FeatureStructure for this: getBooleanValue, getByteValue,

-      getShortValue, getIntValue, getLongValue, getFloatValue, getDoubleValue,

-      getStringValue, and getFeatureValue (which means to get a value which in turn is a

-      reference to a feature structure). There are corresponding <quote>setter</quote>

-      methods, as well. These methods on the feature structure object take as arguments the

-      feature object retrieved earlier in the typeSystemInit method.</para>

-    

-    <para>Using the previous example, with the type system initialized with type personType

-      and feature lastNameFeature, here&apos;s a sample code fragment that gets and sets

-      that feature:</para>

-    

-    

-    <programlisting>// Assume aPerson is a variable holding an object of type Person

-// get the lastNameFeature value from the feature structure

-String lastName = aPerson.getStringValue(lastNameFeature);

-// set the lastNameFeature value

-aPerson.setStringValue(lastNameFeature, newStringValueForLastName);</programlisting>

-    

-    <para>The getters and setters for each of the primitive types are defined in the Javadocs

-      as methods of the FeatureStructure interface.</para>

-    

-  </section>

-  

-  <section id="ugr.ref.cas.indexes_and_iterators">

-    <title>Indexes and Iterators</title>

-    

-    <para>Each CAS can have many indexes associated with it; each CAS View contains 

-      a complete set of instantiations of the indexes.   Each index is represented by an

-      instance of the type org.apache.uima.cas.FSIndex. You use the object

-      org.apache.uima.cas.FSIndexRepository, accessible via a method on a CAS object, to

-      retrieve instances of indexes. There are methods that let you select the index

-      by name, by type, or by both name and type. Since each index is already associated with a type, 

-      passing both a name and a type is valid only if the type passed in is the same

-      type or a subtype of the one declared in the index specification for the named index. If you

-      pass in a subtype, the returned FSIndex object refers to an index that will return only

-      items belonging to that subtype (or subtypes of that subtype).</para>

-    

-    <para>The returned FSIndex objects are used, in turn, to create iterators. 

-      There is also a method on the Index Repository, <literal>getAllIndexedFS</literal>, 

-      which will return an iterator over all indexed Feature Structures (for that CAS View),

-      in no particular order.  The iterators

-      created can be used like common Java iterators, to sequentially retrieve items

-      indexed. If the index represents a sorted index, the items are returned in a sorted

-      order, where the sort order is specified in the XML index definition. This XML is part of

-      the Component Descriptor, see <olink targetdoc="&uima_docs_ref;"

-        targetptr="ugr.ref.xml.component_descriptor.aes.index"/>.</para>

-       

-    <para>In UIMA V3, Feature structures may be added to or removed from indexes while iterating

-      over them.  If this happens, any iterators already created will continue to operate over the

-      before-modification version of the index, unless or until the iterator is re-synchronized with the current

-      value of the index via one of the following specific 3 iterator API calls: 

-      moveToFirst, moveToLast, or moveTo(FeatureStructure).

-      ConcurrentModificationException is no longer thrown in UIMA v3.

-    </para>

-    

-    <para>Feature structures being iterated over may have features which are used as the "keys" of an index, updated.

-    If this is done, UIMA will protect the indexes (to prevent index corruption) by automatically removing the 

-    Feature Structure from the indexes, 

-    updating the field, and adding the FS back to the index (possibly in a new position).  

-    This automatic remove / add-back operation no longer makes the iterator throw a ConcurrentModificationException

-    (as it did in UIMA Version 2) if the iterator is incremented or decremented;

-    existing iterators will continue to operate as if no index modification occurred.

-    </para>   

-      

-    <!-- <para>As of version 2.7.0, a new method on FSIndex, <code>withSnapshotIterators(),</code> 

-    allows creating a light-weight FSIndex based on the original FSIndex 

-    that supports doing arbitrary index operations while iterating, and will not throw 

-    <code>ConcurrentModificationException</code>.  Iterators obtained from this instance use a 

-    <emphasis>snapshot</emphasis> technique - they create a snapshot of the original index when the 

-    iterator is created, and then use that snapshot while operating, so the iteration is unaffected by any

-    modifications to the actual index.</para>  -->

-

-    <section id="ugr.ref.cas.index.built_in_indexes">

-      <title>Built-in Indexes</title>

-      

-      <para>An unnamed built-in bag index exists which holds all feature structures which are indexed.

-      The only access to this index is the method getAllIndexedFS(Type) which returns an iterator

-      over all indexed Feature Structures.</para>

-      

-      <para>The CAS also contains a built-in index for the type <literal>uima.tcas.Annotation</literal>, which sorts

-        annotations in the order in which they appear in the document. Annotations are sorted first by increasing

-        <literal>begin</literal> position. Ties are then broken by <emphasis>decreasing</emphasis>

-        <literal>end</literal> position (so that longer annotations come first). Annotations that match in both

-        their <literal>begin</literal> and <literal>end</literal> features are sorted using the Type Priority,

-        if any are defined

-        (see <olink targetdoc="&uima_docs_ref;"

-          targetptr="ugr.ref.xml.component_descriptor.aes.type_priority"/> )</para>

-    </section>

-

-    

-    <section id="ugr.ref.cas.index.adding_to_indexes">

-      <title>Adding Feature Structures to the Indexes</title>

-

-      <para>Feature Structures are added to the indexes by various APIs. These add the Feature Structure to

-        <emphasis>all</emphasis> indexes that are defined for the type of that FeatureStructure (or any of its

-        supertypes), in a particular view. 

-        Note that you should not add a Feature Structure to the indexes until you have set values for all

-        of the features that may be used as sort keys in an index.</para>

-      

-      <para>There are multiple APIs for adding FSs to the index.

-        <itemizedlist>

-          <listitem><para>(preferred) myFeatureStructure.addToIndexes(). This adds the feature structure instance to the

-          view in which it was originally created.</para>

-          </listitem>

-          <listitem><para>(preferred) myFeatureStructure.addToIndexes(JCas or CAS). This adds the feature structure instance to the

-            view represented by the argument.</para>

-          </listitem>

-          <listitem><para>(older form) casView.addFsToIndexes(myFeatureStructure) or jcasView.addFsToIndexes(myFeatureStructure). 

-            This adds the feature structure instance to the

-            view represented by the cas (or jcas).</para>

-          </listitem>

-          <listitem><para>(older form) fsIndexRepositoryView.addFsToIndexes(myFeatureStructure). 

-            This adds the feature structure instance to the

-            view represented by the fsIndexRepository instance.</para>

-          </listitem>

-        </itemizedlist>

-      </para>

-    </section>

-        

-    <section id="ugr.ref.cas.index.iterators">

-      <title>Iterators over UIMA Indexes</title>

-

-      

-      <para>Iterators are objects of class <literal>org.apache.uima.cas.FSIterator.</literal> This class

-        extends <literal>java.util.Iterator</literal> and implements the normal Java iterator methods, plus

-        additional ones that allow moving both forwards and backwards.</para>

-        

-      <para>UIMA Indexes implement iterable, so you can use the index directly in a Java extended for loop.</para>

-        

-    </section>

-    

-    <section id="ugr.ref.cas.index.annotation_index">

-      <title>Special iterators for Annotation types</title>

-      

-      <para>Note: we recommend using the UIMA V3 select framework, instead of the following.

-        It implements all of the following capabilities, and more, in a uniform manner.</para>

-      

-      <para>The built-in index over the <literal>uima.tcas.Annotation</literal> type

-        named <quote><literal>AnnotationIndex</literal></quote> has additional

-        capabilities. To use them, you first get a reference to this built-in index using

-        either the <literal>getAnnotationIndex</literal> method on a CAS View object, or

-        by asking the <literal>FSIndexRepository</literal> object for an index having the

-        particular name <quote>AnnotationIndex</quote>, for example:        

-        

-        <programlisting>AnnotationIndex idx = aCAS.getAnnotationIndex(); 

-// or you can iterate over a specific subtype of Annotation:        

-AnnotationIndex idx = aCAS.getAnnotationIndex(aType); </programlisting></para>

-      

-      <para>This object can be used to produce several additional kinds of iterators. It can

-        produce unambiguous iterators; these skip over elements until it finds one where the

-        start position of the next annotation is equal to or greater than the end position of

-        the previously returned annotation.</para>

-      

-      <para>It can also produce several kinds of subiterators; these are iterators whose

-        annotations fall within the span of another annotation. This kind of iterator can

-        also have the unambiguous property, if desired. It also can be

-        <quote>strict</quote> or not; strict means that the returned annotation lies

-        completely within the span of the controlling annotation. Non-strict only implies

-        that the beginning of the returned annotation falls within the span of the

-        controlling annotation.</para>

-      

-      <para>There is also a method which produces an <literal>AnnotationTree</literal>

-        object, which contains nodes representing the results of doing a strict,

-        unambiguous subiterator over the span of some controlling annotation. For more

-        details, please refer to the Javadocs for the

-        <literal>org.apache.uima.cas.text</literal> package.</para>

-      

-    </section>

-    

-    <section id="ugr.ref.cas.index.constraints_and_filtered_iterators">

-      <title>Constraints and Filtered iterators</title>

-      

-      <para>Note: for new code, consider using the select framework plus Streams, instead of

-        the following.</para>

-        

-      <para>There is a set of API calls that build constraint objects. These objects can be

-        used directly to test if a particular feature structure matches (satisfies) the

-        constraint, or they can be passed to the createFilteredIterator method to create an

-        iterator that skips over instances which fail to satisfy the constraint.</para>

-      

-      <para>It is possible to specify a feature value located by following a chain of

-        references starting from the feature structure being tested. Here&apos;s a

-        scenario to explore this concept. Let&apos;s suppose you have the following type

-        system (namespaces are omitted for clarity):

-        

-        <blockquote>

-          <para><emphasis role="bold">Token</emphasis>, having a feature PartOfSpeech

-            which holds a reference to another type (POS)</para>

-          

-          <para><emphasis role="bold">POS</emphasis> (a type with many subtypes, each

-            representing a different part of speech)</para>

-          

-          <para><emphasis role="bold">Noun</emphasis> (a subtype of POS)</para>

-          

-          <para><emphasis role="bold">ProperName</emphasis> (a subtype of Noun),

-            having a feature Class which holds an integer value encoding some information

-            about the proper noun.</para></blockquote></para>

-      

-      <para>If you want to filter Token instances, such that only those tokens get through

-        which are proper names of class 3 (for example), you would need a test that started with

-        a Token instance, followed its PartOfSpeech reference to another instance (the

-        ProperName instance) and then tested the Class feature of that instance for a value

-        equal to 3.</para>

-      

-      <para>To support this, the filtering approach has components that specify tests, and

-        components that specify <quote>paths</quote>. The tests that can be done include

-        testing references to type instances to see if they are instances of some type or its

-        subtypes; this is done with a FSTypeConstraint constraint. Other tests check for

-        equality or, for numeric values, ranges.</para>

-      

-      <para>Each test may be combined with a path &ndash; to get to the value to test. Tests that

-        start from a feature structure instance can be combined with and and or connectors.

-        The Javadocs for these are in the package org.apache.uima.cas in the classes that end

-        in Constraint, plus the classes ConstraintFactory, FeaturePath and CAS.

-        Here&apos;s an example; assume the variable cas holds a reference to a CAS instance.

-        

-        

-        <programlisting>// Start by getting the constraint factory from the CAS.

-ConstraintFactory cf = cas.getConstraintFactory();

-

-// To specify a path to an item to test, you start by

-// creating an empty path.

-FeaturePath path = cas.createFeaturePath();

-

-// Add POS feature to path, creating one-element path.

-path.addFeature(posFeat);

-

-// You can extend the chain arbitrarily by adding additional

-// features.

-

-// Create a new type constraint.  

-

-// Type constraints will check that structures

-// they match against have a type at least as specific

-// as the type specified in the constraint.

-FSTypeConstraint nounConstraint = cf.createTypeConstraint();

-

-// Set the type (by default it is TOP).

-// This succeeds if the type being tested by this constraint

-// is nounType or a subtype of nounType.

-nounConstraint.add(nounType);

-

-// Embed the noun constraint under the pos path.

-// This means, associate the test with the path, so it tests the

-// proper value.

-

-// The result is a test which will

-// match a feature structure that has a posFeat defined

-// which has a value which is an instance of a nounType or

-// one of its subtypes.

-FSMatchConstraint embeddedNoun = cf.embedConstraint(path, nounConstraint);

-

-// Create a type constraint for token (or a subtype of it)

-FSTypeConstraint tokenConstraint = cf.createTypeConstraint();

-

-// Set the type.

-tokenConstraint.add(tokenType);

-

-// Create the final constraint by conjoining the two constraints.

-FSMatchConstraint nounTokenCons = cf.and(nounConstraint, tokenConstraint);

-

-// Create a filtered iterator from some annotation iterator.

-FSIterator it = cas.createFilteredIterator(annotIt, nounTokenCons);</programlisting>

-        </para></section></section>

-  

-  <section id="ugr.ref.cas.guide_to_javadocs">

-    <title>The CAS API&apos;s &ndash; a guide to the Javadocs</title>

-    <titleabbrev>CAS API&apos;s Javadocs</titleabbrev>

-    

-    <para>The CAS APIs are organized into 3 Java packages: cas, cas.impl, and cas.text. Most

-      of the APIs described here are in the cas package. The cas.impl package contains classes

-      used in serializing and deserializing (reading and writing external representations) the

-      CAS in various formats, for

-      transporting the CAS among local and remote annotators, or for storing the CAS in

-      permanent storage. The cas.text contains the APIs that extend the CAS to support

-      artifact (including <quote>text</quote>) analysis.</para>

-    

-    <section id="ugr.ref.cas.javadocs.cas_package">

-      <title>APIs in the CAS package</title>

-      

-      <para>The main objects implementing the APIs discussed here are shown in the diagram

-        below. The hierarchy represents that there is a way to get from an upper object to an

-        instance of the lower object, usually by using a method on the upper object; this is not

-        an inheritance hierarchy.

-        <figure id="ugr.ref.cas.fig.api_hierarchy">

-          <title>CAS Object hierarchy</title>

-          <mediaobject>

-            <imageobject>

-              <imagedata width="5.8in" format="JPG"

-                fileref="&imgroot;image001.png"/>

-            </imageobject>

-            <textobject><phrase>CAS object hierarchy</phrase></textobject>

-          </mediaobject>

-        </figure> </para>

-      

-      <para>The main Interface is the CAS interface. This has most of the functionality of the

-        CAS, except for the type system metadata access, and the indexing access. JCas and CAS

-        are alternative representations and API approaches to the CAS; each has a method to

-        get the other. You can mix JCas and CAS APIs in your application as needed. To use the

-        JCas APIs, you have to create the Java classes that correspond to the CAS types, and

-        include them in the Java class path of the application. If you have a CAS object, you can

-        get a JCas object by using the getJCas() method call on the CAS object; likewise, you

-        can get the CAS object from a JCas by using the getCAS() method call on the JCas object.

-        There is also a low level CAS interface that is not part of the official API, and is

-        intended for internal use only &ndash; it is not documented here.</para>

-      

-      <para>The type system metadata APIs are found in the TypeSystem interface. The objects

-        defining each type and feature are defined by the interfaces Type and Feature. The

-        Type interface has methods to see what types subsume other types, to iterate over the

-        types available, and to extract information about the types, including what

-        features it has. The Feature interface has methods that get what type it belongs to,

-        its name, and its range (the kind of values it can hold).</para>

-      

-      <para>The FSIndexRepository gives you access to methods to get instances of indexes, and

-        also provides access to the iterator over all indexed feature structures: 

-        <literal>getAllIndexedFS(aType)</literal>.

-        The FSIndex and AnnotationIndex objects give you methods to create instances of

-        iterators.</para>

-      

-      <para>Iterators and the CAS methods that create new feature structures return

-        FeatureStructure objects. These objects can be used to set and get the values of

-        defined features within them.</para>

-    </section>

-  </section>

-  

-  <section id="ugr.ref.cas.typemerging">

-    <title>Type Merging</title>

-    

-    <para>When annotators are combined in an aggregate, their defined type systems are merged.

-    This is designed to support independent development of annotator components.  The merge

-    results in a single defined type system for CASes that flow through a particular set of

-    annotators.</para>

-    

-    <para>The basic operation of a type system merge is to iterate through all the defined types,

-    and if two annotators define the same fully qualified type name, 

-    to take the features defined for those types

-    and form a logical union of those features.  This operation requires that same-named features

-    have the same range type names.  The resulting type system has features comprising the union

-    of all features over all the various definitions for this type in different annotators.

-    </para>

-    

-    <para>Feature merging checks that for all features having the same name in a type, that the

-    range type is identical; otherwise an error is signaled.</para>

-    

-    <para>Types are combined for merging when their fully qualified names are the same.

-    Two different definitions can be merged even if their supertype definitions do not match, if

-    one supertype subsumes the other supertype; otherwise an error is signaled.  Likewise, two types

-    with the same name can be merged only if their features can be merged.

-    </para>

-    </section>

-    

-  <section id="ugr.ref.cas.limitedmultipleaccess">

-    <title>Limited multi-thread access to read-only CASs</title>

-    

-    <para>Some applications may find it useful to scale up pipelines and run these in parallel.</para>

-    <para>

-    Generally, CASs are not threadsafe, and only one thread at a time may operate on it.  In many

-    scenarios, a CAS may be initialized and then filled with Feature Structures, and after some point,

-    no more updates to that particular CAS will be done.</para>

-    

-    <para>

-    If a CAS is no longer going to be changed, it is possible to 

-    access it on multiple threads in a read-only mode, simultaneously, with some limitations.  Limitations 

-    arise because some UIMA Framework activities may update internal CAS data structures.</para>

-    

-    <para>Operational data is updated while running a pipeline when a PEAR is entered or exited, 

-    because PEARs establish new class loaders and can potentially switch the JCas classes being used

-    (This happens because the class loaders might define different JCas cover classes 

-    implementing the same UIMA type).

-    Because of this, you cannot have multiple pipelines accessing a CAS in read-only mode if one or more of those

-    pipelines contains a PEAR. There are other edge cases where this may happen as well; for example, if you are 

-    running a pipeline with an Extension Class Loader, 

-    and have a callback routine loaded under a different class loader, UIMA will switch the JCas classes when

-    calling the callback.

-    </para>

-    </section>

-</chapter>
\ No newline at end of file
diff --git a/uima-docbook-references/src/docbook/ref.compress.xml b/uima-docbook-references/src/docbook/ref.compress.xml
deleted file mode 100644
index 84ab76e..0000000
--- a/uima-docbook-references/src/docbook/ref.compress.xml
+++ /dev/null
@@ -1,212 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
-<!ENTITY imgroot "images/references/ref.compress/">
-<!ENTITY tp "ugr.ref.compress.">
-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  
-%uimaents;
-]>
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements.  See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership.  The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License.  You may obtain a copy of the License at
-
-   http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied.  See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-<chapter id="ugr.ref.compress">
-  <title>Compressed Binary CASes</title>
-
-  <section id="ugr.ref.compress.overview">
-    <title>Binary CAS Compression overview</title>
-     
-    <para>UIMA has a proprietary binary serialization format, used internally
-    for several things, including communicating with embedded C++ annotators using
-    UIMA-CPP.  This binary format is also selectable for use with UIMA-AS.  Its use
-    requires that the source and target systems implement the identical type system
-    (because the type system is not sent, and internal coding is used within the
-    format that is keyed to the particular type system).</para>
-    
-    <para>Starting with version 2.4.1, two additional forms of binary serialization are added.
-    Both compress the data being serialized; typical size ratios can approach 50 : 1,
-    depending on the exact contents of the CAS, when compared with normal binary serialization.
-    </para>
-    
-    <para>The two forms are called 4 and 6, for historical/internal reasons.  The serialized forms
-    of both of these is fixed, but not currently standardized, and the form being used is encoded in the header so 
-    that the appropriate deserializer can be chosen.  Both forms include support for Delta CAS
-    being returned from a service.</para>
-
-    <para>Form 6 builds on form 4, and adds: serializing only those feature structures which
-    are reachable (that is, in some index, or referenced by other reachable feature structures),
-    and type filtering.</para>
-    
-    <para>Type filtering takes a source type system and a target type system, and for serializing 
-    (source to target), sends the binary representation of reachable feature structures in the target's type system.
-    For deserializing (reading a target into a source), the filtering takes the specification being read
-    as being encoded using the target's type system, and translates that into the source's type system.
-    In this process, types which exist in the source but not the target are skipped (when serializing); 
-    types which exist in the target, but not the source are skipped when deserializing.  
-    <!-- Note that this 
-    never happens when the target is a remote service, as the client type system is guaranteed to be a superset
-    of the service's due to type merging that UIMA does when starting up pipelines.  
-     -->
-    Features that exist in some
-    source type but not in the version of the same type in the target are skipped (when serializing)
-    or set to default values (i.e., 0 or null) when being deserialized.</para>
-
-    <para>There are two main use cases for using compressed forms.  The first one is for communicating with 
-    UIMA-AS remote services (not yet implemented).
-    <!--   
-    Form 6 is automatically used when binary is selected as the method
-    in the &lt;serializer> element in the UIMA-AS deployment descriptor.  It is used with delta CAS
-    support for the returned CAS, and with type filtering - sending to the remote service only those
-    types and features it defines in its type system.
-     -->
-    </para>
-    
-    <para>The second use case is for saving compressed representations of CASes to other media, such as disk files,
-    where they can be deserialized later for use in other UIMA applications.</para>
-    
-  </section>
-
-  
-  <section id="ugr.ref.compress.usage">
-    <title>Using Compressed Binary CASes</title>
-    
-    <para>The main user interface for serializing a CAS using compression is to use one of the 
-    static methods named serializeWithCompression in Serialization.  If you pass a Type System argument representing
-    a target type system, then form 6 compression is used; otherwise form 4 is used.  
-    To get the benefit of only serializing reachable Feature Structure instances, without type mapping 
-    (which is only in form 6), pass a type system argument which is null.     
-    </para>
-    
-    <para>To deserialize into a CAS without type mapping, use one of the deserialize method in Serialization.  
-    There are multiple forms of this method, depending on the arguments.  The forms which take extra arguments
-    include a ReuseInfo may only be used with serialized forms created with form 6 compression.  
-    The plain form of deserialize works with all forms of binary serialization, compressed and non-compressed, by examining a common
-    header which identifies the form of binary serialization used; however, for form 6, since it requires
-    additional arguments, it will fail - and you need to use the other deserialize form.</para>
-    
-    <para>Form 6 has an additional object, ReuseInfo, which holds information which 
-    is required for subsequent Delta CAS format serializations / deserializations.
-    It can speed up subsequent serializations of the same 
-    CAS (before it is further updated), for instance, if an application is sending the CAS to multiple services in parallel.  
-    The serializeWithCompression method returns this object when form 6 is being used. 
-    <!--
-    This object is also
-    used when deserializing delta CASs being returned from services:  internally, it is saved on the client side
-    when serializing a CAS to a remote service; it is saved on the service side after 
-    deserialization an incoming CAS.  The server-side instance of ReuseInfo is provided when that CAS is being 
-    serialized and returned to the client in delta-cas format, and the client-side instance of it is used when deserializing the delta CAS.
-    This is all done under the covers by the UIMA-AS implementation.
-    --> 
-    </para>
-    <para>In addition, the CasIOUtils class offers static load and save methods, which can be used with the SerialFormat
-    enum to serialize and deserialize to URLs or streams; see the Javadocs for details.</para> 
-  </section>
-
-  <section id="ugr.ref.compress.simple-deltas">
-    <title>Simple Delta CAS serialization</title>
-    <para>Use Form 4 for this, because form 6 supports delta CAS but requires 
-    that at the time of deserialization of a CAS (on the receiver side) which will later be delta serialized
-    back to the sender, 
-    an instance of the ReuseInfo must be saved, and that
-    same instance then used for delta serialization; furthermore, the original serialization 
-    (on the sender side)
-    also must save an instance of the ReuseInfo and use this when deserializing the delta CAS.
-    </para>
-    
-    <para>Form 4 may not be as efficient as form 6 in that it does not filter the CASes 
-    either by type systems nor by only sending reachable Feature Structure
-    instances.  But, it doesn't require a ReuseInfo object when doing delta serialization or
-    deserialization,  
-    so it may be more convenient to use when saving
-    delta CASes to files (as opposed to the other use case of 
-    a remote service returning delta CASes to a remote client).</para> 
-  </section>
-  
-  <section id="ugr.ref.compress.use-cases">
-    <title>Use Case cookbook</title>
-    <para>
-    Here are some use cases, together with a suggested approach and example of how to use the APIs.
-    </para>
-    
-    <para>
-      <emphasis role="strong">Save a CAS to an output stream, using form 4 (no type system filtering):</emphasis>
-    </para>
-          <programlisting>// set up an output stream.  In this example, an internal byte array.
-ByteArrayOutputStream baos = new ByteArrayOutputStream(OUT_BFR_INIT_SZ);
-Serialization.serializeWithCompression(casSrc, baos);
-  // or
-CasIOUtls.save(casSrc, baos, SerialFormat.COMPRESSED);
-</programlisting>
- 
-      <para><emphasis role="strong">Deserialize from a stream into an existing CAS:</emphasis></para>
-      <programlisting>// assume the stream is a byte array input stream
-// For example, one could be created 
-//   from the above ByteArrayOutputStream as follows:
-ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
-// Deserialize into a cas having the identical type system
-Serialization.deserializeCAS(cas, bais);
-  // or
-CasIOUtils.load(bais, aCas);
-</programlisting>
-
-<para>Note that the <code>deserializeCAS(cas, inputStream)</code> method is a general way to
-deserialize into a CAS from an inputStream for all forms of binary serialized data
-(with exceptions as noted above).
-The method reads a common header, and based on what it finds, selects the appropriate
-deserialization routine.</para>
-
-<note><para>The <code>deserialization</code> method with just 2 arguments method doesn't support type filtering, or
-delta cas deserializating for form 6. To do those, see example below. 
-</para>
-</note>
-
-<para><emphasis role="strong">Serialize to an output stream, filtering out some types and/or features:</emphasis>
-</para>
-<para>
-To do this, an additional input specifying the Type System of the target must
-be supplied; this Type System should be a subset of the source CAS's.
-The <code>out</code> parameter may be an OutputStream, a DataOutputStream, or a File.
-</para>
-
-<programlisting>// set up an output stream.  In this example, an internal byte array.
-ByteArrayOutputStream baos = new ByteArrayOutputStream(OUT_BFR_INIT_SZ);
-Serialization.serializeWithCompression(cas, out, tgtTypeSystem);
-</programlisting>
-
-<para><emphasis role="strong">Deserialize with type filtering:</emphasis></para>
-<para>There are 2 type systems involved here: one is the receiving CAS, and the other is the type system
-used to decode the serialized form.  This may optionally be stored with the serialized form:</para>
-<programlisting>CasIOUtils.save(cas, out, SerialFormat.COMPRESSED_FILTERED_TS);
-</programlisting>
-<para>and/or it can be supplied at load time.  Here's two examples of suppling this at load time:</para>
-<programlisting>CasIOUtils.load(input, cas, typeSystem); 
-CasIOUtils.load(input, type_system_serialized_form_input, cas);
-</programlisting>
-
-<para>The reuseInfo should be null unless 
-deserializing a delta CAS, in which case, it must be the reuse info captured when 
-the original CAS was serialized out. 
-If the target type system is identical to the one in the CAS, you may pass null for it.
-If a delta cas is not being received, you must pass null for the reuseInfo.
-</para>
-<programlisting>ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
-Serialization.deserializeCAS(cas, bais, tgtTypeSystem, reuseInfo);
-</programlisting> 
-</section>
-  
-
-</chapter>
\ No newline at end of file
diff --git a/uima-docbook-references/src/docbook/ref.config.xml b/uima-docbook-references/src/docbook/ref.config.xml
deleted file mode 100644
index 270f9d7..0000000
--- a/uima-docbook-references/src/docbook/ref.config.xml
+++ /dev/null
@@ -1,292 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY imgroot "images/references/ref.config/">

-<!ENTITY tp "ugr.ref.config.">

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.ref.config">

-  <title>UIMA Setup and Configuration</title>

-  <titleabbrev>Setup and Configuration</titleabbrev>

-

-  <section id="ugr.ref.config.properties">

-    <title>UIMA JVM Configuration Properties</title>

-    

-    <para> Some updates change UIMA's behavior between released versions.  For example, sometimes an error check

-  is enhanced, and this can cause something that previously incorrect but not checked, to now signal an error.

-  Often, users will want these kinds of things to be ignored, at least for a while, to give them time to 

-  analyze and correct the issues.

-    </para> 

-    

-    <para>

-      To enable users to gradually address these issues, there are some global JVM properties

-  for UIMA that can restore earlier behaviors, in some cases.  

-  These are detailed in the table below.  Additionally, there are other JVM properties that can

-  be used in checking and optimizing some performance trade-offs, such as the automatic index protection.

-  For the most part, you don't need to assign any values to these properties,

-  just define them.  For example to disable the enhanced check that insures you 

-  don't add a subtype of AnnotationBase to the wrong View, you could disable this by

-  adding the JVM argument <code>-Duima.disable_enhanced_check_wrong_add_to_index</code>.  

-  This would remove the enhanced

-  checking for this, added in version 2.7.0 (the previously existing partial checking is

-  still there, though).  

-    </para>

-  </section>   

- 

-  <section id="ugr.ref.config.protect-index">

-    <title>Configuring index protection</title>

-    

-    <para>A new feature in version 2.7.0 optionally can include checking for invalid feature updates 

-    which could corrupt indexes.  Because this checking can slightly slow down performance, there are 

-    global JVM properties to control it.  The suggested way to operation with these is as follows.

-    <itemizedlist>

-	    <listitem><para>At the beginning, run with automatic protection enabled (the default), but

-	    turn on explicit reporting (<code>-Duima.report_fs_update_corrupts_index</code>)</para></listitem>

-	    <listitem><para>For all reported instances, examine your code to see if you can restructure to

-	    do the updates before adding the FS to the indexes.  Where you cannot, surround the code doing 

-	    these updates with a try / finally or block form of <code>protectIndexes()</code>, 

-	    which is described in  

-	     <xref linkend="ugr.ref.cas.updating_indexed_feature_structures"/> (and also is similarly available with JCas). 

-	    </para></listitem>

-	    <listitem><para>After no further reports, for maximum performance, leave in the protections 

-	    you may have installed in the above step, and then disable the reporting and runtime checking, 

-	    using the JVM argument  

-	    <code>-Duima.disable_auto_protect_indexes</code>, and removing (if present) 

-	    <code>-Duima.report_fs_update_corrupts_index</code>.</para></listitem>

-    </itemizedlist>

-    One additional JVM property, <code>-Duima.throw_exception_when_fs_update_corrupts_index</code>, 

-    is intended to be used in automated build / testing configurations.  It causes the framework to throw

-    a UIMARuntimeException if an update outside of a <code>protectIndexes</code> block occurs 

-    that could corrupt the indexes,

-    rather than "recovering" this.  

-    </para>

-  </section>

-  

-  <section id="ugr.ref.config.property-table">

-    <title>Properties Table</title>

-      

-    <para>This table describes the various JVM defined properties; specify these on the Java command line

-    using -Dxxxxxx, where the xxxxxx is one of

-    the properties starting with <code>uima.</code> from the table below.</para>  

-    <informaltable frame="all" rowsep="1" colsep="1">

-     <tgroup cols="3">

-       <colspec colnum="1" colname="Title" colwidth="1*"/>

-       <colspec colnum="2" colname="Description" colwidth="3*"/>

-       <colspec colnum="3" colname="Version"  colwidth= "0.5*"/>

-       

-       <spanspec spanname="fullwidth" namest="Title" nameend="Version" align="center"/>

-        

-       <tbody>

-         <row>

-           <entry><emphasis role="bold">Title</emphasis></entry>

-           <entry><emphasis role="bold">Property Name &amp; Description</emphasis></entry>

-           <entry><emphasis role="bold">Since Version</emphasis></entry>

-         </row>

-

-

-         <!-- ******************************************************************************* -->

-         <row>

-           <entry><para>Use built-in Java Logger as default back-end</para></entry>

-           

-           <entry><para><code>uima.use_jul_as_default_uima_logger</code></para>

-           

-                  <para>See <ulink url="https://issues.apache.org/jira/browse/UIMA-5381">UIMA-5381</ulink>.

-                  The standard UIMA logger uses an slf4j implementation, which, in turn hooks up to 

-                  a back end implementation based on what can be found in the class path (see slf4j documentation).

-                  If no backend implementation is found, the slf4j default is to use a NOP logger back end 

-                  which discards all logging.</para>

-                  

-                  <para>When this flag is specified, the behavior of the UIMA logger 

-                        is altered to use the built-in-to-Java logging implementation 

-                        as the back end for the UIMA logger.

-                  </para></entry>

-           <entry><para>3.0.0</para></entry>

-         </row>

-

-         <!-- ******************************************************************************* -->

-         <!-- 

-         <row>

-           <entry><para>Allow duplicate addToIndexes for identical Feature Structures</para></entry>

-           

-           <entry><para><code>uima.allow_duplicate_add_to_indexes</code> (default is false)</para>

-           

-                  <para>See <ulink url="https://issues.apache.org/jira/browse/UIMA-4135">UIMA-4135</ulink>

-                        and <ulink url="https://issues.apache.org/jira/browse/UIMA-3399">UIMA-3399</ulink>.

-                        As of version 2.7.0, adding a particular Feature Structure

-                        to the indexes more than once is ignored.  The old behavior

-                        may be restored by this property.</para></entry>

-           <entry><para>2.7.0</para></entry>

-         </row>

-         -->

-         

-         <!-- ******************************************************************************* -->

-         <!-- 

-         <row>

-           <entry><para>adding Annotation to wrong View</para></entry>

-           

-           <entry><para><code>uima.disable_enhanced_check_wrong_add_to_index</code></para>

-           

-                  <para>See <ulink url="https://issues.apache.org/jira/browse/UIMA-4099">UIMA-4099</ulink>.

-                        Feature Structures which are subtypes of AnnotationBase             

-                        may only be added to the View corresponding to their

-                        Sofa reference.  From version 2.7.0, there is additional 

-                        checking of this which can be disabled if needed 

-                        for backward compatibility.</para></entry>

-           <entry><para>2.7.0</para></entry>

-         </row>

-         -->

-

-         <!-- ******************************************************************************* -->

-         <row>

-           <entry><para>XML: enable doctype declarations</para></entry>

-           <entry><para><code>uima.xml.enable.doctype_decl</code> (default is false)</para>

-

-           <para>See <ulink url="https://issues.apache.org/jira/browse/UIMA-6064">UIMA-6064</ulink>

-           Normally, this is turned off to avoid exposure to malicious XML; see

-           <ulink url="https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Processing">

-             XML External Entity processing vulnerability</ulink>.

-           </para>

-           </entry>

-           

-           <entry><para>2.10.4, 3.1.0</para></entry>

-         </row>

-         

-         <row>

-           <entry spanname="fullwidth"><emphasis role="bold">Index protection properties</emphasis></entry>

-         </row>         

-         <!-- ******************************************************************************* -->

-         <row>

-           <entry><para>Report Illegal Index-key Feature Updates</para></entry>

-           

-           <entry><para><code>uima.report_fs_update_corrupts_index</code> (default is not to report)</para>

-                      

-                  <para>See <ulink url="https://issues.apache.org/jira/browse/UIMA-4135">UIMA-4135</ulink>.

-                        Updating Features which are used in Set and Sorted

-                        indexes as "keys" may corrupt the indexes, if the Feature Structure (FS)

-                        has been added to the indexes.  To update these, you must first

-                        completely remove the FS from the indexes in all views, then do the updates, and then

-                        add it back.  UIMA now checks for this (unless specifically disabled, see below),

-                        and if this property is set, will log WARN messages for each occurrence unless

-                        the user does explicit <code>protectIndexes</code> (see CAS JavaDocs for CAS / JCas <code>protectIndexes</code> methods), if this

-                        property is defined.</para>

-                   <para>To scan the logs for these reports, search for instances of lines having the string 

-                         <code>While FS was in the index, the feature</code></para>

-                   

-                   <para>Specifying this property overrides <code>uima.disable_auto_protect_indexes</code>.</para>

-                         

-                   <para>Users would run with this property defined, and then for high performance, 

-                        would use the report to manually change their code to avoid the problem or 

-                        to wrap the updates with a <code>protectIndexes</code> kind of protection (see the

-                        reference manual, in the CAS or JCas chapters, for examples of user code doing this, 

-                        and then run with the protection turned off (see below).

-                        

-                        </para></entry>

-                        

-           <entry><para>2.7.0</para></entry>

-         </row>

-

-         <!-- ******************************************************************************* -->

-         <row>

-           <entry><para>Throw exception on illegal Index-key Feature Updates</para></entry>

-           

-           <entry><para><code>uima.exception_when_fs_update_corrupts_index</code> (default is false)</para>

-                      

-                  <para>See <ulink url="https://issues.apache.org/jira/browse/UIMA-4150">UIMA-4150</ulink>.

-                        Throws a UIMARuntimeException if an Indexed FS feature used as a key in one or more 

-                        indexes is updated, outside of an explicit <code>protectIndexes</code> block..  \

-                        This is intended for use in automated build and test environments,

-                        to provide a strong signal if this kind of mistake gets into the build.

-                        If it is not set, then the other properties specify if corruption should be checked for, 

-                        recovered automatically, and / or reported</para>

-                   

-                   <para>Specifying this property also forces <code>uima.report_fs_update_corrupts_index</code>

-                         to true even if it was set to false.</para>

-                         

-                   </entry>

-                        

-           <entry><para>2.7.0</para></entry>

-         </row>

-         

-         <!-- ******************************************************************************* -->

-         <row>

-           <entry><para>Disable the index corruption checking</para></entry>

-           

-           <entry><para><code>uima.disable_auto_protect_indexes</code></para>

-                      

-                  <para>See <ulink url="https://issues.apache.org/jira/browse/UIMA-4135">UIMA-4135</ulink>.

-                        After you have fixed all reported issues identified with the above report,

-                        you may set this property to omit this check, which may slightly improve

-                        performance.</para>

-                  <para>Note that this property is ignored if the <code>-Dexception_when_fs_update_corrupts_index</code>

-                  or <code>-Dreport_fs_update_corrupts_index</code></para>

-           </entry>

-                        

-           <entry><para>2.7.0</para></entry>

-         </row>

-

-

-         <row>

-           <entry spanname="fullwidth"><emphasis role="bold">Measurement / Tracing properties</emphasis></entry>

-         </row>         

-         <!-- ******************************************************************************* -->

-      

-         <row>

-           <entry><para>Trace Feature Structure Creation/Updating</para></entry>

-           

-           <entry><para><code>uima.trace_fs_creation_and_updating</code></para>

-                  <para>This causes a trace file to be produced in the current working directory.

-                  The file has one line for each Feature Structure that is created, and include

-                  information on the cas/cas-view, and the features that are set for the Feature Structure.

-                  There is, additionally, one line for each Feature Structure update.

-                  Updates that occur next-to trace information for the same Feature Structure are combined.

-                  </para>

-           

-                  <para>This can generate a lot of output, and definitely slows down execution.</para>

-            </entry>

-            

-            <entry><para>2.10.1</para></entry>

-         </row>    

-         

-                    

-         <row>

-           <entry><para>Measure index flattening optimization</para></entry>

-           

-           <entry><para><code>uima.measure.flatten_index</code></para>

-                      

-                  <para>See <ulink url="https://issues.apache.org/jira/browse/UIMA-4357">UIMA-4357</ulink>.

-                        This creates a short report to System.out when Java is shutdown.

-                        The report has some statistics about the automatic management of 

-                        flattened index creation and use.</para>

-          

-           </entry>

-                        

-           <entry><para>2.8.0</para></entry>

-         </row>         

-          -->

-

-       </tbody>

-     </tgroup>

-   </informaltable>

-   <para>Some additional global flags intended for helping v3 migration are documented in the V3 user's guide.</para> 

-  </section>

-  

-</chapter>
\ No newline at end of file
diff --git a/uima-docbook-references/src/docbook/ref.javadocs.xml b/uima-docbook-references/src/docbook/ref.javadocs.xml
deleted file mode 100644
index 9718940..0000000
--- a/uima-docbook-references/src/docbook/ref.javadocs.xml
+++ /dev/null
@@ -1,89 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY imgroot "images/references/ref.javadocs/">

-<!ENTITY tp "ugr.ref.javadocs.">

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.ref.javadocs">

-  <title>Javadocs</title>

-  

-  <para>The details of all the public APIs for UIMA are contained in the API Javadocs. These are located in the docs/api

-    directory; the top level to open in your browser is called <ulink url="api/index.html"/>.</para>

-  

-  <para>Eclipse supports the ability to attach the Javadocs to your project. The Javadoc should already be attached

-    to the <literal>uimaj-examples</literal> project, if you followed the setup instructions in <olink

-      targetdoc="&uima_docs_overview;"/> <olink

-      targetdoc="&uima_docs_overview;" targetptr="ugr.ovv.eclipse_setup.example_code"/>. To attach

-    Javadocs to your own Eclipse project, use the following instructions.</para>

-  

-  <note><para>As an alternative, you can add the UIMA source to the UIMA binary distribution; if you

-  do this you not only will have the Javadocs automatically available (you can skip the following

-  setup), you will have the ability to step through the UIMA framework code while debugging.

-  To add the source, follow the instructions as described in the setup chapter: 

-  <olink targetdoc="&uima_docs_overview;"/>

-  <olink targetdoc="&uima_docs_overview;" targetptr="ugr.ovv.eclipse_setup.adding_source"/>.</para></note>

-  

-  <para>To add the Javadocs, open a project which is referring to the UIMA APIs in its class path, and open the project properties. Then pick

-    Java Build Path. Pick the "Libraries" tab and select one of the UIMA library entries (if you don't have, for

-    instance, uima-core.jar in this list, it's unlikely your code will compile). Each library entry has a small ">"

-    sign on its left - click that to expand the view to see the Javadoc location. If you highlight that and press edit - you

-    can add a reference to the Javadocs, in the following dialog:

-    

-    

-    <screenshot>

-    <mediaobject>

-      <imageobject>

-        <imagedata width="5.8in" format="JPG" fileref="&imgroot;image002.jpg"/>

-      </imageobject>

-      <textobject><phrase>Screenshot of attaching Javadoc to source in Eclipse</phrase></textobject>

-    </mediaobject>

-  </screenshot></para>

-  

-  <para>Once you do this, Eclipse can show you Javadocs for UIMA APIs as you work. To see the Javadoc for a UIMA API, you

-    can hover over the API class or method, or select it and press shift-F2, or use the menu Navigate &rarr;

-    Open External Javadoc, or open the Javadoc view (Window &rarr; Show View &rarr; Other

-    &rarr; Java &rarr; Javadoc).</para>

-  

-  <para>In a similar manner, you can attach the source for the UIMA framework, if you download the source

-    distribution. The source corresponding to particular

-    releases is available from the Apache UIMA web site (<ulink url="http://uima.apache.org"/>) on the

-    downloads page.</para>

-  

-  <section id="ugr.ref.javadocs.libraries">

-    <title>Using named Eclipse User Libraries</title>

-  <para>You can also create a named "user library" in Eclipse containing the UIMA Jars, and attach the Javadocs (or

-  optionally, the sources); this named library is saved in the Eclipse workspace.  Once created, it can be

-  added to the classpath of newly created Eclipse projects.</para> 

-  

-  <para>Use the menu option Project &rarr; Properties

-  &rarr; Java Build Path, and then pick the Libraries tab, and click the Add Library button. Then select

-  User Libraries, click "Next", and pick the library you created for the UIMA Jars.</para> 

-  

-  <para>To create this library in the workspace,

-    use the same menu picks as above, but after you select the User Libraries and click "Next", you can click the "New Library..."

-    button to define your new library.  You use the "Add Jars" button and multi-select all the Jars in the lib directory

-    of the UIMA binary distribution.  Then you add the Javadoc attachment for each Jar.  The path to use is

-    file:/ -- insert the path to your install of UIMA -- /docs/api.  After you do this for the first Jar, you can

-    copy this string to the clipboard and paste it into the rest of the Jars.</para>

-    </section>

-</chapter>
\ No newline at end of file
diff --git a/uima-docbook-references/src/docbook/ref.jcas.xml b/uima-docbook-references/src/docbook/ref.jcas.xml
deleted file mode 100644
index 52158ae..0000000
--- a/uima-docbook-references/src/docbook/ref.jcas.xml
+++ /dev/null
@@ -1,677 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.ref.jcas">

-  <title>JCas Reference</title>

-  

-  <para>The CAS is a system for sharing data among annotators, consisting of data structures

-    (definable at run time), sets of indexes over these data, metadata describing these, subjects of

-    analysis, and a high

-    performance serialization/deserialization mechanism. JCas provides Java approach to

-    accessing CAS data, and is based on using generated, specific Java classes for each CAS

-    type.</para>

-  

-  <para>Annotators process one CAS per call to their process method. During processing,

-    annotators can retrieve feature structures from the passed in CAS, add new ones, modify

-    existing ones, and use and update CAS indexes. Of course, an annotator can also use plain

-    Java Objects in addition; but the data in the CAS is what is shared among annotators within

-    an application.</para>

-  

-  <para>All the facilities present in the APIs for the CAS are available when using the JCas

-    APIs; indeed, you can use the getCas() method to get the corresponding CAS object from a

-    JCas (and vice-versa). The JCas APIs often have helper methods that make using this

-    interface more convenient for Java developers.</para>

-  

-  <para>The data in the CAS are typed objects having fields. JCas uses a set of generated Java

-    classes (each corresponding to a particular CAS type) with <quote>getter</quote> and

-    <quote>setter</quote> methods for the features, plus a constructor so new instances can

-    be made. The Java classes stores the data in the class instance.</para>

-  

-    <para>Users can modify the JCas generated

-    Java classes by adding fields to them; this allows arbitrary non-CAS data to also be

-    represented within the JCas objects, as well; however, the non-CAS data stored in the JCas

-    object instances cannot be shared with annotators using the plain CAS, unless special

-    provision is made - see the chapter in the v3 user's guide on storing arbitrary

-    Java objects in the CAS.</para>

-  

-  <para>The JCas class Java source files are generated from XML type system descriptions. The

-    JCasGen utility does the work of generating the corresponding Java Class Model for the CAS

-    types. There are a variety of ways JCasGen can be run; these are described later. You

-    include the generated classes with your UIMA component, and you can publish these classes

-    for others who might want to use your type system.</para>

-    

-  <para>JCas classes are not required for all UIMA types.  Those types which don&apos;t have 

-    corresponding JCas classes use the nearest JCas class corresponding to a type in their superchain.</para>

-  

-  <para>The specification of the type system in XML can be written using a conventional text

-    editor, an XML editor, or using the Eclipse plug-in that supports editing UIMA

-    descriptors.</para>

-  

-  <para>Changes to the type system are done by changing the XML and regenerating the

-    corresponding Java Class Models. Of course, once you&apos;ve published your type system

-    for others to use, you should be careful that any changes you make don&apos;t adversely

-    impact the users. Additional features can be added to existing types without breaking

-    other code.</para>

-  

-  <para>A separate Java class is generated for each type; this type implements the CAS

-    FeatureStructure interface, as well as having the special getters and setters for the

-    included features. The generated Java classes have methods (getters and setters) for the

-    fields as defined in the XML type specification. Descriptor comments are reflected in the

-    generated Java code as Java-doc style comments.</para>

-  

-  

-  <section id="ugr.ref.jcas.name_spaces">

-    <title>Name Spaces</title>

-    

-    <para>Full Type names consist of a <quote>namespace</quote> prefix dotted with a simple

-      name. Namespaces are used like packages to avoid collisions between types that are

-      defined by different people at different times. The namespace is used as the Java

-      package name for generated Java files.</para>

-      

-    <para>Type names used in the CAS correspond to the generated Java classes directly. If the

-      CAS name is com.myCompany.myProject.ExampleClass, the generated Java class is in the

-      package com.myCompany.myProject, and the class is ExampleClass.</para>

-      

-    <para>

-      An exception to this rule is the built-in types

-      starting with <literal>uima.cas </literal>and <literal>uima.tcas</literal>;

-      these names are mapped to Java packages named

-      <literal>org.apache.uima.jcas.cas</literal> and

-      <literal>org.apache.uima.jcas.tcas</literal>.</para>

-    

-  </section>

-  

-  <section id="ugr.ref.jcas.use_of_description">

-    <title>XML description element</title>

-    <titleabbrev>Use of XML Description</titleabbrev>

-    

-    <para>Each XML type specification can have &lt;description ...

-      &gt; tags. The description for a type will be copied into the generated Java code, as a

-      Javadoc style comment for the class. When writing these descriptions in the XML type

-      specification file, you might want to use html tags, as allowed in Javadocs.</para>

-    

-    <para>If you use the Component Description Editor, you can write the html tags normally,

-      for instance, <quote>&lt;h1&gt;My Title&lt;/h1&gt;</quote>. The Component

-      Descriptor Editor will take care of coverting the actual descriptor source so that it

-      has the leading <quote>&lt;</quote> character written as <quote>&amp;lt;</quote>,

-      to avoid confusing the XML type specification. For example, &lt;p&gt; would be written

-      in the source of the descriptor as &amp;lt;p&gt;. Any characters used in the Javadoc

-      comment must of course be from the character set allowed by the XML type specification.

-      These specifications often start with the line &lt;?xml version=<quote>1.0</quote>

-      encoding=<quote>UTF-8</quote> ?&gt;, which means you can use any of the UTF-8

-      characters.</para>

-    

-  </section>

-  

-  <section id="ugr.ref.jcas.mapping_built_ins">

-    <title>Mapping built-in CAS types to Java types</title>

-    

-    <para>The built-in primitive CAS types map to Java types as follows:</para>

-    

-    

-    <programlisting>uima.cas.Boolean &rarr; boolean

-uima.cas.Byte    &rarr; byte

-uima.cas.Short   &rarr; short

-uima.cas.Integer &rarr; int

-uima.cas.Long    &rarr; long

-uima.cas.Float   &rarr; float

-uima.cas.Double  &rarr; double

-uima.cas.String  &rarr; String</programlisting>

-    

-  </section>

-  

-  <section id="ugr.ref.jcas.augmenting_generated_code">

-    <title>Augmenting the generated Java Code</title>

-    

-    <para>The Java Class Models generated for each type can be augmented by the user. Typical

-      augmentations include adding additional (non-CAS) fields and methods, and import

-      statements that might be needed to support these. Commonly added methods include

-      additional constructors (having different parameter signatures), and

-      implementations of toString().</para>

-    

-    <para>To augment the code, just edit the generated Java source code for the class named the

-      same as the CAS type. Here&apos;s an example of an additional method you might add; the

-      various getter methods are retrieving values from the instance:</para>

-    

-    

-    <programlisting>public String toString() { // for debugging

-  return "XsgParse "

-    + getslotName() + ": "

-    + getheadWord().getCoveredText()

-    + " seqNo: " + getseqNo()

-    + ", cAddr: " + id

-    + ", size left mods: " + getlMods().size()

-    + ", size right mods: " + getrMods().size();

-}</programlisting>

- 

-   <!-- does not apply for v3

-    <section id="ugr.ref.jcas.data_persistence">

-      <title>Persistence of additional data</title>

-      <para>If you add custom instance fields to JCas cover classes, these exist in the JCas cover object instance,

-        but not in the CAS itself. Each time a CAS object is referenced (by an iterator, or by following a Feature

-        Structure reference), a new JCas cover object instance may be created. If you need these values, you can (a)

-        make them CAS values if possible, or (b) hold a reference to the the particular JCas cover object instance in

-        your Java code. For some simple cases, setting the the performance tuning option JCAS_CACHE_ENABLE (see

-          <olink targetdoc="&uima_docs_tutorial_guides;"/>

-          <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="tug.application.pto"/>)

-         to true

-        will cause the same JCas cover object that was previously used for a particular CAS Feature Structure to be

-        reused. However, this capability won't work when other factors interfere with the ability to reuse the same

-        object.  Pear isolation is an example of this.</para>

-      <para>Because of this, and because the JCas Cache holds on to the JCas cover objects beyond their useful life and

-        prevents them from being garbage collected, it is normally recommended running with the

-        JCAS_CACHE_ENABLE set to "false".</para>

-    </section>   

-     -->

-     

-    <section id="ugr.ref.jcas.keeping_augmentations_when_regenerating">

-      <title>Keeping hand-coded augmentations when regenerating</title>

-      

-      <para>If the type system specification changes, you have to re-run the JCasGen

-        generator. This will produce updated Java for the Class Models that capture the

-        changed specification. If you have previously augmented the source for these Java

-        Class Models, your changes must be merged with the newly (re)generated Java source

-        code for the Class Models. This can be done by hand, or you can run the version of JCasGen

-        that is integrated with Eclipse, and use automatic merging that is done using Eclipse&apos;s EMF

-        plug-in. You can obtain Eclipse and the needed EMF plug-in from <ulink

-          url="http://www.eclipse.org/"/>.</para>

-      

-      <para>If you run the generator version that works without using Eclipse, it will not

-        merge Java source changes you may have previously made; if you want them retained,

-        you&apos;ll have to do the merging by hand.</para>

-      

-      <para>The Java source merging will keep additional constructors, additional fields,

-        and any changes you may have made to the readObject method (see below). Merging will

-        <emphasis>not</emphasis> delete classes in the target corresponding to deleted CAS types, which no longer

-        are in the source &ndash; you should delete these by hand.</para>

-      

-      <warning><para>The merging supports Java 1.4 syntactic constructs only.  

-        JCasGen generates Java 1.4 code, so as long as any code you change here also sticks to 

-        only Java 1.4 constructs, the merge will work.  If you use Java 5 or later specific syntax or constructs, the merge

-        operation will likely fail to merge properly.</para></warning>

-    </section>

-    

-    <section id="ugr.ref.jcas.additional_constructors">

-      <title>Additional Constructors</title>

-      

-      <para>Any additional constructors that you add must include the JCas argument. The

-        first line of your constructor is required to be</para>

-      

-      

-      <programlisting>this(jcas);        // run the standard constructor</programlisting>

-      

-      <para>where jcas is the passed in JCas reference. If the type you&apos;re defining

-        extends <literal>uima.tcas.Annotation</literal>, JCasGen will automatically

-        add a constructor which takes 2 additional parameters &ndash; the begin and end Java

-        int values, and set the <literal>uima.tcas.Annotation</literal>

-        <literal>begin</literal> and <literal>end</literal> fields.</para>

-      

-      <para>Here&apos;s an example: If you&apos;re defining a type MyType which has a

-        feature parent, you might make an additional constructor which has an additional

-        argument of parent:</para>

-      

-      

-      <programlisting>MyType(JCas jcas, MyType parent) {

-  this(jcas);        // run the standard constructor

-  setParent(parent); // set the parent field from the parameter

-}</programlisting>

-      

-      <section id="ugr.ref.jcas.using_readobject">

-        <title>Using readObject</title>

-        

-        <para>Fields defined by augmenting the Java Class Model to include additional

-          fields represent data that exist for this class in Java, in a local JVM (Java Virtual

-          Machine), but do not exist in the CAS when it is passed to other environments (for

-          example, passing to a remote annotator).</para>

-        

-        <para>A problem can arise when new instances are created, perhaps by the underlying

-          system when it iterates over an index, which is: how to insure that any additional

-          non-CAS fields are properly initialized. To allow for arbitrary initialization

-          at instance creation time, an initialization method in the Java Class Model,

-          called readObject is used. The generated default for this method is to do nothing,

-          but it is one of the methods that you can modify &ndash; to do whatever

-          initialization might be needed. It is called with 0 parameters, during the

-          constructor for the object, after the basic object fields have been set up. It can

-          refer to fields in the CAS using the getters and setters, and other fields in the Java

-          object instance being initialized.</para>

-        

-        <para>A pre-existing CAS feature structure could exist if a CAS was being passed to

-          this annotator; in this case the JCas system calls the readObject method when

-          creating the corresponding Java instance for the first time for the CAS feature

-          structure. This can happen at two points: when a new object is being returned from an

-          iterator over a CAS index, or a getter method is getting a field for the first time

-          whose value is a feature structure.</para>

-        

-      </section>

-    </section>

-    

-    <section id="ugr.ref.jcas.modifying_generated_items">

-      <title>Modifying generated items</title>

-      

-      <para>The following modifications, if made in generated items, will be preserved when

-        regenerating.</para>

-      

-      <para>The public/private etc. flags associated with methods (getters and setters).

-        You can change the default (<quote>public</quote>) if needed.</para>

-      

-      <para><quote>final</quote> or <quote>abstract</quote> can be added to the type

-        itself, with the usual semantics.</para>

-      

-    </section>

-  </section>

-  

-  <section id="ugr.ref.jcas.merging_types_from_other_specs">

-    <title>Merging types</title>

-    <titleabbrev>Merging Types</titleabbrev>

-    <para>Type definitions are merged by the framework from all the components being run together.</para>

-    

-    <section id="ugr.ref.jcas.merging_types.aggregates_and_cpes">

-      <title>Aggregate AEs and CPEs as sources of types</title>

-      

-      <para>When running aggregate AEs (Analysis Engines), or a set of AEs in a collection processing engine, the

-        UIMA framework will build a merged type system (Note: this <quote>merge</quote> is merging types, not to be

-        confused with merging Java source code, discussed above). This merged type system has all the types of every

-        component used in the application.  In addition, application code can use UIMA Framework APIs to read and merge

-        type descriptions, manually.</para>

-      

-      <para>In most cases, each type system can have its own Java Class Models generated individually, perhaps at an

-        earlier time, and the resulting class files (or .jar files containing these class files) can be put in the

-        class path to enable JCas.</para>

-      

-      <para>However, it is possible that there may be multiple definitions of the same CAS type, each of which might

-        have different features defined. In this case, the UIMA framework will create a merged type by accumulating

-        all the defined features for a particular type into that type&apos;s type definition. However, the JCas

-        classes for these types are not automatically merged, which can create some issues for JCas users, as

-        discussed in the next section.</para>

-

-    </section>

-    

-    <section id="ugr.ref.jcas.merging_types.jcasgen_support">

-      <title>JCasGen support for type merging</title>

-      

-      <para>When there are multiple definitions of the same CAS type with different features defined, then JCasGen

-        can be re-run on the merged type system, to create one set of JCas Class definitions for the merged types,

-        which can then be shared by all the components. 

-        Directions for running JCasGen can be found in <olink

-          targetdoc="&uima_docs_tools;"/> <olink

-          targetdoc="&uima_docs_tools;" targetptr="ugr.tools.jcasgen"/>. This is typically done by the person who

-        is assembling the Aggregate Analysis Engine or Collection Processing Engine. The resulting merged Java

-        Class Model will then contain get and set methods for the complete set of features. These Java classes must

-        then be made available in the class path, <emphasis>replacing</emphasis> the pre-merge versions of the

-        classes.</para>

-      

-      <para>If hand-modifications were done to the pre-merge versions of the classes, these must be applied to the

-        merged versions, as described in section <xref

-          linkend="ugr.ref.jcas.keeping_augmentations_when_regenerating"/>, above. If just one of the

-        pre-merge versions had hand-modifications, the source for this hand-modified version can be put into the

-        file system where the generated output will go, and the -merge option for JCasGen will automatically

-        merge the hand-modifications with the generated code. If

-        <emphasis>both</emphasis> pre-merged versions had hand-modifications, then these modifications must

-        be manually merged.</para>

-      

-      <para>An alternative to this is packaging the components as individual PEAR files, each with their own

-      version of the JCas generated Classes.  The Framework (as of release 2.2) can run PEAR files using the 

-      pear file descriptor, and supply each component with its particular version of the JCas generated class.</para>

-      

-    </section>

-    

-    <section id="ugr.ref.jcas.impact_of_type_merging_on_composability">

-      <title>Impact of Type Merging on Composability of Annotators</title>

-      <titleabbrev>Type Merging impacts on Composability</titleabbrev>

-      

-      <para>The recommended approach in UIMA is to build and maintain type systems as separate components, which are

-        imported by Annotators. Using this approach, Type Merging does not occur because the Type System and its JCas

-        classes are centrally managed and shared by the annotators.</para>

-      

-      <para>If you do choose to create a JCas Annotator that relies on Type Merging (meaning that your annotator

-        redefines a Type that is already in use elsewhere, and adds its own features), this can negatively impact the

-        reusability of your annotator, unless your component is used as a PEAR file.</para>

-      

-      <para>If not using PEAR file packaging isolation capability, whenever 

-        anyone wants to combine your annotator with another annotator that uses a different version of

-        the same Type, they will need to be aware of all of the issues described in the previous section. They will need

-        to have the know-how to re-run JCasGen and appropriately set up their classpath to include the merged Java

-        classes and to not include the pre-merge classes. (To enable this, you should package these classes

-        separately from other .jar files for your annotator, so that they can be more easily excluded.) And, if you

-        have done hand-modifications to your JCas classes, the person assembling your annotator will need to

-        properly merge those changes. These issues significantly complicate the task of combining annotators, and

-        will cause your annotator not to be as easily reusable as other UIMA annotators. </para>

-      

-    </section>

-    

-    <section id="ugr.ref.jcas.documentannotation_issues">

-      <title>Adding Features to DocumentAnnotation</title>

-      

-      <para>There is one built-in type, <literal>uima.tcas.DocumentAnnotation</literal>, 

-        to which applications can add additional features.  (All other built-in types

-        are "feature-final" and you cannot add additional features to them.)  Frequently,

-        additional features are added to <literal>uima.tcas.DocumentAnnotation</literal> 

-        to provide a place to store document-level metadata.</para>

-      

-      <para>For the same reasons mentioned in the previous section, adding features to 

-        DocumentAnnotation is not recommended if you are using JCas.  Instead, it is recommended

-        that you define your own type for storing your document-level metadata.  You can create 

-        an instance of this type and add it to the indexes in the usual way.  You can then

-        retrieve this instance using the iterator returned from the method<literal>getAllIndexedFS(type)</literal>

-        on an instance of a JFSIndexRepository object.

-        (As of UIMA v2.1, you do not have to declare a custom index in your descriptor to

-        get this to work).</para>

-      

-      <para>If you do choose to add features to DocumentAnnotation, there are additional issues to

-        be aware of.  The UIMA SDK provides the JCas cover class for the built-in definition of

-        DocumentAnnotation, in the separate jar file <literal>uima-document-annotation.jar</literal>.

-        If you add additional features to DocumentAnnotation, you must remove this jar file

-        from your classpath, because you will not want to use the default JCas cover class.

-        You will need to re-run JCasGen as described in <xref

-          linkend="ugr.ref.jcas.merging_types.jcasgen_support"/>.  JCasGen will generate a new cover

-        class for DocumentAnnotation, which you must place in your classpath in lieu of the version

-        in <literal>uima-document-annotation.jar</literal>.</para>

-        

-      <para>Also, this is the reason why the method <literal>JCas.getDocumentAnnotationFs()</literal> returns

-        type <literal>TOP</literal>, rather than type <literal>DocumentAnnotation</literal>.  Because the

-        <literal>DocumentAnnotation</literal> class can be replaced by users, it is not part of

-        <literal>uima-core.jar</literal> and so the core UIMA framework cannot have any references

-        to it.  In your code, you may <quote>cast</quote> the result of <literal>JCas.getDocumentAnnotationFs()</literal> 

-        to type <literal>DocumentAnnotation</literal>, which must be available on the classpath either via 

-        <literal>uima-document-annotation.jar</literal> or by including a custom version that you have generated using JCasGen.</para>

-    </section>

-    

-  </section>

-  

-  <section id="ugr.ref.jcas.using_within_an_annotator">

-    <title>Using JCas within an Annotator</title>

-    

-    <para>To use JCas within an annotator, you must include the generated Java classes output

-      from JCasGen in the class path.</para>

-    

-    <para>An annotator written using JCas is built by defining a class for the annotator that

-      extends JCasAnnotator_ImplBase. The process method for this annotator is

-      written</para>

-    

-    <programlisting>public void process(JCas jcas)

-     throws AnalysisEngineProcessException {

-  ... // body of annotator goes here

-}</programlisting>

-    

-    <para>The process method is passed the JCas instance to use as a parameter.</para>

-    

-    <para>The JCas reference is used throughout the annotator to refer to the particular JCas

-      instance being worked on. In pooled or multi-threaded implementations, there will be a

-      separate JCas for each thread being (simultaneously) worked on.</para>

-    

-    <para>You can do several kinds of operations using the JCas APIs: create new feature

-      structures (instances of CAS types) (using the new operator), access existing feature

-      structures passed to your annotator in the JCas (for example, by using the next method of

-      an iterator over the feature structures), get and set the fields of a particular

-      instance of a feature structure, and add and remove feature structure instances from

-      the CAS indexes. To support iteration, there are also functions to get and use indexes

-      and iterators over the instances in a JCas.</para>

-    

-    <section id="ugr.ref.jcas.new_instances">

-      <title>Creating new instances using the Java <quote>new</quote> operator</title>

-      <titleabbrev>Creating new instances</titleabbrev>

-      

-      <para>The new operator creates new instances of JCas types. It takes at least one

-        parameter, the JCas instance in which the type is to be created. For example, if there

-        was a type Meeting defined, you can create a new instance of it using:

-        

-        <programlisting>Meeting m = new Meeting(jcas);</programlisting></para>

-      

-      <para>Other variations of constructors can be added in custom code; the single

-        parameter version is the one automatically generated by JCasGen. For types that are

-        subtypes of Annotation, JCasGen also generates an additional constructor with

-        additional <quote>begin</quote> and <quote>end</quote> arguments.</para>

-      

-    </section>

-    <section id="ugr.ref.jcas.getters_and_setters">

-      <title>Getters and Setters</title>

-      

-      <para>If the CAS type Meeting had fields location and time, you could get or set these by

-        using getter or setter methods. These methods have names formed by splicing together

-        the word <quote>get</quote> or <quote>set</quote> followed by the field name, with

-        the first letter of the field name capitalized. For instance

-        

-        <programlisting>getLocation()</programlisting></para>

-      

-      <para>The getter forms take no parameters and return the value of the field; the setter

-        forms take one parameter, the value to set into the field, and return void.</para>

-      

-      <para>There are built-in CAS types for arrays of integers, strings, floats, and

-        feature structures. For fields whose values are these types of arrays, there is an

-        alternate form of getters and setters that take an additional parameter, written as

-        the first parameter, which is the index in the array of an item to get or set.</para>

-      

-    </section>

-    

-    <section id="ugr.ref.jcas.obtaining_refs_to_indexes">

-      <title>Obtaining references to Indexes</title>

-      

-      <para>The only way to access instances (not otherwise referenced from other

-        instances) passed in to your annotator in its JCas is to use an iterator over some

-        index. Indexes in the CAS are specified in the annotator descriptor. Indexes have a

-        name; text annotators have a built-in, standard index over all annotations.</para>

-      

-      <para>To get an index, first get the JFSIndexRepository from the JCas using the method

-        jcas.getJFSIndexRepository(). Here are the calls to get indexes:</para>

-      

-      

-      <programlisting>JFSIndexRepository ir = jcas.getJFSIndexRepository();

-

-ir.getIndex(name-of-index) // get the index by its name, a string

-ir.getIndex(name-of-index, Foo.type) // filtered by specific type

-

-ir.getAnnotationIndex()      // get AnnotationIndex

-jcas.getAnnotationIndex()    // get directly from jcas

-ir.getAnnotationIndex(Foo.type)      // filtered by specific type</programlisting>

-jcas.getAnnotationIndex(Foo.class)   // better

-      

-      <para>For convenience, the getAnnotationIndex method is available directly on the JCas object

-      instance; the implementation merely forwards to the associated index repository.</para>

-      

-      <para>Filtering types have to be a subtype of the type specified for this index in its

-        index specification. They can be written as either Foo.type or if you have an instance

-        of Foo, you can write</para>

-      

-      <programlisting>fooInstance.getClass()</programlisting>

-      

-      <para>Foo is (of course) an example of the name of the type.</para>

-      

-    </section>

-    <section id="ugr.ref.jcas.adding_removing_instances_to_indexes">

-      <title>Adding (and removing) instances to (from) indexes</title>

-      <titleabbrev>Updating Indexes</titleabbrev>

-      

-      <para>CAS indexes are maintained automatically by the CAS. But you must add any

-        instances of feature structures you want the index to find, to the indexes by using the

-        call:</para>

-      

-      <programlisting>myInstance.addToIndexes();</programlisting>

-      

-      <para>Do this after setting all features in the instance <emphasis role="bold-italic">which could be used in indexing</emphasis>, 

-        for example, in determining the sorting order.   

-        See <xref linkend="ugr.ref.cas.updating_indexed_feature_structures"/> for details

-        on updating indexed feature structures.

-      </para>

-        

-      <para>When writing a Multi-View component, you may need to index instances in multiple

-        CAS views. The methods above use the indexes associated with the current JCas object.

-        There is a variation of the <literal>addToIndexes / removeFromIndexes</literal> methods which

-        takes one argument: a reference to a JCas object holding the view in which you want to 

-        index this instance.

-        <programlisting>myInstance.addToIndexes(anotherJCas)

-myInstance.removeFromIndexes(anotherJCas)</programlisting>

-      </para>

-      

-      <para>

-        You can also explicitly add instances to other views using the addFsToIndexes method on

-        other JCas (or CAS) objects. For instance, if you had 2 other CAS views (myView1 and

-        myView2), in which you wanted to index myInstance, you could write:</para>

-      

-      <programlisting>myInstance.addToIndexes(); //addToIndexes used with the new operator

-myView1.addFsToIndexes(myInstance); // index myInstance in myView1

-myView2.addFsToIndexes(myInstance); // index myInstance in myView2</programlisting>

-      

-      <para>

-        The rules for determining which index to use with a particular JCas object are designed to

-        behave the way most would think they should; if you need specific behavior, you can always 

-        explicitly designate which view the index adding and removing operations should work on.

-      </para>

-      

-      <para>

-        The rules are:

-        If the instance is a subtype of AnnotationBase, then the view is the view associated with the 

-        annotation as specified in the feature holding the view reference in AnnotationBase.

-        Otherwise, if the instance was created using the "new" operator, then the view is the view passed to the 

-        instance's constructor.

-        Otherwise, if the instance was created by getting a feature value from some other instance, whose range

-        type is a feature structure, then the view is the same as the referring instance.

-        Otherwise, if the instance was created by any of the Feature Structure Iterator operations over some index,

-        then it is the view associated with the index.

-      </para>

-      

-      <para>As of release 2.4.1, there are two efficient bulk-remove methods to remove all instances of a given type, 

-      or all instances of a given type and its subtypes.

-        These are invoked on an instance of an IndexRepository,

-      for a particular view.  For example, to remove all instances of Token from a particular JCas instance:

-            </para> 

-       <programlisting>jcas.removeAllIncludingSubtypes(Token.type) or

-jcas.removeAllIncludingSubtypes(aTokenInstance.getTypeIndexID()) or

-jcas.getFsIndexRepository().

-       removeAllIncludingSubtypes(jcas.getCasType(Token.type))

-</programlisting>

-

-    </section>

-    

-    <section id="ugr.ref.jcas.using_iterators">

-      <title>Using Iterators</title>

-      

-      <para>This chapter describes obtaining and using iterators.  However, it is recommended that instead 

-        you use the select framework, described in a chapter in the version 3 user's guide.</para>

-        

-      <para>Once you have an index obtained from the JCas, you can get an iterator from the

-        index; here is an example:</para>

-      

-      

-      <programlisting>FSIndexRepository ir = jcas.getFSIndexRepository();

-FSIndex myIndex = ir.getIndex("myIndexName");

-FSIterator myIterator = myIndex.iterator();

-

-JFSIndexRepository ir = jcas.getJFSIndexRepository();

-FSIndex myIndex = ir.getIndex("myIndexName", Foo.type); // filtered

-FSIterator myIterator = myIndex.iterator();</programlisting>

-      

-      <para>Iterators work like normal Java iterators, but are augmented to support

-        additional capabilities. Iterators are described in the CAS Reference, <olink

-          targetdoc="&uima_docs_ref;"

-          targetptr="ugr.ref.cas.indexes_and_iterators"/>.</para>

-      

-    </section>

-    

-    <section id="ugr.ref.jcas.class_loaders">

-      <title>Class Loaders in UIMA</title>

-      

-      <para>The basic concept of a UIMA application includes assembling engines into a flow.

-        The application made up of these Engines are run within the UIMA Framework, either by

-        the Collection Processing Manager, or by using more basic UIMA Framework

-        APIs.</para>

-      

-      <para>The UIMA Framework exists within a JVM (Java Virtual Machine). A JVM has the

-        capability to load multiple applications, in a way where each one is isolated from the

-        others, by using a separate class loader for each application. For instance, one set

-        of UIMA Framework Classes could be shared by multiple sets of application - specific

-        classes, even if these application-specific classes had the same names but were

-        different versions.</para>

-      

-      <section id="ugr.ref.jcas.class_loaders.optional">

-        <title>Use of Class Loaders is optional</title>

-        

-        <para>The UIMA framework will use a specific ClassLoader, based on how

-          ResourceManager instances are used. Specific ClassLoaders are only created if

-          you specify an ExtensionClassPath as part of the ResourceManager. If you do not

-          need to support multiple applications within one UIMA framework within a JVM,

-          don&apos;t specify an ExtensionClassPath; in this case, the classloader used

-          will be the one used to load the UIMA framework - usually the overall application

-          class loader.</para>

-        

-        <para>Of course, you should not run multiple UIMA applications together, in this

-          way, if they have different class definitions for the same class name. This

-          includes the JCas <quote>cover</quote> classes. This case might arise, for

-          instance, if both applications extended

-          <literal>uima.tcas.DocumentAnnotation</literal> in differing,

-          incompatible ways. Each application would need its own definition of this class,

-          but only one could be loaded (unless you specify ExtensionClassPath in the

-          ResourceManager which will cause the UIMA application to load its private

-          versions of its classes, from its classpath).</para>

-      </section>

-    </section>

-    

-    <section id="ugr.ref.jcas.accessing_jcas_objects_outside_uima_components">

-      <title>Issues accessing JCas objects outside of UIMA Engine Components</title>

-      

-      <para>If you are using the ExtensionClassPaths, the JCas cover classes are loaded

-        under a class loader created by the ResourceManager part of the UIMA Framework.

-        If you reference the same JCas

-        classes outside of any UIMA component, for instance, in top level application code,

-        the JCas classes used by that top level application code also must be in the class path

-        for the application code.</para>

-      

-      <para>Alternatively, you could do all the JCas processing inside a UIMA component (and do no

-        processing using JCas outside of the UIMA pipeline).</para>

-      

-    </section>

-  </section>

-  

-  <section id="ugr.ref.jcas.setting_up_classpath">

-    <title>Setting up Classpath for JCas</title>

-    

-    <para>The JCas Java classes generated by JCasGen are typically compiled and put into a JAR

-      file, which, in turn, is put into the application&apos;s class path.</para>

-    

-    <para>This JAR file must be generated from the application&apos;s merged type system.

-      This is most conveniently done by opening the top level descriptor used by the

-      application in the Component Descriptor Editor tool, and pressing the Run-JCasGen

-      button on the Type System Definition page.</para>

-    

-  </section>

-  

-  <section id="ugr.ref.jcas.pear_support">

-    <title>PEAR isolation</title>

-    <para>

-      As of version 2.2, the framework supports component descriptors which are PEAR descriptors. 

-      These descriptors define components plus include information on the class path needed to 

-      run them.  The framework uses the class path information to set up a localized class path, just

-      for code running within the PEAR context.  This allows PEAR files requiring different 

-      versions of common code to work well together, even if the class names in the different versions

-      have the same names. 

-    </para>

-    

-    <para>The mechanism used to switch the class loaders when entering a PEAR-packaged annotator in

-    a flow depends on the framework knowing if JCas is being used within that annotator code.  The

-    framework will know this if the particular view being passed has had a previous call to 

-    getJCas(), or if the particular annotator is marked as a JCas-using one (by having it extend the

-    class <code>JCasAnnotator_ImplBase).</code></para>

-    

-  </section>

-  

-</chapter>
\ No newline at end of file
diff --git a/uima-docbook-references/src/docbook/ref.json.xml b/uima-docbook-references/src/docbook/ref.json.xml
deleted file mode 100644
index 3bcf68b..0000000
--- a/uima-docbook-references/src/docbook/ref.json.xml
+++ /dev/null
@@ -1,682 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY imgroot "images/references/ref.json/">

-<!ENTITY tp "ugr.ref.json.">

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.ref.json">

-  <title>JSON Serialization of CASs and UIMA Description objects</title>

-  <titleabbrev>JSON support</titleabbrev>

-

-  <section id="ugr.ref.json.overview">

-    <title>JSON serialization support overview</title>

-     

-    <para>Applications are moving to the "cloud", and new applications are being rapidly developed that are hooking

-    things up using various mashup techniques.  New standards and conventions are emerging to support this kind

-    of application development, such as REST services.  

-    JSON is now a popular way for services to communicate; 

-    its popularity is rising (in 2014) while XML is falling.</para>

-    

-    <para>Starting with version 2.7.0, JSON style serialization (but not (yet) deserialization) 

-    for CASs and UIMA descriptions is supported. 

-    The exact format of the serialization is configurable in several aspects.  

-    The implementation is built on top of the Jackson JSON generation library.   

-    </para>

-    

-    <para>The next section discusses serialization for CASes, while a later section describes serialization

-    of description objects, such as type system descriptions.</para>

-  </section>

-     

-  <section id="ug.ref.json.cas">

-    <title>JSON CAS Serialization</title>

-    

-    <para>CASs primarily consist of collections of Feature Structures (FSs). Similar to XMI serialization, JSON

-    serialization skips serializing unreachable FSs, outputting only those FSs that are found in the indexes (these are called

-    <emphasis>roots</emphasis>), plus all of  

-    the FSs that are referenced via some chain of references, from the roots.  

-    </para>

-    

-    <para>To support the kinds of things users do with FSs, 

-    the serialized form may be augmented to include additional information beyond the FSs.</para>

-    <para>For traditional UIMA implementations, the serialized formats mostly assumed that the receivers had access to

-    a type system description, which specified details of the types of each feature value.  For JSON serialization,

-    some of this information can be including directly in the serialization.</para>

-        

-    <para>This abbreviated type system information is one kind of additional information that can be included; 

-    here's a summary list of the various kinds of additional information you can add to the serialization:</para>

-    <itemizedlist>

-      <listitem>

-        <para>having a way to identify which fields in a FS should be treated as references to other FSs, or

-        as representing serialized binary data from UIMA byte arrays.</para>

-      </listitem>

-      <listitem>

-        <para>something like XML namespaces to allow the use of short type names in the serialization while handling name

-        collisions</para>

-      </listitem>

-      <listitem>

-        <para>enough of the UIMA type hierarchy to allow the common operation of iterating over a type together 

-        with all of its subtypes</para>

-      </listitem>

-      <listitem><para>A way to identify which FSs were "added-to-the-indexes" (separately, per CAS View) 

-      and therefore serve as roots when 

-      iterating over types.</para>

-      </listitem>

-      <listitem><para>An identification of the associated type system definition</para></listitem>

-    </itemizedlist>

-    

-    <para>Simple JSON serialization does not have a convention for supporting these, but many extensions do.

-    We borrow some of the concepts in the JSON-LD (linked data) standard in providing this 

-    additional information.</para>

-    

-    <section id="ug.ref.json.cas.bigpic">

-      <title>The Big Picture</title>

-

-	    <para>CAS JSON serialization consists of several parts: an optional _context, the set of Feature Structures,

-	    and (if doing a delta serialization) information about changes to what was indexed.</para>

-	    

-	    <figure id="ug.ref.json.fig.bigpic">

-	      <title>The major sections of JSON serialization</title>

-		    <mediaobject>

-	        <imageobject>

-	          <imagedata width="3.5in" format="PNG" fileref="&imgroot;big_picture2.png"/>

-	        </imageobject>

-	        <textobject><phrase>The big picture showing the parts of serialization, with the _context optional.</phrase>

-	        </textobject>

-	      </mediaobject>

-      </figure>

-	       

-    <para>The serializer can be configured to omit

-    the _context or parts of the _context for cases where that information isn't needed.  The index changes

-    information is only included if Delta CAS serialization is specified.  Note that Delta CAS support

-    is incomplete; so this information is just for planning purposes.</para>

-    </section>

-    

-    <section id="ug.ref.json.cas.context">

-      <title>The _context section</title>

-          <para>The _context section has entries for each used type as well as some special additional entries. 

-          Each entry for a type has multiple sub-entries, identified

-          by a key-name.  Each sub-entry can be selectively omitted if not needed.

-          

-          

-          <itemizedlist>

-            <listitem><para><emphasis role="bold">_type_system</emphasis> - a URI of the type system information</para></listitem>

-            <listitem><para><emphasis role="bold">_types</emphasis> - information about each used type

-              <itemizedlist>

-		            <listitem><para><emphasis role="bold">_id</emphasis> - the type's fully qualified UIMA type name</para></listitem>

-		            <listitem><para><emphasis role="bold">_feature_types</emphasis> - a map from features of this type to 

-		                                information about the type of the value of the feature</para></listitem>

-		            <listitem><para><emphasis role="bold">_subtypes</emphasis> - an array of used subtype short-names</para></listitem>

-              </itemizedlist>

-            </para></listitem>

-          </itemizedlist>

-          </para>

-          

-			    

-			    <para>Here's an example:</para>

-			    <informalexample>  <!-- does a keep-together -->

-			    <?dbfo keep-together="always" ?>

-          <programlisting>"_context" : {

-  "_type_system" : "URI to the type system information",

-  "_types : {

-    "A_Typical_User_or_built_in_Type" : {

-      "_id" : "org.apache.uima.test.A_Typical_User_or_built_in_Type", 

-      "_feature_types" : [

-           "sofa"         : "_ref", 

-           "aFS"          : "_ref", 

-           "an_array"     : "_array",

-           "a_byte_array" : "_byte_array"],

-      "_subtypes" : [ "subtype1", "subtype2", ... ] }, 

-    "Sofa" : {

-      "_id" : "uima.cas.Sofa", 

-      "_feature_types" : {"sofaArray" : "_ref"} }

-  }

-}</programlisting></informalexample>

-

-      <para>The <emphasis role="bold">_type_system</emphasis> is an optional URI that references a UIMA type system description that

-      defines the types for the CAS being serialized.</para>

-      

-      <para>In the <emphasis role="bold">_types</emphasis> section, the key (e.g. "Sofa" or "A_Typical_User_or_built_in_Type") is the "short" name 

-      for the type used in the serialization.  

-      It is either just

-      the last segment of the full type name (e.g. for the type x.y.z.TypeName, it's TypeName), or, 

-      if name would collide with another type name if just the last segment

-      was used (example:  some.package.cname.Foo,  and some.other.package.cname.Foo), then the key is made up of

-      the next-to-last segment, with an optional suffixed incrementing integer in case of collisions on that name,

-      a colon (:) and then the last name.</para>

-      

-      <blockquote><para>In this example, since the next to last segment of both names is

-      "cname", one namespace name would be "cname", and the other would be "cname1".  The keys in this case would be

-      cname:Foo and cname1:Foo.</para></blockquote>

-      

-      <para>The value of the _id is the fully qualified name of the type.</para>

-      

-      <para>The <emphasis role="bold">_feature_types</emphasis> values of _ref, _array, and _byte_array indicate the corresponding values 

-      of the named features need special handling 

-      when deserailized.

-      <itemizedlist>

-        <listitem><para><emphasis role="bold">_ref</emphasis> - used when features are deserialized as numbers, but they are to be

-      interpreted as references to other FSs whose <code>id</code> is the number.  UIMA lists and arrays of 

-      FSs are marked with _ref; if the value is a JSON array, the elements of the array will be either

-      numbers (to be interpreted as references), or embedded serializations of FSs.</para></listitem>

-      

-        <listitem><para><emphasis role="bold">_array</emphasis> - used when features are serialized as JSON 

-        arrays containing embedded values, 

-      unless the corresponding UIMA object has

-      multiple references, in which case it is serialized as a FS reference which looks like a single number.

-      If a feature is marked with _array, then a non-array, single number should be interpreted as the

-      <code>id</code> of the feature structure that is the array or the first element of the list of items.

-      This designation is used for both UIMA arrays and lists.</para>

-      

-      <para>This designation is for arrays and lists of primitive values, except for byte arrays.  

-      In the case of FS arrays and lists, the _ref designation is used instead of this to indicate that the 

-      resulting values in a JSON array that look like numbers should be interpreted as references.</para></listitem>

-      

-      <listitem><para><emphasis role="bold">_byte_array</emphasis> - _byte_array features are serialized numbers (if they are a 

-      reference to a separate object, or as strings (if embedded).  The strings are to be decoded into

-      binary byte arrays using the Base64 encoding (the standard one used by Jackson to serialize binary data).</para></listitem>

-      </itemizedlist>  

-      </para>

-      

-      <para>

-      Note that single element arrays are <emphasis>not</emphasis> unwrapped, as in some other JSON serializations, to enable distinguishing

-      references to arrays from embedded arrays.

-      </para>

-

-      <para><emphasis role="bold">_subtypes</emphasis> are a list of the type's used subtypes.  A type is <emphasis>used</emphasis>

-       if it is the type of a Feature Structure

-      being serialized,

-      or if it is in the supertype chain of some Feature Structure which is serialized.  If a type has no

-      used subtypes, this element is omitted.

-      The names are represented as the "short" name.  Users typically use this information

-      to construct support for iterators over a type which includes all of its subtypes.</para>

-

-     

-      

-      <!-- 

-      <para>_supertypes are a list of the type's supertypes, in the natural nearest to farthest order.

-      The list is truncated as soon as a type is mentioned which already (in a previous _supertypes)

-      has had its supertypes enumerated.</para> -->

-      

-      <section id="ug.ref.json.cas.context.omit">

-          <title>Omitting parts of the _context section</title>

-          <para>It is possible to selectively omit some of the 

-          _context sections (or the entire _context), via configuration.  

-          Here's an example:</para>

-

-          <informalexample>  <!-- does a keep-together -->

-          

-          <programlisting>// make a new instance to hold the serialization configuration           

-JsonCasSerializer jcs = new JsonCasSerializer();  

-// Omit the expanded type names information

-jcs.setJsonContext(JsonContextFormat.omitExpandedTypeNames);</programlisting></informalexample>

-         

-          <para>See the Javadocs for <code>JsonContextFormat</code> for how to specify the parts.</para>

-      </section>

-      

-    </section>  <!-- of context -->

-    

-    <section id="ug.ref.json.cas.featurestructures">

-      <title>Serializing Feature Structures</title>

-      

-    <para>Feature Structures themselves are represented as JSON objects consisting of field - value pairs, where the 

-    fields correspond to UIMA Features, and the values are the values of the features. 

-    </para>

-    

-    <para>The various kinds of values for a UIMA feature are represented by their natural JSON counterpart.

-    UIMA primitive boolean values are represented by JSON true and false literals. UIMA Strings are 

-    represented as JSON strings.  Numbers are represented by JSON numbers. 

-    Byte Arrays are represented by the Jackson standard binary encoding (base64 encoding), written as JSON strings. 

-    References to other Feature Structures are also represented as JSON integer numbers, the values of which are 

-    interpreted as ids of the referred-to

-    FSs.  These ids are treated in the same manner as the xmi:ids of XMI Serialization.  Arrays and Lists when

-    embedded (see following section) are represented as JSON arrays using the [] notation.</para>

-    

-    <para>Besides the feature values defined for a Feature Structure, an additional special feature

-    may be serialized:  _type.  

-    The _type is the type name, written using the short format.  This is automatically included when the type cannot 

-    easily be

-    inferred from other contextual information. 

-    </para>

-    

-    <para>Here's an example, with some comments which, since JSON doesn't support comments, are just here for explanation:</para>

-              <informalexample>  <!-- does a keep-together -->

-    <programlisting>{ "_type" : "Annotation", // _type may be omitted

-  "feat1" : true,   // boolean value represented as true or false

-  "feat2" : 123,    // could be a number or a reference to FS with id 123

-  "feat3" : "b3axgh"//could be a string or a base64 encoded byte array

-}</programlisting></informalexample>    

-        

-    

-    <section id="ug.ref.json.cas.featurestructures.embedding">

-      <title>Embedding normally referenced values</title>

-      

-      <para>Consider a FS which has a feature that refers to another FS.  This can be serialized in one of two ways:</para>

-      <itemizedlist spacing="compact">

-        <listitem><para>the value of the feature can be coded as an <code>id</code> (a number), where the number is the <code>id</code> of the

-        referred-to FS.</para></listitem>

-        <listitem><para>The value of the feature can be coded as the serialization of the referred-to FS.</para></listitem>

-      </itemizedlist>

-      

-      <para>

-      This second way of encoding is often done by JSON style serializations, and is called "embedding".  Referred-to 

-      FSs may be embedded if there are no other references to the embedded FS.  Multiple references may arise due to

-      having a FS referenced as a "root" in some CAS View, or being used as a value in a FS feature.</para>

-      

-      <para>Following the XMI conventions, UIMA arrays and lists which are 

-      identified as singly referenced by either the static or dynamic method (see below) are embedded

-      directly as the value of a feature.  In this case, the JSON serialization writes out the value of the feature

-      as a JSON array.  Otherwise, the value is written out as a FS reference, and a separate serialization occurs of 

-      the list elements or the array.</para>

-      

-      <para>In addition to arrays and lists, FSs which are identifed as singly referenced from another FS are

-      serialized as the embedded value of the referring feature.  

-      This is also done (when using the dynamic method) for singly referenced rooted instances.

-      </para>

-      <para>  

-      If a FS is multiply referenced, the serialization in these

-      cases is just the numeric value of the <code>id</code> of the FS.</para>

-      </section>

-      

-      <section id="ug.ref.json.cas.featurestructures.dynamicstatic">

-        <title>Dynamic vs Static multiple-references and embedding</title>

-      

-      <para>There are two methods of determining if a particular FS or list or array can be embedded.

-      

-      <itemizedlist>

-        <listitem><para><emphasis role="bold">dynamic</emphasis> - calculates at serilization time whether or not there

-        are multiple references to a given FS.</para></listitem>

-        <listitem><para><emphasis role="bold">static</emphasis> - looks in the type system definition to see if 

-        the feature is marked with &lt;multipleReferencesAllowed&gt;.  

-        <itemizedlist spacing="compact">

-          <listitem><para><code>multipleReferencesAllowed</code> false &rarr; use the embedded style</para></listitem>

-          <listitem><para><code>multipleReferencesAllowed</code> true &rarr; use separate objects</para></listitem>

-        </itemizedlist>

-        Note that since this flag is not available for 

-        references to FSs from View indexes, any FS that is indexed in any view is considered (if using static mode)

-        to be multipleReferencesAllowed. 

-        </para></listitem>

-      </itemizedlist>

-      </para>

-       

-      <para>Delta serialization only supports the static method; this mode is forced on if delta serialization

-      is specified.</para>

-       

-      <para>Dynamic embedding is enabled by default for JSON, but may be disabled via configuration.</para>

-    </section>

-    

-    <section id="ug.ref.json.cas.featurestructures.embeddedArraysLists">

-      <title>Embedded Arrays and Lists</title>

-    

-    <para>When static embedding is being used, a case can arise where some feature is marked to have only 

-    singly referenced FS values, but that value may actually be multiply referenced.  This is detected during 

-    serialization, and an message is issued if an error handler has been specified to the serializer.

-    The serialization continues, however.  In the case of an Array, the value of the array is embedded

-    in the serialization and the fact that these were referring to the same object is lost.

-    In the case of a list, if any element in the list

-    has multiple references (for example,  if the list has back-references, loops, etc.), 

-    the serialization of the list is truncated at the point where the multiple reference

-    occurs.</para>

-    

-    <blockquote><para>Note that you can correctly serialize arbitrarily linked complex list structures created 

-    using the built-in list types only if you use dynamic embedding, or 

-    specify <code>multipleReferencesAllowed</code> = true.</para></blockquote> 

-    

-    

-    <para>Embedded list or array values are both serialized using the JSON array notation; as a result, these

-    alternative representations are not distinguised in the JSON serialization.</para>

-    </section>  <!-- end of embedded Arraylists-->

-    

-    <section id="ug.ref.json.cas.featurestructures.null">

-      <title>Omitting null values</title>

-    

-      <para>Following the conventions established in XMI serialization, features with <code>null</code> values have their

-      key-value pairs omitted from the FS serialization when the type of the feature value is:

-      </para>

-    <itemizedlist spacing="compact">

-      <listitem>

-        <para>a Feature Structure Reference</para>

-      </listitem>

-      <listitem>

-        <para>a String ( whose value is <code>null</code>, not "" (a 0-length String))</para>

-      </listitem>

-            <listitem>

-        <para>an embedded Array or List (where the entire array and/or list is <code>null</code>)</para>

-      </listitem>

-    </itemizedlist>

-    

-    <note><para>Inside arrays or lists of FSs, references which are being serialized

-    as references have a <code>null</code> reference coded as the number 0; references which are embedded are serialized as 

-    <code>null</code>.</para></note>

-    

-    <para>Configuring the serializer with <code>setOmit0Values(true)</code> causes

-    additional primitive features (byte/short/int/long/float/double) to be omitted, when their values are 0 or 0.0</para>

-        

-    </section>

-        

-    </section>

-    

-    </section>

-        

-    <section id="ug.ref.json.cas.featurestructures.organization">

-      <title>Organizing the Feature Structures</title>

-    

-    <para>The set of all FSs being serialized is divided into two parts.  The first part represents

-    all FSs that are root FSs, in that they were in one or more indexes at the time of serialization.  The second part

-    represents feature structures that are multiply referenced, or are referenced via a chain of references from the

-    root FSs.  The same feature structure can appear in both lists.  The elements in the second part are actual 

-    serialized FSs, whereas, the elements in the first part are either references to the corresponding FSs in the

-    second part, if they exist, or the actual embedded serialized FSs.  Actual embedded serialized FSs only

-    exist once in the two parts.</para>

-    

-              <informalexample>  <!-- does a keep-together -->

-    <programlisting>"_views" : {

-  "_InitialView" : {

-     "theFirstType" : [  { ... fs1 ...}, 123, 456, { ... fsn ...} ]

-     "anotherType"  : [  { ... fs1 ...}, ... { ... fsn ...} ]

-      ...     // more types which have roots in view "12"

-         },

-  "AnotherView" : {

-     "theFirstType" : [  { ... fsv1 ...}, 123, { ... fsvn ...} ]

-     "anotherType"  : [  { ... fsv1 ...}, ... { ... fsvn ...} ]

-      ...     // more types which have roots in view "25"

-         },

-   ...        // more views         

-}, 

-

-"_referenced_fss" : {

-  "12" : {"_type" : "Sofa",  "sofaNum" : 1,  "sofaID" : "_InitialView" },

-  "25" : {"_type" : "Sofa",  "sofaNum" : 2,  "sofaID" : "AnotherView" },

-  

-  "123" : { ... fs-123 ... },

-  "456" : { ... fs-456 ... },

-  ...

-}</programlisting></informalexample>

-    

-    <para>The first part map is made up of multiple maps, one for each separate CAS View.

-    The outer map is keyed by the <code>id</code> of the corresponding SofaFS (or 0, if there is no corresponding SofaFS).

-    For each view, the value is a map whose key is a used Type, and the values are an array of instances

-    of FSs of that type which were found in some index; these are the "root" FSs.  Only root instances

-    of a particular type are included in this array.  

-    </para>

-    

-        

-    <para>The second part map has keys which are the <code>id</code> value of the FSs, and values which are 

-    a map of key-value pairs corresponding to the feature-values of that FS.

-    In this case, the _type extra feature is added to record the type.</para>

-    

-    <!--       

-    <para>

-    FSs can be referenced either from being the value of some Feature in an FS, or 

-    by being in one or more indexes at the time of serialization. 

-    If there are more than 1 of these kinds of references, then the FS is

-    multiply referenced.  All FSs that are multiply referenced are not "embedded" at their reference points;

-    instead each reference point has a value which is the <code>id</code> of that FS, and which can be used as the "key"

-    in the _referenced_fss map to obtain the actual serialized form of the FS.</para>

-     -->

-    

-    <para>The _views map, keyed by view and type name, has all the FSs (as an JSON array) for that type that were in

-    one or more indexes in any View.  If a FS in this array is not multiply referenced (using dynamic mode), 

-    then it is embedded here. Otherwise, only the reference (a simple number representing the <code>id</code> of that FS) is serialized for that FS.</para>

-    

-    <!-- 

-    <para>Here's a picture of what the _types map would look like with two views, with Sofa ids of 22 and 99.</para>

-    

-    <figure>

-      <title>Feature Structure Organization</title>

-        <mediaobject>

-          <imageobject>

-            <imagedata width="2.5in" format="PNG" fileref="&imgroot;multi.view.png"/>

-          </imageobject>

-          <textobject><phrase>Multi-view serialized Feature Structures</phrase></textobject>

-        </mediaobject>

-    </figure>   

-          -->

-               

-    </section>  <!-- of representing the collection of FSs -->

-

-    <!--  

-    <section id="ug.ref.json.cas.casviews">

-      <title>CAS Views</title>

-      <para>The last section in a JSON CAS serialization is the _cas_views.  

-      This contains for each view, an array of IDs of FSs that the serializing application added to its indexes.  

-        These arrays are stored in a map, with the key being the <code>id</code> for the Sofa FS associated with the view, or

-        "0" for the edge case where no Sofa has (yet) been created for the initial view (this is an edge case that

-        normally will not occur). For

-        delta-cas serialization (where only changes are being serialized), this array is replaced with a map 

-        of 3 keys:  "added-members", "deleted-members", and "reindexed-members", 

-        the values of which are arrays of IDs of FSs which were added, removed, or reindexed by the application

-        doing the serializing.</para>

-        

-      <para>In some cases, the receiving application may only want to index FSs 

-      that the serializing application had indexed, but otherwise, this information is not of much use 

-      to that application, as that application would likely have its own indexing needs.  .</para>

-    

-    </section> <!- - end of cas views - ->

-     -->

-    <!--  

-    <section id="ug.ref.json.cas.xmidiff">

-      <title>Differences with XMI serialization</title>

-      

-      <para>JSON serialization shares the same implementation core with XMI serialization,

-      except of course for serializing in JSON formats vs XMI/XML formats.

-      One area where it differs is in the treatment of so-called out-of-type-system data.</para>

-      

-      <para>XMI deserialization can be specified with a "lenient" flag, which allows the incoming data to 

-    include types and features which are not present in the type system being deserialized into. These 

-    data are called "out-of-type-system" data (oots).  The XMI serialization merges the oots data back into the

-    the output serialization (if not doing a delta-cas serialization), thus preserving types and features in the

-    serialization it doesn't have definitions for.</para>

-    

-    <para>

-    JSON serialization doesn't support this, mainly because there's no type information available for the 

-    oots data, and the JSON _context information for these types can't be generated.</para>

-     -->      

-        

-

-     

-    <section id="ug.ref.json.cas.features">

-      <title>Additional JSON CAS Serialization features</title>

-

-    <para>JSON serialization also supports several additional features, including:</para>

-    <itemizedlist>

-      <!-- 

-      <listitem>

-        <para>)Delta CAS: only serializes changes since the mark was set</para>

-      </listitem>

-       -->

-      <listitem>

-        <para>Type and feature filtering: only types and features that exist in a specified type system description 

-        are serialized.</para>

-      </listitem>

-      <listitem>

-        <para>An ErrorHandler; this will be called in various error situations, including when 

-        serializing in static mode an array or list value for a feature marked <code>multipleReferencesAllowed = false</code>

-        is found to have multiple references.</para>

-      </listitem>

-      <listitem>

-        <para>A switch to control omitting of numeric features that have 0 values (default is to include these).

-        See the <code>setOmit0Values(true_or_false)</code> method in JsonCasSerializer.</para>

-      </listitem>

-      <listitem>

-        <para>a pretty printing flag (default is not to do pretty-printing)</para>

-      </listitem>

-    </itemizedlist>

-    <para>See the Javadocs for JsonCasSerializer for details.</para>

-    

-    <section id="ugr.ref.json.delta">

-      <title>Delta CAS</title>

-      

-      <note><para>Delta CAS support is incomplete, and is not supported as of release 2.7.0, but may

-      be supported in later releases.  The information here is just for planning purposes.</para></note>

-      

-      <para><emphasis role="bold">_delta_cas</emphasis> is present only when a delta CAS serialization is being performed.  

-      This serializes just the 

-      changes in the CAS since a Mark was set; so for cases where a large CAS is deserialized into a service, which

-      then does a relatively small amount of additions and modifications, only those changes are serialized.

-      The values of the keys are arrays of the ids of FSs that were added to the indexes, 

-      removed from the indexes, or reindexed.</para>

-      

-      <para>This mode requires the static embeddability mode.  When specified, a <code>_delta_cas</code> key-value 

-      is added to the serialization at the end, 

-      which lists the FSs (by <code>id</code>) that were added, removed, or reindexed, since the mark was set.  

-      Additional extra information, created when the CAS was previously deserialized and the mark set, 

-      must be passed to the serializer, in the form of an instance of <code>XmiSerializationSharedData</code>,

-      or JsonSerializationSharedData (not yet defined as of release 2.7.0).</para>

-      

-      <para>Here's what the last part of the serialization looks like, when Delta CAS is specified:

-                <informalexample>  <!-- does a keep-together -->

-      <programlisting>"_delta_cas" : {

-  "added_members" : [  123, ... ],

-  "deleted_members" : [  456, ... ],

-  "reindexed_members" : [] }</programlisting></informalexample>

-      </para>

-      

-      

-    </section>

-  </section>

-

-  

-  <section id="ugr.ref.json.usage">

-    <title>Using JSON CAS serialization</title>

-    

-    <para>The support is built on top the Jackson JSON serialization

-    package.  We follow Jackson conventions for configuring.</para>

-    

-    <para>The serialization APIs are in the JsonCasSerializer class.</para>

-    

-    <para>Although there are some static short-cut methods for common use cases, the basic operations needed

-    to serialize a CAS as JSON are:</para>

-    

-    <itemizedlist>

-      <listitem>

-        <para>Make an instance of the <code>JsonCasSerializer</code> class.  This will serve to collect configuration information.</para>

-      </listitem>

-      <listitem>

-        <para>Do any additional configuration needed.  See the Javadocs for details.  

-        The following objects can be configured:</para>

-        <itemizedlist>

-          <listitem>

-            <para>The <code>JsonCasSerializer</code> object: here you can specify the kind of JSON formatting, what to serialize,

-            whether or not delta serialization is wanted, prettyprinting, and more.</para>

-          </listitem>

-          <listitem>

-            <para>The underlying <code>JsonFactory</code> object from Jackson.  Normally, you won't need to configure this.

-            If you do, you can create your own instance of this object and configure it and use it in the

-            serialization.</para>

-          </listitem>

-          <listitem>

-            <para>The underlying <code>JsonGenerator</code> from Jackson. Normally, you won't need to configure this.

-            If you do, you can get the instance the serializer will be using and configure that.</para>

-          </listitem>

-        </itemizedlist>

-      </listitem>

-      <listitem>

-        <para>Once all the configuration is done, the serialize(...) call is done in this class, 

-        which will create a one-time-use

-        inner class where the actual serialization is done.  The serialize(...) method is thread-safe, in that the same 

-        JsonCasSerializer instance (after it has been configured) can kick off multiple 

-        (identically configured) serializations 

-        on different threads at the same time.</para>

-        <para>The serialize call follows the Jackson conventions, taking one of 3 specifications of where to serialize to:

-        a Writer, an OutputStream, or a File.</para>

-      </listitem>

-    </itemizedlist>

-    

-    <para>Here's an example:</para>

-              <informalexample>  <!-- does a keep-together -->

-    <programlisting>JsonCasSerializer jcs = new JsonCasSerializer();

-jcs.setPrettyPrint(true); // do some configuration

-StringWriter sw = new StringWriter();                          

-jcs.serialize(cas, sw); // serialize into sw</programlisting></informalexample>

-    

-    <para>The JsonCasSerializer class also has some static convenience methods for JSON serialization, for the

-    most common configuration cases; please see the Javadocs for details. These are named jsonSerialize, to 

-    distinguish them from the non-static serialize methods.</para>

-

-    <para>Many of the common configuration methods generally return the instance, so they can be chained together.

-    For example, if <code>jcs</code> is an instance of the JsonCasSerializer, you can write

-    <code>jcs.setPrettyPrint(true).setOmit0values(true);</code> to configure both of these.</para>

-

-   <!--  of configuring  -->

-

-  </section> <!--  of JSON Cas Serialization -->

-  

-  <section id="ugr.ref.json.descriptionserialization">

-    <title>JSON serialization for UIMA descriptors</title>

-    

-    <para>UIMA descriptors are things like analysis engine descriptors, type system descriptors, etc. 

-    UIMA has an internal form for these, typically named UIMA <emphasis>description</emphasis>s; 

-    these can be serialized out as XML using a <code>toXML</code> method.  

-    JSON support adds the ability to serialize these a JSON objects, as well.  It may be of use, for example,

-    to have the full type system description for a UIMA pipeline available in JSON notation.

-    </para>

-    

-    <para>The class JsonMetaDataSerializer defines a set of static methods that serialize UIMA description objects

-    using a toJson method that takes as an argument the description object to be serialized, and the standard

-    set of serialiization targets that Jackson supports (File, Writer, or OutputStream).  There is also

-    an optional prettyprint flag (default is no prettyprinting).</para>

-    

-    <para>The resulting JSON serialization is just a straight-forward serialization of the description object,

-    having the same fields as the XML serialization of it.</para>

-    

-    <para>Here's what a small TypeSystem description looks like, serialized:</para>

-    

-              <informalexample>  <!-- does a keep-together -->

-    <programlisting>{"typeSystemDescription" : 

-  {"name" : "casTestCaseTypesystem",  

-   "description" : "Type system description for CAS test cases.",  

-   "version" : "1.0",  

-   "vendor" : "Apache Software Foundation",  

-   "types" : [

-     {"typeDescription" : 

-       {"name" : "Token",  

-        "description" : "",  

-         "supertypeName" : "uima.tcas.Annotation",  

-         "features" : [

-           {"featureDescription" : 

-             {"name" : "type",  

-              "description" : "",  

-              "rangeTypeName" : 

-              "TokenType" } }, 

-           {"featureDescription" : 

-             {"name" : "tokenFloatFeat",  

-              "description" : "",  

-              "rangeTypeName" : "uima.cas.Float" } } ] } }, 

-     {"typeDescription" : 

-       {"name" : "TokenType",  

-        "description" : "",  

-        "supertypeName" : "uima.cas.TOP" } } ] } }</programlisting></informalexample>

-        

-    <para>Here's a sample of code to serialize a UIMA description object held in the variable <code>tsd</code>, with 

-    and without pretty printing:</para>

-    

- 

-          <informalexample>  <!-- does a keep-together -->

-    <programlisting>StringWriter sw = new StringWriter();                               

-JsonMetaDataSerializer.toJSON(tsd, sw); // no prettyprinting

-

-sw = new StringWriter();             

-JsonMetaDataSerializer.toJSON(tsd, sw, true); // prettyprinting</programlisting></informalexample>

-  </section>

-  

-</chapter>
\ No newline at end of file
diff --git a/uima-docbook-references/src/docbook/ref.pear.xml b/uima-docbook-references/src/docbook/ref.pear.xml
deleted file mode 100644
index 47fb02d..0000000
--- a/uima-docbook-references/src/docbook/ref.pear.xml
+++ /dev/null
@@ -1,1016 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY imgroot "images/references/ref.pear/" >

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  

-%uimaents;

-]>

-<!--

-	Licensed to the Apache Software Foundation (ASF) under one

-	or more contributor license agreements.  See the NOTICE file

-	distributed with this work for additional information

-	regarding copyright ownership.  The ASF licenses this file

-	to you under the Apache License, Version 2.0 (the

-	"License"); you may not use this file except in compliance

-	with the License.  You may obtain a copy of the License at

-	

-	http://www.apache.org/licenses/LICENSE-2.0

-	

-	Unless required by applicable law or agreed to in writing,

-	software distributed under the License is distributed on an

-	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-	KIND, either express or implied.  See the License for the

-	specific language governing permissions and limitations

-	under the License.

--->

-<chapter id="ugr.ref.pear">

-	<title>PEAR Reference</title>

-

-	<para>

-		A PEAR (Processing Engine ARchive) file is a standard package

-		for UIMA components. This chapter describes the PEAR 1.0 structure and

-		specification.

-	</para>

-

-	<para>

-		The PEAR package can be used for distribution and reuse by other

-		components or applications. It also allows applications and

-		tools to manage UIMA components automatically for verification,

-		deployment, invocation, testing, etc.

-	</para>

-

-	<para>

-		Currently, there is an Eclipse plugin and a command line tool

-		available to create PEAR packages for standard UIMA components.

-		Please refer to

-		<olink targetdoc="&uima_docs_tools;"/>

-		<olink targetdoc="&uima_docs_tools;"

-			targetptr="ugr.tools.pear.packager" />

-		for more information about these tools.

-	</para>

-  

-  <para>

-    PEARs distributed to new targets can be installed at those targets.

-    UIMA includes a tool for installing PEARs; see 

-		<olink targetdoc="&uima_docs_tools;"/>

-    <olink targetdoc="&uima_docs_tools;"

-      targetptr="ugr.tools.pear.installer"/> for 

-    more information about installing PEARs.

-  </para>

-  

-  <para>

-    An installed PEAR can be used as a component within a UIMA pipeline,

-    by specifying the pear descriptor that is created when

-    installing the pear.  See

-    <olink targetdoc="&uima_docs_ref;"

-      targetptr="ugr.ref.pear.specifier"/>.

-  </para>

-

-	<section id="ugr.ref.pear.packaging_a_component">

-		<title>Packaging a UIMA component</title>

-

-		<para>

-			For the purpose of describing the process of creating a PEAR

-			file and its internal structure, this section describes the

-			steps used to package a UIMA component as a valid PEAR file.

-			The PEAR packaging process consists of the following steps:

-

-			<itemizedlist>

-				<listitem>

-					<para>

-						<xref

-							linkend="ugr.ref.pear.creating_pear_structure" />

-					</para>

-				</listitem>

-

-				<listitem>

-					<para>

-						<xref

-							linkend="ugr.ref.pear.populating_pear_structure" />

-					</para>

-				</listitem>

-

-				<listitem>

-					<para>

-						<xref

-							linkend="ugr.ref.pear.creating_installation_descriptor" />

-					</para>

-				</listitem>

-

-				<listitem>

-					<para>

-						<xref

-							linkend="ugr.ref.pear.packaging_into_1_file" />

-					</para>

-				</listitem>

-			</itemizedlist>

-		</para>

-

-		<section id="ugr.ref.pear.creating_pear_structure">

-			<title>Creating the PEAR structure</title>

-

-			<para>

-				The first step in the PEAR creation process is to create

-				a PEAR structure. The PEAR structure is a structured

-				tree of folders and files, including the following

-				elements:

-

-				<itemizedlist>

-					<listitem>

-						<para>

-							Required Elements:

-

-							<itemizedlist>

-								<listitem>

-									<para>

-										The

-										<emphasis role="bold">

-											metadata

-										</emphasis>

-										folder which contains the PEAR

-										installation descriptor and

-										properties files.

-									</para>

-								</listitem>

-

-								<listitem>

-									<para>

-										The installation descriptor (

-										<emphasis role="bold">

-											metadata/install.xml

-										</emphasis>

-										)

-									</para>

-								</listitem>

-

-								<listitem>

-									<para>

-										A UIMA analysis engine

-										descriptor and its required

-										code, delegates (if any), and

-										resources

-									</para>

-								</listitem>

-							</itemizedlist>

-						</para>

-					</listitem>

-

-					<listitem>

-						<para>

-							Optional Elements:

-

-							<itemizedlist>

-								<listitem>

-									<para>

-										The desc folder to contain

-										descriptor files of analysis

-										engines, delegates analysis

-										engines (all levels), and other

-										components (Collection Readers,

-										CAS Consumers, etc).

-									</para>

-								</listitem>

-

-								<listitem>

-									<para>

-										The src folder to contain the

-										source code

-									</para>

-								</listitem>

-

-								<listitem>

-									<para>

-										The bin folder to contain

-										executables, scripts, class

-										files, dlls, shared libraries,

-										etc.

-									</para>

-								</listitem>

-

-								<listitem>

-									<para>

-										The lib folder to contain jar

-										files.

-									</para>

-								</listitem>

-

-								<listitem>

-									<para>

-										The doc folder containing

-										documentation materials,

-										preferably accessible through an

-										index.html.

-									</para>

-								</listitem>

-

-								<listitem>

-									<para>

-										The data folder to contain data

-										files (e.g. for testing).

-									</para>

-								</listitem>

-

-								<listitem>

-									<para>

-										The conf folder to contain

-										configuration files.

-									</para>

-								</listitem>

-

-								<listitem>

-									<para>

-										The resources folder to contain

-										other resources and

-										dependencies.

-									</para>

-								</listitem>

-

-								<listitem>

-									<para>

-										Other user-defined folders or

-										files are allowed, but should be

-										avoided.

-									</para>

-								</listitem>

-							</itemizedlist>

-						</para>

-					</listitem>

-				</itemizedlist>

-			</para>

-

-			<figure id="ugr.ref.pear.fig.pear_structure">

-				<title>The PEAR Structure</title>

-				<mediaobject>

-					<imageobject>

-						<imagedata width="3in" format="JPG"

-							fileref="&imgroot;image002.jpg" />

-					</imageobject>

-					<textobject>

-						<phrase>diagram of the PEAR structure</phrase>

-					</textobject>

-				</mediaobject>

-			</figure>

-

-		</section>

-		<section id="ugr.ref.pear.populating_pear_structure">

-			<title>Populating the PEAR structure</title>

-

-			<para>

-				After creating the PEAR structure, the component&apos;s

-				descriptor files, code files, resources files, and any

-				other files and folders are copied into the

-				corresponding folders of the PEAR structure. The

-				developer should make sure that the code would work with

-				this layout of files and folders, and that there are no

-				broken links. Although it is strongly discouraged, the

-				optional elements of the PEAR structure can be replaced

-				by other user defined files and folder, if required for

-				the component to work properly.

-			</para>

-			<note>

-				<para>

-					The PEAR structure must be self-contained. For

-					example, this means that the component must run

-					properly independently from the PEAR root folder

-					location. If the developer needs to use an absolute

-					path in configuration or descriptor files, then

-					he/she should put these files in the

-					<quote>conf</quote>

-					or

-					<quote>desc</quote>

-					and replace the path of the PEAR root folder with

-					the string

-					<quote>$main_root</quote>

-					. The tools that deploy and use PEAR files should

-					localize the files in the

-					<quote>conf</quote>

-					and

-					<quote>desc</quote>

-					folders by replacing the string

-					<quote>$main_root</quote>

-					with the local absolute path of the PEAR root

-					folder. The

-					<quote>$main_root</quote>

-					macro can also be used in the Installation

-					descriptor (install.xml)

-				</para>

-			</note>

-

-			<para>

-				Currently there are three types of component packages

-				depending on their deployment:

-			</para>

-

-			<section id="ugr.ref.pear.package_type.standard">

-				<title>Standard Type</title>

-

-				<para>

-					A component package with the

-					<emphasis role="bold">standard</emphasis>

-					type must be a valid Analysis Engine, and all the

-					required files to deploy it locally must be included

-					in the PEAR package.

-				</para>

-

-			</section>

-			<section id="ugr.ref.pear.package_type.service">

-				<title>Service Type</title>

-

-				<para>

-					A component package with the

-					<emphasis role="bold">service</emphasis>

-					type must be deployable locally as a supported UIMA

-					service (e.g. Vinci). In this case, all the required

-					files to deploy it locally must be included in the

-					PEAR package.

-				</para>

-

-			</section>

-			<section id="ugr.ref.pear.package_type.network">

-				<title>Network Type</title>

-

-				<para>

-					A component package with the network type is not

-					deployed locally but rather in the

-					<quote>remote</quote>

-					environment. It&apos;s accessed as a network AE

-					(e.g. Vinci Service). The component owner has the

-					responsibility to start the service and make sure

-					it&apos;s up and running before it&apos;s used by

-					others (like a webmaster that makes sure the web

-					site is up and running). In this case, the PEAR

-					package does not have to contain files required for

-					deployment, but must contain the network AE

-					descriptor (see

-					<olink targetdoc="&uima_docs_tutorial_guides;"

-          /> <olink targetdoc="&uima_docs_tutorial_guides;"

-						targetptr="ugr.tug.aae.creating_xml_descriptor" />

-					) and the &lt;DESC&gt; tag in the installation

-					descriptor must point to the network AE descriptor.

-					For more information about Network Analysis Engines,

-					please refer to

-					<olink targetdoc="&uima_docs_tutorial_guides;"

-          /> <olink targetdoc="&uima_docs_tutorial_guides;"

-						targetptr="ugr.tug.application.remote_services" />

-					.

-				</para>

-

-			</section>

-		</section>

-

-		<section id="ugr.ref.pear.creating_installation_descriptor">

-			<title>Creating the installation descriptor</title>

-

-			<para>

-				The installation descriptor is an xml file called

-				install.xml under the metadata folder of the PEAR

-				structure. It&apos;s also called InsD. The InsD XML file

-				should be created in the UTF-8 file encoding. The InsD

-				should contain the following sections:

-			</para>

-

-			<itemizedlist>

-				<listitem>

-					<para>

-						&lt;OS&gt;: This section is used to specify

-						supported operating systems

-					</para>

-				</listitem>

-

-				<listitem>

-					<para>

-						&lt;TOOLKITS&gt;: This section is used to

-						specify toolkits, such as JDK, needed by the

-						component.

-					</para>

-				</listitem>

-

-				<listitem>

-					<para>

-						&lt;SUBMITTED_COMPONENT&gt;: This is the most

-						important section in the Installation

-						Descriptor. It&apos;s used to specify required

-						information about the component. See

-						<xref

-							linkend="ugr.ref.pear.installation_descriptor" />

-						for detailed information about this section.

-					</para>

-				</listitem>

-

-				<listitem>

-					<para>

-						&lt;INSTALLATION&gt;: This section is explained

-						in section

-						<xref linkend="ugr.ref.pear.installing" />

-						.

-					</para>

-				</listitem>

-			</itemizedlist>

-

-		</section>

-

-		<section id="ugr.ref.pear.installation_descriptor">

-			<title>

-				Documented template for the installation descriptor:

-			</title>

-			<titleabbrev>Installation Descriptor: template</titleabbrev>

-

-			<para>

-				The following is a sample

-				<quote>documented template</quote>

-				which describes content of the installation descriptor

-				install.xml:

-			</para>

-

-

-			<programlisting><![CDATA[<? xml version="1.0" encoding="UTF-8"?>

-<!-- Installation Descriptor Template -->

-<COMPONENT_INSTALLATION_DESCRIPTOR>

-  <!-- Specifications of OS names, including version, etc. -->

-  <OS>

-    <NAME>OS_Name_1</NAME>

-    <NAME>OS_Name_2</NAME>

-  </OS>

-  <!-- Specifications of required standard toolkits -->

-  <TOOLKITS>

-    <JDK_VERSION>JDK_Version</JDK_VERSION>

-  </TOOLKITS>

-

-  <!-- There are 2 types of variables that are used in the InsD:

-       a) $main_root , which will be substituted with the real path to the

-                 main component root directory after installing the

-                 main (submitted) component

-       b) $component_id$root, which will be substituted with the real path

-          to the root directory of a given delegate component after

-          installing the given delegate component -->

-

-  <!-- Specification of submitted component (AE)             -->

-  <!-- Note: submitted_component_id is assigned by developer; -->

-  <!--       XML descriptor file name is set by developer.    -->

-  <!-- Important: ID element should be the first in the       -->

-  <!--            SUBMITTED_COMPONENT section.                -->

-  <!-- Submitted component may include optional specification -->

-  <!-- of Collection Reader that can be used for testing the  -->

-  <!-- submitted component.                                   -->

-  <!-- Submitted component may include optional specification -->

-  <!-- of CAS Consumer that can be used for testing the       -->

-  <!-- submitted component.                                   -->

-

-  <SUBMITTED_COMPONENT>

-    <ID>submitted_component_id</ID>

-    <NAME>Submitted component name</NAME>

-    <DESC>$main_root/desc/ComponentDescriptor.xml</DESC>

-

-    <!-- deployment options:                                   -->

-    <!-- a) "standard" is deploying AE locally                 -->

-    <!-- b) "service"  is deploying AE locally as a service,   -->

-    <!--    using specified command (script)                   -->

-    <!-- c) "network"  is deploying a pure network AE, which   -->

-    <!--    is running somewhere on the network                -->

-

-    <DEPLOYMENT>standard | service | network</DEPLOYMENT>

-

-    <!-- Specifications for "service" deployment option only   -->

-    <SERVICE_COMMAND>$main_root/bin/startService.bat</SERVICE_COMMAND>

-    <SERVICE_WORKING_DIR>$main_root</SERVICE_WORKING_DIR>

-    <SERVICE_COMMAND_ARGS>

-

-      <ARGUMENT>

-        <VALUE>1st_parameter_value</VALUE>

-        <COMMENTS>1st parameter description</COMMENTS>

-      </ARGUMENT>

-

-      <ARGUMENT>

-        <VALUE>2nd_parameter_value</VALUE>

-        <COMMENTS>2nd parameter description</COMMENTS>

-      </ARGUMENT>

-

-    </SERVICE_COMMAND_ARGS>

-

-    <!-- Specifications for "network" deployment option only   -->

-

-    <NETWORK_PARAMETERS>

-      <VNS_SPECS VNS_HOST="vns_host_IP" VNS_PORT="vns_port_No" />

-    </NETWORK_PARAMETERS>

-

-    <!-- General specifications                                -->

-

-    <COMMENTS>Main component description</COMMENTS>

-

-    <COLLECTION_READER>

-      <COLLECTION_ITERATOR_DESC>

-        $main_root/desc/CollIterDescriptor.xml

-      </COLLECTION_ITERATOR_DESC>

-

-      <CAS_INITIALIZER_DESC>

-        $main_root/desc/CASInitializerDescriptor.xml

-      </CAS_INITIALIZER_DESC>

-    </COLLECTION_READER>

-

-    <CAS_CONSUMER>

-      <DESC>$main_root/desc/CASConsumerDescriptor.xml</DESC>

-    </CAS_CONSUMER>

-

-  </SUBMITTED_COMPONENT>

-  <!-- Specifications of the component installation process -->

-  <INSTALLATION>

-    <!-- List of delegate components that should be installed together -->

-    <!-- with the main submitted component (for aggregate components)  -->

-    <!-- Important: ID element should be the first in each             -->

-

-    <!--            DELEGATE_COMPONENT section.                        -->

-    <DELEGATE_COMPONENT>

-      <ID>first_delegate_component_id</ID>

-      <NAME>Name of first required separate component</NAME>

-    </DELEGATE_COMPONENT>

-

-    <DELEGATE_COMPONENT>

-      <ID>second_delegate_component_id</ID>

-      <NAME>Name of second required separate component</NAME>

-    </DELEGATE_COMPONENT>

-

-    <!-- Specifications of local path names that should be replaced -->

-    <!-- with real path names after the main component as well as   -->

-    <!-- all required delegate (library) components are installed.  -->

-    <!-- <FILE> and <REPLACE_WITH> values may use the $main_root or -->

-    <!-- one of the $component_id$root variables.                   -->

-    <!-- Important: ACTION element should be the first in each      -->

-    <!--            PROCESS section.                                -->

-

-    <PROCESS>

-      <ACTION>find_and_replace_path</ACTION>

-      <PARAMETERS>

-        <FILE>$main_root/desc/ComponentDescriptor.xml</FILE>

-        <FIND_STRING>../resources/dict/</FIND_STRING>

-        <REPLACE_WITH>$main_root/resources/dict/</REPLACE_WITH>

-        <COMMENTS>Specify actual dictionary location in XML component

-          descriptor

-        </COMMENTS>

-      </PARAMETERS>

-    </PROCESS>

-

-    <PROCESS>

-      <ACTION>find_and_replace_path</ACTION>

-      <PARAMETERS>

-        <FILE>$main_root/desc/DelegateComponentDescriptor.xml</FILE>

-        <FIND_STRING>

-local_root_directory_for_1st_delegate_component/resources/dict/

-        </FIND_STRING>

-        <REPLACE_WITH>

-          $first_delegate_component_id$root/resources/dict/

-        </REPLACE_WITH>

-        <COMMENTS>

-          Specify actual dictionary location in the descriptor of the 1st

-          delegate component

-        </COMMENTS>

-      </PARAMETERS>

-    </PROCESS>

-

-    <!-- Specifications of environment variables that should be set prior

-         to running the main component and all other reused components.

-         <VAR_VALUE> values may use the $main_root or one of the

-         $component_id$root variables. -->

-

-    <PROCESS>

-      <ACTION>set_env_variable</ACTION>

-      <PARAMETERS>

-        <VAR_NAME>env_variable_name</VAR_NAME>

-        <VAR_VALUE>env_variable_value</VAR_VALUE>

-        <COMMENTS>Set environment variable value</COMMENTS>

-      </PARAMETERS>

-    </PROCESS>

-

-  </INSTALLATION>

-</COMPONENT_INSTALLATION_DESCRIPTOR>]]></programlisting>

-      

-      <section id="ugr.ref.pear.installation_descriptor.submitted_component">

-        <title>The SUBMITTED_COMPONENT section</title>

-        

-        <para>The SUBMITTED_COMPONENT section of the installation descriptor

-          (install.xml) is used to specify required information about the UIMA component.

-          Before explaining the details, let&apos;s clarify the concept of component ID and

-          <quote>macros</quote> used in the installation descriptor. The component ID

-          element should be the <emphasis role="bold">first element </emphasis>in the

-          SUBMITTED_COMPONENT section.</para>

-        

-        <para>The component id is a string that uniquely identifies the component. It should

-          use the JAVA naming convention (e.g.

-          com.company_name.project_name.etc.mycomponent).</para>

-        

-        <para>Macros are variables such as $main_root, used to represent a string such as the

-          full path of a certain directory.</para>

-        

-        <para>The values of these macros are defined by the PEAR installation process, when the

-          PEAR is installed, and represent the values local to that particular installation.

-          The values are stored in the <literal>metadata/PEAR.properties</literal> file that is 

-          generated during PEAR installation.

-          The tools and applications that use and deploy PEAR files replace these macros with

-          the corresponding values in the local environment as part of the deployment

-          process in the files included in the conf and desc folders.</para>

-        

-        <para>Currently, there are two types of macros:</para>

-        

-        <itemizedlist>

-          

-          <listitem><para>$main_root, which represents the local absolute

-          path of the main component root directory after deployment. </para></listitem>

-          

-          <listitem><para>$<emphasis>component_id</emphasis>$root, which

-            represents the local absolute path to the root directory of the component which

-            has <emphasis>component_id </emphasis> as component ID. This component could

-            be, for instance, a delegate component. </para></listitem></itemizedlist>

-        

-        <para>For example, if some part of a descriptor needs to have a path to the data

-          subdirectory of the PEAR, you write <literal>$main_root/data</literal>. If

-          your PEAR refers to a delegate component having the ID

-          <quote><literal>my.comp.Dictionary</literal></quote>, and you need to

-          specify a path to one of this component&apos;s subdirectories, e.g.

-          <literal>resource/dict</literal>, you write

-          <literal>$my.comp.Dictionary$root/resources/dict</literal>. </para>

-        

-      </section>

-      <section id="ugr.ref.pear.installation_descriptor.id_name_desc">

-        <title>The ID, NAME, and DESC tags</title>

-        

-        <para>These tags are used to specify the component ID, Name, and descriptor path

-          using the corresponding tags as follows:

-          

-          

-          <programlisting><![CDATA[<SUBMITTED_COMPONENT>

-  <ID>submitted_component_id</ID>

-  <NAME>Submitted component name</NAME>

-  <DESC>$main_root/desc/ComponentDescriptor.xml</DESC>]]></programlisting></para>

-        

-      </section>

-      <section id="ugr.ref.pear.installation_descriptor.deployment_type">

-        <title>Tags related to deployment types</title>

-        

-        <para>As mentioned before, there are currently three types of PEAR packages,

-          depending on the following deployment types</para>

-        <section

-          id="ugr.ref.pear.installation_descriptor.deployment_type.standard">

-          <title>Standard Type</title>

-          

-          <para>A component package with the <emphasis role="bold">standard</emphasis>

-            type must be a valid UIMA Analysis Engine, and all the required files to deploy it

-            must be included in the PEAR package. This deployment type should be specified as

-            follows:

-            

-            

-            <programlisting><![CDATA[<DEPLOYMENT>standard</DEPLOYMENT>]]></programlisting></para>

-        </section>

-        <section

-          id="ugr.ref.pear.installation_descriptor.deployment_type.service">

-          <title>Service Type</title>

-          

-          <para>A component package with the <emphasis role="bold">service</emphasis>

-            type must be deployable locally as a supported UIMA service (e.g. Vinci). The

-            installation descriptor must include the path for the executable or script to

-            start the service including its arguments, and the working directory from where

-            to launch it, following this template:

-            

-            

-            <programlisting><![CDATA[<DEPLOYMENT>service</DEPLOYMENT>

-<SERVICE_COMMAND>$main_root/bin/startService.bat</SERVICE_COMMAND>

-<SERVICE_WORKING_DIR>$main_root</SERVICE_WORKING_DIR>

-<SERVICE_COMMAND_ARGS>

-  <ARGUMENT>

-    <VALUE>1st_parameter_value</VALUE>

-    <COMMENTS>1st parameter description</COMMENTS>

-  </ARGUMENT>

-  <ARGUMENT>

-    <VALUE>2nd_parameter_value</VALUE>

-    <COMMENTS>2nd parameter description</COMMENTS>

-  </ARGUMENT>

-</SERVICE_COMMAND_ARGS>]]></programlisting></para>

-          

-        </section>

-        <section

-          id="ugr.ref.pear.installation_descriptor.deployment_type.network">

-          <title>Network Type</title>

-          

-          <para>A component package with the network type is not deployed locally, but

-            rather in a <quote>remote</quote> environment. It&apos;s accessed as a

-            network AE (e.g. Vinci Service). In this case, the PEAR package does not have to

-            contain files required for deployment, but must contain the network AE

-            descriptor. The &lt;DESC&gt; tag in the installation descriptor (See section

-            2.3.2.1) must point to the network AE descriptor. Here is a template in the case of

-            Vinci services:

-            

-            

-            <programlisting><![CDATA[<DEPLOYMENT>network</DEPLOYMENT>

-<NETWORK_PARAMETERS>

-  <VNS_SPECS VNS_HOST="vns_host_IP" VNS_PORT="vns_port_No" />

-</NETWORK_PARAMETERS>]]></programlisting></para>

-        </section>

-      </section>

-      <section

-        id="ugr.ref.pear.installation_descriptor.collection_reader_cas_consumer">

-        <title>The Collection Reader and CAS Consumer tags</title>

-        

-        <para>These sections of the installation descriptor are used by any specific

-          Collection Reader or CAS Consumer to be used with the packaged analysis

-          engine.</para>

-        

-      </section>

-      <section id="ugr.ref.pear.installation_descriptor.installation">

-        <title>The INSTALLATION section</title>

-        

-        <para>The &lt;INSTALLATION&gt; section specifies the external dependencies of

-          the component and the operations that should be performed during the PEAR package

-          installation.</para>

-        

-        <para>The component dependencies are specified in the

-          &lt;DELEGATE_COMPONENT&gt; sub-sections, as shown in the installation

-          descriptor template above.</para>

-        

-        <para>Important: The ID element should be the first element in each

-          &lt;DELEGATE_COMPONENT&gt; sub-section.</para>

-        

-        <para>The &lt;INSTALLATION&gt; section may specify the following operations:

-          

-          <itemizedlist><listitem><para>Setting environment variables that are

-            required to run the installed component.

-            </para>

-            <para>This is also how you specify additional classpaths

-          for a Java component - by specifying the setting of an environmental variable 

-            named CLASSPATH.  The <literal>buildComponentClasspath</literal> method 

-          of the PackageBrowser class builds a classpath string from what it finds in 

-          the CLASSPATH specification here, plus adds a classpath entry for all

-          Jars in the <literal>lib</literal> directory.  Because of this, there is no need

-            to specify Class Path entries for Jars in the lib directory, when using

-            the Eclipse plugin pear packager or the Maven Pear Packager.</para>

-            

-            <blockquote><para>When specifying the value of the CLASSPATH environment 

-            variable, use the semicolon ";" as the separator character, regardless of the

-            target Operating System conventions.  This delimiter will be replaced with 

-            the right one for the Operating System during PEAR installation.</para>

-            </blockquote>

-            

-            <para>If your component needs to set the UIMA datapath you must specify the necessary 

-            datapath setting using an environment variable with the key <literal>uima.datapath</literal>.

-            When such a key is specified the <literal>getComponentDataPath</literal> method of the 

-            PackageBrowser class will return the specified datapath settings for your component.

-            </para>

-            

-            <warning><para>Do not put UIMA Framework Jars into the lib directory of your

-            PEAR; doing so will cause system failures due to class loading issues.</para></warning>

-            </listitem>

-            <listitem><para>Note that you can use <quote>macros</quote>, like

-              $main_root or $component_id$root in the VAR_VALUE element of the

-              &lt;PARAMETERS&gt; sub-section.</para></listitem>

-            

-            <listitem><para>Finding and replacing string expressions in files.</para>

-              </listitem>

-            

-            <listitem><para>Note that you can use the <quote>macros</quote> in the FILE

-              and REPLACE_WITH elements of the &lt;PARAMETERS&gt; sub-section. </para>

-              </listitem></itemizedlist></para>

-        

-        <para>Important: the ACTION element always should be the 1st element in each

-          &lt;PROCESS&gt; sub-section.</para>

-        

-        <para>By default, the PEAR Installer will try to process every file in the desc and

-          conf directories of the PEAR package in order to find the <quote>macros</quote>

-          and replace them with actual path expressions. In addition to this, the installer

-          will process the files specified in the

-          &lt;INSTALLATION&gt; section.</para>

-        

-        <para>Important: all XML files which are going to be processed should be created

-          using UTF-8 or UTF-16 file encoding. All other text files which are going to be

-          processed should be created using the ASCII file encoding.</para>

-      </section>

-    </section>

-    

-    <section id="ugr.ref.pear.packaging_into_1_file">

-      <title>Packaging the PEAR structure into one file</title>

-      

-      <para>The last step of the PEAR process is to simply <emphasis role="bold">

-        zip</emphasis> the content of the PEAR root folder (<emphasis role="bold">not

-        including the root folder itself</emphasis>) to a PEAR file with the extension <quote>.pear</quote>.</para>

-        

-      <para>To do this you can either use the PEAR packaging tools that are described in <quote><olink targetdoc="&uima_docs_tools;"

-      /> <olink targetdoc="&uima_docs_tools;"

-      targetptr="ugr.tools.pear.packager"/></quote> or you can use the PEAR packaging API that is shown below.</para>

-      

-      <para>

-      To use the PEAR packaging API you first have to create the necessary information for the PEAR package:

-        <programlisting>    //define PEAR data  

-    String componentID = "AnnotComponentID";

-    String mainComponentDesc = "desc/mainComponentDescriptor.xml";

-    String classpath ="$main_root/bin;";

-    String datapath ="$main_root/resources;";

-    String mainComponentRoot = "/home/user/develop/myAnnot";

-    String targetDir = "/home/user/develop";

-    Properties annotatorProperties = new Properties();

-    annotatorProperties.setProperty("sysProperty1", "value1");</programlisting>

-  	    

-  	    To create a complete PEAR package in one step call:

-  	    <programlisting>PackageCreator.generatePearPackage(

-   componentID, mainComponentDesc, classpath, datapath, 

-   mainComponentRoot, targetDir, annotatorProperties);</programlisting>

-        The created PEAR package has the file name &lt;componentID>.pear and is located in the &lt;targetDir>.

-        </para>      

-        <para>

-        To create just the PEAR installation descriptor in the main component root directory call:

-        <programlisting>PackageCreator.createInstallDescriptor(componentID, mainComponentDesc,

-   classpath, datapath, mainComponentRoot, annotatorProperties);</programlisting>

-  	    

-  	    To package a PEAR file with an existing installation descriptor call:

-        <programlisting>PackageCreator.createPearPackage(componentID, mainComponentRoot,

-   targetDir);</programlisting>

-        The created PEAR package has the file name &lt;componentID>.pear and is located in the &lt;targetDir>.

-  	  </para>

-  	  

-    </section>

-  </section>

-  <section id="ugr.ref.pear.installing">

-    <title>Installing a PEAR package</title>

-    

-    <para>The installation of a PEAR package can be done using 

-      the PEAR installer tool (see <olink targetdoc="&uima_docs_tools;" 

-      /> <olink targetdoc="&uima_docs_tools;"

-          targetptr="ugr.tools.pear.installer"/>, or by an application using

-      the PEAR APIs, directly. </para>

-    

-    <para>During the PEAR installation the PEAR file is extracted to the installation directory and the PEAR macros 

-    in the descriptors are updated with the corresponding path. At the end of the installation the PEAR verification 

-    is called to check if the installed PEAR package can be started successfully. The PEAR verification use the classpath,

-    datapath and the system property settings of the PEAR package to verify the PEAR content. Necessary Java library 

-    path settings for native libararies, PATH variable settings or system environment variables cannot be recognized 

-    automatically and the use must take care of that manually.</para>

-

-    <note><para>By default the PEAR packages are not installed directly to the specified installation directory. For each PEAR

-    a subdirectory with the name of the PEAR's ID is created where the PEAR package is installed to. If the PEAR installation 

-    directory already exists, the old content is automatically deleted before the new content is installed.</para></note>

-      

-    <section id="ugr.ref.pear.installing_pear_using_API">

-      <title>Installing a PEAR file using the PEAR APIs</title>

-    

-      <para>The example below shows how to use the PEAR APIs to install a 

-      PEAR package and access the installed PEAR package data. For more details about the PackageBrowser API, 

-      please refer to the Javadocs for the org.apache.uima.pear.tools package.

-      

-      <programlisting>File installDir = new File("/home/user/uimaApp/installedPears");

-File pearFile = new File("/home/user/uimaApp/testpear.pear");

-boolean doVerification = true;

-

-try {

-  // install PEAR package

-  PackageBrowser instPear = PackageInstaller.installPackage(

- 	installDir, pearFile, doVerification);

-

-  // retrieve installed PEAR data

-  // PEAR package classpath

-  String classpath = instPear.buildComponentClassPath();

-  // PEAR package datapath

-  String datapath = instPear.getComponentDataPath();

-  // PEAR package main component descriptor

-  String mainComponentDescriptor = instPear

-     	.getInstallationDescriptor().getMainComponentDesc();

-  // PEAR package component ID

-  String mainComponentID = instPear

-     	.getInstallationDescriptor().getMainComponentId();

-  // PEAR package pear descriptor

-  String pearDescPath = instPear.getComponentPearDescPath();

-

-  // print out settings

-  System.out.println("PEAR package class path: " + classpath);

-  System.out.println("PEAR package datapath: " + datapath);

-  System.out.println("PEAR package mainComponentDescriptor: " 

-   	+ mainComponentDescriptor);

-  System.out.println("PEAR package mainComponentID: " 

-   	+ mainComponentID);

-  System.out.println("PEAR package specifier path: " + pearDescPath); 	

-

-  } catch (PackageInstallerException ex) {

-    // catch PackageInstallerException - PEAR installation failed

-    ex.printStackTrace();

-    System.out.println("PEAR installation failed");

-  } catch (IOException ex) {

-    ex.printStackTrace();

-    System.out.println("Error retrieving installed PEAR settings");

-  }</programlisting></para>

-	  

-	  <para>

-	    To run a PEAR package after it was installed using the PEAR API see the example below. It use the 

-	    generated PEAR specifier that was automatically created during the PEAR installation. 

-	    For more details about the APIs please refer to the Javadocs.

-	  

-

-      <programlisting>File installDir = new File("/home/user/uimaApp/installedPears");

-File pearFile = new File("/home/user/uimaApp/testpear.pear");

-boolean doVerification = true;

-

-try {

-

-  // Install PEAR package

-  PackageBrowser instPear = PackageInstaller.installPackage(

-  	installDir, pearFile, doVerification);

-

-  // Create a default resouce manager

-  ResourceManager rsrcMgr = UIMAFramework.newDefaultResourceManager();

-

-  // Create analysis engine from the installed PEAR package using

-  // the created PEAR specifier

-  XMLInputSource in = 

-        new XMLInputSource(instPear.getComponentPearDescPath());

-  ResourceSpecifier specifier =

-        UIMAFramework.getXMLParser().parseResourceSpecifier(in);

-  AnalysisEngine ae = 

-        UIMAFramework.produceAnalysisEngine(specifier, rsrcMgr, null);

-

-  // Create a CAS with a sample document text

-  CAS cas = ae.newCAS();

-  cas.setDocumentText("Sample text to process");

-  cas.setDocumentLanguage("en");

-

-  // Process the sample document

-  ae.process(cas);

-  } catch (Exception ex) {

-         ex.printStackTrace();

-  }</programlisting></para>

-   

-    </section>

-

-  </section>

-  

-    <section id="ugr.ref.pear.specifier">

-    <title>PEAR package descriptor</title>

-    

-    <para>

-       To run an installed PEAR package directly in the UIMA framework the <literal>pearSpecifier</literal> 

-       XML descriptor can be used. Typically during the PEAR installation such an specifier is automatically generated 

-       and contains all the necessary information to run the installed PEAR package. Settings for system environment

-       variables, system PATH settings or Java library path settings cannot be recognized

-       automatically and must be set manually when the JVM is started. 

-    </para>

-      

-    <note><para>The PEAR may contain specifications for "environment variables" and their settings.  

-      When such a PEAR is run

-    directly in the UIMA framework, those settings (except for Classpath and Data Path) are converted

-    to Java System properties, and set to the specified values.  Java cannot set true environmental variables;

-    if such a setting is needed, the application would need to arrange to do this prior to invoking Java.</para>

-    

-    <para>The Classpath and Data Path settings are used by UIMA to configure a special Resource Manager

-    that is used when code from this PEAR is being run.</para></note>  

-

-          <para>

-       The generated PEAR descriptor

-       is located in the component root directory of the installed PEAR package and has a filename like 

-       &lt;componentID&gt;_pear.xml.

-    </para>

-    <para>

-       The PEAR package descriptor looks like:

-    </para>

-    <programlisting><![CDATA[<?xml version="1.0" encoding="UTF-8"?>

-<pearSpecifier xmlns="http://uima.apache.org/resourceSpecifier">

-   <pearPath>/home/user/uimaApp/installedPears/testpear</pearPath>

-   <pearParameters>     <!-- optional -->

-      <nameValuePair>   <!-- any number, repeated -->

-         <name>param1</name>

-         <value><string>stringVal1</string></value>

-      </nameValuePair>

-   </pearParameters>

-   <parameters>         <!-- optional legacy string-valued parameters -->

-      <parameter>       <!-- any number, repeated -->

-        <name>name-of-the-parameter</name>

-        <value>string-value</value>

-      </parameter>

-   </parameters>

-</pearSpecifier>]]></programlisting>

-    <para>

-       The <literal>pearPath</literal> setting in the descriptor must point to the component root directory 

-       of the installed PEAR package.

-    </para>

-    <note>

-      <!--para>  should now be possible (11/2008) 

-         When a PEAR descriptor is used within an aggregate AE as primitive AE, it is not possible to 

-         override the PEAR AE configuration parameter settings with the aggregate AE descriptor. 

-      </para-->

-      <para>

-         It is not possible to share resources between PEAR Analysis Engines that are instantiated using the PEAR

-         descriptor. The PEAR runtime created for each PEAR descriptor has its own specific ResourceManager

-         (unless exactly the same Classpath and Data Path are being used).

-      </para>

-    </note>

-    

-    <para>The optional <literal>pearParameters</literal> section, if used, specifies parameter values,

-      which are used to customize / override parameter values in the PEAR descriptor. The format

-      for parameter values used here is the same as in component parameters (see 

-      <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.aes.configuration_parameter_settings"/>).

-      External Settings overrides continue to work for PEAR descriptors, and have precedence, if specified.

-    </para>

-

-    <para>Additionally, there can be a <literal>parameters</literal> section. This section supports

-    only string-valued parameters. Up to Apache UIMA version 2.10.4, this was the only way of

-    providing PEAR parameters. Starting with Apache UIMA version 2.10.4, this way of specifying

-    parameters is deprecated and should no longer be used. Support for will eventually be removed

-    in a future version of Apache UIMA. Parameters set in the <literal>pearParameters</literal> have

-    precedence over parameters defined in <literal>parameters</literal> section. For the time being,

-    both sections can be present simultaneously in a PEAR specifier.

-    </para>

-          

-  </section>

-  

-</chapter>
\ No newline at end of file
diff --git a/uima-docbook-references/src/docbook/ref.resources.xml b/uima-docbook-references/src/docbook/ref.resources.xml
deleted file mode 100644
index b845ad3..0000000
--- a/uima-docbook-references/src/docbook/ref.resources.xml
+++ /dev/null
@@ -1,186 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY imgroot "images/references/ref.resources/">

-<!ENTITY tp "ugr.ref.resources.">

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.ref.resources">

-  <title>UIMA Resources</title>

-  <titleabbrev>UIMA Resources</titleabbrev>

-

-  <section id="ugr.ref.resources.overview">

-    <title>What is a UIMA Resource?</title>

-    <para>UIMA uses the term <code>Resource</code> to describe all UIMA components

-    that can be acquired by an application or by other resources.</para>

-    

-    <figure id="ref.resource.fig.kinds">

-      <title>Resource Kinds</title>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="3in" format="PNG" fileref="&imgroot;res_resource_kinds.png"/>

-        </imageobject>

-        <textobject><phrase>Resource Kinds, a partial list</phrase>

-        </textobject>

-      </mediaobject>

-    </figure>

-    

-    <para>There are many kinds of resources; here's a list of the main kinds:

-      <variablelist>

-    

-        <varlistentry>

-          <term><emphasis role="strong">Annotator</emphasis></term>

-          <listitem><para>a user written component, receives a CAS, does some processing, and returns the possibly

-          updated CAS.  Variants include CollectionReaders, CAS Consumers, CAS Multipliers.</para></listitem>

-        </varlistentry>

-    

-        <varlistentry>

-          <term><emphasis role="strong">Flow Controller</emphasis></term>

-          <listitem><para>a user written component controlling the flow of CASes within an aggregate.</para></listitem>

-        </varlistentry>

-        

-        <varlistentry>

-          <term><emphasis role="strong">External Resource</emphasis></term>

-          <listitem><para>a user written component. Variants include:

-            <itemizedlist spacing="compact">

-              <listitem><para>Data - includes special lifecycle call to load data</para></listitem>

-              <listitem><para>Parameterized - allows multiple instantiations with simple string parameter variants;

-                example: a dictionary, that has variants in content for different languages</para></listitem>

-              <listitem><para>Configurable - supports configuration from the XML specifier</para></listitem>

-            </itemizedlist>

-          </para></listitem>

-        </varlistentry>

-      </variablelist>

-    </para>

-

-   <section id="ugr.ref.resources.resource-inner-implementations">

-      <title>Resource Inner Implementations</title>

-      

-      <para>Many of the resource kinds include in their specification a (possibly optional) element, which is 

-      the name of a Java class which implements the resource.  We will call this class the "inner implementation".</para>

-      

-      <para>The UIMA framework creates instances of Resource from resource specifiers, by calling 

-      the framework's <code>produceResource(specifier, additional_parameters)</code> method.

-      This call produces a instance of Resource.  </para>

-      

-      <blockquote>

-        <para>   

-          For example, calling produceResource on an AnalysisEngineDescription produces an instance of

-          AnalysisEngine.  This, in turn will have a reference to the user-written inner implementation class.

-          specified by the <code>annotatorImplementationName</code>.

-        </para>

-        <para>External resource descriptors may include an <code>implementationName</code> element.

-	        Calling produceResource on a ExternalResourceDescription produces an instance of Resource;

-	        the resource obtained by subsequent calls to <code>getResource(...)</code> 

-	        is dependent on the particular descriptor, and may be an instance of

-	        the inner implementation class. 

-        </para>

-      </blockquote>

-      

-      <para>For external resources, each resource specifier kind handles the case where 

-      the inner implementation is omitted.  If it is supplied, the named class must implement

-      the interface specified in the bindings for this resource. In addition, the particular specifier kind may 

-      further restrict the kinds of classes the user supplies as the implementationName.

-      </para>

-      

-      <para>Some examples of this further restriction:

-        <variablelist>

-          <varlistentry>

-            <term><emphasis role="strong">customResource</emphasis></term>

-            <listitem><para>the class must also implement the Resource interface</para></listitem>

-          </varlistentry>

-          <varlistentry>

-            <term><emphasis role="strong">dataResource</emphasis></term>

-            <listitem><para>the class must also implement the SharedResourceObject interface</para></listitem>

-          </varlistentry>

-        </variablelist>

-      </para>

-      

-    </section>

-   

-  </section>

-  

-  <section id="ugr.ref.resources.sharing-across-pipelines">

-    <title>Sharing Resources, even across pipelines</title>

-    <titleabbrev>Sharing Resources</titleabbrev>

-    <para>UIMA applications run one or more UIMA Pipelines.  Each pipeline has a top-level Analysis Engine, which

-    may be an aggregation of many other Analysis Engine components.  The UIMA framework instantiates Annotator 

-    resources as specified to configure the pipelines.</para>

-    

-    <para>Sometimes, many identical pipelines are created (for example,

-    in order to exploit multi-core hardware by processing multiple CASes in parallel). In this case, the framework

-    would produce multiple instances of those Annotation resources; these are implemented as multiple instances

-    of the same Java class.</para>

-    

-    <para>Sets of External Resources plus a CAS Pool and UIMA Extension ClassLoader are set up and kept, 

-       per instance of a ResourceManager; 

-    this instance serves to allow sharing of these items across one or more pipelines.

-    

-    <itemizedlist>

-      <listitem>

-        <para>The UIMA Extension ClassLoader (if specified) is used to find the resources to be loaded

-        by the framework</para>

-      </listitem>

-      <listitem>

-        <para>The <code>External Resources</code> are specified by a pipeline's resource configuration.</para>

-      </listitem>

-      <listitem>

-        <para>The CAS Pool is a pool of CASs all with identical type systems and index definitions, associated 

-        with a pipeline.</para>

-      </listitem>

-    </itemizedlist> </para>

-    

-    <para>When setting up a pipeline, the UIMA Framework's <code>produceResource</code> 

-    or one of its specialized variants is called, and a new

-    ResourceManager being created and used for that pipeline.  However, in many cases, it may be advantageous to

-    share the same Resources across multiple pipelines; this is easily doable by passing a common instance of the

-    ResourceManager to the pipeline creation methods (using the additional parameters of the produceResource method).</para>

-

-    <para>

-      To handle additional use cases, the ResourceManager has a <code>copy()</code> method which creates a copy of the

-      Resource Manager instance.  The new instance is created with a null CAS Manager; if you want to share the

-      the CAS Pool, you have to copy the CAS Manager: <code>newRM.setCasManager(originalRM.getCasManager())</code>.

-      You also may set the Extension Class Loader in the new instance (PEAR wrappers use this to allow

-      PEARs to have their own classpath).  See the Javadocs for details.

-    </para>

-          

-  </section>

-     

-  <section id="ugr.ref.resources.external-resource-multiple-parameterized-instances">

-    <title>External Resources support for multiple Parameterized Instances</title>

-    <para>A typical external resource gets a single instantiation, shared with all users of a particular

-    ResourceManager.

-    Sometimes, multiple instantiations may be useful (of the same resource).  The framework supports this for 

-    ParameterizedDataResources.  There's one kind supplied with UIMA - the fileLanguageResourceSpecifier.

-    This works by having each call to getResource(name, extra_keys[]) use the extra keys to select a particular

-    instance.  On the first call for a particular instance, the named resource uses the extra keys to 

-    initialize a new instance by calling its <code>load</code> method with a data resource derived from the 

-    extra keys by the named resource.

-    </para>

-   

-    <para>For example, the fileLanguageResourceSpecifier uses the language code and goes through 

-      a process with lots of defaulting and fall back to find a resource to load, based on the language code.

-    </para>

-    

-  </section>

-    

-</chapter>
\ No newline at end of file
diff --git a/uima-docbook-references/src/docbook/ref.xmi.xml b/uima-docbook-references/src/docbook/ref.xmi.xml
deleted file mode 100644
index 03e5e85..0000000
--- a/uima-docbook-references/src/docbook/ref.xmi.xml
+++ /dev/null
@@ -1,381 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.ref.xmi">

-  <title>XMI CAS Serialization Reference</title>

-  

-  <para>This is the specification for the mapping of the UIMA CAS into the XMI (XML Metadata

-    Interchange<footnote><para> For details on XMI see Grose et al. <emphasis>Mastering

-    XMI. Java Programming with XMI, XML, and UML. </emphasis>John Wiley &amp; Sons, Inc.

-    2002.</para></footnote>) format. XMI is an OMG standard for expressing object graphs in

-    XML. The UIMA SDK provides support for XMI through the classes

-    <literal>org.apache.uima.cas.impl.XmiCasSerializer</literal> and

-    <literal>org.apache.uima.cas.impl.XmiCasDeserializer</literal>.</para>

-  

-  <section id="ugr.ref.xmi.xmi_tag">

-    <title>XMI Tag</title>

-    

-    <para>The outermost tag is &lt;XMI&gt; and must include a version number and XML

-      namespace attribute:

-      

-      

-      <programlisting>&lt;xmi:XMI xmi:version="2.0" xmlns:xmi="http://www.omg.org/XMI"&gt;

-  &lt;!-- CAS Contents here --&gt;

-&lt;/xmi:XMI&gt;</programlisting></para>

-    

-    <para>XML namespaces<footnote><para>http://www.w3.org/TR/xml-names11/</para>

-      </footnote> are used throughout. The <quote>xmi</quote> namespace prefix is used to

-      identify elements and attributes that are defined by the XMI specification. The XMI

-      document will also define one namespace prefix for each CAS namespace, as described in

-      the next section.</para>

-    

-  </section>

-  

-  <section id="ugr.ref.xmi.feature_structures">

-    <title>Feature Structures</title>

-    

-    <para>UIMA Feature Structures are mapped to XML elements. The name of the element is

-      formed from the CAS type name, making use of XML namespaces as follows.</para>

-    

-    <para>The CAS type namespace is converted to an XML namespace URI by the following rule:

-      replace all dots with slashes, prepend http:///, and append .ecore.</para>

-    

-    <para>This mapping was chosen because it is the default mapping used by the Eclipse

-      Modeling Framework (EMF)<footnote><para> For details on EMF and Ecore see Budinsky et

-      al. <emphasis>Eclipse Modeling Framework 2.0</emphasis>. Addison-Wesley.

-      2006.</para></footnote> to create namespace URIs from Java package names. The use of

-      the http scheme is a common convention, and does not imply any HTTP communication. The

-      .ecore suffix is due to the fact that the recommended type system definition for a

-      namespace is an ECore model, see <olink targetdoc="&uima_docs_tutorial_guides;"

-      /> <olink targetdoc="&uima_docs_tutorial_guides;"

-        targetptr="ugr.tug.xmi_emf"/>.</para>

-    

-    <para>Consider the CAS type name <quote>org.myproj.Foo</quote>. The CAS namespace

-      (<quote>org.myorg.</quote>) is converted to the XML namespace URI is

-      http:///org/myproj.ecore.</para>

-    

-    <para>The XML element name is then formed by concatenating the XML namespace prefix

-      (which is an arbitrary token, but typically we use the last component of the CAS

-      namespace) with the type name (excluding the namespace).</para>

-    

-    <para>So the example <quote>org.myproj.Foo</quote> FeatureStructure is written to

-      XMI as:

-      

-      

-      <programlisting>&lt;xmi:XMI 

-    xmi:version="2.0" 

-    xmlns:xmi="http://www.omg.org/XMI" 

-    xmlns:myproj="http:///org/myproj.ecore"&gt;

-  ...

-  &lt;myproj:Foo xmi:id="1"/&gt;

-  ...

-&lt;/xmi:XMI&gt;</programlisting></para>

-    

-    <para>The xmi:id attribute is only required if this object will be referred to from

-      elsewhere in the XMI document. If provided, the xmi:id must be unique for each

-      feature.</para>

-    

-    <para>All namespace prefixes (e.g. <quote>myproj</quote>) in this example must be

-      bound to URIs using the <quote>xmlns...</quote> attribute, as defined by the XML

-      namespaces specification.</para>

-  </section>

-  

-  <section id="ugr.ref.xmi.primitive_features">

-    <title>Primitive Features</title>

-    

-    <para>CAS features of primitive types (String, Boolean, Byte, Short, Integer, Long ,

-      Float, or Double) can be mapped either to XML attributes or XML elements. For example, a

-      CAS FeatureStructure of type org.myproj.Foo, with features:

-      

-      

-      <programlisting>begin   = 14

-end     = 19

-myFeature = "bar"</programlisting>

-      could be mapped to:

-      

-      

-      <programlisting>&lt;xmi:XMI xmi:version="2.0" xmlns:xmi="http://www.omg.org/XMI"

-    xmlns:myproj="http:///org/myproj.ecore"&gt;

-  ...

-  &lt;myproj:Foo xmi:id="1" begin="14" end="19" myFeature="bar"/&gt;

-  ...

-&lt;/xmi:XMI&gt;</programlisting>

-      or equivalently:

-      

-      

-      <programlisting><![CDATA[<xmi:XMI xmi:version="2.0" xmlns:xmi="http://www.omg.org/XMI"

-    xmlns:myproj="http:///org/myproj.ecore">

-  ...

-  <myproj:Foo xmi:id="1">

-    <begin>14</begin>

-    <end>19</end>

-    <myFeature>bar</myFeature>

-  </myproj:Foo>

-  ...

-</xmi:XMI>]]></programlisting></para>

-    

-    <para>The attribute serialization is preferred for compactness, but either

-      representation is allowable. Mixing the two styles is allowed; some features can be

-      represented as attributes and others as elements.</para>

-    

-  </section>

-  

-  <section id="ugr.ref.xmi.reference_features">

-    <title>Reference Features</title>

-    

-    <para>CAS features that are references to other feature structures (excluding arrays

-      and lists, which are handled separately) are serialized as ID references.</para>

-    

-    <para>If we add to the previous CAS example a feature structure of type org.myproj.Baz,

-      with feature <quote>myFoo</quote> that is a reference to the Foo object, the

-      serialization would be:

-      

-      

-      <programlisting><![CDATA[<xmi:XMI xmi:version="2.0" xmlns:xmi="http://www.omg.org/XMI"

-    xmlns:myproj="http:///org/myproj.ecore">

-  ...

-  <myproj:Foo xmi:id="1" begin="14" end="19" myFeature="bar"/>

-  <myproj:Baz xmi:id="2" myFoo="1"/>

-  ...

-</xmi:XMI>]]></programlisting></para>

-    

-    <para>As with primitive-valued features, it is permitted to use an element rather than an

-      attribute. However, the syntax is slightly different:</para>

-    

-    

-    <programlisting>&lt;myproj:Baz xmi:id="2"&gt;

-   &lt;myFoo href="#1"/&gt;

-&lt;myproj.Baz&gt;</programlisting>

-    

-    <para>Note that in the attribute representation, a reference feature is

-      indistinguishable from an integer-valued feature, so the meaning cannot be

-      determined without prior knowledge of the type system. The element representation is

-      unambiguous.</para>

-    

-  </section>

-  

-  <section id="ugr.ref.xmi.array_and_list_features">

-    <title>Array and List Features</title>

-    

-    <para>For a CAS feature whose range type is one of the CAS array or list types, the XMI serialization depends on the

-      setting of the <quote>multipleReferencesAllowed</quote> attribute for that feature in the UIMA Type System

-      Description (see <olink targetdoc="&uima_docs_ref;"

-        targetptr="ugr.ref.xml.component_descriptor.type_system.features"/>).</para>

-    

-    <para>An array or list with multipleReferencesAllowed = false (the default) is serialized as a

-      <quote>multi-valued</quote> property in XMI. An array or list with multipleReferencesAllowed = true is

-      serialized as a first-class object. Details are described below.</para>

-    

-    <section id="ugr.ref.xmi.array_and_list_features.as_multi_valued_properties">

-      <title>Arrays and Lists as Multi-Valued Properties</title>

-      

-      <para>In XMI, a multi-valued property is the most natural XMI representation for most cases. Consider the

-        example where the FeatureStructure of type org.myproj.Baz has a feature myIntArray whose value is the

-        integer array {2,4,6}. This can be mapped to:

-        

-        <programlisting>&lt;myproj:Baz xmi:id="3" myIntArray="2 4 6"/&gt;</programlisting> or

-        equivalently:

-        

-        

-        <programlisting>&lt;myproj:Baz xmi:id="3"&gt;

-  &lt;myIntArray&gt;2&lt;/myIntArray&gt;

-  &lt;myIntArray&gt;4&lt;/myIntArray&gt;

-  &lt;myIntArray&gt;6&lt;/myIntArray&gt;

-&lt;/myproj:Baz&gt;</programlisting>

-        </para>

-      

-      <para>Note that String arrays whose elements contain embedded spaces MUST use the latter mapping.</para>

-      

-      <para>FSArray or FSList features are serialized in a similar way. For example an FSArray feature that contains

-        references to the elements with xmi:id&apos;s <quote>13</quote> and <quote>42</quote> could be

-        serialized as:

-        

-        <programlisting>&lt;myproj:Baz xmi:id="3" myFsArray="13 42"/&gt;</programlisting> or:

-        

-        

-        <programlisting>&lt;myproj:Baz xmi:id="3"&gt;

-  &lt;myFsArray href="#13"/&gt;

-  &lt;myFsArray href="#42"/&gt;

-&lt;/myproj:Baz&gt;</programlisting>

-        </para>

-    </section>

-    

-    <section id="ugr.ref.xmi.array_and_list_features.as_1st_class_objects">

-      <title>Arrays and Lists as First-Class Objects</title>

-      

-      <para>The multi-valued-property representation described in the previous section does not allow multiple

-        references to an array or list object. Therefore, it cannot be used for features that are defined to allow

-        multiple references (i.e. features for which multipleReferencesAllowed = true in the Type System

-        Description).</para>

-      

-      <para>When multipleReferencesAllowed is set to true, array and list features are serialized as references,

-        and the array or list objects are serialized as separate objects in the XMI. Consider again the example where

-        the FeatureStructure of type org.myproj.Baz has a feature myIntArray whose value is the integer array

-        {2,4,6}. If myIntArray is defined with multipleReferencesAllowed=true, the serialization will be as

-        follows:

-        

-        <programlisting>&lt;myproj:Baz xmi:id="3" myIntArray="4"/&gt;</programlisting> or:

-        

-        

-        <programlisting>&lt;myproj:Baz xmi:id="3"&gt;

-  &lt;myIntArray href="#4"/&gt;

-&lt;/myproj:Baz&gt;</programlisting>

-        with the array object serialized as

-        

-        <programlisting>&lt;cas:IntegerArray xmi:id="4" elements="2 4 6"/&gt;</programlisting> or:

-        

-        

-        <programlisting>&lt;cas:IntegerArray xmi:id="4"&gt;

-  &lt;elements&gt;2&lt;/elements&gt;

-  &lt;elements&gt;4&lt;/elements&gt;

-  &lt;elements&gt;6&lt;/elements&gt;

-&lt;/cas:IntegerArray&gt;</programlisting></para>

-      

-      <para>Note that in this case, the XML element name is formed from the CAS type name (e.g.

-        <quote><literal>uima.cas.IntegerArray</literal></quote>) in the same way as for other

-        FeatureStructures. The elements of the array are serialized either as a space-separated attribute named

-        <quote>elements</quote> or as a series of child elements named <quote>elements</quote>.</para>

-      

-      <para>List nodes are just standard FeatureStructures with <quote>head</quote> and <quote>tail</quote>

-        features, and are serialized using the normal FeatureStructure serialization. For example, an

-        IntegerList with the values 2, 4, and 6 would be serialized as the four objects:

-        

-        

-        <programlisting>&lt;cas:NonEmptyIntegerList xmi:id="10" head="2" tail="11"/&gt;

-&lt;cas:NonEmptyIntegerList xmi:id="11" head="4" tail="12"/&gt;

-&lt;cas:NonEmptyIntegerList xmi:id="12" head="6" tail="13"/&gt;

-&lt;cas:EmptyIntegerList xmi:id"13"/&gt;</programlisting></para>

-      

-      <para>This representation of arrays allows multiple references to an array of list. It also allows a feature

-        with range type TOP to refer to an array or list. However, it is a very unnatural representation in XMI and does

-        not support interoperability with other XMI-based systems, so we instead recommend using the

-        multi-valued-property representation described in the previous section whenever it is possible.</para>

-      

-      <para>When a feature is specified in the descriptor without a multipleReferencesAllowed attribute, or with the 

-      attribute specified as <code>false</code>, but the framework discovers multiple references during 

-      serialization, it will issue a message to the log say that it discovered this (look for the phrase

-      "serialized in duplicate").  The serialization will continue, but the multiply-referenced items will 

-      be serialized in duplicate.</para>

-    </section>

-    

-    <section id="ugr.ref.xmi.null_array_list_elements">

-      <title>Null Array/List Elements</title>

-      

-      <para>In UIMA, an element of an FSArray or FSList may be null. In XMI, multi-valued properties do not permit null

-        values. As a workaround for this, we use a dummy instance of the special type cas:NULL, which has xmi:id 0.

-        For example, in the following example the <quote>myFsArray</quote> feature refers to an FSArray whose

-        second element is null:

-        

-        

-        <programlisting>&lt;cas:NULL xmi:id="0"/&gt;

-&lt;myproj:Baz xmi:id="3"&gt;

-  &lt;myFsArray href="#13"/&gt;

-  &lt;myFsArray href="#0"/&gt;

-  &lt;myFsArray href="#42"/&gt;

-&lt;/myproj:Baz&gt;</programlisting></para>

-      

-    </section>

-    

-  </section>

-  

-  <section id="ugr.ref.xmi.sofas_views">

-    <title>Subjects of Analysis (Sofas) and Views</title>

-    

-    <para>A UIMA CAS contain one or more subjects of analysis (Sofas). These are serialized no

-      differently from any other feature structure. For example:

-      

-      

-      <programlisting>&lt;?xml version="1.0"?&gt;

-&lt;xmi:XMI xmi:version="2.0" xmlns:xmi=http://www.omg.org/XMI

-    xmlns:cas="http:///uima/cas.ecore"&gt;

-  &lt;cas:Sofa xmi:id="1" sofaNum="1"

-      text="the quick brown fox jumps over the lazy dog."/&gt;

-&lt;/xmi:XMI&gt;</programlisting></para>

-    

-    <para>Each Sofa defines a separate View. Feature Structures in the CAS can be members of

-      one or more views. (A Feature Structure that is a member of a view is indexed in its

-      IndexRepository, but that is an implementation detail.)</para>

-    

-    <para>In the XMI serialization, views will be represented as first-class objects. Each

-      View has an (optional) <quote>sofa</quote> feature, which references a sofa, and

-      multi-valued reference to the members of the View. For example:</para>

-    

-    

-    <programlisting>&lt;cas:View sofa="1" members="3 7 21 39 61"/&gt;</programlisting>

-    

-    <para>Here the integers 3, 7, 21, 39, and 61 refer to the xmi:id fields of the objects that

-      are members of this view.</para>    

-  </section>

-  

-  <section id="ugr.ref.xmi.linking_to_ecore_type_system">

-    <title>Linking an XMI Document to its Ecore Type System</title>

-    <titleabbrev>Linking XMI docs to Ecore Type System</titleabbrev>

-    

-    <para>If the CAS Type System has been saved to an Ecore file (as described in <olink

-        targetdoc="&uima_docs_tutorial_guides;"/> <olink

-        targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.xmi_emf"/>), it is possible to store a

-      link from an XMI document to that Ecore type system. This is done using an xsi:schemaLocation attribute 

-      on the root XMI element.</para>

-    

-    <para>The xsi:schemaLocation attribute is a space-separated list that represents a

-      mapping from namespace URI (e.g. http:///org/myproj.ecore) to the physical URI of the

-      .ecore file containing the type system for that namespace. For example:

-      

-      

-      <programlisting>xsi:schemaLocation=

-  "http:///org/myproj.ecore file:/c:/typesystems/myproj.ecore"</programlisting>

-      would indicate that the definition for the org.myproj CAS types is contained in the file

-      <literal>c:/typesystems/myproj.ecore</literal>. You can specify a different

-      mapping for each of your CAS namespaces, using a space separated list. For details see

-      Budinsky et al. <emphasis>Eclipse Modeling Framework</emphasis>.</para>

-  </section>

-  

-  <section id="ugr.ref.xmi.delta">

-   <title>Delta CAS XMI Format</title>

-   <titleabbrev>Delta CAS XMI Format</titleabbrev>

-   <para>

-   The Delta CAS XMI serialization format is designed primarily to reduce the overhead serialization when calling annotators 

-   configured as services. Only Feature Structures and Views that are new or modified by the service  

-   are serialized and returned by the service.  

-   </para>

-   <para>

-   The classes <literal>org.apache.uima.cas.impl.XmiCasSerializer</literal> and

-    <literal>org.apache.uima.cas.impl.XmiCasDeserializer</literal> support serialization of only the modifications to the CAS. 

-    A caller is expected to set a marker to indicate the point from which changes to the CAS are to be tracked.

-   </para>

-   <para>

-   A Delta CAS XMI document contains only the Feature Structures and Views that have been added or modified.

-   The new and modified Feature Structures are represented in exactly the format as in a complete CAS serialization.

-   The <literal> cas:View </literal> element has been extended with three additional attributes to represent modifications to 

-   View membership. These new attributes are <literal>added_members</literal>, <literal>deleted_members</literal> and 

-   <literal>reindexed_members</literal>. For example:

-   </para>

-    <programlisting>&lt;cas:View sofa="1" added_members="63 77" 

-          deleted_member="7 61" reindexed_members="39" /&gt;</programlisting>

-    <para>

-    Here the integers 63, 77 represent xmi:id fields of the objects that have been newly added members to this View,

-    7 and 61 are xmi:id fields of the objects that have been removed from this view and 39 is the xmi:id of an object to be reindexed in this view.

-    </para>

-  </section>

-</chapter>
\ No newline at end of file
diff --git a/uima-docbook-references/src/docbook/ref.xml.component_descriptor.xml b/uima-docbook-references/src/docbook/ref.xml.component_descriptor.xml
deleted file mode 100644
index 95d31e0..0000000
--- a/uima-docbook-references/src/docbook/ref.xml.component_descriptor.xml
+++ /dev/null
@@ -1,2520 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" > 

-<!ENTITY tp "ugr.ref.xml.component_descriptor."> 

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.ref.xml.component_descriptor">

-  <title>Component Descriptor Reference</title>

-  

-  <para>This chapter is the reference guide for the UIMA SDK&apos;s Component Descriptor XML

-    schema. A <emphasis>Component Descriptor</emphasis> (also sometimes called a

-    <emphasis>Resource Specifier</emphasis> in the code) is an XML file that either (a)

-    completely describes a component, including all information needed to construct the

-    component and interact with it, or (b) specifies how to connect to and interact with an

-    existing component that has been published as a remote service.

-    <emphasis>Component</emphasis> (also called <emphasis>Resource</emphasis>) is a

-    general term for modules produced by UIMA developers and used by UIMA applications. The

-    types of Components are: Analysis Engines, Collection Readers, CAS

-    Initializers<footnote><para>This component is deprecated and should not be use in new

-    development.</para></footnote>, CAS Consumers, and Collection Processing Engines.

-    However, Collection Processing Engine Descriptors are significantly different in

-    format and are covered in a separate chapter, <olink targetdoc="&uima_docs_ref;"

-      targetptr="ugr.ref.xml.cpe_descriptor"/>.</para>

-  

-  <para><xref linkend="&tp;notation"/> describes the notation used in this

-    chapter.</para>

-  

-  <para><xref linkend="&tp;imports"/> describes the UIMA SDK&apos;s

-    <emphasis>import</emphasis> syntax, used to allow XML descriptors to import

-    information from other XML files, to allow sharing of information between several XML

-    descriptors.</para>

-  

-  <para><xref linkend="&tp;aes"/> describes the XML format for <emphasis>Analysis Engine

-    Descriptors</emphasis>. These are descriptors that completely describe Analysis

-    Engines, including all information needed to construct and interact with them.</para>

-  

-  <para><xref linkend="&tp;collection_processing_parts"/> describes the XML format for

-    <emphasis>Collection Processing Component Descriptors</emphasis>. This includes

-    Collection Iterator, CAS Initializer, and CAS Consumer Descriptors.</para>

-  

-  <para><xref linkend="&tp;service_client"/> describes the XML format for

-    <emphasis>Service Client Descriptors</emphasis>, which specify how to connect to and

-    interact with resources deployed as remote services.</para>

-

-   <para><xref linkend="&tp;custom_resource_specifiers"/> describes the XML format for

-    <emphasis>Custom Resource Specifiers</emphasis>, which allow you to plug in your

-    own Java class as a UIMA Resource.</para>

-	  

-  <section id="&tp;notation">

-    <title>Notation</title>

-    

-    <para>This chapter uses an informal notation to specify the syntax of Component

-      Descriptors. The formal syntax is defined by an XML schema definition, which is

-      contained in the file <literal>resourceSpecifierSchema.xsd</literal>,  

-      located in the <literal>uima-core.jar</literal> file.</para>

-    

-    <para>The notation used in this chapter is:</para>

-    

-    <itemizedlist><listitem><para>An ellipsis (...) inside an element body indicates

-      that the substructure of that element has been omitted (to be described in another

-      section of this chapter). An example of this would be:

-      

-      

-      <programlisting>&lt;analysisEngineMetaData&gt;

-...

-&lt;/analysisEngineMetaData&gt;</programlisting>

-      An ellipsis immediately after an element indicates that the element type may be may be

-      repeated arbitrarily many times. For example:

-      

-      

-      <programlisting>&lt;parameter&gt;[String]&lt;/parameter&gt;

-&lt;parameter&gt;[String]&lt;/parameter&gt;

-...</programlisting>

-      indicates that there may be arbitrarily many parameter elements in this

-      context.</para></listitem>

-      

-      <listitem><para>Bracketed expressions (e.g. <literal>[String]</literal>)

-        indicate the type of value that may be used at that location.</para></listitem>

-      

-      <listitem><para>A vertical bar, as in <literal>true|false</literal>, indicates

-        alternatives. This can be applied to literal values, bracketed type names, and

-        elements.</para></listitem>

-      

-      <listitem><para>Which elements are optional and which are required is specified in

-        prose, not in the syntax definition. </para></listitem></itemizedlist>

-  </section>

-  

-  <section id="&tp;imports">

-    <title>Imports</title>

-    

-    <para>The UIMA SDK defines a particular syntax for XML descriptors to import information

-      from other XML files. When one of the following appears in an XML descriptor:

-      

-      

-      <programlisting>&lt;import location="[URL]" /&gt; or

-&lt;import name="[Name]" /&gt;</programlisting>

-      it indicates that information from a separate XML file is being imported. Note that

-      imports are allowed only in certain places in the descriptor. In the remainder of this

-      chapter, it will be indicated at which points imports are allowed.</para>

-    

-    <para>If an import specifies a <literal>location</literal> attribute, the value of

-      that attribute specifies the URL at which the XML file to import will be found. This can be

-      a relative URL, which will be resolved relative to the descriptor containing the

-      <literal>import</literal> element, or an absolute URL. Relative URLs can be written

-      without a protocol/scheme (e.g., <quote>file:</quote>), and without a host machine

-      name. In this case the relative URL might look something like

-      <literal>org/apache/myproj/MyTypeSystem.xml.</literal></para>

-    

-    <para>An absolute URL is written with one of the following prefixes, followed by a path

-      such as <literal>org/apache/myproj/MyTypeSystem.xml</literal>:

-      

-      <itemizedlist spacing="compact"><listitem><para>file:/ &larr; has no network

-        address</para></listitem>

-        <listitem><para>file:/// &larr; has an empty network address</para></listitem>

-        <listitem><para>file://some.network.address/</para></listitem>

-        </itemizedlist></para>

-    

-    <para>For more information about URLs, please read the javadoc information for the Java

-      class <quote>URL</quote>.</para>

-    

-    <para>If an import specifies a <literal>name</literal> attribute, the value of that

-      attribute should take the form of a Java-style dotted name (e.g.

-      <literal>org.apache.myproj.MyTypeSystem</literal>). An .xml file with this name

-      will be searched for in the classpath or datapath (described below). As in Java, the dots

-      in the name will be converted to file path separators. So an import specifying the

-      example name in this paragraph will result in a search for

-      <literal>org/apache/myproj/MyTypeSystem.xml</literal> in the classpath or

-      datapath.</para>

-    

-    <para id="&tp;datapath">The datapath works similarly to the classpath but can be set programmatically

-      through the resource manager API. Application developers can specify a datapath

-      during initialization, using the following code:

-      

-      

-      <programlisting>

-ResourceManager resMgr = UIMAFramework.newDefaultResourceManager();

-resMgr.setDataPath(yourPathString);

-AnalysisEngine ae = 

-  UIMAFramework.produceAnalysisEngine(desc, resMgr, null);

-</programlisting></para>

-    

-    <para>The default datapath for the entire JVM can be set via the

-      <literal>uima.datapath</literal> Java system property, but this feature should

-      only be used for standalone applications that don&apos;t need to run in the same JVM as

-      other code that may need a different datapath.</para>

-

-    <para>The value of a name or location attribute may be parameterized with references to external

-    override variables using the <literal>${variable-name}</literal> syntax.

-    <programlisting>&lt;import location="Annotator${with}ExternalOverrides.xml" /&gt;</programlisting>

-	If a variable is undefined the value is left unmodified and a warning message identifies the missing

-	variable.</para>

-

-    <para>Previous versions of UIMA also supported XInclude. That support didn't work in

-      many situations, and it is no longer supported. To include other files, please use

-      &lt;import&gt;.</para>

-

-    <!--

-    <para>The UIMA SDK also supports XInclude, a W3C candidate recommendation,

-    to include XML files within other XML files.  However, it is recommended that the import syntax be used instead, as it

-    is more flexible and better supports tool developers.</para>

-    

-    <note><para>UIMA tools for editing XML

-    descriptors do not support the use of xi:include because they cannot correctly

-    determine what parts of a descriptor are updatable, and what parts are included

-    from other files.  They do support the

-    use of &lt;import&gt;.

-    </para></note>

-    

-    <para>To use XInclude, you first must include the XInclude

-    namespace in your document&apos;s root element, e.g.:</para>

-    

-    <programlisting>&lt;analysisEngineDescription xmlns="http://uima.apache.org/resourceSpecifier" xmlns:xi="http://www.w3.org/2001/XInclude"&gt;</programlisting>

-    

-    <para>Then, you can include a file using the syntax <literal>&lt;xi:include

-    href="[URL]"/&gt;</literal></para>

-    

-    <para>where [URL] can be any relative or absolute URL referring

-    to another XML document.  The referred-to

-    document must be a valid XML document, meaning that it must consist of exactly

-    one root element and must define all of the namespace prefixes that it uses.  The default namespace (generally <literal>http://uima.apache.org/resourceSpecifier</literal>) will be

-    inherited from the parent document.   When UIMA parses the XML document, it will automatically replace the <literal>&lt;xi:include&gt; </literal>element with the entire XML document

-    referred to by the href.  For more

-    information on XInclude see 

-    <a href="http://www.w3.org/TR/xinclude/">http://www.w3.org/TR/xinclude/</a>.</para>

-    -->

-    

-  </section>

-  

-  <section id="&tp;type_system">

-    <title>Type System Descriptors</title>

-    

-    <para>A Type System Descriptor is used to define the types and features that can be

-      represented in the CAS. A Type System Descriptor can be imported into an Analysis Engine

-      or Collection Processing Component Descriptor.</para>

-    

-    <para>The basic structure of a Type System Descriptor is as follows:

-      

-      

-      <programlisting><![CDATA[<typeSystemDescription xmlns="http://uima.apache.org/resourceSpecifier">

-

-  <name> [String] </name>

-  <description>[String]</description>

-  <version>[String]</version>

-  <vendor>[String]</vendor> 

-

-  <imports>

-    <import ...>

-    ...

-  </imports> 

-

-  <types>

-    <typeDescription>

-      ...

-    </typeDescription>

-

-    ...

-

-  </types>

-

-</typeSystemDescription>]]></programlisting></para>

-    

-    <para>All of the subelements are optional.</para>

-    

-    <section id="&tp;type_system.imports">

-      <title>Imports</title>

-      

-      <para>The <literal>imports</literal> section allows this descriptor to import

-        types from other type system descriptors. The import syntax is described in <xref

-          linkend="&tp;imports"/>. A type system may import any number of other type

-        systems and then define additional types which refer to imported types. Circular

-        imports are allowed.</para>

-    </section>

-    

-    <section id="&tp;type_system.types">

-      <title>Types</title>

-      

-      <para>The <literal>types</literal> element contains zero or more

-        <literal>typeDescription</literal> elements. Each

-        <literal>typeDescription</literal> has the form:

-        

-        

-        <programlisting><![CDATA[<typeDescription>

-  <name>[TypeName]</name>

-  <description>[String]</description>

-  <supertypeName>[TypeName]</supertypeName>

-  <features>

-    ...

-  </features>

-</typeDescription>]]></programlisting></para>

-      

-      <para>The name element contains the name of the type. A

-        <literal>[TypeName]</literal> is a dot-separated list of names, where each name

-        consists of a letter followed by any number of letters, digits, or underscores.

-        <literal>TypeNames</literal> are case sensitive. Letter and digit are as defined

-        by Java; therefore, any Unicode letter or digit may be used (subject to the character

-        encoding defined by the descriptor file&apos;s XML header). The name following the

-        final dot is considered to be the <quote>short name</quote> of the type; the

-        preceding portion is the namespace (analogous to the package.class syntax used in

-        Java). Namespaces beginning with uima are reserved and should not be used. Examples

-        of valid type names are:</para>

-      

-      <itemizedlist spacing="compact"><listitem><para>test.TokenAnnotation</para>

-        </listitem>

-        

-        <listitem><para>org.myorg.TokenAnnotation</para></listitem>

-        

-        <listitem><para>com.my_company.proj123.TokenAnnotation </para></listitem>

-        </itemizedlist>

-      

-      <para>These would all be considered distinct types since they have different

-        namespaces. Best practice here is to follow the normal Java naming conventions of

-        having namespaces be all lowercase, with the short type names having an initial

-        capital, but this is not mandated, so <literal>ABC.mYtyPE</literal> is an allowed

-        type name. While type names without namespaces (e.g.

-        <literal>TokenAnnotation</literal> alone) are allowed, but discouraged because

-        naming conflicts can then result when combining annotators that use different

-        type systems.</para>

-      

-      <para>The <literal>description</literal> element contains a textual description

-        of the type. The <literal>supertypeName</literal> element contains the name of the

-        type from which it inherits (this can be set to the name of another user-defined type,

-        or it may be set to any built-in type which may be subclassed, such as

-        <literal>uima.tcas.Annotation</literal> for a new annotation

-        type or <literal>uima.cas.TOP</literal> for a new type that is not

-        an annotation). All three of these elements are required.</para>

-      

-    </section>

-    

-    <section id="&tp;type_system.features">

-      <title>Features</title>

-      

-      <para>The <literal>features</literal> element of a

-        <literal>typeDescription</literal> is required only if the type we are specifying

-        introduces new features. If the <literal>features</literal> element is present,

-        it contains zero or more <literal>featureDescription</literal> elements, each of

-        which has the form:</para>

-      

-      

-      <programlisting><![CDATA[<featureDescription>

-  <name>[Name]</name>

-  <description>[String]</description>

-  <rangeTypeName>[Name]</rangeTypeName>

-  <elementType>[Name]</elementType>

-  <multipleReferencesAllowed>true|false</multipleReferencesAllowed>

-</featureDescription>]]></programlisting>

-      

-      <para>A feature&apos;s name follows the same rules as a type short name &ndash; a letter

-        followed by any number of letters, digits, or underscores. Feature names are case

-        sensitive.</para>

-      

-      <para>The feature&apos;s <literal>rangeTypeName</literal> specifies the type of

-        value that the feature can take. This may be the name of any type defined in your type

-        system, or one of the predefined types. All of the predefined types have names that are

-        prefixed with <literal>uima.cas</literal> or <literal>uima.tcas</literal>,

-        for example:

-        

-        

-        <programlisting>uima.cas.TOP 

-uima.cas.String

-uima.cas.Long 

-uima.cas.FSArray

-uima.cas.StringList

-uima.tcas.Annotation.</programlisting>

-        For a complete list of predefined types, see the CAS API documentation.</para>

-      

-      <para>The <literal>elementType</literal> of a feature is optional, and applies only

-        when the <literal>rangeTypeName</literal> is

-        <literal>uima.cas.FSArray</literal> or <literal>uima.cas.FSList</literal>

-        The <literal>elementType</literal> specifies what type of value can be assigned as

-        an element of the array or list. This must be the name of a non-primitive type. If

-        omitted, it defaults to <literal>uima.cas.TOP</literal>, meaning that any

-        FeatureStructure can be assigned as an element the array or list. Note: depending on

-        the CAS Interface that you use in your code, this constraint may or may not be

-        enforced.

-        Note: At run time, the elementType is available from a runtime Feature object 

-            (using the <literal>a_feature_object.getRange().getComponentType()</literal> method) 

-            only when specified for the <literal>uima.cas.FSArray</literal> ranges; it isn't

-            available for <literal>uima.cas.FSList</literal> ranges.

-        </para>

-        

-      

-      <para>The <literal>multipleReferencesAllowed</literal> feature is optional, and

-        applies only when the <literal>rangeTypeName</literal> is an array or list type (it

-        applies to arrays and lists of primitive as well as non-primitive types). Setting

-        this to false (the default) indicates that this feature has exclusive ownership of

-        the array or list, so changes to the array or list are localized. Setting this to true

-        indicates that the array or list may be shared, so changes to it may affect other

-        objects in the CAS. Note: there is currently no guarantee that the framework will

-        enforce this restriction. However, this setting may affect how the CAS is

-        serialized.</para>

-      

-    </section>

-    

-    <section id="&tp;type_system.string_subtypes">

-      <title>String Subtypes</title>

-      

-      <para>There is one other special type that you can declare &ndash; a subset of the String

-        type that specifies a restricted set of allowed values. This is useful for features

-        that can have only certain String values, such as parts of speech. Here is an example of

-        how to declare such a type:</para>

-      

-      

-      <programlisting><![CDATA[<typeDescription>

-  <name>PartOfSpeech</name>

-  <description>A part of speech.</description>

-  <supertypeName>uima.cas.String</supertypeName>

-  <allowedValues>

-    <value>

-      <string>NN</string>

-      <description>Noun, singular or mass.</description>

-    </value>

-    <value>

-      <string>NNS</string>

-      <description>Noun, plural.</description>

-    </value>

-    <value>

-      <string>VB</string>

-      <description>Verb, base form.</description>

-    </value>

-    ...

-  </allowedValues>

-</typeDescription>]]></programlisting>

-      

-    </section>

-  </section>

-  

-  <section id="&tp;aes">

-    <title>Analysis Engine Descriptors</title>

-    

-    <para>Analysis Engine (AE) descriptors completely describe Analysis Engines. There

-      are two basic types of Analysis Engines &ndash; <emphasis>Primitive</emphasis> and

-      <emphasis>Aggregate</emphasis>. A <emphasis>Primitive</emphasis> Analysis

-      Engine is a container for a single <emphasis>annotator</emphasis>, where as an

-      <emphasis>Aggregate</emphasis> Analysis Engine is composed of a collection of other

-      Analysis Engines. (For more information on this and other terminology, see <olink

-        targetdoc="&uima_docs_overview;"/> <olink

-        targetdoc="&uima_docs_overview;" targetptr="ugr.ovv.conceptual"/>).</para>

-    

-    <para>Both Primitive and Aggregate Analysis Engines have descriptors, and the two types

-      of descriptors have some similarities and some differences. <xref linkend="&tp;aes.primitive"/>

-      discusses Primitive Analysis Engine descriptors.  <xref linkend="&tp;aes.aggregate"/> then 

-      describes how Aggregate Analysis Engine descriptors are different.</para>

-    

-    <section id="&tp;aes.primitive">

-      <title>Primitive Analysis Engine Descriptors</title>

-      

-      <section id="&tp;aes.primitive.basic">

-        <title>Basic Structure</title>

-        

-        

-        <programlisting><![CDATA[<?xml version="1.0" encoding="UTF-8" ?>

-<analysisEngineDescription 

-        xmlns="http://uima.apache.org/resourceSpecifier">

-  <frameworkImplementation>org.apache.uima.java</frameworkImplementation> 

-

-  <primitive>true</primitive>

-  <annotatorImplementationName> [String] </annotatorImplementationName>

-

-  <analysisEngineMetaData>

-    ...

-  </analysisEngineMetaData>

-

-  <externalResourceDependencies>

-    ...

-  </externalResourceDependencies>

-

-  <resourceManagerConfiguration>

-    ...

-  </resourceManagerConfiguration>

-

-</analysisEngineDescription>]]></programlisting>

-        

-        <para>The document begins with a standard XML header. The recommended root tag is

-          <literal>&lt;analysisEngineDescription&gt;</literal>, although

-          <literal>&lt;taeDescription&gt;</literal> is also allowed for backwards

-          compatibility.</para>

-        

-        <para>Within the root element we declare that we are using the XML namespace

-          <literal>http://uima.apache.org/resourceSpecifier.</literal> It is

-          required that this namespace be used; otherwise, the descriptor will not be able to

-          be validated for errors.</para>

-        

-        <para> The first subelement,

-          <literal>&lt;frameworkImplementation&gt;,</literal> currently must have

-          the value <literal>org.apache.uima.java</literal>, or

-          <literal>org.apache.uima.cpp</literal>. In future versions, there may be

-          other framework implementations, or perhaps implementations produced by other

-          vendors.</para>

-        

-        <para>The second subelement, <literal>&lt;primitive&gt;,</literal> contains

-          the Boolean value <literal>true</literal>, indicating that this XML document

-          describes a <emphasis>Primitive</emphasis> Analysis Engine.</para>

-        

-        <para>The next subelement,<literal>

-          &lt;annotatorImplementationName&gt;</literal> is how the UIMA framework

-          determines which annotator class to use. This should contain a fully-qualified

-          Java class name for Java implementations, or the name of a .dll or .so file for C++

-          implementations.</para>

-        

-        <para>The <literal>&lt;analysisEngineMetaData&gt;</literal> object contains

-          descriptive information about the analysis engine and what it does. It is

-          described in <xref linkend="&tp;aes.metadata"/>.</para>

-        

-        <para>The <literal>&lt;externalResourceDependencies&gt;</literal> and

-          <literal>&lt;resourceManagerConfiguration&gt;</literal> elements declare

-          the external resource files that the analysis engine relies

-          upon. They are optional and are described in <xref

-            linkend="&tp;aes.primitive.external_resource_dependencies"/> and <xref

-            linkend="&tp;aes.primitive.resource_manager_configuration"/>.</para>

-        

-        </section>

-      

-        <section id="&tp;aes.metadata">

-          <title>Analysis Engine MetaData</title>

-          

-          

-          <programlisting><![CDATA[<analysisEngineMetaData>

-  <name> [String] </name>

-  <description>[String]</description>

-  <version>[String]</version>

-  <vendor>[String]</vendor>

-

-  <configurationParameters> ...  </configurationParameters>

-

-  <configurationParameterSettings>

-    ...

-  </configurationParameterSettings> 

-

-  <typeSystemDescription> ... </typeSystemDescription> 

-

-  <typePriorities> ... </typePriorities> 

-

-  <fsIndexCollection> ... </fsIndexCollection>

-

-  <capabilities> ... </capabilities>

-

-  <operationalProperties> ... </operationalProperties>

-

-</analysisEngineMetaData>]]></programlisting>

-          

-          <para>The <literal>analysisEngineMetaData</literal> element contains four

-            simple string fields &ndash; <literal>name</literal>,

-            <literal>description</literal>, <literal>version</literal>, and

-            <literal>vendor</literal>. Only the <literal>name</literal> field is

-            required, but providing values for the other fields is recommended. The

-            <literal>name</literal> field is just a descriptive name meant to be read by

-            users; it does not need to be unique across all Analysis Engines.</para>

-

-          <para>Configuration parameters are described in 

-            <xref linkend="&tp;aes.configuration_parameters"/>.</para>

-          

-          <para>The other sub-elements &ndash;

-            <literal>typeSystemDescription</literal>,

-            <literal>typePriorities</literal>, <literal>fsIndexes</literal>,

-            <literal>capabilities</literal> and

-            <literal>operationalProperties</literal> are described in the following

-            sections. The only one of these that is required is

-            <literal>capabilities</literal>; the others are optional.</para>

-          

-        </section>

-        

-          <section id="&tp;aes.type_system">

-            <title>Type System Definition</title>

-            

-            

-            <programlisting><![CDATA[<typeSystemDescription>

-

-  <name> [String] </name>

-  <description>[String]</description>

-  <version>[String]</version>

-  <vendor>[String]</vendor> 

-

-  <imports>

-    <import ...>

-    ...

-  </imports> 

-

-  <types>

-    <typeDescription>

-      ...

-    </typeDescription>

-

-    ...

-

-  </types>

-

-</typeSystemDescription>]]></programlisting>

-            

-            <para>A <literal>typeSystemDescription</literal> element defines a type

-              system for an Analysis Engine. The syntax for the element is described in <xref

-                linkend="&tp;type_system"/>.</para>

-            

-            <para>The recommended usage is to <literal>import</literal> an external type

-              system, using the import syntax described in <xref linkend="&tp;imports"/>

-              of this chapter. For example:

-              

-              

-              <programlisting>&lt;typeSystemDescription&gt;

-  &lt;imports&gt;

-    &lt;import location="MySharedTypeSystem.xml"&gt;

-  &lt;/imports&gt;

-&lt;/typeSystemDescription&gt;</programlisting></para>

-            

-            <para>This allows several AEs to share a single type system definition. The file

-              <literal>MySharedTypeSystem.xml</literal> would then contain the full

-              type system information, including the <literal>name</literal>,

-              <literal>description</literal>, <literal>vendor</literal>,

-              <literal>version</literal>, and <literal>types</literal>.</para>

-            

-          </section>

-          <section id="&tp;aes.type_priority">

-            <title>Type Priority Definition</title>

-            

-            

-            <programlisting><![CDATA[<typePriorities>

-  <name> [String] </name>

-  <description>[String]</description>

-  <version>[String]</version>

-  <vendor>[String]</vendor>

-

-  <imports>

-    <import ...>

-    ...

-  </imports> 

-

-  <priorityLists>

-    <priorityList>

-      <type>[TypeName]</type>

-      <type>[TypeName]</type>

-        ...

-    </priorityList>

-

-    ...

-

-  </priorityLists>

-</typePriorities>]]></programlisting>

-            

-            <para>The <literal>&lt;typePriorities&gt;</literal> element contains

-              zero or more <literal>&lt;priorityList&gt;</literal> elements; each

-              <literal>&lt;priorityList&gt;</literal> contains zero or more types.

-              Like a type system, a type priorities definition may also declare a name,

-              description, version, and vendor, and may import other type priorities. See

-                <xref linkend="&tp;imports"/> for the import syntax.</para>

-            

-            <para>Type priority is used when iterating over feature structures in the CAS.

-              For example, if the CAS contains a <literal>Sentence</literal> annotation

-              and a <literal>Paragraph</literal> annotation with the same span of text

-              (i.e. a one-sentence paragraph), which annotation should be returned first

-              by an iterator? Probably the Paragraph, since it is conceptually

-              <quote>bigger,</quote> but the framework does not know that and must be

-              explicitly told that the Paragraph annotation has priority over the Sentence

-              annotation, like this:

-              

-              

-              <programlisting>&lt;typePriorities&gt;

-  &lt;priorityList&gt;

-    &lt;type&gt;org.myorg.Paragraph&lt;/type&gt;

-    &lt;type&gt;org.myorg.Sentence&lt;/type&gt;

-  &lt;/priorityList&gt;

-&lt;/typePriorities&gt;</programlisting></para>

-            

-            <para>All of the <literal>&lt;priorityList&gt;</literal> elements defined

-              in the descriptor (and in all component descriptors of an aggregate analysis

-              engine descriptor) are merged to produce a single priority list.</para>

-            

-            <para>Subtypes of types specified here are also ordered, unless overridden by

-              another user-specified type ordering. For example, if you specify type A

-              comes before type B, then subtypes of A will come before subtypes of B, unless

-              there is an overriding specification which declares some subtype of B comes

-              before some subtype of A.</para>

-            

-            <para>If there are inconsistencies between the priority list (type A declared

-              before type B in one priority list, and type B declared before type A in

-              another), the framework will throw an exception.</para>

-            

-            <para>User defined indexes may declare if they wish to use the type priority or

-              not; see the next section.</para>

-          </section>

-          

-          <section id="&tp;aes.index">

-            <title>Index Definition</title>

-            

-            

-            <programlisting><![CDATA[<fsIndexCollection>

-

-  <name>[String]</name>

-  <description>[String]</description>

-  <version>[String]</version>

-  <vendor>[String]</vendor> 

-

-  <imports>

-    <import ...>

-    ...

-  </imports>

-

-  <fsIndexes> 

-

-    <fsIndexDescription>

-      ...

-    </fsIndexDescription>

-

-    <fsIndexDescription>

-      ...

-    </fsIndexDescription>

-

-  </fsIndexes>

-

-</fsIndexCollection>]]></programlisting>

-            

-            <para>The <literal>fsIndexCollection</literal> element declares<emphasis> Feature Structure

-              Indexes</emphasis>, each of which defined an index that holds feature structures of a given type.

-              Information in the CAS is always accessed through an index. There is a built-in default annotation

-              index declared which can be used to access instances of type

-              <literal>uima.tcas.Annotation</literal> (or its subtypes), sorted based on their

-              <literal>begin</literal> and <literal>end</literal> features, and the type priority ordering (if specified). 

-              For all other types, there is a

-              default, unsorted (bag) index. If there is a need for a specialized index it must be declared in this

-              element of the descriptor. See <olink targetdoc="&uima_docs_ref;"

-                targetptr="ugr.ref.cas.indexes_and_iterators"/> for details on FS indexes.</para>

-            

-            <para>Like type systems and type priorities, an

-              <literal>fsIndexCollection</literal> can declare a

-              <literal>name</literal>, <literal>description</literal>,

-              <literal>vendor</literal>, and <literal>version</literal>, and may

-              import other <literal>fsIndexCollection</literal>s. The import syntax is

-              described in <xref linkend="&tp;imports"/>.</para>

-            

-            <para>An <literal>fsIndexCollection</literal> may also define zero or more

-              <literal>fsIndexDescription</literal> elements, each of which defines a

-              single index. Each <literal>fsIndexDescription</literal> has the form:

-              

-              

-              <programlisting><![CDATA[<fsIndexDescription>

-

-  <label>[String]</label>

-  <typeName>[TypeName]</typeName>

-  <kind>sorted|bag|set</kind>

-

-  <keys>

-

-    <fsIndexKey>

-      <featureName>[Name]</featureName>

-      <comparator>standard|reverse</comparator>

-    </fsIndexKey>

-

-    <fsIndexKey>

-      <typePriority/>

-    </fsIndexKey>

-

-    ...

-

-  </keys>

-</fsIndexDescription>]]></programlisting></para>

-            

-            <para>The <literal>label</literal> element defines the name by which

-              applications and annotators refer to this index. The

-              <literal>typeName</literal> element contains the name of the type that will

-              be contained in this index. This must match one of the type names defined in the

-              <literal>&lt;typeSystemDescription&gt;</literal>.</para>

-            

-            <para>There are three possible values for the

-              <literal>&lt;kind&gt;</literal> of index. Sorted indexes enforce an

-              ordering of feature structures, based on defined keys.  Bag indexes do

-              not enforce ordering, and have no defined keys. Set indexes do not

-              enforce ordering, but use defined keys to specify equivalence classes; 

-              addToIndexes will not add a Feature Structure to a set index if its keys 

-              match those of an entry of the same type already in the index.

-              If the <literal>&lt;kind&gt;</literal>element is omitted, it will default to

-              sorted, which is the most common type of index.</para>

-              

-            <para>Prior to version 2.7.0, the bag and sorted indexes stored duplicate entries for the

-            same identical FS, if it was added to the indexes multiple times. As of version 2.7.0, this 

-            is changed; a second or subsequent add to index operation has no effect.  This has the

-            consequence that a remove operation now guarantees that the particular FS is removed 

-            (as opposed to only being able to say that one (of perhaps many duplicate entries) is removed).

-            Since sending to remote annotators only adds entries to indexes at most once, this 

-            behavior is consistent with that.</para>

-            

-            <para>Note that even after this change, there is still a distinct difference in meaning for bag and set indexes.

-            The set index uses equal defined key values plus the type of the Feature Structure to determine equivalence classes for Feature Structures, and

-            will not add a Feature Structure if it has equal key values and the same type to an entry already in there.</para>

-            

-            <para>It is possible, however, that users may be depending on having multiple instances of 

-            the identical FeatureStructure in the indicies. Therefore, UIMA uses 

-             a JVM defined property,

-            "uima.allow_duplicate_add_to_indexes", which (if defined whend UIMA is loaded) will restore the previous behavior.</para>

-            

-            <note><para>If duplicates are allowed, then the proper way to update an indexed Feature Structure is to

-              <itemizedlist>

-                <listitem><para>remove <emphasis role="bold">*all*</emphasis> instances of the FS to be

-                  updated </para></listitem>

-                <listitem><para>update the features</para></listitem>

-                <listitem><para>re-add the Feature Structure to the indexes (perhaps multiple times, depending on the

-                details of your logic).</para></listitem>

-              </itemizedlist></para></note>

-            

-            <note><para>There is usually no need to explicitly declare a Bag index in your descriptor.  

-              As of UIMA v2.1, if you do not declare any index for a type (or any of its 

-              supertypes), a Bag index will be automatically created if an instance of that type is added to the indexes.</para></note>

-                        

-            <para>An Sorted or Set index may define zero or more <emphasis>keys</emphasis>. These keys

-              determine the sort order of the feature structures within a sorted index, and

-              partially determine equality for set indexes (the equality measure always includes testing that the types are the same). 

-              Bag indexes do not use keys, and 

-			  equality is determined by Feature Structure identity (that is, two elements

-			  are considered equal if and only if they are exactly the same feature structure,

-			  located in the same place in the CAS). Keys are

-              ordered by precedence &ndash; the first key is evaluated first, and

-              subsequent keys are evaluated only if necessary.</para>

-            

-            <para>Each key is represented by an <literal>fsIndexKey</literal> element.

-              Most <literal>fsIndexKeys</literal> contains a

-              <literal>featureName</literal> and a <literal>comparator</literal>.

-              The <literal>featureName</literal> must match the name of one of the

-              features for the type specified in the

-              <literal>&lt;typeName&gt;</literal> element for this index. The

-              comparator defines how the features will be compared &ndash; a value of

-              <literal>standard</literal> means that features will be compared using the

-              standard comparison for their data type (e.g. for numerical types, smaller

-              values precede larger values, and for string types, Unicode string

-              comparison is performed). A value of <literal>reverse</literal> means that

-              features will be compared using the reverse of the standard comparison (e.g.

-              for numerical types, larger values precede smaller values, etc.). For Set

-              indexes, the comparator direction is ignored &ndash; the keys are only used

-              for the equality testing.</para>

-            

-            <para>Each key used in comparisons must refer to a feature whose range type is

-              Boolean, Byte, Short, Integer, Long, Float, Double, or String.

-              </para>

-            

-            <para>There is a second type of a key, one which contains only the

-              <literal>&lt;typePriority/&gt;</literal>. When this key is used, it

-              indicates that Feature Structures will be compared using the type priorities

-              declared in the <literal>&lt;typePriorities&gt;</literal> section of the

-              descriptor.</para>

-            

-          </section>

-          

-          <section id="&tp;aes.capabilities">

-            <title>Capabilities</title>

-            

-            

-            <programlisting><![CDATA[<capabilities>

-  <capability>

-

-    <inputs>

-      <type allAnnotatorFeatures="true|false"[TypeName]</type>

-      ...

-      <feature>[TypeName]:[Name]</feature>

-      ...

-    </inputs>

-

-    <outputs>

-      <type allAnnotatorFeatures="true|false"[TypeName]</type>

-      ...

-      <feature>[TypeName]:[Name]</feature>

-      ...

-    </output>

-

-    <inputSofas>

-      <sofaName>[name]</sofaName>

-      ...

-    </inputSofas>

-

-    <outputSofas>

-      <sofaName>[name]</sofaName>

-      ...

-    </outputSofas>

-

-    <languagesSupported>

-      <language>[ISO Language ID]</language>

-        ...

-    </languagesSupported>

-  </capability>

-

-  <capability>

-    ...

-  </capability>

-

-  ...

-

-</capabilities>]]></programlisting>

-            

-            <para>The capabilities definition is used by the UIMA Framework in several

-              ways, including setting up the Results Specification for process calls,

-              routing control for aggregates based on language, and as part of the Sofa

-              mapping function.</para>

-            

-            <para>The <literal>capabilities</literal> element contains one or more

-              <literal>capability</literal> elements. In Version 2 and onwards, only one

-              capability set should be used (multiple sets will continue to work for a while,

-              but they're not logically consistently supported).

-              <!-- Because you can therefore

-              declare multiple capability sets, you can use this to model component behavior

-              

-              that for a given set of inputs, produces a particular set of outputs. --></para>

-            

-            <para>Each <literal>capability</literal> contains

-              <literal>inputs</literal>, <literal>outputs</literal>,

-              <literal>languagesSupported, inputSofas, and outputSofas</literal>.

-              Inputs and outputs element are required (though they may be empty);

-              <literal>&lt;languagesSupported&gt;, &lt;inputSofas</literal>&gt;,

-              and <literal>&lt;outputSofas&gt;</literal> are optional.</para>

-            

-            <para>Both inputs and outputs may contain a mixture of type and feature

-              elements.</para>

-            

-            <para><literal>&lt;type...&gt;</literal> elements contain the name of one

-              of the types defined in the type system or one of the built in types. Declaring a

-              type as an input means that this component expects instances of this type to be

-              in the CAS when it receives it to process. Declaring a type as an output means

-              that this component creates new instances of this type in the CAS.</para>

-            

-            <para>There is an optional attribute

-              <literal>allAnnotatorFeatures</literal>, which defaults to false if

-              omitted. The Component Descriptor Editor tool defaults this to true when a new

-              type is added to the list of inputs and/or outputs. When this attribute is true,

-              it specifies that all of the type&apos;s features are also declared as input or

-              output. Otherwise, the features that are required as inputs or populated as

-              outputs must be explicitly specified in feature elements.</para>

-            

-            <para><literal>&lt;feature...&gt;</literal> elements contain the

-              <quote>fully-qualified</quote> feature name, which is the type name

-              followed by a colon, followed by the feature name, e.g.

-              <literal>org.myorg.TokenAnnotation:lemma</literal>.

-              <literal>&lt;feature...&gt;</literal> elements in the

-              <literal>&lt;inputs&gt;</literal> section must also have a corresponding

-              type declared as an input. In output sections, this is not required. If the type

-              is not specified as an output, but a feature for that type is, this means that

-              existing instances of the type have the values of the specified features

-              updated. Any type mentioned in a <literal>&lt;feature&gt;</literal>

-              element must be either specified as an input or an output or both.</para>

-            

-            <para><literal>language </literal>elements contain one of the ISO language

-              identifiers, such as <literal>en</literal> for English, or

-              <literal>en-US</literal> for the United States dialect of English.</para>

-            

-            <para>The list of language codes can be found here: <ulink

-                url="http://www.ics.uci.edu/pub/ietf/http/related/iso639.txt"/>

-              and the country codes here:

-              <ulink

-                url="http://www.chemie.fu-berlin.de/diverse/doc/ISO_3166.html"/>

-              </para>

-            

-            <para><literal>&lt;inputSofas&gt;</literal> and

-              <literal>&lt;outputSofas&gt;</literal> declare sofa names used by this

-              component. All Sofa names must be unique within a particular capability set. A

-              Sofa name must be an input or an output, and cannot be both. It is an error to have a

-              Sofa name declared as an input in one capability set, and also have it declared

-              as an output in another capability set.</para>

-            

-            <para>A <literal>&lt;sofaName&gt;</literal> is written as a simple

-              Java-style identifier, without any periods in the name, except that it may be

-              written to end in <quote><literal>.*</literal></quote>. If written in this

-              manner, it specifies a set of Sofa names, all of which start with the base name

-              (the part before the .*) followed by a period and then an arbitrary Java

-              identifier (without periods). This form is used to specify in the descriptor

-              that the component could generate an arbitrary number of Sofas, the exact

-              names and numbers of which are unknown before the component is run.</para>

-            

-          </section>

-          

-          <section id="&tp;aes.operational_properties">

-            <title>OperationalProperties</title>

-            

-            <para>Components can specify specific operational properties that can be

-              useful in deployment. The following are available:</para>

-            

-            

-            <programlisting><![CDATA[<operationalProperties>

-  <modifiesCas> true|false </modifiesCas>

-  <multipleDeploymentAllowed> true|false </multipleDeploymentAllowed>

-  <outputsNewCASes> true|false </outputsNewCASes>

-</operationalProperties>]]></programlisting>

-            

-            <para><literal>ModifiesCas</literal>, if false, indicates that this

-              component does not modify the CAS. If it is not specified, the default value is

-              true except for CAS Consumer components.</para>

-                       

-            <para><literal>multipleDeploymentAllowed</literal>, if true, allows the

-              component to be deployed multiple times to increase performance through

-              scale-out techniques. If it is not specified, the default value is true,

-              except for CAS Consumer and Collection Reader components.</para>

-

-            <note><para>If you wrap one or more CAS Consumers inside an aggregate as the only

-            components, you must explicitly specify in the aggregate the 

-            <literal>multipleDeploymentAllowed</literal> property as false (assuming the CAS Consumer 

-            components take the default here); otherwise the framework will complain about inconsistent 

-            settings for these.</para></note>

-                        

-            <para><literal>outputsNewCASes</literal>, if true, allows the component to

-              create new CASes during processing, for example to break a large artifact into

-              smaller pieces. See <olink targetdoc="&uima_docs_tutorial_guides;"

-                /> <olink targetdoc="&uima_docs_tutorial_guides;"

-                targetptr="ugr.tug.cm"/> for details.</para>

-          </section>

-          

-          <section id="&tp;aes.primitive.external_resource_dependencies">

-            <title>External Resource Dependencies</title>

-            

-            

-            <programlisting><![CDATA[<externalResourceDependencies>

-  <externalResourceDependency>

-    <key>[String]</key>

-    <description>[String] </description>

-    <interfaceName>[String]</interfaceName>

-    <optional>true|false</optional>

-  </externalResourceDependency>

-

-  <externalResourceDependency>

-    ...

-  </externalResourceDependency>

-

-  ...

-

-</externalResourceDependencies>]]></programlisting>

-            

-            <para>A primitive annotator may declare zero or more

-              <literal>&lt;externalResourceDependency&gt;</literal> elements. Each

-              dependency has the following elements:

-              

-              <itemizedlist><listitem><para><literal>key</literal> &ndash; the

-                string by which the annotator code will attempt to access the resource. Must

-                be unique within this annotator.</para></listitem>

-                

-                <listitem><para><literal>description</literal> &ndash; a textual

-                  description of the dependency.</para></listitem>

-                

-                <listitem><para><literal>interfaceName</literal> &ndash; the

-                  fully-qualified name of the Java interface through which the annotator

-                  will access the data. This is optional. If not specified, the annotator

-                  can only get an InputStream to the data.</para></listitem>

-                

-                <listitem><para><literal>optional</literal> &ndash; whether the

-                  resource is optional. If false, an exception will be thrown if no resource

-                  is assigned to satisfy this dependency. Defaults to false. </para>

-                  </listitem></itemizedlist></para>

-            

-          </section>

-          

-          <section id="&tp;aes.primitive.resource_manager_configuration">

-            <title>Resource Manager Configuration</title>

-            

-            

-            <programlisting><![CDATA[<resourceManagerConfiguration>

-

-  <name>[String]</name>

-  <description>[String]</description>

-  <version>[String]</version>

-  <vendor>[String]</vendor> 

-

-  <imports>

-    <import ...>

-    ...

-  </imports>

-

-  <externalResources>

-

-    <externalResource>

-      <name>[String]</name>

-      <description>[String]</description>

-      <fileResourceSpecifier>

-        <fileUrl>[URL]</fileUrl>

-      </fileResourceSpecifier>

-      <implementationName>[String]</implementationName>

-    </externalResource>

-    ...

-  </externalResources>

-

-  <externalResourceBindings>

-    <externalResourceBinding>

-      <key>[String]</key>

-      <resourceName>[String]</resourceName>

-    </externalResourceBinding>

-    ...

-  </externalResourceBindings>

-

-</resourceManagerConfiguration>]]></programlisting>

-            

-            <para>This element declares external resources and binds them to

-              annotators&apos; external resource dependencies.</para>

-            

-            <para>The <literal>resourceManagerConfiguration</literal> element may

-              optionally contain an <literal>import</literal>, which allows resource

-              definitions to be stored in a separate (shareable) file. See <xref

-                linkend="&tp;imports"/> for details.</para>

-            

-            <para>The <literal>externalResources</literal> element contains zero or

-              more <literal>externalResource</literal> elements, each of which

-              consists of:

-              

-              <itemizedlist><listitem><para><literal>name</literal> &ndash; the

-                name of the resource. This name is referred to in the bindings (see below).

-                Resource names need to be unique within any Aggregate Analysis Engine or

-                Collection Processing Engine, so the Java-like

-                <literal>org.myorg.mycomponent.MyResource</literal> syntax is

-                recommended.</para></listitem>

-                

-                <listitem><para><literal>description</literal> &ndash; English

-                  description of the resource.</para></listitem>

-                

-                <listitem><para>Resource Specifier &ndash;

-                  Declares the location of the resource. There are different

-                  possibilities for how this is done (see below).</para></listitem>

-                

-                <listitem><para><literal>implementationName</literal> &ndash; The

-                  fully-qualified name of the Java class that will be instantiated from the

-                  resource data. This is optional; if not specified, the resource will be

-                  accessible as an input stream to the raw data. If specified, the Java class

-                  must implement the <literal>interfaceName</literal> that is

-                  specified in the External Resource Dependency to which it is bound.

-                  </para></listitem></itemizedlist></para>

-            

-            <para>One possibility for the resource specifier is a

-              <literal>&lt;fileResourceSpecifier&gt;</literal>, as shown above. This

-              simply declares a URL to the resource data. This support is built on the Java

-              class URL and its method URL.openStream(); it supports the protocols

-              <quote>file</quote>, <quote>http</quote> and <quote>jar</quote> (for

-              referring to files in jars) by default, and you can plug in handlers for other

-              protocols. The URL has to start with file: (or some other protocol). It is

-              relative to either the classpath or the <quote>data path</quote>. The data

-              path works like the classpath but can be set programmatically via

-              <literal>ResourceManager.setDataPath()</literal>. Setting the Java

-              System property <literal>uima.datapath</literal> also works.</para>

-            

-            <para><literal>file:com/apache.d.txt</literal> is a relative path;

-              relative paths for resources are resolved using the classpath and/or the

-              datapath. For the file protocol, URLs starting with file:/ or file:/// are

-              absolute. Note that <literal>file://org/apache/d.txt</literal> is NOT an

-              absolute path starting with <quote>org</quote>. The <quote>//</quote>

-              indicates that what follows is a host name. Therefore if you try to use this URL

-              it will complain that it can&apos;t connect to the host <quote>org</quote>

-              </para>

-            

-			<para>The URL value may contain references to external override variables using the

- 		      <literal>${variable-name}</literal> syntax, 

-			  e.g. <literal>file:com/${dictUrl}.txt</literal>.

-			  If a variable is undefined the value is left unmodified and a warning message

- 		      identifies the missing variable.

-			  </para>

-

-            <para>Another option is a

-              <literal>&lt;fileLanguageResourceSpecifier&gt;</literal>, which is

-              intended to support resources, such as dictionaries, that depend on the

-              language of the document being processed. Instead of a single URL, a prefix and

-              suffix are specified, like this:

-              

-              

-              <programlisting><![CDATA[<fileLanguageResourceSpecifier>

-  <fileUrlPrefix>file:FileLanguageResource_implTest_data_</fileUrlPrefix>

-  <fileUrlSuffix>.dat</fileUrlSuffix>

-</fileLanguageResourceSpecifier>]]></programlisting></para>

-            

-            <para>The URL of the actual resource is then formed by concatenating the prefix,

-              the language of the document (as an ISO language code, e.g.

-              <literal>en</literal> or <literal>en-US</literal>

-              &ndash; see <xref linkend="&tp;aes.capabilities"/> for more

-              information), and the suffix.</para>

-            

-		    <para>A third option is a <literal>customResourceSpecifier</literal>, which allows

-			  you to plug in an arbitrary Java class.  See <xref linkend="&tp;custom_resource_specifiers"/>

-			  for more information.</para>

-			  

-            <para>The <literal>externalResourceBindings</literal> element declares

-              which resources are bound to which dependencies. Each

-              <literal>externalResourceBinding</literal> consists of:

-              

-              <itemizedlist><listitem><para><literal>key</literal> &ndash;

-                identifies the dependency. For a binding declared in a primitive analysis

-                engine descriptor, this must match the value of the

-                <literal>key</literal> element of one of the

-                <literal>externalResourceDependency</literal> elements. Bindings

-                may also be specified in aggregate analysis engine descriptors, in which

-                case a compound key is used

-                &ndash; see <xref

-                  linkend="&tp;aes.aggregate.external_resource_bindings"/>

-                .</para></listitem>

-                

-                <listitem><para><literal>resourceName</literal> &ndash; the name of

-                  the resource satisfying the dependency. This must match the value of the

-                  <literal>name</literal> element of one of the

-                  <literal>externalResource</literal> declarations. </para>

-                  </listitem></itemizedlist></para>

-            

-            <para>A given resource dependency may only be bound to one external resource;

-              one external resource may be bound to many dependencies &ndash; to allow

-              resource sharing.</para>

-          </section>

-          

-          <section id="&tp;aes.environment_variable_references">

-            <title>Environment Variable References</title>

-            

-            <para>In several places throughout the descriptor, it is possible to reference

-              environment variables. In Java, these are actually references to Java system

-              properties. To reference system environment variables from a Java analysis

-              engine you must pass the environment variables into the Java virtual machine

-              by using the <literal>&minus;D</literal> option on the <literal>java</literal>

-              command line.</para>

-            

-            <para>The syntax for environment variable references is

-              <literal>&lt;envVarRef&gt;[VariableName]&lt;/envVarRef&gt;</literal>

-              , where [VariableName] is any valid Java system property name. Environment

-              variable references are valid in the following places:

-              

-              <itemizedlist spacing="compact"><listitem><para>The value of a

-                configuration parameter (String-valued parameters only)</para>

-                </listitem>

-                

-                <listitem><para>The

-                  <literal>&lt;annotatorImplementationName&gt;</literal> element

-                  of a primitive AE descriptor</para></listitem>

-                

-                <listitem><para>The <literal>&lt;name&gt;</literal> element within

-                  <literal>&lt;analysisEngineMetaData&gt;</literal></para>

-                  </listitem>

-                

-                <listitem><para>Within a

-                  <literal>&lt;fileResourceSpecifier&gt;</literal> or

-                  <literal>&lt;fileLanguageResourceSpecifier&gt;</literal>

-                  </para></listitem></itemizedlist></para>

-            

-            <para>For example, if the value of a configuration parameter were specified as:

-              <literal>&lt;string&gt;&lt;envVarRef&gt;TEMP_DIR&lt;/envVarRef&gt;/temp.dat&lt;/string&gt;</literal>

-              , and the value of the <literal>TEMP_DIR</literal> Java System property were

-              <literal>c:/temp</literal>, then the configuration parameter&apos;s

-              value would evaluate to <literal>c:/temp/temp.dat</literal>.</para>

-              

-            <note><para>The Component Descriptor Editor does not support 

-              environment variable references.  If you need to, however, you 

-              can use the <code>source</code> tab view in the CDE to manually

-              add this notation.

-              </para></note>

-            

-          </section>

-        </section>

-        <section id="&tp;aes.aggregate">

-          <title>Aggregate Analysis Engine Descriptors</title>

-          

-          <para>Aggregate Analysis Engines do not contain an annotator, but instead

-            contain one or more component (also called <emphasis>delegate</emphasis>)

-            analysis engines.</para>

-          

-          <para>Aggregate Analysis Engine Descriptors maintain most of the same structure

-            as Primitive Analysis Engine Descriptors. The differences are:</para>

-          

-          <itemizedlist><listitem><para>An Aggregate Analysis Engine Descriptor

-            contains the element

-            <literal>&lt;primitive&gt;false&lt;/primitive&gt;</literal> rather

-            than <literal>&lt;primitive&gt;true&lt;/primitive&gt;</literal>.

-            </para></listitem>

-            

-            <listitem><para>An Aggregate Analysis Engine Descriptor must not include a

-              <literal>&lt;annotatorImplementationName&gt;</literal>

-              element.</para></listitem>

-            

-            <listitem><para>In place of the

-              <literal>&lt;annotatorImplementationName&gt;</literal>, an Aggregate

-              Analysis Engine Descriptor must have a

-              <literal>&lt;delegateAnalysisEngineSpecifiers&gt;</literal>

-              element. See <xref linkend="&tp;aes.aggregate.delegates"/>.</para>

-              </listitem>

-            

-            <listitem><para>An Aggregate Analysis Engine Descriptor may provide a

-              <literal>&lt;flowController&gt;</literal> element immediately

-              following the

-              <literal>&lt;delegateAnalysisEngineSpecifiers&gt;</literal>. <xref

-                linkend="&tp;aes.aggregate.flow_controller"/>.</para></listitem>

-            

-            <listitem><para>Under the analysisEngineMetaData element, an Aggregate

-              Analysis Engine Descriptor may specify an additional element --

-              <literal>&lt;flowConstraints&gt;</literal>. See <xref

-                linkend="&tp;aes.aggregate.flow_constraints"/>. Typically only one

-              of <literal>&lt;flowController&gt;</literal> and

-              <literal>&lt;flowConstraints&gt;</literal> are specified. If both are

-              specified, the <literal>&lt;flowController&gt;</literal> takes

-              precedence, and the flow controller implementation can use the information

-              in specified in the <literal>&lt;flowConstraints&gt;</literal> as part of

-              its configuration input.</para></listitem>

-            

-            <listitem><para>An aggregate Analysis Engine Descriptors must not contain a

-              <literal>&lt;typeSystemDescription&gt;</literal> element. The Type

-              System of the Aggregate Analysis Engine is derived by merging the Type System

-              of the Analysis Engines that the aggregate contains.</para></listitem>

-            

-            <listitem><para>Within aggregate Analysis Engine Descriptors,

-              <literal>&lt;configurationParameter&gt;</literal> elements may define

-              <literal>&lt;overrides&gt;</literal>. See <xref

-                linkend="&tp;aes.aggregate.configuration_parameter_overrides"/>

-              .</para></listitem>

-            

-            <listitem><para>External Resource Bindings can bind resources to

-              dependencies declared by any delegate AE within the aggregate. See <xref

-                linkend="&tp;aes.aggregate.external_resource_bindings"/>.</para>

-              </listitem>

-            

-            <listitem><para>An additional optional element,

-              <literal>&lt;sofaMappings&gt;</literal>, may be included. </para>

-              </listitem></itemizedlist>

-          

-          <section id="&tp;aes.aggregate.delegates">

-            <title>Delegate Analysis Engine Specifiers</title>

-            

-            

-            <programlisting><![CDATA[<delegateAnalysisEngineSpecifiers>

-

-  <delegateAnalysisEngine key="[String]">

-    <analysisEngineDescription>...</analysisEngineDescription> |

-    <import .../> 

-  </delegateAnalysisEngine>

-

-  <delegateAnalysisEngine key="[String]">

-    ...

-  </delegateAnalysisEngine>

-

-  ...

-

-</delegateAnalysisEngineSpecifiers>]]></programlisting>

-            

-            <para>The <literal>delegateAnalysisEngineSpecifiers</literal> element

-              contains one or more <literal>delegateAnalysisEngine</literal>

-              elements. Each of these must have a unique key, and must contain

-              either:</para>

-            

-            <itemizedlist><listitem><para>A complete

-              <literal>analysisEngineDescription</literal> element describing the

-              delegate analysis engine <emphasis role="bold">OR</emphasis></para>

-              </listitem>

-              

-              <listitem><para>An <literal>import</literal> element giving the name or

-                location of the XML descriptor for the delegate analysis engine (see <xref

-                  linkend="&tp;imports"/>).</para></listitem></itemizedlist>

-            

-            <para>The latter is the much more common usage, and is the only form supported by

-              the Component Descriptor Editor tool.</para>

-          </section>

-          <section id="&tp;aes.aggregate.flow_controller">

-            <title>FlowController</title>

-            

-            

-            <programlisting><![CDATA[<flowController key="[String]">

-    <flowControllerDescription>...</flowControllerDescription> |

-    <import .../>

-  </flowController>]]></programlisting>

-            

-            <para>The optional <literal>flowController</literal> element identifies

-              the descriptor of the FlowController component that will be used to determine

-              the order in which delegate Analysis Engine are called.</para>

-            

-            <para>The <literal>key</literal> attribute is optional, but recommended; it

-              assigns the FlowController an identifier that can be used for configuration

-              parameter overrides, Sofa mappings, or external resource bindings. The key

-              must not be the same as any of the delegate analysis engine keys.</para>

-            

-            <para>As with the <literal>delegateAnalysisEngine</literal> element, the

-              <literal>flowController</literal> element may contain either a complete

-              <literal>flowControllerDescription</literal> or an

-              <literal>import</literal>, but the import is recommended. The Component

-              Descriptor Editor tool only supports imports here.</para>

-            

-          </section>

-          <section id="&tp;aes.aggregate.flow_constraints">

-            <title>FlowConstraints</title>

-            

-            <para>If a <literal>&lt;flowController&gt;</literal> is not specified, the

-              order in which delegate Analysis Engines are called within the aggregate

-              Analysis Engine is specified using the

-              <literal>&lt;flowConstraints&gt;</literal> element, which must occur

-              immediately following the

-              <literal>configurationParameterSettings</literal> element. If a

-              <literal>&lt;flowController&gt;</literal> is specified, then the

-              <literal>&lt;flowConstraints&gt;</literal> are optional. They can be

-              used to pass an ordering of delegate keys to the

-              <literal>&lt;flowController&gt;</literal>.</para>

-            

-            <para>There are two options for flow constraints --

-              <literal>&lt;fixedFlow&gt;</literal> or

-              <literal>&lt;capabilityLanguageFlow&gt;</literal>. Each is discussed

-              in a separate section below.</para>

-            

-            <section id="&tp;aes.aggregate.flow_constraints.fixed_flow">

-              <title>Fixed Flow</title>

-              

-              

-              <programlisting><![CDATA[<flowConstraints>

-  <fixedFlow>

-    <node>[String]</node>

-    <node>[String]</node>

-    ...

-  </fixedFlow>

-</flowConstraints>]]></programlisting>

-              

-              <para>The <literal>flowConstraints</literal> element must be included

-                immediately following the

-                <literal>configurationParameterSettings</literal> element.</para>

-              

-              <para>Currently the <literal>flowConstraints</literal> element must

-                contain a <literal>fixedFlow</literal> element. Eventually, other

-                types of flow constraints may be possible.</para>

-              

-              <para>The <literal>fixedFlow</literal> element contains one or more

-                <literal>node</literal> elements, each of which contains an identifier

-                which must match the key of a delegate analysis engine specified in the

-                <literal>delegateAnalysisEngineSpecifiers</literal>

-                element.</para>

-              

-            </section>

-            <section

-              id="&tp;aes.aggregate.flow_constraints.capability_language_flow">

-              <title>Capability Language Flow</title>

-              

-              

-              <programlisting><![CDATA[<flowConstraints>

-  <capabilityLanguageFlow>

-    <node>[String]</node>

-    <node>[String]</node>

-    ...

-  </capabilityLanguageFlow>

-</flowConstraints>]]></programlisting>

-              

-              <para>If you use <literal>&lt;capabilityLanguageFlow&gt;</literal>,

-                the delegate Analysis Engines named by the

-                <literal>&lt;node&gt;</literal> elements are called in the given order,

-                except that a delegate Analysis Engine is skipped if any of the following are

-                true (according to that Analysis Engine&apos;s declared output

-                capabilities):</para>

-              

-              <itemizedlist><listitem><para>It cannot produce any of the aggregate

-                Analysis Engine&apos;s output capabilities for the language of the

-                current document.</para></listitem>

-                

-                <listitem><para>All of the output capabilities have already been

-                  produced by an earlier Analysis Engine in the flow. </para></listitem>

-                </itemizedlist>

-              

-              <para>For example, if two annotators produce

-                <literal>org.myorg.TokenAnnotation</literal> feature structures for

-                the same language, these feature structures will only be produced by the

-                first annotator in the list.</para>

-              

-              <note><para>The flow analysis uses the specific types that are specified in the

-              output capabilities, without any expansion for subtypes.  So, if you expect

-              a type TT and another type SubTT (which is a subtype of TT) in the output, you

-              must include both of them in the output capabilities.</para></note>

-            </section>

-          </section>

-          

-          <section id="&tp;aes.aggregate.external_resource_bindings">

-            <title>External Resource Bindings</title>

-            

-            <para>Aggregate analysis engine descriptors can declare resource bindings

-              that bind resources to dependencies declared in any of the delegate analysis

-              engines (or their subcomponents, recursively) within that aggregate. This

-              allows resource sharing. Any binding at this level overrides (supersedes)

-              any binding specified by a contained component or their subcomponents,

-              recursively.</para>

-            

-            <para>For example, consider an aggregate Analysis Engine Descriptor that

-              contains delegate Analysis Engines with keys

-              <literal>annotator1</literal> and <literal>annotator2</literal> (as

-              declared in the <literal>&lt;delegateAnalysisEngine&gt;</literal>

-              element &ndash; see <xref linkend="&tp;aes.aggregate.delegates"/>),

-              where <literal>annotator1</literal> declares a resource dependency with

-              key <literal>myResource</literal> and <literal>annotator2</literal>

-              declares a resource dependency with key <literal>someResource</literal>

-              .</para>

-            

-            <para>Within that aggregate Analysis Engine Descriptor, the following

-              <literal>resourceManagerConfiguration</literal> would bind both of

-              those dependencies to a single external resource file.</para>

-            

-            

-            <programlisting><![CDATA[<resourceManagerConfiguration>

-

-  <externalResources>

-    <externalResource>

-      <name>ExampleResource</name>

-      <fileResourceSpecifier>

-        <fileUrl>file:MyResourceFile.dat</fileUrl>

-      </fileResourceSpecifier>

-    </externalResource>

-  </externalResources>  

-

-  <externalResourceBindings>

-    <externalResourceBinding>

-      <key>annotator1/myResource</key>

-      <resourceName>ExampleResource</resourceName>

-    </externalResourceBinding>

-    <externalResourceBinding>

-      <key>annotator2/someResource</key>

-      <resourceName>ExampleResource</resourceName>

-    </externalResourceBinding>

-  </externalResourceBindings>

-

-</resourceManagerConfiguration>]]></programlisting>

-            

-            <para>The syntax for the <literal>externalResources</literal> declaration

-              is exactly the same as described previously. In the resource bindings note the

-              use of the compound keys, e.g. <literal>annotator1/myResource</literal>.

-              This identifies the resource dependency key

-              <literal>myResource</literal> within the annotator with key

-              <literal>annotator1</literal>. Compound resource dependencies can be

-              multiple levels deep to handle nested aggregate analysis engines.</para>

-          </section>

-          

-          <section id="&tp;aes.aggregate.sofa_mappings">

-            <title>Sofa Mappings</title>

-            

-            <para>Sofa mappings are specified between Sofa names declared in this

-              aggregate descriptor as part of the

-              <literal>&lt;capability&gt;</literal> section, and the Sofa names

-              declared in the delegate components. For purposes of the mapping, all the

-              declarations of Sofas in any of the capability sets contained within the

-              <literal>&lt;capabilities&gt; </literal>element are considered

-              together.</para>

-            

-            

-            <programlisting><![CDATA[<sofaMappings>

-  <sofaMapping>

-    <componentKey>[keyName]</componentKey>

-    <componentSofaName>[sofaName]</componentSofaName>

-    <aggregateSofaName>[sofaName]</aggregateSofaName>

-  </sofaMapping>

-  ...

-</sofaMappings>]]></programlisting>

-            

-            <para>The &lt;componentSofaName&gt; may be omitted in the case where the

-              component is not aware of Multiple Views or Sofas. In this case, the UIMA

-              framework will arrange for the specified &lt;aggregateSofaName&gt; to be

-              the one visible to the delegate component.</para>

-            

-            <para>The &lt;componentKey&gt; is the key name for the component as specified

-              in the list of delegate components for this aggregate.</para>

-            

-            <para>The sofaNames used must be declared as input or output sofas in some

-              capability set.</para>

-          </section>

-        </section>

-

-        <section id="&tp;aes.configuration_parameters">

-          <title>Configuration Parameters</title>

-          <para>Configuration parameters may be declared and set in both Primitive and 

-          Aggregate descriptors. Parameters set in an aggregate may override parameters set in one or

-          more of its delegates.

-          </para>

-        <section id="&tp;aes.configuration_parameter_declaration">

-          <title>Configuration Parameter Declaration</title>

-          

-          <para>Configuration Parameters are made available to annotator

-            implementations and applications by the following interfaces:

-            <itemizedlist spacing="compact" mark="circle">

-            <listitem><para>

-            <literal>AnnotatorContext</literal> <footnote><para>Deprecated; use

-            UimaContext instead.</para></footnote> (passed as an argument to the

-            initialize() method of a version 1 annotator)</para>

-            </listitem>

-            <listitem><para>

-            <literal>ConfigurableResource</literal> (every Analysis Engine

-            implements this interface)</para>

-            </listitem>

-            <listitem><para>

-            <literal>UimaContext</literal> (passed

-            as an argument to the initialize() method of a version 2 annotator) (you can get

-            this from any resource, including Analysis Engines, using the method

-            <literal>getUimaContext</literal>()).</para>

-            </listitem>

-            </itemizedlist></para>

-          

-          <para>Use AnnotatorContext within version 1 annotators and UimaContext for

-            version 2 annotators and outside of annotators (for instance, in CasConsumers,

-            or the containing application) to access configuration parameters.</para>

-          

-          <para>Configuration parameters are set from the corresponding elements in the

-            XML descriptor for the application. If you need to programmatically change

-            parameter settings within an application, you can use methods in

-            ConfigurableResource; if you do this, you need to call reconfigure()

-            afterwards to have the UIMA framework notify all the contained analysis

-            components that the parameter configuration has changed (the analysis

-            engine&apos;s reinitialize() methods will be called). Note that in the current

-            implementation, only integrated deployment components have configuration

-            parameters passed to them; remote components obtain their parameters from

-            their remote startup environment. This will likely change in the

-            future.</para>

-          

-          <para>There are two ways to specify the

-            <literal>&lt;configurationParameters&gt;</literal> section &ndash; as a

-            list of configuration parameters or a list of groups. A list of parameters, which

-            are not part of any group, looks like this:

-            

-            

-            <programlisting><![CDATA[<configurationParameters>

-  <configurationParameter>

-    <name>[String]</name> 

-    <externalOverrideName>[String]</externalOverrideName> 

-    <description>[String]</description> 

-    <type>String|Integer|Long|Float|Double|Boolean</type> 

-    <multiValued>true|false</multiValued> 

-    <mandatory>true|false</mandatory>

-    <overrides>

-      <parameter>[String]</parameter>

-      <parameter>[String]</parameter>

-        ...

-    </overrides>

-  </configurationParameter>

-  <configurationParameter>

-    ...

-  </configurationParameter>

-    ...

-</configurationParameters>]]></programlisting></para>

-          

-          <para>For each configuration parameter, the following are specified:</para>

-          

-          <itemizedlist><listitem><para><emphasis role="bold">name</emphasis>

-            &ndash; the name by which the annotator code refers to the parameter. All

-            parameters declared in an analysis engine descriptor must have distinct names.

-            (required). The name is composed of normal Java identifier characters.</para>

-            </listitem>

-            

-            <listitem><para><emphasis role="bold">externalOverrideName</emphasis> &ndash; the

-              name of a property in an external settings file that if defined overrides

-              any value set in this descriptor or in its parent. See <xref

-                linkend="&tp;aes.external_configuration_parameter_overrides"/>

-              for a discussion of external configuration parameter overrides.

-              (optional)</para></listitem>

-            

-            <listitem><para><emphasis role="bold">description</emphasis> &ndash; a

-              natural language description of the intent of the parameter

-              (optional)</para></listitem>

-            

-            <listitem><para><emphasis role="bold">type</emphasis> &ndash; the data

-              type of the parameter&apos;s value &ndash; must be one of

-              <literal>String</literal>, <literal>Integer</literal>, <literal>Long</literal>,

-              <literal>Float</literal>, <literal>Double</literal>, or <literal>Boolean</literal>

-              (required).</para></listitem>

-            

-            <listitem><para><emphasis role="bold">multiValued</emphasis> &ndash;

-              <literal>true</literal> if the parameter can take multiple-values (an

-              array), <literal>false</literal> if the parameter takes only a single value

-              (optional, defaults to false).</para></listitem>

-            

-            <listitem><para><emphasis role="bold">mandatory</emphasis> &ndash;

-              <literal>true</literal> if a value must be provided for the parameter

-              (optional, defaults to false).</para></listitem>

-            

-            <listitem><para><emphasis role="bold">overrides</emphasis> &ndash; this

-              is used only in aggregate Analysis Engines, but is included here for

-              completeness. See <xref

-                linkend="&tp;aes.aggregate.configuration_parameter_overrides"/>

-              for a discussion of configuration parameter overriding in aggregate

-              Analysis Engines. (optional).</para></listitem></itemizedlist>

-          

-          <para>A list of groups looks like this:

-            

-            

-            <programlisting><![CDATA[<configurationParameters defaultGroup="[String]"

-    searchStrategy="none|default_fallback|language_fallback" >

-

-  <commonParameters>

-    [zero or more parameters]

-  </commonParameters>

-

-  <configurationGroup names="name1 name2 name3 ...">

-    [zero or more parameters]

-  </configurationGroup>

-

-  <configurationGroup names="name4 name5 ...">

-    [zero or more parameters]

-  </configurationGroup>

-

-  ...

-

-</configurationParameters>]]></programlisting></para>

-          

-          <para>Both the<literal> &lt;commonParameters&gt;</literal> and

-            <literal>&lt;configurationGroup&gt;</literal> elements contain zero or

-            more <literal>&lt;configurationParameter&gt;</literal> elements, with

-            the same syntax described above.</para>

-          

-          <para>The <literal>&lt;commonParameters&gt;</literal> element declares

-            parameters that exist in all groups. Each

-            <literal>&lt;configurationGroup&gt;</literal> element has a names

-            attribute, which contains a list of group names separated by whitespace (space

-            or tab characters). Names consist of any number of non-whitespace characters;

-            however the Component Descriptor Editor tool restricts this to be normal Java

-            identifiers, including the period (.) and the dash (-). One configuration group

-            will be created for each name, and all of the groups will contain the same set of

-            parameters.</para>

-          

-          <para>The <literal>defaultGroup</literal> attribute specifies the name of the

-            group to be used in the case where an annotator does a lookup for a configuration

-            parameter without specifying a group name. It may also be used as a fallback if the

-            annotator specifies a group that does not exist &ndash; see below.</para>

-          

-          <para>The <literal>searchStrategy</literal> attribute determines the action

-            to be taken when the context is queried for the value of a parameter belonging to a

-            particular configuration group, if that group does not exist or does not contain

-            a value for the requested parameter. There are currently three possible values:

-            

-            <itemizedlist><listitem><para><emphasis role="bold">none</emphasis>

-              &ndash; there is no fallback; return null if there is no value in the exact group

-              specified by the user.</para></listitem>

-              

-              <listitem><para><emphasis role="bold">default_fallback</emphasis>

-                &ndash; if there is no value found in the specified group, look in the default

-                group (as defined by the <literal>default</literal> attribute)</para>

-                </listitem>

-              

-              <listitem><para><emphasis role="bold">language_fallback</emphasis>

-                &ndash; this setting allows for a specific use of configuration parameter

-                groups where the groups names correspond to ISO language and country codes

-                (for an example, see below). The fallback sequence is:

-                <literal>&lt;lang&gt;_&lt;country&gt;_&lt;region&gt; &rarr;

-                &lt;lang&gt;_&lt;country&gt; &rarr; &lt;lang&gt; &rarr;

-                &lt;default&gt;.</literal> </para></listitem></itemizedlist>

-            </para>

-          

-          <section id="&tp;aes.configuration_parameter_declaration.example">

-            <title>Example</title>

-            

-            

-            <programlisting><![CDATA[<configurationParameters defaultGroup="en"

-        searchStrategy="language_fallback">

-

-  <commonParameters>

-    <configurationParameter>

-      <name>DictionaryFile</name>

-      <description>Location of dictionary for this

-           language</description>

-      <type>String</type>

-      <multiValued>false</multiValued>

-      <mandatory>false</mandatory>

-    </configurationParameter>

-  </commonParameters>

-

-  <configurationGroup names="en de en-US"/>

-

-  <configurationGroup names="zh">

-    <configurationParameter>

-      <name>DBC_Strategy</name>

-      <description>Strategy for dealing with double-byte

-          characters.</description>

-      <type>String</type>

-      <multiValued>false</multiValued>

-      <mandatory>false</mandatory>

-    </configurationParameter>

-  </configurationGroup>

-

-</configurationParameters>]]></programlisting>

-            

-            <para>In this example, we are declaring a <literal>DictionaryFile</literal>

-              parameter that can have a different value for each of the languages that our AE

-              supports

-              &ndash; English (general), German, U.S. English, and Chinese. For Chinese

-              only, we also declare a <literal>DBC_Strategy</literal>

-              parameter.</para>

-            

-            <para>We are using the <literal>language_fallback</literal> search

-              strategy, so if an annotator requests the dictionary file for the

-              <literal>en-GB</literal> (British English) group, we will fall back to the

-              more general <literal>en</literal> group.</para>

-            

-            <para>Since we have defined <literal>en</literal> as the default group, this

-              value will be returned if the context is queried for the

-              <literal>DictionaryFile</literal> parameter without specifying any

-              group name, or if a nonexistent group name is specified.</para>

-          </section>

-        </section>

-        

-        <section id="ugr.ref.aes.configuration_parameter_settings">

-          <title>Configuration Parameter Settings</title>

-          

-          <para>For configuration parameters that are not part of any group, the

-            <literal>&lt;configurationParameterSettings&gt;</literal> element

-            looks like this:

-            

-            

-            <programlisting><![CDATA[<configurationParameterSettings>

-  <nameValuePair>

-    <name>[String]</name> 

-    <value>

-      <string>[String]</string>  | 

-      <integer>[Integer]</integer> |

-      <float>[Float]</float> |

-      <boolean>true|false</boolean>  |

-      <array> ... </array>

-    </value>

-  </nameValuePair>

-

-  <nameValuePair>

-    ...

-  </nameValuePair>

-  ...

-</configurationParameterSettings>]]></programlisting></para>

-          

-          <para>There are zero or more <literal>nameValuePair</literal> elements. Each

-            <literal>nameValuePair</literal> contains a name (which refers to one of the

-            configuration parameters) and a value for that parameter.</para>

-          

-          <para>The <literal>value</literal> element contains an element that matches

-            the type of the parameter. For single-valued parameters, this is either

-            <literal>&lt;string&gt;</literal>, <literal>&lt;integer&gt;</literal>

-            , <literal>&lt;float&gt;</literal>, or

-            <literal>&lt;boolean&gt;</literal>. For multi-valued parameters, this is

-            an <literal>&lt;array&gt;</literal> element, which then contains zero or

-            more instances of the appropriate type of primitive value, e.g.:

-            

-            

-            <programlisting>&lt;array&gt;&lt;string&gt;One&lt;/string&gt;&lt;string&gt;Two&lt;/string&gt;&lt;/array&gt;</programlisting></para>

-          

-          <para>For parameters declared in configuration groups the

-            <literal>&lt;configurationParameterSettings&gt;</literal> element

-            looks like this:

-            

-            

-            <programlisting><![CDATA[<configurationParameterSettings>

-

-  <settingsForGroup name="[String]">

-    [one or more <nameValuePair> elements]

-  </settingsForGroup>

-

-  <settingsForGroup name="[String]">

-    [one or more <nameValuePair> elements]

-  </settingsForGroup>

-

-...

-

-</configurationParameterSettings>]]></programlisting>

-            where each <literal>&lt;settingsForGroup&gt;</literal> element has a name

-            that matches one of the configuration groups declared under the

-            <literal>&lt;configurationParameters&gt;</literal> element and contains

-            the parameter settings for that group.</para>

-          

-          <section id="&tp;aes.configuration_parameter_settings.example">

-            <title>Example</title>

-            

-            <para>Here are the settings that correspond to the parameter declarations in

-              the previous example:

-              

-              

-              <programlisting><![CDATA[<configurationParameterSettings>

-

-  <settingsForGroup name="en">

-    <nameValuePair>

-      <name>DictionaryFile</name>

-      <value><string>resourcesEnglishdictionary.dat></string></value>

-    </nameValuePair>

-  </settingsForGroup>     

-

-  <settingsForGroup name="en-US">

-    <nameValuePair>

-      <name>DictionaryFile</name>

-      <value><string>resourcesEnglish_USdictionary.dat</string></value>

-    </nameValuePair>

-  </settingsForGroup>

-

-  <settingsForGroup name="de">

-    <nameValuePair>

-      <name>DictionaryFile</name>

-      <value><string>resourcesDeutschdictionary.dat</string></value>

-    </nameValuePair>

-  </settingsForGroup>

-

-  <settingsForGroup name="zh">

-    <nameValuePair>

-      <name>DictionaryFile</name>

-      <value><string>resourcesChinesedictionary.dat</string></value>

-    </nameValuePair>

-

-    <nameValuePair>

-      <name>DBC_Strategy</name>

-      <value><string>default</string></value>

-    </nameValuePair>

-

-  </settingsForGroup>

-

-</configurationParameterSettings>]]></programlisting></para>

-          </section>

-          </section>

-

-          <section id="&tp;aes.aggregate.configuration_parameter_overrides">

-            <title>Configuration Parameter Overrides</title>

-            

-            <para>In an aggregate Analysis Engine Descriptor, each

-              <literal>&lt;configurationParameter&gt; </literal>element should

-              contain an <literal>&lt;overrides&gt;</literal> element, with the

-              following syntax:</para>

-            

-            

-            <programlisting><![CDATA[<overrides>

-

-  <parameter>

-    [delegateAnalysisEngineKey]/[parameterName]

-  </parameter>

-

-  <parameter>

-    [delegateAnalysisEngineKey]/[parameterName]

-  </parameter>

-  ...

-

-</overrides>]]></programlisting>

-            

-            <para>Since aggregate Analysis Engines have no code associated with them, the

-              only way in which their configuration parameters can affect their processing

-              is by overriding the parameter values of one or more delegate analysis

-              engines. The <literal>&lt;overrides&gt; </literal>element determines

-              which parameters, in which delegate Analysis Engines, are overridden by this

-              configuration parameter.</para>

-            

-            <para>For example, consider an aggregate Analysis Engine Descriptor that

-              contains delegate Analysis Engines with keys

-              <literal>annotator1</literal> and <literal>annotator2</literal> (as

-              declared in the &lt;delegateAnalysisEngine&gt; element &ndash; see <xref

-                linkend="&tp;aes.aggregate.delegates"/>) and also declares a

-              configuration parameter as follows:

-              

-              

-              <programlisting><![CDATA[<configurationParameter>

-  <name>AggregateParam</name>

-  <type>String</type>

-  <overrides>

-    <parameter>annotator1/param1</parameter>

-    <parameter>annotator2/param2</parameter>

-  </overrides>

-</configurationParameter>]]></programlisting></para>

-            

-            <para>The value of the <literal>AggregateParam</literal> parameter

-              (whether assigned in the aggregate descriptor or at runtime by an

-              application) will override the value of parameter

-              <literal>param1</literal> in <literal>annotator1</literal> and also

-              override the value of parameter <literal>param2</literal> in

-              <literal>annotator2</literal>. No other parameters will be

-              affected.  Note that <literal>AggregateParam</literal> may itself be overridden by a

-              parameter in an outer aggregate that has this aggregate as one of its delegates.

-            </para>

-            

-            <para>Prior to release 2.4.1, if an aggregate Analysis Engine descriptor

-              declared a configuration parameter with no explicit overrides, that

-              parameter would override any parameters having the same name within any

-              delegate analysis engine. Starting with release 2.4.1, support for this

-              usage has been dropped.</para>

-            

-          </section>

-          

-    

-          <section id="&tp;aes.external_configuration_parameter_overrides">

-            <title>External Configuration Parameter Overrides</title>

-

-            <para>

-            External parameter overrides are usually declared in primitive descriptors as a way to

-            easily modify the parameters in some or all of an application's annotators.  

-            By using external settings files and shared parameter names the configuration

-            information can be specified without regard for a particular descriptor hierachy.

-            </para>

-

-            <para>

-            Configuration parameter declarations in primitive and aggregate descriptors may

-            include an <literal>&lt;externalOverrideName&gt;</literal> element, 

-            which specifies the name of a property that may be defined in an external settings file.

-            If this element is present, and if a entry can be found for its name in a settings

-            files, then this value overrides the value otherwise specified for this parameter.

-            </para>  

-

-            <para>

-            The value overrides any value set in this descriptor or set by an override in a parent

-            aggregate.  In primitive descriptors the value set by an external override is always

-            applied.  In aggregate descriptors the value set by an external override applies to the

-            aggregate parameter, and is passed down to the overridden delegate parameters in the

-            usual way, i.e. only if the delegate's parameter has not been set by an external override.

-            </para>  

-

-            <para>

-            Im the absence of external overrides,            

-            parameter evaluation can be viewed as proceeding from the primitive descriptor up through

-            any aggregates containing overrides, taking the last setting found.  With external

-            overrides the search ends with the first external override found that has a value

-            assigned by a settings file.

-            </para>

-

-            <para>

-            The same external name may be used for multiple parameters; 

-            the effect of this is that one setting will override multiple parameters.

-            </para>

-

-            <para>

-            The settings for all descriptors in a pipeline are usually loaded from one or more files

-            whose names are obtained from the Java system property <emphasis>UimaExternalOverrides</emphasis>.

-            The value of the property must be a comma-separated list of resource names.  If the name

-            has a prefix of "file:" or no prefix, the filesystem is searched.  If the name has a

-            prefix of "path:" the rest must be a Java-style dotted name, similar to the name

-            attribute for descriptor imports.  The dots are replaced by file separators and a suffix

-            of ".settings" is appended before searching the datapath and classpath.

-            e.g. <literal>&minus;DUimaExternalOverrides=/data/file1.settings,file:relative/file2.settings,path:org.apache.uima.resources.file3</literal>.

-            </para>

-

-            <para>

-            Override settings may also be specified when creating an analysis engine by putting a 

-            <literal>Settings</literal> object in the additional parameters map for the

-            <literal>produceAnalysisEngine</literal> method.  In this case the

-            Java system property <emphasis>UimaExternalOverrides</emphasis> is ignored.

-            <programlisting>  // Construct an analysis engine that uses two settings files

-  Settings extSettings = 

-      UIMAFramework.getResourceSpecifierFactory().createSettings();

-  for (String fname : new String[] { "externalOverride.settings", 

-                                     "default.settings" }) {

-    FileInputStream fis = new FileInputStream(fname);

-    extSettings.load(fis);

-    fis.close();

-  }

-  Map&lt;String,Object&gt; aeParms = new HashMap&lt;String,Object&gt;();

-  aeParms.put(Resource.PARAM_EXTERNAL_OVERRIDE_SETTINGS, extSettings);

-  AnalysisEngine ae = UIMAFramework.produceAnalysisEngine(desc, aeParms);

-            </programlisting>

-            </para>

-

-            <para>

-            These external settings consist of key - value pairs stored in a 

-            file using the UTF-8 character encoding, and written in a style similar to that 

-            of Java properties files.

-            <itemizedlist spacing="compact" mark="circle">  

-            <listitem><para>

-            Leading whitespace is ignored.

-            </para></listitem>

-            <listitem><para>

-            Comment lines start with '#' or '!'.

-            </para></listitem>

-            <listitem><para>

-            The key and value are separated by whitespace, '=' or ':'.

-            </para></listitem>

-            <listitem><para>

-            Keys must contain at least one character and only letters, digits, or the characters '. / - ~ _'.

-            </para></listitem>

-            <listitem><para>

-            If a line ends with '\' it is extended with the following line (after removing any

-            leading whitespace.)

-            </para></listitem>

-            <listitem><para>

-            Whitespace is trimmed from both keys and values.

-            </para></listitem>

-            <listitem><para>

-            Duplicate key values are ignored &ndash; once a value is assigned to a key it cannot be changed.  

-            </para></listitem>

-            <listitem><para>

-            Values may reference other settings using the syntax '${key}'.

-            </para></listitem>

-            <listitem><para>

-            Array values are represented as a list of strings separated by commas or line breaks,

-            and bracketed by the '[ ]' characters.  The value must start with an '[' and is

-            terminated by the first unescaped ']' which must be at the end of a line.

-            The elements of an array (and hence the array size) may be indirectly specified using

-            the '${key}' syntax but the brackets '[ ]' must be explicitly specified.

-            </para></listitem>

-            <listitem><para>

-            In values the special characters '$ { } [ , ] \' are treated as regular characters if

-            preceeded by the escape character '\'.

-            </para></listitem>

-            </itemizedlist>

-      <programlisting><![CDATA[

-key1  :  value1

- key2 =  value  2

-  key3   element2, element3, element4

- # Next assignment is ignored as key3 has already been set

-key3  :   value ignored

-key4  =  [ array element1, ${key3}, element5

-           element6 ]

-key5     value with a reference ${key1} to key1

-key6  :  long value string \

-         continued from previous line (with leading whitespace stripped)

-key7  =  value without a reference \${not-a-key} 

-key8     \[ value that is not an array ]

-key9  :  [ array element1\, with embedded comma, element2 ]

-]]></programlisting>

-            </para>

-    

-            <para>

-            Multiple settings files are allowed; they are loaded in order, such that

-            early ones take precedence over later ones, following the first-assignment-wins rule. 

-            So, if you have lots of settings,

-            you can put the defaults in one file, and then in a earlier file, override just the

-            ones you need to.

-            </para>

-

-            <para>

-            An external override name may be specified for a parameter declared in a group, but if

-            the parameter is in the common group or the group is declared with multiple names, the

-            external name is shared amongst all, i.e. these parameters cannot be given group-specific values.

-            </para>

-          </section>

-

-          <section id="&tp;aes.external_configuration_parameter_access">

-            <title>Direct Access to External Configuration Parameters</title>

-

-            <para>

-            Annotators and flow controllers can directly access these shared configuration

-            parameters from their UimaContext. 

-            Direct access means an access where the key to select the shared parameter is the 

-            parameter name as specified in the external configuration settings file. 

-			<programlisting>

-String value = aContext.getSharedSettingValue(paramName);

-String values[] = aContext.getSharedSettingArray(arrayParamName);

-String allNames[] = aContext.getSharedSettingNames();

-			</programlisting>

-            Java code called by an annotator or flow controller in the same thread or a child thread

-            can use the <literal>UimaContextHolder</literal> to get the annotator's UimaContext and

-            hence access the shared configuration parameters.

-			<programlisting>

-UimaContext uimaContext = UimaContextHolder.getUimaContext();

-if (uimaContext != null) {

-  value = uimaContext.getSharedSettingValue(paramName);

-}

-			</programlisting>

-			The UIMA framework puts the context in an InheritableThreadLocal variable.  The value

-			will be null if <literal>getUimaContext</literal> is not invoked by an annotator or flow

-			controller on the same thread or a child thread.

-            </para>

-            <para>

-            Since UIMA 3.2.1, the context is stored in the InheritableThreadLocal as a weak reference.

-            This ensures that any long-running threads spawned while the context is set do not 

-            prevent garbage-collection of the context when the context is destroyed. If a child

-            thread should really retain a strong reference to the context, it should obtain the

-            context and store it in a field or in another ThreadLocal variable. For backwards

-            compatibility, the old behavior of using a strong reference by default can be enabled

-            by setting the system property <literal>uima.context_holder_reference_type</literal>

-            to <literal>STRONG</literal>.

-            </para>

-          </section>

-

-          <section id="&tp;aes.other_uses_for_external_configuration_parameters">

-            <title>Other Uses for External Configuration Parameters</title>

-			<para>

-            Explicit references to shared configuration parameters can be specified as part of the

-            value of the name and location attributes of the <literal>import</literal> element

-			and in the value of the fileUrl for a <literal>fileResourceSpecifier</literal>

-			(see <xref linkend="&tp;imports"/> and <xref linkend="&tp;aes.primitive.resource_manager_configuration"/>).

-            </para>

-		  </section>

-

-        </section>

-      </section>

- 

-  

-  <section id="&tp;flow_controller">

-    <title>Flow Controller Descriptors</title>

-    

-    <para>The basic structure of a Flow Controller Descriptor is as follows:

-      

-      

-      <programlisting><![CDATA[<?xml version="1.0" ?> 

-<flowControllerDescription 

-    xmlns="http://uima.apache.org/resourceSpecifier">

-

-  <frameworkImplementation>org.apache.uima.java</frameworkImplementation> 

-

-  <implementationName>[ClassName]</implementationName> 

-

-  <processingResourceMetaData>

-    ...

-  </processingResourceMetaData>

-

-  <externalResourceDependencies>

-    ...

-  </externalResourceDependencies>

-

-  <resourceManagerConfiguration>

-    ...

-  </resourceManagerConfiguration>

-

-</flowControllerDescription>]]></programlisting></para>

-    

-    <para>The <literal>frameworkImplementation</literal> element must always be set to

-      the value <literal>org.apache.uima.java</literal>.</para>

-    

-    <para>The <literal>implementationName</literal> element must contain the

-      fully-qualified class name of the Flow Controller implementation. This must name a

-      class that implements the <literal>FlowController</literal> interface.</para>

-    

-    <para>The <literal>processingResourceMetaData</literal> element contains

-      essentially the same information as a Primitive Analysis Engine Descriptor&apos;s

-      <literal>analysisEngineMetaData</literal> element, described in <xref

-        linkend="&tp;aes.metadata"/>.</para>

-    

-    <para>The <literal>externalResourceDependencies</literal> and

-      <literal>resourceManagerConfiguration</literal> elements are exactly the same as

-      in Primitive Analysis Engine Descriptors (see <xref

-        linkend="&tp;aes.primitive.external_resource_dependencies"/> and <xref

-        linkend="&tp;aes.primitive.resource_manager_configuration"/>).</para>

-    

-  </section>

-  

-  <section id="&tp;collection_processing_parts">

-    <title>Collection Processing Component Descriptors</title>

-    

-    <para>There are three types of Collection Processing Components &ndash; Collection

-      Readers, CAS Initializers (deprecated as of UIMA Version 2), and CAS Consumers. Each

-      type of component has a corresponding descriptor. The structure of these descriptors

-      is very similar to that of primitive Analysis Engine Descriptors.</para>

-    

-    <section id="&tp;collection_processing_parts.collection_reader">

-      <title>Collection Reader Descriptors</title>

-      

-      <para>The basic structure of a Collection Reader descriptor is as follows:

-        

-        

-        <programlisting><![CDATA[<?xml version="1.0" ?> 

-<collectionReaderDescription

-    xmlns="http://uima.apache.org/resourceSpecifier">

-

-  <frameworkImplementation>org.apache.uima.java</frameworkImplementation>

-  <implementationName>[ClassName]</implementationName> 

-

-  <processingResourceMetaData>

-    ...

-  </processingResourceMetaData>

-

-  <externalResourceDependencies>

-   ...

-  </externalResourceDependencies>

-

-  <resourceManagerConfiguration>

-

-   ...

-

-  </resourceManagerConfiguration>

-

-</collectionReaderDescription>]]></programlisting></para>

-      

-      <para>The <literal>frameworkImplementation</literal> element must always be set

-        to the value <literal>org.apache.uima.java</literal>.</para>

-      

-      <para>The <literal>implementationName</literal> element contains the

-        fully-qualified class name of the Collection Reader implementation. This must name

-        a class that implements the <literal>CollectionReader</literal>

-        interface.</para>

-      

-      <para>The <literal>processingResourceMetaData</literal> element contains

-        essentially the same information as a Primitive Analysis Engine

-        Descriptor&apos;s&apos; <literal>analysisEngineMetaData</literal> element:

-        

-        

-        <programlisting><![CDATA[<processingResourceMetaData>

-

-  <name> [String] </name>

-  <description>[String]</description>

-  <version>[String]</version>

-  <vendor>[String]</vendor>

-

-  <configurationParameters>

-     ...

-  </configurationParameters>

-

-  <configurationParameterSettings>

-    ...

-  </configurationParameterSettings> 

-

-  <typeSystemDescription>

-   ...

-  </typeSystemDescription> 

-

-  <typePriorities>

-   ...

-  </typePriorities> 

-

-  <fsIndexes>

-   ...

-  </fsIndexes>

-

-  <capabilities>

-   ...

-  </capabilities> 

-

-</processingResourceMetaData>]]></programlisting></para>

-      

-      <para>The contents of these elements are the same as that described in <xref

-          linkend="&tp;aes.metadata"/>, with the exception that the capabilities

-        section should not declare any inputs (because the Collection Reader is always the

-        first component to receive the CAS).</para>

-      

-      <para>The <literal>externalResourceDependencies</literal> and

-        <literal>resourceManagerConfiguration</literal> elements are exactly the same

-        as in the Primitive Analysis Engine Descriptors (see <xref

-          linkend="&tp;aes.primitive.external_resource_dependencies"/> and <xref

-          linkend="&tp;aes.primitive.resource_manager_configuration"/>).</para>

-      

-    </section>

-    <section id="&tp;collection_processing_parts.cas_initializer">

-      <title>CAS Initializer Descriptors (deprecated)</title>

-      

-      <para>The basic structure of a CAS Initializer Descriptor is as follows:

-        

-        

-        <programlisting><![CDATA[<?xml version="1.0" encoding="UTF-8" ?> 

-<casInitializerDescription

-    xmlns="http://uima.apache.org/resourceSpecifier">

-

-  <frameworkImplementation>org.apache.uima.java</frameworkImplementation>

-  <implementationName>[ClassName] </implementationName>

-

-  <processingResourceMetaData>

-    ...

-  </processingResourceMetaData>

-

-  <externalResourceDependencies>

-    ...

-  </externalResourceDependencies>

-

-  <resourceManagerConfiguration>

-    ...

-  </resourceManagerConfiguration>

-

-</casInitializerDescription>]]></programlisting></para>

-      

-      <para>The <literal>frameworkImplementation</literal> element must always be set

-        to the value <literal>org.apache.uima.java</literal>.</para>

-      

-      <para>The <literal>implementationName</literal> element contains the

-        fully-qualified class name of the CAS Initializer implementation. This must name a

-        class that implements the <literal>CasInitializer</literal> interface.</para>

-      

-      <para>The <literal>processingResourceMetaData</literal> element contains

-        essentially the same information as a Primitive Analysis Engine

-        Descriptor&apos;s&apos; <literal>analysisEngineMetaData</literal> element,

-        as described in <xref linkend="&tp;aes.metadata"/>, with the exception of some

-        changes to the capabilities section. A CAS Initializer&apos;s capabilities

-        element looks like this:

-        

-        

-        <programlisting><![CDATA[<capabilities>

-  <capability>

-    <outputs>

-      <type allAnnotatorFeatures="true|false">[String]</type>

-      <type>[TypeName]</type>

-      ...

-      <feature>[TypeName]:[Name]</feature>

-      ...

-    </outputs>

-

-    <outputSofas>

-      <sofaName>[name]</sofaName>

-      ...

-    </outputSofas>

-

-    <mimeTypesSupported>

-      <mimeType>[MIME Type]</mimeType>

-      ...

-    </mimeTypesSupported>

-  </capability>

-

-  <capability>

-    ...

-  </capability>

-  ...

-</capabilities>]]></programlisting></para>

-      

-      <para>The differences between a CAS Initializer&apos;s capabilities declaration

-        and an Analysis Engine&apos;s capabilities declaration are that the CAS Initializer does not

-        declare any input CAS types and features or input Sofas (because it is always the first

-        to operate on a CAS), it doesn&apos;t have a language specifier, and that the CAS

-        Initializer may declare a set of MIME types that it supports for its input documents.

-        Examples include: text/plain, text/html, and application/pdf. For a list of MIME

-        types see <ulink url="http://www.iana.org/assignments/media-types/"/>. This

-        information is currently only for users&apos; information, the framework does not

-        use it for anything. This may change in future versions.</para>

-      

-      <para>The <literal>externalResourceDependencies</literal> and

-        <literal>resourceManagerConfiguration</literal> elements are exactly the same

-        as in the Primitive Analysis Engine Descriptors (see <xref

-          linkend="&tp;aes.primitive.external_resource_dependencies"/> and <xref

-          linkend="&tp;aes.primitive.resource_manager_configuration"/>).</para>

-      

-    </section>

-    <section id="&tp;collection_processing_parts.cas_consumer">

-      <title>CAS Consumer Descriptors</title>

-      

-      <para>The basic structure of a CAS Consumer Descriptor is as follows:

-        

-        

-        <programlisting><![CDATA[<?xml version="1.0" encoding="UTF-8" ?> 

-<casConsumerDescription 

-    xmlns="http://uima.apache.org/resourceSpecifier">

-

-  <frameworkImplementation>org.apache.uima.java</frameworkImplementation> 

-

-  <implementationName>[ClassName]</implementationName> 

-

-  <processingResourceMetaData>

-    ...

-  </processingResourceMetaData>

-

-  <externalResourceDependencies>

-    ...

-  </externalResourceDependencies>

-

-  <resourceManagerConfiguration>

-    ...

-  </resourceManagerConfiguration>

-</casConsumerDescription>]]></programlisting></para>

-

-        <para>The <literal>frameworkImplementation</literal> element currently must 

-          have the value <literal>org.apache.uima.java</literal>, or

-           <literal>org.apache.uima.cpp</literal>.</para>

-               

-        <para>The next subelement,<literal>

-          &lt;annotatorImplementationName&gt;</literal> is how the UIMA framework

-          determines which annotator class to use. This should contain a fully-qualified

-          Java class name for Java implementations, or the name of a .dll or .so file for C++

-          implementations.</para>      

-      <para>The <literal>frameworkImplementation</literal> element must always be set

-        to the value <literal>org.apache.uima.java</literal>.</para>

-      

-      <para>The <literal>implementationName</literal> element must contain the

-        fully-qualified class name of the CAS Consumer implementation, or the name 

-        of a .dll or .so file for C++ implementations.  For Java, the named class must

-        implement the <literal>CasConsumer</literal> interface.</para>

-      

-      <para>The <literal>processingResourceMetaData</literal> element contains

-        essentially the same information as a Primitive Analysis Engine Descriptor&apos;s

-        <literal>analysisEngineMetaData</literal> element, described in <xref

-          linkend="&tp;aes.metadata"/>, except that the CAS Consumer Descriptor&apos;s

-        <literal>capabilities</literal> element should not declare outputs or

-        outputSofas (since CAS Consumers do not modify the CAS).</para>

-      

-      <para>The <literal>externalResourceDependencies</literal> and

-        <literal>resourceManagerConfiguration</literal> elements are exactly the same

-        as in Primitive Analysis Engine Descriptors (see <xref

-          linkend="&tp;aes.primitive.external_resource_dependencies"/> and <xref

-          linkend="&tp;aes.primitive.resource_manager_configuration"/>).</para>

-      

-    </section>

-  </section>

-  

-  <section id="&tp;service_client">

-    <title>Service Client Descriptors</title>

-    

-    <para>Service Client Descriptors specify only a location of a remote service. They are

-      therefore much simpler in structure. In the UIMA SDK, a Service Client Descriptor that

-      refers to a valid Analysis Engine or CAS Consumer service can be used in place of the

-      actual Analysis Engine or CAS Consumer Descriptor. The UIMA SDK will handle the details

-      of calling the remote service. (For details on <emphasis>deploying</emphasis> an

-      Analysis Engine or CAS Consumer as a service, see <olink targetdoc="&uima_docs_tutorial_guides;"

-      /> <olink targetdoc="&uima_docs_tutorial_guides;"

-        targetptr="ugr.tug.application.remote_services"/>.</para>

-    

-    <para>The UIMA SDK is extensible to support different types of remote services. In future

-      versions, there may be different variations of service client descriptors that cater

-      to different types of services. For now, the only type of service client descriptor is

-      the <literal>uriSpecifier</literal>, which supports the Vinci protocol.</para>

-    

-    

-    <programlisting><![CDATA[<?xml version="1.0" encoding="UTF-8" ?>

-<uriSpecifier xmlns="http://uima.apache.org/resourceSpecifier">

-  <resourceType>AnalysisEngine | CasConsumer </resourceType>

-  <uri>[URI]</uri> 

-  <protocol>Vinci</protocol> 

-  <timeout>[Integer]</timeout>

-  <parameters>

-    <parameter name="VNS_HOST" value="some.internet.ip.name-or-address"/>

-    <parameter name="VNS_PORT" value="9000"/>

-    <parameter name="GetMetaDataTimeout" value="[Integer]"/>

-  </parameters> 

-</uriSpecifier>]]></programlisting>

-    

-    <para>The <literal>resourceType</literal> element is required for new descriptors,

-      but is currently allowed to be omitted for backward compatibility. It specifies the

-      type of component (Analysis Engine or CAS Consumer) that is implemented by the service

-      endpoint described by this descriptor.</para>

-    

-    <para>The <literal>uri</literal> element contains the URI for the web service. (Note

-      that in the case of Vinci, this will be the service name, which is looked up in the Vinci

-      Naming Service.)</para>

-    

-    <para>The <literal>protocol</literal> element may be set to Vinci; other protocols may be added

-      later. These specify the particular data transport format that will be used.</para>

-    

-    <para>The <literal>timeout</literal> element is optional. If present, it specifies

-      the number of milliseconds to wait for a request to be processed before an exception is

-      thrown. A value of zero or less will wait forever. If no timeout is specified, a default

-      value (currently 60 seconds) will be used.</para>

-    

-    <para>The parameters element is optional. If present, it can specify values for each

-      of the following:

-    </para>

-    <itemizedlist>

-      <listitem><para><literal>VNS_HOST</literal>: host name for the Vinci naming service.

-      </para></listitem>

-      <listitem><para><literal>VNS_PORT</literal>: port number for the Vinci naming service.

-      </para></listitem>

-      <listitem><para><literal>GetMetaDataTimeout</literal>: timeout period (in milliseconds) for

-          the GetMetaData call.  If not specified, the default is 60 seconds.  This may need

-          to be set higher if there are a lot of clients competing for connections to the service.

-      </para></listitem>      

-    </itemizedlist>

-   

-    <para>If the <literal>VNS_HOST</literal> and <literal>VNS_PORT</literal> are not specified

-      in the descriptor, the values used for these comes from

-      parameters passed on the Java command line using the

-      <literal>&minus;DVNS_HOST=&lt;host&gt;</literal> and/or

-      <literal>&minus;DVNS_PORT=&lt;port&gt;</literal> system arguments. If not present, and

-      a system argument is also not present, the values for these default to

-      <literal>localhost</literal> for the <literal>VNS_HOST</literal> and

-      <literal>9000</literal> for the <literal>VNS_PORT</literal>.</para>

-    

-    <para>For details on how to deploy and call Analysis Engine and CAS Consumer services, see

-        <olink targetdoc="&uima_docs_tutorial_guides;"

-        /> <olink targetdoc="&uima_docs_tutorial_guides;"

-        targetptr="ugr.tug.application.remote_services"/>.</para>

-    

-  </section>

-

-  <section id="&tp;custom_resource_specifiers">

-    <title>Custom Resource Specifiers</title>

-	<para>A Custom Resource Specifier allows you to plug in your own Java class as a UIMA Resource.

-		For example you can support a new service protocol by plugging in a Java class that implements

-		the UIMA <literal>AnalysisEngine</literal> interface and communicates with the remote service.</para>

-	

-	<para>A Custom Resource Specifier has the following format:</para>

-    <programlisting><![CDATA[<?xml version="1.0" encoding="UTF-8" ?>

-<customResourceSpecifier xmlns="http://uima.apache.org/resourceSpecifier">

-  <resourceClassName>[Java Class Name]</resourceClassName>

-  <parameters>

-    <parameter name="[String]" value="[String]"/>

-    <parameter name="[String]" value="[String]"/>

-  </parameters> 

-</customResourceSpecifier>]]></programlisting>	

-	  

-	<para>The <literal>resourceClassName</literal> element must contain the fully-qualified name of a Java class

-	that can be found in the classpath (including the UIMA extension classpath, if you have specified one using

-	the <literal>ResourceManager.setExtensionClassPath</literal> method).  This class must implement the

-	UIMA <literal>Resource</literal> interface.</para>    

-	  

-	<para>When an application calls the <literal>UIMAFramework.produceResource</literal> method and passes a

-	<literal>CustomResourceSpecifier</literal>, the UIMA framework will load the named class and call its

-	<literal>initialize(ResourceSpecifier,Map)</literal> method, passing the <literal>CustomResourceSpecifier</literal>

-	as the first argument.  Your class can override the <literal>initialize</literal> method and use the

-	<literal>CustomResourceSpecifier</literal> API to get access to the <literal>parameter</literal> names and values 

-	specified in the XML.</para>  

-	  

-	<para>If you are using a custom resource specifier to plug in a class that implements a new service protocol,

-	your class must also implement the <literal>AnalysisEngine</literal> interface.  Generally it should also

-	extend <literal>AnalysisEngineImplBase</literal>.  The key methods that should be implemented are 

-	<literal>getMetaData</literal>, <literal>processAndOutputNewCASes</literal>, 

-	<literal>collectionProcessComplete</literal>, and <literal>destroy</literal>.</para>  

-  </section>	  

-</chapter>

diff --git a/uima-docbook-references/src/docbook/ref.xml.cpe_descriptor.xml b/uima-docbook-references/src/docbook/ref.xml.cpe_descriptor.xml
deleted file mode 100644
index 1794049..0000000
--- a/uima-docbook-references/src/docbook/ref.xml.cpe_descriptor.xml
+++ /dev/null
@@ -1,1375 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY imgroot "images/references/ref.xml.cpe_descriptor/">

-<!ENTITY tp "ugr.ref.xml.cpe_descriptor.">

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.ref.xml.cpe_descriptor">

-  <title>Collection Processing Engine Descriptor Reference</title>

-  <titleabbrev>CPE Descriptor Reference</titleabbrev>

-  

-  <para>A UIMA <emphasis>Collection Processing Engine</emphasis> (CPE) is a combination

-    of UIMA components assembled to analyze a collection of artifacts. A CPE is an

-    instantiation of the UIMA <emphasis>Collection Processing Architecture</emphasis>,

-    which defines the collection processing components, interfaces, and APIs. A CPE is

-    executed by a UIMA framework component called the <emphasis>Collection Processing

-    Manager</emphasis> (CPM), which provides a number of services for deploying CPEs,

-    running CPEs, and handling errors.</para>

-  

-  <para>A CPE can be assembled programmatically within a Java application, or it can be

-    assembled declaratively via a CPE configuration specification, called a CPE

-    Descriptor. This chapter describes the format of the CPE Descriptor.</para>

-  

-  <para>Details about the CPE, including its function, sub-components, APIs, and related

-    tools, can be found in <olink targetdoc="&uima_docs_tutorial_guides;"

-    /> <olink targetdoc="&uima_docs_tutorial_guides;"

-      targetptr="ugr.tug.cpe"/>. Here we briefly summarize the CPE to define terms and

-    provide context for the later sections that describe the CPE Descriptor.</para>

-  

-  <section id="&tp;overview">

-    <title>CPE Overview</title>

-    

-    <figure id="&tp;overview.fig.runtime">

-      <title>CPE Runtime Overview</title>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="5.8in" format="PNG"

-            fileref="&imgroot;image002.png"/>

-        </imageobject>

-        <textobject><phrase>CPE Runtime Overview diagram</phrase></textobject>

-      </mediaobject>

-    </figure>

-    

-    <para>An illustration of the CPE runtime is shown in <xref

-        linkend="&tp;overview.fig.runtime"/>. Some of the CPE components, such as the

-      <emphasis>queues</emphasis> and <emphasis>processing pipelines</emphasis>, are

-      internal to the CPE, but their behavior and deployment may be configured using the CPE

-      Descriptor. Other CPE components, such as the <emphasis>Collection

-      Reader</emphasis> and <emphasis>CAS Processors</emphasis>, are defined and

-      configured externally from the CPE and then plugged in to the CPE to create the overall

-      engine. The parts of a CPE are:

-      

-      <variablelist>

-        <varlistentry>

-          <term>Collection Reader</term>

-          <listitem><para>understands the native data collection format and iterates

-            over the collection producing subjects of analysis</para></listitem>

-        </varlistentry>

-        

-        <varlistentry>

-          <term>CAS Initializer<footnote><para>Deprecated</para></footnote>

-            </term>

-          <listitem><para>initializes a CAS with a subject of analysis</para>

-            </listitem>

-        </varlistentry>

-        

-        <varlistentry>

-          <term>Artifact Producer</term>

-          <listitem><para>asynchronously pulls CASes from the Collection Reader,

-            creates batches of CASes and puts them into the work queue</para></listitem>

-        </varlistentry>

-        

-        <varlistentry>

-          <term>Work Queue</term>

-          <listitem><para>shared queue containing batches of CASes queued by the Artifact

-            Producer for analysis by Analysis Engines</para>

-          </listitem>

-        </varlistentry>

-        

-        <varlistentry>

-          <term>B1-Bn</term>

-          <listitem><para>individual batches containing 1 or more CASes</para>

-            </listitem>

-        </varlistentry>

-        

-        <varlistentry>

-          <term>AE1-AEn</term>

-          <listitem><para>Analysis Engines arranged by a CPE descriptor</para>

-            </listitem>

-        </varlistentry>

-        

-        <varlistentry>

-          <term>Processing Pipelines</term>

-          <listitem><para>each pipeline runs in a separate thread and contains a

-            replicated set of the Analysis Engines running in the defined sequence</para>

-            </listitem>

-        </varlistentry>

-        

-        <varlistentry>

-          <term>Output Queue</term>

-          <listitem><para>holds batches of CASes with analysis results intended for CAS

-            Consumers</para></listitem>

-        </varlistentry>

-        

-        <varlistentry>

-          <term>CAS Consumers</term>

-          <listitem><para>perform collection level analysis over the CASes and extract

-            analysis results, e.g., creating indexes or databases</para></listitem>

-        </varlistentry>

-      </variablelist>

-      </para>

-  </section>

-  

-  <section id="&tp;notation">

-    <title>Notation</title>

-    

-    <para>CPE Descriptors are XML files. This chapter uses an informal notation to specify

-      the syntax of CPE Descriptors.</para>

-    

-    <para>The notation used in this chapter is:

-      

-      <itemizedlist><listitem><para>An ellipsis (...) inside an element body indicates

-        that the substructure of that element has been omitted (to be described in another

-        section of this chapter). An example of this would be:

-        

-        

-        <programlisting>&lt;collectionReader&gt;

-...

-&lt;/collectionReader&gt;</programlisting></para>

-        </listitem>

-        

-        <listitem><para>An ellipsis immediately after an element indicates that the

-          element type may be repeated arbitrarily many times. For example:

-          

-          

-          <programlisting>&lt;parameter&gt;[String]&lt;/parameter&gt;

-&lt;parameter&gt;[String]&lt;/parameter&gt;

-...</programlisting>

-          indicates that there may be arbitrarily many parameter elements in this

-          context.</para></listitem>

-        

-        <listitem><para>An ellipsis inside an element means details of the attributes

-          associated with that element are defined later, e.g.:

-          

-          <programlisting>&lt;casProcessor ...&gt;</programlisting></para>

-          </listitem>

-        

-        <listitem><para>Bracketed expressions (e.g. <literal>[String]</literal>)

-          indicate the type of value that may be used at that location.</para></listitem>

-        

-        <listitem><para>A vertical bar, as in <literal>true|false</literal>, indicates

-          alternatives. This can be applied to literal values, bracketed type names, and

-          elements. </para></listitem></itemizedlist></para>

-    

-    <para>Which elements are optional and which are required is specified in prose, not in the

-      syntax definition.</para>

-    

-  </section>

-  

-  <section id="&tp;imports">

-    <title>Imports</title>

-    

-    <para>As of version 2.2, a CPE Descriptor can use the same <literal>import</literal> mechanism

-      as other component descriptors.  This allows referring to component

-      descriptors using either relative paths (resolved relative to the location of the CPE descriptor)

-      or the classpath/datapath.  For details see <olink targetdoc="&uima_docs_ref;"

-      targetptr="ugr.ref.xml.component_descriptor"/>.</para>

-     

-    <para>The follwing older syntax is still supported, but <emphasis>not recommended</emphasis>:

-      

-      <programlisting><![CDATA[<descriptor>

-    <include href="[URL or File]"/>

-</descriptor>]]></programlisting></para>

-    

-    <para>The <literal>[URL or File]</literal> attribute is a URL or a filename for the descriptor of the

-      incorporated component. The argument is first attempted to be resolved as a URL.</para>

-    

-    <para>

-      Relative paths in an <literal>include</literal> are resolved relative to the current working directory 

-      (NOT the CPE descriptor location as is the case for <literal>import</literal>). 

-      A filename relative to another directory can be specified using the <literal>CPM_HOME</literal>

-      variable, e.g.,    

-    <programlisting>&lt;descriptor&gt;

-    &lt;include href="${CPM_HOME}/desc_dir/descriptor.xml"/&gt;

-&lt;/descriptor&gt;</programlisting>

-    

-      In this case, the value for the <literal>CPM_HOME</literal> variable must be

-      provided to the CPE by specifying it on the Java command line, e.g.,

-        

-    <programlisting>java -DCPM_HOME="C:/Program Files/apache/uima/cpm" ...</programlisting>

-    

-  </para>

-    

-  </section>

-  

-  <section id="&tp;descriptor">

-    <title>CPE Descriptor Overview</title>

-    

-    <para>A CPE Descriptor consists of information describing the following four main

-      elements.</para>

-    

-    <orderedlist><listitem><para>The <emphasis>Collection Reader</emphasis>, which

-      is responsible for gathering artifacts and initializing the Common Analysis

-      Structure (CAS) used to support processing in the UIMA collection processing

-      engine.</para></listitem>

-      

-      <listitem><para>The <emphasis>CAS Processors</emphasis>, responsible for

-        analyzing individual artifacts, analyzing across artifacts, and extracting

-        analysis results. CAS Processors include <emphasis>Analysis Engines</emphasis>

-        and <emphasis>CAS Consumers</emphasis>.</para></listitem>

-      

-      <listitem><para>Operational parameters of the <emphasis>Collection Processing

-        Manager</emphasis> (CPM), such as checkpoint frequency and deployment

-        mode.</para></listitem>

-      

-      <listitem><para>Resource Manager Configuration (optional). </para></listitem>

-      </orderedlist>

-    

-    <para>The CPE Descriptor has the following high level skeleton:

-      

-      

-      <programlisting><![CDATA[<?xml version="1.0"?>

-<cpeDescription>

-   <collectionReader>

-...

-   </collectionReader>

-   <casProcessors>

-...

-   </casProcessors>

-   <cpeConfig>

-...

-   </cpeConfig>

-   <resourceManagerConfiguration>

-...

-   </resourceManagerConfiguration>

-</cpeDescription>]]></programlisting></para>

-    

-    <para>Details of each of the four main elements are described in the sections that

-      follow.</para>

- </section>   

-    <section id="&tp;descriptor.collection_reader">

-      <title>Collection Reader</title>

-      

-      <para>The <literal>&lt;collectionReader&gt;</literal> section identifies the

-        Collection Reader and optional CAS Initializer that are to be used in the CPE. The

-        Collection Reader is responsible for retrieval of artifacts from a collection

-        outside of the CPE, and the optional CAS Initializer (deprecated as of UIMA Version 2)

-        is responsible for initializing the CAS with the artifact.</para>

-      

-      <para>A Collection Reader may initialize the CAS itself, in which case it does not

-        require a CAS Initializer. This should be clearly specified in the documentation for

-        the Collection Reader. Specifying a CAS Initializer for a Collection Reader that

-        does not make use of a CAS Initializer will not cause an error, but the specified CAS

-        Initializer will not be used.</para>

-      

-      <para>The complete structure of the <literal>&lt;collectionReader&gt;</literal>

-        section is:

-        

-        

-        <programlisting><![CDATA[<collectionReader>

-  <collectionIterator>

-    <descriptor>

-      <import ...> | <include .../>

-    </descriptor>

-    <configurationParameterSettings>...</configurationParameterSettings>

-    <sofaNameMappings>...</sofaNameMappings>

-  </collectionIterator>

-  <casInitializer>

-    <descriptor>

-      <import ...> | <include .../>

-    </descriptor>

-    <configurationParameterSettings>...</configurationParameterSettings>

-    <sofaNameMappings>...</sofaNameMappings>

-  </casInitializer>

-</collectionReader>]]></programlisting></para>

-      

-      <para>The <literal>&lt;collectionIterator&gt;</literal> identifies the

-        descriptor for the Collection Reader, and the <literal>&lt;casInitializer&gt;

-        </literal>identifies the descriptor for the CAS Initializer. The format and

-        details of the Collection Reader and CAS Initializer descriptors are described in

-          <olink targetdoc="&uima_docs_ref;"

-          targetptr="ugr.ref.xml.component_descriptor.collection_processing_parts.collection_reader"/>

-        . The <literal>&lt;configurationParameterSettings&gt; </literal>and the

-        <literal>&lt;sofaNameMappings&gt;</literal> elements are described in the next

-        section.</para>

-      

-      <section id="&tp;descriptor.collection_reader.error_handling">

-        <title>Error handling for Collection Readers</title>

-        

-        <para>The CPM will abort if the Collection Reader throws a large number of

-          consecutive exceptions (default = 100). This default can by changed by using the

-          Java initialization parameter <literal>&minus;DMaxCRErrorThreshold

-          xxx.</literal></para>

-      </section>

-    </section>

-    

-    <section id="&tp;descriptor.cas_processors">

-      <title>CAS Processors</title>

-      

-      <para>The <literal>&lt;casProcessors&gt;</literal> section identifies the

-        components that perform the analysis on the input data, including CAS analysis

-        (Analysis Engines) and analysis results extraction (CAS Consumers). The CAS

-        Consumers may also perform collection level analysis, where the analysis is

-        performed (or aggregated) over multiple CASes. The basic structure of the CAS

-        Processors section is:

-        

-        

-        <programlisting><![CDATA[<casProcessors 

-    dropCasOnException="true|false"

-    casPoolSize="[Number]" 

-    processingUnitThreadCount="[Number]">

-

-  <casProcessor ...>

-        ...

-  </casProcessor>

-

-  <casProcessor ...>

-        ...

-  </casProcessor>

-    ...

-</casProcessors>]]></programlisting></para>

-      

-      <para>The <literal>&lt;casProcessors&gt;</literal> section has two mandatory

-        attributes and one optional attribute that configure the characteristics of the CAS

-        Processor flow in the CPE. The first mandatory attribute is a casPoolSize, which

-        defines the fixed number of CAS instances that the CPM will create and use during

-        processing. All CAS instances are maintained in a CAS Pool with a check-in and

-        check-out access. Each CAS is checked-out from the CAS Pool by the Collection Reader

-        and initialized with an initial subject of analysis. The CAS is checked-in into the

-        CAS Pool when it is completely processed, at the end of the processing chain. A larger

-        CAS Pool size will result in more memory being used by the CPM. CAS objects can be large

-        and care should be taken to determine the optimum size of the CAS Pool, weighing memory

-        tradeoffs with performance.</para>

-      

-      <para>The second mandatory <literal>&lt;casProcessors&gt;</literal> attribute

-        is <literal>processingUnitThreadCount</literal>, which specifies the number of

-        replicated <emphasis>Processing Pipelines</emphasis>. Each Processing

-        Pipeline runs in its own thread. The CPM takes CASes from the work queue and submits

-        each CAS to one of the Processing Pipelines for analysis. A Processing Pipeline

-        contains one or more Analysis Engines invoked in a given sequence. If more than one

-        Processing Pipeline is specified, the CPM replicates instances of each Analysis

-        Engine defined in the CPE descriptor. Each Processing Pipeline thread runs

-        independently, consuming CASes from work queue and depositing CASes with analysis

-        results onto the output queue. On multiprocessor machines, multiple Processing

-        Pipelines can run in parallel, improving overall throughput of the CPM.</para>

-      <note><para>The number of Processing Pipelines should be equal to or greater than CAS

-      Pool size. </para></note>

-      

-      <para>Elements in the pipeline (each represented by a &lt;casProcessor&gt; element)

-        may indicate that they do not permit multiple deployment in their Analysis Engine

-        descriptor. If so, even though multiple pipelines are being used, all CASes passing

-        through the pipelines will be routed through one instance of these marked Engines.

-        </para>

-      

-      <para>The final, optional, &lt;casProcessors&gt; attribute is

-        <literal>dropCasOnException</literal>. It defines a policy that determines what

-        happens with the CAS when an exception happens during processing. If the value of this

-        attribute is set to true and an exception happens, the CPM will notify all registered

-        listeners of the exception (see <olink targetdoc="&uima_docs_tutorial_guides;"

-        /> <olink targetdoc="&uima_docs_tutorial_guides;"

-          targetptr="ugr.tug.cpe.using_listeners"/>), clear the CAS and check the CAS

-        back into the CAS Pool so that it can be re-used. The presumption is that an exception

-        may leave the CAS in an inconsistent state and therefore that CAS should not be allowed

-        to move through the processing chain. When this attribute is omitted the CPM&apos;s

-        default is the same as specifying

-        <literal>dropCasOnException="false"</literal>.</para>

-      

-      <section id="&tp;descriptor.cas_processors.individual">

-        <title>Specifying an Individual CAS Processor</title>

-        

-        <para>The CAS Processors that make up the Processing Pipeline and the CAS Consumer

-          pipeline are specified with the <literal>&lt;casProcessor&gt;</literal>

-          entity, which appears within the <literal>&lt;casProcessors&gt;</literal>

-          entity. It may appear multiple times, once for each CAS Processor specified for

-          this CPE.</para>

-        

-        <para>The order of the <literal>&lt;casProcessor&gt;</literal> entities with

-          the <literal>&lt;casProcessors&gt;</literal> section specifies the order in

-          which the CAS Processors will run. Although CAS Consumers are usually put at the end

-          of the pipeline, they need not be. Also, Aggregate Analysis Engines may include CAS

-          Consumers.</para>

-        

-        <para>The overall format of the <literal>&lt;casProcessor&gt;</literal> entity

-          is:

-          

-          

-          <programlisting><![CDATA[<casProcessor deployment="local|remote|integrated" name="[String]" >

-    <descriptor>

-      <import ...> | <include .../>

-    </descriptor>

-    <configurationParameterSettings>...</configurationParameterSettings>

-    <sofaNameMappings>...</sofaNameMappings>

-    <runInSeparateProcess>...</runInSeparateProcess>

-    <deploymentParameters>...</deploymentParameters>

-    <filter/>

-    <errorHandling>...</errorHandling>

-    <checkpoint batch="Number"/>

-</casProcessor>]]></programlisting></para>

-        

-        <para>The <literal>&lt;casProcessor&gt;</literal> element has two mandatory

-          attributes, <literal>deployment</literal> and <literal>name</literal>. The

-          mandatory <literal>name</literal> attribute specifies a unique string

-          identifying the CAS Processor.</para>

-        

-        <para>The mandatory <literal>deployment</literal> attribute specifies the CAS

-          Processor deployment mode. Currently, three deployment options are supported:

-          

-          <variablelist>

-            <varlistentry>

-              <term>integrated</term>

-              <listitem><para>indicates <emphasis>integrated</emphasis> deployment

-                of the CAS Processor. The CPM deploys and collocates the CAS Processor in the

-                same process space as the CPM. This type of deployment is recommended to

-                increase the performance of the CPE. However, it is NOT recommended to

-                deploy annotators containing JNI this way. Such CAS Processors may cause a

-                fatal exception and force the JVM to exit without cleanup (bringing down the

-                CPM). Any UIMA SDK compliant pure Java CAS Processors may be safely deployed

-                this way.</para>

-                <para>The descriptor for an integrated deployment can, in fact, be a remote

-                  service descriptor. When used this way, however, the CPM error recovery 

-                  options (see below) operate in the integrated mode, which means that many 

-                  of the retry options are not available.</para></listitem>

-            </varlistentry>

-            <varlistentry>

-              <term>remote</term>

-              <listitem><para>indicates <emphasis>non-managed</emphasis>

-                deployment of the CAS Processor. The CAS Processor descriptor referenced

-                in the <literal>&lt;descriptor&gt;</literal> element must be a Vinci

-                <emphasis>Service Client Descriptor</emphasis>, which identifies a

-                remotely deployed CAS Processor service (see <olink

-                  targetdoc="&uima_docs_tutorial_guides;"/> <olink

-                  targetdoc="&uima_docs_tutorial_guides;"

-                  targetptr="ugr.tug.application.remote_services"/>). The CPM

-                assumes that the CAS Processor is already running as a remote service and

-                will connect to it using the URI provided in the client service descriptor.

-                The lifecycle of a remotely deployed CAS Processor is not managed by the CPM,

-                so appropriate infrastructure should be in place to start/restart such CAS

-                Processors when necessary. This deployment provides fault isolation and

-                is implementation (i.e., programming language) neutral.</para>

-                </listitem>

-            </varlistentry>

-            <varlistentry>

-              <term>local</term>

-              <listitem><para>indicates <emphasis>managed</emphasis> deployment of

-                the CAS Processor. The CAS Processor descriptor referenced in the

-                <literal>&lt;descriptor&gt;</literal> element must be a Vinci

-                <emphasis>Service Deployment Descriptor</emphasis>, which configures

-                a CAS Processor for deployment as a Vinci service (see <olink

-                  targetdoc="&uima_docs_tutorial_guides;"/> <olink

-                  targetdoc="&uima_docs_tutorial_guides;"

-                  targetptr="ugr.tug.application.remote_services"/>). The CPM

-                deploys the CAS Processor in a separate process and manages the life cycle

-                (start/stop) of the CAS Processor. Communication between the CPM and the

-                CAS Processor is done with Vinci. When the CPM completes processing, the

-                process containing the CAS Processor is terminated. This deployment mode

-                insulates the CPM from the CAS Processor, creating a more robust deployment

-                at the cost of a small communication overhead. On multiprocessor machines,

-                the separate processes may run concurrently and improve overall

-                throughput.</para></listitem>

-            </varlistentry>

-          </variablelist></para>

-        

-        <para>A number of elements may appear within the

-          <literal>&lt;casProcessor&gt;</literal> element.</para>

-        

-        <section id="&tp;descriptor.cas_processors.individual.descriptor">

-          <title>&lt;descriptor&gt; Element</title>

-          

-          <para>The <literal>&lt;descriptor&gt;</literal> element is mandatory. It

-            identifies the descriptor for the referenced CAS Processor using the syntax

-            described in <olink targetdoc="&uima_docs_ref;"

-              targetptr="ugr.ref.xml.component_descriptor.aes"/>.

-            

-            <itemizedlist spacing="compact"><listitem><para>For

-              <emphasis><literal>remote</literal></emphasis> CAS Processors, the

-              referenced descriptor must be a Vinci <emphasis>Service Client

-              Descriptor</emphasis>, which identifies a remotely deployed CAS Processor

-              service.</para></listitem>

-              

-              <listitem><para>For <emphasis>local</emphasis> CAS Processors, the

-                referenced descriptor must be a Vinci <emphasis>Service Deployment

-                Descriptor</emphasis>.</para></listitem>

-              

-              <listitem><para>For <emphasis>integrated</emphasis> CAS Processors,

-                the referenced descriptor must be an Analysis Engine Descriptor

-                (primitive or aggregate). </para></listitem></itemizedlist> </para>

-          

-          <para>See <olink targetdoc="&uima_docs_tutorial_guides;"

-              /> <olink targetdoc="&uima_docs_tutorial_guides;"

-              targetptr="ugr.tug.application.remote_services"/> for more

-            information on creating these descriptors and deploying services.</para>

-          

-        </section>

-        

-        <section

-          id="&tp;descriptor.cas_processors.individual.configuration_parameter_settings">

-          <title>&lt;configurationParameterSettings&gt; Element</title>

-          

-          <para>This element provides a way to override the contained Analysis

-            Engine&apos;s parameters settings. Any entry specified here must already be

-            defined; values specified replace the corresponding values for each

-            parameter. <emphasis role="bold-italic">For Cas Processors, this mechanism

-            is only available when they are deployed in <quote>integrated</quote>

-            mode.</emphasis> For Collection Readers and Initializers, it always is

-            available.</para>

-          

-          <para>The content of this element is identical to the component descriptor for

-            specifying parameters (in the case where no parameter groups are

-            specified)<footnote><para>An earlier UIMA version required these to have a

-            suffix of <quote>_p</quote>, e.g., <quote>string_p</quote>. This is no

-            longer required, but this format is accepted, also, for backward

-            compatibility.</para></footnote>. Here is an example:

-            

-            

-            <programlisting><![CDATA[<configurationParameterSettings>

-  <nameValuePair>

-    <name>CivilianTitles</name>

-    <value>

-      <array>

-        <string>Mr.</string>

-        <string>Ms.</string>

-        <string>Mrs.</string>

-        <string>Dr.</string>

-      </array>  

-    </value>

-  </nameValuePair>

-  ...

-</configurationParameterSettings>]]></programlisting></para>

-          

-        </section>

-        

-        <section

-          id="&tp;descriptor.cas_processors.individual.sofa_name_mappings">

-          <title>&lt;sofaNameMappings&gt; Element</title>

-          

-          <para>This optional element provides a mapping from defined Sofa names in the

-            component, or the default Sofa name (if the component does not declare any Sofa

-            names). The form of this element is:

-            

-            

-            <programlisting>&lt;sofaNameMappings&gt;

-  &lt;sofaNameMapping cpeSofaName="a_CPE_name"

-                   componentSofaName="a_component_Name"/&gt;

-  ...

-&lt;/sofaNameMappings&gt;</programlisting></para>

-          

-          <para>There can be any number of<literal>

-            &lt;sofaNameMapping&gt;</literal> elements contained in the

-            <literal>&lt;sofaNameMappings&gt;</literal> element. The

-            <literal>componentSofaName</literal> attribute is optional; leave it out to

-            specify a mapping for the <literal>_InitialView</literal> - that is, for

-            Single-View components.</para>

-          

-        </section>

-        

-        <section id="&tp;descriptor.cas_processors.run_in_separate_process">

-          <title>&lt;runInSeparateProcess&gt; Element</title>

-          

-          <para>The <literal>&lt;runInSeparateProcess&gt;</literal> element is

-            mandatory for <literal>local</literal> CAS Processors, but should not appear

-            for <literal>remote</literal> or <literal>integrated</literal> CAS

-            Processors. It enables the CPM to create external processes using the provided

-            runtime environment. Applications launched this way communicate with the CPM

-            using the Vinci protocol and connectivity is enabled by a local instance of the

-            VNS that the CPM manages. Since communication is based on Vinci, the application

-            need not be implemented in Java. Any language for which Vinci provides support

-            may be used to create an application, and the CPM will seamlessly communicate

-            with it. The overall structure of this element is:

-            

-            

-            <programlisting><![CDATA[<runInSeparateProcess>

-    <exec dir="[String]" executable="[String]">

-        <env key="[String]" value ="[String]"/>

-        ...

-        <arg>[String]</arg>

-        ...

-    </exec>

-</runInSeparateProcess>]]></programlisting></para>

-          

-          <para>The <literal>&lt;exec&gt;</literal> element provides information

-            about how to execute the referenced CAS Processor. Two attributes are defined

-            for the <literal>&lt;exec&gt;</literal> element. The

-            <literal>dir</literal> attribute is currently not used &ndash; it is reserved

-            for future functionality. The <literal>executable</literal> attribute

-            specifies the actual Vinci service executable that will be run by the CPM, e.g.,

-            <literal>java</literal>, a batch script, an application (.exe), etc. The

-            executable must be specified with a fully qualified path, or be found in the

-            <literal>PATH</literal> of the CPM.</para>

-          

-          <para>The <literal>&lt;exec&gt;</literal> element has two elements within it

-            that define parameters used to construct the command line for executing the CAS

-            Processor. These elements must be listed in the order in which they should be

-            defined for the CAS Processor.</para>

-          

-          <para>The optional <literal>&lt;env&gt;</literal> element is used to set an

-            environment variable. The variable <literal>key</literal> will be set to

-            <literal>value</literal>. For example,

-            

-            

-            <programlisting>&lt;env key="CLASSPATH" value="C:Javalib"/&gt;</programlisting>

-            will set the environment variable <literal>CLASSPATH</literal> to the value

-            <literal>C:Javalib</literal>. The <literal>&lt;env&gt;</literal>

-            element may be repeated to set multiple environment variables. All of the

-            key/value pairs will be added to the environment by the CPM prior to launching the

-            executable.</para>

-          <note><para>The CPM actually adds ALL system environment variables when it

-          launches the program. It queries the Operating System for its current system

-          variables and one by one adds them to the program&apos;s process

-          configuration.</para></note>

-          

-          <para>The <literal>&lt;arg&gt;</literal> element is used to specify arbitrary

-            string arguments that will appear on the command line when the CPM runs the

-            command specified in the <literal>executable</literal> attribute.</para>

-          

-          <para>For example, the following would be used to invoke the UIMA Java

-            implementation of the Vinci service wrapper on a Java CAS Processor:

-            

-            

-            <programlisting><![CDATA[<runInSeparateProcess>

-    <exec executable="java">

-        <arg>&minus;DVNS_HOST=localhost</arg> 

-        <arg>&minus;DVNS_PORT=9099</arg>

-        <arg>org.apache.uima.reference_impl.analysis_engine.service.

-vinci.VinciAnalysisEngineService_impl</arg> 

-        <arg>C:uimadescdeployCasProcessor.xml</arg>

-    </exec>

-<runInSeparateProcess>]]></programlisting></para>

-          

-          <para>This will cause the CPM to run the following command line when starting the

-            CAS Processor:

-            

-            

-            <programlisting>java -DVNS_HOST=localhost -DVNS_PORT=9099 

-  org.apache.uima.reference_impl.analysis_engine.service.vinci.\\

-              VinciAnalysisEngineService_impl 

-  C:uimadescdeployCasProcessor.xml</programlisting></para>

-          

-          <para>The first argument specifies that the Vinci Naming Service is running on the

-            <literal>localhost</literal>. The second argument specifies that the Vinci

-            Naming Service port number is <literal>9099</literal>. The third argument

-            (split over 2 lines in this documentation) 

-            identifies the UIMA implementation of the Vinci service wrapper. This class

-            contains the <literal>main</literal> method that will execute. That main

-            method in turn takes a single argument &ndash; the filename for the CAS Processor

-            service deployment descriptor. Thus the last argument identifies the Vinci

-            service deployment descriptor file for the CAS Processor. Since this is the same

-            descriptor file specified earlier in the

-            <literal>&lt;descriptor&gt;</literal> element, the string

-            <literal>${descriptor}</literal> can be used to refer to the descriptor,

-            e.g.:

-            

-            

-            <programlisting>&lt;arg&gt;${descriptor}&lt;/arg&gt;</programlisting></para>

-          

-          <para>The CPM will expand this out to the service deployment descriptor file

-            referenced in the <literal>&lt;descriptor&gt;</literal> element.</para>

-          

-        </section>

-        

-        <section

-          id="&tp;descriptor.cas_processors.individual.deployment_parameters">

-          <title>&lt;deploymentParameters&gt; Element</title>

-          

-          <para>The <literal>&lt;deploymentParameters&gt;</literal> element defines

-            a number of deployment parameters that control how the CPM will interact with the

-            CAS Processor. This element has the following overall form:

-            

-            

-            <programlisting>&lt;deploymentParameters&gt;

-    &lt;parameter name="[String]" value="..." type="string|integer" /&gt; 

-    ...

-&lt;/deploymentParameters&gt;</programlisting></para>

-          

-          <para>The <literal>name</literal> attribute identifies the parameter, the

-            <literal>value</literal> attribute specifies the value that will be assigned

-            to the parameter, and the <literal>type</literal> attribute indicates the

-            type of the parameter, either <literal>string</literal> or

-            <literal>integer</literal>. The available parameters include:

-            

-            <variablelist>

-              

-              <varlistentry>

-                <term>service-access</term>

-                <listitem><para>string parameter whose value must be

-                  <quote>exclusive</quote>, if present. This parameter is only

-                  effective for remote deployments. It modifies the Vinci service

-                  connections to be preallocated and dedicated, one service instance per

-                  pipe-line. It is only relevant for non-Integrated deployement modes. If

-                  there are fewer services instances that are available (and alive &ndash;

-                  responding to a <quote>ping</quote> request) than there are pipelines,

-                  the number of pipelines (the number of concurrent threads) is reduced to

-                  match the number of available instances. If not specified, the VNS is

-                  queried each time a service is needed, and a <quote>random</quote>

-                  instance is assigned from the pool of available instances. If a services

-                  dies during processing, the CPM will use its normal error handling

-                  procedures to attempt to reconnect. The number of attempts is specified

-                  in the CPE descriptor for each Cas Processor using the

-                  <literal>&lt;maxConsecutiveRestarts value="10"

-                  action="kill-pipeline"

-                  waitTimeBetweenRetries="50"/&gt;</literal> xml element. The

-                  <quote>value</quote> attribute is the number of reconnection tries;

-                  the <quote>action</quote> says what to do if the retries exceed the

-                  limit. The <quote>kill-pipeline</quote> action stops the pipeline

-                  that was associated with the failing service (other pipelines will

-                  continue to work). The CAS in process within a killed pipeline will be

-                  dropped. These events are communicated to the application using the

-                  normal event listener mechanism. The

-                  <literal>waitTimeBetweenRetries</literal> says how many

-                  milliseconds to wait inbetween attempts to reconnect.</para>

-                  </listitem>

-              </varlistentry>

-              

-              <varlistentry>

-                <term>vnsHost</term>

-                <listitem><para>(Deprecated) string parameter specifying the VNS host,

-                  e.g., <literal>localhost</literal> for local CAS Processors, host

-                  name or IP address of VNS host for remote CAS Processors. This parameter is

-                  deprecated; use the parameter specification instead inside the Vinci

-                  <emphasis>Service Client Descriptor</emphasis>, if needed. It is

-                  ignored for integrated and local deployments. If present, for remote

-                  deployments, it specifies the VNS Host to use, unless that is specified in

-                  the Vinci <emphasis>Service Client Descriptor</emphasis>.</para>

-                  </listitem>

-              </varlistentry>

-              

-              <varlistentry>

-                <term>vnsPort</term>

-                <listitem><para>(Deprecated) integer parameter specifying the VNS port

-                  number. This parameter is deprecated; use the parameter specification

-                  instead inside the Vinci <emphasis>Service Client

-                  Descriptor,</emphasis> if needed. It is ignored for integrated and

-                  local deployments. If present, for remote deployments, it specifies the

-                  VNS Port number to use, unless that is specified in the Vinci

-                  <emphasis>Service Client Descriptor.</emphasis></para>

-                  </listitem>

-              </varlistentry>

-            </variablelist></para>

-          

-          <para>For example, the following parameters might be used with a CAS Processor

-            deployed in local mode:

-            

-            

-            <programlisting>&lt;deploymentParameters&gt;

-  &lt;parameter name="service-access" value="exclusive" type="string"/&gt; 

-&lt;/deploymentParameters&gt;</programlisting></para>

-          

-        </section>

-        

-        <section id="&tp;descriptor.cas_processors.individual.filter">

-          <title>&lt;filter&gt; Element</title>

-          

-          <para>The &lt;filter&gt; element is a required element but currently should be

-            left empty. This element is reserved for future use.</para>

-          

-        </section>

-        

-        <section id="&tp;descriptor.cas_processors.individual.error_handling">

-          <title>&lt;errorHandling&gt; Element</title>

-          

-          <para>The mandatory <literal>&lt;errorHandling&gt;</literal> element

-            defines error and restart policies for the CAS Processor. Each CAS Processor may

-            define different actions in the event of errors and restarts. The CPM monitors

-            and logs errant behaviors and attempts to recover the component based on the

-            policies specified in this element.</para>

-          

-          <para>There are two kinds of faults:

-            

-            <orderedlist><listitem><para>One kind only occurs with non-integrated CAS

-              Processors &ndash; this fault is either a timeout attempting to launch or

-              connect to the non-integrated component, or some other kind of connection

-              related exception (for instance, the network connection might timeout or get

-              reset).</para></listitem>

-              

-              <listitem><para>The other kind happens when the CAS Processor component (an

-                Annotator, for example) throws any kind of exception. This kind may occur

-                with any kind of deployment, integrated or not. </para></listitem>

-              </orderedlist></para>

-          

-          <para>The &lt;errorHandling&gt; has specifications for each of these kinds of

-            faults. The format of this element is:

-            

-            

-            <programlisting><![CDATA[<errorHandling>

-  <maxConsecutiveRestarts action="continue|disable|terminate"

-                           value="[Number]"/>

-  <errorRateThreshold action="continue|disable|terminate" value="[Rate]"/>

-  <timeout max="[Number]"/>

-</errorHandling>]]></programlisting></para>

-          

-          <para>The mandatory <literal>&lt;maxConsecutiveRestarts&gt;</literal>

-            element applies only to faults of the first kind, and therefore, only applies to

-            non-integrated deployments. If such a fault occurs, a retry is attempted, up to

-            <literal>value="[Number]"</literal> of times. This retry resets the

-            connection (if one was made) and attempts to reconnect and perhaps re-launch

-            (see below for details). The original CAS (not a partially updated one) is sent to

-            the CAS Processor as part of the retry, once the deployed component has been

-            successfully restarted or reconnected to.</para>

-          

-          <para>The <literal>action</literal> attribute specifies the action to take

-            when the threshold specified by the <literal>value="[Number]"</literal> is

-            exceeded. The possible actions are:

-            

-            <variablelist>

-              <varlistentry>

-                <term>continue</term>

-                <listitem><para>skip any further processing for this CAS by this CAS

-                  Processor, and pass the CAS to the next CAS Processor in the Pipeline.

-                  </para>

-                  <para>The <quote>restart</quote> action is done, because it is needed

-                    for the next CAS.</para>

-                  

-                  <para>If the <literal>dropCasOnException="true"</literal>, the CPM

-                    will NOT pass the CAS to the next CAS Processor in the chain. Instead, the

-                    CPM will abort processing of this CAS, release the CAS back to the CAS

-                    Pool and will process the next CAS in the queue.</para>

-                  

-                  <para>The counter counting the restarts toward the threshold is only

-                    reset after a CAS is successfully processed.</para></listitem>

-              </varlistentry>

-              

-              <varlistentry>

-                <term>disable</term>

-                <listitem><para>the current CAS is handled just as in the

-                  <literal>continue</literal> case, but in addition, the CAS Processor

-                  is marked so that its <emphasis>process()</emphasis> method will not be

-                  called again (i.e., it will be <quote>skipped</quote> for future

-                  CASes)</para></listitem>

-              </varlistentry>

-              

-              <varlistentry>

-                <term>terminate</term>

-                <listitem><para>the CPM will terminate all processing and exit.</para>

-                  </listitem>

-              </varlistentry>

-            </variablelist></para>

-          

-          <para>The definition of an error for the

-            <literal>&lt;maxConsecutiveRestarts&gt;</literal> element differs

-            slightly for each of the three CAS Processor deployment modes:

-            <variablelist>

-              <varlistentry>

-                <term>local</term>

-                <listitem><para>Local CAS Processors experience two general error

-                  types:

-                  <itemizedlist>

-                    <listitem><para>launch errors &ndash; errors associated with

-                      launching a process</para></listitem>

-                    <listitem><para>processing errors &ndash; errors associated with

-                      sending Vinci commands to the process</para></listitem>

-                  </itemizedlist></para>

-                  

-                  <para>A launch error is defined by a failure of the process to

-                    successfully register with the local VNS within a default time window.

-                    The current timeout is 15 minutes. Multiple local CAS Processors are

-                    launched sequentially, with a subsequent processor launched

-                    immediately after its previous processor successfully registers

-                    with the VNS.</para>

-                  

-                  <para>A processing error is detected if a connection to the CAS Processor

-                    is lost or if the processing time exceeds a specified timeout

-                    value.</para>

-                  

-                  <para>For local CAS Processors, the

-                    &lt;maxConsecutiveRestarts&gt; element specifies the number of

-                    consecutive attempts made to launch the CAS Processor at CPM startup or

-                    after the CPM has lost a connection to the CAS Processor.</para>

-                  </listitem>

-              </varlistentry>

-              

-              <varlistentry>

-                <term>remote</term>

-                <listitem><para>For remote CAS Processors, the

-                  &lt;maxConsecutiveRestarts&gt; element applies to errors from

-                  sending Vinci commands. An error is detected if a connection to the CAS

-                  Processor is lost, or if the processing time exceeds the timeout value

-                  specified in the &lt;timeout&gt; element (see below).</para>

-                  </listitem>

-              </varlistentry>

-              

-              <varlistentry>

-                <term>integrated</term>

-                <listitem><para>Although mandatory, the

-                  &lt;maxConsecutiveRestarts&gt; element is NOT used for integrated CAS

-                  Processors, because Integrated CAS Processors are not

-                  re-instantiated/restarted on exceptions. This setting is ignored by

-                  the CPM for Integrated CAS Processors but it is required. Future version

-                  of the CPM will make this element mandatory for remote and local CAS

-                  Processors only.</para></listitem>

-              </varlistentry>

-              

-            </variablelist></para>

-          

-          <para>The mandatory <literal>&lt;errorRateThreshold&gt;</literal> element

-            is used for all faults &ndash; both those above, and exceptions thrown by the CAS

-            Processor itself. It specifies the number of retries for exceptions thrown by

-            the CAS Processor itself, a maximum error rate, and the corresponding action to

-            take when this rate is exceeded. The <literal>value</literal> attribute

-            specifies the error rate in terms of errors per sample size in the form

-            <quote><literal>N/M</literal></quote>, where <literal>N</literal> is the

-            number of errors and <literal>M</literal> is the sample size, defined in terms

-            of the number of documents.</para>

-          

-          <para>The first number is used also to indicate the maximum number of retries. If

-            this number is less than the <literal>&lt;maxConsecutiveRestarts

-            value="[Number]"&gt;, </literal>it will override, reducing the number of

-            <quote>restarts</quote> attempted. A retry is done only if the

-            <literal>dropCasOnException </literal>is false. If it is set to true, no retry

-            occurs, but the error is counted.</para>

-          

-          <para>When the number of counted errors exceeds the sample size, an action

-            specified by the <literal>action</literal> attribute is taken. The possible

-            actions and their meaning are the same as described above for the

-            <literal>&lt;maxConsecutiveRestarts&gt;</literal> element:

-            <itemizedlist spacing="compact">

-              <listitem><para><literal>continue</literal></para></listitem>

-              <listitem><para><literal>disable</literal></para></listitem>

-              <listitem><para><literal>terminate</literal></para></listitem>

-            </itemizedlist></para>

-         

-          <para>The <literal>dropCasOnException="true"</literal> attribute of the

-            <literal>&lt;casProcessors&gt;</literal> element modifies the action

-            taken for continue and disable, in the same manner as above. For example:

-            

-            

-            <programlisting>&lt;errorRateThreshold value="3/1000" action="disable"/&gt;</programlisting>

-            specifies that each error thrown by the CAS Processor itself will be retried up to

-            3 times (if <literal>dropCasOnException</literal> is false) and the CAS

-            Processor will be disabled if the error rate exceeds 3 errors in 1000

-            documents.</para>

-          

-          <para>If a document causes an error and the error rate threshold for the CAS

-            Processor is not exceeded, the CPM increments the CAS Processor&apos;s error

-            count and retries processing that document (if

-            <literal>dropCasOnException</literal> is false). The retry means that the

-            CPM calls the CAS Processor&apos;s process() method again, passing in as an

-            argument the same CAS that previously caused an exception.</para>

-          <note><para>The CPM does not attempt to rollback any partial changes that may have

-          been applied to the CAS in the previous process() call. </para></note>

-          

-          <para>Errors are accumulated across documents. For example, assume the error

-            rate threshold is <literal>3/1000</literal>. The same document may fail three

-            times before finally succeeding on the fourth try, but the error count is now 3. If

-            one more error occurs within the current sample of 1000 documents, the error rate

-            threshold will be exceeded and the specified action will be taken. If no more

-            errors occur within the current sample, the error counter is reset to 0 for the

-            next sample of 1000 documents.</para>

-          

-          <para>The <literal>&lt;timeout&gt;</literal> element is a mandatory element.

-            Although mandatory for all CAS Processors, this element is only relevant for

-            local and remote CAS Processors. For integrated CAS Processors, this element is

-            ignored. In the current CPM implementation the integrated CAS Processor

-            process() method is not subject to timeouts.</para>

-          

-          <para>The <literal>max</literal> attribute specifies the maximum amount of

-            time in milliseconds the CPM will wait for a process() method to complete When

-            exceeded, the CPM will generate an exception and will treat this as an error

-            subject to the threshold defined in the

-            <literal>&lt;errorRateThreshold&gt;</literal> element above, including

-            doing retries.</para>

-          

-          <section

-            id="&tp;descriptor.cas_processors.individual.error_handling.timeout_retry_action">

-            <title>Retry action taken on a timeout</title>

-            

-            <para>The action taken depends on whether the CAS Processor is local (managed)

-              or remote (unmanaged). Local CAS Processors (which are services) are killed

-              and restarted, and a new connection to them is established. For remote CAS

-              Processors, the connection to them is dropped, and a new connection is

-              reestablished (which may actually connect to a different instance of the

-              remote services, if it has multiple instances).</para>

-          </section>

-        </section>

-        

-        <section id="&tp;descriptor.cas_processors.individual.checkpoint">

-          <title>&lt;checkpoint&gt; Element</title>

-          

-          <para>The <literal>&lt;checkpoint&gt;</literal> element is an optional

-            element used to improve the performance of CAS Consumers. It has a single

-            attribute, <literal>batch</literal>, which specifies the number of CASes in a

-            batch, e.g.:

-            

-            

-            <programlisting>&lt;checkpoint batch="1000"&gt;</programlisting></para>

-          

-          <para>sets the batch size to 1000 CASes. The batch size is the interval used to mark a

-            point in processing requiring special handling. The CAS Processor&apos;s

-            <literal>batchProcessComplete()</literal> method will be called by the CPM

-            when this mark is reached so that the processor can take appropriate action. This

-            mark could be used as a mechanism to buffer up results in CAS Consumers and perform

-            time-consuming operations, such as check-pointing, that should not be done on a

-            per-document basis.</para>

-          

-        </section>

-      </section>

-    </section>

-    

-    <section id="&tp;descriptor.operational_parameters">

-      <title>CPE Operational Parameters</title>

-      

-      <para>The parameters for configuring the overall CPE and CPM are specified in the

-        <literal>&lt;cpeConfig&gt;</literal> section. The overall format of this

-        section is:

-        

-        

-        <programlisting><![CDATA[<cpeConfig>

-  <startAt>[NumberOrID]</startAt>

-

-  <numToProcess>[Number]</numToProcess>

-

-  <outputQueue dequeueTimeout="[Number]" queueClass="[ClassName]" />

-

-  <checkpoint file="[File]" time="[Number]" batch="[Number]"/>

-

-  <timerImpl>[ClassName]</timerImpl>

-

-  <deployAs>vinciService|interactive|immediate|single-threaded

-  </deployAs>

-

-</cpeConfig>]]></programlisting></para>

-      

-      <para>This section of the CPE descriptor allows for defining the starting entity, the

-        number of entities to process, a checkpoint file and frequency, a pluggable timer, an

-        optional output queue implementation, and finally a mode of operation. The mode of

-        operation determines how the CPM interacts with users and other systems.</para>

-      

-      <para>The <literal>&lt;startAt&gt;</literal> element is an optional argument. It

-        defines the starting entity in the collection at which the CPM should start

-        processing.</para>

-      

-      <para>The implementation in the CPM passes this argument to the Collection Reader

-        as the value of the parameter <quote><literal>startNumber</literal></quote>.

-        The CPM does not do anything else with this parameter; in particular, the CPM has no

-        ability to skip to a specific document - that function, if available, is only provided

-        by a particular Collection Reader implementation.</para>

-      

-      <para>If the <literal>&lt;startAt&gt;</literal> element is used, the Collection

-        Reader descriptor must define a single-valued configuration parameter with the

-        name <literal>startNumber</literal>. It can declare this value to be of any type;

-        the value passed in this XML element must be convertible to that type.</para>

-      

-      <para>A typical use is to declare this to be an integer type, and to pass the sequential

-        document number where processing should start. An alternative implementation

-        might take a specific document ID; the collection reader could search through its

-        collection until it reaches this ID and then start there.</para>

-      

-      <para>This parameter will only make sense if the particular collection reader is

-        implemented to use the <literal>startNumber</literal> configuration

-        parameter.</para>

-      

-      <para>The <literal>&lt;numToProcess&gt;</literal> element is an optional

-        element. It specifies the total number of entities to process. Use -1 to indicate ALL.

-        If not defined, the number of entities to process will be taken from the Collection

-        Reader configuration. If present, this value overrides the Collection Reader

-        configuration.</para>

-      

-      <para>The <literal>&lt;outputQueue&gt;</literal> element is an optional element.

-        It enables plugging in a custom implementation for the Output Queue. When omitted,

-        the CPM will use a default output queue that is based on First-in First-out (FIFO)

-        model.</para>

-      

-      <para>The UIMA SDK provides a second implementation for the Output Queue that can be

-        plugged in to the CPM, named <quote>

-        <literal>org.apache.uima.collection.impl.cpm.engine.SequencedQueue</literal>

-        </quote>.</para>

-      

-      <para>This implementation supports handling very large documents that are split into

-        <quote>chunks</quote>; it provides a delivery mechanism that insures the

-        sequential order of the chunks using information carried in the CAS metadata. This

-        metadata, which is required for this implementation to work correctly, must be added

-        as an instance of a Feature Structure of type

-        <literal>org.apache.es.tt.DocumentMetaData</literal> and referred to by an

-        additional feature named <literal>esDocumentMetaData</literal> in the special

-        instance of <literal>uima.tcas.DocumentAnnotation</literal> that is

-        associated with the CAS. This is usually done by the Collection Reader; the instance

-        contains the following features:

-        

-        <variablelist>

-          <varlistentry>

-            <term>sequenceNumber</term>

-            <listitem><para>[Number] the sequential number of a chunk, starting at 1. If

-              not a chunk (i.e. complete document), the value should be 0.</para>

-              </listitem>

-          </varlistentry>

-          <varlistentry>

-            <term>documentId</term>

-            <listitem><para>[Number] current document id. Chunks belonging to the same

-              document have identical document id.</para></listitem>

-          </varlistentry>

-          <varlistentry>

-            <term>isCompleted</term>

-            <listitem><para>[Number] 1 if the chunk is the last in a sequence, 0

-              otherwise.</para></listitem>

-          </varlistentry>

-          <varlistentry>

-            <term>url</term>

-            <listitem><para>[String] document url.</para></listitem>

-          </varlistentry>

-          <varlistentry>

-            <term>throttleID</term>

-            <listitem><para>[String] special attribute currently used by

-              OmniFind.</para></listitem>

-          </varlistentry>

-        </variablelist></para>

-      

-      <para>This implementation of a sequenced queue supports proper sequencing of CASes in

-        CPM deployments that use document chunking. Chunking is a technique of splitting

-        large documents into pieces to reduce overall memory consumption. Chunking does not

-        depend on the number of CASes in the CAS Pool. It works equally well with one or more

-        CASes in the CAS Pool. Each chunk is packaged in a separate CAS and placed in the Work

-        Queue. If the CAS Pool is depleted, the CollectionReader thread is suspended until a

-        CAS is released back to the pool by the processing threads. A document may be split into

-        1, 2, 3 or more chunks that are analyzed independently. In order to reconstruct the

-        document correctly, the CAS Consumer can depend on receiving the chunks in the same

-        sequential order that the chunks were <quote>produced</quote>, when this

-        sequenced queue implementation is used. To plug in this sequenced queue to the CPM use

-        the following specification:

-        

-        

-        <programlisting>&lt;outputQueue dequeueTimeout="100000" queueClass=

-"org.apache.uima.collection.impl.cpm.engine.SequencedQueue"/&gt;</programlisting>

-        

-        where the mandatory <literal>queueClass</literal> attribute defines the name of

-        the class and the second mandatory attribute, <literal>dequeueTimeout</literal>

-        specifies the maximum number of milliseconds to wait for the expected chunk.</para>

-      

-      <note><para>The value for this timeout must be carefully determined to avoid

-      excessive occurrences of timeouts. Typically, the size of a chunk and the type of

-      analysis being done are the most important factors when deciding on the value for the

-      timeout. The larger the chunk and the more complicated analysis, the more time it takes

-      for the chunk to go from source to sink. You may specify 0, in which case, the timeout is 

-      disabled - i.e., it is equivalent to an infinitely long timeout.</para></note>

-      

-      <para>If the chunk doesn&apos;t arrive in the configured time window, the entire

-        document is presumed to be invalid and the CAS is dropped from further processing.

-        This action occurs regardless of any other error action specification. The

-        SequencedQueue invalidate the document, adding the offending document&apos;s

-        metadata to a local cache of invalid documents. </para>

-      

-      <para>If the time out occurs, the CPM notifies all registered listeners (see <olink

-          targetdoc="&uima_docs_tutorial_guides;"/> <olink

-          targetdoc="&uima_docs_tutorial_guides;"

-          targetptr="ugr.tug.cpe.using_listeners"/>) by calling

-        entityProcessComplete(). As part of this call, the SequencedQueue will pass null

-        instead of a CAS as the first argument, and a special exception &ndash;

-        CPMChunkTimeoutException. The reason for passing null as the first argument is

-        because the time out occurs due to the fact that the chunk has not been received in the

-        configured timeout window, so there is no CAS available when the timeout event

-        occurs.</para>

-      

-      <para>The CPMChunkTimeoutException object includes an API that allows the listener

-        to retrieve the offending document id as well as the other metadata attributes as

-        defined above. These attributes are part of each chunk&apos;s metadata and are added

-        by the Collection Reader.</para>

-      

-      <para>Each chunk that SequencedQueue works on is subjected to a test to determine if the

-        chunk belongs to an invalid document. This test checks the chunk&apos;s metadata

-        against the data in the local cache. If there is a match, the chunk is dropped. This

-        check is only performed for chunks and complete documents are not subject to this

-        check.</para>

-      

-      <para>If there is an exception during the processing of a chunk, the CPM sends a

-        notification to all registered listeners. The notification includes the CAS and an

-        exception. When the listener notification is completed, the CPM also sends separate

-        notifications, containing the CAS, to the Artifact Producer and the

-        SequencedQueue. The intent is to stop adding new chunks to the Work Queue that belong

-        to an <quote>invalid</quote> document and also to deal with chunks that are

-        en-route, being processed by the processing threads.</para>

-      

-      <para>In response to the notification, the Artifact Producer will drop and release

-        back to the CAS Pool all CASes that belong to an <quote>invalid</quote> document.

-        Currently, there is no support in the CollectionReader&apos;s API to tell it to stop

-        generating chunks. The CollectionReader keeps producing the chunks but the

-        Artifact Producer immediately drops/releases them to the CAS Pool. Before the CAS is

-        released back to the CAS Pool, the Artifact Producer sends notification to all

-        registered listeners. This notification includes the CAS and an exception &ndash;

-        SkipCasException.</para>

-      

-      <para>In response to the notification of an exception involving a chunk, the

-        SequencedQueue retrieves from the CAS the metadata and adds it to its local cache of

-        <quote>invalid</quote> documents. All chunks de-queued from the OutputQueue and

-        belonging to <quote>invalid</quote> documents will be dropped and released back to

-        the CAS Pool. Before dropping the CAS, the CPM sends notification to all registered

-        listeners. The notification includes the CAS and SkipCasException.</para>

-      

-      <para>The <literal>&lt;checkpoint&gt;</literal> element is an optional element.

-        It specifies a CPE checkpoint file, checkpoint frequency, and strategy for

-        checkpoints (time or count based). At checkpoint time, the CPM saves status

-        information and statistics to the checkpoint file. The checkpoint file is specified

-        in the <literal>file</literal> attribute, which has the same form as the

-        <literal>href</literal> attribute of the <literal>&lt;include&gt;</literal>

-        element described in <xref linkend="&tp;imports"/>. The

-        <literal>time</literal> attribute indicates that a checkpoint should be taken

-        every <literal>[Number]</literal> seconds, and the <literal>batch</literal>

-        attribute indicates that a checkpoint should be taken every

-        <literal>[Number]</literal> batches.</para>

-      

-      <para>The <literal>&lt;timerImpl&gt;</literal> element is optional. It is used to

-        identify a custom timer plug-in class to generate time stamps during the CPM

-        execution. The value of the element is a Java class name.</para>

-      

-      <para>The <literal>&lt;deployAs&gt;</literal> element indicates the type of CPM

-        deployment. Valid contents for this element include:

-        

-        <variablelist>

-          <varlistentry>

-            <term>vinciService</term>

-            <listitem><para>Vinci service exposing APIs for stop, pause, resume, and

-              getStats</para></listitem>

-          </varlistentry>

-          <varlistentry>

-            <term>interactive</term>

-            <listitem><para>provide command line menus (start, stop, pause,

-              resume)</para></listitem>

-          </varlistentry>

-          <varlistentry>

-            <term>immediate</term>

-            <listitem><para>run the CPM without menus or a service API</para></listitem>

-          </varlistentry>

-          <varlistentry>

-            <term>single-threaded</term>

-            <listitem><para>run the CPM in a single threaded mode. In this mode, the

-              Collection Reader, the Processing Pipeline, and the CAS Consumer Pipeline

-              are all running in one thread without the work queue and the output

-              queue.</para></listitem>

-          </varlistentry>

-        </variablelist></para>

-      

-    </section>

-    

-    <section id="&tp;descriptor.resource_manager_configuration">

-      <title>Resource Manager Configuration</title>

-      

-      <para>External resource bindings for the CPE may optionally be specified in an

-        element:

-        

-        

-        <programlisting>&lt;resourceManagerConfiguration href="..."/&gt;</programlisting></para>

-      

-      <para>For an introduction to external resources, refer to <olink

-          targetdoc="&uima_docs_tutorial_guides;"/> <olink

-          targetdoc="&uima_docs_tutorial_guides;"

-          targetptr="ugr.tug.aae.accessing_external_resource_files"/>.</para>

-      

-      <para>In the <literal>resourceManagerConfiguration</literal> element, the value

-        of the href attribute refers to another file that contains definitions and bindings

-        for the external resources used by the CPE. The format of this file is the same as the XML

-        snippet <olink targetdoc="&uima_docs_ref;"

-          targetptr="ugr.ref.xml.component_descriptor.aes.aggregate.external_resource_bindings"/>

-        . For example, in a CPE containing an aggregate analysis engine with two annotators,

-        and a CAS Consumer, the following resource manager configuration file would bind

-        external resource dependencies in all three components to the same physical

-        resource:

-        

-        

-        <programlisting><![CDATA[<resourceManagerConfiguration>

-

-  <!-- Declare Resource -->

-

-  <externalResources>

-    <externalResource>

-      <name>ExampleResource</name>

-      <fileResourceSpecifier>

-        <fileUrl>file:MyResourceFile.dat</fileUrl>

-      </fileResourceSpecifier>

-    </externalResource>

-  </externalResources>

-

-  <!-- Bind component resource dependencies to ExampleResource -->

-

-  <externalResourceBindings>

-    <externalResourceBinding>

-      <key>MyAE/annotator1/myResourceKey</key>

-      <resourceName>ExampleResource</resourceName>

-    </externalResourceBinding>

-

-    <externalResourceBinding>

-      <key>MyAE/annotator2/someResourceKey</key>

-      <resourceName>ExampleResource</resourceName>

-    </externalResourceBinding>

-

-    <externalResourceBinding>

-      <key>MyCasConsumer/otherResourceKey</key>

-      <resourceName>ExampleResource</resourceName>

-    </externalResourceBinding>

-

-  </externalResourceBindings>

-

-</resourceManagerConfiguration>]]></programlisting></para>

-      

-      <para>In this example, <literal>MyAE</literal> and

-        <literal>MyCasConsumer</literal> are the names of the Analysis Engine and CAS

-        Consumer, as specified by the name attributes of the CPE&apos;s

-        <literal>&lt;casProcessor&gt;</literal> elements.

-        <literal>annotator1</literal> and <literal>annotator2</literal> are the

-        annotator keys specified within the Aggregate AE Descriptor, and

-        <literal>myResourceKey</literal>, <literal>someResourceKey</literal>, and

-        <literal>otherResourceKey</literal> are the keys of the resource dependencies

-        declared in the individual annotator and CAS Consumer descriptors.</para>

-      

-    </section>

-    

-    <section id="&tp;descriptor.example">

-      <title>Example CPE Descriptor</title>

-      

-      

-      <programlisting><![CDATA[<?xml version="1.0" encoding="UTF-8"?>

-<cpeDescription>

-  <collectionReader>

-    <collectionIterator>

-      <descriptor>

-        <import location=

-           "../collection_reader/FileSystemCollectionReader.xml"/>

-      </descriptor>

-    </collectionIterator>

-  </collectionReader>

-  <casProcessors dropCasOnException="true" casPoolSize="1" 

-      processingUnitThreadCount="1">

-    <casProcessor deployment="integrated" 

-      name="Aggregate TAE - Name Recognizer and Person Title Annotator">

-      <descriptor>

-        <import location=

-           "../analysis_engine/NamesAndPersonTitles_TAE.xml"/>

-      </descriptor>

-      <deploymentParameters/>

-      <filter/>

-      <errorHandling>

-        <errorRateThreshold action="terminate" value="100/1000"/>

-                <maxConsecutiveRestarts action="terminate" value="30"/>

-                <timeout max="100000"/>

-      </errorHandling>

-      <checkpoint batch="1"/>

-    </casProcessor>

-    <casProcessor deployment="integrated" name="Annotation Printer">

-      <descriptor>

-        <import location="../cas_consumer/AnnotationPrinter.xml"/>

-      </descriptor>

-      <deploymentParameters/>

-      <filter/>

-      <errorHandling>

-        <errorRateThreshold action="terminate" value="100/1000"/>

-        <maxConsecutiveRestarts action="terminate" value="30"/>

-        <timeout max="100000"/>

-      </errorHandling>

-      <checkpoint batch="1"/>

-    </casProcessor>

-  </casProcessors>

-  <cpeConfig>

-    <numToProcess>1</numToProcess>

-    <deployAs>immediate</deployAs>

-    <checkpoint file="" time="3000"/>

-    <timerImpl/>

-  </cpeConfig>

-</cpeDescription>]]></programlisting>

-    </section>

-  

-</chapter>
\ No newline at end of file
diff --git a/uima-docbook-references/src/docbook/references.xml b/uima-docbook-references/src/docbook/references.xml
deleted file mode 100644
index 8f0268a..0000000
--- a/uima-docbook-references/src/docbook/references.xml
+++ /dev/null
@@ -1,39 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<book lang="en">

-  <title>UIMA References</title>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="../../target/docbook-shared/common_book_info_ibm_c.xml"/>

-

-  <toc/>

-  

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ref.javadocs.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ref.xml.component_descriptor.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ref.xml.cpe_descriptor.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ref.cas.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ref.jcas.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ref.pear.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ref.xmi.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ref.compress.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ref.json.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ref.config.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ref.resources.xml"/>

-</book>

diff --git a/uima-docbook-tools/pom.xml b/uima-docbook-tools/pom.xml
deleted file mode 100644
index 611f39c..0000000
--- a/uima-docbook-tools/pom.xml
+++ /dev/null
@@ -1,50 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!--
-   Licensed to the Apache Software Foundation (ASF) under one
-   or more contributor license agreements.  See the NOTICE file
-   distributed with this work for additional information
-   regarding copyright ownership.  The ASF licenses this file
-   to you under the Apache License, Version 2.0 (the
-   "License"); you may not use this file except in compliance
-   with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
-   Unless required by applicable law or agreed to in writing,
-   software distributed under the License is distributed on an
-   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-   KIND, either express or implied.  See the License for the
-   specific language governing permissions and limitations
-   under the License.    
--->
-<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
-  <modelVersion>4.0.0</modelVersion>
-
-  <parent>
-    <groupId>org.apache.uima</groupId>
-    <artifactId>uimaj-parent</artifactId>
-    <version>3.5.0-SNAPSHOT</version>
-    <relativePath>../uimaj-parent/pom.xml</relativePath>
-  </parent>
-
-  <artifactId>uima-docbook-tools</artifactId>
-  <packaging>pom</packaging>
-  <name>Apache UIMA SDK Documentation - tools</name>
-  <url>${uimaWebsiteUrl}</url>
-
-  <properties>
-    <!-- next property is the name of the top file under src/docbook without trailing .xml -->
-    <bookNameRoot>tools</bookNameRoot>
-  </properties>
-
-  <repositories>
-    <repository>
-      <id>apache.snapshots</id>
-      <name>Apache Snapshot Repository</name>
-      <url>https://repository.apache.org/snapshots</url>
-      <releases>
-        <enabled>false</enabled>
-      </releases>
-    </repository>
-  </repositories>
-</project>
\ No newline at end of file
diff --git a/uima-docbook-tools/src/docbook/tools.annotation_viewer.xml b/uima-docbook-tools/src/docbook/tools.annotation_viewer.xml
deleted file mode 100644
index 1f00cc1..0000000
--- a/uima-docbook-tools/src/docbook/tools.annotation_viewer.xml
+++ /dev/null
@@ -1,79 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY imgroot "images/tools/tools.annotation_viewer/" >

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.tools.annotation_viewer">

-  <title>Annotation Viewer</title>

-  

-  <para>The <emphasis>Annotation Viewer</emphasis> is a tool for viewing analysis results

-    that have been saved to your disk as <emphasis>external XML representations of the

-    CAS</emphasis>. These are saved in a particular format called XMI. In the UIMA SDK, XML

-    versions of CASes can be generated by:</para>

-  

-  <itemizedlist><listitem><para>Running the Document Analyzer (see <olink

-      targetdoc="&uima_docs_tools;" targetptr="ugr.tools.doc_analyzer"/>, which

-    saves an XML representations of the CAS to the specified output directory.</para>

-    </listitem>

-    

-    <listitem><para>Running a Collection Processing Engine that includes the

-      <emphasis>XMI Writer </emphasis>CAS Consumer

-      (<literal>examples/descriptors/cas_consumer/XmiWriterCasConsumer.xml)</literal>.

-      </para></listitem>

-    

-    <listitem><para>Explicitly creating XML representations of the CAS from your own

-      application using the org.apache.uima.cas.impl.XMISerializer class. The best way

-      to learn how to do this is to look at the example code for the XMI Writer CAS Consumer,

-      located in

-      <literal>examples/src/org/apache/uima/examples/xmi/XmiWriterCasConsumer.java</literal>.

-      <footnote><para>An older form of a different XML format for the CAS is also provided

-      mainly for backwards compatibility. This form is called XCAS, and you can see examples

-      of its use in

-      <literal>examples/src/org/apache/uima/examples/cpe/XCasWriterCasConsumer.java</literal>.

-      </para></footnote> </para></listitem></itemizedlist>

-  <note><para>The Annotation Viewer only shows CAS views where the Sofa data type is a String.

-  </para></note>

-  

-  <para>You can run the Annotation Viewer by executing the

-    <literal>annotationViewer</literal> shell script located in the bin directory of the

-    UIMA SDK or the "UIMA Annotation Viewer" Eclipse run configuration in the

-    <literal>uimaj-examples</literal> project. This will open the following window:   

-    

-    <screenshot>

-  <mediaobject>

-    <imageobject>

-      <imagedata width="5.8in" format="JPG" fileref="&imgroot;image002.jpg"/>

-    </imageobject>

-    <textobject><phrase>Screenshot of annotationViewer</phrase></textobject>

-  </mediaobject>

-</screenshot></para>

-  

-  <para>Select an input directory (which must contain XMI files), and the descriptor for the

-    AE that produced the Analysis (which is needed to get the type system for the analysis).

-    Then press the <quote>View</quote> button.</para>

-  

-  <para>This will bring up a dialog where you can select a viewing format and double-click on a

-    document to view it. This dialog is the same as the one that is described in <olink

-      targetdoc="&uima_docs_tools;"

-      targetptr="ugr.tools.doc_analyzer.viewing_results"/>.</para>

-</chapter>
\ No newline at end of file
diff --git a/uima-docbook-tools/src/docbook/tools.caseditor.xml b/uima-docbook-tools/src/docbook/tools.caseditor.xml
deleted file mode 100644
index 2dcf063..0000000
--- a/uima-docbook-tools/src/docbook/tools.caseditor.xml
+++ /dev/null
@@ -1,639 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
-<!ENTITY imgroot "images/tools/tools.caseditor/" >
-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  
-%uimaents;
-]>
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements.  See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership.  The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License.  You may obtain a copy of the License at
-
-   http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied.  See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-<chapter id="ugr.tools.ce">
-
-	<title>Cas Editor User&apos;s Guide</title>
-	<titleabbrev>Cas Editor User&apos;s Guide</titleabbrev>
-
-	<section id="sandbox.caseditor.Introduction">
-		<title>Introduction</title>
-
-		<para>
-			The CAS Editor is an Eclipse based annotation tool which supports manual and automatic
-			annotation (via running UIMA annotators) of CASes stored in files.
-			Currently only text-based CAS are supported. 
-		    The CAS Editor can visualize and edit all feature structures. 
-		    Feature Structures which are annotations can additionally be viewed and edited
-			directly on text.
-		</para>
-		
-		<!-- Note: In the HTML version the screen shot is too small and in the
-		PDF version its almost to big, how can that be fixed?
-    
-    I looked at this, in PDF and Firefox, and the images are almost the same size
-    relative to the font size.  So I think it's working OK.
-    
-    There is code to scale things for pdf vs html in uima-build-resources in 
-    src/main/resources/docbook-shared/common/html-pdf.xsl -->
-    
-		<screenshot>
-			<mediaobject>
-				<imageobject>
-					<imagedata width="5.7in" format="PNG"
-						fileref="&imgroot;CasEditor.png" />
-				</imageobject>
-			</mediaobject>
-		</screenshot>
-	</section>
-
-	<section id="sandbox.caseditor.Launching">
-		<title>Launching the Cas Editor</title>
-		<para>
-			To open a CAS in the Cas Editor it needs a compatible type system
-			and styling information which specify how to display the types.
-			The styling information is created automatically by the Cas Editor;
-			but the type system file must be provided by the user.
-    </para>
-    
-    <para>A CAS in the xmi or xcas format can simply be opened by clicking
-      on it, like a text file is opened with the Eclipse text editor.</para>
-    
-    <section id="sandbox.caseditor.typeSystemSpec">
-      <title>Specifying a type system</title>
-      <para>    
-      The Cas Editor expects a type system file at the root of the project
-      named TypeSystem.xml.  If a type system cannot be found, this message is shown:
-    <screenshot>
-      <mediaobject>
-        <imageobject>
-          <imagedata scale="100" format="PNG"
-            fileref="&imgroot;ProvideTypeSystem.png" />
-        </imageobject>
-        <textobject>
-          <phrase>No type system available for the opened CAS.</phrase>
-        </textobject>
-      </mediaobject>
-    </screenshot>
-    
-    </para>
-    <para>     
-      If the type system file does not exist in this
-      location you can point the Cas Editor to a specific
-      type system file.  
-      
-      You can also change the default type system location in the
-      properties page of the Eclipse project. To do that right click the project,
-      select Properties and go to the UIMA Type System tab, and specify the default
-      location for the type system file.
-    </para>   
-    </section>
-    
-		<para>
-			After the Cas Editor is opened switch to the Cas Editor
-			Perspective to see all the Cas Editor related views.
-		</para>
-	</section>
-
-	<section id="sandbox.caseditor.annotation_editor">
-
-		<title>Annotation editor</title>
-		<para>
-			The annotation editor shows the text with annotations and
-			provides different views to show aspects of the CAS.
-			
-		</para>
-
-		<section id="ugr.tools.cas_editor.annotation_editor.editor">
-			<title>Editor</title>
-			<para>
-			    After the editor is open it shows the default sofa of the CAS.  
-          (Displaying another sofa is right now not possible.)
-				The editor has an associated, changeable CAS Type. This type is called the editor "mode".
-        		By default the editor only shows annotation of this type. Actions and views are
-				sensitive to this mode.  The next screen shows the display, where the mode is set to "Person":
-				
-				<screenshot>
-					<mediaobject>
-						<imageobject>
-							<imagedata width="5.7in" format="PNG"
-								fileref="&imgroot;EditorOneType.png" />
-						</imageobject>
-					</mediaobject>
-				</screenshot>
-				
-				To change the mode for the editor, use the "Mode" menu in the editor context menu.
-				To open the context menu right click somewhere on the text.
-				
-				<screenshot>
-					<mediaobject>
-						<imageobject>
-							<imagedata width="5.7in" format="PNG"
-								fileref="&imgroot;ModeMenu.png" />
-						</imageobject>
-					</mediaobject>
-				</screenshot>
-				        	
-				The current mode is displayed in the status line at the bottom and in the Style View.
-			</para>
-			
-      		<para>
-				It's possible to work with more than one annotation type at a time; the mode just selects the default annotation type
-				which can be marked with the fewest keystrokes. To show annotations of other types, use the "Show" menu in
-				the context menu.
-					
-				<screenshot>
-					<mediaobject>
-						<imageobject>
-							<imagedata scale="100" format="PNG"
-								fileref="&imgroot;ShowAnnotationsMenu.png" />
-						</imageobject>
-					</mediaobject>
-				</screenshot>
-				
-				Alternatively, you may select the annotation types to be shown in the Style View.
-				
-				<screenshot>
-					<mediaobject>
-						<imageobject>
-							<imagedata scale="100" format="PNG"
-								fileref="&imgroot;StyleView2.png" />
-						</imageobject>
-					</mediaobject>
-				</screenshot>
-        
-				The editor will show the additional selected types.
-				
-				<screenshot>
-					<mediaobject>
-						<imageobject>
-							<imagedata width="5.7in" format="PNG"
-								fileref="&imgroot;EditorAllTypes.png" />
-						</imageobject>
-					</mediaobject>
-				</screenshot>				
-				The annotation renderer and rendering layer can be changed in the Properties dialog. After the
-				change all editors which share the same type system will be updated.
-			</para>
-			<para>
-				The editor automatically selects annotations of the editor mode type that are near the
-				cursor. This selection is then synchronized or displayed in other views.
-			</para>
-			<para>
-				To create an annotation manually using the editor, mark a piece of text and then
-				press the enter key. This creates an annotation of the 
-        		type of the editor mode, having bounds corresponding to the selection.
-        		You can also use the "Quick Annotate" action from the context menu.
-			</para>
-			<para>
-				It is also possible to choose the annotation type; press
-				shift + enter (smart insert) or click on "Annotate" in the context menu for this.
-				A dialog will ask for the annotation type to create; either select the desired type or use
-				the associated key shortcut. In the screen shot below, pressing the "p" key
-				will create a Person annotation for "Obama".
-			</para>
-			
-			<screenshot>
-				<mediaobject>
-					<imageobject>
-						<imagedata scale="100" format="PNG"
-							fileref="&imgroot;ShiftEnter.png" />
-					</imageobject>
-				</mediaobject>
-			</screenshot>
-			
-			<para>
-				To delete an annotation, select it and press the delete
-				key. Only annotations of the editor mode can be deleted with this method.
-				To delete non-editor mode annotations use the Outline View.
-			</para>
-			
-			<para>
-			For annotation projects you can change the font size in the editor.
-			The default font size is 13. To change this open the Eclipse preference dialog, 
-			go to "UIMA Annotation Editor".  
-			</para>
-		</section>
-
-		<section id="sandbox.caseditor.annotation_editor.styling">
-			<title>Configure annotation styling</title>
-			<para>
-				The Cas Editor can visualize the annotations in multiple
-				highlighting colors and with different annotation drawing styles.
-				The annotation styling is defined per type system. When its changed,
-				the appearance changes in all opened editors sharing a type system.
-			</para>
-		
-			<para>
-				The styling is initialized with a unique color for every
-				annotation type and every annotation is drawn with
-				Squiggles annotation style. You may adjust
-				the annotation styles and coloring depending on the project
-				needs.
-			</para>
-		
-			<screenshot>
-				<mediaobject>
-					<imageobject>
-						<imagedata scale="100" format="PNG"
-						fileref="&imgroot;StyleView.png" />
-				</imageobject>
-				</mediaobject>
-			</screenshot>
-			
-			<para>
-				The Cas Editor offers a property page to edit the
-				styling. To open this property page click on the "Properties"
-				button in the Styles view.
-			</para>
-			
-			<para>
-				The property page can be seen below. By clicking on one of the
-				annotation types, the color, drawing style and drawing layer can be edited on the right
-				side.
-			</para>
-			
-			<screenshot>
-				<mediaobject>
-					<imageobject>
-						<imagedata width="5.7in" format="PNG"
-						fileref="&imgroot;StyleProperties.png" />
-				</imageobject>
-				</mediaobject>
-			</screenshot>
-			
-			<para>
-				The annotations can be visualized with one the following 
-				annotation stlyes:
-			
-			
-				<table frame='all'><title>Style Table</title>
-					<tgroup cols='3' align='left' colsep='1' rowsep='1'>
-					<thead>
-						<row>
-						  <entry>Style</entry>
-						  <entry>Sample</entry>
-						  <entry>Description</entry>
-						</row>
-					</thead>
-					<tbody>
-					<row>
-						<entry>BACKGROUND</entry>
-						<entry>							
-							<screenshot>
-								<mediaobject>
-									<imageobject>
-										<imagedata align="left" scale="100" format="PNG"
-											fileref="&imgroot;Style-Background.png" />
-										</imageobject>
-								</mediaobject>
-							</screenshot>
-						</entry>
-						<entry>
-							<para>The background is drawn in the annotation color.</para>
-						</entry>
-					</row>
-					
-					<row>
-						<entry>TEXT_COLOR</entry>
-						<entry>							
-							<screenshot>
-								<mediaobject>
-									<imageobject>
-										<imagedata align="left" scale="100" format="PNG"
-											fileref="&imgroot;Style-TextColor.png" />
-										</imageobject>
-								</mediaobject>
-							</screenshot>
-						</entry>
-						<entry>
-							<para>The text is drawn in the annotation color.</para>
-						</entry>
-					</row>
-
-					<row>
-						<entry>TOKEN</entry>
-						<entry>							
-							<screenshot>
-								<mediaobject>
-									<imageobject>
-										<imagedata align="left" scale="100" format="PNG"
-											fileref="&imgroot;Style-Token.png" />
-										</imageobject>
-								</mediaobject>
-							</screenshot>
-						</entry>
-						<entry>
-							<para>
-								The token type assumes that token annotation are always separated
-								by a whitespace. Only if they are not separated by a whitespace
-								a vertical line is drawn to display the two token annotations.
-								The image on the left actually contains three annotations, one for "Mr", "."
-								and "Obama".
-							</para>
-						</entry>
-					</row>
-					
-					<row>
-						<entry>SQUIGGLES</entry>
-						<entry>							
-							<screenshot>
-								<mediaobject>
-									<imageobject>
-										<imagedata align="left" scale="100" format="PNG"
-											fileref="&imgroot;Style-Squiggles.png" />
-										</imageobject>
-								</mediaobject>
-							</screenshot>
-						</entry>
-						<entry>
-							<para>Squiggles are drawen under the annotation in the annotation color.</para>
-						</entry>
-					</row>
-					
-					<row>
-						<entry>BOX</entry>
-						<entry>							
-							<screenshot>
-								<mediaobject>
-									<imageobject>
-										<imagedata align="left" scale="100" format="PNG"
-											fileref="&imgroot;Style-Box.png" />
-										</imageobject>
-								</mediaobject>
-							</screenshot>
-						</entry>
-						<entry>
-							<para>A box in the annotation color is drawn around
-							the annotation.</para>
-						</entry>
-					</row>
-					
-					<row>
-						<entry>UNDERLINE</entry>
-						<entry>							
-							<screenshot>
-								<mediaobject>
-									<imageobject>
-										<imagedata align="left" scale="100" format="PNG"
-											fileref="&imgroot;Style-Underline.png" />
-										</imageobject>
-								</mediaobject>
-							</screenshot>
-						</entry>
-						<entry>
-							<para>A line in the annotation color is drawen below
-							the annotation.</para>
-						</entry>
-					</row>
-
-					<row>
-						<entry>BRACKET</entry>
-						<entry>							
-							<screenshot>
-								<mediaobject>
-									<imageobject>
-										<imagedata align="left" scale="100" format="PNG"
-											fileref="&imgroot;Style-Bracket.png" />
-										</imageobject>
-								</mediaobject>
-							</screenshot>
-						</entry>
-						<entry>
-							<para>An opening bracket is drawn around the first
-							character of the annotation and a closing bracket
-							is drawn around the last character of the annotation.</para>
-						</entry>
-					</row>
-					
-					</tbody>
-					</tgroup>
-				</table>
-			</para>
-			
-			<para>
-				The Cas Editor can draw the annotations in different
-				layers. If the spans of two annotations overlap the annotation
-				which is in a higher layer is drawn over annotations in a lower 
-				layer. Depending on the drawing style it is possible to see
-				both annotations. The drawing order is defined by the layer
-				number, layer 0 is drawn first, then layer 1 and so on.
-				If annotations in the same layer overlap its not defined which
-				annotation type is drawn first.
-			</para>
-		
-		<!-- Add image to explain the layers -->
-		</section>
-		
-		<section id="ugr.tools.cas_editor.annotation_editor.cas_views">
-			<title>CAS view support</title>
-			<para>
-			The Annotation Editor can only display text Sofa CAS views. Displaying
-			CAS views with Sofas of different types is not possible and will show
-			an editor page to switch back to another CAS view. The Edit and Feature Structure Browser views
-			are still available and might be used to edit Feature Structures which belong to the CAS view.
-			</para>
-			<para>
-			To switch to another CAS view, right click in the editor to open
-			the context menu and choose "CAS Views" and the view the editor
-			should switch to. 
-			</para>
-		</section>
-		
-		<section id="ugr.tools.cas_editor.annotation_editor.outline">
-			<title>Outline view</title>
-			
-			<para>
-				The outline view gives an overview of the annoations which are
-				shown in the editor.  The annotation are grouped by type. There are
-				actions to increase or decrease the bounds of the selected annotation. There is
-				also an action to merge selected annotations. The outline has second view mode where only
-				annotations of the current editor mode are shown. 
-
-			<!-- TODO: Replace image which a newer one -->
-			<screenshot>
-				<mediaobject>
-					<imageobject>
-						<imagedata scale="100" format="PNG"
-							fileref="&imgroot;Outline.png" />
-					</imageobject>
-				</mediaobject>
-			</screenshot>
-			
-			The style can be switched in the view menu, to a style where it only shows the annotations which 
-			belong to the current editor mode.
-			
-			<!-- TODO: Add image which visualizes this -->
-			</para>
-		</section>
-
-		<section
-			id="ugr.tools.cas_editor.annotation_editor.properties_view">
-			<title>Edit Views</title>
-			<para>
-				The Edit Views show details about the currently
-				selected annotations or feature structures. It is
-				possible to change primitive values in this view.
-				Referenced feature structures can be created and deleted,
-				including arrays. To link a feature structure with
-				other feature structures, it can be pinned to the edit
-				view. This means that it does not change if the
-				selection changes.
-			</para>
-			<screenshot>
-				<mediaobject>
-					<imageobject>
-						<imagedata scale="100" format="PNG"
-							fileref="&imgroot;EditView.png" />
-					</imageobject>
-				</mediaobject>
-			</screenshot>
-		</section>
-
-		<section id="ugr.tools.cas_editor.annotation_editor.fs_view">
-			<title>FeatureStructure View</title>
-			<para>
-				The FeatureStructure View lists all feature structures of
-				a specified type. The type is selected in the type
-				combobox.
-			</para>
-			
-			<para>
-				It's possible to create and delete feature structures of
-				every type.
-			</para>
-
-			<screenshot>
-				<mediaobject>
-					<imageobject>
-						<imagedata scale="100" format="PNG"
-							fileref="&imgroot;FSView.png" />
-					</imageobject>
-				</mediaobject>
-			</screenshot>
-		</section>
-	</section>
-	<section id="ugr.tools.cas_editor.custom_view">
-	<title>Implementing a custom Cas Editor View</title>
-	<para>
-	Custom Cas Editor views can be added, 
-	to rapidly create, access and/or change Feature Structures in the CAS.
-	While the Annotation Editor and its views offer support for general viewing and editing,
-	accessing and editing things in the CAS can be streamlined using a custom Cas Editor.
-  A custom Cas Editor view can be
-	programmed to use a particular type system and optimized to quickly change or show something.
-	</para>
-	<para>
-	Annotation projects often need to track the annotation status of a CAS where a user
-	needs to mark which parts have been annotated or corrected. To do this with the Cas Editor
-	a user would need to use the Feature Structure Browser view to select the Feature Structure
-	and then edit it inside the Edit view.
-	A custom Cas Editor view could directly select and show the Feature Structure and offer 
-	a tailored user interface to change the annotation status.
-	Some features such as the name of the annotator could even be automatically filled in.
-	</para>
-	<para>
-	The creation of Feature Structures which are linked to existing annotations or Feature Structures
-	is usually difficult with the standard views. A custom view which can make assumptions about the
-	type system is usually needed to do this efficiently.
-	</para>
-	<section id="ugr.tools.cas_editor.custom_view.sample">
-	<title>Annotation Status View Sample</title>
-	<para>
-	The Cas Editor provides the CasEditorView class as a base class for views which need to access
-	the CAS which is opened in the current editor. It shows a "view not available" message when the current
-	editor does not show a CAS, no editor is opened at all or the current CAS view is incompatible with
-	the view. 
-	</para>
-	<para>
-	The following snippet shows how it is usually implemented:
-	<programlisting>
-public class AnnotationStatusView extends CasEditorView {
-	
-  public AnnotationStatusView() {
-    super("The Annotation Status View is currently not available.");
-  }
-
-  @Override
-  protected IPageBookViewPage doCreatePage(ICasEditor editor) {
-    ICasDocument document = editor.getDocument();
-
-    if (document != null) {
-      return new AnnotationStatusViewPage(editor);
-    }
-
-    return null;
-  }
-}
-	</programlisting>
-	The doCreatePage method is called to create the actual view page. If the document
-	is null the editor failed to load a document and is showing an error message.
-	In the case the document is not null but the CAS view is incompatible the method
-	should return null to indicate that it has nothing to show. In this case the
-  "not available" message is displayed.
-	</para>
-	<para>
-	The next step is to implement the AnnotationStatusViewPage. That is the page which
-	gets the CAS as input and need to provide the user with a ui to change the Annotation
-	Status Feature Structure.
-	<programlisting>
-public class AnnotationStatusViewPage extends Page {
-  
-  private ICasEditor editor;
-  
-  AnnotationStatusViewPage(ICasEditor editor) {
-    this.editor = editor;
-  }
-  
-  ...
-  
-  public void createControl(Composite parent) {
-  
-    // create ui elements here
-    
-    ...
-    
-    ICasDocument document = editor.getDocument();
-    CAS cas = document.getCAS();
-    
-    // Retrieve Annotation Status FS from CAS
-    // and initalize the ui elements with it
-    
-    FeatureStructre statusFS;
-    
-    ...
-    
-    // Add event listeners to the ui element
-    // to save an update to the CAS
-    // and to advertise a change
-    
-    ...
-    
-    // Send update event
-    document.update(statusFS);
-    
-  }
-}
-	</programlisting>
-	The above code sketches out how a typical view page is implemented. The CAS can be directly used
-	to access any Feature Structures or annotations stored in it.
-	When something is modified added/removed/changed that must be advertised via the ICasDocument object.
-	It has multiple notification methods which send an event so that other views can be updated.
-	The view itself can also register a listener to receive CAS change events. 
-	</para>
-	</section>
-	</section>
-</chapter>
\ No newline at end of file
diff --git a/uima-docbook-tools/src/docbook/tools.cde.xml b/uima-docbook-tools/src/docbook/tools.cde.xml
deleted file mode 100644
index 617f3c7..0000000
--- a/uima-docbook-tools/src/docbook/tools.cde.xml
+++ /dev/null
@@ -1,1450 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY imgroot "images/tools/tools.cde/" >

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  

-<!ENTITY uima_docs_overview_title "UIMA Overview &amp; SDK Setup" >

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.tools.cde">

-  <title>Component Descriptor Editor User&apos;s Guide</title>

-  <titleabbrev>CDE User&apos;s Guide</titleabbrev>

-  

-  <para>The Component Descriptor Editor is an Eclipse plug-in that provides a forms-based

-    interface for creating and editing UIMA XML descriptors. It supports most of the

-    descriptor formats, except the Collection Processing Engine descriptor, the PEAR

-    package descriptor and some remote deployment descriptors.</para>

-  

-  <section id="ugr.tools.cde.launching">

-    <title>Launching the Component Descriptor Editor</title>

-    

-    <para>Here&apos;s how to launch this tool on a descriptor contained in the examples. This

-      presumes you have installed the examples as described in the SDK Installation and Setup

-      chapter.</para>

-    

-    <itemizedlist spacing="compact"><listitem><para>Expand the uimaj-examples

-      project in the Eclipse Navigator or Package Explorer view</para></listitem>

-      

-      <listitem><para>Within this project, browse to the file

-        descriptors/tutorial/ex1/RoomNumberAnnotator.xml.</para></listitem>

-      

-      <listitem><para>Right-click on this file and select Open With &rarr; Component

-        Descriptor Editor. (If this option is not present, check to make sure you installed

-        the plug-ins as described in <olink targetdoc="&uima_docs_overview;"

-          targetptr="ugr.ovv.eclipse_setup.installation"/> of the <olink targetdoc="&uima_docs_overview;"/> book. 

-        The EMF plugin is also

-        required.)</para></listitem>

-      

-      <listitem><para>This should open a graphical editor and display the contents of the

-        RoomNumberAnnotator descriptor. </para></listitem></itemizedlist>

-    

-  </section>

-  

-  <section id="ugr.tools.cde.creating_new_ae_descriptor">

-    <title>Creating a New AE Descriptor</title>

-    

-    <para>A new AE descriptor file may be created by selecting the File &rarr; New &rarr;

-      Other... menu. This brings up the following dialog:

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="5.8in" format="JPG" fileref="&imgroot;image002.jpg"/>

-        </imageobject>

-        <textobject><phrase>Screenshot of selecting new UIMA component in Eclipse</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-    

-    <para>If the user then selects UIMA and Analysis Engine Descriptor File, and clicks the

-      Next &gt; button, the following dialog is displayed. We will cover creating other kinds

-      of components later in the documentation.

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="3.2in" format="JPG" fileref="&imgroot;image004.jpg"/>

-        </imageobject>

-        <textobject><phrase>Screenshot of selecting new UIMA component in Eclipse

-        after pushing Next</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-    

-    <para>After entering the appropriate parent folder and file name, and clicking Finish,

-      an initial AE descriptor file is created with the given name, and the descriptor is

-      opened up within the Component Descriptor Editor.</para>

-    

-    <para>At this point, the display inside the Component Descriptor Editor is the same

-      whether one started by creating a new AE descriptor, as in the preceding paragraph, or

-      one merely opened a previously created AE descriptor from, say, the Package Explorer

-      view. We show a previously created AE in the figure below:

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="5.7in" format="JPG" fileref="&imgroot;image006.jpg"/>

-        </imageobject>

-        <textobject><phrase>Screenshot of CDE showing overview page</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-    

-    <para>To see all the information shown in the main editor pane with less scrolling, double

-      click the title tab to toggle between the <quote>full screen</quote> and normal

-      views.</para>

-    

-    <para>It is possible to set the Component Descriptor Editor as the default editor for all

-      .xml files by going to Window &rarr; Preferences, and then selecting File Associations

-      on the left, and *.xml on the right, and finally by clicking on Component Descriptor

-      Editor, the Default button and then OK. If AE and Type System descriptors are not the

-      primary .xml files you work with within the Eclipse environment, we recommend not

-      setting the Component Descriptor Editor as your default editor for all .xml files. To

-      open an .xml file using the Component Descriptor Editor, if the Component Descriptor

-      Editor is not set as your default editor, right click on the file in the Package Explorer,

-      or other navigational view, and select Open With &rarr; Component Descriptor Editor.

-      This choice is remembered by Eclipse for subsequent open operations.</para>

-    

-  </section>

-  

-  <section id="ugr.tools.cde.pages_within_the_editor">

-    <title>Pages within the Editor</title>

-    

-    <para>The Component Descriptor Editor follows a standard Eclipse paradigm for these

-      kinds of editors. There are several pages in the editor; each one can be selected, one at a

-      time, by clicking on the bottom tabs. The last page contains the actual XML source file

-      being edited, and is displayed as plain text.</para>

-    

-    <para>The same set of tabs appear at the bottom of each page in the Component Descriptor

-      Editor. The Component Descriptor Editor uses this <quote>multi-page editor</quote>

-      paradigm to give the user a view of conceptually distinct portions of the Descriptor

-      metadata in separate pages. At any point in time the user may click on the Source tab to

-      view the actual XML source. The Component Descriptor Editor is, in a way, just a fancy GUI

-      for editing the XML. The tabs provide quick access to the following pages: Overview,

-      Aggregate, Parameters, Parameter Settings, Type System, Capabilities, Indexes,

-      Resources, and Source. We discuss each of these pages in turn.</para>

-    

-    <section id="ugr.tools.cde.adjusting_display_of_pages">

-      <title>Adjusting the display of pages</title>

-      

-      <para>Most pages in the editor have a <quote>sash</quote> bar. This is a light gray bar

-        which separates sub-sections of the page. This bar can be dragged with the mouse to

-        adjust how the display area is split between the two sash panes. You can also change the

-        orientation of the Sash so it splits vertically, instead of horizontally, by

-        clicking on the small icons at the top right of the page that look like this:

-        

-        <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width=".7in" format="JPG" fileref="&imgroot;image008.jpg"/>

-        </imageobject>

-        <textobject><phrase>Changing orientation of two window split</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-      

-      <para>All of the sections on a page have subtitles, with an indicator to the left which

-        you can click to collapse or expand that particular section. Collapsing sections can

-        sometimes be useful to free up screen area for other sections.</para>

-      

-    </section>

-  </section>

-  

-  <section id="ugr.tools.cde.overview_page">

-    <title>Overview Page</title>

-    

-    <para>Normally, the first page displayed in the Component Descriptor Editor is the

-      Overview page (the name of the page is shown in the GUI panel at the top left). If there is an

-      error reading and parsing the source, the Source page is shown instead, giving you the

-      opportunity to correct the problem. For many components, the Overview page contains

-      three sections: Implementation Details, Runtime Information and overall

-      Identification Information.</para>

-    

-    <section id="ugr.tools.cde.overview_page.implementation_details">

-      <title>Implementation Details</title>

-      

-      <para>In the Implementation Details section you specify the Implementation Language

-        and Engine Type. There are two kinds of Engines: Aggregate, and non-Aggregate (also

-        called Primitive). An Aggregate engine is one which is composed of additional

-        component engines and contains no code, itself. Several of the pages in the Component

-        Descriptor Editor have different formats, depending on the engine type.</para>

-      

-    </section>

-    <section id="ugr.tools.cde.overview_page.runtime_info">

-      <title>Runtime Information</title>

-      

-      <para>Runtime information is only applicable for primitive engines and is disabled

-        for aggregates and other kinds of descriptors. This is where you specify the class name of the annotator

-        implementation, if you are doing a Java implementation, or the C++ shared object or dll name,

-        if you are doing a C++ implementation.  Most Analysis Engines will specify that

-        they update the CAS, and that they may be replicated (for performance reasons) when deployed. If

-        a particular Analysis Engine must see every CAS (for instance, if it is counting the

-        number of CASes), then uncheck the <quote>multiple deployment allowed</quote>

-        box. If the Analysis Engine doesn&apos;t update the CAS, uncheck the <quote>updates

-        the CAS</quote> box. (Most CAS Consumers do not update the CAS, and this parameter

-        defaults to unchecked for new CAS Consumer descriptors).</para>

-      

-      <para>Analysis engines are written using the CAS Multiplier APIs 

-        (see <olink targetdoc="&uima_docs_tutorial_guides;"/> 

-             <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.cm"/>) 

-        can create additional CASes for analysis. To specify that they

-        do this, check the <quote>returns new artifacts</quote>.</para>

-      

-    </section>

-    

-    <section id="ugr.tools.cde.overview_page.overall_id_info">

-      <title>Overall Identification Information</title>

-      

-      <para>The Name should be a human-readable name that describes this component. The

-        Version, Vendor, and Description fields are optional, and are arbitrary

-        strings.</para>

-      

-    </section>

-  </section>

-  

-  <section id="ugr.tools.cde.aggregate_page">

-    <title>Aggregate Page</title>

-    

-    <para>For primitive Analysis Engines, Flow Controllers or Collection Processing

-      components, the Aggregate page is not used. For aggregate engines, the page looks like

-      this:

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="5.7in" format="JPG" fileref="&imgroot;image010.jpg"/>

-        </imageobject>

-        <textobject><phrase>CDE Aggregate page</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-    

-    <para>On the left we see a list of component engines, and on the right information about the

-      flow. If you hover the mouse over an item in the list of component engines, that

-      engine&apos;s description meta data will be shown. If you right-click on one of these

-      items, you get an option to open that delegate descriptor in another editor instance.

-      Any changes you make, however, won&apos;t be seen until you close and reopen the editor

-      on the importing file.</para>

-    

-    <para>Engines can be added to the list on the left by clicking the Add button at the bottom of

-      the Component Engine section. This brings up one of the following two dialogs:

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="3.875in" format="JPG" fileref="&imgroot;import-by-location.jpg"/>

-        </imageobject>

-        <textobject><phrase>Adding an Analysis Engine to an Aggregate, by location</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-    

-    <para>This dialog lets you select

-      a descriptor from your workspace, or browse the file system to select a descriptor. 

-      </para>

-    

-    <para>Or, if you have selected to import by name, this dialog is shown:

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="5.296875in" format="JPG" fileref="&imgroot;import-by-name.jpg"/>

-        </imageobject>

-        <textobject><phrase>Adding an Analysis Engine to an Aggregate, by name</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-        

-    <para>You can specify that the import should be by Name (the name is looked up using both the

-      Project&apos;s class path, and DataPath), or by location. If it is by name, 

-      the dialog shows the available xml files on the class path, to pick from.  If the

-      one you want isn't showing, this means it isn't on the enclosing Eclipse Java Project's

-      classpath, nor on the datapath, and one of those needs to be updated to include the 

-      path to the resource.  If the name picked is

-      <literal>com/company/prod/xyz.xml</literal>, the name in

-      the descriptor will be <quote><literal>com.company.prod.xyz</literal></quote>.

-      The "Browse the file system..." button is disabled when import by name is checked, because

-      the file system is not the source of the imports - rather, its the resources on the 

-      classpath or datapath that are.</para>

-    

-    <para>

-      If it is by location, the file reference is converted to a relative reference if

-      possible, in the descriptor.</para>

-    

-    <para>The final selection at the bottom tells whether or not the selected engine(s)

-      should automatically be added to the end of the flow section (the right section on the

-      Aggregate page). The OK button does not become activated until a descriptor

-      file is selected.</para>

-    

-    <para>To remove an analysis engine from the component engine list simply select an engine

-      and click the Remove button, or press the delete key. If the engine is already in the flow

-      list you will be warned that deletion will also delete the specified engine from this

-      list.</para>

-    

-    <section id="ugr.tools.cde.aggregate_page.adding_components_more_than_once">

-      <title>Adding components more than once</title>

-      

-      <para>Components may be added to the left panel more than once. Each of these components

-        will be given a key which is unique. A typical reason this might be done is to use a

-        component in a flow several times, but have each use be associated with different

-        configuration parameters (different configuration parameters can be associated

-        with each instance).</para>

-    </section>

-    

-    <section

-      id="ugr.tools.cde.aggregate_page.adding_removing_components_from_flow">

-      <title>Adding or Removing components in a flow</title>

-      

-      <para>The button in-between the Component Engines and the Flow List, labeled

-        <literal>&gt;&gt;</literal>, adds a chosen engine to the flow list and the button

-        labeled <literal>&lt;&lt;</literal> removes an engine from the flow list. To add an

-        engine to the flow list you must first select an engine from the left hand list, and then

-        press the <literal>&gt;&gt;</literal> button. Engines may appear any number of

-        times in the flow list. To remove an engine from the flow list, select an engine from the

-        right hand list and press the <literal>&lt;&lt;</literal> button.</para>

-    </section>

-    

-    <section id="ugr.tools.cde.aggregate_page.adding_remote_aes">

-      <title>Adding remote Analysis Engines</title>

-      

-      <para>There are two ways to add remote engines: add an existing descriptor, which

-        specifies a remote engine (just as if you were adding a non-remote engine) or use the

-        Add Remote button which will create a remote descriptor, save it, and then import it,

-        all in one operation. The Add Remote button enables you to easily specify the

-        information needed to create a remote service descriptor for a remote AE - one that

-        runs on a different computer connected over the network. There are 3 kinds of 

-        these: two are variants of the Service Client

-        descriptor, described in <olink targetdoc="&uima_docs_ref;"/> <olink targetdoc="&uima_docs_ref;"

-          targetptr="ugr.ref.xml.component_descriptor.service_client"/>;

-          the other is the UIMA-AS JMS Service descriptor, described in

-          <olink targetdoc="&uima_docs_as;"/> <olink targetdoc="&uima_docs_as;"  

-          targetptr="ugr.async.ov.concepts.jms_descriptor"/>. The Add

-        Remote button creates an instance of one of these descriptors, 

-        saves it as a file in the workspace, and

-        imports it into the aggregate.</para>

-      

-      <para>Of course, if you already have a remote service descriptor, you can add it to the

-        set of delegates using the <code>Add</code> button, just like adding other kinds of analysis engines.</para>

-      

-      <para>After clicking on <code>Add Remote</code>, the following dialog is displayed:

-        

-        

-        <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="4.9in" format="JPG" fileref="&imgroot;image014v2.jpg"/>

-        </imageobject>

-        <textobject><phrase>Adding a remote client to an aggregate</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-      

-      <para>To define a remote service you specify the Service Kind, Protocol Service Type,

-        URI and Key. You can also specify a Timeout in milliseconds, used by the JMS services,

-        and a VNS Host and Port used by the Vinci Service. 

-        The JMS service has additional timeouts and other parameters you may specify. 

-        Just like when one adds an engine from

-        the file system, you have the option of adding the engine to the end of the flow. The

-        Component Descriptor Editor currently only supports Vinci services using

-        this dialog.</para>

-      

-      <para>Remote engines are added to the descriptor using the

-        &lt;import ... &gt; syntax. The information you specify here is saved in the Eclipse

-        project as a file, using a generated name, &lt;key-name&gt;.xml, where

-        &lt;key-name&gt; is the name you listed as the Key. Because of this, the key-name must

-        be a valid file name. If you want a different name, you can change the path information

-        in the dialog box.</para>

-    </section>

-    

-    <section id="ugr.tools.cde.aggregate_page.connecting_to_remote_services">

-      <title>Connecting to Remote Services</title>

-      

-      <para>If you are using the Vinci protocol, it requires that you specify the location of

-        the Vinci Name Server (an IP address and a Port number). You can specify these in the

-        service descriptor, or globally, for your Eclipse workspace, using the Eclipse menu

-        item: Window &rarr; Preferences... &rarr; UIMA Preferences.

-      </para>

-      

-      <para>If the remote service

-        is available (up and running), additional operations become possible. For

-        instance, hovering the mouse over the remote descriptor will show the description

-        metadata from the remote service.</para>

-    </section>

-    

-    <section id="ugr.tools.cde.aggregate_page.finding_aes_by_searching">

-      <title>Finding Analysis Engines by searching</title>

-      

-      <para>The next button that appears between the component engine list and the flow list

-        is the Find AE button. When this button is pressed the following dialog is displayed,

-        which allows one to search for AEs by name, by input or output types, or by a combination

-        of these criteria. This function searches the existing Eclipse workspace for

-        matching *.xml descriptor source files; it does not look inside Jar files.

-        

-        

-        <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="5.3in" format="JPG" fileref="&imgroot;image016.jpg"/>

-        </imageobject>

-        <textobject><phrase>Searching for an AE to add to an aggregate</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-      

-      <para>The search automatically adds a <quote>match any characters</quote> - style

-        (*) wildcard at the beginning and end of anything entered. Thus, if person is

-        specified for an output type, a <quote>*person*</quote> search is performed. Such a

-        search would match such things as <quote>my.namespace.person</quote> and

-        <quote>person.governmentOfficial.</quote> One can search in all projects or one

-        particular project. The search does an implicit <emphasis>and</emphasis> on all

-        fields which are left non-blank.</para>

-    </section>

-    

-    <section id="ugr.tools.cde.aggregate_page.component_engine_flow">

-      <title>Component Engine Flow</title>

-      

-      <para>The UIMA SDK currently supports three kinds of sequencing flows: Fixed,

-        CapabilityLanguageFlow, and user-defined

-        (see <olink targetdoc="&uima_docs_ref;"/> <olink targetdoc="&uima_docs_ref;"

-          targetptr="ugr.ref.xml.component_descriptor.aes.aggregate.flow_constraints"/>).

-        The first two require specification of a linear flow sequence;

-        this linear flow sequence can also be read by a user-defined flow controller (what use

-        is made of it is up to the user-defined flow controller). The Component Engine Flow

-        section allows specification of these items.</para>

-      

-      <para>The pull-down labeled Flow Kind picks between the three flow models. When the

-        user-defined flow is selected, the Browse and Search buttons become enabled to let

-        you pick the flow controller XML descriptor to import.

-        

-        

-        <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="3.8in" format="JPG" fileref="&imgroot;image018.jpg"/>

-        </imageobject>

-        <textobject><phrase>Specifying flow control</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-      

-      <para>The key name value is set automatically from the XML descriptor being imported,

-        and enables parameters to be overridden for that descriptor (see following

-        sections).</para>

-      

-      <para>The Up and Down buttons to the right in the Flow section are activated when an

-        engine in the flow is selected. The Up button moves the selected engine up one place in

-        the execution order, and down moves the selected engine down one place in the

-        execution order. Remember that engines can appear multiple times in the flow (or not

-        at all).</para>

-      

-    </section>

-  </section>

-  

-  <section id="ugr.tools.cde.parm_definition">

-    <title>Parameters Definition Page</title>

-    

-    <para>There are two pages for parameters: the first one is where parameters are defined,

-      and the second one is where the parameter settings are configured. The first page is the

-      Parameter Definition page and has two alternatives, depending on whether or not the

-      descriptor is an Aggregate or not. We start with a description of parameter definitions

-      for Primitive engines, CAS Consumers, Collection Readers, CAS Initializers, and Flow

-      Controllers. Here is an example:

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="5.2in" format="JPG" fileref="&imgroot;image020.jpg"/>

-        </imageobject>

-        <textobject><phrase>Parameter Definitions - not Aggregate</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-    

-    <para>The first checkbox at the top simplifies things if you are not using Parameter

-      Groups (see the following section for a discussion of groups). In this case, leave the

-      check box unchecked. The main area shows a list of parameter definitions. Each

-      parameter has a name, which must be unique for this Analysis Engine. The first three

-      attributes specify whether the parameter can have a single or multiple values (an array

-      of values), whether it is Optional or Mandatory, and what the value type it can hold

-      (String, Integer, Float, and Boolean).  If an external override name has been specified 

-      an attribute of "XO" is included. See <olink targetdoc="&uima_docs_ref;"/> 

-      <olink targetdoc="&uima_docs_ref;" 

-      targetptr="ugr.ref.xml.component_descriptor.aes.external_configuration_parameter_overrides"/>

-      for a discussion of external configuration parameter overrides.</para>

-    

-    <para>In addition to using the buttons on the right to edit this information, you can

-      double-click a parameter to edit it, or remove (delete) a selected parameter by

-      pressing the delete key. Use the Add button to add a new parameter to the list.</para>

-    

-    <para>Parameters have an additional description field, which you can specify when you

-      add or edit a parameter. To see the value of the description, hover the mouse over the

-      item, as shown in the picture below. If the parameter has an external override name its value

-      is included in the hover.

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="5.5in" format="JPG" fileref="&imgroot;image022.jpg"/>

-        </imageobject>

-        <textobject><phrase>Parameter description shown in a hover message</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-    

-    <section id="ugr.tools.cde.parm_definition.using_groups">

-      <title>Using groups</title>

-      

-      <para>The group concept for parameters arose from the observation that sets of

-        parameters were sometimes associated with different configuration needs. As an

-        example, you might have an Analysis Engine which needed different configuration

-        based on the language of a document.</para>

-      

-      <para>To use groups, you check the <quote>Use Parameter Groups</quote> box. When you

-        do this, you get the ability to add groups, and to define parameters within these

-        groups. You also get a capability to define <quote>Common</quote> parameters,

-        which are parameters which are defined for all groups. Here is a screen shot showing

-        some parameter groups in use:

-        

-        

-        <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="5.2in" format="JPG" fileref="&imgroot;image024.jpg"/>

-        </imageobject>

-        <textobject><phrase>Using parameter groups</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-      

-      <para>You can see the <quote>&lt;Common&gt;</quote> parameters as well as two

-        different sets of groups.</para>

-      

-      <para>The Default Group is an optional specification of what Group to use if the

-        parameter is not available for the group requested.</para>

-      

-      <para>The Search strategy specifies what to do when a parameter is not available for the

-        group requested. It can have the values of None, language_fallback, or

-        default_fallback. These are more fully described in the section 

-        <olink targetdoc="&uima_docs_ref;"/> <olink targetdoc="&uima_docs_ref;"

-          targetptr="ugr.ref.xml.component_descriptor.aes.configuration_parameter_declaration"/>

-        .</para>

-      

-      <para>Groups are added using the Add Group button. Once added, they can be edited or

-        removed, using the buttons to the right, or the standard gestures for editing

-        (double-clicking the item) and removing (pressing the delete key after an item is

-        selected). Removing a group removes all the parameter definitions in the group. If

-        you try and remove the <quote>&lt;Common&gt;</quote> group, it just removes the

-        parameters in the group.</para>

-      

-      <para>Each entry for a group in the table specifies one or more group names. For example,

-        the highlighted entry above, specifies two groups: <quote>myNewGroup2</quote>

-        and <quote>mg3</quote>. The parameter definition underneath is considered to be in

-        both groups.</para>

-      

-    </section>

-    

-    <section id="ugr.tools.cde.parm_definition.adding">

-      <title>Adding or Editing a Parameter</title>

-

-      <para>When creating or modifying a parameter both a unique name and a valid type must be

-       specified. The Description and External Override fields are optional.  The defaults for the two

-       checkboxs indicate a single-valued optional parameter in the example below:

-        <screenshot>

-          <mediaobject>

-            <imageobject>

-              <imagedata width="4.7in" format="JPG" fileref="&imgroot;image025.jpg"/>

-            </imageobject>

-            <textobject><phrase>Aggregate parameters</phrase>

-            </textobject>

-          </mediaobject>

-        </screenshot></para>

-

-    </section>

-

-    <section id="ugr.tools.cde.parm_definition.aggregates">

-      <title>Parameter declarations for Aggregates</title>

-      

-      <para>Aggregates declare parameters which always must override a parameter setting

-        for a component making up the aggregate. They do this using the version of this page

-        which is shown when the descriptor is an Aggregate; here&apos;s an example:

-        

-        

-        <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="5.7in" format="JPG" fileref="&imgroot;image026.jpg"/>

-        </imageobject>

-        <textobject><phrase>Aggregate parameters</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-      

-      <para>There is an additional panel shown (on the right) which lists all of the

-        components by their key names, and shows for each of them their defined parameters. To

-        add a new override for one or more of these parameters to the aggregate, select the

-        component parameter you wish to override and push the Create Override button (or, you

-        can just double-click the component parameter). This will automatically add a

-        parameter of the same name (by default &ndash; you can change the name if you like) to

-        the aggregate, putting it into the same group(s) (if groups are being used in the

-        component &ndash; this is required), and setting the properties of the parameter to

-        match those of the component (this is required).</para>

-      <note><para>If the name of the parameter being added already is in use in the aggregate,

-      and the parameters are not compatible, a new parameter name is generated by suffixing

-      the name with a number. If the parameters are compatible, the selected component

-      parameter is added to the existing aggregate parameter, as an additional override. If

-      you don&apos;t want this behavior, but want to have a new name generated in this case,

-      push the Create non-shared Override button instead, or hold down the

-      <quote>shift</quote> key when double clicking the component parameter.</para>

-      

-      <para>The required / optional setting in the aggregate parameter is set to match that of

-        the parameter being overridden. You may want to make an optional delegate parameter

-        required. You can do this by changing that value manually in the source editor view.

-        </para></note>

-      

-      <para>In the above example, the user has just double-clicked the

-        <quote>TypeNames</quote> parameter in the <quote>NameRecognizer</quote>

-        component. This added that parameter to this aggregate under the <quote>&lt;Not in

-        any group&gt;</quote> section &ndash; since it wasn&apos;t part of a group.</para>

-      

-      <para>Once you have added a parameter definition to the aggregate, you can use the

-        buttons on the right side of the left panel to add additional overrides or remove

-        parameters or their overrides. <phrase

-          id="ugr.tools.cde.parm_definition.removing_groups"> You can also remove

-        groups; removing a group is like removing all the parameter definitions in the

-        group.</phrase></para>

-      

-      <para>In addition to adding one parameter at a time from a component, you can also add all

-        the parameters for a group within a component, or all the parameters in the component,

-        by selecting those items.</para>

-      

-      <para>If you double-click (or push Create Override) the

-        <quote>&lt;Common&gt;</quote> group or a parameter in the &lt;Common&gt; group in

-        a component, a special group is created in the Aggregate consisting of all of the

-        groups in that component, and the overriding parameter (or parameters) are added to

-        that. This is done because each component can have different groups belonging to the

-        Common group notion; the Common group for a component is just shorthand for all the

-        groups in that component.</para>

-      

-      <para>The Aggregate&apos;s specification of the default group and search strategy

-        override any specifications contained in the components.</para>

-      

-    </section>

-  </section>

-  

-  <section id="ugr.tools.cde.parameter_settings">

-    <title>Parameter Settings Page</title>

-    

-    <para>The Parameter Settings page is rather straightforward; it is where the user

-      defines parameter settings for their engines. An example of such a page is given below:

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="5.7in" format="JPG" fileref="&imgroot;image028.jpg"/>

-        </imageobject>

-        <textobject><phrase>Parameter settings page</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-    

-    <para>For single valued attributes, the user simply types the default value into the

-      Value box on the right hand side. For multi-valued parameters the user should use the

-      Add, Edit and Remove buttons to manage the list of multiple parameter values.</para>

-    

-    <para>Values within groups are shown with each group separately displayed, to allow

-      configuring different values for each group.</para>

-    

-    <para>Values are checked for validity. For Boolean values in a list, use the words

-      <literal>true</literal> or <literal>false</literal>.</para>

-    <note><para>If you specify a value in a single-valued parameter, and then delete all the

-    characters in the value, the CDE will treat this as if you wanted to not specify any setting

-    for this parameter. In order to specify a 0 length string setting for a String-valued

-    parameter, you will have to manually edit the XML using the <quote>Source</quote> tab.

-    </para>

-    <para> For array valued parameters, if you remove all of the entries for a particular array

-      parameter setting, the XML will reflect a 0-length array. To change this to an

-      unspecified parameter setting, you will have to manually edit the XML using the

-      <quote>Source</quote> tab. </para></note>

-    

-  </section>

-  

-  <section id="ugr.tools.cde.type_system">

-    <title>Type System Page</title>

-    

-    <para>This page declares the type system used by the annotator. For aggregates it is

-      derived by merging the type systems of all constituent AEs. The types used by the AE

-      constitute the language in which the inputs and outputs are described in the

-      Capabilities page and also affect the choice of indexes on the Indexes page. The Type

-      System page looks like the following:

-            

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="6.4in" format="JPG" fileref="&imgroot;limitJCasGenType.jpg"/>

-        </imageobject>

-        <textobject><phrase>Type System declaration page</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-    

-    <para>Before discussing this page in detail, it is important to note that there are 3

-      settings that affect the operation of this page. These are accessed by selecting the

-      UIMA &rarr; Settings (or by going to the Eclipse Window &rarr; Preferences &rarr; UIMA

-      Preferences) and checking or unchecking one of the following: <quote>Auto generate

-      .java files when defining types</quote>, 

-      <quote>Generate JCasGen classes only for types defined within the local project scope</quote> 

-      and <quote>Display fully qualified type

-      names.</quote></para>

-    

-    <para id="ugr.tools.cde.auto_jcasgen">When the Auto generate option is checked and the development language for the AE is

-      Java, any time a change is made to a type and the change is saved, the corresponding .java

-      files are generated using the JCasGen tool. The results are stored in the primary source

-      directory defined for the project. The primary source directory is that listed first

-      when you right click on your project and select Properties &rarr; Java Build Path, click

-      on the Source tab and look in the list box under the text that reads: <quote>Source folder

-      on build path.</quote> If no source folders are defined, you will get a warning that you

-      have no source folders defined and JCasGen will not be run. (For information on JCasGen

-      see <olink targetdoc="&uima_docs_tools;"/> 

-      <olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.jcasgen"/>).

-      When JCasGen is run, you can monitor the progress of the generation by observing the

-      status on the Eclipse status line (normally at the bottom of the Eclipse window).

-      JCasGen runs on the fully-merged type system, consisting of the type specification

-      plus any imported type system, plus (for aggregates) the merged type systems of all the

-      components in an aggregate.</para>

-    

-  <warning><para>If the components of the aggregate have different definitions for the same 

-    type name, the CDE will show a warning.  It is possible to continue past this warning, 

-    in which case the CDE will produce the correct 

-    Java source files representing the merged types (that is, the

-    type definition that contains all of the features defined on that type by all of your

-    components).  However, it is not recommended to use this feature 

-    (of having different definitions for the same type name) since it can make it 

-    difficult to combine/package your annotator with others. See <olink targetdoc="&uima_docs_ref;"/> 

-    <olink targetdoc="&uima_docs_ref;"

-      targetptr="ugr.ref.jcas.merging_types_from_other_specs"/> for more information.

-  </para></warning>    

-    

-    <note><para>In addition to running automatically, you can manually run JCasGen on the

-    fully merged type system by clicking the JCasGen button, or by selecting Run JCasGen from

-    the UIMA pulldown menu: </para></note>

-    

-    

-    <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="5.2in" format="JPG" fileref="&imgroot;image032.jpg"/>

-        </imageobject>

-        <textobject><phrase>Setting JCasGen options</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot>

-    

-    <para>When <quote>Generate JCasGen classes only for types defined within the local project scope</quote>

-    is checked, then JCasGen skips generating classes for types that are imported from sources outside this project.

-    This might be done, for instance, if you have an aggregate which is importing type systems from its delegates,

-    some of which are defined in other projects, and have JCasGen'd files already present in those other projects.

-    </para>

-    

-    <para>The UIMA settings and preferences for controlling this are used to initialize a particular instance of the

-    editor, when it is started.  Following that, you can override this setting, just for that editor, by checking or

-    unchecking the box shown on the type system page:</para>

-    

-    <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width=".73in" format="JPG" fileref="&imgroot;limitJCasGen.jpg"/>

-        </imageobject>

-        <textobject><phrase>Setting JCasGen options</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot>

-    

-    <note><para>If this is checked, and one of the types that would be excluded has merged type features, an error message

-    is issued - because JCasGen will need to be run for the combined (merged) type in order to get a class definition

-    that will work for this configuration (have access to all the features).  If this happens, you have to run without

-    limiting JCasGen, and manually delete any duplicated/unwanted source results.</para></note>

-    

-    

-    <para>When <quote>Display fully qualified type names</quote> is left unchecked, the

-      namespace of types is not displayed, i.e. if a fully qualified type name is

-      my.namespace.person, only the abbreviated type name person will be displayed. In the

-      Type page diagram shown above, <quote>Display fully qualified type names</quote> is

-      in fact unchecked.</para>

-    

-    <para>To add, edit, or remove types the buttons on the top left section are used. When

-      adding or editing types, fully qualified type names should of course be used,

-      regardless of whether the <quote>Display fully qualified type names</quote> is

-      unchecked. Removing or editing a type will have a cascading effect in that the type

-      removal/edit will effect inputs, outputs, indexes and type priorities in the natural

-      way.</para>

-    

-    <para>When a type is added, this dialog is shown:

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="4.2in" format="JPG" fileref="&imgroot;image034.jpg"/>

-        </imageobject>

-        <textobject><phrase>Adding a type</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-    

-    <para>Type names should be specified using a namespace. The namespace is like a Java

-      package name, and serves to insure type names are unique. It also serves as the package

-      name for the generated JCas classes. The namespace name is the set of names up to the last

-      period in the string.</para>

-    

-    <para>The supertype must be picked from an existing type. The entry field for the

-      supertype supports Eclipse-style content assist. To use it, put the cursor in the

-      supertype field, and type a letter or two of the supertype name (lower case is fine),

-      either starting with the name space, or just with the type name (without the name space),

-      and hold down the Control key and then press the spacebar. When you do this, you can see a

-      list of suitable matching types. You can then type more letters to narrow down your

-      choices, or pick the right entry with the mouse.</para>

-    

-    <para>To see the available types and pick one, press the Browse button. This will show the

-      available types, and as you type letters for the type name (in lower case &ndash;

-      capitalization is ignored), the available types that match are narrowed. When

-      you&apos;ve typed enough to specify the type you want, press Enter. Or you can use the

-      list of matching type names and pick the one you want with the mouse.</para>

-    

-    <para>Once you&apos;ve added the type, you can add features to it by highlighting the

-      type, and pressing the Add button.</para>

-    

-    <para>If the type being defined is a subtype of uima.cas.String, the Add button allows you

-      to add allowed values for the string, instead of adding features.</para>

-    

-    <para>To edit a type or feature, you can double click the entry, or highlight the entry and

-      press the Edit button. To delete a type or feature, you highlight the entry to be deleted,

-      and click the delete button or push the delete key.</para>

-    

-    <para>If the range of a feature is an array or one of the built-in list types, an additional

-      specification allows you to specify if multiple references to the object referenced by

-      this feature are allowed. If they are not allowed then the XMI serialization of

-      instances of this type use a more efficient format.</para>

-    

-    <para>If the range of a feature is an array of Feature Structures, then it is possible to

-      specify an element type for the array. This information is used in the XMI serialization

-      and also by the JCas generation routines to generate more efficient code.

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="4.2in" format="JPG" fileref="&imgroot;image036.jpg"/>

-        </imageobject>

-        <textobject><phrase>Specifying a Feature Structure</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-    

-    <para>It is also possible to import type systems for inclusion in your descriptor. To do

-      this, use the Type Import panel&apos;s<literal> Add...</literal> button. This

-      allows you to import a type system descriptor.</para>

-    

-    <para>When importing by name, the name is resolved using the class path for the Eclipse

-      project containing the descriptor file being edited, or by looking up this name in the

-      UIMA DataPath. The DataPath can be set by pushing the Set DataPath button. It will be

-      remembered for this Eclipse project, as a project Property, so you only have to set it

-      once (per project). The value of the DataPath setting is written just like a class path,

-      and can include directories or JAR files, just as is true for class paths.</para>

-    

-    <para>The following dialog allows you to pick one or more files from the Eclipse

-      workspace, or one file (at a time) from the file system:

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="3.5in" format="JPG" fileref="&imgroot;import-chooser.jpg"/>

-        </imageobject>

-        <textobject><phrase>Picking files for importing</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-    

-    <para>This is essentially the same dialog as was used to add component engines to an

-      aggregate. To import from a type system descriptor that is not part of your Eclipse

-      workspace, click the Browse the file system.... button.</para>

-    

-    <para>Imported types are validated, and if OK, they are added to the list in the Imported

-      Type Systems section of the Type System page. Any types they define are merged with the

-      existing type system.</para>

-    

-    <para>Imported types and features which are only defined in imports are shown in the Type

-      System section, but in a grayed-out font; these type cannot be edited here. To change

-      them, open up the imported type system descriptor, and change them there.</para>

-    

-    <para>If you hover the mouse over an import specification, it will show more information

-      about the import. If you right-click, it will bring up a context menu that allows opening

-      the imported file in the Editor, if the imported file is part of the Eclipse workspace.

-      Changes you make, however, won&apos;t be seen until you close and reopen the editor on

-      the importing file.</para>

-    

-    <para>It is not possible to define types for an aggregate analysis engine. In this case the

-      type system is computed from the component AEs. The Type System information is shown in a

-      grayed-out font.</para>

-    

-    <section id="ugr.tools.cde.type_system.exporting">

-      <title>Exporting</title>

-      

-      <para>In addition to importing type specifications, you can export as well. When you

-        push the Export... button, the editor will create a new importable XML descriptor for

-        the types in this type system, and change the existing descriptor to import that newly

-        created one.

-        

-        

-        <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="3.75in" format="JPG" fileref="&imgroot;image040.jpg"/>

-        </imageobject>

-        <textobject><phrase>Exporting a type system</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-      

-      <para>The base file name you type is inserted into the path in the line below

-        automatically. You can change the path where the generated part descriptor is stored

-        by overtyping the lower text box. When you click OK, the new part descriptor will be

-        generated, and the current descriptor will be changed to import that part.</para>

-      

-    </section>

-  </section>

-  

-  <section id="ugr.tools.cde.capabilities">

-    <title>Capabilities Page</title>

-    

-    <para>Capabilities come in <quote>sets</quote>. You can have multiple sets of

-      capabilities; each one specifies languages supported, plus inputs and outputs of the

-      Analysis Engine. The idea behind having multiple sets is the concept that different

-      inputs can result in different outputs. Many Analysis Engines, though, will probably

-      define just one set of capabilities. A sample Capabilities page is given below:

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="5.2in" format="JPG" fileref="&imgroot;image042.jpg"/>

-        </imageobject>

-        <textobject><phrase>Capabilities page</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-    

-    <para>When defining the capabilities of a primitive analysis engine, input and output

-      types can be any type defined in the type system. When defining the capabilities of an

-      aggregate the inputs must be a subset of the union of the inputs in the constituent

-      analysis engines and the outputs must be a subset of the union of the outputs of the

-      constituent analysis engines.</para>

-    

-    <para>To add a type, first select something in the set you wish to add the type to, and press

-      Add Type. The following dialog appears presenting the user with a list of types which are

-      candidates for additional inputs:

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="4.4in" format="JPG" fileref="&imgroot;image044.jpg"/>

-        </imageobject>

-        <textobject><phrase>Adding a type to the capabilities page</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-    

-    <para>Follow the instructions to mark the types as input and / or output (a type can be

-      both). By default, the &lt;all features&gt; flag is set to true. If you want to specify a

-      subset of features of a type, read on.</para>

-    

-    <para>When types have features, you can specify what features are input and / or output. A

-      type doesn&apos;t have to be an output to have an output feature. For example, an

-      Analysis Engine might be passed as input a type Token, and it adds (outputs) a feature to

-      the existing Token types. If no new Token instances were created, it would not be an

-      output Type, but it would have features which are output.</para>

-    

-    <para>To specify features as input and / or output (they can be both), select a type, and

-      press Add. The following dialog box appears:

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="4in" format="JPG" fileref="&imgroot;image046.jpg"/>

-        </imageobject>

-        <textobject><phrase>Specifying features as input or output</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-    

-    <para>To mark a feature as being input and / or output, click the mouse in the input and / or

-      output column for the feature. If you select &lt;all features&gt;, it unmarks any

-      individual feature you selected, since &lt;all features&gt; subsumes all the

-      features.</para>

-    

-    <para>The Languages part of the capability is where you specify what languages are

-      supported by the Analysis Engine. Supported languages should be listed using either a

-      two letter ISO-639 language code, or an ISO-639 language code followed by a hyphen and then a two-letter

-      ISO-3166 country code. Add a language by selecting Languages and pressing the Add

-      button. The dialog for adding languages is given below.

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="4in" format="JPG" fileref="&imgroot;image048.jpg"/>

-        </imageobject>

-        <textobject><phrase>Specifying a language</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-    

-    <para>The Sofa part of the capability is optional; it allows defining Sofa names that this

-      component uses, and whether they are input (meaning they are created outside of this

-      component, and passed into it), or output (meaning that they are created by this

-      component). Note that a Sofa can be either input or output, but can&apos;t be

-      both.</para>

-    

-    <para>To add a Sofa name (which is synonymous with the view name), press the Add Sofa

-      button, and this dialog appears:

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="4.2in" format="JPG" fileref="&imgroot;image050.jpg"/>

-        </imageobject>

-        <textobject><phrase>Specifying a Sofa name</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-    

-    <section id="ugr.tools.cde.capabilities.sofa_name_mapping">

-      <title>Sofa (and view) name mappings</title>

-      

-      <para>Sofa names, once created, are used in Sofa Mappings. These are optional

-        mappings, done in an aggregate, that specify which Sofas are the same ones but with

-        different names. The Sofa Mappings section is minimized unless you are editing an

-        Aggregate descriptor, and have one or more Sofa names defined for the aggregate. In

-        that case, the Sofa Mappings section will look like this:

-        

-        

-        <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="5.4in" format="JPG" fileref="&imgroot;image052.jpg"/>

-        </imageobject>

-        <textobject><phrase>Sofa mappings</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-      

-      <para>Here the aggregate has defined two input Sofas, named

-        <quote>MyInputSofa</quote>, and <quote>AnotherSofa</quote>. Any named sofas in

-        the aggregate&apos;s capabilities will appear in the Sofa Mapping section, listed

-        either under Inputs or Outputs. Each name in the Mappings has 0 or more delegate

-        (component) sofa names mapped to it. A delegate may have multiple Sofas, as in this

-        example, where the GovernmentOfficialRecognizer delegate has Sofas named

-        <quote>so1</quote> and <quote>so2</quote>.</para>

-      

-      <para>Delegate components may be written as Single-View components. In this case,

-        they have one implicit, default Sofa (<quote>_InitialView</quote>), and to map to

-        it you use the form shown for the <quote>NameRecognizer</quote> &ndash; you map to

-        the delegate&apos;s key name in the aggregate, without specifying a Sofa name. You

-        can also specify the sofa name explicitly, e.g.,

-        NameRecognizer/_InitialView.</para>

-      

-      <para>To add a new mapping, select the Aggregate Sofa name you wish to add the mapping

-        for, and press the Add button. This brings up a window like this, showing all available

-        delegates and their Sofas; select one or more (use the normal multi-select methods)

-        of these and press OK to add them.

-        

-        

-        <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="5.7in" format="JPG" fileref="&imgroot;image054.jpg"/>

-        </imageobject>

-        <textobject><phrase>Adding a Sofa mapping</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-      

-      <para>To edit an existing mapping, select the mapping and press Edit. This will show the

-        existing mapping with all mapped items <quote>selected</quote>, and other

-        available items unselected. Change the items selected to match what you want,

-        deselecting some, and perhaps selecting others, and press OK.</para>

-      

-    </section>

-  </section>

-  

-  <section id="ugr.tools.cde.indexes">

-    <title>Indexes Page</title>

-    

-    <para>The Indexes page is where the user declares what indexes and type priority lists are

-      used by the analysis engine. Indexes are used to determine which Feature

-      Structures of a particular type are fetched, using an iterator in the UIMA API.  An

-      unpopulated Indexes page is displayed below:

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="5.5in" format="JPG" fileref="&imgroot;image056.jpg"/>

-        </imageobject>

-        <textobject><phrase>Index page</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-    

-    <para>Both indexes and type priority lists can have imports. These imports work just like

-      the type system imports, described above. Both indexes and type priority lists can be

-      exported to new component descriptors, using the Export... button, just like the type

-      system export operation described above.</para>

-    

-    <para>The built-in Annotation Index is always present. It is based on the built-in type

-      <literal>uima.tcas.Annotation </literal>and has keys begin (Ascending), end

-      (Descending) and TYPE_PRIORITY. There are no built-in type priorities, so this last

-      sort item does not play a role in the index unless type priorities are specified.</para>

-    

-    <para>Type priority may be combined with other keys. Type priorities are defined in the

-      Priority Lists section, using one or more priority list. A given priority list gives an

-      ordering among a group of types. Types that appear higher in the priority list are given

-      higher priority, in other words, they sort first when TYPE_PRIORITY is specified as the

-      index key. Subtypes of these types are also ordered in a consistent manner, unless

-      overridden by another specific type priority specification. To get the ordering used

-      among all the types, all of the type priority lists are merged. This gives a partial

-      ordering among the types. Ties are resolved in an unspecified fashion. The Component

-      Descriptor Editor checks for incompatible orderings, and informs the user if they

-      exist, so they can be corrected.</para>

-    

-    <para>To create a new index, use the Add Index button in the top left section. This brings up

-      this dialog:

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="4in" format="JPG" fileref="&imgroot;image058.jpg"/>

-        </imageobject>

-        <textobject><phrase>Adding a new index</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-    

-    <para>Each index needs a globally unique index name. Every index indexes one CAS type (including

-      its subtypes). If you're using Eclipse 3.2 or later, the entry field for this 

-      has content assist (start typing the type name

-      and press Control &ndash; Spacebar to get help, or press the Browse button to pick a

-      type).</para>

-    

-    <para>Indexes can be sorted, in which case you need to specify one or more keys to sort on.

-      Sort keys are selected from features whose range type is Integer, Float, or String. Some

-      elements will be disabled if they are not relevant. For instance, if the index kind is

-      <quote>bag</quote>, you cannot provide sort keys. The order of sort keys can be

-      adjusted using the up and down buttons, if necessary.</para>

-    

-    

-       <note><para>There is usually no need to explicitly declare a Bag index in your descriptor.  

-              As of UIMA v2.1, if you do not declare any index for a type (or any of its 

-              supertypes), a Bag index will be automatically created.  This index is 

-       accessed using the <literal>getAllIndexedFS(...)</literal> method defined on the index repository.</para></note>

-    

-    

-    <para>A set index will contain no duplicates of the same type, where a duplicate is defined

-      by the indexing comparator. That is, if you commit two feature structures of the same

-      type that are equal with respect to the indexing comparator, only the first one will be

-      entered into the index. Note that you can still have duplicates with respect to the

-      indexing order, if they are of a different type. A set index is not guaranteed to be

-      sorted. If no keys are specified for a set index, then all instances are considered by

-      default to be equal, so only the first instance (for a particular type or subtype of the

-      type being indexed) is indexed. On the other hand, <quote>bag</quote> indicates that

-      all annotation instances are indexed, including duplicates.</para>

-    

-    <para>The Priority Lists section of the Indexes page is used to specify Priority Lists of

-      types. Priority Lists are unnamed ordered sets of type names. Add a new priority list by

-      clicking the Add Set button. Add a type to an existing priority list by first selecting

-      the set, and then clicking Add. You can use the up and down buttons to adjust the order as

-      necessary; these buttons move the selected item up or down.</para>

-    

-    <para>Although it is possible to import self-contained index and type priority files,

-      the creation of such files is not yet supported by the Component Descriptor Editor. If

-      you create these files using another editor, they can be imported using the

-      corresponding Import panels, shown on the right. Imports are specified in the same

-      manner as they are for Type System imports.</para>

-    

-  </section>

-  

-  <section id="ugr.tools.cde.resources">

-    <title>Resources Page</title>

-    

-    <para>The resources page describes resource dependencies (for primitive Analysis

-      Engines) and external Resource specification and their bindings to the resource

-      dependencies.</para>

-    

-    <para>Only primitive Analysis Engines define resource dependencies. Primitive and

-      Aggregate Analysis Engines can define external resources and connect them (bind them)

-      to resource dependencies.</para>

-    

-    <para>When an Aggregate is providing an external resource to be bound to a dependency, the

-      binding is specified using a possibly multi-level path, starting at the Aggregate, and

-      specify which component (by its key name), and then if that component is, in turn, an

-      Aggregate, which component (again by its key name), and so on until you reach a

-      primitive. The sequence of key names is made into the binding specification by joining

-      the parts with a <quote>/</quote> character. All of this is done for you by the Component

-      Descriptor Editor.</para>

-    

-    <para>Any external resource provided by an Aggregate will override any binding provided

-      by any lower level component for the same resource dependency.</para>

-    

-    <para>There are two views of the Resources page, depending on whether the Analysis Engine

-      is an Aggregate or Primitive. Here&apos;s the view for a Primitive:

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="5in" format="JPG" fileref="&imgroot;image060.jpg"/>

-        </imageobject>

-        <textobject><phrase>Resources page for a primitive</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-    

-    <para>To declare a resource dependency, click the Add button in the right hand panel. This

-      puts up the dialog:

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="4in" format="JPG" fileref="&imgroot;image062.jpg"/>

-        </imageobject>

-        <textobject><phrase>Specifying a resource dependency</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-    

-    <para>The Key must be unique within the descriptor declaring it. The Interface, if

-      present, is the name of a Java interface the Analysis Engine uses to access the

-      resource.</para>

-    

-    <para>Declare actual External resource on the left side of the page. Clicking

-      <quote>Add</quote> brings up this dialog:

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="4.2in" format="JPG" fileref="&imgroot;image064.jpg"/>

-        </imageobject>

-        <textobject><phrase>Specifying an External Resource</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-    

-    <para>The Name must be unique within this Analysis Engine. The URL identifies a file

-      resource. If both the URL and URL suffix are used, the file resource is formed by

-      combining the first URL part with the language-identifier, followed by the URL suffix;

-      see <olink targetdoc="&uima_docs_ref;"/> <olink targetdoc="&uima_docs_ref;"

-        targetptr="ugr.ref.xml.component_descriptor.aes.primitive.resource_manager_configuration"/>

-      . URLs may be written as <quote>relative</quote> URLs; in this case they are resolved by

-      looking them up relative to the classpath and/or datapath. A relative URL has the path

-      part starting without an intial <quote>/</quote>; for example:

-      file:my/directory/file. An absolute URL starts with file:/ or file:/// or

-      file://some.network.address/. For more information about URLs, please read the

-      javaDoc information for the Java class <quote>URL</quote>.</para>

-    

-    <para>The Implementation is optional, and if given, must be a Java class that implements

-      the interface specified in any Resource Dependencies this resource is bound

-      to.</para>

-    

-    <section id="ugr.tools.cde.resources.binding">

-      <title>Binding</title>

-      

-      <para>Once you have an external resource definition, and a Resource Dependency, you

-        can bind them together. To do this, you select the two things (an external resource

-        definition, and a Resource Dependency) that you want to bind together, and click

-        Bind.</para>

-      

-    </section>

-    

-    <section id="ugr.tools.cde.resources.aggregates">

-      

-      <title>Resources with Aggregates</title>

-      

-      <para>When editing an Aggregate Descriptor, the Resource definitions panel will show

-        all the resources at the primitive level, with paths down through the components

-        (multiple levels, if needed) to get to the primitives. The Aggregate can define

-        external resources, and bind them to one or more uses by the primitives.</para>

-      

-    </section>

-    

-    <section id="ugr.tools.cde.resources.imports_exports">

-      <title>Imports and Exports</title>

-      

-      <para>Resource definitions and their bindings can be imported, just like other

-        imports. Existing Resource definitions and their bindings can be exported to a new

-        importable part, and replaced with an import for that importable part, using the

-        <quote>Export...</quote> button, just like the similar function on the Type System

-        page.</para>

-      

-    </section>

-  </section>

-  

-  <section id="ugr.tools.cde.source">

-    <title>Source Page</title>

-    

-    <para>The Source page is a text view of the xml content of the Analysis Engine or Type System

-      being configured. An example of this page is displayed below:

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="5.7in" format="JPG" fileref="&imgroot;image066.jpg"/>

-        </imageobject>

-        <textobject><phrase>Source page</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-    

-    <para>Changes made in the GUI are immediately reflected in the xml source, and changes

-      made in the xml source are immediately reflected back in the GUI. The thought here is that

-      the GUI view and the Source view are just two ways of looking at the same data. When the data

-      is in an unsaved state the file name is prefaced with an asterisk in the currently

-      selected file tab in the editor pane inside Eclipse (as in the example above).</para>

-    

-    <para>You may accidentally create invalid descriptors or XML by editing directly in the

-      Source view. If you do this, when you try and save or when you switch to a different view,

-      the error will be detected and reported. In the case of saving, the file will be saved,

-      even if it is in an error state.</para>

-    

-    <section id="ugr.tools.cde.source.formatting">

-      <title>Source formatting &ndash; indentation</title>

-      

-      <para>The XML is indented using an indentation amount saved as a global UIMA

-        preference. To change this preference, use the Eclipse menu item: Windows &rarr;

-        Preferences &rarr; UIMA Preferences.</para>

-      

-    </section>

-  </section>

-  

-  <section id="ugr.tools.cde.creating_self_contained_type_system">

-    <title>Creating a Self-Contained Type System</title>

-    

-    <para>It is also possible to use the Component Descriptor Editor to create or edit

-      self-contained type systems. To create a self-contained type system, select the menu

-      item File &rarr; New &rarr; Other and then select Type System Descriptor File. From the

-      next page of the selection wizard specify a Parent Folder and File name and click Finish.

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="3.5in" format="JPG" fileref="&imgroot;image068.jpg"/>

-        </imageobject>

-        <textobject><phrase>Working with a self-contained type system</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot>

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="3.5in" format="JPG" fileref="&imgroot;image070.jpg"/>

-        </imageobject>

-        <textobject><phrase></phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-    

-    <para>This will take you to a version of the Component Descriptor Editor for editing a type

-      system file which contains just three pages: an overview page, a type system page, and a

-      source page. The overview page is a bit more spartan than in the case of an AE. It looks like

-      the following:

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="3.7in" format="JPG" fileref="&imgroot;image072.jpg"/>

-        </imageobject>

-        <textobject><phrase>Editing a type system object</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-    

-    <para>Just like an AE has an associated name, version, vendor and description, the same is

-      true of a self-contained type system. The Type System page is identical to that in an AE

-      descriptor file, as is the Source page. Note that a self-contained type system can

-      import type systems just like the type system associated with an AE.</para>

-    

-    <para>A type system component can also be created from an existing descriptor which

-      contains a type system definition section, by clicking on the Export... button on the

-      Type System page.</para>

-    

-  </section>

-  

-  <section id="ugr.tools.cde.creating_other_descriptor_components">

-    <title>Creating Other Descriptor Components</title>

-    

-    <para>The new wizard can create several other kinds of components: Collection

-      Processing Management (CPM) components, flow controllers, and importable parts

-      (besides Type Systems, described above, Indexes, Type Priorities, and Resource

-      Manager Configuration imports).</para>

-    

-    <para>The CPM components supported by this editor include the Collection Reader, CAS

-      Initializer, and CAS Consumer descriptors. Each of these is basically treated just

-      like a primitive AE descriptor, with small changes to accommodate the different

-      semantics. For instance, a CAS Consumer can&apos;t declare in its capabilities

-      section that it outputs types or features.</para>

-    

-    <para>Flow controllers are components that control the flow of CASes within an

-      aggregate, an are edited in a similar fashion as a primitive Analysis Engine.</para>

-    

-    <para>The importable part support requires context information to enable the editor to

-      work, because much of the power of this editor comes from extensive checking that

-      requires additional information, other than what is available in just the importable

-      part. For instance, when you create or edit an Indexes import, the facility for adding

-      new indexes needs the type information, which is not present in this part when it is

-      edited alone. </para>

-      

-    <para>To overcome this, when you edit these descriptors, you will be asked to

-      specify a context descriptor, usually a descriptor which would import the part being

-      edited, which would have the additional information needed. </para>

-    

-    <para>Various methods are used

-      to guess what the context descriptor should be - and if the guess is correct, you can just

-      press the Enter key to confirm. The last successful context file is remembered and will

-      be suggested as the context file to use at the next edit session</para>

-  </section>

-</chapter>

diff --git a/uima-docbook-tools/src/docbook/tools.cpe.xml b/uima-docbook-tools/src/docbook/tools.cpe.xml
deleted file mode 100644
index 298bf83..0000000
--- a/uima-docbook-tools/src/docbook/tools.cpe.xml
+++ /dev/null
@@ -1,216 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY imgroot "images/tools/tools.cpe/" >

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.tools.cpe">

-  <title>Collection Processing Engine Configurator User&apos;s Guide</title>

-  <titleabbrev>CPE Configurator User&apos;s Guide</titleabbrev>

-  

-  <para>A <emphasis>Collection Processing Engine (CPE)</emphasis> processes

-    collections of artifacts (documents) through the combination of the following

-    components: a Collection Reader, Analysis Engines, and CAS Consumers.

-    <footnote><para>Earlier versions of UIMA supported another component, the CAS

-    Initializer, but this component is now deprecated in UIMA Version 2.</para></footnote>

-    </para>

-  

-  <para>The <emphasis>Collection Processing Engine Configurator(CPE

-    Configurator)</emphasis> is a graphical tool that allows you to assemble and run

-    CPEs.</para>

-  

-  <para>For an introduction to Collection Processing Engine concepts, including

-    developing the components that make up a CPE, read <olink targetdoc="&uima_docs_tutorial_guides;"/>

-    <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.cpe"/>. This

-    chapter is a user&apos;s guide for using the CPE Configurator tool, and does not describe

-    UIMA&apos;s Collection Processing Architecture itself.</para>

-  

-  <section id="ugr.tools.cpe.limitations">

-    <title>Limitations of the CPE Configurator</title>

-    

-    <para>The CPE Configurator only supports basic CPE configurations.</para>

-    

-    <para>It only supports <quote>Integrated</quote> deployments (although it will

-      connect to remotes if particular CAS Processors are specified with remote service

-      descriptors). It doesn&apos;t support configuration of the error handling. It

-      doesn&apos;t support Sofa Mappings; it assumes all Single-View components are

-      operating with the _InitialView Sofa. Multi-View components will not have their names

-      mapped. It sets up a fixed-sized CAS Pool.</para>

-    

-    <para>To set these additional options, you must edit the CPE Descriptor XML file

-      directly.  See <olink targetdoc="&uima_docs_ref;"/>

-      <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/> for the syntax.

-      You may then open the CPE Descriptor in the CPE Configurator and run it.  The changes

-      you applied to the CPE Descriptor <emphasis>will</emphasis> be respected, although you

-      will not be able to see them or edit them from the GUI.

-    </para>   

-  </section>

-  

-  <section id="ugr.tools.cpe.starting">

-    <title>Starting the CPE Configurator</title>

-    

-    <para>The CPE Configurator tool can be run using the <literal>cpeGui</literal> shell

-      script, which is located in the <literal>bin</literal> directory of the UIMA SDK. If

-      you&apos;ve installed the example Eclipse project (see <olink targetdoc="&uima_docs_overview;"/>

-      <olink targetdoc="&uima_docs_overview;"

-        targetptr="ugr.ovv.eclipse_setup.example_code"/>, you can also run it using the

-      <quote>UIMA CPE GUI</quote> run configuration provided in that project.</para>

-    <note><para>If you are planning to build a CPE using components other than the examples

-    included in the UIMA SDK, you will first need to update your CLASSPATH environment

-    variable to include the classes needed by these components.</para></note>

-    

-    <para>When you first start the CPE Configurator, you will see the main window shown here:

-      

-      

-      <screenshot>

-    <mediaobject>

-      <imageobject>

-        <imagedata width="5.8in" format="JPG" fileref="&imgroot;image002.jpg"/>

-      </imageobject>

-      <textobject><phrase>CPE Configurator main GUI window</phrase> 

-      </textobject>

-    </mediaobject>

-  </screenshot></para>

-  </section>

-  

-  <section id="ugr.tools.cpe.selecting_component_descriptors">

-    <title>Selecting Component Descriptors</title>

-    

-    <para>The CPE Configurator&apos;s main window is divided into three sections, one each for the Collection

-      Reader, Analysis Engines, and CAS Consumers.<footnote>

-      <para>There is also a fourth pane, for the CAS Initializer, but it is hidden by default. To enable it click the

-        <literal>View &rarr; CAS Initializer Panel</literal> menu item.</para></footnote></para>

-    

-    <para>In each section of the CPE Configurator, you can select the component(s) you want to use by browsing to (or

-      typing the location of) their XML descriptors. You must select a Collection Reader, and at least one Analysis

-      Engine or CAS Consumer.</para>

-    

-    <para>When you select a descriptor, the configuration parameters that are defined in that descriptor will then

-      be displayed in the GUI; these can be modified to override the values present in the descriptor.</para>

-    

-    <para>For example, the screen shot below shows the CPE Configurator after the following components have been

-      chosen:

-      

-      

-      <programlisting>examples/descriptors/collectionReader/FileSystemCollectionReader.xml

-examples/descriptors/analysis_engine/NamesAndPersonTitles_TAE.xml

-examples/descriptors/cas_consumer/XmiWriterCasConsumer.xml</programlisting></para>

-    

-    

-    <screenshot>

-    <mediaobject>

-      <imageobject>

-        <imagedata width="5.8in" format="JPG" fileref="&imgroot;image004.jpg"/>

-      </imageobject>

-      <textobject><phrase>CPE Configurator after components chosen</phrase> 

-      </textobject>

-    </mediaobject>

-  </screenshot>

-    

-  </section>

-  

-  <section id="ugr.tools.cpe.running">

-    <title>Running a Collection Processing Engine</title>

-    

-    <para>After selecting each of the components and providing configuration settings,

-      click the play (forward arrow) button at the bottom of the screen to begin processing. A

-      progress bar should be displayed in the lower left corner. (Note that the progress bar

-      will not begin to move until all components have completed their initialization, which

-      may take several seconds.) Once processing has begun, the pause and stop buttons become

-      enabled.</para>

-    

-    <para>If an error occurs, you will be informed by an error dialog. If processing completes

-      successfully, you will be presented with a performance report.</para>

-    

-  </section>

-  

-  <section id="ugr.tools.cpe.file_menu">

-    <title>The File Menu</title>

-    

-    <para>The CPE Configurator&apos;s File Menu has the following options:</para>

-    

-    <itemizedlist><listitem><para>Open CPE Descriptor</para></listitem>

-      

-      <listitem><para>Save CPE Descriptor</para></listitem>

-

-      <listitem><para>Save Options (submenu)</para></listitem>

-      

-      <listitem><para>Refresh Descriptors from File System</para></listitem>

-      

-      <listitem><para>Clear All</para></listitem>

-      

-      <listitem><para>Exit </para></listitem></itemizedlist>

-    

-    <para><emphasis role="bold">Open CPE Descriptor</emphasis> will allow you to select a

-      CPE Descriptor file from disk, and will read in that CPE Descriptor and configure the GUI

-      appropriately.</para>

-    

-    <para><emphasis role="bold">Save CPE Descriptor</emphasis> will create a CPE

-      Descriptor file that defines the CPE you have constructed. This CPE Descriptor will

-      identify the components that constitute the CPE, as well as the configuration settings

-      you have specified for each of these components. Later, you can use <quote>Open CPE

-      Descriptor</quote> to restore the CPE Configurator to the state. Also, CPE

-      Descriptors can be used to easily run a CPE from a Java program &ndash; see 

-      <olink targetdoc="&uima_docs_tutorial_guides;"/> <olink

-        targetdoc="&uima_docs_tutorial_guides;"

-        targetptr="ugr.tug.application.running_a_cpe_from_a_descriptor"/>

-      .</para>

-    

-    <para>CPE Descriptors also allow specifying operational parameters, such as error

-      handling options that are not currently available for configuration through the CPE

-      Configurator. For more information on manually creating a CPE Descriptor, see 

-      <olink targetdoc="&uima_docs_ref;"/>

-      <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/>.

-    </para>

-    

-    <para>The <emphasis role="bold">Save Options</emphasis> submenu has one item,

-      "Use &lt;import>".  If this item is checked (the default), saved CPE descriptors

-      will use the <literal>&lt;import></literal> syntax to refer to their component 

-      descriptors.  If unchecked, the older <literal>&lt;include></literal> syntax will

-      be used for new components that you add to your CPE using the GUI.  (However, if you  

-      open a CPE descriptor that used &lt;import>, these imports will not be replaced.)    

-    </para>

-    

-    <para><emphasis role="bold">Refresh Descriptors from File System</emphasis> will

-      reload all descriptors from disk. This is useful if you have made a change to the

-      descriptor outside of the CPE Configurator, and want to refresh the display.</para>

-    

-    <para><emphasis role="bold">Clear All</emphasis> will reset the CPE Configurator to

-      its initial state, with no components selected.</para>

-    

-    <para><emphasis role="bold">Exit</emphasis> will close the CPE Configurator. If you

-      have unsaved changes, you will be prompted as to whether you would like to save them to a

-      CPE Descriptor file. If you do not save them, they will be lost.</para>

-    

-    <para>When you restart the CPE Configurator, it will automatically reload the last CPE

-      descriptor file that you were working with.</para>

-    

-  </section>

-  

-  <section id="ugr.tools.cpe.help_menu">

-    <title>The Help Menu</title>

-    

-    <para>The CPE Configurator&apos;s Help menu provides <quote>About</quote>

-      information and some very simple instructions on how to use the tool.</para>

-    

-  </section>

-</chapter>
\ No newline at end of file
diff --git a/uima-docbook-tools/src/docbook/tools.cvd.xml b/uima-docbook-tools/src/docbook/tools.cvd.xml
deleted file mode 100644
index a96732c..0000000
--- a/uima-docbook-tools/src/docbook/tools.cvd.xml
+++ /dev/null
@@ -1,941 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY imgroot "images/tools/tools.cvd/" >

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  

-%uimaents;

-]>

-<!--

- Licensed to the Apache Software Foundation (ASF) under one

- or more contributor license agreements.  See the NOTICE file

- distributed with this work for additional information

- regarding copyright ownership.  The ASF licenses this file

- to you under the Apache License, Version 2.0 (the

- "License"); you may not use this file except in compliance

- with the License.  You may obtain a copy of the License at

- 

- http://www.apache.org/licenses/LICENSE-2.0

- 

- Unless required by applicable law or agreed to in writing,

- software distributed under the License is distributed on an

- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

- KIND, either express or implied.  See the License for the

- specific language governing permissions and limitations

- under the License.

--->

-<chapter id="ugr.tools.cvd">

- <title>CAS Visual Debugger</title>

- <section id="ugr.tools.cvd.introduction">

- <title>Introduction</title>

- <para>

-  The CAS Visual Debugger is a tool to run text analysis engines in UIMA

-  and view the results. The tool is implemented as a stand-alone GUI 

-  tool using Java's Swing library.

- </para>

- <para>

-  This is a developer's tool.  It is intended to support you in writing

-  text analysis annotators for UIMA (Unstructured Information Management

-  Architecture).  As a development tool, the emphasis is not so much on

-  pretty pictures, but rather on navigability.  It is intended to show

-  you all the information you need, and show it to you quickly (at least

-  on a fast machine ;-).

- </para>

- <para>

-  The main purpose of this application is to let you browse all the data

-  that was created when you ran an analysis engine over some text.  The

-  display mimics the access methods you have in the CAS API in terms of

-  indexes, types, feature structures and feature values.

- </para>

- <para>

-  As in the CAS, there is special support for annotations.  Clicking on

-  an annotation will select the corresponding text, and conversely, you

-  can display all annotations that cover a given position in the text.

-  This will be explained in more detail in the section on the main

-  display area.

- </para>

- <para>

-  As usual, the graphics in this manual are for illustrative purposes

-  and may not look 100% like the actual version of CVD you are running.

-  This depends on your operating system, your version of Java, and a

-  variety of other factors.

- </para>

- <section id="ugr.cvd.introduction.running">

- <title>Running CVD</title>

- <para>

-  You will usually want to start CVD from the command line, or from Eclipse.  To start CVD from the

-  command line, you minimally need the uima-core and uima-tools jars.  Below is a sample command

-  line for sh and its offspring.

-  <programlisting>java -cp ${UIMA_HOME}/lib/uima-core.jar:${UIMA_HOME}/lib/uima-tools.jar 

-    org.apache.uima.tools.cvd.CVD</programlisting>

-  However, there is no need to type this.  The ${UIMA_HOME}/bin directory contains a cvd.sh and

-  cvd.bat file for Unix/Linux/MacOS and Windows, respectively.

- </para>

- <para>

-   In Eclipse, you have a ready to use launch configuration available when you have installed the

-   UIMA sample project (see <olink targetdoc="&uima_docs_overview;"/> <olink targetdoc="&uima_docs_overview;" 

-   targetptr="ugr.ovv.eclipse_setup.example_code"/>).  Below is a screenshot of the the Eclipse Run 

-   dialog with the CVD

-   run configuration selected.

-   <screenshot>

-    <mediaobject>

-     <imageobject>

-      <imagedata scale="85" format="JPG" fileref="&imgroot;eclipse-cvd-launch.jpg"/>

-     </imageobject>

-     <textobject>

-      <phrase>Eclipse run dialog with CVD selected</phrase>

-     </textobject>

-    </mediaobject>

-   </screenshot>

- </para>

- </section>

- 

- <section id="cvd.introduction.commandline">

- <title>Command line parameters</title>

- <para>

- You can provide some command line parameters to influence the startup behavior of CVD.  For

- example, if you want to run a certain analysis engine on a certain text over and over again

- (for debugging, say), you can make CVD load the annotator and text at startup and execute

- the annotator.  Here's a list of the supported command line options.

- </para>

- 

-    <table frame="none" id="cvd.table.commandline">

-    <title>Command line options</title>

-    <tgroup cols="2">

-     <thead>

-      <row>

-       <entry>Option</entry>

-       <entry>Description</entry>

-      </row>

-     </thead>

-     <tbody>

-      <row>

-       <entry>

-        <computeroutput>-text &lt;textFile></computeroutput>

-       </entry>

-       <entry>Loads the text file <computeroutput>&lt;textFile></computeroutput></entry>

-      </row>

-      <row>

-       <entry>

-        <computeroutput>-desc &lt;descriptorFile></computeroutput>

-       </entry>

-       <entry>Loads the descriptor <computeroutput>&lt;descriptorFile></computeroutput></entry>

-      </row>

-      <row>

-       <entry>

-        <computeroutput>-exec</computeroutput>

-       </entry>

-       <entry>Runs the pre-loaded annotator; only allowed in conjunction with <computeroutput>-desc</computeroutput> </entry>

-      </row>

-      <row>

-       <entry>

-        <computeroutput>-datapath &lt;datapath></computeroutput>

-       </entry>

-       <entry>Sets the data path to <computeroutput>&lt;datapath></computeroutput></entry>

-      </row>

-      <row>

-       <entry>

-        <computeroutput>-ini &lt;iniFile></computeroutput>

-       </entry>

-       <entry>Makes CVD use alternative ini file <computeroutput>&lt;textFile></computeroutput> (default is ~/annotViewer.pref)</entry>

-      </row>

-      <row>

-       <entry>

-        <computeroutput>-lookandfeel &lt;lnfClass></computeroutput>

-       </entry>

-       <entry>Uses alternative look-and-feel <computeroutput>&lt;lnfClass></computeroutput></entry>

-      </row>

-      </tbody>

-      </tgroup>

-      </table>

- 

- </section>

- 

- </section>

- <section id="cvd.errorHandling">

-  <title>Error Handling</title>

-  <para>

-   On encountering

-   an error, CVD will pop up an error dialog with a short,

-   usually incomprehensible message.  Often, the error message will

-   claim that there is more information available in the log file, and

-   sometimes, this is actually true; so do go and check the log.  You

-   can view the log file by selecting the appropriate item in the

-   &quot;Tools&quot; menu.

-

-   <screenshot>

-    <mediaobject>

-     <imageobject>

-      <imagedata scale="100" format="JPG" fileref="&imgroot;ErrorExample.jpg"/>

-     </imageobject>

-     <textobject>

-      <phrase>Sample error dialog</phrase>

-     </textobject>

-    </mediaobject>

-   </screenshot>

-

-  </para>

- </section>

-

- <section id="cvd.preferencesFile">

-  <title>Preferences File</title>

-  <para>

-   The program will attempt to read on startup and save on exit a file

-   called annotViewer.pref in your home directory.  This file contains

-   information about choices you made while running the program:

-   directories (such as where your data files are) and window sizes. 

-   These settings will be used the next time you use the program. There

-   is no user control over this process, but the file format is

-   reasonably transparent, in case you feel like changing it.  Note,

-   however, that the file will be overwritten every time you exit the

-   program.

-  </para>

-  

-  <para>

-  If you use CVD for several projects, it may be convenient to use a different

-  ini files for each project.  You can specify the ini file CVD should use

-  with the <programlisting>-ini &lt;iniFile></programlisting> parameter on the

-  command line.

-  </para>

- </section>

-

- <section id="cvd.theMenus">

-  <title>The Menus</title>

-  <para>

-   We give a brief description of the various menus. All menu items come

-   with mnemonics (e.g., Alt-F X will exit the program). In addition,

-   some menu items have their own keyboard accelerators that you can use

-   anywhere in the program. For example, Ctrl-S will save the text

-   you've been editing.

-  </para>

-  <section id="cvd.fileMenu">

-   <title>The File Menu</title>

-   <para>

-    The File menu lets you load, create and save text, load and save

-    color settings, and import and export the XCAS format. Here's a

-    screenshot.

-

-   <screenshot> 

-    <mediaobject>

-      <imageobject>

-       <imagedata scale="100" format="JPG" fileref="&imgroot;FileMenu.jpg"/>

-      </imageobject>

-      <textobject>

-       <phrase>The File menu</phrase>

-      </textobject>

-     </mediaobject>

-    </screenshot>

-   </para>

-

-   <itemizedlist>

-    <para>

-     Below is a list of the menu items, together with an explanation.

-    </para>

-

-    <listitem>

-     <formalpara>

-      <title>New Text...</title>

-      <para>

-       Clears the text area. Text you type is written to an anonymous

-       buffer. You can use &quot;Save Text As...&quot; to save the text

-       you typed to a file. Note: whenever you modify the text, be it

-       through typing, loading a file or using the &quot;New

-       Text...&quot; menu item, previous analysis results will be lost.

-       Since the previous analysis is specific to the text, modifying

-       the text invalidates the analysis.

-      </para>

-     </formalpara>

-    </listitem>

-

-    <listitem>

-     <formalpara>

-      <title>Open Text File</title>

-      <para>

-       Loads a new text file into the viewer.  The next time you run an

-       analysis engine, it will run the text you loaded last.  Depending

-       on the annotator you're using, the program may run slow with very

-       large text files, so you may want to experiment.

-      </para>

-     </formalpara>

-    </listitem>

-

-    <listitem>

-     <formalpara>

-      <title>Save Text File</title>

-      <para>

-       Saves the currently open text file. If no file is currently

-       loaded (either because you haven't loaded a file, or you've used

-       the &quot;New Text...&quot; menu item), this menu item is

-       disabled (and Ctrl-S will do nothing).

-      </para>

-     </formalpara>

-    </listitem>

-

-    <listitem>

-     <formalpara>

-      <title>Save Text As...</title>

-      <para>

-       Save the text to a file of your choosing. This can be an existing

-       file, which is then overwritten, or it can be a new file that

-       you're creating.

-      </para>

-     </formalpara>

-    </listitem>

-

-    <listitem>

-     <formalpara>

-      <title>Change Code Page</title>

-      <para>

-       Allows you to change the code page that is used to load and save

-       text files. If you're sure the text you're loading is in ASCII or

-       one of the 8-bit extensions such as ISO-8859-1 (ISO Latin1),

-       there is probably nothing you need to do. Just load the text and

-       look at the display. If you see no funny characters or square

-       boxes, chances are your selected code page is compatible with

-       your text file.

-       

-       Note that the code page setting is also in effect when you save

-       files. You can observe the effects with a hex editor or by just

-       looking at the file size. For example, if you save the default

-       text

-       <computeroutput>This is where the text goes.</computeroutput>

-       to a file on Windows using the default code page, the size of the

-       file will be 28 bytes. If you now change the code page to UTF-16

-       and save the file again, the file size will be 58 bytes: two

-       bytes for each character, plus two bytes for the byte-order mark.

-       Now switch the code page back to the default Windows code page

-       and reload the UTF-16 file to see the difference in the editor.

-       

-       CVD will display all code pages that are available in the JVM

-       you're running it on.  The first code page in the list is the

-       default code page of your system.  This is also CVD's default if

-       you don't make a specific choice.

-       

-       Your code page selection will be remembered in CVD's ini file.

-      </para>

-     </formalpara>

-    </listitem>

-

-    <listitem>

-     <formalpara>

-      <title>Load Color Settings</title>

-      <para>

-       Load previously saved color settings from a file (see

-       Tools/Customize Annotation Display).  It is highly recommended

-       that you only load automatically generated files.  Strange things

-       may happen if you try to load the wrong file format. On startup,

-       the program attempts to load the last color settings file that

-       you loaded or saved during a previous session. If you intend to

-       use the same color settings as the last time you ran the program,

-       there is therefore no need to manually load a color settings

-       file.

-      </para>

-     </formalpara>

-    </listitem>

-

-    <listitem>

-     <formalpara>

-      <title>Save Color Settings</title>

-      <para>

-       Save your customized color settings (see Tools/Customize

-       Annotation Display).  The file is a Java properties file, and as

-       such, reasonably transparent.  What is not transparent is the

-       encoding of the colors (integer encoding of 24-bit RGB values),

-       so changing the file by hand is not really recommended.

-      </para>

-     </formalpara>

-    </listitem>

-

-    <listitem>

-     <formalpara>

-      <title>Read Type System File</title>

-      <para>

-       Load a type system file. This allows you to load an XCAS file

-       without having to have access to the corresponding annotator.

-      </para>

-     </formalpara>

-    </listitem>

-

-    <listitem>

-     <formalpara>

-      <title>Write Type System File</title>

-      <para>

-       Create a type system file from the currently loaded type

-       definitions. In addition, you can save the current CAS as a XCAS

-       file (see below). This allows you to later load the type system

-       and XCAS to view the CAS without having to rerun the annotator.

-      </para>

-     </formalpara>

-    </listitem>

-

-    <listitem>

-     <formalpara>

-      <title>Read XMI CAS File</title>

-      <para>

-       Read an XMI CAS file. Important: XMI CAS is a serialization format that

-       serializes a CAS without type system and index information. It is

-       therefore impossible to read in a stand-alone XMI CAS file. XMI CAS

-       files can only be interpreted in the context of an existing type

-       system. Consequently, you need to first load the Analysis Engine that was used to

-       create the XMI file, to be able to load that XMI file.

-      </para>

-     </formalpara>

-    </listitem>

-

-    <listitem>

-     <formalpara>

-      <title>Write XMI CAS File</title>

-      <para>

-       Writes the current analysis out as an XMI CAS file.

-      </para>

-     </formalpara>

-    </listitem>

-

-

-    <listitem>

-     <formalpara>

-      <title>Read XCAS File</title>

-      <para>

-       Read an XCAS file. Important: XCAS is a serialization format that

-       serializes a CAS without type system and index information. It is

-       therefore impossible to read in a stand-alone XCAS file. XCAS

-       files can only be interpreted in the context of an existing type

-       system. Consequently, you need to load the Analysis Engine that was used to

-       create the XCAS file to be able to load it. Loading a XCAS file

-       without loading the Analysis Engine may produce strange errors. You may get

-       syntax errors on loading the XCAS file, or worse, everything may

-       appear to go smoothly but in reality your CAS may be corrupted.

-      </para>

-     </formalpara>

-    </listitem>

-

-    <listitem>

-     <formalpara>

-      <title>Write XCAS File</title>

-      <para>

-       Writes the current analysis out as an XCAS file.

-      </para>

-     </formalpara>

-    </listitem>

-

-    <listitem>

-     <formalpara>

-      <title>Exit</title>

-      <para>Exits the program. Your preferences will be saved.</para>

-     </formalpara>

-    </listitem>

-

-   </itemizedlist>

-

-  </section>

-

-  <section id="cvd.editMenu">

-   <title>The Edit Menu</title>

-   <para>

-

-   <screenshot>

-     <mediaobject>

-      <imageobject>  <!-- was 2.15in -->

-       <imagedata scale="100" format="JPG" fileref="&imgroot;EditMenu.jpg" />

-      </imageobject>

-      <textobject>

-       <phrase>The Edit menu</phrase>

-      </textobject>

-     </mediaobject>

-    </screenshot>

-

-    The &quot;Edit&quot; menu provides a standard text editing menu with

-    Cut, Copy and Paste, as well as unlimited Undo.

-   </para>

-   <para>

-    Note that standard keyboard accelerators Ctrl-X, Ctrl-C, Ctrl-V and

-    Ctrl-Z can be used for Cut, Copy, Paste and Undo, respectively. The

-    text area supports other standard keyboard operations such as

-    navigation HOME, Ctrl-HOME etc., as well as marking text with Shift-

-    &lt;ArrowKey&gt;.

-   </para>

-  </section>

-

-  <section id="cvd.runMenu">

-   <title>The Run Menu</title>

-   <para>

-

-    <screenshot>

-     <mediaobject>

-      <imageobject> <!-- was width="2.225in" -->

-       <imagedata scale="100" format="JPG" fileref="&imgroot;RunMenu.jpg" />

-      </imageobject>

-      <textobject>

-       <phrase>The Run menu</phrase>

-      </textobject>

-     </mediaobject>

-     </screenshot>

-

-     In the Run menu, you can load and run text analysis engines.

-   </para>

-

-   <itemizedlist>

-

-    <listitem>

-     <formalpara>

-      <title>Load AE</title>

-      <para>

-       Loads and initializes a text analysis engine. Choosing this menu

-       item will display a file open dialog where you should choose an

-       XML descriptor of a Text Analysis Engine to process the current

-       text.  Even if the analysis engine runs fast, this will take a

-       while, since there is a lot of setup work to do when a new TAE is

-       created.  So be patient.

-

-       When you develop a new annotator, you will often need to

-       recompile your code. Gladis will not reload your annotator code.

-       When you recompile your code, you need to terminate the GUI and

-       restart it. If you only make changes to the XML descriptor, you

-       don't need to restart the GUI. Simply reload the XML file.

-      </para>

-     </formalpara>

-    </listitem>

-

-    <listitem>

-     <formalpara>

-      <title>Run AE</title>

-      <para>

-       Before you have (successfully) loaded a TAE, this menu item will

-       be disabled. After you have loaded a TAE, it will be enabled, and

-       the name changes according to the name of the TAE you have

-       loaded. For example, if you've loaded &quot;The World's Fastest

-       Parser&quot;, you will have a menu item called &quot;Run The

-       World's Fastest Parser&quot;. When you choose the item, the TAE

-       is run on whatever text you have currently loaded.

-

-       After a TAE has run successfully, the index window in the upper

-       left-hand corner of the screen should be updated and show the

-       indexes that were created by this run.  We will have more to say

-       about indexes and what to do with them later.

-      </para>

-     </formalpara>

-    </listitem>

-

-    <listitem>

-     <formalpara>

-      <title>Run AE on CAS</title>

-      <para>

-       This allows you to run an analysis engine on the current CAS.

-       This is useful if you have loaded a CAS from an XCAS file, and

-       would like to run further analysis on it.

-      </para>

-     </formalpara>

-    </listitem>

-

-    <listitem>

-     <formalpara>

-      <title>Run collectionProcessComplete</title>

-      <para>

-       When you select this item, the analysis engine's 

-       collectionProcessComplete() method is called.

-      </para>

-     </formalpara>

-    </listitem>

-

-    <listitem>

-     <formalpara>

-      <title>Performance Report</title>

-      <para>

-       After you've run your analysis, you can view a performance report.  It will show

-       you where the time went: which component used how much of the processing time.

-      </para>

-     </formalpara>

-    </listitem>

-

-    <listitem>

-     <formalpara>

-      <title>Recently used</title>

-      <para>

-       Collects a list of recently used analysis engines as a short-cut

-       for loading.

-      </para>

-     </formalpara>

-    </listitem>

-

-    <listitem>

-     <formalpara>

-      <title>Language</title>

-      <para>

-       Some annotators do language specific processing. For example, if

-       you run lexical analysis, the results may be quite different

-       depending on what the analysis engine thinks the language of the

-       document is. With this menu item, you can manually set the

-       document language. Alternatively, you can use an automatic

-       language identification annotator. If the analysis engines you're

-       working with are language agnostic, there is no need to set the

-       language.

-      </para>

-     </formalpara>

-    </listitem>

-

-   </itemizedlist>

-  </section>

-

-  <section id="cvd.toolsMenu">

-   <title>The tools menu</title>

-   <para>

-    The tools menu contains some assorted utilities, such as the log

-    file viewer. Here you can also set the log level for UIMA.  

-    A more detailed description of some of the menu items

-    follows below.

-   </para>

-   <section id="cvd.viewTypeSystem">

-    <title>View Type System</title>

-    <para>

-

-     <screenshot>

-       <mediaobject>

-        <imageobject>  

-         <imagedata scale="100" format="JPG" fileref="&imgroot;TypeSystemViewer.jpg" />

-        </imageobject>

-       </mediaobject>

-      </screenshot>

-

-     Brings up a new window that displays the type system. This menu

-     item is disabled until the first time you have run an analysis

-     engine, since there is no type system to display until then. An

-     example is shown above.

-    </para>

-    <para>

-     You can view the inheritance tree on the left by expanding and

-     collapsing nodes.  When you select a type, the features defined on

-     that type are displayed in the table on the right.  The feature

-     table has three columns.  The first gives the name of the feature,

-     the second one the type of the feature (i.e., what values it

-     takes), and the third column displays the highest type this feature

-     is defined on.  In this example, the features &quot;begin&quot; and

-     &quot;end&quot; are inherited from the built-in annotation type.

-    </para>

-    <para>

-     In the options menu, you can configure if you want to see inherited

-     features or not (not yet implemented).

-    </para>

-   </section>

-

-   <section id="cvd.showSelectedAnnotations">

-    <title>Show Selected Annotations</title>

-    <para>

-     <figure id="AnnotationViewerFigure">

-      <title>

-       Annotations produced by a statistical named entity tagger

-      </title>

-      <mediaobject>

-       <imageobject> <!-- was width="5.82in" -->

-        <imagedata scale="100" format="JPG" fileref="&imgroot;AnnotationViewer.jpg" />

-       </imageobject>

-      </mediaobject>

-     </figure>

-    </para>

-

-    <para>

-     To enable this menu, you must have run an analysis engine and

-     selected the ``AnnotationIndex'' or one of its subnodes in the

-     upper left hand corncer of the screen.  It will bring up a new text

-     window with all selected annotations marked up in the text. 

-    </para>

-    <para>

-     <xref linkend="AnnotationViewerFigure" />

-     shows the results of applying a statistical named entity tagger to

-     a newspaper article.  Some annotation colors have been customized:

-     countries are in reverse video, organizations have a turquois

-     background, person names are green, and occupations have a maroon

-     background.  The default background color is yellow.  This color is

-     also used if there is more than one annotation spanning a certain

-     text.  Clearly, this display is only useful if you don't have any

-     overlapping annotations, or at least not too many.

-    </para>

-    <para>

-     This menu item is also available as a context menu in the Index

-     Tree area of the main window. To use it, select the annotation

-     index or one of its subnodes, right-click to bring up a popup menu,

-     and select the only item in the popup menu. The popup menu is

-     actually a better way to invoke the annotation display, since it

-     changes according to the selection in the Index Tree area, and will

-     tell you if what you've selected can be displayed or not.

-    </para>

-

-

-   </section>

-

-  </section>

-

- </section>

-

- <section id="cvd.mainDisplayArea">

-  <title>The Main Display Area</title>

-  <para>

-   The main display area has three sub-areas.  In the upper left-hand

-   corner is the

-   <emphasis role="bold">index display</emphasis>, which shows the indexes that were defined in the 

-   AE, as well as

-   the types of the indexes and their subtypes.  In the lower left-hand

-   corner, the content of indexes and sub-indexes is displayed 

-   (<emphasis role="bold">FS display</emphasis>).  Clicking on any node in the index display will 

-   show the

-   corresponding feature structures in the FS display.  You can explore

-   those structures by expanding the tree nodes.  When you click on a

-   node that represents an annotation, clicking on it will cause the

-   corresponding text span to marked in the

-   <emphasis role="bold">text display</emphasis>.

-  </para>

-  <para>

-   <figure id="Main1Figure">

-    <title>State of GUI after running an analysis engine</title>

-    <mediaobject>

-     <imageobject>

-      <imagedata scale="100" format="JPG" fileref="&imgroot;Main1.jpg" />

-     </imageobject>

-    </mediaobject>

-   </figure>

-  </para>

-  <para>

-   <xref linkend="Main1Figure"></xref>

-   shows the state after running the UIMA_Analysis_Example.xml aggregate from the

-   uimaj-examples project.  There are two indexes in the index display, and the

-   annotation index has been selected.  Note that the number of

-   structures in an index is displayed in square brackets after the

-   index name.

-  </para>

-  <para>

-   Since displaying thousands of sister nodes is both confusing and

-   slow, nodes are grouped in powers of 10.  As soon as there are no

-   more than 100 sister nodes, they are displayed next to each other.

-  </para>

-  <para>

-   In our example, a name annotation has been selected, and the

-   corresponding token text is highlighted in the text area.  We have

-   also expanded the token node to display its structure (not much to see in this simple example).

-  </para>

-  <para>

-   In <xref linkend="Main1Figure"/>, we selected an annotation in the FS display to find the

-   corresponding text.  We can also do the reverse and find out what

-   annotations cover a certain point in the text.  Let's go back to the

-   name recognizer for an example.

-  </para>

-  <para>

-   <figure id="Main2Figure">

-    <title>

-     Finding annotations for a specific location in the text

-    </title>

-    <mediaobject>

-     <imageobject>  <!-- next width was 6.39in -->

-      <imagedata scale="100" format="JPG" fileref="&imgroot;Main2.jpg" />

-     </imageobject>

-    </mediaobject>

-   </figure>

-  </para>

-  <para>

-   We would like to know if the Michael Baessler has been

-   recognized as a name.  So we position the cursor in the corresponding

-   text span somewhere, then right-click to bring up the context menu

-   telling us which annotations exist at this point. An example is shown

-   in

-   <xref linkend="Main2Figure" />.

-  </para>

-  <para>

-   <figure id="Main3Figure">

-    <title>

-     Selecting an annotation from the context menu will highlight that

-     annotation in the FS display

-    </title>

-    <mediaobject>

-     <imageobject> <!-- width was 6.39in -->

-      <imagedata scale="100" format="JPG" fileref="&imgroot;Main3.jpg" />

-     </imageobject>

-    </mediaobject>

-   </figure>

-  </para>

-

-  <para>

-   At this point (<xref linkend="Main2Figure" />), 

-   we only know that somewhere around the text cursor position (not

-   visible in the picture), we discovered a name.  When we select the corresponding entry in the

-   context menu, the name annotation is selected in the FS display, and its covered text is

-   highlighted.

-   <xref linkend="Main3Figure" /> shows the display after 

-   the name node has been selected in

-   the popup menu.

-  </para>

-  <para>

-   We're glad to see that, indeed, Michael Baessler is

-   considered to be a name.  Note that in the FS display, the

-   corresponding annotation node has been selected, and the tree has

-   been expanded to make the node visible.

-  </para>

-  <para>

-   NB that the annotations displayed in the popup menu come from the

-   annotations currently displayed in the FS display.  If you didn't

-   select the annotation index or one of its sub-nodes, no annotations

-   can be displayed and the popup menu will be empty.

-  </para>

-

-  <section id="cvd.statusBar">

-   <title>The Status Bar</title>

-   <para>

-    At the bottom of the screen, some useful information is displayed in

-    the

-    <emphasis role="bold">status bar</emphasis>. The left-most area shows the most recent major event, with the

-    time when the event terminated in square brackets. The next area

-    shows the file name of the currently loaded XML descriptor. This

-    area supports a tool tip that will show the full path to the file.

-    The right-most area shows the current cursor position, or the extent

-    of the selection, if a portion of the text has been selected. The

-    numbers correspond to the character offsets that are used for

-    annotations.

-   </para>

-  </section>

-

-  <section id="cvd.keyboardNavigation">

-   <title>Keyboard Navigation and Shortcuts</title>

-   <para>

-    The GUI can be completely navigated and operated through the

-    keyboard. All menus and menu items support keyboard mnemonics, and

-    some common operations are accessible through keyboard accelerators.

-   </para>

-   <para>

-    You can move the focus between the three main areas using

-    <computeroutput>Tab</computeroutput>

-    (clockwise) and

-    <computeroutput>Shift-Tab</computeroutput>

-    (counterclockwise). When the focus is on the text area, the

-    <computeroutput>Tab</computeroutput>

-    key will insert the corresponding character into the text, so you

-    will need to use

-    <computeroutput>Ctrl-Tab</computeroutput>

-    and

-    <computeroutput>Ctrl-Shift-Tab</computeroutput>

-    instead. Alternatively, you can use the following key bindings to

-    jump directly to one of the areas:

-    <computeroutput>Ctrl-T</computeroutput>

-    to focus the text area,

-    <computeroutput>Ctrl-I</computeroutput>

-    for the index repository frame and

-    <computeroutput>Ctrl-F</computeroutput>

-    for the feature structure area.

-   </para>

-   <para>

-    Some additional keyboard shortcuts are available only in the text

-    area, such as

-    <computeroutput>Ctrl-X</computeroutput>

-    for Cut,

-    <computeroutput>Ctrl-C</computeroutput>

-    for Copy,

-    <computeroutput>Ctrl-V</computeroutput>

-    for Paste and

-    <computeroutput>Ctrl-Z</computeroutput>

-    for Undo. The context menu in the text area can be evoke through the

-    <computeroutput>Alt-Enter</computeroutput>

-    shortcut. Text can be selected using the arrow keys while holding

-    the

-    <computeroutput>Shift</computeroutput>

-    key.

-   </para>

-   <para>

-    The following table shows the supported keyboard shortcuts.

-   </para>

-   <table frame="none" id="cvd.table.keyboardShortcuts">

-    <title>Keyboard shortcuts</title>

-    <tgroup cols="3">

-     <thead>

-      <row>

-       <entry>Shortcut</entry>

-       <entry>Action</entry>

-       <entry>Scope</entry>

-      </row>

-     </thead>

-     <tbody>

-      <row>

-       <entry>

-        <computeroutput>Ctrl-O</computeroutput>

-       </entry>

-       <entry>Open text file</entry>

-       <entry>Global</entry>

-      </row>

-      <row>

-       <entry>

-        <computeroutput>Ctrl-S</computeroutput>

-       </entry>

-       <entry>Save text file</entry>

-       <entry>Global</entry>

-      </row>

-      <row>

-       <entry>

-        <computeroutput>Ctrl-L</computeroutput>

-       </entry>

-       <entry>Load AE descriptor</entry>

-       <entry>Global</entry>

-      </row>

-      <row>

-       <entry>

-        <computeroutput>Ctrl-R</computeroutput>

-       </entry>

-       <entry>Run current AE</entry>

-       <entry>Global</entry>

-      </row>

-      <row>

-       <entry>

-        <computeroutput>Ctrl-I</computeroutput>

-       </entry>

-       <entry>Switch focus to index repository</entry>

-       <entry>Global</entry>

-      </row>

-      <row>

-       <entry>

-        <computeroutput>Ctrl-T</computeroutput>

-       </entry>

-       <entry>Switch focus to text area</entry>

-       <entry>Global</entry>

-      </row>

-      <row>

-       <entry>

-        <computeroutput>Ctrl-F</computeroutput>

-       </entry>

-       <entry>Switch focus to FS area</entry>

-       <entry>Global</entry>

-      </row>

-      <row>

-       <entry>

-        <computeroutput>Ctrl-X</computeroutput>

-       </entry>

-       <entry>Cut selection</entry>

-       <entry>Text</entry>

-      </row>

-      <row>

-       <entry>

-        <computeroutput>Ctrl-C</computeroutput>

-       </entry>

-       <entry>Copy selection</entry>

-       <entry>Text</entry>

-      </row>

-      <row>

-       <entry>

-        <computeroutput>Ctrl-V</computeroutput>

-       </entry>

-       <entry>Paste selection</entry>

-       <entry>Text</entry>

-      </row>

-      <row>

-       <entry>

-        <computeroutput>Ctrl-Z</computeroutput>

-       </entry>

-       <entry>Undo</entry>

-       <entry>Text</entry>

-      </row>

-      <row>

-       <entry>

-        <computeroutput>Alt-Enter</computeroutput>

-       </entry>

-       <entry>Show context menu</entry>

-       <entry>Text</entry>

-      </row>

-     </tbody>

-    </tgroup>

-   </table>

-  </section>

-

- </section>

-</chapter>
\ No newline at end of file
diff --git a/uima-docbook-tools/src/docbook/tools.doc_analyzer.xml b/uima-docbook-tools/src/docbook/tools.doc_analyzer.xml
deleted file mode 100644
index 688fcef..0000000
--- a/uima-docbook-tools/src/docbook/tools.doc_analyzer.xml
+++ /dev/null
@@ -1,338 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY imgroot "images/tools/tools.doc_analyzer/" >

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.tools.doc_analyzer">

-  <title>Document Analyzer User&apos;s Guide</title>

- 

-

-<para>The <emphasis>Document Analyzer</emphasis> is a tool provided by the

-UIMA SDK for testing annotators and AEs. It reads text files from your disk, processes them using an AE, and

-allows you to view the results.  The

-Document Analyzer is designed to work with text files and cannot be used with

-Analysis Engines that process other types of data.</para>

-

-<para>For an introduction to developing annotators and Analysis

-Engines, read <olink targetdoc="&uima_docs_tutorial_guides;"/>

- <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aae"/>.  

-  This chapter is a user&apos;s guide for using the Document Analyzer tool, and

-does not describe the process of developing annotators and Analysis Engines.</para>

-

-<section id="ugr.tools.doc_analyzer.starting">

-  <title>Starting the Document Analyzer</title>

-  

-<para>To run the Document Analyzer, execute the <literal>documentAnalyzer</literal> script that is in the <literal>bin</literal> directory of your UIMA SDK installation, or, if you

-are using the example Eclipse project, execute the <quote>UIMA Document Analyzer</quote>

-run configuration supplied with that project.</para>

-

-<para>Note that if you&apos;re planning to run an Analysis Engine

-other than one of the examples included in the UIMA SDK, you&apos;ll first need to

-update your CLASSPATH environment variable to include the classes needed by

-that Analysis Engine.</para>

-

-<para>When you first run the Document Analyzer, you should see a

-screen that looks like this:

-  

-  <screenshot>

-    <mediaobject>

-      <imageobject>

-        <imagedata width="5.8in" format="PNG" fileref="&imgroot;DocAnalyzerScr1.png"/>

-      </imageobject>

-      <textobject><phrase>Document Analyzer GUI</phrase>

-      </textobject>

-    </mediaobject>

-  </screenshot></para>

-

-

-  </section>

-  

-  <section id="ugr.tools.doc_analyzer.running_an_ae">

-    <title>Running an AE</title>

-

-

-

-<para>To run a AE, you must first configure the six fields on

-the main screen of the Document Analyzer.</para>

-

-<para><emphasis role="bold">Input Directory:</emphasis>  

-  Browse to or type the path of a directory containing text files that you

-want to analyze.  Some sample documents

-are provided in the UIMA SDK under the <literal>examples/data</literal>

-directory.</para>

-

-<para><emphasis role="bold">Input File Format:</emphasis> Set this to "text".  It can, alternatively, 

-be set to one of the two serialized forms for CASes, if you have previously generated and saved these.

-For the CAS formats only, you can also specify "Lenient deserialization"; if checked, then extra

-types and features in the CAS being deserialized and loaded (that are not defined by the Annotator-to-be-run's

-type system) will not cause a deserialization error, but will instead be ignored.</para>

-

-<para><emphasis role="bold">Character Encoding:</emphasis>  

-  The character encoding of the input files.  The default, UTF-8, also works fine for ASCII

-text files.  If you have a different

-encoding, select it here.  For more information on character sets and their names, see the Javadocs for 

-  <literal>java.nio.charset.Charset</literal>.</para>

-

-<para><emphasis role="bold">Output Directory:</emphasis> Browse to or type the path of a directory where you want

-  output to be written. (As we&apos;ll see later, you won&apos;t normally need to look directly at these files, but the

-  Document Analyzer needs to know where to write them.) The files written to this directory will be an XML

-  representation of the analyzed documents. If this directory doesn&apos;t exist, it will be created. If the

-  directory exists, any files in it will be deleted (but the tool will ask you to confirm this before doing so). If you

-  leave this field blank, your AE will be run but no output will be generated.</para>

-

-<para><emphasis role="bold">Location of AE XML Descriptor:</emphasis>  

-  Browse to or type the path of the descriptor

-for the AE that you want to run.  There

-are some example descriptors provided in the UIMA SDK under the <literal>examples/descriptors/analysis_engine</literal> and <literal>examples/descriptors/tutorial</literal> directories.</para>

-

-<para><emphasis role="bold">XML Tag containing Text:</emphasis>  

-  This is an optional feature.  If you enter a value here, it specifies the

-name of an XML tag, expected to be found within the input documents, that

-contains the text to be analyzed.  For

-example, the value <literal>TEXT</literal> would cause the AE to only

-analyze the portion of the document enclosed within &lt;TEXT&gt;...&lt;/TEXT&gt;

-tags.  Also, any XML tags occuring within that text will be removed prior to analysis.</para>

-

-<para><emphasis role="bold">Language:</emphasis>

-  Specify

-the language in which the documents are written.  Some Analysis Engines, but not all, require

-that this be set correctly in order to do their analysis.  You can select a value from the drop-down

-list or type your own.  The value entered

-here must be an ISO language identifier, the list of which can be found here: 

-  <ulink url="http://www.ics.uci.edu/pub/ietf/http/related/iso639.txt"/>.

-</para>

-

-

-<para>Once you&apos;ve filled in the appropriate values, press the

-<quote>Run</quote> button.</para>

-

-<para>If an error occurs, a dialog will appear with the error

-message.  (A stack trace will also be

-printed to the console, which may help you if the error was generated by your

-own annotator code.)  Otherwise, an

-<quote>Analysis Results</quote> window will appear.</para>

-

-

-

-</section>

-  

-  <section id="ugr.tools.doc_analyzer.viewing_results">

-    <title>Viewing the Analysis Results</title>

-

-<para>After a successful analysis, the <quote>Analysis

-Results</quote> window will appear.

-  

-  <screenshot>

-    <mediaobject>

-      <imageobject>

-        <imagedata width="4.2in" format="JPG" fileref="&imgroot;image004.jpg"/>

-      </imageobject>

-      <textobject><phrase>Analysis Results Window</phrase></textobject>

-    </mediaobject>

-  </screenshot></para>

-

-

-<para>The <quote>Results Display Format</quote> options at the

-bottom of this window show the different ways you can view your analysis &ndash; the

-Java Viewer, Java Viewer (JV) with User Colors, HTML, and XML.  

-  The default, Java Viewer, is recommended.</para>

-

-<para>Once you have selected your desired Results Display

-Format, you can double-click on one of the files in the list to view the

-analysis done on that file.</para>

-

-<para>For the Java viewer, two different view modes are supported, each represented by one of two 

-radio buttons titled "Annnotations", and "Features":</para>

-

-<para>In the "Annotations" view, each annotation which is declared to be an output of the pipeline 

-(in the top most Annotator Descriptor) is given a checkbox and a color, in the bottom panel. You can control which

-annotations are shown by using the checkboxes in the bottom panel, the Select All button, 

-or the Deselet All button. The results display looks like this (for the AE descriptor 

-<literal>examples/descriptors/tutorial/ex4/MeetingDetectorTAE.xml</literal>):

-

-  <screenshot>

-    <mediaobject>

-      <imageobject>

-        <imagedata width="5.8in" format="PNG" fileref="&imgroot;image006v2.png"/>

-      </imageobject>

-      <textobject><phrase>Analysis Results Window showing results from tutorial example 4 in Annotations view mode</phrase></textobject>

-    </mediaobject>

-  </screenshot></para>

-

-<para>You can click the mouse on one of the highlighted

-annotations to see a list of all its features in the frame on the right.</para>

-

-<!--

-<para>In the "Entities" view, annotations are grouped by the type of entities they resolve to, 

-through a user specified entity resolver. You can control which groups of annotations are 

-selected by using the checkboxes in the legend, each representing a specific type of entity. 

-The results display looks like this 

-

-  <screenshot>

-    <mediaobject>

-      <imageobject>

-        <imagedata width="5.8in" format="JPG" fileref="&imgroot;image007.jpg"/>

-      </imageobject>

-      <textobject><phrase>Analysis Results Window showing results from tutorial example 4 in Entities view mode</phrase></textobject>

-    </mediaobject>

-  </screenshot></para>

--->

-

-<para>In the "Features" view, you can specify a combination of a single type, a single feature of that type, and some feature values for that feature.

-The annotations whose feature values match will be highlighted.  Step by step, you first select a specific type of annotations by using 

-a radio button in the first tab of the legend.

-

-  <screenshot>

-    <mediaobject>

-      <imageobject>

-        <imagedata width="5.8in" format="PNG" fileref="&imgroot;image007-1v2.png"/>

-      </imageobject>

-      <textobject><phrase>Analysis Results Window showing results from tutorial example 4 in Features view mode by selecting the DateAnnotation type.</phrase></textobject>

-    </mediaobject>

-  </screenshot></para>

-

-<para>Selecting this automatically transitions to the second tab, where you then select a specific feature 

-of the annotation type.

-

-  <screenshot>

-    <mediaobject>

-      <imageobject>

-        <imagedata width="5.8in" format="PNG" fileref="&imgroot;image007-2v2.png"/>

-      </imageobject>

-      <textobject><phrase>Analysis Results Window showing results from tutorial example 4 in Features view mode by selecting the shortDateString feature.</phrase></textobject>

-    </mediaobject>

-  </screenshot></para>

-

-<para>Selecting this again automatically transitions you to the thrid tab, where you select some specific feature 

-values in the third tab of the legend.

-

-  <screenshot>

-    <mediaobject>

-      <imageobject>

-        <imagedata width="5.8in" format="PNG" fileref="&imgroot;image007-3v2.png"/>

-      </imageobject>

-      <textobject><phrase>Analysis Results Window showing results from tutorial example 4 in Features view mode by selecting individual shortDateString feature values.</phrase></textobject>

-    </mediaobject>

-  </screenshot></para>

-

-<para>In each of the above two view modes, you can click the mouse on one of the highlighted 

-annotations to see a list of all its features in the frame on the right.</para>

-

-<para>If you are viewing a CAS that contains multiple subjects

-of analysis, then a selector will appear at the bottom right of the Annotation

-Viewer window.  This will allow you to

-choose the Sofa that you wish to view.  Note that only text Sofas containing a non-null document are available

-for viewing.</para>

-

-</section>

-  

-  <section id="ugr.tools.doc_analyzer.configuring">

-    <title>Configuring the Annotation Viewer</title>

-

-<para>The <quote>JV User Colors</quote> and the HTML viewer allow

-you to specify exactly which colors are used to display each of your annotation

-types.  For the Java Viewer, you can also

-specify which types should be initially selected, and you can hide types

-entirely.</para>

-

-<para>To configure the viewer, click the <quote>Edit Style

-Map</quote> button on the <quote>Analysis Results</quote> dialog.  

-  You should see a dialog that looks like this:

-

-  

-  <screenshot>

-    <mediaobject>

-      <imageobject>

-        <imagedata width="5.8in" format="JPG" fileref="&imgroot;image008.jpg"/>

-      </imageobject>

-      <textobject><phrase>Configuring the Analysis Results Viewer</phrase></textobject>

-    </mediaobject>

-  </screenshot></para>

-

-<para>To change the color assigned to a type, simply click on

-the colored cell in the <quote>Background</quote> column for the type you wish to

-edit.  This will display a dialog that

-allows you to choose the color.  For the

-HTML viewer only, you can also change the foreground color.</para>

-

-<para>If you would like the type to be initially checked

-(selected) in the legend when the viewer is first launched, check the box in

-the <quote>Checked</quote> column.  If you

-would like the type to never be shown in the viewer, click the box in the

-<quote>Hidden</quote> column.  These

-settings only affect the Java Viewer, not the HTML view.</para>

-

-<para>When you are done editing, click the <quote>Save</quote>

-button.  This will save your choices to a

-file in the same directory as your AE descriptor.  From now on, when you view analysis results

-produced by this AE using the <quote>JV User Colors</quote> or <quote>HTML</quote>

-options, the viewer will be configured as you have specified.</para>

-

-</section>

-

-<section id="ugr.tools.doc_analyzer.interactive_mode">

-  <title>Interactive Mode</title>

-  

-

-<para>Interactive Mode allows you to analyze text that you type

-or cut-and-paste into the tool, rather than requiring that the documents be

-stored as files.</para>

-

-<para>In the main Document Analyzer window, you can invoke

-Interactive Mode by clicking the <quote>Interactive</quote> button instead of the

-<quote>Run</quote> button.  This will

-display a dialog that looks like this:

-  

-   

-  <screenshot>

-    <mediaobject>

-      <imageobject>

-        <imagedata width="5.5in" format="JPG" fileref="&imgroot;image010.jpg"/>

-      </imageobject>

-      <textobject><phrase>Invoking Interactive Mode</phrase></textobject>

-    </mediaobject>

-  </screenshot></para> 

-

-<para>You can type or cut-and-paste your text into this window,

-then choose your Results Display Format and click the <quote>Analyze</quote>

-button.  Your AE will be run on the text

-that you supplied and the results will be displayed as usual.</para>

-

-

-</section>

-  

-  <section id="ugr.tools.doc_analyzer.view_mode">

-    <title>View Mode</title>

-    

-<para>If you have previously run a AE and saved its analysis

-results, you can use the Document Analyzer&apos;s View mode to view those results,

-without re-running your analysis.  To do

-this, on the main Document Analyzer window simply select the location of your

-analyzed documents in the <quote>Output Directory</quote> dialog and click the

-<quote>View</quote> button.  You can then

-view your analysis results as described in Section 

- <xref linkend="ugr.tools.doc_analyzer.viewing_results"/>.</para>

-

-</section>

-  </chapter>

-

diff --git a/uima-docbook-tools/src/docbook/tools.eclipse_launcher.xml b/uima-docbook-tools/src/docbook/tools.eclipse_launcher.xml
deleted file mode 100644
index 96123d0..0000000
--- a/uima-docbook-tools/src/docbook/tools.eclipse_launcher.xml
+++ /dev/null
@@ -1,96 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
-<!ENTITY imgroot "images/tools/tools.eclipse_launcher/" >
-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  
-%uimaents;
-]>
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements.  See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership.  The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License.  You may obtain a copy of the License at
-
-   http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied.  See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-<chapter id="ugr.tools.eclipse_launcher">
-
-	<title>Eclipse Analysis Engine Launcher&apos;s Guide</title>
-	<titleabbrev>Eclipse Analysis Engine Launcher&apos;s Guide</titleabbrev>
-	
-	<para>
-	The Analysis Engine Launcher is an Eclipse plug-in that provides debug and run support 
-	for Analysis Engines directly within eclipse, like a Java program can be debugged.
-	It supports most of the descriptor formats except CPE, UIMA AS and
-	some remote deployment descriptors.
-	</para>
-	
-	<screenshot>
-		<mediaobject>
-			<imageobject>
-				<imagedata width="5in" format="PNG"
-					fileref="&imgroot;image01.png" />
-			</imageobject>
-		</mediaobject>
-	</screenshot>
-		
-	<section id="ugr.tools.eclipse_launcher.create_configuration">
-	    <title>Creating an Analysis Engine launch configuration</title>
-	    <para>
-	    To debug or run an Analysis Engine a launch configuration must be created. To do this
-	    select "Run -> Run Configurations" or "Run -> Run Configurations" from the menu bar. A dialog
-	    will open where the launch configuration can be created. Select UIMA Analysis Engine and create
-	    a new configuration via pressing the New button at the top, or via the New button in the context menu.
-	    The newly created configuration will be automatically selected and the Main tab will be displayed.
-	    </para>
-	    
-	    <para>
-	    The Main tab defines the Analysis Engine which will be launched. First select the project which
-	    contains the descriptor, then choose a descriptor and select the input. The input can either be
-	    a folder which contains input files or just a single input file, if the recursively check box
-	    is marked the input folder will be scanned recursively for input files.  
-	    </para>
-	    
-	    <para>
-	    The input format defines the format of the input files, if it is set to CASes the input resource
-	    must be either in the XMI or XCAS format and if it is set to plain text, plain text input files in
-	    the specified encoding are expected. The input logic filters out all files which do not have an appropriate
-	    file ending, depending on the chosen format the file ending must be one of .xcas, .xmi or .txt, all
-	    other files are ignored when the input is a folder, if a single file is selected it will be processed
-	    independent of the file ending.
-	    </para>
-	    
-	    <para>
-	    The output directory is optional, if set all processed input files will be written to the specified
-	    directory in the XMI CAS format, if the clear check box is marked all files inside the output folder will be deleted, usually
-	    this option is not needed because existing files will be overwritten without notice.
-	    </para>
-	    
-	    <para>
-	    The other tabs in the launch configuration are documented in the eclipse documentation,
-	    see the "Java development user guide -> Tasks -> Running and Debugging". 
-	    </para>
-    </section>
- 	
- 	<section id="ugr.tools.eclipse_launcher.launching">
-	    <title>Launching an Analysis Engine</title>
-	    <para>
-		To launch an Analysis Engine go to the previously created launch configuration and
-		click on "Debug" or "Run" depending on the desired run mode. The Analysis Engine will
-		now be launched. The output will be shown in the Console View. To debug an Analysis Engine
-		place breakpoints inside the implementation class. If a breakpoint is hit the execution will pause 
-		like in a Java program. 
-	    </para>
- 	</section>
- </chapter>
\ No newline at end of file
diff --git a/uima-docbook-tools/src/docbook/tools.jcasgen.xml b/uima-docbook-tools/src/docbook/tools.jcasgen.xml
deleted file mode 100644
index e10a570..0000000
--- a/uima-docbook-tools/src/docbook/tools.jcasgen.xml
+++ /dev/null
@@ -1,251 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY imgroot "images/tools/tools.jcasgen/" >

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.tools.jcasgen">

-  <title>JCasGen User&apos;s Guide</title>

-  

-  <para>JCasGen reads a descriptor for an application (either an Analysis Engine Descriptor, 

-    or a Type System Descriptor), creates the merged type system

-    specification by merging all the type system information from all the components

-    referred to in the descriptor, and then uses this merged type system to create Java source

-    files for classes that enable JCas access to the CAS. Java classes are not produced for the

-    built-in types, since these classes are already provided by the UIMA SDK.  (An exception is

-    the built-in type <literal>uima.tcas.DocumentAnnotation</literal>, see the warning below.) </para>

-  

-  <warning><para>If the components comprising the input to the type merging process 

-    have different definitions for the same type name,

-    JCasGen will show a warning, and in some environments may offer to abort the operation.

-    If you continue past this warning, 

-    JCasGen will produce correct Java source files representing the merged types 

-   (that is, the

-    type definition containing all of the features defined on that type by all of the

-    components).  It is recommended that you do not use this capability (of having 

-    two different definitions for the same type name, with different feature sets) since it can make it 

-    difficult to combine/package your annotator with others. See <olink targetdoc="&uima_docs_ref;"/>

-    <olink targetdoc="&uima_docs_ref;"

-      targetptr="ugr.ref.jcas.merging_types_from_other_specs"/> for more information.

-  </para>

-  

-  <para>Also note that if your type system declares a custom version of the 

-    <literal>uima.tcas.DocumentAnnotation</literal> 

-    built-in type, then JCasGen will generate a Java source file for it.  If you do this, you need to be

-    aware of the issues discussed in <olink targetdoc="&uima_docs_ref;"/>

-    <olink

-       targetdoc="&uima_docs_ref;"

-       targetptr="ugr.ref.jcas.documentannotation_issues"/>.</para></warning>

-  

-  <para>JCasGen can be run in many ways.  For Eclipse users using the Component Descriptor Editor, there's a button

-  on the Type System Description page to run it on that type system.  There's also a jcasgen-maven-plugin to use 

-  in maven build scripts.  There's a menu-driven GUI tool for it.    

-  And, there are command line scripts you can use to invoke it.</para>

-  

-  <para>There are several versions of JCasGen. The basic version reads an XML descriptor

-    which contains a type system descriptor, and generates the corresponding Java Class

-    Models for those types. Variants exist for the Eclipse environment that allow merging the

-    newly generated Java source code with previously augmented versions; see <olink

-    targetdoc="&uima_docs_ref;"/> <olink targetdoc="&uima_docs_ref;"

-      targetptr="ugr.ref.jcas.augmenting_generated_code"/> for a discussion of how the

-    Java Class Models can be augmented by adding additional methods and fields.</para>

-  

-  <para>Input to JCasGen needs to be mostly self-contained. In particular, any types that are

-    defined to depend on user-defined supertypes must have that supertype defined, if the

-    supertype is <literal>uima.tcas.Annotation </literal>or a subtype of it. Any features

-    referencing ranges which are subtypes of uima.cas.String must have those subtypes

-    included. If this is not followed, a warning message is given stating that the resulting

-    generation may be inaccurate.</para>

-  

-  <para>JCasGen is typically invoked automatically when using the Component Descriptor

-    Editor (see <olink targetdoc="&uima_docs_tools;"

-      targetptr="ugr.tools.cde.auto_jcasgen"/>), but can also be run using a shell

-    script. These scripts can take 0, 1, or 2 arguments. The first argument is the location of

-    the file containing the input XML descriptor. The second argument specifies where the

-    generated Java source code should go. If it isn&apos;t given, JCasGen generates its

-    output into a subfolder called JCas (or sometimes JCasNew &ndash; see below), of the first

-    argument&apos;s path.</para>

-    

-  <para>The first argument, the input file, can be written as

-    <literal>jar:&lt;url>!{entry}</literal>, for example:

-    <literal>jar:http://www.foo.com/bar/baz.jar!/COM/foo/quux.class</literal></para>

-  

-  <para>If no arguments are given to JCasGen, then it launches a GUI to interact with the user

-    and ask for the same input. The GUI will remember the arguments you previously used.

-    Here&apos;s what it looks like:

-    

-    

-    <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="5.8in" format="JPG" fileref="&imgroot;image002.jpg"/>

-        </imageobject>

-        <textobject><phrase>JCasGen tool showing fields for input arguments</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-  

-  <para>When running with automatic merging of the generated Java source with previously

-    augmented versions, the output location is where the merge function obtains the source

-    for the merge operation.</para>

-  

-  <para>As is customary for Java, the generated class source files are placed in the

-    appropriate subdirectory structure according to Java conventions that correspond to

-    the package (name space) name.</para>

-  

-  <para>The Java classes must be compiled and the resulting class files included in the class

-    path of your application; you make these classes available for other annotator writers

-    using your types, perhaps packaged as an xxx.jar file. If the xxx.jar file is made to

-    contain only the Java Class Models for the CAS types, it can be reused by any users of these

-    types.</para>

-  

-  <section id="ugr.tools.jcasgen.running_without_eclipse">

-    <title>Running stand-alone without Eclipse</title>

-    

-    <para>There is no capability to automatically merge the generated Java source with

-      previous versions, unless running with Eclipse. If run without Eclipse, no automatic

-      merging of the generated Java source is done with any previous versions. In this case,

-      the output is put in a folder called <quote>JCasNew</quote> unless overridden by

-      specifying a second argument.</para>

-    

-    <para>The distribution includes a shell script/bat file to run the stand-alone version,

-      called jcasgen.</para>

-    

-  </section>

-  

-  <section id="ugr.tools.jcasgen.running_standalone_with_eclipse">

-    <title>Running stand-alone with Eclipse</title>

-    

-    <para>If you have Eclipse and EMF (EMF = Eclipse Modeling Framework; both of these are

-      available from <ulink url="http://www.eclipse.org"/>) installed (version 3 or

-      later) JCasGen can merge the Java code it generates with previous versions, picking up

-      changes you might have inserted by hand. The output (and source of the merge input) is in a

-      folder <quote>JCas</quote> under the same path as the input XML file, unless

-      overridden by specifying a second argument.</para>

-    

-    <para>You must install the UIMA plug-ins into Eclipse to enable this function.</para>

-    

-    <para>The distribution includes a shell script/bat file to run the stand-alone with

-      Eclipse version, called jcasgen_merge. This works by starting Eclipse in

-      <quote>headless</quote> mode (no GUI) and invoking JCasGen within Eclipse. You will

-      need to set the ECLIPSE_HOME environment variable or modify the jcasgen_merge shell

-      script to specify where to find Eclipse. The version of Eclipse needed is 3 or higher,

-      with the EMF plug-in and the UIMA runtime plug-in installed. A temporary workspace is

-      used; the name/location of this is customizable in the shell script.</para>

-    

-    <para>Log and error messages are written to the UIMA log. This file is called uima.log, and

-      is located in the default working directory, which if not overridden, is the startup

-      directory of Eclipse.</para>

-    

-  </section>

-  

-  <section id="ugr.tools.jcasgen.running_within_eclipse">

-    <title>Running within Eclipse</title>

-    

-    <para>There are two ways to run JCasGen within Eclipse. The first way is to configure an

-      Eclipse external tools launcher, and use it to run the stand-alone shell scripts, with

-      the arguments filled in. Here&apos;s a picture of a typical launcher configuration

-      screen (you get here by navigating from the top menu: Run &ndash;&gt; External Tools

-      &ndash;&gt; External tools...).

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="5.8in" format="JPG" fileref="&imgroot;image004.jpg"/>

-        </imageobject>

-        <textobject><phrase>Running JCasGen within Eclipse using the external tool launcher</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-    

-    <para>The second way (which is the normal way it's done) to run within Eclipse is to use the

-      Component Descriptor Editor (CDE) (see <olink targetdoc="&uima_docs_tools;"

-        targetptr="ugr.tools.cde"/>). This tool can be configured to automatically

-      launch JCasGen whenever the type system descriptor is modified. In this release, this

-      operation completely regenerates the files, even if just a small thing changed. For

-      very large type systems, you probably don&apos;t want to enable this all the time. The

-      configurator tool has an option to enable/disable this function.</para>

-  </section>

-  

-  <section id="ugr.tools.jcasgen.maven_plugin">

-    <title>Using the jcasgen-maven-plugin</title>

-    

-    <para>For Maven builds, you can use the jcasgen-maven-plugin to take one or more

-    top level descriptors (Type System or Analysis Engine descriptors), merge them

-    together in the standard way UIMA merges type definitions, and produce the corresponding

-    JCas source classes.  These, by default, are generated to the standard spot for Maven

-    builds for generated files.</para>

-    

-    <para>You can use ant-like include / exclude patterns to specify the top level descriptor

-    files.  If you set &lt;limitToProject> to true, then after a complete UIMA type system

-    merge is done with all of the types, including those that are imported, only those

-    types which are defined within this Maven project (that is, in some subdirectory of the project)

-    will be generated.</para>

-    

-    <para>To use the jcasgen-maven-plugin, specify it in the POM as follows:</para>

-    <programlisting><![CDATA[<plugin>

-  <groupId>org.apache.uima</groupId>

-  <artifactId>jcasgen-maven-plugin</artifactId>

-  <version>2.4.1</version>  <!-- change this to the latest version -->

-  <executions>

-    <execution>

-      <goals><goal>generate</goal></goals>  <!-- this is the only goal -->

-      <!-- runs in phase process-resources by default -->

-      <configuration>

-

-        <!-- REQUIRED -->

-        <typeSystemIncludes>

-          <!-- one or more ant-like file patterns 

-               identifying top level descriptors --> 

-          <typeSystemInclude>src/main/resources/MyTs.xml

-          </typeSystemInclude>

-        </typeSystemIncludes>

-

-        <!-- OPTIONAL -->

-        <!-- a sequence of ant-like file patterns 

-             to exclude from the above include list -->

-        <typeSystemExcludes>

-        </typeSystemExcludes>

-

-        <!-- OPTIONAL -->

-        <!-- where the generated files go -->

-        <!-- default value: 

-             ${project.build.directory}/generated-sources/jcasgen" -->

-        <outputDirectory> 

-        </outputDirectory>

-

-        <!-- true or false, default = false -->

-        <!-- if true, then although the complete merged type system 

-             will be created internally, only those types whose

-             definition is contained within this maven project will be

-             generated.  The others will be presumed to be 

-             available via other projects. -->

-        <!-- OPTIONAL -->

-        <limitToProject>false</limitToProject>

-      </configuration>     

-    </execution>

-  </executions>

-</plugin>]]></programlisting>

-  </section>

-  

-</chapter>
\ No newline at end of file
diff --git a/uima-docbook-tools/src/docbook/tools.pear.installer.xml b/uima-docbook-tools/src/docbook/tools.pear.installer.xml
deleted file mode 100644
index aaa674f..0000000
--- a/uima-docbook-tools/src/docbook/tools.pear.installer.xml
+++ /dev/null
@@ -1,119 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY imgroot "images/tools/tools.pear.installer/" >

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.tools.pear.installer">

-  <title>PEAR Installer User&apos;s Guide</title>

-  

-  <para>PEAR (Processing Engine ARchive) is a new standard for packaging UIMA compliant

-    components. This standard defines several service elements that should be included in

-    the archive package to enable automated installation of the encapsulated UIMA

-    component. The major PEAR service element is an XML Installation Descriptor that

-    specifies installation platform, component attributes, custom installation

-    procedures and environment variables. </para>

-  

-  <para>The installation of a UIMA compliant component includes 2 steps: (1) installation of

-    the component code and resources in a local file system, and (2) verification of the

-    serviceability of the installed component. Installation of the component code and

-    resources involves extracting component files from the archive (PEAR) package in a

-    designated directory and localizing file references in component descriptors and other

-    configuration files. Verification of the component serviceability is accomplished

-    with the help of standard UIMA mechanisms for instantiating analysis engines.

-    

-    

-    <screenshot>

-    <mediaobject>

-      <imageobject>

-        <imagedata width="5.8in" format="JPG" fileref="&imgroot;image002.jpg"/>

-      </imageobject>

-      <textobject><phrase>PEAR Installer GUI</phrase>

-      </textobject>

-    </mediaobject>

-  </screenshot></para>

-  

-  <para>There are two versions of the PEAR Installer.  One is an interactive, GUI-based

-  application which puts up a panel asking for the parameters of the installation; the 

-  other is a command line interface version where you pass the parameters needed on the command

-  line itself.  To launch the GUI version of the PEAR Installer, use the script in the UIMA bin directory: 

-  <code>runPearInstaller.bat</code> or <code>runPearInstaller.sh.</code>

-  The command line is launched using <code>runPearInstallerCli.cmd</code> or 

-  <code>runPearInstallerCli.sh.</code></para>

-  

-  <para>The PEAR Installer installs UIMA

-    compliant components (analysis engines) from PEAR packages in a local file system. To

-    install a desired UIMA component the user needs to select the appropriate PEAR file in a

-    local file system and specify the installation directory (optional). If no installation

-    directory is specified, the PEAR file is installed to the current working directory. 

-	By default the PEAR packages are not installed directly to the specified installation directory. 

-	For each PEAR a subdirectory with the name of the PEAR's ID is created where the PEAR package is 

-	installed to. If the PEAR installation directory already exists, the old content is automatically 

-	deleted before the new content is installed. During the

-    component installation the user can read messages printed by the installation program in

-    the message area of the application window. If the installation fails, appropriate error

-    message is printed to help identifying and fixing the problem.</para>

-  

-  <para>After the desired UIMA component is successfully installed, the PEAR Installer

-    allows testing this component in the CAS Visual Debugger (CVD) application, which is

-    provided with the UIMA package. The CVD application will load your UIMA component using

-    its XML descriptor file. If the component is loaded successfully, you&apos;ll be able to

-    run it either with sample documents provided in the

-    <literal>&lt;UIMA_HOME&gt;/examples/data</literal> directory, or with any other

-    sample documents. See <olink targetdoc="&uima_docs_tools;"

-      targetptr="ugr.tools.cvd"/> for more information about the CVD application.

-    Running your component in the CVD application helps to make sure the component will run in

-    other UIMA applications. If the CVD application fails to load or run your component, or

-    throws an exception, you can find more information about the problem in the uima.log file

-    in the current working directory. The log file can be viewed with the CVD.</para>

-  

-  <para>PEAR Installer creates a file named <literal>setenv.txt</literal> in the

-    <literal>&lt;component_root&gt;/metadata</literal> directory. This file contains

-    environment variables required to run your component in any UIMA application. 

-    It also creates a PEAR descriptor (see also <olink targetdoc="&uima_docs_ref;"/>

-    <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.pear.specifier"/>)

-    file named <literal>&lt;componentID&gt;_pear.xml</literal> 

-    in the <literal>&lt;component_root&gt;</literal> directory that can be used to directly run

-    the installed pear file in your application.

-  </para>

-

-  <para>

-    The metadata/setenv.txt is not read by the UIMA framework anywhere.  

-    It's there for use by non-UIMA application code if that code wants to set environment variables.

-    The metadata/setenv.txt is just a "convenience" file duplicating what's in the xml.  

-  </para>

-  

-  <para>

-    The setenv.txt file has 2 special variables: the CLASSPATH and the PATH. 

-    The CLASSPATH is computed from any supplied CLASSPATH environment variable, 

-    plus the jars that are configured in the PEAR structure, including subcomponents. 

-    The PATH is similarly computed, using any supplied PATH environment variable plus 

-    it includes the "bin" subdirectory of the PEAR structure, if it exists.

-  </para>

-  

-  <para>The command line version of the PEAR installer has one required argument:

-  the path to the PEAR file being installed.  A second argument can specify the

-  installation directory (default is the current working directory).

-  An optional argument, one of "-c" or "-check" or "-verify", causes verification to be done

-  after installation, as described above.</para>  

-  

-</chapter>
\ No newline at end of file
diff --git a/uima-docbook-tools/src/docbook/tools.pear.merger.xml b/uima-docbook-tools/src/docbook/tools.pear.merger.xml
deleted file mode 100644
index c836121..0000000
--- a/uima-docbook-tools/src/docbook/tools.pear.merger.xml
+++ /dev/null
@@ -1,164 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.tools.pear.merger">

-  <title>PEAR Merger User&apos;s Guide</title>

-  

-  <para>The PEAR Merger utility takes two or more PEAR files and merges their contents,

-    creating a new PEAR which has, in turn, a new Aggregate analysis engine whose delegates are

-    the components from the original files being merged. It does this by (1) copying the

-    contents of the input components into the output component, placing each component into a

-    separate subdirectory, (2) generating a UIMA descriptor for the output Aggregate 

-    analysis engine and (3) creating an output PEAR file that encapsulates the output

-    Aggregate.</para>

-  

-  <para>The merge logic is quite simple, and is intended to work for simple cases. More complex

-    merging needs to be done by hand. Please see the Restrictions and Limitations section,

-    below.</para>

-  

-  <para>To run the PearMerger command line utility you can use the runPearMerger scripts (.bat for Windows, and .sh for

-    Unix). The usage of the tooling is shown below:</para>

-  

-  <para><programlisting>runPearMerger 1st_input_pear_file ... nth_input_pear_file 

-  -n output_analysis_engine_name [-f output_pear_file ]</programlisting></para>

-  

-  <para>The first group of parameters are the input PEAR files. No duplicates are allowed

-    here. The <literal>-n</literal> parameter is the name of the generated Aggregate

-    Analysis Engine. The optional <literal>-f</literal> parameter specifies the name of

-    the output file. If it is omitted, the output is written to

-    <literal>output_analysis_engine_name.pear</literal> in the current working directory.</para>

-  

-  <para>During the running of this tool, work files are written to a temporary directory

-    created in the user&apos;s home directory.</para>

-  

-  <section id="ugr.tools.pear.merger.merge_details">

-    <title>Details of the merging process</title>

-    

-    <para>The PEARs are merged using the following steps:</para>

-    

-    <orderedlist><listitem><para>A temporary working directory, is created for the

-      output aggregate component.</para></listitem>

-      

-      <listitem><para>Each input PEAR file is extracted into a separate

-        &apos;input_component_name&apos; folder under the working directory.</para>

-        </listitem>

-      

-      <listitem><para>The extracted files are processed to adjust the

-        &apos;$main_root&apos; macros. This operation differs from the PEAR installation

-        operation, because it does not replace the macros with absolute paths.</para>

-        </listitem>

-      

-      <listitem><para>The output PEAR directory structure, &apos;metadata&apos; and

-        &apos;desc&apos; folders under the working directory, are created.</para>

-        </listitem>

-      

-      <listitem><para>The UIMA AE descriptor for the output aggregate component is built

-        in the &apos;desc&apos; folder. This aggregate descriptor refers to the input

-        delegate components, specifying &apos;fixed flow&apos; based on the original

-        order of the input components in the command line. The aggregate descriptor&apos;s

-        &apos;capabilities&apos; and

-        &apos;operational properties&apos; sections are built based on the input

-        components&apos; specifications.</para></listitem>

-      

-      <listitem><para>A new PEAR installation descriptor is created in the

-        &apos;metadata&apos; folder, referencing the new output aggregate descriptor

-        built in the previous step. </para></listitem>

-      

-      <listitem><para>The content of the temporary output working directory is zipped to

-        created the output PEAR, and then the temporary working directory is deleted.

-        </para></listitem></orderedlist>

-    

-    <para>The PEAR merger utility logs all the operations both to standard console output and

-      to a log file, pm.log, which is created in the current working directory.</para>

-    

-  </section>

-  

-  <section id="ugr.tools.pear.merger.testing_modifying_resulting_pear">

-    <title>Testing and Modifying the resulting PEAR</title>

-    

-    <para>The output PEAR file can be installed and tested using the PEAR Installer. The

-      output aggregate component can also be tested by using the CVD or DocAnalyzer

-      tools.</para>

-    

-    <para>The PEAR Installer creates Eclipse project files (.classpath and .project) in the

-      root directory of the installer PEAR, so the installed component can be imported into

-      the Eclipse IDE as an external project. Once the component is in the Eclipse IDE,

-      developers may use the Component Descriptor Editor and the PEAR Packager to modify the

-      output aggregate descriptor and re-package the component.</para>

-    

-  </section>

-  <section id="ugr.tools.pear.merger.restrictions_limitations">

-    <title>Restrictions and Limitations</title>

-    

-    <para>The PEAR Merger utility only does basic merging operations, and is limited as

-      follows. You can overcome these by editing the resulting PEAR file or the resulting

-      Aggregate Descriptor.</para>

-    

-    <orderedlist><listitem><para>The Merge operation specifies Fixed Flow sequencing

-      for the Aggregate.</para></listitem>

-      

-      <listitem><para>The merged aggregate does not define any parameters, so the delegate

-        parameters cannot be overridden.</para></listitem>

-      

-      <listitem><para>No External Resource definitions are generated for the

-        aggregate.</para></listitem>

-      

-      <listitem><para>No Sofa Mappings are generated for the aggregate.</para>

-        </listitem>

-      

-      <listitem><para>Name collisions are not checked for. Possible name collisions could

-        occur in the fully-qualified class names of the implementing Java classes, the names

-        of JAR files, the names of descriptor files, and the names of resource bindings or

-        resource file paths.</para></listitem>

-      

-      <listitem><para>The input and output capabilities are generated based on merging the

-        capabilities from the components (removing duplicates). Capability sets are

-        ignored - only the first of the set is used in this process, and only one set is created

-        for the generated Aggregate. There is no support for merging Sofa

-        specifications.</para></listitem>

-      

-      <listitem><para>No Indexes or Type Priorities are created for the generated

-        Aggregate. No checking is done to see if the Indexes or Type Priorities of the

-        components conflict or are inconsistent.</para></listitem>

-      

-      <listitem><para>You can only merge Analysis Engines and CAS Consumers. </para>

-        </listitem>

-      

-      <listitem><para>Although PEAR file installation descriptors that are being merged

-        can have specific XML elements describing Collection Reader and CAS Consumer

-        descriptors, these elements are ignored during the merge, in the sense that the

-        installation descriptor that is created by the merge does not set these elements. The

-        merge process does not use these elements; the output PEAR&apos;s new aggregate only

-        references the merged components&apos; main PEAR descriptor element, as

-        identified by the PEAR element:

-        

-        <programlisting><![CDATA[<SUBMITTED_COMPONENT>

-  <DESC>the_component.xml</DESC>... 

-</SUBMITTED_COMPONENT>

-]]></programlisting></para>

-        </listitem></orderedlist>

-    

-  </section>

-  

-</chapter>

diff --git a/uima-docbook-tools/src/docbook/tools.pear.packager.maven.xml b/uima-docbook-tools/src/docbook/tools.pear.packager.maven.xml
deleted file mode 100644
index 5152130..0000000
--- a/uima-docbook-tools/src/docbook/tools.pear.packager.maven.xml
+++ /dev/null
@@ -1,387 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [

-<!ENTITY imgroot "images/tools/tools.pear.packager.maven/" >

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  

-%uimaents;

-]>

-<!--

-  Licensed to the Apache Software Foundation (ASF) under one

-  or more contributor license agreements.  See the NOTICE file

-  distributed with this work for additional information

-  regarding copyright ownership.  The ASF licenses this file

-  to you under the Apache License, Version 2.0 (the

-  "License"); you may not use this file except in compliance

-  with the License.  You may obtain a copy of the License at

-  

-  http://www.apache.org/licenses/LICENSE-2.0

-  

-  Unless required by applicable law or agreed to in writing,

-  software distributed under the License is distributed on an

-  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-  KIND, either express or implied.  See the License for the

-  specific language governing permissions and limitations

-  under the License.

--->

-

-  <chapter id="ugr.tools.pear.packager.maven.plugin.usage">

-    <title>The PEAR Packaging Maven Plugin</title>

-    <para>

-      UIMA includes a Maven plugin that supports creating PEAR packages using Maven. 

-      When configured for a project, it assumes that the project has the PEAR layout, 

-      and will copy the standard directories that are part of a PEAR structure under the

-      project root into the PEAR, excluding files that start with a period (".").  

-      It also will put the Jar that is built for the project

-      into the lib/ directory and include it first on the generated classpath. 

-    </para>

-    

-    <para>

-      The classpath that is generated for this includes the artifact's Jar first, any user specified

-      entries second (in the order they are specified), and finally, entries for all Jars 

-      found in the lib/ directory (in some arbitrary order).

-    </para>

-    

-    <section id="ugr.tools.pear.packager.maven.plugin.usage.configure">

-      <title>Specifying the PEAR Packaging Maven Plugin</title>

-      

-    <para>

-      To use the PEAR Packaging Plugin within a Maven build, 

-      the plugin must be added to the plugins section of the 

-      Maven POM as shown below:

-    </para>

-    <para>

-    <programlisting><![CDATA[<build>

- <plugins>

-  ...

-  <plugin>

-    <groupId>org.apache.uima</groupId>

-    <artifactId>PearPackagingMavenPlugin</artifactId>

-    

-    <!-- if version is omitted, then --> 

-    <!-- version is inherited from parent's pluginManagement section -->

-    <!-- otherwise, include a version element here --> 

-    

-    <!-- says to load Maven extensions 

-         (such as packaging and type handlers) from this plugin -->

-    <extensions>true</extensions>  

-    <executions>

-      <execution>

-        <phase>package</phase>

-        <!-- where you specify details of the thing being packaged -->

-        <configuration>  

-          

-          <classpath>

-            <!-- PEAR file component classpath settings -->

-            $main_root/lib/sample.jar

-          </classpath>

-          

-          <mainComponentDesc>

-            <!-- PEAR file main component descriptor -->

-            desc/${artifactId}.xml

-          </mainComponentDesc>

-          

-          <componentId>

-            <!-- PEAR file component ID -->

-            ${artifactId}

-          </componentId>

-          

-          <datapath>

-            <!-- PEAR file UIMA datapath settings -->

-            $main_root/resources

-          </datapath>

-          

-        </configuration>

-        <goals>

-          <goal>package</goal>

-        </goals>

-      </execution>

-    </executions>

-  </plugin>

-  ...

- </plugins>

-</build>

-]]></programlisting>

-    </para>

-    

-    <para>

-      To configure the plugin with the specific settings of a PEAR package, the 

-      <code>&lt;configuration></code> element section is used. This sections contains all parameters 

-      that are used by the PEAR Packaging Plugin to package the right content and set the specific PEAR package settings.

-      The details about each parameter and how it is used is shown below:

-    </para>

-    <para>

-      <itemizedlist>

-        <listitem>

-          <para>

-            <code>&lt;classpath></code>

-              - This element specifies the classpath settings for the 

-              PEAR component. The Jar artifact that is built during the current Maven build is 

-              automatically added to the PEAR classpath settings and does not have to be added manually.

-              In addition, all Jars in the lib directory and its subdirectories will be added to the

-              generated classpath when the PEAR is installed.  

-          </para>

-          <note>

-            <para>Use $main_root variables to refer to libraries inside 

-              the PEAR package. For more details about PEAR packaging please refer to the 

-              Apache UIMA PEAR documentation.</para>

-          </note>

-        </listitem>

-        <listitem>

-          <para>

-            <code>&lt;mainComponentDesc></code>

-              - This element specifies the relative path to the main component descriptor 

-              that should be used to run the PEAR content. The path must be relative to the 

-              project root. A good default to use is <code>desc/${artifactId}.xml</code>.

-          </para>

-        </listitem>

-        <listitem>

-          <para>

-            <code>&lt;componentID></code>

-              - This element specifies the PEAR package component ID. A good default

-              to use is <code>${artifactId}</code>.

-          </para>

-        </listitem>

-        <listitem>

-          <para>

-            <code>&lt;datapath></code>

-              - This element specifies the PEAR package UIMA datapath settings.

-              If no datapath settings are necessary, this element can be omitted. 

-          </para>

-          <note>

-            <para>Use $main_root variables to refer libraries inside 

-              the PEAR package. For more details about PEAR packaging please refer to the 

-              Apache UIMA PEAR documentation.</para>

-          </note>

-        </listitem>

-      </itemizedlist>

-    </para>

-    <para>

-      For most Maven projects it is sufficient to specify the parameters described above. In some cases, for 

-      more complex projects, it may be necessary to specify some additional configuration 

-      parameters. These parameters are listed below with the default values that are used if they are not 

-      added to the configuration section shown above.

-    </para>

-    <para>

-      <itemizedlist>

-        <listitem>

-          <para>

-            <code>&lt;mainComponentDir></code>

-              - This element specifies the main component directory where the UIMA

-              nature is applied. By default this parameter points to the project root 

-              directory - ${basedir}.  

-          </para>

-        </listitem>

-        <listitem>

-          <para>

-            <code>&lt;targetDir></code>

-              - This element specifies the target directory where the result of the plugin 

-              are written to. By default this parameters points to the default Maven output 

-              directory - ${basedir}/target

-          </para>

-        </listitem>

-      </itemizedlist>

-    </para>

-    </section>

-    

-    <section id="ugr.tools.pear.packager.maven.plugin.usage.dependencies">

-      <title>Automatically including dependencies</title>

-

-      <para>

-        A key concept in PEARs is that they allow specifying other Jars in the classpath.

-        You can optionally include these Jars within the PEAR package.

-      </para>

-      <para>

-          The PEAR Packaging Plugin does not take care of automatically

-          adding these Jars (that the PEAR might depend on) to the PEAR archive. 

-          However, this

-          behavior can be manually added to your Maven POM. 

-          The following two build plugins

-          hook into the build cycle and insure that all runtime

-          dependencies are included in the PEAR file.

-      </para>

-

-      

-        <para>

-        The dependencies will be automatically included in the 

-        PEAR file using this procedure; the pear install process also will automatically

-        adds all files in the lib directory (and sub directories) to the 

-        classpath.

-        </para>

-      

-

-      <para>

-        The <code>maven-dependency-plugin</code>

-        copies the runtime dependencies of the PEAR into the

-        <code>lib</code> folder, which is where the PEAR packaging

-        plugin expects them.  

-      </para>

-

-        <programlisting><![CDATA[<build>

- <plugins>

-  ...

-  <plugin>

-   <groupId>org.apache.maven.plugins</groupId>

-   <artifactId>maven-dependency-plugin</artifactId>

-   <executions>

-    <!-- Copy the dependencies to the lib folder for the PEAR to copy -->

-    <execution>

-     <id>copy-dependencies</id>

-     <phase>package</phase>

-     <goals>

-      <goal>copy-dependencies</goal>

-     </goals>

-     <configuration>

-      <outputDirectory>${basedir}/lib</outputDirectory>

-      <overWriteSnapshots>true</overWriteSnapshots>

-      <includeScope>runtime</includeScope>

-     </configuration>

-    </execution>

-   </executions>

-  </plugin>

-  ...

- </plugins>

-</build>

-]]></programlisting>

-

-      <para>

-        The second Maven plug-in hooks into the <code>clean</code>

-        phase of the build life-cycle, and deletes the

-        <code>lib</code> folder.

-      </para>

-

-      <note>

-        <para>

-          With this approach, the <code>lib</code> folder is 

-          automatically created, populated, and removed

-          during the build process. Therefore it should not go into

-          the source control system and neither should you

-          manually place any jars in there.

-        </para>

-      </note>

-      

-        <programlisting><![CDATA[<build>

- <plugins>

-  ...

-  <plugin>

-   <artifactId>maven-antrun-plugin</artifactId>

-   <executions>

-    <!-- Clean the libraries after packaging -->

-    <execution>

-     <id>CleanLib</id>

-     <phase>clean</phase>

-     <configuration>

-      <tasks>

-       <delete quiet="true" 

-               failOnError="false">

-        <fileset dir="lib" includes="**/*.jar"/>

-       </delete>

-      </tasks>

-     </configuration>

-     <goals>

-      <goal>run</goal>

-     </goals>

-    </execution>                      

-   </executions>

-  </plugin>

-  ...

- </plugins>

-</build>

-]]></programlisting>

-

-    </section>

-<!-- 

-

- <section id="ugr.tools.pear.packager.maven.plugin.install">

-    <title>Installing The PEAR Packaging Plugin</title>

-

-    <para>If you specify the Apache Incubating Repository as one of the repositories 

-      for your maven configuration, then the <code>uima-pear-maven-plugin.jar</code> 

-      will be automatically fetched when needed.  

-      This is typically specified in the POM, the Maven .settings file or in 

-      a parent POM, using this format:

-    </para>

-    <programlisting><![CDATA[<repositories>

-  <repository>

-    <id>apache-incubating-repository</id>

-    <url>http://people.apache.org/repo/m2-incubating-repository</url>

-    <releases>

-      

-      <updatePolicy>never</updatePolicy> 

-    </releases>

-  </repository>

-</repositories>]]></programlisting>

-

-

-    <para>

-      Otherwise, the 

-      <code>uima-pear-maven-plugin.jar</code> file must be manually installed into your local

-      repository.  See <ulink url="http://maven.apache.org/general.html#importing-jars"/>.

-      The information you need to do this is:

-      <itemizedlist spacing="compact">

-        <listitem><para><code>-DgroupId=org.apache.uima</code></para></listitem>

-        <listitem><para><code>-DartifactId=PearPackagingMavenPlugin</code></para></listitem>

-        <listitem><para><code>-Dversion=2.3.0-incubating</code>  (change this to the version you want)</para></listitem>

-        <listitem><para><code>-Dpackaging=jar</code></para></listitem>

-        <listitem><para><code>-DgeneratePom=true</code></para></listitem>

-      </itemizedlist>

-    </para>

-  </section>

--->

-  <section id="ugr.tools.pear.packager.maven.plugin.commandline">

-    <title>Running from the command line</title>

-    <para>

-      The pear packager can be run as a maven command.  To enable this, you have to add the following to your

-      maven settings file:

-      <programlisting><![CDATA[<settings>

-  ...

-  <pluginGroups>

-    <pluginGroup>org.apache.uima</pluginGroup>

-  </pluginGroups>]]></programlisting>

-      To invoke the pear packager using maven, use the command:

-      <programlisting><![CDATA[mvn uima-pear:package <parameters...>]]></programlisting>

-      The settings are the same ones used in the configuration above, specified as -D variables 

-      where the variable name is pear.parameterName.

-      For example:

-      <programlisting><![CDATA[mvn uima-pear:package -Dpear.mainComponentDesc=desc/mydescriptor.xml

-                      -Dpear.componentId=foo]]></programlisting> 

-    </para>

-  </section>

-  

-  <section id="ugr.tools.pear.packager.maven.plugin.install.src">

-    <title>Building the PEAR Packaging Plugin From Source</title>

-    <para>

-      The plugin code is available in the Apache

-      subversion repository at:

-      <ulink url="http://svn.apache.org/repos/asf/uima/uimaj/trunk/PearPackagingMavenPlugin"/>.

-      Use the following command line to build it (you will need the Maven build tool, available from Apache):

-    </para>

-    <para>

-    <programlisting><![CDATA[#PearPackagingMavenPlugin> mvn install]]></programlisting>

-    </para>

-    <para>

-      This maven command will build the tool and install it in your local maven repository, 

-      making it available for use by other maven POMs.  The plugin version number

-      is displayed at the end of the Maven build as shown in the example below. For this example, the plugin 

-      version number is: <code>2.3.0-incubating</code> 

-    </para>

-    <para>

-    <programlisting><![CDATA[[INFO] Installing 

-/code/apache/PearPackagingMavenPlugin/target/

-PearPackagingMavenPlugin-2.3.0-incubating.jar 

-to 

-/maven-repository/repository/org/apache/uima/PearPackagingMavenPlugin/

-2.3.0-incubating/

-PearPackagingMavenPlugin-2.3.0-incubating.jar

-[INFO] [plugin:updateRegistry]

-[INFO] --------------------------------------------------------------

-[INFO] BUILD SUCCESSFUL

-[INFO] --------------------------------------------------------------

-[INFO] Total time: 6 seconds

-[INFO] Finished at: Tue Nov 13 15:07:11 CET 2007

-[INFO] Final Memory: 10M/24M

-[INFO] --------------------------------------------------------------]]></programlisting>

-    </para>

-  </section>

-</chapter>

-

-

diff --git a/uima-docbook-tools/src/docbook/tools.pear.packager.xml b/uima-docbook-tools/src/docbook/tools.pear.packager.xml
deleted file mode 100644
index a3caca0..0000000
--- a/uima-docbook-tools/src/docbook/tools.pear.packager.xml
+++ /dev/null
@@ -1,389 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY imgroot "images/tools/tools.pear.packager/" >

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.tools.pear.packager">

-  <title>PEAR Packager User&apos;s Guide</title>

-  

-  <para>A PEAR (Processing Engine ARchive) file is a standard package for UIMA (Unstructured

-    Information Management Architecture) components. The PEAR package can be used for

-    distribution and reuse by other components or applications. It also allows applications

-    and tools to manage UIMA components automatically for verification, deployment,

-    invocation, testing, etc. Please refer to <olink targetdoc="&uima_docs_ref;"/>

-    <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.pear"/>

-    for more information about the internal structure of a PEAR file.</para>

-  

-  <para>This chapter describes how to use the PEAR Eclipse plugin or the PEAR command line packager 

-  to create PEAR files for standard UIMA components.</para>

-  

-  <section id="ugr.tools.pear.packager.using_eclipse_plugin">

-    <title>Using the PEAR Eclipse Plugin</title>

-    

-    <para>The PEAR Eclipse plugin is automatically installed if you followed the directions in

-      <olink targetdoc="&uima_docs_overview;"/>

-      <olink targetdoc="&uima_docs_overview;" targetptr="ugr.ovv.eclipse_setup"/>. The use of the 

-      plugin involves the following two steps:</para>

-    

-    <itemizedlist spacing="compact"><listitem><para>Add the UIMA nature to your project

-      </para></listitem>

-      

-      <listitem><para>Create a PEAR file using the PEAR generation wizard </para>

-        </listitem></itemizedlist>

-    

-    <section id="ugr.tools.pear.packager.add_uima_nature">

-      <title>Add UIMA Nature to your project</title>

-      

-      <para>First, create a project for your UIMA component:</para>

-      

-      <itemizedlist spacing="compact"><listitem><para>Create a Java project, which

-        would contain all the files and folders needed for your UIMA component.</para>

-        </listitem>

-        

-        <listitem><para>Create a source folder called <quote>src</quote> in your

-          project, and make it the only source folder, by clicking on

-          <quote>Properties</quote> in your project&apos;s context menu (right-click),

-          then select <quote>Java Build Path</quote>, then add the <quote>src</quote>

-          folder to the source folders list, and remove any other folder from the

-          list.</para></listitem>

-        

-        <listitem><para>Specify an output folder for your project called bin, by clicking

-          on <quote>Properties</quote> in your project&apos;s context menu

-          (right-click), then select <quote>Java Build Path</quote>, and specify

-          <quote><emphasis>your_project_name</emphasis>/bin</quote> as the default

-          output folder. </para></listitem></itemizedlist>

-      

-      <para>Then, add the UIMA nature to your project by clicking on <quote>Add UIMA

-        Nature</quote> in the context menu (right-click) of your project. Click

-        <quote>Yes</quote> on the <quote>Adding UIMA custom Nature</quote> dialog box.

-        Click <quote>OK</quote> on the confirmation dialog box.

-        

-        

-        <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="5.8in" format="JPG" fileref="&imgroot;image002.jpg"/>

-        </imageobject>

-        <textobject><phrase>Screenshot of Adding the UIMA Nature to your project</phrase>

-        </textobject>

-      </mediaobject>

-    </screenshot></para>

-      

-      <para>Adding the UIMA nature to your project creates the PEAR structure in your

-        project. The PEAR structure is a structured tree of folders and files, including the

-        following elements:

-        

-        <itemizedlist><listitem><para><emphasis role="bold">Required

-          Elements:</emphasis>

-          

-          <itemizedlist><listitem><para>The <emphasis role="bold">

-            metadata</emphasis> folder which contains the PEAR installation descriptor

-            and properties files.</para></listitem>

-            

-            <listitem><para>The installation descriptor (<emphasis role="bold">

-              metadata/install.xml</emphasis>)

-              </para></listitem></itemizedlist></para></listitem>

-          

-          <listitem><para><emphasis role="bold">Optional Elements:</emphasis>

-            

-            <itemizedlist><listitem><para>The <emphasis role="bold">

-              desc</emphasis> folder to contain descriptor files of analysis engines,

-              component analysis engines (all levels), and other component (Collection

-              Readers, CAS Consumers, etc).</para></listitem>

-              

-              <listitem><para>The <emphasis role="bold">src </emphasis>folder to

-                contain the source code</para></listitem>

-              

-              <listitem><para>The <emphasis role="bold">bin</emphasis> folder to

-                contain executables, scripts, class files, dlls, shared libraries,

-                etc.</para></listitem>

-              

-              <listitem><para>The <emphasis role="bold">lib</emphasis> folder to

-                contain jar files. </para></listitem>

-              

-              <listitem><para>The <emphasis role="bold">doc </emphasis>folder

-                containing documentation materials, preferably accessible through an

-                index.html.</para></listitem>

-              

-              <listitem><para>The <emphasis role="bold">data</emphasis> folder to

-                contain data files (e.g. for testing).</para></listitem>

-              

-              <listitem><para>The <emphasis role="bold">conf</emphasis> folder to

-                contain configuration files.</para></listitem>

-              

-              <listitem><para>The <emphasis role="bold">resources</emphasis> folder

-                to contain other resources and dependencies.</para></listitem>

-              

-              <listitem><para>Other user-defined folders or files are allowed, but

-                <emphasis>should be avoided</emphasis>. </para></listitem>

-              </itemizedlist> </para></listitem></itemizedlist></para>

-      

-      <para>For more information about the PEAR structure, please refer to the

-        <quote>Processing Engine Archive</quote> section.

-        

-        <figure id="ugr.tools.pear.packager.fig.pear_structure">

-          <title>The Pear Structure</title>

-          <mediaobject>

-            <imageobject>

-              <imagedata width="3in" format="JPG"

-                fileref="&imgroot;image004.jpg"/>

-            </imageobject>

-            <textobject><phrase>Pear structure</phrase>

-            </textobject>

-          </mediaobject>

-        </figure></para>

-      

-    </section>

-    <section id="ugr.tools.pear.packager.using_pear_generation_wizard">

-      <title>Using the PEAR Generation Wizard</title>

-      

-      <para>Before using the PEAR Generation Wizard, add all the files needed to

-        run your component including descriptors, jars, external libraries, resources,

-        and component analysis engines (in the case of an aggregate analysis engine), etc.

-        <emphasis>Do not</emphasis> add Jars for the UIMA framework, however.  Doing so will

-        cause class loading problems at run time.</para>

-      <para>

-        If you're using a Java IDE like Eclipse, instead of using the output folder (usually 

-        <literal>bin</literal> as the source of your classes, it&apos;s recommended that 

-        you generate a Jar file containing these classes.</para>

-      

-      <para>Then, click on <quote>Generate PEAR file</quote> from the context menu

-        (right-click) of your project, to open the PEAR Generation wizard, and follow the

-        instructions on the wizard to generate the PEAR file.</para>

-      

-      <section id="ugr.tools.pear.packager.wizard.component_information">

-        <title>The Component Information page</title>

-        

-        <para>The first page of the PEAR generation wizard is the component information

-          page. Specify in this page a component ID for your PEAR and select the main Analysis

-          Engine descriptor. The descriptor must be specified using a pathname relative to

-          the project&apos;s root (e.g. <quote>desc/MyAE.xml</quote>). The component id

-          is a string that uniquely identifies the component. It should use the JAVA naming

-          convention (e.g. org.apache.uima.mycomponent).</para>

-        

-        <para>Optionally, you can include specific Collection Iterator, CAS Initializer (deprecated

-          as of Version 2.1),

-          or CAS Consumers. In this case, specify the corresponding descriptors in this

-          page.

-          

-          <figure id="ugr.tools.pear.packager.fig.wizard.component_information">

-            <title>The Component Information Page</title>

-            <mediaobject>

-              <imageobject>

-                <imagedata width="5.8in" format="JPG"

-                  fileref="&imgroot;image006.jpg"/>

-              </imageobject>

-              <textobject><phrase>Pear Wizard - component information page</phrase>

-              </textobject>

-            </mediaobject>

-          </figure></para>

-        

-      </section>

-      

-      <section id="ugr.tools.pear.packager.wizard.install_environment">

-        <title>The Installation Environment page</title>

-        

-        <para>The installation environment page is used to specify the following:

-          

-          <itemizedlist spacing="compact"><listitem><para>Preferred operating

-            system</para></listitem>

-            

-            <listitem><para>Required JDK version, if applicable.</para></listitem>

-            

-            <listitem><para>Required Environment variable settings.  This is where

-              you specify special CLASSPATH paths.  You do not need to specify this for

-              any Jar that is listed in the your eclipse project classpath settings; those are automatically

-              put into the generated CLASSPATH.  Nor should you include paths to the

-              UIMA Framework itself, here.  Doing so may cause class loading problems.

-            </para>

-              

-            <para>CLASSPATH segments are written here using a semicolon ";" as the separator;

-              during PEAR installation, these will be adjusted to be the correct character for the

-              target Operating System.</para>

-            

-            <para>In order to specify the UIMA datapath for your component you have to create an environment

-            variable with the property name <literal>uima.datapath</literal>. The value of this property 

-            must contain the UIMA datapath settings.</para>

-              

-              </listitem></itemizedlist></para>

-        

-        <para>Path names should be specified using macros (see below), instead of

-          hard-coded absolute paths that might work locally, but probably won&apos;t if the

-          PEAR is deployed in a different machine and environment.</para>

-        

-        <para>Macros are variables such as $main_root, used to represent a string such as the

-          full path of a certain directory.</para>

-        

-        <para>These macros should be defined in the PEAR.properties file using the local

-          values. The tools and applications that use and deploy PEAR files should replace

-          these macros (in the files included in the conf and desc folders) with the

-          corresponding values in the local environment as part of the deployment

-          process.</para>

-        

-        <para>Currently, there are two types of macros:</para>

-        

-        <itemizedlist><listitem><para>$main_root, which represents the local absolute

-          path of the main component root directory after deployment.</para></listitem>

-          

-          <listitem><para><emphasis>$component_id$root</emphasis>, which

-            represents the local absolute path to the root directory of the component which

-            has <emphasis>component_id</emphasis> as component ID. This component could

-            be, for instance, a delegate component. </para></listitem></itemizedlist>

-        

-        <figure id="ugr.tools.pear.packager.fig.wizard.install_environment">

-          <title>The Installation Environment Page</title>

-          <mediaobject>

-            <imageobject>

-              <imagedata width="5.8in" format="JPG"

-                fileref="&imgroot;image008.jpg"/>

-            </imageobject>

-            <textobject><phrase>Pear Wizard - install environment page</phrase>

-            </textobject>

-          </mediaobject>

-        </figure>

-        

-      </section>

-      

-      <section id="ugr.tools.pear.packager.wizard.file_content">

-        <title>The PEAR file content page</title>

-        

-        <para>The last page of the wizard is the <quote>PEAR file Export</quote> page, which

-          allows the user to select the files to include in the PEAR file. The metadata folder

-          and all its content is mandatory. Make sure you include all the files needed to run

-          your component including descriptors, jars, external libraries, resources, and

-          component analysis engines (in the case of an aggregate analysis engine), etc.

-          It&apos;s recommended to generate a jar file from your code as an alternative to

-          building the project and making sure the output folder (bin) contains the required

-          class files.</para>

-        

-        <para>Eclipse compiles your class files into some output directory, often named

-        "bin" when you take the usual defaults in Eclipse.  The recommended practice is to

-        take all these files and put them into a Jar file, perhaps using the Eclipse Export 

-        wizard.  You would place that Jar file into the PEAR <literal>lib</literal> directory.</para>

-        

-        <note><para>If you are relying on the class files generated in the output folder

-          (usually called bin) to run your code, then make sure the project is built properly,

-          and all the required class files are generated without errors, and then put the

-          output folder (e.g. $main_root/bin) in the classpath using the option to set

-          environment variables, by setting the CLASSPATH variable to include this folder (see the

-          <quote>Installation Environment</quote> page.

-          Beware that using a Java output folder named "bin" in this case is a poor practice, 

-            because the PEAR installation

-          tools will presume this folder contains binary executable files, and will adds this folder to 

-          the PATH environment variable.

-          </para>  </note>        

-          <figure id="ugr.tools.pear.packager.fig.wizard.export">

-            <title>The PEAR File Export Page</title>

-            <mediaobject>

-              <imageobject>

-                <imagedata width="5.7in" format="JPG"

-                  fileref="&imgroot;image010.jpg"/>

-              </imageobject>

-              <textobject><phrase>Pear Wizard - File Export Page</phrase>

-              </textobject>

-            </mediaobject>

-          </figure>

-          

-          

-        

-      </section>

-    </section>

-  </section>

-  

-  <section id="ugr.tools.pear.packager.using_command_line">

-    <title>Using the PEAR command line packager</title>

-     <para>The PEAR command line packager takes some PEAR package parameter settings on the command line to create an 

-     UIMA PEAR file.</para>

-     

-     <para>To run the PEAR command line packager you can use the provided runPearPackager (.bat for Windows, and .sh for Unix) 

-     scripts. The packager can be used in three different modes.</para>

-     <para><itemizedlist>

-     <listitem> 

-     	<para>Mode 1: creates a complete PEAR package with the provided information (default mode)</para>

-     	<para><programlisting>runPearPackager -compID &lt;componentID> 

-  -mainCompDesc &lt;mainComponentDesc> [-classpath &lt;classpath>] 

-  [-datapath &lt;datapath>] -mainCompDir &lt;mainComponentDir> 

-  -targetDir &lt;targetDir> [-envVars &lt;propertiesFilePath>]</programlisting></para>   

-     	<para> The created PEAR file has the file name &lt;componentID>.pear and is located in the &lt;targetDir>.</para>

-     </listitem>

-     

-     <listitem> 

-     	<para>Mode 2: creates a PEAR installation descriptor without packaging the PEAR file</para>

-     	<para><programlisting>runPearPackager -create -compID &lt;componentID> 

-  -mainCompDesc &lt;mainComponentDesc> [-classpath &lt;classpath>]

-  [-datapath &lt;datapath>] -mainCompDir &lt;mainComponentDir> 

-  [-envVars &lt;propertiesFilePath>]</programlisting></para>

-     	<para> The PEAR installation descriptor is created in the &lt;mainComponentDir>/metadata directory.</para>

-     </listitem>

-

-     <listitem>

-     	<para>Mode 3: creates a PEAR package with an existing PEAR installation descriptor</para>

-     	<para><programlisting>runPearPackager -package -compID &lt;componentID> 

-  -mainCompDir &lt;mainComponentDir> -targetDir &lt;targetDir></programlisting></para>

-      	<para> The created PEAR file has the file name &lt;componentID>.pear and is located in the &lt;targetDir>.</para>

-     </listitem>

-     	 

-     </itemizedlist>

-     </para>          

-     <para>The modes 2 and 3 should be used when you want to manipulate the PEAR installation descriptor before packaging

-     the PEAR file. </para>

-     

-     <para>Some more details about the PearPackager parameters is provided in the list below:</para>

-     <para><itemizedlist>

-     <listitem> 

-     	<simpara><literal>&lt;componentID></literal>: PEAR package component ID.</simpara>

-     </listitem>

-     

-     <listitem> 

-     	<simpara><literal>&lt;mainComponentDesc></literal>: Main component descriptor of the PEAR package.</simpara>

-     </listitem>

-

-     <listitem>

-     	<simpara><literal>&lt;classpath></literal>: PEAR classpath settings. Use $main_root macros to specify

-     	path entries. Use <literal>;</literal> to separate the entries.</simpara>

-     </listitem>

-

-     <listitem>

-     	<simpara><literal>&lt;datapath></literal>: PEAR datapath settings. Use $main_root macros to specify

-     	path entries. Use <literal>;</literal> to separate the path entries.</simpara>

-     </listitem>

-

-     <listitem>

-     	<simpara><literal>&lt;mainComponentDir></literal>: Main component directory that contains the PEAR package content.</simpara>

-     </listitem>

-

-     <listitem>

-     	<simpara><literal>&lt;targetDir></literal>: Target directory where the created PEAR file is written to.</simpara>

-     </listitem>

-

-     <listitem>

-     	<simpara><literal>&lt;propertiesFilePath></literal>: Path name to a properties file that contains environment variables that must be

-     	set to run the PEAR content.</simpara>

-     </listitem>

-          	 

-     </itemizedlist>

-     

-     </para>

-   </section>

-  

-</chapter>
\ No newline at end of file
diff --git a/uima-docbook-tools/src/docbook/tools.xml b/uima-docbook-tools/src/docbook/tools.xml
deleted file mode 100644
index e6d9b00..0000000
--- a/uima-docbook-tools/src/docbook/tools.xml
+++ /dev/null
@@ -1,41 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<book lang="en">

-  <title>UIMA Tools Guide and Reference</title>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="../../target/docbook-shared/common_book_info.xml"/>

-  <toc/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tools.cde.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tools.cpe.xml"/>

-  <!--

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tools.eclipse_launcher.xml"/>

-  -->

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tools.doc_analyzer.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tools.annotation_viewer.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tools.cvd.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tools.eclipse_launcher.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tools.caseditor.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tools.jcasgen.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tools.pear.packager.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tools.pear.packager.maven.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tools.pear.installer.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tools.pear.merger.xml"/>

-</book>

diff --git a/uima-docbook-tutorials-and-users-guides/pom.xml b/uima-docbook-tutorials-and-users-guides/pom.xml
deleted file mode 100644
index 50952bb..0000000
--- a/uima-docbook-tutorials-and-users-guides/pom.xml
+++ /dev/null
@@ -1,50 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!--
-   Licensed to the Apache Software Foundation (ASF) under one
-   or more contributor license agreements.  See the NOTICE file
-   distributed with this work for additional information
-   regarding copyright ownership.  The ASF licenses this file
-   to you under the Apache License, Version 2.0 (the
-   "License"); you may not use this file except in compliance
-   with the License.  You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
-   Unless required by applicable law or agreed to in writing,
-   software distributed under the License is distributed on an
-   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-   KIND, either express or implied.  See the License for the
-   specific language governing permissions and limitations
-   under the License.    
--->
-<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
-  <modelVersion>4.0.0</modelVersion>
-
-  <parent>
-    <groupId>org.apache.uima</groupId>
-    <artifactId>uimaj-parent</artifactId>
-    <version>3.5.0-SNAPSHOT</version>
-    <relativePath>../uimaj-parent/pom.xml</relativePath>
-  </parent>
-
-  <artifactId>uima-docbook-tutorials-and-users-guides</artifactId>
-  <packaging>pom</packaging>
-  <name>Apache UIMA SDK Documentation - tutorials and user's guides</name>
-  <url>${uimaWebsiteUrl}</url>
-
-  <properties>
-    <!-- next property is the name of the top file under src/docbook without trailing .xml -->
-    <bookNameRoot>tutorials_and_users_guides</bookNameRoot>
-  </properties>
-
-  <repositories>
-    <repository>
-      <id>apache.snapshots</id>
-      <name>Apache Snapshot Repository</name>
-      <url>https://repository.apache.org/snapshots</url>
-      <releases>
-        <enabled>false</enabled>
-      </releases>
-    </repository>
-  </repositories>
-</project>
\ No newline at end of file
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/annotator_analysis_engine_guide.xml b/uima-docbook-tutorials-and-users-guides/src/docbook/annotator_analysis_engine_guide.xml
deleted file mode 100644
index 9944a3f..0000000
--- a/uima-docbook-tutorials-and-users-guides/src/docbook/annotator_analysis_engine_guide.xml
+++ /dev/null
@@ -1,2797 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY imgroot "images/tutorials_and_users_guides/tug.aae/">

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent">  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.tug.aae">

-  <title>Annotator and Analysis Engine Developer&apos;s Guide</title>

-  <titleabbrev>Annotator &amp; AE Developer&apos;s Guide</titleabbrev>

-  

-  <para>This chapter describes how to develop UIMA <emphasis>type systems</emphasis>,

-    <emphasis>Annotators</emphasis> and <emphasis>Analysis Engines</emphasis> using

-    the UIMA SDK. It is helpful to read the UIMA Conceptual Overview chapter for a review on

-    these concepts.</para>

-  

-  <para>An <emphasis>Analysis Engine (AE)</emphasis> is a program that analyzes artifacts

-    (e.g. documents) and infers information from them.</para>

-  

-  <para>Analysis Engines are constructed from building blocks called

-    <emphasis>Annotators</emphasis>. An annotator is a component that contains analysis

-    logic. Annotators analyze an artifact (for example, a text document) and create

-    additional data (metadata) about that artifact. It is a goal of UIMA that annotators need

-    not be concerned with anything other than their analysis logic &ndash; for example the

-    details of their deployment or their interaction with other annotators.</para>

-  

-  <para>An Analysis Engine (AE) may contain a single annotator (this is referred to as a

-    <emphasis>Primitive AE)</emphasis>, or it may be a composition of others and therefore

-    contain multiple annotators (this is referred to as an <emphasis>Aggregate

-    AE</emphasis>). Primitive and aggregate AEs implement the same interface and can be used

-    interchangeably by applications.</para>

-  

-  <para>Annotators produce their analysis results in the form of typed <emphasis>Feature

-    Structures</emphasis>, which are simply data structures that have a type and a set of

-    (attribute, value) pairs. An <emphasis>annotation</emphasis> is a particular type of

-    Feature Structure that is attached to a region of the artifact being analyzed (a span of

-    text in a document, for example).</para>

-  

-  <para>For example, an annotator may produce an Annotation over the span of text

-    <literal>President Bush</literal>, where the type of the Annotation is

-    <literal>Person</literal> and the attribute <literal>fullName</literal> has the

-    value <literal>George W. Bush</literal>, and its position in the artifact is character

-    position 12 through character position 26.</para>

-  

-  <para>It is also possible for annotators to record information associated with the entire

-    document rather than a particular span (these are considered Feature Structures but not

-    Annotations).</para>

-  

-  <para>All feature structures, including annotations, are represented in the UIMA

-    <emphasis>Common Analysis Structure(CAS)</emphasis>. The CAS is the central data

-    structure through which all UIMA components communicate. Included with the UIMA SDK is an

-    easy-to-use, native Java interface to the CAS called the <emphasis>JCas</emphasis>.

-    The JCas represents each feature structure as a Java object; the example feature

-    structure from the previous paragraph would be an instance of a Java class Person with

-    getFullName() and setFullName() methods. 

-  </para>

-  

-  <para>The CAS interface for accessing feature structures uses UIMA Type an Feature object instances,

-    which are computed at run time, depending on the type system being used.  This interface supports

-    writing general annotators which can work for all type systems.  It is used, for example, internally,

-    in the CasCopier implementation, to copy the content of one CAS to another.

-  </para>

-  

-  <para>The JCas interface can take advantage of knowing ahead of time the particular Types and Features

-    a pipeline is using.  The JCas Classes correspond to a particular UIMA type, and the class includes 

-    special setters and getters whose names match the features.

-  </para>

-    

-  <para>The remainder of this chapter will refer to the analysis of text documents and the

-    creation of annotations that are attached to spans of text in those documents. Keep in mind

-    that the CAS can represent arbitrary types of feature structures, and feature structures

-    can refer to other feature structures. For example, you can use the CAS to represent a parse

-    tree for a document. Also, the artifact that you are analyzing need not be a text

-    document.</para>

-  

-  <para>This guide is organized as follows:</para>

-  

-  <itemizedlist>

-    <listitem>

-      <para><emphasis role="bold-italic"><xref linkend="ugr.tug.aae.getting_started"/></emphasis> is a

-        tutorial with step-by-step instructions for how to develop and test a simple UIMA annotator.</para>

-    </listitem>

-    <listitem>

-      <para><emphasis role="bold-italic"><xref linkend="ugr.tug.aae.configuration_logging"/>

-        </emphasis> discusses how to make your UIMA annotator configurable, and how it can write messages to the UIMA

-        log file.</para>

-    </listitem>

-    <listitem>

-      <para> <emphasis role="bold-italic"><xref linkend="ugr.tug.aae.building_aggregates"/></emphasis>

-        describes how annotators can be combined into aggregate analysis engines. It also describes how one

-        annotator can make use of the analysis results produced by an annotator that has run previously.</para>

-    </listitem>

-    <listitem>

-      <para><emphasis role="bold-italic"><xref linkend="ugr.tug.aae.other_examples"/></emphasis>

-        describes several other examples you may find interesting, including</para>

-      

-      <itemizedlist spacing="compact">

-        <listitem>

-          <para>SimpleTokenAndSentenceAnnotator

-            &ndash; a simple tokenizer and sentence annotator.</para>

-        </listitem>

-        

-        <listitem>

-          <para>PersonTitleDBWriterCasConsumer &ndash; a sample CAS Consumer which populates a relational

-            database with some annotations. It uses JDBC and in this example, hooks up with the Open Source Apache

-            Derby database. </para>

-        </listitem>

-      </itemizedlist>

-    </listitem>

-    <listitem>

-      <para><emphasis role="bold-italic"><xref linkend="ugr.tug.aae.additional_topics"/></emphasis>

-        describes additional features of the UIMA SDK that may help you in building your own annotators and analysis

-        engines.</para>

-    </listitem>

-    <listitem>

-      <para><emphasis role="bold-italic"><xref linkend="ugr.tug.aae.common_pitfalls"/> </emphasis>

-        contains some useful guidelines to help you ensure that your annotators will work correctly in any UIMA

-        application.</para>

-    </listitem>

-  </itemizedlist>

-  

-  <para>This guide does not discuss how to build UIMA Applications, which are programs that

-    use Analysis Engines, along with other components, e.g. a search engine, document store,

-    and user interface, to deliver a complete package of functionality to an end-user. For

-    information on application development, see <olink

-      targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.application"

-       xrefstyle="select: label quotedtitle"/>

-    .</para>

-  

-  <section id="ugr.tug.aae.getting_started">

-    <title>Getting Started</title>

-    

-    <para>This section is a step-by-step tutorial that will get you started developing UIMA

-      annotators. All of the files referred to by the examples in this chapter are in the

-      <literal>examples</literal> directory of the UIMA SDK. This directory is designed to

-      be imported into your Eclipse workspace; see <olink targetdoc="&uima_docs_overview;"/>

-      <olink targetdoc="&uima_docs_overview;"

-        targetptr="ugr.ovv.eclipse_setup.example_code"/> for instructions on how to do

-      this. 

-      See <olink targetdoc="&uima_docs_overview;"/> <olink  targetdoc="&uima_docs_overview;"

-        targetptr="ugr.ovv.eclipse_setup.linking_uima_javadocs"/> for how to attach the UIMA 

-        Javadocs to the jar files.

-      Also you may wish to refer to the UIMA SDK Javadocs located in the <ulink

-        url="api/index.html">docs/api/index.html</ulink> directory.</para>

-    

-        <note><para>If you hover over a UIMA class or method defined in the UIMA SDK

-    Javadocs, the Javadocs appear after a short delay. </para></note>

-    <note><para>If you downloaded the source distribution for UIMA, you can attach that as

-    well to the library Jar files; for information on how to do this, see

-    <olink targetdoc="&uima_docs_ref;"/>

-    <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.javadocs"/>.</para></note>

-

-    <para>The example annotator that we are going to walk through will detect room numbers for

-      rooms where the room numbering scheme follows some simple conventions. In our example,

-      there are two kinds of patterns we want to find; here are some examples, together with

-      their corresponding regular expression patterns:

-      <variablelist>

-        <varlistentry>

-          <term>Yorktown patterns:</term>

-          <listitem><para>20-001, 31-206, 04-123(Regular Expression Pattern:

-            ##-[0-2]##)</para></listitem>

-        </varlistentry>

-        <varlistentry>

-          <term>Hawthorne patterns:</term>

-          <listitem><para>GN-K35, 1S-L07, 4N-B21 (Regular Expression Pattern:

-            [G1-4][NS]-[A-Z]##)</para></listitem>

-        </varlistentry>

-      </variablelist> </para>

-    

-    <para>There are several steps to develop and test a simple UIMA annotator.</para>

-    

-    <orderedlist spacing="compact"><listitem><para>Define the CAS types that the

-      annotator will use.</para></listitem>

-      

-      <listitem><para>Generate the Java classes for these types.</para></listitem>

-      

-      <listitem><para>Write the actual annotator Java code.</para></listitem>

-      

-      <listitem><para>Create the Analysis Engine descriptor.</para></listitem>

-      

-      <listitem><para>Test the annotator. </para></listitem></orderedlist>

-    

-    <para>These steps are discussed in the next sections.</para>

-    

-    <section id="ugr.tug.aae.defining_types">

-      <title>Defining Types</title>

-      

-      <para>The first step in developing an annotator is to define the CAS Feature Structure

-        types that it creates. This is done in an XML file called a <emphasis>Type System

-        Descriptor</emphasis>. UIMA defines basic primitive types such as

-        Boolean, Byte, Short, Integer, Long, Float, and Double, as well as Arrays of these primitive

-        types.  UIMA also defines the built-in types <literal>TOP</literal>, which is the root 

-        of the type system, analogous to Object in Java; <literal>FSArray</literal>, which is 

-        an array of Feature Structures (i.e. an array of instances of TOP); and

-        <literal>Annotation</literal>, which we will discuss in more detail in this section.</para>

-      

-      <para>UIMA includes an Eclipse plug-in that will help you edit Type System

-        Descriptors, so if you are using Eclipse you will not need to worry about the details of

-        the XML syntax. See <olink targetdoc="&uima_docs_overview;"/> <olink targetdoc="&uima_docs_overview;"

-          targetptr="ugr.ovv.eclipse_setup"/> for instructions on setting up Eclipse and

-        installing the plugin.</para>

-      

-      <para>The Type System Descriptor for our annotator is located in the file

-        <literal>descriptors/tutorial/ex1/TutorialTypeSystem.xml.</literal> (This

-        and all other examples are located in the <literal>examples</literal> directory of

-        the installation of the UIMA SDK, which can be imported into an Eclipse project for

-        your convenience, as described in <olink targetdoc="&uima_docs_overview;"/>

-        <olink targetdoc="&uima_docs_overview;"

-          targetptr="ugr.ovv.eclipse_setup.example_code"/>.)</para>

-      

-      <para>In Eclipse, expand the <literal>uimaj-examples</literal> project in the

-        Package Explorer view, and browse to the file

-        <literal>descriptors/tutorial/ex1/TutorialTypeSystem.xml</literal>.

-        Right-click on the file in the navigator and select Open With &rarr; Component

-        Descriptor Editor. Once the editor opens, click on the <quote>Type System</quote>

-        tab at the bottom of the editor window. You should see a view such as the

-        following:</para>

-      

-      

-      <screenshot>

- <mediaobject>

-        <imageobject>

-          <imagedata scale="100" format="JPG" fileref="&imgroot;image002.jpg"/>

-        </imageobject>

-        <textobject><phrase>Screenshot of editor for Type System Definitions</phrase></textobject>

-      </mediaobject>

-  </screenshot>

-      

-      <para>Our annotator will need only one type &ndash;

-        <literal>org.apache.uima.tutorial.RoomNumber</literal>. (We use the same

-        namespace conventions as are used for Java classes.) Just as in Java, types have

-        supertypes. The supertype is listed in the second column of the left table. In this

-        case our RoomNumber annotation extends from the built-in type

-        <literal>uima.tcas.Annotation</literal>.</para>

-      

-      <para>Descriptions can be included with types and features. In this example, there is a

-        description associated with the <literal>building</literal> feature. To see it,

-        hover the mouse over the feature.</para>

-      

-      <para>The bottom tab labeled <quote>Source</quote> will show you the XML source file

-        associated with this descriptor.</para>

-      

-      <para>The built-in Annotation type declares three fields (called

-        <emphasis>Features</emphasis> in CAS terminology).  The features <literal>begin</literal>

-        and <literal>end</literal> store the character offsets of the span of text to which the 

-        annotation refers.  The feature <literal>sofa</literal> (Subject of Analysis) indicates

-        which document the begin and end offsets point into.  The <literal>sofa</literal> feature

-        can be ignored for now since we assume in this tutorial that the CAS contains only one

-        subject of analysis (document).</para>

-      <para>Our RoomNumber type will inherit these three features from

-        <literal>uima.tcas.Annotation</literal>, its supertype; they are not visible in

-        this view because inherited features are not shown. One additional feature,

-        <literal>building</literal>, is declared. It takes a String as its value. Instead

-        of String, we could have declared the range-type of our feature to be any other CAS type

-        (defined or built-in).</para>

-      

-      <para>If you are not using Eclipse, if you need to edit the type system, do so using any XML

-        or text editor, directly. The following is the actual XML representation of the Type

-        System displayed above in the editor:</para>

-      

-      

-      <programlisting><![CDATA[<?xml version="1.0" encoding="UTF-8" ?>

-  <typeSystemDescription xmlns="http://uima.apache.org/resourceSpecifier">

-    <name>TutorialTypeSystem</name>

-    <description>Type System Definition for the tutorial examples - 

-        as of Exercise 1</description>

-    <vendor>Apache Software Foundation</vendor>

-    <version>1.0</version>

-    <types>

-      <typeDescription>

-        <name>org.apache.uima.tutorial.RoomNumber</name>

-        <description></description>

-        <supertypeName>uima.tcas.Annotation</supertypeName>

-        <features>

-          <featureDescription>

-            <name>building</name>

-            <description>Building containing this room</description>

-            <rangeTypeName>uima.cas.String</rangeTypeName>

-          </featureDescription>

-        </features>

-      </typeDescription>

-    </types>

-  </typeSystemDescription>]]></programlisting>

-      

-    </section>

-    

-    <section id="ugr.tug.aae.generating_jcas_sources">

-      <title>Generating Java Source Files for CAS Types</title>

-      

-      <para>When you save a descriptor that you have modified, the Component Descriptor

-        Editor will automatically generate Java classes corresponding to the types that are

-        defined in that descriptor (unless this has been disabled), using a utility called

-        JCasGen. These Java classes will have the same name (including package) as the CAS

-        types, and will have get and set methods for each of the features that you have

-        defined.</para>

-      

-      <para>This feature is enabled/disabled using the UIMA menu pulldown (or the Eclipse

-        Preferences &rarr; UIMA). If automatic running of JCasGen is not happening, please

-        make sure the option is checked:</para>

-      

-      

-      <screenshot>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="5.7in" format="JPG" fileref="&imgroot;image004.jpg"/>

-        </imageobject>

-        <textobject><phrase>Screenshot of enabling automatic running of JCasGen</phrase></textobject>

-      </mediaobject>

-  </screenshot>

-      

-      <para>The Java class for the example org.apache.uima.tutorial.RoomNumber type can

-        be found in <literal>src/org/apache/uima/tutorial/RoomNumber.java</literal>

-        . You will see how to use these generated classes in the next section.</para>

-      

-      <para>If you are not using the Component Descriptor Editor, you will need to generate

-        these Java classes by using the <emphasis>JCasGen</emphasis> tool. JCasGen reads a

-        Type System Descriptor XML file and generates the corresponding Java classes that

-        you can then use in your annotator code. To launch JCasGen, run the jcasgen shell

-        script located in the <literal>/bin</literal> directory of the UIMA SDK

-        installation. This should launch a GUI that looks something like this:</para>

-      

-      

-      <screenshot>

-        <mediaobject>

-        <imageobject>

-          <imagedata width="5.7in" format="JPG" fileref="&imgroot;image006.jpg"/>

-        </imageobject>

-        <textobject><phrase>Screenshot of JCasGen</phrase></textobject>

-      </mediaobject>

-</screenshot>

-      

-      <para>Use the <quote>Browse</quote> buttons to select your input file

-        (TutorialTypeSystem.xml) and output directory (the root of the source tree into

-        which you want the generated files placed). Then click the <quote>Go</quote>

-        button. If the Type System Descriptor has no errors, new Java source files will be

-        generated under the specified output directory.</para>

-      

-      <para>There are some additional options to choose from when running JCasGen; please

-        refer to the <olink targetdoc="&uima_docs_tools;"/> <olink targetdoc="&uima_docs_tools;"

-          targetptr="ugr.tools.jcasgen"/> for details.</para>

-    </section>

-    

-    <section id="ugr.tug.aae.developing_annotator_code">

-      <title>Developing Your Annotator Code</title>

-      

-      <para>Annotator implementations all implement a standard interface (AnalysisComponent), having several

-        methods, the most important of which are:

-        

-        <itemizedlist spacing="compact">

-          <listitem>

-            <para><literal>initialize</literal>, </para>

-          </listitem>

-          

-          <listitem>

-            <para><literal>process</literal>, and </para>

-          </listitem>

-          

-          <listitem>

-            <para><literal>destroy</literal>. </para>

-          </listitem>

-        </itemizedlist></para>

-      

-      <para><literal>initialize</literal> is called by the framework once when it first creates an instance of the

-        annotator class. <literal>process</literal> is called once per item being processed.

-        <literal>destroy</literal> may be called by the application when it is done using your annotator. There is a 

-        default implementation of this interface for annotators using the JCas, called JCasAnnotator_ImplBase, which 

-        has implementations of all required methods except for the process method.</para>

-      

-      <para>Our annotator class extends the JCasAnnotator_ImplBase; most annotators that use the JCas will extend

-        from this class, so they only have to implement the process method. This class is not restricted to handling

-        just text; see <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aas"/>.</para>

-      

-      <para>Annotators are not required to extend from the JCasAnnotator_ImplBase class; they may instead

-        directly implement the AnalysisComponent interface, and provide all method implementations themselves.

-        <footnote>

-        <para>Note that AnalysisComponent is not specific to JCAS. There is a method getRequiredCasInterface()

-          which the user would have to implement to return <literal>JCas.class</literal>. Then in the

-          <literal>process(AbstractCas cas)</literal> method, they would need to typecast

-          <literal>cas</literal> to type <literal>JCas</literal>.</para></footnote> This allows you to have

-        your annotator inherit from some other superclass if necessary. If you would like to do this, see the Javadocs

-        for JCasAnnotator for descriptions of the methods you must implement.</para>

-      

-      <para>Annotator classes need to be public, cannot be declared abstract, and must have public, 0-argument 

-        constructors, so that they can be instantiated by the framework. <footnote>

-        <para> Although Java classes in which you do not define any constructor will, by default, have a 0-argument

-          constructor that doesn&apos;t do anything, a class in which you have defined at least one constructor does

-          not get a default 0-argument constructor.</para> </footnote> .</para>

-      

-      <para>The class definition for our RoomNumberAnnotator implements the process method, and is shown here. You

-        can find the source for this in the

-        <literal>uimaj-examples/src/org/apache/uima/tutorial/ex1/RoomNumberAnnotator.java</literal> .

-        <note>

-        <para>In Eclipse, in the <quote>Package Explorer</quote> view, this will appear by default in the project

-          <literal>uimaj-examples</literal>, in the folder <literal>src</literal>, in the package

-          <literal>org.apache.uima.tutorial.ex1</literal>.</para></note> In Eclipse, open the

-        RoomNumberAnnotator.java in the uimaj-examples project, under the src directory.</para>

-      

-      

-      <programlisting>package org.apache.uima.tutorial.ex1;

-

-import java.util.regex.Matcher;

-import java.util.regex.Pattern;

-

-import org.apache.uima.analysis_component.JCasAnnotator_ImplBase;

-import org.apache.uima.jcas.JCas;

-import org.apache.uima.tutorial.RoomNumber;

-

-/**

- * Example annotator that detects room numbers using 

- * Java 1.4 regular expressions.

- */

-public class RoomNumberAnnotator extends JCasAnnotator_ImplBase {

-  private Pattern mYorktownPattern = 

-        Pattern.compile("\\b[0-4]\\d-[0-2]\\d\\d\\b");

-

-  private Pattern mHawthornePattern = 

-        Pattern.compile("\\b[G1-4][NS]-[A-Z]\\d\\d\\b");

-

-  public void process(JCas aJCas) {

-    // Discussed Later

-  }

-}</programlisting>

-      

-      <para>The two Java class fields, mYorktownPattern and mHawthornePattern, hold regular expressions that

-        will be used in the process method. Note that these two fields are part of the Java implementation of the

-        annotator code, and not a part of the CAS type system. We are using the regular expression facility that is

-        built into Java 1.4. It is not critical that you know the details of how this works, but if you are curious the

-        details can be found in the Java API docs for the java.util.regex package.</para>

-      

-      <para>The only method that we are required to implement is <literal>process</literal>. This method is typically 

-        called once for each document that is being analyzed. This method takes one argument, which is a JCas instance; 

-        this holds the document to be analyzed and all of the analysis results. <footnote>

-        <para>Version 1 of UIMA specified an additional parameter, the ResultSpecification. This provides a

-          specification of which types and features are desired to be computed and "output" from this annotator. Its

-          use is optional; many annotators ignore it.</para>

-        <para> This parameter has been replaced by specific set/getResultSpecification() methods, which allow

-          the annotator to receive a signal (a method call) when the result specification changes.</para>

-        </footnote></para>

-      

-      

-      <programlisting>public void process(JCas aJCas) {

-  // get document text

-  String docText = aJCas.getDocumentText();

-  // search for Yorktown room numbers

-  Matcher m = mYorktownPattern.matcher(docText);

-  int pos = 0;

-  while (m.find(pos)) {

-    // found one - create annotation, with the begin/end positions

-    RoomNumber annotation = new RoomNumber(aJCas, m.start(), m.end());

-    annotation.setBuilding("Yorktown");

-    annotation.addToIndexes();

-    pos = m.end();

-  }

-  

-  // search for Hawthorne room numbers

-  m = mHawthornePattern.matcher(docText);

-  pos = 0;

-  while (m.find(pos)) {

-    // found one - create annotation, with the begin/end positions

-    RoomNumber annotation = new RoomNumber(aJCas, m.start(), m.end());

-    annotation.setBuilding("Hawthorne");

-    annotation.addToIndexes();

-    pos = m.end();

-  }

-}</programlisting>

-      

-      <para>The Matcher class is part of the java.util.regex package and is used to find the room numbers in the

-        document text. When we find one, recording the annotation is as simple as creating a new Java object and

-        calling some set methods:</para>

-      

-      

-      <programlisting>RoomNumber annotation = new RoomNumber(aJCas, m.start(), m.end());

-annotation.setBuilding("Yorktown");</programlisting>

-      

-      <para>The <literal>RoomNumber</literal> class was generated from the type system description by the

-        Component Descriptor Editor or the JCasGen tool, as discussed in the previous section.</para>

-      

-      <para>Finally, we call <literal>annotation.addToIndexes()</literal> to add the new annotation to the

-        indexes maintained in the CAS. By default, the CAS implementation used for analysis of text documents keeps

-        an index of all annotations in their order from beginning to end of the document. Subsequent annotators or

-        applications use the indexes to iterate over the annotations. </para>

-      

-      <note>

-      <para> If you don&apos;t add the instance to the indexes, it cannot be retrieved by down-stream annotators,

-        using the indexes. </para></note>

-      

-      <note>

-      <para>You can also call <literal>addToIndexes()</literal> on Feature Structures that are not subtypes of

-        <literal>uima.tcas.Annotation</literal>, but these will not be sorted in any particular way. If you want

-        to specify a sort order, you can define your own custom indexes in the CAS: see 

-        <olink targetdoc="&uima_docs_ref;"/> <olink

-          targetdoc="&uima_docs_ref;" targetptr="ugr.ref.cas"/> and <olink targetdoc="&uima_docs_ref;"

-          targetptr="ugr.ref.xml.component_descriptor.aes.index"/> for details.</para></note>

-      

-      <para>We&apos;re almost ready to test the RoomNumberAnnotator. There is just one more step

-        remaining.</para>

-    </section>

-    <section id="ugr.tug.aae.creating_xml_descriptor">

-      <title>Creating the XML Descriptor</title>

-      

-      <para>The UIMA architecture requires that descriptive information about an

-        annotator be represented in an XML file and provided along with the annotator class

-        file(s) to the UIMA framework at run time. This XML file is called an

-        <emphasis>Analysis Engine Descriptor</emphasis>. The descriptor includes:

-        

-        <itemizedlist><listitem><para>Name, description, version, and vendor</para>

-          </listitem>

-          

-          <listitem><para>The annotator&apos;s inputs and outputs, defined in terms of

-            the types in a Type System Descriptor</para></listitem>

-          

-          <listitem><para>Declaration of the configuration parameters that the

-            annotator accepts </para></listitem></itemizedlist> </para>

-      

-      <para>The <emphasis>Component Descriptor Editor</emphasis> plugin, which we

-        previously used to edit the Type System descriptor, can also be used to edit Analysis

-        Engine Descriptors.</para>

-      

-      <para>A descriptor for our RoomNumberAnnotator is provided with the UIMA

-        distribution under the name

-        <literal>descriptors/tutorial/ex1/RoomNumberAnnotator.xml.</literal> To

-        edit it in Eclipse, right-click on that file in the navigator and select Open With

-        &rarr; Component Descriptor Editor.</para> <tip><para>In Eclipse, you can double

-      click on the tab at the top of the Component Descriptor Editor&apos;s window

-      identifying the currently selected editor, and the window will

-      <quote>Maximize</quote>. Double click it again to restore the original size.</para>

-      </tip>

-      

-      <para>If you are not using Eclipse, you will need to edit Analysis Engine descriptors

-        manually. See <xref linkend="ugr.tug.aae.xml_intro_ae_descriptor"/> for an

-        introduction to the Analysis Engine descriptor XML syntax. The remainder of this

-        section assumes you are using the Component Descriptor Editor plug-in to edit the

-        Analysis Engine descriptor.</para>

-      

-      <para>The Component Descriptor Editor consists of several tabbed pages; we will only

-        need to use a few of them here. For more information on using this editor, see <olink

-          targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cde"/>.</para>

-      

-      <para>The initial page of the Component Descriptor Editor is the Overview page, which

-        appears as follows:</para>

-      

-      

-      <screenshot>

-  <mediaobject>

-    <imageobject>

-      <imagedata width="5.7in" format="JPG" fileref="&imgroot;image008.jpg"/>

-    </imageobject>

-    <textobject><phrase>Screenshot of Component Descriptor Editor overview page</phrase>      

-    </textobject>

-  </mediaobject>

-</screenshot>

-      

-      <para>This presents an overview of the RoomNumberAnnotator Analysis Engine (AE). The

-        left side of the page shows that this descriptor is for a

-        <emphasis>Primitive</emphasis> AE (meaning it consists of a single annotator),

-        and that the annotator code is developed in Java. Also, it specifies the Java class

-        that implements our logic (the code which was discussed in the previous section).

-        Finally, on the right side of the page are listed some descriptive attributes of our

-        annotator.</para>

-      

-      <para>The other two pages that need to be filled out are the Type System page and the

-        Capabilities page. You can switch to these pages using the tabs at the bottom of the

-        Component Descriptor Editor. In the tutorial, these are already filled out for

-        you.</para>

-      

-      <para>The RoomNumberAnnotator will be using the TutorialTypeSystem we looked at in

-        Section <xref linkend="ugr.tug.aae.defining_types"/>. To specify this, we add

-        this type system to the Analysis Engine&apos;s list of Imported Type Systems, using

-        the Type System page&apos;s right side panel, as shown here:</para>

-      

-      

-      <screenshot>

-   <mediaobject>

-     <imageobject>

-       <imagedata width="5.7in" format="JPG" fileref="&imgroot;image010.jpg"/>

-     </imageobject>

-     <textobject><phrase>Screenshot of CDE Type System page</phrase></textobject>

-   </mediaobject>

- </screenshot>

-      

-      <para>On the Capabilities page, we define our annotator&apos;s inputs and outputs, in

-        terms of the types in the type system. The Capabilities page is shown below:</para>

-      

-      

-      <screenshot>

-   <mediaobject>

-     <imageobject>

-       <imagedata width="5.3in" format="JPG" fileref="&imgroot;image012.jpg"/>

-     </imageobject>

-     <textobject><phrase>Screenshot of CDE Capabilities page</phrase></textobject>

-   </mediaobject>

- </screenshot>

-      

-      <para>Although capabilities come in sets, having multiple sets is deprecated; here

-        we&apos;re just using one set. The RoomNumberAnnotator is very simple. It requires

-        no input types, as it operates directly on the document text -- which is supplied as a

-        part of the CAS initialization (and which is always assumed to be present). It

-        produces only one output type (RoomNumber), and it sets the value of the

-        <literal>building</literal> feature on that type. This is all represented on the

-        Capabilities page.</para>

-      

-      <para>The Capabilities page has two other parts for specifying languages and Sofas.

-        The languages section allows you to specify which languages your Analysis Engine

-        supports. The RoomNumberAnnotator happens to be language-independent, so we can

-        leave this blank. The Sofas section allows you to specify the names of additional

-        subjects of analysis. This capability and the Sofa Mappings at the bottom are

-        advanced topics, described in <olink targetdoc="&uima_docs_tutorial_guides;"

-          targetptr="ugr.tug.aas"/>. </para>

-      

-      <para>This is all of the information we need to provide for a simple annotator. If you

-        want to peek at the XML that this tool saves you from having to write, click on the

-        <quote>Source</quote> tab at the bottom to view the generated XML.</para>

-    </section>

-    

-    <section id="ugr.tug.aae.testing_your_annotator">

-      <title>Testing Your Annotator</title>

-      

-      <para>Having developed an annotator, we need a way to try it out on some example

-        documents. The UIMA SDK includes a tool called the Document Analyzer that will allow

-        us to do this. To run the Document Analyzer, execute the documentAnalyzer shell

-        script that is in the <literal>bin</literal> directory of your UIMA SDK

-        installation, or, if you are using the example Eclipse project, execute the

-        <quote>UIMA Document Analyzer</quote> run configuration supplied with that

-        project. (To do this, click on the menu bar Run &rarr; Run ... &rarr; and under Java

-        Applications in the left box, click on UIMA Document Analyzer.)</para>

-      

-      <para>You should see a screen that looks like this:</para>

-      

-      

-      <screenshot>

-   <mediaobject>

-     <imageobject>

-       <imagedata width="5.7in" format="JPG" fileref="&imgroot;image014.jpg"/>

-     </imageobject>

-     <textobject><phrase>Screenshot of UIMA Document Analyzer GUI</phrase></textobject>

-   </mediaobject>       

-      </screenshot>

-      

-      <para>There are six options on this screen:</para>

-      

-      <orderedlist><listitem><para>Directory containing documents to analyze</para>

-        </listitem>

-        

-        <listitem><para>Directory where analysis results will be written</para>

-        </listitem>

-        

-        <listitem><para>The XML descriptor for the Analysis Engine (AE) you want to

-          run</para></listitem>

-        

-        <listitem><para>(Optional) an XML tag, within the input documents, that contains

-          the text to be analyzed. For example, the value TEXT would cause the AE to only

-          analyze the portion of the document enclosed within

-          &lt;TEXT&gt;...&lt;/TEXT&gt; tags.</para></listitem>

-        

-        <listitem><para>Language of the document </para></listitem>

-        

-        <listitem><para>Character encoding </para></listitem></orderedlist>

-      

-      <para>Use the Browse button next to the third item to set the <quote>Location of AE XML

-        Descriptor</quote> field to the descriptor we&apos;ve just been discussing

-        &mdash;

-        <literal>&lt;where-you-installed-uima-e.g.UIMA_HOME&gt; 

-          /examples/descriptors/tutorial/ex1/RoomNumberAnnotator.xml</literal>

-        . Set the other fields to the values shown in the screen shot above (which should be the

-        default values if this is the first time you&apos;ve run the Document Analyzer). Then

-        click the <quote>Run</quote> button to start processing.</para>

-      

-      <para>When processing completes, an <quote>Analysis Results</quote> window should

-        appear.</para>

-      

-      

-      <screenshot>

-   <mediaobject>

-     <imageobject>

-       <imagedata width="3.5in" format="JPG" fileref="&imgroot;image016.jpg"/>

-     </imageobject>

-     <textobject><phrase>Screenshot of UIMA Document Analyzer Results GUI</phrase></textobject>

-   </mediaobject>       

-      </screenshot>

-      

-      <para>Make sure <quote>Java Viewer</quote> is selected as the Results Display

-        Format, and <emphasis role="bold">double-click</emphasis> on the document

-        UIMASummerSchool2003.txt to view the annotations that were discovered. The view

-        should look something like this:</para>

-      

-      

-      <screenshot>

-   <mediaobject>

-     <imageobject>

-       <imagedata width="5.7in" format="JPG" fileref="&imgroot;image018.jpg"/>

-     </imageobject>

-     <textobject><phrase>Screenshot of UIMA CAS Annotation Viewer GUI</phrase></textobject>

-   </mediaobject>       

-      </screenshot>

-      

-      <para>You can click the mouse on one of the highlighted annotations to see a list of all

-        its features in the frame on the right.</para> <note><para>The legend will only show

-      those types which have at least one instance in the CAS, and are declared as outputs in the

-      capabilities section of the descriptor (see <xref

-        linkend="ugr.tug.aae.creating_xml_descriptor"/>. </para></note>

-      

-      <para>You can use the DocumentAnalyzer to test any UIMA annotator

-        &mdash; just make sure that the annotator&apos;s classes are in the class

-        path.</para>

-    </section>

-  </section>

-  

-  <section id="ugr.tug.aae.configuration_logging">

-    <title>Configuration and Logging</title>

-    

-    <section id="ugr.tug.aae.configuration_parameters">

-      <title>Configuration Parameters</title>

-      

-      <para>The example RoomNumberAnnotator from the previous section used hardcoded

-        regular expressions and location names, which is obviously not very flexible. For

-        example, you might want to have the patterns of room numbers be supplied by a

-        configuration parameter, rather than having to redo the annotator&apos;s Java code

-        to add additional patterns. Rather than add a new hardcoded regular expression for a

-        new pattern, a better solution is to use configuration parameters.</para>

-      

-      <para>UIMA allows annotators to declare configuration parameters in their

-        descriptors. The descriptor also specifies default values for the parameters,

-        though these can be overridden at runtime.</para>

-      

-      <section id="ugr.tug.aae.declaring_parameters_in_the_descriptor">

-        <title>Declaring Parameters in the Descriptor</title>

-        

-        <para>The example descriptor

-          <literal>descriptors/tutorial/ex2/RoomNumberAnnotator.xml</literal> is

-          the same as the descriptor from the previous section except that information has

-          been filled in for the Parameters and Parameter Settings pages of the Component

-          Descriptor Editor.</para>

-        

-        <para>First, in Eclipse, open example two&apos;s RoomNumberAnnotator in the

-          Component Descriptor Editor, and then go to the Parameters page (click on the

-          parameters tab at the bottom of the window), which is shown below:</para>

-        

-        

-        <screenshot>

-   <mediaobject>

-     <imageobject>

-       <imagedata width="5.7in" format="JPG" fileref="&imgroot;image020.jpg"/>

-     </imageobject>

-     <textobject><phrase>Screenshot of UIMA Component Descriptor Editor (CDE) Parameters page</phrase></textobject>

-   </mediaobject>       

-      </screenshot>

-        

-        <para>Two parameters &ndash; Patterns and Locations -- have been declared. In this

-          screen shot, the mouse (not shown) is hovering over Patterns to show its

-          description in the small popup window. Every parameter has the following

-          information associated with it:</para>

-        

-        <itemizedlist><listitem><para>name &ndash; the name by which the annotator code

-          refers to the parameter</para></listitem>

-          

-          <listitem><para>description &ndash; a natural language description of the

-            intent of the parameter</para></listitem>

-          

-          <listitem><para>type &ndash; the data type of the parameter&apos;s value

-            &ndash; must be one of String, Integer, Float, or Boolean.</para></listitem>

-          

-          <listitem><para>multiValued &ndash; true if the parameter can take

-            multiple-values (an array), false if the parameter takes only a single value.

-            Shown above as <literal>Multi</literal>.</para></listitem>

-          

-          <listitem><para>mandatory &ndash; true if a value must be provided for the

-            parameter. Shown above as <literal>Req</literal> (for required). </para>

-          </listitem></itemizedlist>

-        

-        <para>Both of our parameters are mandatory and accept an array of Strings as their

-          value.</para>

-        

-        <para>Next, default values are assigned to the parameters on the Parameter Settings

-          page:</para>

-        

-        

-        <screenshot>

-   <mediaobject>

-     <imageobject>

-       <imagedata width="5.7in" format="JPG" fileref="&imgroot;image022.jpg"/>

-     </imageobject>

-     <textobject><phrase>Screenshot of UIMA Component Descriptor Editor (CDE) Parameter Settings page</phrase></textobject>

-   </mediaobject>       

-      </screenshot>

-        

-        <para>Here the <quote>Patterns</quote> parameter is selected, and the right pane

-          shows the list of values for this parameter, in this case the regular expressions

-          that match particular room numbering conventions. Notice the third pattern is

-          new, for matching the style of room numbers in the third building, which has room

-          numbers such as <literal>J2-A11</literal>.</para>

-      </section>

-      <section id="ugr.tug.aae.accessing_parameter_values_from_annotator">

-        <title>Accessing Parameter Values from the Annotator Code</title>

-        

-        <para>The class

-          <literal>org.apache.uima.tutorial.ex2.RoomNumberAnnotator</literal> has

-          overridden the initialize method. The initialize method is called by the UIMA

-          framework when the annotator is instantiated, so it is a good place to read

-          configuration parameter values. The default initialize method does nothing with

-          configuration parameters, so you have to override it. To see the code in Eclipse,

-          switch to the src folder, and open

-          <literal>org.apache.uima.tutorial.ex2</literal>. Here is the method

-          body:</para>

-        

-        

-        <programlisting>/**

-* @see AnalysisComponent#initialize(UimaContext)

-*/

-public void initialize(UimaContext aContext) 

-        throws ResourceInitializationException {

-  super.initialize(aContext);

-  

-  // Get config. parameter values  

-  String[] patternStrings = 

-        (String[]) aContext.getConfigParameterValue("Patterns");

-  mLocations = 

-        (String[]) aContext.getConfigParameterValue("Locations");

-

-  // compile regular expressions

-  mPatterns = new Pattern[patternStrings.length];

-  for (int i = 0; i &lt; patternStrings.length; i++) {

-    mPatterns[i] = Pattern.compile(patternStrings[i]);

-  }

-}</programlisting>

-        

-        <para>Configuration parameter values are accessed through the UimaContext. As you

-          will see in subsequent sections of this chapter, the UimaContext is the

-          annotator&apos;s access point for all of the facilities provided by the UIMA

-          framework &ndash; for example logging and external resource access.</para>

-        

-        <para>The UimaContext&apos;s <literal>getConfigParameterValue</literal>

-          method takes the name of the parameter as an argument; this must match one of the

-          parameters declared in the descriptor. The return value of this method is a Java

-          Object, whose type corresponds to the declared type of the parameter. It is up to the

-          annotator to cast it to the appropriate type, String[] in this case.</para>

-        

-        <para>If there is a problem retrieving the parameter values, the framework throws an

-          exception. Generally annotators don&apos;t handle these, and just let them

-          propagate up.</para>

-        

-        <para>To see the configuration parameters working, run the Document Analyzer

-          application and select the descriptor

-          <literal>examples/descriptors/tutorial/ex2/RoomNumberAnnotator.xml</literal>

-          . In the example document <literal>WatsonConferenceRooms.txt</literal>, you

-          should see some examples of Hawthorne II room numbers that would not have been

-          detected by the ex1 version of RoomNumberAnnotator.</para>

-      </section>

-      

-      <section id="ugr.tug.aae.supporting_reconfiguration">

-        <title>Supporting Reconfiguration</title>

-        

-        <para>If you take a look at the Javadocs (located in the <ulink

-            url="api/index.html">docs/api</ulink> directory) for

-          <literal>org.apache.uima.analysis_component.AnaysisComponent</literal>

-          (which our annotator implements indirectly through JCasAnnotator_ImplBase),

-          you will see that there is a reconfigure() method, which is called by the containing

-          application through the UIMA framework, if the configuration parameter values

-          are changed.</para>

-        

-        <para>The AnalysisComponent_ImplBase class provides a default implementation

-          that just calls the annotator&apos;s destroy method followed by its initialize

-          method. This works fine for our annotator. The only situation in which you might

-          want to override the default reconfigure() is if your annotator has very expensive

-          initialization logic, and you don&apos;t want to reinitialize everything if just

-          one configuration parameter has changed. In that case, you can provide a more

-          intelligent implementation of reconfigure() for your annotator.</para>

-        

-      </section>

-      

-      <section id="ugr.tug.aae.configuration_parameter_groups">

-        <title>Configuration Parameter Groups</title>

-        

-        <para>For annotators with many sets of configuration parameters, UIMA supports

-          organizing them into groups. It is possible to define a parameter with the same name

-          in multiple groups; one common use for this is for annotators that can process

-          documents in several languages and which want to have different parameter

-          settings for the different languages.</para>

-        

-        <para>The syntax for defining parameter groups in your descriptor is fairly

-          straightforward &ndash; see <olink targetdoc="&uima_docs_ref;"/>

-          <olink targetdoc="&uima_docs_ref;"

-            targetptr="ugr.ref.xml.component_descriptor"/> for details. Values of

-          parameters defined within groups are accessed through the two-argument version

-          of <literal>UimaContext.getConfigParameterValue</literal>, which takes

-          both the group name and the parameter name as its arguments.</para>

-      </section>

-

-      <section id="ugr.tug.aae.configuration_parameter_overrides">

-        <title>Overriding Configuration Parameter Settings</title>

-

-        <para>There are two ways that the value assigned to a configuration parameter can be

-        overridden. An aggregate may declare a parameter that overrides one or more of the

-        parameters in one or more of its delegates.  The aggregate must also define a value for the

-        parameter, unless the parameter is itself overridden by a setting in the parent

-        aggregate.</para>

-

-        <para>An alternative method that avoids these strict hierarchical override constraints is to

-        associate an external global name with a parameter and to assign values to these external

-        names in an external properties file.  With this approach a particular parameter setting can

-        be easily shared by multiple descriptors, even across different applications.  For applications

-        with many levels of descriptor nesting it avoids the need to edit aggregate override

-        definitions when the location of an annotator in the hierarchy is changed.

-

-        For details see

-          <olink targetdoc="&uima_docs_ref;"/>

-          <olink targetdoc="&uima_docs_ref;" 

-          targetptr="ugr.ref.xml.component_descriptor.aes.external_configuration_parameter_overrides"/>

-        </para> 

-      </section>

-    </section>

-    

-    <section id="ugr.tug.aae.logging">

-      <title>Logging</title>

-      

-      <para>The UIMA SDK provides a logging facility, which is very similar to the

-        java.util.logging.Logger class that was introduced in Java 1.4.

-        In addition, it includes the SLF4j framework <ulink url="https://www.slf4j.org/"/>

-        and all the methods in that framework's <code>Logger</code> API, plus

-        the Java 8 specific API extensions that take <code>Supplier</code> parameters.</para>

-      

-      <para>Each logger instance is associated with a name. By

-        convention, this name is usually a hierarchy of simple names connected with periods, 

-        often the fully qualified class name of the component

-        issuing the logging call. The name (or any of its parents - starting prefixes up to a period) 

-        can be referenced in a configuration file which can then configure for each logger

-        various things such as the logging level and where messages should go.</para>

-      

-      <para>The UIMA framework supports this convention using the

-        <literal>UimaContext</literal> object. If you access a logger instance using

-        <literal>getContext().getLogger()</literal> or the shorter, but equivalent

-        <literal>getLogger()</literal>

-        within an Annotator, the logger

-        name will be the fully qualified name of the Annotator implementation class.</para>

-              

-      <para>Here is an example from the process method of

-        <literal>org.apache.uima.tutorial.ex2.RoomNumberAnnotator</literal>:

-        

-        <programlisting>getLogger().trace("Found: {}", () -> annotation.toString());</programlisting>

-      </para>

-

-      <para>The <code>trace</code> call 

-        indicates that this is a tracing message. This is useful for tracing program flow, but it is a low level which

-        is not usually enabled. 

-      </para>

-        

-      <para>

-        The first parameter is the message, with substitutable parts.  The convention for where those parts go is

-        written as either {} or {n}, where "n" is an integer, specifying the argument number.  

-        The modern logging APIs use the {} style, with API calls such as 

-        <code>logger.**level**( msg-using-{}-convention, substitutable-arguments)</code>, while the older

-        java.util.logger framework uses <code>logger.log(**level**, msg-using-{n} convention, substitutable-arguments)</code>.

-      </para>

-        

-      <para>

-        UIMA supports both styles.  

-        For new code, it is recommended to use the first style, together with the Java 8 lambda method for the arguments, which 

-        insures that the work of turning the <code>annotation</code>

-        argument into a printable string only will happen if tracing is enabled.

-      </para>

-         

-      <para>Log statements are "filtered" according to the logging configuration, by Level, and sometimes by

-        additional indicators, such as Markers.  Levels work in a hierarchy.  A given level of 

-        filtering passes that level and all higher levels.  Some levels have two names, due to the 

-        way the different logger back ends name things.  Most levels are also used as method names on 

-        the logger, to indicate logging for that level.  For example, you could say <code>aLogger.log(Level.INFO, message)</code>

-        but you can also say <code>aLogger.info(message)</code>). The level ordering, highest to lowest, 

-        and the associated method names are as follows:

-        <itemizedlist spacing="compact">

-          <listitem><para>SEVERE or ERROR; error(...)</para></listitem>

-          <listitem><para>WARN or WARNING; warn(...)</para></listitem>

-          <listitem><para>INFO; info(...)</para></listitem>

-          <listitem><para>CONFIG; info(UIMA_MARKER_CONFIG, ...)</para></listitem>

-          <listitem><para>FINE or DEBUG; debug(...)</para></listitem>

-          <listitem><para>FINER or TRACE; trace(...)</para></listitem>

-          <listitem><para>FINEST; trace(UIMA_MARKER_FINEST, ...)</para></listitem>

-        </itemizedlist>

-        </para>

-        

-        <para>The CONFIG and FINEST levels are merged with other levels, but are distinguished by having 

-        <code>Markers</code>.  If the filtering is configured to pass CONFIG level, then it will pass also the

-        INFO/WARN/ERROR  (or their alternative names WARNING/SEVERE) levels as well.

-        </para>

-       

-      

-      <para>Each logging backend has its own documentation for how 

-        to configure loggers at run time, via configuration files or APIs in some cases.

-        Some backends even allow dynamic reconfiguration

-        while running, just by updating the configuration file (it is re-loaded every so often, if changed).

-      </para>

-      

-      <para>For the built-in-to-Java logging back end, if no logging configuration file is provided (see next section), 

-        the Java Virtual Machine defaults would be used, which typically set the level to INFO and

-        higher messages, and direct output to the console.</para>

-                

-      <para>The UIMA logger is by default implemented using an SLF4J implementation; this (in turn) connects to

-        a logging back end, determined via a search of the classpath for a connector.  If none can be found,

-        then a message to that effect will be printed to System.err, and no logging will be done.

-        The binary distribution for UIMA includes, in its <code>lib</code> directory, the 

-        Jar which connects SLF4j to the Java-built-in logger to use as

-        its back end, so if you use the standard launchers, you will get this logging back end. 

-        </para>

-        

-      <para>Assuming you are using the Java-built-in-logger as the back-end, 

-        if you specify the configuration using the standard UIMA SDK <literal>Logger.properties</literal>

-        (found in <code>UIMA_HOME/config/</code>),

-        the output will be directed to a file named uima.log, in the current working directory

-        (often the <quote>project</quote> directory when running from Eclipse, for

-        instance).</para> 

-        

-        <note><para>When using Eclipse, the uima.log file, if written

-      into the Eclipse workspace in the project uimaj-examples, for example, may not appear

-      in the Eclipse package explorer view until you right-click the uimaj-examples project

-      with the mouse, and select <quote>Refresh</quote>. This operation refreshes the

-      Eclipse display to conform to what may have changed on the file system. Also, you can set

-      the Eclipse preferences for the workspace to automatically refresh (Window &rarr;

-      Preferences &rarr; General &rarr; Workspace, then click the <quote>refresh

-      automatically</quote> checkbox.</para></note>

-      

-      <para>The next several sections mainly describe how to configure the built-in

-        Java logger.  See the documentation for other logging back ends for 

-        details on how to configure those.</para>

-         

-      <section id="ugr.tug.aae.logging.configuring">

-        <title>Specifying the Logging Configuration when using Java's built-in logger</title>

-        

-        <para>The

-          standard Java built-in logging initialization mechanisms will look for a Java System

-          Property named <literal>java.util.logging.config.file</literal> and if

-          found, will use the value of this property as the name of a standard

-          <quote>properties</quote> file, for setting the logging level. Please refer to

-          the Java 1.4. documentation for more information on the format and use of this

-          file.</para>

-        

-        <para>Two sample logging specification property files can be found in the UIMA_HOME

-          directory where the UIMA SDK is installed:

-          <literal>config/Logger.properties</literal>, and

-          <literal>config/FileConsoleLogger.properties</literal>. These specify the same

-          logging, except the first logs just to a file, while the second logs both to a file and

-          to the console. You can edit these files, or create additional ones, as described

-          below, to change the logging behavior.</para>

-        

-        <para>When running your own Java application, you can specify the location of this

-          logging configuration file on your Java command line by setting the Java system

-          property <literal>java.util.logging.config.file</literal> to be the logging

-          configuration filename. This file specification can be either absolute or

-          relative to the working directory. For example:

-          

-          

-          <programlisting><?db-font-size 65% ?>java "-Djava.util.logging.config.file=C:/Program Files/apache-uima/config/Logger.properties"</programlisting>

-          <note><para>In a shell script, you can use environment variables such as

-          UIMA_HOME if convenient.</para></note> </para>

-               

-        <para>If you are using Eclipse to launch your application, you can set this property

-          in the VM arguments section of the Arguments tab of the run configuration screen. If

-          you&apos;ve set an environment variable UIMA_HOME, you could for example, use the

-          string:

-          <literal>"-Djava.util.logging.config.file=${env_var:UIMA_HOME}/config/Logger.properties".</literal>

-          </para>

-        

-        <para>If you running the .bat or .sh files in the UIMA SDK's <literal>bin</literal> directory, you can specify the location of your

-           logger configuration file by setting the <literal>UIMA_LOGGER_CONFIG_FILE</literal> environment variable prior to running the script,

-           for example (on Windows): 

-

-           <programlisting><?db-font-size 70% ?>set UIMA_LOGGER_CONFIG_FILE=C:/myapp/MyLogger.properties</programlisting>        

-        </para>        

-      </section>

-      

-      <section id="ugr.tug.aae.logging.setting_logging_levels">

-        <title>Setting Logging Levels when using Java's built-in logger</title>

-        

-        <para>Within the logging control file, the default global logging level specifies

-          which kinds of events are logged across all loggers. For any given facility this

-          global level can be overridden by a facility specific level. Multiple handlers are

-          supported. This allows messages to be directed to a log file, as well as to a

-          <quote>console</quote>. Note that the ConsoleHandler also has a separate level

-          setting to limit messages printed to the console. For example: <literal>.level=

-          INFO</literal> </para>

-        

-        <para>The properties file can change where the log is written, as well.</para>

-        

-        <para>Facility specific properties allow different logging for each class, as

-          well. For example, to set the com.xyz.foo logger to only log SEVERE messages:

-          <literal>com.xyz.foo.level = SEVERE</literal></para>

-        

-        <para>If you have a sample annotator in the package

-          <literal>org.apache.uima.SampleAnnotator</literal> you can set the log level

-          by specifying: <literal>org.apache.uima.SampleAnnotator.level =

-          ALL</literal></para>

-        

-        <para>There are other logging controls; for a full discussion, please read the

-          contents of the <literal>Logger.properties</literal> file and the Java

-          specification for logging in Java 1.4.</para>

-      </section>

-      

-      <section id="ugr.tug.aae.logging.output_format">

-        <title>Configuring the format of logging output when using Java's built-in logger</title>

-        

-        <para>The logging output is formatted by handlers specified in the properties file

-          for configuring logging, described above. The default formatter that comes with

-          the UIMA SDK formats logging output as follows:</para>

-        

-        <para><literal>Timestamp - threadID: sourceInfo: Message level:

-          message</literal></para>

-        

-        <para> Here&apos;s an example:</para>

-        

-        <para><literal>7/12/04 2:15:35 PM - 10:

-          org.apache.uima.util.TestClass.main(62): INFO: You are not logged

-          in!</literal></para>

-      </section>

-      

-      <section id="ugr.tug.aae.logging.meaning_of_severity_levels">

-        <title>Meaning of the logging severity levels used by the UIMA logger</title>

-        

-        <para>These levels are defined by the Java logging framework, which was

-          incorporated into Java as of the 1.4 release level. The levels are defined in the

-          Javadocs for java.util.logging.Level, and include both logging and tracing

-          levels:

-          <itemizedlist spacing="compact">

-            <listitem><para>OFF is a special level that can be used to turn off

-              logging.</para></listitem>

-            

-            <listitem><para>ALL indicates that all messages should be logged. </para>

-            </listitem>

-            

-            <listitem><para>CONFIG is a message level for configuration messages. These

-              would typically occur once (during configuration) in methods like

-              <literal>initialize()</literal>. </para></listitem>

-            

-            <listitem><para>INFO is a message level for informational messages, for

-              example, connected to server IP: 192.168.120.12 </para></listitem>

-            

-            <listitem><para>WARNING is a message level indicating a potential

-              problem.</para></listitem>

-            

-            <listitem><para>SEVERE is a message level indicating a serious

-              failure.</para></listitem>

-          </itemizedlist></para>

-        

-        <para> Tracing levels, typically used for debugging:

-          <itemizedlist>

-            

-            <listitem><para>FINE is a message level providing tracing information,

-              typically at a collection level (messages occurring once per collection).

-              </para></listitem>

-            

-            <listitem><para>FINER indicates a fairly detailed tracing message,

-              typically at a document level (once per document).</para></listitem>

-            

-            <listitem><para>FINEST indicates a highly detailed tracing message. </para>

-            </listitem></itemizedlist></para>

-      </section>

-      

-      <section id="ugr.tug.aae.logging.using_outside_of_an_annotator">

-        <title>Using loggers outside of an annotator</title>

-        

-        <para>An application using UIMA may want to log its messages using the same logging

-          framework. This can be done by getting a reference to the UIMA logger, as follows:                  

-          <programlisting>Logger logger = UIMAFramework.getLogger(TestClass.class);</programlisting>.

-        </para>

-          

-        <para>You can also simply get a direct reference to an Slf4j logger using the standard approach:

-          <programlisting>org.slf4j.Logger logger = org.slf4j.LogFactory.getLogger(TestClass.class);</programlisting>

-        </para>

-        

-        <para>The class argument specifies the name of the logger, using the fully qualified class name. 

-          For UIMA loggers, if not specified, the name of the returned logger instance is

-          <quote>org.apache.uima</quote>.</para>

-      </section>

-      

-      <section id="ugr.tug.aae.logging.change_logger_implementation">

-        <title>Changing the underlying UIMA logging implementation</title>

-        

-        <para>By default the UIMA framework uses, under the hood of the UIMA Logger interface, the 

-        SLF4J logging framework to do logging. This allows UIMA, when running embedded inside other frameworks,

-        to defer the choice of back-end logging frameworks to those applications.

-        </para>

-        

-        <para>For backwards compatibility with Version 2, the older methods (prior to Slf4j) for switching the

-        logger implementation remains.     

-        You do this by specifying the system property  

-          <programlisting>-Dorg.apache.uima.logger.class=&lt;loggerClass></programlisting>

-        when the UIMA framework is started.  

-        </para>

-        <para>

-          The specified logger class must be available in the classpath and has to subclass the 

-          <code>org.apache.uima.util.Logger_common_impl</code> class. 

-        </para>

-

-        <para>For backwards compatibility, V3 continues to provide the class

-           <code>org.apache.uima.util.impl.Log4jLogger_impl</code> as an alternative

-           which can be specified this way by this JVM argument:

-           <programlisting><?db-font-size 80% ?>-Dorg.apache.uima.logger.class=org.apache.uima.util.impl.Log4jLogger_impl</programlisting>

-           to switch to the log4j back end.  This has been updated in V3 to <code>log4j 2</code>

-           (see <ulink url="https://logging.apache.org/log4j"/>).

-           If you use this, you must provide the required <code>Log4j 2</code> jars in the classpath.

-         </para>        

-                 

-      </section>

-      

-      <section id="uv3.logging.suppress_annotator_logging">

-        <title>Throttling excessive logging from Annotators</title>

-        

-        <para>Sometimes, in production, you may find annotators are logging excessively, and you wish to throttle 

-          this. But you may not have access to logging settings to control this,

-          perhaps because UIMA is running as a library component within another framework. 

-          For this special case,

-          you can limit logging done by Annotators by passing an additional parameter to the UIMA Framework's 

-          produceAnalysisEngine API, using the key name 

-          <code>AnalysisEngine.PARAM_THROTTLE_EXCESSIVE_ANNOTATOR_LOGGING</code>

-          and setting the value to an Integer object equal to the the limit.  Using 0 will suppress all logging.

-          Any positive number allows that many log records to be logged, per level.  A limit of 10 would allow 

-          10 Errors, 10 Warnings, etc.  The limit is enforced separately, per logger instance.</para>

-          

-          <note><para>This only works if the logger used by Annotators is obtained from the 

-          Annotator base implementation class via the <code>getLogger()</code> method.</para></note>

-            

-      </section>  

-      

-    </section>

-  </section>  

-  <section id="ugr.tug.aae.building_aggregates">

-    <title>Building Aggregate Analysis Engines</title>

-    

-    <section id="ugr.tug.aae.combining_annotators">

-      <title>Combining Annotators</title>

-      

-      <para>The UIMA SDK makes it very easy to combine any sequence of Analysis Engines to

-        form an <emphasis>Aggregate Analysis Engine</emphasis>. This is done through an

-        XML descriptor; no Java code is required!</para>

-      

-      <para>If you go to the <literal>examples/descriptors/tutorial/ex3</literal>

-        folder (in Eclipse, it&apos;s in your uimaj-examples project, under the

-        <literal>descriptors/tutorial/ex3</literal> folder), you will find a

-        descriptor for a TutorialDateTime annotator. This annotator detects dates and

-        times. To see what this annotator can do, try it out

-        using the Document Analyzer. If you are curious as to how this annotator works, the

-        source code is included, but it is not necessary to understand the code at this

-        time.</para>

-      

-      <para>We are going to combine the TutorialDateTime annotator with the

-        RoomNumberAnnotator to create an aggregate Analysis Engine. This is illustrated

-        in the following figure:

-        

-        <figure id="ugr.tug.aae.fig.combining_annotators">

-          <title>Combining Annotators to form an Aggregate Analysis Engine</title>

-          <mediaobject>

-            <imageobject>

-              <imagedata width="5.7in" format="PNG"

-                fileref="&imgroot;image024.png"/>

-            </imageobject>

-            <textobject> <phrase>Combining Annotators to form an Aggregate Analysis

-              Engine</phrase>

-            </textobject>

-          </mediaobject>

-        </figure> </para>

-      

-      <para>The descriptor that does this is named

-        <literal>RoomNumberAndDateTime.xml</literal>, which you can open in the

-        Component Descriptor Editor plug-in. This is in the uimaj-examples project in the

-        folder <literal>descriptors/tutorial/ex3</literal>. </para>

-      

-      <para>The <quote>Aggregate</quote> page of the Component Descriptor Editor is

-        used to define which components make up the aggregate. A screen shot is shown below.

-        (If you are not using Eclipse, see <xref

-          linkend="ugr.tug.aae.xml_intro_ae_descriptor"/> for the actual XML syntax

-        for Aggregate Analysis Engine Descriptors.)</para>

-      

-      

-        <screenshot>

-  <mediaobject>

-    <imageobject>

-      <imagedata width="5.7in" format="JPG" fileref="&imgroot;image026.jpg"/>

-    </imageobject>

-    <textobject>

-      <phrase>Aggregate page of the Component Descriptor Editor (CDE)</phrase>

-    </textobject>

-  </mediaobject>

-</screenshot>

-        

-      <para>On the left side of the screen is the list of component engines that make up the

-        aggregate &ndash; in this case, the TutorialDateTime annotator and the

-        RoomNumberAnnotator. To add a component, you can click the <quote>Add</quote>

-        button and browse to its descriptor. You can also click the <quote>Find AE</quote>

-        button and search for an Analysis Engine in your Eclipse workspace.

-        <note><para>The <quote>AddRemote</quote> button is used for adding components

-        which run remotely (for example, on another machine using a remote networking

-        connection). This capability is described in section <olink

-          targetdoc="&uima_docs_tutorial_guides;"

-          targetptr="ugr.tug.application.how_to_call_a_uima_service"/>,</para>

-        </note> </para>

-      

-      <para>The order of the components in the left pane does not imply an order of

-        execution. The order of execution, or <quote>flow</quote> is determined in the

-        <quote>Component Engine Flow</quote> section on the right. UIMA supports

-        different types of algorithms (including user-definable) for determining the

-        flow. Here we pick the simplest: <literal>FixedFlow</literal>. We have chosen to

-        have the RoomNumberAnnotator execute first, although in this case it

-        doesn&apos;t really matter, since the RoomNumber and DateTime annotators do not

-        have any dependencies on one another.</para>

-      

-      <para>If you look at the <quote>Type System</quote> page of the Component

-        Descriptor Editor, you will see that it displays the type system but is not

-        editable. The Type System of an Aggregate Analysis Engine is automatically

-        computed by merging the Type Systems of all of its components.</para>

-      

-      <warning><para>If the components have different definitions for the same type name,

-        The Component Descriptor Editor will show a warning.  It is possible to continue past

-        this warning, in which case your aggregate's type system will have the correct

-        <quote>merged</quote>

-        type definition that contains all of the features defined on that type by all of your

-        components.  However, it is not recommended to use this feature in conjunction with JCAS,

-        since the JCAS Java Class definitions cannot be so easily merged.  See

-        <olink targetdoc="&uima_docs_ref;"/>

-        <olink

-          targetdoc="&uima_docs_ref;"

-          targetptr="ugr.ref.jcas.merging_types_from_other_specs"/> for more information.

-      </para></warning>

-      

-      <para>The Capabilities page is where you explicitly declare the aggregate Analysis

-        Engine&apos;s inputs and outputs. Sofas and Languages are described later.

-        

-          

-          <screenshot>

-     <mediaobject>

-       <imageobject>

-         <imagedata width="5.7in" format="JPG" fileref="&imgroot;image028.jpg"/>

-       </imageobject>

-       <textobject><phrase>Screen shot of the Capabilities page of the Component Descriptor Editor

-       </phrase></textobject>

-     </mediaobject>

-   </screenshot>

-          </para>

-        <para>Note that it is not automatically assumed that all outputs of each component

-          Analysis Engine (AE) are passed through as outputs of the aggregate AE. If, for example,

-          the TutorialDateTime annotator also produced Word and Sentence annotations, 

-          but those were not of interest as output in this case, we can exclude them from the 

-          list of outputs.</para>

-        

-        <para>You can run this AE using the Document Analyzer in the same way that you run any

-          other AE. Just select the <literal>examples/descriptors/tutorial/ex3/

-          RoomNumberAndDateTime.xml</literal> descriptor and click the Run button. You

-          should see that RoomNumbers, Dates, and Times are all shown:</para>

-        

-        <screenshot>

-     <mediaobject>

-       <imageobject>

-         <imagedata width="5.7in" format="JPG" fileref="&imgroot;image030.jpg"/>

-       </imageobject>

-       <textobject><phrase>Screen shot results of running the Document Analyzer

-       </phrase></textobject>

-     </mediaobject>

-   </screenshot>

-        

-    </section>

-    

-    <section id="ugr.tug.aae.aaes_can_contain_cas_consumers">

-      <title>AAEs can also contain CAS Consumers</title>

-      

-      <para>In addition to aggregating Analysis Engines, Aggregates can also contain CAS

-        Consumers (see <olink targetdoc="&uima_docs_tutorial_guides;"

-          targetptr="ugr.tug.cpe"/>, or even a mixture of these components with regular

-        Analysis Engines. The UIMA Examples has an example of an Aggregate which contains

-        both an analysis engine and a CAS consumer, in

-        <literal>examples/descriptors/MixedAggregate.xml.</literal></para>

-      

-      <para>Analysis Engines support the <literal>collectionProcessComplete</literal>

-        method, which is particularly important for many CAS Consumers.  If

-        an application (or a Collection Processing Engine) calls 

-        <literal>collectionProcessComplete</literal> on an aggregate, the framework

-        will deliver that call to all of the components of the aggregate.  If you use

-        one of the built-in flow types (fixedFlow or capabilityLanguageFlow), then the

-        order specified in that flow will be the same order in which the

-        <literal>collectionProcessComplete</literal> calls are made to the components.

-        If a custom flow is used, then the calls will be made in arbitrary order.

-      </para>

-    </section>

-    

-    <section id="ugr.tug.aae.reading_results_previous_annotators">

-      <title>Reading the Results of Previous Annotators</title>

-      

-      <para>So far, we have been looking at annotators that look directly at the document text. However, annotators

-        can also use the results of other annotators. One useful thing we can do at this point is look for the

-        co-occurrence of a Date, a RoomNumber, and two Times &ndash; and annotate that as a Meeting.</para>

-        

-      <para>The <code>select</code> API, available on the CAS, JCas, and individual UIMA indexes, 

-        is the preferred way to get 

-        feature structures from the CAS and work with them.</para>  

-      

-      <para>The CAS maintains <emphasis>indexes</emphasis> of annotations, and from an index you can obtain an

-        iterator that allows you to step through all annotations of a particular type in that index.

-        Indexes are optional; they allow you to specify a sorting order or can specify set-inclusion

-        criteria.  One built-in index is the Annotation index; this contains sorted instances of type Annotation 

-        or its subtypes.

-      </para> 

-      

-      <para>

-        Here&apos;s some example code

-        that would iterate over all of the TimeAnnot annotations in the JCas, in some unspecified order:

-        

-        <programlisting>for (TimeAnnot : aJCas.select(TimeAnnot.class)) {

-  //do something

-}</programlisting></para>

-      

-      <para>

-        The same code, but using the Annotation index to specify an ordering (assuming that

-        TimeAnnot is a subtype of Annotation):

-        

-        <programlisting>for (TimeAnnot : aJCas.getAnnotationIndex().select(TimeAnnot.class)) {

-  //do something

-}

-  // or

-for (TimeAnnot : aJCas.getAnnotationIndex(TimeAnnot.class).select()) {

-  //do something

-}

-</programlisting></para>

-        

-      <para>Also, if you've defined your own custom index as described in <olink targetdoc="&uima_docs_ref;"/>

-        <olink targetdoc="&uima_docs_ref;"

-          targetptr="ugr.ref.xml.component_descriptor.aes.index"/>, you can get an iterator over that

-        specific index by calling <literal>aJCas.getIndex(label, clazz)</literal>.

-        The <literal>getIndex(...)</literal> method's second argument 

-      specialized the index to subtype of the type the index was declared to index.  For instance,

-      if you defined an index called "allEvents" over the type <literal>Event</literal>, and wanted 

-      to get an index over just a particular subtype of event, say, <literal>TimeEvent</literal>,

-      you can ask for that index using 

-        <literal>aJCas.getIndex("allEvents", TimeEvent.class)</literal>.</para>

-      

-      

-      <para>Whereever the type is specified by TimeEvent.class, the APIs also allow the non-JCas 

-        specification of the type by passing an instance of a UIMA Type class. This alternative enables

-        writing code that can be used for any type, discovered at run time.</para>

-      

-      <para>Now that we&apos;ve explained the basics, let&apos;s take a look at the process method for

-        <literal>org.apache.uima.tutorial.ex4.MeetingAnnotator</literal>. Since we&apos;re looking for a

-        combination of a RoomNumber, a Date, and two Times, there are four nested iterators. (There&apos;s surely a

-        better algorithm for doing this, but to keep things simple we&apos;re just going to look at every combination

-        of the four items.)</para>

-      

-      <para>For each combination of the four annotations, we compute the span of text that includes all of them, and

-        then we check to see if that span is smaller than a <quote>window</quote> size, a configuration parameter.

-        There are also some checks to make sure that we don&apos;t annotate the same span of text multiple times. If all

-        the checks pass, we create a Meeting annotation over the whole span. There&apos;s really nothing to

-        it!</para>

-      

-      <para>The XML descriptor, located in

-        <literal>examples/descriptors/tutorial/ex4/MeetingAnnotator.xml</literal> , is also very

-        straightforward. An important difference from previous descriptors is that this is the first annotator

-        we&apos;ve discussed that has input requirements. This can be seen on the <quote>Capabilities</quote>

-        page of the Component Descriptor Editor:</para>

-      

-      

-      <screenshot>

-     <mediaobject>

-       <imageobject>

-         <imagedata width="5.7in" format="JPG" fileref="&imgroot;image032.jpg"/>

-       </imageobject>

-       <textobject><phrase>Screen shot of Capabilities page of the Component Descriptor Editor

-       </phrase></textobject>

-     </mediaobject>

-   </screenshot>

-      

-      <para>If we were to run the MeetingAnnotator on its own, it wouldn&apos;t detect anything because it

-        wouldn&apos;t have any input annotations to work with. The required input annotations can be produced by the

-        RoomNumber and DateTime annotators. So, we create an aggregate Analysis Engine containing these two

-        annotators, followed by the Meeting annotator. This aggregate is illustrated in <xref

-          linkend="ugr.tug.aae.fig.aggregate_for_meeting_annotator"/>. The descriptor for this is in

-        <literal>examples/descriptors/tutorial/ex4/MeetingDetectorAE.xml</literal> . Give it a try in the

-        Document Analyzer.

-        

-        <figure id="ugr.tug.aae.fig.aggregate_for_meeting_annotator">

-          <title>An Aggregate Analysis Engine where an internal component uses output from previous

-            engines</title>

-          <mediaobject>

-            <imageobject>

-              <imagedata width="5.7in" format="PNG" fileref="&imgroot;image034.png"/>

-            </imageobject>

-            <textobject><phrase>An Aggregate Analysis Engine where an internal component uses output from

-              previous engines. </phrase>

-            </textobject>

-          </mediaobject>

-        </figure> </para>

-      

-    </section>

-  </section>

-  

-  <section id="ugr.tug.aae.other_examples">

-    <title>Other examples</title>

-    

-    <para>The UIMA SDK include several other examples you may find interesting,

-      including</para>

-    

-    <itemizedlist spacing="compact">

-      <listitem><para>SimpleTokenAndSentenceAnnotator &ndash; a simple tokenizer and

-        sentence annotator.</para></listitem>

-      

-      <listitem><para>XmlDetagger &ndash; A multi-sofa annotator that does XML

-        detagging. Multiple Sofas (Subjects of Analysis) are described in a later &ndash;

-        see <olink targetdoc="&uima_docs_tutorial_guides;"

-          targetptr="ugr.tug.mvs"/>.  Reads XML data from the input Sofa

-        (named "xmlDocument"); this data can be stored in the CAS as a string or array, or it can

-        be a URI to a remote file. The XML is parsed using the JVM's default parser, and the

-        plain-text content is written to a new sofa called "plainTextDocument".</para>

-      </listitem>

-      

-      <listitem><para>PersonTitleDBWriterCasConsumer &ndash; a sample CAS Consumer

-        which populates a relational database with some annotations. It uses JDBC and in this

-        example, hooks up with the Open Source Apache Derby database. </para></listitem>

-    </itemizedlist>

-  </section>

-  

-  <section id="ugr.tug.aae.additional_topics">

-    <title>Additional Topics</title>

-    

-    <section id="ugr.tug.aae.contract_for_annotator_methods">

-      <title>Contract: Annotator Methods Called by the Framework</title>

-      <titleabbrev>Annotator Methods</titleabbrev>

-      

-      <para>The UIMA framework ensures that an Annotator instance is called by only one

-        thread at a time.  An instance never has to worry about running some method on one 

-        thread, and then asynchronously being called using another thread. This approach 

-        simplifies the design of annotators &ndash; they do not have to be designed to support

-        multi-threading. When multiple threading is wanted, for performance, multiple

-        instances of the Annotator are created, each one running on just one thread.</para>

-      

-      <para>The following table defines the methods called by the framework, when they are

-        called, and the requirements annotator implementations must follow.</para>

-      

-      <informaltable frame="all">

-        <tgroup cols="3" colsep="1" rowsep="1">

-          <colspec colname="c1" colwidth="1*"/>

-          <colspec colname="c2" colwidth="2*"/>

-          <colspec colname="c3" colwidth="2*"/>

-          <thead>

-            <row>

-              <entry align="center">Method</entry>

-              <entry align="center">When Called by Framework</entry>

-              <entry align="center">Requirements</entry>

-            </row>

-          </thead>

-          <tbody>

-            <row>

-              <entry>initialize</entry>

-              <entry>Typically only called once, when instance is created. Can be called

-                again if application does a reinitialize call and the default behavior

-                isn't overridden (the default behavior for reinitialize is to call

-                <literal>destroy</literal> followed by

-                <literal>initialize</literal></entry>

-              <entry>Normally does one-time initialization, including reading of

-                configuration parameters. If the application changes the parameters, it

-                can call initialize to have the annotator re-do its

-                initialization.</entry>

-            </row>

-            <row>

-              <entry>typeSystemInit</entry>

-              <entry>Called before <literal>process</literal> whenever the type system

-                in the CAS being passed in differs from what was previously passed in a

-                <literal>process</literal> call (and called for the first CAS passed in,

-                too). The Type System being passed to an annotator only changes in the case of

-                remote annotators that are active as servers, receiving possibly

-                different type systems to operate on.</entry>

-              <entry>Typically, users of JCas do not implement any method for this. An

-                annotator can use this call to read the CAS type system and setup any instance

-                variables that make accessing the types and features convenient.</entry>

-            </row>

-            <row>

-              <entry>process</entry>

-              <entry>Called once for each CAS. Called by the application if not using

-                Collection Processing Manager (CPM); the application calls the process

-                method on the analysis engine, which is then delegated by the framework to

-                all the annotators in the engine. For Collection Processing application,

-                the CPM calls the process method. If the application creates and manages

-                your own Collection Processing Engine via API calls (see Javadocs), the

-                application calls this on the Collection Processing Engine, and it is

-                delegated by the framework to the components.</entry>

-              <entry>Process the CAS, adding and/or modifying elements in it</entry>

-            </row>

-            <row>

-              <entry>destroy</entry>

-              <entry>This method can be called by applications, and is also called by the

-                Collection Processing Manager framework when the collection processing

-                completes. It is also called on Aggregate delegate components, if those 

-                components successfully complete their <literal>initialize</literal> call, if 

-                a subsequent delegate (or flow controller) in the aggregate fails to initialize.

-                This allows components which need to clean up things done during initialization 

-                to do so.  It is up to the component writer to use a try/finally construct during initialization

-                to cleanup from errors that occur during initialization within one component.

-                The <literal>destroy</literal> call on an aggregate is

-                propagated to all contained analysis engines.</entry>

-              <entry>An annotator should release all resources, close files, close

-                database connections, etc., and return to a state where another initialize

-                call could be received to restart. Typically, after a destroy call, no

-                further calls will be made to an annotator instance.</entry>

-            </row>

-            <row>

-              <entry>reconfigure</entry>

-              <entry><para>This method is never called by the framework, unless an

-                application calls it on the Engine object &ndash; in which case it the

-                framework propagates it to all annotators contained in the Engine.</para>

-                <para>Its purpose is to signal that the configuration parameters have

-                  changed.</para></entry>

-              <entry>A default implementation of this calls destroy, followed by

-                initialize. This is the only case where initialize would be called more than

-                once. Users should implement whatever logic is needed to return the

-                annotator to an initialized state, including re-reading the

-                configuration parameter data.</entry>

-            </row>

-          </tbody>

-        </tgroup>

-      </informaltable>

-      

-    </section>

-    

-    <section id="ugr.tug.aae.reporting_errors_from_annotators">

-      <title>Reporting errors from Annotators</title>

-      

-      <para>There are two broad classes of errors that can occur: recoverable and

-        unrecoverable. Because Annotators are often expected to process very large numbers

-        of artifacts (for example, text documents), they should be written to recover where

-        possible.</para>

-      

-      <para>For example, if an upstream annotator created some input for an annotator which

-        is invalid, the annotator may want to log this event, ignore the bad input and

-        continue. It may include a notification of this event in the CAS, for further

-        downstream annotators to consider. Or, it may throw an exception (see next section)

-        &ndash; but in this case, it cannot do any further processing on that

-        document.</para> <note><para>The choice of what to do can be made configurable,

-      using the configuration parameters. </para></note>

-      

-    </section>

-    

-    <section id="ugr.tug.aae.throwing_exceptions_from_annotators">

-      <title>Throwing Exceptions from Annotators</title>

-      

-      <para>Let&apos;s say an invalid regular expression was passed as a parameter to the

-        RoomNumberAnnotator. Because this is an error related to the overall

-        configuration, and not something we could expect to ignore, we should throw an

-        appropriate exception, and most Java programmers would expect to do so like

-        this:</para>

-      

-      

-      <programlisting>throw new ResourceInitializationException(

-    "The regular expression " + x + " is not valid.");</programlisting>

-      

-      <para>UIMA, however, does not do it this way. All UIMA exceptions are

-        <emphasis>internationalized</emphasis>, meaning that they support translation

-        into other languages. This is accomplished by eliminating hardcoded message

-        strings and instead using external message digests. Message digests are files

-        containing (key, value) pairs. The key is used in the Java code instead of the actual

-        message string. This allows the message string to be easily translated later by

-        modifying the message digest file, not the Java code. Also, message strings in the

-        digest can contain parameters that are filled in when the exception is thrown. The

-        format of the message digest file is described in the Javadocs for the Java class

-        <literal>java.util.PropertyResourceBundle</literal> and in the load method of

-        <literal>java.util.Properties</literal>.</para>

-      

-      <para>The first thing an annotator developer must choose is what Exception class to

-        use. There are three to choose from:

-        

-        <orderedlist><listitem><para>ResourceConfigurationException should be

-          thrown from the annotator&apos;s reconfigure() method if invalid configuration

-          parameter values have been specified. 

-          </para></listitem>

-          

-          <listitem><para>ResourceInitializationException should be thrown from the

-            annotator&apos;s initialize() method if initialization fails for any 

-            reason (including invalid configuration parameters).</para></listitem>

-          

-          <listitem><para>AnalysisEngineProcessException should be thrown from the

-            annotator&apos;s process() method if the processing of a particular document

-            fails for any reason. </para></listitem></orderedlist></para>

-      

-      <para>Generally you will not need to define your own custom exception classes, but if

-        you do they must extend one of these three classes, which are the only types of

-        Exceptions that the annotator interface permits annotators to throw.</para>

-      

-      <para>All of the UIMA Exception classes share common constructor varieties. There are

-        four possible arguments:</para>

-      

-      <para>The name of the message digest to use (optional &ndash; if not specified the

-        default UIMA message digest is used).</para>

-      

-      <para>The key string used to select the message in the message digest.</para>

-      

-      <para>An object array containing the parameters to include in the message. Messages

-        can have substitutable parts. When the message is given, the string representation

-        of the objects passed are substituted into the message. The object array is often

-        created using the syntax new Object[]{x, y}.</para>

-      

-      <para>Another exception which is the <quote>cause</quote> of the exception you are

-        throwing. This feature is commonly used when you catch another exception and rethrow

-        it. (optional)</para>

-      

-      <para>If you look at source file (folder: src in Eclipse)

-        <literal>org.apache.uima.tutorial.ex5.RoomNumberAnnotator</literal>, you

-        will see the following code:

-        

-        

-        <programlisting>try {

-  mPatterns[i] = Pattern.compile(patternStrings[i]);

-} 

-catch (PatternSyntaxException e) {

-  throw new ResourceInitializationException(

-     MESSAGE_DIGEST, "regex_syntax_error",

-     new Object[]{patternStrings[i]}, e);

-}</programlisting>

-        where the MESSAGE_DIGEST constant has the value <literal>

-        "org.apache.uima.tutorial.ex5.RoomNumberAnnotator_Messages". </literal>

-        </para>

-      

-      <para>Message digests are specified using a dotted name, just like Java classes. This

-        file, with the .properties extension, must be present in the class path. In Eclipse,

-        you find this file under the src folder, in the package

-        org.apache.uima.tutorial.ex5, with the name

-        RoomNumberAnnotator_Messages.properties. Outside of Eclipse, you can find this

-        in the <literal>uimaj-examples.jar</literal> with the name

-        <literal>org/apache/uima/tutorial/ex5/RoomNumberAnnotator_Messages.properties.</literal>

-        If you look in this file you will see the line:

-        

-        

-        <programlisting>regex_syntax_error = {0} is not a valid regular expression.</programlisting>

-        which is the error message for the example exception we showed above. The placeholder

-        {0} will be filled by the toString() value of the argument passed to the exception

-        constructor &ndash; in this case, the regular expression pattern that didn&apos;t

-        compile. If there were additional arguments, their locations in the message would be

-        indicated as {1}, {2}, and so on.</para>

-      

-      <para>If a message digest is not specified in the call to the exception constructor, the

-        default is <literal>UIMAException.STANDARD_MESSAGE_CATALOG</literal> (whose

-        value is <quote><literal>org.apache.uima.UIMAException_Messages</literal>

-        </quote> in the current release but may change). This message digest is located in the

-        <literal>uima-core.jar</literal> file at

-        <literal>org/apache/uima/UIMAException_messages.properties</literal>

-        &ndash; you can take a look to see if any of these exception messages are useful to

-        use.</para>

-      

-      <para>To try out the regex_syntax_error exception, just use the Document Analyzer to

-        run

-        <literal>examples/descriptors/tutorial/ex5/RoomNumberAnnotator.xml</literal>

-        , which happens to have an invalid regular expression in its configuration parameter

-        settings.</para>

-      

-      <para>To summarize, here are the steps to take if you want to define your own exception

-        message:</para>

-      

-      <para>Create a file with the .properties extension, where you declare message keys and

-        their associated messages, using the same syntax as shown above for the

-        regex_syntax_error exception. The properties file syntax is more completely

-        described in the Javadocs for the <ulink 

-          url="http://java.sun.com/j2se/1.5.0/docs/api/java/util/Properties.html#load(java.io.InputStream)">

-        load</ulink> method of the java.util.Properties class.</para>

-      

-      <para>Put your properties file somewhere in your class path (it can be in your

-        annotator&apos;s .jar file).</para>

-      

-      <para>Define a String constant (called MESSAGE_DIGEST for example) in your annotator

-        code whose value is the dotted name of this properties file. For example, if your

-        properties file is inside your jar file at the location

-        <literal>org/myorg/myannotator/Messages.properties</literal>, then this

-        String constant should have the value

-        <literal>org.myorg.myannotator.Messages</literal>. Do not include the

-        .properties extension. In Java Internationalization terminology, this is called

-        the Resource Bundle name. For more information see the Javadocs for the <ulink

-          url="http://java.sun.com/j2se/1.5.0/docs/api/java/util/PropertyResourceBundle.html">

-        PropertyResourceBundle</ulink> class.</para>

-      

-      <para>In your annotator code, throw an exception like this:

-        

-        <programlisting>throw new ResourceInitializationException(

-    MESSAGE_DIGEST, "your_message_name",

-    new Object[]{param1,param2,...});</programlisting></para>

-      

-      <para>You may also wish to look at the Javadocs for the UIMAException class.</para>

-      

-      <para>For more information on Java&apos;s internationalization features, see the 

-       <ulink url="http://java.sun.com/j2se/1.5.0/docs/guide/intl/index.html">

-        Java Internationalization Guide</ulink>.</para>

-    </section>

-    

-    <section id="ugr.tug.aae.accessing_external_resource_files">

-      <title>Accessing External Resources</title>

-      

-      <para>External Resources are Java objects that have a life cycle where they

-      are (optionally) initialized at startup time by reading external data from 

-      a file or via a URL (which can access information over the http protocol, for instance).

-      It is not <emphasis>required</emphasis> that Extermal Resource objects 

-      do any external data reading to initialize themselves.  However, this is such a 

-      common use case, that we will presume this mode of operation in the description below.</para>

-      

-      <para>Sometimes you may want an annotator to read from an external resource, 

-        such as a URL or a file &ndash; for

-        example, a long list of keys and values that you are going to build into a HashMap. You

-        could, of course, just introduce a configuration parameter that holds the absolute

-        path or URL to this resource, and build the HashMap in your annotator&apos;s

-        initialize method. However, this is not the best solution for three reasons:</para>

-      

-      <orderedlist><listitem><para>Including an absolute path in your descriptor to

-        specify the initialization data makes

-        your annotator difficult for others to use. Each user will need to edit this

-        descriptor and set the absolute path to a value appropriate for his or her

-        installation.</para></listitem>

-        

-        <listitem><para>You cannot share the created Java object(s), e.g., a HashMap, 

-          between multiple annotators. Also,

-          in some deployment scenarios there may be more than one instance of your annotator,

-          and you would like to have the option for them to share the same Java Object(s).</para></listitem>

-        

-        <listitem><para>Your annotator would become dependent on a particular 

-          implementation of the Java Object(s).  It would be better if there was 

-          a decoupling between the actual implementation, and the API used to

-          access it. </para></listitem></orderedlist>

-      

-      <para>A better way to create these sharable Java objects and initialize them 

-        via external disk or URL sources is through the ResourceManager

-        component. In this section we are going to show an example of how to use the Resource

-        Manager.</para>

-      

-      <para>This example annotator will annotate UIMA acronyms (e.g. UIMA, AE, CAS, JCas)

-        and store the acronym&apos;s expanded form as a feature of the annotation. The

-        acronyms and their expanded forms are stored in an external file.</para>

-      

-      <para>First, look at the

-        <literal>examples/descriptors/tutorial/ex6/UimaAcronymAnnotator.xml</literal>

-        descriptor.

-        

-        

-        <screenshot>

-       <mediaobject>

-       <imageobject>

-         <imagedata width="5.7in" format="JPG" fileref="&imgroot;image036.jpg"/>

-       </imageobject>

-       <textobject><phrase>Screen shot of Component Descriptor Editor page for configuring External Resources

-       </phrase></textobject>

-     </mediaobject>

-

-</screenshot></para>

-      

-      <para>The values of the rows in the two tables are longer than can be easily shown. You can

-        click the small button at the top right to shift the layout from two side-by-side

-        tables, to a vertically stacked layout. You can also click the small twisty on the

-        <quote>Imports for External Resources and Bindings</quote> to collapse this

-        section, because it&apos;s not used here. Then the same screen will appear like this:

-        

-        

-        <screenshot>

-       <mediaobject>

-       <imageobject>

-         <imagedata width="5.7in" format="JPG" fileref="&imgroot;image038.jpg"/>

-       </imageobject>

-       <textobject><phrase>Screen shot of Component Descriptor Editor page for configuring External Resources after

-         adjusting the layout

-       </phrase></textobject>

-     </mediaobject>

-</screenshot>

-        </para>

-      

-      <para>The top window has a scroll bar allowing you to see the rest of the line.</para>

-      

-      <section id="ugr.tug.aae.resources.declaring_dependencies">

-        <title>Declaring Resource Dependencies</title>

-        

-        <para>The bottom window is where an annotator declares an external resource

-          dependency. The XML for this is as follows:</para>

-        

-        

-        <programlisting><![CDATA[<externalResourceDependency>

-  <key>AcronymTable</key> 

-  <description>Table of acronyms and their expanded forms.</description> 

-  <interfaceName>

-    org.apache.uima.tutorial.ex6.StringMapResource

-  </interfaceName> 

-</externalResourceDependency>

-]]></programlisting>

-        

-        <para>The &lt;key&gt; value (AcronymTable) is the name by which the annotator

-          identifies this resource. The key must be unique for all resources that this

-          annotator accesses, but the same key could be used by different annotators to mean

-          different things. The interface name

-          (<literal>org.apache.uima.tutorial.ex6.StringMapResource</literal>) is

-          the Java interface through which the annotator accesses the data. Specifying an

-          interface name is optional.  If you do not specify an interface name, annotators

-          will instead get an interface which can provide direct access to the 

-          data resource (file or URL) that is 

-          associated with this external resource.</para>

-      </section>

-      

-      <section id="ugr.tug.aae.resources.accessing_from_uimacontext">

-        <title>Accessing the Resource from the UimaContext</title>

-        

-        <para> If you look at the

-          <literal>org.apache.uima.tutorial.ex6.UimaAcronymAnnotator</literal>

-          source, you will see that the annotator accesses this resource from the

-          UimaContext by calling:

-          

-          

-          <programlisting>StringMapResource mMap = 

-  (StringMapResource)getContext().getResourceObject("AcronymTable");</programlisting>

-          </para>

-        

-        <para>The object returned from the <literal>getResourceObject</literal> method

-          will implement the interface declared in the

-          <literal>&lt;interfaceName&gt;</literal> section of the descriptor,

-          <literal>StringMapResource</literal> in this case. The annotator code does not

-          need to know the location of external data that may be used to initilize this

-          object, nor the Java class that might be used to read the

-          data and implement the <literal>StringMapResource</literal>

-          interface.</para>

-        

-        <para>Note that if we did not specify a Java interface in our descriptor, our

-          annotator could directly access the resource data as follows:

-          

-          

-          <programlisting>InputStream stream = getContext().getResourceAsStream("AcronymTable");</programlisting></para>

-        

-        <para>If necessary, the annotator could also determine the location of the resource

-          file, by calling:

-          

-          

-          <programlisting>URI uri = getContext().getResourceURI("AcronymTable");</programlisting></para>

-        

-        <para>These last two options are only available in the case where the descriptor does

-          not declare a Java interface.</para>

-        

-        <note><para>The methods for getting access to resources include <literal>getResourceURL</literal>.  That 

-        method returns a URL, which may contain spaces encoded as %20.  url.getPath() would

-        return the path without decoding these %20 into spaces.  <literal>getResourceURI</literal>

-        on the other hand, returns a URI, and the uri.getPath() <emphasis>does</emphasis>

-        do the conversion of %20 into spaces.  See also <literal>getResourceFilePath</literal>,

-          which does a getResourceURI followed by uri.getPath().</para></note>

-        

-      </section>

-      

-      <section id="ugr.tug.aae.resources.declaring_and_bindings">

-        <title>Declaring Resources and Bindings</title>

-        

-        <para>Refer back to the top window in the Resources page of the Component Descriptor

-          Editor. This is where we specify the location of the resource data, and the Java

-          class used to read the data. For the example, this corresponds to the following

-          section of the descriptor:

-          

-          

-          <programlisting><![CDATA[<resourceManagerConfiguration>

-  <externalResources>

-    <externalResource>

-      <name>UimaAcronymTableFile</name> 

-      <description>

-         A table containing UIMA acronyms and their expanded forms.

-      </description> 

-      <fileResourceSpecifier>

-        <fileUrl>file:org/apache/uima/tutorial/ex6/uimaAcronyms.txt

-        </fileUrl> 

-      </fileResourceSpecifier>

-      <implementationName>

-         org.apache.uima.tutorial.ex6.StringMapResource_impl

-      </implementationName> 

-    </externalResource>

-  </externalResources>

-

-  <externalResourceBindings>

-    <externalResourceBinding>

-      <key>AcronymTable</key>    

-      <resourceName>UimaAcronymTableFile</resourceName> 

-    </externalResourceBinding>

-  </externalResourceBindings>

-</resourceManagerConfiguration>

-]]></programlisting></para>

-        

-        <para>The first section of this XML declares an externalResource, the

-          <literal>UimaAcronymTableFile</literal>. With this, the fileUrl element

-          specifies the path to the data file.  This can be a file on the file system,

-          but can also be a remote resource access via, e.g., the http protocol.

-          The fileUrl element doesn't have to be a "file", it can be a URL. 

-          This can be an absolute URL (e.g. one that starts

-          with file:/ or file:///, or file://my.host.org/), but that is not recommended

-          because it makes installation of your component more difficult, as noted earlier.

-          Better is a relative URL, which will be looked up within the classpath (and/or

-          datapath), as used in this example. In this case, the file

-          <literal>org/apache/uima/tutorial/ex6/uimaAcronyms.txt</literal> is

-          located in <literal>uimaj-examples.jar</literal>, which is in the classpath.

-          If you look in this file you will see the definitions of several UIMA

-          acronyms.</para>

-        

-        <para>The second section of the XML declares an externalResourceBinding, which

-          connects the key <literal>AcronymTable</literal>, declared in the

-          annotator&apos;s external resource dependency, to the actual resource name

-          <literal>UimaAcronymTableFile</literal>. This is rather trivial in this case;

-          for more on bindings see the example

-          <literal>UimaMeetingDetectorAE.xml</literal> below. There is no global

-          repository for external resources; it is up to the user to define each resource

-          needed by a particular set of annotators.</para>

-        

-        <para>In the Component Descriptor Editor, bindings are indicated below the

-          external resource. To create a new binding, you select an external resource (which

-          must have previously been defined), and an external resource dependency, and then

-          click the <literal>Bind</literal> button, which only enables if you have

-          selected two things to bind together.</para>

-        

-        <para>When the Analysis Engine is initialized, it creates a single instance of

-          <literal>StringMapResource_impl</literal> and loads it with the contents of

-          the data file.  This means that the framework calls the instance's <literal>load</literal>

-          method, passing it an instance of DataResource, from which you can obtain 

-          a stream or URI/URL of the external resource that was declared in the external resource; 

-          for resources where

-          loading does not make sense, you can implement a <literal>load</literal> method

-          which ignores its argument and just returns, or performes whatever

-          initialization is appropriate at startup time.  See the Javadocs for 

-          SharedResourceObject for details on this.</para>

-          

-          <para> 

-          The UimaAcronymAnnotator then accesses the data through the

-          <literal>StringMapResource</literal> interface. This single instance could

-          be shared among multiple annotators, as will be explained later.</para>

-          

-          <warning><para>

-          Because the implementation of the resource is shared, 

-          you should insure your implementation is thread-safe, as it 

-          could be called multiple times on multiple threads, simultaneously.</para></warning>

-        

-        <para>Note that all resource implementation classes (e.g.

-          StringMapResource_impl in the provided example) must be declared public

-          must not be declared abstract, and must have public, 0-argument constructors, so 

-          that they can be instantiated by the framework. (Although Java classes in which 

-          you do not define any constructor will, by default, have a 0-argument constructor

-          that doesn&apos;t do anything, a class in which you have defined at least one

-          constructor does not get a default 0-argument constructor.)</para>

-          

-        <para>All resource implementation classes that provide access to resource data

-          must also implement the interface org.apache.uima.resource.SharedResourceObject. 

-          The UIMA Framework

-          will invoke this interface's only method, <code>load</code>,  

-          after this object has been instantiated. The implementation of this method 

-          can then read data from the specified <code>DataResource</code> 

-          and use that data to initialize this object.  It can also do whatever

-          resource initialization might be appropriate to do at startup time.</para>

-        

-        <para>This annotator is illustrated in <xref

-            linkend="ugr.tug.aae.fig.external_resource_binding"/>. To see it in

-          action, just run it using the Document Analyzer. When it finishes, open up the

-          UIMA_Seminars document in the processed results window, (double-click it), and

-          then left-click on one of the highlighted terms, to see the expandedForm

-          feature&apos;s value.

-          <figure id="ugr.tug.aae.fig.external_resource_binding">

-            <title>External Resource Binding</title>

-            <mediaobject>

-              <imageobject>

-                <imagedata width="3.7in" format="PNG"

-                  fileref="&imgroot;image040.png"/>

-              </imageobject>

-              <textobject><phrase>External Resource Binding</phrase></textobject>

-            </mediaobject>

-          </figure> </para>

-        

-        <para>By designing our annotator in this way, we have gained some flexibility. We can

-          freely replace the StringMapResource_impl class with any other implementation

-          that implements the simple StringMapResource interface. (For example, for very

-          large resources we might not be able to have the entire map in memory.) We have also

-          made our external resource dependencies explicit in the descriptor, which will

-          help others to deploy our annotator.</para>

-      </section>

-      <section id="ugr.tug.aae.resources.sharing_among_annotators">

-        <title>Sharing Resources among Annotators</title>

-        

-        <para>Another advantage of the Resource Manager is that it allows our data to be

-          shared between annotators. To demonstrate this we have developed another

-          annotator that will use the same acronym table. The UimaMeetingAnnotator will

-          iterate over Meeting annotations discovered by the Meeting Detector we

-          previously developed and attempt to determine whether the topic of the meeting is

-          related to UIMA. It will do this by looking for occurrences of UIMA acronyms in close

-          proximity to the meeting annotation. We could implement this by using the

-          UimaAcronymAnnotator, of course, but for the sake of this example we will have the

-          UimaMeetingAnnotator access the acronym map directly.</para>

-        

-        <para>The Java code for the UimaMeetingAnnotator in example 6 creates a new type,

-          UimaMeeting, if it finds a meeting within 50 characters of the UIMA

-          acronym.</para>

-        

-        <para>We combine three analysis engines, the UimaAcronymAnnotator to annotate

-          UIMA acronyms, the MeetingDectector from example 4 to find meetings and finally

-          the UimaMeetingAnnotator to annotate just meetings about UIMA. Together these

-          are assembled to form the new aggregate analysis engine, UimaMeetingDectector.

-          This aggregate and the sharing of a common resource are illustrated in <xref

-            linkend="ugr.tug.aae.fig.sharing_common_resource"/>.

-          <figure id="ugr.tug.aae.fig.sharing_common_resource">

-            <title>Component engines of an aggregate share a common resource</title>

-            <mediaobject>

-              <imageobject>

-                <imagedata width="5.7in" format="PNG"

-                  fileref="&imgroot;image042.png"/>

-              </imageobject>

-              <textobject><phrase>Picture of Component engines of an aggregate sharing a

-                common resource</phrase></textobject>

-            </mediaobject>

-          </figure> The important thing to notice is in the

-          <literal>UimaMeetingDetectorAE.xml</literal> aggregate descriptor. It

-          includes both the UimaMeetingAnnotator and the UimaAcronymAnnotator, and

-          contains a single declaration of the UimaAcronymTableFile resource. (The actual

-          example has the order of the first two annotators reversed versus the above

-          picture, which is OK since they do not depend on one another).</para>

-        

-        <para>It also binds the resources as follows:

-          

-          

-          <screenshot>

-     <mediaobject>

-      <imageobject>

-        <imagedata width="5.7in" format="JPG" fileref="&imgroot;image044.jpg"/>

-      </imageobject>

-      <textobject><phrase>UimaMeetingDetectorAE.xml binding a common resource</phrase></textobject>

-    </mediaobject>

-  </screenshot>

-          

-          

-          <programlisting><![CDATA[<externalResourceBindings>

-  <externalResourceBinding>

-    <key>UimaAcronymAnnotator/AcronymTable</key> 

-    <resourceName>UimaAcronymTableFile</resourceName> 

-  </externalResourceBinding>

-

-  <externalResourceBinding>

-    <key>UimaMeetingAnnotator/UimaTermTable</key> 

-    <resourceName>UimaAcronymTableFile</resourceName> 

-  </externalResourceBinding>

-</externalResourceBindings>

-]]></programlisting>

-          </para>

-        

-        <para>This binds the resource dependencies of both the UimaAcronymAnnotator

-          (which uses the name AcronymTable) and UimaMeetingAnnotator (which uses

-          UimaTermTable) to the single declared resource named UimaAcronymFile.

-          Therefore they will share the same instance. Resource bindings in the aggregate

-          descriptor <emphasis role="bold-italic">override</emphasis> any resource

-          declarations in individual annotator descriptors.</para>

-        

-        <para>If we wanted to have the annotators use different acronym tables, we could

-          easily do that. We would simply have to change the resourceName elements in the

-          bindings so that they referred to two different resources. The Resource Manager

-          gives us the flexibility to make this decision at deployment time, without

-          changing any Java code.</para>

-        

-      </section>

-      

-      <section id="ugr.tug.aae.resources.threading">

-        <title>Threading and Shared Resources</title>

-        <para>Sharing can also occur when multiple instances of an annotator are 

-        created by the framework in response to run-time deployment specifications.

-        If an implementation class is specified in the external resource, 

-        only one instance of that implementation class  

-          is created for a given binding, and is shared among all

-        annotators.  Because of this, the implementation of that shared instance must be written to be

-        thread-safe - that is, to operate correctly when called at arbitrary times

-        by multiple threads.  Writing thread-safe code in Java is addressed in several

-        books, such as Brian Goetz's <emphasis>Java Concurrency in Practice</emphasis>.</para>

-        

-        <para>

-          If no implementation class is specified, then the getResource method returns a

-          DataResource object, from which each annotator instance can obtain their

-          own (non-shared) input stream; so threading is not an issue in this case.

-        </para>

-        

-      </section>

-    </section>

-    <section id="ugr.tug.aae.result_specification_setting">

-      <title>Result Specifications</title>

-      

-      <para>Annotators often are written to do a lot of computation and produce a lot of different outputs.

-      For example, a tokenizer can, in addition to identifying tokens, look them up in dictionaries, create 

-      lemma forms (dropping suffexes and prefixes), etc.  Result Specifications provide a way to dynamically

-      specify what results are desired for a particular CAS being processed.</para>

-      

-      <para>It is up to the annotator writer to take advantage of the result specification; using it is optional.

-      If it is used, the annotator writer checks if a particular output is wanted, by asking the result specification

-      if it contains a specific Type and/or Feature.  If it does, then the annotator produces that type/feature; if not,

-      it skips the computations for producing that type/feature.</para>

-      

-      <para>The Result Specification querying may 

-      include the language.  A typical use case:  The CAS contains a document written in some language, and some

-      upstream Annotator has discovered what this language is.  

-      The Annotator extracts the previously discovered language specification from the CAS and 

-      then includes it when querying the Result Specification.  The exact method of encoding 

-      language specifications in the CAS is left up to annotator developers; however,

-      the framework provides a commonly used type for this - the org.apache.uima.tcas.DocumentAnnotation

-      type.</para>

-      

-      <para>The Result Specification is passed to the annotator instance by calling its

-        setResultSpecificaiton method (this call is typically done by the framework, based on Capability specifications). 

-        When called, the default implementation saves the

-        result specification in an instance variable of the Annotator instance, which can be

-        accessed by the annotator using the protected

-        <literal>getResultSpecification()</literal> method.</para>

-      

-      <para>A Result Specification is a list of output types and / or type:feature

-        names, catagorized by language(s), which are expected to be output from (produced by) the

-        annotator. Annotators may use this to optimize their operations, when possible, for

-        those cases where only particular outputs are wanted. The interface to the Result

-        Specification object (see the Javadocs) allows querying both types and particular

-        features of types.</para>

-      

-      <para>The languages specifications used by Result Specifications are the same that are

-      specifiable in Capability Specifications; examples include "en" for English, "en-uk" for

-      British English, etc.  There is also a language type, "x-unspecified", which is presumed

-      if no language specification(s) are given.</para>

-           

-      <para>If a query of the Result Specification doesn't include a language, it is treated as if the 

-      language "x-unspecified" was specified.  Language matching is hierarchically defaulted,

-      in one direction: if a query includes the language "en-uk", meaning that the document

-      being processed is in that language, it will match

-        Result Specifications whose languages "en-uk", "en", or "x-unspecified".  In other words, if the 

-        Result Specifications say to produce output if the actual document's language

-        is en-uk, or en, or x-unspecified, then having the actual document's language be

-        en-uk would "match" any of these Result Specifications. However the reverse is not true:

-        If the query asks about producing output if the actual document's language is "x-unspecified", 

-        then it would not match if the Result Specification said to produce output only if the 

-        actual document is en-uk or en;  the Result Specification would need to say to 

-        produce output for "x-unspecified).

-        </para>

-      

-      <para>If the Result Specification indicates it wants output

-      produced for "en-uk", but the annotator is given a language which is unknown, 

-        or one that is known, but isn't "en-uk", then the query (using the language 

-        of the document) will return false.   This is true even if the language is "en".  

-        However, if the Result Specification indicates it wants output for "en", 

-      and the query is for a document whose language is "en-uk" then the query will return true.

-    </para>      

-      

-      <para>Sometimes you can specify the Result Specification; othertimes, you cannot

-        (for instance, inside a Collection Processing Engine, you cannot). When you cannot

-        specify it, or choose not to specify it (for example, using the form of the

-        process(...) call on an Analysis Engine that doesn&apos;t include the Result

-        Specification), a <quote>Default</quote> Result Specification is used.</para>

-          

-      <section id="ugr.tug.aae.result_spec.default">

-        <title>Default ResultSpecification</title>

-        

-        <para>The default Result Specification is taken from the Engine&apos;s output

-          Capability Specification. Remember that a Capability Specification has both

-          inputs and outputs, can specify types and / or features, and there can be more than one

-          Capability Set. If there is more than one set, the logical union by language of these sets is used.

-          Each set can have a different "language(s)" specified; the default Result Specification 

-          will have the outputs by language(s), so that the annotator can query which outputs 

-          should be provided for particular languages.  The methods to query the Result Specification

-          take a type and (optionally) a feature, and optionally, a language.  If the queried type is

-          a subtype of some otherwise matching type in the Result Specification, it will match the query.  

-          See the Javadocs for more details on this.

-          </para>

-        

-      </section>

-      

-      <section id="ugr.tug.aae.result_spec.passing_to_annotators">

-        <title>Passing Result Specifications to Annotators</title>

-        

-        <para>If you are not using a Collection Processing Engine, you can specify a Result

-          Specification for your AnalysisEngine(s) by calling the

-          <literal>AnalysisEngine.setResultSpecification(ResultSpecification)</literal>

-          method.</para>

-        <para>It is also possible to pass a Result Specification on each call to

-          <literal>AnalysisEngine.process(CAS, ResultSpecification)</literal>. However,

-          this is not recommended if your Result Specification will stay constant across

-          multiple calls to

-          <literal>process</literal>. In that case it will be more efficient to call

-          <literal>AnalysisEngine.setResultSpecification(ResultSpecification)</literal>

-          only when the Result Specification changes.</para>

-        <para> For primitive Analysis Engines, whatever Result Specification you pass in is

-          passed along to the annotator's

-          <literal>setResultSpecification(ResultSpecification)</literal> method. For

-          aggregate Analysis Engines, see below.</para>

-      </section>

-      

-      <section id="ugr.tug.aae.result_spec.aggregates">

-        <title>Aggregates</title>

-        

-        <para>For aggregate engines, the Result Specification passed to the

-          <code>AnalysisEngine.setResultSpecification(ResultSpecification)</code>

-          method is intended to specify the set of output types/features that the aggregate

-          should produce. This is not necessarily equivalent to the set of output

-          types/features that each annotator should produce. For example, an annotator may

-          need to produce an intermediate type that is then consumed by a downstream annotator,

-          even though that intermediate type is not part of the Result Specification.</para>

-        <para>To handle this situation, when

-          <code>AnalysisEngine.setResultSpecification(ResultSpecification)</code>

-          is called on an aggregate, the framework computes the union of the passed Result

-          Specification with the set of

-          <emphasis>all</emphasis> input types and features of

-          <emphasis>all</emphasis> component AnalysisEngines within that aggregate. This forms the

-          complete set of types and features that any component of the aggregate might need to

-          produce. This derived Result Specification is then intersected with the 

-          delegate's output capabilities, and the result is passed to the

-          <code>AnalysisEngine.setResultSpecification(ResultSpecification)</code>

-          of each component AnalysisEngine. In the case of nested aggregates, this procedure

-          is applied recursively.</para>

-      </section>  

-      <section id="ugr.tug.aae.result_spec.aggregates.cpes">

-        <title>Collection Proessing Engines</title>

-          

-        <para>The Default Result Specification is always used for all components of a

-          Collection Processing Engine.</para>        

-      </section>

-    </section>

-    

-    <section id="ugr.tug.aae.classpath_when_using_jcas">

-      <title>Class path setup when using JCas</title>

-      

-      <para>JCas provides Java classes that correspond to each CAS type in an application.

-        These classes are generated by the JCasGen utility (which can be automatically

-        invoked from the Component Descriptor Editor).</para>

-      

-      <para>The Java source classes generated by the JCasGen utility are typically compiled

-        and packaged into a JAR file. This JAR file must be present in the classpath of the UIMA

-        application.</para>

-      

-      <para>For more details on issues around setting up this class path, including

-        deployment issues where class loaders are being used to isolate multiple UIMA

-        applications inside a single running Java Virtual Machine, please see 

-        <olink targetdoc="&uima_docs_ref;"/>

-        <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.jcas.class_loaders"/>

-        .</para>

-      

-    </section>

-    <section id="ugr.tug.aae.using_shell_scripts">

-      <title>Using the Shell Scripts</title>

-      

-      <para>The SDK includes a <literal>/bin</literal> subdirectory containing shell

-        scripts, for Windows (.bat files) and Unix (.sh files). Many of these scripts invoke

-        sample Java programs which require a class path; they call a common shell script,

-        <literal>setUimaClassPath</literal> to set up the UIMA required files and

-        directories on the class path.</para>

-      

-      <para>If you need to include files on the class path, the scripts will add anything you

-        specify in the environment variables CLASSPATH or UIMA_CLASSPATH to the classpath. So, for

-        example, if you are running the document analyzer, and wanted it to find a Java class

-        file named (on Windows) c:\a\b\c\myProject\myJarFile.jar, you could first issue a

-        <literal>set</literal> command to set the UIMA_CLASSPATH to this file, followed by

-        the documentAnalyzer script:

-        

-        

-        <programlisting>set UIMA_CLASSPATH=c:\a\b\c\myProject\myJarFile.jar

-documentAnalyzer</programlisting>

-      </para>

-      

-      <para>Other environment variables are used by the shell scripts, as follows:

-        

-        <table frame="all" id="ugr.aae.tbl.env_vars_used_by_shell_scripts">

-          <title>Environment variables used by the shell scripts</title>

-          <tgroup cols="2" rowsep="1" colsep="1">

-            <colspec colname="c1"/>

-            <colspec colname="c2"/>

-            <thead>

-              <row>

-                <entry align="center">Environment Variable</entry>

-                <entry align="center">Description</entry>

-              </row>

-            </thead>

-            <tbody>

-              <row>

-                <entry>UIMA_HOME</entry>

-                <entry>Path where the UIMA SDK was installed.</entry>

-              </row>

-              <row>

-                <entry>JAVA_HOME</entry>

-                <entry>(Optional) Path to a Java Runtime Environment. If not set, the Java

-                  JRE that is in your system PATH is used.</entry>

-              </row>

-              <row>

-                <entry>UIMA_CLASSPATH</entry>

-                <entry>(Optional) if specified, a path specification to use as the default

-                  ClassPath.  You can also set the CLASSPATH variable.  If you set both, they

-                  will be concatenated.</entry>

-              </row>

-              <row>

-                <entry>UIMA_DATAPATH</entry>

-                <entry>(Optional) if specified, a path specification to use as the default

-                  DataPath (see <olink targetdoc="&uima_docs_ref;"/>

-                  <olink targetdoc="&uima_docs_ref;"

-                    targetptr="ugr.ref.xml.component_descriptor.datapath"/>)</entry>

-              </row>

-              <row>

-                <entry>UIMA_LOGGER_CONFIG_FILE</entry>

-                <entry>(Optional) if specified, a path to a Java Logger properties file

-                  (see <xref linkend="ugr.tug.aae.configuration_logging"/>)</entry>

-              </row>

-              <row>

-                <entry>UIMA_JVM_OPTS</entry>

-                <entry>(Optional) if specified, the JVM arguments to be used when the Java

-                  process is started.  This can be used for example to set the maximum Java

-                  heap size or to define system properties.</entry>

-              </row>

-              <row>

-                <entry>VNS_PORT</entry>

-                <entry>(Optional) if specified, the network IP port number of the Vinci

-                  Name Server (VNS) (see <olink

-                    targetdoc="&uima_docs_tutorial_guides;"

-                    targetptr="ugr.tug.application.vns"/>)</entry>

-              </row>

-              <row>

-                <entry>ECLIPSE_HOME</entry>

-                <entry>(Optional) Needs to be set to the root of your Eclipse installation

-                  when using shell scripts that invoke Eclipse (e.g.

-                  jcasgen_merge)</entry>

-              </row>

-            </tbody>

-          </tgroup>

-          

-        </table> </para>

-      

-    </section>

-  </section>

-  

-  <section id="ugr.tug.aae.common_pitfalls">

-    <title>Common Pitfalls</title>

-    

-    <para>Here are some things to avoid doing in your annotator code:</para>

-    

-    <para><emphasis role="bold">Do not retain references to JCas objects between calls to

-      process() for different CASes</emphasis></para>

-    

-    <para>The JCas will be cleared between calls to your annotator&apos;s process() method

-      for each new CAS.

-      All of the analysis results related to the previous document will be deleted to make way

-      for analysis of a new document. Therefore, you should never save a reference to a JCas

-      Feature Structure object (i.e. an instance of a class created using JCasGen) and

-      attempt to reuse it in a future invocation of the process() method. If you do so, the

-      results will be undefined.</para>

-    

-    <para><emphasis role="bold">Careless use of static data</emphasis></para>

-    

-    <para>Always keep in mind that an application that uses your annotator may create

-      multiple instances of your annotator class. A multithreaded application may attempt

-      to use two instances of your annotator to process two different documents

-      simultaneously. This will generally not cause any problems as long as your annotator

-      instances do not share static data.</para>

-    

-    <para>In general, you should not use static variables other than static final constants

-      of primitive data types (String, int, float, etc). Other types of static variables may

-      allow one annotator instance to set a value that affects another annotator instance,

-      which can lead to unexpected effects. Also, static references to classes that

-      aren&apos;t thread-safe are likely to cause errors in multithreaded

-      applications.</para>

-    

-  </section>

-  <section id="ugr.tug.aae.viewing_UIMA_objects_in_eclipse_debugger">

-    <title>Viewing UIMA objects in the Eclipse debugger</title>

-    <titleabbrev>UIMA Objects in Eclipse Debugger</titleabbrev>

-    

-    <para>Eclipse has a feature for viewing Java Logical

-      Structures. When enabled, it will permit you to see a view of UIMA objects (such as

-      feature structure instances, CAS or JCas instances, etc.) which displays the logical

-      subparts. For example, here is a view of a feature structure for the RoomNumber

-      annotation, from the tutorial example 1:

-      

-      

-      <screenshot>

-     <mediaobject>

-      <imageobject>

-        <imagedata width="5.7in" format="JPG" fileref="&imgroot;image046.jpg"/>

-      </imageobject>

-      <textobject><phrase>Screenshot of Eclipse debugger showing non-logical-structure display of 

-      a feature structure</phrase></textobject>

-    </mediaobject>

-  </screenshot></para>

-    

-    <para>The <quote>annotation</quote> object in Java shows the internals of the JCas object, not very

-      convenient for seeing the features or the part of the input that is being annotated. But

-      if you turn on the Java Logical Structure mode by pushing this button:

-      

-      

-      <screenshot>

-     <mediaobject>

-      <imageobject>

-        <imagedata width="5.6in" format="JPG" fileref="&imgroot;image048.jpg"/>

-      </imageobject>

-      <textobject><phrase>Screenshot of Eclipse debugger showing button to push to 

-        enable viewing logical structures</phrase></textobject>

-    </mediaobject>

-  </screenshot>

-      the features of the FeatureStructure instance will be shown:

-      

-      

-      <screenshot>

-     <mediaobject>

-      <imageobject>

-        <imagedata width="5.7in" format="JPG" fileref="&imgroot;image050.jpg"/>

-      </imageobject>

-      <textobject><phrase>Screenshot of Eclipse debugger showing logical structure display of 

-      an annotation</phrase></textobject>

-    </mediaobject>

-  </screenshot></para>

-    

-  </section>

-  

-  <section id="ugr.tug.aae.xml_intro_ae_descriptor">

-    <title>Introduction to Analysis Engine Descriptor XML Syntax</title>

-    <titleabbrev>Analysis Engine XML Descriptor</titleabbrev>

-    

-    <para>This section is an introduction to the syntax used for Analysis Engine

-      Descriptors. Most users do not need to understand these details; they can use the

-      Component Descriptor Editor Eclipse plugin to edit Analysis Engine Descriptors

-      rather than editing the XML directly.</para>

-    

-    <para>This section walks through the actual XML descriptor for the RoomNumberAnnotator

-      example introduced in section <xref linkend="ugr.tug.aae.getting_started"/>. The

-      discussion is divided into several logical sections of the descriptor.</para>

-    

-    <para>The full specification for Analysis Engine Descriptors is defined in 

-    <olink targetdoc="&uima_docs_ref;"/>

-    <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.component_descriptor"/>

-      .</para>

-    

-    <section id="ugr.tug.aae.header_annotator_class_identification">

-      <title>Header and Annotator Class Identification</title>

-      

-      

-      <programlisting><?db-font-size 80% ?><![CDATA[<?xml version="1.0" encoding="UTF-8" ?> 

-<!--  Descriptor for the example RoomNumberAnnotator. --> 

-<analysisEngineDescription xmlns="http://uima.apache.org/resourceSpecifier">

-  <frameworkImplementation>org.apache.uima.java</frameworkImplementation> 

-  <primitive>true</primitive> 

-  <annotatorImplementationName>

-    org.apache.uima.tutorial.ex1.RoomNumberAnnotator

-  </annotatorImplementationName>

-]]></programlisting>

-      

-      <para>The document begins with a standard XML header and a comment. The root element of

-        the document is named <literal>&lt;analysisEngineDescription&gt;,</literal>

-        and must specify the XML namespace

-        <literal>http://uima.apache.org/resourceSpecifier</literal>.</para>

-      

-      <para>The first subelement,

-        <literal>&lt;frameworkImplementation&gt;</literal>, must contain the value

-        <literal>org.apache.uima.java</literal>. The second subelement,

-        <literal>&lt;primitive&gt;</literal>, contains the Boolean value true,

-        indicating that this XML document describes a <emphasis>Primitive</emphasis>

-        Analysis Engine. A Primitive Analysis Engine is comprised of a single annotator. It

-        is also possible to construct XML descriptors for non-primitive or

-        <emphasis>Aggregate</emphasis> Analysis Engines; this is covered later.</para>

-      

-      <para>The next element,

-        <literal>&lt;annotatorImplementationName&gt;</literal>, contains the

-        fully-qualified class name of our annotator class. This is how the UIMA framework

-        determines which annotator class to instantiate.</para>

-    </section>

-    

-    <section id="ugr.tug.aae.xml_intro_simple_metadata_attributes">

-      <title>Simple Metadata Attributes</title>

-      

-      

-      <programlisting><![CDATA[<analysisEngineMetaData>

-  <name>Room Number Annotator</name> 

-  <description>An example annotator that searches for room numbers in

-     the IBM Watson research buildings.</description> 

-  <version>1.0</version> 

-  <vendor>The Apache Software Foundation</vendor></para>

-]]></programlisting>

-      

-      <para>Here are shown four simple metadata fields &ndash; name, description, version,

-        and vendor. Providing values for these fields is optional, but recommended.</para>

-      

-    </section>

-    

-    <section id="ugr.tug.aae.xml_intro_type_system_definition">

-      <title>Type System Definition</title>

-      

-      

-      <programlisting><![CDATA[<typeSystemDescription>

-  <imports>

-    <import location="TutorialTypeSystem.xml"/>

-  </imports>

-</typeSystemDescription>

-]]></programlisting>

-      

-      <para>This section of the XML descriptor defines which types the annotator works with.

-        The recommended way to do this is to <emphasis>import</emphasis> the type system

-        definition from a separate file, as shown here. The location specified here should be

-        a relative path, and it will be resolved relative to the location of the aggregate

-        descriptor. It is also possible to define types directly in the Analysis Engine

-        descriptor, but these types will not be easily shareable by others.</para>

-      

-    </section>

-    

-    <section id="ugr.tug.aae.xml_intro_capabilities">

-      <title>Capabilities</title>

-      

-      

-      <programlisting><![CDATA[<capabilities>

-  <capability>

-    <inputs /> 

-    <outputs>

-      <type>org.apache.uima.tutorial.RoomNumber</type> 

-      <feature>org.apache.uima.tutorial.RoomNumber:building</feature> 

-    </outputs>

-  </capability>

-</capabilities>

-]]></programlisting>

-      

-      <para>The last section of the descriptor describes the

-        <emphasis>Capabilities</emphasis> of the annotator &ndash; the Types/Features

-        it consumes (input) and the Types/Features that it produces (output). These must be

-        the names of types and features that exist in the ANALYSIS ENGINE descriptor&apos;s

-        type system definition.</para>

-      

-      <para>Our annotator outputs only one Type, RoomNumber and one feature,

-        RoomNumber:building. The fully-qualified names (including namespace) are

-        needed.</para>

-      

-      <para>The building feature is listed separately here, but clearly specifying every

-        feature for a complex type would be cumbersome. Therefore, a shortcut syntax exists.

-        The &lt;outputs&gt; section above could be replaced with the equivalent section:

-        

-        

-        <programlisting><![CDATA[<outputs>

-  <type allAnnotatorFeatures ="true">

-     org.apache.uima.tutorial.RoomNumber

-  </type> 

-</outputs>]]></programlisting></para>

-      

-    </section>

-    

-    <section id="ugr.tug.aae.xml_intro.configuration_parameters">

-      <title>Configuration Parameters (Optional)</title>

-      

-      <section id="ugr.tug.aae.xml_intro.configuration_parameters_declarations">

-        <title>Configuration Parameter Declarations</title>

-        

-        

-        <programlisting><![CDATA[<configurationParameters>

-  <configurationParameter>

-    <name>Patterns</name> 

-    <description>List of room number regular expression patterns.

-    </description> 

-    <type>String</type> 

-    <multiValued>true</multiValued> 

-    <mandatory>true</mandatory> 

-  </configurationParameter>

-  <configurationParameter>

-    <name>Locations</name> 

-    <description>List of locations corresponding to the room number

-       expressions specified by the Patterns parameter.

-    </description> 

-    <type>String</type> 

-    <multiValued>true</multiValued> 

-    <mandatory>true</mandatory> 

-  </configurationParameter>

-</configurationParameters>]]></programlisting>

-        

-        <para>The <literal>&lt;configurationParameters&gt;</literal> element

-          contains the definitions of the configuration parameters that our annotator

-          accepts. We have declared two parameters. For each configuration parameter, the

-          following are specified:

-          

-          <itemizedlist><listitem><para><emphasis role="bold">name</emphasis>

-            &ndash; the name that the annotator code uses to refer to the parameter</para>

-            </listitem>

-            

-            <listitem><para><emphasis role="bold">description</emphasis>

-              &ndash; a natural language description of the intent of the parameter</para>

-            </listitem>

-            

-            <listitem><para><emphasis role="bold">type</emphasis> &ndash; the data

-              type of the parameter&apos;s value &ndash; must be one of String, Integer,

-              Float, or Boolean.</para></listitem>

-            

-            <listitem><para><emphasis role="bold">multiValued</emphasis>

-              &ndash; true if the parameter can take multiple-values (an array), false if

-              the parameter takes only a single value. </para></listitem>

-            

-            <listitem><para><emphasis role="bold">mandatory</emphasis> &ndash; true

-              if a value must be provided for the parameter </para></listitem>

-          </itemizedlist></para>

-        

-        <para>Both of our parameters are mandatory and accept an array of Strings as their

-          value.</para>

-      </section>

-      

-      <section id="ugr.tug.aae.xml_intro_configuration_parameter_settings">

-        <title>Configuration Parameter Settings</title>

-        

-        

-        <programlisting><![CDATA[<configurationParameterSettings>

-  <nameValuePair>

-    <name>Patterns</name> 

-    <value>

-      <array>

-        <string>b[0-4]d-[0-2]ddb</string> 

-        <string>b[G1-4][NS]-[A-Z]ddb</string> 

-        <string>bJ[12]-[A-Z]ddb</string> 

-      </array>

-    </value>

-  </nameValuePair>

-  <nameValuePair>

-    <name>Locations</name> 

-    <value>

-      <array>

-        <string>Watson - Yorktown</string> 

-        <string>Watson - Hawthorne I</string> 

-        <string>Watson - Hawthorne II</string> 

-      </array>

-    </value>

-  </nameValuePair>

-</configurationParameterSettings>]]></programlisting>

-        

-      </section>

-      

-      <section id="ugr.tug.aae.xml_intro.aggregate">

-        <title>Aggregate Analysis Engine Descriptor</title>

-        

-        

-        <programlisting><?db-font-size 80% ?><![CDATA[<?xml version="1.0" encoding="UTF-8" ?> 

-<analysisEngineDescription xmlns="http://uima.apache.org/resourceSpecifier">

-  <frameworkImplementation>org.apache.uima.java</frameworkImplementation> 

-  <primitive>false</primitive> 

-

-  <delegateAnalysisEngineSpecifiers>

-    <delegateAnalysisEngine key="RoomNumber">

-      <import location="../ex2/RoomNumberAnnotator.xml"/> 

-    </delegateAnalysisEngine>

-    <delegateAnalysisEngine key="DateTime">

-      <import location="TutorialDateTime.xml" /> 

-    </delegateAnalysisEngine>

-  </delegateAnalysisEngineSpecifiers>]]></programlisting>

-        

-        <para>The first difference between this descriptor and an individual

-          annotator&apos;s descriptor is that the

-          <literal>&lt;primitive&gt;</literal> element contains the value

-          <literal>false</literal>. This indicates that this Analysis Engine (AE) is an

-          aggregate AE rather than a primitive AE.</para>

-        

-        <para>Then, instead of a single annotator class name, we have a list of

-          <literal>delegateAnalysisEngineSpecifiers</literal>. Each specifies one of

-          the components that constitute our Aggregate . We refer to each component by the

-          relative path from this XML descriptor to the component AE&apos;s XML

-          descriptor.</para>

-        

-        <para>This list of component AEs does not imply an ordering of them in the execution

-          pipeline. Ordering is done by another section of the descriptor:

-          

-          

-          <programlisting><![CDATA[<analysisEngineMetaData>

-  <name>Aggregate AE - Room Number and DateTime Annotators</name> 

-  <description>Detects Room Numbers, Dates, and Times</description> 

-  <flowConstraints>

-    <fixedFlow>

-      <node>RoomNumber</node> 

-      <node>DateTime</node> 

-    </fixedFlow>

-  </flowConstraints>]]></programlisting></para>

-        

-        <para>Here, a fixedFlow is adequate, and we specify the exact ordering in which the

-          AEs will be executed. In this case, it doesn&apos;t really matter, since the

-          RoomNumber and DateTime annotators do not have any dependencies on one

-          another.</para>

-        

-        <para>Finally, the descriptor has a capabilities section, which has exactly the

-          same syntax as a primitive AE&apos;s capabilities section:

-          

-          

-          <programlisting><![CDATA[<capabilities>

-  <capability>

-    <inputs /> 

-    <outputs>

-      <type allAnnotatorFeatures="true">

-        org.apache.uima.tutorial.RoomNumber

-      </type> 

-      <type allAnnotatorFeatures="true">

-        org.apache.uima.tutorial.DateAnnot

-      </type> 

-      <type allAnnotatorFeatures="true">

-        org.apache.uima.tutorial.TimeAnnot

-      </type> 

-    </outputs>

-    <languagesSupported>

-      <language>en</language> 

-    </languagesSupported>

-  </capability>

-</capabilities>]]></programlisting>

-          </para>

-        

-      </section>

-      

-    </section>

-  </section>

-</chapter>

diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/tug.aas.xml b/uima-docbook-tutorials-and-users-guides/src/docbook/tug.aas.xml
deleted file mode 100644
index 65b05f4..0000000
--- a/uima-docbook-tutorials-and-users-guides/src/docbook/tug.aas.xml
+++ /dev/null
@@ -1,283 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.tug.aas">

-  <title>Annotations, Artifacts, and Sofas</title>

-  <titleabbrev>Annotations, Artifacts &amp; Sofas</titleabbrev>

-  

-  <para>Up to this point, the documentation has focused on analyzing strings of Unicode text,

-    producing subtypes of Annotations which reference offsets in those strings. This

-    chapter generalizes this concept and shows how other kinds of artifacts can be handled,

-    including non-text things like audio and images, and how you can define your own kinds of

-    <quote>annotations</quote> for these.</para>

-  

-  <section id="ugr.tug.aas.terminology">

-    <title>Terminology</title>

-    

-    <section id="ugr.tug.aas.artifact">

-      <title>Artifact</title>

-      

-      <para>The Artifact is the unstructured thing being analyzed by an annotator. It could

-        be an HTML web page, an image, a video stream, a recorded audio conversation, an MPEG-4

-        stream, etc. Artifacts are often restructured in the course of processing to

-        facilitate particular kinds of analysis. For instance, an HTML page may be converted

-        into a <quote>de-tagged</quote> version. Annotators at different places in the

-        pipeline may be analyzing different versions of the artifact.</para>

-      

-    </section>

-    

-    <section id="ugr.tug.aas.sofa">

-      <title>Subject of Analysis &mdash; Sofa</title>

-      

-      <para>Each representation of an Artifact is called a Subject of Analysis, abbreviated

-        using the acronym <quote>Sofa</quote> which stands for <emphasis

-          role="underline">S</emphasis>ubject <emphasis role="underline">

-        OF</emphasis> <emphasis role="underline">A</emphasis>nalysis. Annotation

-        metadata, which have explicit designations of sub-regions of the artifact to which

-        they apply, are always associated with a particular Sofa. For instance, an

-        annotation over text specifies two features, the begin and end, which represent the

-        character offsets into the text string Sofa being analyzed.</para>

-      

-      <para>Other examples of representations of Artifacts, which could be Sofas include:

-        An HTML web page, a detagged web page, the translated text of that document, an audio or

-        video stream, closed-caption text from a video stream, etc.</para>

-      

-      <para>Often, there is one Sofa being analyzed in a CAS. The next chapter will show how

-        UIMA facilitates working with multiple representations of an artifact at the same

-        time, in the same CAS.</para>

-      

-    </section>

-  </section>

-  

-  <section id="ugr.tug.aas.sofa_data_formats">

-    <title>Formats of Sofa Data</title>

-    

-    <para>Sofa data can be Java Unicode Strings, Feature Structure arrays of primitive

-      types, or a URI which references remote data available via a network

-      connection.</para>

-    

-    <para>The arrays of primitive types can be things like byte arrays or float arrays, and are

-      intended to be used for artifacts like audio data, image data, etc.</para>

-    

-    <para>The URI form holds a URI specification String.</para>

-    

-    <note><para>Sofa data can be "serialized" using an XML format; when it is, the String data 

-      being serialized must not include invalid XML characters.  See

-      <xref linkend="ugr.tug.xmi_emf.xml_character_issues"/>.

-      </para></note>

-    

-  </section>

-  

-  <section id="ugr.tug.aas.setting_accessing_sofa_data">

-    <title>Setting and Accessing Sofa Data</title>

-    

-    <section id="ugr.tug.aas.setting_sofa_data">

-      <title>Setting Sofa Data</title>

-      

-      <para>When a CAS is created, you can set its Sofa Data, just one time; this property

-        insures that metadata describing regions of the Sofa remain valid. As a consequence,

-        the following methods that set data for a given Sofa can only be called once for a given

-        Sofa.</para>

-      

-      <para>The following methods on the CAS set the Sofa Data to one of the 3 formats. Assume

-        that the variable <quote>aCas</quote> holds a reference to a CAS:</para>

-      

-      

-      <programlisting><?db-font-size 80% ?>aCas.<emphasis role="bold">setSofaDataString</emphasis>(document_text_string, mime_type_string);

-aCas.<emphasis role="bold">setSofaDataArray</emphasis>(feature_structure_primitive_array, mime_type_string);

-aCas.<emphasis role="bold">setSofaDataURI</emphasis>(uri_string, mime_type_string);</programlisting>

-      

-      <para>In addition, the method

-        <literal>aCas.setDocumentText(document_text_string)</literal> may still be

-        used, and is equivalent to <literal>setSofaDataString(string,

-        "text")</literal>. The mime type is currently not used by the UIMA framework, but may

-        be set and retrieved by user code.</para>

-      

-      <para>Feature Structure primitive arrays are all the UIMA Array types except arrays of

-        Feature Structures, Strings, and Booleans. Typically, these are arrays of bytes,

-        but can be other types, such as floats, longs, etc.</para>

-      

-      <para>The URI string should conform to the standard URI format.</para>

-      

-    </section>

-    

-    <section id="ugr.tug.aas.accessing_sofa_data">

-      <title>Accessing Sofa Data</title>

-      

-      <para>The analysis algorithms typically work with the Sofa data. The following

-        methods on the CAS access the Sofa Data:</para>

-      

-      

-      <programlisting>String           aCas.getDocumentText();

-String           aCas.getSofaDataString();

-FeatureStructure aCas.getSofaDataArray();

-String           aCas.getSofaDataURI();

-String           aCas.getSofaMimeType();</programlisting>

-      

-      <para>The <literal>getDocumentText</literal> and

-        <literal>getSofaDataString</literal> return the same text string. The

-        <literal>getSofaDataURI</literal> returns the URI itself, not the data the URI is

-        pointing to. You can use standard Java I/O capabilities to get the data associated

-        with the URI, or use the UIMA Framework Streaming method described next.</para>

-      

-    </section>

-    

-    <section id="ugr.tug.aas.accessing_sofa_data_using_java_stream">

-      <title>Accessing Sofa Data using a Java Stream</title>

-      

-      <para>The framework provides a consistent method for accessing the Sofa data,

-        independent of it being stored locally, or accessed remotely using the URI. Get a Java

-        InputStream instance from the Sofa data using:</para>

-      

-      

-      <programlisting>InputStream inputStream = aCas.getSofaDataStream();</programlisting>

-      

-      <itemizedlist spacing="compact"><listitem><para>If the data is local, this method

-        returns a ByteArrayInputStream. This stream provides bytes.

-        

-        <itemizedlist><listitem><para>If the Sofa data was set using setDocumentText or

-          setSofaDataString, the String is converted to bytes by using the UTF-8

-          encoding.</para></listitem>

-          

-          <listitem><para>If the Sofa data was set as a DataArray, the bytes in the data array

-            are serialized, high-byte first. </para></listitem></itemizedlist>

-        </para></listitem>

-        

-        <listitem><para>If the Sofa data was specified as a URI, this method returns the

-          handle from url.openStream(). Java offers built-in support for several URI

-          schemes including <quote>FILE:</quote>, <quote>HTTP:</quote>,

-          <quote>FTP:</quote> and has an extensible mechanism,

-          <literal>URLStreamHandlerFactory</literal>, for customizing access to an

-          arbitrary URI. See more details at <ulink

-            url="http://java.sun.com/j2se/1.5.0/docs/api/java/net/URLStreamHandlerFactory.html"/>

-          . </para></listitem></itemizedlist>

-      

-    </section>

-  </section>

-  

-  <section id="ugr.tug.aas.sofa_fs">

-    <title>The Sofa Feature Structure</title>

-    

-    <para>Information about a Sofa is contained in a special built-in Feature Structure of

-      type <literal>uima.cas.Sofa</literal>. This feature structure is created and

-      managed by the UIMA Framework; users must not create it directly. Although these Sofa

-      type instances are implemented as standard feature structures, <emphasis>generic

-      CAS APIs can not be used to create Sofas or set their features</emphasis>. Instead,

-      Sofas are created implicitly by the creation of new CAS views. Similarly, Sofa features

-      are set by CAS methods such as <literal>cas.setDocumentText()</literal>.</para>

-    

-    <para>Features of the Sofa type include</para>

-    

-    <itemizedlist><listitem><para>SofaID: Every Sofa in a CAS has a unique SofaID. SofaIDs

-      are the primary handle for access. This ID is often the same as the name string given to the

-      Sofa by the developer, but it can be mapped to a different name (see <olink

-        targetdoc="&uima_docs_tutorial_guides;"

-        targetptr="ugr.tug.mvs.sofa_name_mapping"/>.</para></listitem>

-      

-      <listitem><para>Mime type: This string feature can be used to describe the type of the

-        data represented by a Sofa. It is not used by the framework; the framework provides

-        APIs to set and get its value.</para></listitem>

-      

-      <listitem><para>Sofa Data: The Sofa data itself. This data can be resident in the CAS or

-        it can be a reference to data outside the CAS. </para></listitem></itemizedlist>

-    

-  </section>

-  

-  <section id="ugr.tug.aas.annotations">

-    <title>Annotations</title>

-    

-    <para>Annotators add meta data about a Sofa to the CAS. It is often useful to have this

-      metadata denote a region of the Sofa to which it applies. For instance, assuming the Sofa

-      is a String, the metadata might describe a particular substring as the name of a person.

-      The built-in UIMA type, uima.tcas.Annotation, has two extra features that enable this

-      - the begin and end features - which denote a character position offset into the text

-      string being analyzed.</para>

-    

-    <para>The concept of <quote>annotations</quote> can be generalized for non-string

-      kinds of Sofas. For instance, an audio stream might have an audio annotation which

-      describes sounds regions in terms of floating point time offsets in the Sofa. An image

-      annotation might use two pairs of x,y coordinates to define the region the annotation

-      applies to.</para>

-    

-    <section id="ugr.tug.aas.built_in_annotation_types">

-      <title>Built-in Annotation types</title>

-      

-      <para>The built-in CAS type, <literal>uima.tcas.Annotation</literal>, is just one

-        kind of definition of an Annotation. It was designed for annotating text strings, and

-        has begin and end features which describe which substring of the Sofa being

-        annotated.</para>

-      

-      <para>For applications which have other kinds of Sofas, the UIMA developer will design

-        their own kinds of Annotation types, as needed to describe an annotation, by

-        declaring new types which are subtypes of

-        <literal>uima.cas.AnnotationBase</literal>. For instance, for images, you

-        might have the concept of a rectangular region to which the annotation applies. In

-        this case, you might describe the region with 2 pairs of x, y coordinates.</para>

-      

-    </section>

-    

-    <section id="ugr.tug.aas.annotations_associated_sofa">

-      <title>Annotations have an associated Sofa</title>

-      

-      <para>Annotations are always associated with a particular Sofa. In subsequent

-        chapters, you will learn how there can be multiple Sofas associated with an artifact;

-        which Sofa an annotation refers to is described by the Annotation feature structure

-        itself.</para>

-      

-      <para>All annotation types extend from the built-in type uima.cas.AnnotationBase.

-        This type has one feature, a reference to the Sofa associated with the annotation.

-        This value is currently used by the Framework to support the getCoveredText() method

-        on the annotation instance - this returns the portion of a text Sofa that the

-        annotation spans. It also is used to insure that the Annotation is indexed only in the

-        CAS View associated with this Sofa.</para>

-    </section>

-  </section>

-  

-  <section id="ugr.tug.aas.annotationbase">

-    <title>AnnotationBase</title>

-    

-    <para>A built-in type, <literal>uima.cas.AnnotationBase</literal>, is provided by

-      UIMA to allow users to extend the Annotation capabilities to different kinds of

-      Annotations. The <literal>AnnotationBase</literal> type has one feature, named

-      <literal>sofa</literal>, which holds a reference to the

-      <literal>Sofa</literal> feature structure with which this annotation is associated. 

-      The <literal>sofa</literal> feature is automatically set when creating an annotation 

-      (meaning &mdash; any type derived from the built-in 

-      <literal>uima.cas.AnnotationBase</literal> type); it should not be set by the user.</para>

-    

-    <para>There is one method, <literal>getView</literal>(), provided by

-      <literal>AnnotationBase</literal> that returns the CAS View for the Sofa the

-      annotation is pointing at. Note that this method always returns a CAS, even when applied

-      to JCas annotation instances.</para>

-    

-    <para>The built-in type <literal>uima.tcas.Annotation</literal> extends

-      <literal>uima.cas.AnnotationBase</literal> and adds two features, a begin and an

-      end feature, which are suitable for identifying a span in a text string that the

-      annotation applies to. Users may define other extensions to

-      <literal>AnnotationBase</literal> with alternative specifications that can

-      denote a particular region within the subject of analysis, as appropriate to their

-      application.</para>

-    

-  </section>

-</chapter>
\ No newline at end of file
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/tug.application.xml b/uima-docbook-tutorials-and-users-guides/src/docbook/tug.application.xml
deleted file mode 100644
index 763ee93..0000000
--- a/uima-docbook-tutorials-and-users-guides/src/docbook/tug.application.xml
+++ /dev/null
@@ -1,1859 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY imgroot "images/tutorials_and_users_guides/tug.application/">

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent">  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.tug.application">

-  <title>Application Developer&apos;s Guide</title>

-  

-  <para>This chapter describes how to develop an application using the Unstructured Information Management

-    Architecture (UIMA). The term <emphasis>application</emphasis> describes a program that provides end-user

-    functionality. A UIMA application incorporates one or more UIMA components such as Analysis Engines,

-    Collection Processing Engines, a Search Engine, and/or a Document Store and adds application-specific logic

-    and user interfaces.</para>

-  

-  <section id="ugr.tug.appication.uimaframework_class">

-    <title>The UIMAFramework Class</title>

-    

-    <para>An application developer's starting point for accessing UIMA framework functionality is the

-      <literal>org.apache.uima.UIMAFramework</literal> class. The following is a short introduction to some

-      important methods on this class. Several of these methods are used in examples in the rest of this chapter. For

-      more details, see the Javadocs (in the docs/api directory of the UIMA SDK).

-      

-      <itemizedlist>

-        <listitem>

-          <para>UIMAFramework.getXMLParser(): Returns an instance of the UIMA XML Parser class, which then can be

-            used to parse the various types of UIMA component descriptors. Examples of this can be found in the

-            remainder of this chapter.</para>

-        </listitem>

-        

-        <listitem>

-          <para>UIMAFramework.produceXXX(ResourceSpecifier): There are various produce methods that are used

-            to create different types of UIMA components from their descriptors. The argument type,

-            ResourceSpecifier, is the base interface that subsumes all types of component descriptors in UIMA. You

-            can get a ResourceSpecifier from the XMLParser. Examples of produce methods are:

-            

-            <itemizedlist>

-              <listitem>

-                <para>produceAnalysisEngine</para>

-              </listitem>

-              <listitem>

-                <para>produceCasConsumer</para>

-              </listitem>

-              <listitem>

-                <para>produceCasInitializer</para>

-              </listitem>

-              <listitem>

-                <para>produceCollectionProcessingEngine</para>

-              </listitem>

-              <listitem>

-                <para>produceCollectionReader</para>

-              </listitem>

-            </itemizedlist>

-            There are other variations of each of these methods that take additional, optional arguments. See the

-            Javadocs for details. </para>

-        </listitem>

-        

-        <listitem>

-          <para>UIMAFramework.getLogger(&lt;optional-logger-name&gt;): Gets a reference to the UIMA Logger,

-            to which you can write log messages. If no logger name is passed, the name of the returned logger instance

-            is <quote>org.apache.uima</quote>.</para>

-        </listitem>

-        

-        <listitem>

-          <para>UIMAFramework.getVersionString(): Gets the number of the UIMA version you are using.</para>

-        </listitem>

-        

-        <listitem>

-          <para>UIMAFramework.newDefaultResourceManager(): Gets an instance of the UIMA ResourceManager. The

-            key method on ResourceManager is setDataPath, which allows you to specify the location where UIMA

-            components will go to look for their external resources. Once you've obtained and initialized a

-            ResourceManager, you can pass it to any of the produceXXX methods. </para>

-        </listitem>

-      </itemizedlist></para>

-    

-  </section>

-  

-  <section id="ugr.tug.application.using_aes">

-    <title>Using Analysis Engines</title>

-    

-    <para>This section describes how to add analysis capability to your application by using Analysis Engines

-      developed using the UIMA SDK. An <emphasis>Analysis Engine (AE)</emphasis> is a component that analyzes

-      artifacts (e.g. documents) and infers information about them.</para>

-    

-    <para>An Analysis Engine consists of two parts - Java classes (typically packaged as one or more JAR files) and

-      <emphasis>AE descriptors</emphasis> (one or more XML files). You must put the Java classes in your

-      application&apos;s class path, but thereafter you will not need to directly interact with them. The UIMA

-      framework insulates you from this by providing a standard AnalysisEngine interfaces.</para>

-    

-    <para>The term <emphasis>Text Analysis Engine (TAE)</emphasis> is sometimes used to describe an Analysis

-      Engine that analyzes a text document. In the UIMA SDK v1.x, there was a TextAnalysisEngine interface that was

-      commonly used. However, as of the UIMA SDK v2.0, this interface has been deprecated and all applications should

-      switch to using the standard AnalysisEngine interface.</para>

-    

-    <para>The AE descriptor XML files contain the configuration settings for the Analysis Engine as well as a

-      description of the AE&apos;s input and output requirements. You may need to edit these files in order to

-      configure the AE appropriately for your application - the supplier of the AE may have provided documentation

-      (or comments in the XML descriptor itself) about how to do this.</para>

-    

-    <section id="ugr.tug.application.instantiating_an_ae">

-      <title>Instantiating an Analysis Engine</title>

-      

-      <para>The following code shows how to instantiate an AE from its XML descriptor:

-        

-        

-        <programlisting>  //get Resource Specifier from XML file

-XMLInputSource in = new XMLInputSource("MyDescriptor.xml");

-ResourceSpecifier specifier = 

-    UIMAFramework.getXMLParser().parseResourceSpecifier(in);

-

-  //create AE here

-AnalysisEngine ae = 

-    UIMAFramework.produceAnalysisEngine(specifier);</programlisting></para>

-      

-      <para>The first two lines parse the XML descriptor (for AEs with multiple descriptor files, one of them is the

-        <quote>main</quote> descriptor - the AE documentation should indicate which it is). The result of the parse

-        is a <literal>ResourceSpecifier</literal> object. The third line of code invokes a static factory method

-        <literal>UIMAFramework.produceAnalysisEngine</literal>, which takes the specifier and instantiates

-        an <literal>AnalysisEngine</literal> object.</para>

-      

-      <para>There is one caveat to using this approach - the Analysis Engine instance that you create will not support

-        multiple threads running through it concurrently. If you need to support this, see <xref

-          linkend="ugr.tug.applications.multi_threaded"/>.</para>

-      

-    </section>

-    

-    <section id="ugr.tug.application.analyzing_text_documents">

-      <title>Analyzing Text Documents</title>

-      

-      <para>There are two ways to use the AE interface to analyze documents. You can either use the

-        <emphasis>JCas</emphasis> interface, which is described in detail in <olink

-          targetdoc="&uima_docs_ref;"/> <olink

-          targetdoc="&uima_docs_ref;" targetptr="ugr.ref.jcas"/> or you can directly use the

-        <emphasis>CAS</emphasis> interface, which is described in detail in <olink

-          targetdoc="&uima_docs_ref;"/> <olink

-          targetdoc="&uima_docs_ref;" targetptr="ugr.ref.cas"/>. Besides text documents, other kinds of

-        artifacts can also be analyzed; see <olink targetdoc="&uima_docs_tutorial_guides;"

-          targetptr="ugr.tug.aas"/> for more information.</para>

-      

-      <para>The basic structure of your application will look similar in both cases:</para>

-      

-      <para>Using the JCas

-        

-        

-        <programlisting>  //create a JCas, given an Analysis Engine (ae)

-JCas jcas = ae.newJCas();

-  

-  //analyze a document

-jcas.setDocumentText(doc1text);

-ae.process(jcas);

-doSomethingWithResults(jcas);

-jcas.reset();

-  

-  //analyze another document

-jcas.setDocumentText(doc2text);

-ae.process(jcas);

-doSomethingWithResults(jcas);

-jcas.reset();

-...

-  //done

-ae.destroy();</programlisting></para>

-      

-      <para>Using the CAS

-        

-        

-        <programlisting>//create a CAS

-CAS aCasView = ae.newCAS();

-

-//analyze a document

-aCasView.setDocumentText(doc1text);

-ae.process(aCasView);

-doSomethingWithResults(aCasView);

-aCasView.reset();

-

-//analyze another document

-aCasView.setDocumentText(doc2text);

-ae.process(aCasView);

-doSomethingWithResults(aCasView);

-aCasView.reset();

-...

-//done

-ae.destroy();</programlisting></para>

-      

-      <para>First, you create the CAS or JCas that you will use. Then, you repeat the following four steps for each

-        document:</para>

-      

-      <orderedlist spacing="compact">

-        <listitem>

-          <para>Put the document text into the CAS or JCas.</para>

-        </listitem>

-        

-        <listitem>

-          <para>Call the AE's process method, passing the CAS or JCas as an argument</para>

-        </listitem>

-        

-        <listitem>

-          <para>Do something with the results that the AE has added to the CAS or JCas</para>

-        </listitem>

-        

-        <listitem>

-          <para>Call the CAS's or JCas's reset() method to prepare for another analysis </para>

-        </listitem>

-      </orderedlist>

-      

-    </section>

-    

-    <section id="ugr.tug.applications.analyzing_non_text_artifacts">

-      <title>Analyzing Non-Text Artifacts</title>

-      

-      <para>Analyzing non-text artifacts is similar to analyzing text documents. The main difference is that

-        instead of using the <literal>setDocumentText</literal> method, you need to use the Sofa APIs to set the

-        artifact into the CAS. See <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aas"/>

-        for details.</para>

-      

-    </section>

-    <section id="ugr.tug.applications.accessing_analysis_results">

-      <title>Accessing Analysis Results</title>

-      <para>Annotators (and applications) access the results of analysis via the CAS, using the CAS or JCas

-        interfaces. These results are accessed using the CAS Indexes. There is one built-in index for instances of

-        the built-in type <literal>uima.tcas.Annotation</literal> that can be used to retrieve instances of

-        <literal>Annotation</literal> or any subtype of Annotation. You can also define additional indexes over

-        other types. </para>

-      <para>Indexes provide a method to obtain an iterators over their contents; the iterator returns the matching

-        elements one at time from the CAS.</para>

-      

-      <section id="ugr.tug.applications.accessing_results_using_jcas">

-        <title>Accessing Analysis Results using the JCas</title>

-        

-        <para>See:</para>

-        

-        <itemizedlist>

-          <listitem>

-            <para> <olink targetdoc="&uima_docs_tutorial_guides;"

-                targetptr="ugr.tug.aae.reading_results_previous_annotators"/> </para>

-          </listitem>

-          

-          <listitem>

-            <para> <olink targetdoc="&uima_docs_ref;"/> 

-                   <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.jcas"/></para>

-          </listitem>

-          

-          <listitem>

-            <para>The Javadocs for <literal>org.apache.uima.jcas.JCas</literal>. </para>

-          </listitem>

-        </itemizedlist>

-        

-      </section>

-      

-      <section id="ugr.tug.application.accessing_results_using_cas">

-        <title>Accessing Analysis Results using the CAS</title>

-        

-        <para>See:</para>

-        

-        <itemizedlist>

-          <listitem>

-            <para> <olink targetdoc="&uima_docs_ref;"/>

-                   <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.cas"/></para>

-          </listitem>

-          

-          <listitem>

-            <para> The source code for <literal>org.apache.uima.examples.PrintAnnotations</literal>, which

-              is in <literal>examples\src.</literal></para>

-          </listitem>

-          

-          <listitem>

-            <para>The Javadocs for the <literal>org.apache.uima.cas</literal> and

-              <literal>org.apache.uima.cas.text</literal> packages. </para>

-          </listitem>

-        </itemizedlist>

-      </section>

-    </section>

-    

-    <section id="ugr.tug.applications.multi_threaded">

-      <title>Multi-threaded Applications</title>

-      

-      <para>You may be running on a multi-core system, and want to run multiple CASes at once through your pipeline.  To support this, UIMA provides multiple approaches.

-      The most flexible and recommended way to do this is to use the features of UIMA-AS, which not only allows scale-up (multiple threads in one CPU), but also

-      supports scale-out (exploiting a cluster of machines).</para>

-      

-      <para>This section describes the simplest way to use an AE in a multi-threaded environment. 

-      First, note that most Analysis Engines are written with the assumption that only one thread will be accessing 

-      it at any one time; that is, Analysis Engines are not written to be thread safe.  The writers of these 

-      assume that multiple instances of the Annotator Engine class will be instantiated as needed to support multiple 

-      threads.

-      </para>

-      <para>If your application has multiple threads that might invoke an Analysis Engine, to insure that 

-      only one thread at a time uses a CAS and runs in the pipeline, 

-      you can use the Java synchronized keyword to

-        ensure that only one thread is using an AE at any given time. For example:

-        

-        <programlisting>public class MyApplication {

-  private AnalysisEngine mAnalysisEngine;

-  private CAS mCAS;

-

-  public MyApplication() {

-    //get Resource Specifier from XML file

-    XMLInputSource in = new XMLInputSource("MyDescriptor.xml");

-    ResourceSpecifier specifier = 

-        UIMAFramework.getXMLParser().parseResourceSpecifier(in);

- 

-    //create Analysis Engine here

-    mAnalysisEngine = UIMAFramework.produceAnalysisEngine(specifier);

-    mCAS = mAnalysisEngine.newCAS();

-  }

-

-  // Assume some other part of your multi-threaded application could

-  // call <quote>analyzeDocument</quote> on different threads, asynchronously

-

-  public synchronized void analyzeDocument(String aDoc) {

-    //analyze a document

-    mCAS.setDocumentText(aDoc);

-    mAnalysisEngine.process();  

-    doSomethingWithResults(mCAS);

-    mCAS.reset();

-  }

-  ...

-}</programlisting></para>

-      

-      <para>Without the synchronized keyword, this application would not be thread-safe. If multiple threads

-        called the analyzeDocument method simultaneously, they would both use the same CAS and clobber each others'

-        results. The synchronized keyword ensures that no more than one thread is executing this method at any given

-        time. For more information on thread synchronization in Java, see <ulink

-          url="http://docs.oracle.com/javase/tutorial/essential/concurrency/"/>

-        .</para>

-      

-      <para>The synchronized keyword ensures thread-safety, but does not allow you to process more than one

-        document at a time. If you need to process multiple documents simultaneously (for example, to make use of a

-        multiprocessor machine), you&apos;ll need to use more than one CAS instance.</para>

-      

-      <para>Because CAS instances use memory and can take some time to construct, you don't want to create a new CAS

-        instance for each request. Instead, you should use a feature of the UIMA SDK called the <emphasis>CAS

-        Pool</emphasis>, implemented by the type <literal>CasPool</literal>.</para>

-      

-      <para>A CAS Pool contains some number of CAS instances (you specify how many when you create the pool). When a

-        thread wants to use a CAS, it <emphasis>checks out</emphasis> an instance from the pool. When the thread is

-        done using the CAS, it must <emphasis>release</emphasis> the CAS instance back into the pool. If all

-        instances are checked out, additional threads will block and wait for an instance to become available. Here

-        is some example code:

-        

-        

-        <programlisting>public class MyApplication {

-  private CasPool mCasPool;

-  

-  private AnalysisEngine mAnalysisEngine;

-  

-  public MyApplication()

-  {

-    //get Resource Specifier from XML file

-    XMLInputSource in = new XMLInputSource("MyDescriptor.xml");

-    ResourceSpecifier specifier = 

-      UIMAFramework.getXMLParser().parseResourceSpecifier(in);

- 

-    //Create multithreadable AE that will 

-    //Accept 3 simultaneous requests

-    //The 3rd parameter specifies a timeout.

-    //When the number of simultaneous requests exceeds 3,

-    // additional requests will wait for other requests to finish. 

-    // This parameter determines the maximum number of milliseconds 

-    // that a new request should wait before throwing an

-    // - a value of 0 will cause them to wait forever.

-    mAnalysisEngine = UIMAFramework.produceAnalysisEngine(specifier,3,0);

-

-    //create CAS pool with 3 CAS instances

-    mCasPool = new CasPool(3, mAnalysisEngine);

-  }

-

-  // Notice this is no longer "synchronized"

-  public void analyzeDocument(String aDoc) {

-    //check out a CAS instance (argument 0 means no timeout)

-    CAS cas = mCasPool.getCas(0);  

-    try {

-      //analyze a document 

-      cas.setDocumentText(aDoc);   

-      mAnalysisEngine.process(cas);  

-      doSomethingWithResults(cas);

-    } finally {

-      //MAKE SURE we release the CAS instance

-      mCasPool.releaseCas(cas);  

-    }

-  }

-  ...

-}</programlisting></para>

-      

-      <para>There is not much more code required here than in the previous example. First, there is one additional

-        parameter to the AnalysisEngine producer, specifying the number of annotator instances to

-        create<footnote>

-        <para> Both the UIMA Collection Processing Manager framework and the remote deployment services framework

-          have implementations which use CAS pools in this manner, and thereby relieve the annotator developer of

-          the necessity to make their annotators thread-safe.</para> </footnote>. Then, instead of creating a

-        single CAS in the constructor, we now create a CasPool containing 3 instances. In the analyze method, we check

-        out a CAS, use it, and then release it.</para> <note>

-      <para>Frequently, the two numbers (number of CASes, and the number of AEs) will be the same. It would not make

-        sense to have the number of CASes less than the number of AEs

-        &ndash; the extra AE instances would always block waiting for a CAS from the pool. It could make sense to have

-        additional CASes, though &ndash; if you had other multi-threaded processes that were using the CASes, other

-        than the AEs. </para> </note>

-      

-      <para>The getCAS() method returns a CAS which is not specialized to any particular subject of analysis. To

-        process things other than this, please refer to <olink targetdoc="&uima_docs_tutorial_guides;"

-          targetptr="ugr.tug.aas"/> .</para>

-      

-      <para>Note the use of the try...finally block. This is very important, as it ensures that the CAS we have checked

-        out will be released back into the pool, even if the analysis code throws an exception. You should always use

-        try...finally when using the CAS pool; if you do not, you risk exhausting the pool and causing

-        deadlock.</para>

-      

-      <para>The parameter 0 passed to the CasPool.getCas() method is a timeout value. If this is set to a positive

-        integer, it is the maximum number of milliseconds that the thread will wait for an instance to become

-        available in the pool. If this time elapses, the getCas method will return null, and the application can do

-        something intelligent, like ask the user to try again later. A value of 0 will cause the thread to wait for an

-        available CAS, potentially forever.</para>

-        

-      <para>All of this can better be done using UIMA-AS.  Besides taking care of setting up the CAS pools, etc.,

-      UIMA-AS allows a pipe line having several delegates to be scaled-up optimally for each delegate; 

-      one delegate might have 5 instances, while another might have 3.  It also does

-      a different kind of initialization, in that it creates a thread pool itself, and insures that each

-      annotator instance gets its process() method called using the same thread that was used for that annotator 

-      instance's initialization call; some annotators could be written assuming that this is the case.</para>

-    </section>

-    

-    <section id="ugr.tug.application.using_multiple_aes">

-      <title>Using Multiple Analysis Engines and Creating Shared CASes</title>

-      <titleabbrev>Multiple AEs &amp; Creating Shared CASes</titleabbrev>

-      

-      <para>In most cases, the easiest way to use multiple Analysis Engines from within an application is to combine

-        them into an aggregate AE. For instructions, see <olink targetdoc="&uima_docs_tutorial_guides;"

-          targetptr="ugr.tug.aae.building_aggregates"/>. Be sure that you understand this method before

-        deciding to use the more advanced feature described in this section.</para>

-      

-      <para>If you decide that your application does need to instantiate multiple AEs and have those AEs share a

-        single CAS, then you will no longer be able to use the various methods on the

-        <literal>AnalysisEngine</literal> class that create CASes (or JCases) to create your CAS. This is because

-        these methods create a CAS with a data model specific to a single AE and which therefore cannot be shared by

-        other AEs. Instead, you create a CAS as follows:</para>

-      

-      <para>Suppose you have two analysis engines, and one CAS Consumer, and you want to create one type system from

-        the merge of all of their type specifications. Then you can do the following:</para>

-      

-      

-      <programlisting>AnalysisEngineDescription aeDesc1 =

-  UIMAFramework.getXMLParser().parseAnalysisEngineDescription(...);

-  

-  AnalysisEngineDescription aeDesc2 =

-  UIMAFramework.getXMLParser().parseAnalysisEngineDescription(...);

-

-  CasConsumerDescription ccDesc =

-  UIMAFramework.getXMLParser().parseCasConsumerDescription(...);

-

-  List list = new ArrayList();

-

-  list.add(aeDesc1);

-  list.add(aeDesc2);

-  list.add(ccDesc);

-

-  CAS cas = CasCreationUtils.createCas(list);

-

-  // (optional, if using the JCas interface) 

-  JCas jcas = cas.getJCas();</programlisting>

-      

-      <para>The CasCreationUtils class takes care of the work of merging the AEs&apos; type systems and producing a

-        CAS for the combined type system. If the type systems are not compatible, an exception will be thrown.</para>

-      

-    </section>

-    

-    <section id="ugr.tug.application.saving_cases_to_file_systems">

-      <title>Saving CASes to file systems or general Streams</title>

-      

-      <para>The UIMA framework provides multiple APIs to save and restore the contents of a CAS to streams. 

-      Two common uses of this are to save CASes to the file system, and to send CASes to other processes, running

-      on remote systems.</para>

-      

-      <para>

-        The CASes can be serialized in multiple formats:

-        <itemizedlist>

-          <listitem>

-            <para>Binary formats:

-              <itemizedlist>

-                <listitem>

-                  <para>plain binary: This is used to communicate with remote services, and also for interfacing with

-                  annotators written in C/C++ or related languages via the JNI Java interface, from Java</para>

-                </listitem>

-                <listitem>

-                  <para>Compressed binary: There are two forms of compressed binary.  The recommend one is form 6, which also allows

-                  type filtering. See <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.compress.overview"/>.</para>

-                </listitem>

-              </itemizedlist>

-            </para>

-          </listitem>

-          <listitem>

-            <para>XML formats: There are two forms of this format. The preferred form is the XMI form (see 

-             <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xmi"/>). An older format is also available,

-               called XCAS.</para>

-          </listitem>

-          <listitem>

-            <para>JSON formats (as of version 2.7.0): 

-            This is intended for exposing results in the CAS as JSON objects for use by 

-            web applications.  See <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.json.overview"/>.

-            For JSON, only serialization is supported.</para>

-          </listitem>

-          <listitem>

-            <para>Java Object serialization: There are APIs to convert a CAS to a Java object that can be serialized

-            and deserialized

-            using standard Java object read and write Object methods.  There is also a way to include the CAS's type system and 

-            index definition.</para>

-          </listitem>

-        </itemizedlist>

-      </para>

-      

-      <para>Each of these serializations has different capabilities, summarized in the table below.

-       <table frame="all" id="ugr.tug.tbl.serialization_capabilities">

-          <title>Serialization Capabilities</title>

-          <tgroup cols="8" rowsep="1" colsep="1">

-            <colspec colname="c1" colwidth="6*"/>

-            <colspec colname="c2" colwidth="5*"/>

-            <colspec colname="c3" colwidth="5*"/>

-            <colspec colname="c4" colwidth="5*"/>

-            <colspec colname="c5" colwidth="5*"/>

-            <colspec colname="c6" colwidth="5*"/>

-            <colspec colname="c7" colwidth="5*"/>

-            <colspec colname="c8" colwidth="5*"/>

-            <thead>

-              <row>

-                <entry align="center"></entry>

-                <entry align="center">XCAS</entry>

-                <entry align="center">XMI</entry>

-                <entry align="center">JSON</entry>

-                <entry align="center">Binary</entry>

-                <entry align="center">Cmpr 4</entry>

-                <entry align="center">Cmrp 6</entry>

-                <entry align="center">JavaObj</entry>

-              </row>

-            </thead>

-            <tbody>

-              <row>

-                <entry>Output</entry>

-                <entry>Output Stream</entry>

-                <entry>Output Stream</entry>

-                <entry>Output Stream, File, Writer</entry>

-                <entry>Output Stream</entry>

-                <entry>Output Stream, Data Output Stream, File</entry>

-                <entry>Output Stream, Data Output Stream, File</entry>

-                <entry>-</entry>

-              </row>

-              <row>

-                <entry>Lists/Arrays inline formating?</entry>

-                <entry>-</entry>

-                <entry>Yes</entry>

-                <entry>Yes</entry>

-                <entry>-</entry>

-                <entry>-</entry>

-                <entry>-</entry>

-                <entry>-</entry>

-              </row>

-              <row>

-                <entry>Formated?</entry>

-                <entry>-</entry>

-                <entry>Yes</entry>

-                <entry>Yes</entry>

-                <entry>-</entry>

-                <entry>-</entry>

-                <entry>-</entry>

-                <entry>-</entry>

-              </row>

-              <row>

-                <entry>Type Filtering?</entry>

-                <entry>-</entry>

-                <entry>Yes</entry>

-                <entry>Yes</entry>

-                <entry>-</entry>

-                <entry>-</entry>

-                <entry>Yes</entry>

-                <entry>-</entry>

-              </row>

-              <row>

-                <entry>Delta Cas?</entry>

-                <entry>-</entry>

-                <entry>Yes</entry>

-                <entry>-</entry>

-                <entry>Yes</entry>

-                <entry>Yes</entry>

-                <entry>Yes</entry>

-                <entry>-</entry>

-              </row>

-              <row>

-                <entry>OOTS?</entry>

-                <entry>Yes</entry>

-                <entry>Yes</entry>

-                <entry>-</entry>

-                <entry>-</entry>

-                <entry>-</entry>

-                <entry>-</entry>

-                <entry>-</entry>

-              </row>

-              <row>

-                <entry>Only send indexed + reachable FSs?</entry>

-                <entry>Yes</entry>

-                <entry>Yes</entry>

-                <entry>Yes</entry>

-                <entry>send all</entry>

-                <entry>send all</entry>

-                <entry>Yes</entry>

-                <entry>send all</entry>

-              </row>

-              <row>

-                <entry>Name Space / Schemas?</entry>

-                <entry>-</entry>

-                <entry>Yes</entry>

-                <entry>-</entry>

-                <entry>-</entry>

-                <entry>-</entry>

-                <entry>-</entry>

-                <entry>-</entry>

-              </row>

-             <row>

-                <entry>lenient available?</entry>

-                <entry>Yes</entry>

-                <entry>Yes</entry>

-                <entry>-</entry>

-                <entry>-</entry>

-                <entry>-</entry>

-                <entry>Yes</entry>

-                <entry>-</entry>

-              </row>

-              <row>

-                <entry>optionally include embedded Type System and Indexes definition?</entry>

-                <entry>-</entry>

-                <entry>-</entry>

-                <entry>Just type system</entry>

-                <entry>Yes</entry>

-                <entry>Yes</entry>

-                <entry>Yes</entry>

-                <entry>Yes</entry>

-              </row>

-              

-            </tbody>

-          </tgroup>

-          

-        </table>

-      </para>

-      

-      <para>In the above table, Cmpr 4 and Cmpr 6 refer to Compressed forms of the serialization, 

-      and JavaObj refers to Java Object serialization.</para>

-      

-      <para>For the XMI and JSON formats, lists and arrays can sometimes be formatted "inline".

-      In this representation, the elements are formatted directly as the value of a particular

-      feature.  This is only done if the arrays and lists are not multiply-referenced.</para>

-      

-      <para>Type Filtering support enables only a subset of the types and/or features to be

-      serialized. An additional type system object is used to specify the types to be included

-      in the serialization.  This can be useful, for instance, when sending a CAS to a remote service,

-      where the remote service only uses a small number of the types and features, to reduce the size

-      of the serialized CAS.</para>

-      

-      <para>Delta Cas support makes use of a "mark" set in the CAS, and only serializes changes in the CAS,

-      both new and modified Feature Structures, that were added or changed after the mark was set.

-      This is useful for remote services, supporting the use-case where a large CAS is sent to the service,

-      which sets the mark in the received CAS, and then adds a small amount of information; 

-      the Delta CAS then serializes only that small amount as the "reply" sent back to the sender.</para>

-      

-      <para>OOTS means "Out of Type System" support, intended to support the use-case where a CAS is being sent

-      to a remote application.  This supports deserializing an incoming CAS where

-      some of the types and/or features may not be present in the receiving CAS's type system.  A "lenient" 

-      option on the deserialization permits the deserialization to proceed, with the out-of-type-system

-      information preserved so that when the CAS is subsequently reserialized (in the use-case, to be 

-      returned back to the sender), the out-of-type-system information is re-merged back into the output stream.

-      </para>

-      

-      <para>The Binary, Java Object, and Compressed Form 4 serializations send all the Feature Structures in the CAS,

-      in the order they were created in the CAS.  The other methods only 

-      send Feature Structures that are reachable, either by 

-      their being in some CAS index, or being referenced 

-      as a feature of another Feature Structure which is reachable.</para>

-      

-      <para>The NameSpace/Schema support allows specifying a set of schemas, each one corresponding to a particular

-      namespace, used in XMI serialization.</para>

-      

-      <para>Lenient allows the receiving Type System to be missing types and/or features that being deserialized.

-      Normally this causes an exception, but with the lenient flag turned on, these extra types and/or features are 

-      skipped over and ignored, with no error indicated.</para>

-      

-      <para>Some formats optionally allow embedded type system and indexes definition to be saved; 

-      loaders for these can use that information to

-      replace the CAS's type system and indexes definition, or (for compressed form 6) use the type system part

-      to decode the serialized data.  This is described in detail in the Javadocs for CasIOUtils.

-      JSON serialization has several alternatives for optionally including portions of the type system, described in

-      the reference document chapter on JSON.</para>

-      

-      <para>To save an XMI representation of a CAS, use the <code>save</code> method in <code>CasIOUtils</code> or the 

-        <literal>serialize</literal> method of the class

-        <literal>org.apache.uima.util.XmlCasSerializer</literal>. To save an XCAS representation of a CAS,

-        use the <code>save</code> method in <code>CasIOUtils</code> class or use the 

-        <literal>org.apache.uima.cas.impl.XCASSerializer</literal> instead; see the Javadocs

-        for details.</para>

-      

-      <para>All the external serialized forms (except JSON and the inline CAS approximate serialization) 

-        can be read back in using the <code>CasIOUtils load</code> methods.

-        The <code>CasIOUtils load</code> methods also have API forms that support 

-        loading type system and index definition information

-        at the same time (from addition input sources); there is also a form for loading compressed form 6 where

-        you can pass the type system to use for decoding, when it is different from that of the receiving CAS. 

-        The XCAS and XMI external forms can also be read back in using the <literal>deserialize</literal> method of

-        the class <literal>org.apache.uima.util.XmlCasDeserializer</literal>. All of these methods deserialize

-        into a pre-existing CAS, which you must create ahead of time.  See the

-        Javadocs for details.</para>

-        

-      <para>        

-      The <code>Serialization</code> class has various static methods for serializing and deserializing Java Object forms and 

-      compressed forms, with finer control over available options.   

-      See the Javadocs for that class for details.</para>

-      

-      <para>Several of the APIs use or return instances of <code>SerialFormat</code>, which is an enum specifying the various

-      forms of serialization.</para>

-      

-      <para>Serialization often makes use of temporary extra data structures, anchored from the CAS being serialized.

-        These are read/write, and because of this, most serializations are synchronized to prevent multiple

-        serializations of the same CAS from happening in parallel.</para>  

-    </section>

-  </section>

-  

-  <section id="ugr.tug.application.using_cpes">

-    <title>Using Collection Processing Engines</title>

-    

-    <para>A <emphasis>Collection Processing Engine (CPE)</emphasis> processes collections of artifacts

-      (documents) through the combination of the following components: a Collection Reader, an optional CAS

-      Initializer, Analysis Engines, and CAS Consumers. Collection Processing Engines and their components are

-      described in <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.cpe"/> .</para>

-    

-    <para>Like Analysis Engines, CPEs consist of a set of Java classes and a set of descriptors. You need to make sure

-      the Java classes are in your classpath, but otherwise you only deal with descriptors.</para>

-    

-    <section id="ugr.tug.application.running_a_cpe_from_a_descriptor">

-      <title>Running a Collection Processing Engine from a Descriptor</title>

-      <titleabbrev>Running a CPE from a Descriptor</titleabbrev>

-      

-      <para><olink targetdoc="&uima_docs_tutorial_guides;"

-          targetptr="ugr.tug.cpe.running_cpe_from_application"/> describes how to use the APIs to read a CPE

-        descriptor and run it from an application.</para>

-      

-    </section>

-    

-    <section id="ugr.tug.application.configuring_a_cpe_descriptor_programmatically">

-      <title>Configuring a Collection Processing Engine Descriptor Programmatically</title>

-      <titleabbrev>Configuring a CPE Descriptor Programmatically</titleabbrev>

-      

-      <para>For the finest level of control over the CPE descriptor settings, the CPE offers programmatic access to

-        the descriptor via an API. With this API, a developer can create a complete descriptor and then save the result

-        to a file. This also can be used to read in a descriptor (using XMLParser.parseCpeDescription as shown in the

-        previous section), modify it, and write it back out again. The CPE Descriptor API allows a developer to

-        redefine default behavior related to error handling for each component, turn-on check-pointing, change

-        performance characteristics of the CPE, and plug-in a custom timer.</para>

-      

-      <para>Below is some example code that illustrates how this works. See the Javadocs for package

-        org.apache.uima.collection.metadata for more details.</para>

-      

-      

-      <programlisting>//Creates descriptor with default settings

-CpeDescription cpe = CpeDescriptorFactory.produceDescriptor();

-

-//Add CollectionReader 

-cpe.addCollectionReader([descriptor]);

-

-//Add CasInitializer (deprecated)

-cpe.addCasInitializer(&lt;cas initializer descriptor&gt;);

-

-// Provide the number of CASes the CPE will use

-cpe.setCasPoolSize(2);

-

-//  Define and add Analysis Engine 

-CpeIntegratedCasProcessor personTitleProcessor = 

-   CpeDescriptorFactory.produceCasProcessor (<quote>Person</quote>);

-

-// Provide descriptor for the Analysis Engine

-personTitleProcessor.setDescriptor([descriptor]);

-

-//Continue, despite errors and skip bad Cas

-personTitleProcessor.setActionOnMaxError(<quote>continue</quote>);

-

-  //Increase amount of time in ms the CPE waits for response

-//from this Analysis Engine

-personTitleProcessor.setTimeout(100000);

-

-//Add Analysis Engine to the descriptor

-cpe.addCasProcessor(personTitleProcessor);

-                                

-//  Define and add CAS Consumer

-CpeIntegratedCasProcessor consumerProcessor = 

-CpeDescriptorFactory.produceCasProcessor(<quote>Printer</quote>);

-consumerProcessor.setDescriptor([descriptor]);

-

-//Define batch size

-consumerProcessor.setBatchSize(100);

-

-//Terminate CPE on max errors

-consumerProcessor.setActionOnMaxError(<quote>terminate</quote>);

-

-//Add CAS Consumer to the descriptor

-cpe.addCasProcessor(consumerProcessor);

-

-//  Add Checkpoint file and define checkpoint frequency (ms)

-cpe.setCheckpoint(<quote>[path]/checkpoint.dat</quote>, 3000);

-

-//  Plug in custom timer class used for timing events

-cpe.setTimer(<quote>org.apache.uima.internal.util.JavaTimer</quote>);

-

-//  Define number of documents to process

-cpe.setNumToProcess(1000);

-

-//  Dump the descriptor to the System.out

-((CpeDescriptionImpl)cpe).toXML(System.out);</programlisting>

-      

-      <para>The CPE descriptor for the above configuration looks like this:

-        

-        

-        <programlisting><![CDATA[<?xml version="1.0" encoding="UTF-8"?>

-<cpeDescription xmlns="http://uima.apache.org/resourceSpecifier">

-  <collectionReader>

-    <collectionIterator>

-      <descriptor>

-        <include href="[descriptor]"/>

-      </descriptor>

-      <configurationParameterSettings>...

-      </configurationParameterSettings>

-    </collectionIterator>

-

-    <casInitializer>

-      <descriptor>

-        <include href="[descriptor]"/>

-      </descriptor>

-      <configurationParameterSettings>...

-      </configurationParameterSettings>

-    </casInitializer>

-  </collectionReader>

-

-  <casProcessors casPoolSize="2" processingUnitThreadCount="1">

-    <casProcessor deployment="integrated" name="Person">

-      <descriptor>

-        <include href="[descriptor]"/>

-      </descriptor>

-      <deploymentParameters/>

-      <errorHandling>

-        <errorRateThreshold action="terminate" value="100/1000"/>

-        <maxConsecutiveRestarts action="terminate" value="30"/>

-        <timeout max="100000"/>

-      </errorHandling>

-      <checkpoint batch="100" time="1000ms"/>

-    </casProcessor>

-

-    <casProcessor deployment="integrated" name="Printer">

-      <descriptor>

-        <include href="[descriptor]"/>

-      </descriptor>

-      <deploymentParameters/>

-      <errorHandling>

-        <errorRateThreshold action="terminate"

-          value="100/1000"/>

-        <maxConsecutiveRestarts action="terminate"

-          value="30"/>

-        <timeout max="100000" default="-1"/>

-      </errorHandling>

-      <checkpoint batch="100" time="1000ms"/>

-    </casProcessor>

-  </casProcessors>

-

-  <cpeConfig>

-    <numToProcess>1000</numToProcess>

-    <deployAs>immediate</deployAs>

-    <checkpoint file="[path]/checkpoint.dat" time="3000ms"/>

-    <timerImpl>

-      org.apache.uima.reference_impl.util.JavaTimer

-    </timerImpl>

-  </cpeConfig>

-</cpeDescription>]]></programlisting></para>

-      

-    </section>

-  </section>

-  

-  <section id="ugr.tug.application.setting_configuration_parameters">

-    <title>Setting Configuration Parameters</title>

-    

-    <para>Configuration parameters can be set using APIs as well as configured using the XML descriptor metadata

-      specification (see <olink targetdoc="&uima_docs_tutorial_guides;"

-        targetptr="ugr.tug.aae.configuration_parameters"/>.</para>

-    

-    <para>There are two different places you can set the parameters via the APIs.</para>

-    

-    <itemizedlist spacing="compact">

-      <listitem>

-        <para>After reading the XML descriptor for a component, but before you produce the component itself,

-          and</para>

-      </listitem>

-      

-      <listitem>

-        <para>After the component has been produced. </para>

-      </listitem>

-    </itemizedlist>

-    

-    <para>Setting the parameters before you produce the component is done using the

-      ConfigurationParameterSettings object. You get an instance of this for a particular component by accessing

-      that component description&apos;s metadata. For instance, if you produced a component description by using

-      <literal>UIMAFramework.getXMLParser().parse...</literal> method, you can use that component

-      description&apos;s getMetaData() method to get the metadata, and then the metadata&apos;s

-      getConfigurationParameterSettings method to get the ConfigurationParameterSettings object. Using that

-      object, you can set individual parameters using the setParameterValue method. Here&apos;s an example, for a

-      CAS Consumer component:

-      

-      

-      <programlisting>// Create a description object by reading the XML for the descriptor

-

-CasConsumerDescription casConsumerDesc =  

-   UIMAFramework.getXMLParser().parseCasConsumerDescription(new

-     XMLInputSource("descriptors/cas_consumer/InlineXmlCasConsumer.xml"));

-

-// get the settings from the metadata

-ConfigurationParameterSettings consumerParamSettings =

-    casConsumerDesc.getMetaData().getConfigurationParameterSettings();

-

-// Set a parameter value

-consumerParamSettings.setParameterValue(

-  InlineXmlCasConsumer.PARAM_OUTPUTDIR,

-  outputDir.getAbsolutePath());</programlisting></para>

-    

-    <para>Then you might produce this component using:

-      

-      

-      <programlisting>CasConsumer component =

-  UIMAFramework.produceCasConsumer(casConsumerDesc);</programlisting></para>

-    

-    <para>A side effect of producing a component is calling the component's <quote>initialize</quote> method,

-      allowing it to read its configuration parameters. If you want to change parameters after this, use

-      

-      

-      <programlisting>component.setConfigParameterValue(

-    <quote>&lt;parameter-name&gt;</quote>,

-    <quote>&lt;parameter-value&gt;</quote>);</programlisting>

-      and then signal the component to re-read its configuration by calling the component's reconfigure method:

-      

-      <programlisting>component.reconfigure();</programlisting></para>

-    

-    <para>Although these examples are for a CAS Consumer component, the parameter APIs also work for other kinds of

-      components.</para>

-  </section>

-  

-  <section id="ugr.tug.application.integrating_text_analysis_and_search">

-    <title>Integrating Text Analysis and Search</title>

-    

-    <para>A combination of AEs with a search engine capable of indexing both words and annotations over spans

-      of text enables what UIMA refers to as <emphasis>semantic search</emphasis>.</para>

-    

-    <para>Semantic search is a search where the semantic intent of the query is specified using one or more entity or

-      relation specifiers. For example, one could specify that they are looking for a person (named)

-      <quote>Bush.</quote> Such a query would then not return results about the kind of bushes that grow in your

-      garden.</para>

-    

-    <section id="ugr.tug.application.building_an_index">

-      <title>Building an Index</title>

-      

-      <para>To build a semantic search index using the UIMA SDK, you run a Collection Processing Engine that includes

-        your AE along with a CAS Consumer which takes the tokens and annotatitions, together with sentence

-        boundaries, and feeds them to a semantic searcher's index term input. Your AE must include an annotator that produces

-        Tokens and Sentence annotations, along with any <quote>semantic</quote> annotations, because the

-        Indexer requires this.</para>

-      

-      <section id="ugr.tug.application.search.configuring_indexer">

-        <title>Configuring the Semantic Search CAS Indexer</title>

-        

-        <para>Since there are several ways you might want to build a search index from the information in the CAS

-          produced by your AE, you need to supply the Semantic Search CAS Consumer &ndash; Indexer with

-          configuration information in the form of an <emphasis>Index Build Specification</emphasis> file.

-          Apache UIMA includes code for parsing Index Build Specification files (see the Javadocs for details). An

-          example of an Indexing specification tailored to the AE from the tutorial in the <olink

-            targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aae"/> is located in

-          <literal>examples/descriptors/tutorial/search/MeetingIndexBuildSpec.xml</literal> . It looks

-          like this:

-          

-          

-          <programlisting><![CDATA[<indexBuildSpecification>

-  <indexBuildItem>

-    <name>org.apache.uima.examples.tokenizer.Token</name>

-    <indexRule>

-      <style name="Term"/>

-    </indexRule>    

-  </indexBuildItem>

-  <indexBuildItem>

-    <name>org.apache.uima.examples.tokenizer.Sentence</name>

-    <indexRule>

-      <style name="Breaking"/>

-    </indexRule>    

-  </indexBuildItem>

-  <indexBuildItem>

-    <name>org.apache.uima.tutorial.Meeting</name>

-    <indexRule>

-      <style name="Annotation"/>

-    </indexRule>    

-  </indexBuildItem>

-  <indexBuildItem>

-    <name>org.apache.uima.tutorial.RoomNumber</name>

-    <indexRule>

-      <style name="Annotation">

-        <attributeMappings>

-          <mapping>

-            <feature>building</feature>

-            <indexName>building</indexName>

-          </mapping>

-        </attributeMappings>

-      </style>

-    </indexRule>    

-  </indexBuildItem>

-  <indexBuildItem>

-    <name>org.apache.uima.tutorial.DateAnnot</name>

-    <indexRule>

-      <style name="Annotation"/>

-    </indexRule>    

-  </indexBuildItem>

-  <indexBuildItem>

-    <name>org.apache.uima.tutorial.TimeAnnot</name>

-    <indexRule>

-      <style name="Annotation"/>

-    </indexRule>    

-  </indexBuildItem>

-</indexBuildSpecification>]]></programlisting></para>

-        

-        <para>The index build specification is a series of index build items, each of which identifies a CAS

-          annotation type (a subtype of <literal>uima.tcas.Annotation</literal> &ndash; see <olink

-            targetdoc="&uima_docs_ref;"/> <olink

-            targetdoc="&uima_docs_ref;" targetptr="ugr.ref.cas"/>) and a style.</para>

-        

-        <para>The first item in this example specifies that the annotation type

-          <literal>org.apache.uima.examples.tokenizer.Token</literal> should be indexed with the

-          <quote>Term</quote> style. This means that each span of text annotated by a Token will be considered a

-          single token for standard text search purposes.</para>

-        

-        <para>The second item in this example specifies that the annotation type

-          <literal>org.apache.uima.examples.tokenizer.Sentence</literal> should be indexed with the

-          <quote>Breaking</quote> style. This means that each span of text annotated by a Sentence will be

-          considered a single sentence, which can affect that search engine's algorithm for matching queries. 

-        </para>

-          

-        <para>The remaining items all use the <quote>Annotation</quote> style. This indicates that each

-          annotation of the specified types will be stored in the index as a searchable span, with a name equal to the

-          annotation name (without the namespace).</para>

-        

-        <para>Also, features of annotations can be indexed using the

-          <literal>&lt;attributeMappings&gt;</literal> subelement. In the example index build

-          specification, we declare that the <literal>building</literal> feature of the type

-          <literal>org.apache.uima.tutorial.RoomNumber</literal> should be indexed. The

-          <literal>&lt;indexName&gt;</literal> element can be used to map the feature name to a different name in

-          the index, but in this example we have opted to use the same name, <literal>building</literal>. </para>

-        

-        <para> At the end of the batch or collection, the Semantic Search CAS Indexer builds the index. This index can

-          be queried with simple tokens or with XML tags.</para>

-        

-        <para>Examples:

-          

-          <itemizedlist spacing="compact">

-            <listitem>

-              <para>A query on the word <quote>UIMA</quote> will retrieve all documents that have the occurrence

-                of the word. But a query of the type <literal>&lt;Meeting&gt;UIMA&lt;/Meeting&gt;</literal>

-                will retrieve only those documents that contain a Meeting annotation (produced by our

-                MeetingDetector TAE, for example), where that Meeting annotation contains the word

-                <quote>UIMA</quote>.</para>

-            </listitem>

-            

-            <listitem>

-              <para>A query for <literal>&lt;RoomNumber building="Yorktown"/&gt;</literal> will return

-                documents that have a RoomNumber annotation whose <literal>building</literal> feature

-                contains the term <quote>Yorktown</quote>. </para>

-            </listitem>

-          </itemizedlist></para>

-        

-        <para>For more information on the Index Build

-          Specification format, see the UIMA Javadocs for class

-          <literal>org.apache.uima.search.IndexBuildSpecification</literal>. Accessing the Javadocs is

-          described in <olink targetdoc="&uima_docs_ref;"/> 

-          <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.javadocs"/>.</para>

-        

-      </section>

-      

-      <section id="ugr.tug.application.search.cpe_with_semantic_search_cas_consumer">

-        <title>Building and Running a CPE including the Semantic Search CAS Indexer</title>

-        <titleabbrev>Using Semantic Search CAS Indexer</titleabbrev>

-        

-        <para>The following steps illustrate how to build and run a CPE that uses the UIMA Meeting Detector TAE and the

-          Simple Token and Sentence Annotator, discussed in the <olink

-            targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aae"/> along with a CAS Consumer

-          called the Semantic Search CAS Indexer, to build an index that allows you to query for documents based not

-          only on textual content but also on whether they contain mentions of Meetings detected by the TAE.</para>

-        

-        <para>Run the CPE Configurator tool by executing the <literal>cpeGui</literal> shell script in the

-          <literal>bin</literal> directory of the UIMA SDK. (For instructions on using this tool, see the <olink

-            targetdoc="&uima_docs_tools;"/> <olink

-            targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cpe"/>.)</para>

-        

-        <para>In the CPE Configurator tool, select the following components by browsing to their

-          descriptors:</para>

-        

-        <itemizedlist spacing="compact">

-          <listitem>

-            <para>Collection Reader: <literal>%UIMA_HOME%/examples/descriptors/collectionReader/

-              FileSystemCollectionReader.xml</literal></para>

-          </listitem>

-          

-          <listitem>

-            <para>Analysis Engine: include both of these; one produces tokens/sentences, required by the indexer

-              in all cases and the other produces the meeting annotations of interest.

-              <itemizedlist spacing="compact">

-                <listitem><para><literal><?db-font-size 70% ?>%UIMA_HOME%/examples/descriptors/analysis_engine/SimpleTokenAndSentenceAnnotator.xml</literal></para></listitem>

-                <listitem><para><literal><?db-font-size 70% ?>%UIMA_HOME%/examples/descriptors/tutorial/ex6/UIMAMeetingDetectorTAE.xml</literal></para></listitem>

-              </itemizedlist>

-            </para>

-          </listitem>

-<!--              

-              

-              <literallayout>%UIMA_HOME%/examples/descriptors/analysis_engine/

-SimpleTokenAndSentenceAnnotator.xml</literallayout></para>

-          </listitem>

-          

-          <listitem>

-            <para><literal> and %UIMA_HOME%/examples/descriptors/tutorial/ex6/

-              UIMAMeetingDetectorTAE.xml</literal></para>

-          </listitem>

-  -->

-          

-          <listitem>

-            <para>Two CAS Consumers:

-              <itemizedlist spacing="compact">

-                <listitem><para><literal><?db-font-size 70% ?>%UIMA_HOME%/examples/descriptors/cas_consumer/SemanticSearchCasIndexer.xml</literal></para></listitem>

-                <listitem><para><literal><?db-font-size 70% ?>%UIMA_HOME%/examples/descriptors/cas_consumer/XmiWriterCasConsumer.xml</literal></para></listitem>

-              </itemizedlist>  

- <!--             

-              <literallayout>%UIMA_HOME%/examples/descriptors/cas_consumer/

-  SemanticSearchCasIndexer.xml

-

-%UIMA_HOME%/examples/descriptors/cas_consumer/ 

-  XmiWriterCasConsumer.xml</literallayout>

-   -->

-            </para>

-          </listitem>

-        </itemizedlist>

-        

-        <para>Set up parameters:</para>

-        

-        <itemizedlist spacing="compact">

-          <listitem>

-            <para> Set the File System Collection Reader's <quote>Input Directory</quote> parameter to point to

-              the <literal>%UIMA_HOME%/examples/data</literal> directory.</para>

-          </listitem>

-          

-          <listitem>

-            <para>Set the Semantic Search CAS Indexer's <quote>Indexing Specification Descriptor</quote>

-              parameter to point to <literal>%UIMA_HOME%/examples/descriptors/tutorial/search/

-              MeetingIndexBuildSpec.xml</literal></para>

-          </listitem>

-          

-          <listitem>

-            <para>Set the Semantic Search CAS Indexer's <quote>Index Dir</quote> parameter to whatever

-              directory into which you want the indexer to write its index files. <warning>

-              <para>The Indexer <emphasis>erases</emphasis> old versions of the files it creates in this

-                directory. </para></warning> </para>

-          </listitem>

-          

-          <listitem>

-            <para>Set the XMI Writer CAS Consumer's <quote>Output Directory</quote> parameter to whatever

-              directory into which you want to store the XMI files containing the results of your analysis for each

-              document. </para>

-          </listitem>

-        </itemizedlist>

-        

-        <para>Click on the Run Button. Once the run completes, a statistics dialog should appear, in which you can see

-          how much time was spent in each of the components involved in the run.</para>

-        

-      </section>

-    </section>

-  </section>

-

-   

-  <section id="ugr.tug.application.remote_services">

-    <title>Working with Remote Services</title>

-    

-    <note><para>This chapter describes older methods of working with Remote Services.  These approaches do not support

-    some of the newer CAS features, such as multiple views and CAS Multipliers.  These methods have been supplanted by

-    UIMA-AS, which has full support for the new CAS features.</para></note>

-    

-    <para>The UIMA SDK allows you to easily take any Analysis Engine or CAS Consumer and deploy it as a service. That

-      Analysis Engine or CAS Consumer can then be called from a remote machine using various network

-      protocols.</para>

-    

-    <para>The UIMA SDK provides support for the following communications protocols:

-      

-      <itemizedlist spacing="compact">

-        <listitem>

-          <para>Vinci, a lightweight protocol, included as a part of Apache UIMA.</para>

-        </listitem>

-      </itemizedlist></para>

-    

-    <para>The UIMA framework can make use of these services in two different ways:

-      

-      <orderedlist>

-        <listitem>

-          <para>An Analysis Engine can create a proxy to a remote service; this proxy acts like a local component, but

-            connects to the remote. The proxy has limited error handling and retry capabilities. The Vinci protocol is supported.</para>

-        </listitem>

-        

-        <listitem>

-          <para>A Collection Processing Engine can specify non-Integrated mode (see <olink

-              targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.cpe.deploying_a_cpe"/>. The

-            CPE provides more extensive error recovery capabilities. This mode only supports the Vinci

-            communications protocol. </para>

-        </listitem>

-      </orderedlist></para>

-    

-    <section id="ugr.tug.application.how_to_deploy_a_vinci_service">

-      <title>Deploying a UIMA Component as a Vinci Service</title>

-      <titleabbrev>Deploying as a Vinci Service</titleabbrev>

-      

-      <para>There are no software prerequisites for deploying a Vinci service. The necessary libraries are part of

-        the UIMA SDK. However, before you can use Vinci services you need to deploy the Vinci Naming Service (VNS), as

-        described in section <xref linkend="ugr.tug.application.vns"/>.</para>

-      

-      <para>To deploy a service, you have to insure any components you want to include can be found on the class path.

-        One way to do this is to set the environment variable UIMA_CLASSPATH to the set of class paths you need for any

-        included components. Then run the <literal>startVinciService</literal> shell script, which is located

-        in the <literal>bin</literal> directory, and pass it the path to a Vinci deployment descriptor, for

-        example: <literal>C:UIMA&gt;bin/startVinciService

-        ../examples/deploy/vinci/Deploy_PersonTitleAnnotator.xml</literal>.

-      If you are running Eclipse, and have the <literal>uimaj-examples</literal> project

-      in your workspace, you can use the Eclipse Menu &rarr; Run &rarr; Run... and then

-      pick <quote>UIMA Start Vinci Service</quote>.</para>

-      

-      <para>This example deployment descriptor looks like:

-        

-        <programlisting>&lt;deployment name=<emphasis role="bold-italic">"Vinci Person Title Annotator Service"</emphasis>&gt;

-

-  &lt;service name=<emphasis role="bold-italic">"uima.annotator.PersonTitleAnnotator"</emphasis> provider="vinci"&gt;

-

-    &lt;parameter name="resourceSpecifierPath" 

-      value=<emphasis role="bold-italic">"C:/Program Files/apache/uima/examples/descriptors/

-          analysis_engine/PersonTitleAnnotator.xml"</emphasis>/&gt;

-

-    &lt;parameter name="numInstances" value="1"/&gt;

-

-    &lt;parameter name="serverSocketTimeout" value="120000"/&gt;

-

-  &lt;/service&gt;

-

-&lt;/deployment&gt;</programlisting></para>

-      

-      <para>To modify this deployment descriptor to deploy your own Analysis Engine or CAS Consumer, just replace

-        the areas indicated in bold italics (deployment name, service name, and resource specifier path) with

-        values appropriate for your component.</para>

-      

-      <para>The <literal>numInstances</literal> parameter specifies how many instances of your Analysis Engine

-        or CAS Consumer will be created. This allows your service to support multiple clients concurrently. When a

-        new request comes in, if all of the instances are busy, the new request will wait until an instance becomes

-        available.</para>

-      

-      <para>The <literal>serverSocketTimeout</literal> parameter specifies the number of milliseconds

-        (default = 5 minutes) that the service will wait between requests to process something. After this amount of

-        time, the server will presume the client may have gone away - and it <quote>cleans up</quote>, releasing any

-        resources it is holding. The next call to process on the service will result in a cycle which will cause the

-        client to re-establish its connection with the service (some additional overhead).</para>

-

-      <para>There are two additional parameters that you can add to your deployment descriptor:

-        </para>

-      <itemizedlist>

-        <listitem><para><literal>&lt;parameter name="threadPoolMinSize" value="[Integer]"/></literal>:

-          Specifies the number of threads that the Vinci service creates on startup in order to

-          serve clients' requests.</para></listitem>

-        <listitem><para><literal>&lt;parameter name="threadPoolMaxSize" value="[Integer]"/></literal>:

-          Specifies the maximum number of threads that the Vinci service will create.  When the number of

-          concurrent requests exceeds the <literal>threadPoolMinSize</literal>, additional threads will be

-          created to serve requests, until the <literal>threadPoolMaxSize</literal> is reached.</para></listitem>

-      </itemizedlist>

-      

-      <para>The <literal>startVinciService</literal> script takes two additional optional parameters. The

-        first one overrides the value of the VNS_HOST environment variable, allowing you to specify the name server

-        to use. The second parameter if specified needs to be a unique (on this server) non-negative number,

-        specifying the instance of this service. When used, this number allows multiple instances of the same named

-        service to be started on one server; they will all register with the Vinci name service and be made available to

-        client requests.</para>

-      

-      <para>Once you have deployed your component as a web service, you may call it from a remote machine. See <xref

-          linkend="ugr.tug.application.how_to_call_a_uima_service"/> for instructions.</para>

-      

-    </section>

-    

-    <section id="ugr.tug.application.how_to_call_a_uima_service">

-      <title>How to Call a UIMA Service</title>

-      <titleabbrev>Calling a UIMA Service</titleabbrev>

-      

-      <para>Once an Analysis Engine or CAS Consumer has been deployed as a service, it can be used from any UIMA

-        application, in the exact same way that a local Analysis Engine or CAS Consumer is used. For example, you can

-        call an Analysis Engine service from the Document Analyzer or use the CPE Configurator to build a CPE that

-        includes Analysis Engine and CAS Consumer services.</para>

-      

-      <para>To do this, you use a <emphasis>service client descriptor</emphasis> in place of the usual Analysis

-        Engine or CAS Consumer Descriptor. A service client descriptor is a simple XML file that indicates the

-        location of the remote service and a few parameters. Example service client descriptors are provided in the

-        UIMA SDK under the directories <literal>examples/descriptors/vinciService</literal>. The 

-        contents of these descriptors are explained below.</para>

-      

-      <section id="ugr.tug.application.vinci_service_client_descriptor">

-        <title>Vinci Service Client Descriptor</title>

-        

-        <para>To call a Vinci service, a similar descriptor is used:

-          

-          

-          <programlisting><![CDATA[<uriSpecifier xmlns="http://uima.apache.org/resourceSpecifier">

-   <resourceType>AnalysisEngine</resourceType>

-   <uri>uima.annot.PersonTitleAnnotator</uri>

-   <protocol>Vinci</protocol>

-   <timeout>60000</timeout> 

-   <parameters>

-     <parameter name="VNS_HOST" value="some.internet.ip.name-or-address"/>

-     <parameter name="VNS_PORT" value="9000"/>

-   </parameters>

-</uriSpecifier>]]></programlisting></para>

-        

-        <para>Note that Vinci uses a centralized naming server, so the host where the service is deployed does not

-          need to be specified. Only a name (<literal>uima.annot.PersonTitleAnnotator</literal>) is given,

-          which must match the name specified in the deployment descriptor used to deploy the service.</para>

-        

-        <para>The host and/or port where your Vinci Naming Service (VNS) server is running can be specified by the

-          optional &lt;parameter&gt; elements. If not specified, the value is taken from the specification given

-          your Java command line (if present) using <literal>-DVNS_HOST=&lt;host&gt; </literal>and

-          <literal>-DVNS_PORT=&lt;port&gt;</literal> system arguments. If not specified on the Java command

-          line, defaults are used: localhost for the <literal>VNS_HOST</literal>, and <literal>9000</literal>

-          for the <literal>VNS_PORT</literal>. See the next section for details on setting up a VNS server.</para>

-        

-      </section>

-    </section>

-    <section id="ugr.tug.application.restrictions_on_remotely_deployed_services">

-      <title>Restrictions on remotely deployed services</title>

-      

-      <para>Remotely deployed services are started on remote machines, using UIMA component descriptors on those

-        remote machines. These descriptors supply any configuration and resource parameters for the service

-        (configuration parameters are not transmitted from the calling instance to the remote one). Likewise, the

-        remote descriptors supply the type system specification for the remote annotators that will be run (the type

-        system of the calling instance is not transmitted to the remote one).</para>

-      

-      <para>The remote service wrapper, when it receives a CAS from the caller, instantiates it for the remote

-        service, making instances of all types which the remote service specifies. Other instances in the incoming

-        CAS for types which the remote service has no type specification for are kept aside, and when the remote

-        service returns the CAS back to the caller, these type instances are re-merged back into the CAS being

-        transmitted back to the caller. Because of this design, a remote service which doesn't declare a type system

-        won't receive any type instances.</para> <note>

-      <para>This behavior may change in future releases, to one where configuration parameters and / or type systems

-        are transmitted to remote services. </para></note>

-      

-    </section>

-    

-    <section id="ugr.tug.application.vns">

-      <title>The Vinci Naming Services (VNS)</title>

-      

-      <para>Vinci consists of components for building network-accessible services, clients for accessing those

-        services, and an infrastructure for locating and managing services. The primary infrastructure component

-        is the Vinci directory, known as VNS (for Vinci Naming Service).</para>

-      

-      <para>On startup, Vinci services locate the VNS and provide it with information that is used by VNS during

-        service discovery. Vinci service provides the name of the host machine on which it runs, and the name of the

-        service. The VNS internally creates a binding for the service name and returns the port number on which the

-        Vinci service will wait for client requests. This VNS stores its bindings in a filesystem in a file called

-        vns.services.</para>

-      

-      <para>In Vinci, services are identified by their service name. If there is more than one physical service with

-        the same service name, then Vinci assumes they are equivalent and will route queries to them randomly,

-        provided that they are all running on different hosts. You should therefore use a unique service name if you

-        don't want to conflict with other services listed in whatever VNS you have configured jVinci to use.</para>

-      

-      <section id="ugr.tug.application.vns.starting">

-        <title>Starting VNS</title>

-        

-        <para>To run the VNS use the <literal>startVNS</literal> script found in the

-          <literal>bin</literal> directory of the UIMA installation, 

-        or launch it from Eclipse.  If you've installed the <literal>uimaj-examples</literal> project,

-        it will supply a pre-configured launch script you can access in Eclipse by selecting

-        Menu &rarr; Run &rarr; Run... and picking <quote>UIMA Start VNS</quote>.</para>

-        <note><para>VNS runs on port 9000 by default so please make sure this port is

-        available. If you see the following exception:

-        

-        <programlisting>java.net.BindException: Address already in use:

-

-JVM_Bind</programlisting>

-          it indicates that another process is running on port 9000. In this case, add the parameter <literal>-p

-          &lt;port&gt;</literal> to the <literal>startVNS</literal> command, using

-          <literal>&lt;port&gt;</literal> to specify an alternative port to use. </para></note>

-        

-        <para>When started, the VNS produces output similar to the following:

-          

-          

-          <programlisting><?db-font-size 80% ?>[10/6/04 3:44 PM | main] WARNING: Config file doesn't exist, 

-            creating a new empty config file!

-[10/6/04 3:44 PM | main] Loading config file : .vns.services

-[10/6/04 3:44 PM | main] Loading workspaces file : .vns.workspaces

-[10/6/04 3:44 PM | main] ====================================

-(WARNING) Unexpected exception:

-java.io.FileNotFoundException: .vns.workspaces (The system cannot find

-the file specified)

-  at java.io.FileInputStream.open(Native Method)

-  at java.io.FileInputStream.&lt;init&gt;(Unknown Source)

-  at java.io.FileInputStream.&lt;init&gt;(Unknown Source)

-  at java.io.FileReader.&lt;init&gt;(Unknown Source)

-  at org.apache.vinci.transport.vns.service.VNS.loadWorkspaces(VNS.java:339

-  at org.apache.vinci.transport.vns.service.VNS.startServing(VNS.java:237)

-  at org.apache.vinci.transport.vns.service.VNS.main(VNS.java:179)

-[10/6/04 3:44 PM | main] WARNING: failed to load workspace.

-[10/6/04 3:44 PM | main] VNS Workspace : null

-[10/6/04 3:44 PM | main] Loading counter file : .vns.counter

-[10/6/04 3:44 PM | main] Could not load the counter file : .vns.counter

-[10/6/04 3:44 PM | main] Starting backup thread,

-            using files .vns.services.bak

-and .vns.services

-[10/6/04 3:44 PM | main] Serving on port : 9000

-[10/6/04 3:44 PM | Thread-0] Backup thread started

-[10/6/04 3:44 PM | Thread-0] Saving to config file : .vns.services.bak

-&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; VNS is up and running! &lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;

-&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Type 'quit' and hit ENTER to terminate VNS &lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;

-[10/6/04 3:44 PM | Thread-0] Config save required 10 millis.

-[10/6/04 3:44 PM | Thread-0] Saving to config file : .vns.services

-[10/6/04 3:44 PM | Thread-0] Config save required 10 millis.

-[10/6/04 3:44 PM | Thread-0] Saving counter file : .vns.counter</programlisting></para>

-        <note>

-        <para>Disregard the <emphasis>java.io.FileNotFoundException: .\vns.workspaces (The system cannot

-          find the file specified).</emphasis> It is just a complaint. not a serious problem. VNS Workspace is a

-          feature of the VNS that is not critical. The important information to note is <literal>[10/6/04 3:44 PM |

-          main] Serving on port : 9000</literal> which states the actual port where VNS will listen for incoming

-          requests. All Vinci services and all clients connecting to services must provide the VNS port on the

-          command line IF the port is not a default. Again the default port is 9000. Please see <xref

-            linkend="ugr.tug.application.launching_vinci_services"/> below for details about the command

-          line and parameters.</para> </note>

-        

-      </section>

-      

-      <section id="ugr.tug.application.vns_files">

-        <title>VNS Files</title>

-        

-        <para>The VNS maintains two external files:

-          

-          <itemizedlist spacing="compact">

-            <listitem>

-              <para><literal>vns.services</literal></para>

-            </listitem>

-            <listitem>

-              <para><literal>vns.counter</literal></para>

-            </listitem>

-          </itemizedlist></para>

-        

-        <para>These files are generated by the VNS in the same directory where the VNS is launched from. Since these

-          files may contain old information it is best to remove them before starting the VNS. This step ensures that

-          the VNS has always the newest information and will not attempt to connect to a service that has been

-          shutdown.</para>

-      </section>

-      

-      <section id="ugr.tug.application.launching_vinci_services">

-        <title>Launching Vinci Services</title>

-        

-        <para>When launching Vinci service, you must indicate which VNS the service will

-          connect to. A Vinci service is typically started using the script

-          <literal>startVinciService</literal>, found in the <literal>bin</literal>

-          directory of the UIMA installation. (If you're using Eclipse and have the 

-          <literal>uimaj-examples</literal> project in the workspace, you will also find

-          an Eclipse launcher named <quote>UIMA Start Vinci Service</quote> you can use.)  

-          For the script, the environmental variable VNS_HOST should

-          be set to the name or IP address of the machine hosting the Vinci Naming Service. The

-          default is localhost, the machine the service is deployed on. This name can also be

-          passed as the second argument to the startVinciService script. The default port

-          for VNS is 9000 but can be overriden with the VNS_PORT environmental

-          variable.</para>

-

-        

-        <para>If you write your own startup script, to define Vinci&apos;s default VNS you must provide the

-          following JVM parameters:

-          

-          <programlisting>java -DVNS_HOST=localhost -DVNS_PORT=9000 ...</programlisting></para>

-        

-        <para>The above setting is for the VNS running on the same machine as the service. Of course one can deploy the

-          VNS on a different machine and the JVM parameter will need to be changed to this:

-          

-          <programlisting>java -DVNS_HOST=&lt;host&gt; -DVNS_PORT=9000 ...</programlisting></para>

-        

-        <para>where <quote>&lt;host&gt;</quote> is a machine name or its IP where the VNS is running.</para>

-        <note>

-        <para>VNS runs on port 9000 by default. If you see the following exception:

-          

-          

-          <programlisting>(WARNING) Unexpected exception:

-org.apache.vinci.transport.ServiceDownException: 

-          VNS inaccessible: java.net.Connect

-Exception: Connection refused: connect</programlisting>

-          then, perhaps the VNS is not running OR the VNS is running but it is using a different port. To correct the

-          latter, set the environmental variable VNS_PORT to the correct port before starting the service.</para>

-        </note>

-        

-        <para>To get the right port check the VNS output for something similar to the following:

-          

-          <programlisting>[10/6/04 3:44 PM | main] Serving on port : 9000</programlisting></para>

-        

-        <para>It is printed by the VNS on startup.</para>

-        

-      </section>

-    </section>

-    

-    <section id="ugr.tug.configuring_timeout_settings">

-      <title>Configuring Timeout Settings</title>

-      

-      <para>UIMA has several timeout specifications, summarized here.  The timeouts associated with remote 

-      services are discussed below.  In addition there are timeouts that can be specified for:

-      <itemizedlist>

-        

-        <listitem><para><emphasis role="bold">Acquiring an empty CAS from a CAS Pool:</emphasis>

-      See <xref linkend="ugr.tug.applications.multi_threaded"/>.</para></listitem>

-        

-        <listitem><para><emphasis role="bold">Reassembling chunks of a large document</emphasis>

-        See <olink targetdoc="&uima_docs_ref;"/>

-            <olink targetdoc="&uima_docs_ref;" 

-                   targetptr="ugr.ref.xml.cpe_descriptor.descriptor.operational_parameters"/></para>

-        </listitem>

-      

-      </itemizedlist></para>

-      

-      <para>If your application uses remote UIMA services it is important to consider how to set the

-        <emphasis>timeout</emphasis> values appropriately. This is particularly important if your service can

-        take a long time to process each request.</para>

-      

-      <para>There are two types of timeout settings in UIMA, the <emphasis>client timeout</emphasis> and the

-        <emphasis>server socket timeout</emphasis>. The client timeout is usually the most important, it

-        specifies how long that client is willing to wait for the service to process each CAS. The client timeout can be

-        specified for Vinci. The server socket timeout (Vinci only) specifies how long the service

-        holds the connection open between calls from the client. After this amount of time, the server will presume

-        the client may have gone away - and it <quote>cleans up</quote>, releasing any resources it is holding. The

-        next call to process on the service will cause the client to re-establish its connection with the service

-        (some additional overhead).</para>

-      <section id="ugr.tug.setting_client_timeout">

-        <title>Setting the Client Timeout</title>

-        <para>The way to set the client timeout is different depending on what deployment mode you use in your CPE (if

-          any).</para>

-        

-        <para>If you are using the default <quote>integrated</quote> deployment mode in your CPE, or if you are not

-          using a CPE at all, then the client timeout is specified in your Service Client Descriptor (see <xref

-            linkend="ugr.tug.application.how_to_call_a_uima_service"/>). For example:</para>

-        

-        

-        <programlisting>&lt;uriSpecifier xmlns="http://uima.apache.org/resourceSpecifier">

-   &lt;resourceType>AnalysisEngine&lt;/resourceType>

-   &lt;uri>uima.annot.PersonTitleAnnotator&lt;/uri>

-   &lt;protocol>Vinci&lt;/protocol>

-   <emphasis role="bold-italic">&lt;timeout>60000&lt;/timeout></emphasis> 

-   &lt;parameters>

-     &lt;parameter name="VNS_HOST" value="some.internet.ip.name-or-address"/>

-     &lt;parameter name="VNS_PORT" value="9000"/>

-   &lt;/parameters>

-&lt;/uriSpecifier></programlisting>

-        

-        <para>The client timeout in this example is <literal>60000</literal>. This value specifies the number of

-          milliseconds that the client will wait for the service to respond to each request. In this example, the

-          client will wait for one minute.</para>

-        <para>If the service does not respond within this amount of time, processing of the current CAS will abort. If

-          you called the <literal>AnalysisEngine.process</literal> method directly from your application, an

-          Exception will be thrown. If you are running a CPE, what happens next is dependent on the error handling

-          settings in your CPE descriptor (see <olink targetdoc="&uima_docs_ref;"/>

-          <olink targetdoc="&uima_docs_ref;"

-            targetptr="ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.error_handling"/>

-          ). The default action is for the CPE to terminate, but you can override this. </para>

-        

-        <para>If you are using the <quote>managed</quote> or <quote>non-managed</quote> deployment mode in your

-          CPE, then the client timeout is specified in your CPE desciptor's <literal>errorHandling</literal>

-          element. For example:</para>

-        

-        

-        <programlisting><![CDATA[<errorHandling>

-  <maxConsecutiveRestarts .../>

-  <errorRateThreshold .../>

-  <timeout max="60000"/>

-</errorHandling>]]></programlisting>

-        

-        <para>As in the previous example, the client timeout is set to <literal>60000</literal>, and this

-          specifies the number of milliseconds that the client will wait for the service to respond to each

-          request.</para>

-        <para>If the service does not respond within the specified amount of time, the action is determined by the

-          settings for <literal>maxConsecutiveRestarts</literal> and

-          <literal>errorRateThreshold</literal>. These settings support such things as restarting the process

-          (for <quote>managed</quote> deployment mode), dropping and reestablishing the connection (for

-          <quote>non-managed</quote> deployment mode), and removing the offending service from the pipeline. See

-            <olink targetdoc="&uima_docs_ref;"/>

-            <olink targetdoc="&uima_docs_ref;"

-            targetptr="ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.error_handling"/>

-          ) for details. </para>

-        

-        <para>Note that the client timeout does not apply to the <literal>GetMetaData</literal>

-          request that is made when the client first connects to the service.  This call is typically

-          very fast and does not need a large timeout (the default is 60 seconds).  However, if many

-          clients are competing for a small number of services, it may be necessary to increase this

-          value.  See <olink targetdoc="&uima_docs_ref;"/> <olink targetdoc="&uima_docs_ref;"

-            targetptr="ugr.ref.xml.component_descriptor.service_client"/></para>

-      </section>

-      

-      <section id="ugr.tug.setting_server_socket_timeout">

-        <title>Setting the Server Socket Timeout</title>

-        <para>The Server Socket Timeout applies only to Vinci services, and is specified in the Vinci deployment

-          descriptor as discussed in section <xref

-            linkend="ugr.tug.application.how_to_deploy_a_vinci_service"/>. For example:

-          

-          <programlisting>&lt;deployment name="Vinci Person Title Annotator Service"&gt;

-

-  &lt;service name="uima.annotator.PersonTitleAnnotator" provider="vinci"&gt;

-

-    &lt;parameter name="resourceSpecifierPath" 

-      value="C:/Program Files/apache/uima/examples/descriptors/

-          analysis_engine/PersonTitleAnnotator.xml"/&gt;

-

-    &lt;parameter name="numInstances" value="1"/&gt;

-

-    &lt;parameter name="serverSocketTimeout" value=<emphasis role="bold-italic">"120000"</emphasis>/&gt;

-

-  &lt;/service&gt;

-

-&lt;/deployment&gt;</programlisting>

-         </para>

-        

-        <para>The server socket timeout here is set to <literal>120000</literal> milliseconds, or two minutes.

-          This parameter specifies how long the service will wait between requests to process something. After this

-          amount of time, the server will presume the client may have gone away - and it <quote>cleans up</quote>,

-          releasing any resources it is holding. The next call to process on the service will cause the client to

-          re-establish its connection with the service (some additional overhead). The service may print a

-          <quote>Read Timed Out</quote> message to the console when the server socket timeout elapses.</para>

-        

-        <para>In most cases, it is not a problem if the server socket timeout elapses. The client will simply

-          reconnect. However, if you notice <quote>Read Timed Out</quote> messages on your server console,

-          followed by other connection problems, it is possible that the client is having trouble reconnecting for

-          some reason. In this situation it may help increase the stability of your application if you increase the

-          server socket timeout so that it does not elapse during actual processing.</para>

-      </section>

-      

-    </section>

-  </section>

-  

-  <section id="ugr.tug.application.increasing_performance_using_parallelism">

-    <title>Increasing performance using parallelism</title>

-    

-    <para>There are several ways to exploit parallelism to increase performance in the UIMA Framework. These range

-      from running with additional threads within one Java virtual machine on one host (which might be a

-      multi-processor or hyper-threaded host) to deploying analysis engines on a set of remote machines.</para>

-    

-    <para>The Collection Processing facility in UIMA provides the ability to scale the pipe-line of analysis

-      engines. This scale-out runs multiple threads within the Java virtual machine running the CPM, one for each

-      pipe in the pipe-line. To activate it, in the <literal>&lt;casProcessors&gt;</literal> descriptor

-      element, set the attribute <literal>processingUnitThreadCount</literal>, which specifies the number of

-      replicated processing pipelines, to a value greater than 1, and insure that the size of the CAS pool is equal to or

-      greater than this number (the attribute of <literal>&lt;casProcessors&gt;</literal> to set is

-      <literal>casPoolSize</literal>). For more details on these settings, see <olink

-        targetdoc="&uima_docs_ref;"/> <olink

-        targetdoc="&uima_docs_ref;"

-        targetptr="ugr.ref.xml.cpe_descriptor.descriptor.cas_processors"/> .</para>

-    

-    <para>For deployments that incorporate remote analysis engines in the Collection Manager pipe-line, running

-      on multiple remote hosts, scale-out is supported which uses the Vinci naming service. If multiple instances of

-      a service with the same name, but running on different hosts, are registered with the Vinci Name Server, it will

-      assign these instances to incoming requests.</para>

-    

-    <para>There are two modes supported: a <quote>random</quote> assignment, and a <quote>exclusive</quote>

-      one. The <quote>random</quote> mode distributes load using an algorithm that selects a service instance at

-      random. The UIMA framework supports this only for the case where all of the instances are running on unique

-      hosts; the framework does not support starting 2 or more instances on the same host.</para>

-    

-    <para>The exclusive mode dedicates a particular remote instance to each Collection Manager pip-line instance.

-      This mode is enabled by adding a configuration parameter in the

-      &lt;casProcessor&gt; section of the CPE descriptor:</para>

-    

-    

-    <literallayout>&lt;deploymentParameters&gt;

-  &lt;parameter name="service-access" value="exclusive" /&gt;

-&lt;/deploymentParameters&gt;</literallayout>

-    

-    <para>If this is not specified, the <quote>random</quote> mode is used.</para>

-    

-    <para>In addition, remote UIMA engine services can be started with a parameter that specifies the number of

-      instances the service should support (see the <literal>&lt;parameter name="numInstances"&gt;</literal>

-      XML element in remote deployment descriptor <xref linkend="ugr.tug.application.remote_services"/>

-      Specifying more than one causes the service wrapper for the analysis engine to use multi-threading (within the

-      single Java Virtual Machine &ndash; which can take advantage of multi-processor and hyper-threaded

-      architectures).</para> <note>

-    <para>When using Vinci in <quote>exclusive</quote> mode (see service access under <olink

-        targetdoc="&uima_docs_ref;"/> <olink

-        targetdoc="&uima_docs_ref;"

-        targetptr="ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.deployment_parameters"/>

-      ), only one thread is used. To achieve multi-processing on a server in this case, use multiple instances of the

-      service, instead of multiple threads (see <xref

-        linkend="ugr.tug.application.how_to_deploy_a_vinci_service"/>.</para> </note>

-  </section>

-  

-  <section id="ugr.tug.application.jmx">

-    <title>Monitoring AE Performance using JMX</title>

-    

-    <para>As of version 2, UIMA supports remote monitoring of Analysis Engine performance via the Java Management

-      Extensions (JMX) API. JMX is a standard part of the Java Runtime Environment v5.0; there is also a reference

-      implementation available from Sun for Java 1.4. An introduction to JMX is available from Sun here: <ulink

-        url="http://java.sun.com/developer/technicalArticles/J2SE/jmx.html"/>. When you run a UIMA with a

-      JVM that supports JMX, the UIMA framework will automatically detect the presence of JMX and will register

-      <emphasis>MBeans</emphasis> that provide access to the performance statistics.</para>

-    

-    <para>Note: The Sun JVM supports local monitoring; for others you can configure your

-      application for remote monitoring (even when on the same host) by specifying a unique port number, e.g.

-      <literal>

-      -Dcom.sun.management.jmxremote.port=1098

-      -Dcom.sun.management.jmxremote.authenticate=false

-      -Dcom.sun.management.jmxremote.ssl=false</literal></para>

-    

-    <para>Now, you can use any JMX client to view the statistics. JDK 5.0 or later provides a standard client that you can use.

-      Simply open a command prompt, make sure the JDK <literal>bin</literal> directory is in your path, and

-      execute the <literal>jconsole</literal> command. This should bring up a window allowing you to

-      select one of the local JMX-enabled applications currently running, or to enter a remote (or local) host and

-      port, e.g. localhost:1098.  The next screen will show a summary of

-      information about the Java process that you connected to. Click on the <quote>MBeans</quote> tab, then expand

-      <quote>org.apache.uima</quote> in the tree at the left. You should see a view like this:

-      

-      

-      <screenshot>

-    <mediaobject>

-      <imageobject>

-        <imagedata width="5.7in" format="JPG" fileref="&imgroot;image006.jpg"/>

-      </imageobject>

-      <textobject><phrase>Screenshot of JMX console monitoring UIMA components</phrase></textobject>

-    </mediaobject>

-  </screenshot></para>

-    

-    <para>Each of the nodes under <quote><literal>org.apache.uima</literal></quote> in the tree represents one

-      of the UIMA Analysis Engines in the application that you connected to. You can select one of the analysis engines

-      to view its performance statistics in the view at the right.</para>

-    

-    <para>Probably the most useful statistic is <quote>CASes Per Second</quote>, which is the number of CASes that

-      this AE has processed divided by the amount of time spent in the AE's process method, in seconds. Note that this is

-      the total elapsed time, not CPU time. Even so, it can be useful to compare the <quote>CASes Per Second</quote>

-      numbers of all of your Analysis Engines to discover where the bottlenecks occur in your application.</para>

-    

-    <para>The <literal>AnalysisTime</literal>, <literal>BatchProcessCompleteTime</literal>, and

-      <literal>CollectionProcessCompleteTime</literal> properties show the total elapsed time, in

-      milliseconds, that has been spent in the AnalysisEngine's <literal>process(), batchProcessComplete(),

-      </literal>and <literal>collectionProcessComplete()</literal> methods, respectively. (Note that for

-      CAS Multipliers, time spent in the <literal>hasNext()</literal> and <literal>next()</literal> methods is

-      also counted towards the AnalysisTime.)</para>

-    

-    <para>Note that once your UIMA application terminates, you can no longer view the statistics through the JMX

-      console. If you want to use JMX to view processes that have completed, you will need to write your application so

-      that the JVM remains running after processing completes, waiting for some user signal before

-      terminating.</para>

-    

-    <para>It is possible to override the default JMX MBean names UIMA uses, for

-      example to better organize the UIMA MBeans with respect to MBeans exposed by

-      other parts of your application.  This is done using the

-      <literal>AnalysisEngine.PARAM_MBEAN_NAME_PREFIX</literal> additional parameter 

-      when creating your AnalysisEngine:

-        

-        <programlisting>  //set up Map with custom JMX MBean name prefix

-  Map paramMap = new HashMap();

-  paramMap.put(AnalysisEngine.PARAM_MBEAN_NAME_PREFIX,

-               "org.myorg:category=MyApp");

-        

-  // create Analysis Engine

-  AnalysisEngine ae = 

-      UIMAFramework.produceAnalysisEngine(specifier, paramMap);

-</programlisting>    

-    </para>

-    <para>Similary, you can use the <literal>AnalysisEngine.PARAM_MBEAN_SERVER</literal>

-      parameter to specify a particular instance of a JMX MBean Server with which UIMA

-      should register the MBeans.  If no specified then the default is to register with

-      the platform MBeanServer (Java 5+ only).</para>

-        

-    <para>More information on JMX can be found in the <ulink

-        url="http://java.sun.com/j2se/1.5.0/docs/api/javax/management/package-summary.html#package_description">

-      Java 5 documentation</ulink>.</para>    

-  </section>

-  

-  <section id="tug.application.pto">

-	  <title>Performance Tuning Options</title>

-	

-	  <para>

-	  	There are a small number of performance tuning options available to

-	  	influence the runtime behavior of UIMA applications. Performance

-	  	tuning options need to be set programmatically when an analysis

-	  	engine is created. You simply create a Java Properties object with

-	  	the relevant options and pass it to the UIMA framework on the call

-	  	to create an analysis engine. Below is an example.

-	  	

-	  	<programlisting>

-	  	  XMLParser parser = UIMAFramework.getXMLParser();

-	      ResourceSpecifier spec = parser.parseResourceSpecifier(

-	            new XMLInputSource(descriptorFile));

-	      // Create a new properties object to hold the settings.

-	      Properties performanceTuningSettings = new Properties();

-	      // Set the initial CAS heap size.

-	      performanceTuningSettings.setProperty(

-	            UIMAFramework.CAS_INITIAL_HEAP_SIZE, 

-	            "1000000");

-	      // Create a wrapper properties object that can

-	      // be passed to the framework.

-	      Properties additionalParams = new Properties();

-	      // Set the performance tuning properties as value to

-	      // the appropriate parameter.

-	      additionalParams.put(

-	            Resource.PARAM_PERFORMANCE_TUNING_SETTINGS, 

-	            performanceTuningSettings);

-	      // Create the analysis engine with the parameters.

-	      // The second, unused argument here is a custom 

-	      // resource manager.

-	      this.ae = UIMAFramework.produceAnalysisEngine(

-	          spec, null, additionalParams);

-	  	

-	  	</programlisting>

-	  </para>

-  

-	  <para>

-		  The following options are supported:

-		  <itemizedlist>

-        <!-- not used in v3

-		    <listitem>

-				  <para><literal>UIMAFramework.CAS_INITIAL_HEAP_SIZE</literal>:  

-            This is only used to help initialize the sizes of some internal tables.

-            You can leave it unspecified; but if you know the approximate number of feature structures

-            a pipeline will have, you can specify it.

-				  </para>

-		   </listitem>

-        -->

-		    <listitem>

-				  <para><literal>UIMAFramework.PROCESS_TRACE_ENABLED</literal>: enable the process trace mechanism

-				  (true/false).  When enabled, UIMA tracks the time spent in individual components of an aggregate 

-				  AE or CPE.  For more information, see the API documentation of 

-				  <literal>org.apache.uima.util.ProcessTrace</literal>.

-				  </para>

-		   </listitem>

-		   <listitem>

-			   <para><literal>UIMAFramework.SOCKET_KEEPALIVE_ENABLED</literal>: enable socket KeepAlive

-			   (true/false).  This setting is currently only supported by Vinci clients.  Defaults to 

-			   <literal>true</literal>.

-		   </para>

-		  </listitem>

-		 </itemizedlist>

-		</para>

-  

-  </section>

-    

-</chapter>
\ No newline at end of file
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/tug.cas_multiplier.xml b/uima-docbook-tutorials-and-users-guides/src/docbook/tug.cas_multiplier.xml
deleted file mode 100644
index e788838..0000000
--- a/uima-docbook-tutorials-and-users-guides/src/docbook/tug.cas_multiplier.xml
+++ /dev/null
@@ -1,847 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY imgroot "images/tutorials_and_users_guides/tug.cas_multiplier/">

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent">  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.tug.cm">

-  <title>CAS Multiplier Developer&apos;s Guide</title>

-  <titleabbrev>CAS Multiplier</titleabbrev>

-  

-  <para>The UIMA analysis components (Annotators and CAS Consumers) described previously in this manual all take a

-    single CAS as input, optionally make modifications to it, and output that same CAS. This chapter describes an

-    advanced feature that became available in the UIMA SDK v2.0: a new type of analysis component called a

-    <emphasis>CAS Multiplier</emphasis>, which can create new CASes during processing.</para>

-  

-  <para>CAS Multipliers are often used to split a large artifact into manageable pieces. This is a common requirement

-    of audio and video analysis applications, but can also occur in text analysis on very large documents. A CAS

-    Multiplier would take as input a single CAS representing the large artifact (perhaps by a remote reference to the

-    actual data &mdash; see <olink targetdoc="&uima_docs_tutorial_guides;"

-      targetptr="ugr.tug.aas.sofa_data_formats"/>) and produce as output a series of new CASes each of which

-    contains only a small portion of the original artifact.</para>

-  

-  <para>CAS Multipliers are not limited to dividing an artifact into smaller pieces, however. A CAS Multiplier can

-    also be used to combine smaller segments together to form larger segments. In general, a CAS Multiplier is used to

-    <emphasis>change</emphasis> the segmentation of a series of CASes; that is, to change how a stream of data is

-    divided among discrete CAS objects.</para>

-  

-  <section id="ugr.tug.cm.developing_multiplier_code">

-    <title>Developing the CAS Multiplier Code</title>

-    

-    <section id="ugr.tug.cm.cm_interface_overview">

-      <title>CAS Multiplier Interface Overview</title>

-      

-      <para>CAS Multiplier implementations should extend from the

-        <literal>JCasMultiplier_ImplBase</literal> or <literal>CasMultiplier_ImplBase</literal>

-        classes, depending on which CAS interface they prefer to use. As with other types of analysis components, the

-        CAS Multiplier ImplBase classes define optional <literal>initialize</literal>,

-        <literal>destroy</literal>, and <literal>reconfigure</literal> methods. There are then three

-        required methods: <literal>process</literal>, <literal>hasNext</literal>, and

-        <literal>next</literal>. The framework interacts with these methods as follows:</para>

-      

-      <orderedlist>

-        <listitem>

-          <para>The framework calls the CAS Multiplier&apos;s <literal>process</literal> method, passing it an

-            input CAS. The process method returns, but may hold on to a reference to the input CAS.</para>

-        </listitem>

-        

-        <listitem>

-          <para>The framework then calls the CAS Multiplier&apos;s <literal>hasNext</literal> method. The CAS

-            Multiplier should return <literal>true</literal> from this method if it intends to output one or more

-            new CASes (for instance, segments of this CAS), and <literal>false</literal> if not.</para>

-        </listitem>

-        

-        <listitem>

-          <para>If <literal>hasNext</literal> returned true, the framework will call the CAS Multiplier&apos;s

-            <literal>next</literal> method. The CAS Multiplier creates a new CAS (we will see how in a moment),

-            populates it, and returns it from the <literal>next</literal> method.</para>

-        </listitem>

-        

-        <listitem>

-          <para>Steps 2 and 3 continue until <literal>hasNext</literal> returns false. If the

-          framework detects a situation where it needs to cancel this CAS Multiplier, it will stop

-          calling the <literal>hasNext</literal> and <literal>next</literal> methods, and when

-          another top-level CAS comes along it will call the annotator's <literal>process</literal> method again.

-          User's annotator code should interpret this as a signal to cleanup 

-          processing related to the previous CAS and then start processing with the new CAS.</para>

-        </listitem>

-      </orderedlist>

-      

-      <para>From the time when <literal>process</literal> is called until the <literal>hasNext</literal>

-        method returns false (or <literal>process</literal> is called again), 

-        the CAS Multiplier <quote>owns</quote> the CAS that was passed to its

-        <literal>process</literal> method. The CAS Multiplier can store a reference to this CAS in a local field and

-        can read from it or write to it during this time. Once the ending condition occurs, the CAS

-        Multiplier gives up ownership of the input CAS and should no longer retain a reference to it.</para>

-    </section>

-    

-    <section id="ugr.tug.cm.how_to_get_empty_cas_instance">

-      <title>How to Get an Empty CAS Instance</title>

-      <titleabbrev>Getting an empty CAS Instance</titleabbrev>

-      

-      <para>The CAS Multiplier&apos;s <literal>next</literal> method must return a CAS instance that represents

-        a new representation of the input artifact. Since CAS instances are managed by the framework, the CAS

-        Multiplier cannot actually create a new CAS; instead it should request an empty CAS by calling the method:

-        

-        <programlisting>CAS getEmptyCAS()

-

-or

-

-JCas getEmptyJCas()</programlisting> which are

-        defined on the <literal>CasMultiplier_ImplBase</literal> and

-        <literal>JCasMultiplier_ImplBase</literal> classes, respectively.</para>

-      

-      <para>Note that if it is more convenient you can request an empty CAS during the <literal>process</literal> or

-        <literal>hasNext</literal> methods, not just during the <literal>next</literal> method.</para>

-      

-      <para>By default, a CAS Multiplier is only allowed to hold one output CAS instance at a time. You must return the

-        CAS from the <literal>next</literal> method before you can request a second CAS. If you try to call

-        getEmptyCAS a second time you will get an Exception. You can change this default behavior by overriding the

-        method <literal>getCasInstancesRequired</literal> to return the number of CAS instances that you need.

-        Be aware that CAS instances consume a significant amount of memory, so setting this to a large value will cause

-        your application to use a lot of RAM. So, for example, it is not a good practice to attempt to generate a large

-        number of new CASes in the CAS Multiplier&apos;s <literal>process</literal> method. Instead, you should

-        spread your processing out across the calls to the <literal>hasNext</literal> or

-        <literal>next</literal> methods.</para>

-      

-      <note><para>You can only call <literal>getEmptyCAS()</literal> or <literal>getEmptyJCas()</literal>

-        from your CAS Multiplier's <literal>process</literal>, <literal>hasNext</literal>, or

-        <literal>next</literal> methods.  You cannot call it from other methods such as 

-        <literal>initialize</literal>.  This is because the Aggregate AE's Type System is not available

-        until all of the components of the aggregate have finished their initialization.

-      </para></note>

-      

-      <para>The Type System of the empty CAS will contain all of the type definitions for all 

-        components of the outermost Aggregate Analysis Engine or Collection Processing Engine

-        that contains your CAS Multiplier.  Therefore downstream components that receive 

-        these CASes can add new instances of any type that they define.</para>

-                

-      <warning><para>Be careful to keep the Feature Structures that belong to each CAS separate.  You 

-        cannot create references from a Feature Structure in one CAS to a Feature Structure in another CAS.

-        You also cannot add a Feature Structure created in one CAS to the indexes of a different CAS.  

-        If you attempt to do this, the results are undefined.      

-      </para>        

-      </warning>

-    </section>

-    

-    <section id="ugr.tug.cm.example_code">

-      <title>Example Code</title>

-      

-      <para>This section walks through the source code of an example CAS Multiplier that breaks text documents into

-        smaller pieces. The Java class for the example is

-        <literal>org.apache.uima.examples.casMultiplier.SimpleTextSegmenter</literal> and the source

-        code is included in the UIMA SDK under the <literal>examples/src</literal> directory.</para>

-      

-      <section id="ugr.tug.cm.example_code.overall_structure">

-        <title>Overall Structure</title>

-        

-        

-        <programlisting>public class SimpleTextSegmenter extends JCasMultiplier_ImplBase {

-  private String mDoc;

-  private int mPos;

-  private int mSegmentSize;

-  private String mDocUri;  

-  

-  public void initialize(UimaContext aContext) 

-          throws ResourceInitializationException

-  { ... }

-

-  public void process(JCas aJCas) throws AnalysisEngineProcessException

-  { ... }

-

-  public boolean hasNext() throws AnalysisEngineProcessException

-  { ... }

-

-  public AbstractCas next() throws AnalysisEngineProcessException

-  { ... }

-}</programlisting>

-        

-        <para>The <literal>SimpleTextSegmenter</literal> class extends

-          <literal>JCasMultiplier_ImplBase</literal> and implements the optional

-          <literal>initialize</literal> method as well as the required <literal>process</literal>,

-          <literal>hasNext</literal>, and <literal>next</literal> methods. Each method is described

-          below.</para>

-        

-      </section>

-      

-      <section id="ugr.tug.cm.example_code.initialize">

-        <title>Initialize Method</title>

-        

-        

-        <programlisting>public void initialize(UimaContext aContext) throws

-                    ResourceInitializationException {

-  super.initialize(aContext);

-  mSegmentSize = ((Integer)aContext.getConfigParameterValue(

-                            "segmentSize")).intValue();

-}</programlisting>

-        

-        <para>Like an Annotator, a CAS Multiplier can override the initialize method and read configuration

-          parameter values from the UimaContext. The SimpleTextSegmenter defines one parameter, <quote>Segment

-          Size</quote>, which determines the approximate size (in characters) of each segment that it will

-          produce.</para>

-        

-      </section>

-      

-      <section id="ugr.tug.cm.example_code.process">

-        <title>Process Method</title>

-        

-        

-        <programlisting>public void process(JCas aJCas) 

-       throws AnalysisEngineProcessException {

-  mDoc = aJCas.getDocumentText();

-  mPos = 0;

-  // retreive the filename of the input file from the CAS so that it can 

-  // be added to each segment

-  FSIterator it = aJCas.

-          getAnnotationIndex(SourceDocumentInformation.type).iterator();

-  if (it.hasNext()) {

-    SourceDocumentInformation fileLoc = 

-          (SourceDocumentInformation)it.next();

-    mDocUri = fileLoc.getUri();

-  }

-  else {

-    mDocUri = null;

-  }

- }</programlisting>

-        

-        <para>The process method receives a new JCas to be processed(segmented) by this CAS Multiplier. The

-          SimpleTextSegmenter extracts some information from this JCas and stores it in fields (the document text

-          is stored in the field mDoc and the source URI in the field mDocURI). Recall that the CAS Multiplier is

-          considered to <quote>own</quote> the JCas from the time when process is called until the time when hasNext

-          returns false. Therefore it is acceptable to retain references to objects from the JCas in a CAS

-          Multiplier, whereas this should never be done in an Annotator. The CAS Multiplier could have chosen to

-          store a reference to the JCas itself, but that was not necessary for this example.</para>

-        

-        <para>The CAS Multiplier also initializes the mPos variable to 0. This variable is a position into the

-          document text and will be incremented as each new segment is produced.</para>

-        

-      </section>

-      

-      <section id="ugr.tug.cm.example_code.hasnext">

-        <title>HasNext Method</title>

-        

-        

-        <programlisting>public boolean hasNext() throws AnalysisEngineProcessException {

-  return mPos &lt; mDoc.length();

-}</programlisting>

-        

-        <para>The job of the hasNext method is to report whether there are any additional output CASes to produce. For

-          this example, the CAS Multiplier will break the entire input document into segments, so we know there will

-          always be a next segment until the very end of the document has been reached.</para>

-        

-      </section>

-      

-      <section id="ugr.tug.cm.example_code.next">

-        <title>Next Method</title>

-        

-        

-        <programlisting>public AbstractCas next() throws AnalysisEngineProcessException {

-  int breakAt = mPos + mSegmentSize;

-  if (breakAt > mDoc.length())

-    breakAt = mDoc.length();

-          

-  // search for the next newline character. 

-  // Note: this example segmenter implementation

-  // assumes that the document contains many newlines. 

-  // In the worst case, if this segmenter

-  // is run on a document with no newlines, 

-  // it will produce only one segment containing the

-  // entire document text. 

-  // A better implementation might specify a maximum segment size as

-  // well as a minimum.

-          

-  while (breakAt &lt; mDoc.length() &amp;&amp; 

-         mDoc.charAt(breakAt - 1) != '\n')

-    breakAt++;

-

-  JCas jcas = getEmptyJCas();

-  try {

-    jcas.setDocumentText(mDoc.substring(mPos, breakAt));

-    // if original CAS had SourceDocumentInformation, 

-          also add SourceDocumentInformatio

-    // to each segment

-    if (mDocUri != null) {

-      SourceDocumentInformation sdi = 

-          new SourceDocumentInformation(jcas);

-      sdi.setUri(mDocUri);

-      sdi.setOffsetInSource(mPos);

-      sdi.setDocumentSize(breakAt - mPos);

-      sdi.addToIndexes();

-

-      if (breakAt == mDoc.length()) {

-        sdi.setLastSegment(true);

-      }

-    }

-

-    mPos = breakAt;

-    return jcas;

-  } catch (Exception e) {

-    jcas.release();

-    throw new AnalysisEngineProcessException(e);

-  }

-}</programlisting>

-        

-        <para>The <literal>next</literal> method actually produces the next segment and returns it. The

-          framework guarantees that it will not call <literal>next</literal> unless

-          <literal>hasNext</literal> has returned true since the last call to <literal>process</literal> or

-          <literal>next</literal> .</para>

-        

-        <para>Note that in order to produce a segment, the CAS Multiplier must get an empty JCas to populate. This is

-          done by the line:</para>

-        

-        <programlisting>JCas jcas = getEmptyJCas();</programlisting>

-        

-        <para>This requests an empty JCas from the framework, which maintains a pool of JCas instances to draw

-          from.</para>

-        

-        <para>Also, note the use of the <literal>try...catch</literal> block to ensure that a JCas is released back

-          to the pool if an exception occurs. This is very important to allow a CAS Multiplier to recover from

-          errors.</para>

-        

-      </section>

-    </section>

-  </section>

-  

-  <section id="ugr.tug.cm.creating_cm_descriptor">

-    <title>Creating the CAS Multiplier Descriptor</title>

-    <titleabbrev>CAS Multiplier Descriptor</titleabbrev>

-    

-    <para>There is not a separate type of descriptor for a CAS Multiplier. CAS Multiplier are considered a type of

-      Analysis Engine, and so their descriptors use the same syntax as any other Analysis Engine Descriptor.</para>

-    

-    <para>The descriptor for the <literal>SimpleTextSegmenter</literal> is located in the

-      <literal>examples/descriptors/cas_multiplier/SimpleTextSegmenter.xml</literal> directory of the

-      UIMA SDK.</para>

-    

-    <para>The Analysis Engine Description, in its <quote>Operational Properties</quote> section, now contains a

-      new <quote>outputsNewCASes</quote> property which takes a Boolean value. If the Analysis Engine is a CAS

-      Multiplier, this property should be set to true.</para>

-    

-    <para>If you use the CDE, be sure to check the <quote>Outputs new CASes</quote> box in the Runtime Information

-      section on the Overview page, as shown here:

-      

-      

-      <screenshot>

-    <mediaobject>

-      <imageobject>

-        <imagedata width="5.2in" align="center" format="JPG" fileref="&imgroot;image002.jpg"/>

-      </imageobject>

-      <textobject><phrase>Screen shot of Component Descriptor Editor on Overview 

-        showing checking of "Outputs new CASes" box</phrase>       

-      </textobject>

-    </mediaobject>

-  </screenshot></para>

-    

-    <para>If you edit the Analysis Engine Descriptor by hand, you need to add a

-      <literal>&lt;outputsNewCASes&gt;</literal> element to your descriptor as shown here:</para>

-    

-    

-    <programlisting>&lt;operationalProperties&gt;

-    &lt;modifiesCas&gt;false&lt;/modifiesCas&gt;

-    &lt;multipleDeploymentAllowed&gt;true&lt;/multipleDeploymentAllowed&gt;

-    <emphasis role="bold">&lt;outputsNewCASes&gt;true&lt;/outputsNewCASes&gt;</emphasis>

-  &lt;/operationalProperties&gt;</programlisting>

-    <note>

-    <para>The <quote>modifiedCas</quote> operational property refers to the input CAS, not the new output CASes

-      produced. So our example SimpleTextSegmenter has modifiesCas set to false since it doesn&apos;t modify the

-      input CAS. </para></note>

-    

-  </section>

-  

-  <section id="ugr.tug.cm.using_cm_in_aae">

-    <title>Using a CAS Multiplier in an Aggregate Analysis Engine</title>

-    <titleabbrev>Using CAS Multipliers in Aggregates</titleabbrev>

-    

-    <para>You can include a CAS Multiplier as a component in an Aggregate Analysis Engine. For example, this allows

-      you to construct an Aggregate Analysis Engine that takes each input CAS, breaks it up into segments, and runs a

-      series of Annotators on each segment.</para>

-    

-    <section id="ugr.tug.cm.adding_cm_to_aggregate">

-      <title>Adding the CAS Multiplier to the Aggregate</title>

-      <titleabbrev>Aggregate: Adding the CAS Multiplier</titleabbrev>

-      

-      <para>Since CAS Multiplier are considered a type of Analysis Engine, adding them to an aggregate works the same

-        way as for other Analysis Engines. Using the CDE, you just click the <quote>Add...</quote> button in the

-        Component Engines view and browse to the Analysis Engine Descriptor of your CAS Multiplier. If editing the

-        aggregate descriptor directly, just <literal>import</literal> the Analysis Engine Descriptor of your

-        CAS Multiplier as usual.</para>

-      

-      <para>An example descriptor for an Aggregate Analysis Engine containing a CAS Multiplier is provided in

-        <literal>examples/descriptors/cas_multiplier/SegmenterAndTokenizerAE.xml</literal>. This

-        Aggregate runs the <literal>SimpleTextSegmenter</literal> example to break a large document into

-        segments, and then runs each segment through the <literal>SimpleTokenAndSentenceAnnotator</literal>.

-        Try running it in the Document Analyzer tool with a large text file as input, to see that it outputs multiple

-        output CASes, one for each segment produced by the <literal>SimpleTextSegmenter</literal>.</para>

-      

-    </section>

-    

-    <section id="ugr.tug.cm.cm_and_fc">

-      <title>CAS Multipliers and Flow Control</title>

-      

-      <para>CAS Multipliers are only supported in the context of Fixed Flow or custom Flow Control. If you use the

-        built-in <quote>Fixed Flow</quote> for your Aggregate Analysis Engine, you can position the CAS

-        Multiplier anywhere in that flow. Processing then works as follows: When a CAS is input to the Aggregate AE,

-        that CAS is routed to the components in the order specified by the Fixed Flow, until that CAS reaches a CAS

-        Multiplier.</para>

-      

-      <para>Upon reaching a CAS Multiplier, if that CAS Multiplier produces new output CASes, then each output CAS

-        from that CAS Multiplier will continue through the flow, starting at the node immediately after the CAS

-        Multiplier in the Fixed Flow. No further processing will be done on the original input CAS after it has reached

-        a CAS Multiplier &ndash; it will <emphasis>not</emphasis> continue in the flow.</para>

-      

-      <para>If the CAS Multiplier does <emphasis>not</emphasis> produce any output CASes for a given input CAS,

-        then that input CAS <emphasis>will</emphasis> continue in the flow. This behavior is appropriate, for

-        example, for a CAS Multiplier that may segment an input CAS into pieces but only does so if the input CAS is

-        larger than a certain size.</para>

-      

-      <para>It is possible to put more than one CAS Multiplier in your flow. In this case, when a new CAS output from the

-        first CAS Multiplier reaches the second CAS Multiplier and if the second CAS Multiplier produces output

-        CASes, then no further processing will occur on the input CAS, and any new output CASes produced by the second

-        CAS Multiplier will continue the flow starting at the node after the second CAS Multiplier.</para>

-      

-      <para>This default behavior can be customized. The <literal>FixedFlowController</literal> component

-        that implement's UIMA&apos;s default flow defines a configuration parameter

-        <literal>ActionAfterCasMultiplier</literal> that can take the following values:</para>

-      <itemizedlist>

-        <listitem>

-          <para><literal>continue</literal> &ndash; the CAS continues on to the next element in the flow</para>

-        </listitem>

-        <listitem>

-          <para><literal>stop</literal> &ndash; the CAS will no longer continue in the flow, and will be returned

-            from the aggregate if possible.</para>

-        </listitem>

-        <listitem>

-          <para><literal>drop</literal> &ndash; the CAS will no longer continue in the flow, and will be dropped

-            (not returned from the aggregate) if possible.</para>

-        </listitem>

-        <listitem>

-          <para><literal>dropIfNewCasProduced</literal> (the default) &ndash; if the CAS multiplier produced

-            a new CAS as a result of processing this CAS, then this CAS will be dropped. If not, then this CAS will

-            continue.</para>

-        </listitem>

-      </itemizedlist>

-      

-      <para>You can override this parameter in your Aggregate Analysis Engine the same way you would override a

-        parameter in a delegate Analysis Engine. But to do so you must first explicitly identify that you are using the

-        <literal>FixedFlowController</literal> implementation by importing its descriptor into your

-        aggregate as follows:</para>

-      

-      

-      <programlisting>&lt;flowController key="FixedFlowController">

-          &lt;import name="org.apache.uima.flow.FixedFlowController"/>

-        &lt;/flowController>      </programlisting>

-      

-      <para>The parameter could then be overriden as, for example:</para>

-      

-      

-      <programlisting>&lt;configurationParameters>

-          &lt;configurationParameter>

-            &lt;name>ActionForIntermediateSegments&lt;/name>

-            &lt;type>String&lt;/type>

-            &lt;multiValued>false&lt;/multiValued>

-            &lt;mandatory>false&lt;/mandatory>

-            &lt;overrides>

-              &lt;parameter>

-                FixedFlowController/ActionAfterCasMultiplier

-              &lt;/parameter>

-            &lt;/overrides>

-          &lt;/configurationParameter>   

-        &lt;/configurationParameters>

-  

-       &lt;configurationParameterSettings>

-         &lt;nameValuePair>

-           &lt;name>ActionForIntermediateSegments&lt;/name>

-           &lt;value>

-             &lt;string>drop&lt;/string>

-           &lt;/value>

-         &lt;/nameValuePair>

-       &lt;/configurationParameterSettings></programlisting>

-      

-      <para>This overriding can also be done using the Component Descriptor Editor tool. An example of an Analysis

-        Engine that overrides this parameter can be found in

-        <literal>examples/descriptors/cas_multiplier/Segment_Annotate_Merge_AE.xml</literal>. For more

-        information about how to specify a flow controller as part of your Aggregate Analysis Engine descriptor, see

-          <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.fc.adding_fc_to_aggregate"/>.</para>

-      

-      <para>If you would like to further customize the flow, you will need to implement a custom FlowController as

-        described in <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.fc"/>. For example,

-        you could implement a flow where a CAS that is input to a CAS Multiplier will be processed further by

-        <emphasis>some</emphasis> downstream components, but not others.</para>

-      

-    </section>

-    

-    <section id="ugr.tug.cm.aggregate_cms">

-      <title>Aggregate CAS Multipliers</title>

-      

-      <para>An important consideration when you put a CAS Multiplier inside an Aggregate Analysis Engine is whether

-        you want the Aggregate to also function as a CAS Multiplier

-        &ndash; that is, whether you want the new output CASes produced within the Aggregate to be output from the

-        Aggregate. This is controlled by the <literal>&lt;outputsNewCASes&gt;</literal> element in the

-        Operational Properties of your Aggregate Analysis Engine descriptor. The syntax is the same as what was

-        described in <xref linkend="ugr.tug.cm.creating_cm_descriptor"/> .</para>

-      

-      <para>If you set this property to <literal>true</literal>, then any new output CASes produced by a CAS

-        Multiplier inside this Aggregate will be output from the Aggregate. Thus the Aggregate will function as a CAS

-        Multiplier and can be used in any of the ways in which a primitive CAS Multiplier can be used.</para>

-      

-      <para>If you set the &lt;outputsNewCASes&gt; property to <literal>false</literal> , then any new output

-        CASes produced by a CAS Multiplier inside the Aggregate will be dropped (i.e. the CASes will be released back

-        to the pool) once they have finished being processed. Such an Aggregate Analysis Engine functions just like a

-        <quote>normal</quote> non-CAS-Multiplier Analysis Engine; the fact that CAS Multiplication is

-        occurring inside it is hidden from users of that Analysis Engine.</para> <note>

-      <para>If you want to output some new Output CASes and not others, you need to implement a custom Flow Controller

-        that makes this decision &mdash; see <olink targetdoc="&uima_docs_tutorial_guides;"

-          targetptr="ugr.tug.fc.using_fc_with_cas_multipliers"/>. </para> </note>

-      

-    </section>

-  </section>

-  

-  <section id="ugr.tug.cm.using_cm_in_cpe">

-    <title>Using a CAS Multiplier in a Collection Processing Engine</title>

-    <titleabbrev>CAS Multipliers in CPE&apos;s</titleabbrev>

-    

-    <para>It is currently a limitation that CAS Multiplier cannot be deployed directly in a Collection Processing

-      Engine. The only way that you can use a CAS Multiplier in a CPE is to first wrap it in an Aggregate Analysis Engine

-      whose <literal>outputsNewCASes </literal>property is set to <literal>false</literal>, which in effect

-      hides the existence of the CAS Multiplier from the CPE.</para>

-    

-    <para>Note that you can build an Aggregate Analysis Engine that consists of CAS Multipliers and Annotators,

-      followed by CAS Consumers. This can simulate what a CPE would do, but without the deployment and error handling

-      options that the CPE provides.</para>

-    

-  </section>

-  

-  <section id="ugr.tug.cm.calling_cm_from_app">

-    <title>Calling a CAS Multiplier from an Application</title>

-    <titleabbrev>Applications: Calling CAS Multipliers</titleabbrev>    

-    

-    <section id="ugr.tug.cm.retrieving_output_cases">

-      <title>Retrieving Output CASes from the CAS Multiplier</title>

-      <titleabbrev>Output CASes</titleabbrev>

-      <para>The <literal>AnalysisEngine</literal> interface has the following methods that allow you to

-        interact with CAS Multiplier:

-        <itemizedlist>

-          <listitem>

-            <para><literal>CasIterator processAndOutputNewCASes(CAS)</literal></para>

-          </listitem>

-          <listitem>

-            <para><literal>JCasIterator processAndOutputNewCASes(JCas)</literal></para>

-          </listitem>

-        </itemizedlist></para>

-      

-      <para>From your application, you call <literal>processAndOutputNewCASes</literal> and pass it the input

-        CAS. An iterator is returned that allows you to step through each of the new output CASes that are produced by

-        the Analysis Engine.</para>

-      

-      <para>It is very important to realize that CASes are pooled objects and so your application must release each

-        CAS (by calling the <literal>CAS.release()</literal> method) that it obtains from the CasIterator

-        <emphasis>before</emphasis> it calls the <literal>CasIterator.next</literal> method again.

-        Otherwise, the CAS pool will be exhausted and a deadlock will occur.</para>

-      

-      <para>The example code in the class <literal>org.apache.uima.examples.casMultiplier.

-        CasMultiplierExampleApplication</literal> illusrates this. Here is the main processing loop:</para>

-      

-      

-      <programlisting>CasIterator casIterator = ae.processAndOutputNewCASes(initialCas);

-while (casIterator.hasNext()) {

-  CAS outCas = casIterator.next();

-

-  //dump the document text and annotations for this segment

-  System.out.println("********* NEW SEGMENT *********");

-  System.out.println(outCas.getDocumentText());

-  PrintAnnotations.printAnnotations(outCas, System.out); 

-

-  //release the CAS (important)

-  outCas.release();</programlisting>

-      

-      <para>Note that as defined by the CAS Multiplier contract in <xref

-          linkend="ugr.tug.cm.cm_interface_overview"/>, the CAS Multiplier owns the input CAS

-        (<literal>initialCas</literal> in the example) until the last new output CAS has been produced. This means

-        that the application should not try to make changes to <literal>initialCas</literal> until after the

-        <literal>CasIterator.hasNext</literal> method has returned false, indicating that the segmenter has

-        finished.</para>

-      

-      <para>Note that the processing time of the Analysis Engine is spread out over the calls to the

-        <literal>CasIterator&apos;s hasNext</literal> and <literal>next</literal> methods. That is, the next

-        output CAS may not actually be produced and annotated until the application asks for it. So the application

-        should not expect calls to the <literal>CasIterator</literal> to necessarily complete quickly.</para>

-      

-      <para>Also, calls to the <literal>CasIterator</literal> may throw Exceptions indicating an error has

-        occurred during processing. If an Exception is thrown, all processing of the input CAS will stop, and no more

-        output CASes will be produced. There is currently no error recovery mechanism that will allow processing to

-        continue after an exception.</para>

-                

-    </section>

-    <section id="ugr.tug.cm.using_cm_with_other_aes">

-      <title>Using a CAS Multiplier with other Analysis Engines</title> 

-      <titleabbrev>CAS Multipliers with other AEs</titleabbrev>     

-      <para>In your application you can take the output CASes from a CAS Multiplier and pass them to

-        the <literal>process</literal> method of other Analysis Engines.  However there are some

-        special considerations regarding the Type System of these CASes.</para>

-      <para>By default, the output CASes of a CAS Multiplier will have a Type System that contains all

-        of the types and features declared by any component in the outermost Aggregate Analysis Engine or

-        Collection Processing Engine that contains the CAS Multiplier.  If in your application you

-        create a CAS Multiplier and another Analysis Engine, where these are not enclosed in an aggregate,

-        then the output CASes from the CAS Multiplier will not support any types or features that are 

-        declared in the latter Analysis Engine but not in the CAS Multiplier.

-      </para>

-      <para>This can be remedied by forcing the CAS Multiplier and Analysis Engine to share a single

-        <literal>UimaContext</literal> when they are created, as follows:

-      <programlisting>//create a "root" UIMA context for your whole application

-

-UimaContextAdmin rootContext =

-   UIMAFramework.newUimaContext(UIMAFramework.getLogger(),

-      UIMAFramework.newDefaultResourceManager(),

-      UIMAFramework.newConfigurationManager());

-

-XMLInputSource input = new XMLInputSource("MyCasMultiplier.xml");

-AnalysisEngineDescription desc = UIMAFramework.getXMLParser().

-        parseAnalysisEngineDescription(input);

- 

-//create a UIMA Context for the new AE we are about to create

-

-//first argument is unique key among all AEs used in the application

-UimaContextAdmin childContext = rootContext.createChild(

-        "myCasMultiplier", Collections.EMPTY_MAP);

-

-//instantiate CAS Multiplier AE, passing the UIMA Context through the 

-//additional parameters map

-

-Map additionalParams = new HashMap();

-additionalParams.put(Resource.PARAM_UIMA_CONTEXT, childContext);

-

-AnalysisEngine casMultiplierAE = UIMAFramework.produceAnalysisEngine(

-        desc,additionalParams);

-

-//repeat for another AE      

-XMLInputSource input2 = new XMLInputSource("MyAE.xml");

-AnalysisEngineDescription desc2 = UIMAFramework.getXMLParser().

-        parseAnalysisEngineDescription(input2);

- 

-UimaContextAdmin childContext2 = rootContext.createChild(

-        "myAE", Collections.EMPTY_MAP);

-

-Map additionalParams2 = new HashMap();

-additionalParams2.put(Resource.PARAM_UIMA_CONTEXT, childContext2);

-

-AnalysisEngine myAE = UIMAFramework.produceAnalysisEngine(

-        desc2, additionalParams2);</programlisting>

-        

-      </para>

-    </section>

-    

-  </section>

-  

-  <section id="ugr.tug.cm.using_cm_to_merge_cases">

-    <title>Using a CAS Multiplier to Merge CASes</title>

-    <titleabbrev>Merging with CAS Multipliers</titleabbrev>    

-    

-    <para>A CAS Multiplier can also be used to combine smaller CASes together to form larger CASes. In this section we

-      describe how this works and walk through an example.</para>

-    

-    <section id="ugr.tug.cm.overview_of_how_to_merge_cases">

-      <title>Overview of How to Merge CASes</title>

-      <titleabbrev>CAS Merging Overview</titleabbrev>      

-      

-      <orderedlist>

-        <listitem>

-          <para>When the framework first calls the CAS Multiplier&apos;s <literal>process</literal> method,

-            the CAS Multiplier requests an empty CAS (which we'll call the "merged CAS") and copies relevant data

-            from the input CAS into the merged CAS. The class

-            <literal>org.apache.uima.util.CasCopier</literal> provides utilities for copying Feature

-            Structures between CASes.</para>

-        </listitem>

-        

-        <listitem>

-          <para>When the framework then calls the CAS Multiplier&apos;s <literal>hasNext</literal> method, the

-            CAS Multiplier returns <literal>false</literal> to indicate that it has no output at this

-            time.</para>

-        </listitem>

-        

-        <listitem>

-          <para>When the framework calls <literal>process</literal> again with a new input CAS, the CAS

-            Multiplier copies data from that input CAS into the merged CAS, combining it with the data that was

-            previously copied.</para>

-        </listitem>

-        

-        <listitem>

-          <para>Eventually, when the CAS Multiplier decides that it wants to output the merged CAS, it returns

-            <literal>true</literal> from the <literal>hasNext</literal> method, and then when the framework

-            subsequently calls the <literal>next</literal> method, the CAS Multiplier returns the merged

-            CAS.</para>

-        </listitem>

-      </orderedlist> <note>

-      <para>There is no explicit call to flush out any pending CASes from a CAS Multiplier when collection processing

-        completes. It is up to the application to provide some mechanism to let a CAS Multiplier recognize the last CAS

-        in a collection so that it can ensure that its final output CASes are complete.</para></note>

-    </section>

-    <section id="ugr.tug.cm.example_cas_merger">

-      <title>Example CAS Merger</title>

-      <para>An example CAS Multiplier that merges CASes can be found is provided in the UIMA SDK. The Java class for

-        this example is <literal>org.apache.uima.examples.casMultiplier.SimpleTextMerger</literal> and

-        the source code is located under the <literal>examples/src</literal> directory.</para>

-      <section id="ugr.tug.cm.example_cas_merger.process">

-        <title>Process Method</title>

-        <para>Almost all of the code for this example is in the <literal>process</literal> method. The first part of

-          the <literal>process</literal> method shows how to copy Feature Structures from the input CAS to the

-          "merged CAS":</para>

-        

-        

-        <programlisting>public void process(JCas aJCas) throws AnalysisEngineProcessException {

-    // procure a new CAS if we don't have one already

-    if (mMergedCas == null) {

-      mMergedCas = getEmptyJCas();

-    }

-

-    // append document text

-    String docText = aJCas.getDocumentText();

-    int prevDocLen = mDocBuf.length();

-    mDocBuf.append(docText);

-

-    // copy specified annotation types

-    // CasCopier takes two args: the CAS to copy from.

-    //                           the CAS to copy into.

-    CasCopier copier = new CasCopier(aJCas.getCas(), mMergedCas.getCas());

-    

-    // needed in case one annotation is in two indexes (could    

-    // happen if specified annotation types overlap)

-    Set copiedIndexedFs = new HashSet(); 

-    for (int i = 0; i &lt; mAnnotationTypesToCopy.length; i++) {

-      Type type = mMergedCas.getTypeSystem()

-          .getType(mAnnotationTypesToCopy[i]);

-      FSIndex index = aJCas.getCas().getAnnotationIndex(type);

-      Iterator iter = index.iterator();

-      while (iter.hasNext()) {

-        FeatureStructure fs = (FeatureStructure) iter.next();

-        if (!copiedIndexedFs.contains(fs)) {

-          Annotation copyOfFs = (Annotation) copier.copyFs(fs);

-          // update begin and end

-          copyOfFs.setBegin(copyOfFs.getBegin() + prevDocLen);

-          copyOfFs.setEnd(copyOfFs.getEnd() + prevDocLen);

-          mMergedCas.addFsToIndexes(copyOfFs);

-          copiedIndexedFs.add(fs);

-        }

-      }

-    }</programlisting>

-        

-        <para>The <literal>CasCopier</literal> class is used to copy Feature Structures of certain types

-          (specified by a configuration parameter) to the merged CAS. The <literal>CasCopier</literal> does deep

-          copies, meaning that if the copied FeatureStructure references another FeatureStructure, the

-          referenced FeatureStructure will also be copied.</para>

-        

-        <para>This example also merges the document text using a separate <literal>StringBuffer</literal>. Note

-          that we cannot append document text to the Sofa data of the merged CAS because Sofa data cannot be modified

-          once it is set.</para>

-        

-        <para>The remainder of the <literal>process</literal> method determines whether it is time to output a new

-          CAS. For this example, we are attempting to merge all CASes that are segments of one original artifact. This

-          is done by checking the

-          <code>SourceDocumentInformation</code> Feature Structure in the CAS to see if its

-          <code>lastSegment</code> feature is set to <literal>true</literal>. That feature (which is set by the

-          example

-          <code>SimpleTextSegmenter</code> discussed previously) marks the CAS as being the last segment of an

-          artifact, so when the CAS Multiplier sees this segment it knows it is time to produce an output CAS.</para>

-        

-        

-        <programlisting>// get the SourceDocumentInformation FS, 

-// which indicates the sourceURI of the document

-// and whether the incoming CAS is the last segment

-FSIterator it = aJCas

-        .getAnnotationIndex(SourceDocumentInformation.type).iterator();

-if (!it.hasNext()) {

-  throw new RuntimeException("Missing SourceDocumentInformation");

-}

-SourceDocumentInformation sourceDocInfo = 

-      (SourceDocumentInformation) it.next();

-if (sourceDocInfo.getLastSegment()) {

-  // time to produce an output CAS

-  // set the document text

-  mMergedCas.setDocumentText(mDocBuf.toString());

-

-  // add source document info to destination CAS

-  SourceDocumentInformation destSDI = 

-      new SourceDocumentInformation(mMergedCas);

-  destSDI.setUri(sourceDocInfo.getUri());

-  destSDI.setOffsetInSource(0);

-  destSDI.setLastSegment(true);

-  destSDI.addToIndexes();

-

-  mDocBuf = new StringBuffer();

-  mReadyToOutput = true;

-}</programlisting>

-        

-        <para>When it is time to produce an output CAS, the CAS Multiplier makes final updates to the merged CAS

-          (setting the document text and adding a <literal>SourceDocumentInformation</literal>

-          FeatureStructure), and then sets the <literal>mReadyToOutput</literal> field to true. This field is

-          then used in the <literal>hasNext</literal> and <literal>next</literal> methods.</para>

-      </section>

-      <section id="ugr.tug.cm.example_cas_merger.hasnext_and_next">

-        <title>HasNext and Next Methods</title>

-        <para>These methods are relatively simple:</para>

-        

-        

-        <programlisting>public boolean hasNext() throws AnalysisEngineProcessException {

-    return mReadyToOutput;

-  }

-

-  public AbstractCas next() throws AnalysisEngineProcessException {

-    if (!mReadyToOutput) {

-      throw new RuntimeException("No next CAS");

-    }

-    JCas casToReturn = mMergedCas;

-    mMergedCas = null;

-    mReadyToOutput = false;

-    return casToReturn;

-  }</programlisting>

-        <para>When the merged CAS is ready to be output, <literal>hasNext</literal> will return true, and

-          <literal>next</literal> will return the merged CAS, taking care to set the

-          <literal>mMergedCas</literal> field to

-          <code>null</code> so that the next call to

-          <code>process</code> will start with a fresh CAS.</para>

-      </section>

-    </section>

-    <section id="ugr.tug.cm.using_the_simple_text_merger_in_an_aggregate_ae">

-      <title>Using the SimpleTextMerger in an Aggregate Analysis Engine</title>

-      <titleabbrev>SimpleTextMerger in an Aggregate</titleabbrev>

-      

-      <para>An example descriptor for an Aggregate Analysis Engine that uses the

-        <literal>SimpleTextMerger</literal> is provided in

-        <literal>examples/descriptors/cas_multiplier/Segment_Annotate_Merge_AE.xml</literal>. This

-        Aggregate first runs the <literal>SimpleTextSegmenter</literal> example to break a large document into

-        segments. It then runs each segment through the example tokenizer and name recognizer annotators. Finally

-        it runs the <literal>SimpleTextMerger</literal> to reassemble the segments back into one CAS. The

-        <literal>Name</literal> annotations are copied to the final merged CAS but the <literal>Token</literal>

-        annotations are not.</para>

-      <para>This example illustrates how you can break large artifacts into pieces for more efficient processing

-        and then reassemble a single output CAS containing only the results most useful to the application.

-        Intermediate results such as tokens, which may consume a lot of space, need not be retained over the entire

-        input artifact.</para>

-      

-      <para>The intermediate segments are dropped and are never output from the Aggregate Analysis Engine.  This

-        is done by configuring the Fixed Flow Controller as described in 

-        <xref linkend="ugr.tug.cm.cm_and_fc"/>, above.</para>

-      

-      <para>Try running this Analysis Engine in the Document Analyzer tool with a large text file as input, to see that 

-        it outputs just one CAS per input file, and that the final CAS contains only the <literal>Name</literal> annotations. </para>

-    </section>

-  </section>

-</chapter>
\ No newline at end of file
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/tug.configuration.xml b/uima-docbook-tutorials-and-users-guides/src/docbook/tug.configuration.xml
deleted file mode 100644
index 73c1198..0000000
--- a/uima-docbook-tutorials-and-users-guides/src/docbook/tug.configuration.xml
+++ /dev/null
@@ -1,119 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
-<!ENTITY imgroot "images/tutorials_and_users_guides/tug.conf/">
-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent">  
-%uimaents;
-]>
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements.  See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership.  The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License.  You may obtain a copy of the License at
-
-   http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied.  See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-<chapter id="ugr.tug.conf">
-  <title>Top-level Pipeline Run Configuration Guide</title>
-  <titleabbrev>Top-level Configuration</titleabbrev>
-  
-  <para>This chapter describes an alternative, overriding method for specifying
-    many configuration settings for a particular UIMA pipeline <emphasis>run</emphasis>.
-    Its use is optional. 
-  </para>
-  
-  <section id="ugr.tug.conf.what-is-conf">
-    <title>What is Configuration?</title>
-    <para>Configuration is externally specified information that exists per 
-      UIMA pipeline run.  A common example is 
-      <xref id="ugr.tug.aae.configuration_logging">configuration parameters</xref>.
-      Other examples exist; for instance, you can specify where logging is to go.</para>
-      
-    <para>
-      Configuration consists of key - value pairs, typically stored in a 
-      separate file, and written in the style of Java properties files.  
-      The same syntax is followed, so you can write:
-      <preformat>  
-   my-key-name    :           my value string
-   my-key-name = my value string
-   my-key-name   my value string
-       or even
-   my-key-name
-      </preformat>
-    </para>
-    
-    <para>For the full set of rules, see the Javadocs for the Java <code>Properties</code>.
-    </para>
-    
-    <para>Multiple properties files are allowed; they are loaded in an order, such that
-      later ones take precidence over earlier ones.  So, if you have lots of properties,
-      you can put the defaults in one file, and then in another file, just override the
-      ones you need to.</para>
-      
-    <para>Individual key-value pairs can also be specified on the Java command line,
-      using the Java <code>-D</code> parameter; 
-      these override same-named keys that come from other sources.
-    </para>
-  </section>
-  
-  <section id="ugr.tug.conf.where-conf-comes-from">
-    <title>Where Configuration Information comes from</title>
-    <para>
-      The common use case is to put the key-value pairs into one (or more) properties files.
-      Properties can also be entered, individually, on the Java command line, as <code>-D</code>
-      parameters; properties entered in this way override other specifications.
-    </para>
-    
-    <para>
-      Properties may also be specified in a new XML element, &lt;topLevelSettings>, in the 
-      &lt;operationalProperties> element of a top level descriptor.   &lt;topLevelSettings> 
-      is ignored for other than the top level descriptor for a particular UIMA run.  
-    </para>
-    
-    <para>
-    &lt;topLevelSettings> can contain 0 or more standard UIMA &lt;import> elements, specifying
-    multiple properties files to import.  However, it can also have 0 or more &lt;settings
-    elements, the inner text of which is taken as the contents of a properties file, inline.
-    </para> 
-  </section>
-  
-  <section id="ugr.tug.conf.how-it-works">
-    <title>How it works</title>
-    <para>
-      At startup time, the property specifications are read and stored in a set of 
-      normal Java Properties class instances.  These are arranged according to the
-      defaulting order, in the case where there are more than one source.
-    </para>
-    
-    <para>
-      Individual configuration parameter specifications in the descriptors will
-      ignore this, unless a particular parameter declaration has an additional
-      &lt;topLevelName> element, which specifies the key name to be used at the 
-      top level to set this parameter's value.  If this name is present, and if
-      a value can be found for this key from among the top level property specifications
-      (including system properties set with the Java <code>-D</code> parameter), then
-      this value overrides the value otherwise specified for this parameter.
-    </para>
-    
-    <para>It is expected that the <emphasis>deployer</emphasis> will create 
-      the key names and add them to the configuration descriptors, where needed.
-    </para>
-    
-    <para>You may use the same top level name for multiple parameters; 
-    the effect of this is that one setting at the top level will override 
-    multiple parameters.</para>
-  </section>
-  
-</chapter>
-  
-  
\ No newline at end of file
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/tug.cpe.xml b/uima-docbook-tutorials-and-users-guides/src/docbook/tug.cpe.xml
deleted file mode 100644
index 993fac7..0000000
--- a/uima-docbook-tutorials-and-users-guides/src/docbook/tug.cpe.xml
+++ /dev/null
@@ -1,1344 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY imgroot "images/tutorials_and_users_guides/tug.cpe/">

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent">  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.tug.cpe">

-  <title>Collection Processing Engine Developer&apos;s Guide</title>

-  <titleabbrev>CPE Developer&apos;s Guide</titleabbrev>

-  

-  <note><para>The CPE (Collection Processing Engine) was an early

-  approach to supporting some scale-out use cases.  It is an older

-  approach that doesn't support some of the newer features of CASes 

-  such as multiple views and CAS Multipliers.  It has been

-  supplanted by UIMA-AS, which has full support for the new features.</para></note>

-  

-  <para>The UIMA Analysis Engine interface provides support for developing and integrating

-    algorithms that analyze unstructured data. Analysis Engines are designed to operate on a

-    per-document basis. Their interface handles one CAS at a time. UIMA provides additional

-    support for applying analysis engines to collections of unstructured data with its

-    <emphasis>Collection Processing Architecture</emphasis>. The Collection

-    Processing Architecture defines additional components for reading raw data formats

-    from data collections, preparing the data for processing by Analysis Engines, executing

-    the analysis, extracting analysis results, and deploying the overall flow in a variety of

-    local and distributed configurations.</para>

-  

-  <para>The functionality defined in the Collection Processing Architecture is

-    implemented by a <emphasis>Collection Processing Engine</emphasis> (CPE). A CPE

-    includes an Analysis Engine and adds a <emphasis>Collection Reader</emphasis>, a

-    <emphasis>CAS Initializer</emphasis> (deprecated as of version 2), and <emphasis>CAS

-    Consumers</emphasis>. The part of the UIMA Framework that supports the execution of

-    CPEs is called the Collection Processing Manager, or CPM.</para>

-  

-  <para>A Collection Reader provides the interface to the raw input data and knows how to

-    iterate over the data collection. Collection Readers are discussed in <xref

-      linkend="ugr.tug.cpe.collection_reader.developing"/>. The CAS Initializer

-    <footnote><para>CAS Initializers are deprecated in favor of a more general mechanism,

-    multiple subjects of analysis.</para></footnote> prepares an individual data item for

-    analysis and loads it into the CAS. CAS Initializers are discussed in <xref

-      linkend="ugr.tug.cpe.cas_initializer.developing"/> A CAS Consumer extracts

-    analysis results from the CAS and may also perform <emphasis>collection level

-    processing</emphasis>, or analysis over a collection of CASes. CAS Consumers are

-    discussed in <xref linkend="ugr.tug.cpe.cas_consumer.developing"/>.</para>

-  

-  <para>Analysis Engines and CAS Consumers are both instances of <emphasis>CAS

-    Processors</emphasis>. A Collection Processing Engine (CPE) may contain multiple CAS

-    Processors. An Analysis Engine contained in a CPE may itself be a Primitive or an Aggregate

-    (composed of other Analysis Engines). Aggregates may contain Cas Consumers. While

-    Collection Readers and CAS Initializers always run in the same JVM as the CPM, a CAS

-    Processor may be deployed in a variety of local and distributed modes, providing a number

-    of options for scalability and robustness. The different deployment options are covered

-    in detail in <xref linkend="ugr.tug.cpe.deployment_alternatives"/>.</para>

-  

-  <para>Each of the components in a CPE has an interface specified by the UIMA Collection

-    Processing Architecture and is described by a declarative XML descriptor file.

-    Similarly, the CPE itself has a well defined component interface and is described by a

-    declarative XML descriptor file.</para>

-  

-  <para>A user creates a CPE by assembling the components mentioned above. The UIMA SDK

-    provides a graphical tool, called the CPE Configurator, for assisting in the assembly of

-    CPEs. Use of this tool is summarized in <xref

-      linkend="ugr.tug.cpe.cpe_configurator"/>, and more details can be found in 

-      <olink targetdoc="&uima_docs_tools;"/>

-      <olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cpe"/>.

-    Alternatively, a CPE can be assembled by writing an XML CPE descriptor. Details on the CPE

-    descriptor, including its syntax and content, can be found in the 

-    <olink targetdoc="&uima_docs_ref;"/>

-    <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/>. The individual

-    components have associated XML descriptors, each of which can be created and / or edited

-    using the <olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cde">

-    Component Description Editor</olink>.</para>

-  

-  <para>A CPE is executed by a UIMA infrastructure component called the

-    <emphasis>Collection Processing Manager</emphasis> (CPM). The CPM provides a number

-    of services and deployment options that cover instantiation and execution of CPEs, error

-    recovery, and local and distributed deployment of the CPE components.</para>

-  

-  <section id="ugr.tug.cpe.concepts">

-    <title>CPE Concepts</title>

-    

-    <para> <xref linkend="ugr.tug.cpe.fig.cpe_components"/> illustrates the data flow

-      that occurs between the different types of components that make up a CPE.</para>

-    

-    <figure id="ugr.tug.cpe.fig.cpe_components">

-      <title>CPE Components</title>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="5.7in" format="PNG"

-            fileref="&imgroot;image002.png"/>

-        </imageobject>

-        <textobject><phrase>CPE Components and flow between them</phrase>

-        </textobject>

-      </mediaobject>

-    </figure>

-    

-    <para>The components of a CPE are:</para>

-    

-    <itemizedlist><listitem><para><emphasis>Collection Reader &ndash;</emphasis>

-      interfaces to a collection of data items (e.g., documents) to be analyzed. Collection

-      Readers return CASes that contain the documents to analyze, possibly along with

-      additional metadata.</para></listitem>

-      

-      <listitem><para><emphasis>Analysis Engine &ndash;</emphasis> takes a CAS,

-        analyzes its contents, and produces an enriched CAS. Analysis Engines can be

-        recursively composed of other Analysis Engines (called an

-        <emphasis>Aggregate</emphasis> Analysis Engine). Aggregates may also contain

-        CAS Consumers.</para></listitem>

-      

-      <listitem><para><emphasis>CAS Consumer &ndash;</emphasis> consume the enriched

-        CAS that was produced by the sequence of Analysis Engines before it, and produce an

-        application-specific data structure, such as a search engine index or database.

-        </para></listitem></itemizedlist>

-    

-    <para>A fourth type of component, the <emphasis>CAS Initializer,</emphasis> may be

-      used by a Collection Reader to populate a CAS from a document. However, as of UIMA

-      version 2 CAS Initializers are now deprecated in favor of a more general mechsanism,

-      multiple Subjects of Analysis.</para>

-    

-    <para>The Collection Processing Manager orchestrates the data flow

-      within a CPE, monitors status, optionally manages the life-cycle of internal

-      components and collects statistics.</para>

-    

-    <para>CASes are not saved in a persistent way by the framework. If you want to save CASes,

-      then you have to save each CAS as it comes through (for example) using a CAS Consumer you

-      write to do this, in whatever format you like. The UIMA SDK supplies an example CAS

-      Consumer to save CASes to XML files, either in the standard XMI format or in an older

-      format called XCAS.  It also supplies an example CAS Consumer to extract information from CASes and

-      store the results into a relational Database, using Java&apos;s JDBC APIs.</para>

-    

-  </section>

-  

-  <section id="ugr.tug.cpe.configurator_and_viewer">

-    <title>CPE Configurator and CAS viewer</title>

-    

-    <section id="ugr.tug.cpe.cpe_configurator">

-      <title>Using the CPE Configurator</title>

-      

-      <para>A CPE can be assembled by writing an XML CPE descriptor. Details on the CPE

-        descriptor, including its syntax and content, can be found in 

-        <olink targetdoc="&uima_docs_ref;"/>

-        <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/>. Rather than

-        edit raw XML, you may develop a CPE Descriptor using the CPE Configurator tool. The CPE

-        Configurator tool is described briefly in this section, and in more detail in 

-        <olink targetdoc="&uima_docs_tools;"/>

-        <olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cpe"/>.</para>

-      

-      <para>The CPE Configurator tool can be run from Eclipse (see <xref

-          linkend="ugr.tug.cpe.running_cpe_configurator_from_eclipse"/>, or using

-        the <literal>cpeGui</literal> shell script (<literal>cpeGui.bat</literal> on

-        Windows, <literal>cpeGui.sh</literal> on Unix), which is located in the

-        <literal>bin</literal> directory of the UIMA SDK installation. Executing this

-        batch file will display the window shown here:

-        

-        

-        <screenshot>

-          <mediaobject>

-            <imageobject>

-              <imagedata width="5.7in" format="JPG" fileref="&imgroot;image004.jpg"/>

-            </imageobject>

-            <textobject><phrase>Screenshot of CPE GUI</phrase></textobject>

-          </mediaobject>

-        </screenshot>

-        </para>

-      

-      <para>The window is divided into three sections, one each for the Collection Reader, 

-        Analysis Engines, and CAS Consumers.<footnote><para>There is also a fourth pane,

-        for the CAS Initializer, but it is hidden by default.  To enable it click the

-        <literal>View &rarr; CAS Initializer Panel</literal> menu item.</para></footnote> 

-        In each section, you select the component(s) you want to include in the CPE by 

-        browsing to their XML descriptors. The configuration parameters present in the XML 

-        descriptors will then be displayed in the GUI; these can be modified to override

-        the values present in the descriptor. For example, the screen shot below shows the 

-        CPE Configurator after the following components have been chosen:

-        

-        

-        <programlisting>Collection Reader: 

-   %UIMA_HOME%/examples/descriptors/collection_reader/

-          FileSystemCollectionReader.xml

-

-Analysis Engine: 

-   %UIMA_HOME%/examples/descriptors/analysis_engine/

-          NamesAndPersonTitles_TAE.xml

-

-CAS Consumer: 

-    %UIMA_HOME%/examples/descriptors/cas_consumer/

-          XmiWriterCasConsumer.xml</programlisting></para>

-      

-      

-      <screenshot>

-     <mediaobject>

-      <imageobject>

-        <imagedata width="5.7in" format="JPG" fileref="&imgroot;image006.jpg"/>

-      </imageobject>

-      <textobject><phrase>Screenshot of CPE GUI after fields filled in</phrase></textobject>

-    </mediaobject>

-    </screenshot>

-      

-      <para>For the File System Collection Reader, ensure that the Input Directory is set to

-        <literal>%UIMA_HOME%\examples\data</literal><footnote><para>Replace

-        <literal>%UIMA_HOME%</literal> with the path to where you installed UIMA.</para>

-        </footnote>. The other parameters may be left blank. For the External CAS Writer CAS

-        Consumer, ensure that the Output Directory is set to

-        <literal>%UIMA_HOME%\examples\data\processed</literal>.</para>

-      

-      <para>After selecting each of the components and providing configuration settings,

-        click the play (forward arrow) button at the bottom of the screen to begin processing.

-        A progress bar should be displayed in the lower left corner. (Note that the progress

-        bar will not begin to move until all components have completed their initialization,

-        which may take several seconds.) Once processing has begun, the pause and stop

-        buttons become enabled.</para>

-      

-      <para>If an error occurs, you will be informed by an error dialog. If processing

-        completes successfully, you will be presented with a performance report.</para>

-      

-      <para>Using the File menu, you can select <literal>Save CPE Descriptor </literal>to

-        create an .xml descriptor file that defines the CPE you have constructed. Later, you

-        can use <literal>Open CPE Descriptor</literal> to restore the CPE Configurator to

-        the saved state. Also, CPE descriptors can be used to run a CPE from a Java program

-        &ndash; see section <xref

-          linkend="ugr.tug.cpe.running_cpe_from_application"/>. CPE Descriptors

-        allow specifying operational parameters, such as error handling options, that are

-        not currently available for configuration through the CPE Configurator. For more

-        information on manually creating a CPE Descriptor, see the 

-        <olink targetdoc="&uima_docs_ref;"/>

-        <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/>.</para>

-            

-      <para>The CPE configured above runs a simple name and title annotator on the sample data

-        provided with the UIMA SDK and stores the results using the XMI Writer CAS Consumer. To

-        view the results, start the External CAS Annotation Viewer by running the

-        <literal>annotationViewer</literal> batch file

-        (<literal>annotationViewer.bat</literal> on Windows,

-        <literal>annotationViewer.sh</literal> on Unix), which is located in the

-        <literal>bin</literal> directory of the UIMA SDK installation. Executing this

-        batch file will display the window shown here:

-        

-        

-        <screenshot>

-    <mediaobject>

-      <imageobject>

-        <imagedata width="5.5in" format="JPG" fileref="&imgroot;image008.jpg"/>

-      </imageobject>

-      <textobject><phrase>Screenshot of Annotation Viewer results</phrase></textobject>

-    </mediaobject>

-  </screenshot>

-        </para>

-      

-      <para>Ensure that the Input Directory is the same as the Output Directory specified for

-        the XMI Writer CAS Consumer in the CPE configured above (e.g.,

-        <literal>%UIMA_HOME%\examples\data\processed</literal>) and that the TAE

-        Descriptor File is set to the Analysis Engine used in the CPE configured above (e.g.,

-        <literal>examples\descriptors\analysis_engine\NamesAndPersonTitles_TAE.xml</literal>

-        ).</para>

-      

-      <para>Click the View button to display the Analyzed Documents window:

-        

-        

-        <screenshot>

-    <mediaobject>

-      <imageobject>

-        <imagedata width="3.5in" format="JPG" fileref="&imgroot;image010.jpg"/>

-      </imageobject>

-      <textobject><phrase>Screenshot of CPE Configurator Analyzed Documents</phrase></textobject>

-    </mediaobject>

-  </screenshot>

-        </para>

-      

-      <para>Double click on any document in the list to view the analyzed document. Double

-        clicking the first document, IBM_LifeSciences.txt, will bring up the following

-        window:

-        

-        

-        <screenshot>

-    <mediaobject>

-      <imageobject>

-        <imagedata width="5.7in" format="JPG" fileref="&imgroot;image012.jpg"/>

-      </imageobject>

-      <textobject><phrase>Screenshot of Document and Annotation Viewer</phrase></textobject>

-    </mediaobject>

-  </screenshot>

-        </para>

-      

-      <para>This window shows the analysis results for the document. Clicking on any

-        highlighted annotation causes the details for that annotation to be displayed in the

-        right-hand pane. Here the annotation spanning <quote>John M. Thompson</quote> has

-        been clicked.</para>

-      

-      <para>Congratulations! You have successfully configured a CPE, saved its

-        descriptor, run the CPE, and viewed the analysis results.</para>

-    </section>

-    

-    <section id="ugr.tug.cpe.running_cpe_configurator_from_eclipse">

-      <title>Running the CPE Configurator from Eclipse</title>

-      

-      <para>If you have followed the instructions in <olink targetdoc="&uima_docs_overview;"/>

-        <olink targetdoc="&uima_docs_overview;"

-          targetptr="ugr.ovv.eclipse_setup"/> and imported the example Eclipse

-        project, then you should already have a Run configuration for the CPE Configurator

-        tool (called <literal>UIMA CPE GUI</literal>) configured to run in the example

-        project. Simply run that configuration to start the CPE Configurator.</para>

-      

-      <para>If you haven&apos;t followed the Eclipse setup instructions and wish to run the

-        CPE Configurator tool from Eclipse, you will need to do the following. As installed,

-        this Eclipse launch configuration is associated with the

-        <quote>uimaj-examples</quote> project. If you&apos;ve not already done so, you

-        may wish to import that project into your Eclipse workspace. It&apos;s located in

-        %UIMA_HOME%/docs/examples. Doing this will supply the Eclipse launcher with all

-        the class files it needs to run the CPE configurator. If you don&apos;t do this, please

-        manually add the JAR files for UIMA to the launch configuration.</para>

-      <para>Also, you need to add any projects or JAR files for any UIMA components you will be

-        running to the launch class path.</para> <note><para>A simpler alternative may be

-      to change the CPE launch configuration to be based on your project. If you do that, it will

-      pick up all the files in your project&apos;s class path, which you should set up to

-      include all the UIMA framework files. An easy way to do this is to specify in your

-      project&apos;s properties&apos; build-path that the uimaj-examples project is on

-      the build path, because the uimaj-examples project is set up to include all the UIMA

-      framework classes in its classpath already. </para></note>

-      

-      <para>Next, in the Eclipse menu select <literal>Run &rarr;

-        Run</literal>..., which brings up the Run configuration screen.</para>

-      

-      <para>In the Main tab, set the main class to

-        <literal>org.apache.uima.tools.cpm.CpmFrame</literal></para>

-      

-      <para>In the arguments tab, add the following to the VM arguments:

-        

-        

-        <programlisting>-Xms128M -Xmx256M 

--Duima.home="C:\Program Files\Apache\uima"</programlisting>

-        (or wherever you installed the UIMA SDK)</para>

-      

-      <para>Click the Run button to launch the CPE Configurator, and use it as previously

-        described in this section.</para>

-      

-    </section>

-  </section>

-  

-  <section id="ugr.tug.cpe.running_cpe_from_application">

-    <title>Running a CPE from Your Own Java Application</title>

-    

-    <para>The simplest way to run a CPE from a Java application is to first create a CPE

-      descriptor as described in the previous section. Then the CPE can be instantiated and

-      run using the following code:

-      

-      

-      <programlisting>      //parse CPE descriptor in file specified on command line

-CpeDescription cpeDesc = UIMAFramework.getXMLParser().

-        parseCpeDescription(new XMLInputSource(args[0]));

-      

-      //instantiate CPE

-mCPE = UIMAFramework.produceCollectionProcessingEngine(cpeDesc);

-

-      //Create and register a Status Callback Listener

-mCPE.addStatusCallbackListener(new StatusCallbackListenerImpl());

-

-      //Start Processing

-mCPE.process();</programlisting></para>

-    

-    <para>This will start the CPE running in a separate thread.</para>

-    

-    <note><para>The <literal>process()</literal> method for a CPE can only be called once.  If you 

-    need to call it again, you have to instantiate a new CPE, and call that new CPE's process

-    method.</para></note>

-    

-    <section id="ugr.tug.cpe.using_listeners">

-      <title>Using Listeners</title>

-      

-      <para>Updates of the CPM&apos;s progress, including any errors that occur, are sent to

-        the callback handler that is registered by the call to

-        <literal>addStatusCallbackListener</literal>, above. The callback handler is a

-        class that implements the CPM&apos;s

-        <literal>StatusCallbackListener</literal> interface. It responds to events by

-        printing messages to the console. The source code is fairly straightforward and is

-        not included in this chapter &ndash; see the

-        <literal>org.apache.uima.examples.cpe.SimpleRunCPE.java</literal> in the

-        <literal>%UIMA_HOME%\examples\src</literal> directory for the complete

-        code.</para>

-      

-      <para>If you need more control over the information in the CPE descriptor, you can

-        manually configure it via its API. See the Javadocs for package

-        <literal>org.apache.uima.collection</literal> for more details.</para>

-      

-    </section>

-  </section>

-  

-  <section id="ugr.tug.cpe.developing_collection_processing_components">

-    <title>Developing Collection Processing Components</title>

-    

-    <para>This section is an introduction to the process of developing Collection Readers,

-      CAS Initializers, and CAS Consumers. The code snippets refer to the classes that can be

-      found in <literal>%UIMA_HOME%\examples\src </literal>example project.</para>

-    

-    <para>In the following sections, classes you write to represent components need to be

-      public and have public, 0-argument constructors, so that they can be instantiated by

-      the framework. (Although Java classes in which you do not define any constructor will,

-      by default, have a 0-argument constructor that doesn&apos;t do anything, a class in

-      which you have defined at least one constructor does not get a default 0-argument

-      constructor.)</para>

-    

-    <section id="ugr.tug.cpe.collection_reader.developing">

-      <title>Developing Collection Readers</title>

-      

-      <para>A Collection Reader is responsible for obtaining documents from the collection

-        and returning each document as a CAS. Like all UIMA components, a Collection Reader

-        consists of two parts &mdash; the code and an XML descriptor.</para>

-      

-      <para>A simple example of a Collection Reader is the <quote>File System Collection

-        Reader,</quote> which simply reads documents from files in a specified directory.

-        The Java code is in the class

-        <literal>org.apache.uima.examples.cpe.FileSystemCollectionReader</literal>

-        and the XML descriptor is

-        <literal>%UIMA_HOME%/examples/src/main/descriptors/collection_reader/

-          FileSystemCollectionReader.xml</literal>.</para>

-      

-      <section id="ugr.tug.cpe.collection_reader.java_class">

-        <title>Java Class for the Collection Reader</title>

-        

-        <para>The Java class for a Collection Reader must implement the

-          <literal>org.apache.uima.collection.CollectionReader</literal>

-          interface. You may build your Collection Reader from scratch and implement this

-          interface, or you may extend the convenience base class

-          <literal>org.apache.uima.collection.CollectionReader_ImplBase</literal>

-          .</para>

-        

-        <para>The convenience base class provides default implementations for many of the

-          methods defined in the <literal>CollectionReader</literal> interface, and

-          provides abstract definitions for those methods that you are required to

-          implement in your new Collection Reader. Note that if you extend this base class,

-          you do not need to declare that your new Collection Reader implements the

-          <literal>CollectionReader</literal> interface.</para> <tip><para>Eclipse

-        tip &ndash; if you are using Eclipse, you can quickly create the boiler plate code and

-        stubs for all of the required methods by clicking <literal>File</literal>

-        &rarr; <literal>New</literal> &rarr; <literal>Class</literal> to bring up the <quote>New Java Class</quote>

-        dialogue, specifying

-        <literal>org.apache.uima.collection.CollectionReader_ImplBase</literal>

-        as the Superclass, and checking <quote>Inherited abstract methods</quote> in the

-        section <quote>Which method stubs would you like to create?</quote>, as in the 

-        screenshot below:</para></tip>     

-        

-        <screenshot>

-    <mediaobject>

-      <imageobject>

-        <imagedata width="4.4in" format="JPG" fileref="&imgroot;image014.jpg"/>

-      </imageobject>

-      <textobject><phrase>Screenshot showing Eclipse new class wizard</phrase></textobject>

-    </mediaobject>

-  </screenshot>

-        

-        <para>For the rest of this section we will assume that your new Collection Reader

-          extends the <literal>CollectionReader_ImplBase</literal> class, and we will

-          show examples from the

-          <literal>org.apache.uima.examples.cpe.FileSystemCollectionReader</literal>

-          . If you must inherit from a different superclass, you must ensure that your

-          Collection Reader implements the <literal>CollectionReader</literal>

-          interface &ndash; see the Javadocs for <literal>CollectionReader</literal>

-          for more details.</para>

-      </section>

-      

-      <section id="ugr.tug.cpe.collection_reader.required_methods">

-        <title>Required Methods in the Collection Reader class</title>

-        

-        

-        <para>The following abstract methods must be implemented:</para>

-        

-        <section id="ugr.tug.cpe.collection_reader.required_methods.initialize">

-          <title>initialize()</title>

-          

-          <para>The <literal>initialize()</literal> method is called by the framework

-            when the Collection Reader is first created.

-            <literal>CollectionReader_ImplBase</literal> actually provides a default

-            implementation of this method (i.e., it is not abstract), so you are not strictly

-            required to implement this method. However, a typical Collection Reader will

-            implement this method to obtain parameter values and perform various

-            initialization steps.</para>

-          

-          <para>In this method, the Collection Reader class can access the values of its

-            configuration parameters and perform other initialization logic. The example

-            File System Collection Reader reads its configuration parameters and then

-            builds a list of files in the specified input directory, as follows:</para>

-          

-          

-          <programlisting>public void initialize() throws ResourceInitializationException {

-  File directory = new File(

-            (String)getConfigParameterValue(PARAM_INPUTDIR));

-  mEncoding = (String)getConfigParameterValue(PARAM_ENCODING);

-  mDocumentTextXmlTagName = (String)getConfigParameterValue(PARAM_XMLTAG);

-  mLanguage = (String)getConfigParameterValue(PARAM_LANGUAGE);

-  mCurrentIndex = 0; 

-  

-  //get list of files (not subdirectories) in the specified directory

-  mFiles = new ArrayList();

-  File[] files = directory.listFiles();

-  for (int i = 0; i &lt; files.length; i++) {

-    if (!files[i].isDirectory()) {

-      mFiles.add(files[i]);  

-    }

-  }

-}</programlisting>

-          <note><para>This is the zero-argument version of the initialize method. There is

-          also a method on the Collection Reader interface called

-          <literal>initialize(ResourceSpecifier, Map)</literal> but it is not

-          recommended that you override this method in your code. That method performs

-          internal initialization steps and then calls the zero-argument

-          <literal>initialize()</literal>. </para></note>

-          

-        </section>

-        

-        <section id="ugr.tug.cpe.collection_reader.hasnext">

-          <title>hasNext()</title>

-          

-          <para>The <literal>hasNext()</literal> method returns whether or not there are

-            any documents remaining to be read from the collection. The File System

-            Collection Reader&apos;s <literal>hasNext()</literal> method is very

-            simple. It just checks if there are any more files left to be read:

-            

-            

-            <programlisting>public boolean hasNext() {

-  return mCurrentIndex &lt; mFiles.size();

-}</programlisting>

-            </para>

-          

-        </section>

-        

-        <section id="ugr.tug.cpe.collection_reader.required_methods.getnext">

-          <title>getNext(CAS)</title>

-          

-          <para>The <literal>getNext()</literal> method reads the next document from the

-            collection and populates a CAS. In the simple case, this amounts to reading the

-            file and calling the CAS&apos;s <literal>setDocumentText</literal> method.

-            The example File System Collection Reader is slightly more complex. It first

-            checks for a CAS Initializer. If the CPE includes a CAS Initializer, the CAS

-            Initializer is used to read the document, and

-            <literal>initialize()</literal> the CAS. If the CPE does not include a CAS

-            Initializer, the File System Collection Reader reads the document and sets the

-            document text in the CAS.</para>

-          

-          <para>The File System Collection Reader also stores additional metadata about

-            the document in the CAS. In particular, it sets the document&apos;s language in

-            the special built-in feature structure

-            <literal>uima.tcas.DocumentAnnotation </literal>(see 

-            <olink targetdoc="&uima_docs_ref;"/>

-            <olink targetdoc="&uima_docs_ref;"

-              targetptr="ugr.ref.cas.document_annotation"/> for details about this

-            built-in type) and creates an instance of

-            <literal>org.apache.uima.examples.SourceDocumentInformation</literal>

-            , which stores information about the document&apos;s source location. This

-            information may be useful to downstream components such as CAS Consumers. Note

-            that the type system descriptor for this type can be found in

-            <literal>org.apache.uima.examples.SourceDocumentInformation.xml</literal>

-            , which is located in the <literal>examples/src</literal> directory.</para>

-          

-          <para>The getNext() method for the File System Collection Reader looks like

-            this:</para>

-          

-          

-          <programlisting>  public void getNext(CAS aCAS) throws IOException, CollectionException {

-    JCas jcas;

-    try {

-      jcas = aCAS.getJCas();

-    } catch (CASException e) {

-      throw new CollectionException(e);

-    }

-

-    // open input stream to file

-    File file = (File) mFiles.get(mCurrentIndex++);

-    BufferedInputStream fis = 

-            new BufferedInputStream(new FileInputStream(file));

-    try {

-      byte[] contents = new byte[(int) file.length()];

-      fis.read(contents);

-      String text;

-      if (mEncoding != null) {

-        text = new String(contents, mEncoding);

-      } else {

-        text = new String(contents);

-      }

-      // put document in CAS

-      jcas.setDocumentText(text);

-    } finally {

-      if (fis != null)

-        fis.close();

-    }

-

-    // set language if it was explicitly specified 

-    //as a configuration parameter

-    if (mLanguage != null) {

-      ((DocumentAnnotation) jcas.getDocumentAnnotationFs()).

-            setLanguage(mLanguage);

-    }

-

-    // Also store location of source document in CAS. 

-    // This information is critical if CAS Consumers will 

-    // need to know where the original document contents 

-    // are located.

-    // For example, the Semantic Search CAS Indexer 

-    // writes this information into the search index that 

-    // it creates, which allows applications that use the 

-    // search index to locate the documents that satisfy 

-    //their semantic queries.

-    SourceDocumentInformation srcDocInfo = 

-            new SourceDocumentInformation(jcas);

-    srcDocInfo.setUri(

-            file.getAbsoluteFile().toURL().toString());

-    srcDocInfo.setOffsetInSource(0);

-    srcDocInfo.setDocumentSize((int) file.length());

-    srcDocInfo.setLastSegment(

-            mCurrentIndex == mFiles.size());

-    srcDocInfo.addToIndexes();

-  }</programlisting>

-          

-          <para>The Collection Reader can create additional annotations in the CAS at this

-            point, in the same way that annotators create annotations.</para>

-        </section>

-        

-        <section id="ugr.tug.cpe.collection_reader.required_methods.getprogress">

-          <title>getProgress()</title>

-          <para>The Collection Reader is responsible for returning progress information;

-            that is, how much of the collection has been read thus far and how much remains to be

-            read. The framework defines progress very generally; the Collection Reader

-            simply returns an array of <literal>Progress</literal> objects, where each

-            object contains three fields &mdash; the amount already completed, the total

-            amount (if known), and a unit (e.g. entities (documents), bytes, or files). The

-            method returns an array so that the Collection Reader can report progress in

-            multiple different units, if that information is available. The File System

-            Collection Reader&apos;s <literal>getProgress()</literal> method looks

-            like this:

-            

-            

-            <programlisting>public Progress[] getProgress() {

-  return new Progress[]{

-     new ProgressImpl(mCurrentIndex,mFiles.size(),Progress.ENTITIES)};

-}</programlisting></para>

-          

-          <para>In this particular example, the total number of files in the collection is

-            known, but the total size of the collection is not known. As such, a

-            <literal>ProgressImpl</literal> object for

-            <literal>Progress.ENTITIES</literal> is returned, but a

-            <literal>ProgressImpl</literal> object for

-            <literal>Progress.BYTES</literal> is not.</para>

-          

-        </section>

-        

-        <section id="ugr.tug.cpe.collection_reader.required_methods.close">

-          <title>close()</title>

-          

-          <para>The close method is called when the Collection Reader is no longer needed.

-            The Collection Reader should then release any resources it may be holding. The

-            FileSystemCollectionReader does not hold resources and so has an empty

-            implementation of this method:</para>

-          

-          

-          <programlisting>public void close() throws IOException { }</programlisting>

-          

-        </section>

-        

-        <section id="ugr.tug.cpe.collection_reader.optional_methods">

-          <title>Optional Methods</title>

-          

-          <para>The following methods may be implemented:</para>

-          

-          <section id="ugr.tug.cpe.collection_reader.optional_methods.reconfigure">

-            <title>reconfigure()</title>

-            <para>This method is called if the Collection Reader&apos;s configuration

-              parameters change.</para>

-          </section>

-          

-          <section id="ugr.tug.cpe.collection_reader.optional_methods.typesysteminit">

-            <title>typeSystemInit()</title>

-            

-            <para>If you are only setting the document text in the CAS, or if you are using the

-              JCas (recommended, as in the current example, you do not have to implement this

-              method. If you are directly using the CAS API, this method is used in the same way

-              as it is used for an annotator &ndash; see <olink

-                targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aae.contract_for_annotator_methods"/>

-              for more information.</para>

-          </section>

-        </section>

-        

-        <section id="ugr.tug.cpe.collection_reader.threading">

-          <title>Threading considerations</title>

-          

-          <para>Collection readers do not have to be thread safe; they are run with a single

-            thread per instance, and only one instance per instance of the Collection

-            Processing Manager (CPM) is made.</para>

-          

-        </section>

-        

-        <section id="ugr.tug.cpe.collection_reader.descriptor">

-          <title>XML Descriptor for a Collection Reader</title>

-          

-          <para>You can use the Component Description Editor to create and / or edit the File

-            System Collection Reader&apos;s descriptor. Here is its descriptor

-            (abbreviated somewhat), which is very similar to an Analysis

-            Engine descriptor:</para>

-          

-          

-          <programlisting><?db-font-size 80% ?><![CDATA[<collectionReaderDescription 

-          xmlns="http://uima.apache.org/resourceSpecifier">

-  <frameworkImplementation>org.apache.uima.java</frameworkImplementation>

-  <implementationName>

-    org.apache.uima.examples.cpe.FileSystemCollectionReader

-  </implementationName>

-  <processingResourceMetaData>

-    <name>File System Collection Reader</name>

-    <description>Reads files from the filesystem.</description>

-    <version>1.0</version>

-    <vendor>The Apache Software Foundation</vendor>

-    <configurationParameters>

-      <configurationParameter>

-        <name>InputDirectory</name>

-        <description>Directory containing input files</description>

-        <type>String</type>

-        <multiValued>false</multiValued>

-        <mandatory>true</mandatory>

-      </configurationParameter>

-      <configurationParameter>

-        <name>Encoding</name>

-        <description>Character encoding for the documents.</description>

-        <type>String</type>

-        <multiValued>false</multiValued>

-        <mandatory>false</mandatory>

-      </configurationParameter>

-      <configurationParameter>

-        <name>Language</name>

-        <description>ISO language code for the documents</description>

-        <type>String</type>

-        <multiValued>false</multiValued>

-        <mandatory>false</mandatory>

-      </configurationParameter>

-    </configurationParameters>

-    <configurationParameterSettings>

-      <nameValuePair>

-        <name>InputDirectory</name>

-        <value>

-          <string>C:/Program Files/apache/uima/examples/data</string>

-        </value>

-      </nameValuePair>

-    </configurationParameterSettings>

-    

-    <!-- Type System of CASes returned by this Collection Reader -->

-    

-    <typeSystemDescription>

-      <imports>

-        <import name="org.apache.uima.examples.SourceDocumentInformation"/>

-      </imports>

-    </typeSystemDescription>

-    

-    <capabilities>

-      <capability>

-        <inputs/>

-        <outputs>

-          <type allAnnotatorFeatures="true">

-            org.apache.uima.examples.SourceDocumentInformation

-          </type>

-        </outputs>

-      </capability>

-    </capabilities>

-    <operationalProperties>

-      <modifiesCas>true</modifiesCas>

-      <multipleDeploymentAllowed>false</multipleDeploymentAllowed>

-      <outputsNewCASes>true</outputsNewCASes>

-    </operationalProperties>

-  </processingResourceMetaData>

-</collectionReaderDescription>]]></programlisting>

-          

-        </section>

-      </section>

-    </section>

-    

-    <section id="ugr.tug.cpe.cas_initializer.developing"><title>Developing CAS

-      Initializers</title> <note><para>CAS Initializers are now deprecated (as of

-      version 2.1). For complex initialization, please use instead the capabilities of

-      creating additional Subjects of Analysis (see <olink

-        targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.mvs"/>

-      ). </para></note>

-      

-      <para>In UIMA 1.x, the CAS Initializer component was intended to be used as a plug-in

-        to the Collection Reader for when the task of populating the CAS from a raw document is

-        complex and might be reusable with other data collections.</para>

-          

-      <para>A CAS Initializer Java class must implement the interface

-        <literal>org.apache.uima.collection.CasInitializer</literal>, and will also

-        generally extend from the convenience base class

-        <literal>org.apache.uima.collection.CasInitializer_ImplBase</literal>. A

-        CAS Initializer also must have an XML descriptor, which has the exact same form as a

-        Collection Reader Descriptor except that the outer tag is

-        <literal>&lt;casInitializerDescription&gt;</literal>.</para>

-      

-      <para>CAS Initializers have optional <literal>initialize()</literal>,

-        <literal>reconfigure()</literal>, and <literal>typeSystemInit()</literal>

-        methods, which perform the same functions as they do for Collection Readers. The only

-        required method for a CAS Initializer is <literal>initializeCas(Object,

-        CAS)</literal>. This method takes the raw document (for example, an

-        <literal>InputStream</literal> object from which the document can be read) and a

-        CAS, and populates the CAS from the document.</para>      

-    </section>

-    

-    <section id="ugr.tug.cpe.cas_consumer.developing"><title>Developing CAS

-      Consumers</title> 

-      

-      <note><para>In version 2, there is no difference in capability

-      between CAS Consumers and ordinary Analysis Engines, except for the default setting of

-      the XML parameters for <literal>multipleDeploymentAllowed</literal> and

-      <literal>modifiesCas</literal>. We recommend for future work that users implement

-      and use Analysis Engine components instead of CAS Consumers.</para>

-      <para>The rest of this section is written using the version 1 style of CAS Consumer;

-      the methods described are also available for Analysis Engines.  Note that the 

-      CAS Consumer <literal>processCAS</literal> method is equivalent to the Analysis Engine

-      <literal>process</literal> method.</para></note>

-      

-      <para>A CAS Consumer receives each CAS after it has been analyzed by the Analysis

-        Engine. CAS Consumers typically do not update the CAS; they typically extract data

-        from the CAS and persist selected information to aggregate data structures such as

-        search engine indexes or databases.</para>

-      

-      <para>A CAS Consumer Java class must implement the interface

-        <literal>org.apache.uima.collection.CasConsumer</literal>, and will also

-        generally extend from the convenience base class

-        <literal>org.apache.uima.collection.CasConsumer_ImplBase</literal>. A CAS

-        Consumer also must have an XML descriptor, which has the exact same form as a

-        Collection Reader Descriptor except that the outer tag is

-        <literal>&lt;casConsumerDescription&gt;</literal>.</para>

-      

-      <para>CAS Consumers have optional <literal>initialize()</literal>,

-        <literal>reconfigure()</literal>, and <literal>typeSystemInit()</literal>

-        methods, which perform the same functions as they do for Collection Readers and CAS

-        Initializers. The only required method for a CAS Consumer is

-        <literal>processCas(CAS)</literal>, which is where the CAS Consumer does the bulk

-        of its work (i.e., consume the CAS).</para>

-      

-      <para>The <literal>CasConsumer</literal> interface (as well as the version 2

-        Analysis Engine interface) additionally defines batch

-        and collection level processing methods. The CAS Consumer or Analysis Engine

-        can implement the

-        <literal>batchProcessComplete()</literal> method to perform processing that

-        should occur at the end of each batch of CASes. Similarly, the CAS Consumer 

-        or Analysis Engine can

-        implement the <literal>collectionProcessComplete()</literal> method to

-        perform any collection level processing at the end of the collection.</para>

-      

-      <para>A very simple example of a CAS Consumer, which writes an XML representation of the

-        CAS to a file, is the XMI Writer CAS Consumer. The Java code is in the class

-        <literal>org.apache.uima.examples.cpe.XmiWriterCasConsumer</literal> and

-        the descriptor is in

-        <literal>%UIMA_HOME%/examples/descriptors/cas_consumer/XmiWriterCasConsumer.xml</literal>

-        .</para>

-      

-      <section id="ugr.tug.cpe.cas_consumer.required_methods">

-        <title>Required Methods for a CAS Consumer</title>

-        

-        <para>When extending the convenience class

-          <literal>org.apache.uima.collection.CasConsumer_ImplBase</literal>, the

-          following abstract methods must be implemented:</para>

-        

-        <section id="ugr.tug.cpe.cas_consumer.required_methods.initialize">

-          <title>initialize()</title>

-          <para>The <literal>initialize()</literal> method is called by the framework

-            when the CAS Consumer is first created.

-            <literal>CasConsumer_ImplBase</literal> actually provides a default

-            implementation of this method (i.e., it is not abstract), so you are not strictly

-            required to implement this method. However, a typical CAS Consumer will

-            implement this method to obtain parameter values and perform various

-            initialization steps.</para>

-          

-          <para>In this method, the CAS Consumer can access the values of its configuration

-            parameters and perform other initialization logic. The example XMI Writer CAS

-            Consumer reads its configuration parameters and sets up the output directory:

-            

-            

-            <programlisting><?db-font-size 80% ?>public void initialize() throws ResourceInitializationException {

-  mDocNum = 0;

-  mOutputDir = new File((String) getConfigParameterValue(PARAM_OUTPUTDIR));

-  if (!mOutputDir.exists()) {

-    mOutputDir.mkdirs();

-  }

-}</programlisting></para>

-        </section>

-        

-        <section id="ugr.tug.cpe.cas_consumer.required_methods.processcas">

-          <title>processCas()</title>

-          

-          <para>The <literal>processCas()</literal> method is where the CAS Consumer

-            does most of its work. In our example, the XMI Writer CAS Consumer obtains an

-            iterator over the document metadata in the CAS (in the

-            SourceDocumentInformation feature structure, which is created by the File

-            System Collection Reader) and extracts the URI for the current document. From

-            this the output filename is constructed in the output directory and a subroutine

-            (<literal>writeXmi</literal>) is called to generate the output file. The

-            <literal>writeXmi</literal> subroutine uses the

-            <literal>XmiCasSerializer</literal> class provided with the UIMA SDK to

-            serialize the CAS to the output file (see the example source code for

-            details).</para>

-          

-          

-          <programlisting>public void processCas(CAS aCAS) throws ResourceProcessException {

-  String modelFileName = null;

-

-  JCas jcas;

-  try {

-    jcas = aCAS.getJCas();

-  } catch (CASException e) {

-    throw new ResourceProcessException(e);

-  }

- 

-    // retreive the filename of the input file from the CAS

-  FSIterator it = jcas

-            .getAnnotationIndex(SourceDocumentInformation.type)

-                  .iterator();

-  File outFile = null;

-  if (it.hasNext()) {

-    SourceDocumentInformation fileLoc = 

-            (SourceDocumentInformation) it.next();

-    File inFile;

-    try {

-      inFile = new File(new URL(fileLoc.getUri()).getPath());

-      String outFileName = inFile.getName();

-      if (fileLoc.getOffsetInSource() > 0) {

-        outFileName += ("_" + fileLoc.getOffsetInSource());

-      }

-      outFileName += ".xmi";

-      outFile = new File(mOutputDir, outFileName);

-      modelFileName = mOutputDir.getAbsolutePath() + 

-            "/" + inFile.getName() + ".ecore";

-    } catch (MalformedURLException e1) {

-      // invalid URL, use default processing below

-    }

-  }

-  if (outFile == null) {

-    outFile = new File(mOutputDir, "doc" + mDocNum++);

-  }

-  // serialize XCAS and write to output file

-  try {

-    writeXmi(jcas.getCas(), outFile, modelFileName);

-  } catch (IOException e) {

-    throw new ResourceProcessException(e);

-  } catch (SAXException e) {

-    throw new ResourceProcessException(e);

-  }

-}</programlisting>

-          

-        </section>

-        

-        <section id="ugr.tug.cpe.cas_consumer.optional_methods">

-          <title>Optional Methods</title>

-          <para>The following methods are optional in a CAS Consumer, though they are often

-            used.</para>

-          <section id="ugr.tug.cpe.cas_consumer.optional_methods.batchprocesscomplete">

-            <title>batchProcessComplete()</title>

-            

-            <para>The framework calls the batchProcessComplete() method at the end of each

-              batch of CASes. This gives the CAS Consumer or Analysis Engine 

-              an opportunity to perform any batch

-              level processing. Our simple XMI Writer CAS Consumer does not perform any

-              batch level processing, so this method is empty. Batch size is set in the

-              Collection Processing Engine descriptor.</para>

-          </section>

-          

-          <section id="ugr.tug.cpe.cas_consumer.optional_methods.collectionprocesscomplete">

-            <title>collectionProcessComplete()</title>

-            

-            <para>The framework calls the collectionProcessComplete() method at the end

-              of the collection (i.e., when all objects in the collection have been

-              processed). At this point in time, no CAS is passed in as a parameter. This gives

-              the CAS Consumer or Analysis Engine an opportunity to perform collection processing over the

-              entire set of objects in the collection. Our simple XMI Writer CAS Consumer

-              does not perform any collection level processing, so this method is

-              empty.</para>

-          </section>

-          

-        </section>

-        

-      </section>

-    </section>

-  </section>

-  

-  <section id="ugr.tug.cpe.deploying_a_cpe">

-    <title>Deploying a CPE</title>

-    

-    <para>The CPM provides a number of service and deployment options that cover

-      instantiation and execution of CPEs, error recovery, and local and distributed

-      deployment of the CPE components. The behavior of the CPM (and correspondingly, the

-      CPE) is controlled by various options and parameters set in the CPE descriptor. The

-      current version of the CPE Configurator tool, however, supports only default error

-      handling and deployment options. To change these options, you must manually edit the

-      CPE descriptor.</para>

-    

-    <para>Eventually the CPE Configurator tool will support configuring these options and a

-      detailed tutorial for these settings will be provided. In the meantime, we provide only

-      a high-level, conceptual overview of these advanced features in the rest of this

-      chapter, and refer the advanced user to <olink targetdoc="&uima_docs_ref;"/>

-      <olink targetdoc="&uima_docs_ref;"

-        targetptr="ugr.ref.xml.cpe_descriptor"/> for details on setting these options in the CPE

-      Descriptor.</para>

-    

-    <para> <xref linkend="ugr.tug.cpe.fig.cpe_instantiation"/> shows a logical view of

-      how an application uses the UIMA framework to instantiate a CPE from a CPE descriptor.

-      The CPE descriptor identifies the CPE components (referencing their corresponding

-      descriptors) and specifies the various options for configuring the CPM and deploying

-      the CPE components.</para>

-    

-    <figure id="ugr.tug.cpe.fig.cpe_instantiation">

-      <title>CPE Instantiation</title>

-      <mediaobject>

-        <imageobject>

-          <imagedata width="5.7in" format="PNG"

-            fileref="&imgroot;image018.png"/>

-        </imageobject>

-        <textobject><phrase>Picture of deployment of a CPE</phrase></textobject>

-      </mediaobject>

-    </figure>

-    

-    <para id="ugr.tug.cpe.deployment_alternatives">There are three deployment modes

-      for CAS Processors (Analysis Engines and CAS Consumers) in a CPE:</para>

-    

-    <orderedlist><listitem><para><emphasis role="bold">Integrated</emphasis> (runs

-      in the same Java instance as the CPM)</para></listitem>

-      

-      <listitem><para><emphasis role="bold">Managed</emphasis> (runs in a separate

-        process on the same machine), and</para></listitem>

-      

-      <listitem><para><emphasis role="bold">Non-managed</emphasis> (runs in a

-        separate process, perhaps on a different machine). </para></listitem>

-    </orderedlist>

-    

-    <para>An integrated CAS Processor runs in the same JVM as the CPE. A managed CAS Processor

-      runs in a separate process from the CPE, but still on the same computer. The CPE controls

-      startup, shutdown, and recovery of a managed CAS Processor. A non-managed CAS

-      Processor runs as a service and may be on the same computer as the CPE or on a remote

-      computer. A non-managed CAS Processor <emphasis role="bold-italic">

-      service</emphasis> is started and managed independently from the CPE.</para>

-    

-    <para>For both managed and non-managed CAS Processors, the CAS must be transmitted

-      between separate processes and possibly between separate computers. This is

-      accomplished using <emphasis>Vinci</emphasis>, a communication protocol used by

-      the CPM and which is provided as a part of Apache UIMA. Vinci handles service naming and

-      location and data transport (see <olink targetdoc="&uima_docs_tutorial_guides;"

-        targetptr="ugr.tug.application.how_to_deploy_a_vinci_service"/>&nbsp; for more

-      information). Service naming and location are provided by a <emphasis>Vinci Naming

-      Service</emphasis>, or <emphasis>VNS</emphasis>. For managed CAS Processors, the

-      CPE uses its own internal VNS. For non-managed CAS Processors, a separate VNS must be

-      running.</para>

-    

-    <para>The CPE Configurator tool currently only supports constructing CPEs that deploy

-      CAS Processors in integrated mode. To deploy CAS Processors in any other mode, the CPE

-      descriptor must be edited by hand (better tooling may be provided later). Details on the

-      CPE descriptor and the required settings for various CAS Processor deployment modes

-      can be found in <olink targetdoc="&uima_docs_ref;"/>

-      <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/>

-      . In the following sections we merely summarize the various CAS Processor deployment

-      options.</para>

-    

-    <section id="ugr.tug.cpe.managed_deployment">

-      <title>Deploying Managed CAS Processors</title>

-      

-      <para>Managed CAS Processor deployment is shown in <xref

-          linkend="ugr.tug.cpe.fig.managed_deployment"/>. A managed CAS Processor is

-        deployed by the CPE as a Vinci service. The CPE manages the lifecycle of the CAS

-        Processor including service launch, restart on failures, and service shutdown. A

-        managed CAS Processor runs on the same machine as the CPE, but in a separate process.

-        This provides the necessary fault isolation for the CPE to protect it from non-robust

-        CAS Processors. A fatal failure of a managed CAS Processor does not threaten the

-        stability of the CPE.</para>

-      

-      <figure id="ugr.tug.cpe.fig.managed_deployment">

-        <title>CPE with Managed CAS Processors</title>

-        <mediaobject>

-          <imageobject>

-            <imagedata width="3.6in" format="PNG"

-              fileref="&imgroot;image020.png"/>

-          </imageobject>

-          <textobject><phrase>Managed deployment showing separate JVMs and CASes

-            flowing between them</phrase></textobject>

-        </mediaobject>

-      </figure>

-      

-      <para>The CPE communicates with managed CAS Processors using the Vinci communication

-        protocol. A CAS Processor is launched as a Vinci service and its

-        <literal>process()</literal> method is invoked remotely via a Vinci command. The

-        CPE uses its own internal VNS to support managed CAS processors. The VNS, by default,

-        listens on port 9005. If this port is not available, the VNS will increment its listen

-        port until it finds one that is available. All managed CAS Processors are internally

-        configured to <quote>talk</quote> to the CPE managed VNS. This internal VNS is

-        transparent to the end user launching the CPE.</para>

-      

-      <para>To deploy a managed CAS Processor, the CPE deployer must change the CPE

-        descriptor. The following is a section from the CPE descriptor that shows an example

-        configuration specifying a managed CAS Processor.</para>

-      

-      

-      <programlisting>&lt;casProcessor <emphasis role="bold-italic">deployment="local"</emphasis> name="Meeting Detector TAE"&gt;

-  &lt;descriptor&gt;

-    &lt;include href="deploy/vinci/Deploy_MeetingDetectorTAE.xml"/&gt;

-  &lt;/descriptor&gt;

-  &lt;runInSeparateProcess&gt;

-    &lt;exec dir="." executable="java"&gt;

-      &lt;env key="CLASSPATH" 

-         value="src;

-                C:/Program Files/apache/uima/lib/uima-core.jar;

-                C:/Program Files/apache/uima/lib/uima-cpe.jar;

-                C:/Program Files/apache/uima/lib/uima-examples.jar;

-                C:/Program Files/apache/uima/lib/uima-adapter-vinci.jar;

-                C:/Program Files/apache/uima/lib/jVinci.jar"/>

-      &lt;arg&gt;-DLOG=C:/Temp/service.log&lt;/arg&gt;

-      &lt;arg&gt;org.apache.uima.reference_impl.collection.

-         service.vinci.VinciAnalysisEnginerService_impl&lt;/arg&gt;

-      &lt;arg&gt;${descriptor}&lt;/arg&gt;

-    &lt;/exec&gt;

-  &lt;/runInSeparateProcess&gt;

-  &lt;deploymentParameters/&gt;

-  &lt;filter/&gt;

-  &lt;errorHandling&gt;

-    &lt;errorRateThreshold action="terminate" value="1/100"/&gt;

-    &lt;maxConsecutiveRestarts action="terminate" value="3"/&gt;

-    &lt;timeout max="100000"/&gt;

-  &lt;/errorHandling&gt;

-  &lt;checkpoint batch="10000"/&gt;

-&lt;/casProcessor&gt;</programlisting>

-      

-      <para>See <olink targetdoc="&uima_docs_ref;"/>

-        <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/> for

-        details and required settings.</para>

-      

-    </section>

-    

-    <section id="ugr.tug.cpe.deploying_nonmanaged_cas_processors">

-      <title>Deploying Non-managed CAS Processors</title>

-      

-      <para>Non-managed CAS Processor deployment is shown in <xref

-          linkend="ugr.tug.cpe.fig.nonmanaged_cpe"/>. In non-managed mode, the CPE

-        supports connectivity to CAS Processors running on local or remote computers using

-        Vinci. Non-managed processors are different from managed processors in two

-        aspects:

-        

-        <orderedlist><listitem><para>Non-managed processors are neither started nor

-          stopped by the CPE.</para></listitem>

-          

-          <listitem><para>Non-managed processors use an independent VNS, also neither

-            started nor stopped by the CPE. </para></listitem></orderedlist></para>

-      

-      <figure id="ugr.tug.cpe.fig.nonmanaged_cpe">

-        <title>CPE with non-managed CAS Processors</title>

-        <mediaobject>

-          <imageobject>

-            <imagedata width="4.8in" format="PNG"

-              fileref="&imgroot;image023.png"/>

-          </imageobject>

-          <textobject><phrase>Non-managed CPE deployment</phrase></textobject>

-        </mediaobject>

-      </figure>

-      

-      <para>While non-managed CAS Processors provide the same level of fault isolation and

-        robustness as managed CAS Processors, error recovery support for non-managed CAS

-        Processors is much more limited. In particular, the CPE cannot restart a non-managed

-        CAS Processor after an error.</para>

-      

-      <para>Non-managed CAS Processors also require a separate Vinci Naming Service

-        running on the network. This VNS must be manually started and monitored by the end user

-        or application. Instructions for running a VNS can be found in <olink

-          targetdoc="&uima_docs_tutorial_guides;"

-          targetptr="ugr.tug.application.vns.starting"/>.</para>

-      

-      <para>To deploy a non-managed CAS Processor, the CPE deployer must change the CPE

-        descriptor. The following is a section from the CPE descriptor that shows an example

-        configuration for the non-managed CAS Processor.</para>

-      

-      

-      <programlisting>&lt;casProcessor <emphasis role="bold-italic">deployment="remote"</emphasis> name="Meeting Detector TAE"&gt;

-  &lt;descriptor&gt;

-    &lt;include href=

-        "descriptors/vinciService/MeetingDetectorVinciService.xml"/&gt;

-  &lt;/descriptor&gt;

-  &lt;deploymentParameters/&gt;

-  &lt;filter/&gt;

-  &lt;errorHandling&gt;

-    &lt;errorRateThreshold action="terminate" value="1/100"/&gt;

-    &lt;maxConsecutiveRestarts action="terminate" value="3"/&gt;

-    &lt;timeout max="100000"/&gt;

-  &lt;/errorHandling&gt;

-  &lt;checkpoint batch="10000"/&gt;

-&lt;/casProcessor&gt;</programlisting>

-      

-      <para>See <olink targetdoc="&uima_docs_ref;"/>

-        <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/> for

-        details and required settings.</para>

-      

-    </section>

-    

-    <section id="ugr.tug.cpe.integrated_deployment">

-      <title>Deploying Integrated CAS Processors</title>

-      

-      <para>Integrated CAS Processors are shown in <xref

-          linkend="ugr.tug.cpe.fig.integrated_deployment"/>. Here the CAS Processors

-        run in the same JVM as the CPE, just like the Collection Reader and CAS Initializer.

-        This deployment method results in minimal CAS communication and transport overhead

-        as the CAS is shared in the same process space of the JVM. However, a CPE running with all

-        integrated CAS Processors is limited in scalability by the capability of the single

-        computer on which the CPE is running. There is also a stability risk associated with

-        integrated processors because a poorly written CAS Processor can cause the JVM, and

-        hence the entire CPE, to abort.</para>

-      

-      <figure id="ugr.tug.cpe.fig.integrated_deployment">

-        <title>CPE with integrated CAS Processor</title>

-        <mediaobject>

-          <imageobject>

-            <imagedata width="3.2in" format="PNG"

-              fileref="&imgroot;image026.png"/>

-          </imageobject>

-          <textobject><phrase>CPE with integrated CAS Processor</phrase>

-          </textobject>

-        </mediaobject>

-      </figure>

-      

-      <para>The following is a section from a CPE descriptor that shows an example

-        configuration for the integrated CAS Processor.</para>

-      

-      

-      <programlisting>&lt;casProcessor <emphasis role="bold-italic">deployment=<quote>integrated</quote></emphasis> name=<quote>Meeting Detector TAE</quote>&gt;

-  &lt;descriptor&gt;

-    &lt;include href="descriptors/tutorial/ex4/MeetingDetectorTAE.xml"/&gt;

-  &lt;/descriptor&gt;

-  &lt;deploymentParameters/&gt;

-  &lt;filter/&gt;

-  &lt;errorHandling&gt;

-    &lt;errorRateThreshold action="terminate" value="100/1000"/&gt;

-    &lt;maxConsecutiveRestarts action="terminate" value="30"/&gt;

-    &lt;timeout max="100000"/&gt;

-  &lt;/errorHandling&gt;

-  &lt;checkpoint batch="10000"/&gt;

-&lt;/casProcessor&gt;</programlisting>

-      

-      <para>See <olink targetdoc="&uima_docs_ref;"/>

-        <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/> for

-        details and required settings.</para>

-      

-    </section>

-  </section>

-  

-  <section id="ugr.tug.cpe.collection_processing_examples">

-    <title>Collection Processing Examples</title>

-    

-    <para>The UIMA SDK includes a set of examples illustrating the three modes of deployment,

-      integrated, managed, and non-managed. These are in the

-      <literal>/examples/descriptors/collection_processing_engine</literal>

-      directory. There are three CPE descriptors that run an example annotator (the Meeting

-      Finder) in these modes.</para>

-    

-    <para>To run either the integrated or managed examples, use the

-      <literal>runCPE</literal> script in the /bin directory of the UIMA installation,

-      passing the appropriate CPE descriptor as an argument, or

-      if you're using Eclipse and have the <literal>uimaj-examples</literal> project in your

-    workspace, you can use the Eclipse Menu &rarr; Run &rarr; Run... &rarr; and then pick the 

-    launch configuration <quote>UIMA Run CPE</quote>.</para> 

-    

-    <note><para>The <literal>runCPE</literal> script <emphasis role="bold-italic"> must</emphasis> 

-    be run from the <literal>%UIMA_HOME%\examples</literal> directory, because the example

-    CPE descriptors use relative path names that are resolved relative to this working directory. 

-    For instance,

-   

-    <literallayout>runCPE

-descriptors\collection_processing_engine\MeetingFinderCPE_Integrated.xml</literallayout></para>

-    </note>

-    

-    <!--

-    <para>If you installed the examples into Eclipse, you can run directly from Eclipse by

-      creating a run configuration. To do this, highlight the SimpleRunCPE.java source file

-      in the examples src/org/apache/uima/examples/cpe directory, and then</para>

-    

-    <orderedlist><listitem><para>pick the menu Run &rarr; Run...</para></listitem>

-      

-      <listitem><para>click <quote>Java Application</quote> and press

-        <quote>New</quote></para></listitem>

-      

-      <listitem><para>click on the Arguments panel, and insert a path to the appropriate CPE

-        descriptor in the <quote>Program Arguments</quote> box by typing, for instance:

-        <literal>descriptors/collection_processing_engine/

-          MeetingFinderCPE_Integrated.xml</literal>

-        </para></listitem>

-      

-      <listitem><para>Then press <quote>Run</quote> </para></listitem>

-    </orderedlist>

-    -->

-    

-    <para>To run the non-managed example, there are some additional steps.

-      

-      <orderedlist><listitem><para>Start a VNS service by running the

-        <literal>startVNS</literal> script in the <literal>/bin</literal>

-        directory, or using the Eclipse launcher <quote>UIMA Start VNS</quote>.</para></listitem>

-        

-        <listitem><para>Deploy the Meeting Detector Analysis Engine as a Vinci service, by

-          running the <literal>startVinciService</literal> script in the

-          <literal>/bin</literal> directory or using the Eclipse launcher for this, and passing it the location of the

-          descriptor to deploy, in this case

-          <literal>%UIMA_HOME%/examples/deploy/vinci/Deploy_MeetingDetectorTAE.xml</literal>,

-          or

-      if you're using Eclipse and have the <literal>uimaj-examples</literal> project in your

-    workspace, you can use the Eclipse Menu &rarr; Run &rarr; Run... &rarr; and then pick the 

-    launch configuration <quote>UIMA Start Vinci Service</quote>.

-          </para></listitem>

-        

-        <listitem><para>Now, run the runCPE script (or if in Eclipse, run the 

-          launch configuration <quote>UIMA Run CPE</quote>), passing it the CPE for the non-managed

-          version

-          <literal>(%UIMA_HOME%/examples/descriptors/collection_processing_engine/

-            MeetingFinderCPE_NonManaged.xml</literal>

-          ). </para></listitem></orderedlist></para>

-    

-    <para>This assumes that the Vinci Naming Service, the runCPE application, and the

-      <literal>MeetingDetectorTAE</literal> service are all running on the same machine.

-      Most of the scripts that need information about VNS will look for values to use in

-      environment variables VNS_HOST and VNS_PORT; these default to

-      <quote>localhost</quote> and <quote>9000</quote>. You may set these to appropriate

-      values before running the scripts, as needed; you can also pass the name of the VNS host as

-      the second argument to the startVinciService script.</para>

-    

-    <para>Alternatively, you can edit the scripts and/or the XML files to specify

-      alternatives for the VNS_HOST and VNS_PORT. For instance, if the

-      <literal>runCPE</literal> application is running on a different machine from the

-      Vinci Naming Service, you can edit the

-      <literal>MeetingFinderCPE_NonManaged.xml</literal> and change the vnsHost

-      parameter:

-      <literal>&lt;parameter name="vnsHost"  value="localhost" type="string"/&gt;</literal>

-      to specify the VNS host instead of <quote>localhost</quote>.</para>

-  </section>

-  

-</chapter>

-

diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/tug.fc.xml b/uima-docbook-tutorials-and-users-guides/src/docbook/tug.fc.xml
deleted file mode 100644
index 4ad2474..0000000
--- a/uima-docbook-tutorials-and-users-guides/src/docbook/tug.fc.xml
+++ /dev/null
@@ -1,396 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY imgroot "images/tutorials_and_users_guides/tug.fc/">

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent">  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.tug.fc">

-  <title>Flow Controller Developer&apos;s Guide</title>

-  

-  <para>A Flow Controller is a component that plugs into an Aggregate Analysis Engine. When a CAS is input to the

-    Aggregate, the Flow Controller determines the order in which the components of that aggregate are invoked on that

-    CAS. The ability to provide your own Flow Controller implementation is new as of release 2.0 of UIMA.</para>

-  

-  <para>Flow Controllers may decide the flow dynamically, based on the contents of the CAS. So, as just one example,

-    you could develop a Flow Controller that first sends each CAS to a Language Identification Annotator and then,

-    based on the output of the Language Identification Annotator, routes that CAS to an Annotator that is specialized

-    for that particular language.</para>

-  

-  <section id="ugr.tug.fc.developing_fc_code">

-    <title>Developing the Flow Controller Code</title>

-    

-    <section id="ugr.tug.fc.fc_interface_overview">

-      <title>Flow Controller Interface Overview</title>

-      

-      <para>Flow Controller implementations should extend from the

-        <literal>JCasFlowController_ImplBase</literal> or

-        <literal>CasFlowController_ImplBase</literal> classes, depending on which CAS interface they prefer

-        to use. As with other types of components, the Flow Controller ImplBase classes define optional

-        <literal>initialize</literal>, <literal>destroy</literal>, and <literal>reconfigure</literal>

-        methods. They also define the required method <literal>computeFlow</literal>.</para>

-      

-      <para>The <literal>computeFlow</literal> method is called by the framework whenever a new CAS enters the

-        Aggregate Analysis Engine. It is given the CAS as an argument and must return an object which implements the

-        <literal>Flow</literal> interface (the Flow object). The Flow Controller developer must define this

-        object. It is the object that is responsible for routing this particular CAS through the components of the

-        Aggregate Analysis Engine. For convenience, the framework provides basic implementation of flow objects

-        in the classes CasFlow_ImplBase and JCasFlow_ImplBase; use the JCas one if you are using the JCas interface

-        to the CAS.</para>

-      

-      <para>The framework then uses the Flow object and calls its <literal>next()</literal> method, which returns

-        a <literal>Step</literal> object (implemented by the UIMA Framework) that indicates what to do next with

-        this CAS next. There are three types of steps currently supported:</para>

-      

-      <itemizedlist>

-        <listitem>

-          <para><literal>SimpleStep</literal>, which specifies a single Analysis Engine that should receive

-            the CAS next.</para>

-        </listitem>

-        

-        <listitem>

-          <para><literal>ParallelStep</literal>, which specifies that multiple Analysis Engines should

-            receive the CAS next, and that the relative order in which these Analysis Engines execute does not

-            matter. Logically, they can run in parallel. The runtime is not obligated to actually execute them in

-            parallel, however, and the current implementation will execute them serially in an arbitrary

-            order.</para>

-        </listitem>

-        

-        <listitem>

-          <para><literal>FinalStep</literal>, which indicates that the flow is completed. </para>

-        </listitem>

-      </itemizedlist>

-      

-      <para>After executing the step, the framework will call the Flow object&apos;s <literal>next()</literal>

-        method again to determine the next destination, and this will be repeated until the Flow Object indicates

-        that processing is complete by returning a <literal>FinalStep</literal>.</para>

-      

-      <para>The Flow Controller has access to a <literal>FlowControllerContext</literal>, which is a subtype of

-        <literal>UimaContext</literal>. In addition to the configuration parameter and resource access

-        provided by a <literal>UimaContext</literal>, the <literal>FlowControllerContext</literal> also

-        gives access to the metadata for all of the Analysis Engines that the Flow Controller can route CASes to. Most

-        Flow Controllers will need to use this information to make routing decisions. You can get a handle to the

-        <literal>FlowControllerContext</literal> by calling the <literal>getContext()</literal> method

-        defined in <literal>JCasFlowController_ImplBase</literal> and

-        <literal>CasFlowController_ImplBase</literal>. Then, the

-        <literal>FlowControllerContext.getAnalysisEngineMetaDataMap</literal> method can be called to get a

-        map containing an entry for each of the Analysis Engines in the Aggregate. The keys in this map are the same as

-        the delegate analysis engine keys specified in the aggregate descriptor, and the values are the

-        corresponding <literal>AnalysisEngineMetaData</literal> objects.</para>

-      

-      <para>Finally, the Flow Controller has optional methods <literal>addAnalysisEngines</literal> and

-        <literal>removeAnalysisEngines</literal>. These methods are intended to notify the Flow Controller if

-        new Analysis Engines are available to route CASes to, or if previously available Analysis Engines are no

-        longer available. However, the current version of the Apache UIMA framework does not support dynamically

-        adding or removing Analysis Engines to/from an aggregate, so these methods are not currently called. Future

-        versions may support this feature. </para>

-    </section>

-    

-    <section id="ugr.tug.fc.example_code">

-      <title>Example Code</title>

-      

-      <para>This section walks through the source code of an example Flow Controller that simluates a simple version

-        of the <quote>Whiteboard</quote> flow model. At each step of the flow, the Flow Controller looks it all of the

-        available Analysis Engines that have not yet run on this CAS, and picks one whose input requirements are

-        satisfied.</para>

-      

-      <para>The Java class for the example is

-        <literal>org.apache.uima.examples.flow.WhiteboardFlowController</literal> and the source code is

-        included in the UIMA SDK under the <literal>examples/src</literal> directory.</para>

-      

-      <section id="ugr.tug.fc.whiteboard">

-        <title>The WhiteboardFlowController Class</title>

-        

-        

-        <programlisting>public class WhiteboardFlowController 

-          extends CasFlowController_ImplBase {

-  public Flow computeFlow(CAS aCAS) 

-          throws AnalysisEngineProcessException {

-    WhiteboardFlow flow = new WhiteboardFlow();

-    // As of release 2.3.0, the following is not needed,

-    //   because the framework does this automatically

-    // flow.setCas(aCAS); 

-                        

-    return flow;

-  }

-

-  class WhiteboardFlow extends CasFlow_ImplBase {

-     // Discussed Later

-  }

-}</programlisting>

-        

-        <para>The <literal>WhiteboardFlowController</literal> extends from

-          <literal>CasFlowController_ImplBase</literal> and implements the

-          <literal>computeFlow</literal> method. The implementation of the <literal>computeFlow</literal>

-          method is very simple; it just constructs a new <literal>WhiteboardFlow</literal> object that will be

-          responsible for routing this CAS.  The framework will add a handle to that CAS

-          which it will later use to make its routing decisions.</para>

-        

-        <para>Note that we will have one instance of <literal>WhiteboardFlow</literal> per CAS, so if there are

-          multiple CASes being simultaneously processed there will not be any confusion.</para>

-        

-      </section>

-      <section id="ugr.tug.fc.whiteboardflow">

-        <title>The WhiteboardFlow Class</title>

-        

-        

-        <programlisting>class WhiteboardFlow extends CasFlow_ImplBase {

-  private Set mAlreadyCalled = new HashSet();

-

-  public Step next() throws AnalysisEngineProcessException {

-    // Get the CAS that this Flow object is responsible for routing.

-    // Each Flow instance is responsible for a single CAS.

-    CAS cas = getCas();

-

-    // iterate over available AEs

-    Iterator aeIter = getContext().getAnalysisEngineMetaDataMap().

-        entrySet().iterator();

-    while (aeIter.hasNext()) {

-      Map.Entry entry = (Map.Entry) aeIter.next();

-      // skip AEs that were already called on this CAS

-      String aeKey = (String) entry.getKey();

-      if (!mAlreadyCalled.contains(aeKey)) {

-        // check for satisfied input capabilities 

-        //(i.e. the CAS contains at least one instance

-        // of each required input

-        AnalysisEngineMetaData md = 

-            (AnalysisEngineMetaData) entry.getValue();

-        Capability[] caps = md.getCapabilities();

-        boolean satisfied = true;

-        for (int i = 0; i &lt; caps.length; i++) {

-          satisfied = inputsSatisfied(caps[i].getInputs(), cas);

-          if (satisfied)

-            break;

-        }

-        if (satisfied) {

-          mAlreadyCalled.add(aeKey);

-          if (mLogger.isLoggable(Level.FINEST)) {

-            getContext().getLogger().log(Level.FINEST, 

-                "Next AE is: " + aeKey);

-          }

-          return new SimpleStep(aeKey);

-        }

-      }

-    }

-    // no appropriate AEs to call - end of flow

-    getContext().getLogger().log(Level.FINEST, "Flow Complete.");

-    return new FinalStep();

-  }

-

-  private boolean inputsSatisfied(TypeOrFeature[] aInputs, CAS aCAS) {

-      //implementation detail; see the actual source code

-  }

-}</programlisting>

-        

-        <para>Each instance of the <literal>WhiteboardFlowController</literal> is responsible for routing a

-          single CAS. A handle to the CAS instance is available by calling the <literal>getCas()</literal> method,

-          which is a standard method defined on the <literal>CasFlow_ImplBase </literal>superclass.</para>

-        

-        <para>Each time the <literal>next</literal> method is called, the Flow object iterates over the metadata

-          of all of the available Analysis Engines (obtained via the call to <literal>getContext().

-          getAnalysisEngineMetaDataMap)</literal> and sees if the input types declared in an

-          AnalysisEngineMetaData object are satisfied by the CAS (that is, the CAS contains at least one instance of

-          each declared input type). The exact details of checking for instances of types in the CAS are not discussed

-          here &ndash; see the WhiteboardFlowController.java file for the complete source.</para>

-        

-        <para>When the Flow object decides which AnalysisEngine should be called next, it indicates this by

-          creating a SimpleStep object with the key for that AnalysisEngine and returning it:</para>

-        

-        <programlisting>return new SimpleStep(aeKey);</programlisting>

-        

-        <para>The Flow object keeps a list of which Analysis Engines it has invoked in the

-          <literal>mAlreadyCalled</literal> field, and never invokes the same Analysis Engine twice. Note this

-          is not a hard requirement. It is acceptable to design a FlowController that invokes the same Analysis

-          Engine more than once. However, if you do this you must make sure that the flow will eventually

-          terminate.</para>

-        

-        <para>If there are no Analysis Engines left whose input requirements are satisfied, the Flow object signals

-          the end of the flow by returning a FinalStep object:</para>

-        

-        <programlisting>return new FinalStep();</programlisting>

-        

-        <para>Also, note the use of the logger to write tracing messages indicating the decisions made by the Flow

-          Controller. This is a good practice that helps with debugging if the Flow Controller is behaving in an

-          unexpected way.</para>

-      </section>

-    </section>

-  </section>

-  

-  <section id="ugr.tug.fc.creating_fc_descriptor">

-    <title>Creating the Flow Controller Descriptor</title>

-    

-    <para>To create a Flow Controller Descriptor in the CDE, use File &rarr; New &rarr; Other

-      &rarr; UIMA &rarr; Flow Controller Descriptor File:

-      

-      

-      <screenshot>

-    <mediaobject>

-      <imageobject>

-        <imagedata width="5.5in" format="JPG" fileref="&imgroot;image002.jpg"/>

-      </imageobject>

-      <textobject><phrase>Screenshot of Eclipse new object wizard showing Flow Controller</phrase></textobject>

-    </mediaobject>

-  </screenshot></para>

-    

-    <para>This will bring up the Overview page for the Flow Controller Descriptor:

-      

-      

-      <screenshot>

-    <mediaobject>

-      <imageobject>

-        <imagedata width="5.5in" format="JPG" fileref="&imgroot;image004.jpg"/>

-      </imageobject>

-      <textobject><phrase>Screenshot of Component Descriptor Editor Overview page for new Flow Controller</phrase></textobject>

-    </mediaobject>

-  </screenshot></para>

-    

-    <para>Type in the Java class name that implements the Flow Controller, or use the <quote>Browse</quote> button

-      to select it. You must select a Java class that implements the <literal>FlowController</literal>

-      interface.</para>

-    

-    <para>Flow Controller Descriptors are very similar to Primitive Analysis Engine Descriptors &ndash; for

-      example you can specify configuration parameters and external resources if you wish.</para>

-    

-    <para>If you wish to edit a Flow Controller Descriptor by hand, see <olink targetdoc="&uima_docs_ref;"/>

-    <olink targetdoc="&uima_docs_ref;"

-        targetptr="ugr.ref.xml.component_descriptor.flow_controller"/> for the syntax.</para>

-  </section>

-  

-  <section id="ugr.tug.fc.adding_fc_to_aggregate">

-    <title>Adding a Flow Controller to an Aggregate Analysis Engine</title>

-    <titleabbrev>Adding Flow Controller to an Aggregate</titleabbrev>

-    

-    <para>To use a Flow Controller you must add it to an Aggregate Analysis Engine. You can only have one Flow

-      Controller per Aggregate Analysis Engine. In the Component Descriptor Editor, the Flow Controller is

-      specified on the Aggregate page, as a choice in the flow control kind - pick <quote>User-defined Flow</quote>.

-      When you do, the Browse and Search buttons underneath become active, and allow you to specify an existing Flow

-      Controller Descriptor, which when you select it, will be imported into the aggregate descriptor.

-      

-      

-      <screenshot>

-    <mediaobject>

-      <imageobject>

-        <imagedata width="4.5in" format="JPG" fileref="&imgroot;image006.jpg"/>

-      </imageobject>

-      <textobject><phrase>Screenshot of Component Descriptor Editor Aggregate page showing selecting user-defined flow</phrase></textobject>

-    </mediaobject>

-  </screenshot></para>

-    

-    <para>The key name is created automatically from the name element in the Flow Controller Descriptor being

-      imported. If you need to change this name, you can do so by switching to the <quote>Source</quote> view using the

-      bottom tabs, and editing the name in the XML source.</para>

-    

-    <para>If you edit your Aggregate Analysis Engine Descriptor by hand, the syntax for adding a Flow Controller is:

-      

-      

-      <programlisting>  &lt;delegateAnalysisEngineSpecifiers&gt;

-    ...

-  &lt;/delegateAnalysisEngineSpecifiers&gt;  

-  <emphasis role="bold">&lt;flowController key=<quote>[String]</quote>&gt;

-    &lt;import .../&gt; 

-  &lt;/flowController&gt;</emphasis></programlisting></para>

-    

-    <para>As usual, you can use either in import by location or import by name &ndash; see <olink

-        targetdoc="&uima_docs_ref;"/> <olink

-        targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.component_descriptor.imports"/>.</para>

-    

-    <para>The key that you assign to the FlowController can be used elsewhere in the Aggregate Analysis Engine

-      Descriptor &ndash; in parameter overrides, resource bindings, and Sofa mappings.</para>

-  </section>

-  

-  <section id="ugr.tug.fc.adding_fc_to_cpe">

-    <title>Adding a Flow Controller to a Collection Processing Engine</title>

-    <titleabbrev>Adding Flow Controller to CPE</titleabbrev>

-    

-    <para>Flow Controllers cannot be added directly to Collection Processing Engines. To use a Flow Controller in a

-      CPE you first need to wrap the part of your CPE that requires complex flow control into an Aggregate Analysis

-      Engine, and then add the Aggregate Analysis Engine to your CPE. The CPE&apos;s deployment and error handling

-      options can then only be configured for the entire Aggregate Analysis Engine as a unit.</para>

-    

-  </section>

-  

-  <section id="ugr.tug.fc.using_fc_with_cas_multipliers">

-    <title>Using Flow Controllers with CAS Multipliers</title>

-    

-    <para>If you want your Flow Controller to work inside an Aggregate Analysis Engine that contains a CAS Multiplier

-      (see <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.cm"/>), there are additional

-      things you must consider.</para>

-    

-    <para>When your Flow Controller routes a CAS to a CAS Multiplier, the CAS Multiplier may produce new CASes that

-      then will also need to be routed by the Flow Controller. When a new output CAS is produced, the framework will call

-      the <literal>newCasProduced</literal> method on the Flow object that was managing the flow of the parent CAS 

-      (the one that was input to the CAS Multiplier). The <literal>newCasProduced</literal> method must create a new Flow 

-      object that will be responsible for routing the new output CAS.</para>

-    

-    <para>In the <literal>CasFlow_ImplBase</literal> and <literal>JCasFlow_ImplBase</literal> classes, the

-      <literal>newCasProduced</literal> method is defined to throw an exception indicating that the Flow

-      Controller does not handle CAS Multipliers. If you want your Flow Controller to properly deal with CAS

-      Multipliers you must override this method.</para>

-        

-    <para>If your Flow class extends <literal>CasFlow_ImplBase</literal>, the method signature to override is:           

-      <programlisting>protected Flow newCasProduced(CAS newOutputCas, String producedBy)</programlisting>

-    </para>

-    

-    <para>If your Flow class extends <literal>JCasFlow_ImplBase</literal>, the method signature to override is:

-      <programlisting>protected Flow newCasProduced(JCas newOutputCas, String producedBy)</programlisting>

-    </para>  

-    

-    <para>Also, there is a variant of <literal>FinalStep</literal> which can only be specified for output CASes

-      produced by CAS Multipliers within the Aggregate Analysis Engine containing the Flow Controller. This

-      version of <literal>FinalStep</literal> is produced by the calling the constructor with a

-      <literal>true</literal> argument, and it causes the CAS to be immediately released back to the pool. No

-      further processing will be done on it and it will not be output from the aggregate. This is the way that you can

-      build an Aggregate Analysis Engine that outputs some new CASes but not others. Note that if you never want any new

-      CASes to be output from the Aggregate Analysis Engine, you don&apos;t need to use this; instead just declare

-      <literal>&lt;outputsNewCASes&gt;false&lt;/outputsNewCASes&gt;</literal> in your Aggregate Analysis

-      Engine Descriptor as described in <olink targetdoc="&uima_docs_tutorial_guides;"

-        targetptr="ugr.tug.cm.aggregate_cms"/>.</para>

-    

-    <para>For more information on how CAS Multipliers interact with Flow Controllers, see 

-      <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.cm.cm_and_fc"/>.

-    </para>

-  </section>

-  

-  <section id="ugr.tug.fc.continuing_when_exceptions_occur">

-    <title>Continuing the Flow When Exceptions Occur</title>

-    <para> If an exception occurs when processing a CAS, the framework may call the method     

-      <programlisting>boolean continueOnFailure(String failedAeKey, Exception failure)</programlisting>

-      on the Flow object that was managing the flow of that CAS. If this method returns <literal>true</literal>, then

-      the framework may continue to call the <literal>next()</literal> method to continue routing the CAS. If this

-      method returns <literal>false</literal> (the default), the framework will not make any more calls to the

-      <literal>next()</literal> method. </para>

-    <para>In the case where the last Step was a ParallelStep, if at least one of the destinations resulted in a failure,

-      then <literal>continueOnFailure</literal> will be called to report one of the failures. If this method

-      returns true, but one of the other destinations in the ParallelStep resulted in a failure, then the

-      <literal>continueOnFailure</literal> method will be called again to report the next failure. This

-      continues until either this method returns false or there are no more failures. </para>

-    <para>Note that it is possible for processing of a CAS to be aborted without this method being called. This method

-      is only called when an attempt is being made to continue processing of the CAS following an exception, which may

-      be an application configuration decision.</para>

-    <para>In any case, if processing is aborted by the framework for any reason, including because

-      <literal>continueOnFailure</literal> returned false, the framework will call the

-      <literal>Flow.aborted()</literal> method to allow the Flow object to clean up any resources.</para>   

-    <para>For an example of how to continue after an exception, see the example

-      code <literal>org.apache.uima.examples.flow.AdvancedFixedFlowController</literal>, in

-      the <literal>examples/src</literal> directory of the UIMA SDK.  This exampe also demonstrates the use of

-      <literal>ParallelStep</literal>.</para>

-  </section>

-</chapter>
\ No newline at end of file
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/tug.multi_views.xml b/uima-docbook-tutorials-and-users-guides/src/docbook/tug.multi_views.xml
deleted file mode 100644
index ccae4cb..0000000
--- a/uima-docbook-tutorials-and-users-guides/src/docbook/tug.multi_views.xml
+++ /dev/null
@@ -1,660 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent">  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.tug.mvs">

-  <title>Multiple CAS Views of an Artifact</title>

-  <titleabbrev>Multiple CAS Views</titleabbrev>

-  

-  <para>UIMA provides an extension to the basic model of the CAS which supports analysis of

-    multiple views of the same artifact, all contained with the CAS. This chapter describes

-    the concepts, terminology, and the API and XML extensions that enable this.</para>

-  

-  <para>Multiple CAS Views can simplify things when different versions of the artifact are

-    needed at different stages of the analysis. They are also key to enabling multimodal

-    analysis where the initial artifact is transformed from one modality to another, or where

-    the artifact itself is multimodal, such as the audio, video and closed-captioned text

-    associated with an MPEG object. Each representation of the artifact can be analyzed

-    independently with the standard UIMA programming model; in addition, multi-view

-    components and applications can be constructed.</para>

-  

-  <para>UIMA supports this by augmenting the CAS with additional light-weight CAS objects,

-    one for each view, where these objects share most of the same underlying CAS, except for two

-    things: each view has its own set of indexed Feature Structures, and each view has its own

-    subject of analysis (Sofa) - its own version of the artifact being analyzed. The Feature

-    Structure instances themselves are in the shared part of the CAS; only the entries in the

-    indexes are unique for each CAS view.</para>

-  

-  <para>All of these CAS view objects are kept together with the CAS, and passed as a unit

-    between components in a UIMA application. APIs exist which allow components and

-    applications to switch among the various view objects, as needed.</para>

-  

-  <para>Feature Structures may be indexed in multiple views, if necessary. New methods on CAS

-    Views facilitate adding or removing Feature Structures to or from their index

-    repositories:</para>

-  

-  

-  <programlisting>aView.addFsToIndexes(aFeatureStructure) 

-aView.removeFsFromIndexes(aFeatureStructure)</programlisting>

-  

-  <para>specify the view in which this Feature Structure should be added to or removed from the

-    indexes.</para>

-  

-  <section id="ugr.tug.mvs.cas_views_and_sofas">

-    <title>CAS Views and Sofas</title>

-    

-    <para>Sofas (see <olink targetdoc="&uima_docs_tutorial_guides;"

-        targetptr="ugr.tug.aas.sofa"/>) and CAS Views are linked. In this implementation,

-      every CAS view has one associated Sofa, and every Sofa has one associated CAS

-      View.</para>

-    

-    <section id="ugr.tug.mvs.naming_views_sofas">

-      <title>Naming CAS Views and Sofas</title>

-      

-      <para>The developer assigns a name to the View / Sofa, which is a simple string

-        (following the rules for Java identifiers, usually without periods, but see special

-        exception below). These names are declared in the component XML metadata, and are

-        used during assembly and by the runtime to enable switching among multiple Views of

-        the CAS at the same time.</para>

-      <note><para>The name is called the Sofa name, for historical reasons, but it applies

-      equally to the View. In the rest of this chapter, we&apos;ll refer to it as the Sofa

-      name.</para></note>

-      

-      <para>Some applications contain components that expect a variable number of Sofas as

-        input or output. An example of a component that takes a variable number of input Sofas

-        could be one that takes several translations of a document and merges them, where each

-        translation was in a separate Sofa. </para>

-      

-      <para> You can specify a variable number of input or output sofa names, where each name

-        has the same base part, by writing the base part of the name (with no periods), followed

-        by a period character and an asterisk character (.*). These denote sofas that have

-        names matching the base part up to the period; for example, names such as

-        <literal>base_name_part.TTX_3d</literal> would match a specification of

-        <literal>base_name_part.*</literal>.</para>

-      

-    </section>

-    

-    <section id="ugr.tug.mvs.multi_view_and_single_view">

-      <title>Multi-View, Single-View components &amp; applications</title>

-      <titleabbrev>Multi/Single View parts in Applications</titleabbrev>

-      

-      <para>Components and applications can be written to be Multi-View or Single-View.

-        Most components used as primitive building blocks are expected to be Single-View.

-        UIMA provides capabilities to combine these kinds of components with Multi-View

-        components when assembling analysis aggregates or applications.</para>

-      

-      <para>Single-View components and applications use only one subject of analysis, and

-        one CAS View. The code and descriptors for these components do not use the facilities

-        described in this chapter.</para>

-      

-      <para>Conversely, Multi-View components and applications are aware of the

-        possibility of multiple Views and Sofas, and have code and XML descriptors that

-        create and manipulate them.</para>

-      

-    </section>

-  </section>

-  

-  <section id="ugr.tug.mvs.multi_view_components">

-    <title>Multi-View Components</title>

-    <section id="ugr.tug.mvs.deciding_multi_view">

-      <title>How UIMA decides if a component is Multi-View</title>

-      <titleabbrev>Deciding: Multi-View</titleabbrev>

-      

-      <para>Every UIMA component has an associated XML Component Descriptor. Multi-View

-        components are identified simply as those whose descriptors declare one or more Sofa

-        names in their Capability sections, as inputs or outputs. If a Component Descriptor

-        does not mention any input or output Sofa names, the framework treats that component

-        as a Single-View component.</para>

-      

-    </section>

-    

-    <section id="ugr.tug.mvs.additional_capabilities">

-      <title>Multi-View: additional capabilities</title>

-      

-      <para>Additional capabilities provided for components and applications aware of the

-        possibilities of multiple Views and Sofas include:</para>

-      

-      <itemizedlist spacing="compact"><listitem><para>Creating new Views, and for

-        each, setting up the associated Sofa data</para></listitem>

-        

-        <listitem><para>Getting a reference to an existing View and its associated Sofa, by

-          name </para></listitem>

-        

-        <listitem><para>Specifying a view in which to index a particular Feature Structure

-          instance </para></listitem></itemizedlist>

-      

-    </section>

-    

-    <section id="ugr.tug.mvs.component_xml_metadata">

-      <title>Component XML metadata</title>

-      

-      <para>Each Multi-View component that creates a Sofa or wants to switch to a specific

-        previously created Sofa must declare the name for the Sofa in the capabilities

-        section. For example, a component expecting as input a web document in html format and

-        creating a plain text document for further processing might declare:</para>

-      

-      

-      <programlisting>&lt;capabilities&gt;

-  &lt;capability&gt;

-    &lt;inputs/&gt;

-    &lt;outputs/&gt;

-    &lt;inputSofas&gt;

-<emphasis role="bold">      &lt;sofaName&gt;rawContent&lt;/sofaName&gt;</emphasis>

-    &lt;/inputSofas&gt;

-    &lt;outputSofas&gt;

-<emphasis role="bold">      &lt;sofaName&gt;detagContent&lt;/sofaName&gt;</emphasis>

-    &lt;/outputSofas&gt;

-  &lt;/capability&gt;

-&lt;/capabilities&gt;</programlisting>

-      

-      <para>Details on this specification are found in <olink

-          targetdoc="&uima_docs_ref;"/> <olink

-          targetdoc="&uima_docs_ref;"

-          targetptr="ugr.ref.xml.component_descriptor"/>. The Component Descriptor

-        Editor supports Sofa declarations on the <olink targetdoc="&uima_docs_tools;"

-          targetptr="ugr.tools.cde.capabilities">Capabilites Page</olink>.</para>

-      

-    </section>

-  </section>

-  

-  <section id="ugr.tug.mvs.sofa_capabilities_and_apis_for_apps">

-    <title>Sofa Capabilities and APIs for Applications</title>

-    <titleabbrev>Sofa Capabilities &amp; APIs for Apps</titleabbrev>

-    

-    <para>In addition to components, applications can make use of these capabilities. When

-      an application creates a new CAS, it also creates the initial view of that CAS - and this

-      view is the object that is returned from the create call. Additional views beyond this

-      first one can be dynamically created at any time. The application can use the Sofa APIs

-      described in <olink targetdoc="&uima_docs_tutorial_guides;"

-        targetptr="ugr.tug.aas"/> to specify the data to be analyzed.</para>

-    

-    <para>If an Application creates a new CAS, the initial CAS that is created will be a view

-      named <quote>_InitialView</quote>. This name can be used in the application and in

-      Sofa Mapping (see the next section) to refer to this otherwise unnamed view.</para>

-    

-  </section>

-  

-  <section id="ugr.tug.mvs.sofa_name_mapping">

-    <title>Sofa Name Mapping</title>

-    

-    <para>Sofa Name mapping is the mechanism which enables UIMA component developers to

-      choose locally meaningful Sofa names in their source code and let aggregate,

-      collection processing engine developers, and application developers connect output

-      Sofas created in one component to input Sofas required in another.</para>

-    

-    <para>At a given aggregation level, the assembler or application developer defines

-      names for all the Sofas, and then specifies how these names map to the contained

-      components, using the Sofa Map.</para>

-    

-    <para>Consider annotator code to create a new CAS view:</para>

-    

-    

-    <programlisting>CAS viewX = cas.createView("X");</programlisting>

-    

-    <para>Or code to get an existing CAS view:</para>

-    

-    <programlisting>CAS viewX = cas.getView("X");</programlisting>

-    

-    <para>Without Sofa name mapping the SofaID for the new Sofa will be <quote>X</quote>.

-      However, if a name mapping for <quote>X</quote> has been specified by the aggregate or

-      CPE calling this annotator, the actual SofaID in the CAS can be different.</para>

-    

-    <para>All Sofas in a CAS must have unique names. This is accomplished by mapping all

-      declared Sofas as described in the following sections. An attempt to create a Sofa with a

-      SofaID already in use will throw an exception.</para>

-    

-    <para>Sofa name mapping must not use the <quote>.</quote> (period) character. Runtime Sofa

-      mapping maps names up to the <quote>.</quote> and appends the period and the following

-      characters to the mapped name.</para>

-    

-    <para>To get a Java Iterator for all the views in a CAS:</para>

-    

-    <programlisting>Iterator allViews = cas.getViewIterator();</programlisting>

-    

-    <para>To get a Java Iterator for selected views in a CAS, for example, views whose name 

-      is either exactly equal to namePrefix or is of the form namePrefix.suffix, where suffix 

-      can be any String:</para>

-    

-    <programlisting>Iterator someViews = cas.getViewIterator(String namePrefix);</programlisting>

-

-      <note><para>Sofa name mapping is applied to namePrefix.</para></note>

-    

-    <para>Sofa name mappings are not currently supported for remote Analysis Engines.

-      See <xref linkend="ugr.tug.mvs.name_mapping_remote_services"/>.</para>

-               

-    <section id="ugr.tug.mvs.name_mapping_aggregate">

-      <title>Name Mapping in an Aggregate Descriptor</title>

-      

-      <para>For each component of an Aggregate, name mapping specifies the conversion

-        between component Sofa names and names at the aggregate level.</para>

-      

-      <para>Here&apos;s an example. Consider two Multi-View annotators to be assembled

-        into an aggregate which takes an audio segment consisting of spoken English and

-        produces a German text translation.</para>

-      

-      <para>The first annotator takes an audio segment as input Sofa and produces a text

-        transcript as output Sofa. The annotator designer might choose these Sofa names to be

-        <quote>AudioInput</quote> and <quote>TranscribedText</quote>.</para>

-      

-      <para>The second annotator is designed to translate text from English to German. This

-        developer might choose the input and output Sofa names to be

-        <quote>EnglishDocument</quote> and <quote>GermanDocument</quote>,

-        respectively.</para>

-      

-      <para>In order to hook these two annotators together, the following section would be

-        added to the top level of the aggregate descriptor:</para>

-      

-      

-      <programlisting><![CDATA[<sofaMappings>

-  <sofaMapping>

-    <componentKey>SpeechToText</componentKey>

-    <componentSofaName>AudioInput</componentSofaName>

-    <aggregateSofaName>SegementedAudio</aggregateSofaName>

-  </sofaMapping>

-  <sofaMapping>

-    <componentKey>SpeechToText</componentKey>

-    <componentSofaName>TranscribedText</componentSofaName>

-    <aggregateSofaName>EnglishTranscript</aggregateSofaName>

-  </sofaMapping>

-  <sofaMapping>

-    <componentKey>EnglishToGermanTranslator</componentKey>

-    <componentSofaName>EnglishDocument</componentSofaName>

-    <aggregateSofaName>EnglishTranscript</aggregateSofaName>

-  </sofaMapping>

-  <sofaMapping>

-    <componentKey>EnglishToGermanTranslator</componentKey>

-    <componentSofaName>GermanDocument</componentSofaName>

-    <aggregateSofaName>GermanTranslation</aggregateSofaName>

-  </sofaMapping>

-</sofaMappings>]]></programlisting>

-      

-      <para>The Component Descriptor Editor supports Sofa name mapping in aggregates and

-        simplifies the task. See <olink targetdoc="&uima_docs_tools;"/>

-        <olink targetdoc="&uima_docs_tools;"

-          targetptr="ugr.tools.cde.capabilities.sofa_name_mapping"/> for details.</para> 

-    </section>

-    

-    <section id="ugr.tug.mvs.name_mapping_cpe"><title>Name Mapping in a CPE

-      Descriptor</title>

-      

-      <para>The CPE descriptor aggregates together a Collection Reader and CAS Processors

-        (Annotators and CAS Consumers). Sofa mappings can be added to the following elements

-        of CPE descriptors: <literal>&lt;collectionIterator&gt;</literal>,

-        <literal>&lt;casInitializer&gt;</literal> and the

-        <literal>&lt;casProcessor&gt;</literal>. To be consistent with the

-        organization of CPE descriptors, the maps for the CPE descriptor are distributed

-        among the XML markup for each of the parts (collectionIterator, casInitializer,

-        casProcessor). Because of this the<literal>

-        &lt;componentKey&gt;</literal> element is not needed. Finally, rather than

-        sub-elements for the parts, the XML markup for these uses attributes. See <olink

-          targetdoc="&uima_docs_ref;"/> <olink

-          targetdoc="&uima_docs_ref;"

-          targetptr="ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.sofa_name_mappings"/>.</para>

-      

-      <para>Here&apos;s an example. Let&apos;s use the aggregate from the previous section

-        in a collection processing engine. Here we will add a Collection Reader that outputs

-        audio segments in an output Sofa named <quote>nextSegment</quote>. Remember to

-        declare an output Sofa nextSegment in the collection reader description.

-        We&apos;ll add a CAS Consumer in the next section.</para>

-      

-      

-      <programlisting>&lt;collectionReader&gt;

-  &lt;collectionIterator&gt;

-    &lt;descriptor&gt;

-    . . .

-    &lt;/descriptor&gt;

-    &lt;configurationParameterSettings&gt;...&lt;/configurationParameterSettings&gt;

-<emphasis role="bold">    &lt;sofaNameMappings&gt;

-      &lt;sofaNameMapping componentSofaName="nextSegment"

-                       cpeSofaName="SegementedAudio"/&gt;

-      &lt;/sofaNameMappings&gt;

-</emphasis>  &lt;/collectionIterator&gt;

-  &lt;casInitializer/&gt;

-&lt;collectionReader&gt;</programlisting>

-      

-      <para>At this point the CAS Processor section for the aggregate does not need any Sofa

-        mapping because the aggregate input Sofa has the same name,

-        <quote>SegementedAudio</quote>, as is being produced by the Collection

-        Reader.</para>

-      

-    </section>

-    

-    <section id="ugr.tug.mvs.specifying_cas_view_for_process">

-      <title>Specifying the CAS View delivered to a Components Process Method</title>

-      <titleabbrev>CAS View received by Process</titleabbrev>

-      

-      <para>All components receive a Sofa named <quote>_InitialView</quote>, or

-        a Sofa that is mapped to this name.</para>

-      

-      <para>For example, assume that the CAS Consumer to be used in our CPE is a Single-View

-        component that expects the analysis results associated with the input CAS, and that

-        we want it to use the results from the translated German text Sofa. The following

-        mapping added to the CAS Processor section for the CPE will instruct the CPE to get the

-        CAS view for the German text Sofa and pass it to the CAS Consumer:</para>

-      

-      

-      <programlisting>&lt;casProcessor&gt;

-  . . .

-  <emphasis role="bold">&lt;sofaNameMappings&gt;

-    &lt;sofaNameMapping componentSofaName="_InitialView"

-                           cpeSofaName="GermanTranslation"/&gt;

-  &lt;sofaNameMappings&gt;

-</emphasis>&lt;/casProcessor&gt;</programlisting>

-      

-      <para id="ugr.tug.mvs.sofa_mapping_leav_out_name">An alternative syntax for

-        this kind of mapping is to simply leave out the component sofa name in this

-        case.</para>

-      

-    </section>

-    

-    <section id="ugr.tug.mvs.name_mapping_application">

-      <title>Name Mapping in a UIMA Application</title>

-      

-      <para>Applications which instantiate UIMA components directly using the

-        UIMAFramework methods can also create a top level Sofa mapping using the

-        <quote>additional parameters</quote> capability.</para>

-      

-      

-      <programlisting>//create a "root" UIMA context for your whole application

-

-UimaContextAdmin rootContext =

-   UIMAFramework.newUimaContext(UIMAFramework.getLogger(),

-      UIMAFramework.newDefaultResourceManager(),

-      UIMAFramework.newConfigurationManager());

-

-input = new XMLInputSource("test.xml");

-desc = UIMAFramework.getXMLParser().parseAnalysisEngineDescription(input);

-

-//setup sofa name mappings using the api

-

-HashMap sofamappings = new HashMap();

-sofamappings.put("localName1", "globalName1");

-sofamappings.put("localName2", "globalName2");

-  

-//create a UIMA Context for the new AE we are about to create

-

-//first argument is unique key among all AEs used in the application

-UimaContextAdmin childContext = rootContext.createChild("myAE", sofamap);

-

-//instantiate AE, passing the UIMA Context through the additional

-//parameters map

-

-Map additionalParams = new HashMap();

-additionalParams.put(Resource.PARAM_UIMA_CONTEXT, childContext);

-

-AnalysisEngine ae = 

-        UIMAFramework.produceAnalysisEngine(desc,additionalParams);</programlisting>

-      

-      <para>Sofa mappings are applied from the inside out, i.e., local to global. First, any

-        aggregate mappings are applied, then any CPE mappings, and finally, any specified

-        using this <quote>additional parameters</quote> capability.</para>

-      

-    </section>

-    

-    <section id="ugr.tug.mvs.name_mapping_remote_services">

-      <title>Name Mapping for Remote Services</title>

-      

-      <para>Currently, no client-side Sofa mapping information is passed from a UIMA client

-        to a remote service. This can cause complications for UIMA services in a Multi-View

-        application.</para>

-      

-      <para>Remote Multi-View services will work only if the service is Single-View, or if the 

-        Sofa names expected by the service exactly match the Sofa names produced by the client.</para>

-      

-      <para>If your application requires Sofa mappings for a remote Analysis Engine, you

-        can wrap your remotely deployed AE in an aggregate (on the remote side), and specify

-        the necessary Sofa mappings in the descriptor for that aggregate.</para>

-    </section>

-  </section>

-  

-  <section id="ugr.tug.mvs.jcas_extensions_for_multi_views">

-    <title>JCas extensions for Multiple Views</title>

-    

-    <para>The JCas interface to the CAS can be used with any / all views. 

-      You can always get a JCas object from an existing CAS

-      object by using the method getJCas(); this call will create the JCas if it doesn&apos;t

-      already exist. If it does exist, it just returns the existing JCas that corresponds to

-      the CAS.</para>

-    

-    <para>JCas implements the getView(...) method, enabling switching to other named

-      views, just like the corresponding method on the CAS. The JCas version, however,

-      returns JCas objects, instead of CAS objects, corresponding to the view.</para>

-  </section>

-  

-  <section id="ugr.tug.mvs.sample_application">

-    <title>Sample Multi-View Application</title>

-    

-    <para>The UIMA SDK contains a simple Sofa example application which demonstrates many

-      Sofa specific concepts and methods. The source code for the application driver is in

-      <literal>examples/src/org/apache/uima/examples/SofaExampleApplication.java</literal>

-      and the Multi-View annotator is given in

-      <literal>SofaExampleAnnotator.java</literal> in the same directory.</para>

-    

-    <para>This sample application demonstrates a language translator annotator which

-      expects an input text Sofa with an English document and creates an output text Sofa

-      containing a German translation. Some of the key Sofa concepts illustrated here

-      include:</para>

-    

-    <itemizedlist spacing="compact"><listitem><para>Sofa creation.</para>

-      </listitem>

-      

-      <listitem><para>Access of multiple CAS views.</para></listitem>

-      

-      <listitem><para>Unique feature structure index space for each view.</para>

-        </listitem>

-      

-      <listitem><para>Feature structures containing cross references between

-        annotations in different CAS views.</para></listitem>

-      

-      <listitem><para>The strong affinity of annotations with a specific Sofa. </para>

-        </listitem></itemizedlist>

-    

-    <section id="ugr.tug.mvs.sample_application.descriptor">

-      <title>Annotator Descriptor</title>

-      

-      <para>The annotator descriptor in

-        <literal>examples/descriptors/analysis_engine/SofaExampleAnnotator.xml</literal>

-        declares an input Sofa named <quote>EnglishDocument</quote> and an output Sofa

-        named <quote>GermanDocument</quote>. A custom type

-        <quote>CrossAnnotation</quote> is also defined:</para>

-      

-      

-      <programlisting><![CDATA[<typeDescription>

-  <name>sofa.test.CrossAnnotation</name>

-  <description/>

-  <supertypeName>uima.tcas.Annotation</supertypeName>

-  <features>

-    <featureDescription>

-      <name>otherAnnotation</name>

-      <description/>

-      <rangeTypeName>uima.tcas.Annotation</rangeTypeName>

-    </featureDescription>

-  </features>

-</typeDescription>]]></programlisting>

-      

-      <para>The <literal>CrossAnnotation</literal> type is derived from

-        <literal>uima.tcas.Annotation </literal>and includes one new feature: a

-        reference to another annotation.</para>

-      

-    </section>

-    

-    <section id="ugr.tug.mvs.sample_application.setup">

-      <title>Application Setup</title>

-      

-      <para>The application driver instantiates an analysis engine,

-        <literal>seAnnotator</literal>, from the annotator descriptor, obtains a new

-        CAS using that engine&apos;s CAS definition, and creates the expected input

-        Sofa using:</para>

-      

-      

-      <programlisting>CAS cas = seAnnotator.newCAS();

-CAS aView = cas.createView("EnglishDocument");</programlisting>

-      

-      <para>Since <literal>seAnnotator</literal> is a primitive component, and no Sofa

-        mapping has been defined, the SofaID will be <quote>EnglishDocument</quote>.

-        Local Sofa data is set using:</para>

-      

-      

-      <programlisting>aView.setDocumentText("this beer is good");</programlisting>

-      

-      <para>At this point the CAS contains all necessary inputs for the translation

-        annotator and its process method is called.</para>

-      

-    </section>

-    

-    <section id="ugr.tug.mvs.sample_application.annotator_processing">

-      <title>Annotator Processing</title>

-      

-      <para>Annotator processing consists of parsing the English document into individual

-        words, doing word-by-word translation and concatenating the translations into a

-        German translation. Analysis metadata on the English Sofa will be an annotation for

-        each English word. Analysis metadata on the German Sofa will be a

-        <literal>CrossAnnotation</literal> for each German word, where the

-        <literal>otherAnnotation</literal> feature will be a reference to the associated

-        English annotation.</para>

-      

-      <para>Code of interest includes two CAS views:</para>

-      

-      

-      <programlisting>// get View of the English text Sofa

-englishView = aCas.getView("EnglishDocument");

-

-// Create the output German text Sofa

-germanView = aCas.createView("GermanDocument");</programlisting>

-      

-      <para>the indexing of annotations with the appropriate view:</para>

-      

-      

-      <programlisting>englishView.addFsToIndexes(engAnnot);

-. . .

-germanView.addFsToIndexes(germAnnot);</programlisting>

-      

-      <para>and the combining of metadata belonging to different Sofas in the same feature

-        structure:</para>

-      

-      

-      <programlisting>// add link to English text

-germAnnot.setFeatureValue(other, engAnnot);</programlisting>

-      

-    </section>

-    

-    <section id="ugr.tug.mvs.sample_application.accessing_results">

-      <title>Accessing the results of analysis</title>

-      

-      <para>The application needs to get the results of analysis, which may be in different

-        views. Analysis results for each Sofa are dumped independently by iterating over all

-        annotations for each associated CAS view. For the English Sofa:</para>

-      

-      

-      <programlisting>for (Annotation annot : aView.getAnnotationIndex()) {

-  System.out.println(" " + annot.getType().getName()

-                         + ": " + annot.getCoveredText());

-}</programlisting>      

-      

-      <para>Iterating over all German annotations looks the same, except for the

-        following:</para>

-      

-      

-      <programlisting>if (annot.getType() == cross) {

-  AnnotationFS crossAnnot =

-          (AnnotationFS) annot.getFeatureValue(other);

-  System.out.println("   other annotation feature: "

-          + crossAnnot.getCoveredText());

-}</programlisting>

-      

-      <para>Of particular interest here is the built-in Annotation type method

-        <literal>getCoveredText()</literal>. This method uses the

-        <quote>begin</quote> and <quote>end</quote> features of the annotation to create

-        a substring from the CAS document. The SofaRef feature of the annotation is used to

-        identify the correct Sofa&apos;s data from which to create the substring.</para>

-      

-      <para>The example program output is:</para>

-      

-      

-      <programlisting>---Printing all annotations for English Sofa---

-uima.tcas.DocumentAnnotation: this beer is good

-uima.tcas.Annotation: this

-uima.tcas.Annotation: beer

-uima.tcas.Annotation: is

-uima.tcas.Annotation: good

-      

----Printing all annotations for German Sofa---

-uima.tcas.DocumentAnnotation: das bier ist gut

-sofa.test.CrossAnnotation: das

- other annotation feature: this

-sofa.test.CrossAnnotation: bier

- other annotation feature: beer

-sofa.test.CrossAnnotation: ist

- other annotation feature: is

-sofa.test.CrossAnnotation: gut

- other annotation feature: good</programlisting>

-      

-    </section>

-  </section>

-  

-  <section id="ugr.tug.mvs.views_api_summary">

-    <title>Views API Summary</title>

-    

-    <para>The recommended way to deliver a particular CAS view to a <emphasis role="bold-italic">Single-View</emphasis> component is to use by Sofa-mapping in

-      the CPE and/or aggregate descriptors.</para>

-    

-    <para>For <emphasis role="bold-italic">Multi-View </emphasis> components or

-      applications, the following methods are used to create or get a reference to a CAS view

-      for a particular Sofa:</para>

-    

-    <para>Creating a new View:</para>

-    

-    

-    <programlisting>JCas newView = aJCas.createView(String localNameOfTheViewBeforeMapping);

-CAS  newView = aCAS .createView(String localNameOfTheViewBeforeMapping);</programlisting>

-    

-    <para>Getting a View from a CAS or JCas:</para>

-    

-    

-    <programlisting><?db-font-size 80% ?>JCas myView = aJCas.getView(String localNameOfTheViewBeforeMapping);

-CAS  myView = aCAS .getView(String localNameOfTheViewBeforeMapping);

-Iterator allViews = aCasOrJCas.getViewIterator();

-Iterator someViews = aCasOrJCas.getViewIterator(String localViewNamePrefix);</programlisting>

-    

-    <para>The following methods are useful for all annotators and applications:</para>

-    

-    <para>Setting Sofa data for a CAS or JCas:</para>

-    

-    

-    <programlisting>aCasOrJCas.setDocumentText(String docText);

-aCasOrJCas.setSofaDataString(String docText, String mimeType);

-aCasOrJCas.setSofaDataArray(FeatureStructure array, String mimeType);

-aCasOrJCas.setSofaDataURI(String uri, String mimeType);</programlisting>

-    

-    <para>Getting Sofa data for a particular CAS or JCas:</para>

-    

-    

-    <programlisting>String doc = aCasOrJCas.getDocumentText();

-String doc = aCasOrJCas.getSofaDataString();

-FeatureStructure array = aCasOrJCas.getSofaDataArray();

-String uri = aCasOrJCas.getSofaDataURI();

-InputStream is = aCasOrJCas.getSofaDataStream();</programlisting>

-    

-  </section>

-  

-</chapter>
\ No newline at end of file
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/tug.type_mapping.xml b/uima-docbook-tutorials-and-users-guides/src/docbook/tug.type_mapping.xml
deleted file mode 100644
index a7590bd..0000000
--- a/uima-docbook-tutorials-and-users-guides/src/docbook/tug.type_mapping.xml
+++ /dev/null
@@ -1,140 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  
-%uimaents;
-]>
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements.  See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership.  The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License.  You may obtain a copy of the License at
-
-   http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied.  See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-<chapter id="ugr.tug.type_mapping">
-  <title>Managing different Type Systems</title>
-  <titleabbrev>Managing different TypeSystems</titleabbrev>
-  
-  <section id="ugr.tug.type_mapping.type_merging">
-    <title>Annotators, Type Merging, and Remotes</title>
-    
-	  <para>UIMA supports combining Annotators that have different type systems.
-	  This is normally done by "merging" the two type systems when the Annotators
-	  are first loaded and instantiated. The merge process produces a logical
-	  Union of the two; types having the same name have their feature sets combined.
-	  The combining rules say that the range of same-named feature slots must be the same.
-	  This combined type system is then used for the CAS that will be passed to
-	  all of the annotators.   Details of type merging are described in
-    <olink targetdoc="&uima_docs_ref;"/>
-	  <olink targetdoc="%uima_docs_ref;" targetptr="ugr.ref.cas.typemerging"/>.
-	  </para>
-	  
-	  <para>This approach (of merging the type systems together) works well for
-	  annotators that are run together in one UIMA pipeline instantiation in one
-	  machine.  Extensions are needed when UIMA is scaled out where the pipeline
-	  includes remote annotators, acting as servers, serving
-	  potentially multiple clients, each of which might have a different type system.
-	  Clients, when initializing, query all their remote server parts to get their
-	  type system definition, and merges them together with its own 
-	  to make the type system for the CAS that will be sent among all of those
-	  annotators. The Client's TypeSystem is the union of
-	  all of its annotators, even when some of the them are remote.
-	  </para>
-  </section>
-  
-  <section id="ugr.tug.type_mapping.remote_support">
-    <title>Supporting Remote Annotators</title>
-  
-	  <para>Servers, in providing service to multiple clients, may receive CASes from
-	  different Clients having different type systems.  UIMA has implemented several
-	  different approaches to support this.</para> 
-	  
-	  <note><para>
-    Base UIMA includes support for the VINCI
-    protocol (but this is older, and do not support newer features of the CAS like
-    CAS Multipliers and multiple Views).   
-    </para></note>
-	  
-	  
-	  <para>For Vinci and UIMA-AS	using XMI, the "reachable" Feature Structures (only) are sent.  A reachable 
-    Feature Structure is one that is indexed, or is reachable via a 
-    reference from another reachable Feature Structure.  The receiving service's 
-    type system is guaranteed to be a subset of the sender.  Special code in the 
-    deserializer saves aside any types and features not present in the server's type
-    system and re-merges these values back when returning the CAS to the client.
-    </para>
-	  
-	  <para>
-	  UIMA-AS supports in addition binary CAS serialization protocols.
-	  The binary support is typically compressed.  This compression can greatly
-	  reduce the size of data, compared with plain binary serialization.
-	  The compressed form also supports having a target type system which is 
-	  different from the source's, as long as it is compatible. 
-	  </para>
-	  
-	  <para>Delta CAS support is available for XMI, binary and compressed binary 
-	  protocols, used by UIMA-AS.  The Delta CAS refers to the CAS returned from the service back to the client -
-	  only the new Feature Structures added by the service, plus any modifications to existing
-	  feature structures and/or indexes, are returned.  This can greatly reduce the size of the
-	  returned data.  Delta CAS support is automatically used with more recent versions of UIMA-AS.
-	  </para>
-  </section>
-  
-  <section id="ugr.tug.type_mapping.allowed_differences">
-    <title>Type filtering support in Binary Compressed Serialization/Deserialization</title>
-    
-    <para>The built-in support for Binary Compressed Serialization/Deserialization
-    supports filtering between non-identical type systems.  The filtering is designed
-    so that things (types and/or features) that are defined in one type system
-    but not in another are not sent (when serializing) nor received 
-    (when deserializing).  When deserializing, non-received features receive 0 
-    as their value.  For built-in types, like integer, float, etc., this is the 
-    number 0; for other kinds of things, this is usually a "null" value. </para>
-    
-    <para>Some kinds of type mappings cannot be supported, and will signal errors.
-    The two types being mapped between must be "mergable" according to the normal
-    type merger rules (see above); otherwise, errors are signaled.</para>
-  </section>
-  
-  <section id="ugr.tug.type_mapping.compressed">
-    <title>Remote Services support with Compressed Binary Serialization</title>
-    
-    <para>Uncompressed Binary Serialization protocols for communicating to 
-    remote UIMA-AS services require that the Client and Server's type systems
-    be identical.  Compressed Binary Serialization protocols support
-    Server type systems which are a subset of the Clients.  Types and/or features 
-    not in the Server's type system are not sent to the Server.  
-    </para>    
-  </section>
-  
-  <section id="ugr.tug.type_filtering.compressed_file">
-    <title>Compressed Binary serialization to/from files</title>
-    
-    <para>Compressed binary serialization to a file can specify
-    a target type system which is a subset of the original type system.  The
-    serialization will then exclude types and features not in the target, when 
-    serializing.  You can use this to filter the CAS to serialize out just the parts
-    you want to.
-    </para>
-    
-    <para>Compressed binary deserialization from a file must specify as the target type system
-    the one that went with the target when it was serialized.  The source
-    type system can be different; if it is missing types/features, these will be 
-    filtered during deserialization.  If it has additional features, these will be 
-    set to 0 (the default value) in the CAS heap.  For numeric features, this means
-    the value will be 0 (including floating point 0); for feature structure references
-    and strings, the value will be null.
-    </para>
-  </section>
-</chapter>
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/tug.xmi_emf.xml b/uima-docbook-tutorials-and-users-guides/src/docbook/tug.xmi_emf.xml
deleted file mode 100644
index 6b5da05..0000000
--- a/uima-docbook-tutorials-and-users-guides/src/docbook/tug.xmi_emf.xml
+++ /dev/null
@@ -1,186 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[

-<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent">  

-%uimaents;

-]>

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<chapter id="ugr.tug.xmi_emf">

-  <title>XMI and EMF Interoperability</title>

-  <titleabbrev>XMI &amp; EMF</titleabbrev>

-  

-  <section id="ugr.tug.xmi_emf.overview">

-    <title>Overview</title>

-    

-    <para>In traditional object-oriented terms, a UIMA Type System is a class model and a UIMA CAS is an object graph.

-      There are established standards in this area

-      &ndash; specifically, <trademark class="registered">UML</trademark> is an <trademark class="trade">

-      OMG</trademark> standard for class models and XMI (XML Metadata Interchange) is an OMG standard for the XML

-      representation of object graphs.</para>

-    

-    <para>Furthermore, the Eclipse Modeling Framework (EMF) is an open-source framework for model-based

-      application development, and it is based on UML and XMI. In EMF, you define class models using a metamodel called

-      Ecore, which is similar to UML. EMF provides tools for converting a UML model to Ecore. EMF can then generate Java

-      classes from your model, and supports persistence of those classes in the XMI format.</para>

-    

-    <para>The UIMA SDK provides tools for interoperability with XMI and EMF. These tools allow conversions of UIMA

-      Type Systems to and from Ecore models, as well as conversions of UIMA CASes to and from XMI format. This provides a

-      number of advantages, including:</para>

-    

-    <blockquote>

-      <para>You can define a model using a UML Editor, such as Rational Rose or EclipseUML, and then automatically

-        convert it to a UIMA Type System.</para>

-      

-      <para>You can take an existing UIMA application, convert its type system to Ecore, and save the CASes it

-        produces to XMI. This data is now in a form where it can easily be ingested by an EMF-based application.</para>

-    </blockquote>

-    

-    <para>More generally, we are adopting the well-documented, open standard XMI as the standard way to represent

-      UIMA-compliant analysis results (replacing the UIMA-specific XCAS format). This use of an open standard

-      enables other applications to more easily produce or consume these UIMA analysis results.</para>

-    

-    <para>For more information on XMI, see Grose et al. <emphasis>Mastering XMI. Java Programming with XMI, XML, and

-      UML.</emphasis> John Wiley &amp; Sons, Inc. 2002.</para>

-    

-    <para>For more information on EMF, see Budinsky et al. <emphasis>Eclipse Modeling Framework 2.0.</emphasis>

-      Addison-Wesley. 2006.</para>

-    

-    <para>For details of how the UIMA CAS is represented in XMI format, see <olink targetdoc="&uima_docs_ref;"/>

-          <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xmi"/> .</para>

-    

-  </section>

-  

-  <section id="ugr.tug.xmi_emf.converting_ecore_to_from_uima_type_system">

-    <title>Converting an Ecore Model to or from a UIMA Type System</title>

-    

-    <para>The UIMA SDK provides the following two classes:</para>

-    

-    <para><emphasis role="bold"><literal>Ecore2UimaTypeSystem:</literal>

-      </emphasis> converts from an .ecore model developed using EMF to a UIMA-compliant

-      TypeSystem descriptor. This is a Java class that can be run as a standalone program or

-      invoked from another Java application. To run as a standalone program,

-      execute:</para>

-    

-    <para><command>java org.apache.uima.ecore.Ecore2UimaTypeSystem &lt;ecore

-      file&gt; &lt;output file&gt;</command></para>

-    

-    <para>The input .ecore file will be converted to a UIMA TypeSystem descriptor and written

-      to the specified output file. You can then use the resulting TypeSystem descriptor in

-      your UIMA application.</para>

-    

-    <para><emphasis role="bold"><literal>UimaTypeSystem2Ecore:</literal>

-      </emphasis> converts from a UIMA TypeSystem descriptor to an .ecore model. This is a

-      Java class that can be run as a standalone program or invoked from another Java

-      application. To run as a standalone program, execute:</para>

-    

-    <para><command>java org.apache.uima.ecore.UimaTypeSystem2Ecore

-      &lt;TypeSystem descriptor&gt; &lt;output file&gt;</command></para>

-    

-    <para>The input UIMA TypeSystem descriptor will be converted to an Ecore model file and

-      written to the specified output file. You can then use the resulting Ecore model in EMF

-      applications. The converted type system will include any

-      <literal>&lt;import...&gt;</literal>ed TypeSystems; the fact that they were

-      imported is currently not preserved.</para>

-    

-    <para>To run either of these converters, your classpath will need to include the UIMA jar

-      files as well as the following jar files from the EMF distribution: common.jar,

-      ecore.jar, and ecore.xmi.jar.</para>

-    

-    <para>Also, note that the uima-core.jar file contains the Ecore model file uima.ecore,

-      which defines the built-in UIMA types. You may need to use this file from your EMF

-      applications.</para>

-    

-  </section>

-  

-  <section id="ugr.tug.xmi_emf.using_xmi_cas_serialization">

-    <title>Using XMI CAS Serialization</title>

-    

-    <para>The UIMA SDK provides XMI support through the following two classes:</para>

-    

-    <para><emphasis role="bold"><literal>XmiCasSerializer:</literal></emphasis>

-      can be run from within a UIMA application to write out a CAS to the standard XMI format. The

-      XMI that is generated will be compliant with the Ecore model generated by

-      <literal>UimaTypeSystem2Ecore</literal>. An EMF application could use this Ecore

-      model to ingest and process the XMI produced by the XmiCasSerializer.</para>

-    

-    <para><emphasis role="bold"><literal>XmiCasDeserializer:</literal></emphasis>

-      can be run from within a UIMA application to read in an XMI document and populate a CAS. The

-      XMI must conform to the Ecore model generated by

-      <literal>UimaTypeSystem2Ecore</literal>.</para>

-    

-    <para>Also, the uimaj-examples Eclipse project contains some example code that shows

-      how to use the serializer and deserializer:

-

-    <blockquote>

-    <para><literal>org.apache.uima.examples.xmi.XmiWriterCasConsumer:</literal>

-      This is a CAS Consumer that writes each CAS to an output file in XMI format. It is analogous

-      to the XCasWriter CAS Consumer that has existed in prior UIMA versions, except that it

-      uses the XMI serialization format.</para>

-    

-    <para><literal>org.apache.uima.examples.xmi.XmiCollectionReader:</literal>

-      This is a Collection Reader that reads a directory of XMI files and deserializes each of

-      them into a CAS. For example, this would allow you to build a Collection Processing

-      Engine that reads XMI files, which could contain some previous analysis results, and

-      then do further analysis.</para>

-    </blockquote></para>

-    

-    <para>Finally, in under the folder <literal>uimaj-examples/ecore_src</literal> is

-      the class

-      <literal>org.apache.uima.examples.xmi.XmiEcoreCasConsumer</literal>, which

-      writes each CAS to XMI format and also saves the Type System as an Ecore file. Since this

-      uses the <literal>UimaTypeSystem2Ecore</literal> converter, to compile it you must

-      add to your classpath the EMF jars common.jar, ecore.jar, and ecore.xmi.jar &ndash;

-      see ecore_src/readme.txt for instructions.</para>

-

-    <section id="ugr.tug.xmi_emf.xml_character_issues">

-    <title>Character Encoding Issues with XML Serialization</title>

-    

-    <para>Note that not all valid Unicode characters are valid XML characters, at least not in XML

-      1.0.  Moreover, it is possible to create characters in Java that are not even valid Unicode

-      characters, let alone XML characters.  As UIMA character data is translated directly into XML

-      character data on serialization, this may lead to issues.  UIMA will therefore check that the

-      character data that is being serialized is valid for the version of XML being used.  If 

-      non-serializable character data is encountered during serialization, an exception is thrown

-      and serialization fails (to avoid creating invalid XML data).  UIMA does not simply replace

-      the offending characters with some valid replacement character; the assumption being that

-      most applications would not like to have their data modified automatically.

-    </para>

-    

-    <para>If you know you are going to use XML serialization, and you would like to avoid such issues

-      on serialization, you should check any character data you create in UIMA ahead of time.  Issues

-      most often arise with the document text, as documents may originate at various sources, and

-      may be of varying quality.  So it's a particularly good idea to check the document text for

-      characters that will cause issues for serialization.

-    </para>

-    

-    <para>UIMA provides a handful of functions to assist you in checking Java character data.  Those

-      methods are located in 

-      <literal>org.apache.uima.internal.util.XMLUtils.checkForNonXmlCharacters()</literal>, with

-      several overloads.  Please check the javadocs for further information.

-    </para>

-    

-    <para>Please note that these issues are not specific to XMI serialization, they apply to the

-      older XCAS format in the same way.

-    </para>

-  

-    </section>

-  </section>

-  

-</chapter>
\ No newline at end of file
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/tutorials_and_users_guides.xml b/uima-docbook-tutorials-and-users-guides/src/docbook/tutorials_and_users_guides.xml
deleted file mode 100644
index 402f727..0000000
--- a/uima-docbook-tutorials-and-users-guides/src/docbook/tutorials_and_users_guides.xml
+++ /dev/null
@@ -1,37 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>

-<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"

-"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">

-<!--

-Licensed to the Apache Software Foundation (ASF) under one

-or more contributor license agreements.  See the NOTICE file

-distributed with this work for additional information

-regarding copyright ownership.  The ASF licenses this file

-to you under the Apache License, Version 2.0 (the

-"License"); you may not use this file except in compliance

-with the License.  You may obtain a copy of the License at

-

-   http://www.apache.org/licenses/LICENSE-2.0

-

-Unless required by applicable law or agreed to in writing,

-software distributed under the License is distributed on an

-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

-KIND, either express or implied.  See the License for the

-specific language governing permissions and limitations

-under the License.

--->

-<book lang="en">

-  <title>UIMA Tutorial and Developers&apos; Guides</title>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="../../target/docbook-shared/common_book_info_ibm_c.xml"/>

-

-  <toc/>

-  

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="annotator_analysis_engine_guide.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.cpe.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.application.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.fc.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.aas.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.multi_views.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.cas_multiplier.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.xmi_emf.xml"/>

-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.type_mapping.xml"/>

-</book>

diff --git a/uimaj-documentation/pom.xml b/uimaj-documentation/pom.xml
new file mode 100644
index 0000000..05bee6d
--- /dev/null
+++ b/uimaj-documentation/pom.xml
@@ -0,0 +1,247 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements. See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership. The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License. You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied. See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
+  <modelVersion>4.0.0</modelVersion>
+
+  <parent>
+    <groupId>org.apache.uima</groupId>
+    <artifactId>uimaj-parent</artifactId>
+    <version>3.5.0-SNAPSHOT</version>
+    <relativePath>../uimaj-parent/pom.xml</relativePath>
+  </parent>
+
+  <artifactId>uimaj-documentation</artifactId>
+  <packaging>pom</packaging>
+  <name>Apache UIMA Java SDK Documentation</name>
+  <url>https://uima.apache.org</url>
+
+  <build>
+    <plugins>
+      <plugin>
+        <groupId>org.asciidoctor</groupId>
+        <artifactId>asciidoctor-maven-plugin</artifactId>
+        <version>${asciidoctor.plugin.version}</version>
+        <configuration>
+          <outputDirectory>${project.build.directory}/site/d</outputDirectory>
+          <attributes>
+            <experimental>true</experimental>
+            <doctype>book</doctype>
+            <toclevels>8</toclevels>
+            <sectanchors>true</sectanchors>
+            <sectnums>true</sectnums>
+            <docinfo1>true</docinfo1>
+            <project-version>${project.version}</project-version>
+            <revnumber>${project.version}</revnumber>
+            <product-website-url>https://uima.apache.org</product-website-url>
+            <icons>font</icons>
+            <relativeBaseDir>true</relativeBaseDir>
+            <xrefstyle>short</xrefstyle>
+          </attributes>
+          <requires>
+            <require>asciidoctor-pdf</require>
+          </requires>
+        </configuration>
+        <executions>
+          <execution>
+            <id>tug-html</id>
+            <phase>generate-resources</phase>
+            <goals>
+              <goal>process-asciidoc</goal>
+            </goals>
+            <configuration>
+              <sourceDocumentName>tug.adoc</sourceDocumentName>
+              <backend>html5</backend>
+              <attributes>
+                <toc>left</toc>
+                <imagesDir>./tug</imagesDir>
+              </attributes>
+            </configuration>
+          </execution>
+          <execution>
+            <id>tug-pdf</id>
+            <phase>generate-resources</phase>
+            <goals>
+              <goal>process-asciidoc</goal>
+            </goals>
+            <configuration>
+              <sourceDocumentName>tug.adoc</sourceDocumentName>
+              <backend>pdf</backend>
+              <attributes>
+                <toc>preamble</toc>
+                <imagesDir>./tug</imagesDir>
+              </attributes>
+            </configuration>
+          </execution>
+          <execution>
+            <id>oas-html</id>
+            <phase>generate-resources</phase>
+            <goals>
+              <goal>process-asciidoc</goal>
+            </goals>
+            <configuration>
+              <sourceDocumentName>oas.adoc</sourceDocumentName>
+              <backend>html5</backend>
+              <attributes>
+                <toc>left</toc>
+                <imagesDir>./oas</imagesDir>
+              </attributes>
+            </configuration>
+          </execution>
+          <execution>
+            <id>oas-pdf</id>
+            <phase>generate-resources</phase>
+            <goals>
+              <goal>process-asciidoc</goal>
+            </goals>
+            <configuration>
+              <sourceDocumentName>oas.adoc</sourceDocumentName>
+              <backend>pdf</backend>
+              <attributes>
+                <toc>preamble</toc>
+                <imagesDir>./oas</imagesDir>
+              </attributes>
+            </configuration>
+          </execution>
+          <execution>
+            <id>ref-html</id>
+            <phase>generate-resources</phase>
+            <goals>
+              <goal>process-asciidoc</goal>
+            </goals>
+            <configuration>
+              <sourceDocumentName>ref.adoc</sourceDocumentName>
+              <backend>html5</backend>
+              <attributes>
+                <toc>left</toc>
+                <imagesDir>./ref</imagesDir>
+              </attributes>
+            </configuration>
+          </execution>
+          <execution>
+            <id>ref-pdf</id>
+            <phase>generate-resources</phase>
+            <goals>
+              <goal>process-asciidoc</goal>
+            </goals>
+            <configuration>
+              <sourceDocumentName>ref.adoc</sourceDocumentName>
+              <backend>pdf</backend>
+              <attributes>
+                <toc>preamble</toc>
+                <imagesDir>./ref</imagesDir>
+              </attributes>
+            </configuration>
+          </execution>
+          <execution>
+            <id>tools-html</id>
+            <phase>generate-resources</phase>
+            <goals>
+              <goal>process-asciidoc</goal>
+            </goals>
+            <configuration>
+              <sourceDocumentName>tools.adoc</sourceDocumentName>
+              <backend>html5</backend>
+              <attributes>
+                <toc>left</toc>
+                <imagesDir>./tools</imagesDir>
+              </attributes>
+            </configuration>
+          </execution>
+          <execution>
+            <id>tools-pdf</id>
+            <phase>generate-resources</phase>
+            <goals>
+              <goal>process-asciidoc</goal>
+            </goals>
+            <configuration>
+              <sourceDocumentName>tools.adoc</sourceDocumentName>
+              <backend>pdf</backend>
+              <attributes>
+                <toc>preamble</toc>
+                <imagesDir>./tools</imagesDir>
+              </attributes>
+            </configuration>
+          </execution>
+        </executions>
+        <dependencies>
+          <dependency>
+              <groupId>org.jruby</groupId>
+              <artifactId>jruby</artifactId>
+              <version>9.4.3.0</version>
+          </dependency>
+          <dependency>
+            <groupId>org.asciidoctor</groupId>
+            <artifactId>asciidoctorj</artifactId>
+            <version>${asciidoctor.version}</version>
+          </dependency>
+          <dependency>
+            <groupId>org.asciidoctor</groupId>
+            <artifactId>asciidoctorj-pdf</artifactId>
+            <version>${asciidoctor.pdf.version}</version>
+          </dependency>
+        </dependencies>
+      </plugin>
+    </plugins>
+  </build>
+  <profiles>
+    <profile>
+      <id>m2e</id>
+      <activation>
+        <property>
+          <name>m2e.version</name>
+        </property>
+      </activation>
+      <build>
+        <pluginManagement>
+          <plugins>
+            <!--
+              - This plugin's configuration is used to store Eclipse m2e settings only.
+              - It has no influence on the Maven build itself.
+            -->
+            <plugin>
+              <groupId>org.eclipse.m2e</groupId>
+              <artifactId>lifecycle-mapping</artifactId>
+              <version>1.0.0</version>
+              <configuration>
+                <lifecycleMappingMetadata>
+                  <pluginExecutions>
+                    <pluginExecution>
+                      <pluginExecutionFilter>
+                        <groupId>org.asciidoctor</groupId>
+                        <artifactId>asciidoctor-maven-plugin</artifactId>
+                        <versionRange>[1.0,)</versionRange>
+                        <goals>
+                          <goal>process-asciidoc</goal>
+                        </goals>
+                      </pluginExecutionFilter>
+                      <action>
+                        <execute />
+                      </action>
+                    </pluginExecution>
+                  </pluginExecutions>
+                </lifecycleMappingMetadata>
+              </configuration>
+            </plugin>
+          </plugins>
+        </pluginManagement>
+      </build>
+    </profile>
+  </profiles>
+</project>
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/oas.adoc b/uimaj-documentation/src/docs/asciidoc/oas.adoc
new file mode 100644
index 0000000..aa2c145
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/oas.adoc
@@ -0,0 +1,34 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+= Apache UIMA™ - UIMA Overview and SDK Setup
+:Author: Apache UIMA™ Development Community
+:toc-title: UIMA Overview and SDK Setup
+
+include::oas/common_book_info.adoc[leveloffset=+1]
+
+include::oas/project_overview.adoc[leveloffset=+1]
+
+include::oas/conceptual_overview.adoc[leveloffset=+1]
+
+include::oas/eclipse_setup.adoc[leveloffset=+1]
+
+include::oas/faqs.adoc[leveloffset=+1]
+
+include::oas/known_issues.adoc[leveloffset=+1]
+
+include::oas/glossary.adoc[leveloffset=+1]
diff --git a/uimaj-documentation/src/docs/asciidoc/oas/common_book_info.adoc b/uimaj-documentation/src/docs/asciidoc/oas/common_book_info.adoc
new file mode 100644
index 0000000..537f3e6
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/oas/common_book_info.adoc
@@ -0,0 +1,42 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+Copyright © 2006, 2021 The Apache Software Foundation
+
+Copyright © 2004, 2006 International Business Machines Corporation
+
+[discrete]
+=== License and Disclaimer
+
+The ASF licenses this documentation to you under the Apache License, Version 2.0 (the "License"); 
+you may not use this documentation except in compliance with the License.  You may obtain a copy of
+the License at
+
+[.text-center]
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, this documentation and its contents are
+distributed under the License on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
+either express or implied.  See the License for the specific language governing permissions and
+limitations under the License.
+
+[discrete]
+=== Trademarks
+
+All terms mentioned in the text that are known to be trademarks or service marks have been 
+appropriately capitalized.  Use of such terms in this book should not be regarded as affecting the
+validity of the the trademark or service mark.
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/oas/conceptual_overview.adoc b/uimaj-documentation/src/docs/asciidoc/oas/conceptual_overview.adoc
new file mode 100644
index 0000000..d0ac355
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/oas/conceptual_overview.adoc
@@ -0,0 +1,579 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.ovv.conceptual]]
+= UIMA Conceptual Overview
+
+UIMA is an open, industrial-strength, scaleable and extensible platform for creating, integrating and deploying unstructured information management solutions from powerful text or multi-modal analysis and search components. 
+
+The Apache UIMA project is an implementation of the Java UIMA framework available under the Apache License, providing a common foundation for industry and academia to collaborate and accelerate the world-wide development of technologies critical for discovering vital knowledge present in the fastest growing sources of information today.
+
+This chapter presents an introduction to many essential UIMA concepts.
+It is meant to provide a broad overview to give the reader a quick sense of UIMA's basic architectural philosophy and the UIMA SDK's capabilities. 
+
+This chapter provides a general orientation to UIMA and makes liberal reference to the other chapters in the UIMA SDK documentation set, where the reader may find detailed treatments of key concepts and development practices.
+It may be useful to refer to <<ugr.glossary>>, to become familiar with the terminology in this overview.
+
+[[ugr.ovv.conceptual.uima_introduction]]
+== UIMA Introduction
+
+.UIMA helps you build the bridge between the unstructured and structuredworlds
+image::images/overview-and-setup/conceptual_overview_files/image002.png[Picture of a bridge between unstructured information artifacts and structured metadata about those artifacts]
+
+Unstructured information represents the largest, most current and fastest growing source of information available to businesses and governments.
+The web is just the tip of the iceberg.
+Consider the mounds of information hosted in the enterprise and around the world and across different media including text, voice and video.
+The high-value content in these vast collections of unstructured information is, unfortunately, buried in lots of noise.
+Searching for what you need or doing sophisticated data mining over unstructured information sources presents new challenges. 
+
+An unstructured information management (UIM) application may be generally characterized as a software system that analyzes large volumes of unstructured information (text, audio, video, images, etc.) to discover, organize and deliver relevant knowledge to the client or application end-user.
+An example is an application that processes millions of medical abstracts to discover critical drug interactions.
+Another example is an application that processes tens of millions of documents to discover key evidence indicating probable competitive threats. 
+
+First and foremost, the unstructured data must be analyzed to interpret, detect and locate concepts of interest, for example, named entities like persons, organizations, locations, facilities, products etc., that are not explicitly tagged or annotated in the original artifact.
+More challenging analytics may detect things like opinions, complaints, threats or facts.
+And then there are relations, for example, located in, finances, supports, purchases, repairs etc.
+The list of concepts  important for applications to discover in unstructured content is large, varied and  often domain specific.
+Many different component analytics may solve different parts of the overall analysis task.
+These component analytics must interoperate and must be easily combined to facilitate  the development of UIM applications.
+
+The result of analysis are used to populate structured forms so that conventional  data processing and search technologies  like search engines, database engines or OLAP (On-Line Analytical Processing, or Data Mining) engines  can efficiently deliver the newly discovered content in response to the client requests  or queries.
+
+In analyzing unstructured content, UIM applications make use of a variety of analysis technologies including:
+
+* Statistical and rule-based Natural Language Processing (NLP)
+* Information Retrieval (IR)
+* Machine learning
+* Ontologies
+* Automated reasoning and
+* Knowledge Sources (e.g., CYC, WordNet, FrameNet, etc.)
+
+Specific analysis capabilities using these technologies are developed  independently using different techniques, interfaces and platforms. 
+
+The bridge from the unstructured world to the structured world is built through the composition and deployment of these analysis capabilities.
+This integration is often a costly challenge. 
+
+The Unstructured Information Management Architecture (UIMA) is an architecture and software framework that helps you build that bridge.
+It supports creating, discovering, composing and deploying a broad range of analysis capabilities and linking them to structured information services.
+
+UIMA allows development teams to match the right skills with the right parts of a solution and helps enable rapid integration across technologies and platforms using a variety of different deployment options.
+These ranging from tightly-coupled deployments for high-performance, single-machine, embedded solutions to parallel and fully distributed deployments for highly flexible and scaleable solutions.
+
+[[ugr.ovv.conceptual.architecture_framework_sdk]]
+== The Architecture, the Framework and the SDK
+
+UIMA is a software architecture which specifies component interfaces, data representations, design patterns and development roles for creating, describing, discovering, composing and deploying multi-modal analysis capabilities.
+
+The *UIMA framework* provides a run-time environment in which developers can plug in their UIMA component implementations and with which they can build and deploy UIM applications.
+The framework is not specific to any IDE or platform.
+Apache hosts a Java and (soon) a C++ implementation of the UIMA Framework.
+
+The *UIMA Software Development Kit (SDK)* includes the UIMA framework, plus tools and utilities for using UIMA.
+Some of the tooling supports an Eclipse-based ( http://www.eclipse.org/) development environment. 
+
+[[ugr.ovv.conceptual.analysis_basics]]
+== Analysis Basics
+
+[NOTE]
+====
+Analysis Engine, Document, Annotator, Annotator Developer, Type, Type System, Feature, Annotation, CAS, Sofa, JCas, UIMA Context.
+====
+
+[[ugr.ovv.conceptual.aes_annotators_and_analysis_results]]
+=== Analysis Engines, Annotators & Results
+
+[[ugr.ovv.conceptual.metadata_in_cas]]
+.Objects represented in the Common Analysis Structure (CAS)
+image::images/overview-and-setup/conceptual_overview_files/image004.png["Picture of some text, with a hierarchy of discovered metadata about words in the text, including some image of a person as metadata about that name."]
+
+UIMA is an architecture in which basic building blocks called Analysis Engines (AEs) are composed to analyze a document and infer and record descriptive attributes about the document as a whole, and/or about regions therein.
+This descriptive information, produced by AEs is referred to generally as **analysis results**.
+Analysis results typically represent meta-data about the document content.
+One way to think about AEs is as software agents that automatically discover and record meta-data about original content.
+
+UIMA supports the analysis of different modalities including text, audio and video.
+The majority of examples we provide are for text.
+We use the term **document, **therefore, to generally refer to any unit of content that an AE may process, whether it is a text document or a segment of audio, for example.
+See the xref:tug.adoc#ugr.tug.mvs[Multiple CAS Views of an Artifact] for more information on multimodal processing in UIMA.
+
+Analysis results include different statements about the content of a document.
+For example, the following is an assertion about the topic of a document:
+
+[source]
+----
+(1) The Topic of document D102 is "CEOs and Golf".
+----
+
+Analysis results may include statements describing regions more granular than the entire document.
+We use the term *span* to refer to a sequence of characters in a text document.
+Consider that a document with the identifier D102 contains a span, "`Fred Centers`" starting at character position 101.
+An AE that can detect persons in text may represent the following statement as an analysis result:
+
+[source]
+----
+(2) The span from position 101 to 112 in document D102 denotes a Person
+----
+
+In both statements 1 and 2 above there is a special pre-defined term or what we call in UIMA a **Type**.
+They are _Topic_ and _Person_ respectively.
+UIMA types characterize the kinds of results that an AE may create -- more on types later.
+
+Other analysis results may relate two statements.
+For example, an AE might record in its results that two spans are both referring to the same person:
+
+[source]
+----
+(3) The Person denoted by span 101 to 112 and 
+  the Person denoted by span 141 to 143 in document D102 
+  refer to the same Entity.
+----
+
+The above statements are some examples of the kinds of results that AEs may record to describe the content of the documents they analyze.
+These are not meant to indicate the form or syntax with which these results are captured in UIMA -- more on that later in this overview.
+
+The UIMA framework treats Analysis engines as pluggable, composible, discoverable, managed objects.
+At the heart of AEs are the analysis algorithms that do all the work to analyze documents and record analysis results. 
+
+UIMA provides a basic component type intended to house the core analysis algorithms running inside AEs.
+Instances of this component are called **Annotators**.
+The analysis algorithm developer's primary concern therefore is the development of annotators.
+The UIMA framework provides the necessary methods for taking annotators and creating analysis engines.
+
+In UIMA the person who codes analysis algorithms takes on the role of the **Annotator Developer**.
+The xref:tug.adoc#ugr.tug.aae[Annotator and Analysis Engine Developer’s Guide] will take the reader through the details involved in creating UIMA annotators and analysis engines.
+
+At the most primitive level an AE wraps an annotator adding the necessary APIs and infrastructure for the composition and deployment of annotators within the UIMA framework.
+The simplest AE contains exactly one annotator at its core.
+Complex AEs may contain a collection of other AEs each potentially containing within them other AEs. 
+
+[[ugr.ovv.conceptual.representing_results_in_cas]]
+=== Representing Analysis Results in the CAS
+
+How annotators represent and share their results is an important part of the UIMA architecture.
+UIMA defines a *Common Analysis Structure (CAS)* precisely for these purposes.
+
+The CAS is an object-based data structure that allows the representation of objects, properties and values.
+Object types may be related to each other in a single-inheritance hierarchy.
+The CAS logically (if not physically) contains the document being analyzed.
+Analysis developers share and record their analysis results in terms of an object model within the CAS. footnote:[We have plans to extend the representational capabilities of the CAS and align its semantics with the semantics of the OMG's Essential Meta-Object Facility (EMOF) and with the semantics of the Eclipse Modeling Framework's ( ) Ecore semantics and XMI-based representation.]
+
+The UIMA framework includes an implementation and interfaces to the CAS.
+For a more detailed description of the CAS and its interfaces see xref:ref.adoc#ugr.ref.cas[CAS Reference].
+
+A CAS that logically contains statement 2 (repeated here for your convenience)
+
+[source]
+----
+(2) The span from position 101 to 112 in document D102 denotes a Person
+----
+
+would include objects of the Person type.
+For each person found in the body of a document, the AE would create a Person object in the CAS and link it to the span of text where the person was mentioned in the document.
+
+While the CAS is a general purpose data structure, UIMA defines a few basic types and affords the developer the ability to extend these to define an arbitrarily rich **Type System**.
+You can think of a type system as an object schema for the CAS.
+
+A type system defines the various types of objects that may be discovered in  documents by AE's that subscribe to that type system.
+
+As suggested above, Person may be defined as a type.
+Types have properties or **features**.
+So for example, _Age_ and _Occupation_ may be defined as features of the Person type.
+
+Other types might be _Organization, Company, Bank, Facility, Money, Size, Price, Phone Number, Phone Call, Relation, Network Packet, Product, Noun, Phrase, Verb, Color, Parse Node, Feature Weight Array_ etc.
+
+There are no limits to the different types that may be defined in a type system.
+A type system is domain and application specific.
+
+Types in a UIMA type system may be organized into a taxonomy.
+For example, _Company_ may be defined as a subtype of __Organization__. _NounPhrase_ may be a subtype of a __ParseNode__.
+
+[[ugr.ovv.conceptual.annotation_type]]
+==== The Annotation Type
+
+A general and common type used in artifact analysis and from which additional types are often derived is the *annotation* type. 
+
+The annotation type is used to annotate or label regions of an artifact.
+Common artifacts are text documents, but they can be other things, such as audio streams.
+The annotation type for text includes two features, namely `begin` and `end`.
+Values of these features represent integer offsets in the artifact and delimit a span.
+Any particular annotation object identifies the span it annotates with the `begin` and `end` features.
+
+The key idea here is that the annotation type is used to identify and label or __annotate__ a specific region of an artifact.
+
+Consider that the Person type is defined as a subtype of annotation.
+An annotator, for example, can create a Person annotation to record the discovery of a mention of a person between position 141 and 143 in document D102.
+The annotator can create another person annotation to record the detection of a mention of a person in the span between positions 101 and 112. 
+
+[[ugr.ovv.conceptual.not_just_annotations]]
+==== Not Just Annotations
+
+While the annotation type is a useful type for annotating regions of a document, annotations are not the only kind of types in a CAS.
+A CAS is a general representation scheme and may store arbitrary data structures to represent the analysis of documents.
+
+As an example, consider statement 3 above (repeated here for your convenience).
+
+[source]
+----
+(3) The Person denoted by span 101 to 112 and 
+  the Person denoted by span 141 to 143 in document D102 
+  refer to the same Entity.
+----
+
+This statement mentions two person annotations in the CAS; the first, call it P1 delimiting the span from 101 to 112 and the other, call it P2, delimiting the span from 141 to 143.
+Statement 3 asserts explicitly that these two spans refer to the same entity.
+This means that while there are two expressions in the text represented by the annotations P1 and P2, each refers to one and the same person. 
+
+The Entity type may be introduced into a type system to capture this kind of information.
+The Entity type is not an annotation.
+It is intended to represent an object in the domain which may be referred to by different expressions (or mentions) occurring multiple times within a document (or across documents within a collection of documents). The Entity type has a feature named __occurrences. __This feature is used to point to all the annotations believed to label mentions of the same entity.
+
+Consider that the spans annotated by P1 and P2 were "`Fred Center`" and "`He`" respectively.
+The annotator might create a new Entity object called ``FredCenter``.
+To represent the relationship in statement 3 above, the annotator may link FredCenter to both P1 and P2 by making them values of its _occurrences_ feature.
+
+<<ugr.ovv.conceptual.metadata_in_cas>> also illustrates that an entity may be linked to annotations referring to regions of image documents as well.
+To do this the annotation type would have to be extended with the appropriate features to point to regions of an image.
+
+[[ugr.ovv.conceptual.multiple_views_within_a_cas]]
+==== Multiple Views within a CAS
+
+UIMA supports the simultaneous analysis of multiple views of a document.
+This support comes in handy for processing multiple forms of the artifact, for example, the audio and the closed captioned views of a single speech stream, or the tagged and detagged  views of an HTML document.
+
+AEs analyze one or more views of a document.
+Each view contains a specific **subject of analysis(Sofa)**, plus a set of indexes holding metadata indexed by that view.
+The CAS, overall, holds one or more CAS Views, plus the descriptive objects that represent the analysis results for each. 
+
+Another common example of using CAS Views is for different translations of a document.
+Each translation may be represented with a different CAS View.
+Each translation may be described by a different set of analysis results.
+For more details on CAS Views and Sofas, see xref:tug.adoc#ugr.tug.mvs[Multiple CAS Views of an Artifact] and xref:tug.adoc#ugr.tug.aas[Annotations, Artifacts, and Sofas].
+
+[[ugr.ovv.conceptual.interacting_with_cas_and_external_resources]]
+=== Using CASes and External Resources
+
+The two main interfaces that a UIMA component developer interacts with are the CAS and the UIMA Context.
+
+UIMA provides an efficient implementation of the CAS with multiple programming interfaces.
+Through these interfaces, the annotator developer interacts with the document and reads and writes analysis results.
+The CAS interfaces provide a suite of access methods that allow the developer to obtain indexed iterators to the different objects in the CAS.
+See xref:ref.adoc#ugr.ref.cas[CAS Reference].
+While many objects may exist in a CAS, the annotator developer can obtain a specialized iterator to all Person objects associated with a particular view, for example.
+
+For Java annotator developers, UIMA provides the JCas.
+This interface provides the Java developer with a natural interface to CAS objects.
+Each type declared in the type system appears as a Java Class; the UIMA framework renders the Person type as a Person class in Java.
+As the analysis algorithm detects mentions of persons in the documents, it can create Person objects in the CAS.
+For more details on how to interact with the CAS using this interface, refer to the xref:ref.adoc#ugr.ref.jcas[JCas Reference].
+
+The component developer, in addition to interacting with the CAS, can access external resources through the framework's resource manager interface called the **UIMA Context**.
+This interface, among other things, can ensure that different annotators working together in an aggregate flow may share the same instance of an external file or remote resource accessed via its URL, for example.
+For details on using the UIMA Context see xref:tug.adoc#ugr.tug.aae[Annotator and Analysis Engine Developer's Guide].
+
+[[ugr.ovv.conceptual.component_descriptors]]
+=== Component Descriptors
+
+UIMA defines interfaces for a small set of core components that users of the framework provide implmentations for.
+Annotators and Analysis Engines are two of the basic building blocks specified by the architecture.
+Developers implement them to build and compose analysis capabilities and ultimately applications.
+
+There are others components in addition to these, which we will learn about later, but for every component specified in UIMA there are two parts required for its implementation:
+
+. the declarative part and
+. the code part.
+
+The declarative part contains metadata describing the component, its identity, structure and behavior and is called the **Component Descriptor**.
+Component descriptors are represented in XML.
+The code part implements the algorithm.
+The code part may be a program in Java.
+
+As a developer using the UIMA SDK, to implement a UIMA component it is always the case that you will provide two things: the code part and the Component Descriptor.
+Note that when you are composing an engine, the code may be already provided in reusable subcomponents.
+In these cases you may not be developing new code but rather composing an aggregate engine by pointing to other components where the code has been included.
+
+Component descriptors are represented in XML and aid in component discovery, reuse, composition and development tooling.
+The UIMA SDK provides tools for easily creating and maintaining the component descriptors that relieve the developer from editing XML directly.
+This tool is described briefly in xref:tug.adoc#ugr.tug.aae[Annotator and Analysis Engine Developer's Guide], and more thoroughly in xref:tools.adoc#ugr.tools.cde[Component Descriptor Editor User’s Guide].
+
+Component descriptors contain standard metadata including the component's name, author, version, and a reference to the class that implements the component.
+
+In addition to these standard fields, a component descriptor identifies the type system the component uses and the types it requires in an input CAS and the types it plans to produce in an output CAS.
+
+For example, an AE that detects person types may require as input a CAS that includes a tokenization and deep parse of the document.
+The descriptor refers to a type system to make the component's input requirements and output types explicit.
+In effect, the descriptor includes a declarative description of the component's behavior and can be used to aid in component discovery and composition based on desired results.
+UIMA analysis engines provide an interface for accessing the component metadata represented in their descriptors.
+For more details on the structure of UIMA component descriptors refer to xref:ref.adoc#ugr.ref.xml.component_descriptor[Component Descriptor Reference].
+
+[[ugr.ovv.conceptual.aggregate_analysis_engines]]
+== Aggregate Analysis Engines
+
+[NOTE]
+====
+Aggregate Analysis Engine, Delegate Analysis Engine, Tightly and Loosely Coupled, Flow Specification, Analysis Engine Assembler
+====
+
+[[ugr.ovv.conceptual.sample_aggregate]]
+.Sample Aggregate Analysis Engine
+image::images/overview-and-setup/conceptual_overview_files/image006.png["Picture of multiple parts (a language identifier, tokenizer, part of speech annotator, shallow parser, and named entity detector) strung together into a flow, and all of them wrapped as a single aggregate object, which produces as annotations the union of all the results of the individual annotator components ( tokens, parts of speech, names, organizations, places, persons, etc.)"]
+
+A simple or primitive UIMA Analysis Engine (AE) contains a single annotator.
+AEs, however, may be defined to contain other AEs organized in a workflow.
+These more complex analysis engines are called *Aggregate Analysis Engines.*
+
+Annotators tend to perform fairly granular functions, for example language detection, tokenization or part of speech detection.
+These functions typically address just part of an overall analysis task.
+A workflow  of component engines may be orchestrated to perform more complex tasks.
+
+An AE that performs named entity detection, for example, may include a pipeline of annotators starting with language detection feeding tokenization, then part-of-speech detection, then deep grammatical parsing and then finally named-entity detection.
+Each step in the pipeline is required by the subsequent analysis.
+For example, the final named-entity annotator can only do its analysis if the previous deep grammatical parse was recorded in the CAS.
+
+Aggregate AEs are built to encapsulate potentially complex internal structure and insulate it from users of the AE.
+In our example, the aggregate analysis engine developer acquires the internal components, defines the necessary flow between them and publishes the resulting AE.
+Consider the simple example illustrated in <<ugr.ovv.conceptual.sample_aggregate>> where "`MyNamed-EntityDetector`" is composed of a linear flow of more primitive analysis engines.
+
+Users of this AE need not know how it is constructed internally but only need its name and its published input requirements and output types.
+These must be declared in the aggregate AE's descriptor.
+Aggregate AE's descriptors declare the components they contain and a **flow specification**.
+The flow specification defines the order in which the internal component AEs should be run.
+The internal AEs specified in an aggregate are also called the *delegate analysis engines*. 
+The term "delegate" is used because aggregate AE's are thought to "delegate" functions to their internal AEs.
+
+The developer can implement a "Flow Controller" and include it as part  of an aggregate AE by referring to it in the aggregate AE's descriptor.
+The flow controller is responsible for computing the "flow", that is,  for determining the order in which of delegate AE's that will process the CAS.
+The Flow Contoller has access to the CAS and any external resources it may require  for determining the flow.
+It can do this dynamically at run-time, it can  make multi-step decisions and it can consider any sort of flow specification  included in the aggregate AE's descriptor.
+See xref:tug.adoc#ugr.tug.fc[Flow Controller Developer's Guide]  for details on the UIMA Flow Controller interface. 
+
+We refer to the development role associated with building an aggregate from delegate AEs as the *Analysis Engine Assembler* .
+
+The UIMA framework, given an aggregate analysis engine descriptor, will run all delegate AEs, ensuring that each one gets access to the CAS in the sequence produced by the flow controller.
+The UIMA framework is equipped to handle different deployments where the delegate engines, for example, are *tightly-coupled* (running in the same process) or *loosely-coupled* (running in separate processes or even on different machines). The framework supports a number of remote protocols for loose coupling deployments of aggregate analysis engines.
+
+The UIMA framework facilitates the deployment of AEs as remote services by using an adapter layer that automatically creates the necessary infrastructure in response to a declaration in the component's descriptor.
+For more details on creating aggregate analysis engines refer to xref:ref.adoc#ugr.ref.xml.component_descriptor[Component Descriptor Reference].
+The component descriptor editor tool assists in the specification of aggregate AEs from a repository of available engines.
+For more details on this tool refer to xref:tools.adoc#ugr.tools.cde[Component Descriptor Editor User’s Guide].
+
+The UIMA framework implementation has two built-in flow implementations: one that support a linear flow between components, and one with conditional branching based on the language of the document.
+It also supports user-provided flow controllers, as described in xref:tug.adoc#ugr.tug.fc[Flow Controller Developer's Guide].
+Furthermore, the application developer is free to create multiple AEs and provide their own logic to combine the AEs in arbitrarily complex flows.
+For more details on this the reader may refer to xref:tug.adoc#ugr.tug.application.using_aes[Using Analysis Engines].
+
+[[ugr.ovv.conceptual.applicaiton_building_and_collection_processing]]
+== Application Building and Collection Processing
+
+[NOTE]
+====
+Process Method, Collection Processing Architecture, Collection Reader, CAS Consumer, CAS Initializer, Collection Processing Engine, Collection Processing Manager.
+====
+
+[[ugr.ovv.conceptual.using_framework_from_an_application]]
+=== Using the framework from an Application
+
+[[ugr.ovv.conceptual.application_factory_ae]]
+.Using UIMA Framework to create and interact with an Analysis Engine
+image::images/overview-and-setup/conceptual_overview_files/image008.png["Picture of application interacting with UIMA's factory to produce an analysis engine, which acts as a container for annotators, and interfaces with the application via the process and getMetaData methods
+among others."]
+
+As mentioned above, the basic AE interface may be thought of as simply CAS in/CAS out.
+
+The application is responsible for interacting with the UIMA framework to instantiate an AE, create or acquire an input CAS, initialize the input CAS with a document and then pass it to the AE through the **process method**.
+This interaction with the framework is illustrated in <<ugr.ovv.conceptual.application_factory_ae>>. 
+
+The UIMA AE Factory takes the declarative information from the Component Descriptor and the class files implementing the annotator, and instantiates the AE instance, setting up the CAS and the UIMA Context.
+
+The AE, possibly calling many delegate AEs internally, performs the overall analysis and its process method returns the CAS containing new analysis results. 
+
+The application then decides what to do with the returned CAS.
+There are many possibilities.
+For instance the application could: display the results, store the CAS to disk for post processing, extract and index analysis results as part of a search or database application etc.
+
+The UIMA framework provides methods to support the application developer in creating and managing CASes and instantiating, running and managing AEs.
+Details may be found in xref:tug.adoc#ugr.tug.application[Application Developer’s Guide].
+
+[[ugr.ovv.conceptual.graduating_to_collection_processing]]
+=== Graduating to Collection Processing
+
+.High-Level UIMA Component Architecture from Source to Sink
+image::images/overview-and-setup/conceptual_overview_files/image010.png[]
+
+Many UIM applications analyze entire collections of documents.
+They connect to different document sources and do different things with the results.
+But in the typical case, the application must generally follow these logical steps: 
+
+. Connect to a physical source
+. Acquire a document from the source
+. Initialize a CAS with the document to be analyzed
+. Send the CAS to a selected analysis engine
+. Process the resulting CAS
+. Go back to 2 until the collection is processed
+. Do any final processing required after all the documents in the collection have been analyzed
+
+UIMA supports UIM application development for this general type of processing through its **Collection Processing Architecture**.
+
+As part of the collection processing architecture UIMA introduces two primary components in addition to the annotator and analysis engine.
+These are the *Collection Reader* and the **CAS Consumer**.
+The complete flow from source, through document analysis, and to CAS Consumers supported by UIMA is illustrated in <<ugr.ovv.conceptual.fig.cpe>>.
+
+The Collection Reader's job is to connect to and iterate through a source collection, acquiring documents and initializing CASes for analysis. 
+
+CAS Consumers, as the name suggests, function at the end of the flow.
+Their job is to do the final CAS processing.
+A CAS Consumer may be implemented, for example, to index CAS contents in a search engine, extract elements of interest and populate a relational database or serialize and store analysis results to disk for subsequent and further analysis. 
+
+A UIMA *Collection Processing Engine* (CPE) is an aggregate component that specifies a "`source to sink`" flow from a Collection Reader though a set of analysis engines and then to a set of CAS Consumers. 
+
+CPEs are specified by XML files called CPE Descriptors.
+These are declarative specifications that point to their contained components (Collection Readers, analysis engines and CAS Consumers) and indicate a flow among them.
+The flow specification allows for filtering capabilities to, for example, skip over AEs based on CAS contents.
+Details about the format of CPE Descriptors may be found in xref:ref.adoc#ugr.ref.xml.cpe_descriptor[Collection Processing Engine Descriptor Reference].
+
+[[ugr.ovv.conceptual.fig.cpe]]
+.Collection Processing Manager in UIMA Framework
+image::images/overview-and-setup/conceptual_overview_files/image012.png["box and arrows picture of application using CPE factory to instantiate a Collection Processing Engine, and that engine interacting with the application."]
+
+The UIMA framework includes a *Collection Processing Manager* (CPM). The CPM is capable of reading a CPE descriptor, and deploying and running the specified CPE. <<ugr.ovv.conceptual.fig.cpe>> illustrates the role of the CPM in the UIMA Framework.
+
+Key features of the CPM are failure recovery, CAS management and scale-out. 
+
+Collections may be large and take considerable time to analyze.
+A configurable behavior of the CPM is to log faults on single document failures while continuing to process the collection.
+This behavior is commonly used because analysis components often tend to be the weakest link -- in practice they may choke on strangely formatted content. 
+
+This deployment option requires that the CPM run in a separate process or a machine distinct from the CPE components.
+A CPE may be configured to run with a variety of deployment options that control the features provided by the CPM.
+For details see xref:ref.adoc#ugr.ref.xml.cpe_descriptor[Collection Processing Engine Descriptor Reference].
+
+The UIMA SDK also provides a tool called the CPE Configurator.
+This tool provides the developer with a user interface that simplifies the process of connecting up all the components in a CPE and running the result.
+For details on using the CPE Configurator see xref:tools.adoc#ugr.tools.cpe[Collection Processing Engine Configurator User’s Guide].
+This tool currently does not provide access to the full set of CPE deployment options supported by the CPM; however, you can configure other parts of the CPE descriptor by editing it directly.
+For details on how to create and run CPEs refer to xref:tug.adoc#ugr.tools.cpe[Collection Processing Engine Developer's Guide].
+
+[[ugr.ovv.conceptual.exploiting_analysis_results]]
+== Exploiting Analysis Results
+
+[NOTE]
+====
+Semantic Search, XML Fragment Queries.
+====
+
+[[ugr.ovv.conceptual.semantic_search]]
+=== Semantic Search
+
+In a simple UIMA Collection Processing Engine (CPE), a Collection Reader reads documents from the file system and initializes CASs with their content.
+These are then fed to an AE that annotates tokens and sentences, the CASs, now enriched with token and sentence information, are passed to a CAS Consumer that populates a search engine index. 
+
+The search engine query processor can then use the token index to provide basic key-word search.
+For example, given a query "`center`" the search engine would return all the documents that contained the word "`center`".
+
+*Semantic Search* is a search paradigm that can exploit the additional metadata generated by analytics like a UIMA CPE.
+
+Consider that we plugged a named-entity recognizer into the CPE described above.
+Assume this analysis engine is capable of detecting in documents and annotating in the CAS mentions of persons and organizations.
+
+Complementing the name-entity recognizer we add a CAS Consumer that extracts in addition to token and sentence annotations, the person and organizations added to the CASs by the name-entity detector.
+It then feeds these into the semantic search engine's index.
+
+A semantic search engine can exploit this addition information from the CAS to support more powerful queries.
+For example, imagine a user is looking for documents that mention an organization with "`center`" it is name but is not sure of the full or precise name of the organization.
+A key-word search on "`center`" would likely produce way too many documents because "`center`" is a common and ambiguous term.
+A semantic search engine might support a query language called **XML Fragments**.
+This query language is designed to exploit the CAS annotations entered in its index.
+The XML Fragment query, for example, 
+
+[source]
+----
+<organization> center </organization>
+----
+
+will produce first only documents that contain "`center`" where it appears as part of a mention annotated as an organization by the name-entity recognizer.
+This will likely be a much shorter list of documents more precisely matching the user's interest.
+
+Consider taking this one step further.
+We add a relationship recognizer that annotates mentions of the CEO-of relationship.
+We configure the CAS Consumer so that it sends these new relationship annotations to the semantic search index as well.
+With these additional analysis results in the index we can submit queries like 
+
+[source]
+----
+<ceo_of>
+    <person> center </person>
+    <organization> center </organization>
+<ceo_of>
+----
+
+This query will precisely target documents that contain a mention of an organization with "`center`" as part of its name where that organization is mentioned as part of a `CEO-of` relationship annotated by the relationship recognizer.
+
+For more details about using UIMA and Semantic Search see the section on integrating text analysis and search in xref:tug.adoc#ugr.tug.application[Application Developer’s Guide].
+
+[[ugr.ovv.conceptual.databases]]
+=== Databases
+
+Search engine indices are not the only place to deposit analysis results for use by applications.
+Another classic example is populating databases.
+While many approaches are possible with varying degrees of flexibly and performance all are highly dependent on application specifics.
+We included a simple sample CAS Consumer that provides the basics for getting your analysis result into a relational database.
+It extracts annotations from a CAS and writes them to a relational database, using the open source Apache Derby database.
+
+[[ugr.ovv.conceptual.multimodal_processing]]
+== Multimodal Processing in UIMA
+
+In previous sections we've seen how the CAS is initialized with an initial artifact that will be subsequently analyzed by Analysis engines and CAS Consumers.
+The first Analysis engine may make some assertions about the artifact, for example, in the form of annotations.
+Subsequent Analysis engines will make further assertions about both the artifact and previous analysis results, and finally one or more CAS Consumers will extract information from these CASs for structured information storage.
+
+[[ugr.ovv.conceptual.fig.multiple_sofas]]
+.Multiple Sofas in support of multi-modal analysis of an audio Stream. Someengines work on the audio "`view`", some on the text "`view`" and some on both.
+image::images/overview-and-setup/conceptual_overview_files/image014.png["Picture showing audio on the left broken into segments by a segmentation component, then sent to multiple analysis pipelines in parallel, some processing the raw audio, others processing the recognized speech as text."]
+
+Consider a processing pipeline, illustrated in <<ugr.ovv.conceptual.fig.multiple_sofas>>, that starts with an audio recording of a conversation, transcribes the audio into text, and then extracts information from the text transcript.
+Analysis Engines at the start of the pipeline are analyzing an audio subject of analysis, and later analysis engines are analyzing a text subject of analysis.
+The CAS Consumer will likely want to build a search index from concepts found in the text to the original audio segment covered by the concept.
+
+What becomes clear from this relatively simple scenario is that the CAS must be capable of simultaneously holding multiple subjects of analysis.
+Some analysis engine will analyze only one subject of analysis, some will analyze one and create another, and some will need to access multiple subjects of analysis at the same time. 
+
+The support in UIMA for multiple subjects of analysis is called *Sofa* support; 
+Sofa is an acronym which is derived from __S__ubject _of_ __A__nalysis, which is a physical  representation of an artifact (e.g., the detagged text of a web-page, the HTML  text of the same web-page, the audio segment of a video, the close-caption text  of the same audio segment).
+A Sofa may be associated with CAS Views.
+A particular CAS will have one or more views, each view corresponding to a particular subject of analysis, together with a set of the defined indexes that index the metadata (that is, Feature Structures) created in that view.
+
+Analysis results can be indexed in, or "`belong`" to, a specific view.
+UIMA components may be written in "`Multi-View`" mode - able to create and access multiple Sofas at the same time, or in "`Single-View`" mode, simply receiving a particular view of the CAS corresponding to a particular single Sofa.
+For single-view mode components, it is up to the person assembling the component to supply the needed information to insure a particular view is passed to the component at run time.
+This is done using XML descriptors for Sofa mapping (see xref:tug.adoc#ugr.tug.mvs.sofa_name_mapping[Sofa Name Mapping]).
+
+Multi-View capability brings benefits to text-only processing as well.
+An input document can be transformed from one format to another.
+Examples of this include transforming text from HTML to plain text or from one natural language to another. 
+
+[[ugr.ovv.conceptual.next_steps]]
+== Next Steps
+
+This chapter presented a high-level overview of UIMA concepts.
+Along the way, it pointed to other documents in the UIMA SDK documentation set where the reader can find details on how to apply the related concepts in building applications with the UIMA SDK.
+
+At this point the reader may return to the xref:oas.adoc#ugr.project_overview_doc_use[documentation guide] to learn how they might proceed in getting started using UIMA.
+
+For a more detailed overview of the UIMA architecture, framework and development roles we refer the reader to the following paper:
+
+* D. Ferrucci and A. Lally, __"Building an example application using the Unstructured Information Management Architecture",__ __IBM Systems Journal__ **43**, No. 3, 455-475 (2004). 
+
+This paper can be found on line at http://www.research.ibm.com/journal/sj43-3.html
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/oas/eclipse_setup.adoc b/uimaj-documentation/src/docs/asciidoc/oas/eclipse_setup.adoc
new file mode 100644
index 0000000..d1df425
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/oas/eclipse_setup.adoc
@@ -0,0 +1,199 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.ovv.eclipse_setup]]
+= Setting up the Eclipse IDE to work with UIMA
+// <titleabbrev>Eclipse IDE setup for UIMA</titleabbrev>
+
+This chapter describes how to set up the UIMA SDK to work with Eclipse.
+Eclipse (&url_eclipse;link:) is a popular open-source Integrated Development Environment for many things, including Java.
+The UIMA SDK does not require that you use Eclipse.
+However, we recommend that you do use Eclipse because some useful UIMA SDK tools run as plug-ins to the Eclipse platform and because the UIMA SDK examples are provided in a form that's easy to import into your Eclipse environment.
+
+If you are not planning on using the UIMA SDK with Eclipse, you may skip this chapter and read xref:tug.adoc#ugr.tug.aae[Annotator and Analysis Engine Developer's Guide]next.
+
+This chapter provides instructions for 
+
+* installing Eclipse, 
+* installing the UIMA SDK's Eclipse plugins into your Eclipse environment, and 
+* importing the example UIMA code into an Eclipse project. 
+
+The UIMA Eclipse plugins are designed to be used with Eclipse version 4.10 (2018-12) or later. 
+
+[NOTE]
+====
+You will need to run Eclipse using a Java at the 1.8 or later level, in order to use the UIMA Eclipse plugins.
+====
+
+[[ugr.ovv.eclipse_setup.installation]]
+== Installation
+
+[[ugr.ovv.eclipse_setup.install_eclipse]]
+=== Install Eclipse
+
+* Go to &url_eclipse;link: and follow the instructions there to download Eclipse. 
+* We recommend using the latest release level. Navigate to the Eclipse Release version you want and download the archive for your platform.
+* Unzip the archive to install Eclipse somewhere, e.g., c:\
+* Eclipse has a bit of a learning curve. If you plan to make significant use of Eclipse, check out the tutorial under the help menu. It is well worth the effort. There are also books you can get that describe Eclipse and its use.
+
+The first time Eclipse starts up it will take a bit longer as it completes its installation.
+A "`welcome`" page will come up.
+After you are through reading the welcome information, click on the arrow to exit the welcome page and get to the main Eclipse screens.
+
+[[ugr.ovv.eclipse_setup.install_uima_eclipse_plugins]]
+=== Installing the UIMA Eclipse Plugins
+
+The best way to do this is to use the Eclipse Install New Software mechanism, because that will  insure that all needed prerequisites are also installed.
+See below for an alternative, manual approach.
+
+[NOTE]
+====
+If your computer is on an internet connection which uses a proxy server, you can configure Eclipse to know about that.
+Put your proxy settings into Eclipse using the Eclipse preferences by accessing the menus: Window →Preferences... →Install/Update, and Enable HTTP proxy connection under the Proxy Settings with the information about your proxy. 
+====
+
+To use the Eclipse Install New Software mechanism, start Eclipse, and then pick the menu ``Help → Install new software...``.
+In the next page, enter the following URL in the "Work with" box and press enter: 
+
+*``https://www.apache.org/dist/uima/eclipse-update-site/``
++
+*``https://www.apache.org/dist/uima/eclipse-update-site-v3/``
++
+
+Choose the 2nd if you are working with core UIMA Java SDK at version 3 or later.
+.
+
+Now select the plugin tools you wish to install, and click Next, and follow the  remaining panels to install the UIMA plugins. 
+
+[[ugr.ovv.eclipse_setup.install_uima_sdk]]
+=== Install the UIMA SDK
+
+If you haven't already done so, please download and install the UIMA SDK from &url_apache_uima_download;link:.
+Be sure to set the environmental variable UIMA_HOME pointing to the root of the installed UIMA SDK and run the `adjustExamplePaths.bat` or `adjustExamplePaths.sh` script, as explained in the README.
+
+The environmental parameter UIMA_HOME is used by the command-line scripts in the %UIMA_HOME%/bin directory as well as by eclipse run configurations in the uimaj-examples sample project.
+
+[[ugr.ovv.eclipse_setup.install_uima_eclipse_plugins_manually]]
+=== Installing the UIMA Eclipse Plugins, manually
+
+If you installed the UIMA plugins using the update mechanism above, please skip this section.
+
+If you are unable to use the Eclipse Update mechanism to install the UIMA plugins, you  can do this manually.
+In the directory %UIMA_HOME%/eclipsePlugins (The environment variable %UIMA_HOME% is where you installed the UIMA SDK), you will see a set of folders.
+Copy these to your %ECLIPSE_HOME%/dropins directory (%ECLIPSE_HOME% is where you installed Eclipse).
+
+[[ugr.ovv.eclipse_setup.start_eclipse]]
+=== Start Eclipse
+
+If you have Eclipse running, restart it (shut it down, and start it again) using the `-clean` option; you can do this by running the command `eclipse -clean` (see explanation in the next section) in the directory where you installed Eclipse.
+You may want to set up a desktop shortcut at this point for Eclipse.
+
+[[ugr.ovv.eclipse_setup.special_startup_parameter_clean]]
+==== Special startup parameter for Eclipse: -clean
+
+If you have modified the plugin structure (by copying or files directly in the file system) after you started it for the first time, please include the "`-clean`" parameter in the startup arguments to Eclipse, _one time_ (after any plugin modifications were done). This is needed because Eclipse may not notice the changes you made, otherwise.
+This parameter forces Eclipse to reexamine all of its plugins at startup and recompute any cached information about them.
+
+[[ugr.ovv.eclipse_setup.example_code]]
+== Setting up Eclipse to view Example Code
+
+Later chapters refer to example code.
+Here's how to create a special project in Eclipse to hold the examples.
+
+* In Eclipse, if the Java perspective is not already open, switch to it by going to Window →Open Perspective →Java.
+* Set up a class path variable named UIMA_HOME, whose value is the directory where you installed the UIMA SDK. This is done as follows: 
++
+** Go to Window →Preferences →Java →Build Path →Classpath Variables.
+** Click "`New`"
+** Enter UIMA_HOME (all capitals, exactly as written) in the "`Name`" field.
+** Enter your installation directory (e.g. ``C:/Program Files/apache-uima``) in the "`Path`" field
+** Click "`OK`" in the "`New Variable Entry`" dialog
+** Click "`OK`" in the "`Preferences`" dialog
+** If it asks you if you want to do a full build, click "`Yes`"
+* Select the File →Import menu option
+* Select "`General/Existing Project into Workspace`" and click the "`Next`" button.
+* Click "`Browse`" and browse to the %UIMA_HOME%/examples directory
+* Click "`Finish.`" This will create a new project called "`uimaj-examples`" in your Eclipse workspace. There should be no compilation errors. 
+
+To verify that you have set up the project correctly, check that there are no error messages in the "`Problems`" view.
+
+[[ugr.ovv.eclipse_setup.adding_source]]
+== Adding the UIMA source code to the jar files
+
+[NOTE]
+====
+If you are running a current version of Eclipse, and have the m2e (Maven extensions for Eclipse)  plugin installed, Eclipse should be able to automatically download the source for the jars, so you may not need to do anything special (it does take a few seconds, and you need an internet connection).
+====
+
+Otherwise, if you would like to be able to jump to the UIMA source code in Eclipse or to step through it with the debugger, you can add the UIMA source code directly to the jar files.
+This is done via a shell script that comes with the source distribution.
+To add the source code to the jars, you need to: 
+
+* Download and unpack the UIMA source distribution. 
+* Download and install the UIMA binary distribution (the UIMA_HOME environment variable needs to be set to point to where you installed the UIMA binary distribution). 
+* "cd" to the root directory of the source distribution
+* Execute the `src\main\readme_src\addSourceToJars` script in the root directory of the  source distribution. 
+
+This adds the source code to the jar files, and it will then be automatically available from Eclipse.
+There is no further Eclipse setup required. 
+
+[[ugr.ovv.eclipse_setup.linking_uima_javadocs]]
+== Attaching UIMA Javadocs
+
+The binary distribution also includes the UIMA Javadocs.
+They are attached to the UIMA library Jar files in the uima-examples project described above.
+You can attach the Javadocs to your own project as well. 
+
+[NOTE]
+====
+If you attached the source as described in the previous section, you  don't need to attach the Javadocs because the source includes the Javadoc comments.
+====
+
+Attaching the Javadocs enables Javadoc help for UIMA APIs.
+After they are  attached, if you hover your mouse over a certain UIMA api element, the corresponding Javadoc will appear.
+You can then press "`F2`" to make the hover "stick", or "`Shift-F2`" to open the default  web-browser on your system to let you browse the entire Javadoc information  for that element. 
+
+If this pop-up behavior is something you don't want, you can turn it off in the Eclipse preferences, in the menu __Window → Preferences → Java → Editors → hovers__. 
+
+Eclipse also has a Javadoc "view" which you can show, using the __Window → Show View → Javadoc__.
+
+See xref:ref.adoc#ugr.ref.javadocs.libraries[Using named Eclipse User Libraries] for information on how to set up a UIMA "library" with the Javadocs attached, which can be reused for other projects in your Eclipse workspace.
+
+You can attach the Javadocs to each UIMA library jar you think you might be  interested in.
+It makes most sense for the uima-core.jar, you'll probably use the core APIs most of all. 
+
+Here's a screenshot of what you should see when you hover your mouse pointer over the class name "`CAS`" in the source code.
+
+.Screenshot of mouse-over for UIMA APIs
+image::images/overview-and-setup/eclipse_setup_files/image004.jpg[Screenshot of mouse-over for UIMA APIs]
+
+[[ugr.ovv.eclipse_setup.running_external_tools_from_eclipse]]
+== Running external tools from Eclipse
+
+You can run many tools without using Eclipse at all, by using the shell scripts in the UIMA SDK's bin directory.
+In addition, many tools can be run from inside Eclipse; examples are the Document Analyzer, CPE Configurator, CAS Visual Debugger,  and JCasGen.
+The uimaj-examples project provides Eclipse launch configurations that make this easy to do.
+
+To run these tools from Eclipse:
+
+* If the Java perspective is not already open, switch to it by going to Window →Open Perspective →Java.
+* Go to __Run → Run...__ 
+* In the window that appears, select "`UIMA CPE GUI`", "`UIMA CAS Visual Debugger`", "`UIMA JCasGen`", or "`UIMA Document Analyzer`" from the list of run configurations on the left. (If you don't see, these, please select the uimaj-examples project and do a __Menu → File → Refresh__).
+* Press the "`Run`" button. The tools should start. Close the tools by clicking the "`X`" in the upper right corner on the GUI. 
+
+For instructions on using the Document Analyzer and CPE Configurator, in the xref:tools.adoc#ugr.tools.doc_analyzer[Docuemnt Analyzer], and xref:tools.adoc#ugr.tools.cpe[Collection Processing Engine Configurator User's Guide].
+For instructions on using the CAS Visual Debugger and JCasGen, see xref:tools.adoc#ugr.tools.cvd[CAS Visual Debugger] and xref:tools.adoc#ugr.tools.jcasgen[JCasGen].
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/oas/faqs.adoc b/uimaj-documentation/src/docs/asciidoc/oas/faqs.adoc
new file mode 100644
index 0000000..80cc7d0
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/oas/faqs.adoc
@@ -0,0 +1,205 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.faqs]]
+= UIMA Frequently Asked Questions (FAQ's)
+// <titleabbrev>UIMA FAQ's</titleabbrev>
+
+
+*What is UIMA?*::
+UIMA stands for Unstructured Information Management Architecture.
+It is component software architecture for the development, discovery, composition and deployment of multi-modal analytics for the analysis of unstructured information.
++
+UIMA processing occurs through a series of modules called <<ugr.faqs.annotator_versus_ae,analysis engines>>.
+The result of analysis is an assignment of semantics to the elements of unstructured data, for example, the indication that the phrase "`Washington`" refers to a person's name or that it refers to a place.
++
+Analysis Engine's output can be saved in conventional structures, for example, relational databases or search engine indices, where the content of the original unstructured information may be efficiently accessed according to its inferred semantics. 
++
+UIMA supports developers in creating, integrating, and deploying components across platforms and among dispersed teams working to develop unstructured information management applications.
+
+*How do you pronounce UIMA?*::
+You –eee –muh. 
+
+*What's the difference between UIMA and the Apache UIMA?*::
+UIMA is an architecture which specifies component interfaces, design patterns, data representations and development roles.
++
+Apache UIMA is an open source, Apache-licensed software project.
+It includes run-time frameworks in Java and C++, APIs and tools for implementing, composing, packaging and deploying UIMA components.
++
+The UIMA run-time framework allows developers to plug-in their components and applications and run them on different platforms and according to different deployment options that range from tightly-coupled (running in the same process space) to loosely-coupled (distributed across different processes or machines for greater scale, flexibility and recoverability).
++
+The UIMA project has several significant subprojects, including UIMA-AS (for flexibly scaling out UIMA pipelines over clusters of machines), uimaFIT (for a way of using UIMA without the xml descriptors; also provides  many convenience methods), UIMA-DUCC (for managing clusters of  machines running scaled-out UIMA "jobs" in a "fair" way), RUTA (Eclipse-based tooling and \ a runtime framework for development of rule-based Annotators), Addons (where you can find many extensions), and uimaFIT supplying a Java centric set of friendlier interfaces and avoiding XML.
+
+[[ugr.faqs.what_is_an_annotation]]
+*What is an Annotation?*::
+An annotation is metadata that is associated with a region of a document.
+It often is a label, typically represented as string of characters.
+The region may be the whole document. 
++
+An example is the label "`Person`" associated with the span of text "`George Washington`".
+We say that "`Person`" annotates "`George Washington`" in the sentence "`George
+Washington was the first president of the United States`".
+The association of the label "`Person`" with a particular span of text is an annotation.
+Another example may have an annotation represent a topic, like "`American
+Presidents`" and be used to label an entire document.
++
+Annotations are not limited to regions of texts.
+An annotation may annotate a region of an image or a segment of audio.
+The same concepts apply.
+
+[[ugr.faqs.what_is_the_cas]]
+*What is the CAS?*::
+The CAS stands for Common Analysis Structure.
+It provides cooperating UIMA components with a common representation and mechanism for shared access to the artifact being analyzed (e.g., a document, audio file, video stream etc.) and the current analysis results.
+
+*What does the CAS contain?*::
+The CAS is a data structure for which UIMA provides multiple interfaces.
+It contains and provides the analysis algorithm or application developer with access to
+
+* the subject of analysis (the artifact being analyzed, like the document),
+* the analysis results or metadata(e.g., annotations, parse trees, relations, entities etc.),
+* indices to the analysis results, and
+* the type system (a schema for the analysis results).
+
++
+A CAS can hold multiple versions of the artifact being analyzed (for instance, a raw html document, and a detagged version, or an English version and a corresponding German version, or an audio sample, and the text that corresponds, etc.). For each version there is a separate instance of the results indices.
+
+*Does the CAS only contain Annotations?*::
+No.
+The CAS contains the artifact being analyzed plus the analysis results.
+Analysis results are those metadata recorded by <<ugr.faqs.annotator_versus_ae,analysis engines>> in the CAS.
+The most common form of analysis result is the addition of an annotation.
+But an analysis engine may write any structure that conforms to the CAS's type system into the CAS.
+These may not be annotations but may be other things, for example links between annotations and properties of objects associated with annotations.
++
+The CAS may have multiple representations of the artifact being analyzed, each one represented in the CAS as a particular Subject of Analysis.
+or <<ugr.faqs.what_is_a_sofa,Sofa>>
+
+*Is the CAS just XML?*::
+No, in fact there are many possible representations of the CAS.
+If all of the <<ugr.faqs.annotator_versus_ae,analysis engines>> are running in the same process, an efficient, in-memory data object is used.
+If a CAS must be sent to an analysis engine on a remote machine, it can be done via an XML or a binary serialization of the CAS. 
++
+The UIMA framework provides multiple serialization and de-serialization methods in various formats, including XML.
+See the Javadocs for the CasIOUtils class. 
+
+*What is a Type System?*::
+Think of a type system as a schema or class model for the <<ugr.faqs.what_is_the_cas,CAS>>.
+It defines the types of objects and their properties (or features) that may be instantiated in a CAS.
+A specific CAS conforms to a particular type system.
+UIMA components declare their input and output with respect to a type system. 
++
+Type Systems include the definitions of types, their properties, range types (these can restrict the value of properties to other types) and single-inheritance hierarchy of types.
+
+[[ugr.faqs.what_is_a_sofa]]
+*What is a Sofa?*::
+Sofa stands for *Subject of Analysis*. A <<ugr.faqs.what_is_the_cas,CAS>> is associated with a single artifact being analysed by a collection of UIMA analysis engines.
+But a single artifact may have multiple independent views, each of which may be analyzed separately by a different set of <<ugr.faqs.annotator_versus_ae,analysis engines>>.
+For example, given a document it may have different translations, each of which are associated with the original document but each potentially analyzed by different engines.
+A CAS may have multiple Views, each containing a different Subject of Analysis corresponding to some version of the original artifact.
+This feature is ideal for multi-modal analysis, where for example, one view of a video stream may be the video frames and the other the close-captions.
+
+[[ugr.faqs.annotator_versus_ae]]
+*What's the difference between an Annotator and an Analysis Engine?*::
+In the terminology of UIMA, an annotator is simply some code that analyzes documents and outputs <<ugr.faqs.what_is_an_annotation,annotations>> on the content of the documents.
+The UIMA framework takes the annotator, together with metadata describing such things as the input requirements and outputs types of the annotator, and produces an analysis engine. 
++
+Analysis Engines contain the framework-provided infrastructure that allows them to be easily combined with other analysis engines in different flows and according to different deployment options (collocated or as web services, for example). 
++
+Analysis Engines are the framework-generated objects that an Application interacts with.
+An Annotator is a user-written class that implements the one of the supported Annotator interfaces.
+
+*Are UIMA analysis engines web services?*::
+They can be deployed as such.
+Deploying an analysis engine as a web service is one of the deployment options supported by the UIMA framework.
+
+*Do Analysis Engines have to be "stateless"?*::
+This is a user-specifyable option.
+The XML metadata for the component includes an `operationalProperties` element which can specify if multiple deployment is allowed.
+If true, then a particular instance of an Engine might not see all the CASes being processed.
+If false, then that component will see all of the CASes being processed.
+In this case, it can accumulate state information among all the CASes.
+Typically, Analysis Engines in the main analysis pipeline are marked multipleDeploymentAllowed = true.
+The CAS Consumer component, on the other hand, defaults to having this property set to false, and is typically associated with some resource like a database or search engine that aggregates analysis results across an entire collection.
++
+Analysis Engines developers are encouraged not to maintain state between documents that would prevent their engine from working as advertised if operated in a parallelized environment.
+
+*Is engine meta-data compatible with web services and UDDI?*::
+All UIMA component implementations are associated with Component Descriptors which represents metadata describing various properties about the component to support discovery, reuse, validation, automatic composition and development tooling.
+In principle, UIMA component descriptors are compatible with web services and UDDI.
+However, the UIMA framework currently uses its own XML representation for component metadata.
+It would not be difficult to convert between UIMA's XML representation and other standard representations.
+
+*How do you scale a UIMA application?*::
+The UIMA framework allows components such as <<ugr.faqs.annotator_versus_ae,analysis engines>> and CAS Consumers to be easily deployed as services or in other containers and managed by systems middleware designed to scale.
+UIMA applications tend to naturally scale-out across documents allowing many documents to be analyzed in parallel.
++
+The UIMA-AS project has extensive capabilities to flexibly scale a UIMA pipeline across multiple machines.
+The UIMA-DUCC project supports a  unified management of large clusters of machines running multiple "jobs"  each consisting of a pipeline with data sources and sinks.
++
+Within the core UIMA framework, there is a component called the CPM (Collection Processing Manager) which has features and configuration settings for scaling an application to increase its throughput and recoverability;  the CPM was the earlier version of scaleout technology, and has been  superceded by the UIMA-AS effort (although it is still supported).
+
+*What does it mean to embed UIMA in systems middleware?*::
+An example of an embedding would be the deployment of a UIMA analysis engine as an Enterprise Java Bean inside an application server such as IBM WebSphere.
+Such an embedding allows the deployer to take advantage of the features and tools provided by WebSphere for achieving scalability, service management, recoverability etc.
+UIMA is independent of any particular systems middleware, so <<ugr.faqs.annotator_versus_ae,analysis engines>> could be deployed on other application servers as well.
+
+*How is the CPM different from a CPE?*::
+These name complimentary aspects of collection processing.
+The CPM (Collection Processing *Manager* is the part of  the UIMA framework that manages the execution of a workflow of UIMA components orchestrated to analyze a large collection of documents.
+The UIMA developer does not implement or describe a CPM.
+It is a piece of infrastructure code that handles CAS transport, instance management, batching, check-pointing, statistics collection and failure recovery in the execution of a collection processing workflow.
++
+A Collection Processing Engine (CPE) is component created by the framework from a specific CPE descriptor.
+A CPE descriptor refers to a series of UIMA components including a Collection Reader, CAS Initializer, Analysis Engine(s) and CAS Consumers.
+These components are organized in a work flow and define a collection analysis job or CPE.
+A CPE acquires documents from a source collection, initializes CASs with document content, performs document analysis and then produces collection level results (e.g., search engine index, database etc). The CPM is the execution engine for a CPE.
+
+*Does UIMA support modalities other than text?*::
+The UIMA architecture supports the development, discovery, composition and deployment of multi-modal analytics including text, audio and video.
+Applications that process text, speech and video have been developed using UIMA.
+This release of the SDK, however, does not include examples of these multi-modal applications. 
++
+It does however include documentation and programming examples for using the key feature required for building multi-modal applications.
+UIMA supports multiple subjects of analysis or <<ugr.faqs.what_is_a_sofa,Sofas>>.
+These allow multiple views of a single artifact to be associated with a <<ugr.faqs.what_is_the_cas,CAS>>.
+For example, if an artifact is a video stream, one Sofa could be associated with the video frames and another with the closed-captions text.
+UIMA's multiple Sofa feature is included and described in this release of the SDK.
+
+*How does UIMA compare to other similar work?*::
+A number of different frameworks for NLP have preceded UIMA.
+Two of them were developed at IBM Research and represent UIMA's early roots.
+For details please refer to the UIMA article that appears in the IBM Systems Journal Vol.
+43, No.
+3 (http://www.research.ibm.com/journal/sj/433/ferrucci.html ).
++
+UIMA has advanced that state of the art along a number of dimensions including: support for distributed deployments in different middleware environments, easy framework embedding in different software product platforms (key for commercial applications), broader architectural converge with its collection processing architecture, support for multiple-modalities, support for efficient integration across programming languages, support for a modern software engineering discipline calling out different roles in the use of UIMA to develop applications, the extensive use of descriptive component metadata to support development tooling, component discovery and composition.
+(Please note that not all of these features are available in this release of the SDK.)
+
+*Is UIMA Open Source?*::
+Yes.
+As of version 2, UIMA development has moved to Apache and is being developed within the Apache open source processes.
+It is licensed under the Apache version 2 license. 
+
+*What Java level and OS are required for the UIMA SDK?*::
+As of release 3.0.0, the UIMA SDK requires Java 1.8.
+It has been tested on mainly on Windows and Linux platforms, with some testing on the MacOSX.
+Other platforms and JDK implementations will likely work, but have not been as significantly tested.
+
+*Can I build my UIM application on top of UIMA?*::
+Yes.
+Apache UIMA is licensed under the Apache version 2 license, enabling you to build and distribute applications which include the framework. 
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/oas/glossary.adoc b/uimaj-documentation/src/docs/asciidoc/oas/glossary.adoc
new file mode 100644
index 0000000..76f2bb4
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/oas/glossary.adoc
@@ -0,0 +1,205 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+:sectnums!:
+
+[glossary]
+[[ugr.glossary]]
+= Glossary: Key Terms & Concepts
+
+[[ugr.glossary.aggregate]]
+Aggregate Analysis Engine::
+  An Analysis Engine made up of multiple subcomponents arranged in a flow.
+  The flow can be one of the two built-in flows, or a custom flow provided by the user.
+
+Analysis Engine::
+  A program that analyzes artifacts (e.g. documents) and infers information about them, and which implements the UIMA interface Specification.
+  It does not matter how the program is built, with what framework or whether or not it contains (sub)components.
+
+Annotation::
+  The association of a metadata, such as a label, with a region of text (or other type of artifact). 
+  For example, the label Person associated with a region of text John Doe constitutes an annotation. 
+  We say Person annotates the span of text from X to Y containing exactly John Doe. 
+  An annotation is represented as a special type in a UIMA type system.
+  It is the type used to record the labeling of regions of a Sofa.
+  Annotations are Feature Structures whose Type is Annotation or a subtype of that.
+
+Annotator::
+  A software component that implements the UIMA annotator interface. 
+  Annotators are implemented to produce and record annotations over regions of an artifact (e.g., text document, audio, and video).
+
+Application::
+  An application is the outer containing code that invokes the UIMA framework functions to instantiate an Analysis Engine or a Collection Processing Engine from a particular descriptor, and run it.
+
+Apache UIMA Java Framework::
+  A Java-based implementation of the UIMA architecture.
+  It provides a run-time environment in which developers can plug in and run their UIMA component implementations and with which they can build and deploy UIM applications.
+  The framework is the core part of the Apache UIMA SDK.
+
+Apache UIMA Software Development Kit (SDK)::
+  The SDK for which you are now reading the documentation.
+  The SDK includes the framework plus additional components such as tooling and examples.
+  Some of the tooling is Eclipse-based.
+
+CAS::
+  The UIMA Common Analysis Structure is the primary data structure which UIMA analysis components use to represent and share analysis results.  It contains:
++
+--
+* The artifact. This is the object being analyzed such as a text document or audio or video stream. The CAS projects one or more views of the artifact. Each view is referred to as a __Sofa__.
+* A type system description -- indicating the types, subtypes, and their features.
+* Analysis metadata -- __standoff__ annotations describing the artifact or a region of the artifact
+* An index repository to support efficient access to and iteration over the results of analysis.
+--
++
+UIMA's primary interface to this structure is provided by a class called the Common Analysis System. We use __CAS__ to refer to both the structure and system. Where the common analysis structure is used through a different interface, the particular implementation of the structure is indicated, For example, the JCas is a native Java object representation of the contents of the common analysis structure.
+A CAS can have multiple views; each view has a unique representation of the artifact, and has its own index repository, representing results of analysis for that representation of the artifact.
+
+CAS Consumer::
+  A component that receives each CAS in the collection, usually after it has been processed by an Analysis Engine.
+  It is responsible for taking the results from the CAS and using them for some purpose, perhaps storing selected results into a database, for instance.
+  The CAS Consumer may also perform collection-level analysis, saving these results in an application-specific, aggregate data structure.
+
+CAS Multiplier::
+  A component, implemented by a UIMA developer, that takes a CAS as input and produces 0 or more new CASes as output.
+  Common use cases for a CAS Multiplier include creating alternative versions of an input Sofa (see CAS Initializer), and breaking a large input CAS into smaller pieces, each of which is emitted as a separate output CAS.
+  There are other uses, however, such as aggregating input CASes into a single output CAS.
+
+CAS Processor::
+  A component of a Collection Processing Engine (CPE) that takes a CAS as input and returns a CAS as output.
+  There are two types of CAS Processors: Analysis Engines and CAS Consumers.
+
+CAS View::
+  A CAS Object which shares the base CAS and type system definition and index specifications, but has a unique index repository and aparticular Sofa.
+  Views are named, and applications and annotators can dynamically create additional views whenever they are needed.
+  Annotations are made with respect to one view.
+  Feature structures can have references to feature structures indexed in other views, as needed.
+
+CDE::
+  The Component Descriptor Editor.
+  This is the Eclipse tool that lets you conveniently edit the UIMA descriptors; see xref:tools.adoc#ugr.tools.cde[Component Descriptor Editor User's Guide].
+
+Collection Processing Engine (CPE)::
+  Performs Collection Processing through the combination of a Collection Reader, zero or more Analysis Engines, and zero or more CAS Consumers.
+  The Collection Processing Manager (CPM) manages the execution of the engine.
+
+Collection Processing Manager (CPM)::
+  The part of the framework that manages the execution of collection processing, routing CASs from the Collection Reader to zero or more Analysis Engines and then to the zero or more CAS Consumers.
+  The CPM provides feedback such as performance statistics and error reporting and supports other features such as parallelization and error handling.
+
+Collection Reader::
+  A component that reads documents from some source, for example a file system or database.
+  The collection reader initializes a CAS with this document.  
+  Each document is returned as a CAS that may then be processed by an Analysis Engines.
+  If the task of populating a CAS from the document is complex, you may use an arbitrarily complex chain of Analysis Engines and have the last one create and initialize a new Sofa.
+
+Feature Structure::
+  An instance of a Type.
+  Feature Structures are kept in the CAS, and may (optionally) be added to the defined indexes.
+  Feature Structures may contain references to other Feature Structures.
+  Feature Structures whose type is Annotation or a subtype of that, are referred to as annotations.
+
+Feature::
+  A data member or attribute of a type.
+  Each feature itself has an associated range type, the type of the value that it can hold.
+  In the database analogy where types are tables, features are columns.
+  In the world of structured data types, each feature is a field, or data member.
+
+Flow Controller::
+  A component which implements the interfaces needed to specify a custom flow within an xref:#ugr.glossary.aggregate[Aggregate Analysis Engine].
+
+Hybrid Analysis Engine::
+  An where more than one of its component s are deployed the same address space and one or more are deployed remotely (part tightly and part loosely-coupled).
+
+Index::
+  Data in the CAS can only be retrieved using Indexes.  
+  Indexes are analogous to the indexes that are specified on tables of a database.
+  Indexes belong to Index Repositories; there is one Repository for each view of the CAS.
+  Indexes are specified to retrieve instances of some CAS Type (including its subtypes), and can be optionally sorted in a user-definable way. 
+  For example, all types derived from the UIMA built-in type `uima.tcas.Annotation`` contain `begin` and `end` features, which mark the begin and end offsets in the text where this annotation occurs.
+  There is a built-in index of `Annotation`s that specifies that annotations are retrieved sequentially by sorting first on the value of the `begin` feature (ascending) and then by the value of the `end` feature (descending).
+  In this case, iterating over the annotations, one first obtains annotations that come sequentially first in the text, while favoring longer annotations, in the case where two annotations start at the same offset.
+  Users can define their own indexes as well.
+
+JCas::
+  A Java object interface to the contents of the CAS.  
+  This interface uses additional generated Java classes, where each type in the CAS is represented as a Java class with the same name, each feature is represented with a getter and setter method, and each instance of a type is represented as a Java object of the corresponding Java class.
+
+Loosely-Coupled Analysis Engine::
+  An xref:#ugr.glossary.aggregate[Aggregate Analysis Engine] where no two of its subcomponents run in the same address space but where each is remote with respect to the others that make up the aggregate.
+  Loosely coupled engines are ideal for using remote services that are not locally available, or for quickly assembling and testing functionality in cross-language, cross-platform distributed environments.
+  They also better enable distributed scaleable implementations where quick recoverability may have a greater impact on overall throughput than analysis speed.
+
+PEAR::
+  An archive file that packages up a UIMA component with its code, descriptor files and other resources required to install and run it in another
+environment.
+  You can generate PEAR files using utilities that come with the UIMA SDK.
+
+Primitive Analysis Engine::
+  An Analysis Engine that is composed of a single Annotator; one that has no subcomponent inside of it; contrast with xref:#ugr.glossary.aggregate[Aggregate Analysis Engine].
+
+[[ugr.glossary.structuredinformation]]
+Structured Information::
+  Items stored in structured resources such as search engine indices, databases or knowledge bases.
+  The canonical example of structured information is the database table.
+  Each element of information in the database is associated with a precisely defined schema where each table column heading indicates its precise semantics, defining exactly how the information should be interpreted by a computer program or end-user.
+
+Subject of Analysis (Sofa)::
+  A piece of data (e.g., text document, image, audio segment, or video segment), which is intended for analysis by UIMA analysis components.
+  It belongs to a CAS View which has the same name; there is a one-to-one correspondence between these.
+  There can be multiple Sofas contained within one CAS, each one representing a different view of the original artifact for example, an audio file could be the original artifact, and also be one Sofa, and another could be the output of a voice-recognition component, where the Sofa would be the corresponding text document. Sofas may be analyzed independently or simultaneously; they all co-exist within the CAS.  
+
+Tightly-Coupled Analysis Engine::
+  An xref:#ugr.glossary.aggregate[Aggregate Analysis Engine] where all of its component s run in the same address space.
+
+Type::
+  A specification of an object in the CAS used to store the results of analysis.
+  Types are defined using inheritance, so some types may be defined purely for the sake of defining other types, and are in this sense abstract types.
+  Types usually contain Features, which are attributes, or properties of the type.
+  A type is roughly equivalent to a class in an object oriented programming language, or a table in a database.
+  Instances of types in the CAS may be indexed for retrieval.
+
+Type System::
+  A collection of related types.
+  All components that can access the CAS, including Applications, Analysis Engines, Collection Readers, Flow Controllers, or CAS Consumers declare the type system that they use. Type systems are shared across Analysins Engines, allowing the outputs of one Analysis Engine to be read as input by another.
+  A type system is roughly analogous to a set of related classes in object oriented programming, or a set of related tables in a database.
+  The type system / type / feature terminology comes from computational linguistics.
+
+Unstructured Information::
+  The canonical example of unstructured information is the natural language text document. 
+  The intended meaning of a document's content is only implicit and its precise interpretation by a computer program requires some degree of analysis to explicate the document's semantics.
+  Other examples include audio, video and images. Contrast with xref:#ugr.glossary.structuredinformation[Structured Information].
+        
+
+UIMA::
+  UIMA is an acronym that stands for Unstructured Information Management Architecture; it is a software architecture which specifies component interfaces, design patterns and development roles for creating, describing, discovering, composing and deploying multi-modal analysis capabilities.
+  The UIMA specification is being developed by a technical committee at OASIS.
+
+UIMA Java Framework::
+  See Apache UIMA Java Framework.
+
+UIMA SDK::
+  See Apache UIMA SDK.
+
+XCAS::
+  An XML representation of the CAS. The XCAS can be used for saving and restoring CASs to and from streams.
+  The UIMA SDK provides XCAS serialization and de-serialization methods for CASes.
+  This is an older serialization format and new UIMA code should use the standard XMI format instead.
+
+XML Metadata Interchange (XMI)::
+  An OMG standard for representing object graphs in XML, which UIMA uses to serialize analysis results from the CAS to an XML representation.  The UIMA SDK provides XMI serialization and de-serialization methods for CASes
+
+:sectnums:
\ No newline at end of file
diff --git a/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image002.png b/uimaj-documentation/src/docs/asciidoc/oas/images/overview-and-setup/conceptual_overview_files/image002.png
similarity index 100%
rename from uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image002.png
rename to uimaj-documentation/src/docs/asciidoc/oas/images/overview-and-setup/conceptual_overview_files/image002.png
Binary files differ
diff --git a/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image004.png b/uimaj-documentation/src/docs/asciidoc/oas/images/overview-and-setup/conceptual_overview_files/image004.png
similarity index 100%
rename from uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image004.png
rename to uimaj-documentation/src/docs/asciidoc/oas/images/overview-and-setup/conceptual_overview_files/image004.png
Binary files differ
diff --git a/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image006.png b/uimaj-documentation/src/docs/asciidoc/oas/images/overview-and-setup/conceptual_overview_files/image006.png
similarity index 100%
rename from uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image006.png
rename to uimaj-documentation/src/docs/asciidoc/oas/images/overview-and-setup/conceptual_overview_files/image006.png
Binary files differ
diff --git a/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image008.png b/uimaj-documentation/src/docs/asciidoc/oas/images/overview-and-setup/conceptual_overview_files/image008.png
similarity index 100%
rename from uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image008.png
rename to uimaj-documentation/src/docs/asciidoc/oas/images/overview-and-setup/conceptual_overview_files/image008.png
Binary files differ
diff --git a/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image010.png b/uimaj-documentation/src/docs/asciidoc/oas/images/overview-and-setup/conceptual_overview_files/image010.png
similarity index 100%
rename from uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image010.png
rename to uimaj-documentation/src/docs/asciidoc/oas/images/overview-and-setup/conceptual_overview_files/image010.png
Binary files differ
diff --git a/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image012.png b/uimaj-documentation/src/docs/asciidoc/oas/images/overview-and-setup/conceptual_overview_files/image012.png
similarity index 100%
rename from uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image012.png
rename to uimaj-documentation/src/docs/asciidoc/oas/images/overview-and-setup/conceptual_overview_files/image012.png
Binary files differ
diff --git a/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image014.png b/uimaj-documentation/src/docs/asciidoc/oas/images/overview-and-setup/conceptual_overview_files/image014.png
similarity index 100%
rename from uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/conceptual_overview_files/image014.png
rename to uimaj-documentation/src/docs/asciidoc/oas/images/overview-and-setup/conceptual_overview_files/image014.png
Binary files differ
diff --git a/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/eclipse_setup_files/image002.jpg b/uimaj-documentation/src/docs/asciidoc/oas/images/overview-and-setup/eclipse_setup_files/image002.jpg
similarity index 100%
rename from uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/eclipse_setup_files/image002.jpg
rename to uimaj-documentation/src/docs/asciidoc/oas/images/overview-and-setup/eclipse_setup_files/image002.jpg
Binary files differ
diff --git a/uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/eclipse_setup_files/image004.jpg b/uimaj-documentation/src/docs/asciidoc/oas/images/overview-and-setup/eclipse_setup_files/image004.jpg
similarity index 100%
rename from uima-docbook-overview-and-setup/src/docbook/images/overview-and-setup/eclipse_setup_files/image004.jpg
rename to uimaj-documentation/src/docs/asciidoc/oas/images/overview-and-setup/eclipse_setup_files/image004.jpg
Binary files differ
diff --git a/uimaj-documentation/src/docs/asciidoc/oas/known_issues.adoc b/uimaj-documentation/src/docs/asciidoc/oas/known_issues.adoc
new file mode 100644
index 0000000..62e5ac5
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/oas/known_issues.adoc
@@ -0,0 +1,30 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.issues]]
+= Known Issues
+
+*JCasGen merge facility only supports Java levels 1.4 or earlier*::
+JCasGen has a facility to merge in user (hand-coded) changes with the code generated by JCasGen.
+This merging supports Java 1.4 constructs only.
+JCasGen generates Java 1.4  compliant code, so as long as any code you change here also only uses Java 1.4 constructs, the  merge will work, even if you're using Java 5 or later.
+If you use syntactic structures particular to Java 5 or later, the merge operation will likely fail to merge properly.
+
+*Descriptor editor in Eclipse tooling does not work with libgcj 4.1.2*::
+The descriptor editor in the Eclipse tooling does not work with libgcj 4.1.2, and possibly other versions of libgcj.
+This is apparently due to a bug in the implementation of their XML library, which results in a class cast error.
+libgcj is used as the default JVM for Eclipse in Ubuntu (and other Linux distributions?).  The workaround is to use a different JVM to start Eclipse.
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/oas/project_overview.adoc b/uimaj-documentation/src/docs/asciidoc/oas/project_overview.adoc
new file mode 100644
index 0000000..0fa12a2
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/oas/project_overview.adoc
@@ -0,0 +1,415 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.project_overview]]
+= UIMA Overview
+// <titleabbrev>Overview</titleabbrev>
+
+The Unstructured Information Management Architecture (UIMA) is an architecture and software framework for creating, discovering, composing and deploying a broad range of multi-modal analysis capabilities and integrating them with search technologies.
+The architecture is undergoing a standardization effort,  referred to as the _UIMA specification_ by a technical committee within http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=uima[OASIS]. 
+
+The _Apache UIMA_ framework is an Apache licensed, open source implementation of the UIMA Architecture, and provides a run-time environment in which developers can plug in and run their UIMA component implementations and with which they can build and deploy UIM applications.
+The framework itself is not specific to any IDE or platform.
+
+It includes an all-Java implementation of the UIMA framework for the development, description, composition and deployment of UIMA components and applications.
+It also provides the developer with an Eclipse-based (http://www.eclipse.org/ ) development environment that includes a set of tools and utilities for using UIMA.
+It also includes  a C++ version of the framework, and enablements for Annotators built in Perl, Python, and TCL.
+
+This chapter is the intended starting point for readers that are new to the Apache UIMA Project.
+It includes this introduction and the following sections:
+
+* <<ugr.project_overview_doc_overview>> provides a list of the books and topics included in the Apache UIMA documentation with a brief summary of each. 
+* <<ugr.project_overview_doc_use>> describes a recommended path through the documentation to help get the reader up and running with UIMA 
+
+The main website for Apache UIMA is http://uima.apache.org.
+Here you  can find out many things, including: 
+
+* how to download (both the binary and source distributions
+* how to participate in the development
+* mailing lists - including the user list used like a forum for questions and answers
+* a Wiki where you can find and contribute all kinds of information, including tips and best practices
+* a sandbox - a subproject for potential new additions to Apache UIMA or to subprojects of it. Things here are works in progress, and may (or may not) be included in releases.
+* links to conferences
+
+
+[[ugr.project_overview_doc_overview]]
+== Apache UIMA Project Documentation Overview
+
+The user documentation for UIMA is organized into several parts. 
+
+* Overviews - this documentation 
+* Eclipse Tooling Installation and Setup - also in this document 
+* Tutorials and Developer's Guides 
+* Tools Users' Guides 
+* References 
+* Version 3 users-guide
+
+The first 2 parts make up this book; the last 4 have individual  books.
+The books are provided both as (somewhat large) html files, viewable in browsers, and also as PDF files.
+The documentation is fully hyperlinked, with tables of contents.
+The PDF versions are set up to  print nicely - they have page numbers included on the cross references within a book. 
+
+If you view the PDF files inside a browser that supports imbedded viewing of PDF, the hyperlinks between different PDF books may work (not  all browsers have been tested...).
+
+The following set of tables gives a more detailed overview of the various parts of the documentation. 
+
+[[ugr.project_overview_overview]]
+=== Overviews
+
+[cols="1,1", frame="all"]
+|===
+
+|__xref:#ugr.project_overview_doc_overview[Overview of the Documentation]__
+| What you are currently reading.
+Lists the documents provided in the Apache  UIMA documentation set and provides a recommended path through the documentation for getting started using UIMA.
+It includes release notes and provides a brief high-level description of  the different software modules included in the Apache UIMA Project.
+
+|__xref:#ugr.ovv.conceptual[Conceptual Overview]__
+|Provides a broad conceptual overview of the UIMA component architecture; includes references to the other documents in the documentation set that provide more detail.
+
+|__xref:#ugr.faqs[UIMA FAQs]__
+|Frequently Asked Questions about general UIMA concepts. (Not a programming resource.)
+
+|__xref:#ugr.issues[Known Issues]__
+|Known issues and problems with the UIMA SDK.
+
+|__xref:#ugr.glossary[Glossary]__
+|UIMA terms and concepts and their basic definitions.
+|===
+
+[[ugr.project_overview_setup]]
+=== Eclipse Tooling Installation and Setup
+
+Provides step-by-step instructions for installing Apache UIMA in the Eclipse Interactive Development Environment.
+See <<ugr.ovv.eclipse_setup>>.
+
+[[ugr.project_overview_tutorials_dev_guides]]
+=== Tutorials and Developer's Guides
+
+[cols="1,1"]
+|===
+
+|__xref:tug.adoc#ugr.tug.aae[Annotators and Analysis Engines]__
+|Tutorial-style guide for building UIMA annotators and analysis engines. This chapter
+                introduces the developer to creating type systems and using UIMA's common data structure,
+                the CAS or Common Analysis Structure. It demonstrates how to use built in tools to specify and create
+                basic UIMA analysis components.
+
+|__xref:tug.adoc#ugr.tug.cpe[Building UIMA Collection Processing Engines]__
+|Tutorial-style guide for building UIMA collection processing engines. These
+               manage the analysis of collections of documents from source to sink.
+
+|__xref:tug.adoc#ugr.tug.application[Developing Complete Applications]__
+|Tutorial-style guide on using the UIMA APIs to create, run and manage UIMA components from
+                your application. Also describes APIs for saving and restoring the contents of a CAS using an XML
+                format called XMI(TM).
+
+|__xref:tug.adoc#ugr.tug.fc[Flow Controller]__
+|When multiple components are combined in an Aggregate, each CAS flow among the various
+                components. UIMA provides two built-in flows, and also allows custom flows to be
+                implemented.
+
+|__xref:tug.adoc#ugr.tug.aas[Developing Applications using Multiple Subjects of Analysis]__
+|A single CAS maybe associated with multiple subjects of analysis (Sofas). These are useful
+                for representing and analyzing different formats or translations of the same document. For
+                multi-modal analysis, Sofas are good for different modal representations of the same stream
+                (e.g., audio and close-captions).This chapter provides the developer details on how to use
+                multiple Sofas in an application.
+
+|__xref:tug.adoc#ugr.tug.mvs[Multiple CAS Views of an Artifact]__
+|UIMA provides an extension to the basic model of the CAS which supports 
+              analysis of multiple views of the same artifact, all contained with the CAS. This 
+              chapter describes the concepts, terminology, and the API and XML extensions that 
+              enable this
+
+|__xref:tug.adoc#ugr.tug.cm[CAS Multiplier]__
+|A component may add additional CASes into the workflow. This may be useful to break up a large
+                artifact into smaller units, or to create a new CAS that collects information from multiple other
+                CASes.
+
+|__xref:tug.adoc#ugr.tug.xmi_emf[XMI and EMF Interoperability]__
+|The UIMA Type system and the contents of the CAS itself can be externalized using the XMI
+                standard for XML MetaData. Eclipse Modeling Framework (EMF) tooling can be used to develop
+                applications that use this information.
+|===
+
+[[ugr.project_overview_tool_guides]]
+=== Tools Users' Guides
+
+[cols="1,1"]
+|===
+
+|__xref:tools.adoc#ugr.tools.cde[Component Descriptor Editor]__
+|Describes the features of the Component Descriptor Editor Tool. This tool provides a GUI for
+                specifying the details of UIMA component descriptors, including those for Analysis Engines
+                (primitive and aggregate), Collection Readers, CAS Consumers and Type Systems.
+
+|__xref:tools.adoc#ugr.tools.cpe[Collection Processing Engine Configurator]__
+|Describes the User Interfaces and features of the CPE Configurator tool. This tool allows the
+                user to select and configure the components of a Collection Processing Engine and then to run the
+                engine.
+
+|__xref:tools.adoc#ugr.tools.pear.packager[PEAR Packager]__
+|Describes how to use the PEAR Packager utility. This utility enables developers to produce an
+                archive file for an analysis engine that includes all required resources for installing that
+                analysis engine in another UIMA environment.
+
+|__xref:tools.adoc#ugr.tools.pear.installer[PEAR Installer]__
+|Describes how to use the PEAR Installer utility. This utility installs and verifies an
+                analysis engine from an archive file (PEAR) with all its resources in the right place so it is ready to
+                run.
+
+|__xref:tools.adoc#ugr.tools.pear.merger[PEAR Merger]__
+|Describes how to use the PEAR Merger utility, which does a simple merge of multiple PEAR
+                packages into one.
+
+|__xref:tools.adoc#ugr.tools.doc_analyzer[Document Analyzer]__
+|Describes the features of a tool for applying a UIMA analysis engine to a set of documents and
+                viewing the results.
+
+|__xref:tools.adoc#ugr.tools.cvd[CAS Visual Debugger]__
+|Describes the features of a tool for viewing the detailed structure and contents of a CAS. Good
+                for debugging.
+
+|__xref:tools.adoc#ugr.tools.jcasgen[JCasGen]__
+|Describes how to run the JCasGen utility, which automatically builds Java classes that
+                correspond to a particular CAS Type System.
+
+|__xref:tools.adoc#ugr.tools.annotation_viewer[XML CAS Viewer]__
+|Describes how to run the supplied viewer to view externalized XML forms of CASes. This viewer
+                is used in the examples.
+|===
+
+[[ugr.project_overview_reference]]
+=== References
+
+[cols="1,1"]
+|===
+
+|__xref:ref.adoc#ugr.ref.javadocs[Introduction to the UIMA API Javadocs]__
+|Javadocs detailing the UIMA programming interfaces.
+
+|__xref:ref.adoc#ugr.ref.xml.component_descriptor[XML: Component Descriptor]__
+|Provides detailed XML format for all the UIMA component descriptors, except the CPE (see next).
+
+|__xref:ref.adoc#ugr.ref.xml.cpe_descriptor[XML: Collection Processing Engine Descriptor]__
+|Provides detailed XML format for the Collection Processing Engine descriptor.
+
+|__xref:ref.adoc#ugr.ref.cas[CAS]__
+|Provides detailed description of the principal CAS interface.
+
+|__xref:ref.adoc#ugr.ref.jcas[JCas]__
+|Provides details on the JCas, a native Java interface to the CAS.
+
+|__xref:ref.adoc#ugr.ref.pear[PEAR Reference]__
+|Provides detailed description of the deployable archive format for UIMA components.
+
+|__xref:ref.adoc#ugr.ref.xmi[XMI CAS Serialization Reference]__
+|Provides detailed description of the deployable archive format for UIMA components.
+
+|===
+
+[[ugr.project_overview_v3]]
+=== Version 3 User's guide
+
+This book describes Version 3's features, capabilities, and differences with version 2. 
+
+[[ugr.project_overview_doc_use]]
+== How to use the Documentation
+
+. Explore this chapter to get an overview of the different documents that are included with Apache UIMA.
+. Read xref:#ugr.ovv.conceptual[xrefstyle=full] to get a broad view of the basic UIMA concepts and philosophy with reference to the other documents included in the documentation set which provide greater detail. 
+. For more general information on the UIMA architecture and how it has been used, refer to the IBM Systems Journal special issue on Unstructured Information Management, on-line at http://www.research.ibm.com/journal/sj43-3.html or to the section of the UIMA project website on Apache website where other publications are listed. 
+. Set up Apache UIMA in your Eclipse environment. To do this, follow the instructions in xref:#ugr.ovv.eclipse_setup[xrefstyle=full]. 
+. Develop sample UIMA annotators, run them and explore the results. Read the xref:tug.adoc#ugr.tug.aae[Annotator and Analysis Engine Developer's Guide] and follow it like a tutorial to learn how to develop your first UIMA annotator and set up and run your first UIMA analysis engines. 
+** As part of this you will use a few tools including 
+*** The UIMA Component Descriptor Editor, described in more detail in the xref:tools.adoc#ugr.tools.cde[Component Descriptor Editor User's Guide] and 
+*** The Document Analyzer, described in more detail in xref:tools.adoc#ugr.tools.doc_analyzer[Document Analyzer User's Guide].
+** While following along in xref:tug.adoc#ugr.tug.aae[Tutorials and User's Guides], reference documents that may help are: 
+*** xref:ref.adoc#ugr.ref.xml.component_descriptor[Component Descriptor Reference] for understanding the analysis engine descriptors 
+*** xref:ref.adoc#ugr.ref.jcas[JCas Reference] for understanding the JCas. 
+. Learn how to create, run and manage a UIMA analysis engine as part of an application. Connect your analysis engine to the provided semantic search engine to learn how a complete analysis and search application may be built with Apache UIMA. The xref:tug.adoc#ugr.tug.application[Application Developer's Guide] will guide you through this process. 
+** As part of this you will use the document analyzer (described in more detail in xref:tools.adoc#ugr.tools.doc_analyzer[Document Analyzer User's Guide] and semantic search GUI tools.
+. Pat yourself on the back. Congratulations! If you reached this step successfully, then you have an appreciation for the UIMA analysis engine architecture. You would have built a few sample annotators, deployed UIMA analysis engines to analyze a few documents, searched over the results using the built-in semantic search engine and viewed the results through a built-in viewer -- all as part of a simple but complete application. 
+. Develop and run a Collection Processing Engine (CPE) to analyze and gather the results of an entire collection of documents. xref:tug.adoc#ugr.tug.cpe[Collection Processing Engine Developer's Guide] will guide you through this process. 
+** As part of this you will use the CPE Configurator tool. For details see xref:tools.adoc#ugr.tools.cpe[Collection Processing Engine Configurator User's Guide]
+** You will also learn about CPE Descriptors. The detailed format for these may be found in the xref:ref.adoc#ugr.ref.xml.cpe_descriptor[Collection Processing Engine Descriptor Reference].
+. Learn how to package up an analysis engine for easy installation into another UIMA environment. xref:tools.adoc#ugr.tools.pear.packager[PEAR Packager User's Guide] and xref:tools.adoc#ugr.tools.pear.installer[PEAR Installer User's Guide] will teach you how to create UIMA analysis engine archives so that you can easily share your components with a broader community. 
+
+[[ugr.project_overview_changes_from_previous]]
+== Changes from UIMA Version 2
+
+See the separate document Version 3 User's Guide.s
+
+[[ugr.project_overview_migrating_from_v2_to_v3]]
+== Migrating existing UIMA pipelines from Version 2 to Version 3
+
+The format of JCas classes changed when going from version 2 to version 3.
+If you had JCas classes for user types, these need to be regenerated using the  version 3 JCasGen tooling or Maven plugin.
+Alternatively, these can be  migrated without regenerating; the migration preserves any customization  users may have added to the JCas classes.
+
+The Version 3 User's Guide has a chapter detailing the migration, including a description of the migration tool to aid in this process.
+
+[[ugr.project_overview_summary]]
+== Apache UIMA Summary
+
+[[ugr.ovv.summary.general]]
+=== General
+
+UIMA supports the development, discovery, composition and deployment of multi-modal analytics for the analysis of unstructured information and its integration with search technologies.
+
+Apache UIMA includes APIs and tools for creating analysis components.
+Examples of analysis components include tokenizers, summarizers, categorizers, parsers, named-entity detectors etc.
+Tutorial examples are provided with Apache UIMA; additional components are available from the community. 
+
+[[ugr.ovv.summary.programming_language_support]]
+=== Programming Language Support
+
+UIMA supports the development and integration of analysis algorithms developed in different programming languages. 
+
+The Apache UIMA project is both a Java framework and a matching C++ enablement layer, which allows annotators to be written in C++ and have access to a C++ version of the CAS.
+The C++ enablement layer also enables annotators to be written in Perl, Python, and TCL, and to interoperate with those written in other languages. 
+
+[[ugr.ovv.general.summary.multi_modal_support]]
+=== Multi-Modal Support
+
+The UIMA architecture supports the development, discovery, composition and deployment of multi-modal analytics, including text, audio and video. xref:tug.adoc#ugr.tug.aas[Annotations, Artifacts, and Sofas] discuss this is more detail.
+
+[[ugr.project_overview_summary_sdk_capabilities]]
+== Summary of Apache UIMA Capabilities
+
+[cols="1,1", frame="all"]
+|===
+
+|Module
+|Description
+
+|UIMA Framework Core
+|
+
+A framework integrating core functions for creating, deploying, running and managing UIMA components, including analysis engines and Collection Processing Engines in collocated and/or distributed configurations. 
+
+The framework includes an implementation of core components for transport layer adaptation, CAS management, workflow management based on declarative specifications, resource management, configuration management, logging, and other functions.
+
+|C++ and other programming language Interoperability
+|
+
+Includes C++ CAS and supports the creation of UIMA compliant C++ components that can be deployed in the UIMA run-time through a built-in JNI adapter.
+This includes high-speed binary serialization.
+
+Includes support for creating service-based UIMA engines.
+This is ideal for wrapping existing code written in different languages.
+
+|Framework Services and APIs
+|Note that interfaces of these components are available to the developer
+              but different implementations are possible in different implementations of the UIMA
+              framework.
+
+|CAS
+|These classes provide the developer with typed access to the Common Analysis Structure (CAS),
+              including type system schema, elements, subjects of analysis and indices. Multiple subjects of
+              analysis (Sofas) mechanism supports the independent or simultaneous analysis of multiple views of
+              the same artifacts (e.g. documents), supporting multi-lingual and multi-modal analysis.
+
+|JCas
+|An alternative interface to the CAS, providing Java-based UIMA Analysis components with
+              native Java object access to CAS types and their attributes or features, using the
+              JavaBeans conventions of getters and setters.
+
+|Collection Processing Management (CPM)
+|Core functions for running UIMA collection processing engines in collocated and/or
+              distributed configurations. The CPM provides scalability across parallel processing pipelines,
+              check-pointing, performance monitoring and recoverability.
+
+|Resource Manager
+|Provides UIMA components with run-time access to external resources handling capabilities
+              such as resource naming, sharing, and caching. 
+
+|Configuration Manager
+|Provides UIMA components with run-time access to their configuration parameter settings. 
+
+|Logger
+|Provides access to a common logging facility.
+
+| Tools and Utilities 
+
+|JCasGen
+|Utility for generating a Java object model for CAS types from a UIMA XML type system
+              definition.
+
+|Saving and Restoring CAS contents
+|APIs in the core framework support saving and restoring the contents of a CAS to streams 
+              in multiple formats, including XMI, binary, and compressed forms.  
+              These apis are collected into the CasIOUtils class.
+
+|PEAR Packager for Eclipse
+|Tool for building a UIMA component archive to facilitate porting, registering, installing and
+              testing components.
+
+|PEAR Installer
+|Tool for installing and verifying a UIMA component archive in a UIMA installation.
+
+|PEAR Merger
+|Utility that combines multiple PEARs into one.
+
+|Component Descriptor Editor
+|Eclipse Plug-in for specifying and configuring component descriptors for UIMA analysis
+              engines as well as other UIMA component types including Collection Readers and CAS
+              Consumers.
+
+|CPE Configurator
+|Graphical tool for configuring Collection Processing Engines and applying them to
+              collections of documents.
+
+|Java Annotation Viewer
+|Viewer for exploring annotations and related CAS data.
+
+|CAS Visual Debugger
+|GUI Java application that provides developers with detailed visual view of the contents of a
+              CAS.
+
+|Document Analyzer
+|GUI Java application that applies analysis engines to sets of documents and shows results in a
+              viewer.
+
+|CAS Editor
+|Eclipse plug-in that lets you edit the contents of a CAS
+
+|UIMA Pipeline Eclipse Launcher
+|Eclipse plug-in that lets you configure Eclipse launchers for UIMA pipelines
+
+| Example Analysis Components 
+
+|Database Writer
+|CAS Consumer that writes the content of selected CAS types into a relational database, using
+              JDBC. This code is in cpe/PersonTitleDBWriterCasConsumer. 
+
+|Annotators
+| Set of simple annotators meant for pedagogical purposes. Includes: Date/time, Room-number,
+              Regular expression, Tokenizer, and Meeting-finder annotator. There are sample CAS Multipliers
+              as well. 
+
+|Flow Controllers
+| There is a sample flow-controller based on the whiteboard concept of sending the CAS to whatever
+              annotator hasn't yet processed it, when that annotator's inputs are available in the CAS. 
+
+|XMI Collection Reader, CAS Consumer
+|Reads and writes the CAS in the XMI format
+
+|File System Collection Reader
+| Simple Collection Reader for pulling documents from the file system and initializing CASes. 
+|===
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/ref.adoc b/uimaj-documentation/src/docs/asciidoc/ref.adoc
new file mode 100644
index 0000000..5262d17
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/ref.adoc
@@ -0,0 +1,44 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+= Apache UIMA™ - References
+:Author: Apache UIMA™ Development Community
+:toc-title: UIMA References
+
+include::ref/common_book_info.adoc[leveloffset=+1]
+
+include::ref/ref.javadocs.adoc[leveloffset=+1]
+
+include::ref/ref.xml.component_descriptor.adoc[leveloffset=+1]
+
+include::ref/ref.xml.cpe_descriptor.adoc[leveloffset=+1]
+
+include::ref/ref.cas.adoc[leveloffset=+1]
+
+include::ref/ref.jcas.adoc[leveloffset=+1]
+
+include::ref/ref.pear.adoc[leveloffset=+1]
+
+include::ref/ref.xmi.adoc[leveloffset=+1]
+
+include::ref/ref.compress.adoc[leveloffset=+1]
+
+include::ref/ref.json.adoc[leveloffset=+1]
+
+include::ref/ref.config.adoc[leveloffset=+1]
+
+include::ref/ref.resources.adoc[leveloffset=+1]
diff --git a/uimaj-documentation/src/docs/asciidoc/ref/common_book_info.adoc b/uimaj-documentation/src/docs/asciidoc/ref/common_book_info.adoc
new file mode 100644
index 0000000..537f3e6
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/ref/common_book_info.adoc
@@ -0,0 +1,42 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+Copyright © 2006, 2021 The Apache Software Foundation
+
+Copyright © 2004, 2006 International Business Machines Corporation
+
+[discrete]
+=== License and Disclaimer
+
+The ASF licenses this documentation to you under the Apache License, Version 2.0 (the "License"); 
+you may not use this documentation except in compliance with the License.  You may obtain a copy of
+the License at
+
+[.text-center]
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, this documentation and its contents are
+distributed under the License on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
+either express or implied.  See the License for the specific language governing permissions and
+limitations under the License.
+
+[discrete]
+=== Trademarks
+
+All terms mentioned in the text that are known to be trademarks or service marks have been 
+appropriately capitalized.  Use of such terms in this book should not be regarded as affecting the
+validity of the the trademark or service mark.
\ No newline at end of file
diff --git a/uima-docbook-references/src/image-source/diagrams.pptx b/uimaj-documentation/src/docs/asciidoc/ref/image-source/diagrams.pptx
similarity index 100%
rename from uima-docbook-references/src/image-source/diagrams.pptx
rename to uimaj-documentation/src/docs/asciidoc/ref/image-source/diagrams.pptx
Binary files differ
diff --git a/uima-docbook-references/src/docbook/images/references/ref.cas/image001.png b/uimaj-documentation/src/docs/asciidoc/ref/images/references/ref.cas/image001.png
similarity index 100%
rename from uima-docbook-references/src/docbook/images/references/ref.cas/image001.png
rename to uimaj-documentation/src/docs/asciidoc/ref/images/references/ref.cas/image001.png
Binary files differ
diff --git a/uima-docbook-references/src/docbook/images/references/ref.javadocs/image002.jpg b/uimaj-documentation/src/docs/asciidoc/ref/images/references/ref.javadocs/image002.jpg
similarity index 100%
rename from uima-docbook-references/src/docbook/images/references/ref.javadocs/image002.jpg
rename to uimaj-documentation/src/docs/asciidoc/ref/images/references/ref.javadocs/image002.jpg
Binary files differ
diff --git a/uima-docbook-references/src/docbook/images/references/ref.json/FScollections.png b/uimaj-documentation/src/docs/asciidoc/ref/images/references/ref.json/FScollections.png
similarity index 100%
rename from uima-docbook-references/src/docbook/images/references/ref.json/FScollections.png
rename to uimaj-documentation/src/docs/asciidoc/ref/images/references/ref.json/FScollections.png
Binary files differ
diff --git a/uima-docbook-references/src/docbook/images/references/ref.json/big_picture2.png b/uimaj-documentation/src/docs/asciidoc/ref/images/references/ref.json/big_picture2.png
similarity index 100%
rename from uima-docbook-references/src/docbook/images/references/ref.json/big_picture2.png
rename to uimaj-documentation/src/docs/asciidoc/ref/images/references/ref.json/big_picture2.png
Binary files differ
diff --git a/uima-docbook-references/src/docbook/images/references/ref.json/image_source.odg b/uimaj-documentation/src/docs/asciidoc/ref/images/references/ref.json/image_source.odg
similarity index 100%
rename from uima-docbook-references/src/docbook/images/references/ref.json/image_source.odg
rename to uimaj-documentation/src/docs/asciidoc/ref/images/references/ref.json/image_source.odg
Binary files differ
diff --git a/uima-docbook-references/src/docbook/images/references/ref.json/multi.view.png b/uimaj-documentation/src/docs/asciidoc/ref/images/references/ref.json/multi.view.png
similarity index 100%
rename from uima-docbook-references/src/docbook/images/references/ref.json/multi.view.png
rename to uimaj-documentation/src/docs/asciidoc/ref/images/references/ref.json/multi.view.png
Binary files differ
diff --git a/uima-docbook-references/src/docbook/images/references/ref.json/two.fs.collections.png b/uimaj-documentation/src/docs/asciidoc/ref/images/references/ref.json/two.fs.collections.png
similarity index 100%
rename from uima-docbook-references/src/docbook/images/references/ref.json/two.fs.collections.png
rename to uimaj-documentation/src/docs/asciidoc/ref/images/references/ref.json/two.fs.collections.png
Binary files differ
diff --git a/uima-docbook-references/src/docbook/images/references/ref.json/types_and_refs.png b/uimaj-documentation/src/docs/asciidoc/ref/images/references/ref.json/types_and_refs.png
similarity index 100%
rename from uima-docbook-references/src/docbook/images/references/ref.json/types_and_refs.png
rename to uimaj-documentation/src/docs/asciidoc/ref/images/references/ref.json/types_and_refs.png
Binary files differ
diff --git a/uima-docbook-references/src/docbook/images/references/ref.pear/image002.jpg b/uimaj-documentation/src/docs/asciidoc/ref/images/references/ref.pear/image002.jpg
similarity index 100%
rename from uima-docbook-references/src/docbook/images/references/ref.pear/image002.jpg
rename to uimaj-documentation/src/docs/asciidoc/ref/images/references/ref.pear/image002.jpg
Binary files differ
diff --git a/uima-docbook-references/src/docbook/images/references/ref.resources/res_resource_kinds.png b/uimaj-documentation/src/docs/asciidoc/ref/images/references/ref.resources/res_resource_kinds.png
similarity index 100%
rename from uima-docbook-references/src/docbook/images/references/ref.resources/res_resource_kinds.png
rename to uimaj-documentation/src/docs/asciidoc/ref/images/references/ref.resources/res_resource_kinds.png
Binary files differ
diff --git a/uima-docbook-references/src/docbook/images/references/ref.xml.cpe_descriptor/image002.png b/uimaj-documentation/src/docs/asciidoc/ref/images/references/ref.xml.cpe_descriptor/image002.png
similarity index 100%
rename from uima-docbook-references/src/docbook/images/references/ref.xml.cpe_descriptor/image002.png
rename to uimaj-documentation/src/docs/asciidoc/ref/images/references/ref.xml.cpe_descriptor/image002.png
Binary files differ
diff --git a/uimaj-documentation/src/docs/asciidoc/ref/ref.cas.adoc b/uimaj-documentation/src/docs/asciidoc/ref/ref.cas.adoc
new file mode 100644
index 0000000..28daa6d
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/ref/ref.cas.adoc
@@ -0,0 +1,860 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.ref.cas]]
+= CAS Reference
+
+The CAS (Common Analysis System) is the part of the Unstructured Information Management Architecture (UIMA) that is concerned with creating and handling the data that annotators manipulate.
+
+Java users typically use the JCas (Java interface to the CAS) when manipulating objects in the CAS.
+This chapter describes an alternative interface to the CAS which allows discovery and specification of types and features at run time.
+It is recommended for use when the using code cannot know ahead of time the type system it will be dealing with.
+
+Use of the CAS as described here is also recommended (or necessary) when components add to the definitions of types of other components.
+This UIMA feature allows users to add features to a type that was already defined elsewhere.
+When this feature is used in conjunction with the JCas, it can lead to problems with class loading.
+This is because different JCas representations of a single type are generated by the different components, and only one of them is loaded  (unless you are using Pear descriptors).  Note: we do not recommend that you add features to pre-existing types.
+A type should be defined in one place only, and then there is no problem with using the JCas.
+However, if you do use this feature, do not use the JCas.
+Similarly, if you distribute your components for inclusion in somebody else's UIMA application, and you're not sure that they won't add features to your types, do not use the JCas for the same reasons. 
+
+[[ugr.ref.cas.javadocs]]
+== Javadocs
+
+The subdirectory `docs/api` contains the documentation details of all the classes, methods, and constants for the APIs discussed here.
+Please refer to this for details on the methods, classes and constants, specifically in the packages ``org.apache.uima.cas.*``.
+
+[[ugr.ref.cas.overview]]
+== CAS Overview
+
+There are threefootnote:[A fourth part, the Subject of Analysis,
+      is discussed in  .] main parts to the CAS: the type system, data creation and manipulation, and indexing.
+We will start with a brief description of these components.
+
+[[ugr.ref.cas.type_system]]
+=== The Type System
+
+The type system specifies what kind of data you will be able to manipulate in your annotators.
+The type system defines two kinds of entities, types and features.
+Types are arranged in a single inheritance tree and define the kinds of entities (objects) you can manipulate in the CAS.
+Features optionally specify slots or fields within a type.
+The correspondence to Java is to equate a CAS Type to a Java Class, and the CAS Features to fields within the type.
+A critical difference is that CAS types have no methods; they are just data structures with named slots (features). These features can have as values primitive things like integers, floating point numbers, and strings, and they also can hold references to other instances of objects in the CAS.
+We call instances of the data structures declared by the type system "`feature
+        structures`" (not to be confused with "`features`"). Feature structures are similar to the many variants of record structures found in computer science.footnote:[The name feature structure comes from
+        terminology used in linguistics.]
+
+Each CAS Type defines a supertype; it is a subtype of that supertype.
+This means that any features that the supertype defines are features of the subtype; in other words, it inherits its supertype's features.
+Only single inheritance is supported; a type's feature set is the union of all of the features in its supertype hierarchy.
+There is a built-in type called uima.cas.TOP; this is the top, root node of the inheritance tree.
+It defines no features.
+
+The values that can be stored in features are either built-in primitive values or references to other feature structures.
+The primitive values are ``boolean``, ``byte``, `short` (16 bit integers), `integer` (32 bit), `long` (64 bit), `float` (32 bit), `double` (64 bit floats) and strings; the official names of these are ``uima.cas.Boolean``, ``uima.cas.Byte``, ``uima.cas.Short``, ``uima.cas.Integer``, ``uima.cas.Long``, `uima.cas.Float` ,`` uima.cas.Double`` and `uima.cas.String` . The strings are Java strings, and characters are Java characters.
+Technically, this means that characters are UTF-16 code points, which is not quite the same as a Unicode character.
+This distinction should make no difference for almost all applications.
+The CAS also defines other basic built-in types for arrays of these, plus arrays of references to other objects, called `uima.cas.IntegerArray` ,`` uima.cas.FloatArray``, ``uima.cas.StringArray``, ``uima.cas.FSArray``, etc.
+
+The CAS also defines a built-in type called `uima.tcas.Annotation` which inherits from `uima.cas.AnnotationBase` which in turn inherits from ``uima.cas.TOP``.
+There are two features defined by this type, called `begin` and ``end``, both of which are integer valued.
+
+[[ugr.ref.cas.creating_accessing_manipulating_data]]
+=== Creating, accessing and manipulating data
+// <titleabbrev>Creating/Accessing/Changing data</titleabbrev>
+
+Creating and accessing data in the CAS requires knowledge about the types and features  defined in the type system.
+The idea is similar to other data access APIs, such as the XML DOM or SAX APIs, or database access APIs such as JDBC.
+Contrary to those APIs, however, the CAS does not use the names of type system entities directly in the APIs.
+Rather, you use the type system to access type and feature entities by name, then use these entities in the data manipulation APIs.
+This can be compared to the Java reflection APIs: the type system is comparable to the Java class loader, and the type and feature objects to the `java.lang.Class` and `java.lang.reflect.Field` classes. 
+
+Why does it have to be this complicated?  You wouldn't normally use reflection to create a Java object, either.
+As mentioned earlier, the JCas provides the more straightforward method to manipulate CAS data.
+The CAS access methods described here need only be used for generic types of applications that need to be able to handle any kind of data (e.g., generic tooling) or when the JCas may not be used for other reasons.
+The generic kinds of applications are exactly the ones where you would use the reflection API in Java as well. 
+
+[[ugr.ref.cas.creating_using_indexes]]
+=== Creating and using indexes
+
+Each view of a CAS provides a set of indexes for that view.
+Instances of Types (that is, Feature Structures) can be added to a view's indexes.
+These indexes provide a way for annotators to locate existing data in the CAS, using a specific index (or the method `getAllIndexedFS` of the object ``FSIndexRepository``) to retrieve the Feature Structures that were previously created.
+If you want the data you Newly created Feature Structures are not automatically added to the indexes; you choose which Feature Structures to add and use one of several APIs to add them. 
+
+Indexes are named and are associated with a CAS Type; they are used to index instances of that CAS type (including instances of that type's subtypes). If you are using xref:tug.adoc#ugr.tug.mvs[multiple views], each view contains a separate instantiation of all of the indexes.
+To access an index, you minimally need to know its name.
+A CAS view provides an index repository which you can query for indexes for that view.
+Once you have a handle to an index, you can get information about the feature structures in the index, the size of the index, as well as an iterator over the feature structures.
+
+There are three kinds of indexes: 
+
+* bag - no ordering
+* set - uses a user-specfied set of keys to define equality; holds one instance of the set of equal items.
+* sorted - uses a user-specified set of keys to define ordering.
+
+For set indexes, the comparator keys are augmented with an implicit additional field - the type of the feature structure.
+This means that an index over Annotations, having subtype Token, and a key of the "begin" value, will behave as follows: 
+
+* If you make two Tokens (or two Annotations), both having a begin value of 17, and add both of them to the indexes, only one of them will be in the index.
+* If you make 1 Token and 1 Annotation, both having a begin value of 17, and add both of them to the indexes, both of them will be in the index (because the types are different). 
+
+Indexes are defined in the XML descriptor metadata for the application.
+Each CAS View has its own, separate instantiation of indexes based on these definitions,  kept in the view's index repository.
+When you obtain an index, it is always from a particular CAS view's index repository.
+When you index an item, it is always added to all indexes where it belongs, within just the view's repository.
+You can specify different repositories (associated with different CAS views) to use; a given Feature Structure instance  may be indexed in more than one CAS View (unless it is a subtype of AnnotationBase).
+
+Indexes implement the Iterable interface, so you may use the Java enhanced for loop to iterate over them.
+
+You can also get iterators from indexes;  iterators allow you to enumerate the feature structures in an index.
+There are two kinds of iterators supported: the regular Java iterator API, and a specific FS iterator API where the usual Java iterator APIs (``hasNext()`` and ``next()``) are augmented by ``isValid()``, `moveToNext() / moveToPrevious()` (which does not return an element) and ``get()``.
+Finally, there is a `moveTo(FeatureStructure)` API, which, for sorted indexes, moves the iteration point to the left-most (among otherwise "equal") item in the index which compares "equal" to the given FeatureStructure, using the index's defined comparator. 
+
+Which API style you use is up to you, but we do not recommend mixing the styles as the results are sometimes unexpected.
+If you just want to iterate over an index from start to finish, either style is equally appropriate.
+If you also use `moveTo(FeatureStructure fs)` and ``moveToPrevious()``, it is better to use the special FS iterator style. 
+
+[NOTE]
+====
+The reason to not mix these styles is that you might be thinking that next() followed by moveToPrevious() would always work.
+This is not true, because next() returns the "current" element, and advances to the next position, which might be beyond the last element.
+At that point, the iterator becomes "invalid", and  moveToNext and moveToPrevious no longer move the iterator.
+But you can call these methods on the iterator --  `moveToFirst()`, `moveToLast()`, or `moveTo(FS)` -- to reset it.
+====
+
+Indexes are created by specifying them in the annotator's or aggregate's resource descriptor.
+An index specification includes its name, the CAS type being indexed, the kind (bag, set or sorted) of index it is, and an (optional) set of keys.
+The keys are used for set and sorted indexes, and specify what values are used for  ordering, or (for sets) what values are used to determine set equality.
+When a CAS pipeline is created, all index specifications are combined; duplicate definitions (having the same name) are allowed only if their definitions are the same. 
+
+Feature structure instances need to be explicitly added to the index repository by a method call.
+Feature structures that are not indexed will not be visible to other annotators, (unless they are located via being referenced by some other feature of another feature structure, which is indexed, or through a chain of these).
+
+The framework defines an unnamed bag index which indexes all types.
+The only access provided for this index is the getAllIndexedFS(type) method on the index repository, which returns an iterator over all indexed instances of the specified type (including its subtypes) for that CAS View. 
+
+The framework defines one standard, built-in annotation index, called AnnotationIndex, which indexes the `uima.tcas.Annotation` type: all feature structures of type `uima.tcas.Annotation` or its subtypes are automatically indexed with this built-in index.
+
+The ordering relation used by this index is to first order by the value of the "`begin`" features (in ascending order) and then by the value of the "`end`" feature (in descending order), and then, finally, by the  Type Priority.
+This ordering insures that longer annotations starting at the same spot come before shorter ones.
+For Subjects of Analysis other than Text, this may not be an appropriate index.
+
+In addition to normal iterators, there is a `select` API, documented in the Version 3 Users guide, which provides additional capabilities for accessing Feature Structures via the indexes.
+
+[[ugr.ref.cas.builtin_types]]
+== Built-in CAS Types
+
+The CAS has two kinds of built-in types –primitive and non-primitive.
+The primitive types are: 
+
+* uima.cas.Boolean
+* uima.cas.Byte
+* uima.cas.Short
+* uima.cas.Integer
+* uima.cas.Long
+* uima.cas.Float
+* uima.cas.Double
+* uima.cas.String
+
+The ``Byte, Short, Integer, ``and`` Long`` are all signed integer types, of length 8, 16, 32, and 64 bits.
+The `Double` type is 64 bit floating point.
+The `String` type can be subtyped to create sets of allowed values; see xref:ref.adoc#ugr.ref.xml.component_descriptor.type_system.string_subtypes[String Subtypes].
+These types can be used to specify the range of a String-valued feature.
+They act like Strings, but have additional checking to insure the setting of values into them conforms to one of the allowed values, or to null (which is the value if it is not set).  Note that the other primitive types cannot be used as a supertype for another type definition; only `uima.cas.String` can be sub-typed.
+
+The non-primitive types exist in a type hierarchy; the top of the hierarchy is the type ``uima.cas.TOP``.
+All other non-primitive types inherit from some supertype.
+
+There are 9 built-in array types.
+These arrays have a size specified when they are created; the size is fixed at creation time.
+They are named: 
+
+* uima.cas.BooleanArray
+* uima.cas.ByteArray
+* uima.cas.ShortArray
+* uima.cas.IntegerArray
+* uima.cas.LongArray
+* uima.cas.FloatArray
+* uima.cas.DoubleArray
+* uima.cas.StringArray
+* uima.cas.FSArray
+
+The `uima.cas.FSArray` type is an array whose elements are arbitrary other feature structures (instances of non-primitive types).
+
+The JCas cover classes for the array types support the Iterable API, so you may write extended for loops over instances of these.
+For example: 
+[source]
+----
+FSArray<MyType> myArray = ...
+for (MyType fs : myArray) {
+  some_method(fs);
+}
+----
+
+There are 3 built-in types associated with the artifact being analyzed: 
+
+* uima.cas.AnnotationBase
+* uima.tcas.Annotation
+* uima.tcas.DocumentAnnotation
+
+The `AnnotationBase` type defines one system-used feature which specifies for an annotation the subject of analysis (Sofa) to which it refers.
+The Annotation type extends from this and defines 2 features, taking `uima.cas.Integer` values, called `begin` and ``end``.
+The `begin` feature typically identifies the start of a span of text the annotation covers; the `end` feature identifies the end.
+The values refer to character offsets; the starting index is 0.
+An annotation of the word "`CAS`" in a text "`CAS Reference`" would have a start index of 0, and an end index of 3; the difference between end and start is the length of the span the annotation refers to.
+
+Annotations are always with respect to some Sofa (Subject of Analysis –see xref:tug.adoc#ugr.tug.aas[Annotations, Artifacts, and Sofas].
+
+[NOTE]
+====
+Artifacts which are not text strings may have a different interpretation of the meaning of begin and end, or may define their own kind of annotation, extending from ``AnnotationBase``. 
+====
+
+The `DocumentAnnotation` type has one special instance.
+It is a subtype of the Annotation type, and the built-in definition defines one feature, ``language``, which is a string indicating the language of the document in the CAS.
+The value of this language feature is used by the system to control flow among annotators when the "`CapabilityLanguageFlow`" mode is used, allowing the flow to skip over annotators that don't process particular languages.
+Users may extend this type by adding additional features to it, using the XML Descriptor element for defining a type.
+
+[NOTE]
+====
+We do _not_ recommend extending the `DocumentAnnotation` type.
+If you do, you must _not_ use the JCas, for the reasons stated earlier. 
+====
+
+Each CAS view has a different associated instance of the `DocumentAnnotation` type.
+On the CAS, use `getDocumentationAnnotation()` to access the ``DocumentAnnotation``.
+
+There are also built-in types supporting linked lists, similar to the ones available in Java and other programming languages.
+Their use is constrained by the usual properties of linked lists: not very space efficient, no (efficient) random access, but an easy choice if you don't know how long your list will be ahead of time.
+The implementation is type specific; there are different list building objects for each of the primitive types, plus one for general feature structures.
+Here are the type names: 
+
+* uima.cas.FloatList
+* uima.cas.IntegerList
+* uima.cas.StringList
+* uima.cas.FSList
++
+* uima.cas.EmptyFloatList
+* uima.cas.EmptyIntegerList
+* uima.cas.EmptyStringList
+* uima.cas.EmptyFSList
++
+* uima.cas.NonEmptyFloatList
+* uima.cas.NonEmptyIntegerList
+* uima.cas.NonEmptyStringList
+* uima.cas.NonEmptyFSList
+
+For the primitive types ``Float``, ``Integer``, `String` and ``FeatureStructure``, there is a base type, for instance, ``uima.cas.FloatList``.
+For each of these, there are two subtypes, corresponding to a non-empty element, and a marker that serves to indicate the end of the list, or an empty list.
+The non-empty types define two features –``head`` and ``tail``.
+The head feature holds the particular value for that part of the list.
+The tail refers to the next list object (either a non-empty one or the empty version to indicate the end of the list).
+
+For JCas users, the new operator for the NonEmptyXyzList classes includes a 3 argument version where you may specify the head and tail values as part of the constructor.
+The JCas  cover classes for these implement a `push(item)` method which creates a new non-empty node, sets the `head` value to ``item``, and the tail to the node it is called on, and returns the new node.
+These classes also implement Iterable, so you can use the enhanced Java `for` operator.
+The iterator stops when it gets to the end of the list, determined by either the tail being null or  the element being one of the EmptyXXXList elements.
+Here's a StringList example: 
+[source]
+----
+StringList sl = jcas.emptyStringList();
+sl = sl.push("2");
+sl = sl.push("1");
+
+for (String s : sl) {
+  someMethod(s);  // some sample use
+}
+----
+
+There are no other built-in types.
+Users are free to define their own type systems, building upon these types.
+
+[[ugr.ref.cas.accessing_the_type_system]]
+== Accessing the type system
+
+During annotator processing, or outside an annotator, access the type system by calling ``CAS.getTypeSystem()``. 
+
+However, CAS annotators implement an additional method, ``typeSystemInit()``, which is called by the UIMA framework before the annotator's process method.
+This method, implemented by the annotator writer, is passed a reference to the CAS's type system metadata.
+The method typically uses the type system APIs to obtain type and feature objects corresponding to all the types and features the annotator will be using in its process method.
+This initialization step should not be done during an annotator's initialize method since the type system can change after the initialize method is called; it should not be done during the process method, since this is presumably work that is identical for each incoming document, and so should be performed only when the type system changes (which will be a rare event). The UIMA framework guarantees it will call the ``typeSystemInit ``method of an annotator whenever the type system changes, before calling the annotator's `process()` method.
+
+The initialization done by `typeSystemInit()` is done by the UIMA framework when you use the JCas APIs; you only need to provide a `typeSystemInit()` method, as described here, when you are not using the JCas approach.
+
+[[ugr.ref.cas.type_system.printer_example]]
+=== TypeSystemPrinter example
+
+Here is a code fragment that, given a CAS Type System, will print a list of all types.
+
+[source]
+----
+// Get all type names from the type system
+// and print them to stdout.
+private void listTypes1(TypeSystem ts) {
+  for (Type t : ts) {
+    // print its name.
+    System.out.println(t.getName());
+  }
+}
+----
+
+This method is passed the type system as a parameter.
+From the type system, we can  get an iterator over all the types.
+If you run this against a CAS created with no additional user-defined types, we should see something like this on the console:
+
+[source]
+----
+Types in the type system: 
+uima.cas.Boolean 
+uima.cas.Byte
+uima.cas.Short 
+uima.cas.Integer 
+uima.cas.Long 
+uima.cas.ArrayBase 
+...
+----
+
+If the type system had user-defined types these would show up too.
+Note that some of these types are not directly creatable –they are types used by the framework in the type hierarchy (e.g.
+uima.cas.ArrayBase).
+
+CAS type names include a name-space prefix.
+The components of a type name are separated by the dot (.). A type name component must start with a Unicode letter, followed by an arbitrary sequence of letters, digits and the underscore (_). By convention, the last component of a type name starts with an uppercase letter, the rest start with a lowercase letter.
+
+Listing the type names is mildly useful, but it would be even better if we could see the inheritance relation between the types.
+The following code prints the inheritance tree in indented format.
+
+[source]
+----
+private static final int INDENT = 2;
+private void listTypes2(TypeSystem ts) {
+  // Get the root of the inheritance tree.
+  Type top = ts.getTopType();
+  // Recursively print the tree.
+  printInheritanceTree(ts, top, 0);
+}
+
+private void printInheritanceTree(TypeSystem ts, Type type, int level) {
+  indent(level); // Print indentation.
+  System.out.println(type.getName());
+  // Get a vector of the immediate subtypes.
+  Vector subTypes =
+    ts.getDirectlySubsumedTypes(type);
+  ++level; // Increase the indentation level.
+  for (int i = 0; i < subTypes.size(); i++) {
+    // Print the subtypes.
+    printInheritanceTree(ts, (Type) subTypes.get(i), level);
+  }
+}
+  
+// A simple, inefficient indenter
+private void indent(int level) {
+  int spaces = level * INDENT;
+  for (int i = 0; i < spaces; i++) {
+    System.out.print(" ");
+  }
+}
+----
+
+This example shows that you can traverse the type hierarchy by starting at the top with TypeSystem.getTopType and by retrieving subtypes with ``TypeSystem.getDirectlySubsumedTypes()``.
+
+The Javadocs also have APIs that allow you to access the features, as well as what the allowed value type is for that feature.
+Here is sample code which prints out all the features of all the types, together with the allowed value types (the feature "`range`"). Each feature has a "`domain`" which is the type where it is defined, as well as a "`range`". 
+[source]
+----
+private void listFeatures2(TypeSystem ts) {
+  Iterator featureIterator = ts.getFeatures();
+  Feature f;
+  System.out.println("Features in the type system:");
+  while (featureIterator.hasNext()) {
+    f = (Feature) featureIterator.next();
+    System.out.println(
+      f.getShortName() + ": " +
+      f.getDomain() + " -> " + f.getRange());
+  }
+  System.out.println();
+}
+----
+
+We can ask a feature object for its domain (the type it is defined on) and its range (the type of the value of the feature). The terminology derives from the fact that features can be viewed as functions on subspaces of the object space.
+
+[[ugr.ref.cas.cas_apis_create_modify_feature_structures]]
+=== Using the CAS APIs to create and modify feature structures
+// <titleabbrev>Using CAS APIs: Feature Structures</titleabbrev>
+
+Assume a type system declaration that defines two types: Entity and Person.
+Entity has no features defined within it but inherits from uima.tcas.Annotation -- so it has the begin and end features.
+Person is, in turn, a subtype of Entity, and adds firstName and lastName features.
+CAS type systems are declaratively specified using XML; the format of this XML is described in the xref:ref.adoc#ugr.ref.xml.component_descriptor.type_system[Type System Reference].
+
+[source]
+----
+<!-- Type System Definition -->
+<typeSystemDescription>
+  <types>
+    <typeDescription>
+      <name>com.xyz.proj.Entity</name>
+      <description />
+      <supertypeName>uima.tcas.Annotation</supertypeName>
+    </typeDescription>
+    <typeDescription>
+      <name>Person</name>
+      <description />
+      <supertypeName>com.xyz.proj.Entity </supertypeName>
+      <features>
+        <featureDescription>
+          <name>firstName</name>
+          <description />
+          <rangeTypeName>uima.cas.String</rangeTypeName>
+        </featureDescription>
+        <featureDescription>
+          <name>lastName</name>
+          <description />
+          <rangeTypeName>uima.cas.String</rangeTypeName>
+        </featureDescription>
+      </features>
+    </typeDescription>
+  </types>
+</typeSystemDescription>
+----
+
+To be able to access types and features, we need to know their names.
+The CAS interface defines constants that hold the names of built-in feature names, such as, e.g., ``CAS.TYPE_NAME_INTEGER``.
+It is good programming practice to create such constants for the types and features you define, for your own use as well as for others who will be using your annotators. 
+
+[source]
+----
+/** Entity type name constant. */
+public static final String ENTITY_TYPE_NAME = "com.xyz.proj.Entity";
+  
+/** Person type name constant. */
+public static final String PERSON_TYPE_NAME = "com. xyz.proj.Person";
+
+/** First name feature name constant. */
+public static final String FIRST_NAME_FEAT_NAME = "firstName";
+
+/** Last name feature name constant. */
+public static final String LAST_NAME_FEAT_NAME = "lastName";
+----
+
+Next we define type and feature member variables; these will hold the values of the type and feature objects needed by the CAS APIs, to be assigned during ``typeSystemInit()``.
+
+[source]
+----
+// Type system object variables
+private Type entityType;
+private Type personType;
+private Feature firstNameFeature;
+private Feature lastNameFeature;
+private Type stringType;
+----
+
+The type system does not throw an exception if we ask for something that is not known, it simply returns null; therefore the code checks for this and throws a proper exception.
+We require all these types and features to be defined for the annotator to work.
+One might imagine situations where certain computations are predicated on some type or feature being defined in the type system, but that is not the case here.
+
+[source]
+----
+// Get a type object corresponding to a name.
+// If it doesn't exist, throw an exception.
+private Type initType(String typeName)
+  throws AnnotatorInitializationException {
+  Type type = ts.getType(typeName);
+  if (type == null) {
+    throw new AnnotatorInitializationException(
+      AnnotatorInitializationException.TYPE_NOT_FOUND,
+      new Object[] { this.getClass().getName(), typeName });
+  }
+  return type;
+}
+
+// We add similar code for retrieving feature objects.
+// Get a feature object from a name and a type object.
+// If it doesn't exist, throw an exception.
+private Feature initFeature(String featName, Type type)
+  throws AnnotatorInitializationException {
+  Feature feat = type.getFeatureByBaseName(featName);
+  if (feat == null) {
+    throw new AnnotatorInitializationException(
+      AnnotatorInitializationException.FEATURE_NOT_FOUND,
+      new Object[] { this.getClass().getName(), featName });
+  }
+  return feat;
+}
+----
+
+Using these two functions, code for initializing the type system described above would be: 
+[source]
+----
+public void typeSystemInit(TypeSystem aTypeSystem)
+    throws AnalysisEngineProcessException {
+  this.typeSystem = aTypeSystem;
+  // Set type system member variables.
+  this.entityType = initType(ENTITY_TYPE_NAME);
+  this.personType = initType(PERSON_TYPE_NAME);
+  this.firstNameFeature =
+    initFeature(FIRST_NAME_FEAT_NAME, personType);
+  this.lastNameFeature =
+    initFeature(LAST_NAME_FEAT_NAME, personType);
+  this.stringType = initType(CAS.TYPE_NAME_STRING);
+}
+----
+
+Note that we initialize the string type by using a type name constant from the CAS.
+
+[[ugr.ref.cas.creating_feature_structures]]
+== Creating feature structures
+
+To create feature structures in JCas, we use the Java "`new`" operator.
+In the CAS, we use one of several different API methods on the CAS object, depending on which of the 10 basic kinds of feature structures we are creating (a plain feature structure, or an instance of the built-in primitive type arrays or FSArray). There are is also a method to create an instance of a ``uima.tcas.Annotation``, setting the begin and end values.
+
+Once a feature structure is created, it needs to be added to the CAS indexes (unless it will be accessed via some reference from another accessible feature structure). The CAS provides this API: Assuming aCAS holds a reference to a CAS, and token holds a reference to a newly created feature structure, here's the code to add that feature structure to all the relevant CAS indexes:
+
+[source]
+----
+    // Add the token to the index repository.
+    aCAS.addFsToIndexes(token);
+----
+
+There is also a corresponding `removeFsFromIndexes(token)` method on CAS objects.
+
+As of version 2.4.1, there are two methods you can use on an index repository  to efficiently bulk-remove all instances of particular types of feature structures from a particular view.
+One of these, `aCas.getIndexRepository().removeAllIncludingSubtypes(aType)` removes all instances of a particular type, including instances which are subtypes of the specified type.
+The other, `aCas.getIndexRepository().removeAllExcludingSubtypes(aType)` remove all instances of a particular type, only.
+In both cases, the removal is done from the particular view of the CAS referenced by aCas.
+
+[[ugr.ref.cas.updating_indexed_feature_structures]]
+=== Updating indexed feature structures
+
+Version 2.7.0 added protection for indexes when feature structure key value features are updated.
+By default this protection is automatic, but  at some performance cost.
+Users may optimize this further.
+
+Protection is needed because some of the indexes (the Sorted and Set types) use comparators defined to use values of the particular features; if these values  need to be changed after the feature structure is added to the indexes,  the correct way to do this is to: 
+
+. completely remove the item from all indexes where it is indexed, in all views where it is indexed,
+. update the value of the features being used as keys,
+. add the item back to the indexes, in all views.
+
+
+[NOTE]
+====
+It's OK to change feature values which are not used in determining sort ordering (or set membership), without removing and re-adding back to the index. 
+====
+
+The automatic protection checks for updates of features being used as keys, and if it finds an update like this for a feature structure that is in the indexes, it removes the feature structure from the indexes, does the update, and adds it back.
+It will do this for every feature update.
+This is obviously not  efficient when multiple features are being updated; in that case it would better to  remove the feature structure, do all the updates to all the features needing updates, and then do a single add-back operation.
+
+This is supported in user's code by using the new method `protectIndexes`  available in both the CAS and JCas interface.
+Here's two ways of using this, one with a try / finally and the other with a Runnable: 
+[source]
+----
+// an approach using try / finally
+AutoCloseable ac = my_cas.protectIndexes();  // my_cas is a CAS or a JCas
+try {
+   ...  arbitrary user code which updates features
+        which may be "keys" in one or more indexes
+} finally {
+  ac.close();
+}
+
+// This can more compactly be written using the auto-close feature of try:
+
+try (AutoCloseable ac = my_cas.protectIndexes()) {
+   ...  arbitrary user code which updates features 
+        which may be "keys" in one or more indexes
+}
+
+// an approach using a Runnable, written in Java 8 lambda syntax
+my_cas.protectIndexes(() -> {
+  ... arbitrary user code updating "key" features,
+      but no checked exceptions are permitted
+  });
+----
+
+The `protectIndexes` implementation only removes feature structures that have features being updated which are used as keys in some index(es). At the end of the scope of the protectIndexes, it adds all of these back.
+It also skips removing feature structures from bag indexes, since these have no keys.
+
+Within a `protectIndexes` block, do not do any operations which depend on the  indexes being valid, such as creating and using an iterator.
+This is because the removed FSs  are only added back at the end of the protectIndexes block.
+
+The JVM property `-Duima.report_fs_update_corrupts_index` will generate a log entry everytime the frameworks finds (and automatically surrounds with a remove - add-back) an update to  a feature which could corrupt the index.
+The log entries can be identified by scanning for messages starting with `While FS was in the index, the feature` - the message goes on to identify the feature in question.
+Users can use these reports to find the places in their code where  they can either change the design to avoid updating these values after the item is indexed, or surround the updates with their own `protectIndexes` blocks.
+
+Initially, the out-of-the-box defaults for the UIMA framework will run with an automatic (but somewhat inefficient) protection.
+To improve upon this, users would: 
+
+* Turn on reporting using a global JVM flag `` -Duima.report_fs_update_corrupts_index``. This will cause a message to be logged each time the automatic protection is being invoked, and allows the user to find the spots to improve.
+* Improve each spot, perhaps by surrounding the update code with a protectIndexes block, or by rearranging code to reduce updating feature values used as index keys.
+* Once the code is no longer generating any reports, you can turn off the automatic protection for production runs using the JVM global property ``-Duima.disable_auto_protect_indexes``, and rely on the protectIndexes blocks. If protection is disabled, then the corruption detection is skipped, making the production  runs perhaps a bit faster, although this is not significant in most cases.
+* For automated build systems, there's a JVM parameter, ``-Duima.exception_when_fs_update_corrupts_index``, which will throw an exception if any automatic recovery situation is encountered. You can use this  in build/test scenarios to insure (after adding all needed protectIndexes blocks) that the code remains safe for  turning off the checking in production runs.
+
+
+[[ugr.ref.cas.accessing_modifying_feature_structures]]
+== Accessing or modifying feature structures
+
+Values of individual features for a feature structure can be set or referenced, using a set of methods that depend on the type of value that feature is declared to have.
+There are methods on FeatureStructure for this: getBooleanValue, getByteValue, getShortValue, getIntValue, getLongValue, getFloatValue, getDoubleValue, getStringValue, and getFeatureValue (which means to get a value which in turn is a reference to a feature structure). There are corresponding "`setter`" methods, as well.
+These methods on the feature structure object take as arguments the feature object retrieved earlier in the typeSystemInit method.
+
+Using the previous example, with the type system initialized with type personType and feature lastNameFeature, here's a sample code fragment that gets and sets that feature:
+
+[source]
+----
+// Assume aPerson is a variable holding an object of type Person
+// get the lastNameFeature value from the feature structure
+String lastName = aPerson.getStringValue(lastNameFeature);
+// set the lastNameFeature value
+aPerson.setStringValue(lastNameFeature, newStringValueForLastName);
+----
+
+The getters and setters for each of the primitive types are defined in the Javadocs as methods of the FeatureStructure interface.
+
+[[ugr.ref.cas.indexes_and_iterators]]
+== Indexes and Iterators
+
+Each CAS can have many indexes associated with it; each CAS View contains  a complete set of instantiations of the indexes.
+Each index is represented by an instance of the type org.apache.uima.cas.FSIndex.
+You use the object org.apache.uima.cas.FSIndexRepository, accessible via a method on a CAS object, to retrieve instances of indexes.
+There are methods that let you select the index by name, by type, or by both name and type.
+Since each index is already associated with a type,  passing both a name and a type is valid only if the type passed in is the same type or a subtype of the one declared in the index specification for the named index.
+If you pass in a subtype, the returned FSIndex object refers to an index that will return only items belonging to that subtype (or subtypes of that subtype).
+
+The returned FSIndex objects are used, in turn, to create iterators.
+There is also a method on the Index Repository, ``getAllIndexedFS``,  which will return an iterator over all indexed Feature Structures (for that CAS View), in no particular order.
+The iterators created can be used like common Java iterators, to sequentially retrieve items indexed.
+If the index represents a sorted index, the items are returned in a sorted order, where the sort order is specified in the XML index definition.
+This XML is part of the Component Descriptor, see xref:ref.adoc#ugr.ref.xml.component_descriptor.aes.index[Index Definition].
+
+In UIMA V3, Feature structures may be added to or removed from indexes while iterating over them.
+If this happens, any iterators already created will continue to operate over the before-modification version of the index, unless or until the iterator is re-synchronized with the current value of the index via one of the following specific 3 iterator API calls:  moveToFirst, moveToLast, or moveTo(FeatureStructure). ConcurrentModificationException is no longer thrown in UIMA v3. 
+
+Feature structures being iterated over may have features which are used as the "keys" of an index, updated.
+If this is done, UIMA will protect the indexes (to prevent index corruption) by automatically removing the  Feature Structure from the indexes,  updating the field, and adding the FS back to the index (possibly in a new position).   This automatic remove / add-back operation no longer makes the iterator throw a ConcurrentModificationException (as it did in UIMA Version 2) if the iterator is incremented or decremented; existing iterators will continue to operate as if no index modification occurred. 
+
+[[ugr.ref.cas.index.built_in_indexes]]
+=== Built-in Indexes
+
+An unnamed built-in bag index exists which holds all feature structures which are indexed.
+The only access to this index is the method `getAllIndexedFS(Type)`` which returns an iterator over all indexed Feature Structures.
+
+The CAS also contains a built-in index for the type ``uima.tcas.Annotation``, which sorts annotations in the order in which they appear in the document.
+Annotations are sorted first by increasing `begin` position.
+Ties are then broken by _decreasing_``end`` position (so that longer annotations come first). Annotations that match in both their `begin` and `end` features are sorted using the xref:ref.adoc#ugr.ref.xml.component_descriptor.aes.type_priority[Type Priority], if any are defined.
+
+[[ugr.ref.cas.index.adding_to_indexes]]
+=== Adding Feature Structures to the Indexes
+
+Feature Structures are added to the indexes by various APIs.
+These add the Feature Structure to _all_ indexes that are defined for the type of that `FeatureStructure` (or any of its supertypes), in a particular view.
+Note that you should not add a Feature Structure to the indexes until you have set values for all of the features that may be used as sort keys in an index.
+
+There are multiple APIs for adding FSs to the index. 
+
+* (preferred) `myFeatureStructure.addToIndexes()`. This adds the feature structure instance to the view in which it was originally created.
+* (preferred) `myFeatureStructure.addToIndexes(JCas or CAS)`. This adds the feature structure instance to the view represented by the argument.
+* (older form) `casView.addFsToIndexes(myFeatureStructure)` or `jcasView.addFsToIndexes(myFeatureStructure)`.  This adds the feature structure instance to the view represented by the cas (or jcas).
+* (older form) `fsIndexRepositoryView.addFsToIndexes(myFeatureStructure)`.  This adds the feature structure instance to the view represented by the `fsIndexRepository` instance.
+
+
+[[ugr.ref.cas.index.iterators]]
+=== Iterators over UIMA Indexes
+
+Iterators are objects of class `org.apache.uima.cas.FSIterator.` This class extends `java.util.Iterator` and implements the normal Java iterator methods, plus additional ones that allow moving both forwards and backwards.
+
+UIMA Indexes implement `Iterable`, so you can use the index directly in a Java extended for loop.
+
+[[ugr.ref.cas.index.annotation_index]]
+=== Special iterators for Annotation types
+
+Note: we recommend using the UIMA V3 select framework, instead of the following.
+It implements all of the following capabilities, and more, in a uniform manner.
+
+The built-in index over the `uima.tcas.Annotation` type named "``AnnotationIndex``" has additional capabilities.
+To use them, you first get a reference to this built-in index using either the `getAnnotationIndex` method on a CAS View object, or by asking the `FSIndexRepository` object for an index having the particular name "`AnnotationIndex`", for example: 
+
+[source]
+----
+AnnotationIndex idx = aCAS.getAnnotationIndex(); 
+// or you can iterate over a specific subtype of Annotation:        
+AnnotationIndex idx = aCAS.getAnnotationIndex(aType);
+----
+
+This object can be used to produce several additional kinds of iterators.
+It can produce unambiguous iterators; these skip over elements until it finds one where the start position of the next annotation is equal to or greater than the end position of the previously returned annotation.
+
+It can also produce several kinds of subiterators; these are iterators whose annotations fall within the span of another annotation.
+This kind of iterator can also have the unambiguous property, if desired.
+It also can be "`strict`" or not; strict means that the returned annotation lies completely within the span of the controlling annotation.
+Non-strict only implies that the beginning of the returned annotation falls within the span of the controlling annotation.
+
+There is also a method which produces an `AnnotationTree` object, which contains nodes representing the results of doing a strict, unambiguous subiterator over the span of some controlling annotation.
+For more details, please refer to the Javadocs for the `org.apache.uima.cas.text` package.
+
+[[ugr.ref.cas.index.constraints_and_filtered_iterators]]
+=== Constraints and Filtered iterators
+
+Note: for new code, consider using the select framework plus Streams, instead of the following.
+
+There is a set of API calls that build constraint objects.
+These objects can be used directly to test if a particular feature structure matches (satisfies) the constraint, or they can be passed to the createFilteredIterator method to create an iterator that skips over instances which fail to satisfy the constraint.
+
+It is possible to specify a feature value located by following a chain of references starting from the feature structure being tested.
+Here's a scenario to explore this concept.
+Let's suppose you have the following type system (namespaces are omitted for clarity): 
+
+____
+**Token**, having a feature PartOfSpeech which holds a reference to another type (POS)
+
+*POS* (a type with many subtypes, each representing a different part of speech)
+
+*Noun* (a subtype of POS)
+
+*ProperName* (a subtype of Noun), having a feature Class which holds an integer value encoding some information about the proper noun.
+____
+
+If you want to filter Token instances, such that only those tokens get through which are proper names of class 3 (for example), you would need a test that started with a Token instance, followed its PartOfSpeech reference to another instance (the ProperName instance) and then tested the Class feature of that instance for a value equal to 3.
+
+To support this, the filtering approach has components that specify tests, and components that specify "`paths`".
+The tests that can be done include testing references to type instances to see if they are instances of some type or its subtypes; this is done with a FSTypeConstraint constraint.
+Other tests check for equality or, for numeric values, ranges.
+
+Each test may be combined with a path -- to get to the value to test.
+Tests that start from a feature structure instance can be combined with and and or connectors.
+The Javadocs for these are in the package org.apache.uima.cas in the classes that end in Constraint, plus the classes ConstraintFactory, FeaturePath and CAS.
+Here's an example; assume the variable cas holds a reference to a CAS instance. 
+
+[source]
+----
+// Start by getting the constraint factory from the CAS.
+ConstraintFactory cf = cas.getConstraintFactory();
+
+// To specify a path to an item to test, you start by
+// creating an empty path.
+FeaturePath path = cas.createFeaturePath();
+
+// Add POS feature to path, creating one-element path.
+path.addFeature(posFeat);
+
+// You can extend the chain arbitrarily by adding additional
+// features.
+
+// Create a new type constraint.  
+
+// Type constraints will check that structures
+// they match against have a type at least as specific
+// as the type specified in the constraint.
+FSTypeConstraint nounConstraint = cf.createTypeConstraint();
+
+// Set the type (by default it is TOP).
+// This succeeds if the type being tested by this constraint
+// is nounType or a subtype of nounType.
+nounConstraint.add(nounType);
+
+// Embed the noun constraint under the pos path.
+// This means, associate the test with the path, so it tests the
+// proper value.
+
+// The result is a test which will
+// match a feature structure that has a posFeat defined
+// which has a value which is an instance of a nounType or
+// one of its subtypes.
+FSMatchConstraint embeddedNoun = cf.embedConstraint(path, nounConstraint);
+
+// Create a type constraint for token (or a subtype of it)
+FSTypeConstraint tokenConstraint = cf.createTypeConstraint();
+
+// Set the type.
+tokenConstraint.add(tokenType);
+
+// Create the final constraint by conjoining the two constraints.
+FSMatchConstraint nounTokenCons = cf.and(nounConstraint, tokenConstraint);
+
+// Create a filtered iterator from some annotation iterator.
+FSIterator it = cas.createFilteredIterator(annotIt, nounTokenCons);
+----
+
+[[ugr.ref.cas.guide_to_javadocs]]
+== The CAS API's -- a guide to the Javadocs
+// <titleabbrev>CAS API's Javadocs</titleabbrev>
+
+The CAS APIs are organized into 3 Java packages: cas, cas.impl, and cas.text.
+Most of the APIs described here are in the cas package.
+The cas.impl package contains classes used in serializing and deserializing (reading and writing external representations) the CAS in various formats, for transporting the CAS among local and remote annotators, or for storing the CAS in permanent storage.
+The cas.text contains the APIs that extend the CAS to support artifact (including "`text`") analysis.
+
+[[ugr.ref.cas.javadocs.cas_package]]
+=== APIs in the CAS package
+
+The main objects implementing the APIs discussed here are shown in the diagram below.
+The hierarchy represents that there is a way to get from an upper object to an instance of the lower object, usually by using a method on the upper object; this is not an inheritance hierarchy. 
+
+.CAS Object hierarchy
+image::images/references/ref.cas/image001.png[CAS object hierarchy]
+
+The main Interface is the CAS interface.
+This has most of the functionality of the CAS, except for the type system metadata access, and the indexing access.
+JCas and CAS are alternative representations and API approaches to the CAS; each has a method to get the other.
+You can mix JCas and CAS APIs in your application as needed.
+To use the JCas APIs, you have to create the Java classes that correspond to the CAS types, and include them in the Java class path of the application.
+If you have a CAS object, you can get a JCas object by using the `getJCas()` method call on the CAS object; likewise, you can get the CAS object from a JCas by using the `getCAS()` method call on the JCas object.
+There is also a low level CAS interface that is not part of the official API, and is intended for internal use only -- it is not documented here.
+
+The type system metadata APIs are found in the TypeSystem interface.
+The objects defining each type and feature are defined by the interfaces Type and Feature.
+The Type interface has methods to see what types subsume other types, to iterate over the types available, and to extract information about the types, including what features it has.
+The Feature interface has methods that get what type it belongs to, its name, and its range (the kind of values it can hold).
+
+The FSIndexRepository gives you access to methods to get instances of indexes, and also provides access to the iterator over all indexed feature structures: ``getAllIndexedFS(aType)``.
+The FSIndex and AnnotationIndex objects give you methods to create instances of iterators.
+
+Iterators and the CAS methods that create new feature structures return FeatureStructure objects.
+These objects can be used to set and get the values of defined features within them.
+
+[[ugr.ref.cas.typemerging]]
+== Type Merging
+
+When annotators are combined in an aggregate, their defined type systems are merged.
+This is designed to support independent development of annotator components.
+The merge results in a single defined type system for CASes that flow through a particular set of annotators.
+
+The basic operation of a type system merge is to iterate through all the defined types, and if two annotators define the same fully qualified type name,  to take the features defined for those types and form a logical union of those features.
+This operation requires that same-named features have the same range type names.
+The resulting type system has features comprising the union of all features over all the various definitions for this type in different annotators. 
+
+Feature merging checks that for all features having the same name in a type, that the range type is identical; otherwise an error is signaled.
+
+Types are combined for merging when their fully qualified names are the same.
+Two different definitions can be merged even if their supertype definitions do not match, if one supertype subsumes the other supertype; otherwise an error is signaled.
+Likewise, two types with the same name can be merged only if their features can be merged. 
+
+[[ugr.ref.cas.limitedmultipleaccess]]
+== Limited multi-thread access to read-only CASs
+
+Some applications may find it useful to scale up pipelines and run these in parallel.
+
+Generally, CASs are not threadsafe, and only one thread at a time may operate on it.
+In many scenarios, a CAS may be initialized and then filled with Feature Structures, and after some point, no more updates to that particular CAS will be done.
+
+If a CAS is no longer going to be changed, it is possible to  access it on multiple threads in a read-only mode, simultaneously, with some limitations.
+Limitations  arise because some UIMA Framework activities may update internal CAS data structures.
+
+Operational data is updated while running a pipeline when a PEAR is entered or exited,  because PEARs establish new class loaders and can potentially switch the JCas classes being used (This happens because the class loaders might define different JCas cover classes  implementing the same UIMA type). Because of this, you cannot have multiple pipelines accessing a CAS in read-only mode if one or more of those pipelines contains a PEAR.
+There are other edge cases where this may happen as well; for example, if you are  running a pipeline with an Extension Class Loader,  and have a callback routine loaded under a different class loader, UIMA will switch the JCas classes when calling the callback. 
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/ref/ref.compress.adoc b/uimaj-documentation/src/docs/asciidoc/ref/ref.compress.adoc
new file mode 100644
index 0000000..f294f14
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/ref/ref.compress.adoc
@@ -0,0 +1,149 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.ref.compress]]
+= Compressed Binary CASes
+
+[[ugr.ref.compress.overview]]
+== Binary CAS Compression overview
+
+UIMA has a proprietary binary serialization format, used internally for several things, including communicating with embedded C++ annotators using UIMA-CPP.
+This binary format is also selectable for use with UIMA-AS.
+Its use requires that the source and target systems implement the identical type system (because the type system is not sent, and internal coding is used within the format that is keyed to the particular type system).
+
+Starting with version 2.4.1, two additional forms of binary serialization are added.
+Both compress the data being serialized; typical size ratios can approach 50 : 1, depending on the exact contents of the CAS, when compared with normal binary serialization. 
+
+The two forms are called 4 and 6, for historical/internal reasons.
+The serialized forms of both of these is fixed, but not currently standardized, and the form being used is encoded in the header so  that the appropriate deserializer can be chosen.
+Both forms include support for Delta CAS being returned from a service.
+
+Form 6 builds on form 4, and adds: serializing only those feature structures which are reachable (that is, in some index, or referenced by other reachable feature structures), and type filtering.
+
+Type filtering takes a source type system and a target type system, and for serializing  (source to target), sends the binary representation of reachable feature structures in the target's type system.
+For deserializing (reading a target into a source), the filtering takes the specification being read as being encoded using the target's type system, and translates that into the source's type system.
+In this process, types which exist in the source but not the target are skipped (when serializing);  types which exist in the target, but not the source are skipped when deserializing. Features that exist in some source type but not in the version of the same type in the target are skipped (when serializing) or set to default values (i.e., 0 or null) when being deserialized.
+
+There are two main use cases for using compressed forms.
+The first one is for communicating with  UIMA-AS remote services (not yet implemented). 
+
+The second use case is for saving compressed representations of CASes to other media, such as disk files, where they can be deserialized later for use in other UIMA applications.
+
+[[ugr.ref.compress.usage]]
+== Using Compressed Binary CASes
+
+The main user interface for serializing a CAS using compression is to use one of the  static methods named serializeWithCompression in Serialization.
+If you pass a Type System argument representing a target type system, then form 6 compression is used; otherwise form 4 is used.
+To get the benefit of only serializing reachable Feature Structure instances, without type mapping  (which is only in form 6), pass a type system argument which is null. 
+
+To deserialize into a CAS without type mapping, use one of the deserialize method in Serialization.
+There are multiple forms of this method, depending on the arguments.
+The forms which take extra arguments include a ReuseInfo may only be used with serialized forms created with form 6 compression.
+The plain form of deserialize works with all forms of binary serialization, compressed and non-compressed, by examining a common header which identifies the form of binary serialization used; however, for form 6, since it requires additional arguments, it will fail - and you need to use the other deserialize form.
+
+Form 6 has an additional object, ReuseInfo, which holds information which  is required for subsequent Delta CAS format serializations / deserializations.
+It can speed up subsequent serializations of the same  CAS (before it is further updated), for instance, if an application is sending the CAS to multiple services in parallel.
+The serializeWithCompression method returns this object when form 6 is being used. 
+
+In addition, the CasIOUtils class offers static load and save methods, which can be used with the SerialFormat enum to serialize and deserialize to URLs or streams; see the Javadocs for details.
+
+[[ugr.ref.compress.simple_deltas]]
+== Simple Delta CAS serialization
+
+Use Form 4 for this, because form 6 supports delta CAS but requires  that at the time of deserialization of a CAS (on the receiver side) which will later be delta serialized back to the sender,  an instance of the ReuseInfo must be saved, and that same instance then used for delta serialization; furthermore, the original serialization  (on the sender side) also must save an instance of the ReuseInfo and use this when deserializing the delta CAS. 
+
+Form 4 may not be as efficient as form 6 in that it does not filter the CASes  either by type systems nor by only sending reachable Feature Structure instances.
+But, it doesn't require a ReuseInfo object when doing delta serialization or deserialization,   so it may be more convenient to use when saving delta CASes to files (as opposed to the other use case of  a remote service returning delta CASes to a remote client).
+
+[[ugr.ref.compress.use_cases]]
+== Use Case cookbook
+
+Here are some use cases, together with a suggested approach and example of how to use the APIs. 
+
+*Save a CAS to an output stream, using form 4 (no type system filtering):*
+
+[source]
+----
+// set up an output stream.  In this example, an internal byte array.
+ByteArrayOutputStream baos = new ByteArrayOutputStream(OUT_BFR_INIT_SZ);
+Serialization.serializeWithCompression(casSrc, baos);
+  // or
+CasIOUtls.save(casSrc, baos, SerialFormat.COMPRESSED);
+----
+
+*Deserialize from a stream into an existing CAS:*
+
+[source]
+----
+// assume the stream is a byte array input stream
+// For example, one could be created 
+//   from the above ByteArrayOutputStream as follows:
+ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
+// Deserialize into a cas having the identical type system
+Serialization.deserializeCAS(cas, bais);
+  // or
+CasIOUtils.load(bais, aCas);
+----
+
+Note that the `deserializeCAS(cas, inputStream)` method is a general way to deserialize into a CAS from an inputStream for all forms of binary serialized data (with exceptions as noted above). The method reads a common header, and based on what it finds, selects the appropriate deserialization routine.
+
+[NOTE]
+====
+The `deserialization` method with just 2 arguments method doesn't support type filtering, or delta cas deserializating for form 6.
+To do those, see example below. 
+====
+
+*Serialize to an output stream, filtering out some types and/or features:*
+
+To do this, an additional input specifying the Type System of the target must be supplied; this Type System should be a subset of the source CAS's.
+The `out` parameter may be an OutputStream, a DataOutputStream, or a File. 
+
+[source]
+----
+// set up an output stream.  In this example, an internal byte array.
+ByteArrayOutputStream baos = new ByteArrayOutputStream(OUT_BFR_INIT_SZ);
+Serialization.serializeWithCompression(cas, out, tgtTypeSystem);
+----
+
+*Deserialize with type filtering:*
+
+There are 2 type systems involved here: one is the receiving CAS, and the other is the type system used to decode the serialized form.
+This may optionally be stored with the serialized form:
+
+[source]
+----
+CasIOUtils.save(cas, out, SerialFormat.COMPRESSED_FILTERED_TS);
+----
+
+and/or it can be supplied at load time.
+Here's two examples of suppling this at load time:
+
+[source]
+----
+CasIOUtils.load(input, cas, typeSystem); 
+CasIOUtils.load(input, type_system_serialized_form_input, cas);
+----
+
+The reuseInfo should be null unless  deserializing a delta CAS, in which case, it must be the reuse info captured when  the original CAS was serialized out.
+If the target type system is identical to the one in the CAS, you may pass null for it.
+If a delta cas is not being received, you must pass null for the reuseInfo. 
+
+[source]
+----
+ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
+Serialization.deserializeCAS(cas, bais, tgtTypeSystem, reuseInfo);
+----
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/ref/ref.config.adoc b/uimaj-documentation/src/docs/asciidoc/ref/ref.config.adoc
new file mode 100644
index 0000000..0ef6edb
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/ref/ref.config.adoc
@@ -0,0 +1,179 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.ref.config]]
+= UIMA Setup and Configuration
+// <titleabbrev>Setup and Configuration</titleabbrev>
+
+
+[[ugr.ref.config.properties]]
+== UIMA JVM Configuration Properties
+
+Some updates change UIMA's behavior between released versions.
+For example, sometimes an error check is enhanced, and this can cause something that previously incorrect but not checked, to now signal an error.
+Often, users will want these kinds of things to be ignored, at least for a while, to give them time to  analyze and correct the issues. 
+
+To enable users to gradually address these issues, there are some global JVM properties for UIMA that can restore earlier behaviors, in some cases.
+These are detailed in the table below.
+Additionally, there are other JVM properties that can be used in checking and optimizing some performance trade-offs, such as the automatic index protection.
+For the most part, you don't need to assign any values to these properties, just define them.
+For example to disable the enhanced check that insures you  don't add a subtype of AnnotationBase to the wrong View, you could disable this by adding the JVM argument ``-Duima.disable_enhanced_check_wrong_add_to_index``.
+This would remove the enhanced checking for this, added in version 2.7.0 (the previously existing partial checking is still there, though). 
+
+[[ugr.ref.config.protect_index]]
+== Configuring index protection
+
+A new feature in version 2.7.0 optionally can include checking for invalid feature updates  which could corrupt indexes.
+Because this checking can slightly slow down performance, there are  global JVM properties to control it.
+The suggested way to operation with these is as follows. 
+
+* At the beginning, run with automatic protection enabled (the default), but turn on explicit reporting (``-Duima.report_fs_update_corrupts_index``)
+* For all reported instances, examine your code to see if you can restructure to do the updates before adding the FS to the indexes. Where you cannot, surround the code doing  these updates with a try / finally or block form of ``protectIndexes()``,  which is described in <<ugr.ref.cas.updating_indexed_feature_structures>> (and also is similarly available with JCas). 
+* After no further reports, for maximum performance, leave in the protections  you may have installed in the above step, and then disable the reporting and runtime checking,  using the JVM argument ``-Duima.disable_auto_protect_indexes``, and removing (if present) ``-Duima.report_fs_update_corrupts_index``.
+
+One additional JVM property, ``-Duima.throw_exception_when_fs_update_corrupts_index``,  is intended to be used in automated build / testing configurations.
+It causes the framework to throw a UIMARuntimeException if an update outside of a `protectIndexes` block occurs  that could corrupt the indexes, rather than "recovering" this. 
+
+[[ugr.ref.config.property_table]]
+== Properties Table
+
+This table describes the various JVM defined properties; specify these on the Java command line using -Dxxxxxx, where the xxxxxx is one of the properties starting with `uima.` from the table below.
+
+[cols="1,1,1", frame="all"]
+|===
+
+|**Title**
+|**Property Name & Description**
+|**Since Version**
+
+|
+
+Use built-in Java Logger as default back-end
+|
+
+`uima.use_jul_as_default_uima_logger`
+
+See https://issues.apache.org/jira/browse/UIMA-5381[UIMA-5381].
+The standard UIMA logger uses an slf4j implementation, which, in turn hooks up to  a back end implementation based on what can be found in the class path (see slf4j documentation). If no backend implementation is found, the slf4j default is to use a NOP logger back end  which discards all logging.
+
+When this flag is specified, the behavior of the UIMA logger  is altered to use the built-in-to-Java logging implementation  as the back end for the UIMA logger. 
+|
+
+3.0.0
+
+|
+
+XML: enable doctype declarations
+|
+
+`uima.xml.enable.doctype_decl` (default is false)
+
+See https://issues.apache.org/jira/browse/UIMA-6064[UIMA-6064] Normally, this is turned off to avoid exposure to malicious XML; see https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Processing[
+             XML External Entity processing vulnerability]. 
+|
+
+2.10.4, 3.1.0
+
+|**Index protection properties**
+
+|
+
+Report Illegal Index-key Feature Updates
+|
+
+`uima.report_fs_update_corrupts_index` (default is not to report)
+
+See https://issues.apache.org/jira/browse/UIMA-4135[UIMA-4135].
+Updating Features which are used in Set and Sorted indexes as "keys" may corrupt the indexes, if the Feature Structure (FS) has been added to the indexes.
+To update these, you must first completely remove the FS from the indexes in all views, then do the updates, and then add it back.
+UIMA now checks for this (unless specifically disabled, see below), and if this property is set, will log WARN messages for each occurrence unless the user does explicit `protectIndexes` (see CAS JavaDocs for CAS / JCas `protectIndexes` methods), if this property is defined.
+
+To scan the logs for these reports, search for instances of lines having the string `While FS was in the index, the feature`
+
+Specifying this property overrides ``uima.disable_auto_protect_indexes``.
+
+Users would run with this property defined, and then for high performance,  would use the report to manually change their code to avoid the problem or  to wrap the updates with a `protectIndexes` kind of protection (see the reference manual, in the CAS or JCas chapters, for examples of user code doing this,  and then run with the protection turned off (see below). 
+|
+
+2.7.0
+
+|
+
+Throw exception on illegal Index-key Feature Updates
+|
+
+`uima.exception_when_fs_update_corrupts_index` (default is false)
+
+See https://issues.apache.org/jira/browse/UIMA-4150[UIMA-4150].
+Throws a UIMARuntimeException if an Indexed FS feature used as a key in one or more  indexes is updated, outside of an explicit `protectIndexes` block..  \ This is intended for use in automated build and test environments, to provide a strong signal if this kind of mistake gets into the build.
+If it is not set, then the other properties specify if corruption should be checked for,  recovered automatically, and / or reported
+
+Specifying this property also forces `uima.report_fs_update_corrupts_index` to true even if it was set to false.
+|
+
+2.7.0
+
+|
+
+Disable the index corruption checking
+|
+
+`uima.disable_auto_protect_indexes`
+
+See https://issues.apache.org/jira/browse/UIMA-4135[UIMA-4135].
+After you have fixed all reported issues identified with the above report, you may set this property to omit this check, which may slightly improve performance.
+
+Note that this property is ignored if the `-Dexception_when_fs_update_corrupts_index` or `-Dreport_fs_update_corrupts_index`
+|
+
+2.7.0
+
+|**Measurement / Tracing properties**
+
+|
+
+Trace Feature Structure Creation/Updating
+|
+
+`uima.trace_fs_creation_and_updating`
+
+This causes a trace file to be produced in the current working directory.
+The file has one line for each Feature Structure that is created, and include information on the cas/cas-view, and the features that are set for the Feature Structure.
+There is, additionally, one line for each Feature Structure update.
+Updates that occur next-to trace information for the same Feature Structure are combined. 
+
+This can generate a lot of output, and definitely slows down execution.
+|
+
+2.10.1
+
+|
+
+Measure index flattening optimization
+|
+
+`uima.measure.flatten_index`
+
+See https://issues.apache.org/jira/browse/UIMA-4357[UIMA-4357].
+This creates a short report to System.out when Java is shutdown.
+The report has some statistics about the automatic management of  flattened index creation and use.
+|
+
+2.8.0
+|===
+
+Some additional global flags intended for helping v3 migration are documented in the V3 user's guide.
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/ref/ref.javadocs.adoc b/uimaj-documentation/src/docs/asciidoc/ref/ref.javadocs.adoc
new file mode 100644
index 0000000..dabbb63
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/ref/ref.javadocs.adoc
@@ -0,0 +1,61 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.ref.javadocs]]
+= Javadocs
+
+The details of all the public APIs for UIMA are contained in the API Javadocs.
+These are located in the docs/api directory; the top level to open in your browser is called link:api/index.html.
+
+Eclipse supports the ability to attach the Javadocs to your project.
+The Javadoc should already be attached to the `uimaj-examples` project, if you followed the setup instructions in the xref:oas.adoc#ugr.ovv.eclipse_setup.example_code[setup guide].
+To attach Javadocs to your own Eclipse project, use the following instructions.
+
+[NOTE]
+====
+As an alternative, you can add the UIMA source to the UIMA binary distribution; if you do this you not only will have the Javadocs automatically available (you can skip the following setup), you will have the ability to step through the UIMA framework code while debugging.
+To add the source, follow the instructions as described in the xref:oas.adoc#ugr.ovv.eclipse_setup.adding_source[setup guide].
+====
+
+To add the Javadocs, open a project which is referring to the UIMA APIs in its class path, and open the project properties.
+Then pick Java Build Path.
+Pick the "Libraries" tab and select one of the UIMA library entries (if you don't have, for instance, uima-core.jar in this list, it's unlikely your code will compile). Each library entry has a small ">" sign on its left - click that to expand the view to see the Javadoc location.
+If you highlight that and press edit - you can add a reference to the Javadocs, in the following dialog: 
+
+
+image::images/references/ref.javadocs/image002.jpg[Screenshot of attaching Javadoc to source in Eclipse]
+
+Once you do this, Eclipse can show you Javadocs for UIMA APIs as you work.
+To see the Javadoc for a UIMA API, you can hover over the API class or method, or select it and press shift-F2, or use the menu Navigate →Open External Javadoc, or open the Javadoc view (__Window → Show View → Other → Java → Javadoc__).
+
+In a similar manner, you can attach the source for the UIMA framework, if you download the source distribution.
+The source corresponding to particular releases is available from the Apache UIMA web site (http://uima.apache.org) on the downloads page.
+
+[[ugr.ref.javadocs.libraries]]
+== Using named Eclipse User Libraries
+
+You can also create a named "user library" in Eclipse containing the UIMA Jars, and attach the Javadocs (or optionally, the sources); this named library is saved in the Eclipse workspace.
+Once created, it can be added to the classpath of newly created Eclipse projects.
+
+Use the menu option __Project → Properties → Java Build Path__, and then pick the __Libraries_- tab, and click the __Add Library__ button.
+Then select __User Libraries__, click __Next__, and pick the library you created for the UIMA Jars.
+
+To create this library in the workspace, use the same menu picks as above, but after you select the User Libraries and click "Next", you can click the "New Library..." button to define your new library.
+You use the "Add Jars" button and multi-select all the Jars in the lib directory of the UIMA binary distribution.
+Then you add the Javadoc attachment for each Jar.
+The path to use is `file:/` -- insert the path to your install of UIMA -- `/docs/api`.
+After you do this for the first Jar, you can copy this string to the clipboard and paste it into the rest of the Jars.
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/ref/ref.jcas.adoc b/uimaj-documentation/src/docs/asciidoc/ref/ref.jcas.adoc
new file mode 100644
index 0000000..d44a2f7
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/ref/ref.jcas.adoc
@@ -0,0 +1,463 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.ref.jcas]]
+= JCas Reference
+
+The CAS is a system for sharing data among annotators, consisting of data structures (definable at run time), sets of indexes over these data, metadata describing these, subjects of analysis, and a high performance serialization/deserialization mechanism.
+JCas provides Java approach to accessing CAS data, and is based on using generated, specific Java classes for each CAS type.
+
+Annotators process one CAS per call to their process method.
+During processing, annotators can retrieve feature structures from the passed in CAS, add new ones, modify existing ones, and use and update CAS indexes.
+Of course, an annotator can also use plain Java Objects in addition; but the data in the CAS is what is shared among annotators within an application.
+
+All the facilities present in the APIs for the CAS are available when using the JCas APIs; indeed, you can use the getCas() method to get the corresponding CAS object from a JCas (and vice-versa). The JCas APIs often have helper methods that make using this interface more convenient for Java developers.
+
+The data in the CAS are typed objects having fields.
+JCas uses a set of generated Java classes (each corresponding to a particular CAS type) with "`getter`" and "`setter`" methods for the features, plus a constructor so new instances can be made.
+The Java classes stores the data in the class instance.
+
+Users can modify the JCas generated Java classes by adding fields to them; this allows arbitrary non-CAS data to also be represented within the JCas objects, as well; however, the non-CAS data stored in the JCas object instances cannot be shared with annotators using the plain CAS, unless special provision is made - see the chapter in the v3 user's guide on storing arbitrary Java objects in the CAS.
+
+The JCas class Java source files are generated from XML type system descriptions.
+The JCasGen utility does the work of generating the corresponding Java Class Model for the CAS types.
+There are a variety of ways JCasGen can be run; these are described later.
+You include the generated classes with your UIMA component, and you can publish these classes for others who might want to use your type system.
+
+JCas classes are not required for all UIMA types.
+Those types which don't have  corresponding JCas classes use the nearest JCas class corresponding to a type in their superchain.
+
+The specification of the type system in XML can be written using a conventional text editor, an XML editor, or using the Eclipse plug-in that supports editing UIMA descriptors.
+
+Changes to the type system are done by changing the XML and regenerating the corresponding Java Class Models.
+Of course, once you've published your type system for others to use, you should be careful that any changes you make don't adversely impact the users.
+Additional features can be added to existing types without breaking other code.
+
+A separate Java class is generated for each type; this type implements the CAS FeatureStructure interface, as well as having the special getters and setters for the included features.
+The generated Java classes have methods (getters and setters) for the fields as defined in the XML type specification.
+Descriptor comments are reflected in the generated Java code as Java-doc style comments.
+
+[[ugr.ref.jcas.name_spaces]]
+== Name Spaces
+
+Full Type names consist of a "`namespace`" prefix dotted with a simple name.
+Namespaces are used like packages to avoid collisions between types that are defined by different people at different times.
+The namespace is used as the Java package name for generated Java files.
+
+Type names used in the CAS correspond to the generated Java classes directly.
+If the CAS name is com.myCompany.myProject.ExampleClass, the generated Java class is in the package com.myCompany.myProject, and the class is ExampleClass.
+
+An exception to this rule is the built-in types starting with ``uima.cas ``and ``uima.tcas``; these names are mapped to Java packages named `org.apache.uima.jcas.cas` and ``org.apache.uima.jcas.tcas``.
+
+[[ugr.ref.jcas.use_of_description]]
+== XML description element
+// <titleabbrev>Use of XML Description</titleabbrev>
+
+Each XML type specification can have <description ... > tags.
+The description for a type will be copied into the generated Java code, as a Javadoc style comment for the class.
+When writing these descriptions in the XML type specification file, you might want to use html tags, as allowed in Javadocs.
+
+If you use the Component Description Editor, you can write the html tags normally, for instance, "`<h1>My Title</h1>`".
+The Component Descriptor Editor will take care of coverting the actual descriptor source so that it has the leading "`<`" character written as "`&lt;`", to avoid confusing the XML type specification.
+For example, <p> would be written in the source of the descriptor as &lt;p>. Any characters used in the Javadoc comment must of course be from the character set allowed by the XML type specification.
+These specifications often start with the line <?xml version="`1.0`" encoding="`UTF-8`" ?>, which means you can use any of the UTF-8 characters.
+
+[[ugr.ref.jcas.mapping_built_ins]]
+== Mapping built-in CAS types to Java types
+
+The built-in primitive CAS types map to Java types as follows:
+
+[source]
+----
+uima.cas.Boolean  boolean
+uima.cas.Byte     byte
+uima.cas.Short    short
+uima.cas.Integer  int
+uima.cas.Long     long
+uima.cas.Float    float
+uima.cas.Double   double
+uima.cas.String   String
+----
+
+[[ugr.ref.jcas.augmenting_generated_code]]
+== Augmenting the generated Java Code
+
+The Java Class Models generated for each type can be augmented by the user.
+Typical augmentations include adding additional (non-CAS) fields and methods, and import statements that might be needed to support these.
+Commonly added methods include additional constructors (having different parameter signatures), and implementations of toString().
+
+To augment the code, just edit the generated Java source code for the class named the same as the CAS type.
+Here's an example of an additional method you might add; the various getter methods are retrieving values from the instance:
+
+[source]
+----
+public String toString() { // for debugging
+  return "XsgParse "
+    + getslotName() + ": "
+    + getheadWord().getCoveredText()
+    + " seqNo: " + getseqNo()
+    + ", cAddr: " + id
+    + ", size left mods: " + getlMods().size()
+    + ", size right mods: " + getrMods().size();
+}
+----
+
+[[ugr.ref.jcas.keeping_augmentations_when_regenerating]]
+=== Keeping hand-coded augmentations when regenerating
+
+If the type system specification changes, you have to re-run the JCasGen generator.
+This will produce updated Java for the Class Models that capture the changed specification.
+If you have previously augmented the source for these Java Class Models, your changes must be merged with the newly (re)generated Java source code for the Class Models.
+This can be done by hand, or you can run the version of JCasGen that is integrated with Eclipse, and use automatic merging that is done using Eclipse's EMF plug-in.
+You can obtain Eclipse and the needed EMF plug-in from http://www.eclipse.org/.
+
+If you run the generator version that works without using Eclipse, it will not merge Java source changes you may have previously made; if you want them retained, you'll have to do the merging by hand.
+
+The Java source merging will keep additional constructors, additional fields, and any changes you may have made to the readObject method (see below). Merging will _not_ delete classes in the target corresponding to deleted CAS types, which no longer are in the source – you should delete these by hand.
+
+[WARNING]
+====
+The merging supports Java 1.4 syntactic constructs only.
+JCasGen generates Java 1.4 code, so as long as any code you change here also sticks to  only Java 1.4 constructs, the merge will work.
+If you use Java 5 or later specific syntax or constructs, the merge operation will likely fail to merge properly.
+====
+
+[[ugr.ref.jcas.additional_constructors]]
+=== Additional Constructors
+
+Any additional constructors that you add must include the JCas argument.
+The first line of your constructor is required to be
+
+[source]
+----
+this(jcas);        // run the standard constructor
+----
+
+where jcas is the passed in JCas reference.
+If the type you're defining extends ``uima.tcas.Annotation``, JCasGen will automatically add a constructor which takes 2 additional parameters – the begin and end Java int values, and set the `uima.tcas.Annotation```begin`` and `end` fields.
+
+Here's an example: If you're defining a type MyType which has a feature parent, you might make an additional constructor which has an additional argument of parent:
+
+[source]
+----
+MyType(JCas jcas, MyType parent) {
+  this(jcas);        // run the standard constructor
+  setParent(parent); // set the parent field from the parameter
+}
+----
+
+[[ugr.ref.jcas.using_readobject]]
+==== Using readObject
+
+Fields defined by augmenting the Java Class Model to include additional fields represent data that exist for this class in Java, in a local JVM (Java Virtual Machine), but do not exist in the CAS when it is passed to other environments (for example, passing to a remote annotator).
+
+A problem can arise when new instances are created, perhaps by the underlying system when it iterates over an index, which is: how to insure that any additional non-CAS fields are properly initialized.
+To allow for arbitrary initialization at instance creation time, an initialization method in the Java Class Model, called readObject is used.
+The generated default for this method is to do nothing, but it is one of the methods that you can modify –to do whatever initialization might be needed.
+It is called with 0 parameters, during the constructor for the object, after the basic object fields have been set up.
+It can refer to fields in the CAS using the getters and setters, and other fields in the Java object instance being initialized.
+
+A pre-existing CAS feature structure could exist if a CAS was being passed to this annotator; in this case the JCas system calls the readObject method when creating the corresponding Java instance for the first time for the CAS feature structure.
+This can happen at two points: when a new object is being returned from an iterator over a CAS index, or a getter method is getting a field for the first time whose value is a feature structure.
+
+[[ugr.ref.jcas.modifying_generated_items]]
+=== Modifying generated items
+
+The following modifications, if made in generated items, will be preserved when regenerating.
+
+The public/private etc.
+flags associated with methods (getters and setters). You can change the default ("`public`") if needed.
+
+"`final`" or "`abstract`" can be added to the type itself, with the usual semantics.
+
+[[ugr.ref.jcas.merging_types_from_other_specs]]
+== Merging types
+// <titleabbrev>Merging Types</titleabbrev>
+
+Type definitions are merged by the framework from all the components being run together.
+
+[[ugr.ref.jcas.merging_types.aggregates_and_cpes]]
+=== Aggregate AEs and CPEs as sources of types
+
+When running aggregate AEs (Analysis Engines), or a set of AEs in a collection processing engine, the UIMA framework will build a merged type system (Note: this "`merge`" is merging types, not to be confused with merging Java source code, discussed above). This merged type system has all the types of every component used in the application.
+In addition, application code can use UIMA Framework APIs to read and merge type descriptions, manually.
+
+In most cases, each type system can have its own Java Class Models generated individually, perhaps at an earlier time, and the resulting class files (or .jar files containing these class files) can be put in the class path to enable JCas.
+
+However, it is possible that there may be multiple definitions of the same CAS type, each of which might have different features defined.
+In this case, the UIMA framework will create a merged type by accumulating all the defined features for a particular type into that type's type definition.
+However, the JCas classes for these types are not automatically merged, which can create some issues for JCas users, as discussed in the next section.
+
+[[ugr.ref.jcas.merging_types.jcasgen_support]]
+=== JCasGen support for type merging
+
+When there are multiple definitions of the same CAS type with different features defined, then xref:tools.adoc#ugr.tools.jcasgen[JCasGen] can be re-run on the merged type system, to create one set of JCas Class definitions for the merged types, which can then be shared by all the components.
+This is typically done by the person who is assembling the Aggregate Analysis Engine or Collection Processing Engine.
+The resulting merged Java Class Model will then contain get and set methods for the complete set of features.
+These Java classes must then be made available in the class path, __replacing__ the pre-merge versions of the classes.
+
+If hand-modifications were done to the pre-merge versions of the classes, these must be applied to the merged versions, as described in section <<ugr.ref.jcas.keeping_augmentations_when_regenerating>>, above.
+If just one of the pre-merge versions had hand-modifications, the source for this hand-modified version can be put into the file system where the generated output will go, and the -merge option for JCasGen will automatically merge the hand-modifications with the generated code.
+If _both_ pre-merged versions had hand-modifications, then these modifications must be manually merged.
+
+An alternative to this is packaging the components as individual PEAR files, each with their own version of the JCas generated Classes.
+The Framework can run PEAR files using the  pear file descriptor, and supply each component with its particular version of the JCas generated class.
+
+[[ugr.ref.jcas.impact_of_type_merging_on_composability]]
+=== Type Merging impacts on Composability
+
+The recommended approach in UIMA is to build and maintain type systems as separate components, which are imported by Annotators.
+Using this approach, Type Merging does not occur because the Type System and its JCas classes are centrally managed and shared by the annotators.
+
+If you do choose to create a JCas Annotator that relies on Type Merging (meaning that your annotator redefines a Type that is already in use elsewhere, and adds its own features), this can negatively impact the reusability of your annotator, unless your component is used as a PEAR file.
+
+If not using PEAR file packaging isolation capability, whenever  anyone wants to combine your annotator with another annotator that uses a different version of the same Type, they will need to be aware of all of the issues described in the previous section.
+They will need to have the know-how to re-run JCasGen and appropriately set up their classpath to include the merged Java classes and to not include the pre-merge classes.
+(To enable this, you should package these classes separately from other .jar files for your annotator, so that they can be more easily excluded.) And, if you have done hand-modifications to your JCas classes, the person assembling your annotator will need to properly merge those changes.
+These issues significantly complicate the task of combining annotators, and will cause your annotator not to be as easily reusable as other UIMA annotators. 
+
+[[ugr.ref.jcas.documentannotation_issues]]
+=== Adding Features to DocumentAnnotation
+
+There is one built-in type, ``uima.tcas.DocumentAnnotation``,  to which applications can add additional features.
+(All other built-in types are "feature-final" and you cannot add additional features to them.)  Frequently, additional features are added to `uima.tcas.DocumentAnnotation`  to provide a place to store document-level metadata.
+
+For the same reasons mentioned in the previous section, adding features to  DocumentAnnotation is not recommended if you are using JCas.
+Instead, it is recommended that you define your own type for storing your document-level metadata.
+You can create  an instance of this type and add it to the indexes in the usual way.
+You can then retrieve this instance using the iterator returned from the method``getAllIndexedFS(type)`` on an instance of a JFSIndexRepository object.
+(As of UIMA v2.1, you do not have to declare a custom index in your descriptor to get this to work).
+
+If you do choose to add features to DocumentAnnotation, there are additional issues to be aware of.
+The UIMA SDK provides the JCas cover class for the built-in definition of DocumentAnnotation, in the separate jar file ``uima-document-annotation.jar``.
+If you add additional features to DocumentAnnotation, you must remove this jar file from your classpath, because you will not want to use the default JCas cover class.
+You will need to re-run JCasGen as described in <<ugr.ref.jcas.merging_types.jcasgen_support>>.
+JCasGen will generate a new cover class for DocumentAnnotation, which you must place in your classpath in lieu of the version in ``uima-document-annotation.jar``.
+
+Also, this is the reason why the method `JCas.getDocumentAnnotationFs()` returns type ``TOP``, rather than type ``DocumentAnnotation``.
+Because the `DocumentAnnotation` class can be replaced by users, it is not part of `uima-core.jar` and so the core UIMA framework cannot have any references to it.
+In your code, you may "`cast`" the result of `JCas.getDocumentAnnotationFs()`  to type ``DocumentAnnotation``, which must be available on the classpath either via `uima-document-annotation.jar` or by including a custom version that you have generated using JCasGen.
+
+[[ugr.ref.jcas.using_within_an_annotator]]
+== Using JCas within an Annotator
+
+To use JCas within an annotator, you must include the generated Java classes output from JCasGen in the class path.
+
+An annotator written using JCas is built by defining a class for the annotator that extends JCasAnnotator_ImplBase.
+The process method for this annotator is written
+
+[source]
+----
+public void process(JCas jcas)
+     throws AnalysisEngineProcessException {
+  ... // body of annotator goes here
+}
+----
+
+The process method is passed the JCas instance to use as a parameter.
+
+The JCas reference is used throughout the annotator to refer to the particular JCas instance being worked on.
+In pooled or multi-threaded implementations, there will be a separate JCas for each thread being (simultaneously) worked on.
+
+You can do several kinds of operations using the JCas APIs: create new feature structures (instances of CAS types) (using the new operator), access existing feature structures passed to your annotator in the JCas (for example, by using the next method of an iterator over the feature structures), get and set the fields of a particular instance of a feature structure, and add and remove feature structure instances from the CAS indexes.
+To support iteration, there are also functions to get and use indexes and iterators over the instances in a JCas.
+
+[[ugr.ref.jcas.new_instances]]
+=== Creating new instances using the Java "`new`" operator
+// <titleabbrev>Creating new instances</titleabbrev>
+
+The new operator creates new instances of JCas types.
+It takes at least one parameter, the JCas instance in which the type is to be created.
+For example, if there was a type Meeting defined, you can create a new instance of it using: 
+[source]
+----
+Meeting m = new Meeting(jcas);
+----
+
+Other variations of constructors can be added in custom code; the single parameter version is the one automatically generated by JCasGen.
+For types that are subtypes of Annotation, JCasGen also generates an additional constructor with additional "`begin`" and "`end`" arguments.
+
+[[ugr.ref.jcas.getters_and_setters]]
+=== Getters and Setters
+
+If the CAS type Meeting had fields location and time, you could get or set these by using getter or setter methods.
+These methods have names formed by splicing together the word "`get`" or "`set`" followed by the field name, with the first letter of the field name capitalized.
+For instance 
+[source]
+----
+getLocation()
+----
+
+The getter forms take no parameters and return the value of the field; the setter forms take one parameter, the value to set into the field, and return void.
+
+There are built-in CAS types for arrays of integers, strings, floats, and feature structures.
+For fields whose values are these types of arrays, there is an alternate form of getters and setters that take an additional parameter, written as the first parameter, which is the index in the array of an item to get or set.
+
+[[ugr.ref.jcas.obtaining_refs_to_indexes]]
+=== Obtaining references to Indexes
+
+The only way to access instances (not otherwise referenced from other instances) passed in to your annotator in its JCas is to use an iterator over some index.
+Indexes in the CAS are specified in the annotator descriptor.
+Indexes have a name; text annotators have a built-in, standard index over all annotations.
+
+To get an index, first get the JFSIndexRepository from the JCas using the method jcas.getJFSIndexRepository(). Here are the calls to get indexes:
+
+[source]
+----
+JFSIndexRepository ir = jcas.getJFSIndexRepository();
+
+ir.getIndex(name-of-index) // get the index by its name, a string
+ir.getIndex(name-of-index, Foo.type) // filtered by specific type
+
+ir.getAnnotationIndex()      // get AnnotationIndex
+jcas.getAnnotationIndex()    // get directly from jcas
+ir.getAnnotationIndex(Foo.type)      // filtered by specific type
+----
+
+For convenience, the getAnnotationIndex method is available directly on the JCas object instance; the implementation merely forwards to the associated index repository.
+
+Filtering types have to be a subtype of the type specified for this index in its index specification.
+They can be written as either Foo.type or if you have an instance of Foo, you can write
+
+[source]
+----
+fooInstance.getClass()
+----
+
+Foo is (of course) an example of the name of the type.
+
+[[ugr.ref.jcas.adding_removing_instances_to_indexes]]
+=== Adding (and removing) instances to (from) indexes
+// <titleabbrev>Updating Indexes</titleabbrev>
+
+CAS indexes are maintained automatically by the CAS.
+But you must add any instances of feature structures you want the index to find, to the indexes by using the call:
+
+[source]
+----
+myInstance.addToIndexes();
+----
+
+Do this after setting all features in the instance __which could be used in indexing__,  for example, in determining the sorting order.
+See <<ugr.ref.cas.updating_indexed_feature_structures>> for details on updating indexed feature structures. 
+
+When writing a Multi-View component, you may need to index instances in multiple CAS views.
+The methods above use the indexes associated with the current JCas object.
+There is a variation of the `addToIndexes / removeFromIndexes` methods which takes one argument: a reference to a JCas object holding the view in which you want to  index this instance. 
+[source]
+----
+myInstance.addToIndexes(anotherJCas)
+myInstance.removeFromIndexes(anotherJCas)
+----
+
+You can also explicitly add instances to other views using the addFsToIndexes method on other JCas (or CAS) objects.
+For instance, if you had 2 other CAS views (myView1 and myView2), in which you wanted to index myInstance, you could write:
+
+[source]
+----
+myInstance.addToIndexes(); //addToIndexes used with the new operator
+myView1.addFsToIndexes(myInstance); // index myInstance in myView1
+myView2.addFsToIndexes(myInstance); // index myInstance in myView2
+----
+
+The rules for determining which index to use with a particular JCas object are designed to behave the way most would think they should; if you need specific behavior, you can always  explicitly designate which view the index adding and removing operations should work on. 
+
+The rules are: If the instance is a subtype of AnnotationBase, then the view is the view associated with the  annotation as specified in the feature holding the view reference in AnnotationBase.
+Otherwise, if the instance was created using the "new" operator, then the view is the view passed to the  instance's constructor.
+Otherwise, if the instance was created by getting a feature value from some other instance, whose range type is a feature structure, then the view is the same as the referring instance.
+Otherwise, if the instance was created by any of the Feature Structure Iterator operations over some index, then it is the view associated with the index. 
+
+As of release 2.4.1, there are two efficient bulk-remove methods to remove all instances of a given type,  or all instances of a given type and its subtypes.
+These are invoked on an instance of an IndexRepository, for a particular view.
+For example, to remove all instances of Token from a particular JCas instance: 
+
+[source]
+----
+jcas.removeAllIncludingSubtypes(Token.type) or
+jcas.removeAllIncludingSubtypes(aTokenInstance.getTypeIndexID()) or
+jcas.getFsIndexRepository().
+       removeAllIncludingSubtypes(jcas.getCasType(Token.type))
+----
+
+[[ugr.ref.jcas.using_iterators]]
+=== Using Iterators
+
+This chapter describes obtaining and using iterators.
+However, it is recommended that instead  you use the select framework, described in a chapter in the version 3 user's guide.
+
+Once you have an index obtained from the JCas, you can get an iterator from the index; here is an example:
+
+[source]
+----
+FSIndexRepository ir = jcas.getFSIndexRepository();
+FSIndex myIndex = ir.getIndex("myIndexName");
+FSIterator myIterator = myIndex.iterator();
+
+JFSIndexRepository ir = jcas.getJFSIndexRepository();
+FSIndex myIndex = ir.getIndex("myIndexName", Foo.type); // filtered
+FSIterator myIterator = myIndex.iterator();
+----
+
+xref:ref.adoc#ugr.ref.cas.indexes_and_iterators[Iterators] work like normal Java iterators, but are augmented to support additional capabilities.
+
+[[ugr.ref.jcas.class_loaders]]
+=== Class Loaders in UIMA
+
+The basic concept of a UIMA application includes assembling engines into a flow.
+The application made up of these Engines are run within the UIMA Framework, either by the Collection Processing Manager, or by using more basic UIMA Framework APIs.
+
+The UIMA Framework exists within a JVM (Java Virtual Machine). A JVM has the capability to load multiple applications, in a way where each one is isolated from the others, by using a separate class loader for each application.
+For instance, one set of UIMA Framework Classes could be shared by multiple sets of application - specific classes, even if these application-specific classes had the same names but were different versions.
+
+[[ugr.ref.jcas.class_loaders.optional]]
+==== Use of Class Loaders is optional
+
+The UIMA framework will use a specific ClassLoader, based on how ResourceManager instances are used.
+Specific ClassLoaders are only created if you specify an ExtensionClassPath as part of the ResourceManager.
+If you do not need to support multiple applications within one UIMA framework within a JVM, don't specify an ExtensionClassPath; in this case, the classloader used will be the one used to load the UIMA framework - usually the overall application class loader.
+
+Of course, you should not run multiple UIMA applications together, in this way, if they have different class definitions for the same class name.
+This includes the JCas "`cover`" classes.
+This case might arise, for instance, if both applications extended `uima.tcas.DocumentAnnotation` in differing, incompatible ways.
+Each application would need its own definition of this class, but only one could be loaded (unless you specify ExtensionClassPath in the ResourceManager which will cause the UIMA application to load its private versions of its classes, from its classpath).
+
+[[ugr.ref.jcas.accessing_jcas_objects_outside_uima_components]]
+=== Issues accessing JCas objects outside of UIMA Engine Components
+
+If you are using the ExtensionClassPaths, the JCas cover classes are loaded under a class loader created by the ResourceManager part of the UIMA Framework.
+If you reference the same JCas classes outside of any UIMA component, for instance, in top level application code, the JCas classes used by that top level application code also must be in the class path for the application code.
+
+Alternatively, you could do all the JCas processing inside a UIMA component (and do no processing using JCas outside of the UIMA pipeline).
+
+[[ugr.ref.jcas.setting_up_classpath]]
+== Setting up Classpath for JCas
+
+The JCas Java classes generated by JCasGen are typically compiled and put into a JAR file, which, in turn, is put into the application's class path.
+
+This JAR file must be generated from the application's merged type system.
+This is most conveniently done by opening the top level descriptor used by the application in the Component Descriptor Editor tool, and pressing the Run-JCasGen button on the Type System Definition page.
+
+[[ugr.ref.jcas.pear_support]]
+== PEAR isolation
+
+As of version 2.2, the framework supports component descriptors which are PEAR descriptors.
+These descriptors define components plus include information on the class path needed to  run them.
+The framework uses the class path information to set up a localized class path, just for code running within the PEAR context.
+This allows PEAR files requiring different  versions of common code to work well together, even if the class names in the different versions have the same names. 
+
+The mechanism used to switch the class loaders when entering a PEAR-packaged annotator in a flow depends on the framework knowing if JCas is being used within that annotator code.
+The framework will know this if the particular view being passed has had a previous call to  getJCas(), or if the particular annotator is marked as a JCas-using one (by having it extend the class `JCasAnnotator_ImplBase).`
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/ref/ref.json.adoc b/uimaj-documentation/src/docs/asciidoc/ref/ref.json.adoc
new file mode 100644
index 0000000..9fb80cb
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/ref/ref.json.adoc
@@ -0,0 +1,436 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.ref.json]]
+= JSON Serialization of CASs and UIMA Description objects
+// <titleabbrev>JSON support</titleabbrev>
+
+
+[[ugr.ref.json.overview]]
+== JSON serialization support overview
+
+Applications are moving to the "cloud", and new applications are being rapidly developed that are hooking things up using various mashup techniques.
+New standards and conventions are emerging to support this kind of application development, such as REST services.
+JSON is now a popular way for services to communicate;  its popularity is rising (in 2014) while XML is falling.
+
+Starting with version 2.7.0, JSON style serialization (but not (yet) deserialization)  for CASs and UIMA descriptions is supported.
+The exact format of the serialization is configurable in several aspects.
+The implementation is built on top of the Jackson JSON generation library. 
+
+The next section discusses serialization for CASes, while a later section describes serialization of description objects, such as type system descriptions.
+
+[[_ug.ref.json.cas]]
+== JSON CAS Serialization
+
+CASs primarily consist of collections of Feature Structures (FSs). Similar to XMI serialization, JSON serialization skips serializing unreachable FSs, outputting only those FSs that are found in the indexes (these are called __roots__), plus all of   the FSs that are referenced via some chain of references, from the roots. 
+
+To support the kinds of things users do with FSs,  the serialized form may be augmented to include additional information beyond the FSs.
+
+For traditional UIMA implementations, the serialized formats mostly assumed that the receivers had access to a type system description, which specified details of the types of each feature value.
+For JSON serialization, some of this information can be including directly in the serialization.
+
+This abbreviated type system information is one kind of additional information that can be included;  here's a summary list of the various kinds of additional information you can add to the serialization:
+
+* having a way to identify which fields in a FS should be treated as references to other FSs, or as representing serialized binary data from UIMA byte arrays.
+* something like XML namespaces to allow the use of short type names in the serialization while handling name collisions
+* enough of the UIMA type hierarchy to allow the common operation of iterating over a type together  with all of its subtypes
+* A way to identify which FSs were "added-to-the-indexes" (separately, per CAS View)  and therefore serve as roots when  iterating over types.
+* An identification of the associated type system definition
+
+Simple JSON serialization does not have a convention for supporting these, but many extensions do.
+We borrow some of the concepts in the JSON-LD (linked data) standard in providing this  additional information.
+
+[[_ug.ref.json.cas.bigpic]]
+=== The Big Picture
+
+CAS JSON serialization consists of several parts: an optional _context, the set of Feature Structures, and (if doing a delta serialization) information about changes to what was indexed.
+
+.The major sections of JSON serialization
+image::images/references/ref.json/big_picture2.png["The big picture showing the parts of serialization, with the _context optional."]
+
+The serializer can be configured to omit the _context or parts of the _context for cases where that information isn't needed.
+The index changes information is only included if Delta CAS serialization is specified.
+Note that Delta CAS support is incomplete; so this information is just for planning purposes.
+
+[[_ug.ref.json.cas.context]]
+=== The _context section
+
+The _context section has entries for each used type as well as some special additional entries.
+Each entry for a type has multiple sub-entries, identified by a key-name.
+Each sub-entry can be selectively omitted if not needed. 
+
+* *\_type_system* - a URI of the type system information
+* *\_types* - information about each used type 
++
+** *\_id* - the type's fully qualified UIMA type name
+** *\_feature_types* - a map from features of this type to  information about the type of the value of the feature
+** *\_subtypes* - an array of used subtype short-names
+
+Here's an example:
+
+====
+[source]
+----
+"_context" : {
+  "_type_system" : "URI to the type system information",
+  "_types : {
+    "A_Typical_User_or_built_in_Type" : {
+      "_id" : "org.apache.uima.test.A_Typical_User_or_built_in_Type", 
+      "_feature_types" : [
+           "sofa"         : "_ref", 
+           "aFS"          : "_ref", 
+           "an_array"     : "_array",
+           "a_byte_array" : "_byte_array"],
+      "_subtypes" : [ "subtype1", "subtype2", ... ] }, 
+    "Sofa" : {
+      "_id" : "uima.cas.Sofa", 
+      "_feature_types" : {"sofaArray" : "_ref"} }
+  }
+}
+----
+====
+
+The *\_type_system* is an optional URI that references a UIMA type system description that defines the types for the CAS being serialized.
+
+In the *\_types* section, the key (e.g.
+"Sofa" or "A_Typical_User_or_built_in_Type") is the "short" name  for the type used in the serialization.
+It is either just the last segment of the full type name (e.g.
+for the type x.y.z.TypeName, it's TypeName), or,  if name would collide with another type name if just the last segment was used (example:  some.package.cname.Foo,  and some.other.package.cname.Foo), then the key is made up of the next-to-last segment, with an optional suffixed incrementing integer in case of collisions on that name, a colon (:) and then the last name.
+
+[quote]
+In this example, since the next to last segment of both names is "cname", one namespace name would be "cname", and the other would be "cname1".  The keys in this case would be cname:Foo and cname1:Foo.
+
+The value of the _id is the fully qualified name of the type.
+
+The *\_feature_types* values of _ref, _array, and _byte_array indicate the corresponding values  of the named features need special handling  when deserailized. 
+
+* *\_ref* - used when features are deserialized as numbers, but they are to be interpreted as references to other FSs whose `id` is the number. UIMA lists and arrays of  FSs are marked with _ref; if the value is a JSON array, the elements of the array will be either numbers (to be interpreted as references), or embedded serializations of FSs.
+* *\_array* - used when features are serialized as JSON  arrays containing embedded values,  unless the corresponding UIMA object has multiple references, in which case it is serialized as a FS reference which looks like a single number. If a feature is marked with _array, then a non-array, single number should be interpreted as the `id` of the feature structure that is the array or the first element of the list of items. This designation is used for both UIMA arrays and lists.
++
+This designation is for arrays and lists of primitive values, except for byte arrays.
+In the case of FS arrays and lists, the _ref designation is used instead of this to indicate that the  resulting values in a JSON array that look like numbers should be interpreted as references.
+* *\_byte_array* - _byte_array features are serialized numbers (if they are a  reference to a separate object, or as strings (if embedded).  The strings are to be decoded into binary byte arrays using the Base64 encoding (the standard one used by Jackson to serialize binary data).
+
+Note that single element arrays are _not_ unwrapped, as in some other JSON serializations, to enable distinguishing references to arrays from embedded arrays. 
+
+*\_subtypes* are a list of the type's used subtypes.
+A type is _used_ if it is the type of a Feature Structure being serialized, or if it is in the supertype chain of some Feature Structure which is serialized.
+If a type has no used subtypes, this element is omitted.
+The names are represented as the "short" name.
+Users typically use this information to construct support for iterators over a type which includes all of its subtypes.
+
+[[_ug.ref.json.cas.context.omit]]
+==== Omitting parts of the _context section
+
+It is possible to selectively omit some of the  _context sections (or the entire _context), via configuration.
+Here's an example:
+
+====
+[source]
+----
+// make a new instance to hold the serialization configuration           
+JsonCasSerializer jcs = new JsonCasSerializer();  
+// Omit the expanded type names information
+jcs.setJsonContext(JsonContextFormat.omitExpandedTypeNames);
+----
+====
+
+See the Javadocs for `JsonContextFormat` for how to specify the parts.
+
+[[_ug.ref.json.cas.featurestructures]]
+=== Serializing Feature Structures
+
+Feature Structures themselves are represented as JSON objects consisting of field - value pairs, where the  fields correspond to UIMA Features, and the values are the values of the features. 
+
+The various kinds of values for a UIMA feature are represented by their natural JSON counterpart.
+UIMA primitive boolean values are represented by JSON true and false literals.
+UIMA Strings are  represented as JSON strings.
+Numbers are represented by JSON numbers.
+Byte Arrays are represented by the Jackson standard binary encoding (base64 encoding), written as JSON strings.
+References to other Feature Structures are also represented as JSON integer numbers, the values of which are  interpreted as ids of the referred-to FSs.
+These ids are treated in the same manner as the xmi:ids of XMI Serialization.
+Arrays and Lists when embedded (see following section) are represented as JSON arrays using the [] notation.
+
+Besides the feature values defined for a Feature Structure, an additional special feature may be serialized:  _type.
+The _type is the type name, written using the short format.
+This is automatically included when the type cannot  easily be inferred from other contextual information. 
+
+Here's an example, with some comments which, since JSON doesn't support comments, are just here for explanation:
+
+====
+[source]
+----
+{ "_type" : "Annotation", // _type may be omitted
+  "feat1" : true,   // boolean value represented as true or false
+  "feat2" : 123,    // could be a number or a reference to FS with id 123
+  "feat3" : "b3axgh"//could be a string or a base64 encoded byte array
+}
+----
+====
+
+[[_ug.ref.json.cas.featurestructures.embedding]]
+==== Embedding normally referenced values
+
+Consider a FS which has a feature that refers to another FS.
+This can be serialized in one of two ways:
+
+* the value of the feature can be coded as an `id` (a number), where the number is the `id` of the referred-to FS.
+* The value of the feature can be coded as the serialization of the referred-to FS.
+
+This second way of encoding is often done by JSON style serializations, and is called "embedding".  Referred-to  FSs may be embedded if there are no other references to the embedded FS.
+Multiple references may arise due to having a FS referenced as a "root" in some CAS View, or being used as a value in a FS feature.
+
+Following the XMI conventions, UIMA arrays and lists which are  identified as singly referenced by either the static or dynamic method (see below) are embedded directly as the value of a feature.
+In this case, the JSON serialization writes out the value of the feature as a JSON array.
+Otherwise, the value is written out as a FS reference, and a separate serialization occurs of  the list elements or the array.
+
+In addition to arrays and lists, FSs which are identifed as singly referenced from another FS are serialized as the embedded value of the referring feature.
+This is also done (when using the dynamic method) for singly referenced rooted instances. 
+
+If a FS is multiply referenced, the serialization in these cases is just the numeric value of the `id` of the FS.
+
+[[_ug.ref.json.cas.featurestructures.dynamicstatic]]
+==== Dynamic vs Static multiple-references and embedding
+
+There are two methods of determining if a particular FS or list or array can be embedded. 
+
+* *dynamic* - calculates at serilization time whether or not there are multiple references to a given FS.
+* *static* - looks in the type system definition to see if  the feature is marked with <multipleReferencesAllowed>. 
++
+** `multipleReferencesAllowed` false → use the embedded style
+** `multipleReferencesAllowed` true → use separate objects
+
+Note that since this flag is not available for  references to FSs from View indexes, any FS that is indexed in any view is considered (if using static mode) to be multipleReferencesAllowed. 
+
+Delta serialization only supports the static method; this mode is forced on if delta serialization is specified.
+
+Dynamic embedding is enabled by default for JSON, but may be disabled via configuration.
+
+[[_ug.ref.json.cas.featurestructures.embeddedarrayslists]]
+==== Embedded Arrays and Lists
+
+When static embedding is being used, a case can arise where some feature is marked to have only  singly referenced FS values, but that value may actually be multiply referenced.
+This is detected during  serialization, and an message is issued if an error handler has been specified to the serializer.
+The serialization continues, however.
+In the case of an Array, the value of the array is embedded in the serialization and the fact that these were referring to the same object is lost.
+In the case of a list, if any element in the list has multiple references (for example,  if the list has back-references, loops, etc.),  the serialization of the list is truncated at the point where the multiple reference occurs.
+
+[quote]
+Note that you can correctly serialize arbitrarily linked complex list structures created  using the built-in list types only if you use dynamic embedding, or  specify `multipleReferencesAllowed` = true.
+
+Embedded list or array values are both serialized using the JSON array notation; as a result, these alternative representations are not distinguised in the JSON serialization.
+
+[[_ug.ref.json.cas.featurestructures.null]]
+==== Omitting null values
+
+Following the conventions established in XMI serialization, features with `null` values have their key-value pairs omitted from the FS serialization when the type of the feature value is: 
+
+* a Feature Structure Reference
+* a String ( whose value is ``null``, not "" (a 0-length String))
+* an embedded Array or List (where the entire array and/or list is ``null``)
+
+
+[NOTE]
+====
+Inside arrays or lists of FSs, references which are being serialized as references have a `null` reference coded as the number 0; references which are embedded are serialized as ``null``.
+====
+
+Configuring the serializer with `setOmit0Values(true)` causes additional primitive features (byte/short/int/long/float/double) to be omitted, when their values are 0 or 0.0
+
+[[_ug.ref.json.cas.featurestructures.organization]]
+== Organizing the Feature Structures
+
+The set of all FSs being serialized is divided into two parts.
+The first part represents all FSs that are root FSs, in that they were in one or more indexes at the time of serialization.
+The second part represents feature structures that are multiply referenced, or are referenced via a chain of references from the root FSs.
+The same feature structure can appear in both lists.
+The elements in the second part are actual  serialized FSs, whereas, the elements in the first part are either references to the corresponding FSs in the second part, if they exist, or the actual embedded serialized FSs.
+Actual embedded serialized FSs only exist once in the two parts.
+
+====
+[source]
+----
+"_views" : {
+  "_InitialView" : {
+     "theFirstType" : [  { ... fs1 ...}, 123, 456, { ... fsn ...} ]
+     "anotherType"  : [  { ... fs1 ...}, ... { ... fsn ...} ]
+      ...     // more types which have roots in view "12"
+         },
+  "AnotherView" : {
+     "theFirstType" : [  { ... fsv1 ...}, 123, { ... fsvn ...} ]
+     "anotherType"  : [  { ... fsv1 ...}, ... { ... fsvn ...} ]
+      ...     // more types which have roots in view "25"
+         },
+   ...        // more views         
+}, 
+
+"_referenced_fss" : {
+  "12" : {"_type" : "Sofa",  "sofaNum" : 1,  "sofaID" : "_InitialView" },
+  "25" : {"_type" : "Sofa",  "sofaNum" : 2,  "sofaID" : "AnotherView" },
+  
+  "123" : { ... fs-123 ... },
+  "456" : { ... fs-456 ... },
+  ...
+}
+----
+====
+
+The first part map is made up of multiple maps, one for each separate CAS View.
+The outer map is keyed by the `id` of the corresponding SofaFS (or 0, if there is no corresponding SofaFS). For each view, the value is a map whose key is a used Type, and the values are an array of instances of FSs of that type which were found in some index; these are the "root" FSs.
+Only root instances of a particular type are included in this array. 
+
+The second part map has keys which are the `id` value of the FSs, and values which are  a map of key-value pairs corresponding to the feature-values of that FS.
+In this case, the _type extra feature is added to record the type.
+
+The _views map, keyed by view and type name, has all the FSs (as an JSON array) for that type that were in one or more indexes in any View.
+If a FS in this array is not multiply referenced (using dynamic mode),  then it is embedded here.
+Otherwise, only the reference (a simple number representing the `id` of that FS) is serialized for that FS.
+
+[[_ug.ref.json.cas.features]]
+== Additional JSON CAS Serialization features
+
+JSON serialization also supports several additional features, including:
+
+* Type and feature filtering: only types and features that exist in a specified type system description  are serialized.
+* An ErrorHandler; this will be called in various error situations, including when  serializing in static mode an array or list value for a feature marked `multipleReferencesAllowed = false` is found to have multiple references.
+* A switch to control omitting of numeric features that have 0 values (default is to include these). See the `setOmit0Values(true_or_false)` method in JsonCasSerializer.
+* a pretty printing flag (default is not to do pretty-printing)
+
+See the Javadocs for JsonCasSerializer for details.
+
+[[ugr.ref.json.delta]]
+=== Delta CAS
+
+[NOTE]
+====
+Delta CAS support is incomplete, and is not supported as of release 2.7.0, but may be supported in later releases.
+The information here is just for planning purposes.
+====
+
+*\_delta_cas* is present only when a delta CAS serialization is being performed.
+This serializes just the  changes in the CAS since a Mark was set; so for cases where a large CAS is deserialized into a service, which then does a relatively small amount of additions and modifications, only those changes are serialized.
+The values of the keys are arrays of the ids of FSs that were added to the indexes,  removed from the indexes, or reindexed.
+
+This mode requires the static embeddability mode.
+When specified, a `\_delta_cas` key-value  is added to the serialization at the end,  which lists the FSs (by ``id``) that were added, removed, or reindexed, since the mark was set.
+Additional extra information, created when the CAS was previously deserialized and the mark set,  must be passed to the serializer, in the form of an instance of ``XmiSerializationSharedData``, or JsonSerializationSharedData (not yet defined as of release 2.7.0).
+
+Here's what the last part of the serialization looks like, when Delta CAS is specified: 
+
+====
+[source]
+----
+"_delta_cas" : {
+  "added_members" : [  123, ... ],
+  "deleted_members" : [  456, ... ],
+  "reindexed_members" : [] }
+----
+====
+
+[[ugr.ref.json.usage]]
+== Using JSON CAS serialization
+
+The support is built on top the Jackson JSON serialization package.
+We follow Jackson conventions for configuring.
+
+The serialization APIs are in the JsonCasSerializer class.
+
+Although there are some static short-cut methods for common use cases, the basic operations needed to serialize a CAS as JSON are:
+
+* Make an instance of the `JsonCasSerializer` class. This will serve to collect configuration information.
+* Do any additional configuration needed. See the Javadocs for details. The following objects can be configured:
+** The `JsonCasSerializer` object: here you can specify the kind of JSON formatting, what to serialize, whether or not delta serialization is wanted, prettyprinting, and more.
+** The underlying `JsonFactory` object from Jackson. Normally, you won't need to configure this. If you do, you can create your own instance of this object and configure it and use it in the serialization.
+** The underlying `JsonGenerator` from Jackson. Normally, you won't need to configure this. If you do, you can get the instance the serializer will be using and configure that.
+* Once all the configuration is done, the serialize(...) call is done in this class,  which will create a one-time-use inner class where the actual serialization is done. The serialize(...) method is thread-safe, in that the same  JsonCasSerializer instance (after it has been configured) can kick off multiple  (identically configured) serializations  on different threads at the same time.
++
+The serialize call follows the Jackson conventions, taking one of 3 specifications of where to serialize to: a Writer, an OutputStream, or a File.
+
+Here's an example:
+
+====
+[source]
+----
+JsonCasSerializer jcs = new JsonCasSerializer();
+jcs.setPrettyPrint(true); // do some configuration
+StringWriter sw = new StringWriter();                          
+jcs.serialize(cas, sw); // serialize into sw
+----
+====
+
+The JsonCasSerializer class also has some static convenience methods for JSON serialization, for the most common configuration cases; please see the Javadocs for details.
+These are named jsonSerialize, to  distinguish them from the non-static serialize methods.
+
+Many of the common configuration methods generally return the instance, so they can be chained together.
+For example, if `jcs` is an instance of the JsonCasSerializer, you can write `jcs.setPrettyPrint(true).setOmit0values(true);` to configure both of these.
+
+[[ugr.ref.json.descriptionserialization]]
+== JSON serialization for UIMA descriptors
+
+UIMA descriptors are things like analysis engine descriptors, type system descriptors, etc.
+UIMA has an internal form for these, typically named UIMA __description__s;  these can be serialized out as XML using a `toXML` method.
+JSON support adds the ability to serialize these a JSON objects, as well.
+It may be of use, for example, to have the full type system description for a UIMA pipeline available in JSON notation. 
+
+The class JsonMetaDataSerializer defines a set of static methods that serialize UIMA description objects using a toJson method that takes as an argument the description object to be serialized, and the standard set of serialiization targets that Jackson supports (File, Writer, or OutputStream).  There is also an optional prettyprint flag (default is no prettyprinting).
+
+The resulting JSON serialization is just a straight-forward serialization of the description object, having the same fields as the XML serialization of it.
+
+Here's what a small TypeSystem description looks like, serialized:
+
+====
+[source]
+----
+{"typeSystemDescription" : 
+  {"name" : "casTestCaseTypesystem",  
+   "description" : "Type system description for CAS test cases.",  
+   "version" : "1.0",  
+   "vendor" : "Apache Software Foundation",  
+   "types" : [
+     {"typeDescription" : 
+       {"name" : "Token",  
+        "description" : "",  
+         "supertypeName" : "uima.tcas.Annotation",  
+         "features" : [
+           {"featureDescription" : 
+             {"name" : "type",  
+              "description" : "",  
+              "rangeTypeName" : 
+              "TokenType" } }, 
+           {"featureDescription" : 
+             {"name" : "tokenFloatFeat",  
+              "description" : "",  
+              "rangeTypeName" : "uima.cas.Float" } } ] } }, 
+     {"typeDescription" : 
+       {"name" : "TokenType",  
+        "description" : "",  
+        "supertypeName" : "uima.cas.TOP" } } ] } }
+----
+====
+
+Here's a sample of code to serialize a UIMA description object held in the variable ``tsd``, with  and without pretty printing:
+
+====
+[source]
+----
+StringWriter sw = new StringWriter();                               
+JsonMetaDataSerializer.toJSON(tsd, sw); // no prettyprinting
+
+sw = new StringWriter();             
+JsonMetaDataSerializer.toJSON(tsd, sw, true); // prettyprinting
+----
+====
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/ref/ref.pear.adoc b/uimaj-documentation/src/docs/asciidoc/ref/ref.pear.adoc
new file mode 100644
index 0000000..901d91b
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/ref/ref.pear.adoc
@@ -0,0 +1,645 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.ref.pear]]
+= PEAR Reference
+
+A PEAR (Processing Engine ARchive) file is a standard package for UIMA components.
+This chapter describes the PEAR 1.0 structure and specification. 
+
+The PEAR package can be used for distribution and reuse by other components or applications.
+It also allows applications and tools to manage UIMA components automatically for verification, deployment, invocation, testing, etc. 
+
+Currently, there is an Eclipse plugin and a command line tool available to create PEAR packages for standard UIMA components.
+Please refer to the xref:tools.adoc#ugr.tools.pear.packager[PEAR Packager] documentation for more information about these tools. 
+
+PEARs distributed to new targets can be installed at those targets.
+UIMA includes a tool for installing PEARs; xref:tools.adoc#ugr.tools.pear.installer[PEAR Installer User's Guide] for  more information about installing PEARs. 
+
+An installed PEAR can be used as a component within a UIMA pipeline, by specifying the pear descriptor that is created when installing the pear.
+See xref:ref.adoc#ugr.ref.pear.specifier[PEAR Specifier Reference].
+
+[[ugr.ref.pear.packaging_a_component]]
+== Packaging a UIMA component
+
+For the purpose of describing the process of creating a PEAR file and its internal structure, this section describes the steps used to package a UIMA component as a valid PEAR file.
+The PEAR packaging process consists of the following steps: 
+
+* <<ugr.ref.pear.creating_pear_structure>>
+* <<ugr.ref.pear.populating_pear_structure>>
+* <<ugr.ref.pear.creating_installation_descriptor>>
+* <<ugr.ref.pear.packaging_into_1_file>>
+
+
+[[ugr.ref.pear.creating_pear_structure]]
+=== Creating the PEAR structure
+
+The first step in the PEAR creation process is to create a PEAR structure.
+The PEAR structure is a structured tree of folders and files, including the following elements: 
+
+* Required Elements: 
++
+** The * metadata*										folder which contains the PEAR installation descriptor and properties files. 
+** The installation descriptor ( * metadata/install.xml*										) 
+** A UIMA analysis engine descriptor and its required code, delegates (if any), and resources 
+* Optional Elements: 
++
+** The desc folder to contain descriptor files of analysis engines, delegates analysis engines (all levels), and other components (Collection Readers, CAS Consumers, etc). 
+** The src folder to contain the source code 
+** The bin folder to contain executables, scripts, class files, dlls, shared libraries, etc. 
+** The lib folder to contain jar files. 
+** The doc folder containing documentation materials, preferably accessible through an index.html. 
+** The data folder to contain data files (e.g. for testing). 
+** The conf folder to contain configuration files. 
+** The resources folder to contain other resources and dependencies. 
+** Other user-defined folders or files are allowed, but should be avoided. 
+
+
+.The PEAR Structure
+image::images/references/ref.pear/image002.jpg[diagram of the PEAR structure]
+
+
+[[ugr.ref.pear.populating_pear_structure]]
+=== Populating the PEAR structure
+
+After creating the PEAR structure, the component's descriptor files, code files, resources files, and any other files and folders are copied into the corresponding folders of the PEAR structure.
+The developer should make sure that the code would work with this layout of files and folders, and that there are no broken links.
+Although it is strongly discouraged, the optional elements of the PEAR structure can be replaced by other user defined files and folder, if required for the component to work properly. 
+
+[NOTE]
+====
+The PEAR structure must be self-contained.
+For example, this means that the component must run properly independently from the PEAR root folder location.
+If the developer needs to use an absolute path in configuration or descriptor files, then he/she should put these files in the "`conf`"					or "`desc`"					and replace the path of the PEAR root folder with the string "`$main_root`"					. The tools that deploy and use PEAR files should localize the files in the "`conf`"					and "`desc`"					folders by replacing the string "`$main_root`"					with the local absolute path of the PEAR root folder.
+The "`$main_root`"					macro can also be used in the Installation descriptor (install.xml) 
+====
+
+Currently there are three types of component packages depending on their deployment: 
+
+[[ugr.ref.pear.package_type.standard]]
+==== Standard Type
+
+A component package with the *standard*					type must be a valid Analysis Engine, and all the required files to deploy it locally must be included in the PEAR package. 
+
+[[ugr.ref.pear.package_type.service]]
+==== Service Type
+
+A component package with the *service*					type must be deployable locally as a supported UIMA service (e.g.
+Vinci). In this case, all the required files to deploy it locally must be included in the PEAR package. 
+
+[[ugr.ref.pear.package_type.network]]
+==== Network Type
+
+A component package with the network type is not deployed locally but rather in the "`remote`" environment.
+It's accessed as a xref:tug.adoc#ugr.tug.application.remote_services[Network Analysis Engine] (e.g. Vinci Service). The component owner has the responsibility to start the service and make sure it's up and running before it's used by others (like a webmaster that makes sure the web site is up and running). In this case, the PEAR package does not have to contain files required for deployment, but must contain the xref:tug.adoc#ugr.tug.aae.creating_xml_descriptor[network AE descriptor] and the `<DESC>`` tag in the installation descriptor must point to the network AE descriptor.
+
+[[ugr.ref.pear.creating_installation_descriptor]]
+=== Creating the installation descriptor
+
+The installation descriptor is an xml file called install.xml under the metadata folder of the PEAR structure.
+It's also called InsD.
+The InsD XML file should be created in the UTF-8 file encoding.
+The InsD should contain the following sections: 
+
+* `<OS>`: This section is used to specify supported operating systems 
+* `<TOOLKITS>`: This section is used to specify toolkits, such as JDK, needed by the component. 
+* `<SUBMITTED_COMPONENT>`: This is the most important section in the Installation Descriptor. It's used to specify required information about the component. See <<ugr.ref.pear.installation_descriptor>> for detailed information about this section. 
+* `<INSTALLATION>`: This section is explained in section <<ugr.ref.pear.installing>>. 
+
+
+[[ugr.ref.pear.installation_descriptor]]
+=== Documented template for the installation descriptor:
+// <titleabbrev>Installation Descriptor: template</titleabbrev>
+
+The following is a sample "`documented template`" which describes content of the installation descriptor `install.xml`: 
+
+[source]
+----
+<? xml version="1.0" encoding="UTF-8"?>
+<!-- Installation Descriptor Template -->
+<COMPONENT_INSTALLATION_DESCRIPTOR>
+  <!-- Specifications of OS names, including version, etc. -->
+  <OS>
+    <NAME>OS_Name_1</NAME>
+    <NAME>OS_Name_2</NAME>
+  </OS>
+  <!-- Specifications of required standard toolkits -->
+  <TOOLKITS>
+    <JDK_VERSION>JDK_Version</JDK_VERSION>
+  </TOOLKITS>
+
+  <!-- There are 2 types of variables that are used in the InsD:
+       a) $main_root , which will be substituted with the real path to the
+                 main component root directory after installing the
+                 main (submitted) component
+       b) $component_id$root, which will be substituted with the real path
+          to the root directory of a given delegate component after
+          installing the given delegate component -->
+
+  <!-- Specification of submitted component (AE)             -->
+  <!-- Note: submitted_component_id is assigned by developer; -->
+  <!--       XML descriptor file name is set by developer.    -->
+  <!-- Important: ID element should be the first in the       -->
+  <!--            SUBMITTED_COMPONENT section.                -->
+  <!-- Submitted component may include optional specification -->
+  <!-- of Collection Reader that can be used for testing the  -->
+  <!-- submitted component.                                   -->
+  <!-- Submitted component may include optional specification -->
+  <!-- of CAS Consumer that can be used for testing the       -->
+  <!-- submitted component.                                   -->
+
+  <SUBMITTED_COMPONENT>
+    <ID>submitted_component_id</ID>
+    <NAME>Submitted component name</NAME>
+    <DESC>$main_root/desc/ComponentDescriptor.xml</DESC>
+
+    <!-- deployment options:                                   -->
+    <!-- a) "standard" is deploying AE locally                 -->
+    <!-- b) "service"  is deploying AE locally as a service,   -->
+    <!--    using specified command (script)                   -->
+    <!-- c) "network"  is deploying a pure network AE, which   -->
+    <!--    is running somewhere on the network                -->
+
+    <DEPLOYMENT>standard | service | network</DEPLOYMENT>
+
+    <!-- Specifications for "service" deployment option only   -->
+    <SERVICE_COMMAND>$main_root/bin/startService.bat</SERVICE_COMMAND>
+    <SERVICE_WORKING_DIR>$main_root</SERVICE_WORKING_DIR>
+    <SERVICE_COMMAND_ARGS>
+
+      <ARGUMENT>
+        <VALUE>1st_parameter_value</VALUE>
+        <COMMENTS>1st parameter description</COMMENTS>
+      </ARGUMENT>
+
+      <ARGUMENT>
+        <VALUE>2nd_parameter_value</VALUE>
+        <COMMENTS>2nd parameter description</COMMENTS>
+      </ARGUMENT>
+
+    </SERVICE_COMMAND_ARGS>
+
+    <!-- Specifications for "network" deployment option only   -->
+
+    <NETWORK_PARAMETERS>
+      <VNS_SPECS VNS_HOST="vns_host_IP" VNS_PORT="vns_port_No" />
+    </NETWORK_PARAMETERS>
+
+    <!-- General specifications                                -->
+
+    <COMMENTS>Main component description</COMMENTS>
+
+    <COLLECTION_READER>
+      <COLLECTION_ITERATOR_DESC>
+        $main_root/desc/CollIterDescriptor.xml
+      </COLLECTION_ITERATOR_DESC>
+
+      <CAS_INITIALIZER_DESC>
+        $main_root/desc/CASInitializerDescriptor.xml
+      </CAS_INITIALIZER_DESC>
+    </COLLECTION_READER>
+
+    <CAS_CONSUMER>
+      <DESC>$main_root/desc/CASConsumerDescriptor.xml</DESC>
+    </CAS_CONSUMER>
+
+  </SUBMITTED_COMPONENT>
+  <!-- Specifications of the component installation process -->
+  <INSTALLATION>
+    <!-- List of delegate components that should be installed together -->
+    <!-- with the main submitted component (for aggregate components)  -->
+    <!-- Important: ID element should be the first in each             -->
+
+    <!--            DELEGATE_COMPONENT section.                        -->
+    <DELEGATE_COMPONENT>
+      <ID>first_delegate_component_id</ID>
+      <NAME>Name of first required separate component</NAME>
+    </DELEGATE_COMPONENT>
+
+    <DELEGATE_COMPONENT>
+      <ID>second_delegate_component_id</ID>
+      <NAME>Name of second required separate component</NAME>
+    </DELEGATE_COMPONENT>
+
+    <!-- Specifications of local path names that should be replaced -->
+    <!-- with real path names after the main component as well as   -->
+    <!-- all required delegate (library) components are installed.  -->
+    <!-- <FILE> and <REPLACE_WITH> values may use the $main_root or -->
+    <!-- one of the $component_id$root variables.                   -->
+    <!-- Important: ACTION element should be the first in each      -->
+    <!--            PROCESS section.                                -->
+
+    <PROCESS>
+      <ACTION>find_and_replace_path</ACTION>
+      <PARAMETERS>
+        <FILE>$main_root/desc/ComponentDescriptor.xml</FILE>
+        <FIND_STRING>../resources/dict/</FIND_STRING>
+        <REPLACE_WITH>$main_root/resources/dict/</REPLACE_WITH>
+        <COMMENTS>Specify actual dictionary location in XML component
+          descriptor
+        </COMMENTS>
+      </PARAMETERS>
+    </PROCESS>
+
+    <PROCESS>
+      <ACTION>find_and_replace_path</ACTION>
+      <PARAMETERS>
+        <FILE>$main_root/desc/DelegateComponentDescriptor.xml</FILE>
+        <FIND_STRING>
+local_root_directory_for_1st_delegate_component/resources/dict/
+        </FIND_STRING>
+        <REPLACE_WITH>
+          $first_delegate_component_id$root/resources/dict/
+        </REPLACE_WITH>
+        <COMMENTS>
+          Specify actual dictionary location in the descriptor of the 1st
+          delegate component
+        </COMMENTS>
+      </PARAMETERS>
+    </PROCESS>
+
+    <!-- Specifications of environment variables that should be set prior
+         to running the main component and all other reused components.
+         <VAR_VALUE> values may use the $main_root or one of the
+         $component_id$root variables. -->
+
+    <PROCESS>
+      <ACTION>set_env_variable</ACTION>
+      <PARAMETERS>
+        <VAR_NAME>env_variable_name</VAR_NAME>
+        <VAR_VALUE>env_variable_value</VAR_VALUE>
+        <COMMENTS>Set environment variable value</COMMENTS>
+      </PARAMETERS>
+    </PROCESS>
+
+  </INSTALLATION>
+</COMPONENT_INSTALLATION_DESCRIPTOR>
+----
+
+[[ugr.ref.pear.installation_descriptor.submitted_component]]
+==== The SUBMITTED_COMPONENT section
+
+The SUBMITTED_COMPONENT section of the installation descriptor (install.xml) is used to specify required information about the UIMA component.
+Before explaining the details, let's clarify the concept of component ID and "`macros`" used in the installation descriptor.
+The component ID element should be the **first element** in the SUBMITTED_COMPONENT section.
+
+The component id is a string that uniquely identifies the component.
+It should use the JAVA naming convention (e.g.
+com.company_name.project_name.etc.mycomponent).
+
+Macros are variables such as $main_root, used to represent a string such as the full path of a certain directory.
+
+The values of these macros are defined by the PEAR installation process, when the PEAR is installed, and represent the values local to that particular installation.
+The values are stored in the `metadata/PEAR.properties` file that is  generated during PEAR installation.
+The tools and applications that use and deploy PEAR files replace these macros with the corresponding values in the local environment as part of the deployment process in the files included in the conf and desc folders.
+
+Currently, there are two types of macros:
+
+* $main_root, which represents the local absolute path of the main component root directory after deployment. 
+* $__component_id__$root, which represents the local absolute path to the root directory of the component which has _component_id _ as component ID. This component could be, for instance, a delegate component. 
+
+For example, if some part of a descriptor needs to have a path to the data subdirectory of the PEAR, you write ``$main_root/data``.
+If your PEAR refers to a delegate component having the ID "``my.comp.Dictionary``", and you need to specify a path to one of this component's subdirectories, e.g. ``resource/dict``, you write ``$my.comp.Dictionary$root/resources/dict``. 
+
+[[ugr.ref.pear.installation_descriptor.id_name_desc]]
+==== The ID, NAME, and DESC tags
+
+These tags are used to specify the component ID, Name, and descriptor path using the corresponding tags as follows: 
+[source]
+----
+<SUBMITTED_COMPONENT>
+  <ID>submitted_component_id</ID>
+  <NAME>Submitted component name</NAME>
+  <DESC>$main_root/desc/ComponentDescriptor.xml</DESC>
+----
+
+[[ugr.ref.pear.installation_descriptor.deployment_type]]
+==== Tags related to deployment types
+
+As mentioned before, there are currently three types of PEAR packages, depending on the following deployment types
+
+[[ugr.ref.pear.installation_descriptor.deployment_type.standard]]
+===== Standard Type
+
+A component package with the *standard* type must be a valid UIMA Analysis Engine, and all the required files to deploy it must be included in the PEAR package.
+This deployment type should be specified as follows: 
+[source]
+----
+<DEPLOYMENT>standard</DEPLOYMENT>
+----
+
+[[ugr.ref.pear.installation_descriptor.deployment_type.service]]
+===== Service Type
+
+A component package with the *service* type must be deployable locally as a supported UIMA service (e.g.
+Vinci). The installation descriptor must include the path for the executable or script to start the service including its arguments, and the working directory from where to launch it, following this template: 
+
+[source]
+----
+<DEPLOYMENT>service</DEPLOYMENT>
+<SERVICE_COMMAND>$main_root/bin/startService.bat</SERVICE_COMMAND>
+<SERVICE_WORKING_DIR>$main_root</SERVICE_WORKING_DIR>
+<SERVICE_COMMAND_ARGS>
+  <ARGUMENT>
+    <VALUE>1st_parameter_value</VALUE>
+    <COMMENTS>1st parameter description</COMMENTS>
+  </ARGUMENT>
+  <ARGUMENT>
+    <VALUE>2nd_parameter_value</VALUE>
+    <COMMENTS>2nd parameter description</COMMENTS>
+  </ARGUMENT>
+</SERVICE_COMMAND_ARGS>
+----
+
+[[ugr.ref.pear.installation_descriptor.deployment_type.network]]
+===== Network Type
+
+A component package with the network type is not deployed locally, but rather in a "`remote`" environment.
+It's accessed as a network AE (e.g. Vinci Service).
+In this case, the PEAR package does not have to contain files required for deployment, but must contain the network AE descriptor.
+The `<DESC> `tag in the installation descriptor must point to the network AE descriptor.
+Here is a template in the case of Vinci services: 
+
+[source]
+----
+<DEPLOYMENT>network</DEPLOYMENT>
+<NETWORK_PARAMETERS>
+  <VNS_SPECS VNS_HOST="vns_host_IP" VNS_PORT="vns_port_No" />
+</NETWORK_PARAMETERS>
+----
+
+[[ugr.ref.pear.installation_descriptor.collection_reader_cas_consumer]]
+==== The Collection Reader and CAS Consumer tags
+
+These sections of the installation descriptor are used by any specific Collection Reader or CAS Consumer to be used with the packaged analysis engine.
+
+[[ugr.ref.pear.installation_descriptor.installation]]
+==== The INSTALLATION section
+
+The `<INSTALLATION>` section specifies the external dependencies of the component and the operations that should be performed during the PEAR package installation.
+
+The component dependencies are specified in the `<DELEGATE_COMPONENT>` sub-sections, as shown in the installation descriptor template above.
+
+Important: The ID element should be the first element in each `<DELEGATE_COMPONENT>` sub-section.
+
+The `<INSTALLATION>` section may specify the following operations: 
+
+* Setting environment variables that are required to run the installed component. 
++
+This is also how you specify additional classpaths for a Java component - by specifying the setting of an environmental variable  named `CLASSPATH``.
+The `buildComponentClasspath` method  of the `PackageBrowser` class builds a classpath string from what it finds in  the `CLASSPATH` specification here, plus adds a classpath entry for all Jars in the `lib` directory.
+Because of this, there is no need to specify Class Path entries for Jars in the lib directory, when using the Eclipse plugin pear packager or the Maven Pear Packager.
+
+[quote]
+When specifying the value of the CLASSPATH environment  variable, use the semicolon ";" as the separator character, regardless of the target Operating System conventions.
+This delimiter will be replaced with  the right one for the Operating System during PEAR installation.
++
+If your component needs to set the UIMA datapath you must specify the necessary  datapath setting using an environment variable with the key ``uima.datapath``.
+When such a key is specified the `getComponentDataPath` method of the  PackageBrowser class will return the specified datapath settings for your component. 
+
+[WARNING]
+====
+Do not put UIMA Framework Jars into the lib directory of your PEAR; doing so will cause system failures due to class loading issues.
+====
+* Note that you can use "`macros`", like $main_root or $component_id$root in the VAR_VALUE element of the <PARAMETERS> sub-section.
+* Finding and replacing string expressions in files.
+* Note that you can use the "`macros`" in the FILE and REPLACE_WITH elements of the <PARAMETERS> sub-section. 
+
+Important: the ACTION element always should be the 1st element in each <PROCESS> sub-section.
+
+By default, the PEAR Installer will try to process every file in the desc and conf directories of the PEAR package in order to find the "`macros`" and replace them with actual path expressions.
+In addition to this, the installer will process the files specified in the <INSTALLATION> section.
+
+Important: all XML files which are going to be processed should be created using UTF-8 or UTF-16 file encoding.
+All other text files which are going to be processed should be created using the ASCII file encoding.
+
+[[ugr.ref.pear.packaging_into_1_file]]
+=== Packaging the PEAR structure into one file
+
+The last step of the PEAR process is to simply *zip* the content of the PEAR root folder (**not including the root folder itself**) to a PEAR file with the extension "`$$.$$pear`".
+
+To do this you can either use the xref:tools.adoc#ugr.tools.pear.packager[PEAR packaging tools] or you can use the PEAR packaging API that is shown below.
+
+To use the PEAR packaging API you first have to create the necessary information for the PEAR package: 
+
+[source]
+----
+    //define PEAR data  
+    String componentID = "AnnotComponentID";
+    String mainComponentDesc = "desc/mainComponentDescriptor.xml";
+    String classpath ="$main_root/bin;";
+    String datapath ="$main_root/resources;";
+    String mainComponentRoot = "/home/user/develop/myAnnot";
+    String targetDir = "/home/user/develop";
+    Properties annotatorProperties = new Properties();
+    annotatorProperties.setProperty("sysProperty1", "value1");
+----
+
+To create a complete PEAR package in one step call: 
+
+[source]
+----
+PackageCreator.generatePearPackage(
+   componentID, mainComponentDesc, classpath, datapath, 
+   mainComponentRoot, targetDir, annotatorProperties);
+----
+
+The created PEAR package has the file name `<componentID>.pear` and is located in the `<targetDir>`. 
+
+To create just the PEAR installation descriptor in the main component root directory call: 
+
+[source]
+----
+PackageCreator.createInstallDescriptor(componentID, mainComponentDesc,
+   classpath, datapath, mainComponentRoot, annotatorProperties);
+----
+
+To package a PEAR file with an existing installation descriptor call: 
+
+[source]
+----
+PackageCreator.createPearPackage(componentID, mainComponentRoot,
+   targetDir);
+----
+
+The created PEAR package has the file name `<componentID>.pear` and is located in the `<targetDir>`. 
+
+[[ugr.ref.pear.installing]]
+== Installing a PEAR package
+
+The installation of a PEAR package can be done using  the PEAR installer tool (xref:tools.adoc#ugr.tools.pear.installer[PEAR Installer User's Guide], or by an application using the PEAR APIs, directly. 
+
+During the PEAR installation the PEAR file is extracted to the installation directory and the PEAR macros  in the descriptors are updated with the corresponding path.
+At the end of the installation the PEAR verification  is called to check if the installed PEAR package can be started successfully.
+The PEAR verification use the classpath, datapath and the system property settings of the PEAR package to verify the PEAR content.
+Necessary Java library  path settings for native libararies, PATH variable settings or system environment variables cannot be recognized  automatically and the use must take care of that manually.
+
+[NOTE]
+====
+By default the PEAR packages are not installed directly to the specified installation directory.
+For each PEAR a subdirectory with the name of the PEAR's ID is created where the PEAR package is installed to.
+If the PEAR installation  directory already exists, the old content is automatically deleted before the new content is installed.
+====
+
+[[ugr.ref.pear.installing_pear_using_api]]
+=== Installing a PEAR file using the PEAR APIs
+
+The example below shows how to use the PEAR APIs to install a PEAR package and access the installed PEAR package data.
+For more details about the PackageBrowser API, please refer to the Javadocs for the `org.apache.uima.pear.tools` package. 
+
+[source]
+----
+File installDir = new File("/home/user/uimaApp/installedPears");
+File pearFile = new File("/home/user/uimaApp/testpear.pear");
+boolean doVerification = true;
+
+try {
+  // install PEAR package
+  PackageBrowser instPear = PackageInstaller.installPackage(
+ 	installDir, pearFile, doVerification);
+
+  // retrieve installed PEAR data
+  // PEAR package classpath
+  String classpath = instPear.buildComponentClassPath();
+  // PEAR package datapath
+  String datapath = instPear.getComponentDataPath();
+  // PEAR package main component descriptor
+  String mainComponentDescriptor = instPear
+     	.getInstallationDescriptor().getMainComponentDesc();
+  // PEAR package component ID
+  String mainComponentID = instPear
+     	.getInstallationDescriptor().getMainComponentId();
+  // PEAR package pear descriptor
+  String pearDescPath = instPear.getComponentPearDescPath();
+
+  // print out settings
+  System.out.println("PEAR package class path: " + classpath);
+  System.out.println("PEAR package datapath: " + datapath);
+  System.out.println("PEAR package mainComponentDescriptor: " 
+   	+ mainComponentDescriptor);
+  System.out.println("PEAR package mainComponentID: " 
+   	+ mainComponentID);
+  System.out.println("PEAR package specifier path: " + pearDescPath); 	
+
+  } catch (PackageInstallerException ex) {
+    // catch PackageInstallerException - PEAR installation failed
+    ex.printStackTrace();
+    System.out.println("PEAR installation failed");
+  } catch (IOException ex) {
+    ex.printStackTrace();
+    System.out.println("Error retrieving installed PEAR settings");
+  }
+----
+
+To run a PEAR package after it was installed using the PEAR API see the example below.
+It use the  generated PEAR specifier that was automatically created during the PEAR installation.
+For more details about the APIs please refer to the Javadocs. 
+
+[source]
+----
+File installDir = new File("/home/user/uimaApp/installedPears");
+File pearFile = new File("/home/user/uimaApp/testpear.pear");
+boolean doVerification = true;
+
+try {
+
+  // Install PEAR package
+  PackageBrowser instPear = PackageInstaller.installPackage(
+  	installDir, pearFile, doVerification);
+
+  // Create a default resouce manager
+  ResourceManager rsrcMgr = UIMAFramework.newDefaultResourceManager();
+
+  // Create analysis engine from the installed PEAR package using
+  // the created PEAR specifier
+  XMLInputSource in = 
+        new XMLInputSource(instPear.getComponentPearDescPath());
+  ResourceSpecifier specifier =
+        UIMAFramework.getXMLParser().parseResourceSpecifier(in);
+  AnalysisEngine ae = 
+        UIMAFramework.produceAnalysisEngine(specifier, rsrcMgr, null);
+
+  // Create a CAS with a sample document text
+  CAS cas = ae.newCAS();
+  cas.setDocumentText("Sample text to process");
+  cas.setDocumentLanguage("en");
+
+  // Process the sample document
+  ae.process(cas);
+  } catch (Exception ex) {
+         ex.printStackTrace();
+  }
+----
+
+[[ugr.ref.pear.specifier]]
+== PEAR package descriptor
+
+To run an installed PEAR package directly in the UIMA framework the `pearSpecifier` XML descriptor can be used.
+Typically during the PEAR installation such an specifier is automatically generated  and contains all the necessary information to run the installed PEAR package.
+Settings for system environment variables, system PATH settings or Java library path settings cannot be recognized automatically and must be set manually when the JVM is started. 
+
+[NOTE]
+====
+The PEAR may contain specifications for "environment variables" and their settings.
+When such a PEAR is run directly in the UIMA framework, those settings (except for Classpath and Data Path) are converted to Java System properties, and set to the specified values.
+Java cannot set true environmental variables; if such a setting is needed, the application would need to arrange to do this prior to invoking Java.
+
+The Classpath and Data Path settings are used by UIMA to configure a special Resource Manager that is used when code from this PEAR is being run.
+====
+
+The generated PEAR descriptor is located in the component root directory of the installed PEAR package and has a filename like  <componentID>_pear.xml. 
+
+The PEAR package descriptor looks like: 
+
+[source]
+----
+<?xml version="1.0" encoding="UTF-8"?>
+<pearSpecifier xmlns="http://uima.apache.org/resourceSpecifier">
+   <pearPath>/home/user/uimaApp/installedPears/testpear</pearPath>
+   <pearParameters>     <!-- optional -->
+      <nameValuePair>   <!-- any number, repeated -->
+         <name>param1</name>
+         <value><string>stringVal1</string></value>
+      </nameValuePair>
+   </pearParameters>
+   <parameters>         <!-- optional legacy string-valued parameters -->
+      <parameter>       <!-- any number, repeated -->
+        <name>name-of-the-parameter</name>
+        <value>string-value</value>
+      </parameter>
+   </parameters>
+</pearSpecifier>
+----
+
+The `pearPath` setting in the descriptor must point to the component root directory  of the installed PEAR package. 
+
+[NOTE]
+====
+It is not possible to share resources between PEAR Analysis Engines that are instantiated using the PEAR descriptor.
+The PEAR runtime created for each PEAR descriptor has its own specific `ResourceManager` (unless exactly the same Classpath and Data Path are being used). 
+====
+
+The optional `pearParameters` section, if used, specifies parameter values, which are used to customize / override parameter values in the PEAR descriptor.
+The format for parameter values used here is the same as in xref:ref.adoc#ugr.ref.aes.configuration_parameter_settings[component parameters].
+External Settings overrides continue to work for PEAR descriptors, and have precedence, if specified. 
+
+Additionally, there can be a `parameters` section.
+This section supports only string-valued parameters.
+This way of specifying parameters is deprecated and should no longer be used.
+Support for will eventually be removed in a future version of Apache UIMA.
+Parameters set in the `pearParameters` have precedence over parameters defined in `parameters` section.
+For the time being, both sections can be present simultaneously in a PEAR specifier.
diff --git a/uimaj-documentation/src/docs/asciidoc/ref/ref.resources.adoc b/uimaj-documentation/src/docs/asciidoc/ref/ref.resources.adoc
new file mode 100644
index 0000000..e7a30db
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/ref/ref.resources.adoc
@@ -0,0 +1,112 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.ref.resources]]
+= UIMA Resources
+// <titleabbrev>UIMA Resources</titleabbrev>
+
+
+[[ugr.ref.resources.overview]]
+== What is a UIMA Resource?
+
+UIMA uses the term `Resource` to describe all UIMA components that can be acquired by an application or by other resources.
+
+.Resource Kinds
+image::images/references/ref.resources/res_resource_kinds.png["Resource Kinds, a partial list"]
+
+There are many kinds of resources; here's a list of the main kinds: 
+
+*Annotator*::
+a user written component, receives a CAS, does some processing, and returns the possibly updated CAS.
+Variants include CollectionReaders, CAS Consumers, CAS Multipliers.
+
+*Flow Controller*::
+a user written component controlling the flow of CASes within an aggregate.
+
+*External Resource*::
+a user written component.
+Variants include: 
++
+
+* Data - includes special lifecycle call to load data
+* Parameterized - allows multiple instantiations with simple string parameter variants; example: a dictionary, that has variants in content for different languages
+* Configurable - supports configuration from the XML specifier
+
+
+[[ugr.ref.resources.resource_inner_implementations]]
+=== Resource Inner Implementations
+
+Many of the resource kinds include in their specification a (possibly optional) element, which is  the name of a Java class which implements the resource.
+We will call this class the "inner implementation".
+
+The UIMA framework creates instances of Resource from resource specifiers, by calling  the framework's `produceResource(specifier, additional_parameters)` method.
+This call produces a instance of Resource. 
+
+____
+For example, calling produceResource on an AnalysisEngineDescription produces an instance of AnalysisEngine.
+This, in turn will have a reference to the user-written inner implementation class.
+specified by the ``annotatorImplementationName``. 
+
+External resource descriptors may include an `implementationName` element.
+Calling produceResource on a ExternalResourceDescription produces an instance of Resource; the resource obtained by subsequent calls to `getResource(...)`  is dependent on the particular descriptor, and may be an instance of the inner implementation class. 
+____
+
+For external resources, each resource specifier kind handles the case where  the inner implementation is omitted.
+If it is supplied, the named class must implement the interface specified in the bindings for this resource.
+In addition, the particular specifier kind may  further restrict the kinds of classes the user supplies as the implementationName. 
+
+Some examples of this further restriction: 
+
+*customResource*::
+the class must also implement the Resource interface
+
+*dataResource*::
+the class must also implement the SharedResourceObject interface
+
+[[ugr.ref.resources.sharing_across_pipelines]]
+== Sharing Resources, even across pipelines
+// <titleabbrev>Sharing Resources</titleabbrev>
+
+UIMA applications run one or more UIMA Pipelines.
+Each pipeline has a top-level Analysis Engine, which may be an aggregation of many other Analysis Engine components.
+The UIMA framework instantiates Annotator  resources as specified to configure the pipelines.
+
+Sometimes, many identical pipelines are created (for example, in order to exploit multi-core hardware by processing multiple CASes in parallel). In this case, the framework would produce multiple instances of those Annotation resources; these are implemented as multiple instances of the same Java class.
+
+Sets of External Resources plus a CAS Pool and UIMA Extension ClassLoader are set up and kept,  per instance of a ResourceManager;  this instance serves to allow sharing of these items across one or more pipelines. 
+
+* The UIMA Extension ClassLoader (if specified) is used to find the resources to be loaded by the framework
+* The `External Resources` are specified by a pipeline's resource configuration.
+* The CAS Pool is a pool of CASs all with identical type systems and index definitions, associated  with a pipeline.
+
+When setting up a pipeline, the UIMA Framework's `produceResource`  or one of its specialized variants is called, and a new ResourceManager being created and used for that pipeline.
+However, in many cases, it may be advantageous to share the same Resources across multiple pipelines; this is easily doable by passing a common instance of the ResourceManager to the pipeline creation methods (using the additional parameters of the produceResource method).
+
+To handle additional use cases, the ResourceManager has a `copy()` method which creates a copy of the Resource Manager instance.
+The new instance is created with a null CAS Manager; if you want to share the the CAS Pool, you have to copy the CAS Manager: ``newRM.setCasManager(originalRM.getCasManager())``.
+You also may set the Extension Class Loader in the new instance (PEAR wrappers use this to allow PEARs to have their own classpath).  See the Javadocs for details. 
+
+[[ugr.ref.resources.external_resource_multiple_parameterized_instances]]
+== External Resources support for multiple Parameterized Instances
+
+A typical external resource gets a single instantiation, shared with all users of a particular ResourceManager.
+Sometimes, multiple instantiations may be useful (of the same resource).  The framework supports this for  ParameterizedDataResources.
+There's one kind supplied with UIMA - the fileLanguageResourceSpecifier.
+This works by having each call to getResource(name, extra_keys[]) use the extra keys to select a particular instance.
+On the first call for a particular instance, the named resource uses the extra keys to  initialize a new instance by calling its `load` method with a data resource derived from the  extra keys by the named resource. 
+
+For example, the fileLanguageResourceSpecifier uses the language code and goes through  a process with lots of defaulting and fall back to find a resource to load, based on the language code. 
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/ref/ref.xmi.adoc b/uimaj-documentation/src/docs/asciidoc/ref/ref.xmi.adoc
new file mode 100644
index 0000000..0374123
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/ref/ref.xmi.adoc
@@ -0,0 +1,354 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.ref.xmi]]
+= XMI CAS Serialization Reference
+
+This is the specification for the mapping of the UIMA CAS into the XMI (XML Metadata Interchangefootnote:[For details on XMI see Grose et al. Mastering
+    XMI. Java Programming with XMI, XML, and UML. John Wiley & Sons, Inc.
+    2002.]) format.
+XMI is an OMG standard for expressing object graphs in XML.
+The UIMA SDK provides support for XMI through the classes `org.apache.uima.cas.impl.XmiCasSerializer` and ``org.apache.uima.cas.impl.XmiCasDeserializer``.
+
+[[ugr.ref.xmi.xmi_tag]]
+== XMI Tag
+
+The outermost tag is <XMI> and must include a version number and XML namespace attribute: 
+[source]
+----
+<xmi:XMI xmi:version="2.0" xmlns:xmi="http://www.omg.org/XMI">
+  <!-- CAS Contents here -->
+</xmi:XMI>
+----
+
+XML namespacesfootnote:[http://www.w3.org/TR/xml-names11/] are used throughout.
+The "`xmi`" namespace prefix is used to identify elements and attributes that are defined by the XMI specification.
+The XMI document will also define one namespace prefix for each CAS namespace, as described in the next section.
+
+[[ugr.ref.xmi.feature_structures]]
+== Feature Structures
+
+UIMA Feature Structures are mapped to XML elements.
+The name of the element is formed from the CAS type name, making use of XML namespaces as follows.
+
+The CAS type namespace is converted to an XML namespace URI by the following rule: replace all dots with slashes, prepend http:///, and append .ecore.
+
+This mapping was chosen because it is the default mapping used by the Eclipse Modeling Framework (EMF)footnote:[For details on EMF and Ecore see Budinsky et al. Eclipse Modeling Framework 2.0. Addison-Wesley. 2006.] to create namespace URIs from Java package names.
+The use of the http scheme is a common convention, and does not imply any HTTP communication.
+The `.ecore` suffix is due to the fact that the recommended type system definition for a namespace is an xref:tug.adoc#ugr.tug.xmi_emf[ECore model].
+
+Consider the CAS type name `org.myproj.Foo`.
+The CAS namespace (`org.myorg.`) is converted to the XML namespace URI is `http:///org/myproj.ecore`.
+
+The XML element name is then formed by concatenating the XML namespace prefix (which is an arbitrary token, but typically we use the last component of the CAS namespace) with the type name (excluding the namespace).
+
+So the example `org.myproj.Foo` Feature Structure is written to XMI as: 
+[source]
+----
+<xmi:XMI 
+    xmi:version="2.0" 
+    xmlns:xmi="http://www.omg.org/XMI" 
+    xmlns:myproj="http:///org/myproj.ecore">
+  ...
+  <myproj:Foo xmi:id="1"/>
+  ...
+</xmi:XMI>
+----
+
+The `xmi:id` attribute is only required if this object will be referred to from elsewhere in the XMI document.
+If provided, the xmi:id must be unique for each feature.
+
+All namespace prefixes (e.g. `myproj`) in this example must be bound to URIs using the `xmlns:...` attribute, as defined by the XML namespaces specification.
+
+[[ugr.ref.xmi.primitive_features]]
+== Primitive Features
+
+CAS features of primitive types (String, Boolean, Byte, Short, Integer, Long , Float, or Double) can be mapped either to XML attributes or XML elements.
+For example, a CAS FeatureStructure of type org.myproj.Foo, with features: 
+[source]
+----
+begin     = 14
+end       = 19
+myFeature = "bar"
+----
+could be mapped to: 
+
+[source]
+----
+<xmi:XMI xmi:version="2.0" xmlns:xmi="http://www.omg.org/XMI"
+    xmlns:myproj="http:///org/myproj.ecore">
+  ...
+  <myproj:Foo xmi:id="1" begin="14" end="19" myFeature="bar"/>
+  ...
+</xmi:XMI>
+----
+
+or equivalently: 
+
+[source]
+----
+<xmi:XMI xmi:version="2.0" xmlns:xmi="http://www.omg.org/XMI"
+    xmlns:myproj="http:///org/myproj.ecore">
+  ...
+  <myproj:Foo xmi:id="1">
+    <begin>14</begin>
+    <end>19</end>
+    <myFeature>bar</myFeature>
+  </myproj:Foo>
+  ...
+</xmi:XMI>
+----
+
+The attribute serialization is preferred for compactness, but either representation is allowable.
+Mixing the two styles is allowed; some features can be represented as attributes and others as elements.
+
+[[ugr.ref.xmi.reference_features]]
+== Reference Features
+
+CAS features that are references to other feature structures (excluding arrays and lists, which are handled separately) are serialized as ID references.
+
+If we add to the previous CAS example a feature structure of type org.myproj.Baz, with feature "`myFoo`" that is a reference to the Foo object, the serialization would be: 
+
+[source]
+----
+<xmi:XMI xmi:version="2.0" xmlns:xmi="http://www.omg.org/XMI"
+    xmlns:myproj="http:///org/myproj.ecore">
+  ...
+  <myproj:Foo xmi:id="1" begin="14" end="19" myFeature="bar"/>
+  <myproj:Baz xmi:id="2" myFoo="1"/>
+  ...
+</xmi:XMI>
+----
+
+As with primitive-valued features, it is permitted to use an element rather than an attribute.
+However, the syntax is slightly different:
+
+[source]
+----
+<myproj:Baz xmi:id="2">
+   <myFoo href="#1"/>
+<myproj.Baz>
+----
+
+Note that in the attribute representation, a reference feature is indistinguishable from an integer-valued feature, so the meaning cannot be determined without prior knowledge of the type system.
+The element representation is unambiguous.
+
+[[ugr.ref.xmi.array_and_list_features]]
+== Array and List Features
+
+For a CAS feature whose range type is one of the CAS array or list types, the XMI serialization depends on the setting of the "`multipleReferencesAllowed`" attribute for that feature in the xref:ref.adoc#ugr.ref.xml.component_descriptor.type_system.features[UIMA Type System Description].
+
+An array or list with `multipleReferencesAllowed = false` (the default) is serialized as a __multi-valued__ property in XMI.
+An array or list with `multipleReferencesAllowed = true` is serialized as a first-class object.
+Details are described below.
+
+[[ugr.ref.xmi.array_and_list_features.as_multi_valued_properties]]
+=== Arrays and Lists as Multi-Valued Properties
+
+In XMI, a multi-valued property is the most natural XMI representation for most cases.
+Consider the example where the FeatureStructure of type `org.myproj.Baz` has a feature `myIntArray` whose value is the integer array `{2,4,6}`.
+This can be mapped to: 
+
+[source]
+----
+<myproj:Baz xmi:id="3" myIntArray="2 4 6"/>
+---- 
+
+or equivalently: 
+
+[source]
+----
+<myproj:Baz xmi:id="3">
+  <myIntArray>2</myIntArray>
+  <myIntArray>4</myIntArray>
+  <myIntArray>6</myIntArray>
+</myproj:Baz>
+----
+
+Note that String arrays whose elements contain embedded spaces MUST use the latter mapping.
+
+`FSArray` or `FSList` features are serialized in a similar way.
+For example an `FSArray` feature that contains references to the elements with `xmi:id`'s `13` and `42` could be serialized as: 
+
+[source]
+----
+<myproj:Baz xmi:id="3" myFsArray="13 42"/>
+---- 
+
+or: 
+
+[source]
+----
+<myproj:Baz xmi:id="3">
+  <myFsArray href="#13"/>
+  <myFsArray href="#42"/>
+</myproj:Baz>
+----
+
+[[ugr.ref.xmi.array_and_list_features.as_1st_class_objects]]
+=== Arrays and Lists as First-Class Objects
+
+The multi-valued-property representation described in the previous section does not allow multiple references to an array or list object.
+Therefore, it cannot be used for features that are defined to allow multiple references (i.e. features for which multipleReferencesAllowed = true in the Type System Description).
+
+When `multipleReferencesAllowed` is set to true, array and list features are serialized as references, and the array or list objects are serialized as separate objects in the XMI.
+Consider again the example where the Feature Structure of type `org.myproj.Baz` has a feature `myIntArray` whose value is the integer array `{2,4,6}`. If `myIntArray` is defined with multipleReferencesAllowed=true, the serialization will be as follows: 
+
+[source]
+----
+<myproj:Baz xmi:id="3" myIntArray="4"/>
+----
+
+or: 
+
+[source]
+----
+<myproj:Baz xmi:id="3">
+  <myIntArray href="#4"/>
+</myproj:Baz>
+----
+
+with the array object serialized as 
+
+[source]
+----
+<cas:IntegerArray xmi:id="4" elements="2 4 6"/>
+----
+
+or: 
+
+[source]
+----
+<cas:IntegerArray xmi:id="4">
+  <elements>2</elements>
+  <elements>4</elements>
+  <elements>6</elements>
+</cas:IntegerArray>
+----
+
+Note that in this case, the XML element name is formed from the CAS type name (e.g. `uima.cas.IntegerArray`) in the same way as for other Feature Structures.
+The elements of the array are serialized either as a space-separated attribute named `elements` or as a series of child elements named `elements`.
+
+List nodes are just standard FeatureStructures with `head` and `tail` features, and are serialized using the normal Feature Structure serialization.
+For example, an `IntegerList` with the values `2`, `4`, and `6` would be serialized as the four objects: 
+[source]
+----
+<cas:NonEmptyIntegerList xmi:id="10" head="2" tail="11"/>
+<cas:NonEmptyIntegerList xmi:id="11" head="4" tail="12"/>
+<cas:NonEmptyIntegerList xmi:id="12" head="6" tail="13"/>
+<cas:EmptyIntegerList xmi:id"13"/>
+----
+
+This representation of arrays allows multiple references to an array of list.
+It also allows a feature with range type TOP to refer to an array or list.
+However, it is a very unnatural representation in XMI and does not support interoperability with other XMI-based systems, so we instead recommend using the multi-valued-property representation described in the previous section whenever it is possible.
+
+When a feature is specified in the descriptor without a multipleReferencesAllowed attribute, or with the attribute specified as `false`, but the framework discovers multiple references during serialization, it will issue a message to the log say that it discovered this (look for the phrase __serialized in duplicate__).
+The serialization will continue, but the multiply-referenced items will  be serialized in duplicate.
+
+[[ugr.ref.xmi.null_array_list_elements]]
+=== Null Array/List Elements
+
+In UIMA, an element of an FSArray or FSList may be null.
+In XMI, multi-valued properties do not permit null values.
+As a workaround for this, we use a dummy instance of the special type `cas:NULL`, which has `xmi:id="0"`.
+For example, in the following example the "`myFsArray`" feature refers to an FSArray whose second element is null: 
+
+[source]
+----
+<cas:NULL xmi:id="0"/>
+<myproj:Baz xmi:id="3">
+  <myFsArray href="#13"/>
+  <myFsArray href="#0"/>
+  <myFsArray href="#42"/>
+</myproj:Baz>
+----
+
+[[ugr.ref.xmi.sofas_views]]
+== Subjects of Analysis (Sofas) and Views
+
+A UIMA CAS contain one or more subjects of analysis (Sofas). These are serialized no differently from any other feature structure.
+For example: 
+
+[source]
+----
+<?xml version="1.0"?>
+<xmi:XMI xmi:version="2.0" xmlns:xmi=http://www.omg.org/XMI
+    xmlns:cas="http:///uima/cas.ecore">
+  <cas:Sofa xmi:id="1" sofaNum="1"
+      text="the quick brown fox jumps over the lazy dog."/>
+</xmi:XMI>
+----
+
+Each Sofa defines a separate View.
+Feature Structures in the CAS can be members of one or more views.
+(A Feature Structure that is a member of a view is indexed in its IndexRepository, but that is an implementation detail.)
+
+In the XMI serialization, views will be represented as first-class objects.
+Each View has an (optional) "`sofa`" feature, which references a sofa, and multi-valued reference to the members of the View.
+For example:
+
+[source]
+----
+<cas:View sofa="1" members="3 7 21 39 61"/>
+----
+
+Here the integers 3, 7, 21, 39, and 61 refer to the xmi:id fields of the objects that are members of this view.
+
+[[ugr.ref.xmi.linking_to_ecore_type_system]]
+== Linking an XMI Document to its Ecore Type System
+// <titleabbrev>Linking XMI docs to Ecore Type System</titleabbrev>
+
+If the CAS Type System has been saved to an xref:tug.adoc#ugr.tug.xmi_emf[Ecore file], it is possible to store a link from an XMI document to that Ecore type system.
+This is done using an `xsi:schemaLocation` attribute on the root XMI element.
+
+The `xsi:schemaLocation` attribute is a space-separated list that represents a mapping from namespace URI (e.g.
+`http:///org/myproj.ecore`) to the physical URI of the `.ecore` file containing the type system for that namespace.
+For example: 
+
+[source]
+----
+xsi:schemaLocation=
+  "http:///org/myproj.ecore file:/c:/typesystems/myproj.ecore"
+----
+
+would indicate that the definition for the org.myproj CAS types is contained in the file `c:/typesystems/myproj.ecore`.
+You can specify a different mapping for each of your CAS namespaces, using a space separated list.
+For details see Budinsky et al. __Eclipse Modeling Framework__.
+
+[[ugr.ref.xmi.delta]]
+== Delta CAS XMI Format
+
+The Delta CAS XMI serialization format is designed primarily to reduce the overhead serialization when calling annotators  configured as services.
+Only Feature Structures and Views that are new or modified by the service   are serialized and returned by the service. 
+
+The classes `org.apache.uima.cas.impl.XmiCasSerializer` and `org.apache.uima.cas.impl.XmiCasDeserializer` support serialization of only the modifications to the CAS.
+A caller is expected to set a marker to indicate the point from which changes to the CAS are to be tracked. 
+
+A Delta CAS XMI document contains only the Feature Structures and Views that have been added or modified.
+The new and modified Feature Structures are represented in exactly the format as in a complete CAS serialization.
+The ` cas:View ` element has been extended with three additional attributes to represent modifications to  View membership.
+These new attributes are ``added_members``, `deleted_members` and ``reindexed_members``.
+For example: 
+
+[source]
+----
+<cas:View sofa="1" added_members="63 77" 
+          deleted_member="7 61" reindexed_members="39" />
+----
+
+Here the integers 63, 77 represent xmi:id fields of the objects that have been newly added members to this View, 7 and 61 are xmi:id fields of the objects that have been removed from this view and 39 is the xmi:id of an object to be reindexed in this view. 
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/ref/ref.xml.component_descriptor.adoc b/uimaj-documentation/src/docs/asciidoc/ref/ref.xml.component_descriptor.adoc
new file mode 100644
index 0000000..bc846d3
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/ref/ref.xml.component_descriptor.adoc
@@ -0,0 +1,1772 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.ref.xml.component_descriptor]]
+= Component Descriptor Reference
+
+This chapter is the reference guide for the UIMA SDK's Component Descriptor XML schema.
+A _Component Descriptor_ (also sometimes called a _Resource Specifier_ in the code) is an XML file that either (a) completely describes a component, including all information needed to construct the component and interact with it, or (b) specifies how to connect to and interact with an existing component that has been published as a remote service. _Component_ (also called __Resource__) is a general term for modules produced by UIMA developers and used by UIMA applications.
+The types of Components are: Analysis Engines, Collection Readers, CAS Initializers
+footnote:[This component is deprecated and should not be use in new development.], CAS Consumers, and Collection Processing Engines.
+However, Collection Processing Engine Descriptors are significantly different in format and are covered in a xref:ref.adoc#ugr.ref.xml.cpe_descriptor[separate chapter].
+
+<<ugr.ref.xml.component_descriptor.notation>> describes the notation used in this chapter.
+
+<<ugr.ref.xml.component_descriptor.imports>> describes the UIMA SDK's _import_ syntax, used to allow XML descriptors to import information from other XML files, to allow sharing of information between several XML descriptors.
+
+<<ugr.ref.xml.component_descriptor.aes>> describes the XML format for __Analysis Engine Descriptors__.
+These are descriptors that completely describe Analysis Engines, including all information needed to construct and interact with them.
+
+<<ugr.ref.xml.component_descriptor.collection_processing_parts>> describes the XML format for __Collection Processing Component Descriptors__.
+This includes Collection Iterator, CAS Initializer, and CAS Consumer Descriptors.
+
+<<ugr.ref.xml.component_descriptor.service_client>> describes the XML format for __Service Client Descriptors__, which specify how to connect to and interact with resources deployed as remote services.
+
+<<ugr.ref.xml.component_descriptor.custom_resource_specifiers>> describes the XML format for __Custom Resource Specifiers__, which allow you to plug in your own Java class as a UIMA Resource.
+
+[[ugr.ref.xml.component_descriptor.notation]]
+== Notation
+
+This chapter uses an informal notation to specify the syntax of Component Descriptors.
+The formal syntax is defined by an XML schema definition, which is contained in the file ``resourceSpecifierSchema.xsd``,   located in the `uima-core.jar` file.
+
+The notation used in this chapter is:
+
+* An ellipsis (...) inside an element body indicates that the substructure of that element has been omitted (to be described in another section of this chapter). An example of this would be: 
++
+[source]
+----
+<analysisEngineMetaData>
+...
+</analysisEngineMetaData>
+----
++
+An ellipsis immediately after an element indicates that the element type may be may be repeated arbitrarily many times.
+For example: 
++
+[source]
+----
+<parameter>[String]</parameter>
+<parameter>[String]</parameter>
+...
+----
++
+indicates that there may be arbitrarily many parameter elements in this context.
+* Bracketed expressions (e.g. ``[String]``) indicate the type of value that may be used at that location.
+* A vertical bar, as in ``true|false``, indicates alternatives. This can be applied to literal values, bracketed type names, and elements.
+* Which elements are optional and which are required is specified in prose, not in the syntax definition. 
+
+
+[[ugr.ref.xml.component_descriptor.imports]]
+== Imports
+
+The UIMA SDK defines a particular syntax for XML descriptors to import information from other XML files.
+When one of the following appears in an XML descriptor: 
+[source]
+----
+<import location="[URL]" /> or
+<import name="[Name]" />
+----
+it indicates that information from a separate XML file is being imported.
+Note that imports are allowed only in certain places in the descriptor.
+In the remainder of this chapter, it will be indicated at which points imports are allowed.
+
+If an import specifies a `location` attribute, the value of that attribute specifies the URL at which the XML file to import will be found.
+This can be a relative URL, which will be resolved relative to the descriptor containing the `import` element, or an absolute URL.
+Relative URLs can be written without a protocol/scheme (e.g., "`file:`"), and without a host machine name.
+In this case the relative URL might look something like `org/apache/myproj/MyTypeSystem.xml.`
+
+An absolute URL is written with one of the following prefixes, followed by a path such as ``org/apache/myproj/MyTypeSystem.xml``: 
+
+* `file:/` ← has no network address
+* `\file:///` ← has an empty network address
+* `\file://some.network.address/`
+
+For more information about URLs, please read the javadoc information for the Java class "`URL`".
+
+If an import specifies a `name` attribute, the value of that attribute should take the form of a Java-style dotted name (e.g. ``org.apache.myproj.MyTypeSystem``). An .xml file with this name will be searched for in the classpath or datapath (described below). As in Java, the dots in the name will be converted to file path separators.
+So an import specifying the example name in this paragraph will result in a search for `org/apache/myproj/MyTypeSystem.xml` in the classpath or datapath.
+
+The datapath works similarly to the classpath but can be set programmatically through the resource manager API.
+Application developers can specify a datapath during initialization, using the following code: 
+
+[source]
+----
+ResourceManager resMgr = UIMAFramework.newDefaultResourceManager();
+resMgr.setDataPath(yourPathString);
+AnalysisEngine ae = 
+  UIMAFramework.produceAnalysisEngine(desc, resMgr, null);
+----
+
+The default datapath for the entire JVM can be set via the `uima.datapath` Java system property, but this feature should only be used for standalone applications that don't need to run in the same JVM as other code that may need a different datapath.
+
+The value of a name or location attribute may be parameterized with references to external override variables using the `${variable-name}` syntax. 
+
+[source]
+----
+<import location="Annotator${with}ExternalOverrides.xml" />
+----
+
+If a variable is undefined the value is left unmodified and a warning message identifies the missing variable.
+
+Previous versions of UIMA also supported XInclude.
+That support didn't work in many situations, and it is no longer supported.
+To include other files, please use <import>.
+
+[[ugr.ref.xml.component_descriptor.type_system]]
+== Type System Descriptors
+
+A Type System Descriptor is used to define the types and features that can be represented in the CAS.
+A Type System Descriptor can be imported into an Analysis Engine or Collection Processing Component Descriptor.
+
+The basic structure of a Type System Descriptor is as follows: 
+[source]
+----
+<typeSystemDescription xmlns="http://uima.apache.org/resourceSpecifier">
+
+  <name> [String] </name>
+  <description>[String]</description>
+  <version>[String]</version>
+  <vendor>[String]</vendor> 
+
+  <imports>
+    <import ...>
+    ...
+  </imports> 
+
+  <types>
+    <typeDescription>
+      ...
+    </typeDescription>
+
+    ...
+
+  </types>
+
+</typeSystemDescription>
+----
+
+All of the subelements are optional.
+
+[[ugr.ref.xml.component_descriptor.type_system.imports]]
+=== Imports
+
+The `imports` section allows this descriptor to import types from other type system descriptors.
+The import syntax is described in <<ugr.ref.xml.component_descriptor.imports>>.
+A type system may import any number of other type systems and then define additional types which refer to imported types.
+Circular imports are allowed.
+
+[[ugr.ref.xml.component_descriptor.type_system.types]]
+=== Types
+
+The `types` element contains zero or more `typeDescription` elements.
+Each `typeDescription` has the form: 
+[source]
+----
+<typeDescription>
+  <name>[TypeName]</name>
+  <description>[String]</description>
+  <supertypeName>[TypeName]</supertypeName>
+  <features>
+    ...
+  </features>
+</typeDescription>
+----
+
+The name element contains the name of the type.
+A `[TypeName]` is a dot-separated list of names, where each name consists of a letter followed by any number of letters, digits, or underscores. `TypeNames` are case sensitive.
+Letter and digit are as defined by Java; therefore, any Unicode letter or digit may be used (subject to the character encoding defined by the descriptor file's XML header). The name following the final dot is considered to be the "`short name`" of the type; the preceding portion is the namespace (analogous to the package.class syntax used in Java). Namespaces beginning with uima are reserved and should not be used.
+Examples of valid type names are:
+
+* test.TokenAnnotation
+* org.myorg.TokenAnnotation
+* com.my_company.proj123.TokenAnnotation 
+
+These would all be considered distinct types since they have different namespaces.
+Best practice here is to follow the normal Java naming conventions of having namespaces be all lowercase, with the short type names having an initial capital, but this is not mandated, so `ABC.mYtyPE` is an allowed type name.
+While type names without namespaces (e.g. `TokenAnnotation` alone) are allowed, but discouraged because naming conflicts can then result when combining annotators that use different type systems.
+
+The `description` element contains a textual description of the type.
+The `supertypeName` element contains the name of the type from which it inherits (this can be set to the name of another user-defined type, or it may be set to any built-in type which may be subclassed, such as `uima.tcas.Annotation` for a new annotation type or `uima.cas.TOP` for a new type that is not an annotation). All three of these elements are required.
+
+[[ugr.ref.xml.component_descriptor.type_system.features]]
+=== Features
+
+The `features` element of a `typeDescription` is required only if the type we are specifying introduces new features.
+If the `features` element is present, it contains zero or more `featureDescription` elements, each of which has the form:
+
+[source]
+----
+<featureDescription>
+  <name>[Name]</name>
+  <description>[String]</description>
+  <rangeTypeName>[Name]</rangeTypeName>
+  <elementType>[Name]</elementType>
+  <multipleReferencesAllowed>true|false</multipleReferencesAllowed>
+</featureDescription>
+----
+
+A feature's name follows the same rules as a type short name –a letter followed by any number of letters, digits, or underscores.
+Feature names are case sensitive.
+
+The feature's `rangeTypeName` specifies the type of value that the feature can take.
+This may be the name of any type defined in your type system, or one of the predefined types.
+All of the predefined types have names that are prefixed with `uima.cas` or ``uima.tcas``, for example: 
+
+[source]
+----
+uima.cas.TOP 
+uima.cas.String
+uima.cas.Long 
+uima.cas.FSArray
+uima.cas.StringList
+uima.tcas.Annotation.
+----
+For a complete list of predefined types, see the CAS API documentation.
+
+The `elementType` of a feature is optional, and applies only when the `rangeTypeName` is `uima.cas.FSArray` or `uima.cas.FSList` The `elementType` specifies what type of value can be assigned as an element of the array or list.
+This must be the name of a non-primitive type.
+If omitted, it defaults to ``uima.cas.TOP``, meaning that any FeatureStructure can be assigned as an element the array or list.
+Note: depending on the CAS Interface that you use in your code, this constraint may or may not be enforced.
+Note: At run time, the elementType is available from a runtime Feature object  (using the `a_feature_object.getRange().getComponentType()` method)  only when specified for the `uima.cas.FSArray` ranges; it isn't available for `uima.cas.FSList` ranges. 
+
+The `multipleReferencesAllowed` feature is optional, and applies only when the `rangeTypeName` is an array or list type (it applies to arrays and lists of primitive as well as non-primitive types). Setting this to false (the default) indicates that this feature has exclusive ownership of the array or list, so changes to the array or list are localized.
+Setting this to true indicates that the array or list may be shared, so changes to it may affect other objects in the CAS.
+Note: there is currently no guarantee that the framework will enforce this restriction.
+However, this setting may affect how the CAS is serialized.
+
+[[ugr.ref.xml.component_descriptor.type_system.string_subtypes]]
+=== String Subtypes
+
+There is one other special type that you can declare -- a subset of the String type that specifies a restricted set of allowed values.
+This is useful for features that can have only certain String values, such as parts of speech.
+Here is an example of how to declare such a type:
+
+[source]
+----
+<typeDescription>
+  <name>PartOfSpeech</name>
+  <description>A part of speech.</description>
+  <supertypeName>uima.cas.String</supertypeName>
+  <allowedValues>
+    <value>
+      <string>NN</string>
+      <description>Noun, singular or mass.</description>
+    </value>
+    <value>
+      <string>NNS</string>
+      <description>Noun, plural.</description>
+    </value>
+    <value>
+      <string>VB</string>
+      <description>Verb, base form.</description>
+    </value>
+    ...
+  </allowedValues>
+</typeDescription>
+----
+
+[[ugr.ref.xml.component_descriptor.aes]]
+== Analysis Engine Descriptors
+
+Analysis Engine (AE) descriptors completely describe Analysis Engines.
+There are two basic types of Analysis Engines -- __Primitive__ and __Aggregate__.
+A _Primitive_ Analysis Engine is a container for a single __annotator__, where as an _Aggregate_ Analysis Engine is composed of a collection of other Analysis Engines.
+(For more information on this and other terminology, see the xref:oas.adoc#ugr.ovv.conceptual[Conceptual Overview].
+
+Both Primitive and Aggregate Analysis Engines have descriptors, and the two types of descriptors have some similarities and some differences. <<ugr.ref.xml.component_descriptor.aes.primitive>> discusses Primitive Analysis Engine descriptors. <<ugr.ref.xml.component_descriptor.aes.aggregate>> then  describes how Aggregate Analysis Engine descriptors are different.
+
+[[ugr.ref.xml.component_descriptor.aes.primitive]]
+=== Primitive Analysis Engine Descriptors
+
+[[ugr.ref.xml.component_descriptor.aes.primitive.basic]]
+==== Basic Structure
+
+[source]
+----
+<?xml version="1.0" encoding="UTF-8" ?>
+<analysisEngineDescription 
+        xmlns="http://uima.apache.org/resourceSpecifier">
+  <frameworkImplementation>org.apache.uima.java</frameworkImplementation> 
+
+  <primitive>true</primitive>
+  <annotatorImplementationName> [String] </annotatorImplementationName>
+
+  <analysisEngineMetaData>
+    ...
+  </analysisEngineMetaData>
+
+  <externalResourceDependencies>
+    ...
+  </externalResourceDependencies>
+
+  <resourceManagerConfiguration>
+    ...
+  </resourceManagerConfiguration>
+
+</analysisEngineDescription>
+----
+
+The document begins with a standard XML header.
+The recommended root tag is ``<analysisEngineDescription>``, although `<taeDescription>` is also allowed for backwards compatibility.
+
+Within the root element we declare that we are using the XML namespace `http://uima.apache.org/resourceSpecifier.` It is required that this namespace be used; otherwise, the descriptor will not be able to be validated for errors.
+
+The first subelement, `<frameworkImplementation>,` currently must have the value ``org.apache.uima.java``, or ``org.apache.uima.cpp``.
+In future versions, there may be other framework implementations, or perhaps implementations produced by other vendors.
+
+The second subelement, `<primitive>,` contains the Boolean value ``true``, indicating that this XML document describes a _Primitive_ Analysis Engine.
+
+The next subelement,``
+          <annotatorImplementationName>`` is how the UIMA framework determines which annotator class to use.
+This should contain a fully-qualified Java class name for Java implementations, or the name of a .dll or .so file for C++ implementations.
+
+The `<analysisEngineMetaData>` object contains descriptive information about the analysis engine and what it does.
+It is described in <<ugr.ref.xml.component_descriptor.aes.metadata>>.
+
+The `<externalResourceDependencies>` and `<resourceManagerConfiguration>` elements declare the external resource files that the analysis engine relies upon.
+They are optional and are described in <<ugr.ref.xml.component_descriptor.aes.primitive.external_resource_dependencies>> and <<ugr.ref.xml.component_descriptor.aes.primitive.resource_manager_configuration>>.
+
+[[ugr.ref.xml.component_descriptor.aes.metadata]]
+==== Analysis Engine MetaData
+
+[source]
+----
+<analysisEngineMetaData>
+  <name> [String] </name>
+  <description>[String]</description>
+  <version>[String]</version>
+  <vendor>[String]</vendor>
+
+  <configurationParameters> ...  </configurationParameters>
+
+  <configurationParameterSettings>
+    ...
+  </configurationParameterSettings> 
+
+  <typeSystemDescription> ... </typeSystemDescription> 
+
+  <typePriorities> ... </typePriorities> 
+
+  <fsIndexCollection> ... </fsIndexCollection>
+
+  <capabilities> ... </capabilities>
+
+  <operationalProperties> ... </operationalProperties>
+
+</analysisEngineMetaData>
+----
+
+The `analysisEngineMetaData` element contains four simple string fields –``name``, ``description``, ``version``, and ``vendor``.
+Only the `name` field is required, but providing values for the other fields is recommended.
+The `name` field is just a descriptive name meant to be read by users; it does not need to be unique across all Analysis Engines.
+
+Configuration parameters are described in <<ugr.ref.xml.component_descriptor.aes.configuration_parameters>>.
+
+The other sub-elements –``typeSystemDescription``, ``typePriorities``, ``fsIndexes``, `capabilities` and `operationalProperties` are described in the following sections.
+The only one of these that is required is ``capabilities``; the others are optional.
+
+[[ugr.ref.xml.component_descriptor.aes.type_system]]
+==== Type System Definition
+
+[source]
+----
+<typeSystemDescription>
+
+  <name> [String] </name>
+  <description>[String]</description>
+  <version>[String]</version>
+  <vendor>[String]</vendor> 
+
+  <imports>
+    <import ...>
+    ...
+  </imports> 
+
+  <types>
+    <typeDescription>
+      ...
+    </typeDescription>
+
+    ...
+
+  </types>
+
+</typeSystemDescription>
+----
+
+A `typeSystemDescription` element defines a type system for an Analysis Engine.
+The syntax for the element is described in <<ugr.ref.xml.component_descriptor.type_system>>.
+
+The recommended usage is to `import` an external type system, using the import syntax described in <<ugr.ref.xml.component_descriptor.imports>> of this chapter.
+For example: 
+[source]
+----
+<typeSystemDescription>
+  <imports>
+    <import location="MySharedTypeSystem.xml">
+  </imports>
+</typeSystemDescription>
+----
+
+This allows several AEs to share a single type system definition.
+The file `MySharedTypeSystem.xml` would then contain the full type system information, including the ``name``, ``description``, ``vendor``, ``version``, and ``types``.
+
+[[ugr.ref.xml.component_descriptor.aes.type_priority]]
+==== Type Priority Definition
+
+[source]
+----
+<typePriorities>
+  <name> [String] </name>
+  <description>[String]</description>
+  <version>[String]</version>
+  <vendor>[String]</vendor>
+
+  <imports>
+    <import ...>
+    ...
+  </imports> 
+
+  <priorityLists>
+    <priorityList>
+      <type>[TypeName]</type>
+      <type>[TypeName]</type>
+        ...
+    </priorityList>
+
+    ...
+
+  </priorityLists>
+</typePriorities>
+----
+
+The `<typePriorities>` element contains zero or more `<priorityList>` elements; each `<priorityList>` contains zero or more types.
+Like a type system, a type priorities definition may also declare a name, description, version, and vendor, and may import other type priorities.
+See <<ugr.ref.xml.component_descriptor.imports>> for the import syntax.
+
+Type priority is used when iterating over feature structures in the CAS.
+For example, if the CAS contains a `Sentence` annotation and a `Paragraph` annotation with the same span of text (i.e.
+a one-sentence paragraph), which annotation should be returned first by an iterator? Probably the Paragraph, since it is conceptually "`bigger,`" but the framework does not know that and must be explicitly told that the Paragraph annotation has priority over the Sentence annotation, like this: 
+[source]
+----
+<typePriorities>
+  <priorityList>
+    <type>org.myorg.Paragraph</type>
+    <type>org.myorg.Sentence</type>
+  </priorityList>
+</typePriorities>
+----
+
+All of the `<priorityList>` elements defined in the descriptor (and in all component descriptors of an aggregate analysis engine descriptor) are merged to produce a single priority list.
+
+Subtypes of types specified here are also ordered, unless overridden by another user-specified type ordering.
+For example, if you specify type A comes before type B, then subtypes of A will come before subtypes of B, unless there is an overriding specification which declares some subtype of B comes before some subtype of A.
+
+If there are inconsistencies between the priority list (type A declared before type B in one priority list, and type B declared before type A in another), the framework will throw an exception.
+
+User defined indexes may declare if they wish to use the type priority or not; see the next section.
+
+[[ugr.ref.xml.component_descriptor.aes.index]]
+==== Index Definition
+
+[source]
+----
+<fsIndexCollection>
+
+  <name>[String]</name>
+  <description>[String]</description>
+  <version>[String]</version>
+  <vendor>[String]</vendor> 
+
+  <imports>
+    <import ...>
+    ...
+  </imports>
+
+  <fsIndexes> 
+
+    <fsIndexDescription>
+      ...
+    </fsIndexDescription>
+
+    <fsIndexDescription>
+      ...
+    </fsIndexDescription>
+
+  </fsIndexes>
+
+</fsIndexCollection>
+----
+
+The `fsIndexCollection` element declares __xref:ref.adoc#ugr.ref.cas.indexes_and_iterators[Feature Structure Indexes]__, each of which defined an index that holds feature structures of a given type.
+Information in the CAS is always accessed through an index.
+There is a built-in default annotation index declared which can be used to access instances of type `uima.tcas.Annotation` (or its subtypes), sorted based on their `begin` and `end` features, and the type priority ordering (if specified).  For all other types, there is a default, unsorted (bag) index.
+If there is a need for a specialized index it must be declared in this element of the descriptor.
+
+Like type systems and type priorities, an `fsIndexCollection` can declare a ``name``, ``description``, ``vendor``, and ``version``, and may import other ``fsIndexCollection``s.
+The import syntax is described in <<ugr.ref.xml.component_descriptor.imports>>.
+
+An `fsIndexCollection` may also define zero or more `fsIndexDescription` elements, each of which defines a single index.
+Each `fsIndexDescription` has the form: 
+
+[source]
+----
+<fsIndexDescription>
+
+  <label>[String]</label>
+  <typeName>[TypeName]</typeName>
+  <kind>sorted|bag|set</kind>
+
+  <keys>
+
+    <fsIndexKey>
+      <featureName>[Name]</featureName>
+      <comparator>standard|reverse</comparator>
+    </fsIndexKey>
+
+    <fsIndexKey>
+      <typePriority/>
+    </fsIndexKey>
+
+    ...
+
+  </keys>
+</fsIndexDescription>
+----
+
+The `label` element defines the name by which applications and annotators refer to this index.
+The `typeName` element contains the name of the type that will be contained in this index.
+This must match one of the type names defined in the ``<typeSystemDescription>``.
+
+There are three possible values for the `<kind>` of index.
+Sorted indexes enforce an ordering of feature structures, based on defined keys.
+Bag indexes do not enforce ordering, and have no defined keys.
+Set indexes do not enforce ordering, but use defined keys to specify equivalence classes;  addToIndexes will not add a Feature Structure to a set index if its keys  match those of an entry of the same type already in the index.
+If the ``<kind>``element is omitted, it will default to sorted, which is the most common type of index.
+
+Prior to version 2.7.0, the bag and sorted indexes stored duplicate entries for the same identical FS, if it was added to the indexes multiple times.
+As of version 2.7.0, this  is changed; a second or subsequent add to index operation has no effect.
+This has the consequence that a remove operation now guarantees that the particular FS is removed  (as opposed to only being able to say that one (of perhaps many duplicate entries) is removed). Since sending to remote annotators only adds entries to indexes at most once, this  behavior is consistent with that.
+
+Note that even after this change, there is still a distinct difference in meaning for bag and set indexes.
+The set index uses equal defined key values plus the type of the Feature Structure to determine equivalence classes for Feature Structures, and will not add a Feature Structure if it has equal key values and the same type to an entry already in there.
+
+It is possible, however, that users may be depending on having multiple instances of  the identical FeatureStructure in the indicies.
+Therefore, UIMA uses  a JVM defined property, "uima.allow_duplicate_add_to_indexes", which (if defined whend UIMA is loaded) will restore the previous behavior.
+
+[NOTE]
+====
+If duplicates are allowed, then the proper way to update an indexed Feature Structure is to 
+
+* remove **all** instances of the FS to be updated 
+* update the features
+* re-add the Feature Structure to the indexes (perhaps multiple times, depending on the details of your logic).
+====
+
+[NOTE]
+====
+There is usually no need to explicitly declare a Bag index in your descriptor.
+As of UIMA v2.1, if you do not declare any index for a type (or any of its  supertypes), a Bag index will be automatically created if an instance of that type is added to the indexes.
+====
+
+An Sorted or Set index may define zero or more __keys__.
+These keys determine the sort order of the feature structures within a sorted index, and partially determine equality for set indexes (the equality measure always includes testing that the types are the same).  Bag indexes do not use keys, and  equality is determined by Feature Structure identity (that is, two elements are considered equal if and only if they are exactly the same feature structure, located in the same place in the CAS). Keys are ordered by precedence -- the first key is evaluated first, and subsequent keys are evaluated only if necessary.
+
+Each key is represented by an `fsIndexKey` element.
+Most `fsIndexKeys` contains a `featureName` and a ``comparator``.
+The `featureName` must match the name of one of the features for the type specified in the `<typeName>` element for this index.
+The comparator defines how the features will be compared -- a value of `standard` means that features will be compared using the standard comparison for their data type (e.g.
+for numerical types, smaller values precede larger values, and for string types, Unicode string comparison is performed). A value of `reverse` means that features will be compared using the reverse of the standard comparison (e.g.
+for numerical types, larger values precede smaller values, etc.). For Set indexes, the comparator direction is ignored -- the keys are only used for the equality testing.
+
+Each key used in comparisons must refer to a feature whose range type is Boolean, Byte, Short, Integer, Long, Float, Double, or String. 
+
+There is a second type of a key, one which contains only the ``<typePriority/>``.
+When this key is used, it indicates that Feature Structures will be compared using the type priorities declared in the `<typePriorities>` section of the descriptor.
+
+[[ugr.ref.xml.component_descriptor.aes.capabilities]]
+==== Capabilities
+
+[source]
+----
+<capabilities>
+  <capability>
+
+    <inputs>
+      <type allAnnotatorFeatures="true|false"[TypeName]</type>
+      ...
+      <feature>[TypeName]:[Name]</feature>
+      ...
+    </inputs>
+
+    <outputs>
+      <type allAnnotatorFeatures="true|false"[TypeName]</type>
+      ...
+      <feature>[TypeName]:[Name]</feature>
+      ...
+    </output>
+
+    <inputSofas>
+      <sofaName>[name]</sofaName>
+      ...
+    </inputSofas>
+
+    <outputSofas>
+      <sofaName>[name]</sofaName>
+      ...
+    </outputSofas>
+
+    <languagesSupported>
+      <language>[ISO Language ID]</language>
+        ...
+    </languagesSupported>
+  </capability>
+
+  <capability>
+    ...
+  </capability>
+
+  ...
+
+</capabilities>
+----
+
+The capabilities definition is used by the UIMA Framework in several ways, including setting up the Results Specification for process calls, routing control for aggregates based on language, and as part of the Sofa mapping function.
+
+The `capabilities` element contains one or more `capability` elements.
+In Version 2 and onwards, only one capability set should be used (multiple sets will continue to work for a while, but they're not logically consistently supported). 
+
+Each `capability` contains ``inputs``, ``outputs``, ``languagesSupported, inputSofas, and outputSofas``.
+Inputs and outputs element are required (though they may be empty); ``<languagesSupported>, <inputSofas``>, and `<outputSofas>` are optional.
+
+Both inputs and outputs may contain a mixture of type and feature elements.
+
+`<type...>` elements contain the name of one of the types defined in the type system or one of the built in types.
+Declaring a type as an input means that this component expects instances of this type to be in the CAS when it receives it to process.
+Declaring a type as an output means that this component creates new instances of this type in the CAS.
+
+There is an optional attribute ``allAnnotatorFeatures``, which defaults to false if omitted.
+The Component Descriptor Editor tool defaults this to true when a new type is added to the list of inputs and/or outputs.
+When this attribute is true, it specifies that all of the type's features are also declared as input or output.
+Otherwise, the features that are required as inputs or populated as outputs must be explicitly specified in feature elements.
+
+`<feature...>` elements contain the "`fully-qualified`" feature name, which is the type name followed by a colon, followed by the feature name, e.g. ``org.myorg.TokenAnnotation:lemma``. `<feature...>` elements in the `<inputs>` section must also have a corresponding type declared as an input.
+In output sections, this is not required.
+If the type is not specified as an output, but a feature for that type is, this means that existing instances of the type have the values of the specified features updated.
+Any type mentioned in a `<feature>` element must be either specified as an input or an output or both.
+
+``language ``elements contain one of the ISO language identifiers, such as `en` for English, or `en-US` for the United States dialect of English.
+
+The list of language codes can be found here: http://www.ics.uci.edu/pub/ietf/http/related/iso639.txt and the country codes here: http://www.chemie.fu-berlin.de/diverse/doc/ISO_3166.html
+
+`<inputSofas>` and `<outputSofas>` declare sofa names used by this component.
+All Sofa names must be unique within a particular capability set.
+A Sofa name must be an input or an output, and cannot be both.
+It is an error to have a Sofa name declared as an input in one capability set, and also have it declared as an output in another capability set.
+
+A `<sofaName>` is written as a simple Java-style identifier, without any periods in the name, except that it may be written to end in "``$$.$$*``".
+If written in this manner, it specifies a set of Sofa names, all of which start with the base name (the part before the .*) followed by a period and then an arbitrary Java identifier (without periods). This form is used to specify in the descriptor that the component could generate an arbitrary number of Sofas, the exact names and numbers of which are unknown before the component is run.
+
+[[ugr.ref.xml.component_descriptor.aes.operational_properties]]
+==== OperationalProperties
+
+Components can specify specific operational properties that can be useful in deployment.
+The following are available:
+
+[source]
+----
+<operationalProperties>
+  <modifiesCas> true|false </modifiesCas>
+  <multipleDeploymentAllowed> true|false </multipleDeploymentAllowed>
+  <outputsNewCASes> true|false </outputsNewCASes>
+</operationalProperties>
+----
+
+``ModifiesCas``, if false, indicates that this component does not modify the CAS.
+If it is not specified, the default value is true except for CAS Consumer components.
+
+``multipleDeploymentAllowed``, if true, allows the component to be deployed multiple times to increase performance through scale-out techniques.
+If it is not specified, the default value is true, except for CAS Consumer and Collection Reader components.
+
+[NOTE]
+====
+If you wrap one or more CAS Consumers inside an aggregate as the only components, you must explicitly specify in the aggregate the `multipleDeploymentAllowed` property as false (assuming the CAS Consumer components take the default here); otherwise the framework will complain about inconsistent settings for these.
+====
+
+`xref:tug.adoc#ugr.tug.cm[outputsNewCASes]`, if true, allows the component to create new CASes during processing, for example to break a large artifact into smaller pieces.
+
+[[ugr.ref.xml.component_descriptor.aes.primitive.external_resource_dependencies]]
+==== External Resource Dependencies
+
+[source]
+----
+<externalResourceDependencies>
+  <externalResourceDependency>
+    <key>[String]</key>
+    <description>[String] </description>
+    <interfaceName>[String]</interfaceName>
+    <optional>true|false</optional>
+  </externalResourceDependency>
+
+  <externalResourceDependency>
+    ...
+  </externalResourceDependency>
+
+  ...
+
+</externalResourceDependencies>
+----
+
+A primitive annotator may declare zero or more `<externalResourceDependency>` elements.
+Each dependency has the following elements: 
+
+* `key`– the string by which the annotator code will attempt to access the resource. Must be unique within this annotator.
+* `description`– a textual description of the dependency.
+* `interfaceName`– the fully-qualified name of the Java interface through which the annotator will access the data. This is optional. If not specified, the annotator can only get an InputStream to the data.
+* `optional`– whether the resource is optional. If false, an exception will be thrown if no resource is assigned to satisfy this dependency. Defaults to false. 
+
+
+[[ugr.ref.xml.component_descriptor.aes.primitive.resource_manager_configuration]]
+==== Resource Manager Configuration
+
+[source]
+----
+<resourceManagerConfiguration>
+
+  <name>[String]</name>
+  <description>[String]</description>
+  <version>[String]</version>
+  <vendor>[String]</vendor> 
+
+  <imports>
+    <import ...>
+    ...
+  </imports>
+
+  <externalResources>
+
+    <externalResource>
+      <name>[String]</name>
+      <description>[String]</description>
+      <fileResourceSpecifier>
+        <fileUrl>[URL]</fileUrl>
+      </fileResourceSpecifier>
+      <implementationName>[String]</implementationName>
+    </externalResource>
+    ...
+  </externalResources>
+
+  <externalResourceBindings>
+    <externalResourceBinding>
+      <key>[String]</key>
+      <resourceName>[String]</resourceName>
+    </externalResourceBinding>
+    ...
+  </externalResourceBindings>
+
+</resourceManagerConfiguration>
+----
+
+This element declares external resources and binds them to annotators' external resource dependencies.
+
+The `resourceManagerConfiguration` element may optionally contain an ``import``, which allows resource definitions to be stored in a separate (shareable) file.
+See <<ugr.ref.xml.component_descriptor.imports>> for details.
+
+The `externalResources` element contains zero or more `externalResource` elements, each of which consists of: 
+
+* `name` -- the name of the resource. This name is referred to in the bindings (see below). Resource names need to be unique within any Aggregate Analysis Engine or Collection Processing Engine, so the Java-like `org.myorg.mycomponent.MyResource` syntax is recommended.
+* `description` -- English description of the resource.
+* Resource Specifier -- Declares the location of the resource. There are different possibilities for how this is done (see below).
+* `implementationName`--  The fully-qualified name of the Java class that will be instantiated from the resource data. This is optional; if not specified, the resource will be accessible as an input stream to the raw data. If specified, the Java class must implement the `interfaceName` that is specified in the External Resource Dependency to which it is bound. 
+
+One possibility for the resource specifier is a `<fileResourceSpecifier>`, as shown above.
+This simply declares a URL to the resource data.
+This support is built on the Java class URL and its method `URL.openStream()`; it supports the protocols `file`, `http` and `jar` (for referring to files in jars) by default, and you can plug in handlers for other protocols.
+The URL has to start with file: (or some other protocol). It is relative to either the classpath or the `data path`.
+The data path works like the classpath but can be set programmatically via ``ResourceManager.setDataPath()``.
+Setting the Java System property `uima.datapath` also works.
+
+`file:com/apache.d.txt` is a relative path; relative paths for resources are resolved using the classpath and/or the datapath.
+For the file protocol, URLs starting with `file:/` or `\file:///` are absolute.
+Note that `\file://org/apache/d.txt` is NOT an absolute path starting with `org`.
+The "`//`" indicates that what follows is a host name.
+Therefore if you try to use this URL it will complain that it can't connect to the host `org`.
+
+The URL value may contain references to external override variables using the `${variable-name}` syntax,  e.g. ``file:com/${dictUrl}.txt``.
+If a variable is undefined the value is left unmodified and a warning message identifies the missing variable. 
+
+Another option is a ``<fileLanguageResourceSpecifier>``, which is intended to support resources, such as dictionaries, that depend on the language of the document being processed.
+Instead of a single URL, a prefix and suffix are specified, like this: 
+
+[source]
+----
+<fileLanguageResourceSpecifier>
+  <fileUrlPrefix>file:FileLanguageResource_implTest_data_</fileUrlPrefix>
+  <fileUrlSuffix>.dat</fileUrlSuffix>
+</fileLanguageResourceSpecifier>
+----
+
+The URL of the actual resource is then formed by concatenating the prefix, the language of the document (as an ISO language code, e.g. `en` or `en-US`– see <<ugr.ref.xml.component_descriptor.aes.capabilities>> for more information), and the suffix.
+
+A third option is a ``customResourceSpecifier``, which allows you to plug in an arbitrary Java class.
+See <<ugr.ref.xml.component_descriptor.custom_resource_specifiers>> for more information.
+
+The `externalResourceBindings` element declares which resources are bound to which dependencies.
+Each `externalResourceBinding` consists of: 
+
+* `key`-- identifies the dependency. For a binding declared in a primitive analysis engine descriptor, this must match the value of the `key` element of one of the `externalResourceDependency` elements. Bindings may also be specified in aggregate analysis engine descriptors, in which case a compound key is used -- see <<ugr.ref.xml.component_descriptor.aes.aggregate.external_resource_bindings>> .
+* `resourceName` -- the name of the resource satisfying the dependency. This must match the value of the `name` element of one of the `externalResource` declarations. 
+
+A given resource dependency may only be bound to one external resource; one external resource may be bound to many dependencies -- to allow resource sharing.
+
+[[ugr.ref.xml.component_descriptor.aes.environment_variable_references]]
+==== Environment Variable References
+
+In several places throughout the descriptor, it is possible to reference environment variables.
+In Java, these are actually references to Java system properties.
+To reference system environment variables from a Java analysis engine you must pass the environment variables into the Java virtual machine by using the `-D` option on the `java` command line.
+
+The syntax for environment variable references is `<envVarRef>[VariableName]</envVarRef>` , where [VariableName] is any valid Java system property name.
+Environment variable references are valid in the following places: 
+
+* The value of a configuration parameter (String-valued parameters only)
+* The `<annotatorImplementationName>` element of a primitive AE descriptor
+* The `<name>` element within `<analysisEngineMetaData>`
+* Within a `<fileResourceSpecifier>` or `<fileLanguageResourceSpecifier>`
+
+For example, if the value of a configuration parameter were specified as: `<string><envVarRef>TEMP_DIR</envVarRef>/temp.dat</string>` , and the value of the `TEMP_DIR` Java System property were `c:/temp`, then the configuration parameter's value would evaluate to `c:/temp/temp.dat`.
+
+[NOTE]
+====
+The Component Descriptor Editor does not support  environment variable references.
+If you need to, however, you  can use the `source` tab view in the CDE to manually add this notation. 
+====
+
+[[ugr.ref.xml.component_descriptor.aes.aggregate]]
+=== Aggregate Analysis Engine Descriptors
+
+Aggregate Analysis Engines do not contain an annotator, but instead contain one or more component (also called __delegate__) analysis engines.
+
+Aggregate Analysis Engine Descriptors maintain most of the same structure as Primitive Analysis Engine Descriptors.
+The differences are:
+
+* An Aggregate Analysis Engine Descriptor contains the element `<primitive>false</primitive>` rather than ``<primitive>true</primitive>``. 
+* An Aggregate Analysis Engine Descriptor must not include a `<annotatorImplementationName>` element.
+* In place of the ``<annotatorImplementationName>``, an Aggregate Analysis Engine Descriptor must have a `<delegateAnalysisEngineSpecifiers>` element. See <<ugr.ref.xml.component_descriptor.aes.aggregate.delegates>>.
+* An Aggregate Analysis Engine Descriptor may provide a `<flowController>` element immediately following the ``<delegateAnalysisEngineSpecifiers>``. <<ugr.ref.xml.component_descriptor.aes.aggregate.flow_controller>>.
+* Under the analysisEngineMetaData element, an Aggregate Analysis Engine Descriptor may specify an additional element -- ``<flowConstraints>``. See <<ugr.ref.xml.component_descriptor.aes.aggregate.flow_constraints>>. Typically only one of `<flowController>` and `<flowConstraints>` are specified. If both are specified, the `<flowController>` takes precedence, and the flow controller implementation can use the information in specified in the `<flowConstraints>` as part of its configuration input.
+* An aggregate Analysis Engine Descriptors must not contain a `<typeSystemDescription>` element. The Type System of the Aggregate Analysis Engine is derived by merging the Type System of the Analysis Engines that the aggregate contains.
+* Within aggregate Analysis Engine Descriptors, `<configurationParameter>` elements may define ``<overrides>``. See <<ugr.ref.xml.component_descriptor.aes.aggregate.configuration_parameter_overrides>> .
+* External Resource Bindings can bind resources to dependencies declared by any delegate AE within the aggregate. See <<ugr.ref.xml.component_descriptor.aes.aggregate.external_resource_bindings>>.
+* An additional optional element, ``<sofaMappings>``, may be included. 
+
+
+[[ugr.ref.xml.component_descriptor.aes.aggregate.delegates]]
+==== Delegate Analysis Engine Specifiers
+
+[source]
+----
+<delegateAnalysisEngineSpecifiers>
+
+  <delegateAnalysisEngine key="[String]">
+    <analysisEngineDescription>...</analysisEngineDescription> |
+    <import .../> 
+  </delegateAnalysisEngine>
+
+  <delegateAnalysisEngine key="[String]">
+    ...
+  </delegateAnalysisEngine>
+
+  ...
+
+</delegateAnalysisEngineSpecifiers>
+----
+
+The `delegateAnalysisEngineSpecifiers` element contains one or more `delegateAnalysisEngine` elements.
+Each of these must have a unique key, and must contain either:
+
+* A complete `analysisEngineDescription` element describing the delegate analysis engine *OR*
+* An `import` element giving the name or location of the XML descriptor for the delegate analysis engine (see <<ugr.ref.xml.component_descriptor.imports>>).
+
+The latter is the much more common usage, and is the only form supported by the Component Descriptor Editor tool.
+
+[[ugr.ref.xml.component_descriptor.aes.aggregate.flow_controller]]
+==== FlowController
+
+[source]
+----
+<flowController key="[String]">
+    <flowControllerDescription>...</flowControllerDescription> |
+    <import .../>
+  </flowController>
+----
+
+The optional `flowController` element identifies the descriptor of the FlowController component that will be used to determine the order in which delegate Analysis Engine are called.
+
+The `key` attribute is optional, but recommended; it assigns the FlowController an identifier that can be used for configuration parameter overrides, Sofa mappings, or external resource bindings.
+The key must not be the same as any of the delegate analysis engine keys.
+
+As with the `delegateAnalysisEngine` element, the `flowController` element may contain either a complete `flowControllerDescription` or an ``import``, but the import is recommended.
+The Component Descriptor Editor tool only supports imports here.
+
+[[ugr.ref.xml.component_descriptor.aes.aggregate.flow_constraints]]
+==== FlowConstraints
+
+If a `<flowController>` is not specified, the order in which delegate Analysis Engines are called within the aggregate Analysis Engine is specified using the `<flowConstraints>` element, which must occur immediately following the `configurationParameterSettings` element.
+If a `<flowController>` is specified, then the `<flowConstraints>` are optional.
+They can be used to pass an ordering of delegate keys to the ``<flowController>``.
+
+There are two options for flow constraints -- `<fixedFlow>` or ``<capabilityLanguageFlow>``.
+Each is discussed in a separate section below.
+
+[[ugr.ref.xml.component_descriptor.aes.aggregate.flow_constraints.fixed_flow]]
+===== Fixed Flow
+
+[source]
+----
+<flowConstraints>
+  <fixedFlow>
+    <node>[String]</node>
+    <node>[String]</node>
+    ...
+  </fixedFlow>
+</flowConstraints>
+----
+
+The `flowConstraints` element must be included immediately following the `configurationParameterSettings` element.
+
+Currently the `flowConstraints` element must contain a `fixedFlow` element.
+Eventually, other types of flow constraints may be possible.
+
+The `fixedFlow` element contains one or more `node` elements, each of which contains an identifier which must match the key of a delegate analysis engine specified in the `delegateAnalysisEngineSpecifiers` element.
+
+[[ugr.ref.xml.component_descriptor.aes.aggregate.flow_constraints.capability_language_flow]]
+===== Capability Language Flow
+
+[source]
+----
+<flowConstraints>
+  <capabilityLanguageFlow>
+    <node>[String]</node>
+    <node>[String]</node>
+    ...
+  </capabilityLanguageFlow>
+</flowConstraints>
+----
+
+If you use ``<capabilityLanguageFlow>``, the delegate Analysis Engines named by the `<node>` elements are called in the given order, except that a delegate Analysis Engine is skipped if any of the following are true (according to that Analysis Engine's declared output capabilities):
+
+* It cannot produce any of the aggregate Analysis Engine's output capabilities for the language of the current document.
+* All of the output capabilities have already been produced by an earlier Analysis Engine in the flow. 
+
+For example, if two annotators produce `org.myorg.TokenAnnotation` feature structures for the same language, these feature structures will only be produced by the first annotator in the list.
+
+[NOTE]
+====
+The flow analysis uses the specific types that are specified in the output capabilities, without any expansion for subtypes.
+So, if you expect a type TT and another type SubTT (which is a subtype of TT) in the output, you must include both of them in the output capabilities.
+====
+
+[[ugr.ref.xml.component_descriptor.aes.aggregate.external_resource_bindings]]
+==== External Resource Bindings
+
+Aggregate analysis engine descriptors can declare resource bindings that bind resources to dependencies declared in any of the delegate analysis engines (or their subcomponents, recursively) within that aggregate.
+This allows resource sharing.
+Any binding at this level overrides (supersedes) any binding specified by a contained component or their subcomponents, recursively.
+
+For example, consider an aggregate Analysis Engine Descriptor that contains delegate Analysis Engines with keys `annotator1` and `annotator2` (as declared in the `<delegateAnalysisEngine>` element – see <<ugr.ref.xml.component_descriptor.aes.aggregate.delegates>>), where `annotator1` declares a resource dependency with key `myResource` and `annotator2` declares a resource dependency with key `someResource` .
+
+Within that aggregate Analysis Engine Descriptor, the following `resourceManagerConfiguration` would bind both of those dependencies to a single external resource file.
+
+[source]
+----
+<resourceManagerConfiguration>
+
+  <externalResources>
+    <externalResource>
+      <name>ExampleResource</name>
+      <fileResourceSpecifier>
+        <fileUrl>file:MyResourceFile.dat</fileUrl>
+      </fileResourceSpecifier>
+    </externalResource>
+  </externalResources>  
+
+  <externalResourceBindings>
+    <externalResourceBinding>
+      <key>annotator1/myResource</key>
+      <resourceName>ExampleResource</resourceName>
+    </externalResourceBinding>
+    <externalResourceBinding>
+      <key>annotator2/someResource</key>
+      <resourceName>ExampleResource</resourceName>
+    </externalResourceBinding>
+  </externalResourceBindings>
+
+</resourceManagerConfiguration>
+----
+
+The syntax for the `externalResources` declaration is exactly the same as described previously.
+In the resource bindings note the use of the compound keys, e.g. ``annotator1/myResource``.
+This identifies the resource dependency key `myResource` within the annotator with key ``annotator1``.
+Compound resource dependencies can be multiple levels deep to handle nested aggregate analysis engines.
+
+[[ugr.ref.xml.component_descriptor.aes.aggregate.sofa_mappings]]
+==== Sofa Mappings
+
+Sofa mappings are specified between Sofa names declared in this aggregate descriptor as part of the `<capability>` section, and the Sofa names declared in the delegate components.
+For purposes of the mapping, all the declarations of Sofas in any of the capability sets contained within the ``<capabilities> ``element are considered together.
+
+[source]
+----
+<sofaMappings>
+  <sofaMapping>
+    <componentKey>[keyName]</componentKey>
+    <componentSofaName>[sofaName]</componentSofaName>
+    <aggregateSofaName>[sofaName]</aggregateSofaName>
+  </sofaMapping>
+  ...
+</sofaMappings>
+----
+
+The <componentSofaName> may be omitted in the case where the component is not aware of Multiple Views or Sofas.
+In this case, the UIMA framework will arrange for the specified <aggregateSofaName> to be the one visible to the delegate component.
+
+The <componentKey> is the key name for the component as specified in the list of delegate components for this aggregate.
+
+The sofaNames used must be declared as input or output sofas in some capability set.
+
+[[ugr.ref.xml.component_descriptor.aes.configuration_parameters]]
+=== Configuration Parameters
+
+Configuration parameters may be declared and set in both Primitive and  Aggregate descriptors.
+Parameters set in an aggregate may override parameters set in one or more of its delegates. 
+
+[[ugr.ref.xml.component_descriptor.aes.configuration_parameter_declaration]]
+==== Configuration Parameter Declaration
+
+Configuration Parameters are made available to annotator implementations and applications by the following interfaces: 
+
+* `AnnotatorContext`footnote:[Deprecated; use UimaContext instead.] (passed as an argument to the initialize() method of a version 1 annotator)
+* `ConfigurableResource` (every Analysis Engine implements this interface)
+* `UimaContext` (passed as an argument to the initialize() method of a version 2 annotator) (you can get this from any resource, including Analysis Engines, using the method ``getUimaContext``()).
+
+Use AnnotatorContext within version 1 annotators and UimaContext for version 2 annotators and outside of annotators (for instance, in CasConsumers, or the containing application) to access configuration parameters.
+
+Configuration parameters are set from the corresponding elements in the XML descriptor for the application.
+If you need to programmatically change parameter settings within an application, you can use methods in ConfigurableResource; if you do this, you need to call reconfigure() afterwards to have the UIMA framework notify all the contained analysis components that the parameter configuration has changed (the analysis engine's reinitialize() methods will be called). Note that in the current implementation, only integrated deployment components have configuration parameters passed to them; remote components obtain their parameters from their remote startup environment.
+This will likely change in the future.
+
+There are two ways to specify the `<configurationParameters>` section – as a list of configuration parameters or a list of groups.
+A list of parameters, which are not part of any group, looks like this: 
+
+[source]
+----
+<configurationParameters>
+  <configurationParameter>
+    <name>[String]</name> 
+    <externalOverrideName>[String]</externalOverrideName> 
+    <description>[String]</description> 
+    <type>String|Integer|Long|Float|Double|Boolean</type> 
+    <multiValued>true|false</multiValued> 
+    <mandatory>true|false</mandatory>
+    <overrides>
+      <parameter>[String]</parameter>
+      <parameter>[String]</parameter>
+        ...
+    </overrides>
+  </configurationParameter>
+  <configurationParameter>
+    ...
+  </configurationParameter>
+    ...
+</configurationParameters>
+----
+
+For each configuration parameter, the following are specified:
+
+* *name*– the name by which the annotator code refers to the parameter. All parameters declared in an analysis engine descriptor must have distinct names. (required). The name is composed of normal Java identifier characters.
+* *externalOverrideName*– the name of a property in an external settings file that if defined overrides any value set in this descriptor or in its parent. See <<ugr.ref.xml.component_descriptor.aes.external_configuration_parameter_overrides>> for a discussion of external configuration parameter overrides. (optional)
+* *description*– a natural language description of the intent of the parameter (optional)
+* *type*– the data type of the parameter's value – must be one of ``String``, ``Integer``, ``Long``, ``Float``, ``Double``, or `Boolean` (required).
+* *multiValued*–``true`` if the parameter can take multiple-values (an array), `false` if the parameter takes only a single value (optional, defaults to false).
+* *mandatory*–``true`` if a value must be provided for the parameter (optional, defaults to false).
+* *overrides*– this is used only in aggregate Analysis Engines, but is included here for completeness. See <<ugr.ref.xml.component_descriptor.aes.aggregate.configuration_parameter_overrides>> for a discussion of configuration parameter overriding in aggregate Analysis Engines. (optional).
+
+A list of groups looks like this: 
+[source]
+----
+<configurationParameters defaultGroup="[String]"
+    searchStrategy="none|default_fallback|language_fallback" >
+
+  <commonParameters>
+    [zero or more parameters]
+  </commonParameters>
+
+  <configurationGroup names="name1 name2 name3 ...">
+    [zero or more parameters]
+  </configurationGroup>
+
+  <configurationGroup names="name4 name5 ...">
+    [zero or more parameters]
+  </configurationGroup>
+
+  ...
+
+</configurationParameters>
+----
+
+Both the`` <commonParameters>`` and `<configurationGroup>` elements contain zero or more `<configurationParameter>` elements, with the same syntax described above.
+
+The `<commonParameters>` element declares parameters that exist in all groups.
+Each `<configurationGroup>` element has a names attribute, which contains a list of group names separated by whitespace (space or tab characters). Names consist of any number of non-whitespace characters; however the Component Descriptor Editor tool restricts this to be normal Java identifiers, including the period (.) and the dash (-). One configuration group will be created for each name, and all of the groups will contain the same set of parameters.
+
+The `defaultGroup` attribute specifies the name of the group to be used in the case where an annotator does a lookup for a configuration parameter without specifying a group name.
+It may also be used as a fallback if the annotator specifies a group that does not exist – see below.
+
+The `searchStrategy` attribute determines the action to be taken when the context is queried for the value of a parameter belonging to a particular configuration group, if that group does not exist or does not contain a value for the requested parameter.
+There are currently three possible values: 
+
+* *none*– there is no fallback; return null if there is no value in the exact group specified by the user.
+* *default_fallback*– if there is no value found in the specified group, look in the default group (as defined by the `default` attribute)
+* *language_fallback*– this setting allows for a specific use of configuration parameter groups where the groups names correspond to ISO language and country codes (for an example, see below). The fallback sequence is: `<lang>_<country>_<region> → <lang>_<country> → <lang> → <default>.`
+
+
+[[ugr.ref.xml.component_descriptor.aes.configuration_parameter_declaration.example]]
+===== Example
+
+[source]
+----
+<configurationParameters defaultGroup="en"
+        searchStrategy="language_fallback">
+
+  <commonParameters>
+    <configurationParameter>
+      <name>DictionaryFile</name>
+      <description>Location of dictionary for this
+           language</description>
+      <type>String</type>
+      <multiValued>false</multiValued>
+      <mandatory>false</mandatory>
+    </configurationParameter>
+  </commonParameters>
+
+  <configurationGroup names="en de en-US"/>
+
+  <configurationGroup names="zh">
+    <configurationParameter>
+      <name>DBC_Strategy</name>
+      <description>Strategy for dealing with double-byte
+          characters.</description>
+      <type>String</type>
+      <multiValued>false</multiValued>
+      <mandatory>false</mandatory>
+    </configurationParameter>
+  </configurationGroup>
+
+</configurationParameters>
+----
+
+In this example, we are declaring a `DictionaryFile` parameter that can have a different value for each of the languages that our AE supports – English (general), German, U.S.
+English, and Chinese.
+For Chinese only, we also declare a `DBC_Strategy` parameter.
+
+We are using the `language_fallback` search strategy, so if an annotator requests the dictionary file for the `en-GB` (British English) group, we will fall back to the more general `en` group.
+
+Since we have defined `en` as the default group, this value will be returned if the context is queried for the `DictionaryFile` parameter without specifying any group name, or if a nonexistent group name is specified.
+
+[[ugr.ref.aes.configuration_parameter_settings]]
+==== Configuration Parameter Settings
+
+For configuration parameters that are not part of any group, the `<configurationParameterSettings>` element looks like this: 
+[source]
+----
+<configurationParameterSettings>
+  <nameValuePair>
+    <name>[String]</name> 
+    <value>
+      <string>[String]</string>  | 
+      <integer>[Integer]</integer> |
+      <float>[Float]</float> |
+      <boolean>true|false</boolean>  |
+      <array> ... </array>
+    </value>
+  </nameValuePair>
+
+  <nameValuePair>
+    ...
+  </nameValuePair>
+  ...
+</configurationParameterSettings>
+----
+
+There are zero or more `nameValuePair` elements.
+Each `nameValuePair` contains a name (which refers to one of the configuration parameters) and a value for that parameter.
+
+The `value` element contains an element that matches the type of the parameter.
+For single-valued parameters, this is either ``<string>``, `<integer>` , ``<float>``, or ``<boolean>``.
+For multi-valued parameters, this is an `<array>` element, which then contains zero or more instances of the appropriate type of primitive value, e.g.: 
+[source]
+----
+<array><string>One</string><string>Two</string></array>
+----
+
+For parameters declared in configuration groups the `<configurationParameterSettings>` element looks like this: 
+[source]
+----
+<configurationParameterSettings>
+
+  <settingsForGroup name="[String]">
+    [one or more <nameValuePair> elements]
+  </settingsForGroup>
+
+  <settingsForGroup name="[String]">
+    [one or more <nameValuePair> elements]
+  </settingsForGroup>
+
+...
+
+</configurationParameterSettings>
+----
+where each `<settingsForGroup>` element has a name that matches one of the configuration groups declared under the `<configurationParameters>` element and contains the parameter settings for that group.
+
+[[ugr.ref.xml.component_descriptor.aes.configuration_parameter_settings.example]]
+===== Example
+
+Here are the settings that correspond to the parameter declarations in the previous example: 
+[source]
+----
+<configurationParameterSettings>
+
+  <settingsForGroup name="en">
+    <nameValuePair>
+      <name>DictionaryFile</name>
+      <value><string>resourcesEnglishdictionary.dat></string></value>
+    </nameValuePair>
+  </settingsForGroup>     
+
+  <settingsForGroup name="en-US">
+    <nameValuePair>
+      <name>DictionaryFile</name>
+      <value><string>resourcesEnglish_USdictionary.dat</string></value>
+    </nameValuePair>
+  </settingsForGroup>
+
+  <settingsForGroup name="de">
+    <nameValuePair>
+      <name>DictionaryFile</name>
+      <value><string>resourcesDeutschdictionary.dat</string></value>
+    </nameValuePair>
+  </settingsForGroup>
+
+  <settingsForGroup name="zh">
+    <nameValuePair>
+      <name>DictionaryFile</name>
+      <value><string>resourcesChinesedictionary.dat</string></value>
+    </nameValuePair>
+
+    <nameValuePair>
+      <name>DBC_Strategy</name>
+      <value><string>default</string></value>
+    </nameValuePair>
+
+  </settingsForGroup>
+
+</configurationParameterSettings>
+----
+
+[[ugr.ref.xml.component_descriptor.aes.aggregate.configuration_parameter_overrides]]
+==== Configuration Parameter Overrides
+
+In an aggregate Analysis Engine Descriptor, each ``<configurationParameter> ``element should contain an `<overrides>` element, with the following syntax:
+
+[source]
+----
+<overrides>
+
+  <parameter>
+    [delegateAnalysisEngineKey]/[parameterName]
+  </parameter>
+
+  <parameter>
+    [delegateAnalysisEngineKey]/[parameterName]
+  </parameter>
+  ...
+
+</overrides>
+----
+
+Since aggregate Analysis Engines have no code associated with them, the only way in which their configuration parameters can affect their processing is by overriding the parameter values of one or more delegate analysis engines.
+The ``<overrides> ``element determines which parameters, in which delegate Analysis Engines, are overridden by this configuration parameter.
+
+For example, consider an aggregate Analysis Engine Descriptor that contains delegate Analysis Engines with keys `annotator1` and `annotator2` (as declared in the <delegateAnalysisEngine> element – see <<ugr.ref.xml.component_descriptor.aes.aggregate.delegates>>) and also declares a configuration parameter as follows: 
+[source]
+----
+<configurationParameter>
+  <name>AggregateParam</name>
+  <type>String</type>
+  <overrides>
+    <parameter>annotator1/param1</parameter>
+    <parameter>annotator2/param2</parameter>
+  </overrides>
+</configurationParameter>
+----
+
+The value of the `AggregateParam` parameter (whether assigned in the aggregate descriptor or at runtime by an application) will override the value of parameter `param1` in `annotator1` and also override the value of parameter `param2` in ``annotator2``.
+No other parameters will be affected.
+Note that `AggregateParam` may itself be overridden by a parameter in an outer aggregate that has this aggregate as one of its delegates. 
+
+Prior to release 2.4.1, if an aggregate Analysis Engine descriptor declared a configuration parameter with no explicit overrides, that parameter would override any parameters having the same name within any delegate analysis engine.
+Starting with release 2.4.1, support for this usage has been dropped.
+
+[[ugr.ref.xml.component_descriptor.aes.external_configuration_parameter_overrides]]
+==== External Configuration Parameter Overrides
+
+External parameter overrides are usually declared in primitive descriptors as a way to easily modify the parameters in some or all of an application's annotators.
+By using external settings files and shared parameter names the configuration information can be specified without regard for a particular descriptor hierachy. 
+
+Configuration parameter declarations in primitive and aggregate descriptors may include an `<externalOverrideName>` element,  which specifies the name of a property that may be defined in an external settings file.
+If this element is present, and if a entry can be found for its name in a settings files, then this value overrides the value otherwise specified for this parameter. 
+
+The value overrides any value set in this descriptor or set by an override in a parent aggregate.
+In primitive descriptors the value set by an external override is always applied.
+In aggregate descriptors the value set by an external override applies to the aggregate parameter, and is passed down to the overridden delegate parameters in the usual way, i.e.
+only if the delegate's parameter has not been set by an external override. 
+
+Im the absence of external overrides,             parameter evaluation can be viewed as proceeding from the primitive descriptor up through any aggregates containing overrides, taking the last setting found.
+With external overrides the search ends with the first external override found that has a value assigned by a settings file. 
+
+The same external name may be used for multiple parameters;  the effect of this is that one setting will override multiple parameters. 
+
+The settings for all descriptors in a pipeline are usually loaded from one or more files whose names are obtained from the Java system property __UimaExternalOverrides__.
+The value of the property must be a comma-separated list of resource names.
+If the name has a prefix of "file:" or no prefix, the filesystem is searched.
+If the name has a prefix of "path:" the rest must be a Java-style dotted name, similar to the name attribute for descriptor imports.
+The dots are replaced by file separators and a suffix of ".settings" is appended before searching the datapath and classpath.
+e.g. ``-DUimaExternalOverrides=/data/file1.settings,file:relative/file2.settings,path:org.apache.uima.resources.file3``. 
+
+Override settings may also be specified when creating an analysis engine by putting a `Settings` object in the additional parameters map for the `produceAnalysisEngine` method.
+In this case the Java system property _UimaExternalOverrides_ is ignored. 
+[source]
+----
+  // Construct an analysis engine that uses two settings files
+  Settings extSettings = 
+      UIMAFramework.getResourceSpecifierFactory().createSettings();
+  for (String fname : new String[] { "externalOverride.settings", 
+                                     "default.settings" }) {
+    FileInputStream fis = new FileInputStream(fname);
+    extSettings.load(fis);
+    fis.close();
+  }
+  Map<String,Object> aeParms = new HashMap<String,Object>();
+  aeParms.put(Resource.PARAM_EXTERNAL_OVERRIDE_SETTINGS, extSettings);
+  AnalysisEngine ae = UIMAFramework.produceAnalysisEngine(desc, aeParms);
+----
+
+These external settings consist of key - value pairs stored in a  file using the UTF-8 character encoding, and written in a style similar to that  of Java properties files. 
+
+* Leading whitespace is ignored. 
+* Comment lines start with '#' or '!'. 
+* The key and value are separated by whitespace, '=' or ':'. 
+* Keys must contain at least one character and only letters, digits, or the characters '. / - ~ _'. 
+* If a line ends with '\' it is extended with the following line (after removing any leading whitespace.) 
+* Whitespace is trimmed from both keys and values. 
+* Duplicate key values are ignored –once a value is assigned to a key it cannot be changed. 
+* Values may reference other settings using the syntax '${key}'. 
+* Array values are represented as a list of strings separated by commas or line breaks, and bracketed by the '[ ]' characters. The value must start with an '[' and is terminated by the first unescaped ']' which must be at the end of a line. The elements of an array (and hence the array size) may be indirectly specified using the '${key}' syntax but the brackets '[ ]' must be explicitly specified. 
+* In values the special characters '$ { } [ , ] \' are treated as regular characters if preceeded by the escape character '\'. 
+
+[source]
+----
+key1  :  value1
+ key2 =  value  2
+  key3   element2, element3, element4
+ # Next assignment is ignored as key3 has already been set
+key3  :   value ignored
+key4  =  [ array element1, ${key3}, element5
+           element6 ]
+key5     value with a reference ${key1} to key1
+key6  :  long value string \
+         continued from previous line (with leading whitespace stripped)
+key7  =  value without a reference \${not-a-key} 
+key8     \[ value that is not an array ]
+key9  :  [ array element1\, with embedded comma, element2 ]
+----
+
+Multiple settings files are allowed; they are loaded in order, such that early ones take precedence over later ones, following the first-assignment-wins rule.
+So, if you have lots of settings, you can put the defaults in one file, and then in a earlier file, override just the ones you need to. 
+
+An external override name may be specified for a parameter declared in a group, but if the parameter is in the common group or the group is declared with multiple names, the external name is shared amongst all, i.e.
+these parameters cannot be given group-specific values. 
+
+[[ugr.ref.xml.component_descriptor.aes.external_configuration_parameter_access]]
+==== Direct Access to External Configuration Parameters
+
+Annotators and flow controllers can directly access these shared configuration parameters from their UimaContext.
+Direct access means an access where the key to select the shared parameter is the  parameter name as specified in the external configuration settings file. 
+
+[source]
+----
+String value = aContext.getSharedSettingValue(paramName);
+String values[] = aContext.getSharedSettingArray(arrayParamName);
+String allNames[] = aContext.getSharedSettingNames();
+----
+
+Java code called by an annotator or flow controller in the same thread or a child thread can use the `UimaContextHolder` to get the annotator's UimaContext and hence access the shared configuration parameters. 
+
+[source]
+----
+UimaContext uimaContext = UimaContextHolder.getUimaContext();
+if (uimaContext != null) {
+  value = uimaContext.getSharedSettingValue(paramName);
+}
+----
+
+The UIMA framework puts the context in an InheritableThreadLocal variable.
+The value will be null if `getUimaContext` is not invoked by an annotator or flow controller on the same thread or a child thread. 
+
+Since UIMA 3.2.1, the context is stored in the InheritableThreadLocal as a weak reference.
+This ensures that any long-running threads spawned while the context is set do not  prevent garbage-collection of the context when the context is destroyed.
+If a child thread should really retain a strong reference to the context, it should obtain the context and store it in a field or in another ThreadLocal variable.
+For backwards compatibility, the old behavior of using a strong reference by default can be enabled by setting the system property `uima.context_holder_reference_type` to ``STRONG``. 
+
+[[ugr.ref.xml.component_descriptor.aes.other_uses_for_external_configuration_parameters]]
+==== Other Uses for External Configuration Parameters
+
+Explicit references to shared configuration parameters can be specified as part of the value of the name and location attributes of the `import` element and in the value of the fileUrl for a `fileResourceSpecifier`			(see <<ugr.ref.xml.component_descriptor.imports>> and <<ugr.ref.xml.component_descriptor.aes.primitive.resource_manager_configuration>>). 
+
+[[ugr.ref.xml.component_descriptor.flow_controller]]
+== Flow Controller Descriptors
+
+The basic structure of a Flow Controller Descriptor is as follows: 
+
+[source]
+----
+<?xml version="1.0" ?> 
+<flowControllerDescription 
+    xmlns="http://uima.apache.org/resourceSpecifier">
+
+  <frameworkImplementation>org.apache.uima.java</frameworkImplementation> 
+
+  <implementationName>[ClassName]</implementationName> 
+
+  <processingResourceMetaData>
+    ...
+  </processingResourceMetaData>
+
+  <externalResourceDependencies>
+    ...
+  </externalResourceDependencies>
+
+  <resourceManagerConfiguration>
+    ...
+  </resourceManagerConfiguration>
+
+</flowControllerDescription>
+----
+
+The `frameworkImplementation` element must always be set to the value ``org.apache.uima.java``.
+
+The `implementationName` element must contain the fully-qualified class name of the Flow Controller implementation.
+This must name a class that implements the `FlowController` interface.
+
+The `processingResourceMetaData` element contains essentially the same information as a Primitive Analysis Engine Descriptor's `analysisEngineMetaData` element, described in <<ugr.ref.xml.component_descriptor.aes.metadata>>.
+
+The `externalResourceDependencies` and `resourceManagerConfiguration` elements are exactly the same as in Primitive Analysis Engine Descriptors (see <<ugr.ref.xml.component_descriptor.aes.primitive.external_resource_dependencies>> and <<ugr.ref.xml.component_descriptor.aes.primitive.resource_manager_configuration>>).
+
+[[ugr.ref.xml.component_descriptor.collection_processing_parts]]
+== Collection Processing Component Descriptors
+
+There are three types of Collection Processing Components –Collection Readers, CAS Initializers (deprecated as of UIMA Version 2), and CAS Consumers.
+Each type of component has a corresponding descriptor.
+The structure of these descriptors is very similar to that of primitive Analysis Engine Descriptors.
+
+[[ugr.ref.xml.component_descriptor.collection_processing_parts.collection_reader]]
+=== Collection Reader Descriptors
+
+The basic structure of a Collection Reader descriptor is as follows: 
+[source]
+----
+<?xml version="1.0" ?> 
+<collectionReaderDescription
+    xmlns="http://uima.apache.org/resourceSpecifier">
+
+  <frameworkImplementation>org.apache.uima.java</frameworkImplementation>
+  <implementationName>[ClassName]</implementationName> 
+
+  <processingResourceMetaData>
+    ...
+  </processingResourceMetaData>
+
+  <externalResourceDependencies>
+   ...
+  </externalResourceDependencies>
+
+  <resourceManagerConfiguration>
+
+   ...
+
+  </resourceManagerConfiguration>
+
+</collectionReaderDescription>
+----
+
+The `frameworkImplementation` element must always be set to the value ``org.apache.uima.java``.
+
+The `implementationName` element contains the fully-qualified class name of the Collection Reader implementation.
+This must name a class that implements the `CollectionReader` interface.
+
+The `processingResourceMetaData` element contains essentially the same information as a Primitive Analysis Engine Descriptor's' `analysisEngineMetaData` element: 
+[source]
+----
+<processingResourceMetaData>
+
+  <name> [String] </name>
+  <description>[String]</description>
+  <version>[String]</version>
+  <vendor>[String]</vendor>
+
+  <configurationParameters>
+     ...
+  </configurationParameters>
+
+  <configurationParameterSettings>
+    ...
+  </configurationParameterSettings> 
+
+  <typeSystemDescription>
+   ...
+  </typeSystemDescription> 
+
+  <typePriorities>
+   ...
+  </typePriorities> 
+
+  <fsIndexes>
+   ...
+  </fsIndexes>
+
+  <capabilities>
+   ...
+  </capabilities> 
+
+</processingResourceMetaData>
+----
+
+The contents of these elements are the same as that described in <<ugr.ref.xml.component_descriptor.aes.metadata>>, with the exception that the capabilities section should not declare any inputs (because the Collection Reader is always the first component to receive the CAS).
+
+The `externalResourceDependencies` and `resourceManagerConfiguration` elements are exactly the same as in the Primitive Analysis Engine Descriptors (see <<ugr.ref.xml.component_descriptor.aes.primitive.external_resource_dependencies>> and <<ugr.ref.xml.component_descriptor.aes.primitive.resource_manager_configuration>>).
+
+[[ugr.ref.xml.component_descriptor.collection_processing_parts.cas_initializer]]
+=== CAS Initializer Descriptors (deprecated)
+
+The basic structure of a CAS Initializer Descriptor is as follows: 
+[source]
+----
+<?xml version="1.0" encoding="UTF-8" ?> 
+<casInitializerDescription
+    xmlns="http://uima.apache.org/resourceSpecifier">
+
+  <frameworkImplementation>org.apache.uima.java</frameworkImplementation>
+  <implementationName>[ClassName] </implementationName>
+
+  <processingResourceMetaData>
+    ...
+  </processingResourceMetaData>
+
+  <externalResourceDependencies>
+    ...
+  </externalResourceDependencies>
+
+  <resourceManagerConfiguration>
+    ...
+  </resourceManagerConfiguration>
+
+</casInitializerDescription>
+----
+
+The `frameworkImplementation` element must always be set to the value ``org.apache.uima.java``.
+
+The `implementationName` element contains the fully-qualified class name of the CAS Initializer implementation.
+This must name a class that implements the `CasInitializer` interface.
+
+The `processingResourceMetaData` element contains essentially the same information as a Primitive Analysis Engine Descriptor's' `analysisEngineMetaData` element, as described in <<ugr.ref.xml.component_descriptor.aes.metadata>>, with the exception of some changes to the capabilities section.
+A CAS Initializer's capabilities element looks like this: 
+[source]
+----
+<capabilities>
+  <capability>
+    <outputs>
+      <type allAnnotatorFeatures="true|false">[String]</type>
+      <type>[TypeName]</type>
+      ...
+      <feature>[TypeName]:[Name]</feature>
+      ...
+    </outputs>
+
+    <outputSofas>
+      <sofaName>[name]</sofaName>
+      ...
+    </outputSofas>
+
+    <mimeTypesSupported>
+      <mimeType>[MIME Type]</mimeType>
+      ...
+    </mimeTypesSupported>
+  </capability>
+
+  <capability>
+    ...
+  </capability>
+  ...
+</capabilities>
+----
+
+The differences between a CAS Initializer's capabilities declaration and an Analysis Engine's capabilities declaration are that the CAS Initializer does not declare any input CAS types and features or input Sofas (because it is always the first to operate on a CAS), it doesn't have a language specifier, and that the CAS Initializer may declare a set of MIME types that it supports for its input documents.
+Examples include: text/plain, text/html, and application/pdf.
+For a list of MIME types see http://www.iana.org/assignments/media-types/.
+This information is currently only for users' information, the framework does not use it for anything.
+This may change in future versions.
+
+The `externalResourceDependencies` and `resourceManagerConfiguration` elements are exactly the same as in the Primitive Analysis Engine Descriptors (see <<ugr.ref.xml.component_descriptor.aes.primitive.external_resource_dependencies>> and <<ugr.ref.xml.component_descriptor.aes.primitive.resource_manager_configuration>>).
+
+[[ugr.ref.xml.component_descriptor.collection_processing_parts.cas_consumer]]
+=== CAS Consumer Descriptors
+
+The basic structure of a CAS Consumer Descriptor is as follows: 
+[source]
+----
+<?xml version="1.0" encoding="UTF-8" ?> 
+<casConsumerDescription 
+    xmlns="http://uima.apache.org/resourceSpecifier">
+
+  <frameworkImplementation>org.apache.uima.java</frameworkImplementation> 
+
+  <implementationName>[ClassName]</implementationName> 
+
+  <processingResourceMetaData>
+    ...
+  </processingResourceMetaData>
+
+  <externalResourceDependencies>
+    ...
+  </externalResourceDependencies>
+
+  <resourceManagerConfiguration>
+    ...
+  </resourceManagerConfiguration>
+</casConsumerDescription>
+----
+
+The `frameworkImplementation` element currently must  have the value ``org.apache.uima.java``, or ``org.apache.uima.cpp``.
+
+The next subelement,``
+          <annotatorImplementationName>`` is how the UIMA framework determines which annotator class to use.
+This should contain a fully-qualified Java class name for Java implementations, or the name of a .dll or .so file for C++ implementations.
+
+The `frameworkImplementation` element must always be set to the value ``org.apache.uima.java``.
+
+The `implementationName` element must contain the fully-qualified class name of the CAS Consumer implementation, or the name  of a .dll or .so file for C++ implementations.
+For Java, the named class must implement the `CasConsumer` interface.
+
+The `processingResourceMetaData` element contains essentially the same information as a Primitive Analysis Engine Descriptor's `analysisEngineMetaData` element, described in <<ugr.ref.xml.component_descriptor.aes.metadata>>, except that the CAS Consumer Descriptor's `capabilities` element should not declare outputs or outputSofas (since CAS Consumers do not modify the CAS).
+
+The `externalResourceDependencies` and `resourceManagerConfiguration` elements are exactly the same as in Primitive Analysis Engine Descriptors (see <<ugr.ref.xml.component_descriptor.aes.primitive.external_resource_dependencies>> and <<ugr.ref.xml.component_descriptor.aes.primitive.resource_manager_configuration>>).
+
+[[ugr.ref.xml.component_descriptor.service_client]]
+== Service Client Descriptors
+
+Service Client Descriptors specify only a location of a remote service.
+They are therefore much simpler in structure.
+In the UIMA SDK, a Service Client Descriptor that refers to a valid Analysis Engine or CAS Consumer service can be used in place of the actual Analysis Engine or CAS Consumer Descriptor.
+The UIMA SDK will handle the details of calling the remote service.
+(For details on _deploying_ an Analysis Engine or CAS Consumer as a service, see xref:tug.adoc#ugr.tug.application.remote_services[Working with Remote Services].
+
+The UIMA SDK is extensible to support different types of remote services.
+In future versions, there may be different variations of service client descriptors that cater to different types of services.
+For now, the only type of service client descriptor is the ``uriSpecifier``, which supports the Vinci protocol.
+
+[source]
+----
+<?xml version="1.0" encoding="UTF-8" ?>
+<uriSpecifier xmlns="http://uima.apache.org/resourceSpecifier">
+  <resourceType>AnalysisEngine | CasConsumer </resourceType>
+  <uri>[URI]</uri> 
+  <protocol>Vinci</protocol> 
+  <timeout>[Integer]</timeout>
+  <parameters>
+    <parameter name="VNS_HOST" value="some.internet.ip.name-or-address"/>
+    <parameter name="VNS_PORT" value="9000"/>
+    <parameter name="GetMetaDataTimeout" value="[Integer]"/>
+  </parameters> 
+</uriSpecifier>
+----
+
+The `resourceType` element is required for new descriptors, but is currently allowed to be omitted for backward compatibility.
+It specifies the type of component (Analysis Engine or CAS Consumer) that is implemented by the service endpoint described by this descriptor.
+
+The `uri` element contains the URI for the web service.
+(Note that in the case of Vinci, this will be the service name, which is looked up in the Vinci Naming Service.)
+
+The `protocol` element may be set to Vinci; other protocols may be added later.
+These specify the particular data transport format that will be used.
+
+The `timeout` element is optional.
+If present, it specifies the number of milliseconds to wait for a request to be processed before an exception is thrown.
+A value of zero or less will wait forever.
+If no timeout is specified, a default value (currently 60 seconds) will be used.
+
+The parameters element is optional.
+If present, it can specify values for each of the following: 
+
+* ``VNS_HOST``: host name for the Vinci naming service. 
+* ``VNS_PORT``: port number for the Vinci naming service. 
+* ``GetMetaDataTimeout``: timeout period (in milliseconds) for the GetMetaData call. If not specified, the default is 60 seconds. This may need to be set higher if there are a lot of clients competing for connections to the service. 
+
+If the `VNS_HOST` and `VNS_PORT` are not specified in the descriptor, the values used for these comes from parameters passed on the Java command line using the `-DVNS_HOST=<host>` and/or `-DVNS_PORT=<port>` system arguments.
+If not present, and a system argument is also not present, the values for these default to `localhost` for the `VNS_HOST` and `9000` for the ``VNS_PORT``.
+
+For details on how to deploy and call Analysis Engine and CAS Consumer services, see xref:tug.adoc#ugr.tug.application.remote_services[Working with Remote Services].
+
+[[ugr.ref.xml.component_descriptor.custom_resource_specifiers]]
+== Custom Resource Specifiers
+
+A Custom Resource Specifier allows you to plug in your own Java class as a UIMA Resource.
+For example you can support a new service protocol by plugging in a Java class that implements the UIMA `AnalysisEngine` interface and communicates with the remote service.
+
+A Custom Resource Specifier has the following format:
+
+[source]
+----
+<?xml version="1.0" encoding="UTF-8" ?>
+<customResourceSpecifier xmlns="http://uima.apache.org/resourceSpecifier">
+  <resourceClassName>[Java Class Name]</resourceClassName>
+  <parameters>
+    <parameter name="[String]" value="[String]"/>
+    <parameter name="[String]" value="[String]"/>
+  </parameters> 
+</customResourceSpecifier>
+----
+
+The `resourceClassName` element must contain the fully-qualified name of a Java class that can be found in the classpath (including the UIMA extension classpath, if you have specified one using the `ResourceManager.setExtensionClassPath` method).  This class must implement the UIMA `Resource` interface.
+
+When an application calls the `UIMAFramework.produceResource` method and passes a ``CustomResourceSpecifier``, the UIMA framework will load the named class and call its `initialize(ResourceSpecifier,Map)` method, passing the `CustomResourceSpecifier`	as the first argument.
+Your class can override the `initialize` method and use the `CustomResourceSpecifier` API to get access to the `parameter` names and values  specified in the XML.
+
+If you are using a custom resource specifier to plug in a class that implements a new service protocol, your class must also implement the `AnalysisEngine` interface.
+Generally it should also extend ``AnalysisEngineImplBase``.
+The key methods that should be implemented are ``getMetaData``, ``processAndOutputNewCASes``, ``collectionProcessComplete``, and ``destroy``.
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/ref/ref.xml.cpe_descriptor.adoc b/uimaj-documentation/src/docs/asciidoc/ref/ref.xml.cpe_descriptor.adoc
new file mode 100644
index 0000000..1284e7b
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/ref/ref.xml.cpe_descriptor.adoc
@@ -0,0 +1,922 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.ref.xml.cpe_descriptor]]
+= Collection Processing Engine Descriptor Reference
+// <titleabbrev>CPE Descriptor Reference</titleabbrev>
+
+A UIMA _Collection Processing Engine_ (CPE) is a combination of UIMA components assembled to analyze a collection of artifacts.
+A CPE is an instantiation of the UIMA __Collection Processing Architecture__, which defines the collection processing components, interfaces, and APIs.
+A CPE is executed by a UIMA framework component called the _Collection Processing Manager_ (CPM), which provides a number of services for deploying CPEs, running CPEs, and handling errors.
+
+A CPE can be assembled programmatically within a Java application, or it can be assembled declaratively via a CPE configuration specification, called a CPE Descriptor.
+This chapter describes the format of the CPE Descriptor.
+
+Details about the CPE, including its function, sub-components, APIs, and related tools, can be found in xref:tug.adoc#ugr.tools.cpe[Collection Processing Engine Developer's Guide].
+Here we briefly summarize the CPE to define terms and provide context for the later sections that describe the CPE Descriptor.
+
+[[ugr.ref.xml.cpe_descriptor.overview]]
+== CPE Overview
+
+[[ugr.ref.xml.cpe_descriptor.overview.fig.runtime]]
+.CPE Runtime Overview
+image::images/references/ref.xml.cpe_descriptor/image002.png[CPE Runtime Overview diagram]
+
+An illustration of the CPE runtime is shown in <<ugr.ref.xml.cpe_descriptor.overview.fig.runtime>>.
+Some of the CPE components, such as the _queues_ and __processing pipelines__, are internal to the CPE, but their behavior and deployment may be configured using the CPE Descriptor.
+Other CPE components, such as the _Collection Reader_ and __CAS Processors__, are defined and configured externally from the CPE and then plugged in to the CPE to create the overall engine.
+The parts of a CPE are: 
+
+Collection Reader::
+understands the native data collection format and iterates over the collection producing subjects of analysis
+
+CAS Initializerfootnote:[Deprecated]::
+initializes a CAS with a subject of analysis
+
+Artifact Producer::
+asynchronously pulls CASes from the Collection Reader, creates batches of CASes and puts them into the work queue
+
+Work Queue::
+shared queue containing batches of CASes queued by the Artifact Producer for analysis by Analysis Engines
+
+B1-Bn::
+individual batches containing 1 or more CASes
+
+AE1-AEn::
+Analysis Engines arranged by a CPE descriptor
+
+Processing Pipelines::
+each pipeline runs in a separate thread and contains a replicated set of the Analysis Engines running in the defined sequence
+
+Output Queue::
+holds batches of CASes with analysis results intended for CAS Consumers
+
+CAS Consumers::
+perform collection level analysis over the CASes and extract analysis results, e.g., creating indexes or databases
+
+[[ugr.ref.xml.cpe_descriptor.notation]]
+== Notation
+
+CPE Descriptors are XML files.
+This chapter uses an informal notation to specify the syntax of CPE Descriptors.
+
+The notation used in this chapter is: 
+
+* An ellipsis (...) inside an element body indicates that the substructure of that element has been omitted (to be described in another section of this chapter). An example of this would be: 
++
+[source]
+----
+<collectionReader>
+...
+</collectionReader>
+----
+* An ellipsis immediately after an element indicates that the element type may be repeated arbitrarily many times. For example: 
++
+[source]
+----
+<parameter>[String]</parameter>
+<parameter>[String]</parameter>
+...
+----
+indicates that there may be arbitrarily many parameter elements in this context.
+* An ellipsis inside an element means details of the attributes associated with that element are defined later, e.g.: 
++
+[source]
+----
+<casProcessor ...>
+----
+* Bracketed expressions (e.g. ``[String]``) indicate the type of value that may be used at that location.
+* A vertical bar, as in ``true|false``, indicates alternatives. This can be applied to literal values, bracketed type names, and elements. 
+
+Which elements are optional and which are required is specified in prose, not in the syntax definition.
+
+[[ugr.ref.xml.cpe_descriptor.imports]]
+== Imports
+
+As of version 2.2, a CPE Descriptor can use the same `import` mechanism as other component descriptors.
+This allows referring to xref:ref.adoc#ugr.ref.xml.component_descriptor[component descriptors] using either relative paths (resolved relative to the location of the CPE descriptor) or the classpath/datapath.
+
+The follwing older syntax is still supported, but __not recommended__: 
+[source]
+----
+<descriptor>
+    <include href="[URL or File]"/>
+</descriptor>
+----
+
+The `[URL or File]` attribute is a URL or a filename for the descriptor of the incorporated component.
+The argument is first attempted to be resolved as a URL.
+
+Relative paths in an `include` are resolved relative to the current working directory  (NOT the CPE descriptor location as is the case for ``import``).  A filename relative to another directory can be specified using the `CPM_HOME` variable, e.g., 
+[source]
+----
+<descriptor>
+    <include href="${CPM_HOME}/desc_dir/descriptor.xml"/>
+</descriptor>
+---- In this case, the value for the `CPM_HOME` variable must be provided to the CPE by specifying it on the Java command line, e.g., 
+[source]
+----
+java -DCPM_HOME="C:/Program Files/apache/uima/cpm" ...
+----
+
+[[ugr.ref.xml.cpe_descriptor.descriptor]]
+== CPE Descriptor Overview
+
+A CPE Descriptor consists of information describing the following four main elements.
+
+. The __Collection Reader__, which is responsible for gathering artifacts and initializing the Common Analysis Structure (CAS) used to support processing in the UIMA collection processing engine.
+. The __CAS Processors__, responsible for analyzing individual artifacts, analyzing across artifacts, and extracting analysis results. CAS Processors include _Analysis Engines_ and __CAS Consumers__.
+. Operational parameters of the _Collection Processing Manager_ (CPM), such as checkpoint frequency and deployment mode.
+. Resource Manager Configuration (optional). 
+
+The CPE Descriptor has the following high level skeleton: 
+[source]
+----
+<?xml version="1.0"?>
+<cpeDescription>
+   <collectionReader>
+...
+   </collectionReader>
+   <casProcessors>
+...
+   </casProcessors>
+   <cpeConfig>
+...
+   </cpeConfig>
+   <resourceManagerConfiguration>
+...
+   </resourceManagerConfiguration>
+</cpeDescription>
+----
+
+Details of each of the four main elements are described in the sections that follow.
+
+[[ugr.ref.xml.cpe_descriptor.descriptor.collection_reader]]
+== Collection Reader
+
+The `<collectionReader>` section identifies the Collection Reader and optional CAS Initializer that are to be used in the CPE.
+The Collection Reader is responsible for retrieval of artifacts from a collection outside of the CPE, and the optional CAS Initializer (deprecated as of UIMA Version 2) is responsible for initializing the CAS with the artifact.
+
+A Collection Reader may initialize the CAS itself, in which case it does not require a CAS Initializer.
+This should be clearly specified in the documentation for the Collection Reader.
+Specifying a CAS Initializer for a Collection Reader that does not make use of a CAS Initializer will not cause an error, but the specified CAS Initializer will not be used.
+
+The complete structure of the `<collectionReader>` section is: 
+[source]
+----
+<collectionReader>
+  <collectionIterator>
+    <descriptor>
+      <import ...> | <include .../>
+    </descriptor>
+    <configurationParameterSettings>...</configurationParameterSettings>
+    <sofaNameMappings>...</sofaNameMappings>
+  </collectionIterator>
+</collectionReader>
+----
+
+The `<collectionIterator>` identifies the descriptor for the xref:ref.adoc#ugr.ref.xml.component_descriptor.collection_processing_parts.collection_reader[Collection Reader].
+The `<configurationParameterSettings>` and the `<sofaNameMappings>` elements are described in the next section.
+
+[[ugr.ref.xml.cpe_descriptor.descriptor.collection_reader.error_handling]]
+=== Error handling for Collection Readers
+
+The CPM will abort if the Collection Reader throws a large number of consecutive exceptions (default = 100). This default can by changed by using the Java initialization parameter `-DMaxCRErrorThreshold=xxx.`
+
+[[ugr.ref.xml.cpe_descriptor.descriptor.cas_processors]]
+== CAS Processors
+
+The `<casProcessors>` section identifies the components that perform the analysis on the input data, including CAS analysis (Analysis Engines) and analysis results extraction (CAS Consumers). The CAS Consumers may also perform collection level analysis, where the analysis is performed (or aggregated) over multiple CASes.
+The basic structure of the CAS Processors section is: 
+
+[source]
+----
+<casProcessors 
+    dropCasOnException="true|false"
+    casPoolSize="[Number]" 
+    processingUnitThreadCount="[Number]">
+
+  <casProcessor ...>
+        ...
+  </casProcessor>
+
+  <casProcessor ...>
+        ...
+  </casProcessor>
+    ...
+</casProcessors>
+----
+
+The `<casProcessors>` section has two mandatory attributes and one optional attribute that configure the characteristics of the CAS Processor flow in the CPE.
+The first mandatory attribute is a casPoolSize, which defines the fixed number of CAS instances that the CPM will create and use during processing.
+All CAS instances are maintained in a CAS Pool with a check-in and check-out access.
+Each CAS is checked-out from the CAS Pool by the Collection Reader and initialized with an initial subject of analysis.
+The CAS is checked-in into the CAS Pool when it is completely processed, at the end of the processing chain.
+A larger CAS Pool size will result in more memory being used by the CPM.
+CAS objects can be large and care should be taken to determine the optimum size of the CAS Pool, weighing memory tradeoffs with performance.
+
+The second mandatory `<casProcessors>` attribute is ``processingUnitThreadCount``, which specifies the number of replicated __Processing Pipelines__.
+Each Processing Pipeline runs in its own thread.
+The CPM takes CASes from the work queue and submits each CAS to one of the Processing Pipelines for analysis.
+A Processing Pipeline contains one or more Analysis Engines invoked in a given sequence.
+If more than one Processing Pipeline is specified, the CPM replicates instances of each Analysis Engine defined in the CPE descriptor.
+Each Processing Pipeline thread runs independently, consuming CASes from work queue and depositing CASes with analysis results onto the output queue.
+On multiprocessor machines, multiple Processing Pipelines can run in parallel, improving overall throughput of the CPM.
+
+[NOTE]
+====
+The number of Processing Pipelines should be equal to or greater than CAS Pool size. 
+====
+
+Elements in the pipeline (each represented by a <casProcessor> element) may indicate that they do not permit multiple deployment in their Analysis Engine descriptor.
+If so, even though multiple pipelines are being used, all CASes passing through the pipelines will be routed through one instance of these marked Engines. 
+
+The final, optional, <casProcessors> attribute is ``dropCasOnException``.
+It defines a policy that determines what happens with the CAS when an exception happens during processing.
+If the value of this attribute is set to true and an exception happens, the CPM will notify all see xref:tug.adoc#ugr.tug.cpe.using_listeners[registered listeners] of the exception, clear the CAS and check the CAS back into the CAS Pool so that it can be re-used.
+The presumption is that an exception may leave the CAS in an inconsistent state and therefore that CAS should not be allowed to move through the processing chain.
+When this attribute is omitted the CPM's default is the same as specifying `dropCasOnException="false"`.
+
+[[ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual]]
+=== Specifying an Individual CAS Processor
+
+The CAS Processors that make up the Processing Pipeline and the CAS Consumer pipeline are specified with the `<casProcessor>` entity, which appears within the `<casProcessors>` entity.
+It may appear multiple times, once for each CAS Processor specified for this CPE.
+
+The order of the `<casProcessor>` entities with the `<casProcessors>` section specifies the order in which the CAS Processors will run.
+Although CAS Consumers are usually put at the end of the pipeline, they need not be.
+Also, Aggregate Analysis Engines may include CAS Consumers.
+
+The overall format of the `<casProcessor>` entity is: 
+
+[source]
+----
+<casProcessor deployment="local|remote|integrated" name="[String]" >
+    <descriptor>
+      <import ...> | <include .../>
+    </descriptor>
+    <configurationParameterSettings>...</configurationParameterSettings>
+    <sofaNameMappings>...</sofaNameMappings>
+    <runInSeparateProcess>...</runInSeparateProcess>
+    <deploymentParameters>...</deploymentParameters>
+    <filter/>
+    <errorHandling>...</errorHandling>
+    <checkpoint batch="Number"/>
+</casProcessor>
+----
+
+The `<casProcessor>` element has two mandatory attributes, `deployment` and `name`.
+The mandatory `name` attribute specifies a unique string identifying the CAS Processor.
+
+The mandatory `deployment` attribute specifies the CAS Processor deployment mode.
+Currently, three deployment options are supported: 
+
+integrated::
+indicates _integrated_ deployment of the CAS Processor.
+The CPM deploys and collocates the CAS Processor in the same process space as the CPM.
+This type of deployment is recommended to increase the performance of the CPE.
+However, it is NOT recommended to deploy annotators containing JNI this way.
+Such CAS Processors may cause a fatal exception and force the JVM to exit without cleanup (bringing down the CPM). Any UIMA SDK compliant pure Java CAS Processors may be safely deployed this way.
++
+The descriptor for an integrated deployment can, in fact, be a remote service descriptor.
+When used this way, however, the CPM error recovery  options (see below) operate in the integrated mode, which means that many  of the retry options are not available.
+
+remote::
+indicates _non-managed_ deployment of the CAS Processor.
+The CAS Processor descriptor referenced in the `<descriptor>` element must be a Vinci __Service Client Descriptor__, which identifies a xref:tug.adoc#ugr.tug.application.remote_services[remotely deployed CAS Processor service]. The CPM assumes that the CAS Processor is already running as a remote service and will connect to it using the URI provided in the client service descriptor.
+The lifecycle of a remotely deployed CAS Processor is not managed by the CPM, so appropriate infrastructure should be in place to start/restart such CAS Processors when necessary.
+This deployment provides fault isolation and is implementation (i.e., programming language) neutral.
+
+local::
+indicates _managed_ deployment of the CAS Processor.
+The CAS Processor descriptor referenced in the `<descriptor>` element must be a Vinci __Service Deployment Descriptor__, which configures a CAS Processor for deployment as a xref:tug.adoc#ugr.tug.application.remote_services[Vinci service].
+The CPM deploys the CAS Processor in a separate process and manages the life cycle (start/stop) of the CAS Processor.
+Communication between the CPM and the CAS Processor is done with Vinci.
+When the CPM completes processing, the process containing the CAS Processor is terminated.
+This deployment mode insulates the CPM from the CAS Processor, creating a more robust deployment at the cost of a small communication overhead.
+On multiprocessor machines, the separate processes may run concurrently and improve overall throughput.
+
+A number of elements may appear within the `<casProcessor>` element.
+
+[[ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.descriptor]]
+==== <descriptor> Element
+
+The `<descriptor>` element is mandatory.
+It identifies the descriptor for the referenced xref:ref.adoc#ugr.ref.xml.component_descriptor.aes[CAS Processor].
+
+* For _``__remote__``_ CAS Processors, the referenced descriptor must be a Vinci __Service Client Descriptor__, which identifies a remotely deployed CAS Processor service.
+* For _local_ CAS Processors, the referenced descriptor must be a Vinci __Service Deployment Descriptor__.
+* For _integrated_ CAS Processors, the referenced descriptor must be an Analysis Engine Descriptor (primitive or aggregate). 
+
+See the xref:tug.adoc#ugr.tug.application.remote_services[Remote Services Guide] for more information on creating these descriptors and deploying services.
+
+[[ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.configuration_parameter_settings]]
+==== <configurationParameterSettings> Element
+
+This element provides a way to override the contained Analysis Engine's parameters settings.
+Any entry specified here must already be defined; values specified replace the corresponding values for each parameter. _For Cas Processors, this mechanism
+            is only available when they are deployed in "`integrated`"
+            mode._ For Collection Readers and Initializers, it always is available.
+
+The content of this element is identical to the component descriptor for specifying parameters (in the case where no parameter groups are specified)footnote:[An earlier UIMA version required these to have a suffix of _p, e.g., string_p. This is no longer required, but this format is accepted, also, for backward compatibility.].
+
+Here is an example: 
+
+[source]
+----
+<configurationParameterSettings>
+  <nameValuePair>
+    <name>CivilianTitles</name>
+    <value>
+      <array>
+        <string>Mr.</string>
+        <string>Ms.</string>
+        <string>Mrs.</string>
+        <string>Dr.</string>
+      </array>  
+    </value>
+  </nameValuePair>
+  ...
+</configurationParameterSettings>
+----
+
+[[ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.sofa_name_mappings]]
+==== <sofaNameMappings> Element
+
+This optional element provides a mapping from defined Sofa names in the component, or the default Sofa name (if the component does not declare any Sofa names). The form of this element is: 
+[source]
+
+----
+<sofaNameMappings>
+  <sofaNameMapping cpeSofaName="a_CPE_name"
+                   componentSofaName="a_component_Name"/>
+  ...
+</sofaNameMappings>
+----
+
+There can be any number of `<sofaNameMapping>` elements contained in the `<sofaNameMappings>` element.
+The `componentSofaName` attribute is optional; leave it out to specify a mapping for the `\_InitialView` - that is, for Single-View components.
+
+[[ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.run_in_separate_process]]
+==== <runInSeparateProcess> Element
+
+The `<runInSeparateProcess>` element is mandatory for `local` CAS Processors, but should not appear for `remote` or `integrated` CAS Processors.
+It enables the CPM to create external processes using the provided runtime environment.
+Applications launched this way communicate with the CPM using the Vinci protocol and connectivity is enabled by a local instance of the VNS that the CPM manages.
+Since communication is based on Vinci, the application need not be implemented in Java.
+Any language for which Vinci provides support may be used to create an application, and the CPM will seamlessly communicate with it.
+The overall structure of this element is:
+
+[source]
+----
+<runInSeparateProcess>
+    <exec dir="[String]" executable="[String]">
+        <env key="[String]" value ="[String]"/>
+        ...
+        <arg>[String]</arg>
+        ...
+    </exec>
+</runInSeparateProcess>
+----
+
+The `<exec>` element provides information about how to execute the referenced CAS Processor.
+Two attributes are defined for the `<exec>` element.
+The `dir` attribute is currently not used -- it is reserved for future functionality.
+The `executable` attribute specifies the actual Vinci service executable that will be run by the CPM, e.g., `java`, a batch script, an application (`.exe`), etc.
+The executable must be specified with a fully qualified path, or be found in the `PATH` of the CPM.
+
+The `<exec>` element has two elements within it that define parameters used to construct the command line for executing the CAS Processor.
+These elements must be listed in the order in which they should be defined for the CAS Processor.
+
+The optional `<env>` element is used to set an environment variable.
+The variable `key` will be set to ``value``.
+For example, 
+
+[source]
+----
+<env key="CLASSPATH" value="C:Javalib"/>
+----
+
+will set the environment variable `CLASSPATH` to the value `C:\Javalib`.
+The `<env>` element may be repeated to set multiple environment variables.
+All of the key/value pairs will be added to the environment by the CPM prior to launching the executable.
+
+[NOTE]
+====
+The CPM actually adds ALL system environment variables when it launches the program.
+It queries the Operating System for its current system variables and one by one adds them to the program's process configuration.
+====
+
+The `<arg>` element is used to specify arbitrary string arguments that will appear on the command line when the CPM runs the command specified in the `executable` attribute.
+
+For example, the following would be used to invoke the UIMA Java implementation of the Vinci service wrapper on a Java CAS Processor: 
+[source]
+----
+<runInSeparateProcess>
+    <exec executable="java">
+        <arg>-DVNS_HOST=localhost</arg> 
+        <arg>-DVNS_PORT=9099</arg>
+        <arg>org.apache.uima.reference_impl.analysis_engine.service.
+vinci.VinciAnalysisEngineService_impl</arg> 
+        <arg>C:uimadescdeployCasProcessor.xml</arg>
+    </exec>
+<runInSeparateProcess>
+----
+
+This will cause the CPM to run the following command line when starting the CAS Processor: 
+
+[source]
+----
+java -DVNS_HOST=localhost -DVNS_PORT=9099 
+  org.apache.uima.reference_impl.analysis_engine.service.vinci.\\
+              VinciAnalysisEngineService_impl 
+  C:uimadescdeployCasProcessor.xml
+----
+
+The first argument specifies that the Vinci Naming Service is running on the ``localhost``.
+The second argument specifies that the Vinci Naming Service port number is ``9099``.
+The third argument (split over 2 lines in this documentation)  identifies the UIMA implementation of the Vinci service wrapper.
+This class contains the `main` method that will execute.
+That main method in turn takes a single argument -- the filename for the CAS Processor service deployment descriptor.
+Thus the last argument identifies the Vinci service deployment descriptor file for the CAS Processor.
+Since this is the same descriptor file specified earlier in the `<descriptor>` element, the string `${descriptor}` can be used to refer to the descriptor, e.g.: 
+
+[source]
+----
+<arg>${descriptor}</arg>
+----
+
+The CPM will expand this out to the service deployment descriptor file referenced in the `<descriptor>` element.
+
+[[ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.deployment_parameters]]
+==== <deploymentParameters> Element
+
+The `<deploymentParameters>` element defines a number of deployment parameters that control how the CPM will interact with the CAS Processor.
+This element has the following overall form: 
+[source]
+----
+<deploymentParameters>
+    <parameter name="[String]" value="..." type="string|integer" /> 
+    ...
+</deploymentParameters>
+----
+
+The `name` attribute identifies the parameter, the `value` attribute specifies the value that will be assigned to the parameter, and the `type` attribute indicates the type of the parameter, either `string` or ``integer``.
+The available parameters include: 
+
+service-access::
+string parameter whose value must be "`exclusive`", if present.
+This parameter is only effective for remote deployments.
+It modifies the Vinci service connections to be preallocated and dedicated, one service instance per pipe-line.
+It is only relevant for non-Integrated deployement modes.
+If there are fewer services instances that are available (and alive -- responding to a `ping` request) than there are pipelines, the number of pipelines (the number of concurrent threads) is reduced to match the number of available instances.
+If not specified, the VNS is queried each time a service is needed, and a "`random`" instance is assigned from the pool of available instances.
+If a services dies during processing, the CPM will use its normal error handling procedures to attempt to reconnect.
+The number of attempts is specified in the CPE descriptor for each Cas Processor using the `<maxConsecutiveRestarts value="10" action="kill-pipeline" waitTimeBetweenRetries="50"/>` xml element.
+The "`value`" attribute is the number of reconnection tries; the "`action`" says what to do if the retries exceed the limit.
+The "`kill-pipeline`" action stops the pipeline that was associated with the failing service (other pipelines will continue to work). The CAS in process within a killed pipeline will be dropped.
+These events are communicated to the application using the normal event listener mechanism.
+The `waitTimeBetweenRetries` says how many milliseconds to wait inbetween attempts to reconnect.
+
+vnsHost::
+(Deprecated) string parameter specifying the VNS host, e.g., `localhost` for local CAS Processors, host name or IP address of VNS host for remote CAS Processors.
+This parameter is deprecated; use the parameter specification instead inside the Vinci __Service Client Descriptor__, if needed.
+It is ignored for integrated and local deployments.
+If present, for remote deployments, it specifies the VNS Host to use, unless that is specified in the Vinci __Service Client Descriptor__.
+
+vnsPort::
+(Deprecated) integer parameter specifying the VNS port number.
+This parameter is deprecated; use the parameter specification instead inside the Vinci _Service Client
+Descriptor,_ if needed.
+It is ignored for integrated and local deployments.
+If present, for remote deployments, it specifies the VNS Port number to use, unless that is specified in the Vinci _Service Client Descriptor._
+
+For example, the following parameters might be used with a CAS Processor deployed in local mode: 
+[source]
+----
+<deploymentParameters>
+  <parameter name="service-access" value="exclusive" type="string"/> 
+</deploymentParameters>
+----
+
+[[ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.filter]]
+==== <filter> Element
+
+The <filter> element is a required element but currently should be left empty.
+This element is reserved for future use.
+
+[[ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.error_handling]]
+==== <errorHandling> Element
+
+The mandatory `<errorHandling>` element defines error and restart policies for the CAS Processor.
+Each CAS Processor may define different actions in the event of errors and restarts.
+The CPM monitors and logs errant behaviors and attempts to recover the component based on the policies specified in this element.
+
+There are two kinds of faults: 
+
+. One kind only occurs with non-integrated CAS Processors –this fault is either a timeout attempting to launch or connect to the non-integrated component, or some other kind of connection related exception (for instance, the network connection might timeout or get reset).
+. The other kind happens when the CAS Processor component (an Annotator, for example) throws any kind of exception. This kind may occur with any kind of deployment, integrated or not. 
+
+The <errorHandling> has specifications for each of these kinds of faults.
+The format of this element is: 
+[source]
+----
+<errorHandling>
+  <maxConsecutiveRestarts action="continue|disable|terminate"
+                           value="[Number]"/>
+  <errorRateThreshold action="continue|disable|terminate" value="[Rate]"/>
+  <timeout max="[Number]"/>
+</errorHandling>
+----
+
+The mandatory `<maxConsecutiveRestarts>` element applies only to faults of the first kind, and therefore, only applies to non-integrated deployments.
+If such a fault occurs, a retry is attempted, up to `value="[Number]"` of times.
+This retry resets the connection (if one was made) and attempts to reconnect and perhaps re-launch (see below for details). The original CAS (not a partially updated one) is sent to the CAS Processor as part of the retry, once the deployed component has been successfully restarted or reconnected to.
+
+The `action` attribute specifies the action to take when the threshold specified by the `value="[Number]"` is exceeded.
+The possible actions are: 
+
+continue::
+skip any further processing for this CAS by this CAS Processor, and pass the CAS to the next CAS Processor in the Pipeline. 
++
+The "`restart`" action is done, because it is needed for the next CAS.
++
+If the ``dropCasOnException="true"``, the CPM will NOT pass the CAS to the next CAS Processor in the chain.
+Instead, the CPM will abort processing of this CAS, release the CAS back to the CAS Pool and will process the next CAS in the queue.
++
+The counter counting the restarts toward the threshold is only reset after a CAS is successfully processed.
+
+disable::
+the current CAS is handled just as in the `continue` case, but in addition, the CAS Processor is marked so that its _process()_ method will not be called again (i.e., it will be "`skipped`" for future CASes)
+
+terminate::
+the CPM will terminate all processing and exit.
+
+The definition of an error for the `<maxConsecutiveRestarts>` element differs slightly for each of the three CAS Processor deployment modes: 
+
+local::
+Local CAS Processors experience two general error types: 
++
+
+* launch errors –errors associated with launching a process
+* processing errors –errors associated with sending Vinci commands to the process
+
++
+A launch error is defined by a failure of the process to successfully register with the local VNS within a default time window.
+The current timeout is 15 minutes.
+Multiple local CAS Processors are launched sequentially, with a subsequent processor launched immediately after its previous processor successfully registers with the VNS.
++
+A processing error is detected if a connection to the CAS Processor is lost or if the processing time exceeds a specified timeout value.
++
+For local CAS Processors, the <maxConsecutiveRestarts> element specifies the number of consecutive attempts made to launch the CAS Processor at CPM startup or after the CPM has lost a connection to the CAS Processor.
+
+remote::
+For remote CAS Processors, the <maxConsecutiveRestarts> element applies to errors from sending Vinci commands.
+An error is detected if a connection to the CAS Processor is lost, or if the processing time exceeds the timeout value specified in the <timeout> element (see below).
+
+integrated::
+Although mandatory, the <maxConsecutiveRestarts> element is NOT used for integrated CAS Processors, because Integrated CAS Processors are not re-instantiated/restarted on exceptions.
+This setting is ignored by the CPM for Integrated CAS Processors but it is required.
+Future version of the CPM will make this element mandatory for remote and local CAS Processors only.
+
+The mandatory `<errorRateThreshold>` element is used for all faults – both those above, and exceptions thrown by the CAS Processor itself.
+It specifies the number of retries for exceptions thrown by the CAS Processor itself, a maximum error rate, and the corresponding action to take when this rate is exceeded.
+The `value` attribute specifies the error rate in terms of errors per sample size in the form "``N/M``", where `N` is the number of errors and `M` is the sample size, defined in terms of the number of documents.
+
+The first number is used also to indicate the maximum number of retries.
+If this number is less than the ``<maxConsecutiveRestarts
+            value="[Number]">, ``it will override, reducing the number of "`restarts`" attempted.
+A retry is done only if the ``dropCasOnException ``is false.
+If it is set to true, no retry occurs, but the error is counted.
+
+When the number of counted errors exceeds the sample size, an action specified by the `action` attribute is taken.
+The possible actions and their meaning are the same as described above for the `<maxConsecutiveRestarts>` element: 
+
+* `continue`
+* `disable`
+* `terminate`
+
+The `dropCasOnException="true"` attribute of the `<casProcessors>` element modifies the action taken for continue and disable, in the same manner as above.
+For example: 
+[source]
+----
+<errorRateThreshold value="3/1000" action="disable"/>
+----
+specifies that each error thrown by the CAS Processor itself will be retried up to 3 times (if `dropCasOnException` is false) and the CAS Processor will be disabled if the error rate exceeds 3 errors in 1000 documents.
+
+If a document causes an error and the error rate threshold for the CAS Processor is not exceeded, the CPM increments the CAS Processor's error count and retries processing that document (if `dropCasOnException` is false). The retry means that the CPM calls the CAS Processor's process() method again, passing in as an argument the same CAS that previously caused an exception.
+
+[NOTE]
+====
+The CPM does not attempt to rollback any partial changes that may have been applied to the CAS in the previous process() call. 
+====
+
+Errors are accumulated across documents.
+For example, assume the error rate threshold is ``3/1000``.
+The same document may fail three times before finally succeeding on the fourth try, but the error count is now 3.
+If one more error occurs within the current sample of 1000 documents, the error rate threshold will be exceeded and the specified action will be taken.
+If no more errors occur within the current sample, the error counter is reset to 0 for the next sample of 1000 documents.
+
+The `<timeout>` element is a mandatory element.
+Although mandatory for all CAS Processors, this element is only relevant for local and remote CAS Processors.
+For integrated CAS Processors, this element is ignored.
+In the current CPM implementation the integrated CAS Processor process() method is not subject to timeouts.
+
+The `max` attribute specifies the maximum amount of time in milliseconds the CPM will wait for a process() method to complete When exceeded, the CPM will generate an exception and will treat this as an error subject to the threshold defined in the `<errorRateThreshold>` element above, including doing retries.
+
+[[ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.error_handling.timeout_retry_action]]
+===== Retry action taken on a timeout
+
+The action taken depends on whether the CAS Processor is local (managed) or remote (unmanaged). Local CAS Processors (which are services) are killed and restarted, and a new connection to them is established.
+For remote CAS Processors, the connection to them is dropped, and a new connection is reestablished (which may actually connect to a different instance of the remote services, if it has multiple instances).
+
+[[ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.checkpoint]]
+==== <checkpoint> Element
+
+The `<checkpoint>` element is an optional element used to improve the performance of CAS Consumers.
+It has a single attribute, ``batch``, which specifies the number of CASes in a batch, e.g.: 
+[source]
+----
+<checkpoint batch="1000">
+----
+
+sets the batch size to 1000 CASes.
+The batch size is the interval used to mark a point in processing requiring special handling.
+The CAS Processor's `batchProcessComplete()` method will be called by the CPM when this mark is reached so that the processor can take appropriate action.
+This mark could be used as a mechanism to buffer up results in CAS Consumers and perform time-consuming operations, such as check-pointing, that should not be done on a per-document basis.
+
+[[ugr.ref.xml.cpe_descriptor.descriptor.operational_parameters]]
+== CPE Operational Parameters
+
+The parameters for configuring the overall CPE and CPM are specified in the `<cpeConfig>` section.
+The overall format of this section is: 
+[source]
+----
+<cpeConfig>
+  <startAt>[NumberOrID]</startAt>
+
+  <numToProcess>[Number]</numToProcess>
+
+  <outputQueue dequeueTimeout="[Number]" queueClass="[ClassName]" />
+
+  <checkpoint file="[File]" time="[Number]" batch="[Number]"/>
+
+  <timerImpl>[ClassName]</timerImpl>
+
+  <deployAs>vinciService|interactive|immediate|single-threaded
+  </deployAs>
+
+</cpeConfig>
+----
+
+This section of the CPE descriptor allows for defining the starting entity, the number of entities to process, a checkpoint file and frequency, a pluggable timer, an optional output queue implementation, and finally a mode of operation.
+The mode of operation determines how the CPM interacts with users and other systems.
+
+The `<startAt>` element is an optional argument.
+It defines the starting entity in the collection at which the CPM should start processing.
+
+The implementation in the CPM passes this argument to the Collection Reader as the value of the parameter "``startNumber``".
+The CPM does not do anything else with this parameter; in particular, the CPM has no ability to skip to a specific document - that function, if available, is only provided by a particular Collection Reader implementation.
+
+If the `<startAt>` element is used, the Collection Reader descriptor must define a single-valued configuration parameter with the name ``startNumber``.
+It can declare this value to be of any type; the value passed in this XML element must be convertible to that type.
+
+A typical use is to declare this to be an integer type, and to pass the sequential document number where processing should start.
+An alternative implementation might take a specific document ID; the collection reader could search through its collection until it reaches this ID and then start there.
+
+This parameter will only make sense if the particular collection reader is implemented to use the `startNumber` configuration parameter.
+
+The `<numToProcess>` element is an optional element.
+It specifies the total number of entities to process.
+Use -1 to indicate ALL.
+If not defined, the number of entities to process will be taken from the Collection Reader configuration.
+If present, this value overrides the Collection Reader configuration.
+
+The `<outputQueue>` element is an optional element.
+It enables plugging in a custom implementation for the Output Queue.
+When omitted, the CPM will use a default output queue that is based on First-in First-out (FIFO) model.
+
+The UIMA SDK provides a second implementation for the Output Queue that can be plugged in to the CPM, named "``org.apache.uima.collection.impl.cpm.engine.SequencedQueue``".
+
+This implementation supports handling very large documents that are split into "`chunks`"; it provides a delivery mechanism that insures the sequential order of the chunks using information carried in the CAS metadata.
+This metadata, which is required for this implementation to work correctly, must be added as an instance of a Feature Structure of type `org.apache.es.tt.DocumentMetaData` and referred to by an additional feature named `esDocumentMetaData` in the special instance of `uima.tcas.DocumentAnnotation` that is associated with the CAS.
+This is usually done by the Collection Reader; the instance contains the following features: 
+
+sequenceNumber::
+[Number] the sequential number of a chunk, starting at 1.
+If not a chunk (i.e.
+complete document), the value should be 0.
+
+documentId::
+[Number] current document id.
+Chunks belonging to the same document have identical document id.
+
+isCompleted::
+[Number] 1 if the chunk is the last in a sequence, 0 otherwise.
+
+url::
+[String] document url.
+
+throttleID::
+[String] special attribute currently used by OmniFind.
+
+This implementation of a sequenced queue supports proper sequencing of CASes in CPM deployments that use document chunking.
+Chunking is a technique of splitting large documents into pieces to reduce overall memory consumption.
+Chunking does not depend on the number of CASes in the CAS Pool.
+It works equally well with one or more CASes in the CAS Pool.
+Each chunk is packaged in a separate CAS and placed in the Work Queue.
+If the CAS Pool is depleted, the CollectionReader thread is suspended until a CAS is released back to the pool by the processing threads.
+A document may be split into 1, 2, 3 or more chunks that are analyzed independently.
+In order to reconstruct the document correctly, the CAS Consumer can depend on receiving the chunks in the same sequential order that the chunks were "`produced`", when this sequenced queue implementation is used.
+To plug in this sequenced queue to the CPM use the following specification: 
+[source]
+----
+<outputQueue dequeueTimeout="100000" queueClass=
+"org.apache.uima.collection.impl.cpm.engine.SequencedQueue"/>
+---- where the mandatory `queueClass` attribute defines the name of the class and the second mandatory attribute, `dequeueTimeout` specifies the maximum number of milliseconds to wait for the expected chunk.
+
+[NOTE]
+====
+The value for this timeout must be carefully determined to avoid excessive occurrences of timeouts.
+Typically, the size of a chunk and the type of analysis being done are the most important factors when deciding on the value for the timeout.
+The larger the chunk and the more complicated analysis, the more time it takes for the chunk to go from source to sink.
+You may specify 0, in which case, the timeout is  disabled - i.e., it is equivalent to an infinitely long timeout.
+====
+
+If the chunk doesn't arrive in the configured time window, the entire document is presumed to be invalid and the CAS is dropped from further processing.
+This action occurs regardless of any other error action specification.
+The SequencedQueue invalidate the document, adding the offending document's metadata to a local cache of invalid documents. 
+
+If the time out occurs, the CPM notifies all xref:tug.adoc#ugr.tug.cpe.using_listeners[registered listeners] by calling `entityProcessComplete()`. As part of this call, the SequencedQueue will pass null instead of a CAS as the first argument, and a special exception -- `CPMChunkTimeoutException`.
+The reason for passing null as the first argument is because the time out occurs due to the fact that the chunk has not been received in the configured timeout window, so there is no CAS available when the timeout event occurs.
+
+The `CPMChunkTimeoutException` object includes an API that allows the listener to retrieve the offending document id as well as the other metadata attributes as defined above.
+These attributes are part of each chunk's metadata and are added by the Collection Reader.
+
+Each chunk that `SequencedQueue` works on is subjected to a test to determine if the chunk belongs to an invalid document.
+This test checks the chunk's metadata against the data in the local cache.
+If there is a match, the chunk is dropped.
+This check is only performed for chunks and complete documents are not subject to this check.
+
+If there is an exception during the processing of a chunk, the CPM sends a notification to all registered listeners.
+The notification includes the CAS and an exception.
+When the listener notification is completed, the CPM also sends separate notifications, containing the CAS, to the Artifact Producer and the SequencedQueue.
+The intent is to stop adding new chunks to the Work Queue that belong to an `invalid` document and also to deal with chunks that are en-route, being processed by the processing threads.
+
+In response to the notification, the Artifact Producer will drop and release back to the CAS Pool all CASes that belong to an "`invalid`" document.
+Currently, there is no support in the CollectionReader's API to tell it to stop generating chunks.
+The CollectionReader keeps producing the chunks but the Artifact Producer immediately drops/releases them to the CAS Pool.
+Before the CAS is released back to the CAS Pool, the Artifact Producer sends notification to all registered listeners.
+This notification includes the CAS and an exception -- `SkipCasException`.
+
+In response to the notification of an exception involving a chunk, the SequencedQueue retrieves from the CAS the metadata and adds it to its local cache of `invalid` documents.
+All chunks de-queued from the OutputQueue and belonging to `invalid` documents will be dropped and released back to the CAS Pool.
+Before dropping the CAS, the CPM sends notification to all registered listeners.
+The notification includes the CAS and SkipCasException.
+
+The `<checkpoint>` element is an optional element.
+It specifies a CPE checkpoint file, checkpoint frequency, and strategy for checkpoints (time or count based). At checkpoint time, the CPM saves status information and statistics to the checkpoint file.
+The checkpoint file is specified in the `file` attribute, which has the same form as the `href` attribute of the `<include>` element described in <<ugr.ref.xml.cpe_descriptor.imports>>.
+The `time` attribute indicates that a checkpoint should be taken every `[Number]` seconds, and the `batch` attribute indicates that a checkpoint should be taken every `[Number]` batches.
+
+The `<timerImpl>` element is optional.
+It is used to identify a custom timer plug-in class to generate time stamps during the CPM execution.
+The value of the element is a Java class name.
+
+The `<deployAs>` element indicates the type of CPM deployment.
+Valid contents for this element include: 
+
+vinciService::
+Vinci service exposing APIs for stop, pause, resume, and getStats
+
+interactive::
+provide command line menus (start, stop, pause, resume)
+
+immediate::
+run the CPM without menus or a service API
+
+single-threaded::
+run the CPM in a single threaded mode.
+In this mode, the Collection Reader, the Processing Pipeline, and the CAS Consumer Pipeline are all running in one thread without the work queue and the output queue.
+
+[[ugr.ref.xml.cpe_descriptor.descriptor.resource_manager_configuration]]
+== Resource Manager Configuration
+
+xref:tug.adoc#ugr.tug.aae.accessing_external_resource_files[External resource bindings] for the CPE may optionally be specified in an element: 
+[source]
+----
+<resourceManagerConfiguration href="..."/>
+----
+
+In the `resourceManagerConfiguration` element, the value of the href attribute refers to another file that contains definitions and bindings for the external resources used by the CPE.
+The format of this file is the same as for xref:ref.adoc#ugr.ref.xml.component_descriptor.aes.aggregate.external_resource_bindings[Aggregate Analysis Engines].
+For example, in a CPE containing an aggregate analysis engine with two annotators, and a CAS Consumer, the following resource manager configuration file would bind external resource dependencies in all three components to the same physical resource: 
+
+[source]
+----
+<resourceManagerConfiguration>
+
+  <!-- Declare Resource -->
+
+  <externalResources>
+    <externalResource>
+      <name>ExampleResource</name>
+      <fileResourceSpecifier>
+        <fileUrl>file:MyResourceFile.dat</fileUrl>
+      </fileResourceSpecifier>
+    </externalResource>
+  </externalResources>
+
+  <!-- Bind component resource dependencies to ExampleResource -->
+
+  <externalResourceBindings>
+    <externalResourceBinding>
+      <key>MyAE/annotator1/myResourceKey</key>
+      <resourceName>ExampleResource</resourceName>
+    </externalResourceBinding>
+
+    <externalResourceBinding>
+      <key>MyAE/annotator2/someResourceKey</key>
+      <resourceName>ExampleResource</resourceName>
+    </externalResourceBinding>
+
+    <externalResourceBinding>
+      <key>MyCasConsumer/otherResourceKey</key>
+      <resourceName>ExampleResource</resourceName>
+    </externalResourceBinding>
+
+  </externalResourceBindings>
+
+</resourceManagerConfiguration>
+----
+
+In this example, `MyAE` and `MyCasConsumer` are the names of the Analysis Engine and CAS Consumer, as specified by the name attributes of the CPE's `<casProcessor>` elements. `annotator1` and `annotator2` are the annotator keys specified within the Aggregate AE Descriptor, and ``myResourceKey``, ``someResourceKey``, and `otherResourceKey` are the keys of the resource dependencies declared in the individual annotator and CAS Consumer descriptors.
+
+[[ugr.ref.xml.cpe_descriptor.descriptor.example]]
+== Example CPE Descriptor
+
+[source]
+----
+<?xml version="1.0" encoding="UTF-8"?>
+<cpeDescription>
+  <collectionReader>
+    <collectionIterator>
+      <descriptor>
+        <import location=
+           "../collection_reader/FileSystemCollectionReader.xml"/>
+      </descriptor>
+    </collectionIterator>
+  </collectionReader>
+  <casProcessors dropCasOnException="true" casPoolSize="1" 
+      processingUnitThreadCount="1">
+    <casProcessor deployment="integrated" 
+      name="Aggregate TAE - Name Recognizer and Person Title Annotator">
+      <descriptor>
+        <import location=
+           "../analysis_engine/NamesAndPersonTitles_TAE.xml"/>
+      </descriptor>
+      <deploymentParameters/>
+      <filter/>
+      <errorHandling>
+        <errorRateThreshold action="terminate" value="100/1000"/>
+                <maxConsecutiveRestarts action="terminate" value="30"/>
+                <timeout max="100000"/>
+      </errorHandling>
+      <checkpoint batch="1"/>
+    </casProcessor>
+    <casProcessor deployment="integrated" name="Annotation Printer">
+      <descriptor>
+        <import location="../cas_consumer/AnnotationPrinter.xml"/>
+      </descriptor>
+      <deploymentParameters/>
+      <filter/>
+      <errorHandling>
+        <errorRateThreshold action="terminate" value="100/1000"/>
+        <maxConsecutiveRestarts action="terminate" value="30"/>
+        <timeout max="100000"/>
+      </errorHandling>
+      <checkpoint batch="1"/>
+    </casProcessor>
+  </casProcessors>
+  <cpeConfig>
+    <numToProcess>1</numToProcess>
+    <deployAs>immediate</deployAs>
+    <checkpoint file="" time="3000"/>
+    <timerImpl/>
+  </cpeConfig>
+</cpeDescription>
+----
diff --git a/uimaj-documentation/src/docs/asciidoc/tools.adoc b/uimaj-documentation/src/docs/asciidoc/tools.adoc
new file mode 100644
index 0000000..4f2639e
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/tools.adoc
@@ -0,0 +1,46 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+= Apache UIMA™ - Tools
+:Author: Apache UIMA™ Development Community
+:toc-title: UIMA Tools
+
+include::tools/common_book_info.adoc[leveloffset=+1]
+
+include::tools/tools.cde.adoc[leveloffset=+1]
+
+include::tools/tools.cpe.adoc[leveloffset=+1]
+
+include::tools/tools.doc_analyzer.adoc[leveloffset=+1]
+
+include::tools/tools.annotation_viewer.adoc[leveloffset=+1]
+
+include::tools/tools.cvd.adoc[leveloffset=+1]
+
+include::tools/tools.eclipse_launcher.adoc[leveloffset=+1]
+
+include::tools/tools.caseditor.adoc[leveloffset=+1]
+
+include::tools/tools.jcasgen.adoc[leveloffset=+1]
+
+include::tools/tools.pear.packager.adoc[leveloffset=+1]
+
+include::tools/tools.pear.packager.maven.adoc[leveloffset=+1]
+
+include::tools/tools.pear.installer.adoc[leveloffset=+1]
+
+include::tools/tools.pear.merger.adoc[leveloffset=+1]
diff --git a/uimaj-documentation/src/docs/asciidoc/tools/common_book_info.adoc b/uimaj-documentation/src/docs/asciidoc/tools/common_book_info.adoc
new file mode 100644
index 0000000..537f3e6
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/tools/common_book_info.adoc
@@ -0,0 +1,42 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+Copyright © 2006, 2021 The Apache Software Foundation
+
+Copyright © 2004, 2006 International Business Machines Corporation
+
+[discrete]
+=== License and Disclaimer
+
+The ASF licenses this documentation to you under the Apache License, Version 2.0 (the "License"); 
+you may not use this documentation except in compliance with the License.  You may obtain a copy of
+the License at
+
+[.text-center]
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, this documentation and its contents are
+distributed under the License on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
+either express or implied.  See the License for the specific language governing permissions and
+limitations under the License.
+
+[discrete]
+=== Trademarks
+
+All terms mentioned in the text that are known to be trademarks or service marks have been 
+appropriately capitalized.  Use of such terms in this book should not be regarded as affecting the
+validity of the the trademark or service mark.
\ No newline at end of file
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.annotation_viewer/image002.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.annotation_viewer/image002.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.annotation_viewer/image002.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.annotation_viewer/image002.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.caseditor/CasEditor.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/CasEditor.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.caseditor/CasEditor.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/CasEditor.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.caseditor/EditView.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/EditView.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.caseditor/EditView.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/EditView.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.caseditor/Editor.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/Editor.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.caseditor/Editor.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/Editor.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.caseditor/EditorAllTypes.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/EditorAllTypes.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.caseditor/EditorAllTypes.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/EditorAllTypes.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.caseditor/EditorOneType.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/EditorOneType.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.caseditor/EditorOneType.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/EditorOneType.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.caseditor/FSView.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/FSView.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.caseditor/FSView.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/FSView.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.caseditor/ModeMenu.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/ModeMenu.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.caseditor/ModeMenu.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/ModeMenu.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.caseditor/Outline.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/Outline.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.caseditor/Outline.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/Outline.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.caseditor/ProvideTypeSystem.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/ProvideTypeSystem.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.caseditor/ProvideTypeSystem.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/ProvideTypeSystem.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.caseditor/ShiftEnter.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/ShiftEnter.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.caseditor/ShiftEnter.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/ShiftEnter.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.caseditor/ShowAnnotationsMenu.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/ShowAnnotationsMenu.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.caseditor/ShowAnnotationsMenu.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/ShowAnnotationsMenu.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.caseditor/Style-Background.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/Style-Background.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.caseditor/Style-Background.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/Style-Background.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.caseditor/Style-Box.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/Style-Box.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.caseditor/Style-Box.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/Style-Box.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.caseditor/Style-Bracket.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/Style-Bracket.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.caseditor/Style-Bracket.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/Style-Bracket.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.caseditor/Style-Squiggles.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/Style-Squiggles.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.caseditor/Style-Squiggles.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/Style-Squiggles.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.caseditor/Style-TextColor.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/Style-TextColor.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.caseditor/Style-TextColor.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/Style-TextColor.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.caseditor/Style-Token.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/Style-Token.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.caseditor/Style-Token.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/Style-Token.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.caseditor/Style-Underline.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/Style-Underline.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.caseditor/Style-Underline.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/Style-Underline.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.caseditor/StyleProperties.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/StyleProperties.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.caseditor/StyleProperties.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/StyleProperties.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.caseditor/StyleView.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/StyleView.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.caseditor/StyleView.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/StyleView.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.caseditor/StyleView2.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/StyleView2.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.caseditor/StyleView2.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.caseditor/StyleView2.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/delegate-chooser.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/delegate-chooser.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/delegate-chooser.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/delegate-chooser.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image002.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image002.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image002.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image002.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image004.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image004.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image004.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image004.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image006.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image006.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image006.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image006.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image008.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image008.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image008.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image008.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image010.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image010.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image010.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image010.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image012.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image012.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image012.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image012.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image014.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image014.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image014.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image014.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image014v2.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image014v2.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image014v2.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image014v2.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image016.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image016.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image016.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image016.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image018.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image018.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image018.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image018.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image020.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image020.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image020.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image020.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image022.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image022.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image022.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image022.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image024.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image024.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image024.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image024.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image025.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image025.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image025.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image025.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image026.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image026.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image026.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image026.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image028.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image028.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image028.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image028.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image030.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image030.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image030.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image030.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image032.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image032.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image032.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image032.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image034.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image034.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image034.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image034.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image036.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image036.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image036.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image036.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image038.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image038.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image038.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image038.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image040.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image040.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image040.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image040.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image042.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image042.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image042.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image042.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image044.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image044.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image044.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image044.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image046.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image046.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image046.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image046.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image048.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image048.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image048.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image048.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image050.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image050.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image050.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image050.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image052.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image052.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image052.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image052.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image054.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image054.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image054.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image054.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image056.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image056.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image056.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image056.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image058.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image058.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image058.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image058.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image060.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image060.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image060.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image060.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image062.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image062.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image062.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image062.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image064.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image064.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image064.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image064.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image066.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image066.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image066.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image066.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image068.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image068.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image068.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image068.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image070.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image070.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image070.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image070.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/image072.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image072.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/image072.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/image072.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/import-by-location.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/import-by-location.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/import-by-location.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/import-by-location.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/import-by-name.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/import-by-name.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/import-by-name.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/import-by-name.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/import-chooser.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/import-chooser.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/import-chooser.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/import-chooser.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/limitJCasGen.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/limitJCasGen.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/limitJCasGen.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/limitJCasGen.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cde/limitJCasGenType.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/limitJCasGenType.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cde/limitJCasGenType.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cde/limitJCasGenType.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cpe/image002.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cpe/image002.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cpe/image002.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cpe/image002.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cpe/image004.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cpe/image004.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cpe/image004.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cpe/image004.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cvd/AnnotationViewer.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/AnnotationViewer.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cvd/AnnotationViewer.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/AnnotationViewer.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cvd/ChangeCodePage.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/ChangeCodePage.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cvd/ChangeCodePage.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/ChangeCodePage.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cvd/CustomizeColors.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/CustomizeColors.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cvd/CustomizeColors.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/CustomizeColors.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cvd/EditMenu.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/EditMenu.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cvd/EditMenu.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/EditMenu.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cvd/ErrorExample.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/ErrorExample.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cvd/ErrorExample.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/ErrorExample.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cvd/FileMenu.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/FileMenu.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cvd/FileMenu.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/FileMenu.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cvd/LogView.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/LogView.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cvd/LogView.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/LogView.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cvd/Main1.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/Main1.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cvd/Main1.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/Main1.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cvd/Main2.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/Main2.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cvd/Main2.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/Main2.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cvd/Main3.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/Main3.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cvd/Main3.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/Main3.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cvd/Main4.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/Main4.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cvd/Main4.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/Main4.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cvd/RunMenu.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/RunMenu.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cvd/RunMenu.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/RunMenu.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cvd/StatusBar.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/StatusBar.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cvd/StatusBar.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/StatusBar.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cvd/TypeSystemViewer.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/TypeSystemViewer.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cvd/TypeSystemViewer.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/TypeSystemViewer.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.cvd/eclipse-cvd-launch.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/eclipse-cvd-launch.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.cvd/eclipse-cvd-launch.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.cvd/eclipse-cvd-launch.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.doc_analyzer/DocAnalyzerScr1.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.doc_analyzer/DocAnalyzerScr1.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.doc_analyzer/DocAnalyzerScr1.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.doc_analyzer/DocAnalyzerScr1.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.doc_analyzer/image002.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.doc_analyzer/image002.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.doc_analyzer/image002.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.doc_analyzer/image002.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.doc_analyzer/image004.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.doc_analyzer/image004.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.doc_analyzer/image004.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.doc_analyzer/image004.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.doc_analyzer/image006.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.doc_analyzer/image006.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.doc_analyzer/image006.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.doc_analyzer/image006.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.doc_analyzer/image006v2.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.doc_analyzer/image006v2.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.doc_analyzer/image006v2.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.doc_analyzer/image006v2.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.doc_analyzer/image007-1v2.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.doc_analyzer/image007-1v2.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.doc_analyzer/image007-1v2.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.doc_analyzer/image007-1v2.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.doc_analyzer/image007-2v2.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.doc_analyzer/image007-2v2.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.doc_analyzer/image007-2v2.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.doc_analyzer/image007-2v2.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.doc_analyzer/image007-3v2.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.doc_analyzer/image007-3v2.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.doc_analyzer/image007-3v2.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.doc_analyzer/image007-3v2.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.doc_analyzer/image007.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.doc_analyzer/image007.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.doc_analyzer/image007.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.doc_analyzer/image007.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.doc_analyzer/image008.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.doc_analyzer/image008.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.doc_analyzer/image008.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.doc_analyzer/image008.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.doc_analyzer/image010.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.doc_analyzer/image010.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.doc_analyzer/image010.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.doc_analyzer/image010.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.eclipse_launcher/image01.png b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.eclipse_launcher/image01.png
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.eclipse_launcher/image01.png
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.eclipse_launcher/image01.png
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.jcasgen/image002.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.jcasgen/image002.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.jcasgen/image002.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.jcasgen/image002.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.jcasgen/image004.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.jcasgen/image004.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.jcasgen/image004.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.jcasgen/image004.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.pear.installer/image002.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.pear.installer/image002.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.pear.installer/image002.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.pear.installer/image002.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.pear.packager/image001.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.pear.packager/image001.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.pear.packager/image001.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.pear.packager/image001.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.pear.packager/image002.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.pear.packager/image002.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.pear.packager/image002.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.pear.packager/image002.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.pear.packager/image004.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.pear.packager/image004.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.pear.packager/image004.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.pear.packager/image004.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.pear.packager/image005.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.pear.packager/image005.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.pear.packager/image005.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.pear.packager/image005.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.pear.packager/image006.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.pear.packager/image006.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.pear.packager/image006.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.pear.packager/image006.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.pear.packager/image008.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.pear.packager/image008.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.pear.packager/image008.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.pear.packager/image008.jpg
Binary files differ
diff --git a/uima-docbook-tools/src/docbook/images/tools/tools.pear.packager/image010.jpg b/uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.pear.packager/image010.jpg
similarity index 100%
rename from uima-docbook-tools/src/docbook/images/tools/tools.pear.packager/image010.jpg
rename to uimaj-documentation/src/docs/asciidoc/tools/images/tools/tools.pear.packager/image010.jpg
Binary files differ
diff --git a/uimaj-documentation/src/docs/asciidoc/tools/tools.annotation_viewer.adoc b/uimaj-documentation/src/docs/asciidoc/tools/tools.annotation_viewer.adoc
new file mode 100644
index 0000000..de62dd6
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/tools/tools.annotation_viewer.adoc
@@ -0,0 +1,43 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.tools.annotation_viewer]]
+= Annotation Viewer
+
+The _Annotation Viewer_ is a tool for viewing analysis results that have been saved to your disk as __external XML representations of the CAS__.
+These are saved in a particular format called XMI.
+In the UIMA SDK, XML versions of CASes can be generated by:
+
+* Running the xref:tools.adoc#ugr.tools.doc_analyzer[Document Analyzer], which saves an XML representations of the CAS to the specified output directory.
+* Running a Collection Processing Engine that includes the __XMI Writer__ CAS Consumer (`examples/descriptors/cas_consumer/XmiWriterCasConsumer.xml)`. 
+* Explicitly creating XML representations of the CAS from your own application using the org.apache.uima.cas.impl.XMISerializer class. The best way to learn how to do this is to look at the example code for the XMI Writer CAS Consumer, located in ``examples/src/org/apache/uima/examples/xmi/XmiWriterCasConsumer.java``.
+footnote:[An older form of a different XML format for the CAS is also provided mainly for backwards compatibility. This form is called XCAS, and you can see examples of its use in `examples/src/org/apache/uima/examples/cpe/XCasWriterCasConsumer.java`.]
+
+[NOTE]
+====
+The Annotation Viewer only shows CAS views where the Sofa data type is a String. 
+====
+
+You can run the Annotation Viewer by executing the `annotationViewer` shell script located in the bin directory of the UIMA SDK or the "UIMA Annotation Viewer" Eclipse run configuration in the `uimaj-examples` project.
+This will open the following window: 
+
+.Screenshot of the Annotation Viewer
+image::images/tools/tools.annotation_viewer/image002.jpg[Screenshot of the Annotation Viewer]
+
+Select an input directory (which must contain XMI files), and the descriptor for the AE that produced the Analysis (which is needed to get the type system for the analysis). Then press the "`View`" button.
+
+This will bring up a xref:tools.adoc#ugr.tools.doc_analyzer.viewing_results[dialog] where you can select a viewing format and double-click on a document to view it.
diff --git a/uimaj-documentation/src/docs/asciidoc/tools/tools.caseditor.adoc b/uimaj-documentation/src/docs/asciidoc/tools/tools.caseditor.adoc
new file mode 100644
index 0000000..11b6e88
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/tools/tools.caseditor.adoc
@@ -0,0 +1,373 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.tools.ce]]
+= Cas Editor User's Guide
+// <titleabbrev>Cas Editor User's Guide</titleabbrev>
+
+
+[[_sandbox.caseditor.introduction]]
+== Introduction
+
+The CAS Editor is an Eclipse based annotation tool which supports manual and automatic annotation (via running UIMA annotators) of CASes stored in files.
+Currently only text-based CAS are supported.
+The CAS Editor can visualize and edit all feature structures.
+Feature Structures which are annotations can additionally be viewed and edited directly on text. 
+
+
+image::images/tools/tools.caseditor/CasEditor.png[]
+
+
+[[_sandbox.caseditor.launching]]
+== Launching the Cas Editor
+
+To open a CAS in the Cas Editor it needs a compatible type system and styling information which specify how to display the types.
+The styling information is created automatically by the Cas Editor; but the type system file must be provided by the user. 
+
+A CAS in the xmi or xcas format can simply be opened by clicking on it, like a text file is opened with the Eclipse text editor.
+
+[[_sandbox.caseditor.typesystemspec]]
+=== Specifying a type system
+
+The Cas Editor expects a type system file at the root of the project named TypeSystem.xml.
+If a type system cannot be found, this message is shown: 
+
+
+image::images/tools/tools.caseditor/ProvideTypeSystem.png[No type system available for the opened CAS.]
+
+If the type system file does not exist in this location you can point the Cas Editor to a specific type system file.
+You can also change the default type system location in the properties page of the Eclipse project.
+To do that right click the project, select Properties and go to the UIMA Type System tab, and specify the default location for the type system file. 
+
+After the Cas Editor is opened switch to the Cas Editor Perspective to see all the Cas Editor related views. 
+
+[[_sandbox.caseditor.annotation_editor]]
+== Annotation editor
+
+The annotation editor shows the text with annotations and provides different views to show aspects of the CAS. 
+
+[[ugr.tools.cas_editor.annotation_editor.editor]]
+=== Editor
+
+After the editor is open it shows the default sofa of the CAS.
+(Displaying another sofa is right now not possible.) The editor has an associated, changeable CAS Type.
+This type is called the editor "mode". By default the editor only shows annotation of this type.
+Actions and views are sensitive to this mode.
+The next screen shows the display, where the mode is set to "Person": 
+
+
+image::images/tools/tools.caseditor/EditorOneType.png[]
+				 To change the mode for the editor, use the "Mode" menu in the editor context menu.
+To open the context menu right click somewhere on the text. 
+
+
+image::images/tools/tools.caseditor/ModeMenu.png[]
+				        	 The current mode is displayed in the status line at the bottom and in the Style View. 
+
+It's possible to work with more than one annotation type at a time; the mode just selects the default annotation type which can be marked with the fewest keystrokes.
+To show annotations of other types, use the "Show" menu in the context menu. 
+
+
+image::images/tools/tools.caseditor/ShowAnnotationsMenu.png[]
+				 Alternatively, you may select the annotation types to be shown in the Style View. 
+
+
+image::images/tools/tools.caseditor/StyleView2.png[]
+ The editor will show the additional selected types. 
+
+
+image::images/tools/tools.caseditor/EditorAllTypes.png[]
+				 The annotation renderer and rendering layer can be changed in the Properties dialog.
+After the change all editors which share the same type system will be updated. 
+
+The editor automatically selects annotations of the editor mode type that are near the cursor.
+This selection is then synchronized or displayed in other views. 
+
+To create an annotation manually using the editor, mark a piece of text and then press the enter key.
+This creates an annotation of the  type of the editor mode, having bounds corresponding to the selection.
+You can also use the "Quick Annotate" action from the context menu. 
+
+It is also possible to choose the annotation type; press shift + enter (smart insert) or click on "Annotate" in the context menu for this.
+A dialog will ask for the annotation type to create; either select the desired type or use the associated key shortcut.
+In the screen shot below, pressing the "p" key will create a Person annotation for "Obama". 
+
+
+image::images/tools/tools.caseditor/ShiftEnter.png[]
+
+To delete an annotation, select it and press the delete key.
+Only annotations of the editor mode can be deleted with this method.
+To delete non-editor mode annotations use the Outline View. 
+
+For annotation projects you can change the font size in the editor.
+The default font size is 13.
+To change this open the Eclipse preference dialog,  go to "UIMA Annotation Editor". 
+
+[[_sandbox.caseditor.annotation_editor.styling]]
+=== Configure annotation styling
+
+The Cas Editor can visualize the annotations in multiple highlighting colors and with different annotation drawing styles.
+The annotation styling is defined per type system.
+When its changed, the appearance changes in all opened editors sharing a type system. 
+
+The styling is initialized with a unique color for every annotation type and every annotation is drawn with Squiggles annotation style.
+You may adjust the annotation styles and coloring depending on the project needs. 
+
+
+image::images/tools/tools.caseditor/StyleView.png[]
+
+The Cas Editor offers a property page to edit the styling.
+To open this property page click on the "Properties" button in the Styles view. 
+
+The property page can be seen below.
+By clicking on one of the annotation types, the color, drawing style and drawing layer can be edited on the right side. 
+
+
+image::images/tools/tools.caseditor/StyleProperties.png[]
+
+The annotations can be visualized with one the following  annotation stlyes: 
+
+.Style Table
+[cols="1,1,1", frame="all", options="header"]
+|===
+| Style
+| Sample
+| Description
+
+|BACKGROUND
+|
+
+
+image::images/tools/tools.caseditor/Style-Background.png[]
+
+|
+
+The background is drawn in the annotation color.
+
+|TEXT_COLOR
+|
+
+
+image::images/tools/tools.caseditor/Style-TextColor.png[]
+
+|
+
+The text is drawn in the annotation color.
+
+|TOKEN
+|
+
+
+image::images/tools/tools.caseditor/Style-Token.png[]
+
+|
+
+The token type assumes that token annotation are always separated by a whitespace.
+Only if they are not separated by a whitespace a vertical line is drawn to display the two token annotations.
+The image on the left actually contains three annotations, one for "Mr", "." and "Obama". 
+
+|SQUIGGLES
+|
+
+
+image::images/tools/tools.caseditor/Style-Squiggles.png[]
+
+|
+
+Squiggles are drawen under the annotation in the annotation color.
+
+|BOX
+|
+
+
+image::images/tools/tools.caseditor/Style-Box.png[]
+
+|
+
+A box in the annotation color is drawn around the annotation.
+
+|UNDERLINE
+|
+
+
+image::images/tools/tools.caseditor/Style-Underline.png[]
+
+|
+
+A line in the annotation color is drawen below the annotation.
+
+|BRACKET
+|
+
+
+image::images/tools/tools.caseditor/Style-Bracket.png[]
+
+|
+
+An opening bracket is drawn around the first character of the annotation and a closing bracket is drawn around the last character of the annotation.
+|===
+
+The Cas Editor can draw the annotations in different layers.
+If the spans of two annotations overlap the annotation which is in a higher layer is drawn over annotations in a lower  layer.
+Depending on the drawing style it is possible to see both annotations.
+The drawing order is defined by the layer number, layer 0 is drawn first, then layer 1 and so on.
+If annotations in the same layer overlap its not defined which annotation type is drawn first. 
+
+[[ugr.tools.cas_editor.annotation_editor.cas_views]]
+=== CAS view support
+
+The Annotation Editor can only display text Sofa CAS views.
+Displaying CAS views with Sofas of different types is not possible and will show an editor page to switch back to another CAS view.
+The Edit and Feature Structure Browser views are still available and might be used to edit Feature Structures which belong to the CAS view. 
+
+To switch to another CAS view, right click in the editor to open the context menu and choose "CAS Views" and the view the editor should switch to. 
+
+[[ugr.tools.cas_editor.annotation_editor.outline]]
+=== Outline view
+
+The outline view gives an overview of the annoations which are shown in the editor.
+The annotation are grouped by type.
+There are actions to increase or decrease the bounds of the selected annotation.
+There is also an action to merge selected annotations.
+The outline has second view mode where only annotations of the current editor mode are shown. 
+
+image::images/tools/tools.caseditor/Outline.png[]
+			 
+The style can be switched in the view menu, to a style where it only shows the annotations which  belong to the current editor mode. 
+
+[[ugr.tools.cas_editor.annotation_editor.properties_view]]
+=== Edit Views
+
+The Edit Views show details about the currently selected annotations or feature structures.
+It is possible to change primitive values in this view.
+Referenced feature structures can be created and deleted, including arrays.
+To link a feature structure with other feature structures, it can be pinned to the edit view.
+This means that it does not change if the selection changes. 
+
+
+image::images/tools/tools.caseditor/EditView.png[]
+
+
+[[ugr.tools.cas_editor.annotation_editor.fs_view]]
+=== FeatureStructure View
+
+The FeatureStructure View lists all feature structures of a specified type.
+The type is selected in the type combobox. 
+
+It's possible to create and delete feature structures of every type. 
+
+
+image::images/tools/tools.caseditor/FSView.png[]
+
+
+[[ugr.tools.cas_editor.custom_view]]
+== Implementing a custom Cas Editor View
+
+Custom Cas Editor views can be added,  to rapidly create, access and/or change Feature Structures in the CAS.
+While the Annotation Editor and its views offer support for general viewing and editing, accessing and editing things in the CAS can be streamlined using a custom Cas Editor.
+A custom Cas Editor view can be programmed to use a particular type system and optimized to quickly change or show something. 
+
+Annotation projects often need to track the annotation status of a CAS where a user needs to mark which parts have been annotated or corrected.
+To do this with the Cas Editor a user would need to use the Feature Structure Browser view to select the Feature Structure and then edit it inside the Edit view.
+A custom Cas Editor view could directly select and show the Feature Structure and offer  a tailored user interface to change the annotation status.
+Some features such as the name of the annotator could even be automatically filled in. 
+
+The creation of Feature Structures which are linked to existing annotations or Feature Structures is usually difficult with the standard views.
+A custom view which can make assumptions about the type system is usually needed to do this efficiently. 
+
+[[ugr.tools.cas_editor.custom_view.sample]]
+=== Annotation Status View Sample
+
+The Cas Editor provides the CasEditorView class as a base class for views which need to access the CAS which is opened in the current editor.
+It shows a "view not available" message when the current editor does not show a CAS, no editor is opened at all or the current CAS view is incompatible with the view. 
+
+The following snippet shows how it is usually implemented: 
+
+[source]
+----
+public class AnnotationStatusView extends CasEditorView {
+	
+  public AnnotationStatusView() {
+    super("The Annotation Status View is currently not available.");
+  }
+
+  @Override
+  protected IPageBookViewPage doCreatePage(ICasEditor editor) {
+    ICasDocument document = editor.getDocument();
+
+    if (document != null) {
+      return new AnnotationStatusViewPage(editor);
+    }
+
+    return null;
+  }
+}
+----
+
+The `doCreatePage` method is called to create the actual view page.
+If the document is null the editor failed to load a document and is showing an error message.
+In the case the document is not null but the CAS view is incompatible the method should return null to indicate that it has nothing to show.
+In this case the "not available" message is displayed. 
+
+The next step is to implement the AnnotationStatusViewPage.
+That is the page which gets the CAS as input and need to provide the user with a ui to change the Annotation Status Feature Structure. 
+
+[source]
+----
+public class AnnotationStatusViewPage extends Page {
+  
+  private ICasEditor editor;
+  
+  AnnotationStatusViewPage(ICasEditor editor) {
+    this.editor = editor;
+  }
+  
+  ...
+  
+  public void createControl(Composite parent) {
+  
+    // create ui elements here
+    
+    ...
+    
+    ICasDocument document = editor.getDocument();
+    CAS cas = document.getCAS();
+    
+    // Retrieve Annotation Status FS from CAS
+    // and initalize the ui elements with it
+    
+    FeatureStructre statusFS;
+    
+    ...
+    
+    // Add event listeners to the ui element
+    // to save an update to the CAS
+    // and to advertise a change
+    
+    ...
+    
+    // Send update event
+    document.update(statusFS);
+    
+  }
+}
+----
+
+The above code sketches out how a typical view page is implemented.
+The CAS can be directly used to access any Feature Structures or annotations stored in it.
+When something is modified added/removed/changed that must be advertised via the ICasDocument object.
+It has multiple notification methods which send an event so that other views can be updated.
+The view itself can also register a listener to receive CAS change events. 
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/tools/tools.cde.adoc b/uimaj-documentation/src/docs/asciidoc/tools/tools.cde.adoc
new file mode 100644
index 0000000..e3ca773
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/tools/tools.cde.adoc
@@ -0,0 +1,797 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.tools.cde]]
+= Component Descriptor Editor User's Guide
+// <titleabbrev>CDE User's Guide</titleabbrev>
+
+The Component Descriptor Editor is an Eclipse plug-in that provides a forms-based interface for creating and editing UIMA XML descriptors.
+It supports most of the descriptor formats, except the Collection Processing Engine descriptor, the PEAR package descriptor and some remote deployment descriptors.
+
+[[ugr.tools.cde.launching]]
+== Launching the Component Descriptor Editor
+
+Here's how to launch this tool on a descriptor contained in the examples.
+This presumes you have installed the examples as described in the SDK Installation and Setup chapter.
+
+* Expand the uimaj-examples project in the Eclipse Navigator or Package Explorer view
+* Within this project, browse to the file descriptors/tutorial/ex1/RoomNumberAnnotator.xml.
+* Right-click on this file and select Open With →Component Descriptor Editor. (If this option is not present, check to make sure you xref:oas.adoc#ugr.ovv.eclipse_setup.installation[installed the plug-ins]. The EMF plugin is also required.)
+* This should open a graphical editor and display the contents of the RoomNumberAnnotator descriptor. 
+
+
+[[ugr.tools.cde.creating_new_ae_descriptor]]
+== Creating a New AE Descriptor
+
+A new AE descriptor file may be created by selecting the File →New →Other... menu.
+This brings up the following dialog: 
+
+.Screenshot of selecting new UIMA component in Eclipse
+image::images/tools/tools.cde/image002.jpg[Screenshot of selecting new UIMA component in Eclipse]
+
+If the user then selects UIMA and Analysis Engine Descriptor File, and clicks the _Next_ button, the following dialog is displayed.
+We will cover creating other kinds of components later in the documentation. 
+
+.Screenshot of selecting new UIMA component in Eclipse after pushing Next
+image::images/tools/tools.cde/image004.jpg[Screenshot of selecting new UIMA component in Eclipse after pushing Next]
+
+After entering the appropriate parent folder and file name, and clicking Finish, an initial AE descriptor file is created with the given name, and the descriptor is opened up within the Component Descriptor Editor.
+
+At this point, the display inside the Component Descriptor Editor is the same whether one started by creating a new AE descriptor, as in the preceding paragraph, or one merely opened a previously created AE descriptor from, say, the Package Explorer view.
+We show a previously created AE in the figure below: 
+
+.Screenshot of CDE showing overview page
+image::images/tools/tools.cde/image006.jpg[Screenshot of CDE showing overview page]
+
+To see all the information shown in the main editor pane with less scrolling, double click the title tab to toggle between the "`full screen`" and normal views.
+
+It is possible to set the Component Descriptor Editor as the default editor for all .xml files by going to Window →Preferences, and then selecting File Associations on the left, and *.xml on the right, and finally by clicking on Component Descriptor Editor, the Default button and then OK.
+If AE and Type System descriptors are not the primary .xml files you work with within the Eclipse environment, we recommend not setting the Component Descriptor Editor as your default editor for all .xml files.
+To open an .xml file using the Component Descriptor Editor, if the Component Descriptor Editor is not set as your default editor, right click on the file in the Package Explorer, or other navigational view, and select Open With →Component Descriptor Editor.
+This choice is remembered by Eclipse for subsequent open operations.
+
+[[ugr.tools.cde.pages_within_the_editor]]
+== Pages within the Editor
+
+The Component Descriptor Editor follows a standard Eclipse paradigm for these kinds of editors.
+There are several pages in the editor; each one can be selected, one at a time, by clicking on the bottom tabs.
+The last page contains the actual XML source file being edited, and is displayed as plain text.
+
+The same set of tabs appear at the bottom of each page in the Component Descriptor Editor.
+The Component Descriptor Editor uses this "`multi-page editor`" paradigm to give the user a view of conceptually distinct portions of the Descriptor metadata in separate pages.
+At any point in time the user may click on the Source tab to view the actual XML source.
+The Component Descriptor Editor is, in a way, just a fancy GUI for editing the XML.
+The tabs provide quick access to the following pages: Overview, Aggregate, Parameters, Parameter Settings, Type System, Capabilities, Indexes, Resources, and Source.
+We discuss each of these pages in turn.
+
+[[ugr.tools.cde.adjusting_display_of_pages]]
+=== Adjusting the display of pages
+
+Most pages in the editor have a "`sash`" bar.
+This is a light gray bar which separates sub-sections of the page.
+This bar can be dragged with the mouse to adjust how the display area is split between the two sash panes.
+You can also change the orientation of the Sash so it splits vertically, instead of horizontally, by clicking on the small icons at the top right of the page that look like this: 
+
+.Changing orientation of two window split
+image::images/tools/tools.cde/image008.jpg[Changing orientation of two window split]
+
+All of the sections on a page have subtitles, with an indicator to the left which you can click to collapse or expand that particular section.
+Collapsing sections can sometimes be useful to free up screen area for other sections.
+
+[[ugr.tools.cde.overview_page]]
+== Overview Page
+
+Normally, the first page displayed in the Component Descriptor Editor is the Overview page (the name of the page is shown in the GUI panel at the top left). If there is an error reading and parsing the source, the Source page is shown instead, giving you the opportunity to correct the problem.
+For many components, the Overview page contains three sections: Implementation Details, Runtime Information and overall Identification Information.
+
+[[ugr.tools.cde.overview_page.implementation_details]]
+=== Implementation Details
+
+In the Implementation Details section you specify the Implementation Language and Engine Type.
+There are two kinds of Engines: Aggregate, and non-Aggregate (also called Primitive). An Aggregate engine is one which is composed of additional component engines and contains no code, itself.
+Several of the pages in the Component Descriptor Editor have different formats, depending on the engine type.
+
+[[ugr.tools.cde.overview_page.runtime_info]]
+=== Runtime Information
+
+Runtime information is only applicable for primitive engines and is disabled for aggregates and other kinds of descriptors.
+This is where you specify the class name of the annotator implementation, if you are doing a Java implementation, or the C\++ shared object or dll name, if you are doing a C++ implementation.
+Most Analysis Engines will specify that they update the CAS, and that they may be replicated (for performance reasons) when deployed.
+If a particular Analysis Engine must see every CAS (for instance, if it is counting the number of CASes), then uncheck the "`multiple deployment allowed`" box.
+If the Analysis Engine doesn't update the CAS, uncheck the "`updates the CAS`" box.
+(Most CAS Consumers do not update the CAS, and this parameter defaults to unchecked for new CAS Consumer descriptors).
+
+Analysis engines are written using the xref:tug.adoc#-ugr.tug.cm[CAS Multiplier APIs]  can create additional CASes for analysis.
+To specify that they do this, check the `returns new artifacts`.
+
+[[ugr.tools.cde.overview_page.overall_id_info]]
+=== Overall Identification Information
+
+The Name should be a human-readable name that describes this component.
+The Version, Vendor, and Description fields are optional, and are arbitrary strings.
+
+[[ugr.tools.cde.aggregate_page]]
+== Aggregate Page
+
+For primitive Analysis Engines, Flow Controllers or Collection Processing components, the Aggregate page is not used.
+For aggregate engines, the page looks like this: 
+
+.CDE Aggregate page
+image::images/tools/tools.cde/image010.jpg[CDE Aggregate page]
+
+On the left we see a list of component engines, and on the right information about the flow.
+If you hover the mouse over an item in the list of component engines, that engine's description meta data will be shown.
+If you right-click on one of these items, you get an option to open that delegate descriptor in another editor instance.
+Any changes you make, however, won't be seen until you close and reopen the editor on the importing file.
+
+Engines can be added to the list on the left by clicking the Add button at the bottom of the Component Engine section.
+This brings up one of the following two dialogs: 
+
+.Adding an Analysis Engine to an Aggregate, by location
+image::images/tools/tools.cde/import-by-location.jpg["Adding an Analysis Engine to an Aggregate, by location"]
+
+This dialog lets you select a descriptor from your workspace, or browse the file system to select a descriptor. 
+
+Or, if you have selected to import by name, this dialog is shown: 
+
+.Adding an Analysis Engine to an Aggregate, by name
+image::images/tools/tools.cde/import-by-name.jpg["Adding an Analysis Engine to an Aggregate, by name"]
+
+You can specify that the import should be by Name (the name is looked up using both the Project's class path, and DataPath), or by location.
+If it is by name,  the dialog shows the available xml files on the class path, to pick from.
+If the one you want isn't showing, this means it isn't on the enclosing Eclipse Java Project's classpath, nor on the datapath, and one of those needs to be updated to include the  path to the resource.
+If the name picked is ``com/company/prod/xyz.xml``, the name in the descriptor will be "``com.company.prod.xyz``".
+The "Browse the file system..." button is disabled when import by name is checked, because the file system is not the source of the imports - rather, its the resources on the  classpath or datapath that are.
+
+If it is by location, the file reference is converted to a relative reference if possible, in the descriptor.
+
+The final selection at the bottom tells whether or not the selected engine(s) should automatically be added to the end of the flow section (the right section on the Aggregate page). The OK button does not become activated until a descriptor file is selected.
+
+To remove an analysis engine from the component engine list simply select an engine and click the Remove button, or press the delete key.
+If the engine is already in the flow list you will be warned that deletion will also delete the specified engine from this list.
+
+[[ugr.tools.cde.aggregate_page.adding_components_more_than_once]]
+=== Adding components more than once
+
+Components may be added to the left panel more than once.
+Each of these components will be given a key which is unique.
+A typical reason this might be done is to use a component in a flow several times, but have each use be associated with different configuration parameters (different configuration parameters can be associated with each instance).
+
+[[ugr.tools.cde.aggregate_page.adding_removing_components_from_flow]]
+=== Adding or Removing components in a flow
+
+The button in-between the Component Engines and the Flow List, labeled ``>>``, adds a chosen engine to the flow list and the button labeled `<<` removes an engine from the flow list.
+To add an engine to the flow list you must first select an engine from the left hand list, and then press the `>>` button.
+Engines may appear any number of times in the flow list.
+To remove an engine from the flow list, select an engine from the right hand list and press the `<<` button.
+
+[[ugr.tools.cde.aggregate_page.adding_remote_aes]]
+=== Adding remote Analysis Engines
+
+There are two ways to add remote engines: add an existing descriptor, which specifies a remote engine (just as if you were adding a non-remote engine) or use the __Add Remote__ button which will create a remote descriptor, save it, and then import it, all in one operation.
+The __Add Remote__ button enables you to easily specify the information needed to create a remote service descriptor for a remote AE - one that runs on a different computer connected over the network.
+There are 3 kinds of  these: two are variants of the xref:ref.adoc#ugr.ref.xml.component_descriptor.service_client[Service Client descriptor]; the other is the UIMA-AS JMS Service descriptor, described in the UIMA AS documentation.
+The __Add Remote__ button creates an instance of one of these descriptors,  saves it as a file in the workspace, and imports it into the aggregate.
+
+Of course, if you already have a remote service descriptor, you can add it to the set of delegates using the `Add` button, just like adding other kinds of analysis engines.
+
+After clicking on __Add Remote__, the following dialog is displayed: 
+
+.Adding a remote client to an aggregate
+image::images/tools/tools.cde/image014v2.jpg[Adding a remote client to an aggregate]
+
+To define a remote service you specify the Service Kind, Protocol Service Type, URI and Key.
+You can also specify a Timeout in milliseconds, used by the JMS services, and a VNS Host and Port used by the Vinci Service.
+The JMS service has additional timeouts and other parameters you may specify.
+Just like when one adds an engine from the file system, you have the option of adding the engine to the end of the flow.
+The Component Descriptor Editor currently only supports Vinci services using this dialog.
+
+Remote engines are added to the descriptor using the <import ... > syntax.
+The information you specify here is saved in the Eclipse project as a file, using a generated name, `<key-name>.xml`, where `<key-name>` is the name you listed as the Key.
+Because of this, the key-name must be a valid file name.
+If you want a different name, you can change the path information in the dialog box.
+
+[[ugr.tools.cde.aggregate_page.connecting_to_remote_services]]
+=== Connecting to Remote Services
+
+If you are using the Vinci protocol, it requires that you specify the location of the Vinci Name Server (an IP address and a Port number). You can specify these in the service descriptor, or globally, for your Eclipse workspace, using the Eclipse menu item: Window →Preferences... →UIMA Preferences. 
+
+If the remote service is available (up and running), additional operations become possible.
+For instance, hovering the mouse over the remote descriptor will show the description metadata from the remote service.
+
+[[ugr.tools.cde.aggregate_page.finding_aes_by_searching]]
+=== Finding Analysis Engines by searching
+
+The next button that appears between the component engine list and the flow list is the Find AE button.
+When this button is pressed the following dialog is displayed, which allows one to search for AEs by name, by input or output types, or by a combination of these criteria.
+This function searches the existing Eclipse workspace for matching *.xml descriptor source files; it does not look inside Jar files. 
+
+.Searching for an AE to add to an aggregate
+image::images/tools/tools.cde/image016.jpg[Searching for an AE to add to an aggregate]
+
+The search automatically adds a "`match any characters`" - style (*) wildcard at the beginning and end of anything entered.
+Thus, if person is specified for an output type, a "`*person*`" search is performed.
+Such a search would match such things as "`my.namespace.person`" and "`person.governmentOfficial.`" One can search in all projects or one particular project.
+The search does an implicit _and_ on all fields which are left non-blank.
+
+[[ugr.tools.cde.aggregate_page.component_engine_flow]]
+=== Component Engine Flow
+
+The UIMA SDK currently supports three kinds of xref:ref.adoc#ugr.ref.xml.component_descriptor.aes.aggregate.flow_constraints[sequencing flows]: `Fixed`, `CapabilityLanguageFlow`, and user-defined. The first two require specification of a linear flow sequence; this linear flow sequence can also be read by a user-defined flow controller (what use is made of it is up to the user-defined flow controller). The Component Engine Flow section allows specification of these items.
+
+The pull-down labeled Flow Kind picks between the three flow models.
+When the user-defined flow is selected, the Browse and Search buttons become enabled to let you pick the flow controller XML descriptor to import. 
+
+.Specifying flow control
+image::images/tools/tools.cde/image018.jpg[Specifying flow control]
+
+The key name value is set automatically from the XML descriptor being imported, and enables parameters to be overridden for that descriptor (see following sections).
+
+The Up and Down buttons to the right in the Flow section are activated when an engine in the flow is selected.
+The Up button moves the selected engine up one place in the execution order, and down moves the selected engine down one place in the execution order.
+Remember that engines can appear multiple times in the flow (or not at all).
+
+[[ugr.tools.cde.parm_definition]]
+== Parameters Definition Page
+
+There are two pages for parameters: the first one is where parameters are defined, and the second one is where the parameter settings are configured.
+The first page is the Parameter Definition page and has two alternatives, depending on whether or not the descriptor is an Aggregate or not.
+We start with a description of parameter definitions for Primitive engines, CAS Consumers, Collection Readers, CAS Initializers, and Flow Controllers.
+Here is an example: 
+
+.Parameter Definitions - not Aggregate
+image::images/tools/tools.cde/image020.jpg[Parameter Definitions - not Aggregate]
+
+The first checkbox at the top simplifies things if you are not using Parameter Groups (see the following section for a discussion of groups). In this case, leave the check box unchecked.
+The main area shows a list of parameter definitions.
+Each parameter has a name, which must be unique for this Analysis Engine.
+The first three attributes specify whether the parameter can have a single or multiple values (an array of values), whether it is Optional or Mandatory, and what the value type it can hold (String, Integer, Float, and Boolean).  If an external override name has been specified  an attribute of "XO" is included.
+See xref:ref.adoc#ugr.ref.xml.component_descriptor.aes.external_configuration_parameter_overrides[External Configuration Parameter Overrides] for a discussion of external configuration parameter overrides.
+
+In addition to using the buttons on the right to edit this information, you can double-click a parameter to edit it, or remove (delete) a selected parameter by pressing the delete key.
+Use the Add button to add a new parameter to the list.
+
+Parameters have an additional description field, which you can specify when you add or edit a parameter.
+To see the value of the description, hover the mouse over the item, as shown in the picture below.
+If the parameter has an external override name its value is included in the hover. 
+
+.Parameter description shown in a hover message
+image::images/tools/tools.cde/image022.jpg[Parameter description shown in a hover message]
+
+
+[[ugr.tools.cde.parm_definition.using_groups]]
+=== Using groups
+
+The group concept for parameters arose from the observation that sets of parameters were sometimes associated with different configuration needs.
+As an example, you might have an Analysis Engine which needed different configuration based on the language of a document.
+
+To use groups, you check the "`Use Parameter Groups`" box.
+When you do this, you get the ability to add groups, and to define parameters within these groups.
+You also get a capability to define "`Common`" parameters, which are parameters which are defined for all groups.
+Here is a screen shot showing some parameter groups in use: 
+
+.Using parameter groups
+image::images/tools/tools.cde/image024.jpg[Using parameter groups]
+
+You can see the `<Common>` parameters as well as two different sets of groups.
+
+The Default Group is an optional specification of what Group to use if the parameter is not available for the group requested.
+
+The xref:ref.adoc#ugr.ref.xml.component_descriptor.aes.configuration_parameter_declaration[Search strategy] specifies what to do when a parameter is not available for the group requested.
+It can have the values of `None`, `language_fallback`, or `default_fallback`.
+
+Groups are added using the __Add Group__ button.
+Once added, they can be edited or removed, using the buttons to the right, or the standard gestures for editing (double-clicking the item) and removing (pressing the delete key after an item is selected). Removing a group removes all the parameter definitions in the group.
+If you try and remove the `<Common>` group, it just removes the parameters in the group.
+
+Each entry for a group in the table specifies one or more group names.
+For example, the highlighted entry above, specifies two groups: `myNewGroup2` and `mg3`.
+The parameter definition underneath is considered to be in both groups.
+
+[[ugr.tools.cde.parm_definition.adding]]
+=== Adding or Editing a Parameter
+
+When creating or modifying a parameter both a unique name and a valid type must be specified.
+The Description and External Override fields are optional.
+The defaults for the two checkboxs indicate a single-valued optional parameter in the example below: 
+
+
+image::images/tools/tools.cde/image025.jpg[Aggregate parameters]
+
+
+[[ugr.tools.cde.parm_definition.aggregates]]
+=== Parameter declarations for Aggregates
+
+Aggregates declare parameters which always must override a parameter setting for a component making up the aggregate.
+They do this using the version of this page which is shown when the descriptor is an Aggregate; here's an example: 
+
+
+image::images/tools/tools.cde/image026.jpg[Aggregate parameters]
+
+There is an additional panel shown (on the right) which lists all of the components by their key names, and shows for each of them their defined parameters.
+To add a new override for one or more of these parameters to the aggregate, select the component parameter you wish to override and push the Create Override button (or, you can just double-click the component parameter). This will automatically add a parameter of the same name (by default –you can change the name if you like) to the aggregate, putting it into the same group(s) (if groups are being used in the component –this is required), and setting the properties of the parameter to match those of the component (this is required).
+
+[NOTE]
+====
+If the name of the parameter being added already is in use in the aggregate, and the parameters are not compatible, a new parameter name is generated by suffixing the name with a number.
+If the parameters are compatible, the selected component parameter is added to the existing aggregate parameter, as an additional override.
+If you don't want this behavior, but want to have a new name generated in this case, push the Create non-shared Override button instead, or hold down the "`shift`" key when double clicking the component parameter.
+
+The required / optional setting in the aggregate parameter is set to match that of the parameter being overridden.
+You may want to make an optional delegate parameter required.
+You can do this by changing that value manually in the source editor view. 
+====
+
+In the above example, the user has just double-clicked the `TypeNames` parameter in the `NameRecognizer` component.
+This added that parameter to this aggregate under the `<Not in any group>` section -- since it wasn't part of a group.
+
+Once you have added a parameter definition to the aggregate, you can use the buttons on the right side of the left panel to add additional overrides or remove parameters or their overrides. You can also remove groups; removing a group is like removing all the parameter definitions in the group.
+
+In addition to adding one parameter at a time from a component, you can also add all the parameters for a group within a component, or all the parameters in the component, by selecting those items.
+
+If you double-click (or push __Create Override__) the `<Common>` group or a parameter in the `<Common>` group in a component, a special group is created in the Aggregate consisting of all of the groups in that component, and the overriding parameter (or parameters) are added to that.
+This is done because each component can have different groups belonging to the Common group notion; the Common group for a component is just shorthand for all the groups in that component.
+
+The Aggregate's specification of the default group and search strategy override any specifications contained in the components.
+
+[[ugr.tools.cde.parameter_settings]]
+== Parameter Settings Page
+
+The Parameter Settings page is rather straightforward; it is where the user defines parameter settings for their engines.
+An example of such a page is given below: 
+
+.Parameter settings page
+image::images/tools/tools.cde/image028.jpg[Parameter settings page]
+
+For single valued attributes, the user simply types the default value into the Value box on the right hand side.
+For multi-valued parameters the user should use the Add, Edit and Remove buttons to manage the list of multiple parameter values.
+
+Values within groups are shown with each group separately displayed, to allow configuring different values for each group.
+
+Values are checked for validity.
+For Boolean values in a list, use the words `true` or `false`.
+
+[NOTE]
+====
+If you specify a value in a single-valued parameter, and then delete all the characters in the value, the CDE will treat this as if you wanted to not specify any setting for this parameter.
+In order to specify a 0 length string setting for a String-valued parameter, you will have to manually edit the XML using the "`Source`" tab. 
+
+For array valued parameters, if you remove all of the entries for a particular array parameter setting, the XML will reflect a 0-length array.
+To change this to an unspecified parameter setting, you will have to manually edit the XML using the "`Source`" tab. 
+====
+
+[[ugr.tools.cde.type_system]]
+== Type System Page
+
+This page declares the type system used by the annotator.
+For aggregates it is derived by merging the type systems of all constituent AEs.
+The types used by the AE constitute the language in which the inputs and outputs are described in the Capabilities page and also affect the choice of indexes on the Indexes page.
+The Type System page looks like the following: 
+
+
+image::images/tools/tools.cde/limitJCasGenType.jpg[Type System declaration page]
+
+Before discussing this page in detail, it is important to note that there are 3 settings that affect the operation of this page.
+These are accessed by selecting the UIMA →Settings (or by going to the Eclipse Window →Preferences →UIMA Preferences) and checking or unchecking one of the following: "`Auto generate
+      .java files when defining types`", "`Generate JCasGen classes only for types defined within the local project scope`"  and "`Display fully qualified type
+      names.`"
+
+When the Auto generate option is checked and the development language for the AE is Java, any time a change is made to a type and the change is saved, the corresponding .java files are generated using the JCasGen tool.
+The results are stored in the primary source directory defined for the project.
+The primary source directory is that listed first when you right click on your project and select Properties →Java Build Path, click on the Source tab and look in the list box under the text that reads: __Source folder on build path__. 
+If no source folders are defined, you will get a warning that you have no source folders defined and xref:tools.adoc#ugr.tools.jcasgen[JCasGen] will not be run. 
+When JCasGen is run, you can monitor the progress of the generation by observing the status on the Eclipse status line (normally at the bottom of the Eclipse window). 
+JCasGen runs on the fully-merged type system, consisting of the type specification plus any imported type system, plus (for aggregates) the merged type systems of all the components in an aggregate.
+
+[WARNING]
+====
+If the components of the aggregate have different definitions for the same  type name, the CDE will show a warning.
+It is possible to continue past this warning, in which case the CDE will produce the correct  Java source files representing the merged types (that is, the type definition that contains all of the features defined on that type by all of your components). However, it is not recommended to use this feature (of having different definitions for the same type name) since it can make it difficult to xref:ref.adoc#ugr.ref.jcas.merging_types_from_other_specs[combine/package] your annotator with others.
+====
+
+[NOTE]
+====
+In addition to running automatically, you can manually run JCasGen on the fully merged type system by clicking the JCasGen button, or by selecting Run JCasGen from the UIMA pulldown menu: 
+====
+
+.Setting JCasGen options
+image::images/tools/tools.cde/image032.jpg[Setting JCasGen options]
+
+When __Generate JCasGen classes only for types defined within the local project scope__ is checked, then JCasGen skips generating classes for types that are imported from sources outside this project.
+This might be done, for instance, if you have an aggregate which is importing type systems from its delegates, some of which are defined in other projects, and have JCasGen'd files already present in those other projects. 
+
+The UIMA settings and preferences for controlling this are used to initialize a particular instance of the editor, when it is started.
+Following that, you can override this setting, just for that editor, by checking or unchecking the box shown on the type system page:
+
+.Limit the scope of JCasGen
+image::images/tools/tools.cde/limitJCasGen.jpg[Limit the scope of JCasGen]
+
+
+[NOTE]
+====
+If this is checked, and one of the types that would be excluded has merged type features, an error message is issued - because JCasGen will need to be run for the combined (merged) type in order to get a class definition that will work for this configuration (have access to all the features).  If this happens, you have to run without limiting JCasGen, and manually delete any duplicated/unwanted source results.
+====
+
+When __Display fully qualified type names__ is left unchecked, the namespace of types is not displayed, i.e.
+if a fully qualified type name is my.namespace.person, only the abbreviated type name person will be displayed.
+In the Type page diagram shown above, __Display fully qualified type names__ is in fact unchecked.
+
+To add, edit, or remove types the buttons on the top left section are used.
+When adding or editing types, fully qualified type names should of course be used, regardless of whether the __Display fully qualified type names__ is unchecked.
+Removing or editing a type will have a cascading effect in that the type removal/edit will effect inputs, outputs, indexes and type priorities in the natural way.
+
+When a type is added, this dialog is shown: 
+
+.Adding a type
+image::images/tools/tools.cde/image034.jpg[Adding a type]
+
+Type names should be specified using a namespace.
+The namespace is like a Java package name, and serves to insure type names are unique.
+It also serves as the package name for the generated JCas classes.
+The namespace name is the set of names up to the last period in the string.
+
+The supertype must be picked from an existing type.
+The entry field for the supertype supports Eclipse-style content assist.
+To use it, put the cursor in the supertype field, and type a letter or two of the supertype name (lower case is fine), either starting with the name space, or just with the type name (without the name space), and hold down the Control key and then press the spacebar.
+When you do this, you can see a list of suitable matching types.
+You can then type more letters to narrow down your choices, or pick the right entry with the mouse.
+
+To see the available types and pick one, press the Browse button.
+This will show the available types, and as you type letters for the type name (in lower case –capitalization is ignored), the available types that match are narrowed.
+When you've typed enough to specify the type you want, press Enter.
+Or you can use the list of matching type names and pick the one you want with the mouse.
+
+Once you've added the type, you can add features to it by highlighting the type, and pressing the Add button.
+
+If the type being defined is a subtype of uima.cas.String, the Add button allows you to add allowed values for the string, instead of adding features.
+
+To edit a type or feature, you can double click the entry, or highlight the entry and press the Edit button.
+To delete a type or feature, you highlight the entry to be deleted, and click the delete button or push the delete key.
+
+If the range of a feature is an array or one of the built-in list types, an additional specification allows you to specify if multiple references to the object referenced by this feature are allowed.
+If they are not allowed then the XMI serialization of instances of this type use a more efficient format.
+
+If the range of a feature is an array of Feature Structures, then it is possible to specify an element type for the array.
+This information is used in the XMI serialization and also by the JCas generation routines to generate more efficient code. 
+
+.Specifying a Feature Structure
+image::images/tools/tools.cde/image036.jpg[Specifying a Feature Structure]
+
+It is also possible to import type systems for inclusion in your descriptor.
+To do this, use the Type Import panel's __Add...__ button.
+This allows you to import a type system descriptor.
+
+When importing by name, the name is resolved using the class path for the Eclipse project containing the descriptor file being edited, or by looking up this name in the UIMA DataPath.
+The DataPath can be set by pushing the Set DataPath button.
+It will be remembered for this Eclipse project, as a project Property, so you only have to set it once (per project). The value of the DataPath setting is written just like a class path, and can include directories or JAR files, just as is true for class paths.
+
+The following dialog allows you to pick one or more files from the Eclipse workspace, or one file (at a time) from the file system: 
+
+.Picking files for importing
+image::images/tools/tools.cde/import-chooser.jpg[Picking files for importing]
+
+This is essentially the same dialog as was used to add component engines to an aggregate.
+To import from a type system descriptor that is not part of your Eclipse workspace, click the __Browse the file system...__ button.
+
+Imported types are validated, and if OK, they are added to the list in the Imported Type Systems section of the Type System page.
+Any types they define are merged with the existing type system.
+
+Imported types and features which are only defined in imports are shown in the Type System section, but in a grayed-out font; these type cannot be edited here.
+To change them, open up the imported type system descriptor, and change them there.
+
+If you hover the mouse over an import specification, it will show more information about the import.
+If you right-click, it will bring up a context menu that allows opening the imported file in the Editor, if the imported file is part of the Eclipse workspace.
+Changes you make, however, won't be seen until you close and reopen the editor on the importing file.
+
+It is not possible to define types for an aggregate analysis engine.
+In this case the type system is computed from the component AEs.
+The Type System information is shown in a grayed-out font.
+
+[[ugr.tools.cde.type_system.exporting]]
+=== Exporting
+
+In addition to importing type specifications, you can export as well.
+When you push the __Export...__ button, the editor will create a new importable XML descriptor for the types in this type system, and change the existing descriptor to import that newly created one. 
+
+
+image::images/tools/tools.cde/image040.jpg[Exporting a type system]
+
+The base file name you type is inserted into the path in the line below automatically.
+You can change the path where the generated part descriptor is stored by overtyping the lower text box.
+When you click OK, the new part descriptor will be generated, and the current descriptor will be changed to import that part.
+
+[[ugr.tools.cde.capabilities]]
+== Capabilities Page
+
+Capabilities come in __sets__.
+You can have multiple sets of capabilities; each one specifies languages supported, plus inputs and outputs of the Analysis Engine.
+The idea behind having multiple sets is the concept that different inputs can result in different outputs.
+Many Analysis Engines, though, will probably define just one set of capabilities.
+A sample Capabilities page is given below: 
+
+
+image::images/tools/tools.cde/image042.jpg[Capabilities page]
+
+When defining the capabilities of a primitive analysis engine, input and output types can be any type defined in the type system.
+When defining the capabilities of an aggregate the inputs must be a subset of the union of the inputs in the constituent analysis engines and the outputs must be a subset of the union of the outputs of the constituent analysis engines.
+
+To add a type, first select something in the set you wish to add the type to, and press Add Type.
+The following dialog appears presenting the user with a list of types which are candidates for additional inputs: 
+
+
+image::images/tools/tools.cde/image044.jpg[Adding a type to the capabilities page]
+
+Follow the instructions to mark the types as input and / or output (a type can be both). By default, the <all features> flag is set to true.
+If you want to specify a subset of features of a type, read on.
+
+When types have features, you can specify what features are input and / or output.
+A type doesn't have to be an output to have an output feature.
+For example, an Analysis Engine might be passed as input a type Token, and it adds (outputs) a feature to the existing Token types.
+If no new Token instances were created, it would not be an output Type, but it would have features which are output.
+
+To specify features as input and / or output (they can be both), select a type, and press Add.
+The following dialog box appears: 
+
+
+image::images/tools/tools.cde/image046.jpg[Specifying features as input or output]
+
+To mark a feature as being input and / or output, click the mouse in the input and / or output column for the feature.
+If you select <all features>, it unmarks any individual feature you selected, since <all features> subsumes all the features.
+
+The Languages part of the capability is where you specify what languages are supported by the Analysis Engine.
+Supported languages should be listed using either a two letter ISO-639 language code, or an ISO-639 language code followed by a hyphen and then a two-letter ISO-3166 country code.
+Add a language by selecting Languages and pressing the Add button.
+The dialog for adding languages is given below. 
+
+
+image::images/tools/tools.cde/image048.jpg[Specifying a language]
+
+The Sofa part of the capability is optional; it allows defining Sofa names that this component uses, and whether they are input (meaning they are created outside of this component, and passed into it), or output (meaning that they are created by this component). Note that a Sofa can be either input or output, but can't be both.
+
+To add a Sofa name (which is synonymous with the view name), press the Add Sofa button, and this dialog appears: 
+
+
+image::images/tools/tools.cde/image050.jpg[Specifying a Sofa name]
+
+
+[[ugr.tools.cde.capabilities.sofa_name_mapping]]
+=== Sofa (and view) name mappings
+
+Sofa names, once created, are used in Sofa Mappings.
+These are optional mappings, done in an aggregate, that specify which Sofas are the same ones but with different names.
+The Sofa Mappings section is minimized unless you are editing an Aggregate descriptor, and have one or more Sofa names defined for the aggregate.
+In that case, the Sofa Mappings section will look like this: 
+
+
+image::images/tools/tools.cde/image052.jpg[Sofa mappings]
+
+Here the aggregate has defined two input Sofas, named "`MyInputSofa`", and "`AnotherSofa`".
+Any named sofas in the aggregate's capabilities will appear in the Sofa Mapping section, listed either under Inputs or Outputs.
+Each name in the Mappings has 0 or more delegate (component) sofa names mapped to it.
+A delegate may have multiple Sofas, as in this example, where the GovernmentOfficialRecognizer delegate has Sofas named "`so1`" and "`so2`".
+
+Delegate components may be written as Single-View components.
+In this case, they have one implicit, default Sofa ("`_InitialView`"), and to map to it you use the form shown for the "`NameRecognizer`"– you map to the delegate's key name in the aggregate, without specifying a Sofa name.
+You can also specify the sofa name explicitly, e.g., NameRecognizer/_InitialView.
+
+To add a new mapping, select the Aggregate Sofa name you wish to add the mapping for, and press the Add button.
+This brings up a window like this, showing all available delegates and their Sofas; select one or more (use the normal multi-select methods) of these and press OK to add them. 
+
+
+image::images/tools/tools.cde/image054.jpg[Adding a Sofa mapping]
+
+To edit an existing mapping, select the mapping and press Edit.
+This will show the existing mapping with all mapped items "`selected`", and other available items unselected.
+Change the items selected to match what you want, deselecting some, and perhaps selecting others, and press OK.
+
+[[ugr.tools.cde.indexes]]
+== Indexes Page
+
+The Indexes page is where the user declares what indexes and type priority lists are used by the analysis engine.
+Indexes are used to determine which Feature Structures of a particular type are fetched, using an iterator in the UIMA API.
+An unpopulated Indexes page is displayed below: 
+
+
+image::images/tools/tools.cde/image056.jpg[Index page]
+
+Both indexes and type priority lists can have imports.
+These imports work just like the type system imports, described above.
+Both indexes and type priority lists can be exported to new component descriptors, using the Export... button, just like the type system export operation described above.
+
+The built-in Annotation Index is always present.
+It is based on the built-in type ``uima.tcas.Annotation ``and has keys begin (Ascending), end (Descending) and TYPE_PRIORITY.
+There are no built-in type priorities, so this last sort item does not play a role in the index unless type priorities are specified.
+
+Type priority may be combined with other keys.
+Type priorities are defined in the Priority Lists section, using one or more priority list.
+A given priority list gives an ordering among a group of types.
+Types that appear higher in the priority list are given higher priority, in other words, they sort first when TYPE_PRIORITY is specified as the index key.
+Subtypes of these types are also ordered in a consistent manner, unless overridden by another specific type priority specification.
+To get the ordering used among all the types, all of the type priority lists are merged.
+This gives a partial ordering among the types.
+Ties are resolved in an unspecified fashion.
+The Component Descriptor Editor checks for incompatible orderings, and informs the user if they exist, so they can be corrected.
+
+To create a new index, use the Add Index button in the top left section.
+This brings up this dialog: 
+
+
+image::images/tools/tools.cde/image058.jpg[Adding a new index]
+
+Each index needs a globally unique index name.
+Every index indexes one CAS type (including its subtypes). If you're using Eclipse 3.2 or later, the entry field for this  has content assist (start typing the type name and press Control –Spacebar to get help, or press the Browse button to pick a type).
+
+Indexes can be sorted, in which case you need to specify one or more keys to sort on.
+Sort keys are selected from features whose range type is Integer, Float, or String.
+Some elements will be disabled if they are not relevant.
+For instance, if the index kind is "`bag`", you cannot provide sort keys.
+The order of sort keys can be adjusted using the up and down buttons, if necessary.
+
+[NOTE]
+====
+There is usually no need to explicitly declare a Bag index in your descriptor.
+As of UIMA v2.1, if you do not declare any index for a type (or any of its  supertypes), a Bag index will be automatically created.
+This index is  accessed using the `getAllIndexedFS(...)` method defined on the index repository.
+====
+
+A set index will contain no duplicates of the same type, where a duplicate is defined by the indexing comparator.
+That is, if you commit two feature structures of the same type that are equal with respect to the indexing comparator, only the first one will be entered into the index.
+Note that you can still have duplicates with respect to the indexing order, if they are of a different type.
+A set index is not guaranteed to be sorted.
+If no keys are specified for a set index, then all instances are considered by default to be equal, so only the first instance (for a particular type or subtype of the type being indexed) is indexed.
+On the other hand, "`bag`" indicates that all annotation instances are indexed, including duplicates.
+
+The Priority Lists section of the Indexes page is used to specify Priority Lists of types.
+Priority Lists are unnamed ordered sets of type names.
+Add a new priority list by clicking the Add Set button.
+Add a type to an existing priority list by first selecting the set, and then clicking Add.
+You can use the up and down buttons to adjust the order as necessary; these buttons move the selected item up or down.
+
+Although it is possible to import self-contained index and type priority files, the creation of such files is not yet supported by the Component Descriptor Editor.
+If you create these files using another editor, they can be imported using the corresponding Import panels, shown on the right.
+Imports are specified in the same manner as they are for Type System imports.
+
+[[ugr.tools.cde.resources]]
+== Resources Page
+
+The resources page describes resource dependencies (for primitive Analysis Engines) and external Resource specification and their bindings to the resource dependencies.
+
+Only primitive Analysis Engines define resource dependencies.
+Primitive and Aggregate Analysis Engines can define external resources and connect them (bind them) to resource dependencies.
+
+When an Aggregate is providing an external resource to be bound to a dependency, the binding is specified using a possibly multi-level path, starting at the Aggregate, and specify which component (by its key name), and then if that component is, in turn, an Aggregate, which component (again by its key name), and so on until you reach a primitive.
+The sequence of key names is made into the binding specification by joining the parts with a "`/`" character.
+All of this is done for you by the Component Descriptor Editor.
+
+Any external resource provided by an Aggregate will override any binding provided by any lower level component for the same resource dependency.
+
+There are two views of the Resources page, depending on whether the Analysis Engine is an Aggregate or Primitive.
+Here's the view for a Primitive: 
+
+
+image::images/tools/tools.cde/image060.jpg[Resources page for a primitive]
+
+To declare a resource dependency, click the Add button in the right hand panel.
+This puts up the dialog: 
+
+
+image::images/tools/tools.cde/image062.jpg[Specifying a resource dependency]
+
+The Key must be unique within the descriptor declaring it.
+The Interface, if present, is the name of a Java interface the Analysis Engine uses to access the resource.
+
+Declare actual External resource on the left side of the page.
+Clicking __Add__ brings up this dialog: 
+
+.Specifying an External Resource
+image::images/tools/tools.cde/image064.jpg[Specifying an External Resource]
+
+The Name must be unique within this Analysis Engine.
+The URL identifies a file resource.
+If both the URL and URL suffix are used, the file resource is formed by combining the first URL part with the language-identifier, followed by the URL suffix; see xref:ref.adoc#ugr.ref.xml.component_descriptor.aes.primitive.resource_manager_configuration[Resource Manager Configuration].
+URLs may be written as __relative__ URLs; in this case they are resolved by looking them up relative to the classpath and/or datapath.
+A relative URL has the path part starting without an intial "`/`"; for example: file:my/directory/file.
+An absolute URL starts with `file:/` or `\file:///` or `\file://some.network.address/`. For more information about URLs, please read the javaDoc information for the Java class `URL`.
+
+The `Implementation` is optional, and if given, must be a Java class that implements the interface specified in any Resource Dependencies this resource is bound to.
+
+[[ugr.tools.cde.resources.binding]]
+=== Binding
+
+Once you have an external resource definition, and a Resource Dependency, you can bind them together.
+To do this, you select the two things (an external resource definition, and a Resource Dependency) that you want to bind together, and click Bind.
+
+[[ugr.tools.cde.resources.aggregates]]
+=== Resources with Aggregates
+
+When editing an Aggregate Descriptor, the Resource definitions panel will show all the resources at the primitive level, with paths down through the components (multiple levels, if needed) to get to the primitives.
+The Aggregate can define external resources, and bind them to one or more uses by the primitives.
+
+[[ugr.tools.cde.resources.imports_exports]]
+=== Imports and Exports
+
+Resource definitions and their bindings can be imported, just like other imports.
+Existing Resource definitions and their bindings can be exported to a new importable part, and replaced with an import for that importable part, using the "`Export...`" button, just like the similar function on the Type System page.
+
+[[ugr.tools.cde.source]]
+== Source Page
+
+The Source page is a text view of the xml content of the Analysis Engine or Type System being configured.
+An example of this page is displayed below: 
+
+
+image::images/tools/tools.cde/image066.jpg[Source page]
+
+Changes made in the GUI are immediately reflected in the xml source, and changes made in the xml source are immediately reflected back in the GUI.
+The thought here is that the GUI view and the Source view are just two ways of looking at the same data.
+When the data is in an unsaved state the file name is prefaced with an asterisk in the currently selected file tab in the editor pane inside Eclipse (as in the example above).
+
+You may accidentally create invalid descriptors or XML by editing directly in the Source view.
+If you do this, when you try and save or when you switch to a different view, the error will be detected and reported.
+In the case of saving, the file will be saved, even if it is in an error state.
+
+[[ugr.tools.cde.source.formatting]]
+=== Source formatting – indentation
+
+The XML is indented using an indentation amount saved as a global UIMA preference.
+To change this preference, use the Eclipse menu item: Windows →Preferences →UIMA Preferences.
+
+[[ugr.tools.cde.creating_self_contained_type_system]]
+== Creating a Self-Contained Type System
+
+It is also possible to use the Component Descriptor Editor to create or edit self-contained type systems.
+To create a self-contained type system, select the menu item File →New →Other and then select Type System Descriptor File.
+From the next page of the selection wizard specify a Parent Folder and File name and click Finish. 
+
+
+image::images/tools/tools.cde/image068.jpg[Working with a self-contained type system]
+
+
+
+image::images/tools/tools.cde/image070.jpg[]
+
+This will take you to a version of the Component Descriptor Editor for editing a type system file which contains just three pages: an overview page, a type system page, and a source page.
+The overview page is a bit more spartan than in the case of an AE.
+It looks like the following: 
+
+
+image::images/tools/tools.cde/image072.jpg[Editing a type system object]
+
+Just like an AE has an associated name, version, vendor and description, the same is true of a self-contained type system.
+The Type System page is identical to that in an AE descriptor file, as is the Source page.
+Note that a self-contained type system can import type systems just like the type system associated with an AE.
+
+A type system component can also be created from an existing descriptor which contains a type system definition section, by clicking on the Export... button on the Type System page.
+
+[[ugr.tools.cde.creating_other_descriptor_components]]
+== Creating Other Descriptor Components
+
+The new wizard can create several other kinds of components: Collection Processing Management (CPM) components, flow controllers, and importable parts (besides Type Systems, described above, Indexes, Type Priorities, and Resource Manager Configuration imports).
+
+The CPM components supported by this editor include the Collection Reader, CAS Initializer, and CAS Consumer descriptors.
+Each of these is basically treated just like a primitive AE descriptor, with small changes to accommodate the different semantics.
+For instance, a CAS Consumer can't declare in its capabilities section that it outputs types or features.
+
+Flow controllers are components that control the flow of CASes within an aggregate, an are edited in a similar fashion as a primitive Analysis Engine.
+
+The importable part support requires context information to enable the editor to work, because much of the power of this editor comes from extensive checking that requires additional information, other than what is available in just the importable part.
+For instance, when you create or edit an Indexes import, the facility for adding new indexes needs the type information, which is not present in this part when it is edited alone. 
+
+To overcome this, when you edit these descriptors, you will be asked to specify a context descriptor, usually a descriptor which would import the part being edited, which would have the additional information needed. 
+
+Various methods are used to guess what the context descriptor should be - and if the guess is correct, you can just press the Enter key to confirm.
+The last successful context file is remembered and will be suggested as the context file to use at the next edit session
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/tools/tools.cpe.adoc b/uimaj-documentation/src/docs/asciidoc/tools/tools.cpe.adoc
new file mode 100644
index 0000000..217d784
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/tools/tools.cpe.adoc
@@ -0,0 +1,136 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.tools.cpe]]
+= Collection Processing Engine Configurator User's Guide
+// <titleabbrev>CPE Configurator User's Guide</titleabbrev>
+
+A _Collection Processing Engine (CPE)_ processes collections of artifacts (documents) through the combination of the following components: a Collection Reader, Analysis Engines, and CAS Consumers. footnote:[Earlier versions of UIMA supported another component, the CAS
+    Initializer, but this component is now deprecated in UIMA Version 2.]
+
+The _Collection Processing Engine Configurator(CPE
+    Configurator)_ is a graphical tool that allows you to assemble and run CPEs.
+
+For an introduction to Collection Processing Engine concepts, including developing the components that make up a CPE, read xref:tug.adoc#ugr.tools.cpe[Collection Processing Engine Developer's Guide].
+This chapter is a user's guide for using the CPE Configurator tool, and does not describe UIMA's Collection Processing Architecture itself.
+
+[[ugr.tools.cpe.limitations]]
+== Limitations of the CPE Configurator
+
+The CPE Configurator only supports basic CPE configurations.
+
+It only supports "`Integrated`" deployments (although it will connect to remotes if particular CAS Processors are specified with remote service descriptors). It doesn't support configuration of the error handling.
+It doesn't support Sofa Mappings; it assumes all Single-View components are operating with the _InitialView Sofa.
+Multi-View components will not have their names mapped.
+It sets up a fixed-sized CAS Pool.
+
+To set these additional options, you must edit the xref:ref.adoc#ugr.ref.xml.cpe_descriptor[CPE Descriptor XML] file directly.
+You may then open the CPE Descriptor in the CPE Configurator and run it.
+The changes you applied to the CPE Descriptor __will__ be respected, although you will not be able to see them or edit them from the GUI. 
+
+[[ugr.tools.cpe.starting]]
+== Starting the CPE Configurator
+
+The CPE Configurator tool can be run using the `cpeGui` shell script, which is located in the `bin` directory of the UIMA SDK.
+If you've installed the example xref:oas.adoc#ugr.ovv.eclipse_setup.example_code[Eclipse project], you can also run it using the __UIMA CPE GUI__ run configuration provided in that project.
+
+[NOTE]
+====
+If you are planning to build a CPE using components other than the examples included in the UIMA SDK, you will first need to update your CLASSPATH environment variable to include the classes needed by these components.
+====
+
+When you first start the CPE Configurator, you will see the main window shown here: 
+
+
+image::images/tools/tools.cpe/image002.jpg[CPE Configurator main GUI window]
+
+
+[[ugr.tools.cpe.selecting_component_descriptors]]
+== Selecting Component Descriptors
+
+The CPE Configurator's main window is divided into three sections, one each for the Collection Reader, Analysis Engines, and CAS Consumers.footnote:[There is also a fourth pane, for the CAS Initializer, but it is hidden by default. To enable it click the
+        View  CAS Initializer Panel menu item.]
+
+In each section of the CPE Configurator, you can select the component(s) you want to use by browsing to (or typing the location of) their XML descriptors.
+You must select a Collection Reader, and at least one Analysis Engine or CAS Consumer.
+
+When you select a descriptor, the configuration parameters that are defined in that descriptor will then be displayed in the GUI; these can be modified to override the values present in the descriptor.
+
+For example, the screen shot below shows the CPE Configurator after the following components have been chosen: 
+[source]
+----
+examples/descriptors/collectionReader/FileSystemCollectionReader.xml
+examples/descriptors/analysis_engine/NamesAndPersonTitles_TAE.xml
+examples/descriptors/cas_consumer/XmiWriterCasConsumer.xml
+----
+
+
+image::images/tools/tools.cpe/image004.jpg[CPE Configurator after components chosen]
+
+
+[[ugr.tools.cpe.running]]
+== Running a Collection Processing Engine
+
+After selecting each of the components and providing configuration settings, click the play (forward arrow) button at the bottom of the screen to begin processing.
+A progress bar should be displayed in the lower left corner.
+(Note that the progress bar will not begin to move until all components have completed their initialization, which may take several seconds.) Once processing has begun, the pause and stop buttons become enabled.
+
+If an error occurs, you will be informed by an error dialog.
+If processing completes successfully, you will be presented with a performance report.
+
+[[ugr.tools.cpe.file_menu]]
+== The File Menu
+
+The CPE Configurator's File Menu has the following options:
+
+* Open CPE Descriptor
+* Save CPE Descriptor
+* Save Options (submenu)
+* Refresh Descriptors from File System
+* Clear All
+* Exit 
+
+*Open CPE Descriptor* will allow you to select a CPE Descriptor file from disk, and will read in that CPE Descriptor and configure the GUI appropriately.
+
+*Save CPE Descriptor* will create a CPE Descriptor file that defines the CPE you have constructed.
+This CPE Descriptor will identify the components that constitute the CPE, as well as the configuration settings you have specified for each of these components.
+Later, you can use "`Open CPE Descriptor`" to restore the CPE Configurator to the state.
+Also, CPE Descriptors can be used to easily xref:tug.adoc#ugr.tug.application.running_a_cpe_from_a_descriptor[run a CPE from a Java program].
+
+CPE Descriptors also allow specifying operational parameters, such as error handling options that are not currently available for configuration through the CPE Configurator.
+For more information on manually creating a CPE Descriptor, see xref:ref.adoc#ugr.ref.xml.cpe_descriptor[Collection Processing Engine Descriptor Reference].
+
+The *Save Options* submenu has one item, __Use `<import>`__. 
+If this item is checked (the default), saved CPE descriptors will use the `<import>` syntax to refer to their component descriptors.
+If unchecked, the older `<include>` syntax will be used for new components that you add to your CPE using the GUI.
+(However, if you open a CPE descriptor that used `<import>`, these imports will not be replaced.)
+
+*Refresh Descriptors from File System* will reload all descriptors from disk.
+This is useful if you have made a change to the descriptor outside of the CPE Configurator, and want to refresh the display.
+
+*Clear All* will reset the CPE Configurator to its initial state, with no components selected.
+
+*Exit* will close the CPE Configurator.
+If you have unsaved changes, you will be prompted as to whether you would like to save them to a CPE Descriptor file.
+If you do not save them, they will be lost.
+
+When you restart the CPE Configurator, it will automatically reload the last CPE descriptor file that you were working with.
+
+[[ugr.tools.cpe.help_menu]]
+== The Help Menu
+
+The CPE Configurator's Help menu provides __About__ information and some very simple instructions on how to use the tool.
diff --git a/uimaj-documentation/src/docs/asciidoc/tools/tools.cvd.adoc b/uimaj-documentation/src/docs/asciidoc/tools/tools.cvd.adoc
new file mode 100644
index 0000000..cc643cf
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/tools/tools.cvd.adoc
@@ -0,0 +1,452 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.tools.cvd]]
+= CAS Visual Debugger
+
+[[ugr.tools.cvd.introduction]]
+== Introduction
+
+The CAS Visual Debugger is a tool to run text analysis engines in UIMA and view the results.
+The tool is implemented as a stand-alone GUI  tool using Java's Swing library. 
+
+This is a developer's tool.
+It is intended to support you in writing text analysis annotators for UIMA (Unstructured Information Management Architecture).  As a development tool, the emphasis is not so much on pretty pictures, but rather on navigability.
+It is intended to show you all the information you need, and show it to you quickly (at least on a fast machine ;-). 
+
+The main purpose of this application is to let you browse all the data that was created when you ran an analysis engine over some text.
+The display mimics the access methods you have in the CAS API in terms of indexes, types, feature structures and feature values. 
+
+As in the CAS, there is special support for annotations.
+Clicking on an annotation will select the corresponding text, and conversely, you can display all annotations that cover a given position in the text.
+This will be explained in more detail in the section on the main display area. 
+
+As usual, the graphics in this manual are for illustrative purposes and may not look 100% like the actual version of CVD you are running.
+This depends on your operating system, your version of Java, and a variety of other factors. 
+
+[[ugr.cvd.introduction.running]]
+=== Running CVD
+
+You will usually want to start CVD from the command line, or from Eclipse.
+To start CVD from the command line, you minimally need the uima-core and uima-tools jars.
+Below is a sample command line for sh and its offspring. 
+
+[source]
+----
+java -cp ${UIMA_HOME}/lib/uima-core.jar:${UIMA_HOME}/lib/uima-tools.jar 
+    org.apache.uima.tools.cvd.CVD
+----
+
+However, there is no need to type this.
+The `${UIMA_HOME}/bin` directory contains a `cvd.sh` and `cvd.bat` file for Unix/Linux/MacOS and Windows, respectively. 
+
+In Eclipse, you have a ready to use launch configuration available when you have installed the xref:oas.adoc#ugr.ovv.eclipse_setup.example_code[UIMA sample project]).
+Below is a screenshot of the the Eclipse Run  dialog with the CVD run configuration selected. 
+
+.Eclipse run dialog with CVD selected
+image::images/tools/tools.cvd/eclipse-cvd-launch.jpg[Eclipse run dialog with CVD selected]
+
+
+[[_cvd.introduction.commandline]]
+=== Command line parameters
+
+You can provide some command line parameters to influence the startup behavior of CVD.
+For example, if you want to run a certain analysis engine on a certain text over and over again (for debugging, say), you can make CVD load the annotator and text at startup and execute the annotator.
+Here's a list of the supported command line options. 
+
+.Command line options
+[cols="1,1", frame="none", options="header"]
+|===
+| Option
+| Description
+
+|``-text <textFile>``
+|Loads the text file `<textFile>`
+
+|``-desc <descriptorFile>``
+|Loads the descriptor `<descriptorFile>`
+
+|``-exec``
+|Runs the pre-loaded annotator; only allowed in conjunction with `-desc`
+
+|``-datapath <datapath>``
+|Sets the data path to `<datapath>`
+
+|``-ini <iniFile>``
+|Makes CVD use alternative ini file `<textFile>` (default is ~/annotViewer.pref)
+
+|``-lookandfeel <lnfClass>``
+|Uses alternative look-and-feel `<lnfClass>`
+|===
+
+[[_cvd.errorhandling]]
+== Error Handling
+
+On encountering an error, CVD will pop up an error dialog with a short, usually incomprehensible message.
+Often, the error message will claim that there is more information available in the log file, and sometimes, this is actually true; so do go and check the log.
+You can view the log file by selecting the appropriate item in the "Tools" menu. 
+
+
+image::images/tools/tools.cvd/ErrorExample.jpg[Sample error dialog]
+
+
+[[_cvd.preferencesfile]]
+== Preferences File
+
+The program will attempt to read on startup and save on exit a file called annotViewer.pref in your home directory.
+This file contains information about choices you made while running the program: directories (such as where your data files are) and window sizes.
+These settings will be used the next time you use the program.
+There is no user control over this process, but the file format is reasonably transparent, in case you feel like changing it.
+Note, however, that the file will be overwritten every time you exit the program. 
+
+If you use CVD for several projects, it may be convenient to use a different ini files for each project.
+You can specify the ini file CVD should use with the 
+
+[source]
+----
+-ini <iniFile>
+---- 
+
+parameter on the command line. 
+
+[[_cvd.themenus]]
+== The Menus
+
+We give a brief description of the various menus.
+All menu items come with mnemonics (e.g., Alt-F X will exit the program). In addition, some menu items have their own keyboard accelerators that you can use anywhere in the program.
+For example, Ctrl-S will save the text you've been editing. 
+
+[[_cvd.filemenu]]
+=== The File Menu
+
+The File menu lets you load, create and save text, load and save color settings, and import and export the XCAS format.
+Here's a screenshot. 
+
+
+image::images/tools/tools.cvd/FileMenu.jpg[The File menu]
+
+Below is a list of the menu items, together with an explanation. 
+
+.New Text...
+Clears the text area.
+Text you type is written to an anonymous buffer.
+You can use "Save Text As..." to save the text you typed to a file.
+Note: whenever you modify the text, be it through typing, loading a file or using the "New Text..." menu item, previous analysis results will be lost.
+Since the previous analysis is specific to the text, modifying the text invalidates the analysis. 
+
+.Open Text File
+Loads a new text file into the viewer.
+The next time you run an analysis engine, it will run the text you loaded last.
+Depending on the annotator you're using, the program may run slow with very large text files, so you may want to experiment. 
+
+.Save Text File
+Saves the currently open text file.
+If no file is currently loaded (either because you haven't loaded a file, or you've used the "New Text..." menu item), this menu item is disabled (and Ctrl-S will do nothing). 
+
+.Save Text As...
+Save the text to a file of your choosing.
+This can be an existing file, which is then overwritten, or it can be a new file that you're creating. 
+
+.Change Code Page
+Allows you to change the code page that is used to load and save text files.
+If you're sure the text you're loading is in ASCII or one of the 8-bit extensions such as ISO-8859-1 (ISO Latin1), there is probably nothing you need to do.
+Just load the text and look at the display.
+If you see no funny characters or square boxes, chances are your selected code page is compatible with your text file.
+Note that the code page setting is also in effect when you save files.
+You can observe the effects with a hex editor or by just looking at the file size.
+For example, if you save the default text `This is where the text goes.` to a file on Windows using the default code page, the size of the file will be 28 bytes.
+If you now change the code page to UTF-16 and save the file again, the file size will be 58 bytes: two bytes for each character, plus two bytes for the byte-order mark.
+Now switch the code page back to the default Windows code page and reload the UTF-16 file to see the difference in the editor.
+CVD will display all code pages that are available in the JVM you're running it on.
+The first code page in the list is the default code page of your system.
+This is also CVD's default if you don't make a specific choice.
+Your code page selection will be remembered in CVD's ini file. 
+
+.Load Color Settings
+Load previously saved color settings from a file (see Tools/Customize Annotation Display). It is highly recommended that you only load automatically generated files.
+Strange things may happen if you try to load the wrong file format.
+On startup, the program attempts to load the last color settings file that you loaded or saved during a previous session.
+If you intend to use the same color settings as the last time you ran the program, there is therefore no need to manually load a color settings file. 
+
+.Save Color Settings
+Save your customized color settings (see Tools/Customize Annotation Display). The file is a Java properties file, and as such, reasonably transparent.
+What is not transparent is the encoding of the colors (integer encoding of 24-bit RGB values), so changing the file by hand is not really recommended. 
+
+.Read Type System File
+Load a type system file.
+This allows you to load an XCAS file without having to have access to the corresponding annotator. 
+
+.Write Type System File
+Create a type system file from the currently loaded type definitions.
+In addition, you can save the current CAS as a XCAS file (see below). This allows you to later load the type system and XCAS to view the CAS without having to rerun the annotator. 
+
+.Read XMI CAS File
+Read an XMI CAS file.
+Important: XMI CAS is a serialization format that serializes a CAS without type system and index information.
+It is therefore impossible to read in a stand-alone XMI CAS file.
+XMI CAS files can only be interpreted in the context of an existing type system.
+Consequently, you need to first load the Analysis Engine that was used to create the XMI file, to be able to load that XMI file. 
+
+.Write XMI CAS File
+Writes the current analysis out as an XMI CAS file. 
+
+.Read XCAS File
+Read an XCAS file.
+Important: XCAS is a serialization format that serializes a CAS without type system and index information.
+It is therefore impossible to read in a stand-alone XCAS file.
+XCAS files can only be interpreted in the context of an existing type system.
+Consequently, you need to load the Analysis Engine that was used to create the XCAS file to be able to load it.
+Loading a XCAS file without loading the Analysis Engine may produce strange errors.
+You may get syntax errors on loading the XCAS file, or worse, everything may appear to go smoothly but in reality your CAS may be corrupted. 
+
+.Write XCAS File
+Writes the current analysis out as an XCAS file. 
+
+.Exit
+Exits the program.
+Your preferences will be saved.
+
+
+[[_cvd.editmenu]]
+=== The Edit Menu
+
+image::images/tools/tools.cvd/EditMenu.jpg[The Edit menu]
+
+The "Edit" menu provides a standard text editing menu with Cut, Copy and Paste, as well as unlimited Undo. 
+
+Note that standard keyboard accelerators Ctrl-X, Ctrl-C, Ctrl-V and Ctrl-Z can be used for Cut, Copy, Paste and Undo, respectively.
+The text area supports other standard keyboard operations such as navigation HOME, Ctrl-HOME etc., as well as marking text with Shift- <ArrowKey>. 
+
+[[_cvd.runmenu]]
+=== The Run Menu
+
+image::images/tools/tools.cvd/RunMenu.jpg[The Run menu]
+
+In the Run menu, you can load and run text analysis engines. 
+
+.Load AE
+Loads and initializes a text analysis engine.
+Choosing this menu item will display a file open dialog where you should choose an XML descriptor of a Text Analysis Engine to process the current text.
+Even if the analysis engine runs fast, this will take a while, since there is a lot of setup work to do when a new TAE is created.
+So be patient.
+When you develop a new annotator, you will often need to recompile your code.
+Gladis will not reload your annotator code.
+When you recompile your code, you need to terminate the GUI and restart it.
+If you only make changes to the XML descriptor, you don't need to restart the GUI.
+Simply reload the XML file. 
+
+.Run AE
+Before you have (successfully) loaded a TAE, this menu item will be disabled.
+After you have loaded a TAE, it will be enabled, and the name changes according to the name of the TAE you have loaded.
+For example, if you've loaded "The World's Fastest Parser", you will have a menu item called "Run The World's Fastest Parser". When you choose the item, the TAE is run on whatever text you have currently loaded.
+After a TAE has run successfully, the index window in the upper left-hand corner of the screen should be updated and show the indexes that were created by this run.
+We will have more to say about indexes and what to do with them later. 
+
+.Run AE on CAS
+This allows you to run an analysis engine on the current CAS.
+This is useful if you have loaded a CAS from an XCAS file, and would like to run further analysis on it. 
+
+.Run collectionProcessComplete
+When you select this item, the analysis engine's  collectionProcessComplete() method is called. 
+
+.Performance Report
+After you've run your analysis, you can view a performance report.
+It will show you where the time went: which component used how much of the processing time. 
+
+.Recently used
+Collects a list of recently used analysis engines as a short-cut for loading. 
+
+.Language
+Some annotators do language specific processing.
+For example, if you run lexical analysis, the results may be quite different depending on what the analysis engine thinks the language of the document is.
+With this menu item, you can manually set the document language.
+Alternatively, you can use an automatic language identification annotator.
+If the analysis engines you're working with are language agnostic, there is no need to set the language. 
+
+
+[[_cvd.toolsmenu]]
+=== The tools menu
+
+The tools menu contains some assorted utilities, such as the log file viewer.
+Here you can also set the log level for UIMA.
+A more detailed description of some of the menu items follows below. 
+
+[[_cvd.viewtypesystem]]
+==== View Type System
+
+image::images/tools/tools.cvd/TypeSystemViewer.jpg[]
+
+Brings up a new window that displays the type system.
+This menu item is disabled until the first time you have run an analysis engine, since there is no type system to display until then.
+An example is shown above. 
+
+You can view the inheritance tree on the left by expanding and collapsing nodes.
+When you select a type, the features defined on that type are displayed in the table on the right.
+The feature table has three columns.
+The first gives the name of the feature, the second one the type of the feature (i.e., what values it takes), and the third column displays the highest type this feature is defined on.
+In this example, the features "begin" and "end" are inherited from the built-in annotation type. 
+
+In the options menu, you can configure if you want to see inherited features or not (not yet implemented). 
+
+[[_cvd.showselectedannotations]]
+==== Show Selected Annotations
+
+[[_annotationviewerfigure]]
+.Annotations produced by a statistical named entity tagger 
+image::images/tools/tools.cvd/AnnotationViewer.jpg[]
+
+To enable this menu, you must have run an analysis engine and selected the ``AnnotationIndex'' or one of its subnodes in the upper left hand corncer of the screen.
+It will bring up a new text window with all selected annotations marked up in the text. 
+
+<<_annotationviewerfigure>> shows the results of applying a statistical named entity tagger to a newspaper article.
+Some annotation colors have been customized: countries are in reverse video, organizations have a turquois background, person names are green, and occupations have a maroon background.
+The default background color is yellow.
+This color is also used if there is more than one annotation spanning a certain text.
+Clearly, this display is only useful if you don't have any overlapping annotations, or at least not too many. 
+
+This menu item is also available as a context menu in the Index Tree area of the main window.
+To use it, select the annotation index or one of its subnodes, right-click to bring up a popup menu, and select the only item in the popup menu.
+The popup menu is actually a better way to invoke the annotation display, since it changes according to the selection in the Index Tree area, and will tell you if what you've selected can be displayed or not. 
+
+[[_cvd.maindisplayarea]]
+== The Main Display Area
+
+The main display area has three sub-areas.
+In the upper left-hand corner is the **index display**, which shows the indexes that were defined in the  AE, as well as the types of the indexes and their subtypes.
+In the lower left-hand corner, the content of indexes and sub-indexes is displayed  (**FS display**).  Clicking on any node in the index display will  show the corresponding feature structures in the FS display.
+You can explore those structures by expanding the tree nodes.
+When you click on a node that represents an annotation, clicking on it will cause the corresponding text span to marked in the **text display**. 
+
+[[_main1figure]]
+.State of GUI after running an analysis engine
+image::images/tools/tools.cvd/Main1.jpg[]
+
+<<_main1figure>> shows the state after running the UIMA_Analysis_Example.xml aggregate from the uimaj-examples project.
+There are two indexes in the index display, and the annotation index has been selected.
+Note that the number of structures in an index is displayed in square brackets after the index name. 
+
+Since displaying thousands of sister nodes is both confusing and slow, nodes are grouped in powers of 10.
+As soon as there are no more than 100 sister nodes, they are displayed next to each other. 
+
+In our example, a name annotation has been selected, and the corresponding token text is highlighted in the text area.
+We have also expanded the token node to display its structure (not much to see in this simple example). 
+
+In <<_main1figure>>, we selected an annotation in the FS display to find the corresponding text.
+We can also do the reverse and find out what annotations cover a certain point in the text.
+Let's go back to the name recognizer for an example. 
+
+[[_main2figure]]
+.Finding annotations for a specific location in the text 
+image::images/tools/tools.cvd/Main2.jpg[]
+
+We would like to know if the Michael Baessler has been recognized as a name.
+So we position the cursor in the corresponding text span somewhere, then right-click to bring up the context menu telling us which annotations exist at this point.
+An example is shown in <<_main2figure>>. 
+
+[[_main3figure]]
+.Selecting an annotation from the context menu will highlight thatannotation in the FS display 
+image::images/tools/tools.cvd/Main3.jpg[]
+
+At this point (<<_main2figure>>),  we only know that somewhere around the text cursor position (not visible in the picture), we discovered a name.
+When we select the corresponding entry in the context menu, the name annotation is selected in the FS display, and its covered text is highlighted. <<_main3figure>> shows the display after  the name node has been selected in the popup menu. 
+
+We're glad to see that, indeed, Michael Baessler is considered to be a name.
+Note that in the FS display, the corresponding annotation node has been selected, and the tree has been expanded to make the node visible. 
+
+NB that the annotations displayed in the popup menu come from the annotations currently displayed in the FS display.
+If you didn't select the annotation index or one of its sub-nodes, no annotations can be displayed and the popup menu will be empty. 
+
+[[_cvd.statusbar]]
+=== The Status Bar
+
+At the bottom of the screen, some useful information is displayed in the **status bar**.
+The left-most area shows the most recent major event, with the time when the event terminated in square brackets.
+The next area shows the file name of the currently loaded XML descriptor.
+This area supports a tool tip that will show the full path to the file.
+The right-most area shows the current cursor position, or the extent of the selection, if a portion of the text has been selected.
+The numbers correspond to the character offsets that are used for annotations. 
+
+[[_cvd.keyboardnavigation]]
+=== Keyboard Navigation and Shortcuts
+
+The GUI can be completely navigated and operated through the keyboard.
+All menus and menu items support keyboard mnemonics, and some common operations are accessible through keyboard accelerators. 
+
+You can move the focus between the three main areas using `Tab` (clockwise) and `Shift-Tab` (counterclockwise). When the focus is on the text area, the `Tab` key will insert the corresponding character into the text, so you will need to use `Ctrl-Tab` and `Ctrl-Shift-Tab` instead.
+Alternatively, you can use the following key bindings to jump directly to one of the areas: `Ctrl-T` to focus the text area, `Ctrl-I` for the index repository frame and `Ctrl-F` for the feature structure area. 
+
+Some additional keyboard shortcuts are available only in the text area, such as `Ctrl-X` for Cut, `Ctrl-C` for Copy, `Ctrl-V` for Paste and `Ctrl-Z` for Undo.
+The context menu in the text area can be evoke through the `Alt-Enter` shortcut.
+Text can be selected using the arrow keys while holding the `Shift` key. 
+
+The following table shows the supported keyboard shortcuts. 
+
+.Keyboard shortcuts
+[cols="1,1,1", frame="none", options="header"]
+|===
+| Shortcut
+| Action
+| Scope
+
+|``Ctrl-O``
+|Open text file
+|Global
+
+|``Ctrl-S``
+|Save text file
+|Global
+
+|``Ctrl-L``
+|Load AE descriptor
+|Global
+
+|``Ctrl-R``
+|Run current AE
+|Global
+
+|``Ctrl-I``
+|Switch focus to index repository
+|Global
+
+|``Ctrl-T``
+|Switch focus to text area
+|Global
+
+|``Ctrl-F``
+|Switch focus to FS area
+|Global
+
+|``Ctrl-X``
+|Cut selection
+|Text
+
+|``Ctrl-C``
+|Copy selection
+|Text
+
+|``Ctrl-V``
+|Paste selection
+|Text
+
+|``Ctrl-Z``
+|Undo
+|Text
+
+|``Alt-Enter``
+|Show context menu
+|Text
+|===
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/tools/tools.doc_analyzer.adoc b/uimaj-documentation/src/docs/asciidoc/tools/tools.doc_analyzer.adoc
new file mode 100644
index 0000000..523f36b
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/tools/tools.doc_analyzer.adoc
@@ -0,0 +1,172 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.tools.doc_analyzer]]
+= Document Analyzer User's Guide
+
+The _Document Analyzer_ is a tool provided by the UIMA SDK for testing annotators and AEs.
+It reads text files from your disk, processes them using an AE, and allows you to view the results.
+The Document Analyzer is designed to work with text files and cannot be used with Analysis Engines that process other types of data.
+
+For an introduction to developing annotators and Analysis Engines, read xref:tug.adoc#ugr.tug.aae[Annotator and Analysis Engine Developer's Guide]$$.$$
+This chapter is a user's guide for using the Document Analyzer tool, and does not describe the process of developing annotators and Analysis Engines.
+
+[[ugr.tools.doc_analyzer.starting]]
+== Starting the Document Analyzer
+
+To run the Document Analyzer, execute the `documentAnalyzer` script that is in the `bin` directory of your UIMA SDK installation, or, if you are using the example Eclipse project, execute the "`UIMA Document Analyzer`" run configuration supplied with that project.
+
+Note that if you're planning to run an Analysis Engine other than one of the examples included in the UIMA SDK, you'll first need to update your CLASSPATH environment variable to include the classes needed by that Analysis Engine.
+
+When you first run the Document Analyzer, you should see a screen that looks like this: 
+
+
+image::images/tools/tools.doc_analyzer/DocAnalyzerScr1.png[Document Analyzer GUI]
+
+
+[[ugr.tools.doc_analyzer.running_an_ae]]
+== Running an AE
+
+To run a AE, you must first configure the six fields on the main screen of the Document Analyzer.
+
+*Input Directory:*   Browse to or type the path of a directory containing text files that you want to analyze.
+Some sample documents are provided in the UIMA SDK under the `examples/data` directory.
+
+*Input File Format:* Set this to "text".  It can, alternatively,  be set to one of the two serialized forms for CASes, if you have previously generated and saved these.
+For the CAS formats only, you can also specify "Lenient deserialization"; if checked, then extra types and features in the CAS being deserialized and loaded (that are not defined by the Annotator-to-be-run's type system) will not cause a deserialization error, but will instead be ignored.
+
+*Character Encoding:*   The character encoding of the input files.
+The default, UTF-8, also works fine for ASCII text files.
+If you have a different encoding, select it here.
+For more information on character sets and their names, see the Javadocs for ``java.nio.charset.Charset``.
+
+*Output Directory:* Browse to or type the path of a directory where you want output to be written.
+(As we'll see later, you won't normally need to look directly at these files, but the Document Analyzer needs to know where to write them.) The files written to this directory will be an XML representation of the analyzed documents.
+If this directory doesn't exist, it will be created.
+If the directory exists, any files in it will be deleted (but the tool will ask you to confirm this before doing so). If you leave this field blank, your AE will be run but no output will be generated.
+
+*Location of AE XML Descriptor:*   Browse to or type the path of the descriptor for the AE that you want to run.
+There are some example descriptors provided in the UIMA SDK under the `examples/descriptors/analysis_engine` and `examples/descriptors/tutorial` directories.
+
+*XML Tag containing Text:*   This is an optional feature.
+If you enter a value here, it specifies the name of an XML tag, expected to be found within the input documents, that contains the text to be analyzed.
+For example, the value `TEXT` would cause the AE to only analyze the portion of the document enclosed within <TEXT>...</TEXT> tags.
+Also, any XML tags occuring within that text will be removed prior to analysis.
+
+*Language:* Specify the language in which the documents are written.
+Some Analysis Engines, but not all, require that this be set correctly in order to do their analysis.
+You can select a value from the drop-down list or type your own.
+The value entered here must be an ISO language identifier, the list of which can be found here: http://www.ics.uci.edu/pub/ietf/http/related/iso639.txt. 
+
+Once you've filled in the appropriate values, press the "`Run`" button.
+
+If an error occurs, a dialog will appear with the error message.
+(A stack trace will also be printed to the console, which may help you if the error was generated by your own annotator code.)  Otherwise, an "`Analysis Results`" window will appear.
+
+[[ugr.tools.doc_analyzer.viewing_results]]
+== Viewing the Analysis Results
+
+After a successful analysis, the "`Analysis
+Results`" window will appear. 
+
+
+image::images/tools/tools.doc_analyzer/image004.jpg[Analysis Results Window]
+
+The "`Results Display Format`" options at the bottom of this window show the different ways you can view your analysis – the Java Viewer, Java Viewer (JV) with User Colors, HTML, and XML.
+The default, Java Viewer, is recommended.
+
+Once you have selected your desired Results Display Format, you can double-click on one of the files in the list to view the analysis done on that file.
+
+For the Java viewer, two different view modes are supported, each represented by one of two  radio buttons titled "Annnotations", and "Features":
+
+In the "Annotations" view, each annotation which is declared to be an output of the pipeline  (in the top most Annotator Descriptor) is given a checkbox and a color, in the bottom panel.
+You can control which annotations are shown by using the checkboxes in the bottom panel, the Select All button,  or the Deselet All button.
+The results display looks like this (for the AE descriptor ``examples/descriptors/tutorial/ex4/MeetingDetectorTAE.xml``): 
+
+
+image::images/tools/tools.doc_analyzer/image006v2.png[Analysis Results Window showing results from tutorial example 4 in Annotations view mode]
+
+You can click the mouse on one of the highlighted annotations to see a list of all its features in the frame on the right.
+
+In the "Features" view, you can specify a combination of a single type, a single feature of that type, and some feature values for that feature.
+The annotations whose feature values match will be highlighted.
+Step by step, you first select a specific type of annotations by using  a radio button in the first tab of the legend. 
+
+
+image::images/tools/tools.doc_analyzer/image007-1v2.png[Analysis Results Window showing results from tutorial example 4 in Features view mode by selecting the DateAnnotation type.]
+
+Selecting this automatically transitions to the second tab, where you then select a specific feature  of the annotation type. 
+
+
+image::images/tools/tools.doc_analyzer/image007-2v2.png[Analysis Results Window showing results from tutorial example 4 in Features view mode by selecting the shortDateString feature.]
+
+Selecting this again automatically transitions you to the thrid tab, where you select some specific feature  values in the third tab of the legend. 
+
+
+image::images/tools/tools.doc_analyzer/image007-3v2.png[Analysis Results Window showing results from tutorial example 4 in Features view mode by selecting individual shortDateString feature values.]
+
+In each of the above two view modes, you can click the mouse on one of the highlighted  annotations to see a list of all its features in the frame on the right.
+
+If you are viewing a CAS that contains multiple subjects of analysis, then a selector will appear at the bottom right of the Annotation Viewer window.
+This will allow you to choose the Sofa that you wish to view.
+Note that only text Sofas containing a non-null document are available for viewing.
+
+[[ugr.tools.doc_analyzer.configuring]]
+== Configuring the Annotation Viewer
+
+The "`JV User Colors`" and the HTML viewer allow you to specify exactly which colors are used to display each of your annotation types.
+For the Java Viewer, you can also specify which types should be initially selected, and you can hide types entirely.
+
+To configure the viewer, click the "`Edit Style
+Map`" button on the "`Analysis Results`" dialog.
+You should see a dialog that looks like this: 
+
+
+image::images/tools/tools.doc_analyzer/image008.jpg[Configuring the Analysis Results Viewer]
+
+To change the color assigned to a type, simply click on the colored cell in the "`Background`" column for the type you wish to edit.
+This will display a dialog that allows you to choose the color.
+For the HTML viewer only, you can also change the foreground color.
+
+If you would like the type to be initially checked (selected) in the legend when the viewer is first launched, check the box in the "`Checked`" column.
+If you would like the type to never be shown in the viewer, click the box in the "`Hidden`" column.
+These settings only affect the Java Viewer, not the HTML view.
+
+When you are done editing, click the "`Save`" button.
+This will save your choices to a file in the same directory as your AE descriptor.
+From now on, when you view analysis results produced by this AE using the "`JV User Colors`" or "`HTML`" options, the viewer will be configured as you have specified.
+
+[[ugr.tools.doc_analyzer.interactive_mode]]
+== Interactive Mode
+
+Interactive Mode allows you to analyze text that you type or cut-and-paste into the tool, rather than requiring that the documents be stored as files.
+
+In the main Document Analyzer window, you can invoke Interactive Mode by clicking the "`Interactive`" button instead of the "`Run`" button.
+This will display a dialog that looks like this: 
+
+
+image::images/tools/tools.doc_analyzer/image010.jpg[Invoking Interactive Mode]
+
+You can type or cut-and-paste your text into this window, then choose your Results Display Format and click the "`Analyze`" button.
+Your AE will be run on the text that you supplied and the results will be displayed as usual.
+
+[[ugr.tools.doc_analyzer.view_mode]]
+== View Mode
+
+If you have previously run a AE and saved its analysis results, you can use the Document Analyzer's View mode to view those results, without re-running your analysis.
+To do this, on the main Document Analyzer window simply select the location of your analyzed documents in the "`Output Directory`" dialog and click the "`View`" button.
+You can then view your analysis results as described in Section <<ugr.tools.doc_analyzer.viewing_results>>.
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/tools/tools.eclipse_launcher.adoc b/uimaj-documentation/src/docs/asciidoc/tools/tools.eclipse_launcher.adoc
new file mode 100644
index 0000000..5ad5104
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/tools/tools.eclipse_launcher.adoc
@@ -0,0 +1,56 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.tools.eclipse_launcher]]
+= Eclipse Analysis Engine Launcher's Guide
+// <titleabbrev>Eclipse Analysis Engine Launcher's Guide</titleabbrev>
+
+The Analysis Engine Launcher is an Eclipse plug-in that provides debug and run support  for Analysis Engines directly within eclipse, like a Java program can be debugged.
+It supports most of the descriptor formats except CPE, UIMA AS and some remote deployment descriptors. 
+
+
+image::images/tools/tools.eclipse_launcher/image01.png[]
+
+
+[[ugr.tools.eclipse_launcher.create_configuration]]
+== Creating an Analysis Engine launch configuration
+
+To debug or run an Analysis Engine a launch configuration must be created.
+To do this select "Run -> Run Configurations" or "Run -> Run Configurations" from the menu bar.
+A dialog will open where the launch configuration can be created.
+Select UIMA Analysis Engine and create a new configuration via pressing the New button at the top, or via the New button in the context menu.
+The newly created configuration will be automatically selected and the Main tab will be displayed. 
+
+The Main tab defines the Analysis Engine which will be launched.
+First select the project which contains the descriptor, then choose a descriptor and select the input.
+The input can either be a folder which contains input files or just a single input file, if the recursively check box is marked the input folder will be scanned recursively for input files. 
+
+The input format defines the format of the input files, if it is set to CASes the input resource must be either in the XMI or XCAS format and if it is set to plain text, plain text input files in the specified encoding are expected.
+The input logic filters out all files which do not have an appropriate file ending, depending on the chosen format the file ending must be one of .xcas, .xmi or .txt, all other files are ignored when the input is a folder, if a single file is selected it will be processed independent of the file ending. 
+
+The output directory is optional, if set all processed input files will be written to the specified directory in the XMI CAS format, if the clear check box is marked all files inside the output folder will be deleted, usually this option is not needed because existing files will be overwritten without notice. 
+
+The other tabs in the launch configuration are documented in the eclipse documentation, see the "Java development user guide -> Tasks -> Running and Debugging". 
+
+[[ugr.tools.eclipse_launcher.launching]]
+== Launching an Analysis Engine
+
+To launch an Analysis Engine go to the previously created launch configuration and click on "Debug" or "Run" depending on the desired run mode.
+The Analysis Engine will now be launched.
+The output will be shown in the Console View.
+To debug an Analysis Engine place breakpoints inside the implementation class.
+If a breakpoint is hit the execution will pause  like in a Java program. 
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/tools/tools.jcasgen.adoc b/uimaj-documentation/src/docs/asciidoc/tools/tools.jcasgen.adoc
new file mode 100644
index 0000000..8b162c9
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/tools/tools.jcasgen.adoc
@@ -0,0 +1,168 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.tools.jcasgen]]
+= JCasGen User's Guide
+
+JCasGen reads a descriptor for an application (either an Analysis Engine Descriptor,  or a Type System Descriptor), creates the merged type system specification by merging all the type system information from all the components referred to in the descriptor, and then uses this merged type system to create Java source files for classes that enable JCas access to the CAS.
+Java classes are not produced for the built-in types, since these classes are already provided by the UIMA SDK.
+(An exception is the built-in type ``uima.tcas.DocumentAnnotation``, see the warning below.) 
+
+[WARNING]
+====
+If the components comprising the input to the type merging process  have different definitions for the same type name, JCasGen will show a warning, and in some environments may offer to abort the operation.
+If you continue past this warning,  JCasGen will produce correct Java source files representing the merged types  (that is, the type definition containing all of the features defined on that type by all of the components).  It is recommended that you do not use this capability (of having  two different definitions for the same type name, with different feature sets) since it can make it difficult to xref:ref.adoc#ugr.ref.jcas.merging_types_from_other_specs[combine/package] your annotator with others.
+
+Also note that if your type system declares a custom version of the `uima.tcas.DocumentAnnotation`  built-in type, then JCasGen will generate a Java source file for it.
+If you do this, you need to be aware of the issues discussed in the xref:ref.adoc#ugr.ref.jcas.documentannotation_issues[JCas Reference].
+====
+
+JCasGen can be run in many ways.
+For Eclipse users using the Component Descriptor Editor, there's a button on the Type System Description page to run it on that type system.
+There's also a jcasgen-maven-plugin to use  in maven build scripts.
+There's a menu-driven GUI tool for it.
+And, there are command line scripts you can use to invoke it.
+
+There are several versions of JCasGen.
+The basic version reads an XML descriptor which contains a type system descriptor, and generates the corresponding Java Class Models for those types.
+Variants exist for the Eclipse environment that allow merging the newly generated Java source code with xref:ref.adoc#ugr.ref.jcas.augmenting_generated_code[previously augmented versions].
+
+Input to JCasGen needs to be mostly self-contained.
+In particular, any types that are defined to depend on user-defined supertypes must have that supertype defined, if the supertype is `uima.tcas.Annotation` or a subtype of it.
+Any features referencing ranges which are subtypes of uima.cas.String must have those subtypes included.
+If this is not followed, a warning message is given stating that the resulting generation may be inaccurate.
+
+JCasGen is typically invoked automatically when using the xref:tools.adoc#ugr.tools.cde.auto_jcasgen[Component Descriptor Editor], but can also be run using a shell script.
+These scripts can take 0, 1, or 2 arguments.
+The first argument is the location of the file containing the input XML descriptor.
+The second argument specifies where the generated Java source code should go.
+If it isn't given, JCasGen generates its output into a subfolder called JCas (or sometimes JCasNew -- see below), of the first argument's path.
+
+The first argument, the input file, can be written as `jar:<url>!{entry}`, for example: `jar:http://www.foo.com/bar/baz.jar!/COM/foo/quux.class`
+
+If no arguments are given to JCasGen, then it launches a GUI to interact with the user and ask for the same input.
+The GUI will remember the arguments you previously used.
+Here's what it looks like: 
+
+.JCasGen tool showing fields for input arguments
+image::images/tools/tools.jcasgen/image002.jpg[JCasGen tool showing fields for input arguments]
+
+When running with automatic merging of the generated Java source with previously augmented versions, the output location is where the merge function obtains the source for the merge operation.
+
+As is customary for Java, the generated class source files are placed in the appropriate subdirectory structure according to Java conventions that correspond to the package (name space) name.
+
+The Java classes must be compiled and the resulting class files included in the class path of your application; you make these classes available for other annotator writers using your types, perhaps packaged as an xxx.jar file.
+If the xxx.jar file is made to contain only the Java Class Models for the CAS types, it can be reused by any users of these types.
+
+[[ugr.tools.jcasgen.running_without_eclipse]]
+== Running stand-alone without Eclipse
+
+There is no capability to automatically merge the generated Java source with previous versions, unless running with Eclipse.
+If run without Eclipse, no automatic merging of the generated Java source is done with any previous versions.
+In this case, the output is put in a folder called "`JCasNew`" unless overridden by specifying a second argument.
+
+The distribution includes a shell script/bat file to run the stand-alone version, called jcasgen.
+
+[[ugr.tools.jcasgen.running_standalone_with_eclipse]]
+== Running stand-alone with Eclipse
+
+If you have Eclipse and EMF (EMF = Eclipse Modeling Framework; both of these are available from http://www.eclipse.org) installed (version 3 or later) JCasGen can merge the Java code it generates with previous versions, picking up changes you might have inserted by hand.
+The output (and source of the merge input) is in a folder "`JCas`" under the same path as the input XML file, unless overridden by specifying a second argument.
+
+You must install the UIMA plug-ins into Eclipse to enable this function.
+
+The distribution includes a shell script/bat file to run the stand-alone with Eclipse version, called jcasgen_merge.
+This works by starting Eclipse in "`headless`" mode (no GUI) and invoking JCasGen within Eclipse.
+You will need to set the ECLIPSE_HOME environment variable or modify the jcasgen_merge shell script to specify where to find Eclipse.
+The version of Eclipse needed is 3 or higher, with the EMF plug-in and the UIMA runtime plug-in installed.
+A temporary workspace is used; the name/location of this is customizable in the shell script.
+
+Log and error messages are written to the UIMA log.
+This file is called uima.log, and is located in the default working directory, which if not overridden, is the startup directory of Eclipse.
+
+[[ugr.tools.jcasgen.running_within_eclipse]]
+== Running within Eclipse
+
+There are two ways to run JCasGen within Eclipse.
+The first way is to configure an Eclipse external tools launcher, and use it to run the stand-alone shell scripts, with the arguments filled in.
+Here's a picture of a typical launcher configuration screen (you get here by navigating from the top menu: Run –> External Tools –> External tools...). 
+
+
+image::images/tools/tools.jcasgen/image004.jpg[Running JCasGen within Eclipse using the external tool launcher]
+
+The second way (which is the normal way it's done) to run within Eclipse is to use the xref:tools.adoc#ugr.tools.cde[Component Descriptor Editor (CDE)]. This tool can be configured to automatically launch JCasGen whenever the type system descriptor is modified.
+In this release, this operation completely regenerates the files, even if just a small thing changed.
+For very large type systems, you probably don't want to enable this all the time.
+The configurator tool has an option to enable/disable this function.
+
+[[ugr.tools.jcasgen.maven_plugin]]
+== Using the jcasgen-maven-plugin
+
+For Maven builds, you can use the jcasgen-maven-plugin to take one or more top level descriptors (Type System or Analysis Engine descriptors), merge them together in the standard way UIMA merges type definitions, and produce the corresponding JCas source classes.
+These, by default, are generated to the standard spot for Maven builds for generated files.
+
+You can use ant-like include / exclude patterns to specify the top level descriptor files.
+If you set `<limitToProject>` to `true`, then after a complete UIMA type system merge is done with all of the types, including those that are imported, only those types which are defined within this Maven project (that is, in some subdirectory of the project) will be generated.
+
+To use the `jcasgen-maven-plugin`, specify it in the POM as follows:
+
+[source]
+----
+<plugin>
+  <groupId>org.apache.uima</groupId>
+  <artifactId>jcasgen-maven-plugin</artifactId>
+  <version>2.4.1</version>  <!-- change this to the latest version -->
+  <executions>
+    <execution>
+      <goals><goal>generate</goal></goals>  <!-- this is the only goal -->
+      <!-- runs in phase process-resources by default -->
+      <configuration>
+
+        <!-- REQUIRED -->
+        <typeSystemIncludes>
+          <!-- one or more ant-like file patterns 
+               identifying top level descriptors --> 
+          <typeSystemInclude>src/main/resources/MyTs.xml
+          </typeSystemInclude>
+        </typeSystemIncludes>
+
+        <!-- OPTIONAL -->
+        <!-- a sequence of ant-like file patterns 
+             to exclude from the above include list -->
+        <typeSystemExcludes>
+        </typeSystemExcludes>
+
+        <!-- OPTIONAL -->
+        <!-- where the generated files go -->
+        <!-- default value: 
+             ${project.build.directory}/generated-sources/jcasgen" -->
+        <outputDirectory> 
+        </outputDirectory>
+
+        <!-- true or false, default = false -->
+        <!-- if true, then although the complete merged type system 
+             will be created internally, only those types whose
+             definition is contained within this maven project will be
+             generated.  The others will be presumed to be 
+             available via other projects. -->
+        <!-- OPTIONAL -->
+        <limitToProject>false</limitToProject>
+      </configuration>     
+    </execution>
+  </executions>
+</plugin>
+----
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/tools/tools.pear.installer.adoc b/uimaj-documentation/src/docs/asciidoc/tools/tools.pear.installer.adoc
new file mode 100644
index 0000000..88e255e
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/tools/tools.pear.installer.adoc
@@ -0,0 +1,64 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.tools.pear.installer]]
+= PEAR Installer User's Guide
+
+PEAR (Processing Engine ARchive) is a new standard for packaging UIMA compliant components.
+This standard defines several service elements that should be included in the archive package to enable automated installation of the encapsulated UIMA component.
+The major PEAR service element is an XML Installation Descriptor that specifies installation platform, component attributes, custom installation procedures and environment variables. 
+
+The installation of a UIMA compliant component includes 2 steps: (1) installation of the component code and resources in a local file system, and (2) verification of the serviceability of the installed component.
+Installation of the component code and resources involves extracting component files from the archive (PEAR) package in a designated directory and localizing file references in component descriptors and other configuration files.
+Verification of the component serviceability is accomplished with the help of standard UIMA mechanisms for instantiating analysis engines. 
+
+
+image::images/tools/tools.pear.installer/image002.jpg[PEAR Installer GUI]
+
+There are two versions of the PEAR Installer.
+One is an interactive, GUI-based application which puts up a panel asking for the parameters of the installation; the  other is a command line interface version where you pass the parameters needed on the command line itself.
+To launch the GUI version of the PEAR Installer, use the script in the UIMA bin directory: `runPearInstaller.bat` or `runPearInstaller.sh.` The command line is launched using `runPearInstallerCli.cmd` or `runPearInstallerCli.sh.`
+
+The PEAR Installer installs UIMA compliant components (analysis engines) from PEAR packages in a local file system.
+To install a desired UIMA component the user needs to select the appropriate PEAR file in a local file system and specify the installation directory (optional). If no installation directory is specified, the PEAR file is installed to the current working directory.
+By default the PEAR packages are not installed directly to the specified installation directory.
+For each PEAR a subdirectory with the name of the PEAR's ID is created where the PEAR package is  installed to.
+If the PEAR installation directory already exists, the old content is automatically  deleted before the new content is installed.
+During the component installation the user can read messages printed by the installation program in the message area of the application window.
+If the installation fails, appropriate error message is printed to help identifying and fixing the problem.
+
+After the desired UIMA component is successfully installed, the PEAR Installer allows testing this component in the CAS Visual Debugger (CVD) application, which is provided with the UIMA package.
+The xref:tools.adoc#ugr.tools.cvd[CVD application] will load your UIMA component using its XML descriptor file.
+If the component is loaded successfully, you'll be able to run it either with sample documents provided in the `<UIMA_HOME>/examples/data` directory, or with any other sample documents.
+Running your component in the CVD application helps to make sure the component will run in other UIMA applications.
+If the CVD application fails to load or run your component, or throws an exception, you can find more information about the problem in the uima.log file in the current working directory.
+The log file can be viewed with the CVD.
+
+PEAR Installer creates a file named `setenv.txt` in the `<component_root>/metadata` directory.
+This file contains environment variables required to run your component in any UIMA application.
+It also creates a xref:ref.adoc#ugr.ref.pear.specifier[PEAR descriptor] file named `<componentID>_pear.xml` in the `<component_root>` directory that can be used to directly run the installed pear file in your application. 
+
+The `metadata/setenv.txt` is not read by the UIMA framework anywhere.
+It's there for use by non-UIMA application code if that code wants to set environment variables.
+The `metadata/setenv.txt` is just a "convenience" file duplicating what is in the XML. 
+
+The `setenv.txt` file has two special variables: the `CLASSPATH` and the `PATH`.
+The `CLASSPATH` is computed from any supplied `CLASSPATH` environment variable,  plus the jars that are configured in the PEAR structure, including subcomponents.
+The `PATH` is similarly computed, using any supplied `PATH` environment variable plus  it includes the `bin` subdirectory of the PEAR structure, if it exists. 
+
+The command line version of the PEAR installer has one required argument: the path to the PEAR file being installed.
+A second argument can specify the installation directory (default is the current working directory). An optional argument, one of `-c` or `-check` or `-verify`, causes verification to be done after installation, as described above.
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/tools/tools.pear.merger.adoc b/uimaj-documentation/src/docs/asciidoc/tools/tools.pear.merger.adoc
new file mode 100644
index 0000000..198c265
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/tools/tools.pear.merger.adoc
@@ -0,0 +1,90 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.tools.pear.merger]]
+= PEAR Merger User's Guide
+
+The PEAR Merger utility takes two or more PEAR files and merges their contents, creating a new PEAR which has, in turn, a new Aggregate analysis engine whose delegates are the components from the original files being merged.
+It does this by (1) copying the contents of the input components into the output component, placing each component into a separate subdirectory, (2) generating a UIMA descriptor for the output Aggregate  analysis engine and (3) creating an output PEAR file that encapsulates the output Aggregate.
+
+The merge logic is quite simple, and is intended to work for simple cases.
+More complex merging needs to be done by hand.
+Please see the Restrictions and Limitations section, below.
+
+To run the PearMerger command line utility you can use the runPearMerger scripts (.bat for Windows, and .sh for Unix). The usage of the tooling is shown below:
+
+
+[source]
+----
+runPearMerger 1st_input_pear_file ... nth_input_pear_file 
+  -n output_analysis_engine_name [-f output_pear_file ]
+----
+
+The first group of parameters are the input PEAR files.
+No duplicates are allowed here.
+The `-n` parameter is the name of the generated Aggregate Analysis Engine.
+The optional `-f` parameter specifies the name of the output file.
+If it is omitted, the output is written to `output_analysis_engine_name.pear` in the current working directory.
+
+During the running of this tool, work files are written to a temporary directory created in the user's home directory.
+
+[[ugr.tools.pear.merger.merge_details]]
+== Details of the merging process
+
+The PEARs are merged using the following steps:
+
+. A temporary working directory, is created for the output aggregate component.
+. Each input PEAR file is extracted into a separate 'input_component_name' folder under the working directory.
+. The extracted files are processed to adjust the '$main_root' macros. This operation differs from the PEAR installation operation, because it does not replace the macros with absolute paths.
+. The output PEAR directory structure, 'metadata' and 'desc' folders under the working directory, are created.
+. The UIMA AE descriptor for the output aggregate component is built in the 'desc' folder. This aggregate descriptor refers to the input delegate components, specifying 'fixed flow' based on the original order of the input components in the command line. The aggregate descriptor's 'capabilities' and 'operational properties' sections are built based on the input components' specifications.
+. A new PEAR installation descriptor is created in the 'metadata' folder, referencing the new output aggregate descriptor built in the previous step. 
+. The content of the temporary output working directory is zipped to created the output PEAR, and then the temporary working directory is deleted. 
+
+The PEAR merger utility logs all the operations both to standard console output and to a log file, pm.log, which is created in the current working directory.
+
+[[ugr.tools.pear.merger.testing_modifying_resulting_pear]]
+== Testing and Modifying the resulting PEAR
+
+The output PEAR file can be installed and tested using the PEAR Installer.
+The output aggregate component can also be tested by using the CVD or DocAnalyzer tools.
+
+The PEAR Installer creates Eclipse project files (.classpath and .project) in the root directory of the installer PEAR, so the installed component can be imported into the Eclipse IDE as an external project.
+Once the component is in the Eclipse IDE, developers may use the Component Descriptor Editor and the PEAR Packager to modify the output aggregate descriptor and re-package the component.
+
+[[ugr.tools.pear.merger.restrictions_limitations]]
+== Restrictions and Limitations
+
+The PEAR Merger utility only does basic merging operations, and is limited as follows.
+You can overcome these by editing the resulting PEAR file or the resulting Aggregate Descriptor.
+
+. The Merge operation specifies Fixed Flow sequencing for the Aggregate.
+. The merged aggregate does not define any parameters, so the delegate parameters cannot be overridden.
+. No External Resource definitions are generated for the aggregate.
+. No Sofa Mappings are generated for the aggregate.
+. Name collisions are not checked for. Possible name collisions could occur in the fully-qualified class names of the implementing Java classes, the names of JAR files, the names of descriptor files, and the names of resource bindings or resource file paths.
+. The input and output capabilities are generated based on merging the capabilities from the components (removing duplicates). Capability sets are ignored - only the first of the set is used in this process, and only one set is created for the generated Aggregate. There is no support for merging Sofa specifications.
+. No Indexes or Type Priorities are created for the generated Aggregate. No checking is done to see if the Indexes or Type Priorities of the components conflict or are inconsistent.
+. You can only merge Analysis Engines and CAS Consumers. 
+. Although PEAR file installation descriptors that are being merged can have specific XML elements describing Collection Reader and CAS Consumer descriptors, these elements are ignored during the merge, in the sense that the installation descriptor that is created by the merge does not set these elements. The merge process does not use these elements; the output PEAR's new aggregate only references the merged components' main PEAR descriptor element, as identified by the PEAR element: 
++
+[source]
+----
+<SUBMITTED_COMPONENT>
+  <DESC>the_component.xml</DESC>... 
+</SUBMITTED_COMPONENT>
+----
diff --git a/uimaj-documentation/src/docs/asciidoc/tools/tools.pear.packager.adoc b/uimaj-documentation/src/docs/asciidoc/tools/tools.pear.packager.adoc
new file mode 100644
index 0000000..ee7d486
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/tools/tools.pear.packager.adoc
@@ -0,0 +1,210 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.tools.pear.packager]]
+= PEAR Packager User's Guide
+
+A xref:ref.adoc#ugr.ref.pear[PEAR (Processing Engine ARchive)] file is a standard package for UIMA (Unstructured Information Management Architecture) components.
+The PEAR package can be used for distribution and reuse by other components or applications.
+It also allows applications and tools to manage UIMA components automatically for verification, deployment, invocation, testing, etc.
+
+This chapter describes how to use the PEAR Eclipse plugin or the PEAR command line packager to create PEAR files for standard UIMA components.
+
+[[ugr.tools.pear.packager.using_eclipse_plugin]]
+== Using the PEAR Eclipse Plugin
+
+The PEAR Eclipse plugin is automatically installed if you followed the directions in xref:oas.adoc#ugr.ovv.eclipse_setup[Setup Guide].
+The use of the  plugin involves the following two steps:
+
+* Add the UIMA nature to your project 
+* Create a PEAR file using the PEAR generation wizard 
+
+
+[[ugr.tools.pear.packager.add_uima_nature]]
+=== Add UIMA Nature to your project
+
+First, create a project for your UIMA component:
+
+* Create a Java project, which would contain all the files and folders needed for your UIMA component.
+* Create a source folder called `src` in your project, and make it the only source folder, by clicking on __Properties__ in your project's context menu (right-click), then select __Java Build Path__, then add the `src` folder to the source folders list, and remove any other folder from the list.
+* Specify an output folder for your project called bin, by clicking on __Properties__ in your project's context menu (right-click), then select "`Java Build Path`", and specify "`__your_project_name__/bin`" as the default output folder. 
+
+Then, add the UIMA nature to your project by clicking on __Add UIMA Nature__ in the context menu (right-click) of your project.
+Click __Yes__ on the __Adding UIMA custom Nature__ dialog box.
+Click __OK__ on the confirmation dialog box. 
+
+.Screenshot of Adding the UIMA Nature to your project
+image::images/tools/tools.pear.packager/image002.jpg[Screenshot of Adding the UIMA Nature to your project]
+
+Adding the UIMA nature to your project creates the PEAR structure in your project.
+The PEAR structure is a structured tree of folders and files, including the following elements: 
+
+* *Required Elements:*
+** The * metadata* folder which contains the PEAR installation descriptor and properties files.
+** The installation descriptor (`metadata/install.xml``) 
+* *Optional Elements:*
+** The `desc` folder to contain descriptor files of analysis engines, component analysis engines (all levels), and other component (Collection Readers, CAS Consumers, etc).
+** The `src` folder to contain the source code
+** The `bin` folder to contain executables, scripts, class files, dlls, shared libraries, etc.
+** The `lib` folder to contain jar files. 
+** The `doc` folder containing documentation materials, preferably accessible through an index.html.
+** The `data` folder to contain data files (e.g. for testing).
+** The `conf` folder to contain configuration files.
+** The `resources` folder to contain other resources and dependencies.
+** Other user-defined folders or files are allowed, but __should be avoided__. 
+
+For more information about the PEAR structure, please refer to the xref:ref.adoc#ugr.ref.pear[Processing Engine Archive] section. 
+
+.The Pear Structure
+image::images/tools/tools.pear.packager/image004.jpg[Pear structure]
+
+
+[[ugr.tools.pear.packager.using_pear_generation_wizard]]
+=== Using the PEAR Generation Wizard
+
+Before using the PEAR Generation Wizard, add all the files needed to run your component including descriptors, jars, external libraries, resources, and component analysis engines (in the case of an aggregate analysis engine), etc. _Do not_ add JARs for the UIMA framework, however.
+Doing so will cause class loading problems at run time.
+
+If you're using a Java IDE like Eclipse, instead of using the output folder (usually `bin` as the source of your classes, it's recommended that  you generate a Jar file containing these classes.
+
+Then, click on "`Generate PEAR file`" from the context menu (right-click) of your project, to open the PEAR Generation wizard, and follow the instructions on the wizard to generate the PEAR file.
+
+[[ugr.tools.pear.packager.wizard.component_information]]
+==== The Component Information page
+
+The first page of the PEAR generation wizard is the component information page.
+Specify in this page a component ID for your PEAR and select the main Analysis Engine descriptor.
+The descriptor must be specified using a pathname relative to the project's root (e.g. "`desc/MyAE.xml`"). The component id is a string that uniquely identifies the component.
+It should use the JAVA naming convention (e.g.
+org.apache.uima.mycomponent).
+
+Optionally, you can include specific Collection Iterator, CAS Initializer (deprecated as of Version 2.1), or CAS Consumers.
+In this case, specify the corresponding descriptors in this page. 
+
+.The Component Information Page
+image::images/tools/tools.pear.packager/image006.jpg[Pear Wizard - component information page]
+
+
+[[ugr.tools.pear.packager.wizard.install_environment]]
+==== The Installation Environment page
+
+The installation environment page is used to specify the following: 
+
+* Preferred operating system
+* Required JDK version, if applicable.
+* Required Environment variable settings. This is where you specify special CLASSPATH paths. You do not need to specify this for any Jar that is listed in the your eclipse project classpath settings; those are automatically put into the generated CLASSPATH. Nor should you include paths to the UIMA Framework itself, here. Doing so may cause class loading problems. 
++
+CLASSPATH segments are written here using a semicolon ";" as the separator; during PEAR installation, these will be adjusted to be the correct character for the target Operating System.
++
+In order to specify the UIMA datapath for your component you have to create an environment variable with the property name ``uima.datapath``.
+The value of this property  must contain the UIMA datapath settings.
+
+Path names should be specified using macros (see below), instead of hard-coded absolute paths that might work locally, but probably won't if the PEAR is deployed in a different machine and environment.
+
+Macros are variables such as $main_root, used to represent a string such as the full path of a certain directory.
+
+These macros should be defined in the PEAR.properties file using the local values.
+The tools and applications that use and deploy PEAR files should replace these macros (in the files included in the conf and desc folders) with the corresponding values in the local environment as part of the deployment process.
+
+Currently, there are two types of macros:
+
+* $main_root, which represents the local absolute path of the main component root directory after deployment.
+* __$component_id$root__, which represents the local absolute path to the root directory of the component which has _component_id_ as component ID. This component could be, for instance, a delegate component. 
+
+
+.The Installation Environment Page
+image::images/tools/tools.pear.packager/image008.jpg[Pear Wizard - install environment page]
+
+
+[[ugr.tools.pear.packager.wizard.file_content]]
+==== The PEAR file content page
+
+The last page of the wizard is the "`PEAR file Export`" page, which allows the user to select the files to include in the PEAR file.
+The metadata folder and all its content is mandatory.
+Make sure you include all the files needed to run your component including descriptors, jars, external libraries, resources, and component analysis engines (in the case of an aggregate analysis engine), etc.
+It's recommended to generate a jar file from your code as an alternative to building the project and making sure the output folder (bin) contains the required class files.
+
+Eclipse compiles your class files into some output directory, often named "bin" when you take the usual defaults in Eclipse.
+The recommended practice is to take all these files and put them into a Jar file, perhaps using the Eclipse Export  wizard.
+You would place that Jar file into the PEAR `lib` directory.
+
+[NOTE]
+====
+If you are relying on the class files generated in the output folder (usually called bin) to run your code, then make sure the project is built properly, and all the required class files are generated without errors, and then put the output folder (e.g.
+$main_root/bin) in the classpath using the option to set environment variables, by setting the CLASSPATH variable to include this folder (see the "`Installation Environment`" page.
+Beware that using a Java output folder named "bin" in this case is a poor practice,  because the PEAR installation tools will presume this folder contains binary executable files, and will adds this folder to  the PATH environment variable. 
+====
+
+.The PEAR File Export Page
+image::images/tools/tools.pear.packager/image010.jpg[Pear Wizard - File Export Page]
+
+
+[[ugr.tools.pear.packager.using_command_line]]
+== Using the PEAR command line packager
+
+The PEAR command line packager takes some PEAR package parameter settings on the command line to create an  UIMA PEAR file.
+
+To run the PEAR command line packager you can use the provided runPearPackager (.bat for Windows, and .sh for Unix)  scripts.
+The packager can be used in three different modes.
+
+
+
+* Mode 1: creates a complete PEAR package with the provided information (default mode)
++
+[source]
+----
+runPearPackager -compID <componentID> 
+  -mainCompDesc <mainComponentDesc> [-classpath <classpath>] 
+  [-datapath <datapath>] -mainCompDir <mainComponentDir> 
+  -targetDir <targetDir> [-envVars <propertiesFilePath>]
+----
++
+The created PEAR file has the file name <componentID>.pear and is located in the <targetDir>.
+* Mode 2: creates a PEAR installation descriptor without packaging the PEAR file
++
+[source]
+----
+runPearPackager -create -compID <componentID> 
+  -mainCompDesc <mainComponentDesc> [-classpath <classpath>]
+  [-datapath <datapath>] -mainCompDir <mainComponentDir> 
+  [-envVars <propertiesFilePath>]
+----
++
+The PEAR installation descriptor is created in the <mainComponentDir>/metadata directory.
+* Mode 3: creates a PEAR package with an existing PEAR installation descriptor
++
+[source]
+----
+runPearPackager -package -compID <componentID> 
+  -mainCompDir <mainComponentDir> -targetDir <targetDir>
+----
++
+The created PEAR file has the file name <componentID>.pear and is located in the <targetDir>.
+
+The modes 2 and 3 should be used when you want to manipulate the PEAR installation descriptor before packaging the PEAR file. 
+
+Some more details about the PearPackager parameters is provided in the list below:
+
+
+
+* ``<componentID>``: PEAR package component ID.
+* ``<mainComponentDesc>``: Main component descriptor of the PEAR package.
+* ``<classpath>``: PEAR classpath settings. Use $main_root macros to specify path entries. Use `;` to separate the entries.
+* ``<datapath>``: PEAR datapath settings. Use $main_root macros to specify path entries. Use `;` to separate the path entries.
+* ``<mainComponentDir>``: Main component directory that contains the PEAR package content.
+* ``<targetDir>``: Target directory where the created PEAR file is written to.
+* ``<propertiesFilePath>``: Path name to a properties file that contains environment variables that must be set to run the PEAR content.
diff --git a/uimaj-documentation/src/docs/asciidoc/tools/tools.pear.packager.maven.adoc b/uimaj-documentation/src/docs/asciidoc/tools/tools.pear.packager.maven.adoc
new file mode 100644
index 0000000..1cbde00
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/tools/tools.pear.packager.maven.adoc
@@ -0,0 +1,260 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.tools.pear.packager.maven.plugin.usage]]
+= The PEAR Packaging Maven Plugin
+
+UIMA includes a Maven plugin that supports creating PEAR packages using Maven.
+When configured for a project, it assumes that the project has the PEAR layout,  and will copy the standard directories that are part of a PEAR structure under the project root into the PEAR, excluding files that start with a period (".").   It also will put the Jar that is built for the project into the lib/ directory and include it first on the generated classpath. 
+
+The classpath that is generated for this includes the artifact's Jar first, any user specified entries second (in the order they are specified), and finally, entries for all Jars  found in the lib/ directory (in some arbitrary order). 
+
+[[ugr.tools.pear.packager.maven.plugin.usage.configure]]
+== Specifying the PEAR Packaging Maven Plugin
+
+To use the PEAR Packaging Plugin within a Maven build,  the plugin must be added to the plugins section of the  Maven POM as shown below: 
+
+
+[source]
+----
+<build>
+ <plugins>
+  ...
+  <plugin>
+    <groupId>org.apache.uima</groupId>
+    <artifactId>PearPackagingMavenPlugin</artifactId>
+    
+    <!-- if version is omitted, then --> 
+    <!-- version is inherited from parent's pluginManagement section -->
+    <!-- otherwise, include a version element here --> 
+    
+    <!-- says to load Maven extensions 
+         (such as packaging and type handlers) from this plugin -->
+    <extensions>true</extensions>  
+    <executions>
+      <execution>
+        <phase>package</phase>
+        <!-- where you specify details of the thing being packaged -->
+        <configuration>  
+          
+          <classpath>
+            <!-- PEAR file component classpath settings -->
+            $main_root/lib/sample.jar
+          </classpath>
+          
+          <mainComponentDesc>
+            <!-- PEAR file main component descriptor -->
+            desc/${artifactId}.xml
+          </mainComponentDesc>
+          
+          <componentId>
+            <!-- PEAR file component ID -->
+            ${artifactId}
+          </componentId>
+          
+          <datapath>
+            <!-- PEAR file UIMA datapath settings -->
+            $main_root/resources
+          </datapath>
+          
+        </configuration>
+        <goals>
+          <goal>package</goal>
+        </goals>
+      </execution>
+    </executions>
+  </plugin>
+  ...
+ </plugins>
+</build>
+----
+
+To configure the plugin with the specific settings of a PEAR package, the `<configuration>` element section is used.
+This sections contains all parameters  that are used by the PEAR Packaging Plugin to package the right content and set the specific PEAR package settings.
+The details about each parameter and how it is used is shown below: 
+
+* `<classpath>` - This element specifies the classpath settings for the  PEAR component. The Jar artifact that is built during the current Maven build is  automatically added to the PEAR classpath settings and does not have to be added manually. In addition, all Jars in the lib directory and its subdirectories will be added to the generated classpath when the PEAR is installed. 
++
+[NOTE]
+====
+Use $main_root variables to refer to libraries inside  the PEAR package.
+For more details about PEAR packaging please refer to the  Apache UIMA PEAR documentation.
+====
+
+* `<mainComponentDesc>` - This element specifies the relative path to the main component descriptor  that should be used to run the PEAR content. The path must be relative to the  project root. A good default to use is ``desc/${artifactId}.xml``. 
+* `<componentID>` - This element specifies the PEAR package component ID. A good default to use is ``${artifactId}``. 
+* `<datapath>` - This element specifies the PEAR package UIMA datapath settings. If no datapath settings are necessary, this element can be omitted. 
++
+[NOTE]
+====
+Use $main_root variables to refer libraries inside  the PEAR package.
+For more details about PEAR packaging please refer to the  Apache UIMA PEAR documentation.
+====
+
+For most Maven projects it is sufficient to specify the parameters described above.
+In some cases, for  more complex projects, it may be necessary to specify some additional configuration  parameters.
+These parameters are listed below with the default values that are used if they are not  added to the configuration section shown above. 
+
+* `<mainComponentDir>` - This element specifies the main component directory where the UIMA nature is applied. By default this parameter points to the project root  directory - ${basedir}. 
+* `<targetDir>` - This element specifies the target directory where the result of the plugin  are written to. By default this parameters points to the default Maven output  directory - ${basedir}/target 
+
+
+[[ugr.tools.pear.packager.maven.plugin.usage.dependencies]]
+== Automatically including dependencies
+
+A key concept in PEARs is that they allow specifying other Jars in the classpath.
+You can optionally include these Jars within the PEAR package. 
+
+The PEAR Packaging Plugin does not take care of automatically adding these Jars (that the PEAR might depend on) to the PEAR archive.
+However, this behavior can be manually added to your Maven POM.
+The following two build plugins hook into the build cycle and insure that all runtime dependencies are included in the PEAR file. 
+
+The dependencies will be automatically included in the  PEAR file using this procedure; the PEAR install process also will automatically adds all files in the lib directory (and sub directories) to the  classpath. 
+
+The `maven-dependency-plugin` copies the runtime dependencies of the PEAR into the `lib` folder, which is where the PEAR packaging plugin expects them. 
+
+[source]
+----
+<build>
+ <plugins>
+  ...
+  <plugin>
+   <groupId>org.apache.maven.plugins</groupId>
+   <artifactId>maven-dependency-plugin</artifactId>
+   <executions>
+    <!-- Copy the dependencies to the lib folder for the PEAR to copy -->
+    <execution>
+     <id>copy-dependencies</id>
+     <phase>package</phase>
+     <goals>
+      <goal>copy-dependencies</goal>
+     </goals>
+     <configuration>
+      <outputDirectory>${basedir}/lib</outputDirectory>
+      <overWriteSnapshots>true</overWriteSnapshots>
+      <includeScope>runtime</includeScope>
+     </configuration>
+    </execution>
+   </executions>
+  </plugin>
+  ...
+ </plugins>
+</build>
+----
+
+The second Maven plug-in hooks into the `clean` phase of the build life-cycle, and deletes the `lib` folder. 
+
+[NOTE]
+====
+With this approach, the `lib` folder is  automatically created, populated, and removed during the build process.
+Therefore it should not go into the source control system and neither should you manually place any jars in there. 
+====
+
+[source]
+----
+<build>
+ <plugins>
+  ...
+  <plugin>
+   <artifactId>maven-antrun-plugin</artifactId>
+   <executions>
+    <!-- Clean the libraries after packaging -->
+    <execution>
+     <id>CleanLib</id>
+     <phase>clean</phase>
+     <configuration>
+      <tasks>
+       <delete quiet="true" 
+               failOnError="false">
+        <fileset dir="lib" includes="**/*.jar"/>
+       </delete>
+      </tasks>
+     </configuration>
+     <goals>
+      <goal>run</goal>
+     </goals>
+    </execution>                      
+   </executions>
+  </plugin>
+  ...
+ </plugins>
+</build>
+----
+
+[[ugr.tools.pear.packager.maven.plugin.commandline]]
+== Running from the command line
+
+The PEAR packager can be run as a maven command.
+To enable this, you have to add the following to your maven settings file: 
+
+[source]
+----
+<settings>
+  ...
+  <pluginGroups>
+    <pluginGroup>org.apache.uima</pluginGroup>
+  </pluginGroups>
+----
+To invoke the PEAR packager using maven, use the command: 
+
+[source]
+----
+mvn uima-pear:package <parameters...>
+----
+
+The settings are the same ones used in the configuration above, specified as -D variables  where the variable name is `pear.parameterName``.
+For example: 
+[source]
+----
+mvn uima-pear:package -Dpear.mainComponentDesc=desc/mydescriptor.xml
+                      -Dpear.componentId=foo
+----
+
+[[ugr.tools.pear.packager.maven.plugin.install.src]]
+== Building the PEAR Packaging Plugin From Source
+
+The plugin code is available in the Apache subversion repository at: http://svn.apache.org/repos/asf/uima/uimaj/trunk/PearPackagingMavenPlugin.
+Use the following command line to build it (you will need the Maven build tool, available from Apache): 
+
+
+[source]
+----
+#PearPackagingMavenPlugin> mvn install
+----
+
+This maven command will build the tool and install it in your local maven repository,  making it available for use by other maven POMs.
+The plugin version number is displayed at the end of the Maven build as shown in the example below.
+For this example, the plugin  version number is: `2.3.0-incubating`
+
+[source]
+----
+[INFO] Installing 
+/code/apache/PearPackagingMavenPlugin/target/
+PearPackagingMavenPlugin-2.3.0-incubating.jar 
+to 
+/maven-repository/repository/org/apache/uima/PearPackagingMavenPlugin/
+2.3.0-incubating/
+PearPackagingMavenPlugin-2.3.0-incubating.jar
+[INFO] [plugin:updateRegistry]
+[INFO] --------------------------------------------------------------
+[INFO] BUILD SUCCESSFUL
+[INFO] --------------------------------------------------------------
+[INFO] Total time: 6 seconds
+[INFO] Finished at: Tue Nov 13 15:07:11 CET 2007
+[INFO] Final Memory: 10M/24M
+[INFO] --------------------------------------------------------------
+----
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/tug.adoc b/uimaj-documentation/src/docs/asciidoc/tug.adoc
new file mode 100644
index 0000000..ef6dc3a
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/tug.adoc
@@ -0,0 +1,40 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+= Apache UIMA™ - Tutorials and User's Guides
+:Author: Apache UIMA™ Development Community
+:toc-title: UIMA Tutorials and User's Guides
+
+include::tug/common_book_info.adoc[leveloffset=+1]
+
+include::tug/annotator_analysis_engine_guide.adoc[leveloffset=+1]
+
+include::tug/tug.cpe.adoc[leveloffset=+1]
+
+include::tug/tug.application.adoc[leveloffset=+1]
+
+include::tug/tug.fc.adoc[leveloffset=+1]
+
+include::tug/tug.aas.adoc[leveloffset=+1]
+
+include::tug/tug.multi_views.adoc[leveloffset=+1]
+
+include::tug/tug.cas_multiplier.adoc[leveloffset=+1]
+
+include::tug/tug.xmi_emf.adoc[leveloffset=+1]
+
+include::tug/tug.type_mapping.adoc[leveloffset=+1]
diff --git a/uimaj-documentation/src/docs/asciidoc/tug/annotator_analysis_engine_guide.adoc b/uimaj-documentation/src/docs/asciidoc/tug/annotator_analysis_engine_guide.adoc
new file mode 100644
index 0000000..54b6712
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/tug/annotator_analysis_engine_guide.adoc
@@ -0,0 +1,1810 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.tug.aae]]
+= Annotator and Analysis Engine Developer's Guide
+// <titleabbrev>Annotator &amp; AE Developer's Guide</titleabbrev>
+
+This chapter describes how to develop UIMA __type systems__, _Annotators_ and _Analysis Engines_ using the UIMA SDK.
+It is helpful to read the UIMA Conceptual Overview chapter for a review on these concepts.
+
+An _Analysis Engine (AE)_ is a program that analyzes artifacts (e.g.
+documents) and infers information from them.
+
+Analysis Engines are constructed from building blocks called __Annotators__.
+An annotator is a component that contains analysis logic.
+Annotators analyze an artifact (for example, a text document) and create additional data (metadata) about that artifact.
+It is a goal of UIMA that annotators need not be concerned with anything other than their analysis logic – for example the details of their deployment or their interaction with other annotators.
+
+An Analysis Engine (AE) may contain a single annotator (this is referred to as a __Primitive AE)__, or it may be a composition of others and therefore contain multiple annotators (this is referred to as an __Aggregate
+    AE__). Primitive and aggregate AEs implement the same interface and can be used interchangeably by applications.
+
+Annotators produce their analysis results in the form of typed __Feature Structures__, which are simply data structures that have a type and a set of (attribute, value) pairs.
+An _annotation_ is a particular type of Feature Structure that is attached to a region of the artifact being analyzed (a span of text in a document, for example).
+
+For example, an annotator may produce an Annotation over the span of text ``President Bush``, where the type of the Annotation is `Person` and the attribute `fullName` has the value ``George W. Bush``, and its position in the artifact is character position 12 through character position 26.
+
+It is also possible for annotators to record information associated with the entire document rather than a particular span (these are considered Feature Structures but not Annotations).
+
+All feature structures, including annotations, are represented in the UIMA __Common Analysis Structure(CAS)__.
+The CAS is the central data structure through which all UIMA components communicate.
+Included with the UIMA SDK is an easy-to-use, native Java interface to the CAS called the __JCas__.
+The JCas represents each feature structure as a Java object; the example feature structure from the previous paragraph would be an instance of a Java class Person with getFullName() and setFullName() methods. 
+
+The CAS interface for accessing feature structures uses UIMA Type an Feature object instances, which are computed at run time, depending on the type system being used.
+This interface supports writing general annotators which can work for all type systems.
+It is used, for example, internally, in the CasCopier implementation, to copy the content of one CAS to another. 
+
+The JCas interface can take advantage of knowing ahead of time the particular Types and Features a pipeline is using.
+The JCas Classes correspond to a particular UIMA type, and the class includes  special setters and getters whose names match the features. 
+
+The remainder of this chapter will refer to the analysis of text documents and the creation of annotations that are attached to spans of text in those documents.
+Keep in mind that the CAS can represent arbitrary types of feature structures, and feature structures can refer to other feature structures.
+For example, you can use the CAS to represent a parse tree for a document.
+Also, the artifact that you are analyzing need not be a text document.
+
+This guide is organized as follows:
+
+* _<<ugr.tug.aae.getting_started>>_ is a tutorial with step-by-step instructions for how to develop and test a simple UIMA annotator.
+* _<<ugr.tug.aae.configuration_logging>>_ discusses how to make your UIMA annotator configurable, and how it can write messages to the UIMA log file.
+* _<<ugr.tug.aae.building_aggregates>>_ describes how annotators can be combined into aggregate analysis engines. It also describes how one annotator can make use of the analysis results produced by an annotator that has run previously.
+* _<<ugr.tug.aae.other_examples>>_ describes several other examples you may find interesting, including
+** `SimpleTokenAndSentenceAnnotator`: a simple tokenizer and sentence annotator.
+** `PersonTitleDBWriterCasConsumer`: a sample CAS Consumer which populates a relational database with some annotations. It uses JDBC and in this example, hooks up with the Open Source Apache Derby database. 
+* _<<ugr.tug.aae.additional_topics>>_ describes additional features of the UIMA SDK that may help you in building your own annotators and analysis engines.
+* _<<ugr.tug.aae.common_pitfalls>>_ contains some useful guidelines to help you ensure that your annotators will work correctly in any UIMA application.
+
+This guide does not discuss how to build xref:tug.adoc#ugr.tug.application[UIMA Applications], which are programs that use Analysis Engines, along with other components, e.g. a search engine, document store, and user interface, to deliver a complete package of functionality to an end-user.
+
+[[ugr.tug.aae.getting_started]]
+== Getting Started
+
+This section is a step-by-step tutorial that will get you started developing UIMA annotators.
+All of the files referred to by the examples in this chapter are in the `examples` directory of the UIMA SDK.
+This directory is designed to be xref:oas.adoc#ugr.ovv.eclipse_setup.example_code[imported] into your Eclipse workspace.
+See for how to attach the UIMA Javadocs to the jar files.
+Also you may wish to xref:oas.adoc#ugr.ovv.eclipse_setup.linking_uima_javadocs[attach the UIMA Javadocs to the JAR files] or refer to the UIMA SDK Javadocs located in the link:api/index.html[docs/api/index.html] directory.
+
+[NOTE]
+====
+If you hover over a UIMA class or method defined in the UIMA SDK Javadocs, the Javadocs appear after a short delay. 
+====
+
+[NOTE]
+====
+If you downloaded the source distribution for UIMA, you can xref:ref.adoc#ugr.ref.javadocs[attach that as well to the library JAR files[].
+====
+
+The example annotator that we are going to walk through will detect room numbers for rooms where the room numbering scheme follows some simple conventions.
+In our example, there are two kinds of patterns we want to find; here are some examples, together with their corresponding regular expression patterns: 
+
+Yorktown patterns:::
+20-001, 31-206, 04-123 (Regular Expression Pattern: `\\##-[0-2]##`)
+
+Hawthorne patterns:::
+GN-K35, 1S-L07, 4N-B21 (Regular Expression Pattern: `[G1-4][NS]-[A-Z]##`)
+
+There are several steps to develop and test a simple UIMA annotator.
+
+. Define the CAS types that the annotator will use.
+. Generate the Java classes for these types.
+. Write the actual annotator Java code.
+. Create the Analysis Engine descriptor.
+. Test the annotator. 
+
+These steps are discussed in the next sections.
+
+[[ugr.tug.aae.defining_types]]
+=== Defining Types
+
+The first step in developing an annotator is to define the CAS Feature Structure types that it creates.
+This is done in an XML file called a __Type System
+        Descriptor__.
+UIMA defines basic primitive types such as Boolean, Byte, Short, Integer, Long, Float, and Double, as well as Arrays of these primitive types.
+UIMA also defines the built-in types ``TOP``, which is the root  of the type system, analogous to Object in Java; ``FSArray``, which is  an array of Feature Structures (i.e.
+an array of instances of TOP); and ``Annotation``, which we will discuss in more detail in this section.
+
+UIMA includes an xref:oas.adoc#ugr.ovv.eclipse_setup[Eclipse plug-in] that will help you edit Type System Descriptors, so if you are using Eclipse you will not need to worry about the details of the XML syntax.
+
+The Type System Descriptor for our annotator is located in the file `descriptors/tutorial/ex1/TutorialTypeSystem.xml.` 
+This and all other examples are located in the `examples` directory of the installation of the UIMA SDK, which can be xref:oas.adoc#ugr.ovv.eclipse_setup.example_code[imported] into an Eclipse project for your convenience.
+
+In Eclipse, expand the `uimaj-examples` project in the Package Explorer view, and browse to the file `descriptors/tutorial/ex1/TutorialTypeSystem.xml`.
+Right-click on the file in the navigator and select __Open With → Component Descriptor Editor__.
+Once the editor opens, click on the __Type System__ tab at the bottom of the editor window.
+You should see a view such as the following:
+
+.Screenshot of editor for Type System Definitions
+image::images/tutorials_and_users_guides/tug.aae/image002.jpg[]
+
+Our annotator will need only one type -- `org.apache.uima.tutorial.RoomNumber`.
+(We use the same namespace conventions as are used for Java classes.) Just as in Java, types have supertypes.
+The supertype is listed in the second column of the left table.
+In this case our RoomNumber annotation extends from the built-in type `uima.tcas.Annotation`.
+
+Descriptions can be included with types and features.
+In this example, there is a description associated with the `building` feature.
+To see it, hover the mouse over the feature.
+
+The bottom tab labeled __Source__ will show you the XML source file associated with this descriptor.
+
+The built-in Annotation type declares three fields (called __Features__ in CAS terminology).  The features `begin` and `end` store the character offsets of the span of text to which the  annotation refers.
+The feature `sofa` (Subject of Analysis) indicates which document the begin and end offsets point into.
+The `sofa` feature can be ignored for now since we assume in this tutorial that the CAS contains only one subject of analysis (document).
+
+Our RoomNumber type will inherit these three features from `uima.tcas.Annotation`, its supertype; they are not visible in this view because inherited features are not shown.
+One additional feature, ``building``, is declared.
+It takes a String as its value.
+Instead of String, we could have declared the range-type of our feature to be any other CAS type (defined or built-in).
+
+If you are not using Eclipse, if you need to edit the type system, do so using any XML or text editor, directly.
+The following is the actual XML representation of the Type System displayed above in the editor:
+
+[source]
+----
+<?xml version="1.0" encoding="UTF-8" ?>
+  <typeSystemDescription xmlns="http://uima.apache.org/resourceSpecifier">
+    <name>TutorialTypeSystem</name>
+    <description>Type System Definition for the tutorial examples - 
+        as of Exercise 1</description>
+    <vendor>Apache Software Foundation</vendor>
+    <version>1.0</version>
+    <types>
+      <typeDescription>
+        <name>org.apache.uima.tutorial.RoomNumber</name>
+        <description></description>
+        <supertypeName>uima.tcas.Annotation</supertypeName>
+        <features>
+          <featureDescription>
+            <name>building</name>
+            <description>Building containing this room</description>
+            <rangeTypeName>uima.cas.String</rangeTypeName>
+          </featureDescription>
+        </features>
+      </typeDescription>
+    </types>
+  </typeSystemDescription>
+----
+
+[[ugr.tug.aae.generating_jcas_sources]]
+=== Generating Java Source Files for CAS Types
+
+When you save a descriptor that you have modified, the Component Descriptor Editor will automatically generate Java classes corresponding to the types that are defined in that descriptor (unless this has been disabled), using a utility called JCasGen.
+These Java classes will have the same name (including package) as the CAS types, and will have get and set methods for each of the features that you have defined.
+
+This feature is enabled/disabled using the UIMA menu pulldown (or the Eclipse Preferences →UIMA). If automatic running of JCasGen is not happening, please make sure the option is checked:
+
+.Screenshot of enabling automatic running of JCasGen
+image::images/tutorials_and_users_guides/tug.aae/image004.jpg[]
+
+The Java class for the example org.apache.uima.tutorial.RoomNumber type can be found in `src/org/apache/uima/tutorial/RoomNumber.java` . You will see how to use these generated classes in the next section.
+
+If you are not using the Component Descriptor Editor, you will need to generate these Java classes by using the _JCasGen_ tool.
+JCasGen reads a Type System Descriptor XML file and generates the corresponding Java classes that you can then use in your annotator code.
+To launch JCasGen, run the jcasgen shell script located in the `/bin` directory of the UIMA SDK installation.
+This should launch a GUI that looks something like this:
+
+.Screenshot of JCasGen
+image::images/tutorials_and_users_guides/tug.aae/image006.jpg[]
+
+Use the "`Browse`" buttons to select your input file (`TutorialTypeSystem.xml`) and output directory (the root of the source tree into which you want the generated files placed). Then click the __Go__ button.
+If the Type System Descriptor has no errors, new Java source files will be generated under the specified output directory.
+
+There are some xref:tools.adoc#ugr.tools.jcasgen[additional options] to choose from when running JCasGen.
+
+[[ugr.tug.aae.developing_annotator_code]]
+=== Developing Your Annotator Code
+
+Annotator implementations all implement a standard interface (AnalysisComponent), having several methods, the most important of which are: 
+
+* `initialize`, 
+* `process`, and 
+* `destroy`. 
+
+`initialize` is called by the framework once when it first creates an instance of the annotator class. `process` is called once per item being processed. `destroy` may be called by the application when it is done using your annotator.
+There is a  default implementation of this interface for annotators using the JCas, called `JCasAnnotator_ImplBase`, which  has implementations of all required methods except for the process method.
+
+Our annotator class extends the xref:tug.adoc#ugr.tug.aas[JCasAnnotator_ImplBase]; most annotators that use the JCas will extend from this class, so they only have to implement the process method. This class is not restricted to handling just text.
+
+Annotators are not required to extend from the JCasAnnotator_ImplBase class; they may instead directly implement the AnalysisComponent interface, and provide all method implementations themselves.
+footnote:[Note that AnalysisComponent is not specific to `JCas`. There is a method `getRequiredCasInterface()` which the user would have to implement to return `JCas.class`. Then in the `process(AbstractCas cas)` method, they would need to typecast `CAS` to type `JCas`.] This allows you to have your annotator inherit from some other superclass if necessary.
+If you would like to do this, see the Javadocs for `JCasAnnotator` for descriptions of the methods you must implement.
+
+Annotator classes need to be public, cannot be declared abstract, and must have public, no-args constructors, so that they can be instantiated by the framework. 
+footnote:[Although Java classes in which you do not define any constructor will, by default, have a no-args constructor that doesn't do anything, a class in which you have defined at least one constructor does not get a default no-args constructor.] .
+
+The class definition for our RoomNumberAnnotator implements the process method, and is shown here.
+You can find the source for this in the `uimaj-examples/src/org/apache/uima/tutorial/ex1/RoomNumberAnnotator.java` . 
+
+[NOTE]
+====
+In Eclipse, in the "`Package Explorer`" view, this will appear by default in the project ``uimaj-examples``, in the folder ``src``, in the package ``org.apache.uima.tutorial.ex1``.
+==== 
+
+In Eclipse, open the RoomNumberAnnotator.java in the uimaj-examples project, under the src directory.
+
+[source]
+----
+package org.apache.uima.tutorial.ex1;
+
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.uima.analysis_component.JCasAnnotator_ImplBase;
+import org.apache.uima.jcas.JCas;
+import org.apache.uima.tutorial.RoomNumber;
+
+/**
+ * Example annotator that detects room numbers using 
+ * Java 1.4 regular expressions.
+ */
+public class RoomNumberAnnotator extends JCasAnnotator_ImplBase {
+  private Pattern mYorktownPattern = 
+        Pattern.compile("\\b[0-4]\\d-[0-2]\\d\\d\\b");
+
+  private Pattern mHawthornePattern = 
+        Pattern.compile("\\b[G1-4][NS]-[A-Z]\\d\\d\\b");
+
+  public void process(JCas aJCas) {
+    // Discussed Later
+  }
+}
+----
+
+The two Java class fields, mYorktownPattern and mHawthornePattern, hold regular expressions that will be used in the process method.
+Note that these two fields are part of the Java implementation of the annotator code, and not a part of the CAS type system.
+We are using the regular expression facility that is built into Java 1.4.
+It is not critical that you know the details of how this works, but if you are curious the details can be found in the Java API docs for the java.util.regex package.
+
+The only method that we are required to implement is ``process``.
+This method is typically  called once for each document that is being analyzed.
+This method takes one argument, which is a JCas instance;  this holds the document to be analyzed and all of the analysis results. footnote:[Version 1 of UIMA specified an additional parameter, the ResultSpecification. This provides a
+          specification of which types and features are desired to be computed and "output" from this annotator. Its
+          use is optional; many annotators ignore it.]
+
+[source]
+----
+public void process(JCas aJCas) {
+  // get document text
+  String docText = aJCas.getDocumentText();
+  // search for Yorktown room numbers
+  Matcher m = mYorktownPattern.matcher(docText);
+  int pos = 0;
+  while (m.find(pos)) {
+    // found one - create annotation, with the begin/end positions
+    RoomNumber annotation = new RoomNumber(aJCas, m.start(), m.end());
+    annotation.setBuilding("Yorktown");
+    annotation.addToIndexes();
+    pos = m.end();
+  }
+  
+  // search for Hawthorne room numbers
+  m = mHawthornePattern.matcher(docText);
+  pos = 0;
+  while (m.find(pos)) {
+    // found one - create annotation, with the begin/end positions
+    RoomNumber annotation = new RoomNumber(aJCas, m.start(), m.end());
+    annotation.setBuilding("Hawthorne");
+    annotation.addToIndexes();
+    pos = m.end();
+  }
+}
+----
+
+The Matcher class is part of the java.util.regex package and is used to find the room numbers in the document text.
+When we find one, recording the annotation is as simple as creating a new Java object and calling some set methods:
+
+[source]
+----
+RoomNumber annotation = new RoomNumber(aJCas, m.start(), m.end());
+annotation.setBuilding("Yorktown");
+----
+
+The `RoomNumber` class was generated from the type system description by the Component Descriptor Editor or the JCasGen tool, as discussed in the previous section.
+
+Finally, we call `annotation.addToIndexes()` to add the new annotation to the indexes maintained in the CAS.
+By default, the CAS implementation used for analysis of text documents keeps an index of all annotations in their order from beginning to end of the document.
+Subsequent annotators or applications use the indexes to iterate over the annotations. 
+
+[NOTE]
+====
+If you don't add the instance to the indexes, it cannot be retrieved by down-stream annotators, using the indexes. 
+====
+
+[NOTE]
+====
+You can also call `addToIndexes()` on Feature Structures that are not subtypes of ``uima.tcas.Annotation``, but these will not be sorted in any particular way.
+If you want to specify a sort order, you can define your own xref:ref.adoc#ugr.ref.xml.component_descriptor.aes.index[custom indexes] in the xref:ref.adoc#ugr.ref.cas[CAS].
+====
+
+We're almost ready to test the RoomNumberAnnotator.
+There is just one more step remaining.
+
+[[ugr.tug.aae.creating_xml_descriptor]]
+=== Creating the XML Descriptor
+
+The UIMA architecture requires that descriptive information about an annotator be represented in an XML file and provided along with the annotator class file(s) to the UIMA framework at run time.
+This XML file is called an __Analysis Engine Descriptor__.
+The descriptor includes: 
+
+* Name, description, version, and vendor
+* The annotator's inputs and outputs, defined in terms of the types in a Type System Descriptor
+* Declaration of the configuration parameters that the annotator accepts 
+
+The _Component Descriptor Editor_ plugin, which we previously used to edit the Type System descriptor, can also be used to edit Analysis Engine Descriptors.
+
+A descriptor for our RoomNumberAnnotator is provided with the UIMA distribution under the name `descriptors/tutorial/ex1/RoomNumberAnnotator.xml.` To edit it in Eclipse, right-click on that file in the navigator and select Open With → Component Descriptor Editor.
+
+[TIP]
+====
+In Eclipse, you can double click on the tab at the top of the Component Descriptor Editor's window identifying the currently selected editor, and the window will "`Maximize`".
+Double click it again to restore the original size.
+====
+
+If you are not using Eclipse, you will need to edit Analysis Engine descriptors manually.
+See <<ugr.tug.aae.xml_intro_ae_descriptor>> for an introduction to the Analysis Engine descriptor XML syntax.
+The remainder of this section assumes you are using the Component Descriptor Editor plug-in to edit the Analysis Engine descriptor.
+
+The xref:tools.adoc#ugr.tools.cde[Component Descriptor Editor] consists of several tabbed pages; we will only need to use a few of them here.
+
+The initial page of the Component Descriptor Editor is the Overview page, which appears as follows:
+
+.Screenshot of Component Descriptor Editor overview page
+image::images/tutorials_and_users_guides/tug.aae/image008.jpg[]
+
+This presents an overview of the RoomNumberAnnotator Analysis Engine (AE). The left side of the page shows that this descriptor is for a _Primitive_ AE (meaning it consists of a single annotator), and that the annotator code is developed in Java.
+Also, it specifies the Java class that implements our logic (the code which was discussed in the previous section). Finally, on the right side of the page are listed some descriptive attributes of our annotator.
+
+The other two pages that need to be filled out are the Type System page and the Capabilities page.
+You can switch to these pages using the tabs at the bottom of the Component Descriptor Editor.
+In the tutorial, these are already filled out for you.
+
+The RoomNumberAnnotator will be using the TutorialTypeSystem we looked at in Section <<ugr.tug.aae.defining_types>>.
+To specify this, we add this type system to the Analysis Engine's list of Imported Type Systems, using the Type System page's right side panel, as shown here:
+
+.Screenshot of CDE Type System page
+image::images/tutorials_and_users_guides/tug.aae/image010.jpg[]
+
+On the Capabilities page, we define our annotator's inputs and outputs, in terms of the types in the type system.
+The Capabilities page is shown below:
+
+.Screenshot of CDE Capabilities page
+image::images/tutorials_and_users_guides/tug.aae/image012.jpg[]
+
+Although capabilities come in sets, having multiple sets is deprecated; here we're just using one set.
+The RoomNumberAnnotator is very simple.
+It requires no input types, as it operates directly on the document text -- which is supplied as a part of the CAS initialization (and which is always assumed to be present). It produces only one output type (RoomNumber), and it sets the value of the `building` feature on that type.
+This is all represented on the Capabilities page.
+
+The Capabilities page has two other parts for specifying languages and Sofas.
+The languages section allows you to specify which languages your Analysis Engine supports.
+The RoomNumberAnnotator happens to be language-independent, so we can leave this blank.
+The Sofas section allows you to specify the names of additional subjects of analysis.
+This capability and the Sofa Mappings at the bottom are xref:tug.adoc#ugr.tug.aas[advanced topics]. 
+
+This is all of the information we need to provide for a simple annotator.
+If you want to peek at the XML that this tool saves you from having to write, click on the "`Source`" tab at the bottom to view the generated XML.
+
+[[ugr.tug.aae.testing_your_annotator]]
+=== Testing Your Annotator
+
+Having developed an annotator, we need a way to try it out on some example documents.
+The UIMA SDK includes a tool called the Document Analyzer that will allow us to do this.
+To run the Document Analyzer, execute the documentAnalyzer shell script that is in the `bin` directory of your UIMA SDK installation, or, if you are using the example Eclipse project, execute the "`UIMA Document Analyzer`" run configuration supplied with that project.
+(To do this, click on the menu bar Run → Run ... → and under Java Applications in the left box, click on UIMA Document Analyzer.)
+
+You should see a screen that looks like this:
+
+.Screenshot of UIMA Document Analyzer GUI
+image::images/tutorials_and_users_guides/tug.aae/image014.jpg[]
+
+There are six options on this screen:
+
+. Directory containing documents to analyze
+. Directory where analysis results will be written
+. The XML descriptor for the Analysis Engine (AE) you want to run
+. (Optional) an XML tag, within the input documents, that contains the text to be analyzed. For example, the value TEXT would cause the AE to only analyze the portion of the document enclosed within <TEXT>...</TEXT> tags.
+. Language of the document 
+. Character encoding 
+
+Use the Browse button next to the third item to set the "`Location of AE XML
+        Descriptor`" field to the descriptor we've just been discussing —``<where-you-installed-uima-e.g.UIMA_HOME> 
+          /examples/descriptors/tutorial/ex1/RoomNumberAnnotator.xml`` . Set the other fields to the values shown in the screen shot above (which should be the default values if this is the first time you've run the Document Analyzer). Then click the "`Run`" button to start processing.
+
+When processing completes, an "`Analysis Results`" window should appear.
+
+.Screenshot of UIMA Document Analyzer Results GUI
+image::images/tutorials_and_users_guides/tug.aae/image016.jpg[]
+
+Make sure "`Java Viewer`" is selected as the Results Display Format, and *double-click* on the document UIMASummerSchool2003.txt to view the annotations that were discovered.
+The view should look something like this:
+
+.Screenshot of UIMA CAS Annotation Viewer GUI
+image::images/tutorials_and_users_guides/tug.aae/image018.jpg[]
+
+You can click the mouse on one of the highlighted annotations to see a list of all its features in the frame on the right.
+
+[NOTE]
+====
+The legend will only show those types which have at least one instance in the CAS, and are declared as outputs in the capabilities section of the descriptor (see <<ugr.tug.aae.creating_xml_descriptor>>. 
+====
+
+You can use the DocumentAnalyzer to test any UIMA annotator —just make sure that the annotator's classes are in the class path.
+
+[[ugr.tug.aae.configuration_logging]]
+== Configuration and Logging
+
+[[ugr.tug.aae.configuration_parameters]]
+=== Configuration Parameters
+
+The example RoomNumberAnnotator from the previous section used hardcoded regular expressions and location names, which is obviously not very flexible.
+For example, you might want to have the patterns of room numbers be supplied by a configuration parameter, rather than having to redo the annotator's Java code to add additional patterns.
+Rather than add a new hardcoded regular expression for a new pattern, a better solution is to use configuration parameters.
+
+UIMA allows annotators to declare configuration parameters in their descriptors.
+The descriptor also specifies default values for the parameters, though these can be overridden at runtime.
+
+[[ugr.tug.aae.declaring_parameters_in_the_descriptor]]
+==== Declaring Parameters in the Descriptor
+
+The example descriptor `descriptors/tutorial/ex2/RoomNumberAnnotator.xml` is the same as the descriptor from the previous section except that information has been filled in for the Parameters and Parameter Settings pages of the Component Descriptor Editor.
+
+First, in Eclipse, open example two's RoomNumberAnnotator in the Component Descriptor Editor, and then go to the Parameters page (click on the parameters tab at the bottom of the window), which is shown below:
+
+.Screenshot of UIMA Component Descriptor Editor (CDE) Parameters page
+image::images/tutorials_and_users_guides/tug.aae/image020.jpg[]
+
+Two parameters –Patterns and Locations -- have been declared.
+In this screen shot, the mouse (not shown) is hovering over Patterns to show its description in the small popup window.
+Every parameter has the following information associated with it:
+
+* name –the name by which the annotator code refers to the parameter
+* description –a natural language description of the intent of the parameter
+* type –the data type of the parameter's value –must be one of String, Integer, Float, or Boolean.
+* multiValued –true if the parameter can take multiple-values (an array), false if the parameter takes only a single value. Shown above as ``Multi``.
+* mandatory –true if a value must be provided for the parameter. Shown above as `Req` (for required). 
+
+Both of our parameters are mandatory and accept an array of Strings as their value.
+
+Next, default values are assigned to the parameters on the Parameter Settings page:
+
+.Screenshot of UIMA Component Descriptor Editor (CDE) Parameter Settings page
+image::images/tutorials_and_users_guides/tug.aae/image022.jpg[]
+
+Here the "`Patterns`" parameter is selected, and the right pane shows the list of values for this parameter, in this case the regular expressions that match particular room numbering conventions.
+Notice the third pattern is new, for matching the style of room numbers in the third building, which has room numbers such as ``J2-A11``.
+
+[[ugr.tug.aae.accessing_parameter_values_from_annotator]]
+==== Accessing Parameter Values from the Annotator Code
+
+The class `org.apache.uima.tutorial.ex2.RoomNumberAnnotator` has overridden the initialize method.
+The initialize method is called by the UIMA framework when the annotator is instantiated, so it is a good place to read configuration parameter values.
+The default initialize method does nothing with configuration parameters, so you have to override it.
+To see the code in Eclipse, switch to the src folder, and open ``org.apache.uima.tutorial.ex2``.
+Here is the method body:
+
+[source]
+----
+/**
+* @see AnalysisComponent#initialize(UimaContext)
+*/
+public void initialize(UimaContext aContext) 
+        throws ResourceInitializationException {
+  super.initialize(aContext);
+  
+  // Get config. parameter values  
+  String[] patternStrings = 
+        (String[]) aContext.getConfigParameterValue("Patterns");
+  mLocations = 
+        (String[]) aContext.getConfigParameterValue("Locations");
+
+  // compile regular expressions
+  mPatterns = new Pattern[patternStrings.length];
+  for (int i = 0; i < patternStrings.length; i++) {
+    mPatterns[i] = Pattern.compile(patternStrings[i]);
+  }
+}
+----
+
+Configuration parameter values are accessed through the UimaContext.
+As you will see in subsequent sections of this chapter, the UimaContext is the annotator's access point for all of the facilities provided by the UIMA framework –for example logging and external resource access.
+
+The UimaContext's `getConfigParameterValue` method takes the name of the parameter as an argument; this must match one of the parameters declared in the descriptor.
+The return value of this method is a Java Object, whose type corresponds to the declared type of the parameter.
+It is up to the annotator to cast it to the appropriate type, String[] in this case.
+
+If there is a problem retrieving the parameter values, the framework throws an exception.
+Generally annotators don't handle these, and just let them propagate up.
+
+To see the configuration parameters working, run the Document Analyzer application and select the descriptor `examples/descriptors/tutorial/ex2/RoomNumberAnnotator.xml` . In the example document ``WatsonConferenceRooms.txt``, you should see some examples of Hawthorne II room numbers that would not have been detected by the ex1 version of RoomNumberAnnotator.
+
+[[ugr.tug.aae.supporting_reconfiguration]]
+==== Supporting Reconfiguration
+
+If you take a look at the Javadocs (located in the link:api/index.html[docs/api] directory) for `org.apache.uima.analysis_component.AnaysisComponent` (which our annotator implements indirectly through JCasAnnotator_ImplBase), you will see that there is a reconfigure() method, which is called by the containing application through the UIMA framework, if the configuration parameter values are changed.
+
+The AnalysisComponent_ImplBase class provides a default implementation that just calls the annotator's destroy method followed by its initialize method.
+This works fine for our annotator.
+The only situation in which you might want to override the default reconfigure() is if your annotator has very expensive initialization logic, and you don't want to reinitialize everything if just one configuration parameter has changed.
+In that case, you can provide a more intelligent implementation of reconfigure() for your annotator.
+
+[[ugr.tug.aae.configuration_parameter_groups]]
+==== Configuration Parameter Groups
+
+For annotators with many sets of configuration parameters, UIMA supports organizing them into groups.
+It is possible to define a parameter with the same name in multiple groups; one common use for this is for annotators that can process documents in several languages and which want to have different parameter settings for the different languages.
+
+The syntax for defining parameter groups in your descriptor is fairly straightforward –see xref:ref.adoc#ugr.ref.xml.component_descriptor[Component Descriptor Reference] for details.
+Values of parameters defined within groups are accessed through the two-argument version of ``UimaContext.getConfigParameterValue``, which takes both the group name and the parameter name as its arguments.
+
+[[ugr.tug.aae.configuration_parameter_overrides]]
+==== Overriding Configuration Parameter Settings
+
+There are two ways that the value assigned to a configuration parameter can be overridden.
+An aggregate may declare a parameter that overrides one or more of the parameters in one or more of its delegates.
+The aggregate must also define a value for the parameter, unless the parameter is itself overridden by a setting in the parent aggregate.
+
+An alternative method that avoids these strict hierarchical override constraints is to associate an external global name with a parameter and to assign values to these external names in an xref:ref.adoc#ugr.ref.xml.component_descriptor.aes.external_configuration_parameter_overrides[external properties file].
+With this approach a particular parameter setting can be easily shared by multiple descriptors, even across different applications.
+For applications with many levels of descriptor nesting it avoids the need to edit aggregate override definitions when the location of an annotator in the hierarchy is changed.
+
+
+[[ugr.tug.aae.logging]]
+=== Logging
+
+The UIMA SDK provides a logging facility, which is very similar to the `java.util.logging.Logger` class.
+In addition, it includes the link:https://www.slf4j.org/[SLF4j framework] and all the methods in that framework's `Logger` API, plus the Java 8 specific API extensions that take `Supplier` parameters.
+
+Each logger instance is associated with a name.
+By convention, this name is usually a hierarchy of simple names connected with periods,  often the fully qualified class name of the component issuing the logging call.
+The name (or any of its parents - starting prefixes up to a period)  can be referenced in a configuration file which can then configure for each logger various things such as the logging level and where messages should go.
+
+The UIMA framework supports this convention using the `UimaContext` object.
+If you access a logger instance using `getContext().getLogger()` or the shorter, but equivalent `getLogger()` within an Annotator, the logger name will be the fully qualified name of the Annotator implementation class.
+
+Here is an example from the process method of ``org.apache.uima.tutorial.ex2.RoomNumberAnnotator``: 
+
+[source]
+----
+getLogger().trace("Found: {}", () -> annotation.toString());
+----
+
+The `trace` call  indicates that this is a tracing message.
+This is useful for tracing program flow, but it is a low level which is not usually enabled. 
+
+The first parameter is the message, with substitutable parts.
+The convention for where those parts go is written as either {} or {n}, where "n" is an integer, specifying the argument number.
+The modern logging APIs use the {} style, with API calls such as ``logger.**level**( msg-using-{}-convention, substitutable-arguments)``, while the older java.util.logger framework uses ``logger.log(**level**, msg-using-{n} convention, substitutable-arguments)``. 
+
+UIMA supports both styles.
+For new code, it is recommended to use the first style, together with the Java 8 lambda method for the arguments, which  insures that the work of turning the `annotation` argument into a printable string only will happen if tracing is enabled. 
+
+Log statements are "filtered" according to the logging configuration, by Level, and sometimes by additional indicators, such as Markers.
+Levels work in a hierarchy.
+A given level of  filtering passes that level and all higher levels.
+Some levels have two names, due to the  way the different logger back ends name things.
+Most levels are also used as method names on  the logger, to indicate logging for that level.
+For example, you could say `aLogger.log(Level.INFO, message)` but you can also say ``aLogger.info(message)``). The level ordering, highest to lowest,  and the associated method names are as follows: 
+
+* SEVERE or ERROR; error(...)
+* WARN or WARNING; warn(...)
+* INFO; info(...)
+* CONFIG; info(UIMA_MARKER_CONFIG, ...)
+* FINE or DEBUG; debug(...)
+* FINER or TRACE; trace(...)
+* FINEST; trace(UIMA_MARKER_FINEST, ...)
+
+The CONFIG and FINEST levels are merged with other levels, but are distinguished by having ``Markers``.
+If the filtering is configured to pass CONFIG level, then it will pass also the INFO/WARN/ERROR  (or their alternative names WARNING/SEVERE) levels as well. 
+
+Each logging backend has its own documentation for how  to configure loggers at run time, via configuration files or APIs in some cases.
+Some backends even allow dynamic reconfiguration while running, just by updating the configuration file (it is re-loaded every so often, if changed). 
+
+For the built-in-to-Java logging back end, if no logging configuration file is provided (see next section),  the Java Virtual Machine defaults would be used, which typically set the level to INFO and higher messages, and direct output to the console.
+
+The UIMA logger is by default implemented using an SLF4J implementation; this (in turn) connects to a logging back end, determined via a search of the classpath for a connector.
+If none can be found, then a message to that effect will be printed to System.err, and no logging will be done.
+The binary distribution for UIMA includes, in its `lib` directory, the  Jar which connects SLF4j to the Java-built-in logger to use as its back end, so if you use the standard launchers, you will get this logging back end. 
+
+Assuming you are using the Java-built-in-logger as the back-end,  if you specify the configuration using the standard UIMA SDK `Logger.properties` (found in ``UIMA_HOME/config/``), the output will be directed to a file named uima.log, in the current working directory (often the "`project`" directory when running from Eclipse, for instance).
+
+[NOTE]
+====
+When using Eclipse, the uima.log file, if written into the Eclipse workspace in the project uimaj-examples, for example, may not appear in the Eclipse package explorer view until you right-click the uimaj-examples project with the mouse, and select "`Refresh`".
+This operation refreshes the Eclipse display to conform to what may have changed on the file system.
+Also, you can set the Eclipse preferences for the workspace to automatically refresh (Window → Preferences → General → Workspace, then click the "`refresh
+      automatically`" checkbox.
+====
+
+The next several sections mainly describe how to configure the built-in Java logger.
+See the documentation for other logging back ends for  details on how to configure those.
+
+[[ugr.tug.aae.logging.configuring]]
+==== Specifying the Logging Configuration when using Java's built-in logger
+
+The standard Java built-in logging initialization mechanisms will look for a Java System Property named `java.util.logging.config.file` and if found, will use the value of this property as the name of a standard "`properties`" file, for setting the logging level.
+Please refer to the Java 1.4.
+documentation for more information on the format and use of this file.
+
+Two sample logging specification property files can be found in the UIMA_HOME directory where the UIMA SDK is installed: ``config/Logger.properties``, and ``config/FileConsoleLogger.properties``.
+These specify the same logging, except the first logs just to a file, while the second logs both to a file and to the console.
+You can edit these files, or create additional ones, as described below, to change the logging behavior.
+
+When running your own Java application, you can specify the location of this logging configuration file on your Java command line by setting the Java system property `java.util.logging.config.file` to be the logging configuration filename.
+This file specification can be either absolute or relative to the working directory.
+For example: 
+
+[source]
+----
+java "-Djava.util.logging.config.file=C:/Program Files/apache-uima/config/Logger.properties"
+----
+
+[NOTE]
+====
+In a shell script, you can use environment variables such as UIMA_HOME if convenient.
+====
+
+If you are using Eclipse to launch your application, you can set this property in the VM arguments section of the Arguments tab of the run configuration screen.
+If you've set an environment variable UIMA_HOME, you could for example, use the string: `"-Djava.util.logging.config.file=${env_var:UIMA_HOME}/config/Logger.properties".`
+
+If you running the .bat or .sh files in the UIMA SDK's `bin` directory, you can specify the location of your logger configuration file by setting the `UIMA_LOGGER_CONFIG_FILE` environment variable prior to running the script, for example (on Windows): 
+
+[source]
+----
+set UIMA_LOGGER_CONFIG_FILE=C:/myapp/MyLogger.properties
+----
+
+[[ugr.tug.aae.logging.setting_logging_levels]]
+==== Setting Logging Levels when using Java's built-in logger
+
+Within the logging control file, the default global logging level specifies which kinds of events are logged across all loggers.
+For any given facility this global level can be overridden by a facility specific level.
+Multiple handlers are supported.
+This allows messages to be directed to a log file, as well as to a "`console`".
+Note that the ConsoleHandler also has a separate level setting to limit messages printed to the console.
+For example: `$$.$$level=
+          INFO`
+
+The properties file can change where the log is written, as well.
+
+Facility specific properties allow different logging for each class, as well.
+For example, to set the com.xyz.foo logger to only log SEVERE messages: `com.xyz.foo.level = SEVERE`
+
+If you have a sample annotator in the package `org.apache.uima.SampleAnnotator` you can set the log level by specifying: `org.apache.uima.SampleAnnotator.level =
+          ALL`
+
+There are other logging controls; for a full discussion, please read the contents of the `Logger.properties` file and the Java specification for logging in Java 1.4.
+
+[[ugr.tug.aae.logging.output_format]]
+==== Configuring the format of logging output when using Java's built-in logger
+
+The logging output is formatted by handlers specified in the properties file for configuring logging, described above.
+The default formatter that comes with the UIMA SDK formats logging output as follows:
+
+`Timestamp - threadID: sourceInfo: Message level:
+          message`
+
+Here's an example:
+
+`7/12/04 2:15:35 PM - 10:
+          org.apache.uima.util.TestClass.main(62): INFO: You are not logged
+          in!`
+
+[[ugr.tug.aae.logging.meaning_of_severity_levels]]
+==== Meaning of the logging severity levels used by the UIMA logger
+
+These levels are defined by the Java logging framework, which was incorporated into Java as of the 1.4 release level.
+The levels are defined in the Javadocs for java.util.logging.Level, and include both logging and tracing levels: 
+
+* OFF is a special level that can be used to turn off logging.
+* ALL indicates that all messages should be logged. 
+* CONFIG is a message level for configuration messages. These would typically occur once (during configuration) in methods like ``initialize()``. 
+* INFO is a message level for informational messages, for example, connected to server IP: 192.168.120.12 
+* WARNING is a message level indicating a potential problem.
+* SEVERE is a message level indicating a serious failure.
+
+Tracing levels, typically used for debugging: 
+
+* FINE is a message level providing tracing information, typically at a collection level (messages occurring once per collection). 
+* FINER indicates a fairly detailed tracing message, typically at a document level (once per document).
+* FINEST indicates a highly detailed tracing message. 
+
+
+[[ugr.tug.aae.logging.using_outside_of_an_annotator]]
+==== Using loggers outside of an annotator
+
+An application using UIMA may want to log its messages using the same logging framework.
+This can be done by getting a reference to the UIMA logger, as follows: 
+
+[source]
+----
+Logger logger = UIMAFramework.getLogger(TestClass.class);
+----
+
+You can also simply get a direct reference to an Slf4j logger using the standard approach: 
+
+[source]
+----
+org.slf4j.Logger logger = org.slf4j.LogFactory.getLogger(TestClass.class);
+----
+
+The class argument specifies the name of the logger, using the fully qualified class name.
+For UIMA loggers, if not specified, the name of the returned logger instance is "`org.apache.uima`".
+
+[[ugr.tug.aae.logging.change_logger_implementation]]
+==== Changing the underlying UIMA logging implementation
+
+By default the UIMA framework uses, under the hood of the UIMA Logger interface, the  SLF4J logging framework to do logging.
+This allows UIMA, when running embedded inside other frameworks, to defer the choice of back-end logging frameworks to those applications. 
+
+For backwards compatibility with Version 2, the older methods (prior to Slf4j) for switching the logger implementation remains.
+You do this by specifying the system property 
+
+[source]
+----
+-Dorg.apache.uima.logger.class=<loggerClass>
+----
+when the UIMA framework is started. 
+
+The specified logger class must be available in the classpath and has to subclass the `org.apache.uima.util.Logger_common_impl` class. 
+
+For backwards compatibility, V3 continues to provide the class `org.apache.uima.util.impl.Log4jLogger_impl` as an alternative which can be specified this way by this JVM argument: 
+
+[source]
+----
+-Dorg.apache.uima.logger.class=org.apache.uima.util.impl.Log4jLogger_impl
+----
+
+to switch to the log4j back end.
+This has been updated in V3 to `log4j 2` (see https://logging.apache.org/log4j). If you use this, you must provide the required `Log4j 2` jars in the classpath. 
+
+[[_uv3.logging.suppress_annotator_logging]]
+==== Throttling excessive logging from Annotators
+
+Sometimes, in production, you may find annotators are logging excessively, and you wish to throttle  this.
+But you may not have access to logging settings to control this, perhaps because UIMA is running as a library component within another framework.
+For this special case, you can limit logging done by Annotators by passing an additional parameter to the UIMA Framework's  produceAnalysisEngine API, using the key name `AnalysisEngine.PARAM_THROTTLE_EXCESSIVE_ANNOTATOR_LOGGING` and setting the value to an Integer object equal to the the limit.
+Using 0 will suppress all logging.
+Any positive number allows that many log records to be logged, per level.
+A limit of 10 would allow  10 Errors, 10 Warnings, etc.
+The limit is enforced separately, per logger instance.
+
+[NOTE]
+====
+This only works if the logger used by Annotators is obtained from the  Annotator base implementation class via the `getLogger()` method.
+====
+
+[[ugr.tug.aae.building_aggregates]]
+== Building Aggregate Analysis Engines
+
+[[ugr.tug.aae.combining_annotators]]
+=== Combining Annotators
+
+The UIMA SDK makes it very easy to combine any sequence of Analysis Engines to form an __Aggregate Analysis Engine__.
+This is done through an XML descriptor; no Java code is required!
+
+If you go to the `examples/descriptors/tutorial/ex3` folder (in Eclipse, it's in your uimaj-examples project, under the `descriptors/tutorial/ex3` folder), you will find a descriptor for a TutorialDateTime annotator.
+This annotator detects dates and times.
+To see what this annotator can do, try it out using the Document Analyzer.
+If you are curious as to how this annotator works, the source code is included, but it is not necessary to understand the code at this time.
+
+We are going to combine the TutorialDateTime annotator with the RoomNumberAnnotator to create an aggregate Analysis Engine.
+This is illustrated in the following figure: 
+
+.Combining Annotators to form an Aggregate Analysis Engine
+image::images/tutorials_and_users_guides/tug.aae/image024.png[Combining Annotators to form an Aggregate Analysis
+              Engine]
+
+The descriptor that does this is named ``RoomNumberAndDateTime.xml``, which you can open in the Component Descriptor Editor plug-in.
+This is in the uimaj-examples project in the folder ``descriptors/tutorial/ex3``. 
+
+The "`Aggregate`" page of the Component Descriptor Editor is used to define which components make up the aggregate.
+A screen shot is shown below.
+(If you are not using Eclipse, see <<ugr.tug.aae.xml_intro_ae_descriptor>> for the actual XML syntax for Aggregate Analysis Engine Descriptors.)
+
+.Aggregate page of the Component Descriptor Editor (CDE)
+image::images/tutorials_and_users_guides/tug.aae/image026.jpg[]
+
+On the left side of the screen is the list of component engines that make up the aggregate –in this case, the TutorialDateTime annotator and the RoomNumberAnnotator.
+To add a component, you can click the "`Add`" button and browse to its descriptor.
+You can also click the "`Find AE`" button and search for an Analysis Engine in your Eclipse workspace. 
+
+[NOTE]
+====
+The __Add Remote__ button is used for adding components which run xref:tug.adoc#ugr.tug.application.how_to_call_a_uima_service[remotely] (for example, on another machine using a remote networking connection).
+====
+
+The order of the components in the left pane does not imply an order of execution.
+The order of execution, or __flow__ is determined in the "`Component Engine Flow`" section on the right.
+UIMA supports different types of algorithms (including user-definable) for determining the flow.
+Here we pick the simplest: `FixedFlow`.
+We have chosen to have the RoomNumberAnnotator execute first, although in this case it doesn't really matter, since the RoomNumber and DateTime annotators do not have any dependencies on one another.
+
+If you look at the __Type System__ page of the Component Descriptor Editor, you will see that it displays the type system but is not editable.
+The Type System of an Aggregate Analysis Engine is automatically computed by merging the Type Systems of all of its components.
+
+[WARNING]
+====
+If the components have different definitions for the same type name, The Component Descriptor Editor will show a warning.
+It is possible to continue past this warning, in which case your aggregate's type system will have the correct "`merged`" type definition that contains all of the features defined on that type by all of your components.
+However, it is not recommended to use this feature in conjunction with JCAS, since the JCAS Java Class definitions cannot be so easily xref:ref.adoc#ugr.ref.jcas.merging_types_from_other_specs[merged].
+====
+
+The Capabilities page is where you explicitly declare the aggregate Analysis Engine's inputs and outputs.
+Sofas and Languages are described later. 
+
+.Screen shot of the Capabilities page of the Component Descriptor Editor
+image::images/tutorials_and_users_guides/tug.aae/image028.jpg[]
+
+Note that it is not automatically assumed that all outputs of each component Analysis Engine (AE) are passed through as outputs of the aggregate AE.
+If, for example, the TutorialDateTime annotator also produced Word and Sentence annotations,  but those were not of interest as output in this case, we can exclude them from the  list of outputs.
+
+You can run this AE using the Document Analyzer in the same way that you run any other AE.
+Just select the `examples/descriptors/tutorial/ex3/RoomNumberAndDateTime.xml` descriptor and click the Run button.
+You should see that RoomNumbers, Dates, and Times are all shown:
+
+.Screen shot results of running the Document Analyzer
+image::images/tutorials_and_users_guides/tug.aae/image030.jpg[]
+
+
+[[ugr.tug.aae.aaes_can_contain_cas_consumers]]
+=== AAEs can also contain CAS Consumers
+
+In addition to aggregating Analysis Engines, Aggregates xref:tug.adoc#ugr.tug.cpe[can also contain CAS Consumers], or even a mixture of these components with regular Analysis Engines.
+The UIMA Examples has an example of an Aggregate which contains both an analysis engine and a CAS consumer, in `examples/descriptors/MixedAggregate.xml.`
+
+Analysis Engines support the `collectionProcessComplete` method, which is particularly important for many CAS Consumers.
+If an application (or a Collection Processing Engine) calls `collectionProcessComplete` on an aggregate, the framework will deliver that call to all of the components of the aggregate.
+If you use one of the built-in flow types (fixedFlow or capabilityLanguageFlow), then the order specified in that flow will be the same order in which the `collectionProcessComplete` calls are made to the components.
+If a custom flow is used, then the calls will be made in arbitrary order. 
+
+[[ugr.tug.aae.reading_results_previous_annotators]]
+=== Reading the Results of Previous Annotators
+
+So far, we have been looking at annotators that look directly at the document text.
+However, annotators can also use the results of other annotators.
+One useful thing we can do at this point is look for the co-occurrence of a Date, a RoomNumber, and two Times –and annotate that as a Meeting.
+
+The `select` API, available on the CAS, JCas, and individual UIMA indexes,  is the preferred way to get  feature structures from the CAS and work with them.
+
+The CAS maintains _indexes_ of annotations, and from an index you can obtain an iterator that allows you to step through all annotations of a particular type in that index.
+Indexes are optional; they allow you to specify a sorting order or can specify set-inclusion criteria.
+One built-in index is the Annotation index; this contains sorted instances of type Annotation  or its subtypes. 
+
+Here's some example code that would iterate over all of the TimeAnnot annotations in the JCas, in some unspecified order: 
+[source]
+----
+for (TimeAnnot : aJCas.select(TimeAnnot.class)) {
+  //do something
+}
+----
+
+The same code, but using the Annotation index to specify an ordering (assuming that TimeAnnot is a subtype of Annotation): 
+[source]
+----
+for (TimeAnnot : aJCas.getAnnotationIndex().select(TimeAnnot.class)) {
+  //do something
+}
+  // or
+for (TimeAnnot : aJCas.getAnnotationIndex(TimeAnnot.class).select()) {
+  //do something
+}
+----
+
+Also, if you've defined your own xref:ref.adoc#ugr.ref.xml.component_descriptor.aes.index[custom index], you can get an iterator over that specific index by calling `aJCas.getIndex(label, clazz)`.
+The `getIndex(...)` method's second argument  specialized the index to subtype of the type the index was declared to index.
+For instance, if you defined an index called "allEvents" over the type ``Event``, and wanted  to get an index over just a particular subtype of event, say, ``TimeEvent``, you can ask for that index using ``aJCas.getIndex("allEvents", TimeEvent.class)``.
+
+Whereever the type is specified by TimeEvent.class, the APIs also allow the non-JCas  specification of the type by passing an instance of a UIMA Type class.
+This alternative enables writing code that can be used for any type, discovered at run time.
+
+Now that we've explained the basics, let's take a look at the process method for ``org.apache.uima.tutorial.ex4.MeetingAnnotator``.
+Since we're looking for a combination of a RoomNumber, a Date, and two Times, there are four nested iterators.
+(There's surely a better algorithm for doing this, but to keep things simple we're just going to look at every combination of the four items.)
+
+For each combination of the four annotations, we compute the span of text that includes all of them, and then we check to see if that span is smaller than a "`window`" size, a configuration parameter.
+There are also some checks to make sure that we don't annotate the same span of text multiple times.
+If all the checks pass, we create a Meeting annotation over the whole span.
+There's really nothing to it!
+
+The XML descriptor, located in `examples/descriptors/tutorial/ex4/MeetingAnnotator.xml` , is also very straightforward.
+An important difference from previous descriptors is that this is the first annotator we've discussed that has input requirements.
+This can be seen on the "`Capabilities`" page of the Component Descriptor Editor:
+
+.Screen shot of Capabilities page of the Component Descriptor Editor
+image::images/tutorials_and_users_guides/tug.aae/image032.jpg[]
+
+If we were to run the MeetingAnnotator on its own, it wouldn't detect anything because it wouldn't have any input annotations to work with.
+The required input annotations can be produced by the RoomNumber and DateTime annotators.
+So, we create an aggregate Analysis Engine containing these two annotators, followed by the Meeting annotator.
+This aggregate is illustrated in <<ugr.tug.aae.fig.aggregate_for_meeting_annotator>>.
+The descriptor for this is in `examples/descriptors/tutorial/ex4/MeetingDetectorAE.xml`. Give it a try in the Document Analyzer. 
+
+[[ugr.tug.aae.fig.aggregate_for_meeting_annotator]]
+.An Aggregate Analysis Engine where an internal component uses output from previousengines
+image::images/tutorials_and_users_guides/tug.aae/image034.png[]
+
+
+[[ugr.tug.aae.other_examples]]
+== Other examples
+
+The UIMA SDK include several other examples you may find interesting, including
+
+* `SimpleTokenAndSentenceAnnotator`: a simple tokenizer and sentence annotator.
+* `XmlDetagger`: A xref:tug.adoc#ugr.tug.mvs[multi-sofa] annotator that does XML detagging.
+Reads XML data from the input Sofa (named "xmlDocument"); this data can be stored in the CAS as a string or array, or it can be a URI to a remote file. The XML is parsed using the JVM's default parser, and the plain-text content is written to a new sofa called "plainTextDocument".
+* `PersonTitleDBWriterCasConsumer`: a sample CAS Consumer which populates a relational database with some annotations.
+It uses JDBC and in this example, hooks up with the Open Source Apache Derby database. 
+
+
+[[ugr.tug.aae.additional_topics]]
+== Additional Topics
+
+[[ugr.tug.aae.contract_for_annotator_methods]]
+=== Annotator Methods
+
+The UIMA framework ensures that an Annotator instance is called by only one thread at a time.
+An instance never has to worry about running some method on one  thread, and then asynchronously being called using another thread.
+This approach  simplifies the design of annotators –they do not have to be designed to support multi-threading.
+When multiple threading is wanted, for performance, multiple instances of the Annotator are created, each one running on just one thread.
+
+The following table defines the methods called by the framework, when they are called, and the requirements annotator implementations must follow.
+
+[cols="1,1,1", frame="all", options="header"]
+|===
+| Method
+| When Called by Framework
+| Requirements
+
+
+|`initialize`
+|Typically only called once, when instance is created. Can be called
+again if application does a reinitialize call and the default behavior
+                isn't overridden (the default behavior for reinitialize is to call `destroy` followed by `initialize`
+|Normally does one-time initialization, including reading of
+                configuration parameters. If the application changes the parameters, it
+                can call initialize to have the annotator re-do its
+                initialization.
+
+|`typeSystemInit`
+|Called before `process` whenever the type system
+                in the CAS being passed in differs from what was previously passed in a `process` call (and called for the first CAS passed in,
+                too). The Type System being passed to an annotator only changes in the case of
+                remote annotators that are active as servers, receiving possibly
+                different type systems to operate on.
+|Typically, users of JCas do not implement any method for this. An
+                annotator can use this call to read the CAS type system and setup any instance
+                variables that make accessing the types and features convenient.
+
+|`process`
+|Called once for each CAS. Called by the application if not using
+                Collection Processing Manager (CPM); the application calls the process
+                method on the analysis engine, which is then delegated by the framework to
+                all the annotators in the engine. For Collection Processing application,
+                the CPM calls the process method. If the application creates and manages
+                your own Collection Processing Engine via API calls (see Javadocs), the
+                application calls this on the Collection Processing Engine, and it is
+                delegated by the framework to the components.
+|Process the CAS, adding and/or modifying elements in it
+
+|`destroy`
+|This method can be called by applications, and is also called by the
+                Collection Processing Manager framework when the collection processing
+                completes. It is also called on Aggregate delegate components, if those 
+                components successfully complete their `initialize` call, if 
+                a subsequent delegate (or flow controller) in the aggregate fails to initialize.
+                This allows components which need to clean up things done during initialization 
+                to do so.  It is up to the component writer to use a try/finally construct during initialization
+                to cleanup from errors that occur during initialization within one component.
+                The `destroy` call on an aggregate is
+                propagated to all contained analysis engines.
+|An annotator should release all resources, close files, close
+                database connections, etc., and return to a state where another initialize
+                call could be received to restart. Typically, after a destroy call, no
+                further calls will be made to an annotator instance.
+
+|`reconfigure`
+| This method is never called by the framework, unless an application calls it on the Engine object –in which case it the framework propagates it to all annotators contained in the Engine. Its purpose is to signal that the configuration parameters have changed.
+|A default implementation of this calls destroy, followed by
+                initialize. This is the only case where initialize would be called more than
+                once. Users should implement whatever logic is needed to return the
+                annotator to an initialized state, including re-reading the
+                configuration parameter data.
+|===
+
+[[ugr.tug.aae.reporting_errors_from_annotators]]
+=== Reporting errors from Annotators
+
+There are two broad classes of errors that can occur: recoverable and unrecoverable.
+Because Annotators are often expected to process very large numbers of artifacts (for example, text documents), they should be written to recover where possible.
+
+For example, if an upstream annotator created some input for an annotator which is invalid, the annotator may want to log this event, ignore the bad input and continue.
+It may include a notification of this event in the CAS, for further downstream annotators to consider.
+Or, it may throw an exception (see next section) -- but in this case, it cannot do any further processing on that document.
+
+[NOTE]
+====
+The choice of what to do can be made configurable, using the configuration parameters. 
+====
+
+[[ugr.tug.aae.throwing_exceptions_from_annotators]]
+=== Throwing Exceptions from Annotators
+
+Let's say an invalid regular expression was passed as a parameter to the RoomNumberAnnotator.
+Because this is an error related to the overall configuration, and not something we could expect to ignore, we should throw an appropriate exception, and most Java programmers would expect to do so like this:
+
+[source]
+----
+throw new ResourceInitializationException(
+    "The regular expression " + x + " is not valid.");
+----
+
+UIMA, however, does not do it this way.
+All UIMA exceptions are __internationalized__, meaning that they support translation into other languages.
+This is accomplished by eliminating hardcoded message strings and instead using external message digests.
+Message digests are files containing (key, value) pairs.
+The key is used in the Java code instead of the actual message string.
+This allows the message string to be easily translated later by modifying the message digest file, not the Java code.
+Also, message strings in the digest can contain parameters that are filled in when the exception is thrown.
+The format of the message digest file is described in the Javadocs for the Java class `java.util.PropertyResourceBundle` and in the load method of ``java.util.Properties``.
+
+The first thing an annotator developer must choose is what Exception class to use.
+There are three to choose from: 
+
+. ResourceConfigurationException should be thrown from the annotator's reconfigure() method if invalid configuration parameter values have been specified. 
+. ResourceInitializationException should be thrown from the annotator's initialize() method if initialization fails for any  reason (including invalid configuration parameters).
+. AnalysisEngineProcessException should be thrown from the annotator's process() method if the processing of a particular document fails for any reason. 
+
+Generally you will not need to define your own custom exception classes, but if you do they must extend one of these three classes, which are the only types of Exceptions that the annotator interface permits annotators to throw.
+
+All of the UIMA Exception classes share common constructor varieties.
+There are four possible arguments:
+
+The name of the message digest to use (optional –if not specified the default UIMA message digest is used).
+
+The key string used to select the message in the message digest.
+
+An object array containing the parameters to include in the message.
+Messages can have substitutable parts.
+When the message is given, the string representation of the objects passed are substituted into the message.
+The object array is often created using the syntax `new Object[]{x, y}`.
+
+Another exception which is the "`cause`" of the exception you are throwing.
+This feature is commonly used when you catch another exception and rethrow it.
+(optional)
+
+If you look at source file (folder: src in Eclipse) ``org.apache.uima.tutorial.ex5.RoomNumberAnnotator``, you will see the following code: 
+
+[source]
+----
+try {
+  mPatterns[i] = Pattern.compile(patternStrings[i]);
+} 
+catch (PatternSyntaxException e) {
+  throw new ResourceInitializationException(
+     MESSAGE_DIGEST, "regex_syntax_error",
+     new Object[]{patternStrings[i]}, e);
+}
+----
+where the MESSAGE_DIGEST constant has the value `
+        "org.apache.uima.tutorial.ex5.RoomNumberAnnotator_Messages". `
+
+Message digests are specified using a dotted name, just like Java classes.
+This file, with the .properties extension, must be present in the class path.
+In Eclipse, you find this file under the src folder, in the package org.apache.uima.tutorial.ex5, with the name RoomNumberAnnotator_Messages.properties.
+Outside of Eclipse, you can find this in the `uimaj-examples.jar` with the name `org/apache/uima/tutorial/ex5/RoomNumberAnnotator_Messages.properties.` If you look in this file you will see the line: 
+
+[source]
+----
+regex_syntax_error = {0} is not a valid regular expression.
+----
+which is the error message for the example exception we showed above.
+The placeholder {0} will be filled by the toString() value of the argument passed to the exception constructor – in this case, the regular expression pattern that didn't compile.
+If there were additional arguments, their locations in the message would be indicated as {1}, {2}, and so on.
+
+If a message digest is not specified in the call to the exception constructor, the default is `UIMAException.STANDARD_MESSAGE_CATALOG` (whose value is "``org.apache.uima.UIMAException_Messages``" in the current release but may change). This message digest is located in the `uima-core.jar` file at `org/apache/uima/UIMAException_messages.properties`– you can take a look to see if any of these exception messages are useful to use.
+
+To try out the regex_syntax_error exception, just use the Document Analyzer to run `examples/descriptors/tutorial/ex5/RoomNumberAnnotator.xml` , which happens to have an invalid regular expression in its configuration parameter settings.
+
+To summarize, here are the steps to take if you want to define your own exception message:
+
+Create a file with the .properties extension, where you declare message keys and their associated messages, using the same syntax as shown above for the regex_syntax_error exception.
+The properties file syntax is more completely described in the Javadocs for the http://java.sun.com/j2se/1.5.0/docs/api/java/util/Properties.html#load(java.io.InputStream)[
+        load] method of the java.util.Properties class.
+
+Put your properties file somewhere in your class path (it can be in your annotator's .jar file).
+
+Define a String constant (called MESSAGE_DIGEST for example) in your annotator code whose value is the dotted name of this properties file.
+For example, if your properties file is inside your jar file at the location ``org/myorg/myannotator/Messages.properties``, then this String constant should have the value ``org.myorg.myannotator.Messages``.
+Do not include the .properties extension.
+In Java Internationalization terminology, this is called the Resource Bundle name.
+For more information see the Javadocs for the http://java.sun.com/j2se/1.5.0/docs/api/java/util/PropertyResourceBundle.html[
+        PropertyResourceBundle] class.
+
+In your annotator code, throw an exception like this: 
+[source]
+----
+throw new ResourceInitializationException(
+    MESSAGE_DIGEST, "your_message_name",
+    new Object[]{param1,param2,...});
+----
+
+You may also wish to look at the Javadocs for the UIMAException class.
+
+For more information on Java's internationalization features, see the http://java.sun.com/j2se/1.5.0/docs/guide/intl/index.html[
+        Java Internationalization Guide].
+
+[[ugr.tug.aae.accessing_external_resource_files]]
+=== Accessing External Resources
+
+External Resources are Java objects that have a life cycle where they are (optionally) initialized at startup time by reading external data from  a file or via a URL (which can access information over the http protocol, for instance). It is not _required_ that Extermal Resource objects  do any external data reading to initialize themselves.
+However, this is such a  common use case, that we will presume this mode of operation in the description below.
+
+Sometimes you may want an annotator to read from an external resource,  such as a URL or a file –for example, a long list of keys and values that you are going to build into a HashMap.
+You could, of course, just introduce a configuration parameter that holds the absolute path or URL to this resource, and build the HashMap in your annotator's initialize method.
+However, this is not the best solution for three reasons:
+
+. Including an absolute path in your descriptor to specify the initialization data makes your annotator difficult for others to use. Each user will need to edit this descriptor and set the absolute path to a value appropriate for his or her installation.
+. You cannot share the created Java object(s), e.g., a HashMap,  between multiple annotators. Also, in some deployment scenarios there may be more than one instance of your annotator, and you would like to have the option for them to share the same Java Object(s).
+. Your annotator would become dependent on a particular  implementation of the Java Object(s).  It would be better if there was  a decoupling between the actual implementation, and the API used to access it. 
+
+A better way to create these sharable Java objects and initialize them  via external disk or URL sources is through the ResourceManager component.
+In this section we are going to show an example of how to use the Resource Manager.
+
+This example annotator will annotate UIMA acronyms (e.g.
+UIMA, AE, CAS, JCas) and store the acronym's expanded form as a feature of the annotation.
+The acronyms and their expanded forms are stored in an external file.
+
+First, look at the `examples/descriptors/tutorial/ex6/UimaAcronymAnnotator.xml` descriptor. 
+
+.Screen shot of Component Descriptor Editor page for configuring External Resources
+image::images/tutorials_and_users_guides/tug.aae/image036.jpg[]
+
+The values of the rows in the two tables are longer than can be easily shown.
+You can click the small button at the top right to shift the layout from two side-by-side tables, to a vertically stacked layout.
+You can also click the small twisty on the "`Imports for External Resources and Bindings`" to collapse this section, because it's not used here.
+Then the same screen will appear like this: 
+
+.Screen shot of Component Descriptor Editor page for configuring External Resources after adjusting the layout
+image::images/tutorials_and_users_guides/tug.aae/image038.jpg[]
+
+The top window has a scroll bar allowing you to see the rest of the line.
+
+[[ugr.tug.aae.resources.declaring_dependencies]]
+==== Declaring Resource Dependencies
+
+The bottom window is where an annotator declares an external resource dependency.
+The XML for this is as follows:
+
+[source]
+----
+<externalResourceDependency>
+  <key>AcronymTable</key> 
+  <description>Table of acronyms and their expanded forms.</description> 
+  <interfaceName>
+    org.apache.uima.tutorial.ex6.StringMapResource
+  </interfaceName> 
+</externalResourceDependency>
+----
+
+The <key> value (AcronymTable) is the name by which the annotator identifies this resource.
+The key must be unique for all resources that this annotator accesses, but the same key could be used by different annotators to mean different things.
+The interface name (``org.apache.uima.tutorial.ex6.StringMapResource``) is the Java interface through which the annotator accesses the data.
+Specifying an interface name is optional.
+If you do not specify an interface name, annotators will instead get an interface which can provide direct access to the  data resource (file or URL) that is  associated with this external resource.
+
+[[ugr.tug.aae.resources.accessing_from_uimacontext]]
+==== Accessing the Resource from the UimaContext
+
+If you look at the `org.apache.uima.tutorial.ex6.UimaAcronymAnnotator` source, you will see that the annotator accesses this resource from the UimaContext by calling: 
+[source]
+----
+StringMapResource mMap = 
+  (StringMapResource)getContext().getResourceObject("AcronymTable");
+----
+
+The object returned from the `getResourceObject` method will implement the interface declared in the `<interfaceName>` section of the descriptor, `StringMapResource` in this case.
+The annotator code does not need to know the location of external data that may be used to initilize this object, nor the Java class that might be used to read the data and implement the `StringMapResource` interface.
+
+Note that if we did not specify a Java interface in our descriptor, our annotator could directly access the resource data as follows: 
+
+[source]
+----
+InputStream stream = getContext().getResourceAsStream("AcronymTable");
+----
+
+If necessary, the annotator could also determine the location of the resource file, by calling: 
+
+[source]
+----
+URI uri = getContext().getResourceURI("AcronymTable");
+----
+
+These last two options are only available in the case where the descriptor does not declare a Java interface.
+
+[NOTE]
+====
+The methods for getting access to resources include ``getResourceURL``.
+That  method returns a URL, which may contain spaces encoded as %20.
+url.getPath() would return the path without decoding these %20 into spaces. `getResourceURI` on the other hand, returns a URI, and the uri.getPath() _does_ do the conversion of %20 into spaces.
+See also ``getResourceFilePath``, which does a getResourceURI followed by uri.getPath().
+====
+
+[[ugr.tug.aae.resources.declaring_and_bindings]]
+==== Declaring Resources and Bindings
+
+Refer back to the top window in the Resources page of the Component Descriptor Editor.
+This is where we specify the location of the resource data, and the Java class used to read the data.
+For the example, this corresponds to the following section of the descriptor: 
+
+[source]
+----
+<resourceManagerConfiguration>
+  <externalResources>
+    <externalResource>
+      <name>UimaAcronymTableFile</name> 
+      <description>
+         A table containing UIMA acronyms and their expanded forms.
+      </description> 
+      <fileResourceSpecifier>
+        <fileUrl>file:org/apache/uima/tutorial/ex6/uimaAcronyms.txt
+        </fileUrl> 
+      </fileResourceSpecifier>
+      <implementationName>
+         org.apache.uima.tutorial.ex6.StringMapResource_impl
+      </implementationName> 
+    </externalResource>
+  </externalResources>
+
+  <externalResourceBindings>
+    <externalResourceBinding>
+      <key>AcronymTable</key>    
+      <resourceName>UimaAcronymTableFile</resourceName> 
+    </externalResourceBinding>
+  </externalResourceBindings>
+</resourceManagerConfiguration>
+----
+
+The first section of this XML declares an externalResource, the ``UimaAcronymTableFile``.
+With this, the fileUrl element specifies the path to the data file.
+This can be a file on the file system, but can also be a remote resource access via, e.g., the http protocol.
+The fileUrl element doesn't have to be a "file", it can be a URL.
+This can be an absolute URL (e.g.
+one that starts with file:/ or file:///, or file://my.host.org/), but that is not recommended because it makes installation of your component more difficult, as noted earlier.
+Better is a relative URL, which will be looked up within the classpath (and/or datapath), as used in this example.
+In this case, the file `org/apache/uima/tutorial/ex6/uimaAcronyms.txt` is located in ``uimaj-examples.jar``, which is in the classpath.
+If you look in this file you will see the definitions of several UIMA acronyms.
+
+The second section of the XML declares an externalResourceBinding, which connects the key ``AcronymTable``, declared in the annotator's external resource dependency, to the actual resource name ``UimaAcronymTableFile``.
+This is rather trivial in this case; for more on bindings see the example `UimaMeetingDetectorAE.xml` below.
+There is no global repository for external resources; it is up to the user to define each resource needed by a particular set of annotators.
+
+In the Component Descriptor Editor, bindings are indicated below the external resource.
+To create a new binding, you select an external resource (which must have previously been defined), and an external resource dependency, and then click the `Bind` button, which only enables if you have selected two things to bind together.
+
+When the Analysis Engine is initialized, it creates a single instance of `StringMapResource_impl` and loads it with the contents of the data file.
+This means that the framework calls the instance's `load` method, passing it an instance of DataResource, from which you can obtain  a stream or URI/URL of the external resource that was declared in the external resource;  for resources where loading does not make sense, you can implement a `load` method which ignores its argument and just returns, or performes whatever initialization is appropriate at startup time.
+See the Javadocs for  SharedResourceObject for details on this.
+
+The UimaAcronymAnnotator then accesses the data through the `StringMapResource` interface.
+This single instance could be shared among multiple annotators, as will be explained later.
+
+[WARNING]
+====
+Because the implementation of the resource is shared,  you should insure your implementation is thread-safe, as it  could be called multiple times on multiple threads, simultaneously.
+====
+
+Note that all resource implementation classes (e.g.
+StringMapResource_impl in the provided example) must be declared public must not be declared abstract, and must have public, no-args constructors, so  that they can be instantiated by the framework.
+(Although Java classes in which  you do not define any constructor will, by default, have a no-args constructor that doesn't do anything, a class in which you have defined at least one constructor does not get a default no-args constructor.)
+
+All resource implementation classes that provide access to resource data must also implement the interface org.apache.uima.resource.SharedResourceObject.
+The UIMA Framework will invoke this interface's only method, ``load``,   after this object has been instantiated.
+The implementation of this method  can then read data from the specified `DataResource`  and use that data to initialize this object.
+It can also do whatever resource initialization might be appropriate to do at startup time.
+
+This annotator is illustrated in <<ugr.tug.aae.fig.external_resource_binding>>.
+To see it in action, just run it using the Document Analyzer.
+When it finishes, open up the UIMA_Seminars document in the processed results window, (double-click it), and then left-click on one of the highlighted terms, to see the expandedForm feature's value. 
+
+[[ugr.tug.aae.fig.external_resource_binding]]
+.External Resource Binding
+image::images/tutorials_and_users_guides/tug.aae/image040.png[]
+
+By designing our annotator in this way, we have gained some flexibility.
+We can freely replace the StringMapResource_impl class with any other implementation that implements the simple StringMapResource interface.
+(For example, for very large resources we might not be able to have the entire map in memory.) We have also made our external resource dependencies explicit in the descriptor, which will help others to deploy our annotator.
+
+[[ugr.tug.aae.resources.sharing_among_annotators]]
+==== Sharing Resources among Annotators
+
+Another advantage of the Resource Manager is that it allows our data to be shared between annotators.
+To demonstrate this we have developed another annotator that will use the same acronym table.
+The UimaMeetingAnnotator will iterate over Meeting annotations discovered by the Meeting Detector we previously developed and attempt to determine whether the topic of the meeting is related to UIMA.
+It will do this by looking for occurrences of UIMA acronyms in close proximity to the meeting annotation.
+We could implement this by using the UimaAcronymAnnotator, of course, but for the sake of this example we will have the UimaMeetingAnnotator access the acronym map directly.
+
+The Java code for the UimaMeetingAnnotator in example 6 creates a new type, UimaMeeting, if it finds a meeting within 50 characters of the UIMA acronym.
+
+We combine three analysis engines, the UimaAcronymAnnotator to annotate UIMA acronyms, the MeetingDectector from example 4 to find meetings and finally the UimaMeetingAnnotator to annotate just meetings about UIMA.
+Together these are assembled to form the new aggregate analysis engine, UimaMeetingDectector.
+This aggregate and the sharing of a common resource are illustrated in <<ugr.tug.aae.fig.sharing_common_resource>>. 
+
+[[ugr.tug.aae.fig.sharing_common_resource]]
+.Component engines of an aggregate share a common resource
+image::images/tutorials_and_users_guides/tug.aae/image042.png[]
+
+The important thing to notice is in the `UimaMeetingDetectorAE.xml` aggregate descriptor.
+It includes both the UimaMeetingAnnotator and the UimaAcronymAnnotator, and contains a single declaration of the UimaAcronymTableFile resource.
+(The actual example has the order of the first two annotators reversed versus the above picture, which is OK since they do not depend on one another).
+
+It also binds the resources as follows: 
+
+.UimaMeetingDetectorAE.xml binding a common resource
+image::images/tutorials_and_users_guides/tug.aae/image044.jpg[]
+
+[source]
+----
+<externalResourceBindings>
+  <externalResourceBinding>
+    <key>UimaAcronymAnnotator/AcronymTable</key> 
+    <resourceName>UimaAcronymTableFile</resourceName> 
+  </externalResourceBinding>
+
+  <externalResourceBinding>
+    <key>UimaMeetingAnnotator/UimaTermTable</key> 
+    <resourceName>UimaAcronymTableFile</resourceName> 
+  </externalResourceBinding>
+</externalResourceBindings>
+----
+
+This binds the resource dependencies of both the UimaAcronymAnnotator (which uses the name AcronymTable) and UimaMeetingAnnotator (which uses UimaTermTable) to the single declared resource named UimaAcronymFile.
+Therefore they will share the same instance.
+Resource bindings in the aggregate descriptor _override_ any resource declarations in individual annotator descriptors.
+
+If we wanted to have the annotators use different acronym tables, we could easily do that.
+We would simply have to change the resourceName elements in the bindings so that they referred to two different resources.
+The Resource Manager gives us the flexibility to make this decision at deployment time, without changing any Java code.
+
+[[ugr.tug.aae.resources.threading]]
+==== Threading and Shared Resources
+
+Sharing can also occur when multiple instances of an annotator are  created by the framework in response to run-time deployment specifications.
+If an implementation class is specified in the external resource,  only one instance of that implementation class   is created for a given binding, and is shared among all annotators.
+Because of this, the implementation of that shared instance must be written to be thread-safe - that is, to operate correctly when called at arbitrary times by multiple threads.
+Writing thread-safe code in Java is addressed in several books, such as Brian Goetz's __Java Concurrency in Practice__.
+
+If no implementation class is specified, then the getResource method returns a DataResource object, from which each annotator instance can obtain their own (non-shared) input stream; so threading is not an issue in this case. 
+
+[[ugr.tug.aae.result_specification_setting]]
+=== Result Specifications
+
+Annotators often are written to do a lot of computation and produce a lot of different outputs.
+For example, a tokenizer can, in addition to identifying tokens, look them up in dictionaries, create  lemma forms (dropping suffexes and prefixes), etc.
+Result Specifications provide a way to dynamically specify what results are desired for a particular CAS being processed.
+
+It is up to the annotator writer to take advantage of the result specification; using it is optional.
+If it is used, the annotator writer checks if a particular output is wanted, by asking the result specification if it contains a specific Type and/or Feature.
+If it does, then the annotator produces that type/feature; if not, it skips the computations for producing that type/feature.
+
+The Result Specification querying may  include the language.
+A typical use case:  The CAS contains a document written in some language, and some upstream Annotator has discovered what this language is.
+The Annotator extracts the previously discovered language specification from the CAS and  then includes it when querying the Result Specification.
+The exact method of encoding  language specifications in the CAS is left up to annotator developers; however, the framework provides a commonly used type for this - the org.apache.uima.tcas.DocumentAnnotation type.
+
+The Result Specification is passed to the annotator instance by calling its setResultSpecificaiton method (this call is typically done by the framework, based on Capability specifications).  When called, the default implementation saves the result specification in an instance variable of the Annotator instance, which can be accessed by the annotator using the protected `getResultSpecification()` method.
+
+A Result Specification is a list of output types and / or type:feature names, catagorized by language(s), which are expected to be output from (produced by) the annotator.
+Annotators may use this to optimize their operations, when possible, for those cases where only particular outputs are wanted.
+The interface to the Result Specification object (see the Javadocs) allows querying both types and particular features of types.
+
+The languages specifications used by Result Specifications are the same that are specifiable in Capability Specifications; examples include "en" for English, "en-uk" for British English, etc.
+There is also a language type, "x-unspecified", which is presumed if no language specification(s) are given.
+
+If a query of the Result Specification doesn't include a language, it is treated as if the  language "x-unspecified" was specified.
+Language matching is hierarchically defaulted, in one direction: if a query includes the language "en-uk", meaning that the document being processed is in that language, it will match Result Specifications whose languages "en-uk", "en", or "x-unspecified".  In other words, if the  Result Specifications say to produce output if the actual document's language is en-uk, or en, or x-unspecified, then having the actual document's language be en-uk would "match" any of these Result Specifications.
+However the reverse is not true: If the query asks about producing output if the actual document's language is "x-unspecified",  then it would not match if the Result Specification said to produce output only if the  actual document is en-uk or en;  the Result Specification would need to say to  produce output for "x-unspecified). 
+
+If the Result Specification indicates it wants output produced for "en-uk", but the annotator is given a language which is unknown,  or one that is known, but isn't "en-uk", then the query (using the language  of the document) will return false.
+This is true even if the language is "en".   However, if the Result Specification indicates it wants output for "en",  and the query is for a document whose language is "en-uk" then the query will return true. 
+
+Sometimes you can specify the Result Specification; othertimes, you cannot (for instance, inside a Collection Processing Engine, you cannot). When you cannot specify it, or choose not to specify it (for example, using the form of the process(...) call on an Analysis Engine that doesn't include the Result Specification), a "`Default`" Result Specification is used.
+
+[[ugr.tug.aae.result_spec.default]]
+==== Default ResultSpecification
+
+The default Result Specification is taken from the Engine's output Capability Specification.
+Remember that a Capability Specification has both inputs and outputs, can specify types and / or features, and there can be more than one Capability Set.
+If there is more than one set, the logical union by language of these sets is used.
+Each set can have a different "language(s)" specified; the default Result Specification  will have the outputs by language(s), so that the annotator can query which outputs  should be provided for particular languages.
+The methods to query the Result Specification take a type and (optionally) a feature, and optionally, a language.
+If the queried type is a subtype of some otherwise matching type in the Result Specification, it will match the query.
+See the Javadocs for more details on this. 
+
+[[ugr.tug.aae.result_spec.passing_to_annotators]]
+==== Passing Result Specifications to Annotators
+
+If you are not using a Collection Processing Engine, you can specify a Result Specification for your AnalysisEngine(s) by calling the `AnalysisEngine.setResultSpecification(ResultSpecification)` method.
+
+It is also possible to pass a Result Specification on each call to ``AnalysisEngine.process(CAS, ResultSpecification)``.
+However, this is not recommended if your Result Specification will stay constant across multiple calls to ``process``.
+In that case it will be more efficient to call `AnalysisEngine.setResultSpecification(ResultSpecification)` only when the Result Specification changes.
+
+For primitive Analysis Engines, whatever Result Specification you pass in is passed along to the annotator's `setResultSpecification(ResultSpecification)` method.
+For aggregate Analysis Engines, see below.
+
+[[ugr.tug.aae.result_spec.aggregates]]
+==== Aggregates
+
+For aggregate engines, the Result Specification passed to the `AnalysisEngine.setResultSpecification(ResultSpecification)` method is intended to specify the set of output types/features that the aggregate should produce.
+This is not necessarily equivalent to the set of output types/features that each annotator should produce.
+For example, an annotator may need to produce an intermediate type that is then consumed by a downstream annotator, even though that intermediate type is not part of the Result Specification.
+
+To handle this situation, when `AnalysisEngine.setResultSpecification(ResultSpecification)` is called on an aggregate, the framework computes the union of the passed Result Specification with the set of _all_ input types and features of _all_ component AnalysisEngines within that aggregate.
+This forms the complete set of types and features that any component of the aggregate might need to produce.
+This derived Result Specification is then intersected with the  delegate's output capabilities, and the result is passed to the `AnalysisEngine.setResultSpecification(ResultSpecification)` of each component AnalysisEngine.
+In the case of nested aggregates, this procedure is applied recursively.
+
+[[ugr.tug.aae.result_spec.aggregates.cpes]]
+==== Collection Proessing Engines
+
+The Default Result Specification is always used for all components of a Collection Processing Engine.
+
+[[ugr.tug.aae.classpath_when_using_jcas]]
+=== Class path setup when using JCas
+
+JCas provides Java classes that correspond to each CAS type in an application.
+These classes are generated by the JCasGen utility (which can be automatically invoked from the Component Descriptor Editor).
+
+The Java source classes generated by the JCasGen utility are typically compiled and packaged into a JAR file.
+This JAR file must be present in the classpath of the UIMA application.
+
+There can bexref:ref.adoc#ugr.ref.jcas.class_loaders[issues] around setting up this class path, including deployment issues where class loaders are being used to isolate multiple UIMA applications inside a single running Java Virtual Machine.
+
+[[ugr.tug.aae.using_shell_scripts]]
+=== Using the Shell Scripts
+
+The SDK includes a `/bin` subdirectory containing shell scripts, for Windows (.bat files) and Unix (.sh files). Many of these scripts invoke sample Java programs which require a class path; they call a common shell script, `setUimaClassPath` to set up the UIMA required files and directories on the class path.
+
+If you need to include files on the class path, the scripts will add anything you specify in the environment variables CLASSPATH or UIMA_CLASSPATH to the classpath.
+So, for example, if you are running the document analyzer, and wanted it to find a Java class file named (on Windows) c:\a\b\c\myProject\myJarFile.jar, you could first issue a `set` command to set the UIMA_CLASSPATH to this file, followed by the documentAnalyzer script: 
+
+[source]
+----
+set UIMA_CLASSPATH=c:\a\b\c\myProject\myJarFile.jar
+documentAnalyzer
+----
+
+Other environment variables are used by the shell scripts, as follows: 
+
+.Environment variables used by the shell scripts
+[cols="1,1", frame="all", options="header"]
+|===
+| Environment Variable
+| Description
+
+|`UIMA_HOME`
+|Path where the UIMA SDK was installed.
+
+|`JAVA_HOME`
+|(Optional) Path to a Java Runtime Environment. If not set, the Java JRE that is in your system PATH is used.
+
+|`UIMA_CLASSPATH`
+|(Optional) if specified, a path specification to use as the default ClassPath.  You can also set the CLASSPATH variable.  If you set both, they will be concatenated.
+
+|`UIMA_DATAPATH`
+|(Optional) if specified, a path specification to use as the default xref:ref.adoc#ugr.ref.xml.component_descriptor.datapath[DataPath].
+
+|`UIMA_LOGGER_CONFIG_FILE``
+|(Optional) if specified, a path to a Java Logger properties file (see <<ugr.tug.aae.configuration_logging>>)
+
+|`UIMA_JVM_OPTS``
+|(Optional) if specified, the JVM arguments to be used when the Java process is started.  This can be used for example to set the maximum Java heap size or to define system properties.
+
+|`VNS_PORT``
+|(Optional) if specified, the network IP port number of the xref:tug.adoc#ugr.tug.application.vns[Vinci Name Server (VNS)].
+
+|`ECLIPSE_HOME`
+|(Optional) Needs to be set to the root of your Eclipse installation when using shell scripts that invoke Eclipse (e.g. `jcasgen_merge`)
+|===
+
+[[ugr.tug.aae.common_pitfalls]]
+== Common Pitfalls
+
+Here are some things to avoid doing in your annotator code:
+
+*Do not retain references to JCas objects between calls to process() for different CASes*
+
+The JCas will be cleared between calls to your annotator's `process()` method for each new CAS.
+All of the analysis results related to the previous document will be deleted to make way for analysis of a new document.
+Therefore, you should never save a reference to a JCas Feature Structure object (i.e.
+an instance of a class created using JCasGen) and attempt to reuse it in a future invocation of the `process()` method.
+If you do so, the results will be undefined.
+
+*Careless use of static data*
+
+Always keep in mind that an application that uses your annotator may create multiple instances of your annotator class.
+A multithreaded application may attempt to use two instances of your annotator to process two different documents simultaneously.
+This will generally not cause any problems as long as your annotator instances do not share static data.
+
+In general, you should not use static variables other than static final constants of primitive data types (String, int, float, etc). Other types of static variables may allow one annotator instance to set a value that affects another annotator instance, which can lead to unexpected effects.
+Also, static references to classes that aren't thread-safe are likely to cause errors in multithreaded applications.
+
+[[ugr.tug.aae.viewing_uima_objects_in_eclipse_debugger]]
+== UIMA Objects in Eclipse Debugger
+
+Eclipse has a feature for viewing Java Logical Structures.
+When enabled, it will permit you to see a view of UIMA objects (such as feature structure instances, CAS or JCas instances, etc.) which displays the logical subparts.
+For example, here is a view of a feature structure for the RoomNumber annotation, from the tutorial example 1: 
+
+.Screenshot of Eclipse debugger showing non-logical-structure display of a feature structure
+image::images/tutorials_and_users_guides/tug.aae/image046.jpg[]
+
+The "`annotation`" object in Java shows the internals of the JCas object, not very convenient for seeing the features or the part of the input that is being annotated.
+But if you turn on the Java Logical Structure mode by pushing this button: 
+
+.Screenshot of Eclipse debugger showing button to push to enable viewing logical structures
+image::images/tutorials_and_users_guides/tug.aae/image048.jpg[]
+
+The features of the FeatureStructure instance will be shown: 
+
+.Screenshot of Eclipse debugger showing logical structure display of an annotation
+image::images/tutorials_and_users_guides/tug.aae/image050.jpg[]
+
+
+[[ugr.tug.aae.xml_intro_ae_descriptor]]
+== Introduction to Analysis Engine Descriptor XML Syntax
+// <titleabbrev>Analysis Engine XML Descriptor</titleabbrev>
+
+This section is an introduction to the syntax used for Analysis Engine Descriptors.
+Most users do not need to understand these details; they can use the Component Descriptor Editor Eclipse plugin to edit Analysis Engine Descriptors rather than editing the XML directly.
+
+This section walks through the actual XML descriptor for the RoomNumberAnnotator example introduced in section <<ugr.tug.aae.getting_started>>.
+The discussion is divided into several logical sections of the descriptor.
+
+The full specification for Analysis Engine Descriptors is defined in xref:ref.adoc#ugr.ref.xml.component_descriptor[Component Descriptor Reference].
+
+[[ugr.tug.aae.header_annotator_class_identification]]
+=== Header and Annotator Class Identification
+
+[source]
+----
+<?xml version="1.0" encoding="UTF-8" ?> 
+<!--  Descriptor for the example RoomNumberAnnotator. --> 
+<analysisEngineDescription xmlns="http://uima.apache.org/resourceSpecifier">
+  <frameworkImplementation>org.apache.uima.java</frameworkImplementation> 
+  <primitive>true</primitive> 
+  <annotatorImplementationName>
+    org.apache.uima.tutorial.ex1.RoomNumberAnnotator
+  </annotatorImplementationName>
+----
+
+The document begins with a standard XML header and a comment.
+The root element of the document is named `<analysisEngineDescription>,` and must specify the XML namespace ``http://uima.apache.org/resourceSpecifier``.
+
+The first subelement, ``<frameworkImplementation>``, must contain the value ``org.apache.uima.java``.
+The second subelement, ``<primitive>``, contains the Boolean value true, indicating that this XML document describes a _Primitive_ Analysis Engine.
+A Primitive Analysis Engine is comprised of a single annotator.
+It is also possible to construct XML descriptors for non-primitive or _Aggregate_ Analysis Engines; this is covered later.
+
+The next element, ``<annotatorImplementationName>``, contains the fully-qualified class name of our annotator class.
+This is how the UIMA framework determines which annotator class to instantiate.
+
+[[ugr.tug.aae.xml_intro_simple_metadata_attributes]]
+=== Simple Metadata Attributes
+
+[source]
+----
+<analysisEngineMetaData>
+  <name>Room Number Annotator</name> 
+  <description>An example annotator that searches for room numbers in
+     the IBM Watson research buildings.</description> 
+  <version>1.0</version> 
+  <vendor>The Apache Software Foundation</vendor></para>
+----
+
+Here are shown four simple metadata fields –name, description, version, and vendor.
+Providing values for these fields is optional, but recommended.
+
+[[ugr.tug.aae.xml_intro_type_system_definition]]
+=== Type System Definition
+
+[source]
+----
+<typeSystemDescription>
+  <imports>
+    <import location="TutorialTypeSystem.xml"/>
+  </imports>
+</typeSystemDescription>
+----
+
+This section of the XML descriptor defines which types the annotator works with.
+The recommended way to do this is to _import_ the type system definition from a separate file, as shown here.
+The location specified here should be a relative path, and it will be resolved relative to the location of the aggregate descriptor.
+It is also possible to define types directly in the Analysis Engine descriptor, but these types will not be easily shareable by others.
+
+[[ugr.tug.aae.xml_intro_capabilities]]
+=== Capabilities
+
+[source]
+----
+<capabilities>
+  <capability>
+    <inputs /> 
+    <outputs>
+      <type>org.apache.uima.tutorial.RoomNumber</type> 
+      <feature>org.apache.uima.tutorial.RoomNumber:building</feature> 
+    </outputs>
+  </capability>
+</capabilities>
+----
+
+The last section of the descriptor describes the _Capabilities_ of the annotator – the Types/Features it consumes (input) and the Types/Features that it produces (output). These must be the names of types and features that exist in the ANALYSIS ENGINE descriptor's type system definition.
+
+Our annotator outputs only one Type, RoomNumber and one feature, RoomNumber:building.
+The fully-qualified names (including namespace) are needed.
+
+The building feature is listed separately here, but clearly specifying every feature for a complex type would be cumbersome.
+Therefore, a shortcut syntax exists.
+The <outputs> section above could be replaced with the equivalent section: 
+
+[source]
+----
+<outputs>
+  <type allAnnotatorFeatures ="true">
+     org.apache.uima.tutorial.RoomNumber
+  </type> 
+</outputs>
+----
+
+[[ugr.tug.aae.xml_intro.configuration_parameters]]
+=== Configuration Parameters (Optional)
+
+[[ugr.tug.aae.xml_intro.configuration_parameters_declarations]]
+==== Configuration Parameter Declarations
+
+[source]
+----
+<configurationParameters>
+  <configurationParameter>
+    <name>Patterns</name> 
+    <description>List of room number regular expression patterns.
+    </description> 
+    <type>String</type> 
+    <multiValued>true</multiValued> 
+    <mandatory>true</mandatory> 
+  </configurationParameter>
+  <configurationParameter>
+    <name>Locations</name> 
+    <description>List of locations corresponding to the room number
+       expressions specified by the Patterns parameter.
+    </description> 
+    <type>String</type> 
+    <multiValued>true</multiValued> 
+    <mandatory>true</mandatory> 
+  </configurationParameter>
+</configurationParameters>
+----
+
+The `<configurationParameters>` element contains the definitions of the configuration parameters that our annotator accepts.
+We have declared two parameters.
+For each configuration parameter, the following are specified: 
+
+* *name*– the name that the annotator code uses to refer to the parameter
+* *description*– a natural language description of the intent of the parameter
+* *type*– the data type of the parameter's value – must be one of String, Integer, Float, or Boolean.
+* *multiValued*– true if the parameter can take multiple-values (an array), false if the parameter takes only a single value. 
+* *mandatory*– true if a value must be provided for the parameter 
+
+Both of our parameters are mandatory and accept an array of Strings as their value.
+
+[[ugr.tug.aae.xml_intro_configuration_parameter_settings]]
+==== Configuration Parameter Settings
+
+[source]
+----
+<configurationParameterSettings>
+  <nameValuePair>
+    <name>Patterns</name> 
+    <value>
+      <array>
+        <string>b[0-4]d-[0-2]ddb</string> 
+        <string>b[G1-4][NS]-[A-Z]ddb</string> 
+        <string>bJ[12]-[A-Z]ddb</string> 
+      </array>
+    </value>
+  </nameValuePair>
+  <nameValuePair>
+    <name>Locations</name> 
+    <value>
+      <array>
+        <string>Watson - Yorktown</string> 
+        <string>Watson - Hawthorne I</string> 
+        <string>Watson - Hawthorne II</string> 
+      </array>
+    </value>
+  </nameValuePair>
+</configurationParameterSettings>
+----
+
+[[ugr.tug.aae.xml_intro.aggregate]]
+==== Aggregate Analysis Engine Descriptor
+
+[source]
+----
+<?xml version="1.0" encoding="UTF-8" ?> 
+<analysisEngineDescription xmlns="http://uima.apache.org/resourceSpecifier">
+  <frameworkImplementation>org.apache.uima.java</frameworkImplementation> 
+  <primitive>false</primitive> 
+
+  <delegateAnalysisEngineSpecifiers>
+    <delegateAnalysisEngine key="RoomNumber">
+      <import location="../ex2/RoomNumberAnnotator.xml"/> 
+    </delegateAnalysisEngine>
+    <delegateAnalysisEngine key="DateTime">
+      <import location="TutorialDateTime.xml" /> 
+    </delegateAnalysisEngine>
+  </delegateAnalysisEngineSpecifiers>
+----
+
+The first difference between this descriptor and an individual annotator's descriptor is that the `<primitive>` element contains the value ``false``.
+This indicates that this Analysis Engine (AE) is an aggregate AE rather than a primitive AE.
+
+Then, instead of a single annotator class name, we have a list of ``delegateAnalysisEngineSpecifiers``.
+Each specifies one of the components that constitute our Aggregate . We refer to each component by the relative path from this XML descriptor to the component AE's XML descriptor.
+
+This list of component AEs does not imply an ordering of them in the execution pipeline.
+Ordering is done by another section of the descriptor: 
+[source]
+----
+<analysisEngineMetaData>
+  <name>Aggregate AE - Room Number and DateTime Annotators</name> 
+  <description>Detects Room Numbers, Dates, and Times</description> 
+  <flowConstraints>
+    <fixedFlow>
+      <node>RoomNumber</node> 
+      <node>DateTime</node> 
+    </fixedFlow>
+  </flowConstraints>
+----
+
+Here, a fixedFlow is adequate, and we specify the exact ordering in which the AEs will be executed.
+In this case, it doesn't really matter, since the RoomNumber and DateTime annotators do not have any dependencies on one another.
+
+Finally, the descriptor has a capabilities section, which has exactly the same syntax as a primitive AE's capabilities section: 
+[source]
+----
+<capabilities>
+  <capability>
+    <inputs /> 
+    <outputs>
+      <type allAnnotatorFeatures="true">
+        org.apache.uima.tutorial.RoomNumber
+      </type> 
+      <type allAnnotatorFeatures="true">
+        org.apache.uima.tutorial.DateAnnot
+      </type> 
+      <type allAnnotatorFeatures="true">
+        org.apache.uima.tutorial.TimeAnnot
+      </type> 
+    </outputs>
+    <languagesSupported>
+      <language>en</language> 
+    </languagesSupported>
+  </capability>
+</capabilities>
+----
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/tug/common_book_info.adoc b/uimaj-documentation/src/docs/asciidoc/tug/common_book_info.adoc
new file mode 100644
index 0000000..537f3e6
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/tug/common_book_info.adoc
@@ -0,0 +1,42 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+Copyright © 2006, 2021 The Apache Software Foundation
+
+Copyright © 2004, 2006 International Business Machines Corporation
+
+[discrete]
+=== License and Disclaimer
+
+The ASF licenses this documentation to you under the Apache License, Version 2.0 (the "License"); 
+you may not use this documentation except in compliance with the License.  You may obtain a copy of
+the License at
+
+[.text-center]
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, this documentation and its contents are
+distributed under the License on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
+either express or implied.  See the License for the specific language governing permissions and
+limitations under the License.
+
+[discrete]
+=== Trademarks
+
+All terms mentioned in the text that are known to be trademarks or service marks have been 
+appropriately capitalized.  Use of such terms in this book should not be regarded as affecting the
+validity of the the trademark or service mark.
\ No newline at end of file
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image002.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image002.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image002.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image002.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image004.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image004.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image004.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image004.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image006.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image006.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image006.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image006.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image008.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image008.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image008.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image008.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image010.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image010.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image010.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image010.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image012.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image012.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image012.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image012.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image014.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image014.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image014.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image014.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image016.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image016.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image016.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image016.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image018.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image018.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image018.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image018.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image020.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image020.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image020.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image020.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image022.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image022.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image022.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image022.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image024.png b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image024.png
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image024.png
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image024.png
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image026.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image026.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image026.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image026.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image028.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image028.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image028.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image028.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image030.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image030.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image030.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image030.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image032.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image032.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image032.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image032.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image034.png b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image034.png
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image034.png
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image034.png
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image036.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image036.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image036.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image036.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image038.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image038.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image038.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image038.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image040.png b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image040.png
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image040.png
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image040.png
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image042.png b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image042.png
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image042.png
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image042.png
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image044.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image044.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image044.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image044.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image046.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image046.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image046.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image046.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image048.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image048.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image048.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image048.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image050.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image050.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.aae/image050.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.aae/image050.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.application/image002.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.application/image002.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.application/image002.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.application/image002.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.application/image004.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.application/image004.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.application/image004.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.application/image004.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.application/image006.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.application/image006.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.application/image006.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.application/image006.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.cas_multiplier/image002.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.cas_multiplier/image002.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.cas_multiplier/image002.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.cas_multiplier/image002.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.cpe/image002.png b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.cpe/image002.png
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.cpe/image002.png
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.cpe/image002.png
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.cpe/image004.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.cpe/image004.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.cpe/image004.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.cpe/image004.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.cpe/image006.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.cpe/image006.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.cpe/image006.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.cpe/image006.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.cpe/image008.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.cpe/image008.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.cpe/image008.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.cpe/image008.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.cpe/image010.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.cpe/image010.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.cpe/image010.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.cpe/image010.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.cpe/image012.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.cpe/image012.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.cpe/image012.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.cpe/image012.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.cpe/image014.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.cpe/image014.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.cpe/image014.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.cpe/image014.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.cpe/image018.png b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.cpe/image018.png
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.cpe/image018.png
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.cpe/image018.png
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.cpe/image020.png b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.cpe/image020.png
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.cpe/image020.png
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.cpe/image020.png
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.cpe/image023.png b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.cpe/image023.png
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.cpe/image023.png
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.cpe/image023.png
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.cpe/image026.png b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.cpe/image026.png
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.cpe/image026.png
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.cpe/image026.png
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.fc/image002.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.fc/image002.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.fc/image002.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.fc/image002.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.fc/image004.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.fc/image004.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.fc/image004.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.fc/image004.jpg
Binary files differ
diff --git a/uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.fc/image006.jpg b/uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.fc/image006.jpg
similarity index 100%
rename from uima-docbook-tutorials-and-users-guides/src/docbook/images/tutorials_and_users_guides/tug.fc/image006.jpg
rename to uimaj-documentation/src/docs/asciidoc/tug/images/tutorials_and_users_guides/tug.fc/image006.jpg
Binary files differ
diff --git a/uimaj-documentation/src/docs/asciidoc/tug/tug.aas.adoc b/uimaj-documentation/src/docs/asciidoc/tug/tug.aas.adoc
new file mode 100644
index 0000000..e4d3fa7
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/tug/tug.aas.adoc
@@ -0,0 +1,190 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.tug.aas]]
+= Annotations, Artifacts, and Sofas
+// <titleabbrev>Annotations, Artifacts &amp; Sofas</titleabbrev>
+
+Up to this point, the documentation has focused on analyzing strings of Unicode text, producing subtypes of Annotations which reference offsets in those strings.
+This chapter generalizes this concept and shows how other kinds of artifacts can be handled, including non-text things like audio and images, and how you can define your own kinds of "`annotations`" for these.
+
+[[ugr.tug.aas.terminology]]
+== Terminology
+
+[[ugr.tug.aas.artifact]]
+=== Artifact
+
+The Artifact is the unstructured thing being analyzed by an annotator.
+It could be an HTML web page, an image, a video stream, a recorded audio conversation, an MPEG-4 stream, etc.
+Artifacts are often restructured in the course of processing to facilitate particular kinds of analysis.
+For instance, an HTML page may be converted into a "`de-tagged`" version.
+Annotators at different places in the pipeline may be analyzing different versions of the artifact.
+
+[[ugr.tug.aas.sofa]]
+=== Subject of Analysis — Sofa
+
+Each representation of an Artifact is called a Subject of Analysis, abbreviated using the acronym "`Sofa`" which stands for __S__ubject _
+        OF___A__nalysis.
+Annotation metadata, which have explicit designations of sub-regions of the artifact to which they apply, are always associated with a particular Sofa.
+For instance, an annotation over text specifies two features, the begin and end, which represent the character offsets into the text string Sofa being analyzed.
+
+Other examples of representations of Artifacts, which could be Sofas include: An HTML web page, a detagged web page, the translated text of that document, an audio or video stream, closed-caption text from a video stream, etc.
+
+Often, there is one Sofa being analyzed in a CAS.
+The next chapter will show how UIMA facilitates working with multiple representations of an artifact at the same time, in the same CAS.
+
+[[ugr.tug.aas.sofa_data_formats]]
+== Formats of Sofa Data
+
+Sofa data can be Java Unicode Strings, Feature Structure arrays of primitive types, or a URI which references remote data available via a network connection.
+
+The arrays of primitive types can be things like byte arrays or float arrays, and are intended to be used for artifacts like audio data, image data, etc.
+
+The URI form holds a URI specification String.
+
+[NOTE]
+====
+Sofa data can be "serialized" using an XML format; when it is, the String data  being serialized must not include invalid XML characters.
+See <<ugr.tug.xmi_emf.xml_character_issues>>. 
+====
+
+[[ugr.tug.aas.setting_accessing_sofa_data]]
+== Setting and Accessing Sofa Data
+
+[[ugr.tug.aas.setting_sofa_data]]
+=== Setting Sofa Data
+
+When a CAS is created, you can set its Sofa Data, just one time; this property insures that metadata describing regions of the Sofa remain valid.
+As a consequence, the following methods that set data for a given Sofa can only be called once for a given Sofa.
+
+The following methods on the CAS set the Sofa Data to one of the 3 formats.
+Assume that the variable "`aCas`" holds a reference to a CAS:
+
+[source]
+----
+aCas.setSofaDataString(document_text_string, mime_type_string);
+aCas.setSofaDataArray(feature_structure_primitive_array, mime_type_string);
+aCas.setSofaDataURI(uri_string, mime_type_string);
+----
+
+In addition, the method `aCas.setDocumentText(document_text_string)` may still be used, and is equivalent to ``setSofaDataString(string,
+        "text")``.
+The mime type is currently not used by the UIMA framework, but may be set and retrieved by user code.
+
+Feature Structure primitive arrays are all the UIMA Array types except arrays of Feature Structures, Strings, and Booleans.
+Typically, these are arrays of bytes, but can be other types, such as floats, longs, etc.
+
+The URI string should conform to the standard URI format.
+
+[[ugr.tug.aas.accessing_sofa_data]]
+=== Accessing Sofa Data
+
+The analysis algorithms typically work with the Sofa data.
+The following methods on the CAS access the Sofa Data:
+
+[source]
+----
+String           aCas.getDocumentText();
+String           aCas.getSofaDataString();
+FeatureStructure aCas.getSofaDataArray();
+String           aCas.getSofaDataURI();
+String           aCas.getSofaMimeType();
+----
+
+The `getDocumentText` and `getSofaDataString` return the same text string.
+The `getSofaDataURI` returns the URI itself, not the data the URI is pointing to.
+You can use standard Java I/O capabilities to get the data associated with the URI, or use the UIMA Framework Streaming method described next.
+
+[[ugr.tug.aas.accessing_sofa_data_using_java_stream]]
+=== Accessing Sofa Data using a Java Stream
+
+The framework provides a consistent method for accessing the Sofa data, independent of it being stored locally, or accessed remotely using the URI.
+Get a Java InputStream instance from the Sofa data using:
+
+[source]
+----
+InputStream inputStream = aCas.getSofaDataStream();
+----
+
+* If the data is local, this method returns a ByteArrayInputStream. This stream provides bytes. 
++
+** If the Sofa data was set using setDocumentText or setSofaDataString, the String is converted to bytes by using the UTF-8 encoding.
+** If the Sofa data was set as a DataArray, the bytes in the data array are serialized, high-byte first. 
+* If the Sofa data was specified as a URI, this method returns the handle from url.openStream(). Java offers built-in support for several URI schemes including "`FILE:`", "`HTTP:`", "`FTP:`" and has an extensible mechanism, ``URLStreamHandlerFactory``, for customizing access to an arbitrary URI. See more details at http://java.sun.com/j2se/1.5.0/docs/api/java/net/URLStreamHandlerFactory.html . 
+
+
+[[ugr.tug.aas.sofa_fs]]
+== The Sofa Feature Structure
+
+Information about a Sofa is contained in a special built-in Feature Structure of type ``uima.cas.Sofa``.
+This feature structure is created and managed by the UIMA Framework; users must not create it directly.
+Although these Sofa type instances are implemented as standard feature structures, __generic CAS APIs can not be used to create Sofas or set their features__.
+Instead, Sofas are created implicitly by the creation of new CAS views.
+Similarly, Sofa features are set by CAS methods such as ``cas.setDocumentText()``.
+
+Features of the Sofa type include
+
+* SofaID: Every Sofa in a CAS has a unique SofaID. SofaIDs are the primary handle for access. This ID is often the same as the name string given to the Sofa by the developer, but it can be see xref:tug.adoc#ugr.tug.mvs.sofa_name_mapping[mapped to a different name].
+* Mime type: This string feature can be used to describe the type of the data represented by a Sofa. It is not used by the framework; the framework provides APIs to set and get its value.
+* Sofa Data: The Sofa data itself. This data can be resident in the CAS or it can be a reference to data outside the CAS. 
+
+
+[[ugr.tug.aas.annotations]]
+== Annotations
+
+Annotators add meta data about a Sofa to the CAS.
+It is often useful to have this metadata denote a region of the Sofa to which it applies.
+For instance, assuming the Sofa is a String, the metadata might describe a particular substring as the name of a person.
+The built-in UIMA type, uima.tcas.Annotation, has two extra features that enable this - the begin and end features - which denote a character position offset into the text string being analyzed.
+
+The concept of "`annotations`" can be generalized for non-string kinds of Sofas.
+For instance, an audio stream might have an audio annotation which describes sounds regions in terms of floating point time offsets in the Sofa.
+An image annotation might use two pairs of x,y coordinates to define the region the annotation applies to.
+
+[[ugr.tug.aas.built_in_annotation_types]]
+=== Built-in Annotation types
+
+The built-in CAS type, ``uima.tcas.Annotation``, is just one kind of definition of an Annotation.
+It was designed for annotating text strings, and has begin and end features which describe which substring of the Sofa being annotated.
+
+For applications which have other kinds of Sofas, the UIMA developer will design their own kinds of Annotation types, as needed to describe an annotation, by declaring new types which are subtypes of ``uima.cas.AnnotationBase``.
+For instance, for images, you might have the concept of a rectangular region to which the annotation applies.
+In this case, you might describe the region with 2 pairs of x, y coordinates.
+
+[[ugr.tug.aas.annotations_associated_sofa]]
+=== Annotations have an associated Sofa
+
+Annotations are always associated with a particular Sofa.
+In subsequent chapters, you will learn how there can be multiple Sofas associated with an artifact; which Sofa an annotation refers to is described by the Annotation feature structure itself.
+
+All annotation types extend from the built-in type uima.cas.AnnotationBase.
+This type has one feature, a reference to the Sofa associated with the annotation.
+This value is currently used by the Framework to support the getCoveredText() method on the annotation instance - this returns the portion of a text Sofa that the annotation spans.
+It also is used to insure that the Annotation is indexed only in the CAS View associated with this Sofa.
+
+[[ugr.tug.aas.annotationbase]]
+== AnnotationBase
+
+A built-in type, ``uima.cas.AnnotationBase``, is provided by UIMA to allow users to extend the Annotation capabilities to different kinds of Annotations.
+The `AnnotationBase` type has one feature, named ``sofa``, which holds a reference to the `Sofa` feature structure with which this annotation is associated.
+The `sofa` feature is automatically set when creating an annotation  (meaning — any type derived from the built-in `uima.cas.AnnotationBase` type); it should not be set by the user.
+
+There is one method, ``getView``(), provided by `AnnotationBase` that returns the CAS View for the Sofa the annotation is pointing at.
+Note that this method always returns a CAS, even when applied to JCas annotation instances.
+
+The built-in type `uima.tcas.Annotation` extends `uima.cas.AnnotationBase` and adds two features, a begin and an end feature, which are suitable for identifying a span in a text string that the annotation applies to.
+Users may define other extensions to `AnnotationBase` with alternative specifications that can denote a particular region within the subject of analysis, as appropriate to their application.
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/tug/tug.application.adoc b/uimaj-documentation/src/docs/asciidoc/tug/tug.application.adoc
new file mode 100644
index 0000000..a8c938d
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/tug/tug.application.adoc
@@ -0,0 +1,1380 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.tug.application]]
+= Application Developer's Guide
+
+This chapter describes how to develop an application using the Unstructured Information Management Architecture (UIMA). The term _application_ describes a program that provides end-user functionality.
+A UIMA application incorporates one or more UIMA components such as Analysis Engines, Collection Processing Engines, a Search Engine, and/or a Document Store and adds application-specific logic and user interfaces.
+
+[[ugr.tug.appication.uimaframework_class]]
+== The UIMAFramework Class
+
+An application developer's starting point for accessing UIMA framework functionality is the `org.apache.uima.UIMAFramework` class.
+The following is a short introduction to some important methods on this class.
+Several of these methods are used in examples in the rest of this chapter.
+For more details, see the Javadocs (in the docs/api directory of the UIMA SDK). 
+
+* UIMAFramework.getXMLParser(): Returns an instance of the UIMA XML Parser class, which then can be used to parse the various types of UIMA component descriptors. Examples of this can be found in the remainder of this chapter.
+* UIMAFramework.produceXXX(ResourceSpecifier): There are various produce methods that are used to create different types of UIMA components from their descriptors. The argument type, ResourceSpecifier, is the base interface that subsumes all types of component descriptors in UIMA. You can get a ResourceSpecifier from the XMLParser. Examples of produce methods are: 
++
+** produceAnalysisEngine
+** produceCasConsumer
+** produceCasInitializer
+** produceCollectionProcessingEngine
+** produceCollectionReader
+
+There are other variations of each of these methods that take additional, optional arguments.
+See the Javadocs for details. 
+* UIMAFramework.getLogger(<optional-logger-name>): Gets a reference to the UIMA Logger, to which you can write log messages. If no logger name is passed, the name of the returned logger instance is "`org.apache.uima`".
+* UIMAFramework.getVersionString(): Gets the number of the UIMA version you are using.
+* UIMAFramework.newDefaultResourceManager(): Gets an instance of the UIMA ResourceManager. The key method on ResourceManager is setDataPath, which allows you to specify the location where UIMA components will go to look for their external resources. Once you've obtained and initialized a ResourceManager, you can pass it to any of the produceXXX methods. 
+
+
+[[ugr.tug.application.using_aes]]
+== Using Analysis Engines
+
+This section describes how to add analysis capability to your application by using Analysis Engines developed using the UIMA SDK.
+An _Analysis Engine (AE)_ is a component that analyzes artifacts (e.g.
+documents) and infers information about them.
+
+An Analysis Engine consists of two parts - Java classes (typically packaged as one or more JAR files) and _AE descriptors_ (one or more XML files). You must put the Java classes in your application's class path, but thereafter you will not need to directly interact with them.
+The UIMA framework insulates you from this by providing a standard AnalysisEngine interfaces.
+
+The term _Text Analysis Engine (TAE)_ is sometimes used to describe an Analysis Engine that analyzes a text document.
+In the UIMA SDK v1.x, there was a TextAnalysisEngine interface that was commonly used.
+However, as of the UIMA SDK v2.0, this interface has been deprecated and all applications should switch to using the standard AnalysisEngine interface.
+
+The AE descriptor XML files contain the configuration settings for the Analysis Engine as well as a description of the AE's input and output requirements.
+You may need to edit these files in order to configure the AE appropriately for your application - the supplier of the AE may have provided documentation (or comments in the XML descriptor itself) about how to do this.
+
+[[ugr.tug.application.instantiating_an_ae]]
+=== Instantiating an Analysis Engine
+
+The following code shows how to instantiate an AE from its XML descriptor: 
+[source]
+----
+  //get Resource Specifier from XML file
+XMLInputSource in = new XMLInputSource("MyDescriptor.xml");
+ResourceSpecifier specifier = 
+    UIMAFramework.getXMLParser().parseResourceSpecifier(in);
+
+  //create AE here
+AnalysisEngine ae = 
+    UIMAFramework.produceAnalysisEngine(specifier);
+----
+
+The first two lines parse the XML descriptor (for AEs with multiple descriptor files, one of them is the "`main`" descriptor - the AE documentation should indicate which it is). The result of the parse is a `ResourceSpecifier` object.
+The third line of code invokes a static factory method ``UIMAFramework.produceAnalysisEngine``, which takes the specifier and instantiates an `AnalysisEngine` object.
+
+There is one caveat to using this approach - the Analysis Engine instance that you create will not support multiple threads running through it concurrently.
+If you need to support this, see <<ugr.tug.applications.multi_threaded>>.
+
+[[ugr.tug.application.analyzing_text_documents]]
+=== Analyzing Text Documents
+
+There are two ways to use the AE interface to analyze documents.
+You can either use the __xref:ref.adoc#ugr.ref.jcas[JCas]__ interface or you can directly use the __xref:ref.adoc#ugr.ref.cas[CAS]__ interface.
+Besides text documents, xref:tug.adoc#ugr.tug.aas[other kinds of artifacts] can also be analyzed.
+
+The basic structure of your application will look similar in both cases:
+
+.Using the JCas 
+[source]
+----
+  //create a JCas, given an Analysis Engine (ae)
+JCas jcas = ae.newJCas();
+  
+  //analyze a document
+jcas.setDocumentText(doc1text);
+ae.process(jcas);
+doSomethingWithResults(jcas);
+jcas.reset();
+  
+  //analyze another document
+jcas.setDocumentText(doc2text);
+ae.process(jcas);
+doSomethingWithResults(jcas);
+jcas.reset();
+...
+  //done
+ae.destroy();
+----
+
+.Using the CAS 
+[source]
+----
+//create a CAS
+CAS aCasView = ae.newCAS();
+
+//analyze a document
+aCasView.setDocumentText(doc1text);
+ae.process(aCasView);
+doSomethingWithResults(aCasView);
+aCasView.reset();
+
+//analyze another document
+aCasView.setDocumentText(doc2text);
+ae.process(aCasView);
+doSomethingWithResults(aCasView);
+aCasView.reset();
+...
+//done
+ae.destroy();
+----
+
+First, you create the CAS or JCas that you will use.
+Then, you repeat the following four steps for each document:
+
+. Put the document text into the CAS or JCas.
+. Call the AE's process method, passing the CAS or JCas as an argument
+. Do something with the results that the AE has added to the CAS or JCas
+. Call the CAS's or JCas's reset() method to prepare for another analysis 
+
+
+[[ugr.tug.applications.analyzing_non_text_artifacts]]
+=== Analyzing Non-Text Artifacts
+
+Analyzing non-text artifacts is similar to analyzing text documents.
+The main difference is that instead of using the `setDocumentText` method, you need to use the Sofa APIs to xref:tug.adoc#ugr.tug.aas[set the artifact] into the CAS.
+
+[[ugr.tug.applications.accessing_analysis_results]]
+=== Accessing Analysis Results
+
+Annotators (and applications) access the results of analysis via the CAS, using the CAS or JCas interfaces.
+These results are accessed using the CAS Indexes.
+There is one built-in index for instances of the built-in type `uima.tcas.Annotation` that can be used to retrieve instances of `Annotation` or any subtype of Annotation.
+You can also define additional indexes over other types. 
+
+Indexes provide a method to obtain an iterators over their contents; the iterator returns the matching elements one at time from the CAS.
+
+[[ugr.tug.applications.accessing_results_using_jcas]]
+==== Accessing Analysis Results using the JCas
+
+See:
+
+* xref:#ugr.tug.aae.reading_results_previous_annotators[xrefstyle=full];
+* xref:ref.adoc#ugr.ref.jcas[JCas Reference];
+* The Javadocs for `org.apache.uima.jcas.JCas`. 
+
+
+[[ugr.tug.application.accessing_results_using_cas]]
+==== Accessing Analysis Results using the CAS
+
+See:
+
+* xref:ref.adoc#ugr.ref.cas[CAS Reference]
+* The source code for `org.apache.uima.examples.PrintAnnotations`, which is in `examples\src.`
+* The Javadocs for the `org.apache.uima.cas` and `org.apache.uima.cas.text` packages. 
+
+
+[[ugr.tug.applications.multi_threaded]]
+=== Multi-threaded Applications
+
+You may be running on a multi-core system, and want to run multiple CASes at once through your pipeline.
+To support this, UIMA provides multiple approaches.
+The most flexible and recommended way to do this is to use the features of UIMA-AS, which not only allows scale-up (multiple threads in one CPU), but also supports scale-out (exploiting a cluster of machines).
+
+This section describes the simplest way to use an AE in a multi-threaded environment.
+First, note that most Analysis Engines are written with the assumption that only one thread will be accessing  it at any one time; that is, Analysis Engines are not written to be thread safe.
+The writers of these  assume that multiple instances of the Annotator Engine class will be instantiated as needed to support multiple  threads. 
+
+If your application has multiple threads that might invoke an Analysis Engine, to insure that  only one thread at a time uses a CAS and runs in the pipeline,  you can use the Java synchronized keyword to ensure that only one thread is using an AE at any given time.
+For example: 
+
+[source]
+----
+public class MyApplication {
+  private AnalysisEngine mAnalysisEngine;
+  private CAS mCAS;
+
+  public MyApplication() {
+    //get Resource Specifier from XML file
+    XMLInputSource in = new XMLInputSource("MyDescriptor.xml");
+    ResourceSpecifier specifier = 
+        UIMAFramework.getXMLParser().parseResourceSpecifier(in);
+ 
+    //create Analysis Engine here
+    mAnalysisEngine = UIMAFramework.produceAnalysisEngine(specifier);
+    mCAS = mAnalysisEngine.newCAS();
+  }
+
+  // Assume some other part of your multi-threaded application could
+  // call analyzeDocument on different threads, asynchronously
+
+  public synchronized void analyzeDocument(String aDoc) {
+    //analyze a document
+    mCAS.setDocumentText(aDoc);
+    mAnalysisEngine.process();  
+    doSomethingWithResults(mCAS);
+    mCAS.reset();
+  }
+  ...
+}
+----
+
+Without the synchronized keyword, this application would not be thread-safe.
+If multiple threads called the analyzeDocument method simultaneously, they would both use the same CAS and clobber each others' results.
+The synchronized keyword ensures that no more than one thread is executing this method at any given time.
+For more information on thread synchronization in Java, see link:http://docs.oracle.com/javase/tutorial/essential/concurrency/[].
+
+The synchronized keyword ensures thread-safety, but does not allow you to process more than one document at a time.
+If you need to process multiple documents simultaneously (for example, to make use of a multiprocessor machine), you'll need to use more than one CAS instance.
+
+Because CAS instances use memory and can take some time to construct, you don't want to create a new CAS instance for each request.
+Instead, you should use a feature of the UIMA SDK called the __CAS Pool__, implemented by the type `CasPool`.
+
+A CAS Pool contains some number of CAS instances (you specify how many when you create the pool). When a thread wants to use a CAS, it _checks out_ an instance from the pool.
+When the thread is done using the CAS, it must _release_ the CAS instance back into the pool.
+If all instances are checked out, additional threads will block and wait for an instance to become available.
+Here is some example code: 
+
+[source]
+----
+public class MyApplication {
+  private CasPool mCasPool;
+  
+  private AnalysisEngine mAnalysisEngine;
+  
+  public MyApplication()
+  {
+    //get Resource Specifier from XML file
+    XMLInputSource in = new XMLInputSource("MyDescriptor.xml");
+    ResourceSpecifier specifier = 
+      UIMAFramework.getXMLParser().parseResourceSpecifier(in);
+ 
+    //Create multithreadable AE that will 
+    //Accept 3 simultaneous requests
+    //The 3rd parameter specifies a timeout.
+    //When the number of simultaneous requests exceeds 3,
+    // additional requests will wait for other requests to finish. 
+    // This parameter determines the maximum number of milliseconds 
+    // that a new request should wait before throwing an
+    // - a value of 0 will cause them to wait forever.
+    mAnalysisEngine = UIMAFramework.produceAnalysisEngine(specifier,3,0);
+
+    //create CAS pool with 3 CAS instances
+    mCasPool = new CasPool(3, mAnalysisEngine);
+  }
+
+  // Notice this is no longer "synchronized"
+  public void analyzeDocument(String aDoc) {
+    //check out a CAS instance (argument 0 means no timeout)
+    CAS cas = mCasPool.getCas(0);  
+    try {
+      //analyze a document 
+      cas.setDocumentText(aDoc);   
+      mAnalysisEngine.process(cas);  
+      doSomethingWithResults(cas);
+    } finally {
+      //MAKE SURE we release the CAS instance
+      mCasPool.releaseCas(cas);  
+    }
+  }
+  ...
+}
+----
+
+There is not much more code required here than in the previous example.
+First, there is one additional parameter to the AnalysisEngine producer, specifying the number of annotator instances to create.
+footnote:[Both the UIMA Collection Processing Manager framework and the remote deployment services framework have implementations which use CAS pools in this manner, and thereby relieve the annotator developer of the necessity to make their annotators thread-safe.].
+Then, instead of creating a single CAS in the constructor, we now create a CasPool containing 3 instances.
+In the analyze method, we check out a CAS, use it, and then release it.
+
+[NOTE]
+====
+Frequently, the two numbers (number of CASes, and the number of AEs) will be the same.
+It would not make sense to have the number of CASes less than the number of AEs -- the extra AE instances would always block waiting for a CAS from the pool.
+It could make sense to have additional CASes, though -- if you had other multi-threaded processes that were using the CASes, other than the AEs. 
+====
+
+The getCAS() method returns a CAS which is not specialized to any particular subject of analysis.
+To process things other than this, please refer to xref:#ugr.tug.aas[].
+
+Note the use of the `try`...`finally` block.
+This is very important, as it ensures that the CAS we have checked out will be released back into the pool, even if the analysis code throws an exception.
+You should always use `try`...`finally` when using the CAS pool; if you do not, you risk exhausting the pool and causing deadlock.
+
+The parameter 0 passed to the `CasPool.getCas()` method is a timeout value.
+If this is set to a positive integer, it is the maximum number of milliseconds that the thread will wait for an instance to become available in the pool.
+If this time elapses, the getCas method will return null, and the application can do something intelligent, like ask the user to try again later.
+A value of 0 will cause the thread to wait for an available CAS, potentially forever.
+
+All of this can better be done using UIMA-AS.
+Besides taking care of setting up the CAS pools, etc., UIMA-AS allows a pipe line having several delegates to be scaled-up optimally for each delegate;  one delegate might have 5 instances, while another might have 3.
+It also does a different kind of initialization, in that it creates a thread pool itself, and insures that each annotator instance gets its `process()` method called using the same thread that was used for that annotator  instance's initialization call; some annotators could be written assuming that this is the case.
+
+[[ugr.tug.application.using_multiple_aes]]
+=== Using Multiple Analysis Engines and Creating Shared CASes
+
+In most cases, the easiest way to use multiple Analysis Engines from within an application is to combine them into an xref:tug.adoc#ugr.tug.aae.building_aggregates[aggregate AE].
+Be sure that you understand this method before deciding to use the more advanced feature described in this section.
+
+If you decide that your application does need to instantiate multiple AEs and have those AEs share a single CAS, then you will no longer be able to use the various methods on the `AnalysisEngine` class that create CASes (or JCases) to create your CAS.
+This is because these methods create a CAS with a data model specific to a single AE and which therefore cannot be shared by other AEs.
+Instead, you create a CAS as follows:
+
+Suppose you have two analysis engines, and one CAS Consumer, and you want to create one type system from the merge of all of their type specifications.
+Then you can do the following:
+
+[source]
+----
+AnalysisEngineDescription aeDesc1 =
+  UIMAFramework.getXMLParser().parseAnalysisEngineDescription(...);
+  
+  AnalysisEngineDescription aeDesc2 =
+  UIMAFramework.getXMLParser().parseAnalysisEngineDescription(...);
+
+  CasConsumerDescription ccDesc =
+  UIMAFramework.getXMLParser().parseCasConsumerDescription(...);
+
+  List list = new ArrayList();
+
+  list.add(aeDesc1);
+  list.add(aeDesc2);
+  list.add(ccDesc);
+
+  CAS cas = CasCreationUtils.createCas(list);
+
+  // (optional, if using the JCas interface) 
+  JCas jcas = cas.getJCas();
+----
+
+The CasCreationUtils class takes care of the work of merging the AEs' type systems and producing a CAS for the combined type system.
+If the type systems are not compatible, an exception will be thrown.
+
+[[ugr.tug.application.saving_cases_to_file_systems]]
+=== Saving CASes to file systems or general Streams
+
+The UIMA framework provides multiple APIs to save and restore the contents of a CAS to streams.
+Two common uses of this are to save CASes to the file system, and to send CASes to other processes, running on remote systems.
+
+The CASes can be serialized in multiple formats: 
+
+* Binary formats: 
++
+** plain binary: This is used to communicate with remote services, and also for interfacing with annotators written in C/C++ or related languages via the JNI Java interface, from Java
+** Compressed binary: There are two forms of xref:ref.adoc#ugr.ref.compress.overview[compressed binary]. The recommend one is form 6, which also allows type filtering
+* XML formats: There are two forms of this format. The preferred one is the xref:ref.adoc#ugr.ref.xmi[UIMA CAS XMI]. An older format is also available, called XCAS.
+* JSON formats: There is a link:https://github.com/apache/uima-uimaj-io-jsoncas#readme[UIMA CAS JSON] (de)serializer for the CAS available as a separate library. The UIMA CAS JSON format is also supported by the Python library link:https://github.com/dkpro/dkpro-cassis#readme[DKPro Cassis]. There is also an xref:ref.adoc#ugr.ref.json.overview[older JSON serializer] included in the UIMA Java SDK, but it only supports serialization.
+* Java Object serialization: There are APIs to convert a CAS to a Java object that can be serialized and deserialized using standard Java object read and write Object methods. There is also a way to include the CAS's type system and  index definition.
+
+Each of these serializations has different capabilities, summarized in the table below. 
+
+.Serialization Capabilities
+[cols="1,1,1,1,1,1,1,1", frame="all", options="header"]
+|===
+| 
+| XCAS
+| XMI
+| JSON
+| Binary
+| Cmpr 4
+| Cmrp 6
+| JavaObj
+
+|Output
+|Output Stream
+|Output Stream
+|Output Stream, File, Writer
+|Output Stream
+|Output Stream, Data Output Stream, File
+|Output Stream, Data Output Stream, File
+|-
+
+|Lists/Arrays inline formating?
+|-
+|Yes
+|Yes
+|-
+|-
+|-
+|-
+
+|Formated?
+|-
+|Yes
+|Yes
+|-
+|-
+|-
+|-
+
+|Type Filtering?
+|-
+|Yes
+|Yes
+|-
+|-
+|Yes
+|-
+
+|Delta Cas?
+|-
+|Yes
+|-
+|Yes
+|Yes
+|Yes
+|-
+
+|OOTS?
+|Yes
+|Yes
+|-
+|-
+|-
+|-
+|-
+
+|Only send indexed + reachable FSs?
+|Yes
+|Yes
+|Yes
+|send all
+|send all
+|Yes
+|send all
+
+|Name Space / Schemas?
+|-
+|Yes
+|-
+|-
+|-
+|-
+|-
+
+|lenient available?
+|Yes
+|Yes
+|-
+|-
+|-
+|Yes
+|-
+
+|optionally include embedded Type System and Indexes definition?
+|-
+|-
+|Just type system
+|Yes
+|Yes
+|Yes
+|Yes
+|===
+
+In the above table, Cmpr 4 and Cmpr 6 refer to Compressed forms of the serialization, and JavaObj refers to Java Object serialization.
+
+For the XMI and the old JSON format, lists and arrays can sometimes be formatted "inline". In this representation, the elements are formatted directly as the value of a particular feature.
+This is only done if the arrays and lists are not multiply-referenced.
+
+Type Filtering support enables only a subset of the types and/or features to be serialized.
+An additional type system object is used to specify the types to be included in the serialization.
+This can be useful, for instance, when sending a CAS to a remote service, where the remote service only uses a small number of the types and features, to reduce the size of the serialized CAS.
+
+Delta Cas support makes use of a "mark" set in the CAS, and only serializes changes in the CAS, both new and modified Feature Structures, that were added or changed after the mark was set.
+This is useful for remote services, supporting the use-case where a large CAS is sent to the service, which sets the mark in the received CAS, and then adds a small amount of information;  the Delta CAS then serializes only that small amount as the "reply" sent back to the sender.
+
+OOTS means "Out of Type System" support, intended to support the use-case where a CAS is being sent to a remote application.
+This supports deserializing an incoming CAS where some of the types and/or features may not be present in the receiving CAS's type system.
+A "lenient"  option on the deserialization permits the deserialization to proceed, with the out-of-type-system information preserved so that when the CAS is subsequently reserialized (in the use-case, to be  returned back to the sender), the out-of-type-system information is re-merged back into the output stream. 
+
+The Binary, Java Object, and Compressed Form 4 serializations send all the Feature Structures in the CAS, in the order they were created in the CAS.
+The other methods only  send Feature Structures that are reachable, either by  their being in some CAS index, or being referenced  as a feature of another Feature Structure which is reachable.
+
+The NameSpace/Schema support allows specifying a set of schemas, each one corresponding to a particular namespace, used in XMI serialization.
+
+Lenient allows the receiving Type System to be missing types and/or features that being deserialized.
+Normally this causes an exception, but with the lenient flag turned on, these extra types and/or features are  skipped over and ignored, with no error indicated.
+
+Some formats optionally allow embedded type system and indexes definition to be saved;  loaders for these can use that information to replace the CAS's type system and indexes definition, or (for compressed form 6) use the type system part to decode the serialized data.
+This is described in detail in the Javadocs for CasIOUtils.
+JSON serialization has several alternatives for optionally including portions of the type system, described in the reference document chapter on JSON.
+
+To save an XMI representation of a CAS, use the `save` method in `CasIOUtils` or the `serialize` method of the class ``org.apache.uima.util.XmlCasSerializer``.
+To save an XCAS representation of a CAS, use the `save` method in `CasIOUtils` class or use the `org.apache.uima.cas.impl.XCASSerializer` instead; see the Javadocs for details.
+
+All the external serialized forms (except JSON and the inline CAS approximate serialization)  can be read back in using the `CasIOUtils load` methods.
+The `CasIOUtils load` methods also have API forms that support  loading type system and index definition information at the same time (from addition input sources); there is also a form for loading compressed form 6 where you can pass the type system to use for decoding, when it is different from that of the receiving CAS.
+The XCAS and XMI external forms can also be read back in using the `deserialize` method of the class ``org.apache.uima.util.XmlCasDeserializer``.
+All of these methods deserialize into a pre-existing CAS, which you must create ahead of time.
+See the Javadocs for details.
+
+The `Serialization` class has various static methods for serializing and deserializing Java Object forms and  compressed forms, with finer control over available options.
+See the Javadocs for that class for details.
+
+Several of the APIs use or return instances of ``SerialFormat``, which is an enum specifying the various forms of serialization.
+
+Serialization often makes use of temporary extra data structures, anchored from the CAS being serialized.
+These are read/write, and because of this, most serializations are synchronized to prevent multiple serializations of the same CAS from happening in parallel.
+
+[[ugr.tug.application.using_cpes]]
+== Using Collection Processing Engines
+
+A __xref:tug.adoc#ugr.tug.cpe[Collection Processing Engine (CPE)]__ processes collections of artifacts (documents) through the combination of the following components: a Collection Reader, an optional CAS Initializer, Analysis Engines, and CAS Consumers.
+
+Like Analysis Engines, CPEs consist of a set of Java classes and a set of descriptors.
+You need to make sure the Java classes are in your classpath, but otherwise you only deal with descriptors.
+
+[[ugr.tug.application.running_a_cpe_from_a_descriptor]]
+=== Running a Collection Processing Engine from a Descriptor
+
+xref:#ugr.tug.cpe.running_cpe_from_application[xrefstyle=full] describes how to use the APIs to read a CPE descriptor and run it from an application.
+
+[[ugr.tug.application.configuring_a_cpe_descriptor_programmatically]]
+=== Configuring a Collection Processing Engine Descriptor Programmatically
+// <titleabbrev>Configuring a CPE Descriptor Programmatically</titleabbrev>
+
+For the finest level of control over the CPE descriptor settings, the CPE offers programmatic access to the descriptor via an API.
+With this API, a developer can create a complete descriptor and then save the result to a file.
+This also can be used to read in a descriptor (using `XMLParser.parseCpeDescription`` as shown in the previous section), modify it, and write it back out again.
+The CPE Descriptor API allows a developer to redefine default behavior related to error handling for each component, turn-on check-pointing, change performance characteristics of the CPE, and plug-in a custom timer.
+
+Below is some example code that illustrates how this works.
+See the Javadocs for package org.apache.uima.collection.metadata for more details.
+
+[source]
+----
+//Creates descriptor with default settings
+CpeDescription cpe = CpeDescriptorFactory.produceDescriptor();
+
+//Add CollectionReader 
+cpe.addCollectionReader([descriptor]);
+
+//Add CasInitializer (deprecated)
+cpe.addCasInitializer(<cas initializer descriptor>);
+
+// Provide the number of CASes the CPE will use
+cpe.setCasPoolSize(2);
+
+//  Define and add Analysis Engine 
+CpeIntegratedCasProcessor personTitleProcessor = 
+   CpeDescriptorFactory.produceCasProcessor (Person);
+
+// Provide descriptor for the Analysis Engine
+personTitleProcessor.setDescriptor([descriptor]);
+
+//Continue, despite errors and skip bad Cas
+personTitleProcessor.setActionOnMaxError(continue);
+
+  //Increase amount of time in ms the CPE waits for response
+//from this Analysis Engine
+personTitleProcessor.setTimeout(100000);
+
+//Add Analysis Engine to the descriptor
+cpe.addCasProcessor(personTitleProcessor);
+                                
+//  Define and add CAS Consumer
+CpeIntegratedCasProcessor consumerProcessor = 
+CpeDescriptorFactory.produceCasProcessor(Printer);
+consumerProcessor.setDescriptor([descriptor]);
+
+//Define batch size
+consumerProcessor.setBatchSize(100);
+
+//Terminate CPE on max errors
+consumerProcessor.setActionOnMaxError(terminate);
+
+//Add CAS Consumer to the descriptor
+cpe.addCasProcessor(consumerProcessor);
+
+//  Add Checkpoint file and define checkpoint frequency (ms)
+cpe.setCheckpoint([path]/checkpoint.dat, 3000);
+
+//  Plug in custom timer class used for timing events
+cpe.setTimer(org.apache.uima.internal.util.JavaTimer);
+
+//  Define number of documents to process
+cpe.setNumToProcess(1000);
+
+//  Dump the descriptor to the System.out
+((CpeDescriptionImpl)cpe).toXML(System.out);
+----
+
+The CPE descriptor for the above configuration looks like this: 
+
+[source]
+----
+<?xml version="1.0" encoding="UTF-8"?>
+<cpeDescription xmlns="http://uima.apache.org/resourceSpecifier">
+  <collectionReader>
+    <collectionIterator>
+      <descriptor>
+        <include href="[descriptor]"/>
+      </descriptor>
+      <configurationParameterSettings>...
+      </configurationParameterSettings>
+    </collectionIterator>
+
+    <casInitializer>
+      <descriptor>
+        <include href="[descriptor]"/>
+      </descriptor>
+      <configurationParameterSettings>...
+      </configurationParameterSettings>
+    </casInitializer>
+  </collectionReader>
+
+  <casProcessors casPoolSize="2" processingUnitThreadCount="1">
+    <casProcessor deployment="integrated" name="Person">
+      <descriptor>
+        <include href="[descriptor]"/>
+      </descriptor>
+      <deploymentParameters/>
+      <errorHandling>
+        <errorRateThreshold action="terminate" value="100/1000"/>
+        <maxConsecutiveRestarts action="terminate" value="30"/>
+        <timeout max="100000"/>
+      </errorHandling>
+      <checkpoint batch="100" time="1000ms"/>
+    </casProcessor>
+
+    <casProcessor deployment="integrated" name="Printer">
+      <descriptor>
+        <include href="[descriptor]"/>
+      </descriptor>
+      <deploymentParameters/>
+      <errorHandling>
+        <errorRateThreshold action="terminate"
+          value="100/1000"/>
+        <maxConsecutiveRestarts action="terminate"
+          value="30"/>
+        <timeout max="100000" default="-1"/>
+      </errorHandling>
+      <checkpoint batch="100" time="1000ms"/>
+    </casProcessor>
+  </casProcessors>
+
+  <cpeConfig>
+    <numToProcess>1000</numToProcess>
+    <deployAs>immediate</deployAs>
+    <checkpoint file="[path]/checkpoint.dat" time="3000ms"/>
+    <timerImpl>
+      org.apache.uima.reference_impl.util.JavaTimer
+    </timerImpl>
+  </cpeConfig>
+</cpeDescription>
+----
+
+[[ugr.tug.application.setting_configuration_parameters]]
+== Setting Configuration Parameters
+
+xref:tug.adoc#ugr.tug.aae.configuration_parameters[Configuration parameters] can be set using APIs as well as configured using the XML descriptor metadata specification.
+
+There are two different places you can set the parameters via the APIs.
+
+* After reading the XML descriptor for a component, but before you produce the component itself, and
+* After the component has been produced. 
+
+Setting the parameters before you produce the component is done using the ConfigurationParameterSettings object.
+You get an instance of this for a particular component by accessing that component description's metadata.
+For instance, if you produced a component description by using `UIMAFramework.getXMLParser().parse...` method, you can use that component description's `getMetaData()` method to get the metadata, and then the metadata's `getConfigurationParameterSettings()` method to get the `ConfigurationParameterSettings` object.
+Using that object, you can set individual parameters using the setParameterValue method.
+Here's an example, for a CAS Consumer component: 
+
+[source]
+----
+// Create a description object by reading the XML for the descriptor
+
+CasConsumerDescription casConsumerDesc =  
+   UIMAFramework.getXMLParser().parseCasConsumerDescription(new
+     XMLInputSource("descriptors/cas_consumer/InlineXmlCasConsumer.xml"));
+
+// get the settings from the metadata
+ConfigurationParameterSettings consumerParamSettings =
+    casConsumerDesc.getMetaData().getConfigurationParameterSettings();
+
+// Set a parameter value
+consumerParamSettings.setParameterValue(
+  InlineXmlCasConsumer.PARAM_OUTPUTDIR,
+  outputDir.getAbsolutePath());
+----
+
+Then you might produce this component using: 
+[source]
+----
+CasConsumer component =
+  UIMAFramework.produceCasConsumer(casConsumerDesc);
+----
+
+A side effect of producing a component is calling the component's "`initialize`" method, allowing it to read its configuration parameters.
+If you want to change parameters after this, use 
+[source]
+----
+component.setConfigParameterValue(
+    <parameter-name>,
+    <parameter-value>);
+----
+
+and then signal the component to re-read its configuration by calling the component's reconfigure method: 
+
+[source]
+----
+component.reconfigure();
+----
+
+Although these examples are for a CAS Consumer component, the parameter APIs also work for other kinds of components.
+
+[[ugr.tug.application.integrating_text_analysis_and_search]]
+== Integrating Text Analysis and Search
+
+A combination of AEs with a search engine capable of indexing both words and annotations over spans of text enables what UIMA refers to as __semantic search__.
+
+Semantic search is a search where the semantic intent of the query is specified using one or more entity or relation specifiers.
+For example, one could specify that they are looking for a person (named) "`Bush.`" Such a query would then not return results about the kind of bushes that grow in your garden.
+
+[[ugr.tug.application.building_an_index]]
+=== Building an Index
+
+To build a semantic search index using the UIMA SDK, you run a Collection Processing Engine that includes your AE along with a CAS Consumer which takes the tokens and annotatitions, together with sentence boundaries, and feeds them to a semantic searcher's index term input.
+Your AE must include an annotator that produces Tokens and Sentence annotations, along with any "`semantic`" annotations, because the Indexer requires this.
+
+[[ugr.tug.application.search.configuring_indexer]]
+==== Configuring the Semantic Search CAS Indexer
+
+Since there are several ways you might want to build a search index from the information in the CAS produced by your AE, you need to supply the Semantic Search CAS Consumer -- Indexer with configuration information in the form of an _Index Build Specification_ file.
+Apache UIMA includes code for parsing Index Build Specification files (see the Javadocs for details).
+An example of an Indexing specification tailored to the AE from the tutorial in the xref:tug.adoc#ugr.tug.aae[] is located in `examples/descriptors/tutorial/search/MeetingIndexBuildSpec.xml`. 
+It looks like this: 
+
+[source]
+----
+<indexBuildSpecification>
+  <indexBuildItem>
+    <name>org.apache.uima.examples.tokenizer.Token</name>
+    <indexRule>
+      <style name="Term"/>
+    </indexRule>    
+  </indexBuildItem>
+  <indexBuildItem>
+    <name>org.apache.uima.examples.tokenizer.Sentence</name>
+    <indexRule>
+      <style name="Breaking"/>
+    </indexRule>    
+  </indexBuildItem>
+  <indexBuildItem>
+    <name>org.apache.uima.tutorial.Meeting</name>
+    <indexRule>
+      <style name="Annotation"/>
+    </indexRule>    
+  </indexBuildItem>
+  <indexBuildItem>
+    <name>org.apache.uima.tutorial.RoomNumber</name>
+    <indexRule>
+      <style name="Annotation">
+        <attributeMappings>
+          <mapping>
+            <feature>building</feature>
+            <indexName>building</indexName>
+          </mapping>
+        </attributeMappings>
+      </style>
+    </indexRule>    
+  </indexBuildItem>
+  <indexBuildItem>
+    <name>org.apache.uima.tutorial.DateAnnot</name>
+    <indexRule>
+      <style name="Annotation"/>
+    </indexRule>    
+  </indexBuildItem>
+  <indexBuildItem>
+    <name>org.apache.uima.tutorial.TimeAnnot</name>
+    <indexRule>
+      <style name="Annotation"/>
+    </indexRule>    
+  </indexBuildItem>
+</indexBuildSpecification>
+----
+
+The index build specification is a series of index build items, each of which identifies a xref:ref.adoc#ugr.ref.cas[CAS annotation type] (a subtype of `uima.tcas.Annotation` and a style.
+
+The first item in this example specifies that the annotation type `org.apache.uima.examples.tokenizer.Token` should be indexed with the "`Term`" style.
+This means that each span of text annotated by a Token will be considered a single token for standard text search purposes.
+
+The second item in this example specifies that the annotation type `org.apache.uima.examples.tokenizer.Sentence` should be indexed with the "`Breaking`" style.
+This means that each span of text annotated by a Sentence will be considered a single sentence, which can affect that search engine's algorithm for matching queries. 
+
+The remaining items all use the "`Annotation`" style.
+This indicates that each annotation of the specified types will be stored in the index as a searchable span, with a name equal to the annotation name (without the namespace).
+
+Also, features of annotations can be indexed using the `<attributeMappings>` subelement.
+In the example index build specification, we declare that the `building` feature of the type `org.apache.uima.tutorial.RoomNumber` should be indexed.
+The `<indexName>` element can be used to map the feature name to a different name in the index, but in this example we have opted to use the same name, ``building``. 
+
+At the end of the batch or collection, the Semantic Search CAS Indexer builds the index.
+This index can be queried with simple tokens or with XML tags.
+
+Examples: 
+
+* A query on the word "`UIMA`" will retrieve all documents that have the occurrence of the word. But a query of the type `<Meeting>UIMA</Meeting>` will retrieve only those documents that contain a Meeting annotation (produced by our MeetingDetector TAE, for example), where that Meeting annotation contains the word "`UIMA`".
+* A query for `<RoomNumber building="Yorktown"/>` will return documents that have a RoomNumber annotation whose `building` feature contains the term "`Yorktown`". 
+
+For more information on the Index Build Specification format, see the xref:ref.adoc#ugr.ref.javadocs[UIMA Javadocs] for class `org.apache.uima.search.IndexBuildSpecification`.
+
+[[ugr.tug.application.search.cpe_with_semantic_search_cas_consumer]]
+==== Building and Running a CPE including the Semantic Search CAS Indexer
+// <titleabbrev>Using Semantic Search CAS Indexer</titleabbrev>
+
+The following steps illustrate how to build and run a CPE that uses the UIMA Meeting Detector TAE and the Simple Token and Sentence Annotator, discussed in <<ugr.tug.aae>> along with a CAS Consumer called the Semantic Search CAS Indexer, to build an index that allows you to query for documents based not only on textual content but also on whether they contain mentions of Meetings detected by the TAE.
+
+Run the CPE Configurator tool by executing the `cpeGui` shell script in the `bin` directory of the UIMA SDK.
+(For instructions on using this tool, see the xref:tools.adoc#ugr.tools.cpe[Collection Processing Engine Configurator User’s Guide].)
+
+In the CPE Configurator tool, select the following components by browsing to their descriptors:
+
+* Collection Reader: `%UIMA_HOME%/examples/descriptors/collectionReader/ FileSystemCollectionReader.xml`
+* Analysis Engine: include both of these; one produces tokens/sentences, required by the indexer in all cases and the other produces the meeting annotations of interest. 
++
+** `%UIMA_HOME%/examples/descriptors/analysis_engine/SimpleTokenAndSentenceAnnotator.xml`
+** `%UIMA_HOME%/examples/descriptors/tutorial/ex6/UIMAMeetingDetectorTAE.xml`
+* Two CAS Consumers: 
++
+** `%UIMA_HOME%/examples/descriptors/cas_consumer/SemanticSearchCasIndexer.xml`
+** `%UIMA_HOME%/examples/descriptors/cas_consumer/XmiWriterCasConsumer.xml`
+
+Set up parameters:
+
+* Set the File System Collection Reader's "`Input Directory`" parameter to point to the `%UIMA_HOME%/examples/data` directory.
+* Set the Semantic Search CAS Indexer's "`Indexing Specification Descriptor`" parameter to point to `%UIMA_HOME%/examples/descriptors/tutorial/search/ MeetingIndexBuildSpec.xml`
+* Set the Semantic Search CAS Indexer's "`Index Dir`" parameter to whatever directory into which you want the indexer to write its index files. 
++
+
+[WARNING]
+====
+The Indexer _erases_ old versions of the files it creates in this directory. 
+====
+* Set the XMI Writer CAS Consumer's "`Output Directory`" parameter to whatever directory into which you want to store the XMI files containing the results of your analysis for each document. 
+
+Click on the Run Button.
+Once the run completes, a statistics dialog should appear, in which you can see how much time was spent in each of the components involved in the run.
+
+[[ugr.tug.application.remote_services]]
+== Working with Remote Services
+
+[NOTE]
+====
+This chapter describes older methods of working with Remote Services.
+These approaches do not support some of the newer CAS features, such as multiple views and CAS Multipliers.
+These methods have been supplanted by UIMA-AS, which has full support for the new CAS features.
+====
+
+The UIMA SDK allows you to easily take any Analysis Engine or CAS Consumer and deploy it as a service.
+That Analysis Engine or CAS Consumer can then be called from a remote machine using various network protocols.
+
+The UIMA SDK provides support for the following communications protocols: 
+
+* Vinci, a lightweight protocol, included as a part of Apache UIMA.
+
+The UIMA framework can make use of these services in two different ways: 
+
+. An Analysis Engine can create a proxy to a remote service; this proxy acts like a local component, but connects to the remote. The proxy has limited error handling and retry capabilities. The Vinci protocol is supported.
+. A Collection Processing Engine can specify non-Integrated mode (see <<ugr.tug.cpe.deploying_a_cpe>>).
+The CPE provides more extensive error recovery capabilities.
+This mode only supports the Vinci communications protocol. 
+
+
+[[ugr.tug.application.how_to_deploy_a_vinci_service]]
+=== Deploying a UIMA Component as a Vinci Service
+// <titleabbrev>Deploying as a Vinci Service</titleabbrev>
+
+There are no software prerequisites for deploying a Vinci service.
+The necessary libraries are part of the UIMA SDK.
+However, before you can use Vinci services you need to deploy the Vinci Naming Service (VNS), as described in section <<ugr.tug.application.vns>>.
+
+To deploy a service, you have to insure any components you want to include can be found on the class path.
+One way to do this is to set the environment variable UIMA_CLASSPATH to the set of class paths you need for any included components.
+Then run the `startVinciService` shell script, which is located in the `bin` directory, and pass it the path to a Vinci deployment descriptor, for example: ``C:UIMA>bin/startVinciService ../examples/deploy/vinci/Deploy_PersonTitleAnnotator.xml``.
+If you are running Eclipse, and have the `uimaj-examples` project in your workspace, you can use the Eclipse Menu → Run → Run... and then pick "`UIMA Start Vinci Service`".
+
+This example deployment descriptor looks like: 
+[source]
+----
+<deployment name="Vinci Person Title Annotator Service">
+
+  <service name="uima.annotator.PersonTitleAnnotator" provider="vinci">
+
+    <parameter name="resourceSpecifierPath" 
+      value="C:/Program Files/apache/uima/examples/descriptors/
+          analysis_engine/PersonTitleAnnotator.xml"/>
+
+    <parameter name="numInstances" value="1"/>
+
+    <parameter name="serverSocketTimeout" value="120000"/>
+
+  </service>
+
+</deployment>
+----
+
+To modify this deployment descriptor to deploy your own Analysis Engine or CAS Consumer, just replace the areas indicated in bold italics (deployment name, service name, and resource specifier path) with values appropriate for your component.
+
+The `numInstances` parameter specifies how many instances of your Analysis Engine or CAS Consumer will be created.
+This allows your service to support multiple clients concurrently.
+When a new request comes in, if all of the instances are busy, the new request will wait until an instance becomes available.
+
+The `serverSocketTimeout` parameter specifies the number of milliseconds (default = 5 minutes) that the service will wait between requests to process something.
+After this amount of time, the server will presume the client may have gone away - and it "`cleans up`", releasing any resources it is holding.
+The next call to process on the service will result in a cycle which will cause the client to re-establish its connection with the service (some additional overhead).
+
+There are two additional parameters that you can add to your deployment descriptor: 
+
+* ``<parameter name="threadPoolMinSize" value="[Integer]"/>``: Specifies the number of threads that the Vinci service creates on startup in order to serve clients' requests.
+* ``<parameter name="threadPoolMaxSize" value="[Integer]"/>``: Specifies the maximum number of threads that the Vinci service will create. When the number of concurrent requests exceeds the ``threadPoolMinSize``, additional threads will be created to serve requests, until the `threadPoolMaxSize` is reached.
+
+The `startVinciService` script takes two additional optional parameters.
+The first one overrides the value of the VNS_HOST environment variable, allowing you to specify the name server to use.
+The second parameter if specified needs to be a unique (on this server) non-negative number, specifying the instance of this service.
+When used, this number allows multiple instances of the same named service to be started on one server; they will all register with the Vinci name service and be made available to client requests.
+
+Once you have deployed your component as a web service, you may call it from a remote machine.
+See <<ugr.tug.application.how_to_call_a_uima_service>> for instructions.
+
+[[ugr.tug.application.how_to_call_a_uima_service]]
+=== Calling a UIMA Service
+
+Once an Analysis Engine or CAS Consumer has been deployed as a service, it can be used from any UIMA application, in the exact same way that a local Analysis Engine or CAS Consumer is used.
+For example, you can call an Analysis Engine service from the Document Analyzer or use the CPE Configurator to build a CPE that includes Analysis Engine and CAS Consumer services.
+
+To do this, you use a _service client descriptor_ in place of the usual Analysis Engine or CAS Consumer Descriptor.
+A service client descriptor is a simple XML file that indicates the location of the remote service and a few parameters.
+Example service client descriptors are provided in the UIMA SDK under the directories ``examples/descriptors/vinciService``.
+The  contents of these descriptors are explained below.
+
+[[ugr.tug.application.vinci_service_client_descriptor]]
+==== Vinci Service Client Descriptor
+
+To call a Vinci service, a similar descriptor is used: 
+[source]
+----
+<uriSpecifier xmlns="http://uima.apache.org/resourceSpecifier">
+   <resourceType>AnalysisEngine</resourceType>
+   <uri>uima.annot.PersonTitleAnnotator</uri>
+   <protocol>Vinci</protocol>
+   <timeout>60000</timeout> 
+   <parameters>
+     <parameter name="VNS_HOST" value="some.internet.ip.name-or-address"/>
+     <parameter name="VNS_PORT" value="9000"/>
+   </parameters>
+</uriSpecifier>
+----
+
+Note that Vinci uses a centralized naming server, so the host where the service is deployed does not need to be specified.
+Only a name (``uima.annot.PersonTitleAnnotator``) is given, which must match the name specified in the deployment descriptor used to deploy the service.
+
+The host and/or port where your Vinci Naming Service (VNS) server is running can be specified by the optional <parameter> elements.
+If not specified, the value is taken from the specification given your Java command line (if present) using ``-DVNS_HOST=<host> ``and `-DVNS_PORT=<port>` system arguments.
+If not specified on the Java command line, defaults are used: localhost for the ``VNS_HOST``, and `9000` for the ``VNS_PORT``.
+See the next section for details on setting up a VNS server.
+
+[[ugr.tug.application.restrictions_on_remotely_deployed_services]]
+=== Restrictions on remotely deployed services
+
+Remotely deployed services are started on remote machines, using UIMA component descriptors on those remote machines.
+These descriptors supply any configuration and resource parameters for the service (configuration parameters are not transmitted from the calling instance to the remote one). Likewise, the remote descriptors supply the type system specification for the remote annotators that will be run (the type system of the calling instance is not transmitted to the remote one).
+
+The remote service wrapper, when it receives a CAS from the caller, instantiates it for the remote service, making instances of all types which the remote service specifies.
+Other instances in the incoming CAS for types which the remote service has no type specification for are kept aside, and when the remote service returns the CAS back to the caller, these type instances are re-merged back into the CAS being transmitted back to the caller.
+Because of this design, a remote service which doesn't declare a type system won't receive any type instances.
+
+[NOTE]
+====
+This behavior may change in future releases, to one where configuration parameters and / or type systems are transmitted to remote services. 
+====
+
+[[ugr.tug.application.vns]]
+=== The Vinci Naming Services (VNS)
+
+Vinci consists of components for building network-accessible services, clients for accessing those services, and an infrastructure for locating and managing services.
+The primary infrastructure component is the Vinci directory, known as VNS (for Vinci Naming Service).
+
+On startup, Vinci services locate the VNS and provide it with information that is used by VNS during service discovery.
+Vinci service provides the name of the host machine on which it runs, and the name of the service.
+The VNS internally creates a binding for the service name and returns the port number on which the Vinci service will wait for client requests.
+This VNS stores its bindings in a filesystem in a file called vns.services.
+
+In Vinci, services are identified by their service name.
+If there is more than one physical service with the same service name, then Vinci assumes they are equivalent and will route queries to them randomly, provided that they are all running on different hosts.
+You should therefore use a unique service name if you don't want to conflict with other services listed in whatever VNS you have configured jVinci to use.
+
+[[ugr.tug.application.vns.starting]]
+==== Starting VNS
+
+To run the VNS use the `startVNS` script found in the `bin` directory of the UIMA installation,  or launch it from Eclipse.
+If you've installed the `uimaj-examples` project, it will supply a pre-configured launch script you can access in Eclipse by selecting Menu → Run → Run... and picking "`UIMA Start VNS`".
+
+[NOTE]
+====
+VNS runs on port 9000 by default so please make sure this port is available.
+If you see the following exception: 
+[source]
+----
+java.net.BindException: Address already in use:
+
+JVM_Bind
+----
+
+it indicates that another process is running on port 9000.
+In this case, add the parameter `-p <port>` to the `startVNS` command, using `<port>` to specify an alternative port to use. 
+====
+
+When started, the VNS produces output similar to the following: 
+[source]
+----
+[10/6/04 3:44 PM | main] WARNING: Config file doesn't exist, 
+            creating a new empty config file!
+[10/6/04 3:44 PM | main] Loading config file : .vns.services
+[10/6/04 3:44 PM | main] Loading workspaces file : .vns.workspaces
+[10/6/04 3:44 PM | main] ====================================
+(WARNING) Unexpected exception:
+java.io.FileNotFoundException: .vns.workspaces (The system cannot find
+the file specified)
+  at java.io.FileInputStream.open(Native Method)
+  at java.io.FileInputStream.<init>(Unknown Source)
+  at java.io.FileInputStream.<init>(Unknown Source)
+  at java.io.FileReader.<init>(Unknown Source)
+  at org.apache.vinci.transport.vns.service.VNS.loadWorkspaces(VNS.java:339
+  at org.apache.vinci.transport.vns.service.VNS.startServing(VNS.java:237)
+  at org.apache.vinci.transport.vns.service.VNS.main(VNS.java:179)
+[10/6/04 3:44 PM | main] WARNING: failed to load workspace.
+[10/6/04 3:44 PM | main] VNS Workspace : null
+[10/6/04 3:44 PM | main] Loading counter file : .vns.counter
+[10/6/04 3:44 PM | main] Could not load the counter file : .vns.counter
+[10/6/04 3:44 PM | main] Starting backup thread,
+            using files .vns.services.bak
+and .vns.services
+[10/6/04 3:44 PM | main] Serving on port : 9000
+[10/6/04 3:44 PM | Thread-0] Backup thread started
+[10/6/04 3:44 PM | Thread-0] Saving to config file : .vns.services.bak
+>>>>>>>>>>>>> VNS is up and running! <<<<<<<<<<<<<<<<<
+>>>>>>>>>>>>> Type 'quit' and hit ENTER to terminate VNS <<<<<<<<<<<<<
+[10/6/04 3:44 PM | Thread-0] Config save required 10 millis.
+[10/6/04 3:44 PM | Thread-0] Saving to config file : .vns.services
+[10/6/04 3:44 PM | Thread-0] Config save required 10 millis.
+[10/6/04 3:44 PM | Thread-0] Saving counter file : .vns.counter
+----
+
+[NOTE]
+====
+Disregard the _java.io.FileNotFoundException: .\vns.workspaces (The system cannot find the file specified)._ 
+It is just a complaint, not a serious problem.
+VNS Workspace is a feature of the VNS that is not critical.
+The important information to note is `[10/6/04 3:44 PM | main] Serving on port : 9000` which states the actual port where VNS will listen for incoming requests.
+All Vinci services and all clients connecting to services must provide the VNS port on the command line IF the port is not a default.
+Again the default port is 9000.
+Please see <<ugr.tug.application.launching_vinci_services>> below for details about the command line and parameters.
+====
+
+[[ugr.tug.application.vns_files]]
+==== VNS Files
+
+The VNS maintains two external files: 
+
+* `vns.services`
+* `vns.counter`
+
+These files are generated by the VNS in the same directory where the VNS is launched from.
+Since these files may contain old information it is best to remove them before starting the VNS.
+This step ensures that the VNS has always the newest information and will not attempt to connect to a service that has been shutdown.
+
+[[ugr.tug.application.launching_vinci_services]]
+==== Launching Vinci Services
+
+When launching Vinci service, you must indicate which VNS the service will connect to.
+A Vinci service is typically started using the script ``startVinciService``, found in the `bin` directory of the UIMA installation.
+(If you're using Eclipse and have the `uimaj-examples` project in the workspace, you will also find an Eclipse launcher named "`UIMA Start Vinci Service`" you can use.)   For the script, the environmental variable VNS_HOST should be set to the name or IP address of the machine hosting the Vinci Naming Service.
+The default is localhost, the machine the service is deployed on.
+This name can also be passed as the second argument to the startVinciService script.
+The default port for VNS is 9000 but can be overriden with the VNS_PORT environmental variable.
+
+If you write your own startup script, to define Vinci's default VNS you must provide the following JVM parameters: 
+
+[source]
+----
+java -DVNS_HOST=localhost -DVNS_PORT=9000 ...
+----
+
+The above setting is for the VNS running on the same machine as the service.
+Of course one can deploy the VNS on a different machine and the JVM parameter will need to be changed to this: 
+
+[source]
+----
+java -DVNS_HOST=<host> -DVNS_PORT=9000 ...
+----
+
+where "`<host>`" is a machine name or its IP where the VNS is running.
+
+[NOTE]
+====
+VNS runs on port 9000 by default.
+If you see the following exception: 
+[source]
+----
+(WARNING) Unexpected exception:
+org.apache.vinci.transport.ServiceDownException: 
+          VNS inaccessible: java.net.Connect
+Exception: Connection refused: connect
+----
+then, perhaps the VNS is not running OR the VNS is running but it is using a different port.
+To correct the latter, set the environmental variable VNS_PORT to the correct port before starting the service.
+====
+
+To get the right port check the VNS output for something similar to the following: 
+[source]
+----
+[10/6/04 3:44 PM | main] Serving on port : 9000
+----
+
+It is printed by the VNS on startup.
+
+[[ugr.tug.configuring_timeout_settings]]
+=== Configuring Timeout Settings
+
+UIMA has several timeout specifications, summarized here.
+The timeouts associated with remote  services are discussed below.
+In addition there are timeouts that can be specified for: 
+
+* *Acquiring an empty CAS from a CAS Pool:* See <<ugr.tug.applications.multi_threaded>>.
+* *Reassembling chunks of a large document* See xref:ref.adoc#ugr.ref.xml.cpe_descriptor.descriptor.operational_parameters[Operational Parameters].
+
+If your application uses remote UIMA services it is important to consider how to set the _timeout_ values appropriately.
+This is particularly important if your service can take a long time to process each request.
+
+There are two types of timeout settings in UIMA, the _client timeout_ and the __server socket timeout__.
+The client timeout is usually the most important, it specifies how long that client is willing to wait for the service to process each CAS.
+The client timeout can be specified for Vinci.
+The server socket timeout (Vinci only) specifies how long the service holds the connection open between calls from the client.
+After this amount of time, the server will presume the client may have gone away - and it "`cleans up`", releasing any resources it is holding.
+The next call to process on the service will cause the client to re-establish its connection with the service (some additional overhead).
+
+[[ugr.tug.setting_client_timeout]]
+==== Setting the Client Timeout
+
+The way to set the client timeout is different depending on what deployment mode you use in your CPE (if any).
+
+If you are using the default "`integrated`" deployment mode in your CPE, or if you are not using a CPE at all, then the client timeout is specified in your Service Client Descriptor (see <<ugr.tug.application.how_to_call_a_uima_service>>). For example:
+
+[source]
+----
+<uriSpecifier xmlns="http://uima.apache.org/resourceSpecifier">
+   <resourceType>AnalysisEngine</resourceType>
+   <uri>uima.annot.PersonTitleAnnotator</uri>
+   <protocol>Vinci</protocol>
+   <timeout>60000</timeout> 
+   <parameters>
+     <parameter name="VNS_HOST" value="some.internet.ip.name-or-address"/>
+     <parameter name="VNS_PORT" value="9000"/>
+   </parameters>
+</uriSpecifier>
+----
+
+The client timeout in this example is ``60000``.
+This value specifies the number of milliseconds that the client will wait for the service to respond to each request.
+In this example, the client will wait for one minute.
+
+If the service does not respond within this amount of time, processing of the current CAS will abort.
+If you called the `AnalysisEngine.process` method directly from your application, an Exception will be thrown.
+If you are running a CPE, what happens next is dependent on the error handling settings in your CPE descriptor (see xref:ref.adoc#ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.error_handling[CAS Processor Error Handling]). 
+The default action is for the CPE to terminate, but you can override this. 
+
+If you are using the "`managed`" or "`non-managed`" deployment mode in your CPE, then the client timeout is specified in your CPE desciptor's `errorHandling` element.
+For example:
+
+[source]
+----
+<errorHandling>
+  <maxConsecutiveRestarts .../>
+  <errorRateThreshold .../>
+  <timeout max="60000"/>
+</errorHandling>
+----
+
+As in the previous example, the client timeout is set to ``60000``, and this specifies the number of milliseconds that the client will wait for the service to respond to each request.
+
+If the service does not respond within the specified amount of time, the action is determined by the settings for `maxConsecutiveRestarts` and ``errorRateThreshold``.
+These settings support such things as restarting the process (for "`managed`" deployment mode), dropping and reestablishing the connection (for "`non-managed`" deployment mode), and removing the offending service from the pipeline.
+See xref:ref.adoc#ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.error_handling[CAS Processor Error Handling]) for details. 
+
+Note that the client timeout does not apply to the `GetMetaData` request that is made when the client first connects to the service.
+This call is typically very fast and does not need a large timeout (the default is 60 seconds).  However, if many clients are competing for a small number of services, it may be necessary to increase this value.
+See xref:ref.adoc#ugr.ref.xml.component_descriptor.service_client[Service Client Descriptors].
+
+
+[[ugr.tug.setting_server_socket_timeout]]
+==== Setting the Server Socket Timeout
+
+The Server Socket Timeout applies only to Vinci services, and is specified in the Vinci deployment descriptor as discussed in section <<ugr.tug.application.how_to_deploy_a_vinci_service>>.
+For example: 
+[source]
+----
+<deployment name="Vinci Person Title Annotator Service">
+
+  <service name="uima.annotator.PersonTitleAnnotator" provider="vinci">
+
+    <parameter name="resourceSpecifierPath" 
+      value="C:/Program Files/apache/uima/examples/descriptors/
+          analysis_engine/PersonTitleAnnotator.xml"/>
+
+    <parameter name="numInstances" value="1"/>
+
+    <parameter name="serverSocketTimeout" value="120000"/>
+
+  </service>
+
+</deployment>
+----
+
+The server socket timeout here is set to `120000` milliseconds, or two minutes.
+This parameter specifies how long the service will wait between requests to process something.
+After this amount of time, the server will presume the client may have gone away - and it "`cleans up`", releasing any resources it is holding.
+The next call to process on the service will cause the client to re-establish its connection with the service (some additional overhead). The service may print a "`Read Timed Out`" message to the console when the server socket timeout elapses.
+
+In most cases, it is not a problem if the server socket timeout elapses.
+The client will simply reconnect.
+However, if you notice "`Read Timed Out`" messages on your server console, followed by other connection problems, it is possible that the client is having trouble reconnecting for some reason.
+In this situation it may help increase the stability of your application if you increase the server socket timeout so that it does not elapse during actual processing.
+
+[[ugr.tug.application.increasing_performance_using_parallelism]]
+== Increasing performance using parallelism
+
+There are several ways to exploit parallelism to increase performance in the UIMA Framework.
+These range from running with additional threads within one Java virtual machine on one host (which might be a multi-processor or hyper-threaded host) to deploying analysis engines on a set of remote machines.
+
+The Collection Processing facility in UIMA provides the ability to scale the pipe-line of analysis engines.
+This scale-out runs multiple threads within the Java virtual machine running the CPM, one for each pipe in the pipe-line.
+To activate it, in the `<casProcessors>` descriptor element, set the attribute ``processingUnitThreadCount``, which specifies the number of replicated processing pipelines, to a value greater than 1, and insure that the size of the CAS pool is equal to or greater than this number (the attribute of `<casProcessors>` to set is ``casPoolSize``). For more details on these settings, see xref:ref.adoc#ugr.ref.xml.cpe_descriptor.descriptor.cas_processors[CAS Processors].
+
+For deployments that incorporate remote analysis engines in the Collection Manager pipe-line, running on multiple remote hosts, scale-out is supported which uses the Vinci naming service.
+If multiple instances of a service with the same name, but running on different hosts, are registered with the Vinci Name Server, it will assign these instances to incoming requests.
+
+There are two modes supported: a "`random`" assignment, and a "`exclusive`" one.
+The "`random`" mode distributes load using an algorithm that selects a service instance at random.
+The UIMA framework supports this only for the case where all of the instances are running on unique hosts; the framework does not support starting 2 or more instances on the same host.
+
+The exclusive mode dedicates a particular remote instance to each Collection Manager pip-line instance.
+This mode is enabled by adding a configuration parameter in the <casProcessor> section of the CPE descriptor:
+
+[source]
+----
+<deploymentParameters>
+  <parameter name="service-access" value="exclusive" />
+</deploymentParameters>
+----
+
+If this is not specified, the "`random`" mode is used.
+
+In addition, remote UIMA engine services can be started with a parameter that specifies the number of instances the service should support (see the `<parameter name="numInstances">` XML element in remote deployment descriptor <<ugr.tug.application.remote_services>> Specifying more than one causes the service wrapper for the analysis engine to use multi-threading (within the single Java Virtual Machine – which can take advantage of multi-processor and hyper-threaded architectures).
+
+[NOTE]
+====
+When using Vinci in "`exclusive`" mode (see service access under xref:ref.adoc#ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.deployment_parameters[Individual Deployment Parameters]), only one thread is used.
+To achieve multi-processing on a server in this case, use multiple instances of the service, instead of multiple threads (see <<ugr.tug.application.how_to_deploy_a_vinci_service>>.
+====
+
+[[ugr.tug.application.jmx]]
+== Monitoring AE Performance using JMX
+
+UIMA supports remote monitoring of Analysis Engine performance via the Java Management Extensions (JMX) API.
+When you run a UIMA with a JVM that supports JMX, the UIMA framework will automatically detect the presence of JMX and will register _MBeans_ that provide access to the performance statistics.
+
+Note: I local monitoring does not work out-of-the-box, you can configure your application for remote monitoring (even when on the same host) by specifying a unique port number, e.g.
+
+[source]
+----
+-Dcom.sun.management.jmxremote.port=1098
+-Dcom.sun.management.jmxremote.authenticate=false
+-Dcom.sun.management.jmxremote.ssl=false
+----
+
+Now, you can use any JMX client to view the statistics.
+Simply open a command prompt, make sure the JDK `bin` directory is in your path, and execute the `jconsole` command.
+This should bring up a window allowing you to select one of the local JMX-enabled applications currently running, or to enter a remote (or local) host and port, e.g. `localhost:1098``.
+The next screen will show a summary of information about the Java process that you connected to.
+Click on the "`MBeans`" tab, then expand "`org.apache.uima`" in the tree at the left.
+You should see a view like this: 
+
+image::images/tutorials_and_users_guides/tug.application/image006.jpg[Screenshot of JMX console monitoring UIMA components]
+
+Each of the nodes under "``org.apache.uima``" in the tree represents one of the UIMA Analysis Engines in the application that you connected to.
+You can select one of the analysis engines to view its performance statistics in the view at the right.
+
+Probably the most useful statistic is "`CASes Per Second`", which is the number of CASes that this AE has processed divided by the amount of time spent in the AE's process method, in seconds.
+Note that this is the total elapsed time, not CPU time.
+Even so, it can be useful to compare the "`CASes Per Second`" numbers of all of your Analysis Engines to discover where the bottlenecks occur in your application.
+
+The `AnalysisTime`, `BatchProcessCompleteTime`, and `CollectionProcessCompleteTime` properties show the total elapsed time, in milliseconds, that has been spent in the AnalysisEngine's `process()`, `batchProcessComplete()`, and `collectionProcessComplete()` methods, respectively.
+(Note that for CAS Multipliers, time spent in the `hasNext()` and `next()` methods is also counted towards the AnalysisTime.)
+
+Note that once your UIMA application terminates, you can no longer view the statistics through the JMX console.
+If you want to use JMX to view processes that have completed, you will need to write your application so that the JVM remains running after processing completes, waiting for some user signal before terminating.
+
+It is possible to override the default JMX MBean names UIMA uses, for example to better organize the UIMA MBeans with respect to MBeans exposed by other parts of your application.
+This is done using the `AnalysisEngine.PARAM_MBEAN_NAME_PREFIX` additional parameter  when creating your AnalysisEngine: 
+
+[source]
+----
+  //set up Map with custom JMX MBean name prefix
+  Map paramMap = new HashMap();
+  paramMap.put(AnalysisEngine.PARAM_MBEAN_NAME_PREFIX,
+               "org.myorg:category=MyApp");
+        
+  // create Analysis Engine
+  AnalysisEngine ae = 
+      UIMAFramework.produceAnalysisEngine(specifier, paramMap);
+----
+
+Similary, you can use the `AnalysisEngine.PARAM_MBEAN_SERVER` parameter to specify a particular instance of a JMX MBean Server with which UIMA should register the MBeans.
+If no specified then the default is to register with the platform MBeanServer.
+
+[[_tug.application.pto]]
+== Performance Tuning Options
+
+There are a small number of performance tuning options available to influence the runtime behavior of UIMA applications.
+Performance tuning options need to be set programmatically when an analysis engine is created.
+You simply create a Java Properties object with the relevant options and pass it to the UIMA framework on the call to create an analysis engine.
+Below is an example. 
+[source]
+----
+
+	  	  XMLParser parser = UIMAFramework.getXMLParser();
+	      ResourceSpecifier spec = parser.parseResourceSpecifier(
+	            new XMLInputSource(descriptorFile));
+	      // Create a new properties object to hold the settings.
+	      Properties performanceTuningSettings = new Properties();
+	      // Set the initial CAS heap size.
+	      performanceTuningSettings.setProperty(
+	            UIMAFramework.CAS_INITIAL_HEAP_SIZE, 
+	            "1000000");
+	      // Create a wrapper properties object that can
+	      // be passed to the framework.
+	      Properties additionalParams = new Properties();
+	      // Set the performance tuning properties as value to
+	      // the appropriate parameter.
+	      additionalParams.put(
+	            Resource.PARAM_PERFORMANCE_TUNING_SETTINGS, 
+	            performanceTuningSettings);
+	      // Create the analysis engine with the parameters.
+	      // The second, unused argument here is a custom 
+	      // resource manager.
+	      this.ae = UIMAFramework.produceAnalysisEngine(
+	          spec, null, additionalParams);
+----
+
+The following options are supported: 
+
+* ``UIMAFramework.PROCESS_TRACE_ENABLED``: enable the process trace mechanism (true/false).  When enabled, UIMA tracks the time spent in individual components of an aggregate  AE or CPE. For more information, see the API documentation of ``org.apache.uima.util.ProcessTrace``. 
+* ``UIMAFramework.SOCKET_KEEPALIVE_ENABLED``: enable socket KeepAlive (true/false).  This setting is currently only supported by Vinci clients. Defaults to ``true``. 
diff --git a/uimaj-documentation/src/docs/asciidoc/tug/tug.cas_multiplier.adoc b/uimaj-documentation/src/docs/asciidoc/tug/tug.cas_multiplier.adoc
new file mode 100644
index 0000000..e92945d
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/tug/tug.cas_multiplier.adoc
@@ -0,0 +1,682 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.tug.cm]]
+= CAS Multiplier Developer's Guide
+// <titleabbrev>CAS Multiplier</titleabbrev>
+
+The UIMA analysis components (Annotators and CAS Consumers) described previously in this manual all take a single CAS as input, optionally make modifications to it, and output that same CAS.
+This chapter describes an advanced feature that became available in the UIMA SDK v2.0: a new type of analysis component called a __CAS Multiplier__, which can create new CASes during processing.
+
+CAS Multipliers are often used to split a large artifact into manageable pieces.
+This is a common requirement of audio and video analysis applications, but can also occur in text analysis on very large documents.
+A CAS Multiplier would take as input a single CAS representing the large artifact (perhaps by a remote reference to the actual data -- see <<ugr.tug.aas.sofa_data_formats>>) and produce as output a series of new CASes each of which contains only a small portion of the original artifact.
+
+CAS Multipliers are not limited to dividing an artifact into smaller pieces, however.
+A CAS Multiplier can also be used to combine smaller segments together to form larger segments.
+In general, a CAS Multiplier is used to _change_ the segmentation of a series of CASes; that is, to change how a stream of data is divided among discrete CAS objects.
+
+[[ugr.tug.cm.developing_multiplier_code]]
+== Developing the CAS Multiplier Code
+
+[[ugr.tug.cm.cm_interface_overview]]
+=== CAS Multiplier Interface Overview
+
+CAS Multiplier implementations should extend from the `JCasMultiplier_ImplBase` or `CasMultiplier_ImplBase` classes, depending on which CAS interface they prefer to use.
+As with other types of analysis components, the CAS Multiplier ImplBase classes define optional ``initialize``, ``destroy``, and `reconfigure` methods.
+There are then three required methods: ``process``, ``hasNext``, and ``next``.
+The framework interacts with these methods as follows:
+
+. The framework calls the CAS Multiplier's `process` method, passing it an input CAS. The process method returns, but may hold on to a reference to the input CAS.
+. The framework then calls the CAS Multiplier's `hasNext` method. The CAS Multiplier should return `true` from this method if it intends to output one or more new CASes (for instance, segments of this CAS), and `false` if not.
+. If `hasNext` returned true, the framework will call the CAS Multiplier's `next` method. The CAS Multiplier creates a new CAS (we will see how in a moment), populates it, and returns it from the `next` method.
+. Steps 2 and 3 continue until `hasNext` returns false. If the framework detects a situation where it needs to cancel this CAS Multiplier, it will stop calling the `hasNext` and `next` methods, and when another top-level CAS comes along it will call the annotator's `process` method again. User's annotator code should interpret this as a signal to cleanup  processing related to the previous CAS and then start processing with the new CAS.
+
+From the time when `process` is called until the `hasNext` method returns false (or `process` is called again),  the CAS Multiplier "`owns`" the CAS that was passed to its `process` method.
+The CAS Multiplier can store a reference to this CAS in a local field and can read from it or write to it during this time.
+Once the ending condition occurs, the CAS Multiplier gives up ownership of the input CAS and should no longer retain a reference to it.
+
+[[ugr.tug.cm.how_to_get_empty_cas_instance]]
+=== How to Get an Empty CAS Instance
+// <titleabbrev>Getting an empty CAS Instance</titleabbrev>
+
+The CAS Multiplier's `next` method must return a CAS instance that represents a new representation of the input artifact.
+Since CAS instances are managed by the framework, the CAS Multiplier cannot actually create a new CAS; instead it should request an empty CAS by calling the method: 
+
+[source]
+----
+CAS getEmptyCAS()
+----
+
+or
+
+[source]
+----
+JCas getEmptyJCas()
+---- 
+
+which are defined on the `CasMultiplier_ImplBase` and `JCasMultiplier_ImplBase` classes, respectively.
+
+Note that if it is more convenient you can request an empty CAS during the `process` or `hasNext` methods, not just during the `next` method.
+
+By default, a CAS Multiplier is only allowed to hold one output CAS instance at a time.
+You must return the CAS from the `next` method before you can request a second CAS.
+If you try to call getEmptyCAS a second time you will get an Exception.
+You can change this default behavior by overriding the method `getCasInstancesRequired` to return the number of CAS instances that you need.
+Be aware that CAS instances consume a significant amount of memory, so setting this to a large value will cause your application to use a lot of RAM.
+So, for example, it is not a good practice to attempt to generate a large number of new CASes in the CAS Multiplier's `process` method.
+Instead, you should spread your processing out across the calls to the `hasNext` or `next` methods.
+
+[NOTE]
+====
+You can only call `getEmptyCAS()` or `getEmptyJCas()` from your CAS Multiplier's ``process``, ``hasNext``, or `next` methods.
+You cannot call it from other methods such as ``initialize``.
+This is because the Aggregate AE's Type System is not available until all of the components of the aggregate have finished their initialization. 
+====
+
+The Type System of the empty CAS will contain all of the type definitions for all  components of the outermost Aggregate Analysis Engine or Collection Processing Engine that contains your CAS Multiplier.
+Therefore downstream components that receive  these CASes can add new instances of any type that they define.
+
+[WARNING]
+====
+Be careful to keep the Feature Structures that belong to each CAS separate.
+You  cannot create references from a Feature Structure in one CAS to a Feature Structure in another CAS.
+You also cannot add a Feature Structure created in one CAS to the indexes of a different CAS.
+If you attempt to do this, the results are undefined. 
+====
+
+[[ugr.tug.cm.example_code]]
+=== Example Code
+
+This section walks through the source code of an example CAS Multiplier that breaks text documents into smaller pieces.
+The Java class for the example is `org.apache.uima.examples.casMultiplier.SimpleTextSegmenter` and the source code is included in the UIMA SDK under the `examples/src` directory.
+
+[[ugr.tug.cm.example_code.overall_structure]]
+==== Overall Structure
+
+[source]
+----
+public class SimpleTextSegmenter extends JCasMultiplier_ImplBase {
+  private String mDoc;
+  private int mPos;
+  private int mSegmentSize;
+  private String mDocUri;  
+  
+  public void initialize(UimaContext aContext) 
+          throws ResourceInitializationException
+  { ... }
+
+  public void process(JCas aJCas) throws AnalysisEngineProcessException
+  { ... }
+
+  public boolean hasNext() throws AnalysisEngineProcessException
+  { ... }
+
+  public AbstractCas next() throws AnalysisEngineProcessException
+  { ... }
+}
+----
+
+The `SimpleTextSegmenter` class extends `JCasMultiplier_ImplBase` and implements the optional `initialize` method as well as the required ``process``, ``hasNext``, and `next` methods.
+Each method is described below.
+
+[[ugr.tug.cm.example_code.initialize]]
+==== Initialize Method
+
+[source]
+----
+public void initialize(UimaContext aContext) throws
+                    ResourceInitializationException {
+  super.initialize(aContext);
+  mSegmentSize = ((Integer)aContext.getConfigParameterValue(
+                            "segmentSize")).intValue();
+}
+----
+
+Like an Annotator, a CAS Multiplier can override the initialize method and read configuration parameter values from the UimaContext.
+The SimpleTextSegmenter defines one parameter, "`Segment
+          Size`", which determines the approximate size (in characters) of each segment that it will produce.
+
+[[ugr.tug.cm.example_code.process]]
+==== Process Method
+
+[source]
+----
+public void process(JCas aJCas) 
+       throws AnalysisEngineProcessException {
+  mDoc = aJCas.getDocumentText();
+  mPos = 0;
+  // retreive the filename of the input file from the CAS so that it can 
+  // be added to each segment
+  FSIterator it = aJCas.
+          getAnnotationIndex(SourceDocumentInformation.type).iterator();
+  if (it.hasNext()) {
+    SourceDocumentInformation fileLoc = 
+          (SourceDocumentInformation)it.next();
+    mDocUri = fileLoc.getUri();
+  }
+  else {
+    mDocUri = null;
+  }
+ }
+----
+
+The process method receives a new JCas to be processed(segmented) by this CAS Multiplier.
+The SimpleTextSegmenter extracts some information from this JCas and stores it in fields (the document text is stored in the field mDoc and the source URI in the field mDocURI). Recall that the CAS Multiplier is considered to "`own`" the JCas from the time when process is called until the time when hasNext returns false.
+Therefore it is acceptable to retain references to objects from the JCas in a CAS Multiplier, whereas this should never be done in an Annotator.
+The CAS Multiplier could have chosen to store a reference to the JCas itself, but that was not necessary for this example.
+
+The CAS Multiplier also initializes the mPos variable to 0.
+This variable is a position into the document text and will be incremented as each new segment is produced.
+
+[[ugr.tug.cm.example_code.hasnext]]
+==== HasNext Method
+
+[source]
+----
+public boolean hasNext() throws AnalysisEngineProcessException {
+  return mPos < mDoc.length();
+}
+----
+
+The job of the hasNext method is to report whether there are any additional output CASes to produce.
+For this example, the CAS Multiplier will break the entire input document into segments, so we know there will always be a next segment until the very end of the document has been reached.
+
+[[ugr.tug.cm.example_code.next]]
+==== Next Method
+
+[source]
+----
+public AbstractCas next() throws AnalysisEngineProcessException {
+  int breakAt = mPos + mSegmentSize;
+  if (breakAt > mDoc.length())
+    breakAt = mDoc.length();
+          
+  // search for the next newline character. 
+  // Note: this example segmenter implementation
+  // assumes that the document contains many newlines. 
+  // In the worst case, if this segmenter
+  // is run on a document with no newlines, 
+  // it will produce only one segment containing the
+  // entire document text. 
+  // A better implementation might specify a maximum segment size as
+  // well as a minimum.
+          
+  while (breakAt < mDoc.length() && 
+         mDoc.charAt(breakAt - 1) != '\n')
+    breakAt++;
+
+  JCas jcas = getEmptyJCas();
+  try {
+    jcas.setDocumentText(mDoc.substring(mPos, breakAt));
+    // if original CAS had SourceDocumentInformation, 
+          also add SourceDocumentInformatio
+    // to each segment
+    if (mDocUri != null) {
+      SourceDocumentInformation sdi = 
+          new SourceDocumentInformation(jcas);
+      sdi.setUri(mDocUri);
+      sdi.setOffsetInSource(mPos);
+      sdi.setDocumentSize(breakAt - mPos);
+      sdi.addToIndexes();
+
+      if (breakAt == mDoc.length()) {
+        sdi.setLastSegment(true);
+      }
+    }
+
+    mPos = breakAt;
+    return jcas;
+  } catch (Exception e) {
+    jcas.release();
+    throw new AnalysisEngineProcessException(e);
+  }
+}
+----
+
+The `next` method actually produces the next segment and returns it.
+The framework guarantees that it will not call `next` unless `hasNext` has returned true since the last call to `process` or `next` .
+
+Note that in order to produce a segment, the CAS Multiplier must get an empty JCas to populate.
+This is done by the line:
+
+[source]
+----
+JCas jcas = getEmptyJCas();
+----
+
+This requests an empty JCas from the framework, which maintains a pool of JCas instances to draw from.
+
+Also, note the use of the `try...catch` block to ensure that a JCas is released back to the pool if an exception occurs.
+This is very important to allow a CAS Multiplier to recover from errors.
+
+[[ugr.tug.cm.creating_cm_descriptor]]
+== Creating the CAS Multiplier Descriptor
+// <titleabbrev>CAS Multiplier Descriptor</titleabbrev>
+
+There is not a separate type of descriptor for a CAS Multiplier.
+CAS Multiplier are considered a type of Analysis Engine, and so their descriptors use the same syntax as any other Analysis Engine Descriptor.
+
+The descriptor for the `SimpleTextSegmenter` is located in the `examples/descriptors/cas_multiplier/SimpleTextSegmenter.xml` directory of the UIMA SDK.
+
+The Analysis Engine Description, in its "`Operational Properties`" section, now contains a new "`outputsNewCASes`" property which takes a Boolean value.
+If the Analysis Engine is a CAS Multiplier, this property should be set to true.
+
+If you use the CDE, be sure to check the "`Outputs new CASes`" box in the Runtime Information section on the Overview page, as shown here: 
+
+.Screen shot of Component Descriptor Editor on Overview showing checking of "Outputs new CASes" box
+image::images/tutorials_and_users_guides/tug.cas_multiplier/image002.jpg[]
+
+If you edit the Analysis Engine Descriptor by hand, you need to add a `<outputsNewCASes>` element to your descriptor as shown here:
+
+[source]
+----
+<operationalProperties>
+  <modifiesCas>false</modifiesCas>
+  <multipleDeploymentAllowed>true</multipleDeploymentAllowed>
+  <outputsNewCASes>true</outputsNewCASes>
+</operationalProperties>
+----
+
+[NOTE]
+====
+The "`modifiedCas`" operational property refers to the input CAS, not the new output CASes produced.
+So our example SimpleTextSegmenter has modifiesCas set to false since it doesn't modify the input CAS. 
+====
+
+[[ugr.tug.cm.using_cm_in_aae]]
+== Using a CAS Multiplier in an Aggregate Analysis Engine
+// <titleabbrev>Using CAS Multipliers in Aggregates</titleabbrev>
+
+You can include a CAS Multiplier as a component in an Aggregate Analysis Engine.
+For example, this allows you to construct an Aggregate Analysis Engine that takes each input CAS, breaks it up into segments, and runs a series of Annotators on each segment.
+
+[[ugr.tug.cm.adding_cm_to_aggregate]]
+=== Adding the CAS Multiplier to the Aggregate
+// <titleabbrev>Aggregate: Adding the CAS Multiplier</titleabbrev>
+
+Since CAS Multiplier are considered a type of Analysis Engine, adding them to an aggregate works the same way as for other Analysis Engines.
+Using the CDE, you just click the "`Add...`" button in the Component Engines view and browse to the Analysis Engine Descriptor of your CAS Multiplier.
+If editing the aggregate descriptor directly, just `import` the Analysis Engine Descriptor of your CAS Multiplier as usual.
+
+An example descriptor for an Aggregate Analysis Engine containing a CAS Multiplier is provided in ``examples/descriptors/cas_multiplier/SegmenterAndTokenizerAE.xml``.
+This Aggregate runs the `SimpleTextSegmenter` example to break a large document into segments, and then runs each segment through the ``SimpleTokenAndSentenceAnnotator``.
+Try running it in the Document Analyzer tool with a large text file as input, to see that it outputs multiple output CASes, one for each segment produced by the ``SimpleTextSegmenter``.
+
+[[ugr.tug.cm.cm_and_fc]]
+=== CAS Multipliers and Flow Control
+
+CAS Multipliers are only supported in the context of Fixed Flow or custom Flow Control.
+If you use the built-in "`Fixed Flow`" for your Aggregate Analysis Engine, you can position the CAS Multiplier anywhere in that flow.
+Processing then works as follows: When a CAS is input to the Aggregate AE, that CAS is routed to the components in the order specified by the Fixed Flow, until that CAS reaches a CAS Multiplier.
+
+Upon reaching a CAS Multiplier, if that CAS Multiplier produces new output CASes, then each output CAS from that CAS Multiplier will continue through the flow, starting at the node immediately after the CAS Multiplier in the Fixed Flow.
+No further processing will be done on the original input CAS after it has reached a CAS Multiplier –it will _not_ continue in the flow.
+
+If the CAS Multiplier does _not_ produce any output CASes for a given input CAS, then that input CAS _will_ continue in the flow.
+This behavior is appropriate, for example, for a CAS Multiplier that may segment an input CAS into pieces but only does so if the input CAS is larger than a certain size.
+
+It is possible to put more than one CAS Multiplier in your flow.
+In this case, when a new CAS output from the first CAS Multiplier reaches the second CAS Multiplier and if the second CAS Multiplier produces output CASes, then no further processing will occur on the input CAS, and any new output CASes produced by the second CAS Multiplier will continue the flow starting at the node after the second CAS Multiplier.
+
+This default behavior can be customized.
+The `FixedFlowController` component that implement's UIMA's default flow defines a configuration parameter `ActionAfterCasMultiplier` that can take the following values:
+
+* `continue`– the CAS continues on to the next element in the flow
+* `stop`– the CAS will no longer continue in the flow, and will be returned from the aggregate if possible.
+* `drop`– the CAS will no longer continue in the flow, and will be dropped (not returned from the aggregate) if possible.
+* `dropIfNewCasProduced` (the default) – if the CAS multiplier produced a new CAS as a result of processing this CAS, then this CAS will be dropped. If not, then this CAS will continue.
+
+You can override this parameter in your Aggregate Analysis Engine the same way you would override a parameter in a delegate Analysis Engine.
+But to do so you must first explicitly identify that you are using the `FixedFlowController` implementation by importing its descriptor into your aggregate as follows:
+
+[source]
+----
+<flowController key="FixedFlowController">
+          <import name="org.apache.uima.flow.FixedFlowController"/>
+        </flowController>
+----
+
+The parameter could then be overriden as, for example:
+
+[source]
+----
+<configurationParameters>
+          <configurationParameter>
+            <name>ActionForIntermediateSegments</name>
+            <type>String</type>
+            <multiValued>false</multiValued>
+            <mandatory>false</mandatory>
+            <overrides>
+              <parameter>
+                FixedFlowController/ActionAfterCasMultiplier
+              </parameter>
+            </overrides>
+          </configurationParameter>   
+        </configurationParameters>
+  
+       <configurationParameterSettings>
+         <nameValuePair>
+           <name>ActionForIntermediateSegments</name>
+           <value>
+             <string>drop</string>
+           </value>
+         </nameValuePair>
+       </configurationParameterSettings>
+----
+
+This overriding can also be done using the Component Descriptor Editor tool.
+An example of an Analysis Engine that overrides this parameter can be found in ``examples/descriptors/cas_multiplier/Segment_Annotate_Merge_AE.xml``.
+For more information about how to specify a flow controller as part of your Aggregate Analysis Engine descriptor, see <<ugr.tug.fc.adding_fc_to_aggregate>>.
+
+If you would like to further customize the flow, you will need to implement a custom FlowController as described in <<ugr.tug.fc>>.
+For example, you could implement a flow where a CAS that is input to a CAS Multiplier will be processed further by _some_ downstream components, but not others.
+
+[[ugr.tug.cm.aggregate_cms]]
+=== Aggregate CAS Multipliers
+
+An important consideration when you put a CAS Multiplier inside an Aggregate Analysis Engine is whether you want the Aggregate to also function as a CAS Multiplier -- that is, whether you want the new output CASes produced within the Aggregate to be output from the Aggregate.
+This is controlled by the `<outputsNewCASes>` element in the Operational Properties of your Aggregate Analysis Engine descriptor.
+The syntax is the same as what was described in <<ugr.tug.cm.creating_cm_descriptor>> .
+
+If you set this property to ``true``, then any new output CASes produced by a CAS Multiplier inside this Aggregate will be output from the Aggregate.
+Thus the Aggregate will function as a CAS Multiplier and can be used in any of the ways in which a primitive CAS Multiplier can be used.
+
+If you set the <outputsNewCASes> property to `false` , then any new output CASes produced by a CAS Multiplier inside the Aggregate will be dropped (i.e.
+the CASes will be released back to the pool) once they have finished being processed.
+Such an Aggregate Analysis Engine functions just like a "`normal`" non-CAS-Multiplier Analysis Engine; the fact that CAS Multiplication is occurring inside it is hidden from users of that Analysis Engine.
+
+[NOTE]
+====
+If you want to output some new Output CASes and not others, you need to implement a custom Flow Controller that makes this decision -- see <<ugr.tug.fc.using_fc_with_cas_multipliers>>.
+====
+
+[[ugr.tug.cm.using_cm_in_cpe]]
+== Using a CAS Multiplier in a Collection Processing Engine
+// <titleabbrev>CAS Multipliers in CPE's</titleabbrev>
+
+It is currently a limitation that CAS Multiplier cannot be deployed directly in a Collection Processing Engine.
+The only way that you can use a CAS Multiplier in a CPE is to first wrap it in an Aggregate Analysis Engine whose ``outputsNewCASes ``property is set to ``false``, which in effect hides the existence of the CAS Multiplier from the CPE.
+
+Note that you can build an Aggregate Analysis Engine that consists of CAS Multipliers and Annotators, followed by CAS Consumers.
+This can simulate what a CPE would do, but without the deployment and error handling options that the CPE provides.
+
+[[ugr.tug.cm.calling_cm_from_app]]
+== Calling a CAS Multiplier from an Application
+// <titleabbrev>Applications: Calling CAS Multipliers</titleabbrev>
+
+
+[[ugr.tug.cm.retrieving_output_cases]]
+=== Retrieving Output CASes from the CAS Multiplier
+// <titleabbrev>Output CASes</titleabbrev>
+
+The `AnalysisEngine` interface has the following methods that allow you to interact with CAS Multiplier: 
+
+* `CasIterator processAndOutputNewCASes(CAS)`
+* `JCasIterator processAndOutputNewCASes(JCas)`
+
+From your application, you call `processAndOutputNewCASes` and pass it the input CAS.
+An iterator is returned that allows you to step through each of the new output CASes that are produced by the Analysis Engine.
+
+It is very important to realize that CASes are pooled objects and so your application must release each CAS (by calling the `CAS.release()` method) that it obtains from the CasIterator _before_ it calls the `CasIterator.next` method again.
+Otherwise, the CAS pool will be exhausted and a deadlock will occur.
+
+The example code in the class `org.apache.uima.examples.casMultiplier.
+        CasMultiplierExampleApplication` illusrates this.
+Here is the main processing loop:
+
+[source]
+----
+CasIterator casIterator = ae.processAndOutputNewCASes(initialCas);
+while (casIterator.hasNext()) {
+  CAS outCas = casIterator.next();
+
+  //dump the document text and annotations for this segment
+  System.out.println("********* NEW SEGMENT *********");
+  System.out.println(outCas.getDocumentText());
+  PrintAnnotations.printAnnotations(outCas, System.out); 
+
+  //release the CAS (important)
+  outCas.release();
+----
+
+Note that as defined by the CAS Multiplier contract in <<ugr.tug.cm.cm_interface_overview>>, the CAS Multiplier owns the input CAS (``initialCas`` in the example) until the last new output CAS has been produced.
+This means that the application should not try to make changes to `initialCas` until after the `CasIterator.hasNext` method has returned false, indicating that the segmenter has finished.
+
+Note that the processing time of the Analysis Engine is spread out over the calls to the `CasIterator's hasNext` and `next` methods.
+That is, the next output CAS may not actually be produced and annotated until the application asks for it.
+So the application should not expect calls to the `CasIterator` to necessarily complete quickly.
+
+Also, calls to the `CasIterator` may throw Exceptions indicating an error has occurred during processing.
+If an Exception is thrown, all processing of the input CAS will stop, and no more output CASes will be produced.
+There is currently no error recovery mechanism that will allow processing to continue after an exception.
+
+[[ugr.tug.cm.using_cm_with_other_aes]]
+=== Using a CAS Multiplier with other Analysis Engines
+// <titleabbrev>CAS Multipliers with other AEs</titleabbrev>
+
+In your application you can take the output CASes from a CAS Multiplier and pass them to the `process` method of other Analysis Engines.
+However there are some special considerations regarding the Type System of these CASes.
+
+By default, the output CASes of a CAS Multiplier will have a Type System that contains all of the types and features declared by any component in the outermost Aggregate Analysis Engine or Collection Processing Engine that contains the CAS Multiplier.
+If in your application you create a CAS Multiplier and another Analysis Engine, where these are not enclosed in an aggregate, then the output CASes from the CAS Multiplier will not support any types or features that are  declared in the latter Analysis Engine but not in the CAS Multiplier. 
+
+This can be remedied by forcing the CAS Multiplier and Analysis Engine to share a single `UimaContext` when they are created, as follows: 
+[source]
+----
+//create a "root" UIMA context for your whole application
+
+UimaContextAdmin rootContext =
+   UIMAFramework.newUimaContext(UIMAFramework.getLogger(),
+      UIMAFramework.newDefaultResourceManager(),
+      UIMAFramework.newConfigurationManager());
+
+XMLInputSource input = new XMLInputSource("MyCasMultiplier.xml");
+AnalysisEngineDescription desc = UIMAFramework.getXMLParser().
+        parseAnalysisEngineDescription(input);
+ 
+//create a UIMA Context for the new AE we are about to create
+
+//first argument is unique key among all AEs used in the application
+UimaContextAdmin childContext = rootContext.createChild(
+        "myCasMultiplier", Collections.EMPTY_MAP);
+
+//instantiate CAS Multiplier AE, passing the UIMA Context through the 
+//additional parameters map
+
+Map additionalParams = new HashMap();
+additionalParams.put(Resource.PARAM_UIMA_CONTEXT, childContext);
+
+AnalysisEngine casMultiplierAE = UIMAFramework.produceAnalysisEngine(
+        desc,additionalParams);
+
+//repeat for another AE      
+XMLInputSource input2 = new XMLInputSource("MyAE.xml");
+AnalysisEngineDescription desc2 = UIMAFramework.getXMLParser().
+        parseAnalysisEngineDescription(input2);
+ 
+UimaContextAdmin childContext2 = rootContext.createChild(
+        "myAE", Collections.EMPTY_MAP);
+
+Map additionalParams2 = new HashMap();
+additionalParams2.put(Resource.PARAM_UIMA_CONTEXT, childContext2);
+
+AnalysisEngine myAE = UIMAFramework.produceAnalysisEngine(
+        desc2, additionalParams2);
+----
+
+[[ugr.tug.cm.using_cm_to_merge_cases]]
+== Using a CAS Multiplier to Merge CASes
+// <titleabbrev>Merging with CAS Multipliers</titleabbrev>
+
+A CAS Multiplier can also be used to combine smaller CASes together to form larger CASes.
+In this section we describe how this works and walk through an example.
+
+[[ugr.tug.cm.overview_of_how_to_merge_cases]]
+=== Overview of How to Merge CASes
+// <titleabbrev>CAS Merging Overview</titleabbrev>
+
+
+. When the framework first calls the CAS Multiplier's `process` method, the CAS Multiplier requests an empty CAS (which we'll call the "merged CAS") and copies relevant data from the input CAS into the merged CAS. The class `org.apache.uima.util.CasCopier` provides utilities for copying Feature Structures between CASes.
+. When the framework then calls the CAS Multiplier's `hasNext` method, the CAS Multiplier returns `false` to indicate that it has no output at this time.
+. When the framework calls `process` again with a new input CAS, the CAS Multiplier copies data from that input CAS into the merged CAS, combining it with the data that was previously copied.
+. Eventually, when the CAS Multiplier decides that it wants to output the merged CAS, it returns `true` from the `hasNext` method, and then when the framework subsequently calls the `next` method, the CAS Multiplier returns the merged CAS.
+
+
+[NOTE]
+====
+There is no explicit call to flush out any pending CASes from a CAS Multiplier when collection processing completes.
+It is up to the application to provide some mechanism to let a CAS Multiplier recognize the last CAS in a collection so that it can ensure that its final output CASes are complete.
+====
+
+[[ugr.tug.cm.example_cas_merger]]
+=== Example CAS Merger
+
+An example CAS Multiplier that merges CASes can be found is provided in the UIMA SDK.
+The Java class for this example is `org.apache.uima.examples.casMultiplier.SimpleTextMerger` and the source code is located under the `examples/src` directory.
+
+[[ugr.tug.cm.example_cas_merger.process]]
+==== Process Method
+
+Almost all of the code for this example is in the `process` method.
+The first part of the `process` method shows how to copy Feature Structures from the input CAS to the "merged CAS":
+
+[source]
+----
+public void process(JCas aJCas) throws AnalysisEngineProcessException {
+    // procure a new CAS if we don't have one already
+    if (mMergedCas == null) {
+      mMergedCas = getEmptyJCas();
+    }
+
+    // append document text
+    String docText = aJCas.getDocumentText();
+    int prevDocLen = mDocBuf.length();
+    mDocBuf.append(docText);
+
+    // copy specified annotation types
+    // CasCopier takes two args: the CAS to copy from.
+    //                           the CAS to copy into.
+    CasCopier copier = new CasCopier(aJCas.getCas(), mMergedCas.getCas());
+    
+    // needed in case one annotation is in two indexes (could    
+    // happen if specified annotation types overlap)
+    Set copiedIndexedFs = new HashSet(); 
+    for (int i = 0; i < mAnnotationTypesToCopy.length; i++) {
+      Type type = mMergedCas.getTypeSystem()
+          .getType(mAnnotationTypesToCopy[i]);
+      FSIndex index = aJCas.getCas().getAnnotationIndex(type);
+      Iterator iter = index.iterator();
+      while (iter.hasNext()) {
+        FeatureStructure fs = (FeatureStructure) iter.next();
+        if (!copiedIndexedFs.contains(fs)) {
+          Annotation copyOfFs = (Annotation) copier.copyFs(fs);
+          // update begin and end
+          copyOfFs.setBegin(copyOfFs.getBegin() + prevDocLen);
+          copyOfFs.setEnd(copyOfFs.getEnd() + prevDocLen);
+          mMergedCas.addFsToIndexes(copyOfFs);
+          copiedIndexedFs.add(fs);
+        }
+      }
+    }
+----
+
+The `CasCopier` class is used to copy Feature Structures of certain types (specified by a configuration parameter) to the merged CAS.
+The `CasCopier` does deep copies, meaning that if the copied FeatureStructure references another FeatureStructure, the referenced FeatureStructure will also be copied.
+
+This example also merges the document text using a separate ``StringBuffer``.
+Note that we cannot append document text to the Sofa data of the merged CAS because Sofa data cannot be modified once it is set.
+
+The remainder of the `process` method determines whether it is time to output a new CAS.
+For this example, we are attempting to merge all CASes that are segments of one original artifact.
+This is done by checking the `SourceDocumentInformation` Feature Structure in the CAS to see if its `lastSegment` feature is set to ``true``.
+That feature (which is set by the example `SimpleTextSegmenter` discussed previously) marks the CAS as being the last segment of an artifact, so when the CAS Multiplier sees this segment it knows it is time to produce an output CAS.
+
+[source]
+----
+// get the SourceDocumentInformation FS, 
+// which indicates the sourceURI of the document
+// and whether the incoming CAS is the last segment
+FSIterator it = aJCas
+        .getAnnotationIndex(SourceDocumentInformation.type).iterator();
+if (!it.hasNext()) {
+  throw new RuntimeException("Missing SourceDocumentInformation");
+}
+SourceDocumentInformation sourceDocInfo = 
+      (SourceDocumentInformation) it.next();
+if (sourceDocInfo.getLastSegment()) {
+  // time to produce an output CAS
+  // set the document text
+  mMergedCas.setDocumentText(mDocBuf.toString());
+
+  // add source document info to destination CAS
+  SourceDocumentInformation destSDI = 
+      new SourceDocumentInformation(mMergedCas);
+  destSDI.setUri(sourceDocInfo.getUri());
+  destSDI.setOffsetInSource(0);
+  destSDI.setLastSegment(true);
+  destSDI.addToIndexes();
+
+  mDocBuf = new StringBuffer();
+  mReadyToOutput = true;
+}
+----
+
+When it is time to produce an output CAS, the CAS Multiplier makes final updates to the merged CAS (setting the document text and adding a `SourceDocumentInformation` FeatureStructure), and then sets the `mReadyToOutput` field to true.
+This field is then used in the `hasNext` and `next` methods.
+
+[[ugr.tug.cm.example_cas_merger.hasnext_and_next]]
+==== HasNext and Next Methods
+
+These methods are relatively simple:
+
+[source]
+----
+public boolean hasNext() throws AnalysisEngineProcessException {
+    return mReadyToOutput;
+  }
+
+  public AbstractCas next() throws AnalysisEngineProcessException {
+    if (!mReadyToOutput) {
+      throw new RuntimeException("No next CAS");
+    }
+    JCas casToReturn = mMergedCas;
+    mMergedCas = null;
+    mReadyToOutput = false;
+    return casToReturn;
+  }
+----
+
+When the merged CAS is ready to be output, `hasNext` will return true, and `next` will return the merged CAS, taking care to set the `mMergedCas` field to `null` so that the next call to `process` will start with a fresh CAS.
+
+[[ugr.tug.cm.using_the_simple_text_merger_in_an_aggregate_ae]]
+=== Using the SimpleTextMerger in an Aggregate Analysis Engine
+// <titleabbrev>SimpleTextMerger in an Aggregate</titleabbrev>
+
+An example descriptor for an Aggregate Analysis Engine that uses the `SimpleTextMerger` is provided in ``examples/descriptors/cas_multiplier/Segment_Annotate_Merge_AE.xml``.
+This Aggregate first runs the `SimpleTextSegmenter` example to break a large document into segments.
+It then runs each segment through the example tokenizer and name recognizer annotators.
+Finally it runs the `SimpleTextMerger` to reassemble the segments back into one CAS.
+The `Name` annotations are copied to the final merged CAS but the `Token` annotations are not.
+
+This example illustrates how you can break large artifacts into pieces for more efficient processing and then reassemble a single output CAS containing only the results most useful to the application.
+Intermediate results such as tokens, which may consume a lot of space, need not be retained over the entire input artifact.
+
+The intermediate segments are dropped and are never output from the Aggregate Analysis Engine.
+This is done by configuring the Fixed Flow Controller as described in <<ugr.tug.cm.cm_and_fc>>, above.
+
+Try running this Analysis Engine in the Document Analyzer tool with a large text file as input, to see that  it outputs just one CAS per input file, and that the final CAS contains only the `Name` annotations. 
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/tug/tug.cpe.adoc b/uimaj-documentation/src/docs/asciidoc/tug/tug.cpe.adoc
new file mode 100644
index 0000000..7b44e4c
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/tug/tug.cpe.adoc
@@ -0,0 +1,909 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.tug.cpe]]
+= Collection Processing Engine Developer's Guide
+// <titleabbrev>CPE Developer's Guide</titleabbrev>
+
+
+[NOTE]
+====
+The CPE (Collection Processing Engine) was an early approach to supporting some scale-out use cases.
+It is an older approach that doesn't support some of the newer features of CASes  such as multiple views and CAS Multipliers.
+It has been supplanted by UIMA-AS, which has full support for the new features.
+====
+
+The UIMA Analysis Engine interface provides support for developing and integrating algorithms that analyze unstructured data.
+Analysis Engines are designed to operate on a per-document basis.
+Their interface handles one CAS at a time.
+UIMA provides additional support for applying analysis engines to collections of unstructured data with its __Collection Processing Architecture__.
+The Collection Processing Architecture defines additional components for reading raw data formats from data collections, preparing the data for processing by Analysis Engines, executing the analysis, extracting analysis results, and deploying the overall flow in a variety of local and distributed configurations.
+
+The functionality defined in the Collection Processing Architecture is implemented by a _Collection Processing Engine_ (CPE). A CPE includes an Analysis Engine and adds a __Collection Reader__, a _CAS Initializer_ (deprecated as of version 2), and __CAS
+    Consumers__.
+The part of the UIMA Framework that supports the execution of CPEs is called the Collection Processing Manager, or CPM.
+
+A Collection Reader provides the interface to the raw input data and knows how to iterate over the data collection.
+Collection Readers are discussed in <<ugr.tug.cpe.collection_reader.developing>>.
+The CAS Initializer footnote:[CAS Initializers are deprecated in favor of a more general mechanism,
+    multiple subjects of analysis.] prepares an individual data item for analysis and loads it into the CAS.
+CAS Initializers are discussed in <<ugr.tug.cpe.cas_initializer.developing>> A CAS Consumer extracts analysis results from the CAS and may also perform __collection level
+    processing__, or analysis over a collection of CASes.
+CAS Consumers are discussed in <<ugr.tug.cpe.cas_consumer.developing>>.
+
+Analysis Engines and CAS Consumers are both instances of __CAS
+    Processors__.
+A Collection Processing Engine (CPE) may contain multiple CAS Processors.
+An Analysis Engine contained in a CPE may itself be a Primitive or an Aggregate (composed of other Analysis Engines). Aggregates may contain Cas Consumers.
+While Collection Readers and CAS Initializers always run in the same JVM as the CPM, a CAS Processor may be deployed in a variety of local and distributed modes, providing a number of options for scalability and robustness.
+The different deployment options are covered in detail in <<ugr.tug.cpe.deployment_alternatives>>.
+
+Each of the components in a CPE has an interface specified by the UIMA Collection Processing Architecture and is described by a declarative XML descriptor file.
+Similarly, the CPE itself has a well defined component interface and is described by a declarative XML descriptor file.
+
+A user creates a CPE by assembling the components mentioned above.
+The UIMA SDK provides a graphical tool, called the CPE Configurator, for assisting in the assembly of CPEs.
+Use of this tool is summarized in <<ugr.tug.cpe.cpe_configurator>>, and more details can be found in xref:tools.adoc#ugr.tools.cpe[Collection Processing Engine Configurator User’s Guide].
+Alternatively, a CPE can be assembled by writing an XML CPE descriptor.
+Details on the CPE descriptor, including its syntax and content, can be found in the xref:ref.adoc#ugr.ref.xml.cpe_descriptor[Collection Processing Engine Descriptor Reference].
+The individual components have associated XML descriptors, each of which can be created and / or edited using the xref:tools.adoc#ugr.tools.cde[Component Description Editor].
+
+A CPE is executed by a UIMA infrastructure component called the _Collection Processing Manager_ (CPM). The CPM provides a number of services and deployment options that cover instantiation and execution of CPEs, error recovery, and local and distributed deployment of the CPE components.
+
+[[ugr.tug.cpe.concepts]]
+== CPE Concepts
+
+<<ugr.tug.cpe.fig.cpe_components>> illustrates the data flow that occurs between the different types of components that make up a CPE.
+
+[[ugr.tug.cpe.fig.cpe_components]]
+.CPE Components
+image::images/tutorials_and_users_guides/tug.cpe/image002.png[CPE Components and flow between them]
+
+The components of a CPE are:
+
+* _Collection Reader –_ interfaces to a collection of data items (e.g., documents) to be analyzed. Collection Readers return CASes that contain the documents to analyze, possibly along with additional metadata.
+* _Analysis Engine –_ takes a CAS, analyzes its contents, and produces an enriched CAS. Analysis Engines can be recursively composed of other Analysis Engines (called an _Aggregate_ Analysis Engine). Aggregates may also contain CAS Consumers.
+* _CAS Consumer –_ consume the enriched CAS that was produced by the sequence of Analysis Engines before it, and produce an application-specific data structure, such as a search engine index or database. 
+
+A fourth type of component, the _CAS Initializer,_ may be used by a Collection Reader to populate a CAS from a document.
+However, as of UIMA version 2 CAS Initializers are now deprecated in favor of a more general mechsanism, multiple Subjects of Analysis.
+
+The Collection Processing Manager orchestrates the data flow within a CPE, monitors status, optionally manages the life-cycle of internal components and collects statistics.
+
+CASes are not saved in a persistent way by the framework.
+If you want to save CASes, then you have to save each CAS as it comes through (for example) using a CAS Consumer you write to do this, in whatever format you like.
+The UIMA SDK supplies an example CAS Consumer to save CASes to XML files, either in the standard XMI format or in an older format called XCAS.
+It also supplies an example CAS Consumer to extract information from CASes and store the results into a relational Database, using Java's JDBC APIs.
+
+[[ugr.tug.cpe.configurator_and_viewer]]
+== CPE Configurator and CAS viewer
+
+[[ugr.tug.cpe.cpe_configurator]]
+=== Using the CPE Configurator
+
+A CPE can be assembled by writing an XML CPE descriptor.
+Details on the CPE descriptor, including its syntax and content, can be found in xref:ref.adoc#ugr.ref.xml.cpe_descriptor[Collection Processing Engine Descriptor Reference].
+Rather than edit raw XML, you may develop a CPE Descriptor using the CPE Configurator tool.
+The CPE Configurator tool is described briefly in this section, and in more detail in xref:tools.adoc#ugr.tools.cpe[Collection Processing Engine Configurator User’s Guide].
+
+The CPE Configurator tool can be run from Eclipse (see <<ugr.tug.cpe.running_cpe_configurator_from_eclipse>>, or using the `cpeGui` shell script (``cpeGui.bat`` on Windows, `cpeGui.sh` on Unix), which is located in the `bin` directory of the UIMA SDK installation.
+Executing this batch file will display the window shown here: 
+
+
+image::images/tutorials_and_users_guides/tug.cpe/image004.jpg[Screenshot of CPE GUI]
+
+The window is divided into three sections, one each for the Collection Reader,  Analysis Engines, and CAS Consumers.footnote:[There is also a fourth pane,
+        for the CAS Initializer, but it is hidden by default.  To enable it click the
+        View  CAS Initializer Panel menu item.]  In each section, you select the component(s) you want to include in the CPE by  browsing to their XML descriptors.
+The configuration parameters present in the XML  descriptors will then be displayed in the GUI; these can be modified to override the values present in the descriptor.
+For example, the screen shot below shows the  CPE Configurator after the following components have been chosen: 
+[source]
+----
+Collection Reader: 
+   %UIMA_HOME%/examples/descriptors/collection_reader/
+          FileSystemCollectionReader.xml
+
+Analysis Engine: 
+   %UIMA_HOME%/examples/descriptors/analysis_engine/
+          NamesAndPersonTitles_TAE.xml
+
+CAS Consumer: 
+    %UIMA_HOME%/examples/descriptors/cas_consumer/
+          XmiWriterCasConsumer.xml
+----
+
+
+image::images/tutorials_and_users_guides/tug.cpe/image006.jpg[Screenshot of CPE GUI after fields filled in]
+
+For the File System Collection Reader, ensure that the Input Directory is set to `%UIMA_HOME%\examples\data`footnote:[Replace
+        %UIMA_HOME% with the path to where you installed UIMA.].
+The other parameters may be left blank.
+For the External CAS Writer CAS Consumer, ensure that the Output Directory is set to ``%UIMA_HOME%\examples\data\processed``.
+
+After selecting each of the components and providing configuration settings, click the play (forward arrow) button at the bottom of the screen to begin processing.
+A progress bar should be displayed in the lower left corner.
+(Note that the progress bar will not begin to move until all components have completed their initialization, which may take several seconds.) Once processing has begun, the pause and stop buttons become enabled.
+
+If an error occurs, you will be informed by an error dialog.
+If processing completes successfully, you will be presented with a performance report.
+
+Using the File menu, you can select ``Save CPE Descriptor ``to create an .xml descriptor file that defines the CPE you have constructed.
+Later, you can use `Open CPE Descriptor` to restore the CPE Configurator to the saved state.
+Also, CPE descriptors can be used to run a CPE from a Java program – see section <<ugr.tug.cpe.running_cpe_from_application>>.
+CPE Descriptors allow specifying operational parameters, such as error handling options, that are not currently available for configuration through the CPE Configurator.
+For more information on manually creating a CPE Descriptor, see the xref:ref.adoc#ugr.ref.xml.cpe_descriptor[Collection Processing Engine Descriptor Reference].
+
+The CPE configured above runs a simple name and title annotator on the sample data provided with the UIMA SDK and stores the results using the XMI Writer CAS Consumer.
+To view the results, start the External CAS Annotation Viewer by running the `annotationViewer` batch file (``annotationViewer.bat`` on Windows, `annotationViewer.sh` on Unix), which is located in the `bin` directory of the UIMA SDK installation.
+Executing this batch file will display the window shown here: 
+
+
+image::images/tutorials_and_users_guides/tug.cpe/image008.jpg[Screenshot of Annotation Viewer results]
+
+Ensure that the Input Directory is the same as the Output Directory specified for the XMI Writer CAS Consumer in the CPE configured above (e.g., ``%UIMA_HOME%\examples\data\processed``) and that the TAE Descriptor File is set to the Analysis Engine used in the CPE configured above (e.g., `examples\descriptors\analysis_engine\NamesAndPersonTitles_TAE.xml` ).
+
+Click the View button to display the Analyzed Documents window: 
+
+
+image::images/tutorials_and_users_guides/tug.cpe/image010.jpg[Screenshot of CPE Configurator Analyzed Documents]
+
+Double click on any document in the list to view the analyzed document.
+Double clicking the first document, IBM_LifeSciences.txt, will bring up the following window: 
+
+
+image::images/tutorials_and_users_guides/tug.cpe/image012.jpg[Screenshot of Document and Annotation Viewer]
+
+This window shows the analysis results for the document.
+Clicking on any highlighted annotation causes the details for that annotation to be displayed in the right-hand pane.
+Here the annotation spanning "`John M. Thompson`" has been clicked.
+
+Congratulations! You have successfully configured a CPE, saved its descriptor, run the CPE, and viewed the analysis results.
+
+[[ugr.tug.cpe.running_cpe_configurator_from_eclipse]]
+=== Running the CPE Configurator from Eclipse
+
+If you have followed the instructions in the xref:oas.adoc#ugr.ovv.eclipse_setup[Setup Guide] and imported the example Eclipse project, then you should already have a Run configuration for the CPE Configurator tool (called __UIMA CPE GUI__) configured to run in the example project.
+Simply run that configuration to start the CPE Configurator.
+
+If you have not followed the Eclipse setup instructions and wish to run the CPE Configurator tool from Eclipse, you will need to do the following.
+As installed, this Eclipse launch configuration is associated with the `uimaj-examples` project.
+If you've not already done so, you may wish to import that project into your Eclipse workspace.
+It's located in `%UIMA_HOME%/docs/examples`.
+Doing this will supply the Eclipse launcher with all the class files it needs to run the CPE configurator.
+If you don't do this, please manually add the JAR files for UIMA to the launch configuration.
+
+Also, you need to add any projects or JAR files for any UIMA components you will be running to the launch class path.
+
+[NOTE]
+====
+A simpler alternative may be to change the CPE launch configuration to be based on your project.
+If you do that, it will pick up all the files in your project's class path, which you should set up to include all the UIMA framework files.
+An easy way to do this is to specify in your project's properties' build-path that the uimaj-examples project is on the build path, because the uimaj-examples project is set up to include all the UIMA framework classes in its classpath already. 
+====
+
+Next, in the Eclipse menu select __Run → Run...__, which brings up the Run configuration screen.
+
+In the Main tab, set the main class to `org.apache.uima.tools.cpm.CpmFrame`
+
+In the arguments tab, add the following to the VM arguments: 
+
+[source]
+----
+-Xms128M -Xmx256M 
+-Duima.home="C:\Program Files\Apache\uima"
+----
+(or wherever you installed the UIMA SDK)
+
+Click the Run button to launch the CPE Configurator, and use it as previously described in this section.
+
+[[ugr.tug.cpe.running_cpe_from_application]]
+== Running a CPE from Your Own Java Application
+
+The simplest way to run a CPE from a Java application is to first create a CPE descriptor as described in the previous section.
+Then the CPE can be instantiated and run using the following code: 
+[source]
+----
+      //parse CPE descriptor in file specified on command line
+CpeDescription cpeDesc = UIMAFramework.getXMLParser().
+        parseCpeDescription(new XMLInputSource(args[0]));
+      
+      //instantiate CPE
+mCPE = UIMAFramework.produceCollectionProcessingEngine(cpeDesc);
+
+      //Create and register a Status Callback Listener
+mCPE.addStatusCallbackListener(new StatusCallbackListenerImpl());
+
+      //Start Processing
+mCPE.process();
+----
+
+This will start the CPE running in a separate thread.
+
+[NOTE]
+====
+The `process()` method for a CPE can only be called once.
+If you  need to call it again, you have to instantiate a new CPE, and call that new CPE's process method.
+====
+
+[[ugr.tug.cpe.using_listeners]]
+=== Using Listeners
+
+Updates of the CPM's progress, including any errors that occur, are sent to the callback handler that is registered by the call to ``addStatusCallbackListener``, above.
+The callback handler is a class that implements the CPM's `StatusCallbackListener` interface.
+It responds to events by printing messages to the console.
+The source code is fairly straightforward and is not included in this chapter -- see the `org.apache.uima.examples.cpe.SimpleRunCPE.java` in the `%UIMA_HOME%\examples\src` directory for the complete code.
+
+If you need more control over the information in the CPE descriptor, you can manually configure it via its API.
+See the Javadocs for package `org.apache.uima.collection` for more details.
+
+[[ugr.tug.cpe.developing_collection_processing_components]]
+== Developing Collection Processing Components
+
+This section is an introduction to the process of developing Collection Readers, CAS Initializers, and CAS Consumers.
+The code snippets refer to the classes that can be found in ``%UIMA_HOME%\examples\src ``example project.
+
+In the following sections, classes you write to represent components need to be public and have public, no-args constructors, so that they can be instantiated by the framework.
+(Although Java classes in which you do not define any constructor will, by default, have a no-args constructor that doesn't do anything, a class in which you have defined at least one constructor does not get a default no-args constructor.)
+
+[[ugr.tug.cpe.collection_reader.developing]]
+=== Developing Collection Readers
+
+A Collection Reader is responsible for obtaining documents from the collection and returning each document as a CAS.
+Like all UIMA components, a Collection Reader consists of two parts —the code and an XML descriptor.
+
+A simple example of a Collection Reader is the "`File System Collection
+        Reader,`" which simply reads documents from files in a specified directory.
+The Java code is in the class `org.apache.uima.examples.cpe.FileSystemCollectionReader` and the XML descriptor is ``%UIMA_HOME%/examples/src/main/descriptors/collection_reader/
+          FileSystemCollectionReader.xml``.
+
+[[ugr.tug.cpe.collection_reader.java_class]]
+==== Java Class for the Collection Reader
+
+The Java class for a Collection Reader must implement the `org.apache.uima.collection.CollectionReader` interface.
+You may build your Collection Reader from scratch and implement this interface, or you may extend the convenience base class `org.apache.uima.collection.CollectionReader_ImplBase` .
+
+The convenience base class provides default implementations for many of the methods defined in the `CollectionReader` interface, and provides abstract definitions for those methods that you are required to implement in your new Collection Reader.
+Note that if you extend this base class, you do not need to declare that your new Collection Reader implements the `CollectionReader` interface.
+
+[TIP]
+====
+Eclipse tip –if you are using Eclipse, you can quickly create the boiler plate code and stubs for all of the required methods by clicking `File`→``New``→``Class`` to bring up the "`New Java Class`" dialogue, specifying `org.apache.uima.collection.CollectionReader_ImplBase` as the Superclass, and checking "`Inherited abstract methods`" in the section "`Which method stubs would you like to create?`", as in the  screenshot below:
+====
+
+
+image::images/tutorials_and_users_guides/tug.cpe/image014.jpg[Screenshot showing Eclipse new class wizard]
+
+For the rest of this section we will assume that your new Collection Reader extends the `CollectionReader_ImplBase` class, and we will show examples from the `org.apache.uima.examples.cpe.FileSystemCollectionReader` . If you must inherit from a different superclass, you must ensure that your Collection Reader implements the `CollectionReader` interface – see the Javadocs for `CollectionReader` for more details.
+
+[[ugr.tug.cpe.collection_reader.required_methods]]
+==== Required Methods in the Collection Reader class
+
+The following abstract methods must be implemented:
+
+[[ugr.tug.cpe.collection_reader.required_methods.initialize]]
+===== initialize()
+
+The `initialize()` method is called by the framework when the Collection Reader is first created. `CollectionReader_ImplBase` actually provides a default implementation of this method (i.e., it is not abstract), so you are not strictly required to implement this method.
+However, a typical Collection Reader will implement this method to obtain parameter values and perform various initialization steps.
+
+In this method, the Collection Reader class can access the values of its configuration parameters and perform other initialization logic.
+The example File System Collection Reader reads its configuration parameters and then builds a list of files in the specified input directory, as follows:
+
+[source]
+----
+public void initialize() throws ResourceInitializationException {
+  File directory = new File(
+            (String)getConfigParameterValue(PARAM_INPUTDIR));
+  mEncoding = (String)getConfigParameterValue(PARAM_ENCODING);
+  mDocumentTextXmlTagName = (String)getConfigParameterValue(PARAM_XMLTAG);
+  mLanguage = (String)getConfigParameterValue(PARAM_LANGUAGE);
+  mCurrentIndex = 0; 
+  
+  //get list of files (not subdirectories) in the specified directory
+  mFiles = new ArrayList();
+  File[] files = directory.listFiles();
+  for (int i = 0; i < files.length; i++) {
+    if (!files[i].isDirectory()) {
+      mFiles.add(files[i]);  
+    }
+  }
+}
+----
+
+[NOTE]
+====
+This is the zero-argument version of the initialize method.
+There is also a method on the Collection Reader interface called `initialize(ResourceSpecifier, Map)` but it is not recommended that you override this method in your code.
+That method performs internal initialization steps and then calls the zero-argument ``initialize()``. 
+====
+
+[[ugr.tug.cpe.collection_reader.hasnext]]
+===== hasNext()
+
+The `hasNext()` method returns whether or not there are any documents remaining to be read from the collection.
+The File System Collection Reader's `hasNext()` method is very simple.
+It just checks if there are any more files left to be read: 
+[source]
+----
+public boolean hasNext() {
+  return mCurrentIndex < mFiles.size();
+}
+----
+
+[[ugr.tug.cpe.collection_reader.required_methods.getnext]]
+===== getNext(CAS)
+
+The `getNext()` method reads the next document from the collection and populates a CAS.
+In the simple case, this amounts to reading the file and calling the CAS's `setDocumentText` method.
+The example File System Collection Reader is slightly more complex.
+It first checks for a CAS Initializer.
+If the CPE includes a CAS Initializer, the CAS Initializer is used to read the document, and `initialize()` the CAS.
+If the CPE does not include a CAS Initializer, the File System Collection Reader reads the document and sets the document text in the CAS.
+
+The File System Collection Reader also stores additional metadata about the document in the CAS.
+In particular, it sets the document's language in the special built-in feature structure xref:ref.adoc#ugr.ref.cas.document_annotation[`uima.tcas.DocumentAnnotation`] and creates an instance of `org.apache.uima.examples.SourceDocumentInformation`, which stores information about the document's source location.
+This information may be useful to downstream components such as CAS Consumers.
+Note that the type system descriptor for this type can be found in `org.apache.uima.examples.SourceDocumentInformation.xml`, which is located in the `examples/src` directory.
+
+The getNext() method for the File System Collection Reader looks like this:
+
+[source]
+----
+  public void getNext(CAS aCAS) throws IOException, CollectionException {
+    JCas jcas;
+    try {
+      jcas = aCAS.getJCas();
+    } catch (CASException e) {
+      throw new CollectionException(e);
+    }
+
+    // open input stream to file
+    File file = (File) mFiles.get(mCurrentIndex++);
+    BufferedInputStream fis = 
+            new BufferedInputStream(new FileInputStream(file));
+    try {
+      byte[] contents = new byte[(int) file.length()];
+      fis.read(contents);
+      String text;
+      if (mEncoding != null) {
+        text = new String(contents, mEncoding);
+      } else {
+        text = new String(contents);
+      }
+      // put document in CAS
+      jcas.setDocumentText(text);
+    } finally {
+      if (fis != null)
+        fis.close();
+    }
+
+    // set language if it was explicitly specified 
+    //as a configuration parameter
+    if (mLanguage != null) {
+      ((DocumentAnnotation) jcas.getDocumentAnnotationFs()).
+            setLanguage(mLanguage);
+    }
+
+    // Also store location of source document in CAS. 
+    // This information is critical if CAS Consumers will 
+    // need to know where the original document contents 
+    // are located.
+    // For example, the Semantic Search CAS Indexer 
+    // writes this information into the search index that 
+    // it creates, which allows applications that use the 
+    // search index to locate the documents that satisfy 
+    //their semantic queries.
+    SourceDocumentInformation srcDocInfo = 
+            new SourceDocumentInformation(jcas);
+    srcDocInfo.setUri(
+            file.getAbsoluteFile().toURL().toString());
+    srcDocInfo.setOffsetInSource(0);
+    srcDocInfo.setDocumentSize((int) file.length());
+    srcDocInfo.setLastSegment(
+            mCurrentIndex == mFiles.size());
+    srcDocInfo.addToIndexes();
+  }
+----
+
+The Collection Reader can create additional annotations in the CAS at this point, in the same way that annotators create annotations.
+
+[[ugr.tug.cpe.collection_reader.required_methods.getprogress]]
+===== getProgress()
+
+The Collection Reader is responsible for returning progress information; that is, how much of the collection has been read thus far and how much remains to be read.
+The framework defines progress very generally; the Collection Reader simply returns an array of `Progress` objects, where each object contains three fields — the amount already completed, the total amount (if known), and a unit (e.g.
+entities (documents), bytes, or files). The method returns an array so that the Collection Reader can report progress in multiple different units, if that information is available.
+The File System Collection Reader's `getProgress()` method looks like this: 
+[source]
+----
+public Progress[] getProgress() {
+  return new Progress[]{
+     new ProgressImpl(mCurrentIndex,mFiles.size(),Progress.ENTITIES)};
+}
+----
+
+In this particular example, the total number of files in the collection is known, but the total size of the collection is not known.
+As such, a `ProgressImpl` object for `Progress.ENTITIES` is returned, but a `ProgressImpl` object for `Progress.BYTES` is not.
+
+[[ugr.tug.cpe.collection_reader.required_methods.close]]
+===== close()
+
+The close method is called when the Collection Reader is no longer needed.
+The Collection Reader should then release any resources it may be holding.
+The FileSystemCollectionReader does not hold resources and so has an empty implementation of this method:
+
+[source]
+----
+public void close() throws IOException { }
+----
+
+[[ugr.tug.cpe.collection_reader.optional_methods]]
+===== Optional Methods
+
+The following methods may be implemented:
+
+[[ugr.tug.cpe.collection_reader.optional_methods.reconfigure]]
+====== reconfigure()
+
+This method is called if the Collection Reader's configuration parameters change.
+
+[[ugr.tug.cpe.collection_reader.optional_methods.typesysteminit]]
+====== typeSystemInit()
+
+If you are only setting the document text in the CAS, or if you are using the JCas (recommended, as in the current example, you do not have to implement this method.
+If you are directly using the CAS API, this method is used in the xref:tug.adoc#ugr.tug.aae.contract_for_annotator_methods[same way as it is used for an annotator].
+
+[[ugr.tug.cpe.collection_reader.threading]]
+===== Threading considerations
+
+Collection readers do not have to be thread safe; they are run with a single thread per instance, and only one instance per instance of the Collection Processing Manager (CPM) is made.
+
+[[ugr.tug.cpe.collection_reader.descriptor]]
+===== XML Descriptor for a Collection Reader
+
+You can use the Component Description Editor to create and / or edit the File System Collection Reader's descriptor.
+Here is its descriptor (abbreviated somewhat), which is very similar to an Analysis Engine descriptor:
+
+[source]
+----
+<collectionReaderDescription 
+          xmlns="http://uima.apache.org/resourceSpecifier">
+  <frameworkImplementation>org.apache.uima.java</frameworkImplementation>
+  <implementationName>
+    org.apache.uima.examples.cpe.FileSystemCollectionReader
+  </implementationName>
+  <processingResourceMetaData>
+    <name>File System Collection Reader</name>
+    <description>Reads files from the filesystem.</description>
+    <version>1.0</version>
+    <vendor>The Apache Software Foundation</vendor>
+    <configurationParameters>
+      <configurationParameter>
+        <name>InputDirectory</name>
+        <description>Directory containing input files</description>
+        <type>String</type>
+        <multiValued>false</multiValued>
+        <mandatory>true</mandatory>
+      </configurationParameter>
+      <configurationParameter>
+        <name>Encoding</name>
+        <description>Character encoding for the documents.</description>
+        <type>String</type>
+        <multiValued>false</multiValued>
+        <mandatory>false</mandatory>
+      </configurationParameter>
+      <configurationParameter>
+        <name>Language</name>
+        <description>ISO language code for the documents</description>
+        <type>String</type>
+        <multiValued>false</multiValued>
+        <mandatory>false</mandatory>
+      </configurationParameter>
+    </configurationParameters>
+    <configurationParameterSettings>
+      <nameValuePair>
+        <name>InputDirectory</name>
+        <value>
+          <string>C:/Program Files/apache/uima/examples/data</string>
+        </value>
+      </nameValuePair>
+    </configurationParameterSettings>
+    
+    <!-- Type System of CASes returned by this Collection Reader -->
+    
+    <typeSystemDescription>
+      <imports>
+        <import name="org.apache.uima.examples.SourceDocumentInformation"/>
+      </imports>
+    </typeSystemDescription>
+    
+    <capabilities>
+      <capability>
+        <inputs/>
+        <outputs>
+          <type allAnnotatorFeatures="true">
+            org.apache.uima.examples.SourceDocumentInformation
+          </type>
+        </outputs>
+      </capability>
+    </capabilities>
+    <operationalProperties>
+      <modifiesCas>true</modifiesCas>
+      <multipleDeploymentAllowed>false</multipleDeploymentAllowed>
+      <outputsNewCASes>true</outputsNewCASes>
+    </operationalProperties>
+  </processingResourceMetaData>
+</collectionReaderDescription>
+----
+
+[[ugr.tug.cpe.cas_initializer.developing]]
+=== Developing CASInitializers
+
+[NOTE]
+====
+CAS Initializers are now deprecated (as of version 2.1). For complex initialization, please use instead the capabilities of creating see xref:tug.adoc#ugr.tug.mvs[additional Subjects of Analysis]. 
+====
+
+In UIMA 1.x, the CAS Initializer component was intended to be used as a plug-in to the Collection Reader for when the task of populating the CAS from a raw document is complex and might be reusable with other data collections.
+
+A CAS Initializer Java class must implement the interface ``org.apache.uima.collection.CasInitializer``, and will also generally extend from the convenience base class ``org.apache.uima.collection.CasInitializer_ImplBase``.
+A CAS Initializer also must have an XML descriptor, which has the exact same form as a Collection Reader Descriptor except that the outer tag is ``<casInitializerDescription>``.
+
+CAS Initializers have optional ``initialize()``, ``reconfigure()``, and `typeSystemInit()` methods, which perform the same functions as they do for Collection Readers.
+The only required method for a CAS Initializer is ``initializeCas(Object,
+        CAS)``.
+This method takes the raw document (for example, an `InputStream` object from which the document can be read) and a CAS, and populates the CAS from the document.
+
+[[ugr.tug.cpe.cas_consumer.developing]]
+=== Developing CASConsumers
+
+[NOTE]
+====
+In version 2, there is no difference in capability between CAS Consumers and ordinary Analysis Engines, except for the default setting of the XML parameters for `multipleDeploymentAllowed` and ``modifiesCas``.
+We recommend for future work that users implement and use Analysis Engine components instead of CAS Consumers.
+
+The rest of this section is written using the version 1 style of CAS Consumer; the methods described are also available for Analysis Engines.
+Note that the  CAS Consumer `processCAS` method is equivalent to the Analysis Engine `process` method.
+====
+
+A CAS Consumer receives each CAS after it has been analyzed by the Analysis Engine.
+CAS Consumers typically do not update the CAS; they typically extract data from the CAS and persist selected information to aggregate data structures such as search engine indexes or databases.
+
+A CAS Consumer Java class must implement the interface ``org.apache.uima.collection.CasConsumer``, and will also generally extend from the convenience base class ``org.apache.uima.collection.CasConsumer_ImplBase``.
+A CAS Consumer also must have an XML descriptor, which has the exact same form as a Collection Reader Descriptor except that the outer tag is ``<casConsumerDescription>``.
+
+CAS Consumers have optional ``initialize()``, ``reconfigure()``, and `typeSystemInit()` methods, which perform the same functions as they do for Collection Readers and CAS Initializers.
+The only required method for a CAS Consumer is ``processCas(CAS)``, which is where the CAS Consumer does the bulk of its work (i.e., consume the CAS).
+
+The `CasConsumer` interface (as well as the version 2 Analysis Engine interface) additionally defines batch and collection level processing methods.
+The CAS Consumer or Analysis Engine can implement the `batchProcessComplete()` method to perform processing that should occur at the end of each batch of CASes.
+Similarly, the CAS Consumer  or Analysis Engine can implement the `collectionProcessComplete()` method to perform any collection level processing at the end of the collection.
+
+A very simple example of a CAS Consumer, which writes an XML representation of the CAS to a file, is the XMI Writer CAS Consumer.
+The Java code is in the class `org.apache.uima.examples.cpe.XmiWriterCasConsumer` and the descriptor is in `%UIMA_HOME%/examples/descriptors/cas_consumer/XmiWriterCasConsumer.xml` .
+
+[[ugr.tug.cpe.cas_consumer.required_methods]]
+==== Required Methods for a CAS Consumer
+
+When extending the convenience class ``org.apache.uima.collection.CasConsumer_ImplBase``, the following abstract methods must be implemented:
+
+[[ugr.tug.cpe.cas_consumer.required_methods.initialize]]
+===== initialize()
+
+The `initialize()` method is called by the framework when the CAS Consumer is first created. `CasConsumer_ImplBase` actually provides a default implementation of this method (i.e., it is not abstract), so you are not strictly required to implement this method.
+However, a typical CAS Consumer will implement this method to obtain parameter values and perform various initialization steps.
+
+In this method, the CAS Consumer can access the values of its configuration parameters and perform other initialization logic.
+The example XMI Writer CAS Consumer reads its configuration parameters and sets up the output directory: 
+[source]
+----
+public void initialize() throws ResourceInitializationException {
+  mDocNum = 0;
+  mOutputDir = new File((String) getConfigParameterValue(PARAM_OUTPUTDIR));
+  if (!mOutputDir.exists()) {
+    mOutputDir.mkdirs();
+  }
+}
+----
+
+[[ugr.tug.cpe.cas_consumer.required_methods.processcas]]
+===== processCas()
+
+The `processCas()` method is where the CAS Consumer does most of its work.
+In our example, the XMI Writer CAS Consumer obtains an iterator over the document metadata in the CAS (in the SourceDocumentInformation feature structure, which is created by the File System Collection Reader) and extracts the URI for the current document.
+From this the output filename is constructed in the output directory and a subroutine (``writeXmi``) is called to generate the output file.
+The `writeXmi` subroutine uses the `XmiCasSerializer` class provided with the UIMA SDK to serialize the CAS to the output file (see the example source code for details).
+
+[source]
+----
+public void processCas(CAS aCAS) throws ResourceProcessException {
+  String modelFileName = null;
+
+  JCas jcas;
+  try {
+    jcas = aCAS.getJCas();
+  } catch (CASException e) {
+    throw new ResourceProcessException(e);
+  }
+ 
+    // retreive the filename of the input file from the CAS
+  FSIterator it = jcas
+            .getAnnotationIndex(SourceDocumentInformation.type)
+                  .iterator();
+  File outFile = null;
+  if (it.hasNext()) {
+    SourceDocumentInformation fileLoc = 
+            (SourceDocumentInformation) it.next();
+    File inFile;
+    try {
+      inFile = new File(new URL(fileLoc.getUri()).getPath());
+      String outFileName = inFile.getName();
+      if (fileLoc.getOffsetInSource() > 0) {
+        outFileName += ("_" + fileLoc.getOffsetInSource());
+      }
+      outFileName += ".xmi";
+      outFile = new File(mOutputDir, outFileName);
+      modelFileName = mOutputDir.getAbsolutePath() + 
+            "/" + inFile.getName() + ".ecore";
+    } catch (MalformedURLException e1) {
+      // invalid URL, use default processing below
+    }
+  }
+  if (outFile == null) {
+    outFile = new File(mOutputDir, "doc" + mDocNum++);
+  }
+  // serialize XCAS and write to output file
+  try {
+    writeXmi(jcas.getCas(), outFile, modelFileName);
+  } catch (IOException e) {
+    throw new ResourceProcessException(e);
+  } catch (SAXException e) {
+    throw new ResourceProcessException(e);
+  }
+}
+----
+
+[[ugr.tug.cpe.cas_consumer.optional_methods]]
+===== Optional Methods
+
+The following methods are optional in a CAS Consumer, though they are often used.
+
+[[ugr.tug.cpe.cas_consumer.optional_methods.batchprocesscomplete]]
+====== batchProcessComplete()
+
+The framework calls the batchProcessComplete() method at the end of each batch of CASes.
+This gives the CAS Consumer or Analysis Engine  an opportunity to perform any batch level processing.
+Our simple XMI Writer CAS Consumer does not perform any batch level processing, so this method is empty.
+Batch size is set in the Collection Processing Engine descriptor.
+
+[[ugr.tug.cpe.cas_consumer.optional_methods.collectionprocesscomplete]]
+====== collectionProcessComplete()
+
+The framework calls the collectionProcessComplete() method at the end of the collection (i.e., when all objects in the collection have been processed). At this point in time, no CAS is passed in as a parameter.
+This gives the CAS Consumer or Analysis Engine an opportunity to perform collection processing over the entire set of objects in the collection.
+Our simple XMI Writer CAS Consumer does not perform any collection level processing, so this method is empty.
+
+[[ugr.tug.cpe.deploying_a_cpe]]
+== Deploying a CPE
+
+The CPM provides a number of service and deployment options that cover instantiation and execution of CPEs, error recovery, and local and distributed deployment of the CPE components.
+The behavior of the CPM (and correspondingly, the CPE) is controlled by various options and parameters set in the CPE descriptor.
+The current version of the CPE Configurator tool, however, supports only default error handling and deployment options.
+To change these options, you must manually edit the CPE descriptor.
+
+Eventually the CPE Configurator tool will support configuring these options and a detailed tutorial for these settings will be provided.
+In the meantime, we provide only a high-level, conceptual overview of these advanced features in the rest of this chapter, and refer the advanced user to the xref:ref.adoc#ugr.ref.xml.cpe_descriptor[CPE Descriptor Reference] for details on setting these options in the CPE Descriptor.
+
+<<ugr.tug.cpe.fig.cpe_instantiation>> shows a logical view of how an application uses the UIMA framework to instantiate a CPE from a CPE descriptor.
+The CPE descriptor identifies the CPE components (referencing their corresponding descriptors) and specifies the various options for configuring the CPM and deploying the CPE components.
+
+[[ugr.tug.cpe.fig.cpe_instantiation]]
+.CPE Instantiation
+image::images/tutorials_and_users_guides/tug.cpe/image018.png[Picture of deployment of a CPE]
+
+[[ugr.tug.cpe.deployment_alternatives]]
+=== CPE Deployment alternatives
+There are three deployment modes for CAS Processors (Analysis Engines and CAS Consumers) in a CPE:
+
+. *Integrated* (runs in the same Java instance as the CPM)
+. *Managed* (runs in a separate process on the same machine), and
+. *Non-managed* (runs in a separate process, perhaps on a different machine). 
+
+An integrated CAS Processor runs in the same JVM as the CPE.
+A managed CAS Processor runs in a separate process from the CPE, but still on the same computer.
+The CPE controls startup, shutdown, and recovery of a managed CAS Processor.
+A non-managed CAS Processor runs as a service and may be on the same computer as the CPE or on a remote computer.
+A non-managed CAS Processor _service_ is started and managed independently from the CPE.
+
+For both managed and non-managed CAS Processors, the CAS must be transmitted between separate processes and possibly between separate computers.
+This is accomplished using __Vinci__, a communication protocol used by the CPM and which is provided as a part of Apache UIMA.
+xref:tug.adoc#ugr.tug.application.how_to_deploy_a_vinci_service[Vinci] handles service naming and location and data transport.
+Service naming and location are provided by a __Vinci Naming Service__, or __VNS__.
+For managed CAS Processors, the CPE uses its own internal VNS.
+For non-managed CAS Processors, a separate VNS must be running.
+
+The CPE Configurator tool currently only supports constructing CPEs that deploy CAS Processors in integrated mode.
+To deploy CAS Processors in any other mode, the CPE descriptor must be edited by hand (better tooling may be provided later). Details on the CPE descriptor and the required settings for various CAS Processor deployment modes can be found in xref:ref.adoc#ugr.ref.xml.cpe_descriptor[Collection Processing Engine Descriptor Reference].
+In the following sections we merely summarize the various CAS Processor deployment options.
+
+[[ugr.tug.cpe.managed_deployment]]
+=== Deploying Managed CAS Processors
+
+Managed CAS Processor deployment is shown in <<ugr.tug.cpe.fig.managed_deployment>>.
+A managed CAS Processor is deployed by the CPE as a Vinci service.
+The CPE manages the lifecycle of the CAS Processor including service launch, restart on failures, and service shutdown.
+A managed CAS Processor runs on the same machine as the CPE, but in a separate process.
+This provides the necessary fault isolation for the CPE to protect it from non-robust CAS Processors.
+A fatal failure of a managed CAS Processor does not threaten the stability of the CPE.
+
+[[ugr.tug.cpe.fig.managed_deployment]]
+.CPE with Managed CAS Processors
+image::images/tutorials_and_users_guides/tug.cpe/image020.png[Managed deployment showing separate JVMs and CASes
+            flowing between them]
+
+The CPE communicates with managed CAS Processors using the Vinci communication protocol.
+A CAS Processor is launched as a Vinci service and its `process()` method is invoked remotely via a Vinci command.
+The CPE uses its own internal VNS to support managed CAS processors.
+The VNS, by default, listens on port 9005.
+If this port is not available, the VNS will increment its listen port until it finds one that is available.
+All managed CAS Processors are internally configured to "`talk`" to the CPE managed VNS.
+This internal VNS is transparent to the end user launching the CPE.
+
+To deploy a managed CAS Processor, the CPE deployer must change the CPE descriptor.
+The following is a section from the CPE descriptor that shows an example configuration specifying a managed CAS Processor.
+
+[source]
+----
+<casProcessor deployment="local" name="Meeting Detector TAE">
+  <descriptor>
+    <include href="deploy/vinci/Deploy_MeetingDetectorTAE.xml"/>
+  </descriptor>
+  <runInSeparateProcess>
+    <exec dir="." executable="java">
+      <env key="CLASSPATH" 
+         value="src;
+                C:/Program Files/apache/uima/lib/uima-core.jar;
+                C:/Program Files/apache/uima/lib/uima-cpe.jar;
+                C:/Program Files/apache/uima/lib/uima-examples.jar;
+                C:/Program Files/apache/uima/lib/uima-adapter-vinci.jar;
+                C:/Program Files/apache/uima/lib/jVinci.jar"/>
+      <arg>-DLOG=C:/Temp/service.log</arg>
+      <arg>org.apache.uima.reference_impl.collection.
+         service.vinci.VinciAnalysisEnginerService_impl</arg>
+      <arg>${descriptor}</arg>
+    </exec>
+  </runInSeparateProcess>
+  <deploymentParameters/>
+  <filter/>
+  <errorHandling>
+    <errorRateThreshold action="terminate" value="1/100"/>
+    <maxConsecutiveRestarts action="terminate" value="3"/>
+    <timeout max="100000"/>
+  </errorHandling>
+  <checkpoint batch="10000"/>
+</casProcessor>
+----
+
+Refer to the xref:ref.adoc#ugr.ref.xml.cpe_descriptor[CPE Descriptor Reference] for details and required settings.
+
+[[ugr.tug.cpe.deploying_nonmanaged_cas_processors]]
+=== Deploying Non-managed CAS Processors
+
+Non-managed CAS Processor deployment is shown in <<ugr.tug.cpe.fig.nonmanaged_cpe>>.
+In non-managed mode, the CPE supports connectivity to CAS Processors running on local or remote computers using Vinci.
+Non-managed processors are different from managed processors in two aspects: 
+
+. Non-managed processors are neither started nor stopped by the CPE.
+. Non-managed processors use an independent VNS, also neither started nor stopped by the CPE. 
+
+[[ugr.tug.cpe.fig.nonmanaged_cpe]]
+.CPE with non-managed CAS Processors
+image::images/tutorials_and_users_guides/tug.cpe/image023.png[Non-managed CPE deployment]
+
+While non-managed CAS Processors provide the same level of fault isolation and robustness as managed CAS Processors, error recovery support for non-managed CAS Processors is much more limited.
+In particular, the CPE cannot restart a non-managed CAS Processor after an error.
+
+Non-managed CAS Processors also require a separate Vinci Naming Service running on the network.
+This VNS must be xref:tug.adoc#ugr.tug.application.vns.starting[manually started] and monitored by the end user or application.
+
+To deploy a non-managed CAS Processor, the CPE deployer must change the CPE descriptor.
+The following is a section from the CPE descriptor that shows an example configuration for the non-managed CAS Processor.
+
+[source]
+----
+<casProcessor deployment="remote" name="Meeting Detector TAE">
+  <descriptor>
+    <include href=
+        "descriptors/vinciService/MeetingDetectorVinciService.xml"/>
+  </descriptor>
+  <deploymentParameters/>
+  <filter/>
+  <errorHandling>
+    <errorRateThreshold action="terminate" value="1/100"/>
+    <maxConsecutiveRestarts action="terminate" value="3"/>
+    <timeout max="100000"/>
+  </errorHandling>
+  <checkpoint batch="10000"/>
+</casProcessor>
+----
+
+Refer to the xref:ref.adoc#ugr.ref.xml.cpe_descriptor[CPE Descriptor Reference] for details and required settings.
+
+[[ugr.tug.cpe.integrated_deployment]]
+=== Deploying Integrated CAS Processors
+
+Integrated CAS Processors are shown in <<ugr.tug.cpe.fig.integrated_deployment>>.
+Here the CAS Processors run in the same JVM as the CPE, just like the Collection Reader and CAS Initializer.
+This deployment method results in minimal CAS communication and transport overhead as the CAS is shared in the same process space of the JVM.
+However, a CPE running with all integrated CAS Processors is limited in scalability by the capability of the single computer on which the CPE is running.
+There is also a stability risk associated with integrated processors because a poorly written CAS Processor can cause the JVM, and hence the entire CPE, to abort.
+
+[[ugr.tug.cpe.fig.integrated_deployment]]
+.CPE with integrated CAS Processor
+image::images/tutorials_and_users_guides/tug.cpe/image026.png[CPE with integrated CAS Processor]
+
+The following is a section from a CPE descriptor that shows an example configuration for the integrated CAS Processor.
+
+[source]
+----
+<casProcessor deployment=integrated name=Meeting Detector TAE>
+  <descriptor>
+    <include href="descriptors/tutorial/ex4/MeetingDetectorTAE.xml"/>
+  </descriptor>
+  <deploymentParameters/>
+  <filter/>
+  <errorHandling>
+    <errorRateThreshold action="terminate" value="100/1000"/>
+    <maxConsecutiveRestarts action="terminate" value="30"/>
+    <timeout max="100000"/>
+  </errorHandling>
+  <checkpoint batch="10000"/>
+</casProcessor>
+----
+
+Refer to the xref:ref.adoc#ugr.ref.xml.cpe_descriptor[CPE Descriptor Reference] for details and required settings.
+
+[[ugr.tug.cpe.collection_processing_examples]]
+== Collection Processing Examples
+
+The UIMA SDK includes a set of examples illustrating the three modes of deployment, integrated, managed, and non-managed.
+These are in the `/examples/descriptors/collection_processing_engine` directory.
+There are three CPE descriptors that run an example annotator (the Meeting Finder) in these modes.
+
+To run either the integrated or managed examples, use the `runCPE` script in the /bin directory of the UIMA installation, passing the appropriate CPE descriptor as an argument, or if you're using Eclipse and have the `uimaj-examples` project in your workspace, you can use the Eclipse Menu → Run → Run... → and then pick the  launch configuration "`UIMA Run CPE`".
+
+[NOTE]
+====
+The `runCPE` script _ must_  be run from the `%UIMA_HOME%\examples` directory, because the example CPE descriptors use relative path names that are resolved relative to this working directory.
+For instance, 
+
+  runCPE
+  descriptors\collection_processing_engine\MeetingFinderCPE_Integrated.xml
+====
+
+To run the non-managed example, there are some additional steps. 
+
+. Start a VNS service by running the `startVNS` script in the `/bin` directory, or using the Eclipse launcher "`UIMA Start VNS`".
+. Deploy the Meeting Detector Analysis Engine as a Vinci service, by running the `startVinciService` script in the `/bin` directory or using the Eclipse launcher for this, and passing it the location of the descriptor to deploy, in this case ``%UIMA_HOME%/examples/deploy/vinci/Deploy_MeetingDetectorTAE.xml``, or if you're using Eclipse and have the `uimaj-examples` project in your workspace, you can use the Eclipse Menu → Run → Run... → and then pick the  launch configuration "`UIMA Start Vinci Service`". 
+. Now, run the runCPE script (or if in Eclipse, run the  launch configuration "`UIMA Run CPE`"), passing it the CPE for the non-managed version `(%UIMA_HOME%/examples/descriptors/collection_processing_engine/ MeetingFinderCPE_NonManaged.xml` ). 
+
+This assumes that the Vinci Naming Service, the runCPE application, and the `MeetingDetectorTAE` service are all running on the same machine.
+Most of the scripts that need information about VNS will look for values to use in environment variables VNS_HOST and VNS_PORT; these default to "`localhost`" and "`9000`".
+You may set these to appropriate values before running the scripts, as needed; you can also pass the name of the VNS host as the second argument to the startVinciService script.
+
+Alternatively, you can edit the scripts and/or the XML files to specify alternatives for the VNS_HOST and VNS_PORT.
+For instance, if the `runCPE` application is running on a different machine from the Vinci Naming Service, you can edit the `MeetingFinderCPE_NonManaged.xml` and change the vnsHost parameter: `<parameter name="vnsHost"  value="localhost" type="string"/>` to specify the VNS host instead of "`localhost`".
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/tug/tug.fc.adoc b/uimaj-documentation/src/docs/asciidoc/tug/tug.fc.adoc
new file mode 100644
index 0000000..0a75f53
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/tug/tug.fc.adoc
@@ -0,0 +1,294 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.tug.fc]]
+= Flow Controller Developer's Guide
+
+A Flow Controller is a component that plugs into an Aggregate Analysis Engine.
+When a CAS is input to the Aggregate, the Flow Controller determines the order in which the components of that aggregate are invoked on that CAS.
+The ability to provide your own Flow Controller implementation is new as of release 2.0 of UIMA.
+
+Flow Controllers may decide the flow dynamically, based on the contents of the CAS.
+So, as just one example, you could develop a Flow Controller that first sends each CAS to a Language Identification Annotator and then, based on the output of the Language Identification Annotator, routes that CAS to an Annotator that is specialized for that particular language.
+
+[[ugr.tug.fc.developing_fc_code]]
+== Developing the Flow Controller Code
+
+[[ugr.tug.fc.fc_interface_overview]]
+=== Flow Controller Interface Overview
+
+Flow Controller implementations should extend from the `JCasFlowController_ImplBase` or `CasFlowController_ImplBase` classes, depending on which CAS interface they prefer to use.
+As with other types of components, the Flow Controller ImplBase classes define optional ``initialize``, ``destroy``, and `reconfigure` methods.
+They also define the required method ``computeFlow``.
+
+The `computeFlow` method is called by the framework whenever a new CAS enters the Aggregate Analysis Engine.
+It is given the CAS as an argument and must return an object which implements the `Flow` interface (the Flow object). The Flow Controller developer must define this object.
+It is the object that is responsible for routing this particular CAS through the components of the Aggregate Analysis Engine.
+For convenience, the framework provides basic implementation of flow objects in the classes CasFlow_ImplBase and JCasFlow_ImplBase; use the JCas one if you are using the JCas interface to the CAS.
+
+The framework then uses the Flow object and calls its `next()` method, which returns a `Step` object (implemented by the UIMA Framework) that indicates what to do next with this CAS next.
+There are three types of steps currently supported:
+
+* ``SimpleStep``, which specifies a single Analysis Engine that should receive the CAS next.
+* ``ParallelStep``, which specifies that multiple Analysis Engines should receive the CAS next, and that the relative order in which these Analysis Engines execute does not matter. Logically, they can run in parallel. The runtime is not obligated to actually execute them in parallel, however, and the current implementation will execute them serially in an arbitrary order.
+* ``FinalStep``, which indicates that the flow is completed. 
+
+After executing the step, the framework will call the Flow object's `next()` method again to determine the next destination, and this will be repeated until the Flow Object indicates that processing is complete by returning a ``FinalStep``.
+
+The Flow Controller has access to a ``FlowControllerContext``, which is a subtype of ``UimaContext``.
+In addition to the configuration parameter and resource access provided by a ``UimaContext``, the `FlowControllerContext` also gives access to the metadata for all of the Analysis Engines that the Flow Controller can route CASes to.
+Most Flow Controllers will need to use this information to make routing decisions.
+You can get a handle to the `FlowControllerContext` by calling the `getContext()` method defined in `JCasFlowController_ImplBase` and ``CasFlowController_ImplBase``.
+Then, the `FlowControllerContext.getAnalysisEngineMetaDataMap` method can be called to get a map containing an entry for each of the Analysis Engines in the Aggregate.
+The keys in this map are the same as the delegate analysis engine keys specified in the aggregate descriptor, and the values are the corresponding `AnalysisEngineMetaData` objects.
+
+Finally, the Flow Controller has optional methods `addAnalysisEngines` and ``removeAnalysisEngines``.
+These methods are intended to notify the Flow Controller if new Analysis Engines are available to route CASes to, or if previously available Analysis Engines are no longer available.
+However, the current version of the Apache UIMA framework does not support dynamically adding or removing Analysis Engines to/from an aggregate, so these methods are not currently called.
+Future versions may support this feature. 
+
+[[ugr.tug.fc.example_code]]
+=== Example Code
+
+This section walks through the source code of an example Flow Controller that simluates a simple version of the "`Whiteboard`" flow model.
+At each step of the flow, the Flow Controller looks it all of the available Analysis Engines that have not yet run on this CAS, and picks one whose input requirements are satisfied.
+
+The Java class for the example is `org.apache.uima.examples.flow.WhiteboardFlowController` and the source code is included in the UIMA SDK under the `examples/src` directory.
+
+[[ugr.tug.fc.whiteboard]]
+==== The WhiteboardFlowController Class
+
+[source]
+----
+public class WhiteboardFlowController 
+          extends CasFlowController_ImplBase {
+  public Flow computeFlow(CAS aCAS) 
+          throws AnalysisEngineProcessException {
+    WhiteboardFlow flow = new WhiteboardFlow();
+    // As of release 2.3.0, the following is not needed,
+    //   because the framework does this automatically
+    // flow.setCas(aCAS); 
+                        
+    return flow;
+  }
+
+  class WhiteboardFlow extends CasFlow_ImplBase {
+     // Discussed Later
+  }
+}
+----
+
+The `WhiteboardFlowController` extends from `CasFlowController_ImplBase` and implements the `computeFlow` method.
+The implementation of the `computeFlow` method is very simple; it just constructs a new `WhiteboardFlow` object that will be responsible for routing this CAS.
+The framework will add a handle to that CAS which it will later use to make its routing decisions.
+
+Note that we will have one instance of `WhiteboardFlow` per CAS, so if there are multiple CASes being simultaneously processed there will not be any confusion.
+
+[[ugr.tug.fc.whiteboardflow]]
+==== The WhiteboardFlow Class
+
+[source]
+----
+class WhiteboardFlow extends CasFlow_ImplBase {
+  private Set mAlreadyCalled = new HashSet();
+
+  public Step next() throws AnalysisEngineProcessException {
+    // Get the CAS that this Flow object is responsible for routing.
+    // Each Flow instance is responsible for a single CAS.
+    CAS cas = getCas();
+
+    // iterate over available AEs
+    Iterator aeIter = getContext().getAnalysisEngineMetaDataMap().
+        entrySet().iterator();
+    while (aeIter.hasNext()) {
+      Map.Entry entry = (Map.Entry) aeIter.next();
+      // skip AEs that were already called on this CAS
+      String aeKey = (String) entry.getKey();
+      if (!mAlreadyCalled.contains(aeKey)) {
+        // check for satisfied input capabilities 
+        //(i.e. the CAS contains at least one instance
+        // of each required input
+        AnalysisEngineMetaData md = 
+            (AnalysisEngineMetaData) entry.getValue();
+        Capability[] caps = md.getCapabilities();
+        boolean satisfied = true;
+        for (int i = 0; i < caps.length; i++) {
+          satisfied = inputsSatisfied(caps[i].getInputs(), cas);
+          if (satisfied)
+            break;
+        }
+        if (satisfied) {
+          mAlreadyCalled.add(aeKey);
+          if (mLogger.isLoggable(Level.FINEST)) {
+            getContext().getLogger().log(Level.FINEST, 
+                "Next AE is: " + aeKey);
+          }
+          return new SimpleStep(aeKey);
+        }
+      }
+    }
+    // no appropriate AEs to call - end of flow
+    getContext().getLogger().log(Level.FINEST, "Flow Complete.");
+    return new FinalStep();
+  }
+
+  private boolean inputsSatisfied(TypeOrFeature[] aInputs, CAS aCAS) {
+      //implementation detail; see the actual source code
+  }
+}
+----
+
+Each instance of the `WhiteboardFlowController` is responsible for routing a single CAS.
+A handle to the CAS instance is available by calling the `getCas()` method, which is a standard method defined on the ``CasFlow_ImplBase ``superclass.
+
+Each time the `next` method is called, the Flow object iterates over the metadata of all of the available Analysis Engines (obtained via the call to `getContext().
+          getAnalysisEngineMetaDataMap)` and sees if the input types declared in an AnalysisEngineMetaData object are satisfied by the CAS (that is, the CAS contains at least one instance of each declared input type). The exact details of checking for instances of types in the CAS are not discussed here – see the WhiteboardFlowController.java file for the complete source.
+
+When the Flow object decides which AnalysisEngine should be called next, it indicates this by creating a SimpleStep object with the key for that AnalysisEngine and returning it:
+
+[source]
+----
+return new SimpleStep(aeKey);
+----
+
+The Flow object keeps a list of which Analysis Engines it has invoked in the `mAlreadyCalled` field, and never invokes the same Analysis Engine twice.
+Note this is not a hard requirement.
+It is acceptable to design a FlowController that invokes the same Analysis Engine more than once.
+However, if you do this you must make sure that the flow will eventually terminate.
+
+If there are no Analysis Engines left whose input requirements are satisfied, the Flow object signals the end of the flow by returning a FinalStep object:
+
+[source]
+----
+return new FinalStep();
+----
+
+Also, note the use of the logger to write tracing messages indicating the decisions made by the Flow Controller.
+This is a good practice that helps with debugging if the Flow Controller is behaving in an unexpected way.
+
+[[ugr.tug.fc.creating_fc_descriptor]]
+== Creating the Flow Controller Descriptor
+
+To create a Flow Controller Descriptor in the CDE, use File →New →Other →UIMA →Flow Controller Descriptor File: 
+
+
+image::images/tutorials_and_users_guides/tug.fc/image002.jpg[Screenshot of Eclipse new object wizard showing Flow Controller]
+
+This will bring up the Overview page for the Flow Controller Descriptor: 
+
+
+image::images/tutorials_and_users_guides/tug.fc/image004.jpg[Screenshot of Component Descriptor Editor Overview page for new Flow Controller]
+
+Type in the Java class name that implements the Flow Controller, or use the "`Browse`" button to select it.
+You must select a Java class that implements the `FlowController` interface.
+
+Flow Controller Descriptors are very similar to Primitive Analysis Engine Descriptors –for example you can specify configuration parameters and external resources if you wish.
+
+If you wish to edit a Flow Controller Descriptor by hand, see xref:ref.adoc#ugr.ref.xml.component_descriptor.flow_controller[Flow Controller Descriptor Reference] for the syntax.
+
+[[ugr.tug.fc.adding_fc_to_aggregate]]
+== Adding a Flow Controller to an Aggregate Analysis Engine
+// <titleabbrev>Adding Flow Controller to an Aggregate</titleabbrev>
+
+To use a Flow Controller you must add it to an Aggregate Analysis Engine.
+You can only have one Flow Controller per Aggregate Analysis Engine.
+In the Component Descriptor Editor, the Flow Controller is specified on the Aggregate page, as a choice in the flow control kind - pick "`User-defined Flow`".
+When you do, the Browse and Search buttons underneath become active, and allow you to specify an existing Flow Controller Descriptor, which when you select it, will be imported into the aggregate descriptor. 
+
+
+image::images/tutorials_and_users_guides/tug.fc/image006.jpg[Screenshot of Component Descriptor Editor Aggregate page showing selecting user-defined flow]
+
+The key name is created automatically from the name element in the Flow Controller Descriptor being imported.
+If you need to change this name, you can do so by switching to the "`Source`" view using the bottom tabs, and editing the name in the XML source.
+
+If you edit your Aggregate Analysis Engine Descriptor by hand, the syntax for adding a Flow Controller is: 
+[source]
+----
+  <delegateAnalysisEngineSpecifiers>
+    ...
+  </delegateAnalysisEngineSpecifiers>  
+  <flowController key=[String]>
+    <import .../> 
+  </flowController>
+----
+
+As usual, you can xref:ref.adoc#ugr.ref.xml.component_descriptor.imports[import] either by location or  by name.
+
+The key that you assign to the FlowController can be used elsewhere in the Aggregate Analysis Engine Descriptor -- in parameter overrides, resource bindings, and Sofa mappings.
+
+[[ugr.tug.fc.adding_fc_to_cpe]]
+== Adding a Flow Controller to a Collection Processing Engine
+// <titleabbrev>Adding Flow Controller to CPE</titleabbrev>
+
+Flow Controllers cannot be added directly to Collection Processing Engines.
+To use a Flow Controller in a CPE you first need to wrap the part of your CPE that requires complex flow control into an Aggregate Analysis Engine, and then add the Aggregate Analysis Engine to your CPE.
+The CPE's deployment and error handling options can then only be configured for the entire Aggregate Analysis Engine as a unit.
+
+[[ugr.tug.fc.using_fc_with_cas_multipliers]]
+== Using Flow Controllers with CAS Multipliers
+
+If you want your Flow Controller to work inside an Aggregate Analysis Engine that contains a xref:tug.adoc#ugr.tug.cm[CAS Multiplier], there are additional things you must consider.
+
+When your Flow Controller routes a CAS to a CAS Multiplier, the CAS Multiplier may produce new CASes that then will also need to be routed by the Flow Controller.
+When a new output CAS is produced, the framework will call the `newCasProduced` method on the Flow object that was managing the flow of the parent CAS  (the one that was input to the CAS Multiplier). The `newCasProduced` method must create a new Flow  object that will be responsible for routing the new output CAS.
+
+In the `CasFlow_ImplBase` and `JCasFlow_ImplBase` classes, the `newCasProduced` method is defined to throw an exception indicating that the Flow Controller does not handle CAS Multipliers.
+If you want your Flow Controller to properly deal with CAS Multipliers you must override this method.
+
+If your Flow class extends ``CasFlow_ImplBase``, the method signature to override is: 
+[source]
+----
+protected Flow newCasProduced(CAS newOutputCas, String producedBy)
+----
+
+If your Flow class extends ``JCasFlow_ImplBase``, the method signature to override is: 
+[source]
+----
+protected Flow newCasProduced(JCas newOutputCas, String producedBy)
+----
+
+Also, there is a variant of `FinalStep` which can only be specified for output CASes produced by CAS Multipliers within the Aggregate Analysis Engine containing the Flow Controller.
+This version of `FinalStep` is produced by the calling the constructor with a `true` argument, and it causes the CAS to be immediately released back to the pool.
+No further processing will be done on it and it will not be output from the aggregate.
+This is the way that you can build an Aggregate Analysis Engine that outputs some new CASes but not others.
+Note that if you never want any new CASes to be output from the Aggregate Analysis Engine, you don't need to use this; instead just declare `<outputsNewCASes>false</outputsNewCASes>` in your xref:tug.adoc#ugr.tug.cm.aggregate_cms[Aggregate Analysis Engine Descriptor].
+
+For more information on how CAS Multipliers interact with Flow Controllers, <<ugr.tug.cm.cm_and_fc>>.
+
+[[ugr.tug.fc.continuing_when_exceptions_occur]]
+== Continuing the Flow When Exceptions Occur
+
+If an exception occurs when processing a CAS, the framework may call the method 
+[source]
+----
+boolean continueOnFailure(String failedAeKey, Exception failure)
+----
+on the Flow object that was managing the flow of that CAS.
+If this method returns ``true``, then the framework may continue to call the `next()` method to continue routing the CAS.
+If this method returns `false` (the default), the framework will not make any more calls to the `next()` method. 
+
+In the case where the last Step was a ParallelStep, if at least one of the destinations resulted in a failure, then `continueOnFailure` will be called to report one of the failures.
+If this method returns true, but one of the other destinations in the ParallelStep resulted in a failure, then the `continueOnFailure` method will be called again to report the next failure.
+This continues until either this method returns false or there are no more failures. 
+
+Note that it is possible for processing of a CAS to be aborted without this method being called.
+This method is only called when an attempt is being made to continue processing of the CAS following an exception, which may be an application configuration decision.
+
+In any case, if processing is aborted by the framework for any reason, including because `continueOnFailure` returned false, the framework will call the `Flow.aborted()` method to allow the Flow object to clean up any resources.
+
+For an example of how to continue after an exception, see the example code ``org.apache.uima.examples.flow.AdvancedFixedFlowController``, in the `examples/src` directory of the UIMA SDK.
+This exampe also demonstrates the use of ``ParallelStep``.
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/tug/tug.multi_views.adoc b/uimaj-documentation/src/docs/asciidoc/tug/tug.multi_views.adoc
new file mode 100644
index 0000000..758c497
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/tug/tug.multi_views.adoc
@@ -0,0 +1,548 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.tug.mvs]]
+= Multiple CAS Views of an Artifact
+// <titleabbrev>Multiple CAS Views</titleabbrev>
+
+UIMA provides an extension to the basic model of the CAS which supports analysis of multiple views of the same artifact, all contained with the CAS.
+This chapter describes the concepts, terminology, and the API and XML extensions that enable this.
+
+Multiple CAS Views can simplify things when different versions of the artifact are needed at different stages of the analysis.
+They are also key to enabling multimodal analysis where the initial artifact is transformed from one modality to another, or where the artifact itself is multimodal, such as the audio, video and closed-captioned text associated with an MPEG object.
+Each representation of the artifact can be analyzed independently with the standard UIMA programming model; in addition, multi-view components and applications can be constructed.
+
+UIMA supports this by augmenting the CAS with additional light-weight CAS objects, one for each view, where these objects share most of the same underlying CAS, except for two things: each view has its own set of indexed Feature Structures, and each view has its own subject of analysis (Sofa) - its own version of the artifact being analyzed.
+The Feature Structure instances themselves are in the shared part of the CAS; only the entries in the indexes are unique for each CAS view.
+
+All of these CAS view objects are kept together with the CAS, and passed as a unit between components in a UIMA application.
+APIs exist which allow components and applications to switch among the various view objects, as needed.
+
+Feature Structures may be indexed in multiple views, if necessary.
+New methods on CAS Views facilitate adding or removing Feature Structures to or from their index repositories:
+
+[source]
+----
+aView.addFsToIndexes(aFeatureStructure) 
+aView.removeFsFromIndexes(aFeatureStructure)
+----
+
+specify the view in which this Feature Structure should be added to or removed from the indexes.
+
+[[ugr.tug.mvs.cas_views_and_sofas]]
+== CAS Views and Sofas
+
+xref:tug.adoc#ugr.tug.aas.sofa[Sofas] and CAS Views are linked.
+In this implementation, every CAS view has one associated Sofa, and every Sofa has one associated CAS View.
+
+[[ugr.tug.mvs.naming_views_sofas]]
+=== Naming CAS Views and Sofas
+
+The developer assigns a name to the View / Sofa, which is a simple string (following the rules for Java identifiers, usually without periods, but see special exception below). These names are declared in the component XML metadata, and are used during assembly and by the runtime to enable switching among multiple Views of the CAS at the same time.
+
+[NOTE]
+====
+The name is called the Sofa name, for historical reasons, but it applies equally to the View.
+In the rest of this chapter, we'll refer to it as the Sofa name.
+====
+
+Some applications contain components that expect a variable number of Sofas as input or output.
+An example of a component that takes a variable number of input Sofas could be one that takes several translations of a document and merges them, where each translation was in a separate Sofa. 
+
+You can specify a variable number of input or output sofa names, where each name has the same base part, by writing the base part of the name (with no periods), followed by a period character and an asterisk character (.*). These denote sofas that have names matching the base part up to the period; for example, names such as `base_name_part.TTX_3d` would match a specification of ``base_name_part.*``.
+
+[[ugr.tug.mvs.multi_view_and_single_view]]
+=== Multi-View, Single-View components & applications
+// <titleabbrev>Multi/Single View parts in Applications</titleabbrev>
+
+Components and applications can be written to be Multi-View or Single-View.
+Most components used as primitive building blocks are expected to be Single-View.
+UIMA provides capabilities to combine these kinds of components with Multi-View components when assembling analysis aggregates or applications.
+
+Single-View components and applications use only one subject of analysis, and one CAS View.
+The code and descriptors for these components do not use the facilities described in this chapter.
+
+Conversely, Multi-View components and applications are aware of the possibility of multiple Views and Sofas, and have code and XML descriptors that create and manipulate them.
+
+[[ugr.tug.mvs.multi_view_components]]
+== Multi-View Components
+
+[[ugr.tug.mvs.deciding_multi_view]]
+=== How UIMA decides if a component is Multi-View
+// <titleabbrev>Deciding: Multi-View</titleabbrev>
+
+Every UIMA component has an associated XML Component Descriptor.
+Multi-View components are identified simply as those whose descriptors declare one or more Sofa names in their Capability sections, as inputs or outputs.
+If a Component Descriptor does not mention any input or output Sofa names, the framework treats that component as a Single-View component.
+
+[[ugr.tug.mvs.additional_capabilities]]
+=== Multi-View: additional capabilities
+
+Additional capabilities provided for components and applications aware of the possibilities of multiple Views and Sofas include:
+
+* Creating new Views, and for each, setting up the associated Sofa data
+* Getting a reference to an existing View and its associated Sofa, by name 
+* Specifying a view in which to index a particular Feature Structure instance 
+
+
+[[ugr.tug.mvs.component_xml_metadata]]
+=== Component XML metadata
+
+Each Multi-View component that creates a Sofa or wants to switch to a specific previously created Sofa must declare the name for the Sofa in the capabilities section.
+For example, a component expecting as input a web document in html format and creating a plain text document for further processing might declare:
+
+[source]
+----
+<capabilities>
+  <capability>
+    <inputs/>
+    <outputs/>
+    <inputSofas>
+      <sofaName>rawContent</sofaName>
+    </inputSofas>
+    <outputSofas>
+      <sofaName>detagContent</sofaName>
+    </outputSofas>
+  </capability>
+</capabilities>
+----
+
+Details on this specification are found in xref:ref.adoc#ugr.ref.xml.component_descriptor[Component Descriptor Reference].
+The Component Descriptor Editor supports Sofa declarations on the xref:tools.adoc#ugr.tools.cde.capabilitie[Capabilites Page].
+
+[[ugr.tug.mvs.sofa_capabilities_and_apis_for_apps]]
+== Sofa Capabilities and APIs for Applications
+// <titleabbrev>Sofa Capabilities &amp; APIs for Apps</titleabbrev>
+
+In addition to components, applications can make use of these capabilities.
+When an application creates a new CAS, it also creates the initial view of that CAS - and this view is the object that is returned from the create call.
+Additional views beyond this first one can be dynamically created at any time.
+The application can use the Sofa APIs described in <<ugr.tug.aas>> to specify the data to be analyzed.
+
+If an Application creates a new CAS, the initial CAS that is created will be a view named "`_InitialView`".
+This name can be used in the application and in Sofa Mapping (see the next section) to refer to this otherwise unnamed view.
+
+[[ugr.tug.mvs.sofa_name_mapping]]
+== Sofa Name Mapping
+
+Sofa Name mapping is the mechanism which enables UIMA component developers to choose locally meaningful Sofa names in their source code and let aggregate, collection processing engine developers, and application developers connect output Sofas created in one component to input Sofas required in another.
+
+At a given aggregation level, the assembler or application developer defines names for all the Sofas, and then specifies how these names map to the contained components, using the Sofa Map.
+
+Consider annotator code to create a new CAS view:
+
+[source]
+----
+CAS viewX = cas.createView("X");
+----
+
+Or code to get an existing CAS view:
+
+[source]
+----
+CAS viewX = cas.getView("X");
+----
+
+Without Sofa name mapping the SofaID for the new Sofa will be "`X`".
+However, if a name mapping for "`X`" has been specified by the aggregate or CPE calling this annotator, the actual SofaID in the CAS can be different.
+
+All Sofas in a CAS must have unique names.
+This is accomplished by mapping all declared Sofas as described in the following sections.
+An attempt to create a Sofa with a SofaID already in use will throw an exception.
+
+Sofa name mapping must not use the "`$$.$$`" (period) character.
+Runtime Sofa mapping maps names up to the "`$$.$$`" and appends the period and the following characters to the mapped name.
+
+To get a Java Iterator for all the views in a CAS:
+
+[source]
+----
+Iterator allViews = cas.getViewIterator();
+----
+
+To get a Java Iterator for selected views in a CAS, for example, views whose name  is either exactly equal to namePrefix or is of the form namePrefix.suffix, where suffix  can be any String:
+
+[source]
+----
+Iterator someViews = cas.getViewIterator(String namePrefix);
+----
+
+[NOTE]
+====
+Sofa name mapping is applied to namePrefix.
+====
+
+Sofa name mappings are not currently supported for remote Analysis Engines.
+See <<ugr.tug.mvs.name_mapping_remote_services>>.
+
+[[ugr.tug.mvs.name_mapping_aggregate]]
+=== Name Mapping in an Aggregate Descriptor
+
+For each component of an Aggregate, name mapping specifies the conversion between component Sofa names and names at the aggregate level.
+
+Here's an example.
+Consider two Multi-View annotators to be assembled into an aggregate which takes an audio segment consisting of spoken English and produces a German text translation.
+
+The first annotator takes an audio segment as input Sofa and produces a text transcript as output Sofa.
+The annotator designer might choose these Sofa names to be "`AudioInput`" and "`TranscribedText`".
+
+The second annotator is designed to translate text from English to German.
+This developer might choose the input and output Sofa names to be "`EnglishDocument`" and "`GermanDocument`", respectively.
+
+In order to hook these two annotators together, the following section would be added to the top level of the aggregate descriptor:
+
+[source]
+----
+<sofaMappings>
+  <sofaMapping>
+    <componentKey>SpeechToText</componentKey>
+    <componentSofaName>AudioInput</componentSofaName>
+    <aggregateSofaName>SegementedAudio</aggregateSofaName>
+  </sofaMapping>
+  <sofaMapping>
+    <componentKey>SpeechToText</componentKey>
+    <componentSofaName>TranscribedText</componentSofaName>
+    <aggregateSofaName>EnglishTranscript</aggregateSofaName>
+  </sofaMapping>
+  <sofaMapping>
+    <componentKey>EnglishToGermanTranslator</componentKey>
+    <componentSofaName>EnglishDocument</componentSofaName>
+    <aggregateSofaName>EnglishTranscript</aggregateSofaName>
+  </sofaMapping>
+  <sofaMapping>
+    <componentKey>EnglishToGermanTranslator</componentKey>
+    <componentSofaName>GermanDocument</componentSofaName>
+    <aggregateSofaName>GermanTranslation</aggregateSofaName>
+  </sofaMapping>
+</sofaMappings>
+----
+
+The Component Descriptor Editor supports xref:tools.adoc#ugr.tools.cde.capabilities.sofa_name_mapping[Sofa name mapping] in aggregates and simplifies the task.
+
+[[ugr.tug.mvs.name_mapping_cpe]]
+=== Name Mapping in a CPEDescriptor
+
+The CPE descriptor aggregates together a Collection Reader and CAS Processors (Annotators and CAS Consumers). 
+xref:ref.adoc#ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.sofa_name_mappings[Sofa mappings] can be added to the following elements of CPE descriptors: ``<collectionIterator>``, `<casInitializer>` and the ``<casProcessor>``.
+To be consistent with the organization of CPE descriptors, the maps for the CPE descriptor are distributed among the XML markup for each of the parts (collectionIterator, casInitializer, casProcessor). Because of this the `<componentKey>` element is not needed.
+Finally, rather than sub-elements for the parts, the XML markup for these uses attributes.
+
+Here's an example.
+Let's use the aggregate from the previous section in a collection processing engine.
+Here we will add a Collection Reader that outputs audio segments in an output Sofa named "`nextSegment`".
+Remember to declare an output Sofa nextSegment in the collection reader description.
+We'll add a CAS Consumer in the next section.
+
+[source]
+----
+<collectionReader>
+  <collectionIterator>
+    <descriptor>
+    . . .
+    </descriptor>
+    <configurationParameterSettings>...</configurationParameterSettings>
+    <sofaNameMappings>
+      <sofaNameMapping componentSofaName="nextSegment"
+                       cpeSofaName="SegementedAudio"/>
+      </sofaNameMappings>
+  </collectionIterator>
+  <casInitializer/>
+<collectionReader>
+----
+
+At this point the CAS Processor section for the aggregate does not need any Sofa mapping because the aggregate input Sofa has the same name, "`SegementedAudio`", as is being produced by the Collection Reader.
+
+[[ugr.tug.mvs.specifying_cas_view_for_process]]
+=== Specifying the CAS View delivered to a Components Process Method
+// <titleabbrev>CAS View received by Process</titleabbrev>
+
+All components receive a Sofa named "`_InitialView`", or a Sofa that is mapped to this name.
+
+For example, assume that the CAS Consumer to be used in our CPE is a Single-View component that expects the analysis results associated with the input CAS, and that we want it to use the results from the translated German text Sofa.
+The following mapping added to the CAS Processor section for the CPE will instruct the CPE to get the CAS view for the German text Sofa and pass it to the CAS Consumer:
+
+[source]
+----
+<casProcessor>
+  . . .
+  <sofaNameMappings>
+    <sofaNameMapping componentSofaName="_InitialView"
+                           cpeSofaName="GermanTranslation"/>
+  <sofaNameMappings>
+</casProcessor>
+----
+
+An alternative syntax for this kind of mapping is to simply leave out the component sofa name in this case.
+
+[[ugr.tug.mvs.name_mapping_application]]
+=== Name Mapping in a UIMA Application
+
+Applications which instantiate UIMA components directly using the UIMAFramework methods can also create a top level Sofa mapping using the "`additional parameters`" capability.
+
+[source]
+----
+//create a "root" UIMA context for your whole application
+
+UimaContextAdmin rootContext =
+   UIMAFramework.newUimaContext(UIMAFramework.getLogger(),
+      UIMAFramework.newDefaultResourceManager(),
+      UIMAFramework.newConfigurationManager());
+
+input = new XMLInputSource("test.xml");
+desc = UIMAFramework.getXMLParser().parseAnalysisEngineDescription(input);
+
+//setup sofa name mappings using the api
+
+HashMap sofamappings = new HashMap();
+sofamappings.put("localName1", "globalName1");
+sofamappings.put("localName2", "globalName2");
+  
+//create a UIMA Context for the new AE we are about to create
+
+//first argument is unique key among all AEs used in the application
+UimaContextAdmin childContext = rootContext.createChild("myAE", sofamap);
+
+//instantiate AE, passing the UIMA Context through the additional
+//parameters map
+
+Map additionalParams = new HashMap();
+additionalParams.put(Resource.PARAM_UIMA_CONTEXT, childContext);
+
+AnalysisEngine ae = 
+        UIMAFramework.produceAnalysisEngine(desc,additionalParams);
+----
+
+Sofa mappings are applied from the inside out, i.e., local to global.
+First, any aggregate mappings are applied, then any CPE mappings, and finally, any specified using this "`additional parameters`" capability.
+
+[[ugr.tug.mvs.name_mapping_remote_services]]
+=== Name Mapping for Remote Services
+
+Currently, no client-side Sofa mapping information is passed from a UIMA client to a remote service.
+This can cause complications for UIMA services in a Multi-View application.
+
+Remote Multi-View services will work only if the service is Single-View, or if the  Sofa names expected by the service exactly match the Sofa names produced by the client.
+
+If your application requires Sofa mappings for a remote Analysis Engine, you can wrap your remotely deployed AE in an aggregate (on the remote side), and specify the necessary Sofa mappings in the descriptor for that aggregate.
+
+[[ugr.tug.mvs.jcas_extensions_for_multi_views]]
+== JCas extensions for Multiple Views
+
+The JCas interface to the CAS can be used with any / all views.
+You can always get a JCas object from an existing CAS object by using the method getJCas(); this call will create the JCas if it doesn't already exist.
+If it does exist, it just returns the existing JCas that corresponds to the CAS.
+
+JCas implements the getView(...) method, enabling switching to other named views, just like the corresponding method on the CAS.
+The JCas version, however, returns JCas objects, instead of CAS objects, corresponding to the view.
+
+[[ugr.tug.mvs.sample_application]]
+== Sample Multi-View Application
+
+The UIMA SDK contains a simple Sofa example application which demonstrates many Sofa specific concepts and methods.
+The source code for the application driver is in `examples/src/org/apache/uima/examples/SofaExampleApplication.java` and the Multi-View annotator is given in `SofaExampleAnnotator.java` in the same directory.
+
+This sample application demonstrates a language translator annotator which expects an input text Sofa with an English document and creates an output text Sofa containing a German translation.
+Some of the key Sofa concepts illustrated here include:
+
+* Sofa creation.
+* Access of multiple CAS views.
+* Unique feature structure index space for each view.
+* Feature structures containing cross references between annotations in different CAS views.
+* The strong affinity of annotations with a specific Sofa. 
+
+
+[[ugr.tug.mvs.sample_application.descriptor]]
+=== Annotator Descriptor
+
+The annotator descriptor in `examples/descriptors/analysis_engine/SofaExampleAnnotator.xml` declares an input Sofa named "`EnglishDocument`" and an output Sofa named "`GermanDocument`".
+A custom type "`CrossAnnotation`" is also defined:
+
+[source]
+----
+<typeDescription>
+  <name>sofa.test.CrossAnnotation</name>
+  <description/>
+  <supertypeName>uima.tcas.Annotation</supertypeName>
+  <features>
+    <featureDescription>
+      <name>otherAnnotation</name>
+      <description/>
+      <rangeTypeName>uima.tcas.Annotation</rangeTypeName>
+    </featureDescription>
+  </features>
+</typeDescription>
+----
+
+The `CrossAnnotation` type is derived from ``uima.tcas.Annotation ``and includes one new feature: a reference to another annotation.
+
+[[ugr.tug.mvs.sample_application.setup]]
+=== Application Setup
+
+The application driver instantiates an analysis engine, ``seAnnotator``, from the annotator descriptor, obtains a new CAS using that engine's CAS definition, and creates the expected input Sofa using:
+
+[source]
+----
+CAS cas = seAnnotator.newCAS();
+CAS aView = cas.createView("EnglishDocument");
+----
+
+Since `seAnnotator` is a primitive component, and no Sofa mapping has been defined, the SofaID will be "`EnglishDocument`".
+Local Sofa data is set using:
+
+[source]
+----
+aView.setDocumentText("this beer is good");
+----
+
+At this point the CAS contains all necessary inputs for the translation annotator and its process method is called.
+
+[[ugr.tug.mvs.sample_application.annotator_processing]]
+=== Annotator Processing
+
+Annotator processing consists of parsing the English document into individual words, doing word-by-word translation and concatenating the translations into a German translation.
+Analysis metadata on the English Sofa will be an annotation for each English word.
+Analysis metadata on the German Sofa will be a `CrossAnnotation` for each German word, where the `otherAnnotation` feature will be a reference to the associated English annotation.
+
+Code of interest includes two CAS views:
+
+[source]
+----
+// get View of the English text Sofa
+englishView = aCas.getView("EnglishDocument");
+
+// Create the output German text Sofa
+germanView = aCas.createView("GermanDocument");
+----
+
+the indexing of annotations with the appropriate view:
+
+[source]
+----
+englishView.addFsToIndexes(engAnnot);
+. . .
+germanView.addFsToIndexes(germAnnot);
+----
+
+and the combining of metadata belonging to different Sofas in the same feature structure:
+
+[source]
+----
+// add link to English text
+germAnnot.setFeatureValue(other, engAnnot);
+----
+
+[[ugr.tug.mvs.sample_application.accessing_results]]
+=== Accessing the results of analysis
+
+The application needs to get the results of analysis, which may be in different views.
+Analysis results for each Sofa are dumped independently by iterating over all annotations for each associated CAS view.
+For the English Sofa:
+
+[source]
+----
+for (Annotation annot : aView.getAnnotationIndex()) {
+  System.out.println(" " + annot.getType().getName()
+                         + ": " + annot.getCoveredText());
+}
+----
+
+Iterating over all German annotations looks the same, except for the following:
+
+[source]
+----
+if (annot.getType() == cross) {
+  AnnotationFS crossAnnot =
+          (AnnotationFS) annot.getFeatureValue(other);
+  System.out.println("   other annotation feature: "
+          + crossAnnot.getCoveredText());
+}
+----
+
+Of particular interest here is the built-in Annotation type method ``getCoveredText()``.
+This method uses the "`begin`" and "`end`" features of the annotation to create a substring from the CAS document.
+The SofaRef feature of the annotation is used to identify the correct Sofa's data from which to create the substring.
+
+The example program output is:
+
+[source]
+----
+---Printing all annotations for English Sofa---
+uima.tcas.DocumentAnnotation: this beer is good
+uima.tcas.Annotation: this
+uima.tcas.Annotation: beer
+uima.tcas.Annotation: is
+uima.tcas.Annotation: good
+      
+---Printing all annotations for German Sofa---
+uima.tcas.DocumentAnnotation: das bier ist gut
+sofa.test.CrossAnnotation: das
+ other annotation feature: this
+sofa.test.CrossAnnotation: bier
+ other annotation feature: beer
+sofa.test.CrossAnnotation: ist
+ other annotation feature: is
+sofa.test.CrossAnnotation: gut
+ other annotation feature: good
+----
+
+[[ugr.tug.mvs.views_api_summary]]
+== Views API Summary
+
+The recommended way to deliver a particular CAS view to a _Single-View_ component is to use by Sofa-mapping in the CPE and/or aggregate descriptors.
+
+For _Multi-View _ components or applications, the following methods are used to create or get a reference to a CAS view for a particular Sofa:
+
+Creating a new View:
+
+[source]
+----
+JCas newView = aJCas.createView(String localNameOfTheViewBeforeMapping);
+CAS  newView = aCAS .createView(String localNameOfTheViewBeforeMapping);
+----
+
+Getting a View from a CAS or JCas:
+
+[source]
+----
+JCas myView = aJCas.getView(String localNameOfTheViewBeforeMapping);
+CAS  myView = aCAS .getView(String localNameOfTheViewBeforeMapping);
+Iterator allViews = aCasOrJCas.getViewIterator();
+Iterator someViews = aCasOrJCas.getViewIterator(String localViewNamePrefix);
+----
+
+The following methods are useful for all annotators and applications:
+
+Setting Sofa data for a CAS or JCas:
+
+[source]
+----
+aCasOrJCas.setDocumentText(String docText);
+aCasOrJCas.setSofaDataString(String docText, String mimeType);
+aCasOrJCas.setSofaDataArray(FeatureStructure array, String mimeType);
+aCasOrJCas.setSofaDataURI(String uri, String mimeType);
+----
+
+Getting Sofa data for a particular CAS or JCas:
+
+[source]
+----
+String doc = aCasOrJCas.getDocumentText();
+String doc = aCasOrJCas.getSofaDataString();
+FeatureStructure array = aCasOrJCas.getSofaDataArray();
+String uri = aCasOrJCas.getSofaDataURI();
+InputStream is = aCasOrJCas.getSofaDataStream();
+----
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/tug/tug.type_mapping.adoc b/uimaj-documentation/src/docs/asciidoc/tug/tug.type_mapping.adoc
new file mode 100644
index 0000000..eb57618
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/tug/tug.type_mapping.adoc
@@ -0,0 +1,90 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.tug.type_mapping]]
+= Managing different Type Systems
+// <titleabbrev>Managing different TypeSystems</titleabbrev>
+
+
+[[ugr.tug.type_mapping.type_merging]]
+== Annotators, Type Merging, and Remotes
+
+UIMA supports combining Annotators that have different type systems.
+This is normally done by __xref:ref.adoc#ugr.ref.cas.typemerging[merging]__ the two type systems when the Annotators are first loaded and instantiated.
+The merge process produces a logical Union of the two; types having the same name have their feature sets combined.
+The combining rules say that the range of same-named feature slots must be the same.
+This combined type system is then used for the CAS that will be passed to all of the annotators.
+
+This approach (of merging the type systems together) works well for annotators that are run together in one UIMA pipeline instantiation in one machine.
+Extensions are needed when UIMA is scaled out where the pipeline includes remote annotators, acting as servers, serving potentially multiple clients, each of which might have a different type system.
+Clients, when initializing, query all their remote server parts to get their type system definition, and merges them together with its own  to make the type system for the CAS that will be sent among all of those annotators.
+The Client's TypeSystem is the union of all of its annotators, even when some of the them are remote. 
+
+[[ugr.tug.type_mapping.remote_support]]
+== Supporting Remote Annotators
+
+Servers, in providing service to multiple clients, may receive CASes from different Clients having different type systems.
+UIMA has implemented several different approaches to support this.
+
+[NOTE]
+====
+Base UIMA includes support for the VINCI protocol (but this is older, and do not support newer features of the CAS like CAS Multipliers and multiple Views). 
+====
+
+For Vinci and UIMA-AS	using XMI, the "reachable" Feature Structures (only) are sent.
+A reachable  Feature Structure is one that is indexed, or is reachable via a  reference from another reachable Feature Structure.
+The receiving service's  type system is guaranteed to be a subset of the sender.
+Special code in the  deserializer saves aside any types and features not present in the server's type system and re-merges these values back when returning the CAS to the client. 
+
+UIMA-AS supports in addition binary CAS serialization protocols.
+The binary support is typically compressed.
+This compression can greatly reduce the size of data, compared with plain binary serialization.
+The compressed form also supports having a target type system which is  different from the source's, as long as it is compatible. 
+
+Delta CAS support is available for XMI, binary and compressed binary  protocols, used by UIMA-AS.
+The Delta CAS refers to the CAS returned from the service back to the client - only the new Feature Structures added by the service, plus any modifications to existing feature structures and/or indexes, are returned.
+This can greatly reduce the size of the returned data.
+Delta CAS support is automatically used with more recent versions of UIMA-AS. 
+
+[[ugr.tug.type_mapping.allowed_differences]]
+== Type filtering support in Binary Compressed Serialization/Deserialization
+
+The built-in support for Binary Compressed Serialization/Deserialization supports filtering between non-identical type systems.
+The filtering is designed so that things (types and/or features) that are defined in one type system but not in another are not sent (when serializing) nor received  (when deserializing).  When deserializing, non-received features receive 0  as their value.
+For built-in types, like integer, float, etc., this is the  number 0; for other kinds of things, this is usually a "null" value. 
+
+Some kinds of type mappings cannot be supported, and will signal errors.
+The two types being mapped between must be "mergable" according to the normal type merger rules (see above); otherwise, errors are signaled.
+
+[[ugr.tug.type_mapping.compressed]]
+== Remote Services support with Compressed Binary Serialization
+
+Uncompressed Binary Serialization protocols for communicating to  remote UIMA-AS services require that the Client and Server's type systems be identical.
+Compressed Binary Serialization protocols support Server type systems which are a subset of the Clients.
+Types and/or features  not in the Server's type system are not sent to the Server. 
+
+[[ugr.tug.type_filtering.compressed_file]]
+== Compressed Binary serialization to/from files
+
+Compressed binary serialization to a file can specify a target type system which is a subset of the original type system.
+The serialization will then exclude types and features not in the target, when  serializing.
+You can use this to filter the CAS to serialize out just the parts you want to. 
+
+Compressed binary deserialization from a file must specify as the target type system the one that went with the target when it was serialized.
+The source type system can be different; if it is missing types/features, these will be  filtered during deserialization.
+If it has additional features, these will be  set to 0 (the default value) in the CAS heap.
+For numeric features, this means the value will be 0 (including floating point 0); for feature structure references and strings, the value will be null. 
\ No newline at end of file
diff --git a/uimaj-documentation/src/docs/asciidoc/tug/tug.xmi_emf.adoc b/uimaj-documentation/src/docs/asciidoc/tug/tug.xmi_emf.adoc
new file mode 100644
index 0000000..308a7ad
--- /dev/null
+++ b/uimaj-documentation/src/docs/asciidoc/tug/tug.xmi_emf.adoc
@@ -0,0 +1,130 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[ugr.tug.xmi_emf]]
+= XMI and EMF Interoperability
+// <titleabbrev>XMI &amp; EMF</titleabbrev>
+
+
+[[ugr.tug.xmi_emf.overview]]
+== Overview
+
+In traditional object-oriented terms, a UIMA Type System is a class model and a UIMA CAS is an object graph.
+There are established standards in this area –specifically, #UML(TM) is an #
+      OMG(TM) standard for class models and XMI (XML Metadata Interchange) is an OMG standard for the XML representation of object graphs.
+
+Furthermore, the Eclipse Modeling Framework (EMF) is an open-source framework for model-based application development, and it is based on UML and XMI.
+In EMF, you define class models using a metamodel called Ecore, which is similar to UML.
+EMF provides tools for converting a UML model to Ecore.
+EMF can then generate Java classes from your model, and supports persistence of those classes in the XMI format.
+
+The UIMA SDK provides tools for interoperability with XMI and EMF.
+These tools allow conversions of UIMA Type Systems to and from Ecore models, as well as conversions of UIMA CASes to and from XMI format.
+This provides a number of advantages, including:
+
+____
+You can define a model using a UML Editor, such as Rational Rose or EclipseUML, and then automatically convert it to a UIMA Type System.
+
+You can take an existing UIMA application, convert its type system to Ecore, and save the CASes it produces to XMI.
+This data is now in a form where it can easily be ingested by an EMF-based application.
+____
+
+More generally, we are adopting the well-documented, open standard XMI as the standard way to represent UIMA-compliant analysis results (replacing the UIMA-specific XCAS format). This use of an open standard enables other applications to more easily produce or consume these UIMA analysis results.
+
+For more information on XMI, see Grose et al. _Mastering XMI. Java Programming with XMI, XML, and
+      UML._ John Wiley & Sons, Inc.
+2002.
+
+For more information on EMF, see Budinsky et al. _Eclipse Modeling Framework 2.0._ Addison-Wesley.
+2006.
+
+For details of how the UIMA CAS is represented in xref:ref.adoc#ugr.ref.xmi[XMI format].
+
+[[ugr.tug.xmi_emf.converting_ecore_to_from_uima_type_system]]
+== Converting an Ecore Model to or from a UIMA Type System
+
+The UIMA SDK provides the following two classes:
+
+*``**Ecore2UimaTypeSystem:**``* converts from an .ecore model developed using EMF to a UIMA-compliant TypeSystem descriptor.
+This is a Java class that can be run as a standalone program or invoked from another Java application.
+To run as a standalone program, execute:
+
+`java org.apache.uima.ecore.Ecore2UimaTypeSystem <ecore
+      file> <output file>`
+
+The input .ecore file will be converted to a UIMA TypeSystem descriptor and written to the specified output file.
+You can then use the resulting TypeSystem descriptor in your UIMA application.
+
+*``**UimaTypeSystem2Ecore:**``* converts from a UIMA TypeSystem descriptor to an .ecore model.
+This is a Java class that can be run as a standalone program or invoked from another Java application.
+To run as a standalone program, execute:
+
+`java org.apache.uima.ecore.UimaTypeSystem2Ecore
+      <TypeSystem descriptor> <output file>`
+
+The input UIMA TypeSystem descriptor will be converted to an Ecore model file and written to the specified output file.
+You can then use the resulting Ecore model in EMF applications.
+The converted type system will include any ``<import...>``ed TypeSystems; the fact that they were imported is currently not preserved.
+
+To run either of these converters, your classpath will need to include the UIMA jar files as well as the following jar files from the EMF distribution: common.jar, ecore.jar, and ecore.xmi.jar.
+
+Also, note that the uima-core.jar file contains the Ecore model file uima.ecore, which defines the built-in UIMA types.
+You may need to use this file from your EMF applications.
+
+[[ugr.tug.xmi_emf.using_xmi_cas_serialization]]
+== Using XMI CAS Serialization
+
+The UIMA SDK provides XMI support through the following two classes:
+
+*``**XmiCasSerializer:**``* can be run from within a UIMA application to write out a CAS to the standard XMI format.
+The XMI that is generated will be compliant with the Ecore model generated by ``UimaTypeSystem2Ecore``.
+An EMF application could use this Ecore model to ingest and process the XMI produced by the XmiCasSerializer.
+
+*``**XmiCasDeserializer:**``* can be run from within a UIMA application to read in an XMI document and populate a CAS.
+The XMI must conform to the Ecore model generated by ``UimaTypeSystem2Ecore``.
+
+Also, the uimaj-examples Eclipse project contains some example code that shows how to use the serializer and deserializer: 
+
+____
+`org.apache.uima.examples.xmi.XmiWriterCasConsumer:` This is a CAS Consumer that writes each CAS to an output file in XMI format.
+It is analogous to the XCasWriter CAS Consumer that has existed in prior UIMA versions, except that it uses the XMI serialization format.
+
+`org.apache.uima.examples.xmi.XmiCollectionReader:` This is a Collection Reader that reads a directory of XMI files and deserializes each of them into a CAS.
+For example, this would allow you to build a Collection Processing Engine that reads XMI files, which could contain some previous analysis results, and then do further analysis.
+____
+
+Finally, in under the folder `uimaj-examples/ecore_src` is the class ``org.apache.uima.examples.xmi.XmiEcoreCasConsumer``, which writes each CAS to XMI format and also saves the Type System as an Ecore file.
+Since this uses the `UimaTypeSystem2Ecore` converter, to compile it you must add to your classpath the EMF jars common.jar, ecore.jar, and ecore.xmi.jar – see ecore_src/readme.txt for instructions.
+
+[[ugr.tug.xmi_emf.xml_character_issues]]
+=== Character Encoding Issues with XML Serialization
+
+Note that not all valid Unicode characters are valid XML characters, at least not in XML 1.0.
+Moreover, it is possible to create characters in Java that are not even valid Unicode characters, let alone XML characters.
+As UIMA character data is translated directly into XML character data on serialization, this may lead to issues.
+UIMA will therefore check that the character data that is being serialized is valid for the version of XML being used.
+If  non-serializable character data is encountered during serialization, an exception is thrown and serialization fails (to avoid creating invalid XML data).  UIMA does not simply replace the offending characters with some valid replacement character; the assumption being that most applications would not like to have their data modified automatically. 
+
+If you know you are going to use XML serialization, and you would like to avoid such issues on serialization, you should check any character data you create in UIMA ahead of time.
+Issues most often arise with the document text, as documents may originate at various sources, and may be of varying quality.
+So it's a particularly good idea to check the document text for characters that will cause issues for serialization. 
+
+UIMA provides a handful of functions to assist you in checking Java character data.
+Those methods are located in ``org.apache.uima.internal.util.XMLUtils.checkForNonXmlCharacters()``, with several overloads.
+Please check the javadocs for further information. 
+
+Please note that these issues are not specific to XMI serialization, they apply to the older XCAS format in the same way. 
\ No newline at end of file