docs/d/uimaj-current/overview_and_setup.html - uima-site - Git at Google

 <html><head>
       <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
    <title>UIMA Overview &amp; SDK Setup</title><link rel="stylesheet" type="text/css" href="css/stylesheet-html.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div lang="en" class="book" title="UIMA Overview &amp; SDK Setup" id="d5e1"><div xmlns:d="http://docbook.org/ns/docbook" class="titlepage"><div><div><h1 class="title">UIMA Overview &amp; SDK Setup</h1></div><div><div class="authorgroup">
       <h3 class="corpauthor">Written and maintained by the Apache UIMA&#8482; Development Community</h3>
     </div></div><div><p class="releaseinfo">Version 3.0.2</p></div><div><p class="copyright">Copyright &copy; 2006, 2019 The Apache Software Foundation</p></div><div><p class="copyright">Copyright &copy; 2004, 2006 International Business Machines Corporation</p></div><div><div class="legalnotice" title="Legal Notice"><a name="d5e8"></a>
       <p> </p>
       <p title="License and Disclaimer">
         <b>License and Disclaimer.&nbsp;</b>

         The ASF licenses this documentation
            to you under the Apache License, Version 2.0 (the
            "License"); you may not use this documentation except in compliance
            with the License.  You may obtain a copy of the License at

          </p><div class="blockquote"><blockquote class="blockquote">
            <a class="ulink" href="http://www.apache.org/licenses/LICENSE-2.0" target="_top">http://www.apache.org/licenses/LICENSE-2.0</a>
          </blockquote></div><p title="License and Disclaimer">

            Unless required by applicable law or agreed to in writing,
            this documentation and its contents are distributed under the License
            on an
            "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
            KIND, either express or implied.  See the License for the
            specific language governing permissions and limitations
            under the License.

       </p>
       <p> </p>
       <p> </p>
       <p title="Trademarks">
         <b>Trademarks.&nbsp;</b>
         All terms mentioned in the text that are known to be trademarks or
         service marks have been appropriately capitalized.  Use of such terms
         in this book should not be regarded as affecting the validity of the
         the trademark or service mark.

       </p>
     </div></div><div><p class="pubdate">April, 2019</p></div></div><hr></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="chapter"><a href="#ugr.project_overview">1. Overview</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.project_overview_doc_overview">1.1. Apache UIMA Project Documentation Overview</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.project_overview_overview">1.1.1. Overviews</a></span></dt><dt><span class="section"><a href="#ugr.project_overview_setup">1.1.2. Eclipse Tooling Installation and Setup</a></span></dt><dt><span class="section"><a href="#ugr.project_overview_tutorials_dev_guides">1.1.3. Tutorials and Developer's Guides</a></span></dt><dt><span class="section"><a href="#ugr.project_overview_tool_guides">1.1.4. Tools Users' Guides</a></span></dt><dt><span class="section"><a href="#ugr.project_overview_reference">1.1.5. References</a></span></dt><dt><span class="section"><a href="#ugr.project_overview_v3">1.1.6. Version 3 User's guide</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.project_overview_doc_use">1.2. How to use the Documentation</a></span></dt><dt><span class="section"><a href="#ugr.project_overview_changes_from_previous">1.3. Changes from UIMA Version 2</a></span></dt><dt><span class="section"><a href="#ugr.project_overview_migrating_from_v2_to_v3">1.4. Migrating existing UIMA pipelines from Version 2 to Version 3</a></span></dt><dt><span class="section"><a href="#ugr.project_overview_summary">1.5. Apache UIMA Summary</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ovv.summary.general">1.5.1. General</a></span></dt><dt><span class="section"><a href="#ugr.ovv.summary.programming_language_support">1.5.2. Programming Language Support</a></span></dt><dt><span class="section"><a href="#ugr.ovv.general.summary.multi_modal_support">1.5.3. Multi-Modal Support</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.project_overview_summary_sdk_capabilities">1.6. Summary of Apache UIMA Capabilities</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.ovv.conceptual">2. UIMA Conceptual Overview</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ovv.conceptual.uima_introduction">2.1. UIMA Introduction</a></span></dt><dt><span class="section"><a href="#ugr.ovv.conceptual.architecture_framework_sdk">2.2. The Architecture, the Framework and the SDK</a></span></dt><dt><span class="section"><a href="#ugr.ovv.conceptual.analysis_basics">2.3. Analysis Basics</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ovv.conceptual.aes_annotators_and_analysis_results">2.3.1. Analysis Engines, Annotators &amp; Results</a></span></dt><dt><span class="section"><a href="#ugr.ovv.conceptual.representing_results_in_cas">2.3.2. Representing Analysis Results in the CAS</a></span></dt><dt><span class="section"><a href="#ugr.ovv.conceptual.interacting_with_cas_and_external_resources">2.3.3. Using CASes and External Resources</a></span></dt><dt><span class="section"><a href="#ugr.ovv.conceptual.component_descriptors">2.3.4. Component Descriptors</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.ovv.conceptual.aggregate_analysis_engines">2.4. Aggregate Analysis Engines</a></span></dt><dt><span class="section"><a href="#ugr.ovv.conceptual.applicaiton_building_and_collection_processing">2.5. Application Building and Collection Processing</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ovv.conceptual.using_framework_from_an_application">2.5.1. Using the framework from an Application</a></span></dt><dt><span class="section"><a href="#ugr.ovv.conceptual.graduating_to_collection_processing">2.5.2. Graduating to Collection Processing</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.ovv.conceptual.exploiting_analysis_results">2.6. Exploiting Analysis Results</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ovv.conceptual.semantic_search">2.6.1. Semantic Search</a></span></dt><dt><span class="section"><a href="#ugr.ovv.conceptual.databases">2.6.2. Databases</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.ovv.conceptual.multimodal_processing">2.7. Multimodal Processing in UIMA</a></span></dt><dt><span class="section"><a href="#ugr.ovv.conceptual.next_steps">2.8. Next Steps</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.ovv.eclipse_setup">3. Eclipse IDE setup for UIMA</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ovv.eclipse_setup.installation">3.1. Installation</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.ovv.eclipse_setup.install_eclipse">3.1.1. Install Eclipse</a></span></dt><dt><span class="section"><a href="#ugr.ovv.eclipse_setup.install_uima_eclipse_plugins">3.1.2. Installing the UIMA Eclipse Plugins</a></span></dt><dt><span class="section"><a href="#ugr.ovv.eclipse_setup.install_uima_sdk">3.1.3. Install the UIMA SDK</a></span></dt><dt><span class="section"><a href="#ugr.ovv.eclipse_setup.install_uima_eclipse_plugins_manually">3.1.4. Installing the UIMA Eclipse Plugins, manually</a></span></dt><dt><span class="section"><a href="#ugr.ovv.eclipse_setup.start_eclipse">3.1.5. Start Eclipse</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.ovv.eclipse_setup.example_code">3.2. Setting up Eclipse to view Example Code</a></span></dt><dt><span class="section"><a href="#ugr.ovv.eclipse_setup.adding_source">3.3. Adding the UIMA source code to the jar files</a></span></dt><dt><span class="section"><a href="#ugr.ovv.eclipse_setup.linking_uima_javadocs">3.4. Attaching UIMA Javadocs</a></span></dt><dt><span class="section"><a href="#ugr.ovv.eclipse_setup.running_external_tools_from_eclipse">3.5. Running external tools from Eclipse</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.faqs">4. UIMA FAQ's</a></span></dt><dt><span class="chapter"><a href="#ugr.issues">5. Known Issues</a></span></dt><dt><span class="glossary"><a href="#ugr.glossary">Glossary</a></span></dt></dl></div>


   <div class="chapter" title="Chapter&nbsp;1.&nbsp;UIMA Overview" id="ugr.project_overview"><div class="titlepage"><div><div><h2 class="title">Chapter&nbsp;1.&nbsp;UIMA Overview</h2></div></div></div>


   <p>The Unstructured Information Management Architecture (UIMA) is an architecture and software framework
     for creating, discovering, composing and deploying a broad range of multi-modal analysis capabilities and
     integrating them with search technologies.  The architecture is undergoing a standardization effort,
     referred to as the <span class="emphasis"><em>UIMA specification</em></span> by a technical committee within
     <a class="ulink" href="http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=uima" target="_top">OASIS</a>.
     </p>

   <p>The <span class="emphasis"><em>Apache UIMA</em></span> framework is an Apache licensed, open source implementation of the
     UIMA Architecture, and provides a run-time environment in which developers can plug in
     and run their UIMA component implementations and with which they can build and deploy UIM applications. The
     framework itself is not specific to any IDE or platform.</p>

   <p>It includes an all-Java implementation of the
     UIMA framework for the development, description, composition and deployment of UIMA components and
     applications. It also provides the developer with an Eclipse-based (<a class="ulink" href="http://www.eclipse.org/" target="_top">http://www.eclipse.org/</a>
     ) development environment that includes a set of tools and utilities for using UIMA. It also includes
     a C++ version of the framework, and
     enablements for Annotators built in Perl, Python, and TCL.</p>

   <p>This chapter is the intended starting point for readers that are new to the Apache UIMA Project. It includes
     this introduction and the following sections:</p>
   <div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">
       <p> <a class="xref" href="#ugr.project_overview_doc_overview" title="1.1.&nbsp;Apache UIMA Project Documentation Overview">Section&nbsp;1.1, &#8220;Apache UIMA Project Documentation Overview&#8221;</a> provides a list of the books and topics included in
         the Apache UIMA documentation with a brief summary of each. </p>
     </li><li class="listitem">
       <p> <a class="xref" href="#ugr.project_overview_doc_use" title="1.2.&nbsp;How to use the Documentation">Section&nbsp;1.2, &#8220;How to use the Documentation&#8221;</a> describes a recommended path through the
         documentation to help get the reader up and running with UIMA </p>
     </li></ul></div>

     <p>The main website for Apache UIMA is <a class="ulink" href="http://uima.apache.org" target="_top">http://uima.apache.org</a>.  Here you
     can find out many things, including:
      </p><div class="itemizedlist"><ul class="itemizedlist" type="disc" compact><li class="listitem"><p>how to download (both the binary and source distributions</p></li><li class="listitem"><p>how to participate in the development</p></li><li class="listitem"><p>mailing lists - including the user list used like a forum for questions and answers</p></li><li class="listitem"><p>a Wiki where you can find and contribute all kinds of information, including tips and best practices</p></li><li class="listitem"><p>a sandbox - a subproject for potential new additions to Apache UIMA or to subprojects of it.  Things here
        are works in progress, and may (or may not) be included in releases.</p></li><li class="listitem"><p>links to conferences</p></li></ul></div><p>
       </p>

   <div class="section" title="1.1.&nbsp;Apache UIMA Project Documentation Overview"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.project_overview_doc_overview">1.1.&nbsp;Apache UIMA Project Documentation Overview</h2></div></div></div>

     <p> The user documentation for UIMA is organized into several parts.
       </p><div class="itemizedlist"><ul class="itemizedlist" type="disc" compact><li class="listitem">
           <p> Overviews - this documentation </p>
         </li><li class="listitem">
           <p> Eclipse Tooling Installation and Setup - also in this document </p>
         </li><li class="listitem">
           <p> Tutorials and Developer's Guides </p>
         </li><li class="listitem">
           <p> Tools Users' Guides </p>
         </li><li class="listitem">
           <p> References </p>
         </li><li class="listitem">
           <p>Version 3 users-guide</p>
         </li></ul></div><p> </p>

     <p>
     The first 2 parts make up this book; the last 4 have individual
     books.  The books are provided both as
     (somewhat large) html files, viewable in browsers, and also as PDF files.
     The documentation is fully hyperlinked, with tables of contents.  The PDF versions are set up to
     print nicely - they have page numbers included on the cross references within a book. </p>

     <p>If you view the PDF files inside
     a browser that supports imbedded viewing of PDF, the hyperlinks between different PDF books may work (not
     all browsers have been tested...).</p>

     <p>The following set of tables gives a more detailed overview of the various parts of the
     documentation.
     </p>

     <div class="section" title="1.1.1.&nbsp;Overviews"><div class="titlepage"><div><div><h3 class="title" id="ugr.project_overview_overview">1.1.1.&nbsp;Overviews</h3></div></div></div>


       <div class="informaltable">
         <table style="border-collapse: collapse;border-top: 0.5pt solid black; border-bottom: 0.5pt solid black; border-left: 0.5pt solid black; border-right: 0.5pt solid black; "><colgroup><col class="col1"><col class="col2"></colgroup><tbody><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><span class="emphasis"><em>Overview of the Documentation</em></span>
               </td><td style="border-bottom: 0.5pt solid black; ">
                 <p>What you are currently reading.  Lists the documents provided in the Apache
                 UIMA documentation set and provides
                  a recommended path through the documentation for getting started using
                   UIMA.  It includes release notes and provides a brief high-level description of
                   the different software modules included in the
                   Apache UIMA Project.  See <a class="xref" href="#ugr.project_overview_doc_overview" title="1.1.&nbsp;Apache UIMA Project Documentation Overview">Section&nbsp;1.1, &#8220;Apache UIMA Project Documentation Overview&#8221;</a>.</p>
               </td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><span class="emphasis"><em>Conceptual Overview</em></span>
               </td><td style="border-bottom: 0.5pt solid black; ">Provides a broad conceptual overview of the UIMA component architecture; includes
                 references to the other documents in the documentation set that provide more detail.
                 See <a class="xref" href="#ugr.ovv.conceptual" title="Chapter&nbsp;2.&nbsp;UIMA Conceptual Overview">Chapter&nbsp;2, <i>UIMA Conceptual Overview</i></a></td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><span class="emphasis"><em>UIMA FAQs</em></span>
               </td><td style="border-bottom: 0.5pt solid black; ">Frequently Asked Questions about general UIMA concepts. (Not a programming
                 resource.)  See <a class="xref" href="#ugr.faqs" title="Chapter&nbsp;4.&nbsp;UIMA Frequently Asked Questions (FAQ's)">Chapter&nbsp;4, <i>UIMA Frequently Asked Questions (FAQ's)</i></a>.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><span class="emphasis"><em>Known Issues</em></span>
               </td><td style="border-bottom: 0.5pt solid black; ">Known issues and problems with the UIMA SDK.  See <a class="xref" href="#ugr.issues" title="Chapter&nbsp;5.&nbsp;Known Issues">Chapter&nbsp;5, <i>Known Issues</i></a>.</td></tr><tr><td style="border-right: 0.5pt solid black; "><span class="emphasis"><em>Glossary</em></span>
               </td><td style="">UIMA terms and concepts and their basic definitions.  See <a class="xref" href="#ugr.glossary" title="Glossary: Key Terms &amp; Concepts">Glossary</a>.</td></tr></tbody></table>
       </div>
     </div>

     <div class="section" title="1.1.2.&nbsp;Eclipse Tooling Installation and Setup"><div class="titlepage"><div><div><h3 class="title" id="ugr.project_overview_setup">1.1.2.&nbsp;Eclipse Tooling Installation and Setup</h3></div></div></div>

       <p>Provides step-by-step instructions for installing Apache UIMA in the Eclipse Interactive
         Development Environment.  See <a class="xref" href="#ugr.ovv.eclipse_setup" title="Chapter&nbsp;3.&nbsp;Setting up the Eclipse IDE to work with UIMA">Chapter&nbsp;3, <i>Setting up the Eclipse IDE to work with UIMA</i></a>.</p>
     </div>

     <div class="section" title="1.1.3.&nbsp;Tutorials and Developer's Guides"><div class="titlepage"><div><div><h3 class="title" id="ugr.project_overview_tutorials_dev_guides">1.1.3.&nbsp;Tutorials and Developer's Guides</h3></div></div></div>

       <div class="informaltable">
         <table style="border-collapse: collapse;border-top: 0.5pt solid black; border-bottom: 0.5pt solid black; border-left: 0.5pt solid black; border-right: 0.5pt solid black; "><colgroup><col class="col1"><col class="col2"></colgroup><tbody><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><a name="ugr.project_overview_tutorial_annotator"></a><span class="emphasis"><em>Annotators and Analysis Engines</em></span>
               </td><td style="border-bottom: 0.5pt solid black; ">Tutorial-style guide for building UIMA annotators and analysis engines. This chapter
                 introduces the developer to creating type systems and using UIMA's common data structure,
                 the CAS or Common Analysis Structure. It demonstrates how to use built in tools to specify and create
                 basic UIMA analysis components.  See
                 <a href="tutorials_and_users_guides.html#ugr.tug.aae" class="olink">Chapter&nbsp;1, <i>Annotator and Analysis Engine Developer's Guide</i></a>.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><a name="ugr.project_overview_tutorial_cpe"></a><span class="emphasis"><em>Building UIMA Collection Processing Engines</em></span>
               </td><td style="border-bottom: 0.5pt solid black; ">Tutorial-style guide for building UIMA collection processing engines. These
                manage the
                 analysis of collections of documents from source to sink.  See
                 <a href="tutorials_and_users_guides.html#ugr.tug.cpe" class="olink">Chapter&nbsp;2, <i>Collection Processing Engine Developer's Guide</i></a>.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><a name="ugr.project_overview_tutorial_application_development"></a><span class="emphasis"><em>Developing Complete Applications</em></span>
               </td><td style="border-bottom: 0.5pt solid black; ">Tutorial-style guide on using the UIMA APIs to create, run and manage UIMA components from
                 your application. Also describes APIs for saving and restoring the contents of a CAS using an XML
                 format called <span class="trademark"> XMI</span>&reg;.  See
                 <a href="tutorials_and_users_guides.html#ugr.tug.application" class="olink">Chapter&nbsp;3, <i>Application Developer's Guide</i></a>.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><a name="ugr.project_overview_guide_flow_controller"></a><span class="emphasis"><em>Flow Controller</em></span>
               </td><td style="border-bottom: 0.5pt solid black; ">When multiple components are combined in an Aggregate, each CAS flow among the various
                 components. UIMA provides two built-in flows, and also allows custom flows to be
                 implemented.  See <a href="tutorials_and_users_guides.html#ugr.tug.fc" class="olink">Chapter&nbsp;4, <i>Flow Controller Developer's Guide</i></a>.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><a name="ugr.project_overview_guide_multiple_sofas"></a><span class="emphasis"><em>Developing Applications using Multiple Subjects of Analysis</em></span>
               </td><td style="border-bottom: 0.5pt solid black; ">A single CAS maybe associated with multiple subjects of analysis (Sofas). These are useful
                 for representing and analyzing different formats or translations of the same document. For
                 multi-modal analysis, Sofas are good for different modal representations of the same stream
                 (e.g., audio and close-captions).This chapter provides the developer details on how to use
                 multiple Sofas in an application.  See
                 <a href="tutorials_and_users_guides.html#ugr.tug.aas" class="olink">Chapter&nbsp;5, <i>Annotations, Artifacts, and Sofas</i></a>.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><a name="ugr.project_overview_guide_multiple_views"></a><span class="emphasis"><em>Multiple CAS Views of an Artifact</em></span>
               </td><td style="border-bottom: 0.5pt solid black; ">UIMA provides an extension to the basic model of the CAS which supports
               analysis of multiple views of the same artifact, all contained with the CAS. This
               chapter describes the concepts, terminology, and the API and XML extensions that
               enable this.  See
                 <a href="tutorials_and_users_guides.html#ugr.tug.mvs" class="olink">Chapter&nbsp;6, <i>Multiple CAS Views of an Artifact</i></a>.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><a name="ugr.project_overview_guide_cas_multiplier"></a><span class="emphasis"><em>CAS Multiplier</em></span>
               </td><td style="border-bottom: 0.5pt solid black; ">A component may add additional CASes into the workflow. This may be useful to break up a large
                 artifact into smaller units, or to create a new CAS that collects information from multiple other
                 CASes.  See <a href="tutorials_and_users_guides.html#ugr.tug.cm" class="olink">Chapter&nbsp;7, <i>CAS Multiplier Developer's Guide</i></a>.</td></tr><tr><td style="border-right: 0.5pt solid black; "><a name="ugr.project_overview_xmi_emf"></a><span class="emphasis"><em>XMI and EMF Interoperability</em></span>
               </td><td style="">The UIMA Type system and the contents of the CAS itself can be externalized using the XMI
                 standard for XML MetaData. Eclipse Modeling Framework (EMF) tooling can be used to develop
                 applications that use this information.  See
                 <a href="tutorials_and_users_guides.html#ugr.tug.xmi_emf" class="olink">Chapter&nbsp;8, <i>XMI and EMF Interoperability</i></a>.</td></tr></tbody></table>
       </div>
     </div>

     <div class="section" title="1.1.4.&nbsp;Tools Users' Guides"><div class="titlepage"><div><div><h3 class="title" id="ugr.project_overview_tool_guides">1.1.4.&nbsp;Tools Users' Guides</h3></div></div></div>


       <div class="informaltable">
         <table style="border-collapse: collapse;border-top: 0.5pt solid black; border-bottom: 0.5pt solid black; border-left: 0.5pt solid black; border-right: 0.5pt solid black; "><colgroup><col class="col1"><col class="col2"></colgroup><tbody><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><a name="ugr.project_overview_tools_component_descriptor_editor"></a><span class="emphasis"><em>Component Descriptor Editor</em></span>
               </td><td style="border-bottom: 0.5pt solid black; ">Describes the features of the Component Descriptor Editor Tool. This tool provides a GUI for
                 specifying the details of UIMA component descriptors, including those for Analysis Engines
                 (primitive and aggregate), Collection Readers, CAS Consumers and Type Systems.  See
                 <a href="tools.html#ugr.tools.cde" class="olink">Chapter&nbsp;1, <i>Component Descriptor Editor User's Guide</i></a>.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><a name="ugr.project_overview_tools_cpe_configurator"></a><span class="emphasis"><em>Collection Processing Engine Configurator</em></span>
               </td><td style="border-bottom: 0.5pt solid black; ">Describes the User Interfaces and features of the CPE Configurator tool. This tool allows the
                 user to select and configure the components of a Collection Processing Engine and then to run the
                 engine.  See
                 <a href="tools.html#ugr.tools.cpe" class="olink">Chapter&nbsp;2, <i>Collection Processing Engine Configurator User's Guide</i></a>.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><a name="ugr.project_overview_tools_pear_packager"></a><span class="emphasis"><em>Pear Packager</em></span>
               </td><td style="border-bottom: 0.5pt solid black; ">Describes how to use the PEAR Packager utility. This utility enables developers to produce an
                 archive file for an analysis engine that includes all required resources for installing that
                 analysis engine in another UIMA environment.  See
                 <a href="tools.html#ugr.tools.pear.packager" class="olink">Chapter&nbsp;9, <i>PEAR Packager User's Guide</i></a>.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><a name="ugr.project_overview_tools_pear_installer"></a><span class="emphasis"><em>Pear Installer</em></span>
               </td><td style="border-bottom: 0.5pt solid black; ">Describes how to use the PEAR Installer utility. This utility installs and verifies an
                 analysis engine from an archive file (PEAR) with all its resources in the right place so it is ready to
                 run.  See
                 <a href="tools.html#ugr.tools.pear.installer" class="olink">Chapter&nbsp;11, <i>PEAR Installer User's Guide</i></a>.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><a name="ugr.project_overview_tools_pear_merger"></a><span class="emphasis"><em>Pear Merger</em></span>
               </td><td style="border-bottom: 0.5pt solid black; ">Describes how to use the Pear Merger utility, which does a simple merge of multiple PEAR
                 packages into one.  See
                 <a href="tools.html#ugr.tools.pear.merger" class="olink">Chapter&nbsp;12, <i>PEAR Merger User's Guide</i></a>.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><a name="ugr.project_overview_tools_document_analyzer"></a><span class="emphasis"><em>Document Analyzer</em></span>
               </td><td style="border-bottom: 0.5pt solid black; ">Describes the features of a tool for applying a UIMA analysis engine to a set of documents and
                 viewing the results.  See
                 <a href="tools.html#ugr.tools.doc_analyzer" class="olink">Chapter&nbsp;3, <i>Document Analyzer User's Guide</i></a>.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><a name="ugr.project_overview_tools_cas_visual_debugger"></a><span class="emphasis"><em>CAS Visual Debugger</em></span>
               </td><td style="border-bottom: 0.5pt solid black; ">Describes the features of a tool for viewing the detailed structure and contents of a CAS. Good
                 for debugging.  See
                 <a href="tools.html#ugr.tools.cvd" class="olink">Chapter&nbsp;5, <i>CAS Visual Debugger</i></a>.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><a name="ugr.project_overview_tools_jcasgen"></a><span class="emphasis"><em>JCasGen</em></span>
               </td><td style="border-bottom: 0.5pt solid black; ">Describes how to run the JCasGen utility, which automatically builds Java classes that
                 correspond to a particular CAS Type System.  See
                 <a href="tools.html#ugr.tools.jcasgen" class="olink">Chapter&nbsp;8, <i>JCasGen User's Guide</i></a>.</td></tr><tr><td style="border-right: 0.5pt solid black; "><a name="ugr.project_overview_tools_xml_cas_viewer"></a><span class="emphasis"><em>XML CAS Viewer</em></span>
               </td><td style="">Describes how to run the supplied viewer to view externalized XML forms of CASes. This viewer
                 is used in the examples.  See
                 <a href="tools.html#ugr.tools.annotation_viewer" class="olink">Chapter&nbsp;4, <i>Annotation Viewer</i></a>.</td></tr></tbody></table>
       </div>
     </div>

     <div class="section" title="1.1.5.&nbsp;References"><div class="titlepage"><div><div><h3 class="title" id="ugr.project_overview_reference">1.1.5.&nbsp;References</h3></div></div></div>

       <div class="informaltable">
         <table style="border-collapse: collapse;border-top: 0.5pt solid black; border-bottom: 0.5pt solid black; border-left: 0.5pt solid black; border-right: 0.5pt solid black; "><colgroup><col class="col1"><col class="col2"></colgroup><tbody><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><a name="ugr.project_overview_javadocs"></a><span class="emphasis"><em>Introduction to the UIMA API Javadocs</em></span>
               </td><td style="border-bottom: 0.5pt solid black; ">Javadocs detailing the UIMA programming interfaces  See
                 <a href="references.html#ugr.ref.javadocs" class="olink">Chapter&nbsp;1, <i>Javadocs</i></a></td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><a name="ugr.project_overview_xml_ref_component_descriptor"></a><span class="emphasis"><em>XML: Component Descriptor</em></span>
               </td><td style="border-bottom: 0.5pt solid black; ">Provides detailed XML format for all the UIMA component descriptors, except the CPE (see
                 next).  See
                 <a href="references.html#ugr.ref.xml.component_descriptor" class="olink">Chapter&nbsp;2, <i>Component Descriptor Reference</i></a>.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><a name="ugr.project_overview_xml_ref_collection_processing_engine_descriptor"></a><span class="emphasis"><em>XML: Collection Processing Engine Descriptor</em></span>
               </td><td style="border-bottom: 0.5pt solid black; ">Provides detailed XML format for the Collection Processing Engine descriptor.  See
                 <a href="references.html#ugr.ref.xml.cpe_descriptor" class="olink">Chapter&nbsp;3, <i>Collection Processing Engine Descriptor Reference</i></a></td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><a name="ugr.project_overview_cas"></a><span class="emphasis"><em>CAS</em></span>
               </td><td style="border-bottom: 0.5pt solid black; ">Provides detailed description of the principal CAS interface.  See
                 <a href="references.html#ugr.ref.cas" class="olink">Chapter&nbsp;4, <i>CAS Reference</i></a></td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><a name="ugr.project_overview_jcas"></a><span class="emphasis"><em>JCas</em></span>
               </td><td style="border-bottom: 0.5pt solid black; ">Provides details on the JCas, a native Java interface to the CAS.  See
                 <a href="references.html#ugr.ref.jcas" class="olink">Chapter&nbsp;5, <i>JCas Reference</i></a></td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "><a name="ugr.project_overview_ref_pear"></a><span class="emphasis"><em>PEAR Reference</em></span>
               </td><td style="border-bottom: 0.5pt solid black; ">Provides detailed description of the deployable archive format for UIMA
                 components.  See
                 <a href="references.html#ugr.ref.pear" class="olink">Chapter&nbsp;6, <i>PEAR Reference</i></a></td></tr><tr><td style="border-right: 0.5pt solid black; "><a name="ugr.project_overview_xmi_cas_serialization"></a><span class="emphasis"><em>XMI CAS Serialization Reference</em></span>
               </td><td style="">Provides detailed description of the deployable archive format for UIMA
                 components.  See
                 <a href="references.html#ugr.ref.xmi" class="olink">Chapter&nbsp;7, <i>XMI CAS Serialization Reference</i></a></td></tr></tbody></table>
       </div>
     </div>

     <div class="section" title="1.1.6.&nbsp;Version 3 User's guide"><div class="titlepage"><div><div><h3 class="title" id="ugr.project_overview_v3">1.1.6.&nbsp;Version 3 User's guide</h3></div></div></div>

       <p>This book describes Version 3's features, capabilities, and differences with version 2.
         </p>
     </div>

   </div>

   <div class="section" title="1.2.&nbsp;How to use the Documentation"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.project_overview_doc_use">1.2.&nbsp;How to use the Documentation</h2></div></div></div>


     <div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem">
         <p>Explore this chapter to get an overview of the different documents that are included with Apache UIMA.</p>
       </li><li class="listitem">
         <p> Read <a href="overview_and_setup.html#ugr.ovv.conceptual" class="olink">Chapter&nbsp;2, <i>UIMA Conceptual Overview</i></a> to get a broad
           view of the basic UIMA concepts and philosophy with reference to the other documents included in the
           documentation set which provide greater detail. </p>
       </li><li class="listitem">
         <p> For more general information on the UIMA architecture and how it has been used, refer to the IBM Systems
           Journal special issue on Unstructured Information Management, on-line at <a class="ulink" href="http://www.research.ibm.com/journal/sj43-3.html" target="_top">http://www.research.ibm.com/journal/sj43-3.html</a> or to the section of the UIMA project
           website on Apache website where other publications are listed. </p>
       </li><li class="listitem">
         <p> Set up Apache UIMA in your Eclipse environment. To do this, follow the instructions in <a class="xref" href="#ugr.ovv.eclipse_setup" title="Chapter&nbsp;3.&nbsp;Setting up the Eclipse IDE to work with UIMA">Chapter&nbsp;3, <i>Setting up the Eclipse IDE to work with UIMA</i></a>. </p>
       </li><li class="listitem">
         <p> Develop sample UIMA annotators, run them and explore the results. Read <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.aae" class="olink">Chapter&nbsp;1, <i>Annotator and Analysis Engine Developer's Guide</i></a> and follow it like a tutorial
           to learn how to develop your first UIMA annotator and set up and run your first UIMA analysis engines.
           </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">
               <p> As part of this you will use a few tools including
                 </p><div class="itemizedlist"><ul class="itemizedlist" type="circle"><li class="listitem">
                     <p> The UIMA Component Descriptor Editor, described in more detail in <a href="tools.html#d5e1" class="olink">UIMA Tools Guide and Reference</a> <a href="tools.html#ugr.tools.cde" class="olink">Chapter&nbsp;1, <i>Component Descriptor Editor User's Guide</i></a> and </p>
                   </li><li class="listitem">
                     <p> The Document Analyzer, described in more detail in <a href="tools.html#d5e1" class="olink">UIMA Tools Guide and Reference</a> <a href="tools.html#ugr.tools.doc_analyzer" class="olink">Chapter&nbsp;3, <i>Document Analyzer User's Guide</i></a>. </p>
                   </li></ul></div><p> </p>

             </li><li class="listitem">
               <p>While following along in <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a>
                 <a href="tutorials_and_users_guides.html#ugr.tug.aae" class="olink">Chapter&nbsp;1, <i>Annotator and Analysis Engine Developer's Guide</i></a>, reference documents that may help are:
                 </p><div class="itemizedlist"><ul class="itemizedlist" type="circle"><li class="listitem">
                     <p> <a href="references.html#d5e1" class="olink">UIMA References</a> <a href="references.html#ugr.ref.xml.component_descriptor" class="olink">Chapter&nbsp;2, <i>Component Descriptor Reference</i></a> for understanding the analysis
                       engine descriptors </p>
                   </li><li class="listitem">
                     <p> <a href="references.html#d5e1" class="olink">UIMA References</a>
                       <a href="references.html#ugr.ref.jcas" class="olink">Chapter&nbsp;5, <i>JCas Reference</i></a> for
                       understanding the JCas </p>
                   </li></ul></div><p> </p>
             </li></ul></div><p> </p>
       </li><li class="listitem">
         <p> Learn how to create, run and manage a UIMA analysis engine as part of an application.
           Connect your analysis engine to the provided semantic search engine to learn how a
           complete analysis and search application may be built with Apache UIMA. <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.application" class="olink">Chapter&nbsp;3, <i>Application Developer's Guide</i></a> will guide you
           through this process.
           </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">
               <p> As part of this you will use the document analyzer (described in more detail in <a href="tools.html#d5e1" class="olink">UIMA Tools Guide and Reference</a> <a href="tools.html#ugr.tools.doc_analyzer" class="olink">Chapter&nbsp;3, <i>Document Analyzer User's Guide</i></a> and semantic search
                 GUI tools (see <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a>
                 <span class="olink">????</span>. </p>
             </li></ul></div><p> </p>
       </li><li class="listitem">
         <p> Pat yourself on the back. Congratulations! If you reached this step successfully, then you have an
           appreciation for the UIMA analysis engine architecture. You would have built a few sample annotators,
           deployed UIMA analysis engines to analyze a few documents, searched over the results using the built-in
           semantic search engine and viewed the results through a built-in viewer
           &#8211; all as part of a simple but complete application. </p>
       </li><li class="listitem">
         <p> Develop and run a Collection Processing Engine (CPE) to analyze and gather the results of an entire
           collection of documents. <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a>
           <a href="tutorials_and_users_guides.html#ugr.tug.cpe" class="olink">Chapter&nbsp;2, <i>Collection Processing Engine Developer's Guide</i></a> will guide you through this process.
           </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">
               <p> As part of this you will use the CPE Configurator tool. For details see <a href="tools.html#d5e1" class="olink">UIMA Tools Guide and Reference</a> <a href="tools.html#ugr.tools.cpe" class="olink">Chapter&nbsp;2, <i>Collection Processing Engine Configurator User's Guide</i></a>. </p>
             </li><li class="listitem">
               <p> You will also learn about CPE Descriptors. The detailed format for these may be found in <a href="references.html#d5e1" class="olink">UIMA References</a> <a href="references.html#ugr.ref.xml.cpe_descriptor" class="olink">Chapter&nbsp;3, <i>Collection Processing Engine Descriptor Reference</i></a>. </p>
             </li></ul></div><p> </p>
       </li><li class="listitem">
         <p> Learn how to package up an analysis engine for easy installation into another UIMA environment.
             <a href="tools.html#d5e1" class="olink">UIMA Tools Guide and Reference</a>
             <a href="tools.html#ugr.tools.pear.packager" class="olink">Chapter&nbsp;9, <i>PEAR Packager User's Guide</i></a> and <a href="tools.html#d5e1" class="olink">UIMA Tools Guide and Reference</a> <a href="tools.html#ugr.tools.pear.installer" class="olink">Chapter&nbsp;11, <i>PEAR Installer User's Guide</i></a> will teach you how to
           create UIMA analysis engine archives so that you can easily share your components with a broader
           community. </p>
       </li></ol></div>
   </div>

   <div class="section" title="1.3.&nbsp;Changes from UIMA Version 2"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.project_overview_changes_from_previous">1.3.&nbsp;Changes from UIMA Version 2</h2></div></div></div>

       <p>See the separate document Version 3 User's Guide.s</p>
   </div>

   <div class="section" title="1.4.&nbsp;Migrating existing UIMA pipelines from Version 2 to Version 3"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.project_overview_migrating_from_v2_to_v3">1.4.&nbsp;Migrating existing UIMA pipelines from Version 2 to Version 3</h2></div></div></div>

     <p>The format of JCas classes changed when going from version 2 to version 3.
           If you had JCas classes for user types, these need to be regenerated using the
           version 3 JCasGen tooling or Maven plugin.  Alternatively, these can be
           migrated without regenerating; the migration preserves any customization
           users may have added to the JCas classes.</p>

     <p>The Version 3 User's Guide has a chapter detailing the migration, including
       a description of the migration tool to aid in this process.</p>
   </div>

   <div class="section" title="1.5.&nbsp;Apache UIMA Summary"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.project_overview_summary">1.5.&nbsp;Apache UIMA Summary</h2></div></div></div>

     <div class="section" title="1.5.1.&nbsp;General"><div class="titlepage"><div><div><h3 class="title" id="ugr.ovv.summary.general">1.5.1.&nbsp;General</h3></div></div></div>

       <p>UIMA supports the development, discovery, composition and deployment of multi-modal
         analytics for the analysis of unstructured information and its integration with search
         technologies.</p>

       <p>Apache UIMA includes APIs and tools for creating analysis components. Examples of analysis components include
         tokenizers, summarizers, categorizers, parsers, named-entity detectors etc. Tutorial examples are
         provided with Apache UIMA; additional components are available from the community. </p>
     </div>
     <div class="section" title="1.5.2.&nbsp;Programming Language Support"><div class="titlepage"><div><div><h3 class="title" id="ugr.ovv.summary.programming_language_support">1.5.2.&nbsp;Programming Language Support</h3></div></div></div>

       <p>UIMA supports the development and integration of analysis algorithms developed in different
         programming languages. </p>

       <p>The Apache UIMA project is both a Java framework and a matching C++
         enablement layer, which allows annotators to be written in C++ and have access to a C++ version of the CAS. The
         C++ enablement layer also enables annotators to be written in Perl, Python, and TCL, and to interoperate with
         those written in other languages.
         </p>

     </div>
     <div class="section" title="1.5.3.&nbsp;Multi-Modal Support"><div class="titlepage"><div><div><h3 class="title" id="ugr.ovv.general.summary.multi_modal_support">1.5.3.&nbsp;Multi-Modal Support</h3></div></div></div>

       <p>The UIMA architecture supports the development, discovery, composition and deployment of
         multi-modal analytics, including text, audio and video. <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.aas" class="olink">Chapter&nbsp;5, <i>Annotations, Artifacts, and Sofas</i></a> discuss this is more
         detail.</p>
     </div>
   </div>

   <div class="section" title="1.6.&nbsp;Summary of Apache UIMA Capabilities"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.project_overview_summary_sdk_capabilities">1.6.&nbsp;Summary of Apache UIMA Capabilities</h2></div></div></div>

     <div class="informaltable">
       <table style="border-collapse: collapse;border-top: 0.5pt solid black; border-bottom: 0.5pt solid black; border-left: 0.5pt solid black; border-right: 0.5pt solid black; "><colgroup><col class="col1"><col class="col2"></colgroup><tbody><tr><td class="tableSubhead" style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Module</td><td class="tableSubhead" style="border-bottom: 0.5pt solid black; ">Description</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">UIMA Framework Core</td><td style="border-bottom: 0.5pt solid black; ">
               <p>A framework integrating core functions for creating, deploying, running and managing UIMA
                 components, including analysis engines and Collection Processing Engines in collocated and/or
                 distributed configurations. </p>

               <p>The framework includes an implementation of core components for transport layer adaptation,
                 CAS management, workflow management based on declarative specifications, resource management,
                 configuration management, logging, and other functions.</p>
             </td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">C++ and other programming language Interoperability</td><td style="border-bottom: 0.5pt solid black; ">
               <p>Includes C++ CAS and supports the creation of UIMA compliant C++ components that can be
                 deployed in the UIMA run-time through a built-in JNI adapter. This includes high-speed binary
                 serialization.</p>

               <p>Includes support for creating service-based UIMA engines. This is ideal for
                 wrapping existing code written in different languages.</p>
             </td></tr><tr><td class="tableSubhead" style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Framework Services and APIs</td><td class="tableSubhead" style="border-bottom: 0.5pt solid black; ">Note that interfaces of these components are available to the developer
               but different implementations are possible in different implementations of the UIMA
               framework.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">CAS</td><td style="border-bottom: 0.5pt solid black; ">These classes provide the developer with typed access to the Common Analysis Structure (CAS),
               including type system schema, elements, subjects of analysis and indices. Multiple subjects of
               analysis (Sofas) mechanism supports the independent or simultaneous analysis of multiple views of
               the same artifacts (e.g. documents), supporting multi-lingual and multi-modal analysis.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">JCas</td><td style="border-bottom: 0.5pt solid black; ">An alternative interface to the CAS, providing Java-based UIMA Analysis components with
               native Java object access to CAS types and their attributes or features, using the
               JavaBeans conventions of getters and setters.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Collection Processing Management (CPM)</td><td style="border-bottom: 0.5pt solid black; ">Core functions for running UIMA collection processing engines in collocated and/or
               distributed configurations. The CPM provides scalability across parallel processing pipelines,
               check-pointing, performance monitoring and recoverability.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Resource Manager</td><td style="border-bottom: 0.5pt solid black; ">Provides UIMA components with run-time access to external resources handling capabilities
               such as resource naming, sharing, and caching. </td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Configuration Manager</td><td style="border-bottom: 0.5pt solid black; ">Provides UIMA components with run-time access to their configuration parameter settings.
               </td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Logger</td><td style="border-bottom: 0.5pt solid black; ">Provides access to a common logging facility.</td></tr><tr><td class="tableSubhead" style="border-bottom: 0.5pt solid black; " colspan="2" align="center"> Tools and Utilities
               </td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">JCasGen</td><td style="border-bottom: 0.5pt solid black; ">Utility for generating a Java object model for CAS types from a UIMA XML type system
               definition.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Saving and Restoring CAS contents</td><td style="border-bottom: 0.5pt solid black; ">APIs in the core framework support saving and restoring the contents of a CAS to streams
               in multiple formats, including XMI, binary, and compressed forms.
               These apis are collected into the CasIOUtils class.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">PEAR Packager for Eclipse</td><td style="border-bottom: 0.5pt solid black; ">Tool for building a UIMA component archive to facilitate porting, registering, installing and
               testing components.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">PEAR Installer</td><td style="border-bottom: 0.5pt solid black; ">Tool for installing and verifying a UIMA component archive in a UIMA installation.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">PEAR Merger</td><td style="border-bottom: 0.5pt solid black; ">Utility that combines multiple PEARs into one.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Component Descriptor Editor</td><td style="border-bottom: 0.5pt solid black; ">Eclipse Plug-in for specifying and configuring component descriptors for UIMA analysis
               engines as well as other UIMA component types including Collection Readers and CAS
               Consumers.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">CPE Configurator</td><td style="border-bottom: 0.5pt solid black; ">Graphical tool for configuring Collection Processing Engines and applying them to
               collections of documents.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Java Annotation Viewer</td><td style="border-bottom: 0.5pt solid black; ">Viewer for exploring annotations and related CAS data.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">CAS Visual Debugger</td><td style="border-bottom: 0.5pt solid black; ">GUI Java application that provides developers with detailed visual view of the contents of a
               CAS.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Document Analyzer</td><td style="border-bottom: 0.5pt solid black; ">GUI Java application that applies analysis engines to sets of documents and shows results in a
               viewer.</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">CAS Editor</td><td style="border-bottom: 0.5pt solid black; ">Eclipse plug-in that lets you edit the contents of a CAS</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">UIMA Pipeline Eclipse Launcher</td><td style="border-bottom: 0.5pt solid black; ">Eclipse plug-in that lets you configure Eclipse launchers for UIMA pipelines</td></tr><tr><td class="tableSubhead" style="border-bottom: 0.5pt solid black; " colspan="2" align="center"> Example Analysis
               Components </td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Database Writer</td><td style="border-bottom: 0.5pt solid black; ">CAS Consumer that writes the content of selected CAS types into a relational database, using
               JDBC. This code is in cpe/PersonTitleDBWriterCasConsumer. </td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Annotators</td><td style="border-bottom: 0.5pt solid black; "> Set of simple annotators meant for pedagogical purposes. Includes: Date/time, Room-number,
               Regular expression, Tokenizer, and Meeting-finder annotator. There are sample CAS Multipliers
               as well. </td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Flow Controllers</td><td style="border-bottom: 0.5pt solid black; "> There is a sample flow-controller based on the whiteboard concept of sending the CAS to whatever
               annotator hasn't yet processed it, when that annotator's inputs are available in the CAS. </td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">XMI Collection Reader, CAS Consumer</td><td style="border-bottom: 0.5pt solid black; ">Reads and writes the CAS in the XMI format</td></tr><tr><td style="border-right: 0.5pt solid black; ">File System Collection Reader</td><td style=""> Simple Collection Reader for pulling documents from the file system and initializing CASes.
               </td></tr></tbody></table>
     </div>
   </div>

 </div>
   <div class="chapter" title="Chapter&nbsp;2.&nbsp;UIMA Conceptual Overview" id="ugr.ovv.conceptual"><div class="titlepage"><div><div><h2 class="title">Chapter&nbsp;2.&nbsp;UIMA Conceptual Overview</h2></div></div></div>


   <p>UIMA is an open, industrial-strength, scaleable and extensible platform for
     creating, integrating and deploying unstructured information management solutions
     from powerful text or multi-modal analysis and search components. </p>

   <p>The Apache UIMA project is an implementation of the Java UIMA framework available
     under the Apache License, providing a common foundation for industry and academia to
     collaborate and accelerate the world-wide development of technologies critical for
     discovering vital knowledge present in the fastest growing sources of information
     today.</p>

   <p>This chapter presents an introduction to many essential UIMA concepts. It is meant to
     provide a broad overview to give the reader a quick sense of UIMA's basic
     architectural philosophy and the UIMA SDK's capabilities. </p>

   <p>This chapter provides a general orientation to UIMA and makes liberal reference to
     the other chapters in the UIMA SDK documentation set, where the reader may find detailed
     treatments of key concepts and development practices. It may be useful to refer to <a href="overview_and_setup.html#ugr.glossary" class="olink">Glossary</a>, to become familiar
     with the terminology in this overview.</p>

   <div class="section" title="2.1.&nbsp;UIMA Introduction"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ovv.conceptual.uima_introduction">2.1.&nbsp;UIMA Introduction</h2></div></div></div>

     <div class="figure"><a name="ugr.ovv.conceptual.fig.bridge"></a><div class="figure-contents">

       <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="545"><tr><td><img src="images/overview-and-setup/conceptual_overview_files/image002.png" width="545" alt="Picture of a bridge between unstructured information artifacts and structured metadata about those artifacts"></td></tr></table></div>
     </div><p class="title"><b>Figure&nbsp;2.1.&nbsp;UIMA helps you build the bridge between the unstructured and structured
         worlds</b></p></div><br class="figure-break">

     <p> Unstructured information represents the largest, most current and fastest
       growing source of information available to businesses and governments. The web is just
       the tip of the iceberg. Consider the mounds of information hosted in the enterprise and
       around the world and across different media including text, voice and video. The
       high-value content in these vast collections of unstructured information is,
       unfortunately, buried in lots of noise. Searching for what you need or doing
       sophisticated data mining over unstructured information sources presents new
       challenges. </p>

     <p>An unstructured information management (UIM) application may be generally
       characterized as a software system that analyzes large volumes of unstructured
       information (text, audio, video, images, etc.) to discover, organize and deliver
       relevant knowledge to the client or application end-user. An example is an application
       that processes millions of medical abstracts to discover critical drug interactions.
       Another example is an application that processes tens of millions of documents to
       discover key evidence indicating probable competitive threats. </p>

     <p>First and foremost, the unstructured data must be analyzed to interpret, detect
       and locate concepts of interest, for example, named entities like persons,
       organizations, locations, facilities, products etc., that are not explicitly tagged
       or annotated in the original artifact. More challenging analytics may detect things
       like opinions, complaints, threats or facts. And then there are relations, for
       example, located in, finances, supports, purchases, repairs etc. The list of concepts
       important for applications to discover in unstructured content is large, varied and
       often domain specific.
       Many different component analytics may solve different parts of the overall analysis task.
       These component analytics must interoperate and must be easily combined to facilitate
       the developed of UIM applications.</p>

     <p>The result of analysis are used to populate structured forms so that conventional
       data processing and search technologies
       like search engines, database engines or OLAP
       (On-Line Analytical Processing, or Data Mining) engines
       can efficiently deliver the newly discovered content in response to the client requests
       or queries.</p>

     <p>In analyzing unstructured content, UIM applications make use of a variety of
       analysis technologies including:</p>

     <div class="itemizedlist"><ul class="itemizedlist" type="disc" compact><li class="listitem"><p>Statistical and rule-based Natural Language Processing
         (NLP)</p>
       </li><li class="listitem"><p>Information Retrieval (IR)</p>
       </li><li class="listitem"><p>Machine learning</p>
       </li><li class="listitem"><p>Ontologies</p>
       </li><li class="listitem"><p>Automated reasoning and</p>
       </li><li class="listitem"><p>Knowledge Sources (e.g., CYC, WordNet, FrameNet, etc.)</p>
       </li></ul></div>

     <p>Specific analysis capabilities using these technologies are developed
       independently using different techniques, interfaces and platforms.
       </p>

     <p>The bridge from the unstructured world to the structured world is built through the
       composition and deployment of these analysis capabilities. This integration is often
       a costly challenge. </p>

     <p>The Unstructured Information Management Architecture (UIMA) is an architecture
       and software framework that helps you build that bridge. It supports creating,
       discovering, composing and deploying a broad range of analysis capabilities and
       linking them to structured information services.</p>

     <p>UIMA allows development teams to match the right skills with the right parts of a
       solution and helps enable rapid integration across technologies and platforms using a
       variety of different deployment options. These ranging from tightly-coupled
       deployments for high-performance, single-machine, embedded solutions to parallel
       and fully distributed deployments for highly flexible and scaleable
       solutions.</p>

   </div>

   <div class="section" title="2.2.&nbsp;The Architecture, the Framework and the SDK"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ovv.conceptual.architecture_framework_sdk">2.2.&nbsp;The Architecture, the Framework and the SDK</h2></div></div></div>

     <p>UIMA is a software architecture which specifies component interfaces, data
       representations, design patterns and development roles for creating, describing,
       discovering, composing and deploying multi-modal analysis capabilities.</p>

     <p>The <span class="bold"><strong>UIMA framework</strong></span> provides a run-time
       environment in which developers can plug in their UIMA component implementations and
       with which they can build and deploy UIM applications. The framework is not specific to
       any IDE or platform. Apache hosts a Java and (soon) a C++ implementation of the UIMA
       Framework.</p>

     <p>The <span class="bold"><strong>UIMA Software Development Kit (SDK)</strong></span>
       includes the UIMA framework, plus tools and utilities for using UIMA. Some of the
       tooling supports an Eclipse-based ( <a class="ulink" href="http://www.eclipse.org/" target="_top">http://www.eclipse.org/</a>)
       development environment. </p>

   </div>

   <div class="section" title="2.3.&nbsp;Analysis Basics"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ovv.conceptual.analysis_basics">2.3.&nbsp;Analysis Basics</h2></div></div></div>

     <div class="note" title="Key UIMA Concepts Introduced in this Section:" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Key UIMA Concepts Introduced in this Section:</h3><p>Analysis Engine, Document, Annotator, Annotator
       Developer, Type, Type System, Feature, Annotation, CAS, Sofa, JCas, UIMA
       Context.</p>
     </div>

     <div class="section" title="2.3.1.&nbsp;Analysis Engines, Annotators &amp; Results"><div class="titlepage"><div><div><h3 class="title" id="ugr.ovv.conceptual.aes_annotators_and_analysis_results">2.3.1.&nbsp;Analysis Engines, Annotators &amp; Results</h3></div></div></div>

       <div class="figure"><a name="ugr.ovv.conceptual.metadata_in_cas"></a><div class="figure-contents">

         <div class="mediaobject" align="center"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="594"><tr><td align="center"><img src="images/overview-and-setup/conceptual_overview_files/image004.png" align="middle" width="594" alt="Picture of some text, with a hierarchy of discovered metadata about words in the text, including some image of a person as metadata about that name."></td></tr></table></div>
       </div><p class="title"><b>Figure&nbsp;2.2.&nbsp;Objects represented in the Common Analysis Structure (CAS)</b></p></div><br class="figure-break">

       <p>UIMA is an architecture in which basic building blocks called Analysis Engines
         (AEs) are composed to analyze a document and infer and record descriptive attributes
         about the document as a whole, and/or about regions therein. This descriptive
         information, produced by AEs is referred to generally as <span class="bold"><strong>
         analysis results</strong></span>. Analysis results typically represent meta-data
         about the document content. One way to think about AEs is as software agents that
         automatically discover and record meta-data about original content.</p>

       <p>UIMA supports the analysis of different modalities including text, audio and
         video. The majority of examples we provide are for text. We use the term <span class="bold"><strong>document, </strong></span>therefore, to generally refer to any unit of
         content that an AE may process, whether it is a text document or a segment of audio, for
         example. See the <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a>
         <a href="tutorials_and_users_guides.html#ugr.tug.mvs" class="olink">Chapter&nbsp;6, <i>Multiple CAS Views of an Artifact</i></a> for more information on multimodal processing
         in UIMA.</p>

       <p>Analysis results include different statements about the content of a document.
         For example, the following is an assertion about the topic of a document:</p>


       <pre class="programlisting">(1) The Topic of document D102 is "CEOs and Golf".</pre>

       <p>Analysis results may include statements describing regions more granular than
         the entire document. We use the term <span class="bold"><strong>span</strong></span> to
         refer to a sequence of characters in a text document. Consider that a document with the
         identifier D102 contains a span, <span class="quote">&#8220;<span class="quote">Fred Centers</span>&#8221;</span> starting at
         character position 101. An AE that can detect persons in text may represent the
         following statement as an analysis result:</p>


       <pre class="programlisting">(2) The span from position 101 to 112 in document D102 denotes a Person</pre>

       <p>In both statements 1 and 2 above there is a special pre-defined term or what we call
         in UIMA a <span class="bold"><strong>Type</strong></span>. They are
         <span class="emphasis"><em>Topic</em></span> and <span class="emphasis"><em>Person</em></span> respectively.
         UIMA types characterize the kinds of results that an AE may create &#8211; more on
         types later.</p>

       <p>Other analysis results may relate two statements. For example, an AE might
         record in its results that two spans are both referring to the same person:</p>


       <pre class="programlisting">(3) The Person denoted by span 101 to 112 and
   the Person denoted by span 141 to 143 in document D102
   refer to the same Entity.</pre>

       <p>The above statements are some examples of the kinds of results that AEs may record
         to describe the content of the documents they analyze. These are not meant to indicate
         the form or syntax with which these results are captured in UIMA &#8211; more on that
         later in this overview.</p>

       <p>The UIMA framework treats Analysis engines as pluggable, composible,
         discoverable, managed objects. At the heart of AEs are the analysis algorithms that
         do all the work to analyze documents and record analysis results. </p>

       <p>UIMA provides a basic component type intended to house the core analysis
         algorithms running inside AEs. Instances of this component are called <span class="bold"><strong>Annotators</strong></span>. The analysis algorithm developer's
         primary concern therefore is the development of annotators. The UIMA framework
         provides the necessary methods for taking annotators and creating analysis
         engines.</p>

       <p>In UIMA the person who codes analysis algorithms takes on the role of the
           <span class="bold"><strong>Annotator Developer</strong></span>. <a href="tutorials_and_users_guides.html#ugr.tug.aae" class="olink">Chapter&nbsp;1, <i>Annotator and Analysis Engine Developer's Guide</i></a>
           in <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> will take the reader
         through the details involved in creating UIMA annotators and analysis
         engines.</p>

       <p>At the most primitive level an AE wraps an annotator adding the necessary APIs and
         infrastructure for the composition and deployment of annotators within the UIMA
         framework. The simplest AE contains exactly one annotator at its core. Complex AEs
         may contain a collection of other AEs each potentially containing within them other
         AEs. </p>
     </div>

     <div class="section" title="2.3.2.&nbsp;Representing Analysis Results in the CAS"><div class="titlepage"><div><div><h3 class="title" id="ugr.ovv.conceptual.representing_results_in_cas">2.3.2.&nbsp;Representing Analysis Results in the CAS</h3></div></div></div>


       <p>How annotators represent and share their results is an important part of the UIMA
         architecture. UIMA defines a <span class="bold"><strong>Common Analysis Structure
         (CAS)</strong></span> precisely for these purposes.</p>

       <p>The CAS is an object-based data structure that allows the representation of
         objects, properties and values. Object types may be related to each other in a
         single-inheritance hierarchy. The CAS logically (if not physically) contains the
         document being analyzed. Analysis developers share and record their analysis
         results in terms of an object model within the CAS. <sup>[<a name="d5e551" href="#ftn.d5e551" class="footnote">1</a>]</sup> </p>

       <p>The UIMA framework includes an implementation and interfaces to the CAS. For a
         more detailed description of the CAS and its interfaces see <a href="references.html#d5e1" class="olink">UIMA References</a> <a href="references.html#ugr.ref.cas" class="olink">Chapter&nbsp;4, <i>CAS Reference</i></a>.</p>

       <p>A CAS that logically contains statement 2 (repeated here for your
         convenience)</p>


       <pre class="programlisting">(2) The span from position 101 to 112 in document D102 denotes a Person</pre>

       <p>would include objects of the Person type. For each person found in the body of a
         document, the AE would create a Person object in the CAS and link it to the span of text
         where the person was mentioned in the document.</p>

       <p>While the CAS is a general purpose data structure, UIMA defines a
         few basic types and affords the developer the ability to extend these to define an
         arbitrarily rich <span class="bold"><strong>Type System</strong></span>. You can think of a
         type system as an object schema for the CAS.</p>

       <p>A type system defines the various types of objects that may be discovered in
         documents by AE's that subscribe to that type system.</p>

       <p>As suggested above, Person may be defined as a type. Types have properties or
           <span class="bold"><strong>features</strong></span>. So for example,
         <span class="emphasis"><em>Age</em></span> and <span class="emphasis"><em>Occupation</em></span> may be defined as
         features of the Person type.</p>

       <p>Other types might be <span class="emphasis"><em>Organization, Company, Bank, Facility, Money,
         Size, Price, Phone Number, Phone Call, Relation, Network Packet, Product, Noun
         Phrase, Verb, Color, Parse Node, Feature Weight Array</em></span> etc.</p>

       <p>There are no limits to the different types that may be defined in a type system. A
         type system is domain and application specific.</p>

       <p>Types in a UIMA type system may be organized into a taxonomy. For example,
         <span class="emphasis"><em>Company</em></span> may be defined as a subtype of
         <span class="emphasis"><em>Organization</em></span>. <span class="emphasis"><em>NounPhrase</em></span> may be a
         subtype of a <span class="emphasis"><em>ParseNode</em></span>.</p>

       <div class="section" title="2.3.2.1.&nbsp;The Annotation Type"><div class="titlepage"><div><div><h4 class="title" id="ugr.ovv.conceptual.annotation_type">2.3.2.1.&nbsp;The Annotation Type</h4></div></div></div>


         <p>A general and common type used in artifact analysis and from which additional
           types are often derived is the <span class="bold"><strong>annotation</strong></span>
           type. </p>

         <p>The annotation type is used to annotate or label regions of an artifact. Common
           artifacts are text documents, but they can be other things, such as audio streams.
           The annotation type for text includes two features, namely
           <span class="emphasis"><em>begin</em></span> and <span class="emphasis"><em>end</em></span>. Values of these
           features represent integer offsets in the artifact and delimit a span. Any
           particular annotation object identifies the span it annotates with the
           <span class="emphasis"><em>begin</em></span> and <span class="emphasis"><em>end</em></span> features.</p>

         <p>The key idea here is that the annotation type is used to identify and label or
           <span class="quote">&#8220;<span class="quote">annotate</span>&#8221;</span> a specific region of an artifact.</p>

         <p>Consider that the Person type is defined as a subtype of annotation. An
           annotator, for example, can create a Person annotation to record the discovery of a
           mention of a person between position 141 and 143 in document D102. The annotator can
           create another person annotation to record the detection of a mention of a person in
           the span between positions 101 and 112. </p>
       </div>
       <div class="section" title="2.3.2.2.&nbsp;Not Just Annotations"><div class="titlepage"><div><div><h4 class="title" id="ugr.ovv.conceptual.not_just_annotations">2.3.2.2.&nbsp;Not Just Annotations</h4></div></div></div>


         <p>While the annotation type is a useful type for annotating regions of a
           document, annotations are not the only kind of types in a CAS. A CAS is a general
           representation scheme and may store arbitrary data structures to represent the
           analysis of documents.</p>

         <p>As an example, consider statement 3 above (repeated here for your
           convenience).</p>


         <pre class="programlisting">(3) The Person denoted by span 101 to 112 and
   the Person denoted by span 141 to 143 in document D102
   refer to the same Entity.</pre>

         <p>This statement mentions two person annotations in the CAS; the first, call it
           P1 delimiting the span from 101 to 112 and the other, call it P2, delimiting the span
           from 141 to 143. Statement 3 asserts explicitly that these two spans refer to the
           same entity. This means that while there are two expressions in the text
           represented by the annotations P1 and P2, each refers to one and the same person.
           </p>

         <p>The Entity type may be introduced into a type system to capture this kind of
           information. The Entity type is not an annotation. It is intended to represent an
           object in the domain which may be referred to by different expressions (or
           mentions) occurring multiple times within a document (or across documents within
           a collection of documents). The Entity type has a feature named
           <span class="emphasis"><em>occurrences. </em></span>This feature is used to point to all the
           annotations believed to label mentions of the same entity.</p>

         <p>Consider that the spans annotated by P1 and P2 were <span class="quote">&#8220;<span class="quote">Fred
           Center</span>&#8221;</span> and <span class="quote">&#8220;<span class="quote">He</span>&#8221;</span> respectively. The annotator might create
           a new Entity object called
           <code class="code">FredCenter</code>. To represent the relationship in statement 3 above,
           the annotator may link FredCenter to both P1 and P2 by making them values of its
           <span class="emphasis"><em>occurrences</em></span> feature.</p>

         <p> <a class="xref" href="#ugr.ovv.conceptual.metadata_in_cas" title="Figure&nbsp;2.2.&nbsp;Objects represented in the Common Analysis Structure (CAS)">Figure&nbsp;2.2, &#8220;Objects represented in the Common Analysis Structure (CAS)&#8221;</a> also
           illustrates that an entity may be linked to annotations referring to regions of
           image documents as well. To do this the annotation type would have to be extended
           with the appropriate features to point to regions of an image.</p>
       </div>

       <div class="section" title="2.3.2.3.&nbsp;Multiple Views within a CAS"><div class="titlepage"><div><div><h4 class="title" id="ugr.ovv.conceptual.multiple_views_within_a_cas">2.3.2.3.&nbsp;Multiple Views within a CAS</h4></div></div></div>


         <p>UIMA supports the simultaneous analysis of multiple views of a document. This
           support comes in handy for processing multiple forms of the artifact, for example, the audio
           and the closed captioned views of a single speech stream, or the tagged and detagged
           views of an HTML document.</p>

         <p>AEs analyze one or more views of a document. Each view contains a specific
             <span class="bold"><strong>subject of analysis(Sofa)</strong></span>, plus a set of
           indexes holding metadata indexed by that view. The CAS, overall, holds one or more
           CAS Views, plus the descriptive objects that represent the analysis results for
           each. </p>

         <p>Another common example of using CAS Views is for different translations of a
           document. Each translation may be represented with a different CAS View. Each
           translation may be described by a different set of analysis results. For more
           details on CAS Views and Sofas see <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.mvs" class="olink">Chapter&nbsp;6, <i>Multiple CAS Views of an Artifact</i></a> and <a href="tutorials_and_users_guides.html#ugr.tug.aas" class="olink">Chapter&nbsp;5, <i>Annotations, Artifacts, and Sofas</i></a>. </p>
       </div>
     </div>

     <div class="section" title="2.3.3.&nbsp;Interacting with the CAS and External Resources"><div class="titlepage"><div><div><h3 class="title" id="ugr.ovv.conceptual.interacting_with_cas_and_external_resources">2.3.3.&nbsp;Interacting with the CAS and External Resources</h3></div></div></div>


       <p>The two main interfaces that a UIMA component developer interacts with are the
         CAS and the UIMA Context.</p>

       <p>UIMA provides an efficient implementation of the CAS with multiple programming
         interfaces. Through these interfaces, the annotator developer interacts with the
         document and reads and writes analysis results. The CAS interfaces provide a suite of
         access methods that allow the developer to obtain indexed iterators to the different
         objects in the CAS. See <a href="references.html#d5e1" class="olink">UIMA References</a> <a href="references.html#ugr.ref.cas" class="olink">Chapter&nbsp;4, <i>CAS Reference</i></a>. While many objects may exist in a CAS, the annotator
         developer can obtain a specialized iterator to all Person objects associated with a
         particular view, for example.</p>

       <p>For Java annotator developers, UIMA provides the JCas. This interface provides
         the Java developer with a natural interface to CAS objects. Each type declared in the
         type system appears as a Java Class; the UIMA framework renders the Person type as a
         Person class in Java. As the analysis algorithm detects mentions of persons in the
         documents, it can create Person objects in the CAS. For more details on how to interact
         with the CAS using this interface, refer to <a href="references.html#d5e1" class="olink">UIMA References</a> <a href="references.html#ugr.ref.jcas" class="olink">Chapter&nbsp;5, <i>JCas Reference</i></a>.</p>

       <p>The component developer, in addition to interacting with the CAS, can access
         external resources through the framework's resource manager interface
         called the <span class="bold"><strong>UIMA Context</strong></span>. This interface, among
         other things, can ensure that different annotators working together in an aggregate
         flow may share the same instance of an external file or remote resource accessed
         via its URL, for example. For details on using
         the UIMA Context see <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.aae" class="olink">Chapter&nbsp;1, <i>Annotator and Analysis Engine Developer's Guide</i></a>.</p>

     </div>
     <div class="section" title="2.3.4.&nbsp;Component Descriptors"><div class="titlepage"><div><div><h3 class="title" id="ugr.ovv.conceptual.component_descriptors">2.3.4.&nbsp;Component Descriptors</h3></div></div></div>

       <p>UIMA defines interfaces for a small set of core components that users of the
         framework provide implmentations for. Annotators and Analysis Engines are two of
         the basic building blocks specified by the architecture. Developers implement them
         to build and compose analysis capabilities and ultimately applications.</p>

       <p>There are others components in addition to these, which we will learn about
         later, but for every component specified in UIMA there are two parts required for its
         implementation:</p>

       <div class="orderedlist"><ol class="orderedlist" type="1" compact><li class="listitem"><p>the declarative part and</p></li><li class="listitem"><p>the code part.</p></li></ol></div>

       <p>The declarative part contains metadata describing the component, its
         identity, structure and behavior and is called the <span class="bold"><strong>
         Component Descriptor</strong></span>. Component descriptors are represented in XML.
         The code part implements the algorithm. The code part may be a program in Java.</p>

       <p>As a developer using the UIMA SDK, to implement a UIMA component it is always the
         case that you will provide two things: the code part and the Component Descriptor.
         Note that when you are composing an engine, the code may be already provided in
         reusable subcomponents. In these cases you may not be developing new code but rather
         composing an aggregate engine by pointing to other components where the code has been
         included.</p>

       <p>Component descriptors are represented in XML and aid in component discovery,
         reuse, composition and development tooling. The UIMA SDK provides tools for easily
         creating and maintaining the component descriptors that relieve the developer from
         editing XML directly. This tool is described briefly in <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.aae" class="olink">Chapter&nbsp;1, <i>Annotator and Analysis Engine Developer's Guide</i></a>, and more
         thoroughly in <a href="tools.html#d5e1" class="olink">UIMA Tools Guide and Reference</a>
         <a href="tools.html#ugr.tools.cde" class="olink">Chapter&nbsp;1, <i>Component Descriptor Editor User's Guide</i></a>
         .</p>

       <p>Component descriptors contain standard metadata including the
         component's name, author, version, and a reference to the class that
         implements the component.</p>

       <p>In addition to these standard fields, a component descriptor identifies the
         type system the component uses and the types it requires in an input CAS and the types it
         plans to produce in an output CAS.</p>

       <p>For example, an AE that detects person types may require as input a CAS that
         includes a tokenization and deep parse of the document. The descriptor refers to a
         type system to make the component's input requirements and output types
         explicit. In effect, the descriptor includes a declarative description of the
         component's behavior and can be used to aid in component discovery and
         composition based on desired results. UIMA analysis engines provide an interface
         for accessing the component metadata represented in their descriptors. For more
         details on the structure of UIMA component descriptors refer to <a href="references.html#d5e1" class="olink">UIMA References</a> <a href="references.html#ugr.ref.xml.component_descriptor" class="olink">Chapter&nbsp;2, <i>Component Descriptor Reference</i></a>.</p>

     </div>
   </div>
   <div class="section" title="2.4.&nbsp;Aggregate Analysis Engines"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ovv.conceptual.aggregate_analysis_engines">2.4.&nbsp;Aggregate Analysis Engines</h2></div></div></div>


     <div class="note" title="Key UIMA Concepts Introduced in this Section:" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Key UIMA Concepts Introduced in this Section:</h3><p>Aggregate Analysis Engine, Delegate Analysis Engine,
       Tightly and Loosely Coupled, Flow Specification, Analysis Engine Assembler</p>
     </div>

     <div class="figure"><a name="ugr.ovv.conceptual.sample_aggregate"></a><div class="figure-contents">

       <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="588"><tr><td><img src="images/overview-and-setup/conceptual_overview_files/image006.png" width="588" alt="Picture of multiple parts (a language identifier, tokenizer, part of speech annotator, shallow parser, and named entity detector) strung together into a flow, and all of them wrapped as a single aggregate object, which produces as annotations the union of all the results of the individual annotator components ( tokens, parts of speech, names, organizations, places, persons, etc.)"></td></tr></table></div>
     </div><p class="title"><b>Figure&nbsp;2.3.&nbsp;Sample Aggregate Analysis Engine</b></p></div><br class="figure-break">

     <p>A simple or primitive UIMA Analysis Engine (AE) contains a single annotator. AEs,
       however, may be defined to contain other AEs organized in a workflow. These more complex
       analysis engines are called <span class="bold"><strong>Aggregate Analysis
       Engines.</strong></span> </p>

     <p>Annotators tend to perform fairly granular functions, for example language
       detection, tokenization or part of speech detection.
     These functions typically address just part of an overall analysis task. A workflow
       of component engines may be orchestrated to perform more complex tasks.</p>

     <p>An AE that performs named entity detection, for example, may
       include a pipeline of annotators starting with language detection feeding
       tokenization, then part-of-speech detection, then deep grammatical parsing and then
       finally named-entity detection. Each step in the pipeline is required by the
       subsequent analysis. For example, the final named-entity annotator can only do its
       analysis if the previous deep grammatical parse was recorded in the CAS.</p>

     <p>Aggregate AEs are built to encapsulate potentially complex internal structure
       and insulate it from users of the AE. In our example, the aggregate analysis engine
       developer acquires the internal components, defines the necessary flow
       between them and publishes the resulting AE. Consider the simple example illustrated
       in <a class="xref" href="#ugr.ovv.conceptual.sample_aggregate" title="Figure&nbsp;2.3.&nbsp;Sample Aggregate Analysis Engine">Figure&nbsp;2.3, &#8220;Sample Aggregate Analysis Engine&#8221;</a> where
       <span class="quote">&#8220;<span class="quote">MyNamed-EntityDetector</span>&#8221;</span> is composed of a linear flow of more
       primitive analysis engines.</p>

     <p>Users of this AE need not know how it is constructed internally but only need its name
       and its published input requirements and output types. These must be declared in the
       aggregate AE's descriptor. Aggregate AE's descriptors declare the components
       they contain and a <span class="bold"><strong>flow specification</strong></span>. The flow
       specification defines the order in which the internal component AEs should be run. The
       internal AEs specified in an aggregate are also called the <span class="bold"><strong>
       delegate analysis engines.</strong></span> The term "delegate" is used because aggregate AE's
       are thought to "delegate" functions to their internal AEs.</p>

     <p>
       In UIMA 2.0, the developer can implement a "Flow Controller" and include it as part
       of an aggregate AE by referring to it in the aggregate AE's descriptor.
       The flow controller is responsible for computing the "flow", that is,
       for determining the order in which of delegate AE's that will process the CAS.
       The Flow Contoller has access to the CAS and any external resources it may require
       for determining the flow. It can do this dynamically at run-time, it can
       make multi-step decisions and it can consider any sort of flow specification
       included in the aggregate AE's descriptor. See
       <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a>
       <a href="tutorials_and_users_guides.html#ugr.tug.fc" class="olink">Chapter&nbsp;4, <i>Flow Controller Developer's Guide</i></a>
       for details on the UIMA Flow Controller interface.
     </p>

     <p>We refer to the development role associated with building an aggregate from
       delegate AEs as the <span class="bold"><strong>Analysis Engine Assembler</strong></span>
       .</p>

     <p>The UIMA framework, given an aggregate analysis engine descriptor, will run all
       delegate AEs, ensuring that each one gets access to the CAS in the sequence produced by
       the flow controller. The UIMA framework is equipped to handle different
       deployments where the delegate engines, for example, are <span class="bold"><strong>
       tightly-coupled</strong></span> (running in the same process) or <span class="bold"><strong>
       loosely-coupled</strong></span> (running in separate processes or even on different
       machines). The framework supports a number of remote protocols for loose coupling
       deployments of aggregate analysis engines, including SOAP (which stands for Simple
       Object Access Protocol, a standard Web Services communications protocol).</p>

     <p>The UIMA framework facilitates the deployment of AEs as remote services by using an
       adapter layer that automatically creates the necessary infrastructure in response to
       a declaration in the component's descriptor. For more details on creating
       aggregate analysis engines refer to <a href="references.html#d5e1" class="olink">UIMA References</a> <a href="references.html#ugr.ref.xml.component_descriptor" class="olink">Chapter&nbsp;2, <i>Component Descriptor Reference</i></a> The component descriptor editor tool
       assists in the specification of aggregate AEs from a repository of available engines.
       For more details on this tool refer to <a href="tools.html#d5e1" class="olink">UIMA Tools Guide and Reference</a> <a href="tools.html#ugr.tools.cde" class="olink">Chapter&nbsp;1, <i>Component Descriptor Editor User's Guide</i></a>.</p>

     <p>The UIMA framework implementation has two built-in flow implementations: one
       that support a linear flow between components, and one with conditional branching
       based on the language of the document. It also supports user-provided flow
       controllers, as described in <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.fc" class="olink">Chapter&nbsp;4, <i>Flow Controller Developer's Guide</i></a>. Furthermore, the application developer is
       free to create multiple AEs and provide their own logic to combine the AEs in arbitrarily
       complex flows. For more details on this the reader may refer to <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.application.using_aes" class="olink">Section&nbsp;3.2, &#8220;Using Analysis Engines&#8221;</a>.</p>

   </div>

   <div class="section" title="2.5.&nbsp;Application Building and Collection Processing"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ovv.conceptual.applicaiton_building_and_collection_processing">2.5.&nbsp;Application Building and Collection Processing</h2></div></div></div>


     <div class="note" title="Key UIMA Concepts Introduced in this Section:" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Key UIMA Concepts Introduced in this Section:</h3><p>Process Method, Collection Processing Architecture,
       Collection Reader, CAS Consumer, CAS Initializer, Collection Processing Engine,
       Collection Processing Manager.</p></div>

     <div class="section" title="2.5.1.&nbsp;Using the framework from an Application"><div class="titlepage"><div><div><h3 class="title" id="ugr.ovv.conceptual.using_framework_from_an_application">2.5.1.&nbsp;Using the framework from an Application</h3></div></div></div>


       <div class="figure"><a name="ugr.ovv.conceptual.application_factory_ae"></a><div class="figure-contents">

         <div class="mediaobject" align="center"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="618"><tr><td align="center"><img src="images/overview-and-setup/conceptual_overview_files/image008.png" align="middle" width="618" alt="Picture of application interacting with UIMA's factory to produce an analysis engine, which acts as a container for annotators, and interfaces with the application via the process and getMetaData methods among others."></td></tr></table></div>
       </div><p class="title"><b>Figure&nbsp;2.4.&nbsp;Using UIMA Framework to create and interact with an Analysis Engine</b></p></div><br class="figure-break">

       <p>As mentioned above, the basic AE interface may be thought of as simply CAS in/CAS
         out.</p>

       <p>The application is responsible for interacting with the UIMA framework to
         instantiate an AE, create or acquire an input CAS, initialize the input CAS with a
         document and then pass it to the AE through the <span class="bold"><strong>process
         method</strong></span>. This interaction with the framework is illustrated in <a class="xref" href="#ugr.ovv.conceptual.application_factory_ae" title="Figure&nbsp;2.4.&nbsp;Using UIMA Framework to create and interact with an Analysis Engine">Figure&nbsp;2.4, &#8220;Using UIMA Framework to create and interact with an Analysis Engine&#8221;</a>. </p>

       <p>The UIMA AE Factory takes the declarative information from the Component
         Descriptor and the class files implementing the annotator, and instantiates the AE
         instance, setting up the CAS and the UIMA Context.</p>

       <p>The AE, possibly calling many delegate AEs internally, performs the overall
         analysis and its process method returns the CAS containing new analysis results.
         </p>

       <p>The application then decides what to do with the returned CAS. There are many
         possibilities. For instance the application could: display the results, store the
         CAS to disk for post processing, extract and index analysis results as part of a search
         or database application etc.</p>

       <p>The UIMA framework provides methods to support the application developer in
         creating and managing CASes and instantiating, running and managing AEs. Details
         may be found in <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.application" class="olink">Chapter&nbsp;3, <i>Application Developer's Guide</i></a>.</p>
     </div>

     <div class="section" title="2.5.2.&nbsp;Graduating to Collection Processing"><div class="titlepage"><div><div><h3 class="title" id="ugr.ovv.conceptual.graduating_to_collection_processing">2.5.2.&nbsp;Graduating to Collection Processing</h3></div></div></div>

       <div class="figure"><a name="ugr.ovv.conceptual.fig.cpe"></a><div class="figure-contents">

         <div class="mediaobject" align="center"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="578"><tr><td align="center"><img src="images/overview-and-setup/conceptual_overview_files/image010.png" align="middle" width="578" alt="High-Level UIMA Component Architecture from Source to Sink"></td></tr></table></div>
       </div><p class="title"><b>Figure&nbsp;2.5.&nbsp;High-Level UIMA Component Architecture from Source to Sink</b></p></div><br class="figure-break">

       <p>Many UIM applications analyze entire collections of documents. They connect to
         different document sources and do different things with the results. But in the
         typical case, the application must generally follow these logical steps:

         </p><div class="orderedlist"><ol class="orderedlist" type="1" compact><li class="listitem"><p>Connect to a physical source</p></li><li class="listitem"><p>Acquire a document from the source</p></li><li class="listitem"><p>Initialize a CAS with the document to be analyzed</p>
             </li><li class="listitem"><p>Send the CAS to a selected analysis engine</p></li><li class="listitem"><p>Process the resulting CAS</p></li><li class="listitem"><p>Go back to 2 until the collection is processed</p>
             </li><li class="listitem"><p>Do any final processing required after all the documents in the
             collection have been analyzed</p></li></ol></div><p> </p>

       <p>UIMA supports UIM application development for this general type of processing
         through its <span class="bold"><strong>Collection Processing
         Architecture</strong></span>.</p>

       <p>As part of the collection processing architecture UIMA introduces two primary
         components in addition to the annotator and analysis engine. These are the <span class="bold"><strong>Collection Reader</strong></span> and the <span class="bold"><strong>CAS
         Consumer</strong></span>. The complete flow from source, through document analysis,
         and to CAS Consumers supported by UIMA is illustrated in <a class="xref" href="#ugr.ovv.conceptual.fig.cpe" title="Figure&nbsp;2.5.&nbsp;High-Level UIMA Component Architecture from Source to Sink">Figure&nbsp;2.5, &#8220;High-Level UIMA Component Architecture from Source to Sink&#8221;</a>.</p>

       <p>The Collection Reader's job is to connect to and iterate through a source
         collection, acquiring documents and initializing CASes for analysis. </p>


       <p>CAS Consumers, as the name suggests, function at the end of the flow. Their job is
         to do the final CAS processing. A CAS Consumer may be implemented, for example, to
         index CAS contents in a search engine, extract elements of interest and populate a
         relational database or serialize and store analysis results to disk for subsequent
         and further analysis. </p>

       <p>A UIMA <span class="bold"><strong>Collection Processing Engine</strong></span> (CPE)
         is an aggregate component that specifies a <span class="quote">&#8220;<span class="quote">source to sink</span>&#8221;</span> flow from a
         Collection Reader though a set of analysis engines and then to a set of CAS Consumers.
         </p>

       <p>CPEs are specified by XML files called CPE Descriptors. These are declarative
         specifications that point to their contained components (Collection Readers,
         analysis engines and CAS Consumers) and indicate a flow among them. The flow
         specification allows for filtering capabilities to, for example, skip over AEs
         based on CAS contents. Details about the format of CPE Descriptors may be found in
         <a href="references.html#d5e1" class="olink">UIMA References</a>
           <a href="references.html#ugr.ref.xml.cpe_descriptor" class="olink">Chapter&nbsp;3, <i>Collection Processing Engine Descriptor Reference</i></a>.
         </p>

       <div class="figure"><a name="ugr.ovv.conceptual.fig.cpm"></a><div class="figure-contents">

         <div class="mediaobject" align="center"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="576"><tr><td align="center"><img src="images/overview-and-setup/conceptual_overview_files/image012.png" align="middle" width="576" alt="box and arrows picture of application using CPE factory to instantiate a Collection Processing Engine, and that engine interacting with the application."></td></tr></table></div>
       </div><p class="title"><b>Figure&nbsp;2.6.&nbsp;Collection Processing Manager in UIMA Framework</b></p></div><br class="figure-break">

       <p>The UIMA framework includes a <span class="bold"><strong>Collection Processing
         Manager</strong></span> (CPM). The CPM is capable of reading a CPE descriptor, and
         deploying and running the specified CPE. <a class="xref" href="#ugr.ovv.conceptual.fig.cpe" title="Figure&nbsp;2.5.&nbsp;High-Level UIMA Component Architecture from Source to Sink">Figure&nbsp;2.5, &#8220;High-Level UIMA Component Architecture from Source to Sink&#8221;</a> illustrates the role of the CPM
         in the UIMA Framework.</p>

       <p>Key features of the CPM are failure recovery, CAS management and scale-out.
         </p>

       <p>Collections may be large and take considerable time to analyze. A configurable
         behavior of the CPM is to log faults on single document failures while continuing to
         process the collection. This behavior is commonly used because analysis components
         often tend to be the weakest link -- in practice they may choke on strangely formatted
         content. </p>

       <p>This deployment option requires that the CPM run in a separate process or a
         machine distinct from the CPE components. A CPE may be configured to run with a variety
         of deployment options that control the features provided by the CPM. For details see
         <a href="references.html#d5e1" class="olink">UIMA References</a>
           <a href="references.html#ugr.ref.xml.cpe_descriptor" class="olink">Chapter&nbsp;3, <i>Collection Processing Engine Descriptor Reference</i></a>
         .</p>

       <p>The UIMA SDK also provides a tool called the CPE Configurator. This tool provides
         the developer with a user interface that simplifies the process of connecting up all
         the components in a CPE and running the result. For details on using the CPE
         Configurator see <a href="tools.html#d5e1" class="olink">UIMA Tools Guide and Reference</a> <a href="tools.html#ugr.tools.cpe" class="olink">Chapter&nbsp;2, <i>Collection Processing Engine Configurator User's Guide</i></a>. This tool currently does not provide
         access to the full set of CPE deployment options supported by the CPM; however, you can
         configure other parts of the CPE descriptor by editing it directly. For details on how
         to create and run CPEs refer to <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.cpe" class="olink">Chapter&nbsp;2, <i>Collection Processing Engine Developer's Guide</i></a>.</p>

     </div>

   </div>

   <div class="section" title="2.6.&nbsp;Exploiting Analysis Results"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ovv.conceptual.exploiting_analysis_results">2.6.&nbsp;Exploiting Analysis Results</h2></div></div></div>


     <div class="note" title="Key UIMA Concepts Introduced in this Section:" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Key UIMA Concepts Introduced in this Section:</h3><p>Semantic Search, XML Fragment Queries.</p>
     </div>

     <div class="section" title="2.6.1.&nbsp;Semantic Search"><div class="titlepage"><div><div><h3 class="title" id="ugr.ovv.conceptual.semantic_search">2.6.1.&nbsp;Semantic Search</h3></div></div></div>


       <p>In a simple UIMA Collection Processing Engine (CPE), a Collection Reader reads
         documents from the file system and initializes CASs with their content. These are
         then fed to an AE that annotates tokens and sentences, the CASs, now enriched with
         token and sentence information, are passed to a CAS Consumer that populates a search
         engine index. </p>

       <p>The search engine query processor can then use the token index to provide basic
         key-word search. For example, given a query <span class="quote">&#8220;<span class="quote">center</span>&#8221;</span> the search
         engine would return all the documents that contained the word
         <span class="quote">&#8220;<span class="quote">center</span>&#8221;</span>.</p>

       <p><span class="bold"><strong>Semantic Search</strong></span> is a search paradigm that
         can exploit the additional metadata generated by analytics like a UIMA CPE.</p>

       <p>Consider that we plugged a named-entity recognizer into the CPE described
         above. Assume this analysis engine is capable of detecting in documents and
         annotating in the CAS mentions of persons and organizations.</p>

       <p>Complementing the name-entity recognizer we add a CAS Consumer that extracts in
         addition to token and sentence annotations, the person and organizations added to
         the CASs by the name-entity detector. It then feeds these into the semantic search
         engine's index.</p>

       <p>A semantic search engine can exploit
         this addition information from the CAS to support more powerful queries. For
         example, imagine a user is looking for documents that mention an organization with
         <span class="quote">&#8220;<span class="quote">center</span>&#8221;</span> it is name but is not sure of the full or precise name of the
         organization. A key-word search on <span class="quote">&#8220;<span class="quote">center</span>&#8221;</span> would likely produce way
         too many documents because <span class="quote">&#8220;<span class="quote">center</span>&#8221;</span> is a common and ambiguous term.
         A semantic search engine might support a query language called
         <span class="bold"><strong>XML Fragments</strong></span>. This query language is
         designed to exploit the CAS annotations entered in its index. The XML Fragment query,
         for example,


         </p><pre class="programlisting">&lt;organization&gt; center &lt;/organization&gt;</pre><p>
         will produce first only documents that contain <span class="quote">&#8220;<span class="quote">center</span>&#8221;</span> where it
         appears as part of a mention annotated as an organization by the name-entity
         recognizer. This will likely be a much shorter list of documents more precisely
         matching the user's interest.</p>

       <p>Consider taking this one step further. We add a relationship recognizer that
         annotates mentions of the CEO-of relationship. We configure the CAS Consumer so that
         it sends these new relationship annotations to the semantic search index as well.
         With these additional analysis results in the index we can submit queries like


         </p><pre class="programlisting">&lt;ceo_of&gt;
     &lt;person&gt; center &lt;/person&gt;
     &lt;organization&gt; center &lt;/organization&gt;
 &lt;ceo_of&gt;</pre><p>
         This query will precisely target documents that contain a mention of an organization
         with <span class="quote">&#8220;<span class="quote">center</span>&#8221;</span> as part of its name where that organization is mentioned
         as part of a
         <code class="code">CEO-of</code> relationship annotated by the relationship
         recognizer.</p>

       <p>For more details about using UIMA and Semantic Search see the section on
         integrating text analysis and search in <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.application" class="olink">Chapter&nbsp;3, <i>Application Developer's Guide</i></a>.</p>
     </div>

     <div class="section" title="2.6.2.&nbsp;Databases"><div class="titlepage"><div><div><h3 class="title" id="ugr.ovv.conceptual.databases">2.6.2.&nbsp;Databases</h3></div></div></div>


       <p>Search engine indices are not the only place to deposit analysis results for use
         by applications. Another classic example is populating databases. While many
         approaches are possible with varying degrees of flexibly and performance all are
         highly dependent on application specifics. We included a simple sample CAS Consumer
         that provides the basics for getting your analysis result into a relational
         database. It extracts annotations from a CAS and writes them to a relational
         database, using the open source Apache Derby database.</p>
     </div>
   </div>

   <div class="section" title="2.7.&nbsp;Multimodal Processing in UIMA"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ovv.conceptual.multimodal_processing">2.7.&nbsp;Multimodal Processing in UIMA</h2></div></div></div>

     <p>In previous sections we've seen how the CAS is initialized with an initial
       artifact that will be subsequently analyzed by Analysis engines and CAS Consumers. The
       first Analysis engine may make some assertions about the artifact, for example, in the
       form of annotations. Subsequent Analysis engines will make further assertions about
       both the artifact and previous analysis results, and finally one or more CAS Consumers
       will extract information from these CASs for structured information storage.</p>
     <div class="figure"><a name="ugr.ovv.conceptual.fig.multiple_sofas"></a><div class="figure-contents">

       <div class="mediaobject" align="center"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="576"><tr><td align="center"><img src="images/overview-and-setup/conceptual_overview_files/image014.png" align="middle" width="576" alt="Picture showing audio on the left broken into segments by a segmentation component, then sent to multiple analysis pipelines in parallel, some processing the raw audio, others processing the recognized speech as text."></td></tr></table></div>
     </div><p class="title"><b>Figure&nbsp;2.7.&nbsp;Multiple Sofas in support of multi-modal analysis of an audio Stream. Some
         engines work on the audio <span class="quote">&#8220;<span class="quote">view</span>&#8221;</span>, some on the text
         <span class="quote">&#8220;<span class="quote">view</span>&#8221;</span> and some on both.</b></p></div><br class="figure-break">
     <p>Consider a processing pipeline, illustrated in <a class="xref" href="#ugr.ovv.conceptual.fig.multiple_sofas" title="Figure&nbsp;2.7.&nbsp;Multiple Sofas in support of multi-modal analysis of an audio Stream. Some engines work on the audio &#8220;view&#8221;, some on the text &#8220;view&#8221; and some on both.">Figure&nbsp;2.7, &#8220;Multiple Sofas in support of multi-modal analysis of an audio Stream. Some
         engines work on the audio <span class="quote">&#8220;<span class="quote">view</span>&#8221;</span>, some on the text
         <span class="quote">&#8220;<span class="quote">view</span>&#8221;</span> and some on both.&#8221;</a>, that starts with an
       audio recording of a conversation, transcribes the audio into text, and then extracts
       information from the text transcript. Analysis Engines at the start of the pipeline are
       analyzing an audio subject of analysis, and later analysis engines are analyzing a text
       subject of analysis. The CAS Consumer will likely want to build a search index from
       concepts found in the text to the original audio segment covered by the concept.</p>

     <p>What becomes clear from this relatively simple scenario is that the CAS must be
       capable of simultaneously holding multiple subjects of analysis. Some analysis
       engine will analyze only one subject of analysis, some will analyze one and create
       another, and some will need to access multiple subjects of analysis at the same time.
       </p>

     <p>The support in UIMA for multiple subjects of analysis is called <span class="bold"><strong>Sofa</strong></span> support; Sofa is an acronym which is derived from
         <span class="underline">S</span>ubject <span class="underline">
       of</span> <span class="underline">A</span>nalysis, which is a physical
       representation of an artifact (e.g., the detagged text of a web-page, the HTML
       text of the same web-page, the audio segment of a video, the close-caption text
       of the same audio segment). A Sofa may
       be associated with CAS Views. A particular CAS will have one or more views, each view
       corresponding to a particular subject of analysis, together with a set of the defined
       indexes that index the metadata (that is, Feature Structures) created in that view.</p>

     <p>Analysis results can be indexed in, or <span class="quote">&#8220;<span class="quote">belong</span>&#8221;</span> to, a specific view.
       UIMA components may be written in <span class="quote">&#8220;<span class="quote">Multi-View</span>&#8221;</span> mode - able to create and
       access multiple Sofas at the same time, or in <span class="quote">&#8220;<span class="quote">Single-View</span>&#8221;</span> mode, simply
       receiving a particular view of the CAS corresponding to a particular single Sofa. For
       single-view mode components, it is up to the person assembling the component to supply
       the needed information to insure a particular view is passed to the component at run
       time. This is done using XML descriptors for Sofa mapping (see <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.mvs.sofa_name_mapping" class="olink">Section&nbsp;6.4, &#8220;Sofa Name Mapping&#8221;</a>).</p>

     <p>Multi-View capability brings benefits to text-only processing as well. An input
       document can be transformed from one format to another. Examples of this include
       transforming text from HTML to plain text or from one natural language to another.
       </p>
   </div>

   <div class="section" title="2.8.&nbsp;Next Steps"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ovv.conceptual.next_steps">2.8.&nbsp;Next Steps</h2></div></div></div>


     <p>This chapter presented a high-level overview of UIMA concepts. Along the way, it
       pointed to other documents in the UIMA SDK documentation set where the reader can find
       details on how to apply the related concepts in building applications with the UIMA
       SDK.</p>

     <p>At this point the reader may return to the documentation guide in <a href="overview_and_setup.html#ugr.project_overview_doc_use" class="olink">Section&nbsp;1.2, &#8220;How to use the Documentation&#8221;</a>
       to learn how they might proceed in getting started using UIMA.</p>

     <p>For a more detailed overview of the UIMA architecture, framework and development
       roles we refer the reader to the following paper:</p>

     <p>D. Ferrucci and A. Lally, <span class="quote">&#8220;<span class="quote">Building an example application using the
       Unstructured Information Management Architecture,</span>&#8221;</span> <span class="emphasis"><em>IBM Systems
       Journal</em></span> <span class="bold"><strong>43</strong></span>, No. 3, 455-475 (2004).
       </p>

     <p>This paper can be found on line at <a class="ulink" href="http://www.research.ibm.com/journal/sj43-3.html" target="_top">http://www.research.ibm.com/journal/sj43-3.html</a></p>
   </div>

 <div class="footnotes"><br><hr width="100" align="left"><div class="footnote"><p><sup>[<a id="ftn.d5e551" href="#d5e551" class="para">1</a>] </sup> We have plans to
         extend the representational capabilities of the CAS and align its semantics with the
         semantics of the OMG's Essential Meta-Object Facility (EMOF) and with the
         semantics of the Eclipse Modeling Framework's ( <a class="ulink" href="http://www.eclipse.org/emf/" target="_top">http://www.eclipse.org/emf/</a>) Ecore semantics and XMI-based
         representation.</p> </div></div></div>
   <div class="chapter" title="Chapter&nbsp;3.&nbsp;Setting up the Eclipse IDE to work with UIMA" id="ugr.ovv.eclipse_setup"><div class="titlepage"><div><div><h2 class="title">Chapter&nbsp;3.&nbsp;Setting up the Eclipse IDE to work with UIMA</h2></div></div></div>


   <p>This chapter describes how to set up the UIMA SDK to work with Eclipse. Eclipse (<a class="ulink" href="http://www.eclipse.org" target="_top">http://www.eclipse.org</a>) is a popular open-source Integrated Development
     Environment for many things, including Java. The UIMA SDK does not require that you use
     Eclipse. However, we recommend that you do use Eclipse because some useful UIMA SDK tools
     run as plug-ins to the Eclipse platform and because the UIMA SDK examples are provided in a
     form that's easy to import into your Eclipse environment.</p>

   <p>If you are not planning on using the UIMA SDK with Eclipse, you may skip this chapter and
     read <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a>
     <a href="tutorials_and_users_guides.html#ugr.tug.aae" class="olink">Chapter&nbsp;1, <i>Annotator and Analysis Engine Developer's Guide</i></a>
     next.</p>

   <p>This chapter provides instructions for

     </p><div class="itemizedlist"><ul class="itemizedlist" type="disc" compact><li class="listitem"><p>installing Eclipse, </p>
       </li><li class="listitem"><p>installing the UIMA SDK's Eclipse plugins into your Eclipse
         environment, and </p></li><li class="listitem"><p>importing the example UIMA code into an Eclipse project. </p>
         </li></ul></div>

   <p>The UIMA Eclipse plugins are designed to be used with Eclipse version 3.1 or
     later.
   </p>

   <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>You will need to run Eclipse using a Java at the 1.8 level, in order
   to use the UIMA Eclipse plugins.</p></div>

   <div class="section" title="3.1.&nbsp;Installation"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ovv.eclipse_setup.installation">3.1.&nbsp;Installation</h2></div></div></div>

     <div class="section" title="3.1.1.&nbsp;Install Eclipse"><div class="titlepage"><div><div><h3 class="title" id="ugr.ovv.eclipse_setup.install_eclipse">3.1.1.&nbsp;Install Eclipse</h3></div></div></div>


       <div class="itemizedlist"><ul class="itemizedlist" type="disc" compact><li class="listitem"><p>Go to <a class="ulink" href="http://www.eclipse.org" target="_top">http://www.eclipse.org</a> and follow the instructions there to download Eclipse.
         </p></li><li class="listitem"><p>We recommend using the latest release level.
           Navigate to the Eclipse Release version you
           want and download the archive for your platform.</p></li><li class="listitem"><p>Unzip the archive to install Eclipse somewhere, e.g., c:\</p>
           </li><li class="listitem"><p>Eclipse has a bit of a learning curve. If you plan to make
           significant use of Eclipse, check out the tutorial under the help menu. It is well
           worth the effort. There are also books you can get that describe Eclipse and its
         use.</p></li></ul></div>

       <p>The first time Eclipse starts up it will take a bit longer as it completes its
         installation. A <span class="quote">&#8220;<span class="quote">welcome</span>&#8221;</span> page will come up. After you are through
         reading the welcome information, click on the arrow to exit the welcome page and get to
         the main Eclipse screens.</p>
     </div>

     <div class="section" title="3.1.2.&nbsp;Installing the UIMA Eclipse Plugins"><div class="titlepage"><div><div><h3 class="title" id="ugr.ovv.eclipse_setup.install_uima_eclipse_plugins">3.1.2.&nbsp;Installing the UIMA Eclipse Plugins</h3></div></div></div>


       <p>The best way to do this is to use the Eclipse Install New Software mechanism, because that will
         insure that all needed prerequisites are also installed.  See below for an alternative,
         manual approach.</p>

         <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>If your computer is on an internet connection which uses a proxy server, you can
         configure Eclipse to know about that. Put your proxy settings into Eclipse using the
         Eclipse preferences by accessing the menus: Window <span class="symbol">&#8594;</span> Preferences... <span class="symbol">&#8594;</span>
         Install/Update, and Enable HTTP proxy connection under the Proxy Settings with the
         information about your proxy. </p></div>


       <p>To use the Eclipse Install New Software mechanism, start Eclipse, and then pick the menu
         <span class="command"><strong>Help <span class="symbol">&#8594;</span> Install new software...</strong></span>.  In the next page, enter
         the following URL in the "Work with" box and press enter:
         </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p></p><code class="code">https://www.apache.org/dist/uima/eclipse-update-site/</code> or</li><li class="listitem"><p></p><code class="code">https://www.apache.org/dist/uima/eclipse-update-site-uv3/</code>.</li></ul></div><p>
         Choose the 2nd if you are working with core UIMA Java SDK at version 3 or later.
        .</p>

       <p>Now select the plugin tools you wish to install, and click Next, and follow the
         remaining panels to install the UIMA plugins.  </p>
     </div>


     <div class="section" title="3.1.3.&nbsp;Install the UIMA SDK"><div class="titlepage"><div><div><h3 class="title" id="ugr.ovv.eclipse_setup.install_uima_sdk">3.1.3.&nbsp;Install the UIMA SDK</h3></div></div></div>

       <p>If you haven't already done so, please download and install the UIMA SDK from
           <a class="ulink" href="http://incubator.apache.org/uima" target="_top">http://incubator.apache.org/uima</a>.  Be sure to set the environmental variable
           UIMA_HOME pointing to the root of the installed UIMA SDK and run the
           <code class="literal">adjustExamplePaths.bat</code> or <code class="literal">adjustExamplePaths.sh</code>
           script, as explained in the README.</p>

       <p>The environmental parameter UIMA_HOME is used by the command-line scripts in the
           %UIMA_HOME%/bin directory as well as by eclipse run configurations in the uimaj-examples
           sample project.</p>

     </div>

     <div class="section" title="3.1.4.&nbsp;Installing the UIMA Eclipse Plugins, manually"><div class="titlepage"><div><div><h3 class="title" id="ugr.ovv.eclipse_setup.install_uima_eclipse_plugins_manually">3.1.4.&nbsp;Installing the UIMA Eclipse Plugins, manually</h3></div></div></div>


       <p>If you installed the UIMA plugins using the update mechanism above, please skip this section.</p>

       <p>If you are unable to use the Eclipse Update mechanism to install the UIMA plugins, you
         can do this manually.  In the directory %UIMA_HOME%/eclipsePlugins (The environment variable
         %UIMA_HOME% is where you installed the UIMA SDK), you will see a set of folders. Copy these
         to your %ECLIPSE_HOME%/dropins directory (%ECLIPSE_HOME% is where you
         installed Eclipse).</p>

     </div>

     <div class="section" title="3.1.5.&nbsp;Start Eclipse"><div class="titlepage"><div><div><h3 class="title" id="ugr.ovv.eclipse_setup.start_eclipse">3.1.5.&nbsp;Start Eclipse</h3></div></div></div>

       <p>If you have Eclipse running, restart it (shut it down, and start it again) using
         the
         <code class="code">-clean</code> option; you can do this by running the command
         <span class="command"><strong>eclipse -clean</strong></span> (see explanation in the next section) in the
         directory where you installed Eclipse. You may want to set up a desktop shortcut at
         this point for Eclipse.</p>

       <div class="section" title="3.1.5.1.&nbsp;Special startup parameter for Eclipse: -clean"><div class="titlepage"><div><div><h4 class="title" id="ugr.ovv.eclipse_setup.special_startup_parameter_clean">3.1.5.1.&nbsp;Special startup parameter for Eclipse: -clean</h4></div></div></div>

         <p>If you have modified the plugin structure (by copying or files directly in the
           file system) after you started it for the first time, please include
           the <span class="quote">&#8220;<span class="quote">-clean</span>&#8221;</span> parameter in the startup arguments to Eclipse,
           <span class="emphasis"><em>one time</em></span> (after any plugin modifications were done). This
           is needed because Eclipse may not notice the changes you made, otherwise. This
           parameter forces Eclipse to reexamine all of its plugins at startup and recompute
           any cached information about them.</p>
       </div>

     </div>
   </div>
   <div class="section" title="3.2.&nbsp;Setting up Eclipse to view Example Code"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ovv.eclipse_setup.example_code">3.2.&nbsp;Setting up Eclipse to view Example Code</h2></div></div></div>

     <p>Later chapters refer to example code. Here's how to create a special project in Eclipse to
       hold the examples.</p>

     <div class="itemizedlist"><ul class="itemizedlist" type="disc" compact><li class="listitem"><p>In Eclipse, if the Java
       perspective is not already open, switch to it by going to Window <span class="symbol">&#8594;</span> Open Perspective
       <span class="symbol">&#8594;</span> Java.</p></li><li class="listitem"><p>Set up a class path variable named UIMA_HOME, whose value is the
         directory where you installed the UIMA SDK. This is done as follows:

         </p><div class="itemizedlist"><ul class="itemizedlist" type="circle"><li class="listitem"><p>Go to Window <span class="symbol">&#8594;</span> Preferences <span class="symbol">&#8594;</span> Java
           <span class="symbol">&#8594;</span> Build Path <span class="symbol">&#8594;</span> Classpath Variables.</p></li><li class="listitem"><p>Click <span class="quote">&#8220;<span class="quote">New</span>&#8221;</span></p></li><li class="listitem"><p>Enter UIMA_HOME (all capitals, exactly as written) in the
             <span class="quote">&#8220;<span class="quote">Name</span>&#8221;</span> field.</p></li><li class="listitem"><p>Enter your installation directory (e.g. <code class="literal">C:/Program
             Files/apache-uima</code>) in the <span class="quote">&#8220;<span class="quote">Path</span>&#8221;</span> field</p>
             </li><li class="listitem"><p>Click <span class="quote">&#8220;<span class="quote">OK</span>&#8221;</span> in the <span class="quote">&#8220;<span class="quote">New Variable
             Entry</span>&#8221;</span> dialog</p></li><li class="listitem"><p>Click <span class="quote">&#8220;<span class="quote">OK</span>&#8221;</span> in the <span class="quote">&#8220;<span class="quote">Preferences</span>&#8221;</span>
             dialog</p></li><li class="listitem"><p>If it asks you if you want to do a full build, click
             <span class="quote">&#8220;<span class="quote">Yes</span>&#8221;</span> </p></li></ul></div>
         </li><li class="listitem"><p>Select the File <span class="symbol">&#8594;</span> Import menu option</p></li><li class="listitem"><p>Select <span class="quote">&#8220;<span class="quote">General/Existing Project into Workspace</span>&#8221;</span> and click
         the <span class="quote">&#8220;<span class="quote">Next</span>&#8221;</span> button.</p></li><li class="listitem"><p>Click <span class="quote">&#8220;<span class="quote">Browse</span>&#8221;</span> and browse to the
         %UIMA_HOME%/examples directory</p></li><li class="listitem"><p>Click <span class="quote">&#8220;<span class="quote">Finish.</span>&#8221;</span> This will create a new project called
         <span class="quote">&#8220;<span class="quote">uimaj-examples</span>&#8221;</span> in your Eclipse workspace. There should be no
         compilation errors. </p></li></ul></div>

     <p>To verify that you have set up the project correctly, check that there are no error
       messages in the <span class="quote">&#8220;<span class="quote">Problems</span>&#8221;</span> view.</p>

   </div>

   <div class="section" title="3.3.&nbsp;Adding the UIMA source code to the jar files"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ovv.eclipse_setup.adding_source">3.3.&nbsp;Adding the UIMA source code to the jar files</h2></div></div></div>


     <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>If you are running a current version of Eclipse, and have the m2e (Maven extensions for Eclipse)
     plugin installed, Eclipse should be able to automatically download the source for the jars, so you may not need
     to do anything special (it does take a few seconds, and you need an internet connection).</p></div>
     <p>Otherwise, if you would like to be able to jump to the UIMA source code in Eclipse or to step
     through it with the debugger, you can add the UIMA source code directly to the jar files.  This is
     done via a shell script that comes with the source distribution.  To add the source code
     to the jars, you need to:
     </p>

     <div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">
     <p>
     Download and unpack the UIMA source distribution.
     </p>
     </li><li class="listitem">
     <p>
     Download and install the UIMA binary distribution (the UIMA_HOME environment variable needs
     to be set to point to where you installed the UIMA binary distribution).
     </p>
     </li><li class="listitem">
       <p>"cd" to the root directory of the source distribution</p>
     </li><li class="listitem">
     <p>
     Execute the <span class="command"><strong>src\main\readme_src\addSourceToJars</strong></span> script in the root directory of the
     source distribution.
     </p>
     </li></ul></div>

     <p>
     This adds the source code to the jar files, and it will then be automatically available
     from Eclipse.  There is no further Eclipse setup required.
     </p>

   </div>


   <div class="section" title="3.4.&nbsp;Attaching UIMA Javadocs"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ovv.eclipse_setup.linking_uima_javadocs">3.4.&nbsp;Attaching UIMA Javadocs</h2></div></div></div>


      <p>The binary distribution also includes the UIMA Javadocs.  They are
        attached to the UIMA library Jar files in the uima-examples project described
        above.  You can attach the Javadocs to your own project as well.
      </p>

      <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>If you attached the source as described in the previous section, you
      don't need to attach the Javadocs because the source includes the Javadoc comments.</p></div>

      <p>Attaching the Javadocs enables Javadoc help for UIMA APIs.  After they are
        attached, if you hover your mouse
      over a certain UIMA api element, the corresponding Javadoc will appear.
        You can then press <span class="quote">&#8220;<span class="quote">F2</span>&#8221;</span> to make the hover "stick", or
        <span class="quote">&#8220;<span class="quote">Shift-F2</span>&#8221;</span> to open the default
        web-browser on your system to let you browse the entire Javadoc information
        for that element.
      </p>
      <p>If this pop-up behavior is something you don't want, you can turn it off
      in the Eclipse preferences, in the menu Window <span class="symbol">&#8594;</span> Preferences <span class="symbol">&#8594;</span>
        Java <span class="symbol">&#8594;</span> Editors <span class="symbol">&#8594;</span> hovers.
      </p>

      <p>Eclipse also has a Javadoc "view" which you can show, using the Window <span class="symbol">&#8594;</span>
      Show View <span class="symbol">&#8594;</span> Javadoc.</p>

      <p>See <a href="references.html#d5e1" class="olink">UIMA References</a>
      <a href="references.html#ugr.ref.javadocs.libraries" class="olink">Section&nbsp;1.1, &#8220;Using named Eclipse User Libraries&#8221;</a>
      for information on how to set up a UIMA "library" with the Javadocs attached, which
      can be reused for other projects in your Eclipse workspace.</p>

      <p>You can attach the Javadocs to each UIMA library jar you think you might be
        interested in.  It makes most sense
        for the uima-core.jar, you'll probably use the core APIs most of all.
      </p>

      <p>Here's a screenshot of what you should see when you hover your mouse pointer over the
      class name <span class="quote">&#8220;<span class="quote">CAS</span>&#8221;</span> in the source code.
      </p>

        <div class="informalfigure">
          <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="564"><tr><td><img src="images/overview-and-setup/eclipse_setup_files/image004.jpg" width="564" alt="Screenshot of mouse-over for UIMA APIs"></td></tr></table></div>
        </div>

    </div>

   <div class="section" title="3.5.&nbsp;Running external tools from Eclipse"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.ovv.eclipse_setup.running_external_tools_from_eclipse">3.5.&nbsp;Running external tools from Eclipse</h2></div></div></div>


     <p>You can run many tools without using Eclipse at all, by using the shell scripts in the
       UIMA SDK's bin directory. In addition, many tools can be run from inside Eclipse;
       examples are the Document Analyzer, CPE Configurator, CAS Visual Debugger,
       and JCasGen. The uimaj-examples project provides Eclipse launch
       configurations that make this easy to do.</p>

     <p>To run these tools from Eclipse:</p>

     <div class="itemizedlist"><ul class="itemizedlist" type="disc" compact><li class="listitem"><p>If the Java perspective is not
       already open, switch to it by going to Window <span class="symbol">&#8594;</span> Open Perspective <span class="symbol">&#8594;</span>
       Java.</p></li><li class="listitem"><p>Go to Run <span class="symbol">&#8594;</span> Run... </p></li><li class="listitem"><p>In the window that appears, select <span class="quote">&#8220;<span class="quote">UIMA CPE GUI</span>&#8221;</span>,
         <span class="quote">&#8220;<span class="quote">UIMA CAS Visual Debugger</span>&#8221;</span>, <span class="quote">&#8220;<span class="quote">UIMA JCasGen</span>&#8221;</span>, or
         <span class="quote">&#8220;<span class="quote">UIMA Document Analyzer</span>&#8221;</span>
         from the list of run configurations on the left. (If you don't see, these, please
         select the uimaj-examples project and do a Menu <span class="symbol">&#8594;</span> File
         <span class="symbol">&#8594;</span> Refresh).</p></li><li class="listitem"><p>Press the <span class="quote">&#8220;<span class="quote">Run</span>&#8221;</span> button. The tools should start. Close
         the tools by clicking the <span class="quote">&#8220;<span class="quote">X</span>&#8221;</span> in the upper right corner on the GUI.
         </p></li></ul></div>

     <p>For instructions on using the Document Analyzer and CPE Configurator,
       in the <a href="tools.html#d5e1" class="olink">UIMA Tools Guide and Reference</a> book see <a href="tools.html#ugr.tools.doc_analyzer" class="olink">Chapter&nbsp;3, <i>Document Analyzer User's Guide</i></a>, and
         <a href="tools.html#ugr.tools.cpe" class="olink">Chapter&nbsp;2, <i>Collection Processing Engine Configurator User's Guide</i></a> For
       instructions on using the CAS Visual Debugger and JCasGen, see <a href="tools.html#ugr.tools.cvd" class="olink">Chapter&nbsp;5, <i>CAS Visual Debugger</i></a> and
         <a href="tools.html#ugr.tools.jcasgen" class="olink">Chapter&nbsp;8, <i>JCasGen User's Guide</i></a></p>

   </div>

 </div>
   <div class="chapter" title="Chapter&nbsp;4.&nbsp;UIMA Frequently Asked Questions (FAQ's)" id="ugr.faqs"><div class="titlepage"><div><div><h2 class="title">Chapter&nbsp;4.&nbsp;UIMA Frequently Asked Questions (FAQ's)</h2></div></div></div>


   <div class="variablelist"><dl><dt><a name="ugr.faqs.what_is_uima"></a><span class="term"><span class="bold"><strong>What is UIMA?</strong></span></span></dt><dd><p>UIMA stands for Unstructured Information Management
           Architecture. It is component software architecture for the development,
           discovery, composition and deployment of multi-modal analytics for the analysis
           of unstructured information.</p>
           <p>UIMA processing occurs through a series of modules called
             <a class="link" href="#ugr.faqs.annotator_versus_ae">analysis engines</a>. The result of analysis is an assignment of semantics to the elements of
             unstructured data, for example, the indication that the phrase
             <span class="quote">&#8220;<span class="quote">Washington</span>&#8221;</span> refers to a person's name or that it refers to a
             place.</p>

           <p>Analysis Engine's output can be saved in conventional structures,
             for example, relational databases or search engine indices, where the content
             of the original unstructured information may be efficiently accessed
             according to its inferred semantics. </p>

           <p>UIMA supports developers in creating,
             integrating, and deploying components across platforms and among dispersed
             teams working to develop unstructured information management
             applications.</p>
         </dd><dt><a name="ugr.faqs.pronounce"></a><span class="term"><span class="bold"><strong>How do you pronounce UIMA?</strong></span></span></dt><dd><p>You &#8211; eee &#8211; muh.
         </p></dd><dt><a name="ugr.faqs.difference_apache_uima"></a><span class="term"><span class="bold"><strong>What's the difference between UIMA and the Apache UIMA?</strong></span></span></dt><dd><p>UIMA is an architecture which specifies component interfaces,
           design patterns, data representations and development roles.</p>

           <p>Apache UIMA is an open source, Apache-licensed software project.  It includes run-time
             frameworks in Java and C++, APIs and tools for implementing, composing, packaging
             and deploying UIMA components.</p>

           <p>The UIMA run-time framework allows developers to plug-in their components
             and applications and run them on different platforms and according to different
             deployment options that range from tightly-coupled (running in the same
             process space) to loosely-coupled (distributed across different processes or
             machines for greater scale, flexibility and recoverability).</p>

           <p>The UIMA project has several significant subprojects, including UIMA-AS (for flexibly
           scaling out UIMA pipelines over clusters of machines), uimaFIT (for a way of using UIMA without the xml descriptors; also provides
           many convenience methods), UIMA-DUCC (for managing clusters of
           machines running scaled-out UIMA "jobs" in a "fair" way), RUTA (Eclipse-based tooling and \
           a runtime framework for development of rule-based
           Annotators), Addons (where you can find many extensions), and uimaFIT supplying a Java centric
           set of friendlier interfaces and avoiding XML.</p>
         </dd><dt><a name="ugr.faqs.what_is_an_annotation"></a><span class="term"><span class="bold"><strong>What is an Annotation?</strong></span></span></dt><dd><p>An annotation is metadata that is associated with a region of a
           document. It often is a label, typically represented as string of characters. The
           region may be the whole document. </p>

           <p>An example is the label <span class="quote">&#8220;<span class="quote">Person</span>&#8221;</span> associated with the span of
             text <span class="quote">&#8220;<span class="quote">George Washington</span>&#8221;</span>. We say that <span class="quote">&#8220;<span class="quote">Person</span>&#8221;</span>
             annotates <span class="quote">&#8220;<span class="quote">George Washington</span>&#8221;</span> in the sentence <span class="quote">&#8220;<span class="quote">George
             Washington was the first president of the United States</span>&#8221;</span>. The
             association of the label
             <span class="quote">&#8220;<span class="quote">Person</span>&#8221;</span> with a particular span of text is an annotation. Another
             example may have an annotation represent a topic, like <span class="quote">&#8220;<span class="quote">American
             Presidents</span>&#8221;</span> and be used to label an entire document.</p>

           <p>Annotations are not limited to regions of texts. An annotation may annotate
             a region of an image or a segment of audio. The same concepts apply.</p>
         </dd><dt><a name="ugr.faqs.what_is_the_cas"></a><span class="term"><span class="bold"><strong>What is the CAS?</strong></span></span></dt><dd><p>The CAS stands for Common Analysis Structure. It provides
           cooperating UIMA components with a common representation and mechanism for
           shared access to the artifact being analyzed (e.g., a document, audio file, video
           stream etc.) and the current analysis results.</p></dd><dt><a name="ugr.faqs.what_does_the_cas_contain"></a><span class="term"><span class="bold"><strong>What does the CAS contain?</strong></span></span></dt><dd><p>The CAS is a data structure for which UIMA provides multiple
           interfaces. It contains and provides the analysis algorithm or application
           developer with access to</p>

           <div class="itemizedlist"><ul class="itemizedlist" type="disc" compact><li class="listitem"><p>the subject of analysis (the artifact being analyzed, like
               the document),</p></li><li class="listitem"><p>the analysis results or metadata(e.g., annotations, parse
               trees, relations, entities etc.),</p></li><li class="listitem"><p>indices to the analysis results, and</p></li><li class="listitem"><p>the type system (a schema for the analysis results).</p>
             </li></ul></div>

           <p>A CAS can hold multiple versions of the artifact being analyzed (for
             instance, a raw html document, and a detagged version, or an English version and a
             corresponding German version, or an audio sample, and the text that
             corresponds, etc.). For each version there is a separate instance of the results
             indices.</p></dd><dt><a name="ugr.faqs.only_annotations"></a><span class="term"><span class="bold"><strong>Does the CAS only contain Annotations?</strong></span></span></dt><dd><p>No. The CAS contains the artifact being analyzed plus the analysis
           results. Analysis results are those metadata recorded by <a class="link" href="#ugr.faqs.annotator_versus_ae">analysis engines</a> in the
           CAS. The most common form of analysis result is the addition of an annotation. But an
           analysis engine may write any structure that conforms to the CAS's type
           system into the CAS. These may not be annotations but may be other things, for
           example links between annotations and properties of objects associated with
           annotations.</p>
           <p>The CAS may have multiple representations of the artifact being analyzed, each one
             represented in the CAS as a particular Subject of Analysis. or <a class="link" href="#ugr.faqs.what_is_a_sofa">Sofa</a></p></dd><dt><a name="ugr.faqs.just_xml"></a><span class="term"><span class="bold"><strong>Is the CAS just XML?</strong></span></span></dt><dd><p>No, in fact there are many possible representations of the CAS. If all
           of the <a class="link" href="#ugr.faqs.annotator_versus_ae">analysis engines</a> are running in the same process, an efficient, in-memory
           data object is used. If a CAS must be sent to an analysis engine on a remote machine, it
           can be done via an XML or a binary serialization of the CAS. </p>

           <p>The UIMA framework provides multiple serialization and de-serialization methods
             in various formats, including XML.  See the Javadocs for the CasIOUtils class.
             </p></dd><dt><a name="ugr.faqs.what_is_a_type_system"></a><span class="term"><span class="bold"><strong>What is a Type System?</strong></span></span></dt><dd><p>Think of a type system as a schema or class model for the <a class="link" href="#ugr.faqs.what_is_the_cas">CAS</a>. It defines
           the types of objects and their properties (or features) that may be instantiated in
           a CAS. A specific CAS conforms to a particular type system. UIMA components declare
           their input and output with respect to a type system. </p>

           <p>Type Systems include the definitions of types, their properties, range
             types (these can restrict the value of properties to other types) and
             single-inheritance hierarchy of types.</p></dd><dt><a name="ugr.faqs.what_is_a_sofa"></a><span class="term"><span class="bold"><strong>What is a Sofa?</strong></span></span></dt><dd><p>Sofa stands for &#8220;Subject of Analysis". A <a class="link" href="#ugr.faqs.what_is_the_cas">CAS</a> is
           associated with a single artifact being analysed by a collection of UIMA analysis
           engines. But a single artifact may have multiple independent views, each of which
           may be analyzed separately by a different set of <a class="link" href="#ugr.faqs.annotator_versus_ae">analysis engines</a>. For example,
           given a document it may have different translations, each of which are associated
           with the original document but each potentially analyzed by different engines. A
           CAS may have multiple Views, each containing a different Subject of Analysis
           corresponding to some version of the original artifact. This feature is ideal for
           multi-modal analysis, where for example, one view of a video stream may be the video
           frames and the other the close-captions.</p></dd><dt><a name="ugr.faqs.annotator_versus_ae"></a><span class="term"><span class="bold"><strong>What's the difference between an Annotator and an Analysis
           Engine?</strong></span></span></dt><dd><p>In the terminology of UIMA, an annotator is simply some code that
           analyzes documents and outputs <a class="link" href="#ugr.faqs.what_is_an_annotation">annotations</a> on the content of the documents. The
           UIMA framework takes the annotator, together with metadata describing such
           things as the input requirements and outputs types of the annotator, and produces
           an analysis engine. </p>

           <p>Analysis Engines contain the framework-provided infrastructure that
             allows them to be easily combined with other analysis engines in different flows
             and according to different deployment options (collocated or as web services,
             for example). </p>

           <p>Analysis Engines are the framework-generated objects that an Application
             interacts with. An Annotator is a user-written class that implements the one of
             the supported Annotator interfaces.</p></dd><dt><a name="ugr.faqs.web_services"></a><span class="term"><span class="bold"><strong>Are UIMA analysis engines web services?</strong></span></span></dt><dd><p>They can be deployed as such. Deploying an analysis engine as a web
           service is one of the deployment options supported by the UIMA framework.</p>
         </dd><dt><a name="ugr.faqs.stateless_aes"></a><span class="term"><span class="bold"><strong>Do Analysis Engines have to be
           "stateless"?</strong></span></span></dt><dd><p>This is a user-specifyable option. The XML metadata for the
           component includes an
           <code class="code">operationalProperties</code> element which can specify if multiple
           deployment is allowed. If true, then a particular instance of an Engine might not
           see all the CASes being processed. If false, then that component will see all of the
           CASes being processed. In this case, it can accumulate state information among all
           the CASes. Typically, Analysis Engines in the main analysis pipeline are marked
           multipleDeploymentAllowed = true. The CAS Consumer component, on the other hand,
           defaults to having this property set to false, and is typically associated with
           some resource like a database or search engine that aggregates analysis results
           across an entire collection.</p>

           <p>Analysis Engines developers are encouraged not to maintain state between
             documents that would prevent their engine from working as advertised if
             operated in a parallelized environment.</p></dd><dt><a name="ugr.faqs.uddi"></a><span class="term"><span class="bold"><strong>Is engine meta-data compatible with web services and
           UDDI?</strong></span></span></dt><dd><p>All UIMA component implementations are associated with Component
           Descriptors which represents metadata describing various properties about the
           component to support discovery, reuse, validation, automatic composition and
           development tooling. In principle, UIMA component descriptors are compatible
           with web services and UDDI. However, the UIMA framework currently uses its own XML
           representation for component metadata. It would not be difficult to convert
           between UIMA's XML representation and other standard representations.</p>
         </dd><dt><a name="ugr.faqs.scaling"></a><span class="term"><span class="bold"><strong>How do you scale a UIMA application?</strong></span></span></dt><dd><p>The UIMA framework allows components such as
           <a class="link" href="#ugr.faqs.annotator_versus_ae">analysis engines</a> and
           CAS Consumers to be easily deployed as services or in other containers and managed
           by systems middleware designed to scale. UIMA applications tend to naturally
           scale-out across documents allowing many documents to be analyzed in
           parallel.</p>
           <p>The UIMA-AS project has extensive capabilities to flexibly scale a UIMA
             pipeline across multiple machines.  The UIMA-DUCC project supports a
             unified management of large clusters of machines running multiple "jobs"
             each consisting of a pipeline with data sources and sinks.</p>
           <p>Within the core UIMA framework, there is a component called the CPM (Collection Processing
             Manager) which has features and configuration settings for scaling an
             application to increase its throughput and recoverability;
             the CPM was the earlier version of scaleout technology, and has been
             superceded by the UIMA-AS effort (although it is still supported).</p></dd><dt><a name="ugr.faqs.embedding"></a><span class="term"><span class="bold"><strong>What does it mean to embed UIMA in systems middleware?</strong></span></span></dt><dd><p>An example of an embedding would be the deployment of a UIMA analysis
           engine as an Enterprise Java Bean inside an application server such as IBM
           WebSphere. Such an embedding allows the deployer to take advantage of the features
           and tools provided by WebSphere for achieving scalability, service management,
           recoverability etc. UIMA is independent of any particular systems middleware, so
           <a class="link" href="#ugr.faqs.annotator_versus_ae">analysis engines</a> could be deployed on other application servers as well.</p>
         </dd><dt><a name="ugr.faqs.cpm_versus_cpe"></a><span class="term"><span class="bold"><strong>How is the CPM different from a CPE?</strong></span></span></dt><dd><p>These name complimentary aspects of collection processing. The CPM
           (Collection Processing <span class="bold"><strong>Manager</strong></span> is the part of
           the UIMA framework that manages the execution of a workflow of UIMA
           components orchestrated to analyze a large collection of documents. The UIMA
           developer does not implement or describe a CPM. It is a piece of infrastructure code
           that handles CAS transport, instance management, batching, check-pointing,
           statistics collection and failure recovery in the execution of a collection
           processing workflow.</p>

           <p>A Collection Processing Engine (CPE) is component created by the framework
             from a specific CPE descriptor. A CPE descriptor refers to a series of UIMA
             components including a Collection Reader, CAS Initializer, Analysis
             Engine(s) and CAS Consumers. These components are organized in a work flow and
             define a collection analysis job or CPE. A CPE acquires documents from a source
             collection, initializes CASs with document content, performs document
             analysis and then produces collection level results (e.g., search engine
             index, database etc). The CPM is the execution engine for a CPE.</p>
         </dd><dt><a name="ugr.faqs.modalities_other_than_text"></a><span class="term"><span class="bold"><strong>Does UIMA support modalities other than text?</strong></span></span></dt><dd><p>The UIMA architecture supports the development, discovery,
           composition and deployment of multi-modal analytics including text, audio and
           video. Applications that process text, speech and video have been developed using
           UIMA. This release of the SDK, however, does not include examples of these
           multi-modal applications. </p>

           <p>It does however include documentation and programming examples for using
             the key feature required for building multi-modal applications. UIMA supports
             multiple subjects of analysis or <a class="link" href="#ugr.faqs.what_is_a_sofa">Sofas</a>. These allow multiple views of a single
             artifact to be associated with a <a class="link" href="#ugr.faqs.what_is_the_cas">CAS</a>. For example, if an artifact is a video
             stream, one Sofa could be associated with the video frames and another with the
             closed-captions text. UIMA's multiple Sofa feature is included and
             described in this release of the SDK.</p></dd><dt><a name="ugr.faqs.compare"></a><span class="term"><span class="bold"><strong>How does UIMA compare to other similar work?</strong></span></span></dt><dd><p>A number of different frameworks for NLP have preceded UIMA. Two of
           them were developed at IBM Research and represent UIMA's early roots. For
           details please refer to the UIMA article that appears in the IBM Systems Journal
           Vol. 43, No. 3 (<a class="ulink" href="http://www.research.ibm.com/journal/sj/433/ferrucci.html" target="_top">http://www.research.ibm.com/journal/sj/433/ferrucci.html</a>
           ).</p>

           <p>UIMA has advanced that state of the art along a number of dimensions
             including: support for distributed deployments in different middleware
             environments, easy framework embedding in different software product
             platforms (key for commercial applications), broader architectural converge
             with its collection processing architecture, support for
             multiple-modalities, support for efficient integration across programming
             languages, support for a modern software engineering discipline calling out
             different roles in the use of UIMA to develop applications, the extensive use of
             descriptive component metadata to support development tooling, component
             discovery and composition. (Please note that not all of these features are
             available in this release of the SDK.)</p></dd><dt><a name="ugr.faqs.open_source"></a><span class="term"><span class="bold"><strong>Is UIMA Open Source?</strong></span></span></dt><dd><p>Yes. As of version 2, UIMA development has moved to Apache and is being
           developed within the Apache open source processes. It is licensed under the Apache
           version 2 license.
             </p>
         </dd><dt><a name="ugr.faqs.levels_required"></a><span class="term"><span class="bold"><strong>What Java level and OS are required for the UIMA SDK?</strong></span></span></dt><dd><p>As of release 3.0.0, the UIMA SDK requires Java 1.8.
           It has been tested on mainly on Windows and Linux platforms, with some
           testing on the MacOSX. Other
           platforms and JDK implementations will likely work, but have
           not been as significantly tested.</p></dd><dt><a name="ugr.faqs.building_apps_on_top_of_uima"></a><span class="term"><span class="bold"><strong>Can I build my UIM application on top of UIMA?</strong></span></span></dt><dd><p>Yes. Apache UIMA is licensed under the Apache version 2 license,
           enabling you to build and distribute applications which include the framework.
           </p></dd></dl></div>
 </div>
   <div class="chapter" title="Chapter&nbsp;5.&nbsp;Known Issues" id="ugr.issues"><div class="titlepage"><div><div><h2 class="title">Chapter&nbsp;5.&nbsp;Known Issues</h2></div></div></div>


   <div class="variablelist"><dl><dt><a name="ugr.issues.cr_to_xml"></a><span class="term"><span class="bold"><strong>Sun Java 1.4.2_12 doesn't serialize CR characters to XML</strong></span></span></dt><dd>
         <p>(Note: Apache UIMA now requires Java 1.5, so this issue is moot.) The XML serialization support in Sun Java 1.4.2_12 doesn't serialize CR characters to
         XML. As a result, if the document text contains CR characters, XCAS or XMI serialization
         will cause them to be lost, resulting in incorrect annotation offsets. This is exposed in
         the DocumentAnalyzer, with the highlighting being incorrect if the input document contains
         CR characters. </p>
         </dd><dt><a name="ugr.issues.jcasgen_java_1.4"></a><span class="term"><span class="bold"><strong>JCasGen merge facility only supports Java levels 1.4 or earlier</strong></span></span></dt><dd>
         <p>JCasGen has a facility to merge in user (hand-coded) changes with the code generated
           by JCasGen.  This merging supports Java 1.4 constructs only.  JCasGen generates Java 1.4
           compliant code, so as long as any code you change here also only uses Java 1.4 constructs, the
       merge will work, even if you're using Java 5 or later.
           If you use syntactic structures particular to Java 5 or later, the merge
         operation will likely fail to merge properly.</p>
       </dd><dt><a name="ugr.issues.libgcj.4.1.2"></a><span class="term"><span class="bold"><strong>Descriptor editor in Eclipse tooling does not work with libgcj 4.1.2</strong></span></span></dt><dd>
         <p>The descriptor editor in the Eclipse tooling does not work with libgcj 4.1.2, and
         possibly other versions of libgcj.  This is apparently due to a bug in the implementation of
         their XML library, which results in a class cast error.  libgcj is used as the default
         JVM for Eclipse in Ubuntu (and other Linux distributions?).  The workaround is to use a
         different JVM to start Eclipse.</p>
       </dd></dl></div>
 </div>
   <div class="glossary" title="Glossary: Key Terms &amp; Concepts" id="ugr.glossary"><div class="titlepage"><div><div><h2 class="title">Glossary: Key Terms &amp; Concepts</h2></div></div></div><dl><dt><a name="ugr.glossary.aggregate"></a>Aggregate Analysis Engine</dt><dd><p>An <a class="glossterm" href="#ugr.glossary.analysis_engine"><em class="glossterm">Analysis Engine</em></a>
  made up of multiple subcomponent
 Analysis Engines arranged in a flow.  The
 flow can be one of the two built-in flows, or a custom flow provided by the user.</p></dd><dt><a name="ugr.glossary.analysis_engine"></a>Analysis Engine</dt><dd><p>A program that analyzes artifacts (e.g. documents) and infers information about
 them, and which implements the UIMA Analysis Engine interface Specification. It
 does not matter how the program is built, with what framework or whether or not
 it contains component (<span class="quote">&#8220;<span class="quote">sub</span>&#8221;</span>) Analysis Engines.</p></dd><dt><a name="ugr.glossary.annotation"></a>Annotation</dt><dd><p>The association of a metadata, such as a label, with a region of text (or other
 type of artifact). For example, the label <span class="quote">&#8220;<span class="quote">Person</span>&#8221;</span> associated with a
 region of text <span class="quote">&#8220;<span class="quote">John Doe</span>&#8221;</span> constitutes an annotation. We say
 <span class="quote">&#8220;<span class="quote">Person</span>&#8221;</span> annotates the span of text from X to Y containing exactly
 <span class="quote">&#8220;<span class="quote">John Doe</span>&#8221;</span>. An annotation is represented as a special
           <a class="glossterm" href="#ugr.glossary.type"><em class="glossterm">type</em></a>

 in a UIMA <a class="glossterm" href="#ugr.glossary.type_system"><em class="glossterm">type system</em></a>.
            It is the type used to record
 the labeling of regions of a <a class="glossterm" href="#ugr.glossary.sofa"><em class="glossterm">Sofa</em></a>.
           Annotations are <a class="glossterm" href="#ugr.glossary.feature_structure"><em class="glossterm">Feature Structures</em></a>
           whose <a class="glossterm" href="#ugr.glossary.type"><em class="glossterm">Type</em></a> is Annotation or a subtype
           of that.</p></dd><dt><a name="ugr.glossary.annotator"></a>Annotator</dt><dd><p>A software
 component that implements the UIMA annotator interface. Annotators are
 implemented to produce and record annotations over regions of an artifact
 (e.g., text document, audio, and video).</p></dd><dt><a name="ugr.glossary.application"></a>Application</dt><dd><p>An application is the outer containing code that invokes
         the UIMA framework functions to instantiate an
         <a class="glossterm" href="#ugr.glossary.analysis_engine"><em class="glossterm">Analysis Engine</em></a> or a
         <a class="glossterm" href="#ugr.glossary.cpe"><em class="glossterm">Collection Processing Engine</em></a> from a particular
         descriptor, and run it.</p></dd><dt><a name="ugr.glossary.apache_uima_java_framework"></a>Apache UIMA Java Framework</dt><dd><p>A Java-based implementation of the <a class="glossterm" href="#ugr.glossary.uima"><em class="glossterm">UIMA</em></a>
          architecture.  It provides a run-time environment in which developers can plug in and run their UIMA component
          implementations and with which they can build and deploy UIM applications.  The framework is the
          core part of the <a class="glossterm" href="#ugr.glossary.apache_uima_sdk"><em class="glossterm">Apache UIMA SDK</em></a>.</p></dd><dt><a name="ugr.glossary.apache_uima_sdk"></a>Apache UIMA Software Development Kit (SDK)</dt><dd><p>The SDK for which you are now reading the documentation.  The SDK includes the framework
           plus additional components such as tooling and examples.  Some of the tooling is Eclipse-based
           (<a class="ulink" href="http://www.eclipse.org/" target="_top">http://www.eclipse.org/</a>).</p></dd><dt><a name="ugr.glossary.cas"></a>CAS</dt><dd><p>The UIMA Common Analysis Structure is
 the primary data structure which UIMA analysis components use to represent and
 share analysis results.  It contains:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>The artifact. This is the object
 being analyzed such as a text document or audio or video stream. The CAS
 projects one or more views of the artifact. Each view is referred to as a
   <a class="glossterm" href="#ugr.glossary.sofa"><em class="glossterm">Sofa</em></a>.</p></li><li class="listitem"><p>A type system description &#8211;
 indicating the types, subtypes, and their features. </p></li><li class="listitem"><p>Analysis metadata &#8211; <span class="quote">&#8220;<span class="quote">standoff</span>&#8221;</span>
 annotations describing the artifact or a region of the artifact </p></li><li class="listitem"><p>An index repository to support
 efficient access to and iteration over the results of analysis.
 </p></li></ul></div><p>UIMA's primary interface to this structure is provided by
 a class called the Common Analysis System. We use <span class="quote">&#8220;<span class="quote">CAS</span>&#8221;</span> to refer to
 both the structure and system. Where the common analysis structure is used
 through a different interface, the particular implementation of the structure
 is indicated, For example, the <a class="glossterm" href="#ugr.glossary.jcas"><em class="glossterm">JCas</em></a> is a native Java object
 representation of the contents of the common analysis structure.</p><p>A CAS can have multiple views; each view has a unique
 representation of the artifact, and has its own index repository, representing
 results of analysis for that representation of the artifact.</p></dd><dt><a name="ugr.glossary.cas_consumer"></a>CAS Consumer</dt><dd><p>A component that
 receives each CAS in the collection, usually after it has been processed by an
           <a class="glossterm" href="#ugr.glossary.analysis_engine"><em class="glossterm">Analysis Engine</em></a>. It is responsible for taking the results from
 the CAS and using them for some purpose, perhaps storing selected results into
 a database, for instance.  The CAS
 Consumer may also perform collection-level analysis, saving these results in an
 application-specific, aggregate data structure.</p></dd><dt><a name="ugr.glossary.cas_initializer"></a>CAS Initializer (deprecated)</dt><dd><p>Prior to version 2, this was the component that took an
           undefined input form and produced a particular <a class="glossterm" href="#ugr.glossary.sofa"><em class="glossterm">Sofa</em></a>.
           For version 2, this has been replaced with using any <a class="glossterm" href="#ugr.glossary.analysis_engine"><em class="glossterm">Analysis Engine</em></a>
           which takes a particular <a class="glossterm" href="#ugr.glossary.cas_view"><em class="glossterm">CAS View</em></a> and creates a
           new output Sofa.  For example, if the document is HTML, an Analysis Engine might
           create a Sofa which is a detagged version of an input CAS View, perhaps also
 creating annotations derived from the tags. For example &lt;p&gt; tags
 might be translated into Paragraph annotations in the CAS.</p></dd><dt><a name="ugr.glossary.cas_multiplier"></a>CAS Multiplier</dt><dd><p>A component, implemented by a UIMA developer,
 that takes a CAS as input and produces 0 or more new CASes as output.  Common use cases for a CAS Multiplier
           include creating alternative versions of an input <a class="glossterm" href="#ugr.glossary.sofa"><em class="glossterm">Sofa</em></a>
           (see <a class="glossterm" href="#ugr.glossary.cas_initializer"><em class="glossterm">CAS Initializer</em></a>), and breaking
           a large input CAS into smaller pieces, each of which is emitted as a
 separate output CAS.  There are other
 uses, however, such as aggregating input CASes into a single output CAS.</p></dd><dt><a name="ugr.glossary.cas_processor"></a>CAS Processor</dt><dd><p>A component of a Collection Processing Engine (CPE) that
 takes a CAS as input and returns a CAS as output. There are two types of CAS
 Processors: <a class="glossterm" href="#ugr.glossary.analysis_engine"><em class="glossterm">Analysis Engine</em></a>s and
           <a class="glossterm" href="#ugr.glossary.cas_consumer"><em class="glossterm">CAS Consumer</em></a>s.</p></dd><dt><a name="ugr.glossary.cas_view"></a>CAS View</dt><dd><p>A CAS Object which shares the base CAS and type system
 definition and index specifications, but has a unique index repository and a
 particular <a class="glossterm" href="#ugr.glossary.sofa"><em class="glossterm">Sofa</em></a>.   Views are named, and applications and
 annotators can dynamically create additional views whenever they are needed.
 Annotations are made with respect to one view.  Feature structures can have references to feature structures
           indexed in other views, as needed.</p></dd><dt><a name="ugr.glossary.cde"></a>CDE</dt><dd><p>The Component Descriptor Editor. This
 is the Eclipse tool that lets you conveniently edit the UIMA descriptors;
           see <a href="tools.html#ugr.tools.cde" class="olink">Chapter&nbsp;1, <i>Component Descriptor Editor User's Guide</i></a>.</p></dd><dt><a name="ugr.glossary.cpe"></a>Collection Processing Engine (CPE)</dt><dd><p>Performs Collection Processing
 through the combination of a
           <a class="glossterm" href="#ugr.glossary.collection_reader"><em class="glossterm">Collection Reader</em></a>,
           0 or more <a class="glossterm" href="#ugr.glossary.analysis_engine"><em class="glossterm">Analysis Engine</em></a>s,
  and zero or more <a class="glossterm" href="#ugr.glossary.cas_consumer"><em class="glossterm">CAS Consumer</em></a>s.
 The Collection Processing Manager (CPM) manages the execution of the engine.</p><p>The CPE also refers to the XML specification of the Collection Processing
         engine.  The CPM reads a CPE specification and instantiates a CPE instance from it,
         and runs it.</p></dd><dt><a name="ugr.glossary.cpm"></a>Collection Processing Manager (CPM)</dt><dd><p>The part of the framework that
 manages the execution of collection processing, routing CASs from the
           <a class="glossterm" href="#ugr.glossary.collection_reader"><em class="glossterm">Collection Reader</em></a>

 to 0 or more <a class="glossterm" href="#ugr.glossary.analysis_engine"><em class="glossterm">Analysis Engine</em></a>s
 and then to the 0 or more <a class="glossterm" href="#ugr.glossary.cas_consumer"><em class="glossterm">CAS Consumer</em></a>s. The CPM
 provides feedback such as performance statistics and error reporting and supports
 other features such as parallelization and error handling.</p></dd><dt><a name="ugr.glossary.collection_reader"></a>Collection Reader</dt><dd><p>A component
 that reads documents from some source, for example a file system or database.
 The collection reader initializes a CAS with this document.
           Each document is returned as a CAS that may then be processed by
           an <a class="glossterm" href="#ugr.glossary.analysis_engine"><em class="glossterm">Analysis Engine</em></a>s. If the task of populating a CAS
 from the document is complex, you may use an arbitrarily complex chain of
           <a class="glossterm" href="#ugr.glossary.analysis_engine"><em class="glossterm">Analysis Engine</em></a>s and have the last one
           create and initialize a new <a class="glossterm" href="#ugr.glossary.sofa"><em class="glossterm">Sofa</em></a>.</p></dd><dt><a name="ugr.glossary.feature_structure"></a>Feature Structure</dt><dd><p>An instance of a <a class="glossterm" href="#ugr.glossary.type"><em class="glossterm">Type</em></a>.
         Feature Structures are kept in the <a class="glossterm" href="#ugr.glossary.cas"><em class="glossterm">CAS, and may
         (optionally) be added to the defined <a class="glossterm" href="#ugr.glossary.index"><em class="glossterm">indexes</em></a>.
         Feature Structures may contain references to other Feature Structures.
         Feature Structures whose type is Annotation or a subtype of that, are referred to as
         <a class="glossterm" href="#ugr.glossary.annotation"><em class="glossterm">annotations</em></a>.</em></a></p></dd><dt><a name="ugr.glossary.feature"></a>Feature</dt><dd><p>A data member or attribute of a type.  Each feature itself has an
 associated range type, the type of the value that it can hold.  In the
 database analogy where types are tables, features are columns.
         In the world of structured data types, each feature is a <span class="quote">&#8220;<span class="quote">field</span>&#8221;</span>,
         or data member.</p></dd><dt><a name="ugr.glossary.flow_controller"></a>Flow Controller</dt><dd><p>A component which implements the interfaces needed
 to specify a custom flow within an <a class="glossterm" href="#ugr.glossary.aggregate"><em class="glossterm">Aggregate Analysis Engine</em></a>.</p></dd><dt><a name="ugr.glossary.hybrid_analysis_engine"></a>Hybrid Analysis Engine</dt><dd><p>An <a class="glossterm" href="#ugr.glossary.aggregate"><em class="glossterm">Aggregate Analysis Engine</em></a>
           where more than one of its component Analysis Engines are deployed
 the same address space and one or more are deployed remotely (part tightly and
 part loosely-coupled).</p></dd><dt><a name="ugr.glossary.index"></a>Index</dt><dd><p>Data in the CAS can only be retrieved using Indexes.
           Indexes are analogous to the indexes that are
 specified on tables of a database.  Indexes belong to Index Repositories;
 there is one Repository for each
 view of the CAS.  Indexes are specified
 to retrieve instances of some CAS Type (including its subtypes), and can be
 optionally sorted in a user-definable way.
           For example, all types derived from the UIMA
 built-in type <code class="literal">uima.tcas.Annotation</code> contain begin
 and end features, which mark the begin and end offsets in the text where this
 annotation occurs.  There is a built-in index of Annotations that specifies that
 annotations are retrieved sequentially by sorting first on the value of the begin
 feature (ascending) and then by the value of the end feature (descending).
 In this case, iterating over the annotations, one first obtains annotations that
 come sequentially first in the text, while favoring longer annotations, in the case
 where two annotations start at the same offset.  Users can define their own indexes
 as well.</p></dd><dt><a name="ugr.glossary.jcas"></a>JCas</dt><dd><p>A Java object interface to the contents of the CAS.
           This interface uses additional generated Java classes, where each type in the CAS
 is represented as a Java class with the same name, each feature is represented with
 a getter and setter method, and each instance of a type is represented as a
 Java object of the corresponding Java class.</p></dd><dt><a name="ugr.glossary.loosely_coupled_analysis_engine"></a>Loosely-Coupled Analysis Engine</dt><dd><p>An <a class="glossterm" href="#ugr.glossary.aggregate"><em class="glossterm">Aggregate Analysis Engine</em></a>
          where no two of its component Analysis Engines run in the
 same address space but where each is remote with respect to the others that
 make up the aggregate. Loosely coupled engines are ideal for using
           remote Analysis Engine services that are
 not locally available, or for quickly assembling and testing functionality in
 cross-language, cross-platform distributed environments. They also better enable
 distributed scaleable implementations where quick recoverability may have a
 greater impact on overall throughput than analysis speed.</p></dd><dt><a name="ugr.glossary.ontology"></a></dt><dd><p>The part of a knowledge base that defines the semantics of the data
 axiomatically.</p></dd><dt><a name="ugr.glossary.pear"></a>PEAR</dt><dd><p>An archive file that packages up a UIMA component with its code,
 descriptor files and other resources required to install and run it in another
 environment. You can generate PEAR files using utilities that come with the
 UIMA SDK.</p></dd><dt><a name="ugr.glossary.primitive_analysis_engine"></a>Primitive Analysis Engine</dt><dd><p>An <a class="glossterm" href="#ugr.glossary.analysis_engine"><em class="glossterm">Analysis Engine</em></a>
           that is composed of a single
           <a class="glossterm" href="#ugr.glossary.annotator"><em class="glossterm">Annotator</em></a>; one that has
 no component (or <span class="quote">&#8220;<span class="quote">sub</span>&#8221;</span>) Analysis Engines inside of it;
 contrast with
           <a class="glossterm" href="#ugr.glossary.aggregate"><em class="glossterm">Aggregate Analysis Engine</em></a>.</p></dd><dt><a name="ugr.glossary.structured_information"></a>Structured Information</dt><dd><p>Items stored in structured resources such as
 search engine indices, databases or knowledge bases. The canonical example of
 structured information is the database table. Each element of information in
 the database is associated with a precisely defined schema where each table
 column heading indicates its precise semantics, defining exactly how the
 information should be interpreted by a computer program or end-user.</p></dd><dt><a name="ugr.glossary.sofa"></a>Subject of Analysis (Sofa)</dt><dd><p>A piece of
 data (e.g., text document, image, audio segment, or video segment), which is intended
 for analysis by UIMA analysis components.  It belongs to a
           <a class="glossterm" href="#ugr.glossary.cas_view"><em class="glossterm">CAS View</em></a> which has the same name; there
           is a one-to-one correspondence between these.  There can be multiple Sofas contained within
 one CAS, each one representing a different view of the original artifact &#8211; for example,
 an audio file could be the original artifact, and also be one Sofa, and another
 could be the output of a voice-recognition component, where the Sofa would be
 the corresponding text document. Sofas may be analyzed independently or
 simultaneously; they all co-exist within the CAS.  </p></dd><dt><a name="ugr.glossary.tightly_coupled_analysis_engine"></a>Tightly-Coupled Analysis Engine</dt><dd><p>An <a class="glossterm" href="#ugr.glossary.aggregate"><em class="glossterm">Aggregate Analysis Engine</em></a>
  where all of its component Analysis Engines run in the same address space.</p></dd><dt><a name="ugr.glossary.type"></a>Type</dt><dd><p>A specification of an object in the
           <a class="glossterm" href="#ugr.glossary.cas"><em class="glossterm">CAS</em></a> used to store the results of
 analysis.  Types are defined using inheritance, so some types may be
 defined purely for the sake of defining other types, and are in this sense <span class="quote">&#8220;<span class="quote">abstract
 types.</span>&#8221;</span>  Types usually contain
           <a class="glossterm" href="#ugr.glossary.feature"><em class="glossterm">Feature</em></a>s, which are attributes, or
 properties of the type.  A type is roughly equivalent to a class in an
 object oriented programming language, or a table in a database.  Instances of types in the CAS
           may be indexed for retrieval.</p></dd><dt><a name="ugr.glossary.type_system"></a>Type System</dt><dd><p>A collection of related <a class="glossterm" href="#ugr.glossary.type"><em class="glossterm">types</em></a>.
           All components that can access the CAS,
 including <a class="glossterm" href="#ugr.glossary.application"><em class="glossterm">Applications</em></a>,
           <a class="glossterm" href="#ugr.glossary.analysis_engine"><em class="glossterm">Analysis Engine</em></a>s,
           <a class="glossterm" href="#ugr.glossary.collection_reader"><em class="glossterm">Collection Readers</em></a>,
           <a class="glossterm" href="#ugr.glossary.flow_controller"><em class="glossterm">Flow Controllers</em></a>, or
           <a class="glossterm" href="#ugr.glossary.cas_consumer"><em class="glossterm">CAS Consumers</em></a>
 declare the type system that they use. Type systems are shared across Analysis Engines, allowing the outputs
           of one Analysis Engine to be read as input by another Analysis Engine.
 A type system is roughly analogous to a set of related classes in object
 oriented programming, or a set of related tables in a database.  The type
 system / type / feature terminology comes from computational linguistics.</p></dd><dt><a name="ugr.glossary.unstructured_information"></a>Unstructured Information</dt><dd><p>The canonical example of unstructured
 information is the natural language text document. The intended meaning of a
 document's content is only implicit and its precise interpretation by a
 computer program requires some degree of analysis to explicate the document's
 semantics. Other examples include audio, video and images. Contrast with
 <a class="glossterm" href="#ugr.glossary.structured_information"><em class="glossterm">Structured Information</em></a>.
         </p></dd><dt><a name="ugr.glossary.uima"></a>UIMA</dt><dd><p>UIMA is an acronym that stands for Unstructured Information Management Architecture;
           it is a software architecture which specifies component interfaces, design patterns
 and development roles for creating, describing, discovering, composing and
 deploying multi-modal analysis capabilities.  The UIMA specification is being developed by a
         technical committee at <a class="ulink" href="http://www.oasis-open.org/committees/uima" target="_top">OASIS</a>.</p></dd><dt><a name="ugr.glossary.uima_java_framework"></a>UIMA Java Framework</dt><dd><p>See <a class="glossterm" href="#ugr.glossary.apache_uima_java_framework"><em class="glossterm">Apache UIMA Java Framework</em></a>.</p><p></p></dd><dt><a name="ugr.glossary.uima_sdk"></a>UIMA SDK</dt><dd><p>See <a class="glossterm" href="#ugr.glossary.apache_uima_sdk"><em class="glossterm">Apache UIMA SDK</em></a>.</p><p></p></dd><dt><a name="ugr.glossary.xcas"></a>XCAS</dt><dd><p>An XML representation of the CAS. The XCAS can be used for saving
 and restoring CASs to and from streams. The UIMA SDK provides XCAS serialization and
 de-serialization methods for CASes.  This is an older serialization format and
 new UIMA code should use the standard <a class="glossterm" href="#ugr.glossary.xmi"><em class="glossterm">XMI</em></a>
 format instead.</p></dd><dt><a name="ugr.glossary.xmi"></a>XML Metadata Interchange (XMI)</dt><dd><p>An OMG standard for representing
 object graphs in XML, which UIMA uses to serialize analysis results from the
 CAS to an XML representation.  The UIMA SDK provides XMI serialization and
 de-serialization methods for CASes</p></dd></dl></div>
 </div></body></html>