<?xml version="1.0" encoding="UTF-8"?> | |
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" | |
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[ | |
<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" > | |
%uimaents; | |
]> | |
<!-- | |
Licensed to the Apache Software Foundation (ASF) under one | |
or more contributor license agreements. See the NOTICE file | |
distributed with this work for additional information | |
regarding copyright ownership. The ASF licenses this file | |
to you under the Apache License, Version 2.0 (the | |
"License"); you may not use this file except in compliance | |
with the License. You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, | |
software distributed under the License is distributed on an | |
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | |
KIND, either express or implied. See the License for the | |
specific language governing permissions and limitations | |
under the License. | |
--> | |
<chapter id="ugr.project_overview"> | |
<title>UIMA Overview</title> | |
<titleabbrev>Overview</titleabbrev> | |
<para>The Unstructured Information Management Architecture (UIMA) is an architecture and software framework | |
for creating, discovering, composing and deploying a broad range of multi-modal analysis capabilities and | |
integrating them with search technologies. The architecture is undergoing a standardization effort, | |
referred to as the <emphasis>UIMA specification</emphasis> by a technical committee within | |
<ulink url="http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=uima">OASIS</ulink>. | |
</para> | |
<para>The <emphasis>Apache UIMA</emphasis> framework is an Apache licensed, open source implementation of the | |
UIMA Architecture, and provides a run-time environment in which developers can plug in | |
and run their UIMA component implementations and with which they can build and deploy UIM applications. The | |
framework itself is not specific to any IDE or platform.</para> | |
<para>It includes an all-Java implementation of the | |
UIMA framework for the development, description, composition and deployment of UIMA components and | |
applications. It also provides the developer with an Eclipse-based (<ulink url="http://www.eclipse.org/"/> | |
) development environment that includes a set of tools and utilities for using UIMA. It also includes | |
a C++ version of the framework, and | |
enablements for Annotators built in Perl, Python, and TCL.</para> | |
<para>This chapter is the intended starting point for readers that are new to the Apache UIMA Project. It includes | |
this introduction and the following sections:</para> | |
<itemizedlist> | |
<listitem> | |
<para> <xref linkend="ugr.project_overview_doc_overview"/> provides a list of the books and topics included in | |
the Apache UIMA documentation with a brief summary of each. </para> | |
</listitem> | |
<listitem> | |
<para> <xref linkend="ugr.project_overview_doc_use"/> describes a recommended path through the | |
documentation to help get the reader up and running with UIMA </para> | |
</listitem> | |
<listitem> | |
<para> <xref linkend="ugr.project_overview_migrating_from_ibm_uima"/> is intended for users of IBM | |
UIMA, and describes the steps needed to upgrade to Apache UIMA. </para> | |
</listitem> | |
<listitem> | |
<para> <xref linkend="ugr.project_overview_changes_from_v1"/> lists the changes that occurred between UIMA | |
v1.x and UIMA v2.x (independent of the transition to Apache).</para> | |
</listitem> | |
</itemizedlist> | |
<para>The main website for Apache UIMA is <ulink url="http://uima.apache.org"/>. Here you | |
can find out many things, including: | |
<itemizedlist spacing="compact"> | |
<listitem><para>how to download (both the binary and source distributions</para></listitem> | |
<listitem><para>how to participate in the development</para></listitem> | |
<listitem><para>mailing lists - including the user list used like a forum for questions and answers</para></listitem> | |
<listitem><para>a Wiki where you can find and contribute all kinds of information, including tips and best practices</para></listitem> | |
<listitem><para>a sandbox - a subproject for potential new additions to Apache UIMA or to subprojects of it. Things here | |
are works in progress, and may (or may not) be included in releases.</para></listitem> | |
<listitem><para>links to conferences</para></listitem> | |
</itemizedlist> | |
</para> | |
<section id="ugr.project_overview_doc_overview"> | |
<title>Apache UIMA Project Documentation Overview</title> | |
<para> The user documentation for UIMA is organized into several parts. | |
<itemizedlist spacing="compact"> | |
<listitem> | |
<para> Overviews - this documentation </para> | |
</listitem> | |
<listitem> | |
<para> Eclipse Tooling Installation and Setup - also in this document </para> | |
</listitem> | |
<listitem> | |
<para> Tutorials and Developer's Guides </para> | |
</listitem> | |
<listitem> | |
<para> Tools Users' Guides </para> | |
</listitem> | |
<listitem> | |
<para> References </para> | |
</listitem> | |
</itemizedlist> </para> | |
<para> | |
The first 2 parts make up this book; the last 3 have individual | |
books. The books are provided both as | |
(somewhat large) html files, viewable in browsers, and also as PDF files. | |
The documentation is fully hyperlinked, with tables of contents. The PDF versions are set up to | |
print nicely - they have page numbers included on the cross references within a book. </para> | |
<para>If you view the PDF files inside | |
a browser that supports imbedded viewing of PDF, the hyperlinks between different PDF books may work (not | |
all browsers have been tested...).</para> | |
<para>The following set of tables gives a more detailed overview of the various parts of the | |
documentation. | |
</para> | |
<section id="ugr.project_overview_overview"> | |
<title>Overviews</title> | |
<informaltable frame="all" rowsep="1" colsep="1"> | |
<tgroup cols="2"> | |
<colspec colnum="1" colname="col1" colwidth="1*"/> | |
<colspec colnum="2" colname="col2" colwidth="2.5*"/> | |
<tbody> | |
<row> | |
<entry><emphasis>Overview of the Documentation</emphasis> | |
</entry> | |
<entry> | |
<para>What you are currently reading. Lists the documents provided in the Apache | |
UIMA documentation set and provides | |
a recommended path through the documentation for getting started using | |
UIMA. It includes release notes and provides a brief high-level description of | |
the different software modules included in the | |
Apache UIMA Project. See <xref linkend="ugr.project_overview_doc_overview"/>.</para> | |
</entry> | |
</row> | |
<row> | |
<entry><emphasis>Conceptual Overview</emphasis> | |
</entry> | |
<entry>Provides a broad conceptual overview of the UIMA component architecture; includes | |
references to the other documents in the documentation set that provide more detail. | |
See <xref linkend="ugr.ovv.conceptual"/></entry> | |
</row> | |
<row> | |
<entry><emphasis>UIMA FAQs</emphasis> | |
</entry> | |
<entry>Frequently Asked Questions about general UIMA concepts. (Not a programming | |
resource.) See <xref linkend="ugr.faqs"/>.</entry> | |
</row> | |
<row> | |
<entry><emphasis>Known Issues</emphasis> | |
</entry> | |
<entry>Known issues and problems with the UIMA SDK. See <xref linkend="ugr.issues"/>.</entry> | |
</row> | |
<row> | |
<entry><emphasis>Glossary</emphasis> | |
</entry> | |
<entry>UIMA terms and concepts and their basic definitions. See <xref linkend="ugr.glossary"/>.</entry> | |
</row> | |
</tbody> | |
</tgroup> | |
</informaltable> | |
</section> | |
<section id="ugr.project_overview_setup"> | |
<title>Eclipse Tooling Installation and Setup</title> | |
<para>Provides step-by-step instructions for installing Apache UIMA in the Eclipse Interactive | |
Development Environment. See <xref linkend="ugr.ovv.eclipse_setup"/>.</para> | |
</section> | |
<section id="ugr.project_overview_tutorials_dev_guides"> | |
<title>Tutorials and Developer's Guides</title> | |
<informaltable> | |
<tgroup cols="2"> | |
<colspec colnum="1" colname="col1" colwidth="1*"/> | |
<colspec colnum="2" colname="col2" colwidth="2.5*"/> | |
<tbody> | |
<row id="ugr.project_overview_tutorial_annotator"> | |
<entry><emphasis>Annotators and Analysis Engines</emphasis> | |
</entry> | |
<entry>Tutorial-style guide for building UIMA annotators and analysis engines. This chapter | |
introduces the developer to creating type systems and using UIMA's common data structure, | |
the CAS or Common Analysis Structure. It demonstrates how to use built in tools to specify and create | |
basic UIMA analysis components. See | |
<olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aae"/>.</entry> | |
</row> | |
<row id="ugr.project_overview_tutorial_cpe"> | |
<entry><emphasis>Building UIMA Collection Processing Engines</emphasis> | |
</entry> | |
<entry>Tutorial-style guide for building UIMA collection processing engines. These | |
manage the | |
analysis of collections of documents from source to sink. See | |
<olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.cpe"/>.</entry> | |
</row> | |
<row id="ugr.project_overview_tutorial_application_development"> | |
<entry><emphasis>Developing Complete Applications</emphasis> | |
</entry> | |
<entry>Tutorial-style guide on using the UIMA APIs to create, run and manage UIMA components from | |
your application. Also describes APIs for saving and restoring the contents of a CAS using an XML | |
format called <trademark class="registered"> XMI</trademark>. See | |
<olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.application"/>.</entry> | |
</row> | |
<row id="ugr.project_overview_guide_flow_controller"> | |
<entry><emphasis>Flow Controller</emphasis> | |
</entry> | |
<entry>When multiple components are combined in an Aggregate, each CAS flow among the various | |
components. UIMA provides two built-in flows, and also allows custom flows to be | |
implemented. See <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.fc"/>.</entry> | |
</row> | |
<row id="ugr.project_overview_guide_multiple_sofas"> | |
<entry><emphasis>Developing Applications using Multiple Subjects of Analysis</emphasis> | |
</entry> | |
<entry>A single CAS maybe associated with multiple subjects of analysis (Sofas). These are useful | |
for representing and analyzing different formats or translations of the same document. For | |
multi-modal analysis, Sofas are good for different modal representations of the same stream | |
(e.g., audio and close-captions).This chapter provides the developer details on how to use | |
multiple Sofas in an application. See | |
<olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aas"/>.</entry> | |
</row> | |
<row id="ugr.project_overview_guide_multiple_views"> | |
<entry><emphasis>Multiple CAS Views of an Artifact</emphasis> | |
</entry> | |
<entry>UIMA provides an extension to the basic model of the CAS which supports | |
analysis of multiple views of the same artifact, all contained with the CAS. This | |
chapter describes the concepts, terminology, and the API and XML extensions that | |
enable this. See | |
<olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.mvs"/>.</entry> | |
</row> | |
<row id="ugr.project_overview_guide_cas_multiplier"> | |
<entry><emphasis>CAS Multiplier</emphasis> | |
</entry> | |
<entry>A component may add additional CASes into the workflow. This may be useful to break up a large | |
artifact into smaller units, or to create a new CAS that collects information from multiple other | |
CASes. See <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.cm"/>.</entry> | |
</row> | |
<row id="ugr.project_overview_xmi_emf"> | |
<entry><emphasis>XMI and EMF Interoperability</emphasis> | |
</entry> | |
<entry>The UIMA Type system and the contents of the CAS itself can be externalized using the XMI | |
standard for XML MetaData. Eclipse Modeling Framework (EMF) tooling can be used to develop | |
applications that use this information. See | |
<olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.xmi_emf"/>.</entry> | |
</row> | |
</tbody> | |
</tgroup> | |
</informaltable> | |
</section> | |
<section id="ugr.project_overview_tool_guides"> | |
<title>Tools Users' Guides</title> | |
<informaltable> | |
<tgroup cols="2"> | |
<colspec colnum="1" colname="col1" colwidth="1*"/> | |
<colspec colnum="2" colname="col2" colwidth="2.5*"/> | |
<tbody> | |
<row id="ugr.project_overview_tools_component_descriptor_editor"> | |
<entry><emphasis>Component Descriptor Editor</emphasis> | |
</entry> | |
<entry>Describes the features of the Component Descriptor Editor Tool. This tool provides a GUI for | |
specifying the details of UIMA component descriptors, including those for Analysis Engines | |
(primitive and aggregate), Collection Readers, CAS Consumers and Type Systems. See | |
<olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cde"/>.</entry> | |
</row> | |
<row id="ugr.project_overview_tools_cpe_configurator"> | |
<entry><emphasis>Collection Processing Engine Configurator</emphasis> | |
</entry> | |
<entry>Describes the User Interfaces and features of the CPE Configurator tool. This tool allows the | |
user to select and configure the components of a Collection Processing Engine and then to run the | |
engine. See | |
<olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cpe"/>.</entry> | |
</row> | |
<row id="ugr.project_overview_tools_pear_packager"> | |
<entry><emphasis>Pear Packager</emphasis> | |
</entry> | |
<entry>Describes how to use the PEAR Packager utility. This utility enables developers to produce an | |
archive file for an analysis engine that includes all required resources for installing that | |
analysis engine in another UIMA environment. See | |
<olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.pear.packager"/>.</entry> | |
</row> | |
<row id="ugr.project_overview_tools_pear_installer"> | |
<entry><emphasis>Pear Installer</emphasis> | |
</entry> | |
<entry>Describes how to use the PEAR Installer utility. This utility installs and verifies an | |
analysis engine from an archive file (PEAR) with all its resources in the right place so it is ready to | |
run. See | |
<olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.pear.installer"/>.</entry> | |
</row> | |
<row id="ugr.project_overview_tools_pear_merger"> | |
<entry><emphasis>Pear Merger</emphasis> | |
</entry> | |
<entry>Describes how to use the Pear Merger utility, which does a simple merge of multiple PEAR | |
packages into one. See | |
<olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.pear.merger"/>.</entry> | |
</row> | |
<row id="ugr.project_overview_tools_document_analyzer"> | |
<entry><emphasis>Document Analyzer</emphasis> | |
</entry> | |
<entry>Describes the features of a tool for applying a UIMA analysis engine to a set of documents and | |
viewing the results. See | |
<olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.doc_analyzer"/>.</entry> | |
</row> | |
<row id="ugr.project_overview_tools_cas_visual_debugger"> | |
<entry><emphasis>CAS Visual Debugger</emphasis> | |
</entry> | |
<entry>Describes the features of a tool for viewing the detailed structure and contents of a CAS. Good | |
for debugging. See | |
<olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cvd"/>.</entry> | |
</row> | |
<row id="ugr.project_overview_tools_jcasgen"> | |
<entry><emphasis>JCasGen</emphasis> | |
</entry> | |
<entry>Describes how to run the JCasGen utility, which automatically builds Java classes that | |
correspond to a particular CAS Type System. See | |
<olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.jcasgen"/>.</entry> | |
</row> | |
<row id="ugr.project_overview_tools_xml_cas_viewer"> | |
<entry><emphasis>XML CAS Viewer</emphasis> | |
</entry> | |
<entry>Describes how to run the supplied viewer to view externalized XML forms of CASes. This viewer | |
is used in the examples. See | |
<olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.annotation_viewer"/>.</entry> | |
</row> | |
</tbody> | |
</tgroup> | |
</informaltable> | |
</section> | |
<section id="ugr.project_overview_reference"> | |
<title>References</title> | |
<informaltable> | |
<tgroup cols="2"> | |
<colspec colnum="1" colname="col1" colwidth="1*"/> | |
<colspec colnum="2" colname="col2" colwidth="2.5*"/> | |
<tbody> | |
<row id="ugr.project_overview_javadocs"> | |
<entry><emphasis>Introduction to the UIMA API Javadocs</emphasis> | |
</entry> | |
<entry>Javadocs detailing the UIMA programming interfaces See | |
<olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.javadocs"/></entry> | |
</row> | |
<row id="ugr.project_overview_xml_ref_component_descriptor"> | |
<entry><emphasis>XML: Component Descriptor</emphasis> | |
</entry> | |
<entry>Provides detailed XML format for all the UIMA component descriptors, except the CPE (see | |
next). See | |
<olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.component_descriptor"/>.</entry> | |
</row> | |
<row id="ugr.project_overview_xml_ref_collection_processing_engine_descriptor"> | |
<entry><emphasis>XML: Collection Processing Engine Descriptor</emphasis> | |
</entry> | |
<entry>Provides detailed XML format for the Collection Processing Engine descriptor. See | |
<olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/></entry> | |
</row> | |
<row id="ugr.project_overview_cas"> | |
<entry><emphasis>CAS</emphasis> | |
</entry> | |
<entry>Provides detailed description of the principal CAS interface. See | |
<olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.cas"/></entry> | |
</row> | |
<row id="ugr.project_overview_jcas"> | |
<entry><emphasis>JCas</emphasis> | |
</entry> | |
<entry>Provides details on the JCas, a native Java interface to the CAS. See | |
<olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.jcas"/></entry> | |
</row> | |
<row id="ugr.project_overview_ref_pear"> | |
<entry><emphasis>PEAR Reference</emphasis> | |
</entry> | |
<entry>Provides detailed description of the deployable archive format for UIMA | |
components. See | |
<olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.pear"/></entry> | |
</row> | |
<row id="ugr.project_overview_xmi_cas_serialization"> | |
<entry><emphasis>XMI CAS Serialization Reference</emphasis> | |
</entry> | |
<entry>Provides detailed description of the deployable archive format for UIMA | |
components. See | |
<olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xmi"/></entry> | |
</row> | |
</tbody> | |
</tgroup> | |
</informaltable> | |
</section> | |
</section> | |
<section id="ugr.project_overview_doc_use"> | |
<!-- _crossRef358 --> | |
<title>How to use the Documentation</title> | |
<orderedlist> | |
<listitem> | |
<para>Explore this chapter to get an overview of the different documents that are included with Apache UIMA.</para> | |
</listitem> | |
<listitem> | |
<para> Read <olink targetdoc="&uima_docs_overview;" targetptr="ugr.ovv.conceptual"/> to get a broad | |
view of the basic UIMA concepts and philosophy with reference to the other documents included in the | |
documentation set which provide greater detail. </para> | |
</listitem> | |
<listitem> | |
<para> For more general information on the UIMA architecture and how it has been used, refer to the IBM Systems | |
Journal special issue on Unstructured Information Management, on-line at <ulink | |
url="http://www.research.ibm.com/journal/sj43-3.html"/> or to the section of the UIMA project | |
website on Apache website where other publications are listed. </para> | |
</listitem> | |
<listitem> | |
<para> Set up Apache UIMA in your Eclipse environment. To do this, follow the instructions in <xref | |
linkend="ugr.ovv.eclipse_setup"/>. </para> | |
</listitem> | |
<listitem> | |
<para> Develop sample UIMA annotators, run them and explore the results. Read <olink | |
targetdoc="&uima_docs_tutorial_guides;"/> <olink | |
targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aae"/> and follow it like a tutorial | |
to learn how to develop your first UIMA annotator and set up and run your first UIMA analysis engines. | |
<itemizedlist> | |
<listitem> | |
<para> As part of this you will use a few tools including | |
<itemizedlist> | |
<listitem> | |
<para> The UIMA Component Descriptor Editor, described in more detail in <olink | |
targetdoc="&uima_docs_tools;"/> <olink | |
targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cde"/> and </para> | |
</listitem> | |
<listitem> | |
<para> The Document Analyzer, described in more detail in <olink | |
targetdoc="&uima_docs_tools;"/> <olink | |
targetdoc="&uima_docs_tools;" targetptr="ugr.tools.doc_analyzer"/>. </para> | |
</listitem> | |
</itemizedlist> </para> | |
</listitem> | |
<listitem> | |
<para>While following along in <olink targetdoc="&uima_docs_tutorial_guides;"/> | |
<olink targetdoc="&uima_docs_tutorial_guides;" | |
targetptr="ugr.tug.aae"/>, reference documents that may help are: | |
<itemizedlist> | |
<listitem> | |
<para> <olink targetdoc="&uima_docs_ref;"/> <olink targetdoc="&uima_docs_ref;" | |
targetptr="ugr.ref.xml.component_descriptor"/> for understanding the analysis | |
engine descriptors </para> | |
</listitem> | |
<listitem> | |
<para> <olink targetdoc="&uima_docs_ref;"/> | |
<olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.jcas"/> for | |
understanding the JCas </para> | |
</listitem> | |
</itemizedlist> </para> | |
</listitem> | |
</itemizedlist> </para> | |
</listitem> | |
<listitem> | |
<para> Learn how to create, run and manage a UIMA analysis engine as part of an application. | |
Connect your analysis engine to the provided semantic search engine to learn how a | |
complete analysis and search application may be built with Apache UIMA. <olink | |
targetdoc="&uima_docs_tutorial_guides;"/> <olink | |
targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.application"/> will guide you | |
through this process. | |
<itemizedlist> | |
<listitem> | |
<para> As part of this you will use the document analyzer (described in more detail in <olink | |
targetdoc="&uima_docs_tools;"/> <olink | |
targetdoc="&uima_docs_tools;" targetptr="ugr.tools.doc_analyzer"/> and semantic search | |
GUI tools (see <olink targetdoc="&uima_docs_tutorial_guides;"/> | |
<olink targetdoc="&uima_docs_tutorial_guides;" | |
targetptr="ugr.tug.application.search.query_tool"/>. </para> | |
</listitem> | |
</itemizedlist> </para> | |
</listitem> | |
<listitem> | |
<para> Pat yourself on the back. Congratulations! If you reached this step successfully, then you have an | |
appreciation for the UIMA analysis engine architecture. You would have built a few sample annotators, | |
deployed UIMA analysis engines to analyze a few documents, searched over the results using the built-in | |
semantic search engine and viewed the results through a built-in viewer | |
– all as part of a simple but complete application. </para> | |
</listitem> | |
<listitem> | |
<para> Develop and run a Collection Processing Engine (CPE) to analyze and gather the results of an entire | |
collection of documents. <olink targetdoc="&uima_docs_tutorial_guides;"/> | |
<olink targetdoc="&uima_docs_tutorial_guides;" | |
targetptr="ugr.tug.cpe"/> will guide you through this process. | |
<itemizedlist> | |
<listitem> | |
<para> As part of this you will use the CPE Configurator tool. For details see <olink | |
targetdoc="&uima_docs_tools;"/> <olink | |
targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cpe"/>. </para> | |
</listitem> | |
<listitem> | |
<para> You will also learn about CPE Descriptors. The detailed format for these may be found in <olink | |
targetdoc="&uima_docs_ref;"/> <olink | |
targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/>. </para> | |
</listitem> | |
</itemizedlist> </para> | |
</listitem> | |
<listitem> | |
<para> Learn how to package up an analysis engine for easy installation into another UIMA environment. | |
<olink targetdoc="&uima_docs_tools;"/> | |
<olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.pear.packager"/> and <olink | |
targetdoc="&uima_docs_tools;"/> <olink | |
targetdoc="&uima_docs_tools;" targetptr="ugr.tools.pear.installer"/> will teach you how to | |
create UIMA analysis engine archives so that you can easily share your components with a broader | |
community. </para> | |
</listitem> | |
</orderedlist> | |
</section> | |
<section id="ugr.project_overview_changes_from_previous"> | |
<title>Changes from Previous Major Versions</title> | |
<para> There are two previous version of UIMA, available from IBM's alphaWorks: version 1.4.x and version 2.0 | |
(the 2.0 version was a "beta" only release). This section describes the changes relative to both of these | |
releases. A migration utility is provided which updates your Java code and descriptors as needed for this | |
release. See <xref linkend="ugr.project_overview_migrating_from_ibm_uima"/> for instructions on how to | |
run the migration utility. </para> | |
<note><para>Each Apache UIMA release includes RELEASE_NOTES and RELEASE_NOTES.html files that | |
describe the changes that have occurred in each release. | |
Please refer to those files for specific changes for each Apache UIMA release.</para></note> | |
<section id="ugr.project_overview_changes_from_2_0"> | |
<title>Changes from IBM UIMA 2.0 to Apache UIMA 2.1</title> | |
<para>This section describes what has changed between version 2.0 and version 2.1 of UIMA; | |
the following section describes the differences between version 1.4 and version 2.1. | |
</para> | |
<section id="ugr.project_overview.migration_utility.java_package_name_changes"> | |
<title>Java Package Name Changes</title> | |
<para>All of the UIMA Java package names have changed in Apache UIMA. They now start with | |
<literal>org.apache</literal> rather than <literal>com.ibm</literal>. There have been other | |
changes as well. The package name segment <literal>reference_impl</literal> has been shortened to | |
<literal>impl</literal>, and some segments have been reordered. For example | |
<literal>com.ibm.uima.reference_impl.analysis_engine</literal> has become | |
<literal>org.apache.uima.analysis_engine.impl</literal>. Tools are now consolidated under | |
<literal>org.apache.uima.tools</literal> and service adapters under | |
<literal>org.apache.uima.adapter</literal>. </para> | |
<para>The migration utility will replace all occurrences of IBM UIMA package names with their Apache UIMA | |
equivalents. It will not replace <emphasis>prefixes</emphasis> of package names, so if your code uses | |
a package called <literal>com.ibm.uima.myproject</literal> (although that is not recommended), it | |
will not be replaced.</para> | |
</section> | |
<section id="ugr.project_overview.migration_utility.xml_descriptor_changes"> | |
<title>XML Descriptor Changes</title> | |
<para>The XML namespace in UIMA component descriptors has changed from | |
<literal>http://uima.watson.ibm.com/resourceSpecifier</literal> to | |
<literal>http://uima.apache.org/resourceSpecifier</literal>. The value of the | |
<literal><frameworkImplementation></literal> must now be | |
<literal>org.apache.uima.java</literal> or <literal>org.apache.uima.cpp</literal>. The | |
migration script will apply these replacements. </para> | |
</section> | |
<section id="ugr.project_overview.migration_utility.tcas_replaced_by_cas"> | |
<title>TCAS replaced by CAS</title> | |
<para>In Apache UIMA the <literal>TCAS</literal> interface has been removed. All uses of it must now be | |
replaced by the <literal>CAS</literal> interface. (All methods that used to be defined on | |
<literal>TCAS</literal> were moved to <literal>CAS</literal> in v2.0.) The method | |
<literal>CAS.getTCAS()</literal> is replaced with <literal>CAS.getCurrentView()</literal> and | |
<literal>CAS.getTCAS(String)</literal> is replaced with <literal>CAS.getView(String)</literal> | |
. The following have also been removed and replaced with the equivalent "CAS" variants: | |
<literal>TCASException</literal>, <literal>TCASRuntimeException</literal>, | |
<literal>TCasPool</literal>, and <literal>CasCreationUtils.createTCas(...)</literal>. </para> | |
<para>The migration script will apply the necessary replacements.</para> | |
</section> | |
<section id="ugr.project_overview.migration_utility.jcas_interface"> | |
<title>JCas Is Now an Interface</title> | |
<para>In previous versions, user code accessed the JCas <emphasis>class</emphasis> directly. In Apache | |
UIMA there is now an interface, <literal>org.apache.uima.jcas.JCas</literal>, which all JCas-based | |
user code must now use. Static methods that were previously on the JCas class (and called from JCas cover | |
classes generated by JCasGen) have been moved to the new | |
<literal>org.apache.uima.jcas.JCasRegistry</literal> class. The migration script will apply the | |
necessary replacements to your code, including any JCas cover classes that are part of your codebase. | |
</para> | |
</section> | |
<section id="ugr.project_overview.migration_utility.jar_files"> | |
<title>JAR File names Have Changed</title> | |
<para>The UIMA JAR file names have changed slightly. Underscores have been replaced with hyphens to | |
be consistent with Apache naming conventions. For example <literal>uima_core.jar</literal> is now | |
<literal>uima-core.jar</literal>. Also <literal>uima_jcas_builtin_types.jar</literal> has been | |
renamed to <literal>uima-document-annotation.jar</literal>. Finally, the <literal>jVinci.jar</literal> | |
file is now in the <literal>lib</literal> directory rather than the <literal>lib/vinci</literal> | |
directory as was previously the case. The migration script will apply the necessary replacements, | |
for example to script files or Eclipse launch configurations. (See <xref | |
linkend="ugr.project_overview_running_the_migration_utility"/> for a list of file extensions that | |
the migration utility will process by default.) | |
</para> | |
</section> | |
<section id="ugr.ovv.search_engine_repackaged"> | |
<title>Semantic Search Engine Repackaged</title> | |
<para>The versions of the UIMA SDK prior to the move into Apache came with a semantic search engine. The Apache | |
version does not include this search engine. The search engine has been repackaged and is separately | |
available from <ulink url="http://www.alphaworks.ibm.com/tech/uima"/>. The intent is to hook up (over | |
time) with other open source search engines, such as the Lucene search engine project in Apache.</para> | |
</section> | |
</section> | |
<section id="ugr.project_overview_changes_from_v1"> | |
<title>Changes from UIMA Version 1.x</title> | |
<para>Version 2.x of UIMA provides new capabilities and refines several areas of the UIMA | |
architecture, as compared with version 1.</para> | |
<section id="ugr.project_overview_new_capabilities"> | |
<title>New Capabilities</title> | |
<formalpara id="ugr.project_overview_new_data_types"> | |
<title>New Primitive data types</title> | |
<para>UIMA now supports Boolean (bit), Byte, Short (16 bit integers), Long (64 bit | |
integers), and Double (64 bit floating point) primitive types, and arrays of | |
these. These types can be used like all the other primitive types.</para> | |
</formalpara> | |
<formalpara id="ugr.ovv.simpler_aes_and_cases"> | |
<title>Simpler Analysis Engines and CASes</title> | |
<para>Version 1.x made a distinction between Analysis Engines and Text Analysis | |
Engines. This distinction has been eliminated in Version 2 - new code should just | |
refer to Analysis Engines. Analysis Engines can operate on multiple kinds of | |
artifacts, including text.</para> | |
</formalpara> | |
<formalpara id="ugr.ovv.sofas_and_cas_views_simplified"> | |
<title>Sofas and CAS Views simplified</title> | |
<para>The APIs for manipulating multiple subjects of analysis (Sofas) and their | |
corresponding CAS Views have been simplified.</para> | |
</formalpara> | |
<formalpara id="ugr.ovv.ae_support_multiple_new_cases"> | |
<title>Analysis Component generalized to support multiple new CAS | |
outputs</title> | |
<para>Analysis Components, in general, can make use of new capabilities to return | |
multiple new CASes, in addition to returning the original CAS that is passed in. | |
This allows components to have Collection Reader-like capabilities, but be | |
placed anywhere in the flow. See <olink | |
targetdoc="&uima_docs_tutorial_guides;"/> <olink | |
targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.cm"/> | |
.</para> | |
</formalpara> | |
<formalpara id="ugr.ovv.user_customized_fc"> | |
<title>User-customized Flow Controllers</title> | |
<para>A new component, the Flow Controller, can be supplied by the user to implement | |
arbitrary flow control for CASes within an Aggregate. This is in addition to the two | |
built-in flow control choices of linear and language-capability flow. See <olink | |
targetdoc="&uima_docs_tutorial_guides;"/> <olink | |
targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.fc"/> | |
.</para> | |
</formalpara> | |
</section> | |
<section id="ugr.ovv.other_changes"> | |
<title>Other Changes</title> | |
<formalpara> | |
<title>New additional Annotator API ImplBase</title> | |
<para> | |
As of version 2.1, UIMA has a new set of Annotator interfaces. Annotators should now | |
extend CasAnnotator_ImplBase or JCasAnnotator_ImplBase instead of the v1.x | |
TextAnnotator_ImplBase and JTextAnnotator_ImplBase. The v1.x annotator | |
interfaces are unchanged and are still supported for backwards compatibility. | |
</para> | |
</formalpara> | |
<para> | |
The new Annotator interfaces support the changed approaches for ResultSpecifications | |
and the changed exception names (see below), and have all the methods that CAS Consumers | |
have, including CollectionProcessComplete and BatchProcessComplete.</para> | |
<formalpara id="ugr.ovv.exceptions_rationalized"> | |
<title>UIMA Exceptions rationalized</title> | |
<para>In version 1 there were different exceptions for the methods of an | |
AnalysisEngine and for the corresponding methods of an Annotator; these were merged | |
in version 2. | |
<itemizedlist spacing="compact"> | |
<listitem><para>AnnotatorProcessException (v1) → | |
AnalysisEngineProcessException (v2)</para></listitem> | |
<listitem><para>AnnotatorInitializationException (v1) → | |
ResourceInitializationException (v2)</para></listitem> | |
<listitem><para>AnnotatorConfigurationException (v1) → | |
ResourceConfigurationException (v2)</para></listitem> | |
<listitem><para>AnnotatorContextException (v1) → | |
ResourceAccessException (v2)</para></listitem> | |
</itemizedlist> The previous exceptions are still available, but new code should | |
use the new exceptions.</para> | |
</formalpara> | |
<note><para>The signature for typeSystemInit changed the <quote>throws</quote> clause to throw AnalysisEngineProcessException. | |
For Annotators that extend the previous base, the previous definition of typeSystemInit will continue to | |
work for backwards compatibility. | |
</para></note> | |
<formalpara id="ugr.ovv.result_specification"> | |
<title>Changes in Result Specifications</title> | |
<para>In version 1, the <literal>process(...)</literal> method took a second | |
argument, a ResultSpecification. Now it is set when changed and it's up to the | |
annotator to store it in a local field and make it available when needed. | |
This approach lets the annotator receive a specific signal (a method call) when | |
the Result Specification changes. Previously, it would need to check on every call to | |
see if it changed. The default impl base classes provide set/getResultSpecification(...) | |
methods for this</para> | |
</formalpara> | |
<formalpara id="ugr.ovv.one_capability_set"> | |
<title>Only one Capability Set</title> | |
<para>In version one, you can define | |
multiple capability sets. These were not supported well, and for version two, | |
this is now simplified - you should only use one capability set. | |
(For backwards compatibility, if you use more, | |
this won't cause a problem for now).</para> | |
</formalpara> | |
<formalpara> | |
<title>TextAnalysisEngine deprecated; use AnalysisEngine instead</title> | |
<para>TextAnalysisEngine has been deprecated - it is now no different than | |
AnalysisEngine. Previous code that uses this should still continue to work, | |
however.</para></formalpara> | |
<formalpara> | |
<title>Annotator Context deprecated; use UimaContext instead</title> | |
<para>The context for the Annotator is the same as the overall UIMA context. | |
The impl base classes provide a getContext() method which returns now the | |
UimaContext object.</para> | |
</formalpara> | |
<formalpara> | |
<title>DocumentAnalyzer tool uses XMI formats</title> | |
<para>The DocumentAnalyzer tool saves outputs in the new XMI serialization format. | |
The AnnotationViewer and SemanticSearchGUI tools can read both the new XMI format | |
and the previous XCAS format.</para></formalpara> | |
<formalpara> | |
<title>CAS Initializer deprecated</title> | |
<para>Example code that used CAS Initializers has been rewritten to not use this.</para> | |
</formalpara> | |
</section> | |
<section id="ugr.project_overview_backwards_compatibility"> | |
<title>Backwards Compatibility</title> | |
<para>Other than the changes from IBM UIMA to Apache UIMA described above, most UIMA 1.x | |
applications should not require additional changes to upgrade to UIMA 2.x. However, | |
there are a few exceptions that UIMA 1.x users may need to be aware of: | |
<itemizedlist> | |
<listitem> | |
<para> There have been some changes to ResultSpecifications. We do not | |
guarantee 100% backwards compatibility for applications that made use of | |
them, although most cases should work. </para> | |
</listitem> | |
<listitem> | |
<para> For applications that deal with multiple subjects of analysis (Sofas), | |
the rules that determine whether a component is Multi-View or Single-View | |
have been made more consistent. A component is considered Multi-View if and | |
only if it declares at least one inputSofa or outputSofa in its descriptor. | |
This leads to the following incompatibilities in unusual cases: | |
<itemizedlist> | |
<listitem> | |
<para> It is an error if an annotator that implements the TextAnnotator or | |
JTextAnnotator interface also declares inputSofas or outputSofas in | |
its descriptor. Such annotators must be Single-View. </para> | |
</listitem> | |
<listitem> | |
<para> Annotators that implement GenericAnnotator but do not declare | |
any inputSofas or outputSofas will now be passed the view of default | |
Sofa instead of the Base CAS. </para> | |
</listitem> | |
<listitem> | |
<para> As of version 2.7.0, all annotators will be passed the view of | |
the default Sofa. </para> | |
</listitem> | |
</itemizedlist> </para> | |
</listitem> | |
</itemizedlist> </para> | |
</section> | |
</section> | |
</section> | |
<section id="ugr.project_overview_migrating_from_ibm_uima"> | |
<title>Migrating from IBM UIMA to Apache UIMA</title> | |
<para>In Apache UIMA, several things have changed that require changes to user code and descriptors. | |
A migration utility is provided which will make the required updates to your files. The most | |
significant change is that the Java package names for all of the UIMA classes and interfaces have changed | |
from what they were in IBM UIMA; all of the package names now start with the prefix <literal>org.apache</literal>.</para> | |
<section id="ugr.project_overview_running_the_migration_utility"> | |
<title>Running the Migration Utility</title> | |
<note> | |
<para>Before running the migration utility, be sure to back up your files, just in case you encounter any | |
problems, because the migration tool updates the files in place in the directories where it finds them.</para> | |
</note> | |
<para> The migration utility is run by executing the script file | |
<literal>apache-uima/bin/ibmUimaToApacheUima.bat</literal> (Windows) or | |
<literal>apache-uima/bin/ibmUimaToApacheUima.sh</literal> (UNIX). You must pass one argument: the | |
directory containing the files that you want to be migrated. Subdirectories will be processed | |
recursively.</para> | |
<para>The script scans your files and applies the necessary updates, for example replacing the com.ibm | |
package names with the new org.apache package names. For more details on what has changed in the UIMA APIs and | |
what changes are performed by the migration script, see <xref linkend="ugr.project_overview_changes_from_2_0"/>.</para> | |
<para>The script will only attempt to modify files with the extensions: java, xml, xmi, wsdd, properties, | |
launch, bat, cmd, sh, ksh, or csh; and files with no extension. Also, files with size greater than 1,000,000 | |
bytes will be skipped. (If you want the script to modify files with other extensions, you can edit the script | |
file and change the <literal>-ext</literal> argument appropriately.) </para> | |
<para>If the migration tool reports warnings, there may be a few additional steps to take. The following two | |
sections explain some simple manual changes that you might need to make to your code.</para> | |
<section id="ugr.project_overview_running_the_migration_utility.jcas_for_document_annotation"> | |
<title>JCas Cover Classes for DocumentAnnotation</title> | |
<para> If you have run JCasGen it is likely that you have the classes | |
<literal>com.ibm.uima.jcas.tcas.DocumentAnnotation</literal> and | |
<literal>com.ibm.uima.jcas.tcas.DocumentAnnotation_Type</literal> as part of your code. This | |
package name is no longer valid, and the migration utility does not move your files between directories so | |
it is unable to fix this. </para> | |
<para> If you have not made manual modifications to these classes, the best solution is usually to just delete | |
these two classes (and their containing package). There is a default version in the | |
<literal>uima-document-annotation.jar</literal> file that is included in Apache UIMA. If you | |
<emphasis>have</emphasis> made custom changes, then you should not delete the file but instead move it to | |
the correct package <literal>org.apache.uima.jcas.tcas</literal>. For more information about JCas | |
and DocumentAnnotation please see <olink targetdoc="&uima_docs_ref;"/> | |
<olink targetdoc="&uima_docs_ref;" | |
targetptr="ugr.ref.jcas.documentannotation_issues"/> </para> | |
</section> | |
<section id="ugr.project_overview_running_the_migration_utility.manual_migration_needed.getdocumentannotation"> | |
<title>JCas.getDocumentAnnotation</title> | |
<para>The deprecated method <literal>JCas.getDocumentAnnotation</literal> has been removed. Its use | |
must be replaced with <literal>JCas.getDocumentAnnotationFs</literal>. The method | |
<literal>JCas.getDocumentAnnotationFs()</literal> returns type <literal>TOP</literal>, so your | |
code must cast this to type <literal>DocumentAnnotation</literal>. The reasons for this are described | |
in <olink targetdoc="&uima_docs_ref;"/> | |
<olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.jcas.documentannotation_issues"/>. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.project_overview_rare_migration"> | |
<title>Manual Migration</title> | |
<para>The following are rare cases where you may need to take additional steps to migrate your code. You need only | |
read this section if the migration tool reported a warning or if you are having trouble getting your code to | |
compile or run after running the migration. For most users, attention to these things will not | |
be required.</para> | |
<section id="ugr.project_overview.manual_migration_needed.xiinclude"> | |
<title>xi:include</title> | |
<para>The use of <xi:include> in UIMA component descriptors has been discouraged for some time, and in | |
Apache UIMA support for it has been removed. If you have descriptors that use that, you must change them to | |
use UIMA's <import> syntax instead. The proper syntax is described in <olink | |
targetdoc="&uima_docs_ref;"/> <olink | |
targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.component_descriptor.imports"/>. | |
</para> | |
</section> | |
<section id="ugr.project_overview.manual_migration_needed.duplicate_methods_cas_tcas"> | |
<title>Duplicate Methods Taking CAS and TCAS as Arguments</title> | |
<para>Because <literal>TCAS</literal> has been replaced by <literal>CAS</literal>, if you had two | |
methods distinguished only by whether an argument type was <literal>TCAS</literal> or | |
<literal>CAS</literal>, the migration tool will cause these to have identical signatures, which will be | |
a compile error. If this happens, consider why the two variants were needed in the first place. Often, it may | |
work to simply delete one of the methods.</para> | |
</section> | |
<section id="ugr.project_overview.manual_migration_needed.undocumented_methods"> | |
<title>Use of Undocumented Methods from the com.ibm.uima.util package</title> | |
<titleabbrev>Undocumented Methods</titleabbrev> | |
<para>Previous UIMA versions has some methods in the <literal>com.ibm.uima.util</literal> package that | |
were for internal use and were not documented in the Javadoc. (There are also many methods in that package | |
which are documented, and there is no issue with using these.) It is not recommended that you use any of the | |
undocumented methods. If you do, the migration script will not handle them correctly. These have now been | |
moved to <literal>org.apache.uima.internal.util</literal>, and you will have to manually update your | |
imports to point to this location.</para> | |
</section> | |
<section id="ugr.project_overview.manual_migration_needed.uima_package_names_in_user_code"> | |
<title>Use of UIMA Package Names for User Code</title> | |
<titleabbrev>Package Names</titleabbrev> | |
<para>If you have placed your own classes in a package that has exactly the same name as one of the UIMA packages | |
(not recommended), this will cause problems when your run the migration script. Since the script replaces | |
UIMA package names, all of your imports that refer to your class will get replaced and your code will no | |
longer compile. If this happens, you can fix it by manually moving your code to the new Apache UIMA package | |
name (i.e., whatever name your imports got replaced with). However, we recommend instead that you do not | |
use Apache UIMA package names for your own code.</para> | |
<para>An even more rare case would be if you had a package name that started with a capital letter (poor Java | |
style) AND was prefixed by one of the UIMA package names, for example a package named | |
<literal>com.ibm.uima.MyPackage</literal>. This would be treated as a class name and replaced with | |
<literal>org.apache.uima.MyPackage</literal> wherever it occurs.</para> | |
</section> | |
<section id="ugr.project_overview.manual_migration_needed.exceptions_extend_uima_exceptions"> | |
<title>CASException and CASRuntimeException now extend UIMA(Runtime)Exception</title> | |
<titleabbrev>Changes to CAS Exceptions</titleabbrev> | |
<para> | |
This change may affect user code to a small extent, as some of the APIs on | |
<literal>CASException</literal> and <literal>CASRuntimeException</literal> no longer exist. | |
On the up side, all UIMA exceptions are now derived from the same base classes and behave | |
the same way. The most significant change is that you can no longer check for the specific | |
type of exception the way you used to. For example, if you had code like this: | |
<programlisting>catch (CASRuntimeException e) { | |
if (e.getError() == CASRuntimeException.ILLEGAL_ARRAY_SIZE) { | |
// Do something in case this particular error is caught</programlisting> | |
you will need to replace it with the following: | |
<programlisting>catch (CASRuntimeException e) { | |
if (e.getMessageKey().equals(CASRuntimeException.ILLEGAL_ARRAY_SIZE)) { | |
// Do something in case this particular error is caught</programlisting> | |
as the message keys are now strings. This change is not handled by the migration script. | |
</para> | |
</section> | |
</section> | |
</section> | |
<section id="ugr.project_overview_summary"> | |
<title>Apache UIMA Summary</title> | |
<section id="ugr.ovv.summary.general"> | |
<title>General</title> | |
<para>UIMA supports the development, discovery, composition and deployment of multi-modal | |
analytics for the analysis of unstructured information and its integration with search | |
technologies.</para> | |
<para>Apache UIMA includes APIs and tools for creating analysis components. Examples of analysis components include | |
tokenizers, summarizers, categorizers, parsers, named-entity detectors etc. Tutorial examples are | |
provided with Apache UIMA; additional components are available from the community. </para> | |
<para>Apache UIMA does not itself include a semantic search engine; instructions are included for | |
incorporating the semantic search SDK from IBM's <ulink url="http://alphaworks.ibm.com/tech/uima">alphaWorks</ulink> | |
which can index the results of | |
analysis and for using this semantic index to perform more advanced search. </para> | |
</section> | |
<section id="ugr.ovv.summary.programming_language_support"> | |
<title>Programming Language Support</title> | |
<para>UIMA supports the development and integration of analysis algorithms developed in different | |
programming languages. </para> | |
<para>The Apache UIMA project is both a Java framework and a matching C++ | |
enablement layer, which allows annotators to be written in C++ and have access to a C++ version of the CAS. The | |
C++ enablement layer also enables annotators to be written in Perl, Python, and TCL, and to interoperate with | |
those written in other languages. <!--Documentation for this is provided here (link to be filled in).--> | |
</para> | |
</section> | |
<section id="ugr.ovv.general.summary.multi_modal_support"> | |
<title>Multi-Modal Support</title> | |
<para>The UIMA architecture supports the development, discovery, composition and deployment of | |
multi-modal analytics, including text, audio and video. <olink | |
targetdoc="&uima_docs_tutorial_guides;"/> <olink | |
targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aas"/> discuss this is more | |
detail.</para> | |
</section> | |
<section id="ugr.ovv.summary.general.semantic_search_components"> | |
<title>Semantic Search Components</title> | |
<para> The Lucene search engine as of this writing (November, 2006) does not support searching with | |
annotations. The site <ulink url="http://www.alphaworks.ibm.com/tech/uima"/> provides a download of a | |
semantic search engine, a simple demo query tool, some documentation on the semantic search engine, and a | |
component that connects the results of UIMA analysis to the indexer so that the annotations as well as | |
key-words can be indexed. </para> | |
<para>Previous versions of the UIMA SDK (prior to the Apache versions) are available from <ulink | |
url="http://www.alphaworks.ibm.com/tech/uima"> IBM's alphaWorks</ulink>. The source code for | |
previous versions of the main UIMA framework is available on <ulink | |
url="http://uima-framework.sourceforge.net/"> SourceForge</ulink>.</para> | |
</section> | |
</section> | |
<section id="ugr.project_overview_summary_sdk_capabilities"> | |
<title>Summary of Apache UIMA Capabilities</title> | |
<informaltable frame="all" rowsep="1" colsep="1"> | |
<tgroup cols="2"> | |
<colspec colnum="1" colname="col1" colwidth=".75*"/> | |
<colspec colnum="2" colname="col2" colwidth="*"/> | |
<tbody> | |
<row> | |
<entry role="tableSubhead">Module</entry> | |
<entry role="tableSubhead">Description</entry> | |
</row> | |
<row> | |
<entry>UIMA Framework Core</entry> | |
<entry> | |
<para>A framework integrating core functions for creating, deploying, running and managing UIMA | |
components, including analysis engines and Collection Processing Engines in collocated and/or | |
distributed configurations. </para> | |
<para>The framework includes an implementation of core components for transport layer adaptation, | |
CAS management, workflow management based on declarative specifications, resource management, | |
configuration management, logging, and other functions.</para> | |
</entry> | |
</row> | |
<row> | |
<entry>C++ and other programming language Interoperability</entry> | |
<entry> | |
<para>Includes C++ CAS and supports the creation of UIMA compliant C++ components that can be | |
deployed in the UIMA run-time through a built-in JNI adapter. This includes high-speed binary | |
serialization.</para> | |
<para>Includes support for creating service-based UIMA engines. This is ideal for | |
wrapping existing code written in different languages.</para> | |
</entry> | |
</row> | |
<row> | |
<entry role="tableSubhead">Framework Services and APIs</entry> | |
<entry role="tableSubhead">Note that interfaces of these components are available to the developer | |
but different implementations are possible in different implementations of the UIMA | |
framework.</entry> | |
</row> | |
<row> | |
<entry>CAS</entry> | |
<entry>These classes provide the developer with typed access to the Common Analysis Structure (CAS), | |
including type system schema, elements, subjects of analysis and indices. Multiple subjects of | |
analysis (Sofas) mechanism supports the independent or simultaneous analysis of multiple views of | |
the same artifacts (e.g. documents), supporting multi-lingual and multi-modal analysis.</entry> | |
</row> | |
<row> | |
<entry>JCas</entry> | |
<entry>An alternative interface to the CAS, providing Java-based UIMA Analysis components with | |
native Java object access to CAS types and their attributes or features, using the | |
JavaBeans conventions of getters and setters.</entry> | |
</row> | |
<row> | |
<entry>Collection Processing Management (CPM)</entry> | |
<entry>Core functions for running UIMA collection processing engines in collocated and/or | |
distributed configurations. The CPM provides scalability across parallel processing pipelines, | |
check-pointing, performance monitoring and recoverability.</entry> | |
</row> | |
<row> | |
<entry>Resource Manager</entry> | |
<entry>Provides UIMA components with run-time access to external resources handling capabilities | |
such as resource naming, sharing, and caching. </entry> | |
</row> | |
<row> | |
<entry>Configuration Manager</entry> | |
<entry>Provides UIMA components with run-time access to their configuration parameter settings. | |
</entry> | |
</row> | |
<row> | |
<entry>Logger</entry> | |
<entry>Provides access to a common logging facility.</entry> | |
</row> | |
<row> | |
<entry namest="col1" nameend="col2" align="center" role="tableSubhead"> Tools and Utilities | |
</entry> | |
</row> | |
<row> | |
<entry>JCasGen</entry> | |
<entry>Utility for generating a Java object model for CAS types from a UIMA XML type system | |
definition.</entry> | |
</row> | |
<row> | |
<entry>Saving and Restoring CAS contents</entry> | |
<entry>APIs in the core framework support saving and restoring the contents of a CAS to streams using an | |
XMI format. </entry> | |
</row> | |
<row> | |
<entry>PEAR Packager for Eclipse</entry> | |
<entry>Tool for building a UIMA component archive to facilitate porting, registering, installing and | |
testing components.</entry> | |
</row> | |
<row> | |
<entry>PEAR Installer</entry> | |
<entry>Tool for installing and verifying a UIMA component archive in a UIMA installation.</entry> | |
</row> | |
<row> | |
<entry>PEAR Merger</entry> | |
<entry>Utility that combines multiple PEARs into one.</entry> | |
</row> | |
<row> | |
<entry>Component Descriptor Editor</entry> | |
<entry>Eclipse Plug-in for specifying and configuring component descriptors for UIMA analysis | |
engines as well as other UIMA component types including Collection Readers and CAS | |
Consumers.</entry> | |
</row> | |
<row> | |
<entry>CPE Configurator</entry> | |
<entry>Graphical tool for configuring Collection Processing Engines and applying them to | |
collections of documents.</entry> | |
</row> | |
<row> | |
<entry>Java Annotation Viewer</entry> | |
<entry>Viewer for exploring annotations and related CAS data.</entry> | |
</row> | |
<row> | |
<entry>CAS Visual Debugger</entry> | |
<entry>GUI Java application that provides developers with detailed visual view of the contents of a | |
CAS.</entry> | |
</row> | |
<row> | |
<entry>Document Analyzer</entry> | |
<entry>GUI Java application that applies analysis engines to sets of documents and shows results in a | |
viewer.</entry> | |
</row> | |
<row> | |
<entry namest="col1" nameend="col2" align="center" role="tableSubhead"> Example Analysis | |
Components </entry> | |
</row> | |
<row> | |
<entry>Database Writer</entry> | |
<entry>CAS Consumer that writes the content of selected CAS types into a relational database, using | |
JDBC. This code is in cpe/PersonTitleDBWriterCasConsumer. </entry> | |
</row> | |
<row> | |
<entry>Annotators</entry> | |
<entry> Set of simple annotators meant for pedagogical purposes. Includes: Date/time, Room-number, | |
Regular expression, Tokenizer, and Meeting-finder annotator. There are sample CAS Multipliers | |
as well. </entry> | |
</row> | |
<row> | |
<entry>Flow Controllers</entry> | |
<entry> There is a sample flow-controller based on the whiteboard concept of sending the CAS to whatever | |
annotator hasn't yet processed it, when that annotator's inputs are available in the CAS. </entry> | |
</row> | |
<row> | |
<entry>XMI Collection Reader, CAS Consumer</entry> | |
<entry>Reads and writes the CAS in the XMI format</entry> | |
</row> | |
<row> | |
<entry>File System Collection Reader</entry> | |
<entry> Simple Collection Reader for pulling documents from the file system and initializing CASes. | |
</entry> | |
</row> | |
<row> | |
<entry namest="col1" nameend="col2" align="center" role="tableSubhead"> Components available | |
from <ulink url="http://www.alphaworks.ibm.com/tech/uima"></ulink> </entry> | |
</row> | |
<row> | |
<entry>Semantic Search CAS Indexer</entry> | |
<entry>A CAS Consumer that uses the semantic search engine indexer to build an index from a stream of | |
CASes. Requires the semantic search engine (available from the same place). </entry> | |
</row> | |
</tbody> | |
</tgroup> | |
</informaltable> | |
</section> | |
</chapter> |