uimaj-2.2.0-incubating/uima-docbooks/src/docbook/tutorials_and_users_guides/tug.cpe.xml - uima-uimaj - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"[
 <!ENTITY imgroot "../images/tutorials_and_users_guides/tug.cpe/">
 <!ENTITY % uimaents SYSTEM "../entities.ent">
 %uimaents;
 ]>
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->
 <chapter id="ugr.tug.cpe">
   <title>Collection Processing Engine Developer&apos;s Guide</title>
   <titleabbrev>CPE Developer&apos;s Guide</titleabbrev>

   <para>The UIMA Analysis Engine interface provides support for developing and integrating
     algorithms that analyze unstructured data. Analysis Engines are designed to operate on a
     per-document basis. Their interface handles one CAS at a time. UIMA provides additional
     support for applying analysis engines to collections of unstructured data with its
     <emphasis>Collection Processing Architecture</emphasis>. The Collection
     Processing Architecture defines additional components for reading raw data formats
     from data collections, preparing the data for processing by Analysis Engines, executing
     the analysis, extracting analysis results, and deploying the overall flow in a variety of
     local and distributed configurations.</para>

   <para>The functionality defined in the Collection Processing Architecture is
     implemented by a <emphasis>Collection Processing Engine</emphasis> (CPE). A CPE
     includes an Analysis Engine and adds a <emphasis>Collection Reader</emphasis>, a
     <emphasis>CAS Initializer</emphasis> (deprecated as of version 2), and <emphasis>CAS
     Consumers</emphasis>. The part of the UIMA Framework that supports the execution of
     CPEs is called the Collection Processing Manager, or CPM.</para>

   <para>A Collection Reader provides the interface to the raw input data and knows how to
     iterate over the data collection. Collection Readers are discussed in <xref
       linkend="ugr.tug.cpe.collection_reader.developing"/>. The CAS Initializer
     <footnote><para>CAS Initializers are deprecated in favor of a more general mechanism,
     multiple subjects of analysis.</para></footnote> prepares an individual data item for
     analysis and loads it into the CAS. CAS Initializers are discussed in <xref
       linkend="ugr.tug.cpe.cas_initializer.developing"/> A CAS Consumer extracts
     analysis results from the CAS and may also perform <emphasis>collection level
     processing</emphasis>, or analysis over a collection of CASes. CAS Consumers are
     discussed in <xref linkend="ugr.tug.cpe.cas_consumer.developing"/>.</para>

   <para>Analysis Engines and CAS Consumers are both instances of <emphasis>CAS
     Processors</emphasis>. A Collection Processing Engine (CPE) may contain multiple CAS
     Processors. An Analysis Engine contained in a CPE may itself be a Primitive or an Aggregate
     (composed of other Analysis Engines). Aggregates may contain Cas Consumers. While
     Collection Readers and CAS Initializers always run in the same JVM as the CPM, a CAS
     Processor may be deployed in a variety of local and distributed modes, providing a number
     of options for scalability and robustness. The different deployment options are covered
     in detail in <xref linkend="ugr.tug.cpe.deployment_alternatives"/>.</para>

   <para>Each of the components in a CPE has an interface specified by the UIMA Collection
     Processing Architecture and is described by a declarative XML descriptor file.
     Similarly, the CPE itself has a well defined component interface and is described by a
     declarative XML descriptor file.</para>

   <para>A user creates a CPE by assembling the components mentioned above. The UIMA SDK
     provides a graphical tool, called the CPE Configurator, for assisting in the assembly of
     CPEs. Use of this tool is summarized in <xref
       linkend="ugr.tug.cpe.cpe_configurator"/>, and more details can be found in <olink
       targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cpe"/>.
     Alternatively, a CPE can be assembled by writing an XML CPE descriptor. Details on the CPE
     descriptor, including its syntax and content, can be found in the <olink
       targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/>. The individual
     components have associated XML descriptors, each of which can be created and / or edited
     using the <olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cde">
     Component Description Editor</olink>.</para>

   <para>A CPE is executed by a UIMA infrastructure component called the
     <emphasis>Collection Processing Manager</emphasis> (CPM). The CPM provides a number
     of services and deployment options that cover instantiation and execution of CPEs, error
     recovery, and local and distributed deployment of the CPE components.</para>

   <section id="ugr.tug.cpe.concepts">
     <title>CPE Concepts</title>

     <para> <xref linkend="ugr.tug.cpe.fig.cpe_components"/> illustrates the data flow
       that occurs between the different types of components that make up a CPE.</para>

     <figure id="ugr.tug.cpe.fig.cpe_components">
       <title>CPE Components</title>
       <mediaobject>
         <imageobject>
           <imagedata width="5.84in" format="PNG"
             fileref="&imgroot;image002.png"/>
         </imageobject>
         <textobject><phrase>CPE Components and flow between them</phrase>
         </textobject>
       </mediaobject>
     </figure>

     <para>The components of a CPE are:</para>

     <itemizedlist><listitem><para><emphasis>Collection Reader &ndash;</emphasis>
       interfaces to a collection of data items (e.g., documents) to be analyzed. Collection
       Readers return CASes that contain the documents to analyze, possibly along with
       additional metadata.</para></listitem>

       <listitem><para><emphasis>Analysis Engine &ndash;</emphasis> takes a CAS,
         analyzes its contents, and produces an enriched CAS. Analysis Engines can be
         recursively composed of other Analysis Engines (called an
         <emphasis>Aggregate</emphasis> Analysis Engine). Aggregates may also contain
         CAS Consumers.</para></listitem>

       <listitem><para><emphasis>CAS Consumer &ndash;</emphasis> consume the enriched
         CAS that was produced by the sequence of Analysis Engines before it, and produce an
         application-specific data structure, such as a search engine index or database.
         </para></listitem></itemizedlist>

     <para>A fourth type of component, the <emphasis>CAS Initializer,</emphasis> may be
       used by a Collection Reader to populate a CAS from a document. However, as of UIMA
       version 2 CAS Initializers are now deprecated in favor of a more general mechsanism,
       multiple Subjects of Analysis.</para>

     <para>The Collection Processing Manager orchestrates the data flow
       within a CPE, monitors status, optionally manages the life-cycle of internal
       components and collects statistics.</para>

     <para>CASes are not saved in a persistent way by the framework. If you want to save CASes,
       then you have to save each CAS as it comes through (for example) using a CAS Consumer you
       write to do this, in whatever format you like. The UIMA SDK supplies an example CAS
       Consumer to save CASes to XML files, either in the standard XMI format or in an older
       format called XCAS.  It also supplies an example CAS Consumer to extract information from CASes and
       store the results into a relational Database, using Java&apos;s JDBC APIs.</para>

   </section>

   <section id="ugr.tug.cpe.configurator_and_viewer">
     <title>CPE Configurator and CAS viewer</title>

     <section id="ugr.tug.cpe.cpe_configurator">
       <title>Using the CPE Configurator</title>

       <para>A CPE can be assembled by writing an XML CPE descriptor. Details on the CPE
         descriptor, including its syntax and content, can be found in <olink
           targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/>. Rather than
         edit raw XML, you may develop a CPE Descriptor using the CPE Configurator tool. The CPE
         Configurator tool is described briefly in this section, and in more detail in <olink
           targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cpe"/>.</para>

       <para>The CPE Configurator tool can be run from Eclipse (see <xref
           linkend="ugr.tug.cpe.running_cpe_configurator_from_eclipse"/>, or using
         the <literal>cpeGui</literal> shell script (<literal>cpeGui.bat</literal> on
         Windows, <literal>cpeGui.sh</literal> on Unix), which is located in the
         <literal>bin</literal> directory of the UIMA SDK installation. Executing this
         batch file will display the window shown here:


         <screenshot>
           <mediaobject>
             <imageobject>
               <imagedata width="5.84in" format="JPG" fileref="&imgroot;image004.jpg"/>
             </imageobject>
             <textobject><phrase>Screenshot of CPE GUI</phrase></textobject>
           </mediaobject>
         </screenshot>
         </para>

       <para>The window is divided into three sections, one each for the Collection Reader,
         Analysis Engines, and CAS Consumers.<footnote><para>There is also a fourth pane,
         for the CAS Initializer, but it is hidden by default.  To enable it click the
         <literal>View &rarr; CAS Initializer Panel</literal> menu item.</para></footnote>
         In each section, you select the component(s) you want to include in the CPE by
         browsing to their XML descriptors. The configuration parameters present in the XML
         descriptors will then be displayed in the GUI; these can be modified to override
         the values present in the descriptor. For example, the screen shot below shows the
         CPE Configurator after the following components have been chosen:


         <programlisting>Collection Reader:
    %UIMA_HOME%/examples/descriptors/collection_reader/
           FileSystemCollectionReader.xml

 Analysis Engine:
    %UIMA_HOME%/examples/descriptors/analysis_engine/
           NamesAndPersonTitles_TAE.xml

 CAS Consumer:
     %UIMA_HOME%/examples/descriptors/cas_consumer/
           XmiWriterCasConsumer.xml</programlisting></para>


       <screenshot>
      <mediaobject>
       <imageobject>
         <imagedata width="5.84in" format="JPG" fileref="&imgroot;image006.jpg"/>
       </imageobject>
       <textobject><phrase>Screenshot of CPE GUI after fields filled in</phrase></textobject>
     </mediaobject>
     </screenshot>

       <para>For the File System Collection Reader, ensure that the Input Directory is set to
         <literal>%UIMA_HOME%\examples\data</literal><footnote><para>Replace
         <literal>%UIMA_HOME%</literal> with the path to where you installed UIMA.</para>
         </footnote>. The other parameters may be left blank. For the External CAS Writer CAS
         Consumer, ensure that the Output Directory is set to
         <literal>%UIMA_HOME%\examples\data\processed</literal>.</para>

       <para>After selecting each of the components and providing configuration settings,
         click the play (forward arrow) button at the bottom of the screen to begin processing.
         A progress bar should be displayed in the lower left corner. (Note that the progress
         bar will not begin to move until all components have completed their initialization,
         which may take several seconds.) Once processing has begun, the pause and stop
         buttons become enabled.</para>

       <para>If an error occurs, you will be informed by an error dialog. If processing
         completes successfully, you will be presented with a performance report.</para>

       <para>Using the File menu, you can select <literal>Save CPE Descriptor </literal>to
         create an .xml descriptor file that defines the CPE you have constructed. Later, you
         can use <literal>Open CPE Descriptor</literal> to restore the CPE Configurator to
         the saved state. Also, CPE descriptors can be used to run a CPE from a Java program
         &ndash; see section <xref
           linkend="ugr.tug.cpe.running_cpe_from_application"/>. CPE Descriptors
         allow specifying operational parameters, such as error handling options, that are
         not currently available for configuration through the CPE Configurator. For more
         information on manually creating a CPE Descriptor, see the <olink
           targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/>.</para>

       <para>The CPE configured above runs a simple name and title annotator on the sample data
         provided with the UIMA SDK and stores the results using the XMI Writer CAS Consumer. To
         view the results, start the External CAS Annotation Viewer by running the
         <literal>annotationViewer</literal> batch file
         (<literal>annotationViewer.bat</literal> on Windows,
         <literal>annotationViewer.sh</literal> on Unix), which is located in the
         <literal>bin</literal> directory of the UIMA SDK installation. Executing this
         batch file will display the window shown here:


         <screenshot>
     <mediaobject>
       <imageobject>
         <imagedata width="5.5in" format="JPG" fileref="&imgroot;image008.jpg"/>
       </imageobject>
       <textobject><phrase>Screenshot of Annotation Viewer results</phrase></textobject>
     </mediaobject>
   </screenshot>
         </para>

       <para>Ensure that the Input Directory is the same as the Output Directory specified for
         the XMI Writer CAS Consumer in the CPE configured above (e.g.,
         <literal>%UIMA_HOME%\examples\data\processed</literal>) and that the TAE
         Descriptor File is set to the Analysis Engine used in the CPE configured above (e.g.,
         <literal>examples\descriptors\analysis_engine\NamesAndPersonTitles_TAE.xml</literal>
         ).</para>

       <para>Click the View button to display the Analyzed Documents window:


         <screenshot>
     <mediaobject>
       <imageobject>
         <imagedata width="3.5in" format="JPG" fileref="&imgroot;image010.jpg"/>
       </imageobject>
       <textobject><phrase>Screenshot of CPE Configurator Analyzed Documents</phrase></textobject>
     </mediaobject>
   </screenshot>
         </para>

       <para>Double click on any document in the list to view the analyzed document. Double
         clicking the first document, IBM_LifeSciences.txt, will bring up the following
         window:


         <screenshot>
     <mediaobject>
       <imageobject>
         <imagedata width="5.84in" format="JPG" fileref="&imgroot;image012.jpg"/>
       </imageobject>
       <textobject><phrase>Screenshot of Document and Annotation Viewer</phrase></textobject>
     </mediaobject>
   </screenshot>
         </para>

       <para>This window shows the analysis results for the document. Clicking on any
         highlighted annotation causes the details for that annotation to be displayed in the
         right-hand pane. Here the annotation spanning <quote>John M. Thompson</quote> has
         been clicked.</para>

       <para>Congratulations! You have successfully configured a CPE, saved its
         descriptor, run the CPE, and viewed the analysis results.</para>
     </section>

     <section id="ugr.tug.cpe.running_cpe_configurator_from_eclipse">
       <title>Running the CPE Configurator from Eclipse</title>

       <para>If you have followed the instructions in <olink
           targetdoc="&uima_docs_overview;"
           targetptr="ugr.ovv.eclipse_setup"/> and imported the example Eclipse
         project, then you should already have a Run configuration for the CPE Configurator
         tool (called <literal>UIMA CPE GUI</literal>) configured to run in the example
         project. Simply run that configuration to start the CPE Configurator.</para>

       <para>If you haven&apos;t followed the Eclipse setup instructions and wish to run the
         CPE Configurator tool from Eclipse, you will need to do the following. As installed,
         this Eclipse launch configuration is associated with the
         <quote>uimaj-examples</quote> project. If you&apos;ve not already done so, you
         may wish to import that project into your Eclipse workspace. It&apos;s located in
         %UIMA_HOME%/docs/examples. Doing this will supply the Eclipse launcher with all
         the class files it needs to run the CPE configurator. If you don&apos;t do this, please
         manually add the JAR files for UIMA to the launch configuration.</para>
       <para>Also, you need to add any projects or JAR files for any UIMA components you will be
         running to the launch class path.</para> <note><para>A simpler alternative may be
       to change the CPE launch configuration to be based on your project. If you do that, it will
       pick up all the files in your project&apos;s class path, which you should set up to
       include all the UIMA framework files. An easy way to do this is to specify in your
       project&apos;s properties&apos; build-path that the uimaj-examples project is on
       the build path, because the uimaj-examples project is set up to include all the UIMA
       framework classes in its classpath already. </para></note>

       <para>Next, in the Eclipse menu select <literal>Run &rarr;
         Run</literal>..., which brings up the Run configuration screen.</para>

       <para>In the Main tab, set the main class to
         <literal>org.apache.uima.tools.cpm.CpmFrame</literal></para>

       <para>In the arguments tab, add the following to the VM arguments:


         <programlisting>-Xms128M -Xmx256M
 -Duima.home="C:\Program Files\Apache\uima"</programlisting>
         (or wherever you installed the UIMA SDK)</para>

       <para>Click the Run button to launch the CPE Configurator, and use it as previously
         described in this section.</para>

     </section>
   </section>

   <section id="ugr.tug.cpe.running_cpe_from_application">
     <title>Running a CPE from Your Own Java Application</title>

     <para>The simplest way to run a CPE from a Java application is to first create a CPE
       descriptor as described in the previous section. Then the CPE can be instantiated and
       run using the following code:


       <programlisting>      //parse CPE descriptor in file specified on command line
 CpeDescription cpeDesc = UIMAFramework.getXMLParser().
         parseCpeDescription(new XMLInputSource(args[0]));

       //instantiate CPE
 mCPE = UIMAFramework.produceCollectionProcessingEngine(cpeDesc);

       //Create and register a Status Callback Listener
 mCPE.addStatusCallbackListener(new StatusCallbackListenerImpl());

       //Start Processing
 mCPE.process();</programlisting></para>

     <para>This will start the CPE running in a separate thread.</para>

     <note><para>The <literal>process()</literal> method for a CPE can only be called once.  If you
     need to call it again, you have to instantiate a new CPE, and call that new CPE's process
     method.</para></note>

     <section id="ugr.tug.cpe.using_listeners">
       <title>Using Listeners</title>

       <para>Updates of the CPM&apos;s progress, including any errors that occur, are sent to
         the callback handler that is registered by the call to
         <literal>addStatusCallbackListener</literal>, above. The callback handler is a
         class that implements the CPM&apos;s
         <literal>StatusCallbackListener</literal> interface. It responds to events by
         printing messages to the console. The source code is fairly straightforward and is
         not included in this chapter &ndash; see the
         <literal>org.apache.uima.examples.cpe.SimpleRunCPE.java</literal> in the
         <literal>%UIMA_HOME%\examples\src</literal> directory for the complete
         code.</para>

       <para>If you need more control over the information in the CPE descriptor, you can
         manually configure it via its API. See the Javadocs for package
         <literal>org.apache.uima.collection</literal> for more details.</para>

     </section>
   </section>

   <section id="ugr.tug.cpe.developing_collection_processing_components">
     <title>Developing Collection Processing Components</title>

     <para>This section is an introduction to the process of developing Collection Readers,
       CAS Initializers, and CAS Consumers. The code snippets refer to the classes that can be
       found in <literal>%UIMA_HOME%\examples\src </literal>example project.</para>

     <para>In the following sections, classes you write to represent components need to be
       public and have public, 0-argument constructors, so that they can be instantiated by
       the framework. (Although Java classes in which you do not define any constructor will,
       by default, have a 0-argument constructor that doesn&apos;t do anything, a class in
       which you have defined at least one constructor does not get a default 0-argument
       constructor.)</para>

     <section id="ugr.tug.cpe.collection_reader.developing">
       <title>Developing Collection Readers</title>

       <para>A Collection Reader is responsible for obtaining documents from the collection
         and returning each document as a CAS. Like all UIMA components, a Collection Reader
         consists of two parts &mdash; the code and an XML descriptor.</para>

       <para>A simple example of a Collection Reader is the <quote>File System Collection
         Reader,</quote> which simply reads documents from files in a specified directory.
         The Java code is in the class
         <literal>org.apache.uima.examples.cpe.FileSystemCollectionReader</literal>
         and the XML descriptor is
         <literal>%UIMA_HOME%/examples/src/main/descriptors/collection_reader/
           FileSystemCollectionReader.xml</literal>.</para>

       <section id="ugr.tug.cpe.collection_reader.java_class">
         <title>Java Class for the Collection Reader</title>

         <para>The Java class for a Collection Reader must implement the
           <literal>org.apache.uima.collection.CollectionReader</literal>
           interface. You may build your Collection Reader from scratch and implement this
           interface, or you may extend the convenience base class
           <literal>org.apache.uima.collection.CollectionReader_ImplBase</literal>
           .</para>

         <para>The convenience base class provides default implementations for many of the
           methods defined in the <literal>CollectionReader</literal> interface, and
           provides abstract definitions for those methods that you are required to
           implement in your new Collection Reader. Note that if you extend this base class,
           you do not need to declare that your new Collection Reader implements the
           <literal>CollectionReader</literal> interface.</para> <tip><para>Eclipse
         tip &ndash; if you are using Eclipse, you can quickly create the boiler plate code and
         stubs for all of the required methods by clicking <literal>File</literal>
         &rarr; <literal>New</literal> &rarr; <literal>Class</literal> to bring up the <quote>New Java Class</quote>
         dialogue, specifying
         <literal>org.apache.uima.collection.CollectionReader_ImplBase</literal>
         as the Superclass, and checking <quote>Inherited abstract methods</quote> in the
         section <quote>Which method stubs would you like to create?</quote>, as in the
         screenshot below:</para></tip>

         <screenshot>
     <mediaobject>
       <imageobject>
         <imagedata width="4.4in" format="JPG" fileref="&imgroot;image014.jpg"/>
       </imageobject>
       <textobject><phrase>Screenshot showing Eclipse new class wizard</phrase></textobject>
     </mediaobject>
   </screenshot>

         <para>For the rest of this section we will assume that your new Collection Reader
           extends the <literal>CollectionReader_ImplBase</literal> class, and we will
           show examples from the
           <literal>org.apache.uima.examples.cpe.FileSystemCollectionReader</literal>
           . If you must inherit from a different superclass, you must ensure that your
           Collection Reader implements the <literal>CollectionReader</literal>
           interface &ndash; see the Javadocs for <literal>CollectionReader</literal>
           for more details.</para>
       </section>

       <section id="ugr.tug.cpe.collection_reader.required_methods">
         <title>Required Methods in the Collection Reader class</title>


         <para>The following abstract methods must be implemented:</para>

         <section id="ugr.tug.cpe.collection_reader.required_methods.initialize">
           <title>initialize()</title>

           <para>The <literal>initialize()</literal> method is called by the framework
             when the Collection Reader is first created.
             <literal>CollectionReader_ImplBase</literal> actually provides a default
             implementation of this method (i.e., it is not abstract), so you are not strictly
             required to implement this method. However, a typical Collection Reader will
             implement this method to obtain parameter values and perform various
             initialization steps.</para>

           <para>In this method, the Collection Reader class can access the values of its
             configuration parameters and perform other initialization logic. The example
             File System Collection Reader reads its configuration parameters and then
             builds a list of files in the specified input directory, as follows:</para>


           <programlisting>public void initialize() throws ResourceInitializationException {
   File directory = new File(
             (String)getConfigParameterValue(PARAM_INPUTDIR));
   mEncoding = (String)getConfigParameterValue(PARAM_ENCODING);
   mDocumentTextXmlTagName = (String)getConfigParameterValue(PARAM_XMLTAG);
   mLanguage = (String)getConfigParameterValue(PARAM_LANGUAGE);
   mCurrentIndex = 0;

   //get list of files (not subdirectories) in the specified directory
   mFiles = new ArrayList();
   File[] files = directory.listFiles();
   for (int i = 0; i &lt; files.length; i++) {
     if (!files[i].isDirectory()) {
       mFiles.add(files[i]);
     }
   }
 }</programlisting>
           <note><para>This is the zero-argument version of the initialize method. There is
           also a method on the Collection Reader interface called
           <literal>initialize(ResourceSpecifier, Map)</literal> but it is not
           recommended that you override this method in your code. That method performs
           internal initialization steps and then calls the zero-argument
           <literal>initialize()</literal>. </para></note>

         </section>

         <section id="ugr.tug.cpe.collection_reader.hasnext">
           <title>hasNext()</title>

           <para>The <literal>hasNext()</literal> method returns whether or not there are
             any documents remaining to be read from the collection. The File System
             Collection Reader&apos;s <literal>hasNext()</literal> method is very
             simple. It just checks if there are any more files left to be read:


             <programlisting>public boolean hasNext() {
   return mCurrentIndex &lt; mFiles.size();
 }</programlisting>
             </para>

         </section>

         <section id="ugr.tug.cpe.collection_reader.required_methods.getnext">
           <title>getNext(CAS)</title>

           <para>The <literal>getNext()</literal> method reads the next document from the
             collection and populates a CAS. In the simple case, this amounts to reading the
             file and calling the CAS&apos;s <literal>setDocumentText</literal> method.
             The example File System Collection Reader is slightly more complex. It first
             checks for a CAS Initializer. If the CPE includes a CAS Initializer, the CAS
             Initializer is used to read the document, and
             <literal>initialize()</literal> the CAS. If the CPE does not include a CAS
             Initializer, the File System Collection Reader reads the document and sets the
             document text in the CAS.</para>

           <para>The File System Collection Reader also stores additional metadata about
             the document in the CAS. In particular, it sets the document&apos;s language in
             the special built-in feature structure
             <literal>uima.tcas.DocumentAnnotation </literal>(see <olink
               targetdoc="&uima_docs_ref;"
               targetptr="ugr.ref.cas.document_annotation"/> for details about this
             built-in type) and creates an instance of
             <literal>org.apache.uima.examples.SourceDocumentInformation</literal>
             , which stores information about the document&apos;s source location. This
             information may be useful to downstream components such as CAS Consumers. Note
             that the type system descriptor for this type can be found in
             <literal>org.apache.uima.examples.SourceDocumentInformation.xml</literal>
             , which is located in the <literal>examples/src</literal> directory.</para>

           <para>The getNext() method for the File System Collection Reader looks like
             this:</para>


           <programlisting>  public void getNext(CAS aCAS) throws IOException, CollectionException {
     JCas jcas;
     try {
       jcas = aCAS.getJCas();
     } catch (CASException e) {
       throw new CollectionException(e);
     }

     // open input stream to file
     File file = (File) mFiles.get(mCurrentIndex++);
     BufferedInputStream fis =
             new BufferedInputStream(new FileInputStream(file));
     try {
       byte[] contents = new byte[(int) file.length()];
       fis.read(contents);
       String text;
       if (mEncoding != null) {
         text = new String(contents, mEncoding);
       } else {
         text = new String(contents);
       }
       // put document in CAS
       jcas.setDocumentText(text);
     } finally {
       if (fis != null)
         fis.close();
     }

     // set language if it was explicitly specified
     //as a configuration parameter
     if (mLanguage != null) {
       ((DocumentAnnotation) jcas.getDocumentAnnotationFs()).
             setLanguage(mLanguage);
     }

     // Also store location of source document in CAS.
     // This information is critical if CAS Consumers will
     // need to know where the original document contents
     // are located.
     // For example, the Semantic Search CAS Indexer
     // writes this information into the search index that
     // it creates, which allows applications that use the
     // search index to locate the documents that satisfy
     //their semantic queries.
     SourceDocumentInformation srcDocInfo =
             new SourceDocumentInformation(jcas);
     srcDocInfo.setUri(
             file.getAbsoluteFile().toURL().toString());
     srcDocInfo.setOffsetInSource(0);
     srcDocInfo.setDocumentSize((int) file.length());
     srcDocInfo.setLastSegment(
             mCurrentIndex == mFiles.size());
     srcDocInfo.addToIndexes();
   }</programlisting>

           <para>The Collection Reader can create additional annotations in the CAS at this
             point, in the same way that annotators create annotations.</para>
         </section>

         <section id="ugr.tug.cpe.collection_reader.required_methods.getprogress">
           <title>getProgress()</title>
           <para>The Collection Reader is responsible for returning progress information;
             that is, how much of the collection has been read thus far and how much remains to be
             read. The framework defines progress very generally; the Collection Reader
             simply returns an array of <literal>Progress</literal> objects, where each
             object contains three fields &mdash; the amount already completed, the total
             amount (if known), and a unit (e.g. entities (documents), bytes, or files). The
             method returns an array so that the Collection Reader can report progress in
             multiple different units, if that information is available. The File System
             Collection Reader&apos;s <literal>getProgress()</literal> method looks
             like this:


             <programlisting>public Progress[] getProgress() {
   return new Progress[]{
      new ProgressImpl(mCurrentIndex,mFiles.size(),Progress.ENTITIES)};
 }</programlisting></para>

           <para>In this particular example, the total number of files in the collection is
             known, but the total size of the collection is not known. As such, a
             <literal>ProgressImpl</literal> object for
             <literal>Progress.ENTITIES</literal> is returned, but a
             <literal>ProgressImpl</literal> object for
             <literal>Progress.BYTES</literal> is not.</para>

         </section>

         <section id="ugr.tug.cpe.collection_reader.required_methods.close">
           <title>close()</title>

           <para>The close method is called when the Collection Reader is no longer needed.
             The Collection Reader should then release any resources it may be holding. The
             FileSystemCollectionReader does not hold resources and so has an empty
             implementation of this method:</para>


           <programlisting>public void close() throws IOException { }</programlisting>

         </section>

         <section id="ugr.tug.cpe.collection_reader.optional_methods">
           <title>Optional Methods</title>

           <para>The following methods may be implemented:</para>

           <section id="ugr.tug.cpe.collection_reader.optional_methods.reconfigure">
             <title>reconfigure()</title>
             <para>This method is called if the Collection Reader&apos;s configuration
               parameters change.</para>
           </section>

           <section id="ugr.tug.cpe.collection_reader.optional_methods.typesysteminit">
             <title>typeSystemInit()</title>

             <para>If you are only setting the document text in the CAS, or if you are using the
               JCas (recommended, as in the current example, you do not have to implement this
               method. If you are directly using the CAS API, this method is used in the same way
               as it is used for an annotator &ndash; see <olink
                 targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aae.contract_for_annotator_methods"/>
               for more information.</para>
           </section>
         </section>

         <section id="ugr.tug.cpe.collection_reader.threading">
           <title>Threading considerations</title>

           <para>Collection readers do not have to be thread safe; they are run with a single
             thread per instance, and only one instance per instance of the Collection
             Processing Manager (CPM) is made.</para>

         </section>

         <section id="ugr.tug.cpe.collection_reader.descriptor">
           <title>XML Descriptor for a Collection Reader</title>

           <para>You can use the Component Description Editor to create and / or edit the File
             System Collection Reader&apos;s descriptor. Here is its descriptor
             (abbreviated somewhat), which is very similar to an Analysis
             Engine descriptor:</para>


           <programlisting><![CDATA[<collectionReaderDescription
           xmlns="http://uima.apache.org/resourceSpecifier">
   <frameworkImplementation>org.apache.uima.java</frameworkImplementation>
   <implementationName>
     org.apache.uima.examples.cpe.FileSystemCollectionReader
   </implementationName>
   <processingResourceMetaData>
     <name>File System Collection Reader</name>
     <description>Reads files from the filesystem.</description>
     <version>1.0</version>
     <vendor>The Apache Software Foundation</vendor>
     <configurationParameters>
       <configurationParameter>
         <name>InputDirectory</name>
         <description>Directory containing input files</description>
         <type>String</type>
         <multiValued>false</multiValued>
         <mandatory>true</mandatory>
       </configurationParameter>
       <configurationParameter>
         <name>Encoding</name>
         <description>Character encoding for the documents.</description>
         <type>String</type>
         <multiValued>false</multiValued>
         <mandatory>false</mandatory>
       </configurationParameter>
       <configurationParameter>
         <name>Language</name>
         <description>ISO language code for the documents</description>
         <type>String</type>
         <multiValued>false</multiValued>
         <mandatory>false</mandatory>
       </configurationParameter>
     </configurationParameters>
     <configurationParameterSettings>
       <nameValuePair>
         <name>InputDirectory</name>
         <value>
           <string>C:/Program Files/apache/uima/examples/data</string>
         </value>
       </nameValuePair>
     </configurationParameterSettings>

     <!-- Type System of CASes returned by this Collection Reader -->

     <typeSystemDescription>
       <imports>
         <import name="org.apache.uima.examples.SourceDocumentInformation"/>
       </imports>
     </typeSystemDescription>

     <capabilities>
       <capability>
         <inputs/>
         <outputs>
           <type allAnnotatorFeatures="true">
             org.apache.uima.examples.SourceDocumentInformation
           </type>
         </outputs>
       </capability>
     </capabilities>
     <operationalProperties>
       <modifiesCas>true</modifiesCas>
       <multipleDeploymentAllowed>false</multipleDeploymentAllowed>
       <outputsNewCASes>true</outputsNewCASes>
     </operationalProperties>
   </processingResourceMetaData>
 </collectionReaderDescription>]]></programlisting>

         </section>
       </section>
     </section>

     <section id="ugr.tug.cpe.cas_initializer.developing"><title>Developing CAS
       Initializers</title> <note><para>CAS Initializers are now deprecated (as of
       version 2.1). For complex initialization, please use instead the capabilities of
       creating additional Subjects of Analysis (see <olink
         targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.mvs"/>
       ). </para></note>

       <para>In UIMA 1.x, the CAS Initializer component was intended to be used as a plug-in
         to the Collection Reader for when the task of populating the CAS from a raw document is
         complex and might be reusable with other data collections.</para>

       <para>A CAS Initializer Java class must implement the interface
         <literal>org.apache.uima.collection.CasInitializer</literal>, and will also
         generally extend from the convenience base class
         <literal>org.apache.uima.collection.CasInitializer_ImplBase</literal>. A
         CAS Initializer also must have an XML descriptor, which has the exact same form as a
         Collection Reader Descriptor except that the outer tag is
         <literal>&lt;casInitializerDescription&gt;</literal>.</para>

       <para>CAS Initializers have optional <literal>initialize()</literal>,
         <literal>reconfigure()</literal>, and <literal>typeSystemInit()</literal>
         methods, which perform the same functions as they do for Collection Readers. The only
         required method for a CAS Initializer is <literal>initializeCas(Object,
         CAS)</literal>. This method takes the raw document (for example, an
         <literal>InputStream</literal> object from which the document can be read) and a
         CAS, and populates the CAS from the document.</para>
     </section>

     <section id="ugr.tug.cpe.cas_consumer.developing"><title>Developing CAS
       Consumers</title>

       <note><para>In version 2, there is no difference in capability
       between CAS Consumers and ordinary Analysis Engines, except for the default setting of
       the XML parameters for <literal>multipleDeploymentAllowed</literal> and
       <literal>modifiesCas</literal>. We recommend for future work that users implement
       and use Analysis Engine components instead of CAS Consumers.</para>
       <para>The rest of this section is written using the version 1 style of CAS Consumer;
       the methods described are also available for Analysis Engines.  Note that the
       CAS Consumer <literal>processCAS</literal> method is equivalent to the Analysis Engine
       <literal>process</literal> method.</para></note>

       <para>A CAS Consumer receives each CAS after it has been analyzed by the Analysis
         Engine. CAS Consumers typically do not update the CAS; they typically extract data
         from the CAS and persist selected information to aggregate data structures such as
         search engine indexes or databases.</para>

       <para>A CAS Consumer Java class must implement the interface
         <literal>org.apache.uima.collection.CasConsumer</literal>, and will also
         generally extend from the convenience base class
         <literal>org.apache.uima.collection.CasConsumer_ImplBase</literal>. A CAS
         Consumer also must have an XML descriptor, which has the exact same form as a
         Collection Reader Descriptor except that the outer tag is
         <literal>&lt;casConsumerDescription&gt;</literal>.</para>

       <para>CAS Consumers have optional <literal>initialize()</literal>,
         <literal>reconfigure()</literal>, and <literal>typeSystemInit()</literal>
         methods, which perform the same functions as they do for Collection Readers and CAS
         Initializers. The only required method for a CAS Consumer is
         <literal>processCas(CAS)</literal>, which is where the CAS Consumer does the bulk
         of its work (i.e., consume the CAS).</para>

       <para>The <literal>CasConsumer</literal> interface (as well as the version 2
         Analysis Engine interfac) additionally defines batch
         and collection level processing methods. The CAS Consumer or Analysis Engine
         can implement the
         <literal>batchProcessComplete()</literal> method to perform processing that
         should occur at the end of each batch of CASes. Similarly, the CAS Consumer
         or Analysis Engine can
         implement the <literal>collectionProcessComplete()</literal> method to
         perform any collection level processing at the end of the collection.</para>

       <para>A very simple example of a CAS Consumer, which writes an XML representation of the
         CAS to a file, is the XMI Writer CAS Consumer. The Java code is in the class
         <literal>org.apache.uima.examples.cpe.XmiWriterCasConsumer</literal> and
         the descriptor is in
         <literal>%UIMA_HOME%/examples/descriptors/cas_consumer/XmiWriterCasConsumer.xml</literal>
         .</para>

       <section id="ugr.tug.cpe.cas_consumer.required_methods">
         <title>Required Methods for a CAS Consumer</title>

         <para>When extending the convenience class
           <literal>org.apache.uima.collection.CasConsumer_ImplBase</literal>, the
           following abstract methods must be implemented:</para>

         <section id="ugr.tug.cpe.cas_consumer.required_methods.initialize">
           <title>initialize()</title>
           <para>The <literal>initialize()</literal> method is called by the framework
             when the CAS Consumer is first created.
             <literal>CasConsumer_ImplBase</literal> actually provides a default
             implementation of this method (i.e., it is not abstract), so you are not strictly
             required to implement this method. However, a typical CAS Consumer will
             implement this method to obtain parameter values and perform various
             initialization steps.</para>

           <para>In this method, the CAS Consumer can access the values of its configuration
             parameters and perform other initialization logic. The example XMI Writer CAS
             Consumer reads its configuration parameters and sets up the output directory:


             <programlisting>public void initialize() throws ResourceInitializationException {
   mDocNum = 0;
   mOutputDir = new File((String) getConfigParameterValue(PARAM_OUTPUTDIR));
   if (!mOutputDir.exists()) {
     mOutputDir.mkdirs();
   }
 }</programlisting></para>
         </section>

         <section id="ugr.tug.cpe.cas_consumer.required_methods.processcas">
           <title>processCas()</title>

           <para>The <literal>processCas()</literal> method is where the CAS Consumer
             does most of its work. In our example, the XMI Writer CAS Consumer obtains an
             iterator over the document metadata in the CAS (in the
             SourceDocumentInformation feature structure, which is created by the File
             System Collection Reader) and extracts the URI for the current document. From
             this the output filename is constructed in the output directory and a subroutine
             (<literal>writeXmi</literal>) is called to generate the output file. The
             <literal>writeXmi</literal> subroutine uses the
             <literal>XmiCasSerializer</literal> class provided with the UIMA SDK to
             serialize the CAS to the output file (see the example source code for
             details).</para>


           <programlisting>public void processCas(CAS aCAS) throws ResourceProcessException {
   String modelFileName = null;

   JCas jcas;
   try {
     jcas = aCAS.getJCas();
   } catch (CASException e) {
     throw new ResourceProcessException(e);
   }

     // retreive the filename of the input file from the CAS
   FSIterator it = jcas
             .getAnnotationIndex(SourceDocumentInformation.type)
                   .iterator();
   File outFile = null;
   if (it.hasNext()) {
     SourceDocumentInformation fileLoc =
             (SourceDocumentInformation) it.next();
     File inFile;
     try {
       inFile = new File(new URL(fileLoc.getUri()).getPath());
       String outFileName = inFile.getName();
       if (fileLoc.getOffsetInSource() > 0) {
         outFileName += ("_" + fileLoc.getOffsetInSource());
       }
       outFileName += ".xmi";
       outFile = new File(mOutputDir, outFileName);
       modelFileName = mOutputDir.getAbsolutePath() +
             "/" + inFile.getName() + ".ecore";
     } catch (MalformedURLException e1) {
       // invalid URL, use default processing below
     }
   }
   if (outFile == null) {
     outFile = new File(mOutputDir, "doc" + mDocNum++);
   }
   // serialize XCAS and write to output file
   try {
     writeXmi(jcas.getCas(), outFile, modelFileName);
   } catch (IOException e) {
     throw new ResourceProcessException(e);
   } catch (SAXException e) {
     throw new ResourceProcessException(e);
   }
 }</programlisting>

         </section>

         <section id="ugr.tug.cpe.cas_consumer.optional_methods">
           <title>Optional Methods</title>
           <para>The following methods are optional in a CAS Consumer, though they are often
             used.</para>
           <section id="ugr.tug.cpe.cas_consumer.optional_methods.batchprocesscomplete">
             <title>batchProcessComplete()</title>

             <para>The framework calls the batchProcessComplete() method at the end of each
               batch of CASes. This gives the CAS Consumer or Analysis Engine
               an opportunity to perform any batch
               level processing. Our simple XMI Writer CAS Consumer does not perform any
               batch level processing, so this method is empty. Batch size is set in the
               Collection Processing Engine descriptor.</para>
           </section>

           <section id="ugr.tug.cpe.cas_consumer.optional_methods.collectionprocesscomplete">
             <title>collectionProcessComplete()</title>

             <para>The framework calls the collectionProcessComplete() method at the end
               of the collection (i.e., when all objects in the collection have been
               processed). At this point in time, no CAS is passed in as a parameter. This gives
               the CAS Consumer or Analysis Engine an opportunity to perform collection processing over the
               entire set of objects in the collection. Our simple XMI Writer CAS Consumer
               does not perform any collection level processing, so this method is
               empty.</para>
           </section>

         </section>

       </section>
     </section>
   </section>

   <section id="ugr.tug.cpe.deploying_a_cpe">
     <title>Deploying a CPE</title>

     <para>The CPM provides a number of service and deployment options that cover
       instantiation and execution of CPEs, error recovery, and local and distributed
       deployment of the CPE components. The behavior of the CPM (and correspondingly, the
       CPE) is controlled by various options and parameters set in the CPE descriptor. The
       current version of the CPE Configurator tool, however, supports only default error
       handling and deployment options. To change these options, you must manually edit the
       CPE descriptor.</para>

     <para>Eventually the CPE Configurator tool will support configuring these options and a
       detailed tutorial for these settings will be provided. In the meantime, we provide only
       a high-level, conceptual overview of these advanced features in the rest of this
       chapter, and refer the advanced user to <olink targetdoc="&uima_docs_ref;"
         targetptr="ugr.ref.xml.cpe_descriptor"/> for details on setting these options in the CPE
       Descriptor.</para>

     <para> <xref linkend="ugr.tug.cpe.fig.cpe_instantiation"/> shows a logical view of
       how an application uses the UIMA framework to instantiate a CPE from a CPE descriptor.
       The CPE descriptor identifies the CPE components (referencing their corresponding
       descriptors) and specifies the various options for configuring the CPM and deploying
       the CPE components.</para>

     <figure id="ugr.tug.cpe.fig.cpe_instantiation">
       <title>CPE Instantiation</title>
       <mediaobject>
         <imageobject>
           <imagedata width="5.84in" format="PNG"
             fileref="&imgroot;image018.png"/>
         </imageobject>
         <textobject><phrase>Picture of deployment of a CPE</phrase></textobject>
       </mediaobject>
     </figure>

     <para id="ugr.tug.cpe.deployment_alternatives">There are three deployment modes
       for CAS Processors (Analysis Engines and CAS Consumers) in a CPE:</para>

     <orderedlist><listitem><para><emphasis role="bold">Integrated</emphasis> (runs
       in the same Java instance as the CPM)</para></listitem>

       <listitem><para><emphasis role="bold">Managed</emphasis> (runs in a separate
         process on the same machine), and</para></listitem>

       <listitem><para><emphasis role="bold">Non-managed</emphasis> (runs in a
         separate process, perhaps on a different machine). </para></listitem>
     </orderedlist>

     <para>An integrated CAS Processor runs in the same JVM as the CPE. A managed CAS Processor
       runs in a separate process from the CPE, but still on the same computer. The CPE controls
       startup, shutdown, and recovery of a managed CAS Processor. A non-managed CAS
       Processor runs as a service and may be on the same computer as the CPE or on a remote
       computer. A non-managed CAS Processor <emphasis role="bold-italic">
       service</emphasis> is started and managed independently from the CPE.</para>

     <para>For both managed and non-managed CAS Processors, the CAS must be transmitted
       between separate processes and possibly between separate computers. This is
       accomplished using <emphasis>Vinci</emphasis>, a communication protocol used by
       the CPM and which is provided as a part of Apache UIMA. Vinci handles service naming and
       location and data transport (see <olink targetdoc="&uima_docs_tutorial_guides;"
         targetptr="ugr.tug.application.how_to_deploy_a_vinci_service"/>&nbsp; for more
       information). Service naming and location are provided by a <emphasis>Vinci Naming
       Service</emphasis>, or <emphasis>VNS</emphasis>. For managed CAS Processors, the
       CPE uses its own internal VNS. For non-managed CAS Processors, a separate VNS must be
       running.</para> <note><para>The UIMA SDK also supports using unmanaged remote
     services via the web-standard SOAP communications protocol (see <olink
       targetdoc="&uima_docs_tutorial_guides;"
       targetptr="ugr.tug.application.how_to_deploy_as_soap"/>. This approach is
     based on a proxy implementation, where the proxy is essentially running in an integrated
     mode. To use this approach with the CPM, use the Integrated mode, with the component being
     an Aggregate which, in turn, connects to a remote service. </para></note>

     <para>The CPE Configurator tool currently only supports constructing CPEs that deploy
       CAS Processors in integrated mode. To deploy CAS Processors in any other mode, the CPE
       descriptor must be edited by hand (better tooling may be provided later). Details on the
       CPE descriptor and the required settings for various CAS Processor deployment modes
       can be found in <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/>
       . In the following sections we merely summarize the various CAS Processor deployment
       options.</para>

     <section id="ugr.tug.cpe.managed_deployment">
       <title>Deploying Managed CAS Processors</title>

       <para>Managed CAS Processor deployment is shown in <xref
           linkend="ugr.tug.cpe.fig.managed_deployment"/>. A managed CAS Processor is
         deployed by the CPE as a Vinci service. The CPE manages the lifecycle of the CAS
         Processor including service launch, restart on failures, and service shutdown. A
         managed CAS Processor runs on the same machine as the CPE, but in a separate process.
         This provides the necessary fault isolation for the CPE to protect it from non-robust
         CAS Processors. A fatal failure of a managed CAS Processor does not threaten the
         stability of the CPE.</para>

       <figure id="ugr.tug.cpe.fig.managed_deployment">
         <title>CPE with Managed CAS Processors</title>
         <mediaobject>
           <imageobject>
             <imagedata width="3.6in" format="PNG"
               fileref="&imgroot;image020.png"/>
           </imageobject>
           <textobject><phrase>Managed deployment showing separate JVMs and CASes
             flowing between them</phrase></textobject>
         </mediaobject>
       </figure>

       <para>The CPE communicates with managed CAS Processors using the Vinci communication
         protocol. A CAS Processor is launched as a Vinci service and its
         <literal>process()</literal> method is invoked remotely via a Vinci command. The
         CPE uses its own internal VNS to support managed CAS processors. The VNS, by default,
         listens on port 9005. If this port is not available, the VNS will increment its listen
         port until it finds one that is available. All managed CAS Processors are internally
         configured to <quote>talk</quote> to the CPE managed VNS. This internal VNS is
         transparent to the end user launching the CPE.</para>

       <para>To deploy a managed CAS Processor, the CPE deployer must change the CPE
         descriptor. The following is a section from the CPE descriptor that shows an example
         configuration specifying a managed CAS Processor.</para>


       <programlisting>&lt;casProcessor <emphasis role="bold-italic">deployment="local"</emphasis> name="Meeting Detector TAE"&gt;
   &lt;descriptor&gt;
     &lt;include href="deploy/vinci/Deploy_MeetingDetectorTAE.xml"/&gt;
   &lt;/descriptor&gt;
   &lt;runInSeparateProcess&gt;
     &lt;exec dir="." executable="java"&gt;
       &lt;env key="CLASSPATH"
          value="src;
                 C:/Program Files/apache/uima/lib/uima-core.jar;
                 C:/Program Files/apache/uima/lib/uima-cpe.jar;
                 C:/Program Files/apache/uima/lib/uima-examples.jar;
                 C:/Program Files/apache/uima/lib/uima-adapter-vinci.jar;
                 C:/Program Files/apache/uima/lib/jVinci.jar"/>
       &lt;arg&gt;-DLOG=C:/Temp/service.log&lt;/arg&gt;
       &lt;arg&gt;org.apache.uima.reference_impl.collection.
          service.vinci.VinciAnalysisEnginerService_impl&lt;/arg&gt;
       &lt;arg&gt;${descriptor}&lt;/arg&gt;
     &lt;/exec&gt;
   &lt;/runInSeparateProcess&gt;
   &lt;deploymentParameters/&gt;
   &lt;filter/&gt;
   &lt;errorHandling&gt;
     &lt;errorRateThreshold action="terminate" value="1/100"/&gt;
     &lt;maxConsecutiveRestarts action="terminate" value="3"/&gt;
     &lt;timeout max="100000"/&gt;
   &lt;/errorHandling&gt;
   &lt;checkpoint batch="10000"/&gt;
 &lt;/casProcessor&gt;</programlisting>

       <para>See <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/> for
         details and required settings.</para>

     </section>

     <section id="ugr.tug.cpe.deploying_nonmanaged_cas_processors">
       <title>Deploying Non-managed CAS Processors</title>

       <para>Non-managed CAS Processor deployment is shown in <xref
           linkend="ugr.tug.cpe.fig.nonmanaged_cpe"/>. In non-managed mode, the CPE
         supports connectivity to CAS Processors running on local or remote computers using
         Vinci. Non-managed processors are different from managed processors in two
         aspects:

         <orderedlist><listitem><para>Non-managed processors are neither started nor
           stopped by the CPE.</para></listitem>

           <listitem><para>Non-managed processors use an independent VNS, also neither
             started nor stopped by the CPE. </para></listitem></orderedlist></para>

       <figure id="ugr.tug.cpe.fig.nonmanaged_cpe">
         <title>CPE with non-managed CAS Processors</title>
         <mediaobject>
           <imageobject>
             <imagedata width="4.8in" format="PNG"
               fileref="&imgroot;image023.png"/>
           </imageobject>
           <textobject><phrase>Non-managed CPE deployment</phrase></textobject>
         </mediaobject>
       </figure>

       <para>While non-managed CAS Processors provide the same level of fault isolation and
         robustness as managed CAS Processors, error recovery support for non-managed CAS
         Processors is much more limited. In particular, the CPE cannot restart a non-managed
         CAS Processor after an error.</para>

       <para>Non-managed CAS Processors also require a separate Vinci Naming Service
         running on the network. This VNS must be manually started and monitored by the end user
         or application. Instructions for running a VNS can be found in <olink
           targetdoc="&uima_docs_tutorial_guides;"
           targetptr="ugr.tug.application.vns.starting"/>.</para>

       <para>To deploy a non-managed CAS Processor, the CPE deployer must change the CPE
         descriptor. The following is a section from the CPE descriptor that shows an example
         configuration for the non-managed CAS Processor.</para>


       <programlisting>&lt;casProcessor <emphasis role="bold-italic">deployment="remote"</emphasis> name="Meeting Detector TAE"&gt;
   &lt;descriptor&gt;
     &lt;include href=
         "descriptors/vinciService/MeetingDetectorVinciService.xml"/&gt;
   &lt;/descriptor&gt;
   &lt;deploymentParameters/&gt;
   &lt;filter/&gt;
   &lt;errorHandling&gt;
     &lt;errorRateThreshold action="terminate" value="1/100"/&gt;
     &lt;maxConsecutiveRestarts action="terminate" value="3"/&gt;
     &lt;timeout max="100000"/&gt;
   &lt;/errorHandling&gt;
   &lt;checkpoint batch="10000"/&gt;
 &lt;/casProcessor&gt;</programlisting>

       <para>See <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/> for
         details and required settings.</para>

     </section>

     <section id="ugr.tug.cpe.integrated_deployment">
       <title>Deploying Integrated CAS Processors</title>

       <para>Integrated CAS Processors are shown in <xref
           linkend="ugr.tug.cpe.fig.integrated_deployment"/>. Here the CAS Processors
         run in the same JVM as the CPE, just like the Collection Reader and CAS Initializer.
         This deployment method results in minimal CAS communication and transport overhead
         as the CAS is shared in the same process space of the JVM. However, a CPE running with all
         integrated CAS Processors is limited in scalability by the capability of the single
         computer on which the CPE is running. There is also a stability risk associated with
         integrated processors because a poorly written CAS Processor can cause the JVM, and
         hence the entire CPE, to abort.</para>

       <figure id="ugr.tug.cpe.fig.integrated_deployment">
         <title>CPE with integrated CAS Processor</title>
         <mediaobject>
           <imageobject>
             <imagedata width="3.2in" format="PNG"
               fileref="&imgroot;image026.png"/>
           </imageobject>
           <textobject><phrase>CPE with integrated CAS Processor</phrase>
           </textobject>
         </mediaobject>
       </figure>

       <para>The following is a section from a CPE descriptor that shows an example
         configuration for the integrated CAS Processor.</para>


       <programlisting>&lt;casProcessor <emphasis role="bold-italic">deployment=<quote>integrated</quote></emphasis> name=<quote>Meeting Detector TAE</quote>&gt;
   &lt;descriptor&gt;
     &lt;include href="descriptors/tutorial/ex4/MeetingDetectorTAE.xml"/&gt;
   &lt;/descriptor&gt;
   &lt;deploymentParameters/&gt;
   &lt;filter/&gt;
   &lt;errorHandling&gt;
     &lt;errorRateThreshold action="terminate" value="100/1000"/&gt;
     &lt;maxConsecutiveRestarts action="terminate" value="30"/&gt;
     &lt;timeout max="100000"/&gt;
   &lt;/errorHandling&gt;
   &lt;checkpoint batch="10000"/&gt;
 &lt;/casProcessor&gt;</programlisting>

       <para>See <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/> for
         details and required settings.</para>

     </section>
   </section>

   <section id="ugr.tug.cpe.collection_processing_examples">
     <title>Collection Processing Examples</title>

     <para>The UIMA SDK includes a set of examples illustrating the three modes of deployment,
       integrated, managed, and non-managed. These are in the
       <literal>/examples/descriptors/collection_processing_engine</literal>
       directory. There are three CPE descriptors that run an example annotator (the Meeting
       Finder) in these modes.</para>

     <para>To run either the integrated or managed examples, use the
       <literal>runCPE</literal> script in the /bin directory of the UIMA installation,
       passing the appropriate CPE descriptor as an argument, or
       if you're using Eclipse and have the <literal>uimaj-examples</literal> project in your
     workspace, you can use the Eclipse Menu &rarr; Run &rarr; Run... &rarr; and then pick the
     launch configuration <quote>UIMA Run CPE</quote>.</para>

     <note><para>The <literal>runCPE</literal> script <emphasis role="bold-italic"> must</emphasis>
     be run from the <literal>%UIMA_HOME%\examples</literal> directory, because the example
     CPE descriptors use relative path names that are resolved relative to this working directory.
     For instance,

     <literallayout>runCPE
 descriptors\collection_processing_engine\MeetingFinderCPE_Integrated.xml</literallayout></para>
     </note>

     <!--
     <para>If you installed the examples into Eclipse, you can run directly from Eclipse by
       creating a run configuration. To do this, highlight the SimpleRunCPE.java source file
       in the examples src/org/apache/uima/examples/cpe directory, and then</para>

     <orderedlist><listitem><para>pick the menu Run &rarr; Run...</para></listitem>

       <listitem><para>click <quote>Java Application</quote> and press
         <quote>New</quote></para></listitem>

       <listitem><para>click on the Arguments panel, and insert a path to the appropriate CPE
         descriptor in the <quote>Program Arguments</quote> box by typing, for instance:
         <literal>descriptors/collection_processing_engine/
           MeetingFinderCPE_Integrated.xml</literal>
         </para></listitem>

       <listitem><para>Then press <quote>Run</quote> </para></listitem>
     </orderedlist>
     -->

     <para>To run the non-managed example, there are some additional steps.

       <orderedlist><listitem><para>Start a VNS service by running the
         <literal>startVNS</literal> script in the <literal>/bin</literal>
         directory, or using the Eclipse launcher <quote>UIMA Start VNS</quote>.</para></listitem>

         <listitem><para>Deploy the Meeting Detector Analysis Engine as a Vinci service, by
           running the <literal>startVinciService</literal> script in the
           <literal>/bin</literal> directory or using the Eclipse launcher for this, and passing it the location of the
           descriptor to deploy, in this case
           <literal>%UIMA_HOME%/examples/deploy/vinci/Deploy_MeetingDetectorTAE.xml</literal>,
           or
       if you're using Eclipse and have the <literal>uimaj-examples</literal> project in your
     workspace, you can use the Eclipse Menu &rarr; Run &rarr; Run... &rarr; and then pick the
     launch configuration <quote>UIMA Start Vinci Service</quote>.
           </para></listitem>

         <listitem><para>Now, run the runCPE script (or if in Eclipse, run the
           launch configuration <quote>UIMA Run CPE</quote>), passing it the CPE for the non-managed
           version
           <literal>(%UIMA_HOME%/examples/descriptors/collection_processing_engine/
             MeetingFinderCPE_NonManaged.xml</literal>
           ). </para></listitem></orderedlist></para>

     <para>This assumes that the Vinci Naming Service, the runCPE application, and the
       <literal>MeetingDetectorTAE</literal> service are all running on the same machine.
       Most of the scripts that need information about VNS will look for values to use in
       environment variables VNS_HOST and VNS_PORT; these default to
       <quote>localhost</quote> and <quote>9000</quote>. You may set these to appropriate
       values before running the scripts, as needed; you can also pass the name of the VNS host as
       the second argument to the startVinciService script.</para>

     <para>Alternatively, you can edit the scripts and/or the XML files to specify
       alternatives for the VNS_HOST and VNS_PORT. For instance, if the
       <literal>runCPE</literal> application is running on a different machine from the
       Vinci Naming Service, you can edit the
       <literal>MeetingFinderCPE_NonManaged.xml</literal> and change the vnsHost
       parameter:
       <literal>&lt;parameter name="vnsHost"  value="localhost" type="string"/&gt;</literal>
       to specify the VNS host instead of <quote>localhost</quote>.</para>
   </section>

 </chapter>