| <html><head> |
| <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> |
| <title>UIMA Tools Guide and Reference</title><link rel="stylesheet" type="text/css" href="css/stylesheet-html.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div lang="en" class="book" title="UIMA Tools Guide and Reference" id="d5e1"><div xmlns:d="http://docbook.org/ns/docbook" class="titlepage"><div><div><h1 class="title">UIMA Tools Guide and Reference</h1></div><div><div class="authorgroup"> |
| <h3 class="corpauthor">Written and maintained by the Apache UIMA™ Development Community</h3> |
| </div></div><div><p class="releaseinfo">Version 3.1.1</p></div><div><p class="copyright">Copyright © 2006, 2019 The Apache Software Foundation</p></div><div><div class="legalnotice" title="Legal Notice"><a name="d5e8"></a> |
| <p> </p> |
| <p title="License and Disclaimer"> |
| <b>License and Disclaimer. </b> |
| |
| The ASF licenses this documentation |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this documentation except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| </p><div class="blockquote"><blockquote class="blockquote"> |
| <a class="ulink" href="http://www.apache.org/licenses/LICENSE-2.0" target="_top">http://www.apache.org/licenses/LICENSE-2.0</a> |
| </blockquote></div><p title="License and Disclaimer"> |
| |
| Unless required by applicable law or agreed to in writing, |
| this documentation and its contents are distributed under the License |
| on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| |
| </p> |
| <p> </p> |
| <p> </p> |
| <p title="Trademarks"> |
| <b>Trademarks. </b> |
| All terms mentioned in the text that are known to be trademarks or |
| service marks have been appropriately capitalized. Use of such terms |
| in this book should not be regarded as affecting the validity of the |
| the trademark or service mark. |
| |
| </p> |
| </div></div><div><p class="pubdate">November, 2019</p></div></div><hr></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="chapter"><a href="#ugr.tools.cde">1. CDE User's Guide</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.cde.launching">1.1. Launching the Component Descriptor Editor</a></span></dt><dt><span class="section"><a href="#ugr.tools.cde.creating_new_ae_descriptor">1.2. Creating a New AE Descriptor</a></span></dt><dt><span class="section"><a href="#ugr.tools.cde.pages_within_the_editor">1.3. Pages within the Editor</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.cde.adjusting_display_of_pages">1.3.1. Adjusting the display of pages</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tools.cde.overview_page">1.4. Overview Page</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.cde.overview_page.implementation_details">1.4.1. Implementation Details</a></span></dt><dt><span class="section"><a href="#ugr.tools.cde.overview_page.runtime_info">1.4.2. Runtime Information</a></span></dt><dt><span class="section"><a href="#ugr.tools.cde.overview_page.overall_id_info">1.4.3. Overall Identification Information</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tools.cde.aggregate_page">1.5. Aggregate Page</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.cde.aggregate_page.adding_components_more_than_once">1.5.1. Adding components more than once</a></span></dt><dt><span class="section"><a href="#ugr.tools.cde.aggregate_page.adding_removing_components_from_flow">1.5.2. Adding or Removing components in a flow</a></span></dt><dt><span class="section"><a href="#ugr.tools.cde.aggregate_page.adding_remote_aes">1.5.3. Adding remote Analysis Engines</a></span></dt><dt><span class="section"><a href="#ugr.tools.cde.aggregate_page.connecting_to_remote_services">1.5.4. Connecting to Remote Services</a></span></dt><dt><span class="section"><a href="#ugr.tools.cde.aggregate_page.finding_aes_by_searching">1.5.5. Finding Analysis Engines by searching</a></span></dt><dt><span class="section"><a href="#ugr.tools.cde.aggregate_page.component_engine_flow">1.5.6. Component Engine Flow</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tools.cde.parm_definition">1.6. Parameters Definition Page</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.cde.parm_definition.using_groups">1.6.1. Using groups</a></span></dt><dt><span class="section"><a href="#ugr.tools.cde.parm_definition.adding">1.6.2. Adding or Editing a Parameter</a></span></dt><dt><span class="section"><a href="#ugr.tools.cde.parm_definition.aggregates">1.6.3. Parameter declarations for Aggregates</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tools.cde.parameter_settings">1.7. Parameter Settings Page</a></span></dt><dt><span class="section"><a href="#ugr.tools.cde.type_system">1.8. Type System Page</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.cde.type_system.exporting">1.8.1. Exporting</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tools.cde.capabilities">1.9. Capabilities Page</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.cde.capabilities.sofa_name_mapping">1.9.1. Sofa (and view) name mappings</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tools.cde.indexes">1.10. Indexes Page</a></span></dt><dt><span class="section"><a href="#ugr.tools.cde.resources">1.11. Resources Page</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.cde.resources.binding">1.11.1. Binding</a></span></dt><dt><span class="section"><a href="#ugr.tools.cde.resources.aggregates">1.11.2. Resources with Aggregates</a></span></dt><dt><span class="section"><a href="#ugr.tools.cde.resources.imports_exports">1.11.3. Imports and Exports</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tools.cde.source">1.12. Source Page</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.cde.source.formatting">1.12.1. Source formatting – indentation</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tools.cde.creating_self_contained_type_system">1.13. Creating a Self-Contained Type System</a></span></dt><dt><span class="section"><a href="#ugr.tools.cde.creating_other_descriptor_components">1.14. Creating Other Descriptor Components</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.tools.cpe">2. CPE Configurator User's Guide</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.cpe.limitations">2.1. Limitations of the CPE Configurator</a></span></dt><dt><span class="section"><a href="#ugr.tools.cpe.starting">2.2. Starting the CPE Configurator</a></span></dt><dt><span class="section"><a href="#ugr.tools.cpe.selecting_component_descriptors">2.3. Selecting Component Descriptors</a></span></dt><dt><span class="section"><a href="#ugr.tools.cpe.running">2.4. Running a Collection Processing Engine</a></span></dt><dt><span class="section"><a href="#ugr.tools.cpe.file_menu">2.5. The File Menu</a></span></dt><dt><span class="section"><a href="#ugr.tools.cpe.help_menu">2.6. The Help Menu</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.tools.doc_analyzer">3. Document Analyzer User's Guide</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.doc_analyzer.starting">3.1. Starting the Document Analyzer</a></span></dt><dt><span class="section"><a href="#ugr.tools.doc_analyzer.running_an_ae">3.2. Running an AE</a></span></dt><dt><span class="section"><a href="#ugr.tools.doc_analyzer.viewing_results">3.3. Viewing the Analysis Results</a></span></dt><dt><span class="section"><a href="#ugr.tools.doc_analyzer.configuring">3.4. Configuring the Annotation Viewer</a></span></dt><dt><span class="section"><a href="#ugr.tools.doc_analyzer.interactive_mode">3.5. Interactive Mode</a></span></dt><dt><span class="section"><a href="#ugr.tools.doc_analyzer.view_mode">3.6. View Mode</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.tools.annotation_viewer">4. Annotation Viewer</a></span></dt><dt><span class="chapter"><a href="#ugr.tools.cvd">5. CAS Visual Debugger</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.cvd.introduction">5.1. Introduction</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.cvd.introduction.running">5.1.1. Running CVD</a></span></dt><dt><span class="section"><a href="#cvd.introduction.commandline">5.1.2. Command line parameters</a></span></dt></dl></dd><dt><span class="section"><a href="#cvd.errorHandling">5.2. Error Handling</a></span></dt><dt><span class="section"><a href="#cvd.preferencesFile">5.3. Preferences File</a></span></dt><dt><span class="section"><a href="#cvd.theMenus">5.4. The Menus</a></span></dt><dd><dl><dt><span class="section"><a href="#cvd.fileMenu">5.4.1. The File Menu</a></span></dt><dt><span class="section"><a href="#cvd.editMenu">5.4.2. The Edit Menu</a></span></dt><dt><span class="section"><a href="#cvd.runMenu">5.4.3. The Run Menu</a></span></dt><dt><span class="section"><a href="#cvd.toolsMenu">5.4.4. The tools menu</a></span></dt></dl></dd><dt><span class="section"><a href="#cvd.mainDisplayArea">5.5. The Main Display Area</a></span></dt><dd><dl><dt><span class="section"><a href="#cvd.statusBar">5.5.1. The Status Bar</a></span></dt><dt><span class="section"><a href="#cvd.keyboardNavigation">5.5.2. Keyboard Navigation and Shortcuts</a></span></dt></dl></dd></dl></dd><dt><span class="chapter"><a href="#ugr.tools.eclipse_launcher">6. Eclipse Analysis Engine Launcher's Guide</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.eclipse_launcher.create_configuration">6.1. Creating an Analysis Engine launch configuration</a></span></dt><dt><span class="section"><a href="#ugr.tools.eclipse_launcher.launching">6.2. Launching an Analysis Engine</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.tools.ce">7. Cas Editor User's Guide</a></span></dt><dd><dl><dt><span class="section"><a href="#sandbox.caseditor.Introduction">7.1. Introduction</a></span></dt><dt><span class="section"><a href="#sandbox.caseditor.Launching">7.2. Launching the Cas Editor</a></span></dt><dd><dl><dt><span class="section"><a href="#sandbox.caseditor.typeSystemSpec">7.2.1. Specifying a type system</a></span></dt></dl></dd><dt><span class="section"><a href="#sandbox.caseditor.annotation_editor">7.3. Annotation editor</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.cas_editor.annotation_editor.editor">7.3.1. Editor</a></span></dt><dt><span class="section"><a href="#sandbox.caseditor.annotation_editor.styling">7.3.2. Configure annotation styling</a></span></dt><dt><span class="section"><a href="#ugr.tools.cas_editor.annotation_editor.cas_views">7.3.3. CAS view support</a></span></dt><dt><span class="section"><a href="#ugr.tools.cas_editor.annotation_editor.outline">7.3.4. Outline view</a></span></dt><dt><span class="section"><a href="#ugr.tools.cas_editor.annotation_editor.properties_view">7.3.5. Edit Views</a></span></dt><dt><span class="section"><a href="#ugr.tools.cas_editor.annotation_editor.fs_view">7.3.6. FeatureStructure View</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tools.cas_editor.custom_view">7.4. Implementing a custom Cas Editor View</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.cas_editor.custom_view.sample">7.4.1. Annotation Status View Sample</a></span></dt></dl></dd></dl></dd><dt><span class="chapter"><a href="#ugr.tools.jcasgen">8. JCasGen User's Guide</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.jcasgen.running_without_eclipse">8.1. Running stand-alone without Eclipse</a></span></dt><dt><span class="section"><a href="#ugr.tools.jcasgen.running_standalone_with_eclipse">8.2. Running stand-alone with Eclipse</a></span></dt><dt><span class="section"><a href="#ugr.tools.jcasgen.running_within_eclipse">8.3. Running within Eclipse</a></span></dt><dt><span class="section"><a href="#ugr.tools.jcasgen.maven_plugin">8.4. Using the jcasgen-maven-plugin</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.tools.pear.packager">9. PEAR Packager User's Guide</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.pear.packager.using_eclipse_plugin">9.1. Using the PEAR Eclipse Plugin</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.pear.packager.add_uima_nature">9.1.1. Add UIMA Nature to your project</a></span></dt><dt><span class="section"><a href="#ugr.tools.pear.packager.using_pear_generation_wizard">9.1.2. Using the PEAR Generation Wizard</a></span></dt></dl></dd><dt><span class="section"><a href="#ugr.tools.pear.packager.using_command_line">9.2. Using the PEAR command line packager</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.tools.pear.packager.maven.plugin.usage">10. The PEAR Packaging Maven Plugin</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.pear.packager.maven.plugin.usage.configure">10.1. Specifying the PEAR Packaging Maven Plugin</a></span></dt><dt><span class="section"><a href="#ugr.tools.pear.packager.maven.plugin.usage.dependencies">10.2. Automatically including dependencies</a></span></dt><dt><span class="section"><a href="#ugr.tools.pear.packager.maven.plugin.commandline">10.3. Running from the command line</a></span></dt><dt><span class="section"><a href="#ugr.tools.pear.packager.maven.plugin.install.src">10.4. Building the PEAR Packaging Plugin From Source</a></span></dt></dl></dd><dt><span class="chapter"><a href="#ugr.tools.pear.installer">11. PEAR Installer User's Guide</a></span></dt><dt><span class="chapter"><a href="#ugr.tools.pear.merger">12. PEAR Merger User's Guide</a></span></dt><dd><dl><dt><span class="section"><a href="#ugr.tools.pear.merger.merge_details">12.1. Details of the merging process</a></span></dt><dt><span class="section"><a href="#ugr.tools.pear.merger.testing_modifying_resulting_pear">12.2. Testing and Modifying the resulting PEAR</a></span></dt><dt><span class="section"><a href="#ugr.tools.pear.merger.restrictions_limitations">12.3. Restrictions and Limitations</a></span></dt></dl></dd></dl></div> |
| |
| |
| |
| <div class="chapter" title="Chapter 1. Component Descriptor Editor User's Guide" id="ugr.tools.cde"><div class="titlepage"><div><div><h2 class="title">Chapter 1. Component Descriptor Editor User's Guide</h2></div></div></div> |
| |
| |
| |
| <p>The Component Descriptor Editor is an Eclipse plug-in that provides a forms-based |
| interface for creating and editing UIMA XML descriptors. It supports most of the |
| descriptor formats, except the Collection Processing Engine descriptor, the PEAR |
| package descriptor and some remote deployment descriptors.</p> |
| |
| <div class="section" title="1.1. Launching the Component Descriptor Editor"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.cde.launching">1.1. Launching the Component Descriptor Editor</h2></div></div></div> |
| |
| |
| <p>Here's how to launch this tool on a descriptor contained in the examples. This |
| presumes you have installed the examples as described in the SDK Installation and Setup |
| chapter.</p> |
| |
| <div class="itemizedlist"><ul class="itemizedlist" type="disc" compact><li class="listitem"><p>Expand the uimaj-examples |
| project in the Eclipse Navigator or Package Explorer view</p></li><li class="listitem"><p>Within this project, browse to the file |
| descriptors/tutorial/ex1/RoomNumberAnnotator.xml.</p></li><li class="listitem"><p>Right-click on this file and select Open With <span class="symbol">→</span> Component |
| Descriptor Editor. (If this option is not present, check to make sure you installed |
| the plug-ins as described in <a href="overview_and_setup.html#ugr.ovv.eclipse_setup.installation" class="olink">Section 3.1, “Installation”</a> of the <a href="overview_and_setup.html#d4e1" class="olink">UIMA Overview & SDK Setup</a> book. |
| The EMF plugin is also |
| required.)</p></li><li class="listitem"><p>This should open a graphical editor and display the contents of the |
| RoomNumberAnnotator descriptor. </p></li></ul></div> |
| |
| </div> |
| |
| <div class="section" title="1.2. Creating a New AE Descriptor"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.cde.creating_new_ae_descriptor">1.2. Creating a New AE Descriptor</h2></div></div></div> |
| |
| |
| <p>A new AE descriptor file may be created by selecting the File <span class="symbol">→</span> New <span class="symbol">→</span> |
| Other... menu. This brings up the following dialog: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="574"><tr><td><img src="images/tools/tools.cde/image002.jpg" width="574" alt="Screenshot of selecting new UIMA component in Eclipse"></td></tr></table></div> |
| </div> |
| |
| <p>If the user then selects UIMA and Analysis Engine Descriptor File, and clicks the |
| Next > button, the following dialog is displayed. We will cover creating other kinds |
| of components later in the documentation. |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="317"><tr><td><img src="images/tools/tools.cde/image004.jpg" width="317" alt="Screenshot of selecting new UIMA component in Eclipse after pushing Next"></td></tr></table></div> |
| </div> |
| |
| <p>After entering the appropriate parent folder and file name, and clicking Finish, |
| an initial AE descriptor file is created with the given name, and the descriptor is |
| opened up within the Component Descriptor Editor.</p> |
| |
| <p>At this point, the display inside the Component Descriptor Editor is the same |
| whether one started by creating a new AE descriptor, as in the preceding paragraph, or |
| one merely opened a previously created AE descriptor from, say, the Package Explorer |
| view. We show a previously created AE in the figure below: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="564"><tr><td><img src="images/tools/tools.cde/image006.jpg" width="564" alt="Screenshot of CDE showing overview page"></td></tr></table></div> |
| </div> |
| |
| <p>To see all the information shown in the main editor pane with less scrolling, double |
| click the title tab to toggle between the <span class="quote">“<span class="quote">full screen</span>”</span> and normal |
| views.</p> |
| |
| <p>It is possible to set the Component Descriptor Editor as the default editor for all |
| .xml files by going to Window <span class="symbol">→</span> Preferences, and then selecting File Associations |
| on the left, and *.xml on the right, and finally by clicking on Component Descriptor |
| Editor, the Default button and then OK. If AE and Type System descriptors are not the |
| primary .xml files you work with within the Eclipse environment, we recommend not |
| setting the Component Descriptor Editor as your default editor for all .xml files. To |
| open an .xml file using the Component Descriptor Editor, if the Component Descriptor |
| Editor is not set as your default editor, right click on the file in the Package Explorer, |
| or other navigational view, and select Open With <span class="symbol">→</span> Component Descriptor Editor. |
| This choice is remembered by Eclipse for subsequent open operations.</p> |
| |
| </div> |
| |
| <div class="section" title="1.3. Pages within the Editor"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.cde.pages_within_the_editor">1.3. Pages within the Editor</h2></div></div></div> |
| |
| |
| <p>The Component Descriptor Editor follows a standard Eclipse paradigm for these |
| kinds of editors. There are several pages in the editor; each one can be selected, one at a |
| time, by clicking on the bottom tabs. The last page contains the actual XML source file |
| being edited, and is displayed as plain text.</p> |
| |
| <p>The same set of tabs appear at the bottom of each page in the Component Descriptor |
| Editor. The Component Descriptor Editor uses this <span class="quote">“<span class="quote">multi-page editor</span>”</span> |
| paradigm to give the user a view of conceptually distinct portions of the Descriptor |
| metadata in separate pages. At any point in time the user may click on the Source tab to |
| view the actual XML source. The Component Descriptor Editor is, in a way, just a fancy GUI |
| for editing the XML. The tabs provide quick access to the following pages: Overview, |
| Aggregate, Parameters, Parameter Settings, Type System, Capabilities, Indexes, |
| Resources, and Source. We discuss each of these pages in turn.</p> |
| |
| <div class="section" title="1.3.1. Adjusting the display of pages"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.cde.adjusting_display_of_pages">1.3.1. Adjusting the display of pages</h3></div></div></div> |
| |
| |
| <p>Most pages in the editor have a <span class="quote">“<span class="quote">sash</span>”</span> bar. This is a light gray bar |
| which separates sub-sections of the page. This bar can be dragged with the mouse to |
| adjust how the display area is split between the two sash panes. You can also change the |
| orientation of the Sash so it splits vertically, instead of horizontally, by |
| clicking on the small icons at the top right of the page that look like this: |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="69"><tr><td><img src="images/tools/tools.cde/image008.jpg" width="69" alt="Changing orientation of two window split"></td></tr></table></div> |
| </div> |
| |
| <p>All of the sections on a page have subtitles, with an indicator to the left which |
| you can click to collapse or expand that particular section. Collapsing sections can |
| sometimes be useful to free up screen area for other sections.</p> |
| |
| </div> |
| </div> |
| |
| <div class="section" title="1.4. Overview Page"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.cde.overview_page">1.4. Overview Page</h2></div></div></div> |
| |
| |
| <p>Normally, the first page displayed in the Component Descriptor Editor is the |
| Overview page (the name of the page is shown in the GUI panel at the top left). If there is an |
| error reading and parsing the source, the Source page is shown instead, giving you the |
| opportunity to correct the problem. For many components, the Overview page contains |
| three sections: Implementation Details, Runtime Information and overall |
| Identification Information.</p> |
| |
| <div class="section" title="1.4.1. Implementation Details"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.cde.overview_page.implementation_details">1.4.1. Implementation Details</h3></div></div></div> |
| |
| |
| <p>In the Implementation Details section you specify the Implementation Language |
| and Engine Type. There are two kinds of Engines: Aggregate, and non-Aggregate (also |
| called Primitive). An Aggregate engine is one which is composed of additional |
| component engines and contains no code, itself. Several of the pages in the Component |
| Descriptor Editor have different formats, depending on the engine type.</p> |
| |
| </div> |
| <div class="section" title="1.4.2. Runtime Information"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.cde.overview_page.runtime_info">1.4.2. Runtime Information</h3></div></div></div> |
| |
| |
| <p>Runtime information is only applicable for primitive engines and is disabled |
| for aggregates and other kinds of descriptors. This is where you specify the class name of the annotator |
| implementation, if you are doing a Java implementation, or the C++ shared object or dll name, |
| if you are doing a C++ implementation. Most Analysis Engines will specify that |
| they update the CAS, and that they may be replicated (for performance reasons) when deployed. If |
| a particular Analysis Engine must see every CAS (for instance, if it is counting the |
| number of CASes), then uncheck the <span class="quote">“<span class="quote">multiple deployment allowed</span>”</span> |
| box. If the Analysis Engine doesn't update the CAS, uncheck the <span class="quote">“<span class="quote">updates |
| the CAS</span>”</span> box. (Most CAS Consumers do not update the CAS, and this parameter |
| defaults to unchecked for new CAS Consumer descriptors).</p> |
| |
| <p>Analysis engines are written using the CAS Multiplier APIs |
| (see <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> |
| <a href="tutorials_and_users_guides.html#ugr.tug.cm" class="olink">Chapter 7, <i>CAS Multiplier Developer's Guide</i></a>) |
| can create additional CASes for analysis. To specify that they |
| do this, check the <span class="quote">“<span class="quote">returns new artifacts</span>”</span>.</p> |
| |
| </div> |
| |
| <div class="section" title="1.4.3. Overall Identification Information"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.cde.overview_page.overall_id_info">1.4.3. Overall Identification Information</h3></div></div></div> |
| |
| |
| <p>The Name should be a human-readable name that describes this component. The |
| Version, Vendor, and Description fields are optional, and are arbitrary |
| strings.</p> |
| |
| </div> |
| </div> |
| |
| <div class="section" title="1.5. Aggregate Page"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.cde.aggregate_page">1.5. Aggregate Page</h2></div></div></div> |
| |
| |
| <p>For primitive Analysis Engines, Flow Controllers or Collection Processing |
| components, the Aggregate page is not used. For aggregate engines, the page looks like |
| this: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="564"><tr><td><img src="images/tools/tools.cde/image010.jpg" width="564" alt="CDE Aggregate page"></td></tr></table></div> |
| </div> |
| |
| <p>On the left we see a list of component engines, and on the right information about the |
| flow. If you hover the mouse over an item in the list of component engines, that |
| engine's description meta data will be shown. If you right-click on one of these |
| items, you get an option to open that delegate descriptor in another editor instance. |
| Any changes you make, however, won't be seen until you close and reopen the editor |
| on the importing file.</p> |
| |
| <p>Engines can be added to the list on the left by clicking the Add button at the bottom of |
| the Component Engine section. This brings up one of the following two dialogs: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="384"><tr><td><img src="images/tools/tools.cde/import-by-location.jpg" width="384" alt="Adding an Analysis Engine to an Aggregate, by location"></td></tr></table></div> |
| </div> |
| |
| <p>This dialog lets you select |
| a descriptor from your workspace, or browse the file system to select a descriptor. |
| </p> |
| |
| <p>Or, if you have selected to import by name, this dialog is shown: |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="524"><tr><td><img src="images/tools/tools.cde/import-by-name.jpg" width="524" alt="Adding an Analysis Engine to an Aggregate, by name"></td></tr></table></div> |
| </div> |
| |
| <p>You can specify that the import should be by Name (the name is looked up using both the |
| Project's class path, and DataPath), or by location. If it is by name, |
| the dialog shows the available xml files on the class path, to pick from. If the |
| one you want isn't showing, this means it isn't on the enclosing Eclipse Java Project's |
| classpath, nor on the datapath, and one of those needs to be updated to include the |
| path to the resource. If the name picked is |
| <code class="literal">com/company/prod/xyz.xml</code>, the name in |
| the descriptor will be <span class="quote">“<span class="quote"><code class="literal">com.company.prod.xyz</code></span>”</span>. |
| The "Browse the file system..." button is disabled when import by name is checked, because |
| the file system is not the source of the imports - rather, its the resources on the |
| classpath or datapath that are.</p> |
| |
| <p> |
| If it is by location, the file reference is converted to a relative reference if |
| possible, in the descriptor.</p> |
| |
| <p>The final selection at the bottom tells whether or not the selected engine(s) |
| should automatically be added to the end of the flow section (the right section on the |
| Aggregate page). The OK button does not become activated until a descriptor |
| file is selected.</p> |
| |
| <p>To remove an analysis engine from the component engine list simply select an engine |
| and click the Remove button, or press the delete key. If the engine is already in the flow |
| list you will be warned that deletion will also delete the specified engine from this |
| list.</p> |
| |
| <div class="section" title="1.5.1. Adding components more than once"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.cde.aggregate_page.adding_components_more_than_once">1.5.1. Adding components more than once</h3></div></div></div> |
| |
| |
| <p>Components may be added to the left panel more than once. Each of these components |
| will be given a key which is unique. A typical reason this might be done is to use a |
| component in a flow several times, but have each use be associated with different |
| configuration parameters (different configuration parameters can be associated |
| with each instance).</p> |
| </div> |
| |
| <div class="section" title="1.5.2. Adding or Removing components in a flow"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.cde.aggregate_page.adding_removing_components_from_flow">1.5.2. Adding or Removing components in a flow</h3></div></div></div> |
| |
| |
| <p>The button in-between the Component Engines and the Flow List, labeled |
| <code class="literal">>></code>, adds a chosen engine to the flow list and the button |
| labeled <code class="literal"><<</code> removes an engine from the flow list. To add an |
| engine to the flow list you must first select an engine from the left hand list, and then |
| press the <code class="literal">>></code> button. Engines may appear any number of |
| times in the flow list. To remove an engine from the flow list, select an engine from the |
| right hand list and press the <code class="literal"><<</code> button.</p> |
| </div> |
| |
| <div class="section" title="1.5.3. Adding remote Analysis Engines"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.cde.aggregate_page.adding_remote_aes">1.5.3. Adding remote Analysis Engines</h3></div></div></div> |
| |
| |
| <p>There are two ways to add remote engines: add an existing descriptor, which |
| specifies a remote engine (just as if you were adding a non-remote engine) or use the |
| Add Remote button which will create a remote descriptor, save it, and then import it, |
| all in one operation. The Add Remote button enables you to easily specify the |
| information needed to create a remote service descriptor for a remote AE - one that |
| runs on a different computer connected over the network. There are 3 kinds of |
| these: two are variants of the Service Client |
| descriptor, described in <a href="references.html#d5e1" class="olink">UIMA References</a> <a href="references.html#ugr.ref.xml.component_descriptor.service_client" class="olink">Section 2.7, “Service Client Descriptors”</a>; |
| the other is the UIMA-AS JMS Service descriptor, described in |
| <a href="uima_async_scaleout.pdf" class="olink">UIMA References</a> <span class="olink">????</span>. The Add |
| Remote button creates an instance of one of these descriptors, |
| saves it as a file in the workspace, and |
| imports it into the aggregate.</p> |
| |
| <p>Of course, if you already have a remote service descriptor, you can add it to the |
| set of delegates using the <code class="code">Add</code> button, just like adding other kinds of analysis engines.</p> |
| |
| <p>After clicking on <code class="code">Add Remote</code>, the following dialog is displayed: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="485"><tr><td><img src="images/tools/tools.cde/image014v2.jpg" width="485" alt="Adding a remote client to an aggregate"></td></tr></table></div> |
| </div> |
| |
| <p>To define a remote service you specify the Service Kind, Protocol Service Type, |
| URI and Key. You can also specify a Timeout in milliseconds, used by the SOAP and |
| JMS services, |
| and a VNS Host and Port used by the Vinci Service. |
| The JMS service has additional timeouts and other parameters you may specify. |
| Just like when one adds an engine from |
| the file system, you have the option of adding the engine to the end of the flow. The |
| Component Descriptor Editor currently only supports Vinci and SOAP services using |
| this dialog.</p> |
| |
| <p>Remote engines are added to the descriptor using the |
| <import ... > syntax. The information you specify here is saved in the Eclipse |
| project as a file, using a generated name, <key-name>.xml, where |
| <key-name> is the name you listed as the Key. Because of this, the key-name must |
| be a valid file name. If you want a different name, you can change the path information |
| in the dialog box.</p> |
| </div> |
| |
| <div class="section" title="1.5.4. Connecting to Remote Services"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.cde.aggregate_page.connecting_to_remote_services">1.5.4. Connecting to Remote Services</h3></div></div></div> |
| |
| |
| <p>If you are using the Vinci protocol, it requires that you specify the location of |
| the Vinci Name Server (an IP address and a Port number). You can specify these in the |
| service descriptor, or globally, for your Eclipse workspace, using the Eclipse menu |
| item: Window <span class="symbol">→</span> Preferences... <span class="symbol">→</span> UIMA Preferences. |
| </p> |
| |
| <p>If the remote service |
| is available (up and running), additional operations become possible. For |
| instance, hovering the mouse over the remote descriptor will show the description |
| metadata from the remote service.</p> |
| </div> |
| |
| <div class="section" title="1.5.5. Finding Analysis Engines by searching"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.cde.aggregate_page.finding_aes_by_searching">1.5.5. Finding Analysis Engines by searching</h3></div></div></div> |
| |
| |
| <p>The next button that appears between the component engine list and the flow list |
| is the Find AE button. When this button is pressed the following dialog is displayed, |
| which allows one to search for AEs by name, by input or output types, or by a combination |
| of these criteria. This function searches the existing Eclipse workspace for |
| matching *.xml descriptor source files; it does not look inside Jar files. |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="525"><tr><td><img src="images/tools/tools.cde/image016.jpg" width="525" alt="Searching for an AE to add to an aggregate"></td></tr></table></div> |
| </div> |
| |
| <p>The search automatically adds a <span class="quote">“<span class="quote">match any characters</span>”</span> - style |
| (*) wildcard at the beginning and end of anything entered. Thus, if person is |
| specified for an output type, a <span class="quote">“<span class="quote">*person*</span>”</span> search is performed. Such a |
| search would match such things as <span class="quote">“<span class="quote">my.namespace.person</span>”</span> and |
| <span class="quote">“<span class="quote">person.governmentOfficial.</span>”</span> One can search in all projects or one |
| particular project. The search does an implicit <span class="emphasis"><em>and</em></span> on all |
| fields which are left non-blank.</p> |
| </div> |
| |
| <div class="section" title="1.5.6. Component Engine Flow"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.cde.aggregate_page.component_engine_flow">1.5.6. Component Engine Flow</h3></div></div></div> |
| |
| |
| <p>The UIMA SDK currently supports three kinds of sequencing flows: Fixed, |
| CapabilityLanguageFlow, and user-defined |
| (see <a href="references.html#d5e1" class="olink">UIMA References</a> <a href="references.html#ugr.ref.xml.component_descriptor.aes.aggregate.flow_constraints" class="olink">Section 2.4.2.3, “FlowConstraints”</a>). |
| The first two require specification of a linear flow sequence; |
| this linear flow sequence can also be read by a user-defined flow controller (what use |
| is made of it is up to the user-defined flow controller). The Component Engine Flow |
| section allows specification of these items.</p> |
| |
| <p>The pull-down labeled Flow Kind picks between the three flow models. When the |
| user-defined flow is selected, the Browse and Search buttons become enabled to let |
| you pick the flow controller XML descriptor to import. |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="376"><tr><td><img src="images/tools/tools.cde/image018.jpg" width="376" alt="Specifying flow control"></td></tr></table></div> |
| </div> |
| |
| <p>The key name value is set automatically from the XML descriptor being imported, |
| and enables parameters to be overridden for that descriptor (see following |
| sections).</p> |
| |
| <p>The Up and Down buttons to the right in the Flow section are activated when an |
| engine in the flow is selected. The Up button moves the selected engine up one place in |
| the execution order, and down moves the selected engine down one place in the |
| execution order. Remember that engines can appear multiple times in the flow (or not |
| at all).</p> |
| |
| </div> |
| </div> |
| |
| <div class="section" title="1.6. Parameters Definition Page"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.cde.parm_definition">1.6. Parameters Definition Page</h2></div></div></div> |
| |
| |
| <p>There are two pages for parameters: the first one is where parameters are defined, |
| and the second one is where the parameter settings are configured. The first page is the |
| Parameter Definition page and has two alternatives, depending on whether or not the |
| descriptor is an Aggregate or not. We start with a description of parameter definitions |
| for Primitive engines, CAS Consumers, Collection Readers, CAS Initializers, and Flow |
| Controllers. Here is an example: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="515"><tr><td><img src="images/tools/tools.cde/image020.jpg" width="515" alt="Parameter Definitions - not Aggregate"></td></tr></table></div> |
| </div> |
| |
| <p>The first checkbox at the top simplifies things if you are not using Parameter |
| Groups (see the following section for a discussion of groups). In this case, leave the |
| check box unchecked. The main area shows a list of parameter definitions. Each |
| parameter has a name, which must be unique for this Analysis Engine. The first three |
| attributes specify whether the parameter can have a single or multiple values (an array |
| of values), whether it is Optional or Mandatory, and what the value type it can hold |
| (String, Integer, Float, and Boolean). If an external override name has been specified |
| an attribute of "XO" is included. See <a href="references.html#d5e1" class="olink">UIMA References</a> |
| <a href="references.html#ugr.ref.xml.component_descriptor.aes.external_configuration_parameter_overrides" class="olink">Section 2.4.3.4, “External Configuration Parameter Overrides”</a> |
| for a discussion of external configuration parameter overrides.</p> |
| |
| <p>In addition to using the buttons on the right to edit this information, you can |
| double-click a parameter to edit it, or remove (delete) a selected parameter by |
| pressing the delete key. Use the Add button to add a new parameter to the list.</p> |
| |
| <p>Parameters have an additional description field, which you can specify when you |
| add or edit a parameter. To see the value of the description, hover the mouse over the |
| item, as shown in the picture below. If the parameter has an external override name its value |
| is included in the hover. |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="545"><tr><td><img src="images/tools/tools.cde/image022.jpg" width="545" alt="Parameter description shown in a hover message"></td></tr></table></div> |
| </div> |
| |
| <div class="section" title="1.6.1. Using groups"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.cde.parm_definition.using_groups">1.6.1. Using groups</h3></div></div></div> |
| |
| |
| <p>The group concept for parameters arose from the observation that sets of |
| parameters were sometimes associated with different configuration needs. As an |
| example, you might have an Analysis Engine which needed different configuration |
| based on the language of a document.</p> |
| |
| <p>To use groups, you check the <span class="quote">“<span class="quote">Use Parameter Groups</span>”</span> box. When you |
| do this, you get the ability to add groups, and to define parameters within these |
| groups. You also get a capability to define <span class="quote">“<span class="quote">Common</span>”</span> parameters, |
| which are parameters which are defined for all groups. Here is a screen shot showing |
| some parameter groups in use: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="515"><tr><td><img src="images/tools/tools.cde/image024.jpg" width="515" alt="Using parameter groups"></td></tr></table></div> |
| </div> |
| |
| <p>You can see the <span class="quote">“<span class="quote"><Common></span>”</span> parameters as well as two |
| different sets of groups.</p> |
| |
| <p>The Default Group is an optional specification of what Group to use if the |
| parameter is not available for the group requested.</p> |
| |
| <p>The Search strategy specifies what to do when a parameter is not available for the |
| group requested. It can have the values of None, language_fallback, or |
| default_fallback. These are more fully described in the section |
| <a href="references.html#d5e1" class="olink">UIMA References</a> <a href="references.html#ugr.ref.xml.component_descriptor.aes.configuration_parameter_declaration" class="olink">Section 2.4.3.1, “Configuration Parameter Declaration”</a> |
| .</p> |
| |
| <p>Groups are added using the Add Group button. Once added, they can be edited or |
| removed, using the buttons to the right, or the standard gestures for editing |
| (double-clicking the item) and removing (pressing the delete key after an item is |
| selected). Removing a group removes all the parameter definitions in the group. If |
| you try and remove the <span class="quote">“<span class="quote"><Common></span>”</span> group, it just removes the |
| parameters in the group.</p> |
| |
| <p>Each entry for a group in the table specifies one or more group names. For example, |
| the highlighted entry above, specifies two groups: <span class="quote">“<span class="quote">myNewGroup2</span>”</span> |
| and <span class="quote">“<span class="quote">mg3</span>”</span>. The parameter definition underneath is considered to be in |
| both groups.</p> |
| |
| </div> |
| |
| <div class="section" title="1.6.2. Adding or Editing a Parameter"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.cde.parm_definition.adding">1.6.2. Adding or Editing a Parameter</h3></div></div></div> |
| |
| |
| <p>When creating or modifying a parameter both a unique name and a valid type must be |
| specified. The Description and External Override fields are optional. The defaults for the two |
| checkboxs indicate a single-valued optional parameter in the example below: |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="465"><tr><td><img src="images/tools/tools.cde/image025.jpg" width="465" alt="Aggregate parameters"></td></tr></table></div> |
| </div> |
| |
| </div> |
| |
| <div class="section" title="1.6.3. Parameter declarations for Aggregates"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.cde.parm_definition.aggregates">1.6.3. Parameter declarations for Aggregates</h3></div></div></div> |
| |
| |
| <p>Aggregates declare parameters which always must override a parameter setting |
| for a component making up the aggregate. They do this using the version of this page |
| which is shown when the descriptor is an Aggregate; here's an example: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="564"><tr><td><img src="images/tools/tools.cde/image026.jpg" width="564" alt="Aggregate parameters"></td></tr></table></div> |
| </div> |
| |
| <p>There is an additional panel shown (on the right) which lists all of the |
| components by their key names, and shows for each of them their defined parameters. To |
| add a new override for one or more of these parameters to the aggregate, select the |
| component parameter you wish to override and push the Create Override button (or, you |
| can just double-click the component parameter). This will automatically add a |
| parameter of the same name (by default – you can change the name if you like) to |
| the aggregate, putting it into the same group(s) (if groups are being used in the |
| component – this is required), and setting the properties of the parameter to |
| match those of the component (this is required).</p> |
| <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>If the name of the parameter being added already is in use in the aggregate, |
| and the parameters are not compatible, a new parameter name is generated by suffixing |
| the name with a number. If the parameters are compatible, the selected component |
| parameter is added to the existing aggregate parameter, as an additional override. If |
| you don't want this behavior, but want to have a new name generated in this case, |
| push the Create non-shared Override button instead, or hold down the |
| <span class="quote">“<span class="quote">shift</span>”</span> key when double clicking the component parameter.</p> |
| |
| <p>The required / optional setting in the aggregate parameter is set to match that of |
| the parameter being overridden. You may want to make an optional delegate parameter |
| required. You can do this by changing that value manually in the source editor view. |
| </p></div> |
| |
| <p>In the above example, the user has just double-clicked the |
| <span class="quote">“<span class="quote">TypeNames</span>”</span> parameter in the <span class="quote">“<span class="quote">NameRecognizer</span>”</span> |
| component. This added that parameter to this aggregate under the <span class="quote">“<span class="quote"><Not in |
| any group></span>”</span> section – since it wasn't part of a group.</p> |
| |
| <p>Once you have added a parameter definition to the aggregate, you can use the |
| buttons on the right side of the left panel to add additional overrides or remove |
| parameters or their overrides. <span><a name="ugr.tools.cde.parm_definition.removing_groups"></a> You can also remove |
| groups; removing a group is like removing all the parameter definitions in the |
| group.</span></p> |
| |
| <p>In addition to adding one parameter at a time from a component, you can also add all |
| the parameters for a group within a component, or all the parameters in the component, |
| by selecting those items.</p> |
| |
| <p>If you double-click (or push Create Override) the |
| <span class="quote">“<span class="quote"><Common></span>”</span> group or a parameter in the <Common> group in |
| a component, a special group is created in the Aggregate consisting of all of the |
| groups in that component, and the overriding parameter (or parameters) are added to |
| that. This is done because each component can have different groups belonging to the |
| Common group notion; the Common group for a component is just shorthand for all the |
| groups in that component.</p> |
| |
| <p>The Aggregate's specification of the default group and search strategy |
| override any specifications contained in the components.</p> |
| |
| </div> |
| </div> |
| |
| <div class="section" title="1.7. Parameter Settings Page"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.cde.parameter_settings">1.7. Parameter Settings Page</h2></div></div></div> |
| |
| |
| <p>The Parameter Settings page is rather straightforward; it is where the user |
| defines parameter settings for their engines. An example of such a page is given below: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="564"><tr><td><img src="images/tools/tools.cde/image028.jpg" width="564" alt="Parameter settings page"></td></tr></table></div> |
| </div> |
| |
| <p>For single valued attributes, the user simply types the default value into the |
| Value box on the right hand side. For multi-valued parameters the user should use the |
| Add, Edit and Remove buttons to manage the list of multiple parameter values.</p> |
| |
| <p>Values within groups are shown with each group separately displayed, to allow |
| configuring different values for each group.</p> |
| |
| <p>Values are checked for validity. For Boolean values in a list, use the words |
| <code class="literal">true</code> or <code class="literal">false</code>.</p> |
| <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>If you specify a value in a single-valued parameter, and then delete all the |
| characters in the value, the CDE will treat this as if you wanted to not specify any setting |
| for this parameter. In order to specify a 0 length string setting for a String-valued |
| parameter, you will have to manually edit the XML using the <span class="quote">“<span class="quote">Source</span>”</span> tab. |
| </p> |
| <p> For array valued parameters, if you remove all of the entries for a particular array |
| parameter setting, the XML will reflect a 0-length array. To change this to an |
| unspecified parameter setting, you will have to manually edit the XML using the |
| <span class="quote">“<span class="quote">Source</span>”</span> tab. </p></div> |
| |
| </div> |
| |
| <div class="section" title="1.8. Type System Page"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.cde.type_system">1.8. Type System Page</h2></div></div></div> |
| |
| |
| <p>This page declares the type system used by the annotator. For aggregates it is |
| derived by merging the type systems of all constituent AEs. The types used by the AE |
| constitute the language in which the inputs and outputs are described in the |
| Capabilities page and also affect the choice of indexes on the Indexes page. The Type |
| System page looks like the following: |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="634"><tr><td><img src="images/tools/tools.cde/limitJCasGenType.jpg" width="634" alt="Type System declaration page"></td></tr></table></div> |
| </div> |
| |
| <p>Before discussing this page in detail, it is important to note that there are 3 |
| settings that affect the operation of this page. These are accessed by selecting the |
| UIMA <span class="symbol">→</span> Settings (or by going to the Eclipse Window <span class="symbol">→</span> Preferences <span class="symbol">→</span> UIMA |
| Preferences) and checking or unchecking one of the following: <span class="quote">“<span class="quote">Auto generate |
| .java files when defining types</span>”</span>, |
| <span class="quote">“<span class="quote">Generate JCasGen classes only for types defined within the local project scope</span>”</span> |
| and <span class="quote">“<span class="quote">Display fully qualified type |
| names.</span>”</span></p> |
| |
| <p><a name="ugr.tools.cde.auto_jcasgen"></a>When the Auto generate option is checked and the development language for the AE is |
| Java, any time a change is made to a type and the change is saved, the corresponding .java |
| files are generated using the JCasGen tool. The results are stored in the primary source |
| directory defined for the project. The primary source directory is that listed first |
| when you right click on your project and select Properties <span class="symbol">→</span> Java Build Path, click |
| on the Source tab and look in the list box under the text that reads: <span class="quote">“<span class="quote">Source folder |
| on build path.</span>”</span> If no source folders are defined, you will get a warning that you |
| have no source folders defined and JCasGen will not be run. (For information on JCasGen |
| see <a href="tools.html#d5e1" class="olink">UIMA Tools Guide and Reference</a> |
| <a href="tools.html#ugr.tools.jcasgen" class="olink">Chapter 8, <i>JCasGen User's Guide</i></a>). |
| When JCasGen is run, you can monitor the progress of the generation by observing the |
| status on the Eclipse status line (normally at the bottom of the Eclipse window). |
| JCasGen runs on the fully-merged type system, consisting of the type specification |
| plus any imported type system, plus (for aggregates) the merged type systems of all the |
| components in an aggregate.</p> |
| |
| <div class="warning" title="Warning" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Warning</h3><p>If the components of the aggregate have different definitions for the same |
| type name, the CDE will show a warning. It is possible to continue past this warning, |
| in which case the CDE will produce the correct |
| Java source files representing the merged types (that is, the |
| type definition that contains all of the features defined on that type by all of your |
| components). However, it is not recommended to use this feature |
| (of having different definitions for the same type name) since it can make it |
| difficult to combine/package your annotator with others. See <a href="references.html#d5e1" class="olink">UIMA References</a> |
| <a href="references.html#ugr.ref.jcas.merging_types_from_other_specs" class="olink">Section 5.5, “Merging Types”</a> for more information. |
| </p></div> |
| |
| <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>In addition to running automatically, you can manually run JCasGen on the |
| fully merged type system by clicking the JCasGen button, or by selecting Run JCasGen from |
| the UIMA pulldown menu: </p></div> |
| |
| |
| <div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="515"><tr><td><img src="images/tools/tools.cde/image032.jpg" width="515" alt="Setting JCasGen options"></td></tr></table></div> |
| </div> |
| |
| <p>When <span class="quote">“<span class="quote">Generate JCasGen classes only for types defined within the local project scope</span>”</span> |
| is checked, then JCasGen skips generating classes for types that are imported from sources outside this project. |
| This might be done, for instance, if you have an aggregate which is importing type systems from its delegates, |
| some of which are defined in other projects, and have JCasGen'd files already present in those other projects. |
| </p> |
| |
| <p>The UIMA settings and preferences for controlling this are used to initialize a particular instance of the |
| editor, when it is started. Following that, you can override this setting, just for that editor, by checking or |
| unchecking the box shown on the type system page:</p> |
| |
| <div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="72"><tr><td><img src="images/tools/tools.cde/limitJCasGen.jpg" width="72" alt="Setting JCasGen options"></td></tr></table></div> |
| </div> |
| |
| <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>If this is checked, and one of the types that would be excluded has merged type features, an error message |
| is issued - because JCasGen will need to be run for the combined (merged) type in order to get a class definition |
| that will work for this configuration (have access to all the features). If this happens, you have to run without |
| limiting JCasGen, and manually delete any duplicated/unwanted source results.</p></div> |
| |
| |
| <p>When <span class="quote">“<span class="quote">Display fully qualified type names</span>”</span> is left unchecked, the |
| namespace of types is not displayed, i.e. if a fully qualified type name is |
| my.namespace.person, only the abbreviated type name person will be displayed. In the |
| Type page diagram shown above, <span class="quote">“<span class="quote">Display fully qualified type names</span>”</span> is |
| in fact unchecked.</p> |
| |
| <p>To add, edit, or remove types the buttons on the top left section are used. When |
| adding or editing types, fully qualified type names should of course be used, |
| regardless of whether the <span class="quote">“<span class="quote">Display fully qualified type names</span>”</span> is |
| unchecked. Removing or editing a type will have a cascading effect in that the type |
| removal/edit will effect inputs, outputs, indexes and type priorities in the natural |
| way.</p> |
| |
| <p>When a type is added, this dialog is shown: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="416"><tr><td><img src="images/tools/tools.cde/image034.jpg" width="416" alt="Adding a type"></td></tr></table></div> |
| </div> |
| |
| <p>Type names should be specified using a namespace. The namespace is like a Java |
| package name, and serves to insure type names are unique. It also serves as the package |
| name for the generated JCas classes. The namespace name is the set of names up to the last |
| period in the string.</p> |
| |
| <p>The supertype must be picked from an existing type. The entry field for the |
| supertype supports Eclipse-style content assist. To use it, put the cursor in the |
| supertype field, and type a letter or two of the supertype name (lower case is fine), |
| either starting with the name space, or just with the type name (without the name space), |
| and hold down the Control key and then press the spacebar. When you do this, you can see a |
| list of suitable matching types. You can then type more letters to narrow down your |
| choices, or pick the right entry with the mouse.</p> |
| |
| <p>To see the available types and pick one, press the Browse button. This will show the |
| available types, and as you type letters for the type name (in lower case – |
| capitalization is ignored), the available types that match are narrowed. When |
| you've typed enough to specify the type you want, press Enter. Or you can use the |
| list of matching type names and pick the one you want with the mouse.</p> |
| |
| <p>Once you've added the type, you can add features to it by highlighting the |
| type, and pressing the Add button.</p> |
| |
| <p>If the type being defined is a subtype of uima.cas.String, the Add button allows you |
| to add allowed values for the string, instead of adding features.</p> |
| |
| <p>To edit a type or feature, you can double click the entry, or highlight the entry and |
| press the Edit button. To delete a type or feature, you highlight the entry to be deleted, |
| and click the delete button or push the delete key.</p> |
| |
| <p>If the range of a feature is an array or one of the built-in list types, an additional |
| specification allows you to specify if multiple references to the object referenced by |
| this feature are allowed. If they are not allowed then the XMI serialization of |
| instances of this type use a more efficient format.</p> |
| |
| <p>If the range of a feature is an array of Feature Structures, then it is possible to |
| specify an element type for the array. This information is used in the XMI serialization |
| and also by the JCas generation routines to generate more efficient code. |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="416"><tr><td><img src="images/tools/tools.cde/image036.jpg" width="416" alt="Specifying a Feature Structure"></td></tr></table></div> |
| </div> |
| |
| <p>It is also possible to import type systems for inclusion in your descriptor. To do |
| this, use the Type Import panel's<code class="literal"> Add...</code> button. This |
| allows you to import a type system descriptor.</p> |
| |
| <p>When importing by name, the name is resolved using the class path for the Eclipse |
| project containing the descriptor file being edited, or by looking up this name in the |
| UIMA DataPath. The DataPath can be set by pushing the Set DataPath button. It will be |
| remembered for this Eclipse project, as a project Property, so you only have to set it |
| once (per project). The value of the DataPath setting is written just like a class path, |
| and can include directories or JAR files, just as is true for class paths.</p> |
| |
| <p>The following dialog allows you to pick one or more files from the Eclipse |
| workspace, or one file (at a time) from the file system: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="347"><tr><td><img src="images/tools/tools.cde/import-chooser.jpg" width="347" alt="Picking files for importing"></td></tr></table></div> |
| </div> |
| |
| <p>This is essentially the same dialog as was used to add component engines to an |
| aggregate. To import from a type system descriptor that is not part of your Eclipse |
| workspace, click the Browse the file system.... button.</p> |
| |
| <p>Imported types are validated, and if OK, they are added to the list in the Imported |
| Type Systems section of the Type System page. Any types they define are merged with the |
| existing type system.</p> |
| |
| <p>Imported types and features which are only defined in imports are shown in the Type |
| System section, but in a grayed-out font; these type cannot be edited here. To change |
| them, open up the imported type system descriptor, and change them there.</p> |
| |
| <p>If you hover the mouse over an import specification, it will show more information |
| about the import. If you right-click, it will bring up a context menu that allows opening |
| the imported file in the Editor, if the imported file is part of the Eclipse workspace. |
| Changes you make, however, won't be seen until you close and reopen the editor on |
| the importing file.</p> |
| |
| <p>It is not possible to define types for an aggregate analysis engine. In this case the |
| type system is computed from the component AEs. The Type System information is shown in a |
| grayed-out font.</p> |
| |
| <div class="section" title="1.8.1. Exporting"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.cde.type_system.exporting">1.8.1. Exporting</h3></div></div></div> |
| |
| |
| <p>In addition to importing type specifications, you can export as well. When you |
| push the Export... button, the editor will create a new importable XML descriptor for |
| the types in this type system, and change the existing descriptor to import that newly |
| created one. |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="371"><tr><td><img src="images/tools/tools.cde/image040.jpg" width="371" alt="Exporting a type system"></td></tr></table></div> |
| </div> |
| |
| <p>The base file name you type is inserted into the path in the line below |
| automatically. You can change the path where the generated part descriptor is stored |
| by overtyping the lower text box. When you click OK, the new part descriptor will be |
| generated, and the current descriptor will be changed to import that part.</p> |
| |
| </div> |
| </div> |
| |
| <div class="section" title="1.9. Capabilities Page"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.cde.capabilities">1.9. Capabilities Page</h2></div></div></div> |
| |
| |
| <p>Capabilities come in <span class="quote">“<span class="quote">sets</span>”</span>. You can have multiple sets of |
| capabilities; each one specifies languages supported, plus inputs and outputs of the |
| Analysis Engine. The idea behind having multiple sets is the concept that different |
| inputs can result in different outputs. Many Analysis Engines, though, will probably |
| define just one set of capabilities. A sample Capabilities page is given below: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="515"><tr><td><img src="images/tools/tools.cde/image042.jpg" width="515" alt="Capabilities page"></td></tr></table></div> |
| </div> |
| |
| <p>When defining the capabilities of a primitive analysis engine, input and output |
| types can be any type defined in the type system. When defining the capabilities of an |
| aggregate the inputs must be a subset of the union of the inputs in the constituent |
| analysis engines and the outputs must be a subset of the union of the outputs of the |
| constituent analysis engines.</p> |
| |
| <p>To add a type, first select something in the set you wish to add the type to, and press |
| Add Type. The following dialog appears presenting the user with a list of types which are |
| candidates for additional inputs: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="436"><tr><td><img src="images/tools/tools.cde/image044.jpg" width="436" alt="Adding a type to the capabilities page"></td></tr></table></div> |
| </div> |
| |
| <p>Follow the instructions to mark the types as input and / or output (a type can be |
| both). By default, the <all features> flag is set to true. If you want to specify a |
| subset of features of a type, read on.</p> |
| |
| <p>When types have features, you can specify what features are input and / or output. A |
| type doesn't have to be an output to have an output feature. For example, an |
| Analysis Engine might be passed as input a type Token, and it adds (outputs) a feature to |
| the existing Token types. If no new Token instances were created, it would not be an |
| output Type, but it would have features which are output.</p> |
| |
| <p>To specify features as input and / or output (they can be both), select a type, and |
| press Add. The following dialog box appears: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="396"><tr><td><img src="images/tools/tools.cde/image046.jpg" width="396" alt="Specifying features as input or output"></td></tr></table></div> |
| </div> |
| |
| <p>To mark a feature as being input and / or output, click the mouse in the input and / or |
| output column for the feature. If you select <all features>, it unmarks any |
| individual feature you selected, since <all features> subsumes all the |
| features.</p> |
| |
| <p>The Languages part of the capability is where you specify what languages are |
| supported by the Analysis Engine. Supported languages should be listed using either a |
| two letter ISO-639 language code, or an ISO-639 language code followed by a hyphen and then a two-letter |
| ISO-3166 country code. Add a language by selecting Languages and pressing the Add |
| button. The dialog for adding languages is given below. |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="396"><tr><td><img src="images/tools/tools.cde/image048.jpg" width="396" alt="Specifying a language"></td></tr></table></div> |
| </div> |
| |
| <p>The Sofa part of the capability is optional; it allows defining Sofa names that this |
| component uses, and whether they are input (meaning they are created outside of this |
| component, and passed into it), or output (meaning that they are created by this |
| component). Note that a Sofa can be either input or output, but can't be |
| both.</p> |
| |
| <p>To add a Sofa name (which is synonymous with the view name), press the Add Sofa |
| button, and this dialog appears: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="416"><tr><td><img src="images/tools/tools.cde/image050.jpg" width="416" alt="Specifying a Sofa name"></td></tr></table></div> |
| </div> |
| |
| <div class="section" title="1.9.1. Sofa (and view) name mappings"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.cde.capabilities.sofa_name_mapping">1.9.1. Sofa (and view) name mappings</h3></div></div></div> |
| |
| |
| <p>Sofa names, once created, are used in Sofa Mappings. These are optional |
| mappings, done in an aggregate, that specify which Sofas are the same ones but with |
| different names. The Sofa Mappings section is minimized unless you are editing an |
| Aggregate descriptor, and have one or more Sofa names defined for the aggregate. In |
| that case, the Sofa Mappings section will look like this: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="535"><tr><td><img src="images/tools/tools.cde/image052.jpg" width="535" alt="Sofa mappings"></td></tr></table></div> |
| </div> |
| |
| <p>Here the aggregate has defined two input Sofas, named |
| <span class="quote">“<span class="quote">MyInputSofa</span>”</span>, and <span class="quote">“<span class="quote">AnotherSofa</span>”</span>. Any named sofas in |
| the aggregate's capabilities will appear in the Sofa Mapping section, listed |
| either under Inputs or Outputs. Each name in the Mappings has 0 or more delegate |
| (component) sofa names mapped to it. A delegate may have multiple Sofas, as in this |
| example, where the GovernmentOfficialRecognizer delegate has Sofas named |
| <span class="quote">“<span class="quote">so1</span>”</span> and <span class="quote">“<span class="quote">so2</span>”</span>.</p> |
| |
| <p>Delegate components may be written as Single-View components. In this case, |
| they have one implicit, default Sofa (<span class="quote">“<span class="quote">_InitialView</span>”</span>), and to map to |
| it you use the form shown for the <span class="quote">“<span class="quote">NameRecognizer</span>”</span> – you map to |
| the delegate's key name in the aggregate, without specifying a Sofa name. You |
| can also specify the sofa name explicitly, e.g., |
| NameRecognizer/_InitialView.</p> |
| |
| <p>To add a new mapping, select the Aggregate Sofa name you wish to add the mapping |
| for, and press the Add button. This brings up a window like this, showing all available |
| delegates and their Sofas; select one or more (use the normal multi-select methods) |
| of these and press OK to add them. |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="564"><tr><td><img src="images/tools/tools.cde/image054.jpg" width="564" alt="Adding a Sofa mapping"></td></tr></table></div> |
| </div> |
| |
| <p>To edit an existing mapping, select the mapping and press Edit. This will show the |
| existing mapping with all mapped items <span class="quote">“<span class="quote">selected</span>”</span>, and other |
| available items unselected. Change the items selected to match what you want, |
| deselecting some, and perhaps selecting others, and press OK.</p> |
| |
| </div> |
| </div> |
| |
| <div class="section" title="1.10. Indexes Page"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.cde.indexes">1.10. Indexes Page</h2></div></div></div> |
| |
| |
| <p>The Indexes page is where the user declares what indexes and type priority lists are |
| used by the analysis engine. Indexes are used to determine which Feature |
| Structures of a particular type are fetched, using an iterator in the UIMA API. An |
| unpopulated Indexes page is displayed below: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="545"><tr><td><img src="images/tools/tools.cde/image056.jpg" width="545" alt="Index page"></td></tr></table></div> |
| </div> |
| |
| <p>Both indexes and type priority lists can have imports. These imports work just like |
| the type system imports, described above. Both indexes and type priority lists can be |
| exported to new component descriptors, using the Export... button, just like the type |
| system export operation described above.</p> |
| |
| <p>The built-in Annotation Index is always present. It is based on the built-in type |
| <code class="literal">uima.tcas.Annotation </code>and has keys begin (Ascending), end |
| (Descending) and TYPE_PRIORITY. There are no built-in type priorities, so this last |
| sort item does not play a role in the index unless type priorities are specified.</p> |
| |
| <p>Type priority may be combined with other keys. Type priorities are defined in the |
| Priority Lists section, using one or more priority list. A given priority list gives an |
| ordering among a group of types. Types that appear higher in the priority list are given |
| higher priority, in other words, they sort first when TYPE_PRIORITY is specified as the |
| index key. Subtypes of these types are also ordered in a consistent manner, unless |
| overridden by another specific type priority specification. To get the ordering used |
| among all the types, all of the type priority lists are merged. This gives a partial |
| ordering among the types. Ties are resolved in an unspecified fashion. The Component |
| Descriptor Editor checks for incompatible orderings, and informs the user if they |
| exist, so they can be corrected.</p> |
| |
| <p>To create a new index, use the Add Index button in the top left section. This brings up |
| this dialog: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="396"><tr><td><img src="images/tools/tools.cde/image058.jpg" width="396" alt="Adding a new index"></td></tr></table></div> |
| </div> |
| |
| <p>Each index needs a globally unique index name. Every index indexes one CAS type (including |
| its subtypes). If you're using Eclipse 3.2 or later, the entry field for this |
| has content assist (start typing the type name |
| and press Control – Spacebar to get help, or press the Browse button to pick a |
| type).</p> |
| |
| <p>Indexes can be sorted, in which case you need to specify one or more keys to sort on. |
| Sort keys are selected from features whose range type is Integer, Float, or String. Some |
| elements will be disabled if they are not relevant. For instance, if the index kind is |
| <span class="quote">“<span class="quote">bag</span>”</span>, you cannot provide sort keys. The order of sort keys can be |
| adjusted using the up and down buttons, if necessary.</p> |
| |
| |
| <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>There is usually no need to explicitly declare a Bag index in your descriptor. |
| As of UIMA v2.1, if you do not declare any index for a type (or any of its |
| supertypes), a Bag index will be automatically created. This index is |
| accessed using the <code class="literal">getAllIndexedFS(...)</code> method defined on the index repository.</p></div> |
| |
| |
| <p>A set index will contain no duplicates of the same type, where a duplicate is defined |
| by the indexing comparator. That is, if you commit two feature structures of the same |
| type that are equal with respect to the indexing comparator, only the first one will be |
| entered into the index. Note that you can still have duplicates with respect to the |
| indexing order, if they are of a different type. A set index is not guaranteed to be |
| sorted. If no keys are specified for a set index, then all instances are considered by |
| default to be equal, so only the first instance (for a particular type or subtype of the |
| type being indexed) is indexed. On the other hand, <span class="quote">“<span class="quote">bag</span>”</span> indicates that |
| all annotation instances are indexed, including duplicates.</p> |
| |
| <p>The Priority Lists section of the Indexes page is used to specify Priority Lists of |
| types. Priority Lists are unnamed ordered sets of type names. Add a new priority list by |
| clicking the Add Set button. Add a type to an existing priority list by first selecting |
| the set, and then clicking Add. You can use the up and down buttons to adjust the order as |
| necessary; these buttons move the selected item up or down.</p> |
| |
| <p>Although it is possible to import self-contained index and type priority files, |
| the creation of such files is not yet supported by the Component Descriptor Editor. If |
| you create these files using another editor, they can be imported using the |
| corresponding Import panels, shown on the right. Imports are specified in the same |
| manner as they are for Type System imports.</p> |
| |
| </div> |
| |
| <div class="section" title="1.11. Resources Page"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.cde.resources">1.11. Resources Page</h2></div></div></div> |
| |
| |
| <p>The resources page describes resource dependencies (for primitive Analysis |
| Engines) and external Resource specification and their bindings to the resource |
| dependencies.</p> |
| |
| <p>Only primitive Analysis Engines define resource dependencies. Primitive and |
| Aggregate Analysis Engines can define external resources and connect them (bind them) |
| to resource dependencies.</p> |
| |
| <p>When an Aggregate is providing an external resource to be bound to a dependency, the |
| binding is specified using a possibly multi-level path, starting at the Aggregate, and |
| specify which component (by its key name), and then if that component is, in turn, an |
| Aggregate, which component (again by its key name), and so on until you reach a |
| primitive. The sequence of key names is made into the binding specification by joining |
| the parts with a <span class="quote">“<span class="quote">/</span>”</span> character. All of this is done for you by the Component |
| Descriptor Editor.</p> |
| |
| <p>Any external resource provided by an Aggregate will override any binding provided |
| by any lower level component for the same resource dependency.</p> |
| |
| <p>There are two views of the Resources page, depending on whether the Analysis Engine |
| is an Aggregate or Primitive. Here's the view for a Primitive: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="495"><tr><td><img src="images/tools/tools.cde/image060.jpg" width="495" alt="Resources page for a primitive"></td></tr></table></div> |
| </div> |
| |
| <p>To declare a resource dependency, click the Add button in the right hand panel. This |
| puts up the dialog: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="396"><tr><td><img src="images/tools/tools.cde/image062.jpg" width="396" alt="Specifying a resource dependency"></td></tr></table></div> |
| </div> |
| |
| <p>The Key must be unique within the descriptor declaring it. The Interface, if |
| present, is the name of a Java interface the Analysis Engine uses to access the |
| resource.</p> |
| |
| <p>Declare actual External resource on the left side of the page. Clicking |
| <span class="quote">“<span class="quote">Add</span>”</span> brings up this dialog: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="416"><tr><td><img src="images/tools/tools.cde/image064.jpg" width="416" alt="Specifying an External Resource"></td></tr></table></div> |
| </div> |
| |
| <p>The Name must be unique within this Analysis Engine. The URL identifies a file |
| resource. If both the URL and URL suffix are used, the file resource is formed by |
| combining the first URL part with the language-identifier, followed by the URL suffix; |
| see <a href="references.html#d5e1" class="olink">UIMA References</a> <a href="references.html#ugr.ref.xml.component_descriptor.aes.primitive.resource_manager_configuration" class="olink">Section 2.4.1.9, “Resource Manager Configuration”</a> |
| . URLs may be written as <span class="quote">“<span class="quote">relative</span>”</span> URLs; in this case they are resolved by |
| looking them up relative to the classpath and/or datapath. A relative URL has the path |
| part starting without an intial <span class="quote">“<span class="quote">/</span>”</span>; for example: |
| file:my/directory/file. An absolute URL starts with file:/ or file:/// or |
| file://some.network.address/. For more information about URLs, please read the |
| javaDoc information for the Java class <span class="quote">“<span class="quote">URL</span>”</span>.</p> |
| |
| <p>The Implementation is optional, and if given, must be a Java class that implements |
| the interface specified in any Resource Dependencies this resource is bound |
| to.</p> |
| |
| <div class="section" title="1.11.1. Binding"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.cde.resources.binding">1.11.1. Binding</h3></div></div></div> |
| |
| |
| <p>Once you have an external resource definition, and a Resource Dependency, you |
| can bind them together. To do this, you select the two things (an external resource |
| definition, and a Resource Dependency) that you want to bind together, and click |
| Bind.</p> |
| |
| </div> |
| |
| <div class="section" title="1.11.2. Resources with Aggregates"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.cde.resources.aggregates">1.11.2. Resources with Aggregates</h3></div></div></div> |
| |
| |
| |
| <p>When editing an Aggregate Descriptor, the Resource definitions panel will show |
| all the resources at the primitive level, with paths down through the components |
| (multiple levels, if needed) to get to the primitives. The Aggregate can define |
| external resources, and bind them to one or more uses by the primitives.</p> |
| |
| </div> |
| |
| <div class="section" title="1.11.3. Imports and Exports"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.cde.resources.imports_exports">1.11.3. Imports and Exports</h3></div></div></div> |
| |
| |
| <p>Resource definitions and their bindings can be imported, just like other |
| imports. Existing Resource definitions and their bindings can be exported to a new |
| importable part, and replaced with an import for that importable part, using the |
| <span class="quote">“<span class="quote">Export...</span>”</span> button, just like the similar function on the Type System |
| page.</p> |
| |
| </div> |
| </div> |
| |
| <div class="section" title="1.12. Source Page"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.cde.source">1.12. Source Page</h2></div></div></div> |
| |
| |
| <p>The Source page is a text view of the xml content of the Analysis Engine or Type System |
| being configured. An example of this page is displayed below: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="564"><tr><td><img src="images/tools/tools.cde/image066.jpg" width="564" alt="Source page"></td></tr></table></div> |
| </div> |
| |
| <p>Changes made in the GUI are immediately reflected in the xml source, and changes |
| made in the xml source are immediately reflected back in the GUI. The thought here is that |
| the GUI view and the Source view are just two ways of looking at the same data. When the data |
| is in an unsaved state the file name is prefaced with an asterisk in the currently |
| selected file tab in the editor pane inside Eclipse (as in the example above).</p> |
| |
| <p>You may accidentally create invalid descriptors or XML by editing directly in the |
| Source view. If you do this, when you try and save or when you switch to a different view, |
| the error will be detected and reported. In the case of saving, the file will be saved, |
| even if it is in an error state.</p> |
| |
| <div class="section" title="1.12.1. Source formatting – indentation"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.cde.source.formatting">1.12.1. Source formatting – indentation</h3></div></div></div> |
| |
| |
| <p>The XML is indented using an indentation amount saved as a global UIMA |
| preference. To change this preference, use the Eclipse menu item: Windows <span class="symbol">→</span> |
| Preferences <span class="symbol">→</span> UIMA Preferences.</p> |
| |
| </div> |
| </div> |
| |
| <div class="section" title="1.13. Creating a Self-Contained Type System"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.cde.creating_self_contained_type_system">1.13. Creating a Self-Contained Type System</h2></div></div></div> |
| |
| |
| <p>It is also possible to use the Component Descriptor Editor to create or edit |
| self-contained type systems. To create a self-contained type system, select the menu |
| item File <span class="symbol">→</span> New <span class="symbol">→</span> Other and then select Type System Descriptor File. From the |
| next page of the selection wizard specify a Parent Folder and File name and click Finish. |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="347"><tr><td><img src="images/tools/tools.cde/image068.jpg" width="347" alt="Working with a self-contained type system"></td></tr></table></div> |
| </div><p> |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="347"><tr><td><img src="images/tools/tools.cde/image070.jpg" width="347"></td></tr></table></div> |
| </div> |
| |
| <p>This will take you to a version of the Component Descriptor Editor for editing a type |
| system file which contains just three pages: an overview page, a type system page, and a |
| source page. The overview page is a bit more spartan than in the case of an AE. It looks like |
| the following: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="366"><tr><td><img src="images/tools/tools.cde/image072.jpg" width="366" alt="Editing a type system object"></td></tr></table></div> |
| </div> |
| |
| <p>Just like an AE has an associated name, version, vendor and description, the same is |
| true of a self-contained type system. The Type System page is identical to that in an AE |
| descriptor file, as is the Source page. Note that a self-contained type system can |
| import type systems just like the type system associated with an AE.</p> |
| |
| <p>A type system component can also be created from an existing descriptor which |
| contains a type system definition section, by clicking on the Export... button on the |
| Type System page.</p> |
| |
| </div> |
| |
| <div class="section" title="1.14. Creating Other Descriptor Components"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.cde.creating_other_descriptor_components">1.14. Creating Other Descriptor Components</h2></div></div></div> |
| |
| |
| <p>The new wizard can create several other kinds of components: Collection |
| Processing Management (CPM) components, flow controllers, and importable parts |
| (besides Type Systems, described above, Indexes, Type Priorities, and Resource |
| Manager Configuration imports).</p> |
| |
| <p>The CPM components supported by this editor include the Collection Reader, CAS |
| Initializer, and CAS Consumer descriptors. Each of these is basically treated just |
| like a primitive AE descriptor, with small changes to accommodate the different |
| semantics. For instance, a CAS Consumer can't declare in its capabilities |
| section that it outputs types or features.</p> |
| |
| <p>Flow controllers are components that control the flow of CASes within an |
| aggregate, an are edited in a similar fashion as a primitive Analysis Engine.</p> |
| |
| <p>The importable part support requires context information to enable the editor to |
| work, because much of the power of this editor comes from extensive checking that |
| requires additional information, other than what is available in just the importable |
| part. For instance, when you create or edit an Indexes import, the facility for adding |
| new indexes needs the type information, which is not present in this part when it is |
| edited alone. </p> |
| |
| <p>To overcome this, when you edit these descriptors, you will be asked to |
| specify a context descriptor, usually a descriptor which would import the part being |
| edited, which would have the additional information needed. </p> |
| |
| <p>Various methods are used |
| to guess what the context descriptor should be - and if the guess is correct, you can just |
| press the Enter key to confirm. The last successful context file is remembered and will |
| be suggested as the context file to use at the next edit session</p> |
| </div> |
| </div> |
| <div class="chapter" title="Chapter 2. Collection Processing Engine Configurator User's Guide" id="ugr.tools.cpe"><div class="titlepage"><div><div><h2 class="title">Chapter 2. Collection Processing Engine Configurator User's Guide</h2></div></div></div> |
| |
| |
| |
| <p>A <span class="emphasis"><em>Collection Processing Engine (CPE)</em></span> processes |
| collections of artifacts (documents) through the combination of the following |
| components: a Collection Reader, Analysis Engines, and CAS Consumers. |
| <sup>[<a name="d5e597" href="#ftn.d5e597" class="footnote">1</a>]</sup> |
| </p> |
| |
| <p>The <span class="emphasis"><em>Collection Processing Engine Configurator(CPE |
| Configurator)</em></span> is a graphical tool that allows you to assemble and run |
| CPEs.</p> |
| |
| <p>For an introduction to Collection Processing Engine concepts, including |
| developing the components that make up a CPE, read <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> |
| <a href="tutorials_and_users_guides.html#ugr.tug.cpe" class="olink">Chapter 2, <i>Collection Processing Engine Developer's Guide</i></a>. This |
| chapter is a user's guide for using the CPE Configurator tool, and does not describe |
| UIMA's Collection Processing Architecture itself.</p> |
| |
| <div class="section" title="2.1. Limitations of the CPE Configurator"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.cpe.limitations">2.1. Limitations of the CPE Configurator</h2></div></div></div> |
| |
| |
| <p>The CPE Configurator only supports basic CPE configurations.</p> |
| |
| <p>It only supports <span class="quote">“<span class="quote">Integrated</span>”</span> deployments (although it will |
| connect to remotes if particular CAS Processors are specified with remote service |
| descriptors). It doesn't support configuration of the error handling. It |
| doesn't support Sofa Mappings; it assumes all Single-View components are |
| operating with the _InitialView Sofa. Multi-View components will not have their names |
| mapped. It sets up a fixed-sized CAS Pool.</p> |
| |
| <p>To set these additional options, you must edit the CPE Descriptor XML file |
| directly. See <a href="references.html#d5e1" class="olink">UIMA References</a> |
| <a href="references.html#ugr.ref.xml.cpe_descriptor" class="olink">Chapter 3, <i>Collection Processing Engine Descriptor Reference</i></a> for the syntax. |
| You may then open the CPE Descriptor in the CPE Configurator and run it. The changes |
| you applied to the CPE Descriptor <span class="emphasis"><em>will</em></span> be respected, although you |
| will not be able to see them or edit them from the GUI. |
| </p> |
| </div> |
| |
| <div class="section" title="2.2. Starting the CPE Configurator"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.cpe.starting">2.2. Starting the CPE Configurator</h2></div></div></div> |
| |
| |
| <p>The CPE Configurator tool can be run using the <code class="literal">cpeGui</code> shell |
| script, which is located in the <code class="literal">bin</code> directory of the UIMA SDK. If |
| you've installed the example Eclipse project (see <a href="overview_and_setup.html#d4e1" class="olink">UIMA Overview & SDK Setup</a> |
| <a href="overview_and_setup.html#ugr.ovv.eclipse_setup.example_code" class="olink">Section 3.2, “Setting up Eclipse to view Example Code”</a>, you can also run it using the |
| <span class="quote">“<span class="quote">UIMA CPE GUI</span>”</span> run configuration provided in that project.</p> |
| <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>If you are planning to build a CPE using components other than the examples |
| included in the UIMA SDK, you will first need to update your CLASSPATH environment |
| variable to include the classes needed by these components.</p></div> |
| |
| <p>When you first start the CPE Configurator, you will see the main window shown here: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="574"><tr><td><img src="images/tools/tools.cpe/image002.jpg" width="574" alt="CPE Configurator main GUI window"></td></tr></table></div> |
| </div> |
| </div> |
| |
| <div class="section" title="2.3. Selecting Component Descriptors"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.cpe.selecting_component_descriptors">2.3. Selecting Component Descriptors</h2></div></div></div> |
| |
| |
| <p>The CPE Configurator's main window is divided into three sections, one each for the Collection |
| Reader, Analysis Engines, and CAS Consumers.<sup>[<a name="d5e633" href="#ftn.d5e633" class="footnote">2</a>]</sup></p> |
| |
| <p>In each section of the CPE Configurator, you can select the component(s) you want to use by browsing to (or |
| typing the location of) their XML descriptors. You must select a Collection Reader, and at least one Analysis |
| Engine or CAS Consumer.</p> |
| |
| <p>When you select a descriptor, the configuration parameters that are defined in that descriptor will then |
| be displayed in the GUI; these can be modified to override the values present in the descriptor.</p> |
| |
| <p>For example, the screen shot below shows the CPE Configurator after the following components have been |
| chosen: |
| |
| |
| </p><pre class="programlisting">examples/descriptors/collectionReader/FileSystemCollectionReader.xml |
| examples/descriptors/analysis_engine/NamesAndPersonTitles_TAE.xml |
| examples/descriptors/cas_consumer/XmiWriterCasConsumer.xml</pre> |
| |
| |
| <div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="574"><tr><td><img src="images/tools/tools.cpe/image004.jpg" width="574" alt="CPE Configurator after components chosen"></td></tr></table></div> |
| </div> |
| |
| </div> |
| |
| <div class="section" title="2.4. Running a Collection Processing Engine"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.cpe.running">2.4. Running a Collection Processing Engine</h2></div></div></div> |
| |
| |
| <p>After selecting each of the components and providing configuration settings, |
| click the play (forward arrow) button at the bottom of the screen to begin processing. A |
| progress bar should be displayed in the lower left corner. (Note that the progress bar |
| will not begin to move until all components have completed their initialization, which |
| may take several seconds.) Once processing has begun, the pause and stop buttons become |
| enabled.</p> |
| |
| <p>If an error occurs, you will be informed by an error dialog. If processing completes |
| successfully, you will be presented with a performance report.</p> |
| |
| </div> |
| |
| <div class="section" title="2.5. The File Menu"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.cpe.file_menu">2.5. The File Menu</h2></div></div></div> |
| |
| |
| <p>The CPE Configurator's File Menu has the following options:</p> |
| |
| <div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>Open CPE Descriptor</p></li><li class="listitem"><p>Save CPE Descriptor</p></li><li class="listitem"><p>Save Options (submenu)</p></li><li class="listitem"><p>Refresh Descriptors from File System</p></li><li class="listitem"><p>Clear All</p></li><li class="listitem"><p>Exit </p></li></ul></div> |
| |
| <p><span class="bold"><strong>Open CPE Descriptor</strong></span> will allow you to select a |
| CPE Descriptor file from disk, and will read in that CPE Descriptor and configure the GUI |
| appropriately.</p> |
| |
| <p><span class="bold"><strong>Save CPE Descriptor</strong></span> will create a CPE |
| Descriptor file that defines the CPE you have constructed. This CPE Descriptor will |
| identify the components that constitute the CPE, as well as the configuration settings |
| you have specified for each of these components. Later, you can use <span class="quote">“<span class="quote">Open CPE |
| Descriptor</span>”</span> to restore the CPE Configurator to the state. Also, CPE |
| Descriptors can be used to easily run a CPE from a Java program – see |
| <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> <a href="tutorials_and_users_guides.html#ugr.tug.application.running_a_cpe_from_a_descriptor" class="olink">Section 3.3.1, “Running a CPE from a Descriptor”</a> |
| .</p> |
| |
| <p>CPE Descriptors also allow specifying operational parameters, such as error |
| handling options that are not currently available for configuration through the CPE |
| Configurator. For more information on manually creating a CPE Descriptor, see |
| <a href="references.html#d5e1" class="olink">UIMA References</a> |
| <a href="references.html#ugr.ref.xml.cpe_descriptor" class="olink">Chapter 3, <i>Collection Processing Engine Descriptor Reference</i></a>. |
| </p> |
| |
| <p>The <span class="bold"><strong>Save Options</strong></span> submenu has one item, |
| "Use <import>". If this item is checked (the default), saved CPE descriptors |
| will use the <code class="literal"><import></code> syntax to refer to their component |
| descriptors. If unchecked, the older <code class="literal"><include></code> syntax will |
| be used for new components that you add to your CPE using the GUI. (However, if you |
| open a CPE descriptor that used <import>, these imports will not be replaced.) |
| </p> |
| |
| <p><span class="bold"><strong>Refresh Descriptors from File System</strong></span> will |
| reload all descriptors from disk. This is useful if you have made a change to the |
| descriptor outside of the CPE Configurator, and want to refresh the display.</p> |
| |
| <p><span class="bold"><strong>Clear All</strong></span> will reset the CPE Configurator to |
| its initial state, with no components selected.</p> |
| |
| <p><span class="bold"><strong>Exit</strong></span> will close the CPE Configurator. If you |
| have unsaved changes, you will be prompted as to whether you would like to save them to a |
| CPE Descriptor file. If you do not save them, they will be lost.</p> |
| |
| <p>When you restart the CPE Configurator, it will automatically reload the last CPE |
| descriptor file that you were working with.</p> |
| |
| </div> |
| |
| <div class="section" title="2.6. The Help Menu"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.cpe.help_menu">2.6. The Help Menu</h2></div></div></div> |
| |
| |
| <p>The CPE Configurator's Help menu provides <span class="quote">“<span class="quote">About</span>”</span> |
| information and some very simple instructions on how to use the tool.</p> |
| |
| </div> |
| <div class="footnotes"><br><hr width="100" align="left"><div class="footnote"><p><sup>[<a id="ftn.d5e597" href="#d5e597" class="para">1</a>] </sup>Earlier versions of UIMA supported another component, the CAS |
| Initializer, but this component is now deprecated in UIMA Version 2.</p></div><div class="footnote"> |
| <p><sup>[<a id="ftn.d5e633" href="#d5e633" class="para">2</a>] </sup>There is also a fourth pane, for the CAS Initializer, but it is hidden by default. To enable it click the |
| <code class="literal">View <span class="symbol">→</span> CAS Initializer Panel</code> menu item.</p></div></div></div> |
| |
| <div class="chapter" title="Chapter 3. Document Analyzer User's Guide" id="ugr.tools.doc_analyzer"><div class="titlepage"><div><div><h2 class="title">Chapter 3. Document Analyzer User's Guide</h2></div></div></div> |
| |
| |
| |
| <p>The <span class="emphasis"><em>Document Analyzer</em></span> is a tool provided by the |
| UIMA SDK for testing annotators and AEs. It reads text files from your disk, processes them using an AE, and |
| allows you to view the results. The |
| Document Analyzer is designed to work with text files and cannot be used with |
| Analysis Engines that process other types of data.</p> |
| |
| <p>For an introduction to developing annotators and Analysis |
| Engines, read <a href="tutorials_and_users_guides.html#d5e1" class="olink">UIMA Tutorial and Developers' Guides</a> |
| <a href="tutorials_and_users_guides.html#ugr.tug.aae" class="olink">Chapter 1, <i>Annotator and Analysis Engine Developer's Guide</i></a>. |
| This chapter is a user's guide for using the Document Analyzer tool, and |
| does not describe the process of developing annotators and Analysis Engines.</p> |
| |
| <div class="section" title="3.1. Starting the Document Analyzer"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.doc_analyzer.starting">3.1. Starting the Document Analyzer</h2></div></div></div> |
| |
| |
| <p>To run the Document Analyzer, execute the <code class="literal">documentAnalyzer</code> script that is in the <code class="literal">bin</code> directory of your UIMA SDK installation, or, if you |
| are using the example Eclipse project, execute the <span class="quote">“<span class="quote">UIMA Document Analyzer</span>”</span> |
| run configuration supplied with that project.</p> |
| |
| <p>Note that if you're planning to run an Analysis Engine |
| other than one of the examples included in the UIMA SDK, you'll first need to |
| update your CLASSPATH environment variable to include the classes needed by |
| that Analysis Engine.</p> |
| |
| <p>When you first run the Document Analyzer, you should see a |
| screen that looks like this: |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="574"><tr><td><img src="images/tools/tools.doc_analyzer/DocAnalyzerScr1.png" width="574" alt="Document Analyzer GUI"></td></tr></table></div> |
| </div> |
| |
| |
| </div> |
| |
| <div class="section" title="3.2. Running an AE"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.doc_analyzer.running_an_ae">3.2. Running an AE</h2></div></div></div> |
| |
| |
| |
| |
| <p>To run a AE, you must first configure the six fields on |
| the main screen of the Document Analyzer.</p> |
| |
| <p><span class="bold"><strong>Input Directory:</strong></span> |
| Browse to or type the path of a directory containing text files that you |
| want to analyze. Some sample documents |
| are provided in the UIMA SDK under the <code class="literal">examples/data</code> |
| directory.</p> |
| |
| <p><span class="bold"><strong>Input File Format:</strong></span> Set this to "text". It can, alternatively, |
| be set to one of the two serialized forms for CASes, if you have previously generated and saved these. |
| For the CAS formats only, you can also specify "Lenient deserialization"; if checked, then extra |
| types and features in the CAS being deserialized and loaded (that are not defined by the Annotator-to-be-run's |
| type system) will not cause a deserialization error, but will instead be ignored.</p> |
| |
| <p><span class="bold"><strong>Character Encoding:</strong></span> |
| The character encoding of the input files. The default, UTF-8, also works fine for ASCII |
| text files. If you have a different |
| encoding, select it here. For more information on character sets and their names, see the Javadocs for |
| <code class="literal">java.nio.charset.Charset</code>.</p> |
| |
| <p><span class="bold"><strong>Output Directory:</strong></span> Browse to or type the path of a directory where you want |
| output to be written. (As we'll see later, you won't normally need to look directly at these files, but the |
| Document Analyzer needs to know where to write them.) The files written to this directory will be an XML |
| representation of the analyzed documents. If this directory doesn't exist, it will be created. If the |
| directory exists, any files in it will be deleted (but the tool will ask you to confirm this before doing so). If you |
| leave this field blank, your AE will be run but no output will be generated.</p> |
| |
| <p><span class="bold"><strong>Location of AE XML Descriptor:</strong></span> |
| Browse to or type the path of the descriptor |
| for the AE that you want to run. There |
| are some example descriptors provided in the UIMA SDK under the <code class="literal">examples/descriptors/analysis_engine</code> and <code class="literal">examples/descriptors/tutorial</code> directories.</p> |
| |
| <p><span class="bold"><strong>XML Tag containing Text:</strong></span> |
| This is an optional feature. If you enter a value here, it specifies the |
| name of an XML tag, expected to be found within the input documents, that |
| contains the text to be analyzed. For |
| example, the value <code class="literal">TEXT</code> would cause the AE to only |
| analyze the portion of the document enclosed within <TEXT>...</TEXT> |
| tags. Also, any XML tags occuring within that text will be removed prior to analysis.</p> |
| |
| <p><span class="bold"><strong>Language:</strong></span> |
| Specify |
| the language in which the documents are written. Some Analysis Engines, but not all, require |
| that this be set correctly in order to do their analysis. You can select a value from the drop-down |
| list or type your own. The value entered |
| here must be an ISO language identifier, the list of which can be found here: |
| <a class="ulink" href="http://www.ics.uci.edu/pub/ietf/http/related/iso639.txt" target="_top">http://www.ics.uci.edu/pub/ietf/http/related/iso639.txt</a>. |
| </p> |
| |
| |
| <p>Once you've filled in the appropriate values, press the |
| <span class="quote">“<span class="quote">Run</span>”</span> button.</p> |
| |
| <p>If an error occurs, a dialog will appear with the error |
| message. (A stack trace will also be |
| printed to the console, which may help you if the error was generated by your |
| own annotator code.) Otherwise, an |
| <span class="quote">“<span class="quote">Analysis Results</span>”</span> window will appear.</p> |
| |
| |
| |
| </div> |
| |
| <div class="section" title="3.3. Viewing the Analysis Results"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.doc_analyzer.viewing_results">3.3. Viewing the Analysis Results</h2></div></div></div> |
| |
| |
| <p>After a successful analysis, the <span class="quote">“<span class="quote">Analysis |
| Results</span>”</span> window will appear. |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="416"><tr><td><img src="images/tools/tools.doc_analyzer/image004.jpg" width="416" alt="Analysis Results Window"></td></tr></table></div> |
| </div> |
| |
| |
| <p>The <span class="quote">“<span class="quote">Results Display Format</span>”</span> options at the |
| bottom of this window show the different ways you can view your analysis – the |
| Java Viewer, Java Viewer (JV) with User Colors, HTML, and XML. |
| The default, Java Viewer, is recommended.</p> |
| |
| <p>Once you have selected your desired Results Display |
| Format, you can double-click on one of the files in the list to view the |
| analysis done on that file.</p> |
| |
| <p>For the Java viewer, two different view modes are supported, each represented by one of two |
| radio buttons titled "Annnotations", and "Features":</p> |
| |
| <p>In the "Annotations" view, each annotation which is declared to be an output of the pipeline |
| (in the top most Annotator Descriptor) is given a checkbox and a color, in the bottom panel. You can control which |
| annotations are shown by using the checkboxes in the bottom panel, the Select All button, |
| or the Deselet All button. The results display looks like this (for the AE descriptor |
| <code class="literal">examples/descriptors/tutorial/ex4/MeetingDetectorTAE.xml</code>): |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="574"><tr><td><img src="images/tools/tools.doc_analyzer/image006v2.png" width="574" alt="Analysis Results Window showing results from tutorial example 4 in Annotations view mode"></td></tr></table></div> |
| </div> |
| |
| <p>You can click the mouse on one of the highlighted |
| annotations to see a list of all its features in the frame on the right.</p> |
| |
| |
| |
| <p>In the "Features" view, you can specify a combination of a single type, a single feature of that type, and some feature values for that feature. |
| The annotations whose feature values match will be highlighted. Step by step, you first select a specific type of annotations by using |
| a radio button in the first tab of the legend. |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="574"><tr><td><img src="images/tools/tools.doc_analyzer/image007-1v2.png" width="574" alt="Analysis Results Window showing results from tutorial example 4 in Features view mode by selecting the DateAnnotation type."></td></tr></table></div> |
| </div> |
| |
| <p>Selecting this automatically transitions to the second tab, where you then select a specific feature |
| of the annotation type. |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="574"><tr><td><img src="images/tools/tools.doc_analyzer/image007-2v2.png" width="574" alt="Analysis Results Window showing results from tutorial example 4 in Features view mode by selecting the shortDateString feature."></td></tr></table></div> |
| </div> |
| |
| <p>Selecting this again automatically transitions you to the thrid tab, where you select some specific feature |
| values in the third tab of the legend. |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="574"><tr><td><img src="images/tools/tools.doc_analyzer/image007-3v2.png" width="574" alt="Analysis Results Window showing results from tutorial example 4 in Features view mode by selecting individual shortDateString feature values."></td></tr></table></div> |
| </div> |
| |
| <p>In each of the above two view modes, you can click the mouse on one of the highlighted |
| annotations to see a list of all its features in the frame on the right.</p> |
| |
| <p>If you are viewing a CAS that contains multiple subjects |
| of analysis, then a selector will appear at the bottom right of the Annotation |
| Viewer window. This will allow you to |
| choose the Sofa that you wish to view. Note that only text Sofas containing a non-null document are available |
| for viewing.</p> |
| |
| </div> |
| |
| <div class="section" title="3.4. Configuring the Annotation Viewer"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.doc_analyzer.configuring">3.4. Configuring the Annotation Viewer</h2></div></div></div> |
| |
| |
| <p>The <span class="quote">“<span class="quote">JV User Colors</span>”</span> and the HTML viewer allow |
| you to specify exactly which colors are used to display each of your annotation |
| types. For the Java Viewer, you can also |
| specify which types should be initially selected, and you can hide types |
| entirely.</p> |
| |
| <p>To configure the viewer, click the <span class="quote">“<span class="quote">Edit Style |
| Map</span>”</span> button on the <span class="quote">“<span class="quote">Analysis Results</span>”</span> dialog. |
| You should see a dialog that looks like this: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="574"><tr><td><img src="images/tools/tools.doc_analyzer/image008.jpg" width="574" alt="Configuring the Analysis Results Viewer"></td></tr></table></div> |
| </div> |
| |
| <p>To change the color assigned to a type, simply click on |
| the colored cell in the <span class="quote">“<span class="quote">Background</span>”</span> column for the type you wish to |
| edit. This will display a dialog that |
| allows you to choose the color. For the |
| HTML viewer only, you can also change the foreground color.</p> |
| |
| <p>If you would like the type to be initially checked |
| (selected) in the legend when the viewer is first launched, check the box in |
| the <span class="quote">“<span class="quote">Checked</span>”</span> column. If you |
| would like the type to never be shown in the viewer, click the box in the |
| <span class="quote">“<span class="quote">Hidden</span>”</span> column. These |
| settings only affect the Java Viewer, not the HTML view.</p> |
| |
| <p>When you are done editing, click the <span class="quote">“<span class="quote">Save</span>”</span> |
| button. This will save your choices to a |
| file in the same directory as your AE descriptor. From now on, when you view analysis results |
| produced by this AE using the <span class="quote">“<span class="quote">JV User Colors</span>”</span> or <span class="quote">“<span class="quote">HTML</span>”</span> |
| options, the viewer will be configured as you have specified.</p> |
| |
| </div> |
| |
| <div class="section" title="3.5. Interactive Mode"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.doc_analyzer.interactive_mode">3.5. Interactive Mode</h2></div></div></div> |
| |
| |
| |
| <p>Interactive Mode allows you to analyze text that you type |
| or cut-and-paste into the tool, rather than requiring that the documents be |
| stored as files.</p> |
| |
| <p>In the main Document Analyzer window, you can invoke |
| Interactive Mode by clicking the <span class="quote">“<span class="quote">Interactive</span>”</span> button instead of the |
| <span class="quote">“<span class="quote">Run</span>”</span> button. This will |
| display a dialog that looks like this: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="545"><tr><td><img src="images/tools/tools.doc_analyzer/image010.jpg" width="545" alt="Invoking Interactive Mode"></td></tr></table></div> |
| </div> |
| |
| <p>You can type or cut-and-paste your text into this window, |
| then choose your Results Display Format and click the <span class="quote">“<span class="quote">Analyze</span>”</span> |
| button. Your AE will be run on the text |
| that you supplied and the results will be displayed as usual.</p> |
| |
| |
| </div> |
| |
| <div class="section" title="3.6. View Mode"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.doc_analyzer.view_mode">3.6. View Mode</h2></div></div></div> |
| |
| |
| <p>If you have previously run a AE and saved its analysis |
| results, you can use the Document Analyzer's View mode to view those results, |
| without re-running your analysis. To do |
| this, on the main Document Analyzer window simply select the location of your |
| analyzed documents in the <span class="quote">“<span class="quote">Output Directory</span>”</span> dialog and click the |
| <span class="quote">“<span class="quote">View</span>”</span> button. You can then |
| view your analysis results as described in Section |
| <a class="xref" href="#ugr.tools.doc_analyzer.viewing_results" title="3.3. Viewing the Analysis Results">Section 3.3, “Viewing the Analysis Results”</a>.</p> |
| |
| </div> |
| </div> |
| <div class="chapter" title="Chapter 4. Annotation Viewer" id="ugr.tools.annotation_viewer"><div class="titlepage"><div><div><h2 class="title">Chapter 4. Annotation Viewer</h2></div></div></div> |
| |
| |
| <p>The <span class="emphasis"><em>Annotation Viewer</em></span> is a tool for viewing analysis results |
| that have been saved to your disk as <span class="emphasis"><em>external XML representations of the |
| CAS</em></span>. These are saved in a particular format called XMI. In the UIMA SDK, XML |
| versions of CASes can be generated by:</p> |
| |
| <div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>Running the Document Analyzer (see <a href="tools.html#ugr.tools.doc_analyzer" class="olink">Chapter 3, <i>Document Analyzer User's Guide</i></a>, which |
| saves an XML representations of the CAS to the specified output directory.</p> |
| </li><li class="listitem"><p>Running a Collection Processing Engine that includes the |
| <span class="emphasis"><em>XMI Writer </em></span>CAS Consumer |
| (<code class="literal">examples/descriptors/cas_consumer/XmiWriterCasConsumer.xml)</code>. |
| </p></li><li class="listitem"><p>Explicitly creating XML representations of the CAS from your own |
| application using the org.apache.uima.cas.impl.XMISerializer class. The best way |
| to learn how to do this is to look at the example code for the XMI Writer CAS Consumer, |
| located in |
| <code class="literal">examples/src/org/apache/uima/examples/xmi/XmiWriterCasConsumer.java</code>. |
| <sup>[<a name="d5e844" href="#ftn.d5e844" class="footnote">3</a>]</sup> </p></li></ul></div> |
| <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>The Annotation Viewer only shows CAS views where the Sofa data type is a String. |
| </p></div> |
| |
| <p>You can run the Annotation Viewer by executing the |
| <code class="literal">annotationViewer</code> shell script located in the bin directory of the |
| UIMA SDK or the "UIMA Annotation Viewer" Eclipse run configuration in the |
| <code class="literal">uimaj-examples</code> project. This will open the following window: |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="574"><tr><td><img src="images/tools/tools.annotation_viewer/image002.jpg" width="574" alt="Screenshot of annotationViewer"></td></tr></table></div> |
| </div> |
| |
| <p>Select an input directory (which must contain XMI files), and the descriptor for the |
| AE that produced the Analysis (which is needed to get the type system for the analysis). |
| Then press the <span class="quote">“<span class="quote">View</span>”</span> button.</p> |
| |
| <p>This will bring up a dialog where you can select a viewing format and double-click on a |
| document to view it. This dialog is the same as the one that is described in <a href="tools.html#ugr.tools.doc_analyzer.viewing_results" class="olink">Section 3.3, “Viewing the Analysis Results”</a>.</p> |
| <div class="footnotes"><br><hr width="100" align="left"><div class="footnote"><p><sup>[<a id="ftn.d5e844" href="#d5e844" class="para">3</a>] </sup>An older form of a different XML format for the CAS is also provided |
| mainly for backwards compatibility. This form is called XCAS, and you can see examples |
| of its use in |
| <code class="literal">examples/src/org/apache/uima/examples/cpe/XCasWriterCasConsumer.java</code>. |
| </p></div></div></div> |
| <div class="chapter" title="Chapter 5. CAS Visual Debugger" id="ugr.tools.cvd"><div class="titlepage"><div><div><h2 class="title">Chapter 5. CAS Visual Debugger</h2></div></div></div> |
| |
| <div class="section" title="5.1. Introduction"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.cvd.introduction">5.1. Introduction</h2></div></div></div> |
| |
| <p> |
| The CAS Visual Debugger is a tool to run text analysis engines in UIMA |
| and view the results. The tool is implemented as a stand-alone GUI |
| tool using Java's Swing library. |
| </p> |
| <p> |
| This is a developer's tool. It is intended to support you in writing |
| text analysis annotators for UIMA (Unstructured Information Management |
| Architecture). As a development tool, the emphasis is not so much on |
| pretty pictures, but rather on navigability. It is intended to show |
| you all the information you need, and show it to you quickly (at least |
| on a fast machine ;-). |
| </p> |
| <p> |
| The main purpose of this application is to let you browse all the data |
| that was created when you ran an analysis engine over some text. The |
| display mimics the access methods you have in the CAS API in terms of |
| indexes, types, feature structures and feature values. |
| </p> |
| <p> |
| As in the CAS, there is special support for annotations. Clicking on |
| an annotation will select the corresponding text, and conversely, you |
| can display all annotations that cover a given position in the text. |
| This will be explained in more detail in the section on the main |
| display area. |
| </p> |
| <p> |
| As usual, the graphics in this manual are for illustrative purposes |
| and may not look 100% like the actual version of CVD you are running. |
| This depends on your operating system, your version of Java, and a |
| variety of other factors. |
| </p> |
| <div class="section" title="5.1.1. Running CVD"><div class="titlepage"><div><div><h3 class="title" id="ugr.cvd.introduction.running">5.1.1. Running CVD</h3></div></div></div> |
| |
| <p> |
| You will usually want to start CVD from the command line, or from Eclipse. To start CVD from the |
| command line, you minimally need the uima-core and uima-tools jars. Below is a sample command |
| line for sh and its offspring. |
| </p><pre class="programlisting">java -cp ${UIMA_HOME}/lib/uima-core.jar:${UIMA_HOME}/lib/uima-tools.jar |
| org.apache.uima.tools.cvd.CVD</pre><p> |
| However, there is no need to type this. The ${UIMA_HOME}/bin directory contains a cvd.sh and |
| cvd.bat file for Unix/Linux/MacOS and Windows, respectively. |
| </p> |
| <p> |
| In Eclipse, you have a ready to use launch configuration available when you have installed the |
| UIMA sample project (see <a href="overview_and_setup.html#d4e1" class="olink">UIMA Overview & SDK Setup</a> <a href="overview_and_setup.html#ugr.ovv.eclipse_setup.example_code" class="olink">Section 3.2, “Setting up Eclipse to view Example Code”</a>). Below is a screenshot of the the Eclipse Run |
| dialog with the CVD |
| run configuration selected. |
| </p><div class="screenshot"> |
| <div class="mediaobject"><img src="images/tools/tools.cvd/eclipse-cvd-launch.jpg" width="459" alt="Eclipse run dialog with CVD selected"></div> |
| </div><p> |
| </p> |
| </div> |
| |
| <div class="section" title="5.1.2. Command line parameters"><div class="titlepage"><div><div><h3 class="title" id="cvd.introduction.commandline">5.1.2. Command line parameters</h3></div></div></div> |
| |
| <p> |
| You can provide some command line parameters to influence the startup behavior of CVD. For |
| example, if you want to run a certain analysis engine on a certain text over and over again |
| (for debugging, say), you can make CVD load the annotator and text at startup and execute |
| the annotator. Here's a list of the supported command line options. |
| </p> |
| |
| <div class="table"><a name="cvd.table.commandline"></a><p class="title"><b>Table 5.1. Command line options</b></p><div class="table-contents"> |
| |
| <table summary="Command line options" style="border: none;"><colgroup><col><col></colgroup><thead><tr><th style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Option</th><th style="border-bottom: 0.5pt solid black; ">Description</th></tr></thead><tbody><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "> |
| <code class="computeroutput">-text <textFile></code> |
| </td><td style="border-bottom: 0.5pt solid black; ">Loads the text file <code class="computeroutput"><textFile></code></td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "> |
| <code class="computeroutput">-desc <descriptorFile></code> |
| </td><td style="border-bottom: 0.5pt solid black; ">Loads the descriptor <code class="computeroutput"><descriptorFile></code></td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "> |
| <code class="computeroutput">-exec</code> |
| </td><td style="border-bottom: 0.5pt solid black; ">Runs the pre-loaded annotator; only allowed in conjunction with <code class="computeroutput">-desc</code> </td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "> |
| <code class="computeroutput">-datapath <datapath></code> |
| </td><td style="border-bottom: 0.5pt solid black; ">Sets the data path to <code class="computeroutput"><datapath></code></td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "> |
| <code class="computeroutput">-ini <iniFile></code> |
| </td><td style="border-bottom: 0.5pt solid black; ">Makes CVD use alternative ini file <code class="computeroutput"><textFile></code> (default is ~/annotViewer.pref)</td></tr><tr><td style="border-right: 0.5pt solid black; "> |
| <code class="computeroutput">-lookandfeel <lnfClass></code> |
| </td><td style="">Uses alternative look-and-feel <code class="computeroutput"><lnfClass></code></td></tr></tbody></table> |
| </div></div><br class="table-break"> |
| |
| </div> |
| |
| </div> |
| <div class="section" title="5.2. Error Handling"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="cvd.errorHandling">5.2. Error Handling</h2></div></div></div> |
| |
| <p> |
| On encountering |
| an error, CVD will pop up an error dialog with a short, |
| usually incomprehensible message. Often, the error message will |
| claim that there is more information available in the log file, and |
| sometimes, this is actually true; so do go and check the log. You |
| can view the log file by selecting the appropriate item in the |
| "Tools" menu. |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><img src="images/tools/tools.cvd/ErrorExample.jpg" alt="Sample error dialog"></div> |
| </div><p> |
| |
| </p> |
| </div> |
| |
| <div class="section" title="5.3. Preferences File"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="cvd.preferencesFile">5.3. Preferences File</h2></div></div></div> |
| |
| <p> |
| The program will attempt to read on startup and save on exit a file |
| called annotViewer.pref in your home directory. This file contains |
| information about choices you made while running the program: |
| directories (such as where your data files are) and window sizes. |
| These settings will be used the next time you use the program. There |
| is no user control over this process, but the file format is |
| reasonably transparent, in case you feel like changing it. Note, |
| however, that the file will be overwritten every time you exit the |
| program. |
| </p> |
| |
| <p> |
| If you use CVD for several projects, it may be convenient to use a different |
| ini files for each project. You can specify the ini file CVD should use |
| with the </p><pre class="programlisting">-ini <iniFile></pre><p> parameter on the |
| command line. |
| </p> |
| </div> |
| |
| <div class="section" title="5.4. The Menus"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="cvd.theMenus">5.4. The Menus</h2></div></div></div> |
| |
| <p> |
| We give a brief description of the various menus. All menu items come |
| with mnemonics (e.g., Alt-F X will exit the program). In addition, |
| some menu items have their own keyboard accelerators that you can use |
| anywhere in the program. For example, Ctrl-S will save the text |
| you've been editing. |
| </p> |
| <div class="section" title="5.4.1. The File Menu"><div class="titlepage"><div><div><h3 class="title" id="cvd.fileMenu">5.4.1. The File Menu</h3></div></div></div> |
| |
| <p> |
| The File menu lets you load, create and save text, load and save |
| color settings, and import and export the XCAS format. Here's a |
| screenshot. |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><img src="images/tools/tools.cvd/FileMenu.jpg" alt="The File menu"></div> |
| </div><p> |
| </p> |
| |
| <div class="itemizedlist"><p> |
| Below is a list of the menu items, together with an explanation. |
| </p><ul class="itemizedlist" type="disc"><li class="listitem"> |
| <p title="New Text..."> |
| <b>New Text... </b> |
| |
| Clears the text area. Text you type is written to an anonymous |
| buffer. You can use "Save Text As..." to save the text |
| you typed to a file. Note: whenever you modify the text, be it |
| through typing, loading a file or using the "New |
| Text..." menu item, previous analysis results will be lost. |
| Since the previous analysis is specific to the text, modifying |
| the text invalidates the analysis. |
| |
| </p> |
| </li><li class="listitem"> |
| <p title="Open Text File"> |
| <b>Open Text File. </b> |
| |
| Loads a new text file into the viewer. The next time you run an |
| analysis engine, it will run the text you loaded last. Depending |
| on the annotator you're using, the program may run slow with very |
| large text files, so you may want to experiment. |
| |
| </p> |
| </li><li class="listitem"> |
| <p title="Save Text File"> |
| <b>Save Text File. </b> |
| |
| Saves the currently open text file. If no file is currently |
| loaded (either because you haven't loaded a file, or you've used |
| the "New Text..." menu item), this menu item is |
| disabled (and Ctrl-S will do nothing). |
| |
| </p> |
| </li><li class="listitem"> |
| <p title="Save Text As..."> |
| <b>Save Text As... </b> |
| |
| Save the text to a file of your choosing. This can be an existing |
| file, which is then overwritten, or it can be a new file that |
| you're creating. |
| |
| </p> |
| </li><li class="listitem"> |
| <p title="Change Code Page"> |
| <b>Change Code Page. </b> |
| |
| Allows you to change the code page that is used to load and save |
| text files. If you're sure the text you're loading is in ASCII or |
| one of the 8-bit extensions such as ISO-8859-1 (ISO Latin1), |
| there is probably nothing you need to do. Just load the text and |
| look at the display. If you see no funny characters or square |
| boxes, chances are your selected code page is compatible with |
| your text file. |
| |
| Note that the code page setting is also in effect when you save |
| files. You can observe the effects with a hex editor or by just |
| looking at the file size. For example, if you save the default |
| text |
| <code class="computeroutput">This is where the text goes.</code> |
| to a file on Windows using the default code page, the size of the |
| file will be 28 bytes. If you now change the code page to UTF-16 |
| and save the file again, the file size will be 58 bytes: two |
| bytes for each character, plus two bytes for the byte-order mark. |
| Now switch the code page back to the default Windows code page |
| and reload the UTF-16 file to see the difference in the editor. |
| |
| CVD will display all code pages that are available in the JVM |
| you're running it on. The first code page in the list is the |
| default code page of your system. This is also CVD's default if |
| you don't make a specific choice. |
| |
| Your code page selection will be remembered in CVD's ini file. |
| |
| </p> |
| </li><li class="listitem"> |
| <p title="Load Color Settings"> |
| <b>Load Color Settings. </b> |
| |
| Load previously saved color settings from a file (see |
| Tools/Customize Annotation Display). It is highly recommended |
| that you only load automatically generated files. Strange things |
| may happen if you try to load the wrong file format. On startup, |
| the program attempts to load the last color settings file that |
| you loaded or saved during a previous session. If you intend to |
| use the same color settings as the last time you ran the program, |
| there is therefore no need to manually load a color settings |
| file. |
| |
| </p> |
| </li><li class="listitem"> |
| <p title="Save Color Settings"> |
| <b>Save Color Settings. </b> |
| |
| Save your customized color settings (see Tools/Customize |
| Annotation Display). The file is a Java properties file, and as |
| such, reasonably transparent. What is not transparent is the |
| encoding of the colors (integer encoding of 24-bit RGB values), |
| so changing the file by hand is not really recommended. |
| |
| </p> |
| </li><li class="listitem"> |
| <p title="Read Type System File"> |
| <b>Read Type System File. </b> |
| |
| Load a type system file. This allows you to load an XCAS file |
| without having to have access to the corresponding annotator. |
| |
| </p> |
| </li><li class="listitem"> |
| <p title="Write Type System File"> |
| <b>Write Type System File. </b> |
| |
| Create a type system file from the currently loaded type |
| definitions. In addition, you can save the current CAS as a XCAS |
| file (see below). This allows you to later load the type system |
| and XCAS to view the CAS without having to rerun the annotator. |
| |
| </p> |
| </li><li class="listitem"> |
| <p title="Read XMI CAS File"> |
| <b>Read XMI CAS File. </b> |
| |
| Read an XMI CAS file. Important: XMI CAS is a serialization format that |
| serializes a CAS without type system and index information. It is |
| therefore impossible to read in a stand-alone XMI CAS file. XMI CAS |
| files can only be interpreted in the context of an existing type |
| system. Consequently, you need to first load the Analysis Engine that was used to |
| create the XMI file, to be able to load that XMI file. |
| |
| </p> |
| </li><li class="listitem"> |
| <p title="Write XMI CAS File"> |
| <b>Write XMI CAS File. </b> |
| |
| Writes the current analysis out as an XMI CAS file. |
| |
| </p> |
| </li><li class="listitem"> |
| <p title="Read XCAS File"> |
| <b>Read XCAS File. </b> |
| |
| Read an XCAS file. Important: XCAS is a serialization format that |
| serializes a CAS without type system and index information. It is |
| therefore impossible to read in a stand-alone XCAS file. XCAS |
| files can only be interpreted in the context of an existing type |
| system. Consequently, you need to load the Analysis Engine that was used to |
| create the XCAS file to be able to load it. Loading a XCAS file |
| without loading the Analysis Engine may produce strange errors. You may get |
| syntax errors on loading the XCAS file, or worse, everything may |
| appear to go smoothly but in reality your CAS may be corrupted. |
| |
| </p> |
| </li><li class="listitem"> |
| <p title="Write XCAS File"> |
| <b>Write XCAS File. </b> |
| |
| Writes the current analysis out as an XCAS file. |
| |
| </p> |
| </li><li class="listitem"> |
| <p title="Exit"> |
| <b>Exit. </b> |
| Exits the program. Your preferences will be saved. |
| </p> |
| </li></ul></div> |
| |
| </div> |
| |
| <div class="section" title="5.4.2. The Edit Menu"><div class="titlepage"><div><div><h3 class="title" id="cvd.editMenu">5.4.2. The Edit Menu</h3></div></div></div> |
| |
| <p> |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><img src="images/tools/tools.cvd/EditMenu.jpg" alt="The Edit menu"></div> |
| </div><p> |
| |
| The "Edit" menu provides a standard text editing menu with |
| Cut, Copy and Paste, as well as unlimited Undo. |
| </p> |
| <p> |
| Note that standard keyboard accelerators Ctrl-X, Ctrl-C, Ctrl-V and |
| Ctrl-Z can be used for Cut, Copy, Paste and Undo, respectively. The |
| text area supports other standard keyboard operations such as |
| navigation HOME, Ctrl-HOME etc., as well as marking text with Shift- |
| <ArrowKey>. |
| </p> |
| </div> |
| |
| <div class="section" title="5.4.3. The Run Menu"><div class="titlepage"><div><div><h3 class="title" id="cvd.runMenu">5.4.3. The Run Menu</h3></div></div></div> |
| |
| <p> |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><img src="images/tools/tools.cvd/RunMenu.jpg" alt="The Run menu"></div> |
| </div><p> |
| |
| In the Run menu, you can load and run text analysis engines. |
| </p> |
| |
| <div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"> |
| <p title="Load AE"> |
| <b>Load AE. </b> |
| |
| Loads and initializes a text analysis engine. Choosing this menu |
| item will display a file open dialog where you should choose an |
| XML descriptor of a Text Analysis Engine to process the current |
| text. Even if the analysis engine runs fast, this will take a |
| while, since there is a lot of setup work to do when a new TAE is |
| created. So be patient. |
| |
| When you develop a new annotator, you will often need to |
| recompile your code. Gladis will not reload your annotator code. |
| When you recompile your code, you need to terminate the GUI and |
| restart it. If you only make changes to the XML descriptor, you |
| don't need to restart the GUI. Simply reload the XML file. |
| |
| </p> |
| </li><li class="listitem"> |
| <p title="Run AE"> |
| <b>Run AE. </b> |
| |
| Before you have (successfully) loaded a TAE, this menu item will |
| be disabled. After you have loaded a TAE, it will be enabled, and |
| the name changes according to the name of the TAE you have |
| loaded. For example, if you've loaded "The World's Fastest |
| Parser", you will have a menu item called "Run The |
| World's Fastest Parser". When you choose the item, the TAE |
| is run on whatever text you have currently loaded. |
| |
| After a TAE has run successfully, the index window in the upper |
| left-hand corner of the screen should be updated and show the |
| indexes that were created by this run. We will have more to say |
| about indexes and what to do with them later. |
| |
| </p> |
| </li><li class="listitem"> |
| <p title="Run AE on CAS"> |
| <b>Run AE on CAS. </b> |
| |
| This allows you to run an analysis engine on the current CAS. |
| This is useful if you have loaded a CAS from an XCAS file, and |
| would like to run further analysis on it. |
| |
| </p> |
| </li><li class="listitem"> |
| <p title="Run collectionProcessComplete"> |
| <b>Run collectionProcessComplete. </b> |
| |
| When you select this item, the analysis engine's |
| collectionProcessComplete() method is called. |
| |
| </p> |
| </li><li class="listitem"> |
| <p title="Performance Report"> |
| <b>Performance Report. </b> |
| |
| After you've run your analysis, you can view a performance report. It will show |
| you where the time went: which component used how much of the processing time. |
| |
| </p> |
| </li><li class="listitem"> |
| <p title="Recently used"> |
| <b>Recently used. </b> |
| |
| Collects a list of recently used analysis engines as a short-cut |
| for loading. |
| |
| </p> |
| </li><li class="listitem"> |
| <p title="Language"> |
| <b>Language. </b> |
| |
| Some annotators do language specific processing. For example, if |
| you run lexical analysis, the results may be quite different |
| depending on what the analysis engine thinks the language of the |
| document is. With this menu item, you can manually set the |
| document language. Alternatively, you can use an automatic |
| language identification annotator. If the analysis engines you're |
| working with are language agnostic, there is no need to set the |
| language. |
| |
| </p> |
| </li></ul></div> |
| </div> |
| |
| <div class="section" title="5.4.4. The tools menu"><div class="titlepage"><div><div><h3 class="title" id="cvd.toolsMenu">5.4.4. The tools menu</h3></div></div></div> |
| |
| <p> |
| The tools menu contains some assorted utilities, such as the log |
| file viewer. Here you can also set the log level for UIMA. |
| A more detailed description of some of the menu items |
| follows below. |
| </p> |
| <div class="section" title="5.4.4.1. View Type System"><div class="titlepage"><div><div><h4 class="title" id="cvd.viewTypeSystem">5.4.4.1. View Type System</h4></div></div></div> |
| |
| <p> |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><img src="images/tools/tools.cvd/TypeSystemViewer.jpg"></div> |
| </div><p> |
| |
| Brings up a new window that displays the type system. This menu |
| item is disabled until the first time you have run an analysis |
| engine, since there is no type system to display until then. An |
| example is shown above. |
| </p> |
| <p> |
| You can view the inheritance tree on the left by expanding and |
| collapsing nodes. When you select a type, the features defined on |
| that type are displayed in the table on the right. The feature |
| table has three columns. The first gives the name of the feature, |
| the second one the type of the feature (i.e., what values it |
| takes), and the third column displays the highest type this feature |
| is defined on. In this example, the features "begin" and |
| "end" are inherited from the built-in annotation type. |
| </p> |
| <p> |
| In the options menu, you can configure if you want to see inherited |
| features or not (not yet implemented). |
| </p> |
| </div> |
| |
| <div class="section" title="5.4.4.2. Show Selected Annotations"><div class="titlepage"><div><div><h4 class="title" id="cvd.showSelectedAnnotations">5.4.4.2. Show Selected Annotations</h4></div></div></div> |
| |
| <p> |
| </p><div class="figure"><a name="AnnotationViewerFigure"></a><div class="figure-contents"> |
| |
| <div class="mediaobject"><img src="images/tools/tools.cvd/AnnotationViewer.jpg"></div> |
| </div><p class="title"><b>Figure 5.1. |
| Annotations produced by a statistical named entity tagger |
| </b></p></div><p><br class="figure-break"> |
| </p> |
| |
| <p> |
| To enable this menu, you must have run an analysis engine and |
| selected the ``AnnotationIndex'' or one of its subnodes in the |
| upper left hand corncer of the screen. It will bring up a new text |
| window with all selected annotations marked up in the text. |
| </p> |
| <p> |
| <a class="xref" href="#AnnotationViewerFigure" title="Figure 5.1. Annotations produced by a statistical named entity tagger">Figure 5.1, “ |
| Annotations produced by a statistical named entity tagger |
| ”</a> |
| shows the results of applying a statistical named entity tagger to |
| a newspaper article. Some annotation colors have been customized: |
| countries are in reverse video, organizations have a turquois |
| background, person names are green, and occupations have a maroon |
| background. The default background color is yellow. This color is |
| also used if there is more than one annotation spanning a certain |
| text. Clearly, this display is only useful if you don't have any |
| overlapping annotations, or at least not too many. |
| </p> |
| <p> |
| This menu item is also available as a context menu in the Index |
| Tree area of the main window. To use it, select the annotation |
| index or one of its subnodes, right-click to bring up a popup menu, |
| and select the only item in the popup menu. The popup menu is |
| actually a better way to invoke the annotation display, since it |
| changes according to the selection in the Index Tree area, and will |
| tell you if what you've selected can be displayed or not. |
| </p> |
| |
| |
| </div> |
| |
| </div> |
| |
| </div> |
| |
| <div class="section" title="5.5. The Main Display Area"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="cvd.mainDisplayArea">5.5. The Main Display Area</h2></div></div></div> |
| |
| <p> |
| The main display area has three sub-areas. In the upper left-hand |
| corner is the |
| <span class="bold"><strong>index display</strong></span>, which shows the indexes that were defined in the |
| AE, as well as |
| the types of the indexes and their subtypes. In the lower left-hand |
| corner, the content of indexes and sub-indexes is displayed |
| (<span class="bold"><strong>FS display</strong></span>). Clicking on any node in the index display will |
| show the |
| corresponding feature structures in the FS display. You can explore |
| those structures by expanding the tree nodes. When you click on a |
| node that represents an annotation, clicking on it will cause the |
| corresponding text span to marked in the |
| <span class="bold"><strong>text display</strong></span>. |
| </p> |
| <p> |
| </p><div class="figure"><a name="Main1Figure"></a><div class="figure-contents"> |
| |
| <div class="mediaobject"><img src="images/tools/tools.cvd/Main1.jpg"></div> |
| </div><p class="title"><b>Figure 5.2. State of GUI after running an analysis engine</b></p></div><p><br class="figure-break"> |
| </p> |
| <p> |
| <a class="xref" href="#Main1Figure" title="Figure 5.2. State of GUI after running an analysis engine">Figure 5.2, “State of GUI after running an analysis engine”</a> |
| shows the state after running the UIMA_Analysis_Example.xml aggregate from the |
| uimaj-examples project. There are two indexes in the index display, and the |
| annotation index has been selected. Note that the number of |
| structures in an index is displayed in square brackets after the |
| index name. |
| </p> |
| <p> |
| Since displaying thousands of sister nodes is both confusing and |
| slow, nodes are grouped in powers of 10. As soon as there are no |
| more than 100 sister nodes, they are displayed next to each other. |
| </p> |
| <p> |
| In our example, a name annotation has been selected, and the |
| corresponding token text is highlighted in the text area. We have |
| also expanded the token node to display its structure (not much to see in this simple example). |
| </p> |
| <p> |
| In <a class="xref" href="#Main1Figure" title="Figure 5.2. State of GUI after running an analysis engine">Figure 5.2, “State of GUI after running an analysis engine”</a>, we selected an annotation in the FS display to find the |
| corresponding text. We can also do the reverse and find out what |
| annotations cover a certain point in the text. Let's go back to the |
| name recognizer for an example. |
| </p> |
| <p> |
| </p><div class="figure"><a name="Main2Figure"></a><div class="figure-contents"> |
| |
| <div class="mediaobject"><img src="images/tools/tools.cvd/Main2.jpg"></div> |
| </div><p class="title"><b>Figure 5.3. |
| Finding annotations for a specific location in the text |
| </b></p></div><p><br class="figure-break"> |
| </p> |
| <p> |
| We would like to know if the Michael Baessler has been |
| recognized as a name. So we position the cursor in the corresponding |
| text span somewhere, then right-click to bring up the context menu |
| telling us which annotations exist at this point. An example is shown |
| in |
| <a class="xref" href="#Main2Figure" title="Figure 5.3. Finding annotations for a specific location in the text">Figure 5.3, “ |
| Finding annotations for a specific location in the text |
| ”</a>. |
| </p> |
| <p> |
| </p><div class="figure"><a name="Main3Figure"></a><div class="figure-contents"> |
| |
| <div class="mediaobject"><img src="images/tools/tools.cvd/Main3.jpg"></div> |
| </div><p class="title"><b>Figure 5.4. |
| Selecting an annotation from the context menu will highlight that |
| annotation in the FS display |
| </b></p></div><p><br class="figure-break"> |
| </p> |
| |
| <p> |
| At this point (<a class="xref" href="#Main2Figure" title="Figure 5.3. Finding annotations for a specific location in the text">Figure 5.3, “ |
| Finding annotations for a specific location in the text |
| ”</a>), |
| we only know that somewhere around the text cursor position (not |
| visible in the picture), we discovered a name. When we select the corresponding entry in the |
| context menu, the name annotation is selected in the FS display, and its covered text is |
| highlighted. |
| <a class="xref" href="#Main3Figure" title="Figure 5.4. Selecting an annotation from the context menu will highlight that annotation in the FS display">Figure 5.4, “ |
| Selecting an annotation from the context menu will highlight that |
| annotation in the FS display |
| ”</a> shows the display after |
| the name node has been selected in |
| the popup menu. |
| </p> |
| <p> |
| We're glad to see that, indeed, Michael Baessler is |
| considered to be a name. Note that in the FS display, the |
| corresponding annotation node has been selected, and the tree has |
| been expanded to make the node visible. |
| </p> |
| <p> |
| NB that the annotations displayed in the popup menu come from the |
| annotations currently displayed in the FS display. If you didn't |
| select the annotation index or one of its sub-nodes, no annotations |
| can be displayed and the popup menu will be empty. |
| </p> |
| |
| <div class="section" title="5.5.1. The Status Bar"><div class="titlepage"><div><div><h3 class="title" id="cvd.statusBar">5.5.1. The Status Bar</h3></div></div></div> |
| |
| <p> |
| At the bottom of the screen, some useful information is displayed in |
| the |
| <span class="bold"><strong>status bar</strong></span>. The left-most area shows the most recent major event, with the |
| time when the event terminated in square brackets. The next area |
| shows the file name of the currently loaded XML descriptor. This |
| area supports a tool tip that will show the full path to the file. |
| The right-most area shows the current cursor position, or the extent |
| of the selection, if a portion of the text has been selected. The |
| numbers correspond to the character offsets that are used for |
| annotations. |
| </p> |
| </div> |
| |
| <div class="section" title="5.5.2. Keyboard Navigation and Shortcuts"><div class="titlepage"><div><div><h3 class="title" id="cvd.keyboardNavigation">5.5.2. Keyboard Navigation and Shortcuts</h3></div></div></div> |
| |
| <p> |
| The GUI can be completely navigated and operated through the |
| keyboard. All menus and menu items support keyboard mnemonics, and |
| some common operations are accessible through keyboard accelerators. |
| </p> |
| <p> |
| You can move the focus between the three main areas using |
| <code class="computeroutput">Tab</code> |
| (clockwise) and |
| <code class="computeroutput">Shift-Tab</code> |
| (counterclockwise). When the focus is on the text area, the |
| <code class="computeroutput">Tab</code> |
| key will insert the corresponding character into the text, so you |
| will need to use |
| <code class="computeroutput">Ctrl-Tab</code> |
| and |
| <code class="computeroutput">Ctrl-Shift-Tab</code> |
| instead. Alternatively, you can use the following key bindings to |
| jump directly to one of the areas: |
| <code class="computeroutput">Ctrl-T</code> |
| to focus the text area, |
| <code class="computeroutput">Ctrl-I</code> |
| for the index repository frame and |
| <code class="computeroutput">Ctrl-F</code> |
| for the feature structure area. |
| </p> |
| <p> |
| Some additional keyboard shortcuts are available only in the text |
| area, such as |
| <code class="computeroutput">Ctrl-X</code> |
| for Cut, |
| <code class="computeroutput">Ctrl-C</code> |
| for Copy, |
| <code class="computeroutput">Ctrl-V</code> |
| for Paste and |
| <code class="computeroutput">Ctrl-Z</code> |
| for Undo. The context menu in the text area can be evoke through the |
| <code class="computeroutput">Alt-Enter</code> |
| shortcut. Text can be selected using the arrow keys while holding |
| the |
| <code class="computeroutput">Shift</code> |
| key. |
| </p> |
| <p> |
| The following table shows the supported keyboard shortcuts. |
| </p> |
| <div class="table"><a name="cvd.table.keyboardShortcuts"></a><p class="title"><b>Table 5.2. Keyboard shortcuts</b></p><div class="table-contents"> |
| |
| <table summary="Keyboard shortcuts" style="border: none;"><colgroup><col><col><col></colgroup><thead><tr><th style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Shortcut</th><th style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Action</th><th style="border-bottom: 0.5pt solid black; ">Scope</th></tr></thead><tbody><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "> |
| <code class="computeroutput">Ctrl-O</code> |
| </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Open text file</td><td style="border-bottom: 0.5pt solid black; ">Global</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "> |
| <code class="computeroutput">Ctrl-S</code> |
| </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Save text file</td><td style="border-bottom: 0.5pt solid black; ">Global</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "> |
| <code class="computeroutput">Ctrl-L</code> |
| </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Load AE descriptor</td><td style="border-bottom: 0.5pt solid black; ">Global</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "> |
| <code class="computeroutput">Ctrl-R</code> |
| </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Run current AE</td><td style="border-bottom: 0.5pt solid black; ">Global</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "> |
| <code class="computeroutput">Ctrl-I</code> |
| </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Switch focus to index repository</td><td style="border-bottom: 0.5pt solid black; ">Global</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "> |
| <code class="computeroutput">Ctrl-T</code> |
| </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Switch focus to text area</td><td style="border-bottom: 0.5pt solid black; ">Global</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "> |
| <code class="computeroutput">Ctrl-F</code> |
| </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Switch focus to FS area</td><td style="border-bottom: 0.5pt solid black; ">Global</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "> |
| <code class="computeroutput">Ctrl-X</code> |
| </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Cut selection</td><td style="border-bottom: 0.5pt solid black; ">Text</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "> |
| <code class="computeroutput">Ctrl-C</code> |
| </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Copy selection</td><td style="border-bottom: 0.5pt solid black; ">Text</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "> |
| <code class="computeroutput">Ctrl-V</code> |
| </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Paste selection</td><td style="border-bottom: 0.5pt solid black; ">Text</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; "> |
| <code class="computeroutput">Ctrl-Z</code> |
| </td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; ">Undo</td><td style="border-bottom: 0.5pt solid black; ">Text</td></tr><tr><td style="border-right: 0.5pt solid black; "> |
| <code class="computeroutput">Alt-Enter</code> |
| </td><td style="border-right: 0.5pt solid black; ">Show context menu</td><td style="">Text</td></tr></tbody></table> |
| </div></div><br class="table-break"> |
| </div> |
| |
| </div> |
| </div> |
| <div class="chapter" title="Chapter 6. Eclipse Analysis Engine Launcher's Guide" id="ugr.tools.eclipse_launcher"><div class="titlepage"><div><div><h2 class="title">Chapter 6. Eclipse Analysis Engine Launcher's Guide</h2></div></div></div> |
| |
| |
| |
| |
| <p> |
| The Analysis Engine Launcher is an Eclipse plug-in that provides debug and run support |
| for Analysis Engines directly within eclipse, like a Java program can be debugged. |
| It supports most of the descriptor formats except CPE, UIMA AS and |
| some remote deployment descriptors. |
| </p> |
| |
| <div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="495"><tr><td><img src="images/tools/tools.eclipse_launcher/image01.png" width="495"></td></tr></table></div> |
| </div> |
| |
| <div class="section" title="6.1. Creating an Analysis Engine launch configuration"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.eclipse_launcher.create_configuration">6.1. Creating an Analysis Engine launch configuration</h2></div></div></div> |
| |
| <p> |
| To debug or run an Analysis Engine a launch configuration must be created. To do this |
| select "Run -> Run Configurations" or "Run -> Run Configurations" from the menu bar. A dialog |
| will open where the launch configuration can be created. Select UIMA Analysis Engine and create |
| a new configuration via pressing the New button at the top, or via the New button in the context menu. |
| The newly created configuration will be automatically selected and the Main tab will be displayed. |
| </p> |
| |
| <p> |
| The Main tab defines the Analysis Engine which will be launched. First select the project which |
| contains the descriptor, then choose a descriptor and select the input. The input can either be |
| a folder which contains input files or just a single input file, if the recursively check box |
| is marked the input folder will be scanned recursively for input files. |
| </p> |
| |
| <p> |
| The input format defines the format of the input files, if it is set to CASes the input resource |
| must be either in the XMI or XCAS format and if it is set to plain text, plain text input files in |
| the specified encoding are expected. The input logic filters out all files which do not have an appropriate |
| file ending, depending on the chosen format the file ending must be one of .xcas, .xmi or .txt, all |
| other files are ignored when the input is a folder, if a single file is selected it will be processed |
| independent of the file ending. |
| </p> |
| |
| <p> |
| The output directory is optional, if set all processed input files will be written to the specified |
| directory in the XMI CAS format, if the clear check box is marked all files inside the output folder will be deleted, usually |
| this option is not needed because existing files will be overwritten without notice. |
| </p> |
| |
| <p> |
| The other tabs in the launch configuration are documented in the eclipse documentation, |
| see the "Java development user guide -> Tasks -> Running and Debugging". |
| </p> |
| </div> |
| |
| <div class="section" title="6.2. Launching an Analysis Engine"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.eclipse_launcher.launching">6.2. Launching an Analysis Engine</h2></div></div></div> |
| |
| <p> |
| To launch an Analysis Engine go to the previously created launch configuration and |
| click on "Debug" or "Run" depending on the desired run mode. The Analysis Engine will |
| now be launched. The output will be shown in the Console View. To debug an Analysis Engine |
| place breakpoints inside the implementation class. If a breakpoint is hit the execution will pause |
| like in a Java program. |
| </p> |
| </div> |
| </div> |
| <div class="chapter" title="Chapter 7. Cas Editor User's Guide" id="ugr.tools.ce"><div class="titlepage"><div><div><h2 class="title">Chapter 7. Cas Editor User's Guide</h2></div></div></div> |
| |
| |
| |
| |
| <div class="section" title="7.1. Introduction"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="sandbox.caseditor.Introduction">7.1. Introduction</h2></div></div></div> |
| |
| |
| <p> |
| The CAS Editor is an Eclipse based annotation tool which supports manual and automatic |
| annotation (via running UIMA annotators) of CASes stored in files. |
| Currently only text-based CAS are supported. |
| The CAS Editor can visualize and edit all feature structures. |
| Feature Structures which are annotations can additionally be viewed and edited |
| directly on text. |
| </p> |
| |
| |
| |
| <div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="564"><tr><td><img src="images/tools/tools.caseditor/CasEditor.png" width="564"></td></tr></table></div> |
| </div> |
| </div> |
| |
| <div class="section" title="7.2. Launching the Cas Editor"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="sandbox.caseditor.Launching">7.2. Launching the Cas Editor</h2></div></div></div> |
| |
| <p> |
| To open a CAS in the Cas Editor it needs a compatible type system |
| and styling information which specify how to display the types. |
| The styling information is created automatically by the Cas Editor; |
| but the type system file must be provided by the user. |
| </p> |
| |
| <p>A CAS in the xmi or xcas format can simply be opened by clicking |
| on it, like a text file is opened with the Eclipse text editor.</p> |
| |
| <div class="section" title="7.2.1. Specifying a type system"><div class="titlepage"><div><div><h3 class="title" id="sandbox.caseditor.typeSystemSpec">7.2.1. Specifying a type system</h3></div></div></div> |
| |
| <p> |
| The Cas Editor expects a type system file at the root of the project |
| named TypeSystem.xml. If a type system cannot be found, this message is shown: |
| </p><div class="screenshot"> |
| <div class="mediaobject"><img src="images/tools/tools.caseditor/ProvideTypeSystem.png" alt="No type system available for the opened CAS."></div> |
| </div><p> |
| |
| </p> |
| <p> |
| If the type system file does not exist in this |
| location you can point the Cas Editor to a specific |
| type system file. |
| |
| You can also change the default type system location in the |
| properties page of the Eclipse project. To do that right click the project, |
| select Properties and go to the UIMA Type System tab, and specify the default |
| location for the type system file. |
| </p> |
| </div> |
| |
| <p> |
| After the Cas Editor is opened switch to the Cas Editor |
| Perspective to see all the Cas Editor related views. |
| </p> |
| </div> |
| |
| <div class="section" title="7.3. Annotation editor"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="sandbox.caseditor.annotation_editor">7.3. Annotation editor</h2></div></div></div> |
| |
| |
| <p> |
| The annotation editor shows the text with annotations and |
| provides different views to show aspects of the CAS. |
| |
| </p> |
| |
| <div class="section" title="7.3.1. Editor"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.cas_editor.annotation_editor.editor">7.3.1. Editor</h3></div></div></div> |
| |
| <p> |
| After the editor is open it shows the default sofa of the CAS. |
| (Displaying another sofa is right now not possible.) |
| The editor has an associated, changeable CAS Type. This type is called the editor "mode". |
| By default the editor only shows annotation of this type. Actions and views are |
| sensitive to this mode. The next screen shows the display, where the mode is set to "Person": |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="564"><tr><td><img src="images/tools/tools.caseditor/EditorOneType.png" width="564"></td></tr></table></div> |
| </div><p> |
| |
| To change the mode for the editor, use the "Mode" menu in the editor context menu. |
| To open the context menu right click somewhere on the text. |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="564"><tr><td><img src="images/tools/tools.caseditor/ModeMenu.png" width="564"></td></tr></table></div> |
| </div><p> |
| |
| The current mode is displayed in the status line at the bottom and in the Style View. |
| </p> |
| |
| <p> |
| It's possible to work with more than one annotation type at a time; the mode just selects the default annotation type |
| which can be marked with the fewest keystrokes. To show annotations of other types, use the "Show" menu in |
| the context menu. |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><img src="images/tools/tools.caseditor/ShowAnnotationsMenu.png"></div> |
| </div><p> |
| |
| Alternatively, you may select the annotation types to be shown in the Style View. |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><img src="images/tools/tools.caseditor/StyleView2.png"></div> |
| </div><p> |
| |
| The editor will show the additional selected types. |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="564"><tr><td><img src="images/tools/tools.caseditor/EditorAllTypes.png" width="564"></td></tr></table></div> |
| </div><p> |
| The annotation renderer and rendering layer can be changed in the Properties dialog. After the |
| change all editors which share the same type system will be updated. |
| </p> |
| <p> |
| The editor automatically selects annotations of the editor mode type that are near the |
| cursor. This selection is then synchronized or displayed in other views. |
| </p> |
| <p> |
| To create an annotation manually using the editor, mark a piece of text and then |
| press the enter key. This creates an annotation of the |
| type of the editor mode, having bounds corresponding to the selection. |
| You can also use the "Quick Annotate" action from the context menu. |
| </p> |
| <p> |
| It is also possible to choose the annotation type; press |
| shift + enter (smart insert) or click on "Annotate" in the context menu for this. |
| A dialog will ask for the annotation type to create; either select the desired type or use |
| the associated key shortcut. In the screen shot below, pressing the "p" key |
| will create a Person annotation for "Obama". |
| </p> |
| |
| <div class="screenshot"> |
| <div class="mediaobject"><img src="images/tools/tools.caseditor/ShiftEnter.png"></div> |
| </div> |
| |
| <p> |
| To delete an annotation, select it and press the delete |
| key. Only annotations of the editor mode can be deleted with this method. |
| To delete non-editor mode annotations use the Outline View. |
| </p> |
| |
| <p> |
| For annotation projects you can change the font size in the editor. |
| The default font size is 13. To change this open the Eclipse preference dialog, |
| go to "UIMA Annotation Editor". |
| </p> |
| </div> |
| |
| <div class="section" title="7.3.2. Configure annotation styling"><div class="titlepage"><div><div><h3 class="title" id="sandbox.caseditor.annotation_editor.styling">7.3.2. Configure annotation styling</h3></div></div></div> |
| |
| <p> |
| The Cas Editor can visualize the annotations in multiple |
| highlighting colors and with different annotation drawing styles. |
| The annotation styling is defined per type system. When its changed, |
| the appearance changes in all opened editors sharing a type system. |
| </p> |
| |
| <p> |
| The styling is initialized with a unique color for every |
| annotation type and every annotation is drawn with |
| Squiggles annotation style. You may adjust |
| the annotation styles and coloring depending on the project |
| needs. |
| </p> |
| |
| <div class="screenshot"> |
| <div class="mediaobject"><img src="images/tools/tools.caseditor/StyleView.png"></div> |
| </div> |
| |
| <p> |
| The Cas Editor offers a property page to edit the |
| styling. To open this property page click on the "Properties" |
| button in the Styles view. |
| </p> |
| |
| <p> |
| The property page can be seen below. By clicking on one of the |
| annotation types, the color, drawing style and drawing layer can be edited on the right |
| side. |
| </p> |
| |
| <div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="564"><tr><td><img src="images/tools/tools.caseditor/StyleProperties.png" width="564"></td></tr></table></div> |
| </div> |
| |
| <p> |
| The annotations can be visualized with one the following |
| annotation stlyes: |
| |
| |
| </p><div class="table"><a name="d5e1306"></a><p class="title"><b>Table 7.1. Style Table</b></p><div class="table-contents"> |
| <table summary="Style Table" style="border-collapse: collapse;border-top: 0.5pt solid black; border-bottom: 0.5pt solid black; border-left: 0.5pt solid black; border-right: 0.5pt solid black; "><colgroup><col><col><col></colgroup><thead><tr><th style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; " align="left">Style</th><th style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; " align="left">Sample</th><th style="border-bottom: 0.5pt solid black; " align="left">Description</th></tr></thead><tbody><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; " align="left">BACKGROUND</td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; " align="left"> |
| <div class="screenshot"> |
| <div class="mediaobject"><img src="images/tools/tools.caseditor/Style-Background.png"></div> |
| </div> |
| </td><td style="border-bottom: 0.5pt solid black; " align="left"> |
| <p>The background is drawn in the annotation color.</p> |
| </td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; " align="left">TEXT_COLOR</td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; " align="left"> |
| <div class="screenshot"> |
| <div class="mediaobject"><img src="images/tools/tools.caseditor/Style-TextColor.png"></div> |
| </div> |
| </td><td style="border-bottom: 0.5pt solid black; " align="left"> |
| <p>The text is drawn in the annotation color.</p> |
| </td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; " align="left">TOKEN</td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; " align="left"> |
| <div class="screenshot"> |
| <div class="mediaobject"><img src="images/tools/tools.caseditor/Style-Token.png"></div> |
| </div> |
| </td><td style="border-bottom: 0.5pt solid black; " align="left"> |
| <p> |
| The token type assumes that token annotation are always separated |
| by a whitespace. Only if they are not separated by a whitespace |
| a vertical line is drawn to display the two token annotations. |
| The image on the left actually contains three annotations, one for "Mr", "." |
| and "Obama". |
| </p> |
| </td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; " align="left">SQUIGGLES</td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; " align="left"> |
| <div class="screenshot"> |
| <div class="mediaobject"><img src="images/tools/tools.caseditor/Style-Squiggles.png"></div> |
| </div> |
| </td><td style="border-bottom: 0.5pt solid black; " align="left"> |
| <p>Squiggles are drawen under the annotation in the annotation color.</p> |
| </td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; " align="left">BOX</td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; " align="left"> |
| <div class="screenshot"> |
| <div class="mediaobject"><img src="images/tools/tools.caseditor/Style-Box.png"></div> |
| </div> |
| </td><td style="border-bottom: 0.5pt solid black; " align="left"> |
| <p>A box in the annotation color is drawn around |
| the annotation.</p> |
| </td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; " align="left">UNDERLINE</td><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; " align="left"> |
| <div class="screenshot"> |
| <div class="mediaobject"><img src="images/tools/tools.caseditor/Style-Underline.png"></div> |
| </div> |
| </td><td style="border-bottom: 0.5pt solid black; " align="left"> |
| <p>A line in the annotation color is drawen below |
| the annotation.</p> |
| </td></tr><tr><td style="border-right: 0.5pt solid black; " align="left">BRACKET</td><td style="border-right: 0.5pt solid black; " align="left"> |
| <div class="screenshot"> |
| <div class="mediaobject"><img src="images/tools/tools.caseditor/Style-Bracket.png"></div> |
| </div> |
| </td><td style="" align="left"> |
| <p>An opening bracket is drawn around the first |
| character of the annotation and a closing bracket |
| is drawn around the last character of the annotation.</p> |
| </td></tr></tbody></table> |
| </div></div><p><br class="table-break"> |
| </p> |
| |
| <p> |
| The Cas Editor can draw the annotations in different |
| layers. If the spans of two annotations overlap the annotation |
| which is in a higher layer is drawn over annotations in a lower |
| layer. Depending on the drawing style it is possible to see |
| both annotations. The drawing order is defined by the layer |
| number, layer 0 is drawn first, then layer 1 and so on. |
| If annotations in the same layer overlap its not defined which |
| annotation type is drawn first. |
| </p> |
| |
| |
| </div> |
| |
| <div class="section" title="7.3.3. CAS view support"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.cas_editor.annotation_editor.cas_views">7.3.3. CAS view support</h3></div></div></div> |
| |
| <p> |
| The Annotation Editor can only display text Sofa CAS views. Displaying |
| CAS views with Sofas of different types is not possible and will show |
| an editor page to switch back to another CAS view. The Edit and Feature Structure Browser views |
| are still available and might be used to edit Feature Structures which belong to the CAS view. |
| </p> |
| <p> |
| To switch to another CAS view, right click in the editor to open |
| the context menu and choose "CAS Views" and the view the editor |
| should switch to. |
| </p> |
| </div> |
| |
| <div class="section" title="7.3.4. Outline view"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.cas_editor.annotation_editor.outline">7.3.4. Outline view</h3></div></div></div> |
| |
| |
| <p> |
| The outline view gives an overview of the annoations which are |
| shown in the editor. The annotation are grouped by type. There are |
| actions to increase or decrease the bounds of the selected annotation. There is |
| also an action to merge selected annotations. The outline has second view mode where only |
| annotations of the current editor mode are shown. |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><img src="images/tools/tools.caseditor/Outline.png"></div> |
| </div><p> |
| |
| The style can be switched in the view menu, to a style where it only shows the annotations which |
| belong to the current editor mode. |
| |
| |
| </p> |
| </div> |
| |
| <div class="section" title="7.3.5. Edit Views"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.cas_editor.annotation_editor.properties_view">7.3.5. Edit Views</h3></div></div></div> |
| |
| <p> |
| The Edit Views show details about the currently |
| selected annotations or feature structures. It is |
| possible to change primitive values in this view. |
| Referenced feature structures can be created and deleted, |
| including arrays. To link a feature structure with |
| other feature structures, it can be pinned to the edit |
| view. This means that it does not change if the |
| selection changes. |
| </p> |
| <div class="screenshot"> |
| <div class="mediaobject"><img src="images/tools/tools.caseditor/EditView.png"></div> |
| </div> |
| </div> |
| |
| <div class="section" title="7.3.6. FeatureStructure View"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.cas_editor.annotation_editor.fs_view">7.3.6. FeatureStructure View</h3></div></div></div> |
| |
| <p> |
| The FeatureStructure View lists all feature structures of |
| a specified type. The type is selected in the type |
| combobox. |
| </p> |
| |
| <p> |
| It's possible to create and delete feature structures of |
| every type. |
| </p> |
| |
| <div class="screenshot"> |
| <div class="mediaobject"><img src="images/tools/tools.caseditor/FSView.png"></div> |
| </div> |
| </div> |
| </div> |
| <div class="section" title="7.4. Implementing a custom Cas Editor View"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.cas_editor.custom_view">7.4. Implementing a custom Cas Editor View</h2></div></div></div> |
| |
| <p> |
| Custom Cas Editor views can be added, |
| to rapidly create, access and/or change Feature Structures in the CAS. |
| While the Annotation Editor and its views offer support for general viewing and editing, |
| accessing and editing things in the CAS can be streamlined using a custom Cas Editor. |
| A custom Cas Editor view can be |
| programmed to use a particular type system and optimized to quickly change or show something. |
| </p> |
| <p> |
| Annotation projects often need to track the annotation status of a CAS where a user |
| needs to mark which parts have been annotated or corrected. To do this with the Cas Editor |
| a user would need to use the Feature Structure Browser view to select the Feature Structure |
| and then edit it inside the Edit view. |
| A custom Cas Editor view could directly select and show the Feature Structure and offer |
| a tailored user interface to change the annotation status. |
| Some features such as the name of the annotator could even be automatically filled in. |
| </p> |
| <p> |
| The creation of Feature Structures which are linked to existing annotations or Feature Structures |
| is usually difficult with the standard views. A custom view which can make assumptions about the |
| type system is usually needed to do this efficiently. |
| </p> |
| <div class="section" title="7.4.1. Annotation Status View Sample"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.cas_editor.custom_view.sample">7.4.1. Annotation Status View Sample</h3></div></div></div> |
| |
| <p> |
| The Cas Editor provides the CasEditorView class as a base class for views which need to access |
| the CAS which is opened in the current editor. It shows a "view not available" message when the current |
| editor does not show a CAS, no editor is opened at all or the current CAS view is incompatible with |
| the view. |
| </p> |
| <p> |
| The following snippet shows how it is usually implemented: |
| </p><pre class="programlisting"> |
| public class AnnotationStatusView extends CasEditorView { |
| |
| public AnnotationStatusView() { |
| super("The Annotation Status View is currently not available."); |
| } |
| |
| @Override |
| protected IPageBookViewPage doCreatePage(ICasEditor editor) { |
| ICasDocument document = editor.getDocument(); |
| |
| if (document != null) { |
| return new AnnotationStatusViewPage(editor); |
| } |
| |
| return null; |
| } |
| } |
| </pre><p> |
| The doCreatePage method is called to create the actual view page. If the document |
| is null the editor failed to load a document and is showing an error message. |
| In the case the document is not null but the CAS view is incompatible the method |
| should return null to indicate that it has nothing to show. In this case the |
| "not available" message is displayed. |
| </p> |
| <p> |
| The next step is to implement the AnnotationStatusViewPage. That is the page which |
| gets the CAS as input and need to provide the user with a ui to change the Annotation |
| Status Feature Structure. |
| </p><pre class="programlisting"> |
| public class AnnotationStatusViewPage extends Page { |
| |
| private ICasEditor editor; |
| |
| AnnotationStatusViewPage(ICasEditor editor) { |
| this.editor = editor; |
| } |
| |
| ... |
| |
| public void createControl(Composite parent) { |
| |
| // create ui elements here |
| |
| ... |
| |
| ICasDocument document = editor.getDocument(); |
| CAS cas = document.getCAS(); |
| |
| // Retrieve Annotation Status FS from CAS |
| // and initalize the ui elements with it |
| |
| FeatureStructre statusFS; |
| |
| ... |
| |
| // Add event listeners to the ui element |
| // to save an update to the CAS |
| // and to advertise a change |
| |
| ... |
| |
| // Send update event |
| document.update(statusFS); |
| |
| } |
| } |
| </pre><p> |
| The above code sketches out how a typical view page is implemented. The CAS can be directly used |
| to access any Feature Structures or annotations stored in it. |
| When something is modified added/removed/changed that must be advertised via the ICasDocument object. |
| It has multiple notification methods which send an event so that other views can be updated. |
| The view itself can also register a listener to receive CAS change events. |
| </p> |
| </div> |
| </div> |
| </div> |
| <div class="chapter" title="Chapter 8. JCasGen User's Guide" id="ugr.tools.jcasgen"><div class="titlepage"><div><div><h2 class="title">Chapter 8. JCasGen User's Guide</h2></div></div></div> |
| |
| |
| <p>JCasGen reads a descriptor for an application (either an Analysis Engine Descriptor, |
| or a Type System Descriptor), creates the merged type system |
| specification by merging all the type system information from all the components |
| referred to in the descriptor, and then uses this merged type system to create Java source |
| files for classes that enable JCas access to the CAS. Java classes are not produced for the |
| built-in types, since these classes are already provided by the UIMA SDK. (An exception is |
| the built-in type <code class="literal">uima.tcas.DocumentAnnotation</code>, see the warning below.) </p> |
| |
| <div class="warning" title="Warning" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Warning</h3><p>If the components comprising the input to the type merging process |
| have different definitions for the same type name, |
| JCasGen will show a warning, and in some environments may offer to abort the operation. |
| If you continue past this warning, |
| JCasGen will produce correct Java source files representing the merged types |
| (that is, the |
| type definition containing all of the features defined on that type by all of the |
| components). It is recommended that you do not use this capability (of having |
| two different definitions for the same type name, with different feature sets) since it can make it |
| difficult to combine/package your annotator with others. See <a href="references.html#d5e1" class="olink">UIMA References</a> |
| <a href="references.html#ugr.ref.jcas.merging_types_from_other_specs" class="olink">Section 5.5, “Merging Types”</a> for more information. |
| </p> |
| |
| <p>Also note that if your type system declares a custom version of the |
| <code class="literal">uima.tcas.DocumentAnnotation</code> |
| built-in type, then JCasGen will generate a Java source file for it. If you do this, you need to be |
| aware of the issues discussed in <a href="references.html#d5e1" class="olink">UIMA References</a> |
| <a href="references.html#ugr.ref.jcas.documentannotation_issues" class="olink">Section 5.5.4, “Adding Features to DocumentAnnotation”</a>.</p></div> |
| |
| <p>JCasGen can be run in many ways. For Eclipse users using the Component Descriptor Editor, there's a button |
| on the Type System Description page to run it on that type system. There's also a jcasgen-maven-plugin to use |
| in maven build scripts. There's a menu-driven GUI tool for it. |
| And, there are command line scripts you can use to invoke it.</p> |
| |
| <p>There are several versions of JCasGen. The basic version reads an XML descriptor |
| which contains a type system descriptor, and generates the corresponding Java Class |
| Models for those types. Variants exist for the Eclipse environment that allow merging the |
| newly generated Java source code with previously augmented versions; see <a href="references.html#d5e1" class="olink">UIMA References</a> <a href="references.html#ugr.ref.jcas.augmenting_generated_code" class="olink">Section 5.4, “Augmenting the generated Java Code”</a> for a discussion of how the |
| Java Class Models can be augmented by adding additional methods and fields.</p> |
| |
| <p>Input to JCasGen needs to be mostly self-contained. In particular, any types that are |
| defined to depend on user-defined supertypes must have that supertype defined, if the |
| supertype is <code class="literal">uima.tcas.Annotation </code>or a subtype of it. Any features |
| referencing ranges which are subtypes of uima.cas.String must have those subtypes |
| included. If this is not followed, a warning message is given stating that the resulting |
| generation may be inaccurate.</p> |
| |
| <p>JCasGen is typically invoked automatically when using the Component Descriptor |
| Editor (see <a href="tools.html#ugr.tools.cde.auto_jcasgen" class="olink">Section 1.8, “Type System Page”</a>), but can also be run using a shell |
| script. These scripts can take 0, 1, or 2 arguments. The first argument is the location of |
| the file containing the input XML descriptor. The second argument specifies where the |
| generated Java source code should go. If it isn't given, JCasGen generates its |
| output into a subfolder called JCas (or sometimes JCasNew – see below), of the first |
| argument's path.</p> |
| |
| <p>The first argument, the input file, can be written as |
| <code class="literal">jar:<url>!{entry}</code>, for example: |
| <code class="literal">jar:http://www.foo.com/bar/baz.jar!/COM/foo/quux.class</code></p> |
| |
| <p>If no arguments are given to JCasGen, then it launches a GUI to interact with the user |
| and ask for the same input. The GUI will remember the arguments you previously used. |
| Here's what it looks like: |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="574"><tr><td><img src="images/tools/tools.jcasgen/image002.jpg" width="574" alt="JCasGen tool showing fields for input arguments"></td></tr></table></div> |
| </div> |
| |
| <p>When running with automatic merging of the generated Java source with previously |
| augmented versions, the output location is where the merge function obtains the source |
| for the merge operation.</p> |
| |
| <p>As is customary for Java, the generated class source files are placed in the |
| appropriate subdirectory structure according to Java conventions that correspond to |
| the package (name space) name.</p> |
| |
| <p>The Java classes must be compiled and the resulting class files included in the class |
| path of your application; you make these classes available for other annotator writers |
| using your types, perhaps packaged as an xxx.jar file. If the xxx.jar file is made to |
| contain only the Java Class Models for the CAS types, it can be reused by any users of these |
| types.</p> |
| |
| <div class="section" title="8.1. Running stand-alone without Eclipse"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.jcasgen.running_without_eclipse">8.1. Running stand-alone without Eclipse</h2></div></div></div> |
| |
| |
| <p>There is no capability to automatically merge the generated Java source with |
| previous versions, unless running with Eclipse. If run without Eclipse, no automatic |
| merging of the generated Java source is done with any previous versions. In this case, |
| the output is put in a folder called <span class="quote">“<span class="quote">JCasNew</span>”</span> unless overridden by |
| specifying a second argument.</p> |
| |
| <p>The distribution includes a shell script/bat file to run the stand-alone version, |
| called jcasgen.</p> |
| |
| </div> |
| |
| <div class="section" title="8.2. Running stand-alone with Eclipse"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.jcasgen.running_standalone_with_eclipse">8.2. Running stand-alone with Eclipse</h2></div></div></div> |
| |
| |
| <p>If you have Eclipse and EMF (EMF = Eclipse Modeling Framework; both of these are |
| available from <a class="ulink" href="http://www.eclipse.org" target="_top">http://www.eclipse.org</a>) installed (version 3 or |
| later) JCasGen can merge the Java code it generates with previous versions, picking up |
| changes you might have inserted by hand. The output (and source of the merge input) is in a |
| folder <span class="quote">“<span class="quote">JCas</span>”</span> under the same path as the input XML file, unless |
| overridden by specifying a second argument.</p> |
| |
| <p>You must install the UIMA plug-ins into Eclipse to enable this function.</p> |
| |
| <p>The distribution includes a shell script/bat file to run the stand-alone with |
| Eclipse version, called jcasgen_merge. This works by starting Eclipse in |
| <span class="quote">“<span class="quote">headless</span>”</span> mode (no GUI) and invoking JCasGen within Eclipse. You will |
| need to set the ECLIPSE_HOME environment variable or modify the jcasgen_merge shell |
| script to specify where to find Eclipse. The version of Eclipse needed is 3 or higher, |
| with the EMF plug-in and the UIMA runtime plug-in installed. A temporary workspace is |
| used; the name/location of this is customizable in the shell script.</p> |
| |
| <p>Log and error messages are written to the UIMA log. This file is called uima.log, and |
| is located in the default working directory, which if not overridden, is the startup |
| directory of Eclipse.</p> |
| |
| </div> |
| |
| <div class="section" title="8.3. Running within Eclipse"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.jcasgen.running_within_eclipse">8.3. Running within Eclipse</h2></div></div></div> |
| |
| |
| <p>There are two ways to run JCasGen within Eclipse. The first way is to configure an |
| Eclipse external tools launcher, and use it to run the stand-alone shell scripts, with |
| the arguments filled in. Here's a picture of a typical launcher configuration |
| screen (you get here by navigating from the top menu: Run –> External Tools |
| –> External tools...). |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="574"><tr><td><img src="images/tools/tools.jcasgen/image004.jpg" width="574" alt="Running JCasGen within Eclipse using the external tool launcher"></td></tr></table></div> |
| </div> |
| |
| <p>The second way (which is the normal way it's done) to run within Eclipse is to use the |
| Component Descriptor Editor (CDE) (see <a href="tools.html#ugr.tools.cde" class="olink">Chapter 1, <i>Component Descriptor Editor User's Guide</i></a>). This tool can be configured to automatically |
| launch JCasGen whenever the type system descriptor is modified. In this release, this |
| operation completely regenerates the files, even if just a small thing changed. For |
| very large type systems, you probably don't want to enable this all the time. The |
| configurator tool has an option to enable/disable this function.</p> |
| </div> |
| |
| <div class="section" title="8.4. Using the jcasgen-maven-plugin"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.jcasgen.maven_plugin">8.4. Using the jcasgen-maven-plugin</h2></div></div></div> |
| |
| |
| <p>For Maven builds, you can use the jcasgen-maven-plugin to take one or more |
| top level descriptors (Type System or Analysis Engine descriptors), merge them |
| together in the standard way UIMA merges type definitions, and produce the corresponding |
| JCas source classes. These, by default, are generated to the standard spot for Maven |
| builds for generated files.</p> |
| |
| <p>You can use ant-like include / exclude patterns to specify the top level descriptor |
| files. If you set <limitToProject> to true, then after a complete UIMA type system |
| merge is done with all of the types, including those that are imported, only those |
| types which are defined within this Maven project (that is, in some subdirectory of the project) |
| will be generated.</p> |
| |
| <p>To use the jcasgen-maven-plugin, specify it in the POM as follows:</p> |
| <pre class="programlisting"><plugin> |
| <groupId>org.apache.uima</groupId> |
| <artifactId>jcasgen-maven-plugin</artifactId> |
| <version>2.4.1</version> <!-- change this to the latest version --> |
| <executions> |
| <execution> |
| <goals><goal>generate</goal></goals> <!-- this is the only goal --> |
| <!-- runs in phase process-resources by default --> |
| <configuration> |
| |
| <!-- REQUIRED --> |
| <typeSystemIncludes> |
| <!-- one or more ant-like file patterns |
| identifying top level descriptors --> |
| <typeSystemInclude>src/main/resources/MyTs.xml |
| </typeSystemInclude> |
| </typeSystemIncludes> |
| |
| <!-- OPTIONAL --> |
| <!-- a sequence of ant-like file patterns |
| to exclude from the above include list --> |
| <typeSystemExcludes> |
| </typeSystemExcludes> |
| |
| <!-- OPTIONAL --> |
| <!-- where the generated files go --> |
| <!-- default value: |
| ${project.build.directory}/generated-sources/jcasgen" --> |
| <outputDirectory> |
| </outputDirectory> |
| |
| <!-- true or false, default = false --> |
| <!-- if true, then although the complete merged type system |
| will be created internally, only those types whose |
| definition is contained within this maven project will be |
| generated. The others will be presumed to be |
| available via other projects. --> |
| <!-- OPTIONAL --> |
| <limitToProject>false</limitToProject> |
| </configuration> |
| </execution> |
| </executions> |
| </plugin></pre> |
| </div> |
| |
| </div> |
| <div class="chapter" title="Chapter 9. PEAR Packager User's Guide" id="ugr.tools.pear.packager"><div class="titlepage"><div><div><h2 class="title">Chapter 9. PEAR Packager User's Guide</h2></div></div></div> |
| |
| |
| <p>A PEAR (Processing Engine ARchive) file is a standard package for UIMA (Unstructured |
| Information Management Architecture) components. The PEAR package can be used for |
| distribution and reuse by other components or applications. It also allows applications |
| and tools to manage UIMA components automatically for verification, deployment, |
| invocation, testing, etc. Please refer to <a href="references.html#d5e1" class="olink">UIMA References</a> |
| <a href="references.html#ugr.ref.pear" class="olink">Chapter 6, <i>PEAR Reference</i></a> |
| for more information about the internal structure of a PEAR file.</p> |
| |
| <p>This chapter describes how to use the PEAR Eclipse plugin or the PEAR command line packager |
| to create PEAR files for standard UIMA components.</p> |
| |
| <div class="section" title="9.1. Using the PEAR Eclipse Plugin"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.pear.packager.using_eclipse_plugin">9.1. Using the PEAR Eclipse Plugin</h2></div></div></div> |
| |
| |
| <p>The PEAR Eclipse plugin is automatically installed if you followed the directions in |
| <a href="overview_and_setup.html#d4e1" class="olink">UIMA Overview & SDK Setup</a> |
| <a href="overview_and_setup.html#ugr.ovv.eclipse_setup" class="olink">Chapter 3, <i>Setting up the Eclipse IDE to work with UIMA</i></a>. The use of the |
| plugin involves the following two steps:</p> |
| |
| <div class="itemizedlist"><ul class="itemizedlist" type="disc" compact><li class="listitem"><p>Add the UIMA nature to your project |
| </p></li><li class="listitem"><p>Create a PEAR file using the PEAR generation wizard </p> |
| </li></ul></div> |
| |
| <div class="section" title="9.1.1. Add UIMA Nature to your project"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.pear.packager.add_uima_nature">9.1.1. Add UIMA Nature to your project</h3></div></div></div> |
| |
| |
| <p>First, create a project for your UIMA component:</p> |
| |
| <div class="itemizedlist"><ul class="itemizedlist" type="disc" compact><li class="listitem"><p>Create a Java project, which |
| would contain all the files and folders needed for your UIMA component.</p> |
| </li><li class="listitem"><p>Create a source folder called <span class="quote">“<span class="quote">src</span>”</span> in your |
| project, and make it the only source folder, by clicking on |
| <span class="quote">“<span class="quote">Properties</span>”</span> in your project's context menu (right-click), |
| then select <span class="quote">“<span class="quote">Java Build Path</span>”</span>, then add the <span class="quote">“<span class="quote">src</span>”</span> |
| folder to the source folders list, and remove any other folder from the |
| list.</p></li><li class="listitem"><p>Specify an output folder for your project called bin, by clicking |
| on <span class="quote">“<span class="quote">Properties</span>”</span> in your project's context menu |
| (right-click), then select <span class="quote">“<span class="quote">Java Build Path</span>”</span>, and specify |
| <span class="quote">“<span class="quote"><span class="emphasis"><em>your_project_name</em></span>/bin</span>”</span> as the default |
| output folder. </p></li></ul></div> |
| |
| <p>Then, add the UIMA nature to your project by clicking on <span class="quote">“<span class="quote">Add UIMA |
| Nature</span>”</span> in the context menu (right-click) of your project. Click |
| <span class="quote">“<span class="quote">Yes</span>”</span> on the <span class="quote">“<span class="quote">Adding UIMA custom Nature</span>”</span> dialog box. |
| Click <span class="quote">“<span class="quote">OK</span>”</span> on the confirmation dialog box. |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="574"><tr><td><img src="images/tools/tools.pear.packager/image002.jpg" width="574" alt="Screenshot of Adding the UIMA Nature to your project"></td></tr></table></div> |
| </div> |
| |
| <p>Adding the UIMA nature to your project creates the PEAR structure in your |
| project. The PEAR structure is a structured tree of folders and files, including the |
| following elements: |
| |
| </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="bold"><strong>Required |
| Elements:</strong></span> |
| |
| </p><div class="itemizedlist"><ul class="itemizedlist" type="circle"><li class="listitem"><p>The <span class="bold"><strong> |
| metadata</strong></span> folder which contains the PEAR installation descriptor |
| and properties files.</p></li><li class="listitem"><p>The installation descriptor (<span class="bold"><strong> |
| metadata/install.xml</strong></span>) |
| </p></li></ul></div></li><li class="listitem"><p><span class="bold"><strong>Optional Elements:</strong></span> |
| |
| </p><div class="itemizedlist"><ul class="itemizedlist" type="circle"><li class="listitem"><p>The <span class="bold"><strong> |
| desc</strong></span> folder to contain descriptor files of analysis engines, |
| component analysis engines (all levels), and other component (Collection |
| Readers, CAS Consumers, etc).</p></li><li class="listitem"><p>The <span class="bold"><strong>src </strong></span>folder to |
| contain the source code</p></li><li class="listitem"><p>The <span class="bold"><strong>bin</strong></span> folder to |
| contain executables, scripts, class files, dlls, shared libraries, |
| etc.</p></li><li class="listitem"><p>The <span class="bold"><strong>lib</strong></span> folder to |
| contain jar files. </p></li><li class="listitem"><p>The <span class="bold"><strong>doc </strong></span>folder |
| containing documentation materials, preferably accessible through an |
| index.html.</p></li><li class="listitem"><p>The <span class="bold"><strong>data</strong></span> folder to |
| contain data files (e.g. for testing).</p></li><li class="listitem"><p>The <span class="bold"><strong>conf</strong></span> folder to |
| contain configuration files.</p></li><li class="listitem"><p>The <span class="bold"><strong>resources</strong></span> folder |
| to contain other resources and dependencies.</p></li><li class="listitem"><p>Other user-defined folders or files are allowed, but |
| <span class="emphasis"><em>should be avoided</em></span>. </p></li></ul></div><p> </p></li></ul></div> |
| |
| <p>For more information about the PEAR structure, please refer to the |
| <span class="quote">“<span class="quote">Processing Engine Archive</span>”</span> section. |
| |
| </p><div class="figure"><a name="ugr.tools.pear.packager.fig.pear_structure"></a><div class="figure-contents"> |
| |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="297"><tr><td><img src="images/tools/tools.pear.packager/image004.jpg" width="297" alt="Pear structure"></td></tr></table></div> |
| </div><p class="title"><b>Figure 9.1. The Pear Structure</b></p></div><p><br class="figure-break"></p> |
| |
| </div> |
| <div class="section" title="9.1.2. Using the PEAR Generation Wizard"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.pear.packager.using_pear_generation_wizard">9.1.2. Using the PEAR Generation Wizard</h3></div></div></div> |
| |
| |
| <p>Before using the PEAR Generation Wizard, add all the files needed to |
| run your component including descriptors, jars, external libraries, resources, |
| and component analysis engines (in the case of an aggregate analysis engine), etc. |
| <span class="emphasis"><em>Do not</em></span> add Jars for the UIMA framework, however. Doing so will |
| cause class loading problems at run time.</p> |
| <p> |
| If you're using a Java IDE like Eclipse, instead of using the output folder (usually |
| <code class="literal">bin</code> as the source of your classes, it's recommended that |
| you generate a Jar file containing these classes.</p> |
| |
| <p>Then, click on <span class="quote">“<span class="quote">Generate PEAR file</span>”</span> from the context menu |
| (right-click) of your project, to open the PEAR Generation wizard, and follow the |
| instructions on the wizard to generate the PEAR file.</p> |
| |
| <div class="section" title="9.1.2.1. The Component Information page"><div class="titlepage"><div><div><h4 class="title" id="ugr.tools.pear.packager.wizard.component_information">9.1.2.1. The Component Information page</h4></div></div></div> |
| |
| |
| <p>The first page of the PEAR generation wizard is the component information |
| page. Specify in this page a component ID for your PEAR and select the main Analysis |
| Engine descriptor. The descriptor must be specified using a pathname relative to |
| the project's root (e.g. <span class="quote">“<span class="quote">desc/MyAE.xml</span>”</span>). The component id |
| is a string that uniquely identifies the component. It should use the JAVA naming |
| convention (e.g. org.apache.uima.mycomponent).</p> |
| |
| <p>Optionally, you can include specific Collection Iterator, CAS Initializer (deprecated |
| as of Version 2.1), |
| or CAS Consumers. In this case, specify the corresponding descriptors in this |
| page. |
| |
| </p><div class="figure"><a name="ugr.tools.pear.packager.fig.wizard.component_information"></a><div class="figure-contents"> |
| |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="574"><tr><td><img src="images/tools/tools.pear.packager/image006.jpg" width="574" alt="Pear Wizard - component information page"></td></tr></table></div> |
| </div><p class="title"><b>Figure 9.2. The Component Information Page</b></p></div><p><br class="figure-break"></p> |
| |
| </div> |
| |
| <div class="section" title="9.1.2.2. The Installation Environment page"><div class="titlepage"><div><div><h4 class="title" id="ugr.tools.pear.packager.wizard.install_environment">9.1.2.2. The Installation Environment page</h4></div></div></div> |
| |
| |
| <p>The installation environment page is used to specify the following: |
| |
| </p><div class="itemizedlist"><ul class="itemizedlist" type="disc" compact><li class="listitem"><p>Preferred operating |
| system</p></li><li class="listitem"><p>Required JDK version, if applicable.</p></li><li class="listitem"><p>Required Environment variable settings. This is where |
| you specify special CLASSPATH paths. You do not need to specify this for |
| any Jar that is listed in the your eclipse project classpath settings; those are automatically |
| put into the generated CLASSPATH. Nor should you include paths to the |
| UIMA Framework itself, here. Doing so may cause class loading problems. |
| </p> |
| |
| <p>CLASSPATH segments are written here using a semicolon ";" as the separator; |
| during PEAR installation, these will be adjusted to be the correct character for the |
| target Operating System.</p> |
| |
| <p>In order to specify the UIMA datapath for your component you have to create an environment |
| variable with the property name <code class="literal">uima.datapath</code>. The value of this property |
| must contain the UIMA datapath settings.</p> |
| |
| </li></ul></div> |
| |
| <p>Path names should be specified using macros (see below), instead of |
| hard-coded absolute paths that might work locally, but probably won't if the |
| PEAR is deployed in a different machine and environment.</p> |
| |
| <p>Macros are variables such as $main_root, used to represent a string such as the |
| full path of a certain directory.</p> |
| |
| <p>These macros should be defined in the PEAR.properties file using the local |
| values. The tools and applications that use and deploy PEAR files should replace |
| these macros (in the files included in the conf and desc folders) with the |
| corresponding values in the local environment as part of the deployment |
| process.</p> |
| |
| <p>Currently, there are two types of macros:</p> |
| |
| <div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>$main_root, which represents the local absolute |
| path of the main component root directory after deployment.</p></li><li class="listitem"><p><span class="emphasis"><em>$component_id$root</em></span>, which |
| represents the local absolute path to the root directory of the component which |
| has <span class="emphasis"><em>component_id</em></span> as component ID. This component could |
| be, for instance, a delegate component. </p></li></ul></div> |
| |
| <div class="figure"><a name="ugr.tools.pear.packager.fig.wizard.install_environment"></a><div class="figure-contents"> |
| |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="574"><tr><td><img src="images/tools/tools.pear.packager/image008.jpg" width="574" alt="Pear Wizard - install environment page"></td></tr></table></div> |
| </div><p class="title"><b>Figure 9.3. The Installation Environment Page</b></p></div><br class="figure-break"> |
| |
| </div> |
| |
| <div class="section" title="9.1.2.3. The PEAR file content page"><div class="titlepage"><div><div><h4 class="title" id="ugr.tools.pear.packager.wizard.file_content">9.1.2.3. The PEAR file content page</h4></div></div></div> |
| |
| |
| <p>The last page of the wizard is the <span class="quote">“<span class="quote">PEAR file Export</span>”</span> page, which |
| allows the user to select the files to include in the PEAR file. The metadata folder |
| and all its content is mandatory. Make sure you include all the files needed to run |
| your component including descriptors, jars, external libraries, resources, and |
| component analysis engines (in the case of an aggregate analysis engine), etc. |
| It's recommended to generate a jar file from your code as an alternative to |
| building the project and making sure the output folder (bin) contains the required |
| class files.</p> |
| |
| <p>Eclipse compiles your class files into some output directory, often named |
| "bin" when you take the usual defaults in Eclipse. The recommended practice is to |
| take all these files and put them into a Jar file, perhaps using the Eclipse Export |
| wizard. You would place that Jar file into the PEAR <code class="literal">lib</code> directory.</p> |
| |
| <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>If you are relying on the class files generated in the output folder |
| (usually called bin) to run your code, then make sure the project is built properly, |
| and all the required class files are generated without errors, and then put the |
| output folder (e.g. $main_root/bin) in the classpath using the option to set |
| environment variables, by setting the CLASSPATH variable to include this folder (see the |
| <span class="quote">“<span class="quote">Installation Environment</span>”</span> page. |
| Beware that using a Java output folder named "bin" in this case is a poor practice, |
| because the PEAR installation |
| tools will presume this folder contains binary executable files, and will adds this folder to |
| the PATH environment variable. |
| </p> </div> |
| <div class="figure"><a name="ugr.tools.pear.packager.fig.wizard.export"></a><div class="figure-contents"> |
| |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="564"><tr><td><img src="images/tools/tools.pear.packager/image010.jpg" width="564" alt="Pear Wizard - File Export Page"></td></tr></table></div> |
| </div><p class="title"><b>Figure 9.4. The PEAR File Export Page</b></p></div><br class="figure-break"> |
| |
| |
| |
| </div> |
| </div> |
| </div> |
| |
| <div class="section" title="9.2. Using the PEAR command line packager"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.pear.packager.using_command_line">9.2. Using the PEAR command line packager</h2></div></div></div> |
| |
| <p>The PEAR command line packager takes some PEAR package parameter settings on the command line to create an |
| UIMA PEAR file.</p> |
| |
| <p>To run the PEAR command line packager you can use the provided runPearPackager (.bat for Windows, and .sh for Unix) |
| scripts. The packager can be used in three different modes.</p> |
| <div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"> |
| <p>Mode 1: creates a complete PEAR package with the provided information (default mode)</p> |
| <pre class="programlisting">runPearPackager -compID <componentID> |
| -mainCompDesc <mainComponentDesc> [-classpath <classpath>] |
| [-datapath <datapath>] -mainCompDir <mainComponentDir> |
| -targetDir <targetDir> [-envVars <propertiesFilePath>]</pre> |
| <p> The created PEAR file has the file name <componentID>.pear and is located in the <targetDir>.</p> |
| </li><li class="listitem"> |
| <p>Mode 2: creates a PEAR installation descriptor without packaging the PEAR file</p> |
| <pre class="programlisting">runPearPackager -create -compID <componentID> |
| -mainCompDesc <mainComponentDesc> [-classpath <classpath>] |
| [-datapath <datapath>] -mainCompDir <mainComponentDir> |
| [-envVars <propertiesFilePath>]</pre> |
| <p> The PEAR installation descriptor is created in the <mainComponentDir>/metadata directory.</p> |
| </li><li class="listitem"> |
| <p>Mode 3: creates a PEAR package with an existing PEAR installation descriptor</p> |
| <pre class="programlisting">runPearPackager -package -compID <componentID> |
| -mainCompDir <mainComponentDir> -targetDir <targetDir></pre> |
| <p> The created PEAR file has the file name <componentID>.pear and is located in the <targetDir>.</p> |
| </li></ul></div><p> |
| </p> |
| <p>The modes 2 and 3 should be used when you want to manipulate the PEAR installation descriptor before packaging |
| the PEAR file. </p> |
| |
| <p>Some more details about the PearPackager parameters is provided in the list below:</p> |
| <div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"> |
| <code class="literal"><componentID></code>: PEAR package component ID. |
| </li><li class="listitem"> |
| <code class="literal"><mainComponentDesc></code>: Main component descriptor of the PEAR package. |
| </li><li class="listitem"> |
| <code class="literal"><classpath></code>: PEAR classpath settings. Use $main_root macros to specify |
| path entries. Use <code class="literal">;</code> to separate the entries. |
| </li><li class="listitem"> |
| <code class="literal"><datapath></code>: PEAR datapath settings. Use $main_root macros to specify |
| path entries. Use <code class="literal">;</code> to separate the path entries. |
| </li><li class="listitem"> |
| <code class="literal"><mainComponentDir></code>: Main component directory that contains the PEAR package content. |
| </li><li class="listitem"> |
| <code class="literal"><targetDir></code>: Target directory where the created PEAR file is written to. |
| </li><li class="listitem"> |
| <code class="literal"><propertiesFilePath></code>: Path name to a properties file that contains environment variables that must be |
| set to run the PEAR content. |
| </li></ul></div><p> |
| |
| </p> |
| </div> |
| |
| </div> |
| <div class="chapter" title="Chapter 10. The PEAR Packaging Maven Plugin" id="ugr.tools.pear.packager.maven.plugin.usage"><div class="titlepage"><div><div><h2 class="title">Chapter 10. The PEAR Packaging Maven Plugin</h2></div></div></div> |
| |
| <p> |
| UIMA includes a Maven plugin that supports creating PEAR packages using Maven. |
| When configured for a project, it assumes that the project has the PEAR layout, |
| and will copy the standard directories that are part of a PEAR structure under the |
| project root into the PEAR, excluding files that start with a period ("."). |
| It also will put the Jar that is built for the project |
| into the lib/ directory and include it first on the generated classpath. |
| </p> |
| |
| <p> |
| The classpath that is generated for this includes the artifact's Jar first, any user specified |
| entries second (in the order they are specified), and finally, entries for all Jars |
| found in the lib/ directory (in some arbitrary order). |
| </p> |
| |
| <div class="section" title="10.1. Specifying the PEAR Packaging Maven Plugin"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.pear.packager.maven.plugin.usage.configure">10.1. Specifying the PEAR Packaging Maven Plugin</h2></div></div></div> |
| |
| |
| <p> |
| To use the PEAR Packaging Plugin within a Maven build, |
| the plugin must be added to the plugins section of the |
| Maven POM as shown below: |
| </p> |
| <p> |
| </p><pre class="programlisting"><build> |
| <plugins> |
| ... |
| <plugin> |
| <groupId>org.apache.uima</groupId> |
| <artifactId>PearPackagingMavenPlugin</artifactId> |
| |
| <!-- if version is omitted, then --> |
| <!-- version is inherited from parent's pluginManagement section --> |
| <!-- otherwise, include a version element here --> |
| |
| <!-- says to load Maven extensions |
| (such as packaging and type handlers) from this plugin --> |
| <extensions>true</extensions> |
| <executions> |
| <execution> |
| <phase>package</phase> |
| <!-- where you specify details of the thing being packaged --> |
| <configuration> |
| |
| <classpath> |
| <!-- PEAR file component classpath settings --> |
| $main_root/lib/sample.jar |
| </classpath> |
| |
| <mainComponentDesc> |
| <!-- PEAR file main component descriptor --> |
| desc/${artifactId}.xml |
| </mainComponentDesc> |
| |
| <componentId> |
| <!-- PEAR file component ID --> |
| ${artifactId} |
| </componentId> |
| |
| <datapath> |
| <!-- PEAR file UIMA datapath settings --> |
| $main_root/resources |
| </datapath> |
| |
| </configuration> |
| <goals> |
| <goal>package</goal> |
| </goals> |
| </execution> |
| </executions> |
| </plugin> |
| ... |
| </plugins> |
| </build> |
| </pre><p> |
| </p> |
| |
| <p> |
| To configure the plugin with the specific settings of a PEAR package, the |
| <code class="code"><configuration></code> element section is used. This sections contains all parameters |
| that are used by the PEAR Packaging Plugin to package the right content and set the specific PEAR package settings. |
| The details about each parameter and how it is used is shown below: |
| </p> |
| <p> |
| </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"> |
| <p> |
| <code class="code"><classpath></code> |
| - This element specifies the classpath settings for the |
| PEAR component. The Jar artifact that is built during the current Maven build is |
| automatically added to the PEAR classpath settings and does not have to be added manually. |
| In addition, all Jars in the lib directory and its subdirectories will be added to the |
| generated classpath when the PEAR is installed. |
| </p> |
| <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3> |
| <p>Use $main_root variables to refer to libraries inside |
| the PEAR package. For more details about PEAR packaging please refer to the |
| Apache UIMA PEAR documentation.</p> |
| </div> |
| </li><li class="listitem"> |
| <p> |
| <code class="code"><mainComponentDesc></code> |
| - This element specifies the relative path to the main component descriptor |
| that should be used to run the PEAR content. The path must be relative to the |
| project root. A good default to use is <code class="code">desc/${artifactId}.xml</code>. |
| </p> |
| </li><li class="listitem"> |
| <p> |
| <code class="code"><componentID></code> |
| - This element specifies the PEAR package component ID. A good default |
| to use is <code class="code">${artifactId}</code>. |
| </p> |
| </li><li class="listitem"> |
| <p> |
| <code class="code"><datapath></code> |
| - This element specifies the PEAR package UIMA datapath settings. |
| If no datapath settings are necessary, this element can be omitted. |
| </p> |
| <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3> |
| <p>Use $main_root variables to refer libraries inside |
| the PEAR package. For more details about PEAR packaging please refer to the |
| Apache UIMA PEAR documentation.</p> |
| </div> |
| </li></ul></div><p> |
| </p> |
| <p> |
| For most Maven projects it is sufficient to specify the parameters described above. In some cases, for |
| more complex projects, it may be necessary to specify some additional configuration |
| parameters. These parameters are listed below with the default values that are used if they are not |
| added to the configuration section shown above. |
| </p> |
| <p> |
| </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"> |
| <p> |
| <code class="code"><mainComponentDir></code> |
| - This element specifies the main component directory where the UIMA |
| nature is applied. By default this parameter points to the project root |
| directory - ${basedir}. |
| </p> |
| </li><li class="listitem"> |
| <p> |
| <code class="code"><targetDir></code> |
| - This element specifies the target directory where the result of the plugin |
| are written to. By default this parameters points to the default Maven output |
| directory - ${basedir}/target |
| </p> |
| </li></ul></div><p> |
| </p> |
| </div> |
| |
| <div class="section" title="10.2. Automatically including dependencies"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.pear.packager.maven.plugin.usage.dependencies">10.2. Automatically including dependencies</h2></div></div></div> |
| |
| |
| <p> |
| A key concept in PEARs is that they allow specifying other Jars in the classpath. |
| You can optionally include these Jars within the PEAR package. |
| </p> |
| <p> |
| The PEAR Packaging Plugin does not take care of automatically |
| adding these Jars (that the PEAR might depend on) to the PEAR archive. |
| However, this |
| behavior can be manually added to your Maven POM. |
| The following two build plugins |
| hook into the build cycle and insure that all runtime |
| dependencies are included in the PEAR file. |
| </p> |
| |
| |
| <p> |
| The dependencies will be automatically included in the |
| PEAR file using this procedure; the pear install process also will automatically |
| adds all files in the lib directory (and sub directories) to the |
| classpath. |
| </p> |
| |
| |
| <p> |
| The <code class="code">maven-dependency-plugin</code> |
| copies the runtime dependencies of the PEAR into the |
| <code class="code">lib</code> folder, which is where the PEAR packaging |
| plugin expects them. |
| </p> |
| |
| <pre class="programlisting"><build> |
| <plugins> |
| ... |
| <plugin> |
| <groupId>org.apache.maven.plugins</groupId> |
| <artifactId>maven-dependency-plugin</artifactId> |
| <executions> |
| <!-- Copy the dependencies to the lib folder for the PEAR to copy --> |
| <execution> |
| <id>copy-dependencies</id> |
| <phase>package</phase> |
| <goals> |
| <goal>copy-dependencies</goal> |
| </goals> |
| <configuration> |
| <outputDirectory>${basedir}/lib</outputDirectory> |
| <overWriteSnapshots>true</overWriteSnapshots> |
| <includeScope>runtime</includeScope> |
| </configuration> |
| </execution> |
| </executions> |
| </plugin> |
| ... |
| </plugins> |
| </build> |
| </pre> |
| |
| <p> |
| The second Maven plug-in hooks into the <code class="code">clean</code> |
| phase of the build life-cycle, and deletes the |
| <code class="code">lib</code> folder. |
| </p> |
| |
| <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3> |
| <p> |
| With this approach, the <code class="code">lib</code> folder is |
| automatically created, populated, and removed |
| during the build process. Therefore it should not go into |
| the source control system and neither should you |
| manually place any jars in there. |
| </p> |
| </div> |
| |
| <pre class="programlisting"><build> |
| <plugins> |
| ... |
| <plugin> |
| <artifactId>maven-antrun-plugin</artifactId> |
| <executions> |
| <!-- Clean the libraries after packaging --> |
| <execution> |
| <id>CleanLib</id> |
| <phase>clean</phase> |
| <configuration> |
| <tasks> |
| <delete quiet="true" |
| failOnError="false"> |
| <fileset dir="lib" includes="**/*.jar"/> |
| </delete> |
| </tasks> |
| </configuration> |
| <goals> |
| <goal>run</goal> |
| </goals> |
| </execution> |
| </executions> |
| </plugin> |
| ... |
| </plugins> |
| </build> |
| </pre> |
| |
| </div> |
| |
| <div class="section" title="10.3. Running from the command line"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.pear.packager.maven.plugin.commandline">10.3. Running from the command line</h2></div></div></div> |
| |
| <p> |
| The pear packager can be run as a maven command. To enable this, you have to add the following to your |
| maven settings file: |
| </p><pre class="programlisting"><settings> |
| ... |
| <pluginGroups> |
| <pluginGroup>org.apache.uima</pluginGroup> |
| </pluginGroups></pre><p> |
| To invoke the pear packager using maven, use the command: |
| </p><pre class="programlisting">mvn uima-pear:package <parameters...></pre><p> |
| The settings are the same ones used in the configuration above, specified as -D variables |
| where the variable name is pear.parameterName. |
| For example: |
| </p><pre class="programlisting">mvn uima-pear:package -Dpear.mainComponentDesc=desc/mydescriptor.xml |
| -Dpear.componentId=foo</pre><p> |
| </p> |
| </div> |
| |
| <div class="section" title="10.4. Building the PEAR Packaging Plugin From Source"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.pear.packager.maven.plugin.install.src">10.4. Building the PEAR Packaging Plugin From Source</h2></div></div></div> |
| |
| <p> |
| The plugin code is available in the Apache |
| subversion repository at: |
| <a class="ulink" href="http://svn.apache.org/repos/asf/uima/uimaj/trunk/PearPackagingMavenPlugin" target="_top">http://svn.apache.org/repos/asf/uima/uimaj/trunk/PearPackagingMavenPlugin</a>. |
| Use the following command line to build it (you will need the Maven build tool, available from Apache): |
| </p> |
| <p> |
| </p><pre class="programlisting">#PearPackagingMavenPlugin> mvn install</pre><p> |
| </p> |
| <p> |
| This maven command will build the tool and install it in your local maven repository, |
| making it available for use by other maven POMs. The plugin version number |
| is displayed at the end of the Maven build as shown in the example below. For this example, the plugin |
| version number is: <code class="code">2.3.0-incubating</code> |
| </p> |
| <p> |
| </p><pre class="programlisting">[INFO] Installing |
| /code/apache/PearPackagingMavenPlugin/target/ |
| PearPackagingMavenPlugin-2.3.0-incubating.jar |
| to |
| /maven-repository/repository/org/apache/uima/PearPackagingMavenPlugin/ |
| 2.3.0-incubating/ |
| PearPackagingMavenPlugin-2.3.0-incubating.jar |
| [INFO] [plugin:updateRegistry] |
| [INFO] -------------------------------------------------------------- |
| [INFO] BUILD SUCCESSFUL |
| [INFO] -------------------------------------------------------------- |
| [INFO] Total time: 6 seconds |
| [INFO] Finished at: Tue Nov 13 15:07:11 CET 2007 |
| [INFO] Final Memory: 10M/24M |
| [INFO] --------------------------------------------------------------</pre><p> |
| </p> |
| </div> |
| </div> |
| <div class="chapter" title="Chapter 11. PEAR Installer User's Guide" id="ugr.tools.pear.installer"><div class="titlepage"><div><div><h2 class="title">Chapter 11. PEAR Installer User's Guide</h2></div></div></div> |
| |
| |
| <p>PEAR (Processing Engine ARchive) is a new standard for packaging UIMA compliant |
| components. This standard defines several service elements that should be included in |
| the archive package to enable automated installation of the encapsulated UIMA |
| component. The major PEAR service element is an XML Installation Descriptor that |
| specifies installation platform, component attributes, custom installation |
| procedures and environment variables. </p> |
| |
| <p>The installation of a UIMA compliant component includes 2 steps: (1) installation of |
| the component code and resources in a local file system, and (2) verification of the |
| serviceability of the installed component. Installation of the component code and |
| resources involves extracting component files from the archive (PEAR) package in a |
| designated directory and localizing file references in component descriptors and other |
| configuration files. Verification of the component serviceability is accomplished |
| with the help of standard UIMA mechanisms for instantiating analysis engines. |
| |
| |
| </p><div class="screenshot"> |
| <div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="574"><tr><td><img src="images/tools/tools.pear.installer/image002.jpg" width="574" alt="PEAR Installer GUI"></td></tr></table></div> |
| </div> |
| |
| <p>There are two versions of the PEAR Installer. One is an interactive, GUI-based |
| application which puts up a panel asking for the parameters of the installation; the |
| other is a command line interface version where you pass the parameters needed on the command |
| line itself. To launch the GUI version of the PEAR Installer, use the script in the UIMA bin directory: |
| <code class="code">runPearInstaller.bat</code> or <code class="code">runPearInstaller.sh.</code> |
| The command line is launched using <code class="code">runPearInstallerCli.cmd</code> or |
| <code class="code">runPearInstallerCli.sh.</code></p> |
| |
| <p>The PEAR Installer installs UIMA |
| compliant components (analysis engines) from PEAR packages in a local file system. To |
| install a desired UIMA component the user needs to select the appropriate PEAR file in a |
| local file system and specify the installation directory (optional). If no installation |
| directory is specified, the PEAR file is installed to the current working directory. |
| By default the PEAR packages are not installed directly to the specified installation directory. |
| For each PEAR a subdirectory with the name of the PEAR's ID is created where the PEAR package is |
| installed to. If the PEAR installation directory already exists, the old content is automatically |
| deleted before the new content is installed. During the |
| component installation the user can read messages printed by the installation program in |
| the message area of the application window. If the installation fails, appropriate error |
| message is printed to help identifying and fixing the problem.</p> |
| |
| <p>After the desired UIMA component is successfully installed, the PEAR Installer |
| allows testing this component in the CAS Visual Debugger (CVD) application, which is |
| provided with the UIMA package. The CVD application will load your UIMA component using |
| its XML descriptor file. If the component is loaded successfully, you'll be able to |
| run it either with sample documents provided in the |
| <code class="literal"><UIMA_HOME>/examples/data</code> directory, or with any other |
| sample documents. See <a href="tools.html#ugr.tools.cvd" class="olink">Chapter 5, <i>CAS Visual Debugger</i></a> for more information about the CVD application. |
| Running your component in the CVD application helps to make sure the component will run in |
| other UIMA applications. If the CVD application fails to load or run your component, or |
| throws an exception, you can find more information about the problem in the uima.log file |
| in the current working directory. The log file can be viewed with the CVD.</p> |
| |
| <p>PEAR Installer creates a file named <code class="literal">setenv.txt</code> in the |
| <code class="literal"><component_root>/metadata</code> directory. This file contains |
| environment variables required to run your component in any UIMA application. |
| It also creates a PEAR descriptor (see also <a href="references.html#d5e1" class="olink">UIMA References</a> |
| <a href="references.html#ugr.ref.pear.specifier" class="olink">Section 6.3, “PEAR package descriptor”</a>) |
| file named <code class="literal"><componentID>_pear.xml</code> |
| in the <code class="literal"><component_root></code> directory that can be used to directly run |
| the installed pear file in your application. |
| </p> |
| |
| <p> |
| The metadata/setenv.txt is not read by the UIMA framework anywhere. |
| It's there for use by non-UIMA application code if that code wants to set environment variables. |
| The metadata/setenv.txt is just a "convenience" file duplicating what's in the xml. |
| </p> |
| |
| <p> |
| The setenv.txt file has 2 special variables: the CLASSPATH and the PATH. |
| The CLASSPATH is computed from any supplied CLASSPATH environment variable, |
| plus the jars that are configured in the PEAR structure, including subcomponents. |
| The PATH is similarly computed, using any supplied PATH environment variable plus |
| it includes the "bin" subdirectory of the PEAR structure, if it exists. |
| </p> |
| |
| <p>The command line version of the PEAR installer has one required argument: |
| the path to the PEAR file being installed. A second argument can specify the |
| installation directory (default is the current working directory). |
| An optional argument, one of "-c" or "-check" or "-verify", causes verification to be done |
| after installation, as described above.</p> |
| |
| </div> |
| <div class="chapter" title="Chapter 12. PEAR Merger User's Guide" id="ugr.tools.pear.merger"><div class="titlepage"><div><div><h2 class="title">Chapter 12. PEAR Merger User's Guide</h2></div></div></div> |
| |
| |
| <p>The PEAR Merger utility takes two or more PEAR files and merges their contents, |
| creating a new PEAR which has, in turn, a new Aggregate analysis engine whose delegates are |
| the components from the original files being merged. It does this by (1) copying the |
| contents of the input components into the output component, placing each component into a |
| separate subdirectory, (2) generating a UIMA descriptor for the output Aggregate |
| analysis engine and (3) creating an output PEAR file that encapsulates the output |
| Aggregate.</p> |
| |
| <p>The merge logic is quite simple, and is intended to work for simple cases. More complex |
| merging needs to be done by hand. Please see the Restrictions and Limitations section, |
| below.</p> |
| |
| <p>To run the PearMerger command line utility you can use the runPearMerger scripts (.bat for Windows, and .sh for |
| Unix). The usage of the tooling is shown below:</p> |
| |
| <pre class="programlisting">runPearMerger 1st_input_pear_file ... nth_input_pear_file |
| -n output_analysis_engine_name [-f output_pear_file ]</pre> |
| |
| <p>The first group of parameters are the input PEAR files. No duplicates are allowed |
| here. The <code class="literal">-n</code> parameter is the name of the generated Aggregate |
| Analysis Engine. The optional <code class="literal">-f</code> parameter specifies the name of |
| the output file. If it is omitted, the output is written to |
| <code class="literal">output_analysis_engine_name.pear</code> in the current working directory.</p> |
| |
| <p>During the running of this tool, work files are written to a temporary directory |
| created in the user's home directory.</p> |
| |
| <div class="section" title="12.1. Details of the merging process"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.pear.merger.merge_details">12.1. Details of the merging process</h2></div></div></div> |
| |
| |
| <p>The PEARs are merged using the following steps:</p> |
| |
| <div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>A temporary working directory, is created for the |
| output aggregate component.</p></li><li class="listitem"><p>Each input PEAR file is extracted into a separate |
| 'input_component_name' folder under the working directory.</p> |
| </li><li class="listitem"><p>The extracted files are processed to adjust the |
| '$main_root' macros. This operation differs from the PEAR installation |
| operation, because it does not replace the macros with absolute paths.</p> |
| </li><li class="listitem"><p>The output PEAR directory structure, 'metadata' and |
| 'desc' folders under the working directory, are created.</p> |
| </li><li class="listitem"><p>The UIMA AE descriptor for the output aggregate component is built |
| in the 'desc' folder. This aggregate descriptor refers to the input |
| delegate components, specifying 'fixed flow' based on the original |
| order of the input components in the command line. The aggregate descriptor's |
| 'capabilities' and |
| 'operational properties' sections are built based on the input |
| components' specifications.</p></li><li class="listitem"><p>A new PEAR installation descriptor is created in the |
| 'metadata' folder, referencing the new output aggregate descriptor |
| built in the previous step. </p></li><li class="listitem"><p>The content of the temporary output working directory is zipped to |
| created the output PEAR, and then the temporary working directory is deleted. |
| </p></li></ol></div> |
| |
| <p>The PEAR merger utility logs all the operations both to standard console output and |
| to a log file, pm.log, which is created in the current working directory.</p> |
| |
| </div> |
| |
| <div class="section" title="12.2. Testing and Modifying the resulting PEAR"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.pear.merger.testing_modifying_resulting_pear">12.2. Testing and Modifying the resulting PEAR</h2></div></div></div> |
| |
| |
| <p>The output PEAR file can be installed and tested using the PEAR Installer. The |
| output aggregate component can also be tested by using the CVD or DocAnalyzer |
| tools.</p> |
| |
| <p>The PEAR Installer creates Eclipse project files (.classpath and .project) in the |
| root directory of the installer PEAR, so the installed component can be imported into |
| the Eclipse IDE as an external project. Once the component is in the Eclipse IDE, |
| developers may use the Component Descriptor Editor and the PEAR Packager to modify the |
| output aggregate descriptor and re-package the component.</p> |
| |
| </div> |
| <div class="section" title="12.3. Restrictions and Limitations"><div class="titlepage"><div><div><h2 class="title" style="clear: both" id="ugr.tools.pear.merger.restrictions_limitations">12.3. Restrictions and Limitations</h2></div></div></div> |
| |
| |
| <p>The PEAR Merger utility only does basic merging operations, and is limited as |
| follows. You can overcome these by editing the resulting PEAR file or the resulting |
| Aggregate Descriptor.</p> |
| |
| <div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>The Merge operation specifies Fixed Flow sequencing |
| for the Aggregate.</p></li><li class="listitem"><p>The merged aggregate does not define any parameters, so the delegate |
| parameters cannot be overridden.</p></li><li class="listitem"><p>No External Resource definitions are generated for the |
| aggregate.</p></li><li class="listitem"><p>No Sofa Mappings are generated for the aggregate.</p> |
| </li><li class="listitem"><p>Name collisions are not checked for. Possible name collisions could |
| occur in the fully-qualified class names of the implementing Java classes, the names |
| of JAR files, the names of descriptor files, and the names of resource bindings or |
| resource file paths.</p></li><li class="listitem"><p>The input and output capabilities are generated based on merging the |
| capabilities from the components (removing duplicates). Capability sets are |
| ignored - only the first of the set is used in this process, and only one set is created |
| for the generated Aggregate. There is no support for merging Sofa |
| specifications.</p></li><li class="listitem"><p>No Indexes or Type Priorities are created for the generated |
| Aggregate. No checking is done to see if the Indexes or Type Priorities of the |
| components conflict or are inconsistent.</p></li><li class="listitem"><p>You can only merge Analysis Engines and CAS Consumers. </p> |
| </li><li class="listitem"><p>Although PEAR file installation descriptors that are being merged |
| can have specific XML elements describing Collection Reader and CAS Consumer |
| descriptors, these elements are ignored during the merge, in the sense that the |
| installation descriptor that is created by the merge does not set these elements. The |
| merge process does not use these elements; the output PEAR's new aggregate only |
| references the merged components' main PEAR descriptor element, as |
| identified by the PEAR element: |
| |
| </p><pre class="programlisting"><SUBMITTED_COMPONENT> |
| <DESC>the_component.xml</DESC>... |
| </SUBMITTED_COMPONENT> |
| </pre> |
| </li></ol></div> |
| |
| </div> |
| |
| </div> |
| </div></body></html> |