| <?xml version="1.0" encoding="UTF-8"?> |
| <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" |
| "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"[ |
| <!ENTITY imgroot "../images/tools/tools.cde/" > |
| <!ENTITY % uimaents SYSTEM "../entities.ent" > |
| %uimaents; |
| ]> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| <chapter id="ugr.tools.cde"> |
| <title>Component Descriptor Editor User's Guide</title> |
| <titleabbrev>CDE User's Guide</titleabbrev> |
| |
| <para>The Component Descriptor Editor is an Eclipse plug-in that provides a forms-based |
| interface for creating and editing UIMA XML descriptors. It supports most of the |
| descriptor formats, except the Collection Processing Engine descriptor, the PEAR |
| package descriptor and some remote deployment descriptors.</para> |
| |
| <section id="ugr.tools.cde.launching"> |
| <title>Launching the Component Descriptor Editor</title> |
| |
| <para>Here's how to launch this tool on a descriptor contained in the examples. This |
| presumes you have installed the examples as described in the SDK Installation and Setup |
| chapter.</para> |
| |
| <itemizedlist spacing="compact"><listitem><para>Expand the uimaj-examples |
| project in the Eclipse Navigator or Package Explorer view</para></listitem> |
| |
| <listitem><para>Within this project, browse to the file |
| descriptors/tutorial/ex1/RoomNumberAnnotator.xml.</para></listitem> |
| |
| <listitem><para>Right-click on this file and select Open With → Component |
| Descriptor Editor. (If this option is not present, check to make sure you installed |
| the plug-ins as described <olink targetdoc="&uima_docs_overview;" |
| targetptr="ugr.ovv.eclipse_setup.installation"/>. The EMF plugin is also |
| required.).</para></listitem> |
| |
| <listitem><para>This should open a graphical editor and display the contents of the |
| RoomNumberAnnotator descriptor. </para></listitem></itemizedlist> |
| |
| </section> |
| |
| <section id="ugr.tools.cde.creating_new_ae_descriptor"> |
| <title>Creating a New AE Descriptor</title> |
| |
| <para>A new AE descriptor file may be created by selecting the File → New → |
| Other... menu. This brings up the following dialog: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="5.8in" format="JPG" fileref="&imgroot;image002.jpg"/> |
| </imageobject> |
| <textobject><phrase>Screenshot of selecting new UIMA component in Eclipse</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>If the user then selects UIMA and Analysis Engine Descriptor File, and clicks the |
| Next > button, the following dialog is displayed. We will cover creating other kinds |
| of components later in the documentation. |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="3.2in" format="JPG" fileref="&imgroot;image004.jpg"/> |
| </imageobject> |
| <textobject><phrase>Screenshot of selecting new UIMA component in Eclipse |
| after pushing Next</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>After entering the appropriate parent folder and file name, and clicking Finish, |
| an initial AE descriptor file is created with the given name, and the descriptor is |
| opened up within the Component Descriptor Editor.</para> |
| |
| <para>At this point, the display inside the Component Descriptor Editor is the same |
| whether one started by creating a new AE descriptor, as in the preceding paragraph, or |
| one merely opened a previously created AE descriptor from, say, the Package Explorer |
| view. We show a previously created AE in the figure below: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="5.7in" format="JPG" fileref="&imgroot;image006.jpg"/> |
| </imageobject> |
| <textobject><phrase>Screenshot of CDE showing overview page</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>To see all the information shown in the main editor pane with less scrolling, double |
| click the title tab to toggle between the <quote>full screen</quote> and normal |
| views.</para> |
| |
| <para>It is possible to set the Component Descriptor Editor as the default editor for all |
| .xml files by going to Window → Preferences, and then selecting File Associations |
| on the left, and *.xml on the right, and finally by clicking on Component Descriptor |
| Editor, the Default button and then OK. If AE and Type System descriptors are not the |
| primary .xml files you work with within the Eclipse environment, we recommend not |
| setting the Component Descriptor Editor as your default editor for all .xml files. To |
| open an .xml file using the Component Descriptor Editor, if the Component Descriptor |
| Editor is not set as your default editor, right click on the file in the Package Explorer, |
| or other navigational view, and select Open With → Component Descriptor Editor. |
| This choice is remembered by Eclipse for subsequent open operations.</para> |
| |
| </section> |
| |
| <section id="ugr.tools.cde.pages_within_the_editor"> |
| <title>Pages within the Editor</title> |
| |
| <para>The Component Descriptor Editor follows a standard Eclipse paradigm for these |
| kinds of editors. There are several pages in the editor; each one can be selected, one at a |
| time, by clicking on the bottom tabs. The last page contains the actual XML source file |
| being edited, and is displayed as plain text.</para> |
| |
| <para>The same set of tabs appear at the bottom of each page in the Component Descriptor |
| Editor. The Component Descriptor Editor uses this <quote>multi-page editor</quote> |
| paradigm to give the user a view of conceptually distinct portions of the Descriptor |
| metadata in separate pages. At any point in time the user may click on the Source tab to |
| view the actual XML source. The Component Descriptor Editor is, in a way, just a fancy GUI |
| for editing the XML. The tabs provide quick access to the following pages: Overview, |
| Aggregate, Parameters, Parameter Settings, Type System, Capabilities, Indexes, |
| Resources, and Source. We discuss each of these pages in turn.</para> |
| |
| <section id="ugr.tools.cde.adjusting_display_of_pages"> |
| <title>Adjusting the display of pages</title> |
| |
| <para>Most pages in the editor have a <quote>sash</quote> bar. This is a light gray bar |
| which separates sub-sections of the page. This bar can be dragged with the mouse to |
| adjust how the display area is split between the two sash panes. You can also change the |
| orientation of the Sash so it splits vertically, instead of horizontally, by |
| clicking on the small icons at the top right of the page that look like this: |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width=".7in" format="JPG" fileref="&imgroot;image008.jpg"/> |
| </imageobject> |
| <textobject><phrase>Changing orientation of two window split</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>All of the sections on a page have subtitles, with an indicator to the left which |
| you can click to collapse or expand that particular section. Collapsing sections can |
| sometimes be useful to free up screen area for other sections.</para> |
| |
| </section> |
| </section> |
| |
| <section id="ugr.tools.cde.overview_page"> |
| <title>Overview Page</title> |
| |
| <para>Normally, the first page displayed in the Component Descriptor Editor is the |
| Overview page (the name of the page is shown in the GUI panel at the top left). If there is an |
| error reading and parsing the source, the Source page is shown instead, giving you the |
| opportunity to correct the problem. For many components, the Overview page contains |
| three sections: Implementation Details, Runtime Information and overall |
| Identification Information.</para> |
| |
| <section id="ugr.tools.cde.overview_page.implementation_details"> |
| <title>Implementation Details</title> |
| |
| <para>In the Implementation Details section you specify the Implementation Language |
| and Engine Type. There are two kinds of Engines: Aggregate, and non-Aggregate (also |
| called Primitive). An Aggregate engine is one which is composed of additional |
| component engines and contains no code, itself. Several of the pages in the Component |
| Descriptor Editor have different formats, depending on the engine type.</para> |
| |
| </section> |
| <section id="ugr.tools.cde.overview_page.runtime_info"> |
| <title>Runtime Information</title> |
| |
| <para>Runtime information is only applicable for primitive engines and is disabled |
| for aggregates and other kinds of descriptors. This is where you specify the class name of the annotator |
| implementation, if you are doing a Java implementation, or the C++ shared object or dll name, |
| if you are doing a C++ implementation. Most Analysis Engines will specify that |
| they update the CAS, and that they may be replicated (for performance reasons) when deployed. If |
| a particular Analysis Engine must see every CAS (for instance, if it is counting the |
| number of CASes), then uncheck the <quote>multiple deployment allowed</quote> |
| box. If the Analysis Engine doesn't update the CAS, uncheck the <quote>updates |
| the CAS</quote> box. (Most CAS Consumers do not update the CAS, and this parameter |
| defaults to unchecked for new CAS Consumer descriptors).</para> |
| |
| <para>Analysis engines are written using the CAS Multiplier APIs |
| (see <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.cm"/>) |
| can create additional CASes for analysis. To specify that they |
| do this, check the <quote>returns new artifacts</quote>.</para> |
| |
| </section> |
| |
| <section id="ugr.tools.cde.overview_page.overall_id_info"> |
| <title>Overall Identification Information</title> |
| |
| <para>The Name should be a human-readable name that describes this component. The |
| Version, Vendor, and Description fields are optional, and are arbitrary |
| strings.</para> |
| |
| </section> |
| </section> |
| |
| <section id="ugr.tools.cde.aggregate_page"> |
| <title>Aggregate Page</title> |
| |
| <para>For primitive Analysis Engines, Flow Controllers or Collection Processing |
| components, the Aggregate page is not used. For aggregate engines, the page looks like |
| this: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="5.7in" format="JPG" fileref="&imgroot;image010.jpg"/> |
| </imageobject> |
| <textobject><phrase>CDE Aggregate page</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>On the left we see a list of component engines, and on the right information about the |
| flow. If you hover the mouse over an item in the list of component engines, that |
| engine's description meta data will be shown. If you right-click on one of these |
| items, you get an option to open that delegate descriptor in another editor instance. |
| Any changes you make, however, won't be seen until you close and reopen the editor |
| on the importing file.</para> |
| |
| <para>Engines can be added to the list on the left by clicking the Add button at the bottom of |
| the Component Engine section. This brings up one of the following two dialogs: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="3.875in" format="JPG" fileref="&imgroot;import-by-location.jpg"/> |
| </imageobject> |
| <textobject><phrase>Adding an Analysis Engine to an Aggregate, by location</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>This dialog lets you select |
| a descriptor from your workspace, or browse the file system to select a descriptor. |
| </para> |
| |
| <para>Or, if you have selected to import by name, this dialog is shown: |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="5.296875in" format="JPG" fileref="&imgroot;import-by-name.jpg"/> |
| </imageobject> |
| <textobject><phrase>Adding an Analysis Engine to an Aggregate, by name</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>You can specify that the import should be by Name (the name is looked up using both the |
| Project's class path, and DataPath), or by location. If it is by name, |
| the dialog shows the available xml files on the class path, to pick from. If the |
| one you want isn't showing, this means it isn't on the enclosing Eclipse Java Project's |
| classpath, nor on the datapath, and one of those needs to be updated to include the |
| path to the resource. If the name picked is |
| <literal>com/company/prod/xyz.xml</literal>, the name in |
| the descriptor will be <quote><literal>com.company.prod.xyz</literal></quote>. |
| The "Browse the file system..." button is disabled when import by name is checked, because |
| the file system is not the source of the imports - rather, its the resources on the |
| classpath or datapath that are.</para> |
| |
| <para> |
| If it is by location, the file reference is converted to a relative reference if |
| possible, in the descriptor.</para> |
| |
| <para>The final selection at the bottom tells whether or not the selected engine(s) |
| should automatically be added to the end of the flow section (the right section on the |
| Aggregate page). The OK button does not become activated until a descriptor |
| file is selected.</para> |
| |
| <para>To remove an analysis engine from the component engine list simply select an engine |
| and click the Remove button, or press the delete key. If the engine is already in the flow |
| list you will be warned that deletion will also delete the specified engine from this |
| list.</para> |
| |
| <section id="ugr.tools.cde.aggregate_page.adding_components_more_than_once"> |
| <title>Adding components more than once</title> |
| |
| <para>Components may be added to the left panel more than once. Each of these components |
| will be given a key which is unique. A typical reason this might be done is to use a |
| component in a flow several times, but have each use be associated with different |
| configuration parameters (different configuration parameters can be associated |
| with each instance).</para> |
| </section> |
| |
| <section |
| id="ugr.tools.cde.aggregate_page.adding_removing_components_from_flow"> |
| <title>Adding or Removing components in a flow</title> |
| |
| <para>The button in-between the Component Engines and the Flow List, labeled |
| <literal>>></literal>, adds a chosen engine to the flow list and the button |
| labeled <literal><<</literal> removes an engine from the flow list. To add an |
| engine to the flow list you must first select an engine from the left hand list, and then |
| press the <literal>>></literal> button. Engines may appear any number of |
| times in the flow list. To remove an engine from the flow list, select an engine from the |
| right hand list and press the <literal><<</literal> button.</para> |
| </section> |
| |
| <section id="ugr.tools.cde.aggregate_page.adding_remote_aes"> |
| <title>Adding remote Analysis Engines</title> |
| |
| <para>There are two ways to add remote engines: add an existing descriptor, which |
| specifies a remote engine (just as if you were adding a non-remote engine) or use the |
| Add Remote button which will create a remote descriptor, save it, and then import it, |
| all in one operation. The Add Remote button enables you to easily specify the |
| information needed to create a Service Client descriptor for a remote AE - one that |
| runs on a different computer connected over the network. The Service Client |
| descriptor is described in <olink targetdoc="&uima_docs_ref;" |
| targetptr="ugr.ref.xml.component_descriptor.service_client"/>. The Add |
| Remote button creates this descriptor, saves it as a file in the workspace, and |
| imports it into the aggregate.</para> |
| |
| <para>Of course, if you already have a Service Client descriptor, you can add it to the |
| set of delegates, just like adding other kinds of analysis engines.</para> |
| |
| <para>After clicking on Add Remote, the following dialog is displayed: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="5.7in" format="JPG" fileref="&imgroot;image014.jpg"/> |
| </imageobject> |
| <textobject><phrase>Adding a remote client to an aggregate</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>To define a remote service you specify the Service Kind, Protocol Service Type, |
| URI and Key. You can also specify a Timeout in milliseconds, used by the SOAP service, |
| and a VNS Host and Port used by the Vinci Service. Just like when one adds an engine from |
| the file system, you have the option of adding the engine to the end of the flow. The |
| Component Descriptor Editor currently only supports Vinci and SOAP services using |
| this dialog.</para> |
| |
| <para>Remote engines are added to the descriptor using the |
| <import ... > syntax. The information you specify here is saved in the Eclipse |
| project as a file, using a generated name, <key-name>.xml, where |
| <key-name> is the name you listed as the Key. Because of this, the key-name must |
| be a valid file name. If you want a different name, you can change the path information |
| in the dialog box.</para> |
| </section> |
| |
| <section id="ugr.tools.cde.aggregate_page.connecting_to_remote_services"> |
| <title>Connecting to Remote Services</title> |
| |
| <para>If you are using the Vinci protocol, it requires that you specify the location of |
| the Vinci Name Server (an IP address and a Port number). You can specify these in the |
| service descriptor, or globally, for your Eclipse workspace, using the Eclipse menu |
| item: Window → Preferences... → UIMA Preferences. If the remote service |
| is available (up and running), additional operations become possible. For |
| instance, hovering the mouse over the remote descriptor will show the description |
| metadata from the remote service.</para> |
| </section> |
| |
| <section id="ugr.tools.cde.aggregate_page.finding_aes_by_searching"> |
| <title>Finding Analysis Engines by searching</title> |
| |
| <para>The next button that appears between the component engine list and the flow list |
| is the Find AE button. When this button is pressed the following dialog is displayed, |
| which allows one to search for AEs by name, by input or output types, or by a combination |
| of these criteria. This function searches the existing Eclipse workspace for |
| matching *.xml descriptor source files; it does not look inside Jar files. |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="5.3in" format="JPG" fileref="&imgroot;image016.jpg"/> |
| </imageobject> |
| <textobject><phrase>Searching for an AE to add to an aggregate</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>The search automatically adds a <quote>match any characters</quote> - style |
| (*) wildcard at the beginning and end of anything entered. Thus, if person is |
| specified for an output type, a <quote>*person*</quote> search is performed. Such a |
| search would match such things as <quote>my.namespace.person</quote> and |
| <quote>person.governmentOfficial.</quote> One can search in all projects or one |
| particular project. The search does an implicit <emphasis>and</emphasis> on all |
| fields which are left non-blank.</para> |
| </section> |
| |
| <section id="ugr.tools.cde.aggregate_page.component_engine_flow"> |
| <title>Component Engine Flow</title> |
| |
| <para>The UIMA SDK currently supports three kinds of sequencing flows: Fixed, |
| CapabilityLanguageFlow (see <olink targetdoc="&uima_docs_ref;" |
| targetptr="ugr.ref.xml.component_descriptor.aes.aggregate.flow_constraints.capability_language_flow"/> |
| ), and user-defined. The first two require specification of a linear flow sequence; |
| this linear flow sequence can also be read by a user-defined flow controller (what use |
| is made of it is up to the user-defined flow controller). The Component Engine Flow |
| section allows specification of these items.</para> |
| |
| <para>The pull-down labeled Flow Kind picks between the three flow models. When the |
| user-defined flow is selected, the Browse and Search buttons become enabled to let |
| you pick the flow controller XML descriptor to import. |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="3.8in" format="JPG" fileref="&imgroot;image018.jpg"/> |
| </imageobject> |
| <textobject><phrase>Specifying flow control</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>The key name value is set automatically from the XML descriptor being imported, |
| and enables parameters to be overridden for that descriptor (see following |
| sections).</para> |
| |
| <para>The Up and Down buttons to the right in the Flow section are activated when an |
| engine in the flow is selected. The Up button moves the selected engine up one place in |
| the execution order, and down moves the selected engine down one place in the |
| execution order. Remember that engines can appear multiple times in the flow (or not |
| at all).</para> |
| |
| </section> |
| </section> |
| |
| <section id="ugr.tools.cde.parm_definition"> |
| <title>Parameters Definition Page</title> |
| |
| <para>There are two pages for parameters: the first one is where parameters are defined, |
| and the second one is where the parameter settings are configured. The first page is the |
| Parameter Definition page and has two alternatives, depending on whether or not the |
| descriptor is an Aggregate or not. We start with a description of parameter definitions |
| for Primitive engines, CAS Consumers, Collection Readers, CAS Initializers, and Flow |
| Controllers. Here is an example: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="5.2in" format="JPG" fileref="&imgroot;image020.jpg"/> |
| </imageobject> |
| <textobject><phrase>Parameter Definitions - not Aggregate</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>The first checkbox at the top simplifies things if you are not using Parameter |
| Groups (see the following section for a discussion of groups). In this case, leave the |
| check box unchecked. The main area shows a list of parameter definitions. Each |
| parameter has a name, which must be unique for this Analysis Engine. The other three |
| attributes specify whether the parameter can have a single or multiple values (an array |
| of values), whether it is Optional or Mandatory, and what the value type it can hold |
| (String, Integer, Float, and Boolean).</para> |
| |
| <para>In addition to using the buttons on the right to edit this information, you can |
| double-click a parameter to edit it, or remove (delete) a selected parameter by |
| pressing the delete key. Use the Add button to add a new parameter to the list.</para> |
| |
| <para>Parameters have an additional description field, which you can specify when you |
| add or edit a parameter. To see the value of the description, hover the mouse over the |
| item, as shown in the picture below: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="5.5in" format="JPG" fileref="&imgroot;image022.jpg"/> |
| </imageobject> |
| <textobject><phrase>Parameter description shown in a hover message</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <section id="ugr.tools.cde.parm_definition.using_groups"> |
| <title>Using groups</title> |
| |
| <para>The group concept for parameters arose from the observation that sets of |
| parameters were sometimes associated with different configuration needs. As an |
| example, you might have an Analysis Engine which needed different configuration |
| based on the language of a document.</para> |
| |
| <para>To use groups, you check the <quote>Use Parameter Groups</quote> box. When you |
| do this, you get the ability to add groups, and to define parameters within these |
| groups. You also get a capability to define <quote>Common</quote> parameters, |
| which are parameters which are defined for all groups. Here is a screen shot showing |
| some parameter groups in use: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="5.2in" format="JPG" fileref="&imgroot;image024.jpg"/> |
| </imageobject> |
| <textobject><phrase>Using parameter groups</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>You can see the <quote><Common></quote> parameters as well as two |
| different sets of groups.</para> |
| |
| <para>The Default Group is an optional specification of what Group to use if the |
| parameter is not available for the group requested.</para> |
| |
| <para>The Search strategy specifies what to do when a parameter is not available for the |
| group requested. It can have the values of None, language_fallback, or |
| default_fallback. These are more fully described in the section <olink |
| targetdoc="&uima_docs_ref;" |
| targetptr="ugr.ref.xml.component_descriptor.aes.configuration_parameter_declaration"/> |
| .</para> |
| |
| <para>Groups are added using the Add Group button. Once added, they can be edited or |
| removed, using the buttons to the right, or the standard gestures for editing |
| (double-clicking the item) and removing (pressing the delete key after an item is |
| selected). Removing a group removes all the parameter definitions in the group. If |
| you try and remove the <quote><Common></quote> group, it just removes the |
| parameters in the group.</para> |
| |
| <para>Each entry for a group in the table specifies one or more group names. For example, |
| the highlighted entry above, specifies two groups: <quote>myNewGroup2</quote> |
| and <quote>mg3</quote>. The parameter definition underneath is considered to be in |
| both groups.</para> |
| |
| </section> |
| |
| <section id="ugr.tools.cde.parm_definition.aggregates"> |
| <title>Parameter declarations for Aggregates</title> |
| |
| <para>Aggregates declare parameters which always must override a parameter setting |
| for a component making up the aggregate. They do this using the version of this page |
| which is shown when the descriptor is an Aggregate; here's an example: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="5.7in" format="JPG" fileref="&imgroot;image026.jpg"/> |
| </imageobject> |
| <textobject><phrase>Aggregate parameters</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>There is an additional panel shown (on the right) which lists all of the |
| components by their key names, and shows for each of them their defined parameters. To |
| add a new override for one or more of these parameters to the aggregate, select the |
| component parameter you wish to override and push the Create Override button (or, you |
| can just double-click the component parameter). This will automatically add a |
| parameter of the same name (by default – you can change the name if you like) to |
| the aggregate, putting it into the same group(s) (if groups are being used in the |
| component – this is required), and setting the properties of the parameter to |
| match those of the component (this is required).</para> |
| <note><para>If the name of the parameter being added already is in use in the aggregate, |
| and the parameters are not compatible, a new parameter name is generated by suffixing |
| the name with a number. If the parameters are compatible, the selected component |
| parameter is added to the existing aggregate parameter, as an additional override. If |
| you don't want this behavior, but want to have a new name generated in this case, |
| push the Create non-shared Override button instead, or hold down the |
| <quote>shift</quote> key when double clicking the component parameter.</para> |
| |
| <para>The required / optional setting in the aggregate parameter is set to match that of |
| the parameter being overridden. You may want to make an optional delegate parameter |
| required. You can do this by changing that value manually in the source editor view. |
| </para></note> |
| |
| <para>In the above example, the user has just double-clicked the |
| <quote>TypeNames</quote> parameter in the <quote>NameRecognizer</quote> |
| component. This added that parameter to this aggregate under the <quote><Not in |
| any group></quote> section – since it wasn't part of a group.</para> |
| |
| <para>Once you have added a parameter definition to the aggregate, you can use the |
| buttons on the right side of the left panel to add additional overrides or remove |
| parameters or their overrides. <phrase |
| id="ugr.tools.cde.parm_definition.removing_groups"> You can also remove |
| groups; removing a group is like removing all the parameter definitions in the |
| group.</phrase></para> |
| |
| <para>In addition to adding one parameter at a time from a component, you can also add all |
| the parameters for a group within a component, or all the parameters in the component, |
| by selecting those items.</para> |
| |
| <para>If you double-click (or push Create Override) the |
| <quote><Common></quote> group or a parameter in the <Common> group in |
| a component, a special group is created in the Aggregate consisting of all of the |
| groups in that component, and the overriding parameter (or parameters) are added to |
| that. This is done because each component can have different groups belonging to the |
| Common group notion; the Common group for a component is just shorthand for all the |
| groups in that component.</para> |
| |
| <para>The Aggregate's specification of the default group and search strategy |
| override any specifications contained in the components.</para> |
| |
| </section> |
| </section> |
| |
| <section id="ugr.tools.cde.parameter_settings"> |
| <title>Parameter Settings Page</title> |
| |
| <para>The Parameter Settings page is rather straightforward; it is where the user |
| defines parameter settings for their engines. An example of such a page is given below: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="5.7in" format="JPG" fileref="&imgroot;image028.jpg"/> |
| </imageobject> |
| <textobject><phrase>Parameter settings page</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>For single valued attributes, the user simply types the default value into the |
| Value box on the right hand side. For multi-valued parameters the user should use the |
| Add, Edit and Remove buttons to manage the list of multiple parameter values.</para> |
| |
| <para>Values within groups are shown with each group separately displayed, to allow |
| configuring different values for each group.</para> |
| |
| <para>Values are checked for validity. For Boolean values in a list, use the words |
| <literal>true</literal> or <literal>false</literal>.</para> |
| <note><para>If you specify a value in a single-valued parameter, and then delete all the |
| characters in the value, the CDE will treat this as if you wanted to not specify any setting |
| for this parameter. In order to specify a 0 length string setting for a String-valued |
| parameter, you will have to manually edit the XML using the <quote>Source</quote> tab. |
| </para> |
| <para> For array valued parameters, if you remove all of the entries for a particular array |
| parameter setting, the XML will reflect a 0-length array. To change this to an |
| unspecified parameter setting, you will have to manually edit the XML using the |
| <quote>Source</quote> tab. </para></note> |
| |
| </section> |
| |
| <section id="ugr.tools.cde.type_system"> |
| <title>Type System Page</title> |
| |
| <para>This page declares the type system used by the annotator. For aggregates it is |
| derived by merging the type systems of all constituent AEs. The types used by the AE |
| constitute the language in which the inputs and outputs are described in the |
| Capabilities page and also affect the choice of indexes on the Indexes page. The Type |
| System page looks like the following: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="5.7in" format="JPG" fileref="&imgroot;image030.jpg"/> |
| </imageobject> |
| <textobject><phrase>Type System declaration page</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>Before discussing this page in detail, it is important to note that there are two |
| settings that affect the operation of this page. These are accessed by selecting the |
| UIMA → Settings (or by going to the Eclipse Window → Preferences → UIMA |
| Preferences) and checking or unchecking one of the following: <quote>Auto generate |
| .java files when defining types</quote> and <quote>Display fully qualified type |
| names.</quote></para> |
| |
| <para id="ugr.tools.cde.auto_jcasgen">When the Auto generate option is checked and the development language for the AE is |
| Java, any time a change is made to a type and the change is saved, the corresponding .java |
| files are generated using the JCasGen tool. The results are stored in the primary source |
| directory defined for the project. The primary source directory is that listed first |
| when you right click on your project and select Properties → Java Build Path, click |
| on the Source tab and look in the list box under the text that reads: <quote>Source folder |
| on build path.</quote> If no source folders are defined, you will get a warning that you |
| have no source folders defined and JCasGen will not be run. (For information on JCasGen |
| see <olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.jcasgen"/>). |
| When JCasGen is run, you can monitor the progress of the generation by observing the |
| status on the Eclipse status line (normally at the bottom of the Eclipse window). |
| JCasGen runs on the fully-merged type system, consisting of the type specification |
| plus any imported type system, plus (for aggregates) the merged type systems of all the |
| components in an aggregate.</para> |
| |
| <warning><para>If the components of the aggregate have different definitions for the same |
| type name, the CDE will show a warning. It is possible to continue past this warning, |
| in which case the CDE will produce the correct |
| Java source files representing the merged types (that is, the |
| type definition that contains all of the features defined on that type by all of your |
| components). However, it is not recommended to use this feature |
| (of having different definitions for the same type name) since it can make it |
| difficult to combine/package your annotator with others. See <olink |
| targetdoc="&uima_docs_ref;" |
| targetptr="ugr.ref.jcas.merging_types_from_other_specs"/> for more information. |
| </para></warning> |
| |
| <note><para>In addition to running automatically, you can manually run JCasGen on the |
| fully merged type system by clicking the JCasGen button, or by selecting Run JCasGen from |
| the UIMA pulldown menu: </para></note> |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="5.2in" format="JPG" fileref="&imgroot;image032.jpg"/> |
| </imageobject> |
| <textobject><phrase>Setting JCasGen options</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot> |
| |
| <para>When <quote>Display fully qualified type names</quote> is left unchecked, the |
| namespace of types is not displayed, i.e. if a fully qualified type name is |
| my.namespace.person, only the abbreviated type name person will be displayed. In the |
| Type page diagram shown above, <quote>Display fully qualified type names</quote> is |
| in fact unchecked.</para> |
| |
| <para>To add, edit, or remove types the buttons on the top left section are used. When |
| adding or editing types, fully qualified type names should of course be used, |
| regardless of whether the <quote>Display fully qualified type names</quote> is |
| unchecked. Removing or editing a type will have a cascading effect in that the type |
| removal/edit will effect inputs, outputs, indexes and type priorities in the natural |
| way.</para> |
| |
| <para>When a type is added, this dialog is shown: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="4.2in" format="JPG" fileref="&imgroot;image034.jpg"/> |
| </imageobject> |
| <textobject><phrase>Adding a type</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>Type names should be specified using a namespace. The namespace is like a Java |
| package name, and serves to insure type names are unique. It also serves as the package |
| name for the generated JCas classes. The namespace name is the set of names up to the last |
| period in the string.</para> |
| |
| <para>The supertype must be picked from an existing type. The entry field for the |
| supertype supports Eclipse-style content assist. To use it, put the cursor in the |
| supertype field, and type a letter or two of the supertype name (lower case is fine), |
| either starting with the name space, or just with the type name (without the name space), |
| and hold down the Control key and then press the spacebar. When you do this, you can see a |
| list of suitable matching types. You can then type more letters to narrow down your |
| choices, or pick the right entry with the mouse.</para> |
| |
| <para>To see the available types and pick one, press the Browse button. This will show the |
| available types, and as you type letters for the type name (in lower case – |
| capitalization is ignored), the available types that match are narrowed. When |
| you've typed enough to specify the type you want, press Enter. Or you can use the |
| list of matching type names and pick the one you want with the mouse.</para> |
| |
| <para>Once you've added the type, you can add features to it by highlighting the |
| type, and pressing the Add button.</para> |
| |
| <para>If the type being defined is a subtype of uima.cas.String, the Add button allows you |
| to add allowed values for the string, instead of adding features.</para> |
| |
| <para>To edit a type or feature, you can double click the entry, or highlight the entry and |
| press the Edit button. To delete a type or feature, you highlight the entry to be deleted, |
| and click the delete button or push the delete key.</para> |
| |
| <para>If the range of a feature is an array or one of the built-in list types, an additional |
| specification allows you to specify if multiple references to the object referenced by |
| this feature are allowed. If they are not allowed then the XMI serialization of |
| instances of this type use a more efficient format.</para> |
| |
| <para>If the range of a feature is an array of Feature Structures, then it is possible to |
| specify an element type for the array. This information is used in the XMI serialization |
| and also by the JCas generation routines to generate more efficient code. |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="4.2in" format="JPG" fileref="&imgroot;image036.jpg"/> |
| </imageobject> |
| <textobject><phrase>Specifying a Feature Structure</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>It is also possible to import type systems for inclusion in your descriptor. To do |
| this, use the Type Import panel's<literal> Add...</literal> button. This |
| allows you to import a type system descriptor.</para> |
| |
| <para>When importing by name, the name is resolved using the class path for the Eclipse |
| project containing the descriptor file being edited, or by looking up this name in the |
| UIMA DataPath. The DataPath can be set by pushing the Set DataPath button. It will be |
| remembered for this Eclipse project, as a project Property, so you only have to set it |
| once (per project). The value of the DataPath setting is written just like a class path, |
| and can include directories or JAR files, just as is true for class paths.</para> |
| |
| <para>The following dialog allows you to pick one or more files from the Eclipse |
| workspace, or one file (at a time) from the file system: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="3.5in" format="JPG" fileref="&imgroot;import-chooser.jpg"/> |
| </imageobject> |
| <textobject><phrase>Picking files for importing</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>This is essentially the same dialog as was used to add component engines to an |
| aggregate. To import from a type system descriptor that is not part of your Eclipse |
| workspace, click the Browse the file system.... button.</para> |
| |
| <para>Imported types are validated, and if OK, they are added to the list in the Imported |
| Type Systems section of the Type System page. Any types they define are merged with the |
| existing type system.</para> |
| |
| <para>Imported types and features which are only defined in imports are shown in the Type |
| System section, but in a grayed-out font; these type cannot be edited here. To change |
| them, open up the imported type system descriptor, and change them there.</para> |
| |
| <para>If you hover the mouse over an import specification, it will show more information |
| about the import. If you right-click, it will bring up a context menu that allows opening |
| the imported file in the Editor, if the imported file is part of the Eclipse workspace. |
| Changes you make, however, won't be seen until you close and reopen the editor on |
| the importing file.</para> |
| |
| <para>It is not possible to define types for an aggregate analysis engine. In this case the |
| type system is computed from the component AEs. The Type System information is shown in a |
| grayed-out font.</para> |
| |
| <section id="ugr.tools.cde.type_system.exporting"> |
| <title>Exporting</title> |
| |
| <para>In addition to importing type specifications, you can export as well. When you |
| push the Export... button, the editor will create a new importable XML descriptor for |
| the types in this type system, and change the existing descriptor to import that newly |
| created one. |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="3.75in" format="JPG" fileref="&imgroot;image040.jpg"/> |
| </imageobject> |
| <textobject><phrase>Exporting a type system</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>The base file name you type is inserted into the path in the line below |
| automatically. You can change the path where the generated part descriptor is stored |
| by overtyping the lower text box. When you click OK, the new part descriptor will be |
| generated, and the current descriptor will be changed to import that part.</para> |
| |
| </section> |
| </section> |
| |
| <section id="ugr.tools.cde.capabilities"> |
| <title>Capabilities Page</title> |
| |
| <para>Capabilities come in <quote>sets</quote>. You can have multiple sets of |
| capabilities; each one specifies languages supported, plus inputs and outputs of the |
| Analysis Engine. The idea behind having multiple sets is the concept that different |
| inputs can result in different outputs. Many Analysis Engines, though, will probably |
| define just one set of capabilities. A sample Capabilities page is given below: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="5.2in" format="JPG" fileref="&imgroot;image042.jpg"/> |
| </imageobject> |
| <textobject><phrase>Capabilities page</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>When defining the capabilities of a primitive analysis engine, input and output |
| types can be any type defined in the type system. When defining the capabilities of an |
| aggregate the inputs must be a subset of the union of the inputs in the constituent |
| analysis engines and the outputs must be a subset of the union of the outputs of the |
| constituent analysis engines.</para> |
| |
| <para>To add a type, first select something in the set you wish to add the type to, and press |
| Add Type. The following dialog appears presenting the user with a list of types which are |
| candidates for additional inputs: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="4.4in" format="JPG" fileref="&imgroot;image044.jpg"/> |
| </imageobject> |
| <textobject><phrase>Adding a type to the capabilities page</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>Follow the instructions to mark the types as input and / or output (a type can be |
| both). By default, the <all features> flag is set to true. If you want to specify a |
| subset of features of a type, read on.</para> |
| |
| <para>When types have features, you can specify what features are input and / or output. A |
| type doesn't have to be an output to have an output feature. For example, an |
| Analysis Engine might be passed as input a type Token, and it adds (outputs) a feature to |
| the existing Token types. If no new Token instances were created, it would not be an |
| output Type, but it would have features which are output.</para> |
| |
| <para>To specify features as input and / or output (they can be both), select a type, and |
| press Add. The following dialog box appears: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="4in" format="JPG" fileref="&imgroot;image046.jpg"/> |
| </imageobject> |
| <textobject><phrase>Specifying features as input or output</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>To mark a feature as being input and / or output, click the mouse in the input and / or |
| output column for the feature. If you select <all features>, it unmarks any |
| individual feature you selected, since <all features> subsumes all the |
| features.</para> |
| |
| <para>The Languages part of the capability is where you specify what languages are |
| supported by the Analysis Engine. Supported languages should be listed using either a |
| two letter ISO-639 language code, or an ISO-639 language code followed by a hyphen and then a two-letter |
| ISO-3166 country code. Add a language by selecting Languages and pressing the Add |
| button. The dialog for adding languages is given below. |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="4in" format="JPG" fileref="&imgroot;image048.jpg"/> |
| </imageobject> |
| <textobject><phrase>Specifying a language</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>The Sofa part of the capability is optional; it allows defining Sofa names that this |
| component uses, and whether they are input (meaning they are created outside of this |
| component, and passed into it), or output (meaning that they are created by this |
| component). Note that a Sofa can be either input or output, but can't be |
| both.</para> |
| |
| <para>To add a Sofa name (which is synonymous with the view name), press the Add Sofa |
| button, and this dialog appears: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="4.2in" format="JPG" fileref="&imgroot;image050.jpg"/> |
| </imageobject> |
| <textobject><phrase>Specifying a Sofa name</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <section id="ugr.tools.cde.capabilities.sofa_name_mapping"> |
| <title>Sofa (and view) name mappings</title> |
| |
| <para>Sofa names, once created, are used in Sofa Mappings. These are optional |
| mappings, done in an aggregate, that specify which Sofas are the same ones but with |
| different names. The Sofa Mappings section is minimized unless you are editing an |
| Aggregate descriptor, and have one or more Sofa names defined for the aggregate. In |
| that case, the Sofa Mappings section will look like this: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="5.4in" format="JPG" fileref="&imgroot;image052.jpg"/> |
| </imageobject> |
| <textobject><phrase>Sofa mappings</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>Here the aggregate has defined two input Sofas, named |
| <quote>MyInputSofa</quote>, and <quote>AnotherSofa</quote>. Any named sofas in |
| the aggregate's capabilities will appear in the Sofa Mapping section, listed |
| either under Inputs or Outputs. Each name in the Mappings has 0 or more delegate |
| (component) sofa names mapped to it. A delegate may have multiple Sofas, as in this |
| example, where the GovernmentOfficialRecognizer delegate has Sofas named |
| <quote>so1</quote> and <quote>so2</quote>.</para> |
| |
| <para>Delegate components may be written as Single-View components. In this case, |
| they have one implicit, default Sofa (<quote>_InitialView</quote>), and to map to |
| it you use the form shown for the <quote>NameRecognizer</quote> – you map to |
| the delegate's key name in the aggregate, without specifying a Sofa name. You |
| can also specify the sofa name explicitly, e.g., |
| NameRecognizer/_InitialView.</para> |
| |
| <para>To add a new mapping, select the Aggregate Sofa name you wish to add the mapping |
| for, and press the Add button. This brings up a window like this, showing all available |
| delegates and their Sofas; select one or more (use the normal multi-select methods) |
| of these and press OK to add them. |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="5.7in" format="JPG" fileref="&imgroot;image054.jpg"/> |
| </imageobject> |
| <textobject><phrase>Adding a Sofa mapping</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>To edit an existing mapping, select the mapping and press Edit. This will show the |
| existing mapping with all mapped items <quote>selected</quote>, and other |
| available items unselected. Change the items selected to match what you want, |
| deselecting some, and perhaps selecting others, and press OK.</para> |
| |
| </section> |
| </section> |
| |
| <section id="ugr.tools.cde.indexes"> |
| <title>Indexes Page</title> |
| |
| <para>The Indexes page is where the user declares what indexes and type priority lists are |
| used by the analysis engine. Indexes are used to determine which Feature |
| Structures of a particular type are fetched, using an iterator in the UIMA API. An |
| unpopulated Indexes page is displayed below: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="5.5in" format="JPG" fileref="&imgroot;image056.jpg"/> |
| </imageobject> |
| <textobject><phrase>Index page</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>Both indexes and type priority lists can have imports. These imports work just like |
| the type system imports, described above. Both indexes and type priority lists can be |
| exported to new component descriptors, using the Export... button, just like the type |
| system export operation described above.</para> |
| |
| <para>The built-in Annotation Index is always present. It is based on the built-in type |
| <literal>uima.tcas.Annotation </literal>and has keys begin (Ascending), end |
| (Descending) and TYPE_PRIORITY. There are no built-in type priorities, so this last |
| sort item does not play a role in the index unless type priorities are specified.</para> |
| |
| <para>Type priority may be combined with other keys. Type priorities are defined in the |
| Priority Lists section, using one or more priority list. A given priority list gives an |
| ordering among a group of types. Types that appear higher in the priority list are given |
| higher priority, in other words, they sort first when TYPE_PRIORITY is specified as the |
| index key. Subtypes of these types are also ordered in a consistent manner, unless |
| overridden by another specific type priority specification. To get the ordering used |
| among all the types, all of the type priority lists are merged. This gives a partial |
| ordering among the types. Ties are resolved in an unspecified fashion. The Component |
| Descriptor Editor checks for incompatible orderings, and informs the user if they |
| exist, so they can be corrected.</para> |
| |
| <para>To create a new index, use the Add Index button in the top left section. This brings up |
| this dialog: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="4in" format="JPG" fileref="&imgroot;image058.jpg"/> |
| </imageobject> |
| <textobject><phrase>Adding a new index</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>Each index needs a globally unique index name. Every index indexes one CAS type (including |
| its subtypes). If you're using Eclipse 3.2 or later, the entry field for this |
| has content assist (start typing the type name |
| and press Control – Spacebar to get help, or press the Browse button to pick a |
| type).</para> |
| |
| <para>Indexes can be sorted, in which case you need to specify one or more keys to sort on. |
| Sort keys are selected from features whose range type is Integer, Float, or String. Some |
| elements will be disabled if they are not relevant. For instance, if the index kind is |
| <quote>bag</quote>, you cannot provide sort keys. The order of sort keys can be |
| adjusted using the up and down buttons, if necessary.</para> |
| |
| |
| <note><para>There is usually no need to explicitly declare a Bag index in your descriptor. |
| As of UIMA v2.1, if you do not declare any index for a type (or any of its |
| supertypes), a Bag index will be automatically created. This index is |
| accessed using the <literal>getAllIndexedFS(...)</literal> method defined on the index repository.</para></note> |
| |
| |
| <para>A set index will contain no duplicates of the same type, where a duplicate is defined |
| by the indexing comparator. That is, if you commit two feature structures of the same |
| type that are equal with respect to the indexing comparator, only the first one will be |
| entered into the index. Note that you can still have duplicates with respect to the |
| indexing order, if they are of a different type. A set index is not guaranteed to be |
| sorted. If no keys are specified for a set index, then all instances are considered by |
| default to be equal, so only the first instance (for a particular type or subtype of the |
| type being indexed) is indexed. On the other hand, <quote>bag</quote> indicates that |
| all annotation instances are indexed, including duplicates.</para> |
| |
| <para>The Priority Lists section of the Indexes page is used to specify Priority Lists of |
| types. Priority Lists are unnamed ordered sets of type names. Add a new priority list by |
| clicking the Add Set button. Add a type to an existing priority list by first selecting |
| the set, and then clicking Add. You can use the up and down buttons to adjust the order as |
| necessary; these buttons move the selected item up or down.</para> |
| |
| <para>Although it is possible to import self-contained index and type priority files, |
| the creation of such files is not yet supported by the Component Descriptor Editor. If |
| you create these files using another editor, they can be imported using the |
| corresponding Import panels, shown on the right. Imports are specified in the same |
| manner as they are for Type System imports.</para> |
| |
| </section> |
| |
| <section id="ugr.tools.cde.resources"> |
| <title>Resources Page</title> |
| |
| <para>The resources page describes resource dependencies (for primitive Analysis |
| Engines) and external Resource specification and their bindings to the resource |
| dependencies.</para> |
| |
| <para>Only primitive Analysis Engines define resource dependencies. Primitive and |
| Aggregate Analysis Engines can define external resources and connect them (bind them) |
| to resource dependencies.</para> |
| |
| <para>When an Aggregate is providing an external resource to be bound to a dependency, the |
| binding is specified using a possibly multi-level path, starting at the Aggregate, and |
| specify which component (by its key name), and then if that component is, in turn, an |
| Aggregate, which component (again by its key name), and so on until you reach a |
| primitive. The sequence of key names is made into the binding specification by joining |
| the parts with a <quote>/</quote> character. All of this is done for you by the Component |
| Descriptor Editor.</para> |
| |
| <para>Any external resource provided by an Aggregate will override any binding provided |
| by any lower level component for the same resource dependency.</para> |
| |
| <para>There are two views of the Resources page, depending on whether the Analysis Engine |
| is an Aggregate or Primitive. Here's the view for a Primitive: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="5in" format="JPG" fileref="&imgroot;image060.jpg"/> |
| </imageobject> |
| <textobject><phrase>Resources page for a primitive</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>To declare a resource dependency, click the Add button in the right hand panel. This |
| puts up the dialog: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="4in" format="JPG" fileref="&imgroot;image062.jpg"/> |
| </imageobject> |
| <textobject><phrase>Specifying a resource dependency</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>The Key must be unique within the descriptor declaring it. The Interface, if |
| present, is the name of a Java interface the Analysis Engine uses to access the |
| resource.</para> |
| |
| <para>Declare actual External resource on the left side of the page. Clicking |
| <quote>Add</quote> brings up this dialog: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="4.2in" format="JPG" fileref="&imgroot;image064.jpg"/> |
| </imageobject> |
| <textobject><phrase>Specifying an External Resource</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>The Name must be unique within this Analysis Engine. The URL identifies a file |
| resource. If both the URL and URL suffix are used, the file resource is formed by |
| combining the first URL part with the language-identifier, followed by the URL suffix; |
| see <olink targetdoc="&uima_docs_ref;" |
| targetptr="ugr.ref.xml.component_descriptor.aes.primitive.resource_manager_configuration"/> |
| . URLs may be written as <quote>relative</quote> URLs; in this case they are resolved by |
| looking them up relative to the classpath and/or datapath. A relative URL has the path |
| part starting without an intial <quote>/</quote>; for example: |
| file:my/directory/file. An absolute URL starts with file:/ or file:/// or |
| file://some.network.address/. For more information about URLs, please read the |
| javaDoc information for the Java class <quote>URL</quote>.</para> |
| |
| <para>The Implementation is optional, and if given, must be a Java class that implements |
| the interface specified in any Resource Dependencies this resource is bound |
| to.</para> |
| |
| <section id="ugr.tools.cde.resources.binding"> |
| <title>Binding</title> |
| |
| <para>Once you have an external resource definition, and a Resource Dependency, you |
| can bind them together. To do this, you select the two things (an external resource |
| definition, and a Resource Dependency) that you want to bind together, and click |
| Bind.</para> |
| |
| </section> |
| |
| <section id="ugr.tools.cde.resources.aggregates"> |
| |
| <title>Resources with Aggregates</title> |
| |
| <para>When editing an Aggregate Descriptor, the Resource definitions panel will show |
| all the resources at the primitive level, with paths down through the components |
| (multiple levels, if needed) to get to the primitives. The Aggregate can define |
| external resources, and bind them to one or more uses by the primitives.</para> |
| |
| </section> |
| |
| <section id="ugr.tools.cde.resources.imports_exports"> |
| <title>Imports and Exports</title> |
| |
| <para>Resource definitions and their bindings can be imported, just like other |
| imports. Existing Resource definitions and their bindings can be exported to a new |
| importable part, and replaced with an import for that importable part, using the |
| <quote>Export...</quote> button, just like the similar function on the Type System |
| page.</para> |
| |
| </section> |
| </section> |
| |
| <section id="ugr.tools.cde.source"> |
| <title>Source Page</title> |
| |
| <para>The Source page is a text view of the xml content of the Analysis Engine or Type System |
| being configured. An example of this page is displayed below: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="5.7in" format="JPG" fileref="&imgroot;image066.jpg"/> |
| </imageobject> |
| <textobject><phrase>Source page</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>Changes made in the GUI are immediately reflected in the xml source, and changes |
| made in the xml source are immediately reflected back in the GUI. The thought here is that |
| the GUI view and the Source view are just two ways of looking at the same data. When the data |
| is in an unsaved state the file name is prefaced with an asterisk in the currently |
| selected file tab in the editor pane inside Eclipse (as in the example above).</para> |
| |
| <para>You may accidentally create invalid descriptors or XML by editing directly in the |
| Source view. If you do this, when you try and save or when you switch to a different view, |
| the error will be detected and reported. In the case of saving, the file will be saved, |
| even if it is in an error state.</para> |
| |
| <section id="ugr.tools.cde.source.formatting"> |
| <title>Source formatting – indentation</title> |
| |
| <para>The XML is indented using an indentation amount saved as a global UIMA |
| preference. To change this preference, use the Eclipse menu item: Windows → |
| Preferences → UIMA Preferences.</para> |
| |
| </section> |
| </section> |
| |
| <section id="ugr.tools.cde.creating_self_contained_type_system"> |
| <title>Creating a Self-Contained Type System</title> |
| |
| <para>It is also possible to use the Component Descriptor Editor to create or edit |
| self-contained type systems. To create a self-contained type system, select the menu |
| item File → New → Other and then select Type System Descriptor File. From the |
| next page of the selection wizard specify a Parent Folder and File name and click Finish. |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="3.5in" format="JPG" fileref="&imgroot;image068.jpg"/> |
| </imageobject> |
| <textobject><phrase>Working with a self-contained type system</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot> |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="3.5in" format="JPG" fileref="&imgroot;image070.jpg"/> |
| </imageobject> |
| <textobject><phrase></phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>This will take you to a version of the Component Descriptor Editor for editing a type |
| system file which contains just three pages: an overview page, a type system page, and a |
| source page. The overview page is a bit more spartan than in the case of an AE. It looks like |
| the following: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="3.7in" format="JPG" fileref="&imgroot;image072.jpg"/> |
| </imageobject> |
| <textobject><phrase>Editing a type system object</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>Just like an AE has an associated name, version, vendor and description, the same is |
| true of a self-contained type system. The Type System page is identical to that in an AE |
| descriptor file, as is the Source page. Note that a self-contained type system can |
| import type systems just like the type system associated with an AE.</para> |
| |
| <para>A type system component can also be created from an existing descriptor which |
| contains a type system definition section, by clicking on the Export... button on the |
| Type System page.</para> |
| |
| </section> |
| |
| <section id="ugr.tools.cde.creating_other_descriptor_components"> |
| <title>Creating Other Descriptor Components</title> |
| |
| <para>The new wizard can create several other kinds of components: Collection |
| Processing Management (CPM) components, flow controllers, and importable parts |
| (besides Type Systems, described above, Indexes, Type Priorities, and Resource |
| Manager Configuration imports).</para> |
| |
| <para>The CPM components supported by this editor include the Collection Reader, CAS |
| Initializer, and CAS Consumer descriptors. Each of these is basically treated just |
| like a primitive AE descriptor, with small changes to accommodate the different |
| semantics. For instance, a CAS Consumer can't declare in its capabilities |
| section that it outputs types or features.</para> |
| |
| <para>Flow controllers are components that control the flow of CASes within an |
| aggregate, an are edited in a similar fashion as a primitive Analysis Engine.</para> |
| |
| <para>The importable part support requires context information to enable the editor to |
| work, because much of the power of this editor comes from extensive checking that |
| requires additional information, other than what is available in just the importable |
| part. For instance, when you create or edit an Indexes import, the facility for adding |
| new indexes needs the type information, which is not present in this part when it is |
| edited alone. </para> |
| |
| <para>To overcome this, when you edit these descriptors, you will be asked to |
| specify a context descriptor, usually a descriptor which would import the part being |
| edited, which would have the additional information needed. </para> |
| |
| <para>Various methods are used |
| to guess what the context descriptor should be - and if the guess is correct, you can just |
| press the Enter key to confirm. The last successful context file is remembered and will |
| be suggested as the context file to use at the next edit session</para> |
| </section> |
| </chapter> |