blob: e3ca7734a4017250ecdedf12d2a53c44e49fdfbc [file] [log] [blame]
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
[[ugr.tools.cde]]
= Component Descriptor Editor User's Guide
// <titleabbrev>CDE User's Guide</titleabbrev>
The Component Descriptor Editor is an Eclipse plug-in that provides a forms-based interface for creating and editing UIMA XML descriptors.
It supports most of the descriptor formats, except the Collection Processing Engine descriptor, the PEAR package descriptor and some remote deployment descriptors.
[[ugr.tools.cde.launching]]
== Launching the Component Descriptor Editor
Here's how to launch this tool on a descriptor contained in the examples.
This presumes you have installed the examples as described in the SDK Installation and Setup chapter.
* Expand the uimaj-examples project in the Eclipse Navigator or Package Explorer view
* Within this project, browse to the file descriptors/tutorial/ex1/RoomNumberAnnotator.xml.
* Right-click on this file and select Open With →Component Descriptor Editor. (If this option is not present, check to make sure you xref:oas.adoc#ugr.ovv.eclipse_setup.installation[installed the plug-ins]. The EMF plugin is also required.)
* This should open a graphical editor and display the contents of the RoomNumberAnnotator descriptor.
[[ugr.tools.cde.creating_new_ae_descriptor]]
== Creating a New AE Descriptor
A new AE descriptor file may be created by selecting the File →New →Other... menu.
This brings up the following dialog:
.Screenshot of selecting new UIMA component in Eclipse
image::images/tools/tools.cde/image002.jpg[Screenshot of selecting new UIMA component in Eclipse]
If the user then selects UIMA and Analysis Engine Descriptor File, and clicks the _Next_ button, the following dialog is displayed.
We will cover creating other kinds of components later in the documentation.
.Screenshot of selecting new UIMA component in Eclipse after pushing Next
image::images/tools/tools.cde/image004.jpg[Screenshot of selecting new UIMA component in Eclipse after pushing Next]
After entering the appropriate parent folder and file name, and clicking Finish, an initial AE descriptor file is created with the given name, and the descriptor is opened up within the Component Descriptor Editor.
At this point, the display inside the Component Descriptor Editor is the same whether one started by creating a new AE descriptor, as in the preceding paragraph, or one merely opened a previously created AE descriptor from, say, the Package Explorer view.
We show a previously created AE in the figure below:
.Screenshot of CDE showing overview page
image::images/tools/tools.cde/image006.jpg[Screenshot of CDE showing overview page]
To see all the information shown in the main editor pane with less scrolling, double click the title tab to toggle between the "`full screen`" and normal views.
It is possible to set the Component Descriptor Editor as the default editor for all .xml files by going to Window →Preferences, and then selecting File Associations on the left, and *.xml on the right, and finally by clicking on Component Descriptor Editor, the Default button and then OK.
If AE and Type System descriptors are not the primary .xml files you work with within the Eclipse environment, we recommend not setting the Component Descriptor Editor as your default editor for all .xml files.
To open an .xml file using the Component Descriptor Editor, if the Component Descriptor Editor is not set as your default editor, right click on the file in the Package Explorer, or other navigational view, and select Open With →Component Descriptor Editor.
This choice is remembered by Eclipse for subsequent open operations.
[[ugr.tools.cde.pages_within_the_editor]]
== Pages within the Editor
The Component Descriptor Editor follows a standard Eclipse paradigm for these kinds of editors.
There are several pages in the editor; each one can be selected, one at a time, by clicking on the bottom tabs.
The last page contains the actual XML source file being edited, and is displayed as plain text.
The same set of tabs appear at the bottom of each page in the Component Descriptor Editor.
The Component Descriptor Editor uses this "`multi-page editor`" paradigm to give the user a view of conceptually distinct portions of the Descriptor metadata in separate pages.
At any point in time the user may click on the Source tab to view the actual XML source.
The Component Descriptor Editor is, in a way, just a fancy GUI for editing the XML.
The tabs provide quick access to the following pages: Overview, Aggregate, Parameters, Parameter Settings, Type System, Capabilities, Indexes, Resources, and Source.
We discuss each of these pages in turn.
[[ugr.tools.cde.adjusting_display_of_pages]]
=== Adjusting the display of pages
Most pages in the editor have a "`sash`" bar.
This is a light gray bar which separates sub-sections of the page.
This bar can be dragged with the mouse to adjust how the display area is split between the two sash panes.
You can also change the orientation of the Sash so it splits vertically, instead of horizontally, by clicking on the small icons at the top right of the page that look like this:
.Changing orientation of two window split
image::images/tools/tools.cde/image008.jpg[Changing orientation of two window split]
All of the sections on a page have subtitles, with an indicator to the left which you can click to collapse or expand that particular section.
Collapsing sections can sometimes be useful to free up screen area for other sections.
[[ugr.tools.cde.overview_page]]
== Overview Page
Normally, the first page displayed in the Component Descriptor Editor is the Overview page (the name of the page is shown in the GUI panel at the top left). If there is an error reading and parsing the source, the Source page is shown instead, giving you the opportunity to correct the problem.
For many components, the Overview page contains three sections: Implementation Details, Runtime Information and overall Identification Information.
[[ugr.tools.cde.overview_page.implementation_details]]
=== Implementation Details
In the Implementation Details section you specify the Implementation Language and Engine Type.
There are two kinds of Engines: Aggregate, and non-Aggregate (also called Primitive). An Aggregate engine is one which is composed of additional component engines and contains no code, itself.
Several of the pages in the Component Descriptor Editor have different formats, depending on the engine type.
[[ugr.tools.cde.overview_page.runtime_info]]
=== Runtime Information
Runtime information is only applicable for primitive engines and is disabled for aggregates and other kinds of descriptors.
This is where you specify the class name of the annotator implementation, if you are doing a Java implementation, or the C\++ shared object or dll name, if you are doing a C++ implementation.
Most Analysis Engines will specify that they update the CAS, and that they may be replicated (for performance reasons) when deployed.
If a particular Analysis Engine must see every CAS (for instance, if it is counting the number of CASes), then uncheck the "`multiple deployment allowed`" box.
If the Analysis Engine doesn't update the CAS, uncheck the "`updates the CAS`" box.
(Most CAS Consumers do not update the CAS, and this parameter defaults to unchecked for new CAS Consumer descriptors).
Analysis engines are written using the xref:tug.adoc#-ugr.tug.cm[CAS Multiplier APIs] can create additional CASes for analysis.
To specify that they do this, check the `returns new artifacts`.
[[ugr.tools.cde.overview_page.overall_id_info]]
=== Overall Identification Information
The Name should be a human-readable name that describes this component.
The Version, Vendor, and Description fields are optional, and are arbitrary strings.
[[ugr.tools.cde.aggregate_page]]
== Aggregate Page
For primitive Analysis Engines, Flow Controllers or Collection Processing components, the Aggregate page is not used.
For aggregate engines, the page looks like this:
.CDE Aggregate page
image::images/tools/tools.cde/image010.jpg[CDE Aggregate page]
On the left we see a list of component engines, and on the right information about the flow.
If you hover the mouse over an item in the list of component engines, that engine's description meta data will be shown.
If you right-click on one of these items, you get an option to open that delegate descriptor in another editor instance.
Any changes you make, however, won't be seen until you close and reopen the editor on the importing file.
Engines can be added to the list on the left by clicking the Add button at the bottom of the Component Engine section.
This brings up one of the following two dialogs:
.Adding an Analysis Engine to an Aggregate, by location
image::images/tools/tools.cde/import-by-location.jpg["Adding an Analysis Engine to an Aggregate, by location"]
This dialog lets you select a descriptor from your workspace, or browse the file system to select a descriptor.
Or, if you have selected to import by name, this dialog is shown:
.Adding an Analysis Engine to an Aggregate, by name
image::images/tools/tools.cde/import-by-name.jpg["Adding an Analysis Engine to an Aggregate, by name"]
You can specify that the import should be by Name (the name is looked up using both the Project's class path, and DataPath), or by location.
If it is by name, the dialog shows the available xml files on the class path, to pick from.
If the one you want isn't showing, this means it isn't on the enclosing Eclipse Java Project's classpath, nor on the datapath, and one of those needs to be updated to include the path to the resource.
If the name picked is ``com/company/prod/xyz.xml``, the name in the descriptor will be "``com.company.prod.xyz``".
The "Browse the file system..." button is disabled when import by name is checked, because the file system is not the source of the imports - rather, its the resources on the classpath or datapath that are.
If it is by location, the file reference is converted to a relative reference if possible, in the descriptor.
The final selection at the bottom tells whether or not the selected engine(s) should automatically be added to the end of the flow section (the right section on the Aggregate page). The OK button does not become activated until a descriptor file is selected.
To remove an analysis engine from the component engine list simply select an engine and click the Remove button, or press the delete key.
If the engine is already in the flow list you will be warned that deletion will also delete the specified engine from this list.
[[ugr.tools.cde.aggregate_page.adding_components_more_than_once]]
=== Adding components more than once
Components may be added to the left panel more than once.
Each of these components will be given a key which is unique.
A typical reason this might be done is to use a component in a flow several times, but have each use be associated with different configuration parameters (different configuration parameters can be associated with each instance).
[[ugr.tools.cde.aggregate_page.adding_removing_components_from_flow]]
=== Adding or Removing components in a flow
The button in-between the Component Engines and the Flow List, labeled ``>>``, adds a chosen engine to the flow list and the button labeled `<<` removes an engine from the flow list.
To add an engine to the flow list you must first select an engine from the left hand list, and then press the `>>` button.
Engines may appear any number of times in the flow list.
To remove an engine from the flow list, select an engine from the right hand list and press the `<<` button.
[[ugr.tools.cde.aggregate_page.adding_remote_aes]]
=== Adding remote Analysis Engines
There are two ways to add remote engines: add an existing descriptor, which specifies a remote engine (just as if you were adding a non-remote engine) or use the __Add Remote__ button which will create a remote descriptor, save it, and then import it, all in one operation.
The __Add Remote__ button enables you to easily specify the information needed to create a remote service descriptor for a remote AE - one that runs on a different computer connected over the network.
There are 3 kinds of these: two are variants of the xref:ref.adoc#ugr.ref.xml.component_descriptor.service_client[Service Client descriptor]; the other is the UIMA-AS JMS Service descriptor, described in the UIMA AS documentation.
The __Add Remote__ button creates an instance of one of these descriptors, saves it as a file in the workspace, and imports it into the aggregate.
Of course, if you already have a remote service descriptor, you can add it to the set of delegates using the `Add` button, just like adding other kinds of analysis engines.
After clicking on __Add Remote__, the following dialog is displayed:
.Adding a remote client to an aggregate
image::images/tools/tools.cde/image014v2.jpg[Adding a remote client to an aggregate]
To define a remote service you specify the Service Kind, Protocol Service Type, URI and Key.
You can also specify a Timeout in milliseconds, used by the JMS services, and a VNS Host and Port used by the Vinci Service.
The JMS service has additional timeouts and other parameters you may specify.
Just like when one adds an engine from the file system, you have the option of adding the engine to the end of the flow.
The Component Descriptor Editor currently only supports Vinci services using this dialog.
Remote engines are added to the descriptor using the <import ... > syntax.
The information you specify here is saved in the Eclipse project as a file, using a generated name, `<key-name>.xml`, where `<key-name>` is the name you listed as the Key.
Because of this, the key-name must be a valid file name.
If you want a different name, you can change the path information in the dialog box.
[[ugr.tools.cde.aggregate_page.connecting_to_remote_services]]
=== Connecting to Remote Services
If you are using the Vinci protocol, it requires that you specify the location of the Vinci Name Server (an IP address and a Port number). You can specify these in the service descriptor, or globally, for your Eclipse workspace, using the Eclipse menu item: Window →Preferences... →UIMA Preferences.
If the remote service is available (up and running), additional operations become possible.
For instance, hovering the mouse over the remote descriptor will show the description metadata from the remote service.
[[ugr.tools.cde.aggregate_page.finding_aes_by_searching]]
=== Finding Analysis Engines by searching
The next button that appears between the component engine list and the flow list is the Find AE button.
When this button is pressed the following dialog is displayed, which allows one to search for AEs by name, by input or output types, or by a combination of these criteria.
This function searches the existing Eclipse workspace for matching *.xml descriptor source files; it does not look inside Jar files.
.Searching for an AE to add to an aggregate
image::images/tools/tools.cde/image016.jpg[Searching for an AE to add to an aggregate]
The search automatically adds a "`match any characters`" - style (*) wildcard at the beginning and end of anything entered.
Thus, if person is specified for an output type, a "`*person*`" search is performed.
Such a search would match such things as "`my.namespace.person`" and "`person.governmentOfficial.`" One can search in all projects or one particular project.
The search does an implicit _and_ on all fields which are left non-blank.
[[ugr.tools.cde.aggregate_page.component_engine_flow]]
=== Component Engine Flow
The UIMA SDK currently supports three kinds of xref:ref.adoc#ugr.ref.xml.component_descriptor.aes.aggregate.flow_constraints[sequencing flows]: `Fixed`, `CapabilityLanguageFlow`, and user-defined. The first two require specification of a linear flow sequence; this linear flow sequence can also be read by a user-defined flow controller (what use is made of it is up to the user-defined flow controller). The Component Engine Flow section allows specification of these items.
The pull-down labeled Flow Kind picks between the three flow models.
When the user-defined flow is selected, the Browse and Search buttons become enabled to let you pick the flow controller XML descriptor to import.
.Specifying flow control
image::images/tools/tools.cde/image018.jpg[Specifying flow control]
The key name value is set automatically from the XML descriptor being imported, and enables parameters to be overridden for that descriptor (see following sections).
The Up and Down buttons to the right in the Flow section are activated when an engine in the flow is selected.
The Up button moves the selected engine up one place in the execution order, and down moves the selected engine down one place in the execution order.
Remember that engines can appear multiple times in the flow (or not at all).
[[ugr.tools.cde.parm_definition]]
== Parameters Definition Page
There are two pages for parameters: the first one is where parameters are defined, and the second one is where the parameter settings are configured.
The first page is the Parameter Definition page and has two alternatives, depending on whether or not the descriptor is an Aggregate or not.
We start with a description of parameter definitions for Primitive engines, CAS Consumers, Collection Readers, CAS Initializers, and Flow Controllers.
Here is an example:
.Parameter Definitions - not Aggregate
image::images/tools/tools.cde/image020.jpg[Parameter Definitions - not Aggregate]
The first checkbox at the top simplifies things if you are not using Parameter Groups (see the following section for a discussion of groups). In this case, leave the check box unchecked.
The main area shows a list of parameter definitions.
Each parameter has a name, which must be unique for this Analysis Engine.
The first three attributes specify whether the parameter can have a single or multiple values (an array of values), whether it is Optional or Mandatory, and what the value type it can hold (String, Integer, Float, and Boolean). If an external override name has been specified an attribute of "XO" is included.
See xref:ref.adoc#ugr.ref.xml.component_descriptor.aes.external_configuration_parameter_overrides[External Configuration Parameter Overrides] for a discussion of external configuration parameter overrides.
In addition to using the buttons on the right to edit this information, you can double-click a parameter to edit it, or remove (delete) a selected parameter by pressing the delete key.
Use the Add button to add a new parameter to the list.
Parameters have an additional description field, which you can specify when you add or edit a parameter.
To see the value of the description, hover the mouse over the item, as shown in the picture below.
If the parameter has an external override name its value is included in the hover.
.Parameter description shown in a hover message
image::images/tools/tools.cde/image022.jpg[Parameter description shown in a hover message]
[[ugr.tools.cde.parm_definition.using_groups]]
=== Using groups
The group concept for parameters arose from the observation that sets of parameters were sometimes associated with different configuration needs.
As an example, you might have an Analysis Engine which needed different configuration based on the language of a document.
To use groups, you check the "`Use Parameter Groups`" box.
When you do this, you get the ability to add groups, and to define parameters within these groups.
You also get a capability to define "`Common`" parameters, which are parameters which are defined for all groups.
Here is a screen shot showing some parameter groups in use:
.Using parameter groups
image::images/tools/tools.cde/image024.jpg[Using parameter groups]
You can see the `<Common>` parameters as well as two different sets of groups.
The Default Group is an optional specification of what Group to use if the parameter is not available for the group requested.
The xref:ref.adoc#ugr.ref.xml.component_descriptor.aes.configuration_parameter_declaration[Search strategy] specifies what to do when a parameter is not available for the group requested.
It can have the values of `None`, `language_fallback`, or `default_fallback`.
Groups are added using the __Add Group__ button.
Once added, they can be edited or removed, using the buttons to the right, or the standard gestures for editing (double-clicking the item) and removing (pressing the delete key after an item is selected). Removing a group removes all the parameter definitions in the group.
If you try and remove the `<Common>` group, it just removes the parameters in the group.
Each entry for a group in the table specifies one or more group names.
For example, the highlighted entry above, specifies two groups: `myNewGroup2` and `mg3`.
The parameter definition underneath is considered to be in both groups.
[[ugr.tools.cde.parm_definition.adding]]
=== Adding or Editing a Parameter
When creating or modifying a parameter both a unique name and a valid type must be specified.
The Description and External Override fields are optional.
The defaults for the two checkboxs indicate a single-valued optional parameter in the example below:
image::images/tools/tools.cde/image025.jpg[Aggregate parameters]
[[ugr.tools.cde.parm_definition.aggregates]]
=== Parameter declarations for Aggregates
Aggregates declare parameters which always must override a parameter setting for a component making up the aggregate.
They do this using the version of this page which is shown when the descriptor is an Aggregate; here's an example:
image::images/tools/tools.cde/image026.jpg[Aggregate parameters]
There is an additional panel shown (on the right) which lists all of the components by their key names, and shows for each of them their defined parameters.
To add a new override for one or more of these parameters to the aggregate, select the component parameter you wish to override and push the Create Override button (or, you can just double-click the component parameter). This will automatically add a parameter of the same name (by default –you can change the name if you like) to the aggregate, putting it into the same group(s) (if groups are being used in the component –this is required), and setting the properties of the parameter to match those of the component (this is required).
[NOTE]
====
If the name of the parameter being added already is in use in the aggregate, and the parameters are not compatible, a new parameter name is generated by suffixing the name with a number.
If the parameters are compatible, the selected component parameter is added to the existing aggregate parameter, as an additional override.
If you don't want this behavior, but want to have a new name generated in this case, push the Create non-shared Override button instead, or hold down the "`shift`" key when double clicking the component parameter.
The required / optional setting in the aggregate parameter is set to match that of the parameter being overridden.
You may want to make an optional delegate parameter required.
You can do this by changing that value manually in the source editor view.
====
In the above example, the user has just double-clicked the `TypeNames` parameter in the `NameRecognizer` component.
This added that parameter to this aggregate under the `<Not in any group>` section -- since it wasn't part of a group.
Once you have added a parameter definition to the aggregate, you can use the buttons on the right side of the left panel to add additional overrides or remove parameters or their overrides. You can also remove groups; removing a group is like removing all the parameter definitions in the group.
In addition to adding one parameter at a time from a component, you can also add all the parameters for a group within a component, or all the parameters in the component, by selecting those items.
If you double-click (or push __Create Override__) the `<Common>` group or a parameter in the `<Common>` group in a component, a special group is created in the Aggregate consisting of all of the groups in that component, and the overriding parameter (or parameters) are added to that.
This is done because each component can have different groups belonging to the Common group notion; the Common group for a component is just shorthand for all the groups in that component.
The Aggregate's specification of the default group and search strategy override any specifications contained in the components.
[[ugr.tools.cde.parameter_settings]]
== Parameter Settings Page
The Parameter Settings page is rather straightforward; it is where the user defines parameter settings for their engines.
An example of such a page is given below:
.Parameter settings page
image::images/tools/tools.cde/image028.jpg[Parameter settings page]
For single valued attributes, the user simply types the default value into the Value box on the right hand side.
For multi-valued parameters the user should use the Add, Edit and Remove buttons to manage the list of multiple parameter values.
Values within groups are shown with each group separately displayed, to allow configuring different values for each group.
Values are checked for validity.
For Boolean values in a list, use the words `true` or `false`.
[NOTE]
====
If you specify a value in a single-valued parameter, and then delete all the characters in the value, the CDE will treat this as if you wanted to not specify any setting for this parameter.
In order to specify a 0 length string setting for a String-valued parameter, you will have to manually edit the XML using the "`Source`" tab.
For array valued parameters, if you remove all of the entries for a particular array parameter setting, the XML will reflect a 0-length array.
To change this to an unspecified parameter setting, you will have to manually edit the XML using the "`Source`" tab.
====
[[ugr.tools.cde.type_system]]
== Type System Page
This page declares the type system used by the annotator.
For aggregates it is derived by merging the type systems of all constituent AEs.
The types used by the AE constitute the language in which the inputs and outputs are described in the Capabilities page and also affect the choice of indexes on the Indexes page.
The Type System page looks like the following:
image::images/tools/tools.cde/limitJCasGenType.jpg[Type System declaration page]
Before discussing this page in detail, it is important to note that there are 3 settings that affect the operation of this page.
These are accessed by selecting the UIMA →Settings (or by going to the Eclipse Window →Preferences →UIMA Preferences) and checking or unchecking one of the following: "`Auto generate
.java files when defining types`", "`Generate JCasGen classes only for types defined within the local project scope`" and "`Display fully qualified type
names.`"
When the Auto generate option is checked and the development language for the AE is Java, any time a change is made to a type and the change is saved, the corresponding .java files are generated using the JCasGen tool.
The results are stored in the primary source directory defined for the project.
The primary source directory is that listed first when you right click on your project and select Properties →Java Build Path, click on the Source tab and look in the list box under the text that reads: __Source folder on build path__.
If no source folders are defined, you will get a warning that you have no source folders defined and xref:tools.adoc#ugr.tools.jcasgen[JCasGen] will not be run.
When JCasGen is run, you can monitor the progress of the generation by observing the status on the Eclipse status line (normally at the bottom of the Eclipse window).
JCasGen runs on the fully-merged type system, consisting of the type specification plus any imported type system, plus (for aggregates) the merged type systems of all the components in an aggregate.
[WARNING]
====
If the components of the aggregate have different definitions for the same type name, the CDE will show a warning.
It is possible to continue past this warning, in which case the CDE will produce the correct Java source files representing the merged types (that is, the type definition that contains all of the features defined on that type by all of your components). However, it is not recommended to use this feature (of having different definitions for the same type name) since it can make it difficult to xref:ref.adoc#ugr.ref.jcas.merging_types_from_other_specs[combine/package] your annotator with others.
====
[NOTE]
====
In addition to running automatically, you can manually run JCasGen on the fully merged type system by clicking the JCasGen button, or by selecting Run JCasGen from the UIMA pulldown menu:
====
.Setting JCasGen options
image::images/tools/tools.cde/image032.jpg[Setting JCasGen options]
When __Generate JCasGen classes only for types defined within the local project scope__ is checked, then JCasGen skips generating classes for types that are imported from sources outside this project.
This might be done, for instance, if you have an aggregate which is importing type systems from its delegates, some of which are defined in other projects, and have JCasGen'd files already present in those other projects.
The UIMA settings and preferences for controlling this are used to initialize a particular instance of the editor, when it is started.
Following that, you can override this setting, just for that editor, by checking or unchecking the box shown on the type system page:
.Limit the scope of JCasGen
image::images/tools/tools.cde/limitJCasGen.jpg[Limit the scope of JCasGen]
[NOTE]
====
If this is checked, and one of the types that would be excluded has merged type features, an error message is issued - because JCasGen will need to be run for the combined (merged) type in order to get a class definition that will work for this configuration (have access to all the features). If this happens, you have to run without limiting JCasGen, and manually delete any duplicated/unwanted source results.
====
When __Display fully qualified type names__ is left unchecked, the namespace of types is not displayed, i.e.
if a fully qualified type name is my.namespace.person, only the abbreviated type name person will be displayed.
In the Type page diagram shown above, __Display fully qualified type names__ is in fact unchecked.
To add, edit, or remove types the buttons on the top left section are used.
When adding or editing types, fully qualified type names should of course be used, regardless of whether the __Display fully qualified type names__ is unchecked.
Removing or editing a type will have a cascading effect in that the type removal/edit will effect inputs, outputs, indexes and type priorities in the natural way.
When a type is added, this dialog is shown:
.Adding a type
image::images/tools/tools.cde/image034.jpg[Adding a type]
Type names should be specified using a namespace.
The namespace is like a Java package name, and serves to insure type names are unique.
It also serves as the package name for the generated JCas classes.
The namespace name is the set of names up to the last period in the string.
The supertype must be picked from an existing type.
The entry field for the supertype supports Eclipse-style content assist.
To use it, put the cursor in the supertype field, and type a letter or two of the supertype name (lower case is fine), either starting with the name space, or just with the type name (without the name space), and hold down the Control key and then press the spacebar.
When you do this, you can see a list of suitable matching types.
You can then type more letters to narrow down your choices, or pick the right entry with the mouse.
To see the available types and pick one, press the Browse button.
This will show the available types, and as you type letters for the type name (in lower case –capitalization is ignored), the available types that match are narrowed.
When you've typed enough to specify the type you want, press Enter.
Or you can use the list of matching type names and pick the one you want with the mouse.
Once you've added the type, you can add features to it by highlighting the type, and pressing the Add button.
If the type being defined is a subtype of uima.cas.String, the Add button allows you to add allowed values for the string, instead of adding features.
To edit a type or feature, you can double click the entry, or highlight the entry and press the Edit button.
To delete a type or feature, you highlight the entry to be deleted, and click the delete button or push the delete key.
If the range of a feature is an array or one of the built-in list types, an additional specification allows you to specify if multiple references to the object referenced by this feature are allowed.
If they are not allowed then the XMI serialization of instances of this type use a more efficient format.
If the range of a feature is an array of Feature Structures, then it is possible to specify an element type for the array.
This information is used in the XMI serialization and also by the JCas generation routines to generate more efficient code.
.Specifying a Feature Structure
image::images/tools/tools.cde/image036.jpg[Specifying a Feature Structure]
It is also possible to import type systems for inclusion in your descriptor.
To do this, use the Type Import panel's __Add...__ button.
This allows you to import a type system descriptor.
When importing by name, the name is resolved using the class path for the Eclipse project containing the descriptor file being edited, or by looking up this name in the UIMA DataPath.
The DataPath can be set by pushing the Set DataPath button.
It will be remembered for this Eclipse project, as a project Property, so you only have to set it once (per project). The value of the DataPath setting is written just like a class path, and can include directories or JAR files, just as is true for class paths.
The following dialog allows you to pick one or more files from the Eclipse workspace, or one file (at a time) from the file system:
.Picking files for importing
image::images/tools/tools.cde/import-chooser.jpg[Picking files for importing]
This is essentially the same dialog as was used to add component engines to an aggregate.
To import from a type system descriptor that is not part of your Eclipse workspace, click the __Browse the file system...__ button.
Imported types are validated, and if OK, they are added to the list in the Imported Type Systems section of the Type System page.
Any types they define are merged with the existing type system.
Imported types and features which are only defined in imports are shown in the Type System section, but in a grayed-out font; these type cannot be edited here.
To change them, open up the imported type system descriptor, and change them there.
If you hover the mouse over an import specification, it will show more information about the import.
If you right-click, it will bring up a context menu that allows opening the imported file in the Editor, if the imported file is part of the Eclipse workspace.
Changes you make, however, won't be seen until you close and reopen the editor on the importing file.
It is not possible to define types for an aggregate analysis engine.
In this case the type system is computed from the component AEs.
The Type System information is shown in a grayed-out font.
[[ugr.tools.cde.type_system.exporting]]
=== Exporting
In addition to importing type specifications, you can export as well.
When you push the __Export...__ button, the editor will create a new importable XML descriptor for the types in this type system, and change the existing descriptor to import that newly created one.
image::images/tools/tools.cde/image040.jpg[Exporting a type system]
The base file name you type is inserted into the path in the line below automatically.
You can change the path where the generated part descriptor is stored by overtyping the lower text box.
When you click OK, the new part descriptor will be generated, and the current descriptor will be changed to import that part.
[[ugr.tools.cde.capabilities]]
== Capabilities Page
Capabilities come in __sets__.
You can have multiple sets of capabilities; each one specifies languages supported, plus inputs and outputs of the Analysis Engine.
The idea behind having multiple sets is the concept that different inputs can result in different outputs.
Many Analysis Engines, though, will probably define just one set of capabilities.
A sample Capabilities page is given below:
image::images/tools/tools.cde/image042.jpg[Capabilities page]
When defining the capabilities of a primitive analysis engine, input and output types can be any type defined in the type system.
When defining the capabilities of an aggregate the inputs must be a subset of the union of the inputs in the constituent analysis engines and the outputs must be a subset of the union of the outputs of the constituent analysis engines.
To add a type, first select something in the set you wish to add the type to, and press Add Type.
The following dialog appears presenting the user with a list of types which are candidates for additional inputs:
image::images/tools/tools.cde/image044.jpg[Adding a type to the capabilities page]
Follow the instructions to mark the types as input and / or output (a type can be both). By default, the <all features> flag is set to true.
If you want to specify a subset of features of a type, read on.
When types have features, you can specify what features are input and / or output.
A type doesn't have to be an output to have an output feature.
For example, an Analysis Engine might be passed as input a type Token, and it adds (outputs) a feature to the existing Token types.
If no new Token instances were created, it would not be an output Type, but it would have features which are output.
To specify features as input and / or output (they can be both), select a type, and press Add.
The following dialog box appears:
image::images/tools/tools.cde/image046.jpg[Specifying features as input or output]
To mark a feature as being input and / or output, click the mouse in the input and / or output column for the feature.
If you select <all features>, it unmarks any individual feature you selected, since <all features> subsumes all the features.
The Languages part of the capability is where you specify what languages are supported by the Analysis Engine.
Supported languages should be listed using either a two letter ISO-639 language code, or an ISO-639 language code followed by a hyphen and then a two-letter ISO-3166 country code.
Add a language by selecting Languages and pressing the Add button.
The dialog for adding languages is given below.
image::images/tools/tools.cde/image048.jpg[Specifying a language]
The Sofa part of the capability is optional; it allows defining Sofa names that this component uses, and whether they are input (meaning they are created outside of this component, and passed into it), or output (meaning that they are created by this component). Note that a Sofa can be either input or output, but can't be both.
To add a Sofa name (which is synonymous with the view name), press the Add Sofa button, and this dialog appears:
image::images/tools/tools.cde/image050.jpg[Specifying a Sofa name]
[[ugr.tools.cde.capabilities.sofa_name_mapping]]
=== Sofa (and view) name mappings
Sofa names, once created, are used in Sofa Mappings.
These are optional mappings, done in an aggregate, that specify which Sofas are the same ones but with different names.
The Sofa Mappings section is minimized unless you are editing an Aggregate descriptor, and have one or more Sofa names defined for the aggregate.
In that case, the Sofa Mappings section will look like this:
image::images/tools/tools.cde/image052.jpg[Sofa mappings]
Here the aggregate has defined two input Sofas, named "`MyInputSofa`", and "`AnotherSofa`".
Any named sofas in the aggregate's capabilities will appear in the Sofa Mapping section, listed either under Inputs or Outputs.
Each name in the Mappings has 0 or more delegate (component) sofa names mapped to it.
A delegate may have multiple Sofas, as in this example, where the GovernmentOfficialRecognizer delegate has Sofas named "`so1`" and "`so2`".
Delegate components may be written as Single-View components.
In this case, they have one implicit, default Sofa ("`_InitialView`"), and to map to it you use the form shown for the "`NameRecognizer`"– you map to the delegate's key name in the aggregate, without specifying a Sofa name.
You can also specify the sofa name explicitly, e.g., NameRecognizer/_InitialView.
To add a new mapping, select the Aggregate Sofa name you wish to add the mapping for, and press the Add button.
This brings up a window like this, showing all available delegates and their Sofas; select one or more (use the normal multi-select methods) of these and press OK to add them.
image::images/tools/tools.cde/image054.jpg[Adding a Sofa mapping]
To edit an existing mapping, select the mapping and press Edit.
This will show the existing mapping with all mapped items "`selected`", and other available items unselected.
Change the items selected to match what you want, deselecting some, and perhaps selecting others, and press OK.
[[ugr.tools.cde.indexes]]
== Indexes Page
The Indexes page is where the user declares what indexes and type priority lists are used by the analysis engine.
Indexes are used to determine which Feature Structures of a particular type are fetched, using an iterator in the UIMA API.
An unpopulated Indexes page is displayed below:
image::images/tools/tools.cde/image056.jpg[Index page]
Both indexes and type priority lists can have imports.
These imports work just like the type system imports, described above.
Both indexes and type priority lists can be exported to new component descriptors, using the Export... button, just like the type system export operation described above.
The built-in Annotation Index is always present.
It is based on the built-in type ``uima.tcas.Annotation ``and has keys begin (Ascending), end (Descending) and TYPE_PRIORITY.
There are no built-in type priorities, so this last sort item does not play a role in the index unless type priorities are specified.
Type priority may be combined with other keys.
Type priorities are defined in the Priority Lists section, using one or more priority list.
A given priority list gives an ordering among a group of types.
Types that appear higher in the priority list are given higher priority, in other words, they sort first when TYPE_PRIORITY is specified as the index key.
Subtypes of these types are also ordered in a consistent manner, unless overridden by another specific type priority specification.
To get the ordering used among all the types, all of the type priority lists are merged.
This gives a partial ordering among the types.
Ties are resolved in an unspecified fashion.
The Component Descriptor Editor checks for incompatible orderings, and informs the user if they exist, so they can be corrected.
To create a new index, use the Add Index button in the top left section.
This brings up this dialog:
image::images/tools/tools.cde/image058.jpg[Adding a new index]
Each index needs a globally unique index name.
Every index indexes one CAS type (including its subtypes). If you're using Eclipse 3.2 or later, the entry field for this has content assist (start typing the type name and press Control –Spacebar to get help, or press the Browse button to pick a type).
Indexes can be sorted, in which case you need to specify one or more keys to sort on.
Sort keys are selected from features whose range type is Integer, Float, or String.
Some elements will be disabled if they are not relevant.
For instance, if the index kind is "`bag`", you cannot provide sort keys.
The order of sort keys can be adjusted using the up and down buttons, if necessary.
[NOTE]
====
There is usually no need to explicitly declare a Bag index in your descriptor.
As of UIMA v2.1, if you do not declare any index for a type (or any of its supertypes), a Bag index will be automatically created.
This index is accessed using the `getAllIndexedFS(...)` method defined on the index repository.
====
A set index will contain no duplicates of the same type, where a duplicate is defined by the indexing comparator.
That is, if you commit two feature structures of the same type that are equal with respect to the indexing comparator, only the first one will be entered into the index.
Note that you can still have duplicates with respect to the indexing order, if they are of a different type.
A set index is not guaranteed to be sorted.
If no keys are specified for a set index, then all instances are considered by default to be equal, so only the first instance (for a particular type or subtype of the type being indexed) is indexed.
On the other hand, "`bag`" indicates that all annotation instances are indexed, including duplicates.
The Priority Lists section of the Indexes page is used to specify Priority Lists of types.
Priority Lists are unnamed ordered sets of type names.
Add a new priority list by clicking the Add Set button.
Add a type to an existing priority list by first selecting the set, and then clicking Add.
You can use the up and down buttons to adjust the order as necessary; these buttons move the selected item up or down.
Although it is possible to import self-contained index and type priority files, the creation of such files is not yet supported by the Component Descriptor Editor.
If you create these files using another editor, they can be imported using the corresponding Import panels, shown on the right.
Imports are specified in the same manner as they are for Type System imports.
[[ugr.tools.cde.resources]]
== Resources Page
The resources page describes resource dependencies (for primitive Analysis Engines) and external Resource specification and their bindings to the resource dependencies.
Only primitive Analysis Engines define resource dependencies.
Primitive and Aggregate Analysis Engines can define external resources and connect them (bind them) to resource dependencies.
When an Aggregate is providing an external resource to be bound to a dependency, the binding is specified using a possibly multi-level path, starting at the Aggregate, and specify which component (by its key name), and then if that component is, in turn, an Aggregate, which component (again by its key name), and so on until you reach a primitive.
The sequence of key names is made into the binding specification by joining the parts with a "`/`" character.
All of this is done for you by the Component Descriptor Editor.
Any external resource provided by an Aggregate will override any binding provided by any lower level component for the same resource dependency.
There are two views of the Resources page, depending on whether the Analysis Engine is an Aggregate or Primitive.
Here's the view for a Primitive:
image::images/tools/tools.cde/image060.jpg[Resources page for a primitive]
To declare a resource dependency, click the Add button in the right hand panel.
This puts up the dialog:
image::images/tools/tools.cde/image062.jpg[Specifying a resource dependency]
The Key must be unique within the descriptor declaring it.
The Interface, if present, is the name of a Java interface the Analysis Engine uses to access the resource.
Declare actual External resource on the left side of the page.
Clicking __Add__ brings up this dialog:
.Specifying an External Resource
image::images/tools/tools.cde/image064.jpg[Specifying an External Resource]
The Name must be unique within this Analysis Engine.
The URL identifies a file resource.
If both the URL and URL suffix are used, the file resource is formed by combining the first URL part with the language-identifier, followed by the URL suffix; see xref:ref.adoc#ugr.ref.xml.component_descriptor.aes.primitive.resource_manager_configuration[Resource Manager Configuration].
URLs may be written as __relative__ URLs; in this case they are resolved by looking them up relative to the classpath and/or datapath.
A relative URL has the path part starting without an intial "`/`"; for example: file:my/directory/file.
An absolute URL starts with `file:/` or `\file:///` or `\file://some.network.address/`. For more information about URLs, please read the javaDoc information for the Java class `URL`.
The `Implementation` is optional, and if given, must be a Java class that implements the interface specified in any Resource Dependencies this resource is bound to.
[[ugr.tools.cde.resources.binding]]
=== Binding
Once you have an external resource definition, and a Resource Dependency, you can bind them together.
To do this, you select the two things (an external resource definition, and a Resource Dependency) that you want to bind together, and click Bind.
[[ugr.tools.cde.resources.aggregates]]
=== Resources with Aggregates
When editing an Aggregate Descriptor, the Resource definitions panel will show all the resources at the primitive level, with paths down through the components (multiple levels, if needed) to get to the primitives.
The Aggregate can define external resources, and bind them to one or more uses by the primitives.
[[ugr.tools.cde.resources.imports_exports]]
=== Imports and Exports
Resource definitions and their bindings can be imported, just like other imports.
Existing Resource definitions and their bindings can be exported to a new importable part, and replaced with an import for that importable part, using the "`Export...`" button, just like the similar function on the Type System page.
[[ugr.tools.cde.source]]
== Source Page
The Source page is a text view of the xml content of the Analysis Engine or Type System being configured.
An example of this page is displayed below:
image::images/tools/tools.cde/image066.jpg[Source page]
Changes made in the GUI are immediately reflected in the xml source, and changes made in the xml source are immediately reflected back in the GUI.
The thought here is that the GUI view and the Source view are just two ways of looking at the same data.
When the data is in an unsaved state the file name is prefaced with an asterisk in the currently selected file tab in the editor pane inside Eclipse (as in the example above).
You may accidentally create invalid descriptors or XML by editing directly in the Source view.
If you do this, when you try and save or when you switch to a different view, the error will be detected and reported.
In the case of saving, the file will be saved, even if it is in an error state.
[[ugr.tools.cde.source.formatting]]
=== Source formatting – indentation
The XML is indented using an indentation amount saved as a global UIMA preference.
To change this preference, use the Eclipse menu item: Windows →Preferences →UIMA Preferences.
[[ugr.tools.cde.creating_self_contained_type_system]]
== Creating a Self-Contained Type System
It is also possible to use the Component Descriptor Editor to create or edit self-contained type systems.
To create a self-contained type system, select the menu item File →New →Other and then select Type System Descriptor File.
From the next page of the selection wizard specify a Parent Folder and File name and click Finish.
image::images/tools/tools.cde/image068.jpg[Working with a self-contained type system]
image::images/tools/tools.cde/image070.jpg[]
This will take you to a version of the Component Descriptor Editor for editing a type system file which contains just three pages: an overview page, a type system page, and a source page.
The overview page is a bit more spartan than in the case of an AE.
It looks like the following:
image::images/tools/tools.cde/image072.jpg[Editing a type system object]
Just like an AE has an associated name, version, vendor and description, the same is true of a self-contained type system.
The Type System page is identical to that in an AE descriptor file, as is the Source page.
Note that a self-contained type system can import type systems just like the type system associated with an AE.
A type system component can also be created from an existing descriptor which contains a type system definition section, by clicking on the Export... button on the Type System page.
[[ugr.tools.cde.creating_other_descriptor_components]]
== Creating Other Descriptor Components
The new wizard can create several other kinds of components: Collection Processing Management (CPM) components, flow controllers, and importable parts (besides Type Systems, described above, Indexes, Type Priorities, and Resource Manager Configuration imports).
The CPM components supported by this editor include the Collection Reader, CAS Initializer, and CAS Consumer descriptors.
Each of these is basically treated just like a primitive AE descriptor, with small changes to accommodate the different semantics.
For instance, a CAS Consumer can't declare in its capabilities section that it outputs types or features.
Flow controllers are components that control the flow of CASes within an aggregate, an are edited in a similar fashion as a primitive Analysis Engine.
The importable part support requires context information to enable the editor to work, because much of the power of this editor comes from extensive checking that requires additional information, other than what is available in just the importable part.
For instance, when you create or edit an Indexes import, the facility for adding new indexes needs the type information, which is not present in this part when it is edited alone.
To overcome this, when you edit these descriptors, you will be asked to specify a context descriptor, usually a descriptor which would import the part being edited, which would have the additional information needed.
Various methods are used to guess what the context descriptor should be - and if the guess is correct, you can just press the Enter key to confirm.
The last successful context file is remembered and will be suggested as the context file to use at the next edit session