<html><head> | |
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> | |
<title>CFE User Guide</title><link rel="stylesheet" href="css/stylesheet-html.css" type="text/css"><meta name="generator" content="DocBook XSL Stylesheets V1.72.0"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="book" lang="en" id="d0e2"><div class="titlepage"><div><div><h1 class="title"><a name="d0e2"></a>CFE User Guide</h1></div><div><div class="authorgroup"><h3 class="corpauthor">Authors: The Apache UIMA Development Community</h3></div></div><div><span class="productname">Apache UIMA Sandbox<br></span></div><div><p class="releaseinfo">Version 2.3.0</p></div><div><p class="copyright">Copyright © 2008, 2009 The Apache Software Foundation</p></div><div><div class="legalnotice"><a name="d0e15"></a><p> </p><p><b>Incubation Notice and Disclaimer. </b>Apache UIMA is an effort undergoing incubation at the Apache Software Foundation (ASF). | |
Incubation is required of all newly accepted projects until a further review indicates that | |
the infrastructure, communications, and decision making process have stabilized in a manner | |
consistent with other successful ASF projects. While incubation status is not necessarily | |
a reflection of the completeness or stability of the code, | |
it does indicate that the project has yet to be fully endorsed by the ASF.</p><p> </p><p> </p><p><b>License and Disclaimer. </b>The ASF licenses this documentation | |
to you under the Apache License, Version 2.0 (the | |
"License"); you may not use this documentation except in compliance | |
with the License. You may obtain a copy of the License at | |
</p><div class="blockquote"><blockquote class="blockquote"><p> | |
<a xmlns:xlink="http://www.w3.org/1999/xlink" href="http://www.apache.org/licenses/LICENSE-2.0" target="_top">http://www.apache.org/licenses/LICENSE-2.0</a> | |
</p></blockquote></div><p> | |
Unless required by applicable law or agreed to in writing, | |
this documentation and its contents are distributed under the License | |
on an | |
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | |
KIND, either express or implied. See the License for the | |
specific language governing permissions and limitations | |
under the License. | |
</p><p> </p><p> </p><p><b>Trademarks. </b>All terms mentioned in the text that are known to be trademarks or | |
service marks have been appropriately capitalized. Use of such terms | |
in this book should not be regarded as affecting the validity of the | |
the trademark or service mark. | |
</p></div></div></div><hr></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="chapter"><a href="#_Overview">1. | |
Overview | |
</a></span></dt><dd><dl><dt><span class="section"><a href="#_Motivation">1.1. | |
Motivation | |
</a></span></dt><dt><span class="section"><a href="#_Approaches_to_feature_extraction">1.2. | |
Approaches to feature extraction | |
</a></span></dt><dd><dl><dt><span class="section"><a href="#_Custom_CAS_Consumers">1.2.1. | |
Custom CAS Consumers | |
</a></span></dt><dt><span class="section"><a href="#_CFE_approach">1.2.2. | |
CFE approach | |
</a></span></dt></dl></dd><dt><span class="section"><a href="#_CFE_Basics">1.3. | |
CFE Basics | |
</a></span></dt></dl></dd><dt><span class="chapter"><a href="#_Components">2. | |
Components | |
</a></span></dt><dd><dl><dt><span class="section"><a href="#_FESL_XSD">2.1. | |
FESL XSD | |
</a></span></dt><dt><span class="section"><a href="#_Source_Code">2.2. | |
Source Code | |
</a></span></dt><dt><span class="section"><a href="#_Descriptors">2.3. | |
Descriptors | |
</a></span></dt><dt><span class="section"><a href="#_Type_Dependencies">2.4. | |
Type Dependencies | |
</a></span></dt></dl></dd><dt><span class="chapter"><a href="#_Configuration_Files">3. | |
Configuration Files | |
</a></span></dt><dd><dl><dt><span class="section"><a href="#_Common_notations_and_tags">3.1. | |
Common notations and tags | |
</a></span></dt><dd><dl><dt><span class="section"><a href="#_Feature_path">3.1.1. | |
Feature path | |
</a></span></dt><dt><span class="section"><a href="#_Full_path_and_partial_path">3.1.2. | |
Full path and partial path | |
</a></span></dt><dt><span class="section"><a href="#_TAM_and_FAM">3.1.3. | |
TAM and FAM | |
</a></span></dt><dt><span class="section"><a href="#_Arrays">3.1.4. | |
Arrays | |
</a></span></dt><dt><span class="section"><a href="#_Parent_tag">3.1.5. | |
Parent tag | |
</a></span></dt><dt><span class="section"><a href="#_Null_values">3.1.6. | |
Null values | |
</a></span></dt><dt><span class="section"><a href="#_Implicit_TA_exclusion">3.1.7. | |
Implicit TA exclusion | |
</a></span></dt></dl></dd><dt><span class="section"><a href="#_FESL_Elements">3.2. | |
FESL Elements | |
</a></span></dt><dd><dl><dt><span class="section"><a href="#_BitsetFeatureValuesXML">3.2.1. | |
BitsetFeatureValuesXML | |
</a></span></dt><dt><span class="section"><a href="#_EnumFeatureValuesXML">3.2.2. | |
EnumFeatureValuesXML | |
</a></span></dt><dt><span class="section"><a href="#_ObjectPathFeatureValue">3.2.3. | |
ObjectPathFeatureValuesXML | |
</a></span></dt><dt><span class="section"><a href="#_PatternFeatureValuesXM">3.2.4. | |
PatternFeatureValuesXML | |
</a></span></dt><dt><span class="section"><a href="#_RangeFeatureValuesXML">3.2.5. | |
RangeFeatureValuesXML | |
</a></span></dt><dt><span class="section"><a href="#_SingleFeatureMatcherXML">3.2.6. | |
SingleFeatureMatcherXML | |
</a></span></dt><dt><span class="section"><a href="#_GroupFeatureMatcherXML">3.2.7. | |
GroupFeatureMatcherXML | |
</a></span></dt><dt><span class="section"><a href="#_PartialObjectMatcherXML">3.2.8. | |
PartialObjectMatcherXML | |
</a></span></dt><dt><span class="section"><a href="#_FeatureObjectMatcherXML">3.2.9. | |
FeatureObjectMatcherXML | |
</a></span></dt><dt><span class="section"><a href="#_TargetAnnotationXML">3.2.10. | |
TargetAntotationXML | |
</a></span></dt></dl></dd><dt><span class="section"><a href="#_Configuration_file_sample">3.3. | |
Configuration file sample | |
</a></span></dt><dd><dl><dt><span class="section"><a href="#_Task_definition">3.3.1. | |
Task definition | |
</a></span></dt><dt><span class="section"><a href="#_Implementation">3.3.2. | |
Implementation | |
</a></span></dt></dl></dd></dl></dd><dt><span class="chapter"><a href="#_Using_CFE_for_evaluation">4. | |
Using CFE for evaluation | |
</a></span></dt></dl></div><div class="chapter" lang="en" id="_Overview"><div class="titlepage"><div><div><h2 class="title"><a name="_Overview"></a>Chapter 1. | |
Overview | |
</h2></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_Motivation"></a>1.1. | |
Motivation | |
</h2></div></div></div><p class="Normal">Feature extraction, the extraction of | |
information from data sources, is a common task frequently required | |
to be performed by many different types of applications, such as | |
machine learning, performance evaluation, and statistical analysis. | |
This guide describes a tool that can be used to facilitate this | |
extraction process, in conjunction with the Unstructured Information | |
Management Architecture (UIMA), particularly focusing on text | |
processing applications. UIMA provides a mechanism for executing | |
modules called Analysis Engines that analyze artifacts (text | |
documents in our case) and store the results of the analysis in a | |
data structure called the Common Analysis Structure (CAS). These | |
results are stored as Feature Structures, which are simply data | |
structures that have an associated type and a set of properties in | |
the form of attribute/value pairs. Feature Structures that are | |
attached to a particular span of a text document are called | |
Annotations. They usually represent a concept that the analysis | |
engine computes based on the text. The attributes are called | |
<code class="code">Features</code> in UIMA terminology. This sense of feature will always be | |
referred to as <code class="code">UIMA feature</code> in this document, so as not to be | |
confused with the general sense of <code class="code">feature</code> when discussing | |
<code class="code">feature extraction</code>, referring to the process of extracting values | |
from data sources (in our case, the CAS). Values that are extracted | |
are not required to be values of attributes (i.e., UIMA Features) of | |
Annotations, but can be computed by other methods, as will be shown | |
later. The terms <code class="code">features</code> and <code class="code">feature values</code> | |
in this document refer to any value extracted from the CAS, regardless of the particular | |
source. | |
</p><p class="Normal"></p><p class="Normal">As an example, Figure 1 depicts annotation objects | |
of the type Token that are associated with individual words, each | |
having attributes <code class="code">Index</code> and <code class="code">POS</code> (part of speech). A feature | |
extraction task could be "extract token indexes for the words that | |
are nouns". Such a task is translated to the following execution | |
steps: | |
</p><div class="orderedlist"><ol type="1"><li><p class="Normal">find an annotation of a type <code class="code">Token</code></p></li><li><p class="Normal">examine the value of <code class="code">POS</code> attribute</p></li><li><p class="Normal">extract the value of <code class="code">Index</code> attribute only if | |
the value of <code class="code">POS</code> attribute is <code class="code">NN</code> | |
</p></li></ol></div><p class="Normal">The expression "word that is a noun" defines a | |
concept, and its implementation is that it has to be found in the | |
CAS. <code class="code">Token index</code> is the information (i.e., <code class="code">feature</code>) to be | |
extracted. The resulting values for the task will be values 3 and 9, | |
which are the values of the attribute <code class="code">Index</code> for the words <code class="code">car</code> and | |
<code class="code">finish</code>. | |
</p><p> | |
<span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-1.jpg"></span> | |
</p><p class="LREC Caption"> | |
Figure 1: Annotated text sample | |
</p><p class="Normal">While Figure 1 shows a fairly simple example of | |
annotations types associated with some text, real world applications | |
could have quite sophisticated annotation types, storing various | |
kinds of computed information. Consider an annotation type Car that | |
has, for illustration purposes, just two attributes: <code class="code">Color</code> and | |
Engine. While the attribute <code class="code">Color</code> is of type string, the <code class="code">Engine</code> | |
attribute is a complex annotation type with attributes <code class="code">Cylinders</code> and | |
<code class="code">Size</code>. This is represented by a UML diagram in Figure 2, illustrating | |
a class hierarchy on the left and sample instance of this class | |
structure on the right. | |
</p><p> | |
<span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-3.jpg"></span> | |
</p><p class="LREC Caption"> | |
Figure 2: Composite object sample | |
</p><p class="Normal"> | |
If a requirement is to extract the number of cylinders of the car's | |
engine, then the application needs to find any object(s) that represent | |
the concept of a car (<code class="code">CarAnnotation</code> in this case) and traverse the | |
object's structure to access the <code class="code">Cylinders</code> attribute of <code class="code">EngineAnnotation</code>. | |
Once the attribute's value is accessed, the application outputs it to the | |
desired destination, such as a text file or a database. | |
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_Approaches_to_feature_extraction"></a>1.2. | |
Approaches to feature extraction | |
</h2></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Custom_CAS_Consumers"></a>1.2.1. | |
Custom CAS Consumers | |
</h3></div></div></div><p class="Normal"> | |
When working with UIMA, feature extraction is usually implemented by | |
writing a special UIMA component called a CAS Consumer that contains | |
custom code for accessing the annotations and their attributes, | |
outputting them to a file, memory or database as required. The CAS | |
consumer contains explicit logic for traversing the object's structure | |
and examining values of specific attributes. Also, the CAS consumer would | |
likely have code for outputting the accessed values to a particular | |
destination, as required by the application. Writing CAS consumers can be | |
labor intensive and requires Java programming. While this approach allows | |
powerful control and customization to an application's needs, supporting | |
the code can become problematic, especially as application requirements | |
change. This can have a negative effect on many different aspects of code | |
support, such as maintenance, evolution, bug fixing, reusability etc. | |
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_CFE_approach"></a>1.2.2. | |
CFE approach | |
</h3></div></div></div><p class="Normal"></p><p class="Normal"> | |
CFE is a multipurpose tool that enables feature extraction from a UIMA | |
CAS in a very generalized and application independent way. The extraction | |
process is performed according to rules expressed using the Feature | |
Extraction Specification Language (FESL) that are stored in configuration | |
files. Using CFE eliminates the need for creating customized CAS | |
consumers and writing Java code for every application. Instead, by using | |
FESL rules in XML format, users can customize the information extraction | |
process to suit their application. FESL's rule semantics allow the | |
precise identification of the information that is required to be | |
extracted by specifying precise multi-parameter criteria. The FESL syntax | |
and semantics are defined further in this guide.</p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_CFE_Basics"></a>1.3. | |
CFE Basics | |
</h2></div></div></div><p class="Normal">The feature extraction process involves three | |
major steps:</p><div class="orderedlist"><ol type="1"><li><p class="Normal"> | |
locating a concept of interest that is represented by a UIMA annotation | |
object; examples of such concepts could be "word that is a noun" or "a | |
car that has a six cylinder engine" etc. The annotation object that | |
represents such a concept is referred to as the Target Annotation (TA) | |
</p></li><li><p class="Normal"> | |
locating concepts, relative to the TAs, specifying the information to | |
extract. These are also represented by UIMA annotations, that are within | |
some context of the TAs. Some examples of context could be "to the left | |
of the TA" or "within the TA" etc. The annotation object that corresponds | |
to such a concept is referred to as the Feature Annotation (FA). | |
In relation to Figure 1, an example FA could be the expression "two words | |
to the left from word finish that is a noun", assuming that "word finish | |
that is a noun", describes the TA. The result of such a specification | |
will be tokens <code class="code">at</code> and <code class="code">the</code> | |
</p></li><li><p class="Normal">extraction of the specified information | |
from FAs | |
</p></li></ol></div><p class="Normal"> | |
<a name="FA"></a> | |
Just to illustrate the process, suppose the requirement is "to | |
extract indexes of two words to the left of the word finish that is | |
a noun". In such a scenario, in the first step, CFE locates a TA | |
that is represented by an annotation object corresponding to a word | |
<code class="code">finish</code> and also has its <code class="code">POS</code> attribute equal to <code class="code">NN</code>. For the | |
second step, FAs that correspond to two words to the left from TA | |
are located. On the third step, values of the <code class="code">Index</code> attribute for | |
each of FAs that were found are extracted. It is possible, however, | |
that the requirement is to extract the value of the <code class="code">Index</code> attribute | |
from the annotation for the word <code class="code">finish</code> itself. In such a case, | |
the TA and FA are represented by the same UIMA annotation object. | |
This is usually the case when extracting features for evaluation or | |
testing. The specification for a TA or FA can be specified by | |
complex multi-parameter conditions that are also expressed using | |
FESL, as will be shown later. | |
</p></div></div><div class="chapter" lang="en" id="_Components"><div class="titlepage"><div><div><h2 class="title"><a name="_Components"></a>Chapter 2. | |
Components | |
</h2></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_FESL_XSD"></a>2.1. | |
FESL XSD | |
</h2></div></div></div><p class="Normal"> | |
The specification for FESL is written in XSD format and stored in the | |
file <CFE_HOME>/src/main/xsdForEmf/CFEConfigModel.xsd to be used | |
by EMF-based parser generator and in <CFE_HOME>/src/main/xsdForXMLBeans | |
for XMLBeans parser generator). Using this XSD in conjunction with an | |
XML editor that provides syntax validation can | |
help to provide more efficient editing of FESL configuration files. | |
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_Source_Code"></a>2.2. | |
Source Code | |
</h2></div></div></div><p class="Normal">CFE is implemented in Java 5.0 for Apache UIMA, and | |
resides in the org.apache.uima.tools.cfe package. CFE is dependent on | |
Eclipse EMF, Apache UIMA, and the Apache XMLBeans and JXPath | |
libraries. The source code contains the complete implementation of | |
CFE, including auxiliary utility classes that wrap some UIMA | |
functionality (located in org.apache.uima.tools.cfe.support package) | |
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_Descriptors"></a>2.3. | |
Descriptors | |
</h2></div></div></div><p class="Normal"> | |
A sample descriptor file that defines a type system for machine learning | |
processing is located in | |
<CFE_HOME>src/main/resources/descriptors/type_system/AppliedSenseAnnotation.xml | |
</p><p class="Normal"> | |
A sample descriptor that uses CFE in a CAS Consumer is located in | |
<CFE_HOME>src/main/resources/descriptors/cas_consumers/UIMAFeatureConsumer.xml | |
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_Type_Dependencies"></a>2.4. | |
Type Dependencies | |
</h2></div></div></div><p class="Normal"> | |
CFE code uses UIMA example annotation type | |
<code class="code">org.apache.uima.examples.SourceDocumentInformation</code> | |
to retrieve the name of a document that is being processed. | |
Typically, annotations of this type are produces by a file collection reader, | |
provided by UIMA examples. If a UIMA application uses a different type | |
of a reader, an annotation of this type should be created and initialized | |
for each document prior to execution of TAE. Please see | |
<CFE_HOME>src/test/java/org/apache/uima/tools/cfe/test/CFEtest.java | |
for an example. | |
</p></div></div><div class="chapter" lang="en" id="_Configuration_Files"><div class="titlepage"><div><div><h2 class="title"><a name="_Configuration_Files"></a>Chapter 3. | |
Configuration Files | |
</h2></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_Common_notations_and_tags"></a>3.1. | |
Common notations and tags | |
</h2></div></div></div><p class="Normal"> | |
CFE configuration files are written using FESL semantic rules, as defined | |
in CFEConfig.xsd. These rules describe the information extraction process | |
and are independent of the application from which the information is to | |
be extracted. There are several common notations and tags that are used | |
in different elements of FESL | |
</p><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Feature_path"></a>3.1.1. | |
Feature path | |
</h3></div></div></div><p class="Normal"> | |
A "feature path" is a mechanism used by FESL to identify a particular | |
feature (not necessarily a UIMA feature) of an annotation. The value | |
associated with the feature, indicated by the feature path, can be either | |
evaluated to match a certain criteria or extracted to the final output or | |
both. The syntax of a feature path is an indexed sequence of | |
attribute/method names separated by the colon character. Such a sequence | |
mimics the sequence of Java method calls required to extract the feature | |
value. For example, a value of the <code class="code">EngineAnnotation</code> attribute <code class="code">Cylinders</code> | |
from Figure 2 can be written as <code class="code">CarAnnotation:Engine:Cylinders</code>, where | |
Engine is an attribute of <code class="code">CarAnnotation</code>. The intermediate results of each | |
step of the call sequence can be referred from different FESL structural | |
elements by their zero-based index. For instance, the Parent Tag notation | |
(see below) uses the index to access intermediate values. The feature | |
path can be used to identify feature values that are either primitives or | |
complex object types. | |
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Full_path_and_partial_path"></a>3.1.2. | |
Full path and partial path | |
</h3></div></div></div><p class="Normal"> | |
There are two different ways of using feature path notation to identify | |
an object: full path and partial path. The object can be one of the | |
following: | |
</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">an annotation</p></li><li style="list-style-type: disc"><p class="Normal">value of an annotation's attribute</p></li><li style="list-style-type: disc"><p class="Normal"> | |
value of a result of an annotation's method; only get-style methods | |
(methods that return a value and take no parameters) are supported. | |
</p></li></ul></div><p class="Normal"> | |
A full path specifies a path to an object starting from its type. For | |
instance, if <code class="code">EngineAnnotation</code> is specified as a full path, it would refer | |
to all instances of annotations of that type. If <code class="code">CarAnnotation:Engine</code> is | |
specified, it would refer only to instances of the <code class="code">EngineAnnotation</code> type that are | |
attributes of instances of the <code class="code">CarAnnotation</code> type. Full path notation is usually | |
used for TA or FA identification. | |
</p><p class="Normal"> | |
A partial path specifies a path to an object starting from a previously | |
located annotation object (whether TA or FA). For example, if an instance | |
of <code class="code">CarAnnotation</code> is located as a TA, then the size of its engine can be | |
specified as Engine:Size. Partial path notation is usually used for | |
specification of feature values that are being examined or extracted. | |
The distinction between "full path" and "partial path" is very similar to | |
the concepts of "absolute path" and "relative path" when discussing a | |
computer's file system. | |
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_TAM_and_FAM"></a>3.1.3. | |
TAM and FAM | |
</h3></div></div></div><p class="Normal"> | |
Each FESL rule is represented by a1 XML element with the tag | |
<code class="code">targetAnnotation</code> | |
, as specified in the XSD by the | |
<a href="#_TargetAnnotationXML" title="3.2.10. TargetAntotationXML"> | |
<span class="Hyperlink2">TargetAnnotationXML</span> | |
</a> | |
type. Each element of this type is a composition of: | |
</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal"> | |
a single target annotation matcher ( | |
<code class="code">TAM</code> | |
) that is denoted by an XML element with the tag | |
<code class="code">targetAnnotationMatcher</code> | |
, of the type | |
<a href="#_PartialObjectMatcherXML" title="3.2.8. PartialObjectMatcherXML"> | |
<code class="code">PartialObjectMatcherXML</code> | |
</a> | |
</p></li><li style="list-style-type: disc"><p class="Normal"> | |
optional feature annotation matchers ( | |
<code class="code">FAM</code> | |
) denoted by XML elements with the tag <code class="code">featureAnnotationMatchers</code>, | |
of the type | |
<a href="#_FeatureObjectMatcherXML" title="3.2.9. FeatureObjectMatcherXML"> | |
<code class="code">FeatureObjectMatcherXML</code> | |
</a> | |
</p></li></ul></div><p class="Normal"> | |
The | |
<code class="code">TAM</code> | |
specifies search criteria for locating Target Annotations ( | |
<code class="code">TA</code> | |
s), while | |
<code class="code">FAM</code> | |
s contain criteria for locating Feature Annotations ( | |
<code class="code">FA</code> | |
s) and the specification of features for extraction from the | |
<code class="code">FA</code> | |
s. The criteria for the search and the features to be extracted are | |
specified using the | |
<a href="#_Feature_path" title="3.1.1. Feature path"> | |
<span class="Hyperlink1">feature path</span> | |
</a> | |
notation, as explained earlier. The XML tags representing the | |
matchers are detailed below. | |
<span class="system1"> </span> | |
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Arrays"></a>3.1.4. | |
Arrays | |
</h3></div></div></div><p class="Normal"> | |
Since UIMA annotations may have arrays as attributes, FESL provides the | |
ability to perform feature extraction from array objects. In particular, | |
going back to Figure 2, if the implementation for the <code class="code">Wheels</code> attribute is | |
a UIMA <code class="code">FSArray</code> type, then using feature path notation: | |
</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal"> | |
the feature value for the | |
<code class="code">Wheels</code> | |
attribute of | |
<code class="code">FSArray</code> | |
type can be specified as <code class="code">CarAnnotation:Wheels</code>. | |
</p></li><li style="list-style-type: disc"><p class="Normal"> | |
the feature value for the number of elements in the | |
<code class="code">FSArray</code> | |
can be specified as <code class="code">CarAnnotation:Wheels:size</code>, where size is a | |
method of | |
<code class="code">FSArray</code> | |
; such value corresponds to a concept of how many wheels the car | |
has. | |
</p></li><li style="list-style-type: disc"><p class="Normal">the feature values for individual elements of | |
<code class="code">Wheels</code> attribute of type <code class="code">WheelAnnotation</code> can be accessed as | |
<code class="code">CarAnnotation:Wheels:toArray</code>. It should be noted that <code class="code">toArray</code> is a | |
name of a method of the <code class="code">FSArray</code> type rather than a name of an | |
attribute.</p></li><li style="list-style-type: disc"><p class="Normal">the feature values for <code class="code">Diameter</code> attribute of each | |
<code class="code">WheelAnnotation</code> can be specified as | |
<code class="code">CarAnnotation:Wheels:toArray:Diameter</code> | |
</p></li></ul></div><p class="Normal"> | |
The result of using toArray as an accessor is an array of values. FESL | |
also provides syntax for accessing individual elements of arrays by index. | |
</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal"> | |
the feature for the diameter of the first wheel can be specified as | |
<code class="code">CarAnnotation:Wheels:toArray[0]:Diameter</code> | |
</p></li><li style="list-style-type: disc"><p class="Normal"> | |
the feature for the diameter of the first and second wheels can be | |
specified as <code class="code">CarAnnotation:Wheels:toArray[0][1]:Diameter</code> | |
</p></li><li style="list-style-type: disc"><p class="Normal"> | |
the feature for the diameter of first three wheels can be specified | |
as <code class="code">CarAnnotation:Wheels:toArray[0-2]:Diameter</code> | |
</p></li></ul></div><p class="Normal"> | |
The specification of individual elements can be mixed for example: | |
<code class="code">CarAnnotation:Wheels:toArray[0][2-3]:Diameter</code> refers to all elements of | |
<code class="code">Wheels</code> attribute except the second. If the index specified falls outside | |
the range of the matched data, a null value will be assigned. | |
</p><p class="Normal"> | |
If required, FESL allows sorting extracted features by an offset in the | |
text of the annotations that these features are extracted from. For | |
instance <code class="code">CarAnnotation:Wheels:toArray[sort]:Diameter</code> would ensure such | |
an order. | |
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Parent_tag"></a>3.1.5. | |
Parent tag | |
</h3></div></div></div><p class="Normal"> | |
The parent tag is used to access a specific element of a feature path of | |
a TA or FA by index. If a parent tag is used within a TAM specification, | |
it is applied to the full path of the corresponding TA. Likewise, parent | |
tags contained in FAMs are applied to the full a path of the | |
corresponding FA. The tag consists of <code class="code">__p</code> prefix followed by the index | |
of an element that is being accessed. For instance, <code class="code">__p0</code> addresses the | |
first element of a feature path. The tag can be a part of a feature path. | |
For example, if a TA is specified as <code class="code">CarAnnotation:Wheels:toArray</code>, | |
corresponding to a concept of "wheels of a car" then the value of the | |
<code class="code">Color</code> attribute of a <code class="code">CarAnnotation</code> object can be accessed by specifying | |
<code class="code">__p0:Color</code>. Such a specification can be used when it is required to | |
examine/extract features of a containing annotation along with features | |
of contained annotations. Samples of using parent tags are provided in | |
the sections that detail FESL syntax, below. | |
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Null_values"></a>3.1.6. | |
Null values | |
</h3></div></div></div><p class="Normal"> | |
CFE allows comparing feature values for equality to null. The root XML | |
element CFEConfig has a string attribute <code class="code">nullValueImage</code> that sets a | |
literal representation of a null value. If an extracted feature value is | |
null, it will be converted to a string that is assigned the | |
<code class="code">nullValueImage</code> attribute. The example below illustrates the usage of this | |
attribute. | |
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Implicit_TA_exclusion"></a>3.1.7. | |
Implicit TA exclusion | |
</h3></div></div></div><p class="Normal"> | |
While all FAM specifications for a single TAM are independent from | |
each other, there is an implicit dependency between TAMs. In | |
particular, they are dependent on the order in which they are | |
specified in a configuration file. Annotations corresponding to | |
certain concepts that were identified by a TAM that appear earlier in | |
the configuration file will be excluded from further processing by | |
FESL. This rule only applies to TAMs that use the | |
<code class="code">fullPath</code> | |
attribute in their specification (see | |
<a href="#_PartialObjectMatcherXML" title="3.2.8. PartialObjectMatcherXML"> | |
<span class="Hyperlink1"> | |
<code class="code">PartialObjectMatcherXML</code> | |
</span> | |
</a> | |
). Having the implicit exclusion helps to separate the processing of | |
same type annotations in the case when these annotations have | |
different semantic meaning. For instance, the set of features that is | |
required to be extracted from annotations of type | |
<code class="code">EngineAnnotation</code> | |
that are attributes of | |
<code class="code">CarAnnotation</code> | |
objects can be different than a set of features that is required to | |
be extracted from annotations of the same | |
<code class="code">EngineAnnotation</code> | |
type that are attributes of some other type or are not attached to | |
any annotations of other types. To implement such a behavior in FESL, | |
the fist | |
<code class="code">TAM</code> | |
would contain criteria for locating | |
<code class="code">EngineAnnotation</code> | |
objects that are attached to objects of the | |
<code class="code">CarAnnotation</code> | |
type, while the second | |
<code class="code">TAM</code> | |
would not specify any restriction on containment of objects of the | |
<code class="code">EngineAnnotation</code> | |
type. If such a specification is given, all | |
<code class="code">EngineAnnotation</code> | |
objects located according to the rule in the first | |
<code class="code">TAM</code> | |
will be excluded from further processing and, hence, will not be | |
available for processing by rules given in the second | |
<code class="code">TAM</code> | |
</p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_FESL_Elements"></a>3.2. | |
FESL Elements | |
</h2></div></div></div><p class="Normal"> | |
FESL's XSD defines several elements that allow specify rules for feature | |
extraction. These elements may contains attributes and other elements in | |
their definition | |
</p><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_BitsetFeatureValuesXML"></a>3.2.1. | |
BitsetFeatureValuesXML | |
</h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: bitmask[1]: Integer</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: exact_match[0..1]: boolean: default false</p></li></ul></div><p> | |
<span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-7.jpg" align="middle"></span> | |
</p><p class="Normal"> | |
The specification enables comparing a feature value to an integer | |
bitmask. The feature value is considered to be matched if it is of an | |
Integer type and: | |
</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal"> | |
if the <code class="code">exact_match</code> attribute is set to true and all "1" bits specified in | |
bitmask are also set in feature value | |
</p></li><li style="list-style-type: disc"><p class="Normal"> | |
if the <code class="code">exact_match</code> attribute is set to false and any of "1" bits | |
specified in bitmask is also set in feature value | |
</p></li></ul></div><p class="Normal">Example:</p><p class="Normal"><bitsetFeatureValues bitmask="3" exact_match="false" /></p><p class="Normal"><bitsetFeatureValues bitmask="3" exact_match="true" /></p><p class="Normal"> | |
The first line of the example specifies a test whether either of the two | |
less significant bits of a feature value is set. To be successful, the | |
test specified by the second line requires both less significant bits to be set. | |
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_EnumFeatureValuesXML"></a>3.2.2. | |
EnumFeatureValuesXML | |
</h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: caseSensitive[0..1]: boolean: default false</p></li><li style="list-style-type: disc"><p class="Normal">Element: values[0..*]: String</p></li></ul></div><p> | |
<span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-8.jpg" align="middle"></span> | |
</p><p class="Normal"> | |
EnumFeatureValuesXML element allow to test if a feature value belongs to | |
a finite set of values. According to EnumFeatureValuesXML specification, | |
if a feature value is equal to either one of the elements of values then | |
the feature is considered to be successfully evaluated. The <code class="code">caseSensitive</code> | |
attribute indicates whether the comparison between the feature value and | |
members of the values element is case sensitive. The FESL fragment below | |
shows how to specify such a comparison: | |
</p><p class="Normal"><enumFeatureValues caseSensitive="true"></p><p class="Normal"><values>red</values></p><p class="Normal"><values>green</values></p><p class="Normal"><values>blue</values></p><p class="Normal"></enumFeatureValues></p><p class="Normal"> | |
This fragment specifies a case sensitive comparison of a feature value to | |
a set of strings: <code class="code">red</code>, <code class="code">green</code> and <code class="code">blue</code>. | |
</p><p class="Normal"> | |
Special processing occurs when the array has only a single element that | |
starts with <code class="code">file://</code>, enabling the use of external dictionaries for | |
comparison. In this case, the text within the | |
<code class="code">values</code> | |
element is treated as a URI. The contents of the file referenced by the | |
URI will be loaded and used as a set of values against which the feature | |
value is going to be tested. The file should contain one dictionary entry | |
per line, with each line starting with the <code class="code">#</code> character considered to be | |
a comment and thus will not be loaded. The dictionary handling is | |
implemented in org.apache.uima.tools.cfe.EnumeratedEntryDictionary. The default | |
implementation supports single token (whitespace separated) dictionary | |
entries. If a more sophisticated dictionary format is desired, then | |
either the constructor's parameters can be changed or methods for | |
initializing and loading the dictionary from a file can be overridden. | |
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_ObjectPathFeatureValue"></a>3.2.3. | |
ObjectPathFeatureValuesXML | |
</h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: objectPath[1]: String</p></li></ul></div><p> | |
<span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-9.jpg" align="middle"></span> | |
</p><p class="Normal"> | |
According to ObjectPathFeatureValuesXML specification, the | |
<a href="#_CFE_Basics" title="1.3. CFE Basics">TA</a> | |
or | |
<a href="#_CFE_Basics" title="1.3. CFE Basics"> | |
<span class="Hyperlink1">FA</span> | |
</a> | |
itself (depending on whether this element is in | |
<a href="#_TAM_and_FAM" title="3.1.3. TAM and FAM"> | |
<span class="Hyperlink1">TAM</span> | |
</a> | |
or in | |
<a href="#_TAM_and_FAM" title="3.1.3. TAM and FAM"> | |
<span class="Hyperlink1">FAM</span> | |
</a>) | |
is tested whether it is at the location defined by the objectPath. This | |
ability to evaluate whether a feature belongs to some CAS object is | |
useful specifically in the cases where a particular feature value is the | |
property of several different objects. For instance, this element can be | |
used when features from annotations should be extracted only if they are | |
attributes of other annotations. The FESL fragment below specifies a test | |
that checks if an object's full path is | |
<code class="code">org.apache.uima.tools.cfe.sample.CarAnnotation:Wheels:toArray</code>. Such a test, for | |
instance, can be used to check if an instance of a <code class="code">WheelAnnotation</code> | |
belongs to an instance <code class="code">CarAnnotation</code>: | |
</p><p class="Normal"> | |
<objectFeatureValues objectPath="org.apache.uima.tools.cfe.sample.CarAnotation:Wheels:toArray"b> | |
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_PatternFeatureValuesXM"></a>3.2.4. | |
PatternFeatureValuesXML | |
</h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: pattern[1]: String</p></li></ul></div><p> | |
<span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-10.jpg" align="middle"></span> | |
</p><p class="Normal"> | |
The PatternFeatureValuesXML element enables comparing a feature value | |
against a regular expression specified by the <code class="code">pattern</code> attribute using | |
Java Regular Expression syntax and considered to be successfully | |
evaluated if the value matches the pattern. | |
</p><p class="Normal"> | |
The FESL fragment below defines a test that checks if a feature value | |
conforms to the hex number format: | |
</p><p class="Normal"><patternFeatureValues pattern="(0[Xx][0-9A-Fa-f]+)" /></p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_RangeFeatureValuesXML"></a>3.2.5. | |
RangeFeatureValuesXML | |
</h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: lowerBoundary[0..1]: Comparable: default 0</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: lowerBoundaryInclusive[0..1]: boolean default false</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: upperBoundary[0..1]: Comparable default 0</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: upperBoundaryInclusive[0..1]: boolean default false</p></li></ul></div><div class="mediaobject"><span></span></div><p class="Normal"> | |
According to RangeFeatureValuesXML specification the feature value is | |
evaluated whether it is of a Comparable type and belongs to the interval | |
specified by the attributes <code class="code">lowerBoundary</code> and <code class="code">upperBoundary</code>. The | |
attributes <code class="code">lowerBoundaryInclusive</code> and <code class="code">upperBoundaryInclusive</code> indicate | |
whether the corresponding boundaries should be included in the range for | |
comparison. FESL fragment below specifies a test that checks if feature | |
value is in the numeric range between 1 and 5, including 1 and excluding | |
5: | |
</p><p class="Normal"> | |
<rangeFeatureValues lowerBoundary="1.8" upperBoundaryInclusive="true" upperBoundary="3.0" /></p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_SingleFeatureMatcherXML"></a>3.2.6. | |
SingleFeatureMatcherXML | |
</h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: featurePath[1]: String</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: featureTypeName[0..1]: String: no default value</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: exclude[0..1]: boolean: default false</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: quiet[0..1]: boolean: default false</p></li><li style="list-style-type: disc"><p class="Normal">Element: featureValues one of: </p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">bitsetFeatureValues: BitsetFeatureValuesXML</p></li><li style="list-style-type: disc"><p class="Normal">enumFeatureValues: EnumFeatureValuesXML</p></li><li style="list-style-type: disc"><p class="Normal">objectPathFeatureValues: ObjectPathFeatureValuesXML</p></li><li style="list-style-type: disc"><p class="Normal">patternFeatureValues: PatternFeatureValuesXML</p></li><li style="list-style-type: disc"><p class="Normal">rangeFeatureValues: RangeFeatureValuesXML</p></li></ul></div></li></ul></div><p> | |
<span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-12.jpg" align="middle"></span> | |
</p><p class="Normal"> | |
The <code class="code">SingleFeatureMatcherXML</code> defines rules for matching of a feature value | |
to the featureValues element. The featureValues can be one of the | |
elements in the bullet list above. The previous section detailed rules | |
for matching a feature value to each of these elements. According to the | |
specification for matching of a single feature value, first, a value of a | |
feature denoted by the required <code class="code">featurePath</code> attribute is located. For | |
features that have arrays in their featurePath multiple values can be | |
found. If such value(s) is found and optional <code class="code">featureTypeName</code> attribute | |
specifies a type name of the feature value, every found feature value is | |
tested to be of that type. If the test is successful, then feature values | |
are evaluated according to a specification given in featureValues. After | |
the evaluation is performed a single feature is considered to be | |
successfully evaluated if: | |
</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal"> | |
the exclude attribute value is set to false and at least one | |
feature value is matched to <code class="code">featureValues</code> specification. | |
</p></li><li style="list-style-type: disc"><p class="Normal"> | |
the exclude attribute value is set to true and none of the | |
feature values is matched to <code class="code">featureValues</code> specification. | |
</p></li></ul></div><p class="Normal"> | |
For <code class="code">SingleFeatureMatcherXML</code> elements that are parts of TAM element only | |
evaluation of feature values is performed. If a <code class="code">SingleFeatureMatcherXML</code> | |
element is a part of FAM then the feature value is output only if the | |
<code class="code">quiet</code> attribute is set to false. If the value of the <code class="code">quiet</code> attribute is | |
set to true, then, even if the feature is matched, only an evaluation is | |
performed, but no value is written into the final output. A <code class="code">featurePath</code> | |
attribute uses feature path notation explained earlier. | |
</p><p class="Normal"> | |
FESL fragment below defines a test that checks if a value of the <code class="code">Size</code> | |
attribute is in a range defined by <code class="code">rangeFeatureVulues</code> element: | |
</p><p class="Normal"><featureMatchers featurePath="Size" featureTypeName="java.lang.Float"></p><p class="Normal"><rangeFeatureValues lowerBoundary="1.8" upperBoundaryInclusive="true" upperBoundary="3.0"/></p><p class="Normal"></featureMatchers></p><p class="Normal"> | |
In addition it is allowed to use the parent tag (see | |
<a href="#_Parent_tag" title="3.1.5. Parent tag"> | |
<span class="Hyperlink1">Parent tag</span> | |
</a>) | |
in the <code class="code">featurePath</code> attribute. A sample in the <code class="code">PartialObjectMatcherXML</code> | |
section detail on how use the parent tag notation. | |
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_GroupFeatureMatcherXML"></a>3.2.7. | |
GroupFeatureMatcherXML | |
</h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: exclude[0..1]: boolean: default false</p></li><li style="list-style-type: disc"><p class="Normal">Element: featureMatchers[1..*]: SingleFeatureMatcherXML</p></li></ul></div><p> | |
<span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-13.jpg" align="middle"></span> | |
</p><p class="Normal"> | |
This is a specification for matching a group of features. It can be applied | |
to both types of annotations, TAs and FAs. Each element in featureMatchers is | |
evaluated against either a TA or a FA annotation. The group is considered to | |
be matched if: | |
</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal"> | |
the <code class="code">exclude</code> attribute value is set ao false and all elements in | |
<code class="code">featureMatchers</code> have been successfully evaluated. | |
</p></li><li style="list-style-type: disc"><p class="Normal"> | |
the <code class="code">exclude</code> attribute value is set to true and evaluation of either | |
of the elements in <code class="code">featureMatchers</code> is unsuccessful | |
</p></li></ul></div><p class="Normal"> | |
The FESL fragment below defines a group with the two features <code class="code">Color</code> and | |
<code class="code">Wheels:Size</code> to be matched. The entire group is to be successfully evaluated | |
if both features are matched. The first feature is successfully evaluated if | |
its value is one of the values listed by its <code class="code">enumFeatureValues</code> element and | |
the second feature is matched if its value is not in the set contained in its | |
<code class="code">enumFeatureValues</code> element, as specified by its <code class="code">exclude</code> attribute. It should | |
be noted that if the optional attribute <code class="code">featureTypeName</code> is omitted then a | |
feature value is assumed to be of a string type. Otherwise a feature value's type | |
will be evaluated if it is the same or derived from the type specified by the | |
<code class="code">featureTypeName</code> attribute. Assuming the <code class="code">groupFeatureMatcher</code> is specified for | |
the <code class="code">CarAnnotation</code> type, the test defined by a FESL fragment below is | |
successful is a car is ether red, green or blue and it does not have 1 or 3 | |
wheels: | |
</p><p class="Normal"><groupFeatureMatchers></p><p class="Normal"> <featureMatchers featurePath="Color" featureTypeName="java.lang.Stting"> </p><p class="Normal"> <enumFeatureValues caseSensitive="true"> </p><p class="Normal"> <values>red</values> </p><p class="Normal"> <values>green</values></p><p class="Normal"> <values>blue</values></p><p class="Normal"> </enumFeatureValues></p><p class="Normal"> </featureMatcher></p><p class="Normal"> <featureMatchers featurePath="Wheels:Size" exclude="true"></p><p class="Normal"> <enumFeatureValues caseSensitive="true"></p><p class="Normal"> <values>1</values></p><p class="Normal"> <values>3</values></p><p class="Normal"> </enumFeatureValues></p><p class="Normal"> </featureMatchers></p><p class="Normal"><groupFeatureMatchers></p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_PartialObjectMatcherXML"></a>3.2.8. | |
PartialObjectMatcherXML | |
</h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: annotationTypeName[1]: String</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: fullPath[0..1]: String: no default value</p></li><li style="list-style-type: disc"><p class="Normal"> | |
Element: groupFeatureMatchers[0..*]: GroupFeatureMatcherXML | |
</p></li></ul></div><p> | |
<span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-14.jpg" align="middle"></span> | |
</p><p class="Normal"> | |
This is a base specification for an annotation matcher that will search | |
annotations of a type specified by <code class="code">annotationTypeName</code> located on a path | |
specified by <code class="code">fullPath</code>. If <code class="code">fullPath</code> is omitted or just contains the type | |
name of an annotation (same as <code class="code">annotationTypeName</code> attribute) then all | |
instances of that type are considered for further feature value | |
evaluation. If <code class="code">fullPath</code> contains a path to an object from an attribute of | |
a different object, then only instances of <code class="code">annotationTypeName</code> that | |
located on that path will be considered for further evaluation Once an | |
annotation is successfully evaluated to match a type/path, its features | |
are evaluated according to specification given in all elements of | |
<code class="code">groupFeatureMatchers</code>. If evaluation of any <code class="code">groupFeatureMatchers</code> is | |
successful or if no <code class="code">groupFeatureMatchers</code> is given, then the annotation is | |
considered to be successfully evaluated. The <code class="code">fullPath</code> attribute should be | |
specified using syntax described in the | |
<a href="#_Feature_path" title="3.1.1. Feature path"> | |
<span class="Hyperlink2">feature path</span> | |
</a> | |
section above, with the exception that it can not contain any parent tags. | |
For instance, a specification where a value of the <code class="code">fullPath</code> attribute is | |
<code class="code">CarAnnotation:Engine</code> and a value of the <code class="code">annotationTypeName</code> is | |
<code class="code">EngineAnnotation</code> would address only engines that are car engines. | |
<code class="code">PartialAnnotationMatcherXML</code> is used to specify search rules in TAM | |
specifications. To illustrate the use of parent tag notation let's | |
consider an example where it is required to identify engines of blue | |
cars that have a size more than 1.8 l but not greater then 3.0 l. | |
According to a class diagram in Figure 2, the FESL fragment below defines | |
rules for the task. It should be noted that the second feature matcher | |
uses the | |
<a href="#_Parent_tag" title="3.1.5. Parent tag"> | |
<span class="Hyperlink2">parent tag</span> | |
</a> notation to access a value of the <code class="code">CarAnnotation</code>'s attribute <code class="code">Color</code>: | |
</p><p class="Normal"><targetAnnotationMatcher annotationTypeName="EngineAnnotation" fullPath="CarAnnotation:EngineAnnotation" ></p><p class="Normal"> <groupFeatureMatchers></p><p class="Normal"> <featureMatchers featurePath="Size" featureTypeName="java.lang.Float"></p><p class="Normal"> <rangeFeatureValues lowerBoundary="1.8" upperBoundaryInclusive="true" upperBoundary="3.0"/></p><p class="Normal"> </featureMatchers></p><p class="Normal"> <featureMatchers featurePath="__p0:Color" featureTypeName="java.lang.String"</p><p class="Normal"> <enumFeatureValues caseSensitive="true"></p><p class="Normal"> <values>red</values></p><p class="Normal"> <values>green</values></p><p class="Normal"> <values>blue</values></p><p class="Normal"> </enumFeatureValues></p><p class="Normal"> </featureMatcher></p><p class="Normal"> <groupFeatureMatchers></p><p class="Normal"></targetAnnotationMatcher></p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_FeatureObjectMatcherXML"></a>3.2.9. | |
FeatureObjectMatcherXML | |
</h3></div></div></div><p class="Normal">extends <code class="code">PartialAnnotationMatcherXML</code></p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: windowsizeLeft[0..1]: Integer: default 0</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: windowsizeInside[0..L]: Integer: default 0</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: windowsizeRight[0..1]: Integer: default 0</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: windowsizeEnclosed[0..1]: Integer: default 0</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: windowFlags[0..1]: Integer: default 0</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: orientation[0..1]: boolean: default false</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: distance[0..1]: boolean: default false</p></li></ul></div><p> | |
<span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-15.jpg" align="middle"></span> | |
</p><p class="Normal"> | |
The <code class="code">FeatureObjectMatcherXML</code> element contains rules that specify how | |
<code class="code">FeatureAnnotations</code> (FA) should be located and which features should be | |
extracted from them. It inherits its properties from | |
<code class="code">PartialObjectMatcherXML</code>. In addition it has semantics for specifying: | |
</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">a size of a search window</p></li><li style="list-style-type: disc"><p class="Normal"> | |
a direction for the search relative to a corresponding Target Annotation (TA). | |
</p></li></ul></div><p class="Normal"> | |
It is done by using boolean attributes <code class="code">windowsizeLeft</code>, <code class="code">windowsizeInside</code>, | |
<code class="code">windowsizeRight</code>, <code class="code">windowsizeEnclosed</code> and the bitmask <code class="code">windowFlags</code> attribute | |
that indicate FA's search rules: | |
</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">windowsizeLeft - a size of the search window to the left from TA</p></li><li style="list-style-type: disc"><p class="Normal">windowsizeRight - a size of the search window to the right from TA</p></li><li style="list-style-type: disc"><p class="Normal">windowsizeInside - a size of the search window within TA boundaries; if the value of this attribute is 1, then the TA is considered to be an FA at the same time</p></li><li style="list-style-type: disc"><p class="Normal">windowFlags - more precise criteria for search window; the value if this attribute is a bitmask with a combination of the following values:</p><div class="orderedlist"><ol type="a"><li><p class="Normal">1 - FA starts to the left from the TA and ends to the left from the TA</p></li><li><p class="Normal">2 - FA starts to the left from the TA and ends inside of TA boundaries</p></li><li><p class="Normal">4 - FA starts to the left from the TA and ends to the right from the TA</p></li><li><p class="Normal">8 - FA starts inside of the TA and ends inside of the TA boundaries</p></li><li><p class="Normal">16 - FA starts inside of the TA boundaries and ends to the right from the TA</p></li><li><p class="Normal">32 - FA starts to the right from the TA and ends to the right from the TA</p></li></ol></div></li></ul></div><p class="Normal"> | |
The location of a FA is included in the generated output according to | |
optional orientation and distance attributes. For example, if values of | |
both of these attributes are set to true and the FA is a first annotation | |
of required type to the left from TA, then the generated feature value | |
will start with the prefix <code class="code">L1</code>. If the values are set to false, then the | |
feature value's prefix will be <code class="code">X0</code>. This allows generating unique | |
feature names for model building and evaluation for machine learning. | |
</p><p class="Normal"> | |
<code class="code">FeatureObjectMatcherXML</code> is used to specify search rules in FAM | |
specifications. | |
</p><p class="Normal"> | |
The FESL fragment below adds rules to the previous sample to extract a | |
number of cylinders from engines of cars whose wheels diameter is at | |
least 20.0": | |
</p><p class="Normal"><targetAnnotationMatcher annotationTypeName="EngineAnnotation" fullPath="CarAnnotation:EngineAnnotation" ></p><p class="Normal"> <groupFeatureMatchers></p><p class="Normal"> <featureMatchers featurePath="Size" featureTypeName="java.lang.Float"></p><p class="Normal"> <rangeFeatureValues lowerBoundary="1.8" upperBoundaryInclusive="true" upperBoundary="3.0"/></p><p class="Normal"> </featureMatchers></p><p class="Normal"> <featureMatchers featurePath="__p0:Color" featureTypeName="java.lang.String"></p><p class="Normal"> <enumFeatureValues caseSensitive="true"></p><p class="Normal"> <values>red</values></p><p class="Normal"> <values>green</values></p><p class="Normal"> <values>blue</values></p><p class="Normal"> </enumFeatureValues></p><p class="Normal"> </featureMatcher></p><p class="Normal"> <groupFeatureMatchers></p><p class="Normal"></targetAnnotationMatcher></p><p class="Normal"><featureAnnotationMatcher annotationTypeName="EngineAnnotation" fullPath="CarAnnotation:EngineAnnotation" windowsizeInside=1 ></p><p class="Normal"> <groupFeatureMatchers></p><p class="Normal"> <featureMatchers featurePath="__p0:Wheels:toArray:Diameter" featureTypeName="java.lang.Float" quiet="true" ></p><p class="Normal"> <rangeFeatureValues lowerBoundary="20.0" lowerBoundaryInclusive="true"/></p><p class="Normal"> </featureMatcher></p><p class="Normal"> <featureMatchers featurePath="Cylinders" featureTypeName="java.lang.Float" /></p><p class="Normal"> <groupFeatureMatchers></p><p class="Normal"></featureAnnotationMatcher></p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_TargetAnnotationXML"></a>3.2.10. | |
TargetAntotationXML | |
</h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: className[1]: String</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: enclosingAnnotation[1]: String</p></li><li style="list-style-type: disc"><p class="Normal">Element targetAnnotationMatcher[1..1]: PartialObjectMatcherXML</p></li><li style="list-style-type: disc"><p class="Normal"> | |
Element featureAnnotationMatchers[0..*]: FeatureObjectMatcherXML | |
</p></li></ul></div><p> | |
<span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-16.jpg" align="middle"></span> | |
</p><p class="Normal"> | |
This is a root specification for a class (group) of annotations of all | |
extracted instances, which are assigned the same label (className) in the | |
final output. The label can be a literal string or a feature path in | |
curly brackets or a combination of the two (i.e. | |
<code class="code">SomeText_{__p0:SomeProperty}</code>). If using a feature path in a class name | |
label it is required to use the parent tag notation. In such a case the | |
parent tag refers to the TA specified by the <code class="code">targetAnnotationMatcher</code> | |
element. Annotations that belong to the group are searched within a span | |
of <code class="code">enclosingAnnotation</code> according to the specification given in the | |
<code class="code">targetAnnotationMatcher</code> (TAM) and features from matched annotations are | |
extracted according to specification given in <code class="code">featureAnnotationMatchers</code> | |
(FAM). In general, the annotation that features are extracted from could | |
be different from annotations that are matched during the search This is | |
useful when extracting features for machine learning model building and | |
evaluation where features are selected from annotations that could be | |
located in a specific location relatively to the annotation that satisfy | |
a search criteria. For instance, POS tags of 5 words to the left and | |
right from a specific word. Only if an annotation is successfully | |
evaluated (matched) by a TAM further feature extraction is allowed and | |
rules specified by corresponding FAMs are executed. | |
</p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_Configuration_file_sample"></a>3.3. | |
Configuration file sample | |
</h2></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Task_definition"></a>3.3.1. | |
Task definition | |
</h3></div></div></div><p class="Normal"> | |
The sample configuration file below has been created for extracting | |
features in order to build models for a machine learning application. The | |
type system for this sample defines several UIMA annotation types: | |
</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">org.apache.uima.tools.cfe.sample.Sentence - type that marks a sentence</p></li><li style="list-style-type: disc"><p class="Normal">org.apache.uima.tools.cfe.sample.Token - type that marks a token with features:</p></li></ul></div><p class="Normal">pennTag: String - POS tag of a token</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">org.apache.uima.tools.cfe.sample.NamedEntity - named entity type with features:</p></li></ul></div><p class="Normal">Code: String - specific code assigned to a named entity</p><p class="Normal">SemanticClass: String - semantic class of a named entity</p><p class="Normal">Tokens: FSArray - array of org.apache.uima.tools.cfe.sample.Token annotations, ordered by their offset, that are included in the named entity</p><p class="Normal">The classification task is defined as follows:</p><div class="orderedlist"><ol type="a"><li><p class="Normal"> | |
classify first token of each named entities that has semantic | |
class <code class="code">Car Maker</code> with a class label that is a composite of | |
the string <code class="code">CMBegin</code> and a value of the <code class="code">Code</code> attribute that | |
named entity | |
</p></li><li><p class="Normal"> | |
classify all other tokens of named entities of a semantic class | |
<code class="code">Car Maker</code> with a class label that is a composite of the string | |
<code class="code">CMInside</code> and a value of the <code class="code">Code</code> property of that named entity | |
</p></li><li><p class="Normal">classify all other tokens with a class label <code class="code">Other_Token</code></p></li></ol></div><p class="Normal"> | |
To build a model for machine learning it is required to extract | |
features from surrounding tokens for all classes listed above. | |
In particular the following features are required to be extracted: | |
</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">a string literal of the token to which the class label is assigned (<code class="code">class token</code>)</p></li><li style="list-style-type: disc"><p class="Normal"> | |
a string literal of each token that is located with in a window of 5 | |
tokens from the <code class="code">class token</code> with the exception of prepositions (POS tag | |
is IN), conjunctions (CC), delimiters (DT), punctuation (POS tag is not | |
defined - null) and numbers (CD) | |
</p></li><li style="list-style-type: disc"><p class="Normal"> | |
all extracted features have to be unique with their position information | |
relative to the location of the <code class="code">class token</code>. | |
</p></li></ul></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Implementation"></a>3.3.2. | |
Implementation | |
</h3></div></div></div><p class="Normal">Line 1 - a standard XML declaration that defines the XML version of the document and its encoding</p><p class="Normal">Line 2, 87 - FESL root element that references the schema and defines global variables, such as nullValueImage (see | |
<a href="#_Null_values" title="3.1.6. Null values"> | |
<span class="Hyperlink1">Null values</span> | |
</a>) | |
</p><p class="Normal">Line 3-32 - rules for extracting features for first tokens of named entities.</p><p class="Normal">Line 3 - extracted features for those tokens are assigned a composite label that includes prefix <code class="code">CMBegin_</code> pl s a value of a <code class="code">Code</code> attribute of the first element of the TA's path. The search for FA is going to be performed within boundaries of enclosing org.apache.uima.tools.cfe.sample.Sentence annotation</p><p class="Normal">Line 4-12 - TAM that defines rules for identifying the fist TA</p><p class="Normal">Line 4 - defines TA's type (org.apache.uima.tools.cfe.sample.Token) and a full path to it (org.apache.uima.tools.cfe.sample.NamedEntity:Tokens:toArray[0]). According to this path notion, the CFE will:</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">search for annotations of type org.apache.uima.tools.cfe.sample.NamedEntity</p></li><li style="list-style-type: disc"><p class="Normal"> | |
for annotations that were found it accesses the value of their attribute | |
Tokens and if the value is not null, the method toArray is called to | |
convert the value to an array | |
</p></li><li style="list-style-type: disc"><p class="Normal">if the resulted array is not empty, its first element will be considered to be a TA </p></li></ul></div><p class="Normal">Line 5-11 - defines rules for matching a group of features for TA</p><p class="Normal">Line 6-10 - defines rules for matching a feature for this group</p><p class="Normal">Line 6 - defines that the feature value is of the type | |
java.lang.String and has the feature the path __p0:SemanticClass, which | |
translates to a value of the attribute SemanticClass of the first element of | |
the TA's path (org.apache.uima.tools.cfe.sample.NamedEntity) | |
</p><p class="Normal">Line 7-9 - defines an explicit list of values that the feature value should be in</p><p class="Normal">Line 8 - defines the value <code class="code">Car Maker</code> as the only possible value for the feature </p><p class="Normal">Line 13-17 - FAM that defines rules for identifying first FA and its feature extraction</p><p class="Normal">Line 13 - defines FA's type to be org.apache.uima.tools.cfe.sample.Token; | |
the attribute windowsizeInside with the value 1 tells CFE to extract features from TA | |
itself (TA=FA) and setting orientation and distance attributes to true tells CFE to | |
include position information into the generated feature value | |
</p><p class="Normal">Line 14-16 - defines rules for matching a group of features for the first FA.</p><p class="Normal">Line 15 - defines rules for matching the only feature for | |
this group of the type java.lang.String and with feature path coveredText that | |
eventually will be translated by CFE to a method call of a org.apache.uima.tools.cfe.sample.Token | |
annotation object; according to this specification the feature value will be | |
unconditionally extracted | |
</p><p class="Normal">Line 18-31 - FAM that defines rules for identifying second type of FA and its feature extraction</p><p class="Normal">Line 18 - defines FA's type to be org.apache.uima.tools.cfe.sample.Token; | |
the attributes windowsizeLeft and windowsizeRight with the values 5 tell CFE | |
to extract features from 5 nearest annotations of this type to the left and | |
to the right from TA and having orientation and distance attributes set to | |
true tells CFE to include position information into the generated feature | |
value. | |
</p><p class="Normal">Line 19-30 - defines rules for matching a group of features for the second FA.</p><p class="Normal">Line 20 - defines rules for matching the first feature of | |
the group to be of the type java.lang.String and with the feature path | |
coveredText that eventually will be translated by CFE to a method call of a | |
org.apache.uima.tools.cfe.sample.Token annotation object; according to this | |
specification the feature value will be unconditionally extracted | |
</p><p class="Normal">Line 21-29 - define rules for matching the second feature of the group</p><p class="Normal">Line 21 - defines rules for matching the second feature | |
of the group to be of the type java.lang.String and with the feature path | |
pennTag that eventually will be translated by CFE to <code class="code">getPennTag</code> method call | |
of a org.apache.uima.tools.cfe.sample.Token annotation object; according to this | |
specification the feature will be evaluated against | |
<span class="Hyperlink1">enumFeatureValues</span> | |
and, as the exclude attribute is set to true: | |
</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal"> | |
if the evaluation is successful, the feature matcher will cause the | |
parent group to be unmatched and since it is the only group in the | |
FAM, no output for this FA will be produced | |
</p></li><li style="list-style-type: disc"><p class="Normal"> | |
if the evaluation is unsuccessful, this feature matcher will not affect | |
matching status of the group, so the output for FA will be generated as | |
the first matcher of the group unconditionally produces output | |
</p></li></ul></div><p class="Normal">As the | |
<span class="Hyperlink1">quiet</span> | |
attribute is set to true, the feature value extracted by the second | |
matcher will not be added to the generated for this FA output </p><p class="Normal">Line 22-28 - defines an explicit list of values that the | |
value of the second feature should be in | |
</p><p class="Normal">Line 23-27 - defines values <code class="code">IN</code>, <code class="code">CC</code>, <code class="code">DT</code>, <code class="code">CD</code>, <code class="code">null</code> | |
as possible values for the second feature; if the feature value is equal | |
to one of these values, evaluation of the enclosing feature matcher is | |
successful; if the feature value is null it will be converted to the | |
string defined by | |
<a href="#_Null_values" title="3.1.6. Null values"> | |
<span class="Hyperlink1">nullValueImage</span> | |
</a> | |
(<code class="code">null</code> as set in line 2 of this sample) and as <code class="code">null</code> is one of the | |
list's elements, it will be successfully evaluated. | |
</p><p class="Normal">Line 34-63 - rules for extracting features for all tokens | |
of named entities except the first. These rules are the same as the rules | |
defined for first tokens of named entities (lines 3-32) with the following | |
exceptions: | |
</p><p class="Normal">Line 34 - defines that TAs matched by these rules will | |
be assigned a composite label that includes prefix <code class="code">CMInside_</code> plus a | |
value of the <code class="code">Code</code> attribute of a first element of the TA's path | |
</p><p class="Normal">Line 35 - sets the fullPath attribute to | |
org.apache.uima.tools.cfe.sample.NamedEntity:Tokens:toArray that can be | |
translated as <code class="code">any token of a named entity</code>, but because of | |
<a href="#_Implicit_TA_exclusion" title="3.1.7. Implicit TA exclusion"> | |
<span class="Hyperlink1">implicit TA exclusion</span> | |
</a> | |
, the TAs that were matched for first tokens of named entities by the | |
rules for previous TAM are not included into the set of TAs that will be | |
evaluated by rules for this TAM | |
</p><p class="Normal">Line 65-86 - rules for extracting features for all tokens | |
other than tokens of named entities. These rules are the same as the rules | |
defined for previous categories with the following exceptions: | |
</p><p class="Normal">Line 65 - defines that TAs matched by the enclosed | |
rules will be assigned the string label <code class="code">Other_token</code> | |
</p><p class="Normal">Line 66 - only defines a type of TAs that should be | |
processed by the corresponding TAM without fullPath attribute. Such a | |
notation can be translated as <code class="code">all tokens</code>, but because of the | |
<a href="#_Implicit_TA_exclusion" title="3.1.7. Implicit TA exclusion"> | |
<span class="Hyperlink1">implicit TA exclusion</span> | |
</a> | |
, the TAs, which were matched for tokens of named entities by rules | |
defined by the previous TAMs, are not included into the set of TAs that | |
will be evaluated by rules for this TAM. So, the actual translation will | |
be <code class="code">all tokens other than tokens of named entities.</code> | |
</p><div class="orderedlist"><ol type="1" compact><li><?xml version="1.0" encoding="UTF-8"?></li><li><tns:CFEConfig nullValueImage="null" | |
xmlns:tns="http://www.apache.org/uima/cfe/config" | |
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | |
xsi:schemaLocation="http://www.apache.org/uima/cfe/config CFEConfig.xsd "> | |
</li><li> <tns:targetAnnotations className="CMBegin_{__p0:Code}" | |
enclosingAnnotation="org.apache.uima.tools.cfe.sample.Sentence"> | |
</li><li> <tns:targetAnnotationMatcher | |
annotationTypeName="org.apache.uima.tools.cfe.sample.Token" | |
fullPath="org.apache.uima.tools.cfe.sample.NamedEntity:Tokens:toArray[0]"> | |
</li><li> <tns:groupFeatureMatchers></li><li> <tns:featureMatchers featurePath="__p0:SemanticClass" | |
featureTypeName="java.lang.String"></li><li> <tns:enumFeatureValues></li><li> <tns:values>Car Maker</tns:values></li><li> </tns:enumFeatureValues></li><li> </tns:featureMatchers></li><li> </tns:groupFeatureMatchers></li><li> </tns:targetAnnotationMatcher></li><li> <tns:featureAnnotationMatchers annotationTypeName= | |
"org.apache.uima.tools.cfe.sample.Token" windowsizeInside="1" | |
orientation="true" distance="true"> | |
</li><li> <tns:groupFeatureMatchers></li><li> <tns:featureMatchers featurePath="coveredText" | |
featureTypeName="java.lang.String"/></li><li> </tns:groupFeatureMatchers></li><li> </tns:featureAnnotationMatchers></li><li> <tns:featureAnnotationMatchers annotationTypeName= | |
"org.apache.uima.tools.cfe.sample.Token" windowsizeLeft="5" | |
windowsizeRight="5" orientation="true" distance="true"> | |
</li><li> <tns:groupFeatureMatchers></li><li> <tns:featureMatchers | |
featurePath="coveredText" featureTypeName="java.lang.String"/> | |
</li><li> <tns:featureMatchers featurePath="pennTag" | |
featureTypeName="java.lang.String" exclude="true" quiet="true"> | |
</li><li> <tns:enumFeatureValues caseSensitive="true"></li><li> <tns:values>IN</tns:values></li><li> <tns:values>CC</tns:values></li><li> <tns:values>DT</tns:values></li><li> <tns:values>CD</tns:values></li><li> <tns:values>null</tns:values></li><li> </tns:enumFeatureValues></li><li> </tns:featureMatchers></li><li> </tns:groupFeatureMatchers></li><li> < tns:featureAnnotationMatchers></li><li> </tns:targetAnnotations></li><li></li><li> <tns:targetAnnotations className="CMInside_{__p0:Code}" | |
enclosingAnnotation="org.apache.uima.tools.cfe.sample.Sentence"> | |
</li><li> <tns:targetAnnotationMatcher | |
annotationTypeName="org.apache.uima.tools.cfe.sample.Token" | |
fullPath="org.apache.uima.tools.cfe.sample.NamedEntity:Tokens:toArray"> | |
</li><li> <tns:groupFeatureMatchers></li><li> <tns:featureMatchers featurePath="__p0:SemanticClass" | |
featureTypeName="java.lang.String"> | |
</li><li> <tns:enumFeatureValues></li><li> <tns:values>Car Maker</tns:values></li><li> </tns:enumFeatureValues></li><li> </tns:featureMatchers></li><li> </tns:groupFeatureMatchers></li><li> </tns:targetAnnotationMatcher></li><li> <tns:featureAnnotationMatchers | |
annotationTypeName="org.apache.uima.tools.cfe.sample.Token" | |
windowsizeInside="1" orientation="true" distance="true"> | |
</li><li> <tns:groupFeatureMatchers></li><li> <tns:featureMatchers | |
featurePath="coveredText" featureTypeName="java.lang.String"/> | |
</li><li> </tns:groupFeatureMatchers></li><li> </tns:featureAnnotationMatchers></li><li> <tns:featureAnnotationMatchers | |
annotationTypeName="org.apache.uima.tools.cfe.sample.Token" windowsizeLeft="5" | |
windowsizeRight="5" orientation="true" distance="true"> | |
</li><li> <tns:groupFeatureMatchers></li><li> <tns:featureMatchers | |
featurePath="coveredText" featureTypeName="java.lang.String"/> | |
</li><li> <tns:featureMatchers | |
featurePath="pennTag" featureTypeName="java.lang.String" exclude="true" quiet="true"> | |
</li><li> <tns:enumFeatureValues caseSensitive="true"></li><li> <tns:values>IN</tns:values></li><li> <tns:values>CC</tns:values></li><li> <tns:values>DT</tns:values></li><li> <tns:values>CD</tns:values></li><li> <tns:values>null</tns:values></li><li> </tns:enumFeatureValues></li><li> </tns:featureMatchers></li><li> </tns:groupFeatureMatchers></li><li> </tns:featureAnnotationMatchers></li><li> </tns:targetAnnotations></li><li></li><li> <tns:targetAnnotations className="Other_token" | |
enclosingAnnotation="org.apache.uima.tools.cfe.sample.Sentence"> | |
</li><li> <tns:targetAnnotationMatcher | |
annotationTypeName="org.apache.uima.tools.cfe.sample.Token"/> | |
</li><li> <tns:featureAnnotationMatchers | |
annotationTypeName="org.apache.uima.tools.cfe.sample.Token" | |
windowsizeInside="1" orientation="true" distance="true"> | |
</li><li> <tns:groupFeatureMatchers></li><li> <tns:featureMatchers featurePath="coveredText" | |
featureTypeName="java.lang.String"/></li><li> </tns:groupFeatureMatchers></li><li> </tns:featureAnnotationMatchers></li><li> <tns:featureAnnotationMatchers | |
annotationTypeName="org.apache.uima.tools.cfe.sample.Token" | |
windowsizeLeft="c" windowsizeRight="5" orientation="true" distance="true"> | |
</li><li> <tns:groupFeatureMatchers></li><li> <tns:featureMatchers featurePath="coveredText" | |
featureTypeName="java.lang.String"/> | |
</li><li> <tns:featureMatchers featurePath="pennTag" | |
featureTypeName="java.lang.String" exclude="true" quiet="true"> | |
</li><li> <tns:enumFeatureValues caseSensitive="true"></li><li> <tns:values>IN</tns:values></li><li> <tns:values>CC</tns:values></li><li> <tns:values>DT</tns:values></li><li> <tns:values>CD</tns:values></li><li> <tns:values>null</tns:values></li><li> </tns:enumFeatureValues></li><li> </tns:featureMatchers></li><li> </tns:groupFeatureMatchers></li><li> </tns:featureAnnotationMatchers></li><li> </tns:targetAnnotations></li><li></tns:CFEConfig></li></ol></div></div></div></div><div class="chapter" lang="en" id="_Using_CFE_for_evaluation"><div class="titlepage"><div><div><h2 class="title"><a name="_Using_CFE_for_evaluation"></a>Chapter 4. | |
Using CFE for evaluation | |
</h2></div></div></div><p class="Normal"> | |
Comparison of results produced by a pipeline of UIMA annotators to a | |
<code class="code">gold standard</code> or results of two different NLP systems is a frequent | |
task. With CFE this task can be automated. | |
</p><p class="Normal"> | |
The paper "CFE a system for testing, evaluation and machine learning of | |
UIMA based applications" by Sominsky, Coden and Tanenblatt describes details of the | |
evaluation process. | |
</p></div></div></body></html> |