blob: 882407f14c49d6f5a199e4354e1533b3828da9c7 [file] [log] [blame]
<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>CFE User Guide</title><link rel="stylesheet" href="css/stylesheet-html.css" type="text/css"><meta name="generator" content="DocBook XSL Stylesheets V1.72.0"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="book" lang="en" id="d0e2"><div class="titlepage"><div><div><h1 class="title"><a name="d0e2"></a>CFE User Guide</h1></div><div><div class="authorgroup"><h3 class="corpauthor">Authors: The Apache UIMA Development Community</h3></div></div><div><span class="productname">Apache UIMA Sandbox<br></span></div><div><p class="releaseinfo">Version 2.3.0</p></div><div><p class="copyright">Copyright &copy; 2008, 2009 The Apache Software Foundation</p></div><div><div class="legalnotice"><a name="d0e15"></a><p> </p><p><b>Incubation Notice and Disclaimer.&nbsp;</b>Apache UIMA is an effort undergoing incubation at the Apache Software Foundation (ASF).
Incubation is required of all newly accepted projects until a further review indicates that
the infrastructure, communications, and decision making process have stabilized in a manner
consistent with other successful ASF projects. While incubation status is not necessarily
a reflection of the completeness or stability of the code,
it does indicate that the project has yet to be fully endorsed by the ASF.</p><p> </p><p> </p><p><b>License and Disclaimer.&nbsp;</b>The ASF licenses this documentation
to you under the Apache License, Version 2.0 (the
"License"); you may not use this documentation except in compliance
with the License. You may obtain a copy of the License at
</p><div class="blockquote"><blockquote class="blockquote"><p>
<a xmlns:xlink="http://www.w3.org/1999/xlink" href="http://www.apache.org/licenses/LICENSE-2.0" target="_top">http://www.apache.org/licenses/LICENSE-2.0</a>
</p></blockquote></div><p>
Unless required by applicable law or agreed to in writing,
this documentation and its contents are distributed under the License
on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
</p><p> </p><p> </p><p><b>Trademarks.&nbsp;</b>All terms mentioned in the text that are known to be trademarks or
service marks have been appropriately capitalized. Use of such terms
in this book should not be regarded as affecting the validity of the
the trademark or service mark.
</p></div></div></div><hr></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="chapter"><a href="#_Overview">1.
Overview
</a></span></dt><dd><dl><dt><span class="section"><a href="#_Motivation">1.1.
Motivation
</a></span></dt><dt><span class="section"><a href="#_Approaches_to_feature_extraction">1.2.
Approaches to feature extraction
</a></span></dt><dd><dl><dt><span class="section"><a href="#_Custom_CAS_Consumers">1.2.1.
Custom CAS Consumers
</a></span></dt><dt><span class="section"><a href="#_CFE_approach">1.2.2.
CFE approach
</a></span></dt></dl></dd><dt><span class="section"><a href="#_CFE_Basics">1.3.
CFE Basics
</a></span></dt></dl></dd><dt><span class="chapter"><a href="#_Components">2.
Components
</a></span></dt><dd><dl><dt><span class="section"><a href="#_FESL_XSD">2.1.
FESL XSD
</a></span></dt><dt><span class="section"><a href="#_Source_Code">2.2.
Source Code
</a></span></dt><dt><span class="section"><a href="#_Descriptors">2.3.
Descriptors
</a></span></dt><dt><span class="section"><a href="#_Type_Dependencies">2.4.
Type Dependencies
</a></span></dt></dl></dd><dt><span class="chapter"><a href="#_Configuration_Files">3.
Configuration Files
</a></span></dt><dd><dl><dt><span class="section"><a href="#_Common_notations_and_tags">3.1.
Common notations and tags
</a></span></dt><dd><dl><dt><span class="section"><a href="#_Feature_path">3.1.1.
Feature path
</a></span></dt><dt><span class="section"><a href="#_Full_path_and_partial_path">3.1.2.
Full path and partial path
</a></span></dt><dt><span class="section"><a href="#_TAM_and_FAM">3.1.3.
TAM and FAM
</a></span></dt><dt><span class="section"><a href="#_Arrays">3.1.4.
Arrays
</a></span></dt><dt><span class="section"><a href="#_Parent_tag">3.1.5.
Parent tag
</a></span></dt><dt><span class="section"><a href="#_Null_values">3.1.6.
Null values
</a></span></dt><dt><span class="section"><a href="#_Implicit_TA_exclusion">3.1.7.
Implicit TA exclusion
</a></span></dt></dl></dd><dt><span class="section"><a href="#_FESL_Elements">3.2.
FESL Elements
</a></span></dt><dd><dl><dt><span class="section"><a href="#_BitsetFeatureValuesXML">3.2.1.
BitsetFeatureValuesXML
</a></span></dt><dt><span class="section"><a href="#_EnumFeatureValuesXML">3.2.2.
EnumFeatureValuesXML
</a></span></dt><dt><span class="section"><a href="#_ObjectPathFeatureValue">3.2.3.
ObjectPathFeatureValuesXML
</a></span></dt><dt><span class="section"><a href="#_PatternFeatureValuesXM">3.2.4.
PatternFeatureValuesXML
</a></span></dt><dt><span class="section"><a href="#_RangeFeatureValuesXML">3.2.5.
RangeFeatureValuesXML
</a></span></dt><dt><span class="section"><a href="#_SingleFeatureMatcherXML">3.2.6.
SingleFeatureMatcherXML
</a></span></dt><dt><span class="section"><a href="#_GroupFeatureMatcherXML">3.2.7.
GroupFeatureMatcherXML
</a></span></dt><dt><span class="section"><a href="#_PartialObjectMatcherXML">3.2.8.
PartialObjectMatcherXML
</a></span></dt><dt><span class="section"><a href="#_FeatureObjectMatcherXML">3.2.9.
FeatureObjectMatcherXML
</a></span></dt><dt><span class="section"><a href="#_TargetAnnotationXML">3.2.10.
TargetAntotationXML
</a></span></dt></dl></dd><dt><span class="section"><a href="#_Configuration_file_sample">3.3.
Configuration file sample
</a></span></dt><dd><dl><dt><span class="section"><a href="#_Task_definition">3.3.1.
Task definition
</a></span></dt><dt><span class="section"><a href="#_Implementation">3.3.2.
Implementation
</a></span></dt></dl></dd></dl></dd><dt><span class="chapter"><a href="#_Using_CFE_for_evaluation">4.
Using CFE for evaluation
</a></span></dt></dl></div><div class="chapter" lang="en" id="_Overview"><div class="titlepage"><div><div><h2 class="title"><a name="_Overview"></a>Chapter&nbsp;1.&nbsp;
Overview
</h2></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_Motivation"></a>1.1.&nbsp;
Motivation
</h2></div></div></div><p class="Normal">Feature extraction, the extraction of
information from data sources, is a common task frequently required
to be performed by many different types of applications, such as
machine learning, performance evaluation, and statistical analysis.
This guide describes a tool that can be used to facilitate this
extraction process, in conjunction with the Unstructured Information
Management Architecture (UIMA), particularly focusing on text
processing applications. UIMA provides a mechanism for executing
modules called Analysis Engines that analyze artifacts (text
documents in our case) and store the results of the analysis in a
data structure called the Common Analysis Structure (CAS). These
results are stored as Feature Structures, which are simply data
structures that have an associated type and a set of properties in
the form of attribute/value pairs. Feature Structures that are
attached to a particular span of a text document are called
Annotations. They usually represent a concept that the analysis
engine computes based on the text. The attributes are called
<code class="code">Features</code> in UIMA terminology. This sense of feature will always be
referred to as <code class="code">UIMA feature</code> in this document, so as not to be
confused with the general sense of <code class="code">feature</code> when discussing
<code class="code">feature extraction</code>, referring to the process of extracting values
from data sources (in our case, the CAS). Values that are extracted
are not required to be values of attributes (i.e., UIMA Features) of
Annotations, but can be computed by other methods, as will be shown
later. The terms <code class="code">features</code> and <code class="code">feature values</code>
in this document refer to any value extracted from the CAS, regardless of the particular
source.
</p><p class="Normal"></p><p class="Normal">As an example, Figure 1 depicts annotation objects
of the type Token that are associated with individual words, each
having attributes <code class="code">Index</code> and <code class="code">POS</code> (part of speech). A feature
extraction task could be "extract token indexes for the words that
are nouns". Such a task is translated to the following execution
steps:
</p><div class="orderedlist"><ol type="1"><li><p class="Normal">find an annotation of a type <code class="code">Token</code></p></li><li><p class="Normal">examine the value of <code class="code">POS</code> attribute</p></li><li><p class="Normal">extract the value of <code class="code">Index</code> attribute only if
the value of <code class="code">POS</code> attribute is <code class="code">NN</code>
</p></li></ol></div><p class="Normal">The expression "word that is a noun" defines a
concept, and its implementation is that it has to be found in the
CAS. <code class="code">Token index</code> is the information (i.e., <code class="code">feature</code>) to be
extracted. The resulting values for the task will be values 3 and 9,
which are the values of the attribute <code class="code">Index</code> for the words <code class="code">car</code> and
<code class="code">finish</code>.
</p><p>
<span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-1.jpg"></span>
</p><p class="LREC Caption">
Figure 1: Annotated text sample
</p><p class="Normal">While Figure 1 shows a fairly simple example of
annotations types associated with some text, real world applications
could have quite sophisticated annotation types, storing various
kinds of computed information. Consider an annotation type Car that
has, for illustration purposes, just two attributes: <code class="code">Color</code> and
Engine. While the attribute <code class="code">Color</code> is of type string, the <code class="code">Engine</code>
attribute is a complex annotation type with attributes <code class="code">Cylinders</code> and
<code class="code">Size</code>. This is represented by a UML diagram in Figure 2, illustrating
a class hierarchy on the left and sample instance of this class
structure on the right.
</p><p>
<span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-3.jpg"></span>
</p><p class="LREC Caption">
Figure 2: Composite object sample
</p><p class="Normal">
If a requirement is to extract the number of cylinders of the car's
engine, then the application needs to find any object(s) that represent
the concept of a car (<code class="code">CarAnnotation</code> in this case) and traverse the
object's structure to access the <code class="code">Cylinders</code> attribute of <code class="code">EngineAnnotation</code>.
Once the attribute's value is accessed, the application outputs it to the
desired destination, such as a text file or a database.
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_Approaches_to_feature_extraction"></a>1.2.&nbsp;
Approaches to feature extraction
</h2></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Custom_CAS_Consumers"></a>1.2.1.&nbsp;
Custom CAS Consumers
</h3></div></div></div><p class="Normal">
When working with UIMA, feature extraction is usually implemented by
writing a special UIMA component called a CAS Consumer that contains
custom code for accessing the annotations and their attributes,
outputting them to a file, memory or database as required. The CAS
consumer contains explicit logic for traversing the object's structure
and examining values of specific attributes. Also, the CAS consumer would
likely have code for outputting the accessed values to a particular
destination, as required by the application. Writing CAS consumers can be
labor intensive and requires Java programming. While this approach allows
powerful control and customization to an application's needs, supporting
the code can become problematic, especially as application requirements
change. This can have a negative effect on many different aspects of code
support, such as maintenance, evolution, bug fixing, reusability etc.
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_CFE_approach"></a>1.2.2.&nbsp;
CFE approach
</h3></div></div></div><p class="Normal"></p><p class="Normal">
CFE is a multipurpose tool that enables feature extraction from a UIMA
CAS in a very generalized and application independent way. The extraction
process is performed according to rules expressed using the Feature
Extraction Specification Language (FESL) that are stored in configuration
files. Using CFE eliminates the need for creating customized CAS
consumers and writing Java code for every application. Instead, by using
FESL rules in XML format, users can customize the information extraction
process to suit their application. FESL's rule semantics allow the
precise identification of the information that is required to be
extracted by specifying precise multi-parameter criteria. The FESL syntax
and semantics are defined further in this guide.</p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_CFE_Basics"></a>1.3.&nbsp;
CFE Basics
</h2></div></div></div><p class="Normal">The feature extraction process involves three
major steps:</p><div class="orderedlist"><ol type="1"><li><p class="Normal">
locating a concept of interest that is represented by a UIMA annotation
object; examples of such concepts could be "word that is a noun" or "a
car that has a six cylinder engine" etc. The annotation object that
represents such a concept is referred to as the Target Annotation (TA)
</p></li><li><p class="Normal">
locating concepts, relative to the TAs, specifying the information to
extract. These are also represented by UIMA annotations, that are within
some context of the TAs. Some examples of context could be "to the left
of the TA" or "within the TA" etc. The annotation object that corresponds
to such a concept is referred to as the Feature Annotation (FA).
In relation to Figure 1, an example FA could be the expression "two words
to the left from word finish that is a noun", assuming that "word finish
that is a noun", describes the TA. The result of such a specification
will be tokens <code class="code">at</code> and <code class="code">the</code>
</p></li><li><p class="Normal">extraction of the specified information
from FAs
</p></li></ol></div><p class="Normal">
<a name="FA"></a>
Just to illustrate the process, suppose the requirement is "to
extract indexes of two words to the left of the word finish that is
a noun". In such a scenario, in the first step, CFE locates a TA
that is represented by an annotation object corresponding to a word
<code class="code">finish</code> and also has its <code class="code">POS</code> attribute equal to <code class="code">NN</code>. For the
second step, FAs that correspond to two words to the left from TA
are located. On the third step, values of the <code class="code">Index</code> attribute for
each of FAs that were found are extracted. It is possible, however,
that the requirement is to extract the value of the <code class="code">Index</code> attribute
from the annotation for the word <code class="code">finish</code> itself. In such a case,
the TA and FA are represented by the same UIMA annotation object.
This is usually the case when extracting features for evaluation or
testing. The specification for a TA or FA can be specified by
complex multi-parameter conditions that are also expressed using
FESL, as will be shown later.
</p></div></div><div class="chapter" lang="en" id="_Components"><div class="titlepage"><div><div><h2 class="title"><a name="_Components"></a>Chapter&nbsp;2.&nbsp;
Components
</h2></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_FESL_XSD"></a>2.1.&nbsp;
FESL XSD
</h2></div></div></div><p class="Normal">
The specification for FESL is written in XSD format and stored in the
file &lt;CFE_HOME&gt;/src/main/xsdForEmf/CFEConfigModel.xsd to be used
by EMF-based parser generator and in &lt;CFE_HOME&gt;/src/main/xsdForXMLBeans
for XMLBeans parser generator). Using this XSD in conjunction with an
XML editor that provides syntax validation can
help to provide more efficient editing of FESL configuration files.
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_Source_Code"></a>2.2.&nbsp;
Source Code
</h2></div></div></div><p class="Normal">CFE is implemented in Java 5.0 for Apache UIMA, and
resides in the org.apache.uima.tools.cfe package. CFE is dependent on
Eclipse EMF, Apache UIMA, and the Apache XMLBeans and JXPath
libraries. The source code contains the complete implementation of
CFE, including auxiliary utility classes that wrap some UIMA
functionality (located in org.apache.uima.tools.cfe.support package)
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_Descriptors"></a>2.3.&nbsp;
Descriptors
</h2></div></div></div><p class="Normal">
A sample descriptor file that defines a type system for machine learning
processing is located in
&lt;CFE_HOME&gt;src/main/resources/descriptors/type_system/AppliedSenseAnnotation.xml
</p><p class="Normal">
A sample descriptor that uses CFE in a CAS Consumer is located in
&lt;CFE_HOME&gt;src/main/resources/descriptors/cas_consumers/UIMAFeatureConsumer.xml
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_Type_Dependencies"></a>2.4.&nbsp;
Type Dependencies
</h2></div></div></div><p class="Normal">
CFE code uses UIMA example annotation type
<code class="code">org.apache.uima.examples.SourceDocumentInformation</code>
to retrieve the name of a document that is being processed.
Typically, annotations of this type are produces by a file collection reader,
provided by UIMA examples. If a UIMA application uses a different type
of a reader, an annotation of this type should be created and initialized
for each document prior to execution of TAE. Please see
&lt;CFE_HOME&gt;src/test/java/org/apache/uima/tools/cfe/test/CFEtest.java
for an example.
</p></div></div><div class="chapter" lang="en" id="_Configuration_Files"><div class="titlepage"><div><div><h2 class="title"><a name="_Configuration_Files"></a>Chapter&nbsp;3.&nbsp;
Configuration Files
</h2></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_Common_notations_and_tags"></a>3.1.&nbsp;
Common notations and tags
</h2></div></div></div><p class="Normal">
CFE configuration files are written using FESL semantic rules, as defined
in CFEConfig.xsd. These rules describe the information extraction process
and are independent of the application from which the information is to
be extracted. There are several common notations and tags that are used
in different elements of FESL
</p><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Feature_path"></a>3.1.1.&nbsp;
Feature path
</h3></div></div></div><p class="Normal">
A "feature path" is a mechanism used by FESL to identify a particular
feature (not necessarily a UIMA feature) of an annotation. The value
associated with the feature, indicated by the feature path, can be either
evaluated to match a certain criteria or extracted to the final output or
both. The syntax of a feature path is an indexed sequence of
attribute/method names separated by the colon character. Such a sequence
mimics the sequence of Java method calls required to extract the feature
value. For example, a value of the <code class="code">EngineAnnotation</code> attribute <code class="code">Cylinders</code>
from Figure 2 can be written as <code class="code">CarAnnotation:Engine:Cylinders</code>, where
Engine is an attribute of <code class="code">CarAnnotation</code>. The intermediate results of each
step of the call sequence can be referred from different FESL structural
elements by their zero-based index. For instance, the Parent Tag notation
(see below) uses the index to access intermediate values. The feature
path can be used to identify feature values that are either primitives or
complex object types.
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Full_path_and_partial_path"></a>3.1.2.&nbsp;
Full path and partial path
</h3></div></div></div><p class="Normal">
There are two different ways of using feature path notation to identify
an object: full path and partial path. The object can be one of the
following:
</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">an annotation</p></li><li style="list-style-type: disc"><p class="Normal">value of an annotation's attribute</p></li><li style="list-style-type: disc"><p class="Normal">
value of a result of an annotation's method; only get-style methods
(methods that return a value and take no parameters) are supported.
</p></li></ul></div><p class="Normal">
A full path specifies a path to an object starting from its type. For
instance, if <code class="code">EngineAnnotation</code> is specified as a full path, it would refer
to all instances of annotations of that type. If <code class="code">CarAnnotation:Engine</code> is
specified, it would refer only to instances of the <code class="code">EngineAnnotation</code> type that are
attributes of instances of the <code class="code">CarAnnotation</code> type. Full path notation is usually
used for TA or FA identification.
</p><p class="Normal">
A partial path specifies a path to an object starting from a previously
located annotation object (whether TA or FA). For example, if an instance
of <code class="code">CarAnnotation</code> is located as a TA, then the size of its engine can be
specified as Engine:Size. Partial path notation is usually used for
specification of feature values that are being examined or extracted.
The distinction between "full path" and "partial path" is very similar to
the concepts of "absolute path" and "relative path" when discussing a
computer's file system.
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_TAM_and_FAM"></a>3.1.3.&nbsp;
TAM and FAM
</h3></div></div></div><p class="Normal">
Each FESL rule is represented by a1 XML element with the tag
<code class="code">targetAnnotation</code>
, as specified in the XSD by the
<a href="#_TargetAnnotationXML" title="3.2.10.&nbsp; TargetAntotationXML">
<span class="Hyperlink2">TargetAnnotationXML</span>
</a>
type. Each element of this type is a composition of:
</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">
a single target annotation matcher (
<code class="code">TAM</code>
) that is denoted by an XML element with the tag
<code class="code">targetAnnotationMatcher</code>
, of the type
<a href="#_PartialObjectMatcherXML" title="3.2.8.&nbsp; PartialObjectMatcherXML">
<code class="code">PartialObjectMatcherXML</code>
</a>
</p></li><li style="list-style-type: disc"><p class="Normal">
optional feature annotation matchers (
<code class="code">FAM</code>
) denoted by XML elements with the tag <code class="code">featureAnnotationMatchers</code>,
of the type
<a href="#_FeatureObjectMatcherXML" title="3.2.9.&nbsp; FeatureObjectMatcherXML">
<code class="code">FeatureObjectMatcherXML</code>
</a>
</p></li></ul></div><p class="Normal">
The
<code class="code">TAM</code>
specifies search criteria for locating Target Annotations (
<code class="code">TA</code>
s), while
<code class="code">FAM</code>
s contain criteria for locating Feature Annotations (
<code class="code">FA</code>
s) and the specification of features for extraction from the
<code class="code">FA</code>
s. The criteria for the search and the features to be extracted are
specified using the
<a href="#_Feature_path" title="3.1.1.&nbsp; Feature path">
<span class="Hyperlink1">feature path</span>
</a>
notation, as explained earlier. The XML tags representing the
matchers are detailed below.
<span class="system1"> </span>
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Arrays"></a>3.1.4.&nbsp;
Arrays
</h3></div></div></div><p class="Normal">
Since UIMA annotations may have arrays as attributes, FESL provides the
ability to perform feature extraction from array objects. In particular,
going back to Figure 2, if the implementation for the <code class="code">Wheels</code> attribute is
a UIMA <code class="code">FSArray</code> type, then using feature path notation:
</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">
the feature value for the
<code class="code">Wheels</code>
attribute of
<code class="code">FSArray</code>
type can be specified as <code class="code">CarAnnotation:Wheels</code>.
</p></li><li style="list-style-type: disc"><p class="Normal">
the feature value for the number of elements in the
<code class="code">FSArray</code>
can be specified as <code class="code">CarAnnotation:Wheels:size</code>, where size is a
method of
<code class="code">FSArray</code>
; such value corresponds to a concept of how many wheels the car
has.
</p></li><li style="list-style-type: disc"><p class="Normal">the feature values for individual elements of
<code class="code">Wheels</code> attribute of type <code class="code">WheelAnnotation</code> can be accessed as
<code class="code">CarAnnotation:Wheels:toArray</code>. It should be noted that <code class="code">toArray</code> is a
name of a method of the <code class="code">FSArray</code> type rather than a name of an
attribute.</p></li><li style="list-style-type: disc"><p class="Normal">the feature values for <code class="code">Diameter</code> attribute of each
<code class="code">WheelAnnotation</code> can be specified as
<code class="code">CarAnnotation:Wheels:toArray:Diameter</code>
</p></li></ul></div><p class="Normal">
The result of using toArray as an accessor is an array of values. FESL
also provides syntax for accessing individual elements of arrays by index.
</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">
the feature for the diameter of the first wheel can be specified as
<code class="code">CarAnnotation:Wheels:toArray[0]:Diameter</code>
</p></li><li style="list-style-type: disc"><p class="Normal">
the feature for the diameter of the first and second wheels can be
specified as <code class="code">CarAnnotation:Wheels:toArray[0][1]:Diameter</code>
</p></li><li style="list-style-type: disc"><p class="Normal">
the feature for the diameter of first three wheels can be specified
as <code class="code">CarAnnotation:Wheels:toArray[0-2]:Diameter</code>
</p></li></ul></div><p class="Normal">
The specification of individual elements can be mixed for example:
<code class="code">CarAnnotation:Wheels:toArray[0][2-3]:Diameter</code> refers to all elements of
<code class="code">Wheels</code> attribute except the second. If the index specified falls outside
the range of the matched data, a null value will be assigned.
</p><p class="Normal">
If required, FESL allows sorting extracted features by an offset in the
text of the annotations that these features are extracted from. For
instance <code class="code">CarAnnotation:Wheels:toArray[sort]:Diameter</code> would ensure such
an order.
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Parent_tag"></a>3.1.5.&nbsp;
Parent tag
</h3></div></div></div><p class="Normal">
The parent tag is used to access a specific element of a feature path of
a TA or FA by index. If a parent tag is used within a TAM specification,
it is applied to the full path of the corresponding TA. Likewise, parent
tags contained in FAMs are applied to the full a path of the
corresponding FA. The tag consists of <code class="code">__p</code> prefix followed by the index
of an element that is being accessed. For instance, <code class="code">__p0</code> addresses the
first element of a feature path. The tag can be a part of a feature path.
For example, if a TA is specified as <code class="code">CarAnnotation:Wheels:toArray</code>,
corresponding to a concept of "wheels of a car" then the value of the
<code class="code">Color</code> attribute of a <code class="code">CarAnnotation</code> object can be accessed by specifying
<code class="code">__p0:Color</code>. Such a specification can be used when it is required to
examine/extract features of a containing annotation along with features
of contained annotations. Samples of using parent tags are provided in
the sections that detail FESL syntax, below.
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Null_values"></a>3.1.6.&nbsp;
Null values
</h3></div></div></div><p class="Normal">
CFE allows comparing feature values for equality to null. The root XML
element CFEConfig has a string attribute <code class="code">nullValueImage</code> that sets a
literal representation of a null value. If an extracted feature value is
null, it will be converted to a string that is assigned the
<code class="code">nullValueImage</code> attribute. The example below illustrates the usage of this
attribute.
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Implicit_TA_exclusion"></a>3.1.7.&nbsp;
Implicit TA exclusion
</h3></div></div></div><p class="Normal">
While all FAM specifications for a single TAM are independent from
each other, there is an implicit dependency between TAMs. In
particular, they are dependent on the order in which they are
specified in a configuration file. Annotations corresponding to
certain concepts that were identified by a TAM that appear earlier in
the configuration file will be excluded from further processing by
FESL. This rule only applies to TAMs that use the
<code class="code">fullPath</code>
attribute in their specification (see
<a href="#_PartialObjectMatcherXML" title="3.2.8.&nbsp; PartialObjectMatcherXML">
<span class="Hyperlink1">
<code class="code">PartialObjectMatcherXML</code>
</span>
</a>
). Having the implicit exclusion helps to separate the processing of
same type annotations in the case when these annotations have
different semantic meaning. For instance, the set of features that is
required to be extracted from annotations of type
<code class="code">EngineAnnotation</code>
that are attributes of
<code class="code">CarAnnotation</code>
objects can be different than a set of features that is required to
be extracted from annotations of the same
<code class="code">EngineAnnotation</code>
type that are attributes of some other type or are not attached to
any annotations of other types. To implement such a behavior in FESL,
the fist
<code class="code">TAM</code>
would contain criteria for locating
<code class="code">EngineAnnotation</code>
objects that are attached to objects of the
<code class="code">CarAnnotation</code>
type, while the second
<code class="code">TAM</code>
would not specify any restriction on containment of objects of the
<code class="code">EngineAnnotation</code>
type. If such a specification is given, all
<code class="code">EngineAnnotation</code>
objects located according to the rule in the first
<code class="code">TAM</code>
will be excluded from further processing and, hence, will not be
available for processing by rules given in the second
<code class="code">TAM</code>
</p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_FESL_Elements"></a>3.2.&nbsp;
FESL Elements
</h2></div></div></div><p class="Normal">
FESL's XSD defines several elements that allow specify rules for feature
extraction. These elements may contains attributes and other elements in
their definition
</p><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_BitsetFeatureValuesXML"></a>3.2.1.&nbsp;
BitsetFeatureValuesXML
</h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: bitmask[1]: Integer</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: exact_match[0..1]: boolean: default false</p></li></ul></div><p>
<span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-7.jpg" align="middle"></span>
</p><p class="Normal">
The specification enables comparing a feature value to an integer
bitmask. The feature value is considered to be matched if it is of an
Integer type and:
</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">
if the <code class="code">exact_match</code> attribute is set to true and all "1" bits specified in
bitmask are also set in feature value
</p></li><li style="list-style-type: disc"><p class="Normal">
if the <code class="code">exact_match</code> attribute is set to false and any of "1" bits
specified in bitmask is also set in feature value
</p></li></ul></div><p class="Normal">Example:</p><p class="Normal">&lt;bitsetFeatureValues bitmask="3" exact_match="false" /&gt;</p><p class="Normal">&lt;bitsetFeatureValues bitmask="3" exact_match="true" /&gt;</p><p class="Normal">
The first line of the example specifies a test whether either of the two
less significant bits of a feature value is set. To be successful, the
test specified by the second line requires both less significant bits to be set.
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_EnumFeatureValuesXML"></a>3.2.2.&nbsp;
EnumFeatureValuesXML
</h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: caseSensitive[0..1]: boolean: default false</p></li><li style="list-style-type: disc"><p class="Normal">Element: values[0..*]: String</p></li></ul></div><p>
<span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-8.jpg" align="middle"></span>
</p><p class="Normal">
EnumFeatureValuesXML element allow to test if a feature value belongs to
a finite set of values. According to EnumFeatureValuesXML specification,
if a feature value is equal to either one of the elements of values then
the feature is considered to be successfully evaluated. The <code class="code">caseSensitive</code>
attribute indicates whether the comparison between the feature value and
members of the values element is case sensitive. The FESL fragment below
shows how to specify such a comparison:
</p><p class="Normal">&lt;enumFeatureValues caseSensitive="true"&gt;</p><p class="Normal">&lt;values&gt;red&lt;/values&gt;</p><p class="Normal">&lt;values&gt;green&lt;/values&gt;</p><p class="Normal">&lt;values&gt;blue&lt;/values&gt;</p><p class="Normal">&lt;/enumFeatureValues&gt;</p><p class="Normal">
This fragment specifies a case sensitive comparison of a feature value to
a set of strings: <code class="code">red</code>, <code class="code">green</code> and <code class="code">blue</code>.
</p><p class="Normal">
Special processing occurs when the array has only a single element that
starts with <code class="code">file://</code>, enabling the use of external dictionaries for
comparison. In this case, the text within the
<code class="code">values</code>
element is treated as a URI. The contents of the file referenced by the
URI will be loaded and used as a set of values against which the feature
value is going to be tested. The file should contain one dictionary entry
per line, with each line starting with the <code class="code">#</code> character considered to be
a comment and thus will not be loaded. The dictionary handling is
implemented in org.apache.uima.tools.cfe.EnumeratedEntryDictionary. The default
implementation supports single token (whitespace separated) dictionary
entries. If a more sophisticated dictionary format is desired, then
either the constructor's parameters can be changed or methods for
initializing and loading the dictionary from a file can be overridden.
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_ObjectPathFeatureValue"></a>3.2.3.&nbsp;
ObjectPathFeatureValuesXML
</h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: objectPath[1]: String</p></li></ul></div><p>
<span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-9.jpg" align="middle"></span>
</p><p class="Normal">
According to ObjectPathFeatureValuesXML specification, the
<a href="#_CFE_Basics" title="1.3.&nbsp; CFE Basics">TA</a>
or
<a href="#_CFE_Basics" title="1.3.&nbsp; CFE Basics">
<span class="Hyperlink1">FA</span>
</a>
itself (depending on whether this element is in
<a href="#_TAM_and_FAM" title="3.1.3.&nbsp; TAM and FAM">
<span class="Hyperlink1">TAM</span>
</a>
or in
<a href="#_TAM_and_FAM" title="3.1.3.&nbsp; TAM and FAM">
<span class="Hyperlink1">FAM</span>
</a>)
is tested whether it is at the location defined by the objectPath. This
ability to evaluate whether a feature belongs to some CAS object is
useful specifically in the cases where a particular feature value is the
property of several different objects. For instance, this element can be
used when features from annotations should be extracted only if they are
attributes of other annotations. The FESL fragment below specifies a test
that checks if an object's full path is
<code class="code">org.apache.uima.tools.cfe.sample.CarAnnotation:Wheels:toArray</code>. Such a test, for
instance, can be used to check if an instance of a <code class="code">WheelAnnotation</code>
belongs to an instance <code class="code">CarAnnotation</code>:
</p><p class="Normal">
&lt;objectFeatureValues objectPath="org.apache.uima.tools.cfe.sample.CarAnotation:Wheels:toArray"b&gt;
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_PatternFeatureValuesXM"></a>3.2.4.&nbsp;
PatternFeatureValuesXML
</h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: pattern[1]: String</p></li></ul></div><p>
<span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-10.jpg" align="middle"></span>
</p><p class="Normal">
The PatternFeatureValuesXML element enables comparing a feature value
against a regular expression specified by the <code class="code">pattern</code> attribute using
Java Regular Expression syntax and considered to be successfully
evaluated if the value matches the pattern.
</p><p class="Normal">
The FESL fragment below defines a test that checks if a feature value
conforms to the hex number format:
</p><p class="Normal">&lt;patternFeatureValues pattern="(0[Xx][0-9A-Fa-f]+)" /&gt;</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_RangeFeatureValuesXML"></a>3.2.5.&nbsp;
RangeFeatureValuesXML
</h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: lowerBoundary[0..1]: Comparable: default 0</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: lowerBoundaryInclusive[0..1]: boolean default false</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: upperBoundary[0..1]: Comparable default 0</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: upperBoundaryInclusive[0..1]: boolean default false</p></li></ul></div><div class="mediaobject"><span></span></div><p class="Normal">
According to RangeFeatureValuesXML specification the feature value is
evaluated whether it is of a Comparable type and belongs to the interval
specified by the attributes <code class="code">lowerBoundary</code> and <code class="code">upperBoundary</code>. The
attributes <code class="code">lowerBoundaryInclusive</code> and <code class="code">upperBoundaryInclusive</code> indicate
whether the corresponding boundaries should be included in the range for
comparison. FESL fragment below specifies a test that checks if feature
value is in the numeric range between 1 and 5, including 1 and excluding
5:
</p><p class="Normal">
&lt;rangeFeatureValues lowerBoundary="1.8" upperBoundaryInclusive="true" upperBoundary="3.0" /&gt;</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_SingleFeatureMatcherXML"></a>3.2.6.&nbsp;
SingleFeatureMatcherXML
</h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: featurePath[1]: String</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: featureTypeName[0..1]: String: no default value</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: exclude[0..1]: boolean: default false</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: quiet[0..1]: boolean: default false</p></li><li style="list-style-type: disc"><p class="Normal">Element: featureValues one of: </p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">bitsetFeatureValues: BitsetFeatureValuesXML</p></li><li style="list-style-type: disc"><p class="Normal">enumFeatureValues: EnumFeatureValuesXML</p></li><li style="list-style-type: disc"><p class="Normal">objectPathFeatureValues: ObjectPathFeatureValuesXML</p></li><li style="list-style-type: disc"><p class="Normal">patternFeatureValues: PatternFeatureValuesXML</p></li><li style="list-style-type: disc"><p class="Normal">rangeFeatureValues: RangeFeatureValuesXML</p></li></ul></div></li></ul></div><p>
<span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-12.jpg" align="middle"></span>
</p><p class="Normal">
The <code class="code">SingleFeatureMatcherXML</code> defines rules for matching of a feature value
to the featureValues element. The featureValues can be one of the
elements in the bullet list above. The previous section detailed rules
for matching a feature value to each of these elements. According to the
specification for matching of a single feature value, first, a value of a
feature denoted by the required <code class="code">featurePath</code> attribute is located. For
features that have arrays in their featurePath multiple values can be
found. If such value(s) is found and optional <code class="code">featureTypeName</code> attribute
specifies a type name of the feature value, every found feature value is
tested to be of that type. If the test is successful, then feature values
are evaluated according to a specification given in featureValues. After
the evaluation is performed a single feature is considered to be
successfully evaluated if:
</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">
the exclude attribute value is set to false and at least one
feature value is matched to <code class="code">featureValues</code> specification.
</p></li><li style="list-style-type: disc"><p class="Normal">
the exclude attribute value is set to true and none of the
feature values is matched to <code class="code">featureValues</code> specification.
</p></li></ul></div><p class="Normal">
For <code class="code">SingleFeatureMatcherXML</code> elements that are parts of TAM element only
evaluation of feature values is performed. If a <code class="code">SingleFeatureMatcherXML</code>
element is a part of FAM then the feature value is output only if the
<code class="code">quiet</code> attribute is set to false. If the value of the <code class="code">quiet</code> attribute is
set to true, then, even if the feature is matched, only an evaluation is
performed, but no value is written into the final output. A <code class="code">featurePath</code>
attribute uses feature path notation explained earlier.
</p><p class="Normal">
FESL fragment below defines a test that checks if a value of the <code class="code">Size</code>
attribute is in a range defined by <code class="code">rangeFeatureVulues</code> element:
</p><p class="Normal">&lt;featureMatchers featurePath="Size" featureTypeName="java.lang.Float"&gt;</p><p class="Normal">&lt;rangeFeatureValues lowerBoundary="1.8" upperBoundaryInclusive="true" upperBoundary="3.0"/&gt;</p><p class="Normal">&lt;/featureMatchers&gt;</p><p class="Normal">
In addition it is allowed to use the parent tag (see
<a href="#_Parent_tag" title="3.1.5.&nbsp; Parent tag">
<span class="Hyperlink1">Parent tag</span>
</a>)
in the <code class="code">featurePath</code> attribute. A sample in the <code class="code">PartialObjectMatcherXML</code>
section detail on how use the parent tag notation.
</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_GroupFeatureMatcherXML"></a>3.2.7.&nbsp;
GroupFeatureMatcherXML
</h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: exclude[0..1]: boolean: default false</p></li><li style="list-style-type: disc"><p class="Normal">Element: featureMatchers[1..*]: SingleFeatureMatcherXML</p></li></ul></div><p>
<span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-13.jpg" align="middle"></span>
</p><p class="Normal">
This is a specification for matching a group of features. It can be applied
to both types of annotations, TAs and FAs. Each element in featureMatchers is
evaluated against either a TA or a FA annotation. The group is considered to
be matched if:
</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">
the <code class="code">exclude</code> attribute value is set ao false and all elements in
<code class="code">featureMatchers</code> have been successfully evaluated.
</p></li><li style="list-style-type: disc"><p class="Normal">
the <code class="code">exclude</code> attribute value is set to true and evaluation of either
of the elements in <code class="code">featureMatchers</code> is unsuccessful
</p></li></ul></div><p class="Normal">
The FESL fragment below defines a group with the two features <code class="code">Color</code> and
<code class="code">Wheels:Size</code> to be matched. The entire group is to be successfully evaluated
if both features are matched. The first feature is successfully evaluated if
its value is one of the values listed by its <code class="code">enumFeatureValues</code> element and
the second feature is matched if its value is not in the set contained in its
<code class="code">enumFeatureValues</code> element, as specified by its <code class="code">exclude</code> attribute. It should
be noted that if the optional attribute <code class="code">featureTypeName</code> is omitted then a
feature value is assumed to be of a string type. Otherwise a feature value's type
will be evaluated if it is the same or derived from the type specified by the
<code class="code">featureTypeName</code> attribute. Assuming the <code class="code">groupFeatureMatcher</code> is specified for
the <code class="code">CarAnnotation</code> type, the test defined by a FESL fragment below is
successful is a car is ether red, green or blue and it does not have 1 or 3
wheels:
</p><p class="Normal">&lt;groupFeatureMatchers&gt;</p><p class="Normal"> &lt;featureMatchers featurePath="Color" featureTypeName="java.lang.Stting"&gt; </p><p class="Normal"> &lt;enumFeatureValues caseSensitive="true"&gt; </p><p class="Normal"> &lt;values&gt;red&lt;/values&gt; </p><p class="Normal"> &lt;values&gt;green&lt;/values&gt;</p><p class="Normal"> &lt;values&gt;blue&lt;/values&gt;</p><p class="Normal"> &lt;/enumFeatureValues&gt;</p><p class="Normal"> &lt;/featureMatcher&gt;</p><p class="Normal"> &lt;featureMatchers featurePath="Wheels:Size" exclude="true"&gt;</p><p class="Normal"> &lt;enumFeatureValues caseSensitive="true"&gt;</p><p class="Normal"> &lt;values&gt;1&lt;/values&gt;</p><p class="Normal"> &lt;values&gt;3&lt;/values&gt;</p><p class="Normal"> &lt;/enumFeatureValues&gt;</p><p class="Normal"> &lt;/featureMatchers&gt;</p><p class="Normal">&lt;groupFeatureMatchers&gt;</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_PartialObjectMatcherXML"></a>3.2.8.&nbsp;
PartialObjectMatcherXML
</h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: annotationTypeName[1]: String</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: fullPath[0..1]: String: no default value</p></li><li style="list-style-type: disc"><p class="Normal">
Element: groupFeatureMatchers[0..*]: GroupFeatureMatcherXML
</p></li></ul></div><p>
<span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-14.jpg" align="middle"></span>
</p><p class="Normal">
This is a base specification for an annotation matcher that will search
annotations of a type specified by <code class="code">annotationTypeName</code> located on a path
specified by <code class="code">fullPath</code>. If <code class="code">fullPath</code> is omitted or just contains the type
name of an annotation (same as <code class="code">annotationTypeName</code> attribute) then all
instances of that type are considered for further feature value
evaluation. If <code class="code">fullPath</code> contains a path to an object from an attribute of
a different object, then only instances of <code class="code">annotationTypeName</code> that
located on that path will be considered for further evaluation Once an
annotation is successfully evaluated to match a type/path, its features
are evaluated according to specification given in all elements of
<code class="code">groupFeatureMatchers</code>. If evaluation of any <code class="code">groupFeatureMatchers</code> is
successful or if no <code class="code">groupFeatureMatchers</code> is given, then the annotation is
considered to be successfully evaluated. The <code class="code">fullPath</code> attribute should be
specified using syntax described in the
<a href="#_Feature_path" title="3.1.1.&nbsp; Feature path">
<span class="Hyperlink2">feature path</span>
</a>
section above, with the exception that it can not contain any parent tags.
For instance, a specification where a value of the <code class="code">fullPath</code> attribute is
<code class="code">CarAnnotation:Engine</code> and a value of the <code class="code">annotationTypeName</code> is
<code class="code">EngineAnnotation</code> would address only engines that are car engines.
<code class="code">PartialAnnotationMatcherXML</code> is used to specify search rules in TAM
specifications. To illustrate the use of parent tag notation let's
consider an example where it is required to identify engines of blue
cars that have a size more than 1.8 l but not greater then 3.0 l.
According to a class diagram in Figure 2, the FESL fragment below defines
rules for the task. It should be noted that the second feature matcher
uses the
<a href="#_Parent_tag" title="3.1.5.&nbsp; Parent tag">
<span class="Hyperlink2">parent tag</span>
</a> notation to access a value of the <code class="code">CarAnnotation</code>'s attribute <code class="code">Color</code>:
</p><p class="Normal">&lt;targetAnnotationMatcher annotationTypeName="EngineAnnotation" fullPath="CarAnnotation:EngineAnnotation" &gt;</p><p class="Normal"> &lt;groupFeatureMatchers&gt;</p><p class="Normal"> &lt;featureMatchers featurePath="Size" featureTypeName="java.lang.Float"&gt;</p><p class="Normal"> &lt;rangeFeatureValues lowerBoundary="1.8" upperBoundaryInclusive="true" upperBoundary="3.0"/&gt;</p><p class="Normal"> &lt;/featureMatchers&gt;</p><p class="Normal"> &lt;featureMatchers featurePath="__p0:Color" featureTypeName="java.lang.String"</p><p class="Normal"> &lt;enumFeatureValues caseSensitive="true"&gt;</p><p class="Normal"> &lt;values&gt;red&lt;/values&gt;</p><p class="Normal"> &lt;values&gt;green&lt;/values&gt;</p><p class="Normal"> &lt;values&gt;blue&lt;/values&gt;</p><p class="Normal"> &lt;/enumFeatureValues&gt;</p><p class="Normal"> &lt;/featureMatcher&gt;</p><p class="Normal"> &lt;groupFeatureMatchers&gt;</p><p class="Normal">&lt;/targetAnnotationMatcher&gt;</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_FeatureObjectMatcherXML"></a>3.2.9.&nbsp;
FeatureObjectMatcherXML
</h3></div></div></div><p class="Normal">extends <code class="code">PartialAnnotationMatcherXML</code></p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: windowsizeLeft[0..1]: Integer: default 0</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: windowsizeInside[0..L]: Integer: default 0</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: windowsizeRight[0..1]: Integer: default 0</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: windowsizeEnclosed[0..1]: Integer: default 0</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: windowFlags[0..1]: Integer: default 0</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: orientation[0..1]: boolean: default false</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: distance[0..1]: boolean: default false</p></li></ul></div><p>
<span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-15.jpg" align="middle"></span>
</p><p class="Normal">
The <code class="code">FeatureObjectMatcherXML</code> element contains rules that specify how
<code class="code">FeatureAnnotations</code> (FA) should be located and which features should be
extracted from them. It inherits its properties from
<code class="code">PartialObjectMatcherXML</code>. In addition it has semantics for specifying:
</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">a size of a search window</p></li><li style="list-style-type: disc"><p class="Normal">
a direction for the search relative to a corresponding Target Annotation (TA).
</p></li></ul></div><p class="Normal">
It is done by using boolean attributes <code class="code">windowsizeLeft</code>, <code class="code">windowsizeInside</code>,
<code class="code">windowsizeRight</code>, <code class="code">windowsizeEnclosed</code> and the bitmask <code class="code">windowFlags</code> attribute
that indicate FA's search rules:
</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">windowsizeLeft - a size of the search window to the left from TA</p></li><li style="list-style-type: disc"><p class="Normal">windowsizeRight - a size of the search window to the right from TA</p></li><li style="list-style-type: disc"><p class="Normal">windowsizeInside - a size of the search window within TA boundaries; if the value of this attribute is 1, then the TA is considered to be an FA at the same time</p></li><li style="list-style-type: disc"><p class="Normal">windowFlags - more precise criteria for search window; the value if this attribute is a bitmask with a combination of the following values:</p><div class="orderedlist"><ol type="a"><li><p class="Normal">1 - FA starts to the left from the TA and ends to the left from the TA</p></li><li><p class="Normal">2 - FA starts to the left from the TA and ends inside of TA boundaries</p></li><li><p class="Normal">4 - FA starts to the left from the TA and ends to the right from the TA</p></li><li><p class="Normal">8 - FA starts inside of the TA and ends inside of the TA boundaries</p></li><li><p class="Normal">16 - FA starts inside of the TA boundaries and ends to the right from the TA</p></li><li><p class="Normal">32 - FA starts to the right from the TA and ends to the right from the TA</p></li></ol></div></li></ul></div><p class="Normal">
The location of a FA is included in the generated output according to
optional orientation and distance attributes. For example, if values of
both of these attributes are set to true and the FA is a first annotation
of required type to the left from TA, then the generated feature value
will start with the prefix <code class="code">L1</code>. If the values are set to false, then the
feature value's prefix will be <code class="code">X0</code>. This allows generating unique
feature names for model building and evaluation for machine learning.
</p><p class="Normal">
<code class="code">FeatureObjectMatcherXML</code> is used to specify search rules in FAM
specifications.
</p><p class="Normal">
The FESL fragment below adds rules to the previous sample to extract a
number of cylinders from engines of cars whose wheels diameter is at
least 20.0":
</p><p class="Normal">&lt;targetAnnotationMatcher annotationTypeName="EngineAnnotation" fullPath="CarAnnotation:EngineAnnotation" &gt;</p><p class="Normal"> &lt;groupFeatureMatchers&gt;</p><p class="Normal"> &lt;featureMatchers featurePath="Size" featureTypeName="java.lang.Float"&gt;</p><p class="Normal"> &lt;rangeFeatureValues lowerBoundary="1.8" upperBoundaryInclusive="true" upperBoundary="3.0"/&gt;</p><p class="Normal"> &lt;/featureMatchers&gt;</p><p class="Normal"> &lt;featureMatchers featurePath="__p0:Color" featureTypeName="java.lang.String"&gt;</p><p class="Normal"> &lt;enumFeatureValues caseSensitive="true"&gt;</p><p class="Normal"> &lt;values&gt;red&lt;/values&gt;</p><p class="Normal"> &lt;values&gt;green&lt;/values&gt;</p><p class="Normal"> &lt;values&gt;blue&lt;/values&gt;</p><p class="Normal"> &lt;/enumFeatureValues&gt;</p><p class="Normal"> &lt;/featureMatcher&gt;</p><p class="Normal"> &lt;groupFeatureMatchers&gt;</p><p class="Normal">&lt;/targetAnnotationMatcher&gt;</p><p class="Normal">&lt;featureAnnotationMatcher annotationTypeName="EngineAnnotation" fullPath="CarAnnotation:EngineAnnotation" windowsizeInside=1 &gt;</p><p class="Normal"> &lt;groupFeatureMatchers&gt;</p><p class="Normal"> &lt;featureMatchers featurePath="__p0:Wheels:toArray:Diameter" featureTypeName="java.lang.Float" quiet="true" &gt;</p><p class="Normal"> &lt;rangeFeatureValues lowerBoundary="20.0" lowerBoundaryInclusive="true"/&gt;</p><p class="Normal"> &lt;/featureMatcher&gt;</p><p class="Normal"> &lt;featureMatchers featurePath="Cylinders" featureTypeName="java.lang.Float" /&gt;</p><p class="Normal"> &lt;groupFeatureMatchers&gt;</p><p class="Normal">&lt;/featureAnnotationMatcher&gt;</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_TargetAnnotationXML"></a>3.2.10.&nbsp;
TargetAntotationXML
</h3></div></div></div><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">Attribute: className[1]: String</p></li><li style="list-style-type: disc"><p class="Normal">Attribute: enclosingAnnotation[1]: String</p></li><li style="list-style-type: disc"><p class="Normal">Element targetAnnotationMatcher[1..1]: PartialObjectMatcherXML</p></li><li style="list-style-type: disc"><p class="Normal">
Element featureAnnotationMatchers[0..*]: FeatureObjectMatcherXML
</p></li></ul></div><p>
<span class="inlinemediaobject"><img src="../images/CFE_UG/CFE_UG-16.jpg" align="middle"></span>
</p><p class="Normal">
This is a root specification for a class (group) of annotations of all
extracted instances, which are assigned the same label (className) in the
final output. The label can be a literal string or a feature path in
curly brackets or a combination of the two (i.e.
<code class="code">SomeText_{__p0:SomeProperty}</code>). If using a feature path in a class name
label it is required to use the parent tag notation. In such a case the
parent tag refers to the TA specified by the <code class="code">targetAnnotationMatcher</code>
element. Annotations that belong to the group are searched within a span
of <code class="code">enclosingAnnotation</code> according to the specification given in the
<code class="code">targetAnnotationMatcher</code> (TAM) and features from matched annotations are
extracted according to specification given in <code class="code">featureAnnotationMatchers</code>
(FAM). In general, the annotation that features are extracted from could
be different from annotations that are matched during the search This is
useful when extracting features for machine learning model building and
evaluation where features are selected from annotations that could be
located in a specific location relatively to the annotation that satisfy
a search criteria. For instance, POS tags of 5 words to the left and
right from a specific word. Only if an annotation is successfully
evaluated (matched) by a TAM further feature extraction is allowed and
rules specified by corresponding FAMs are executed.
</p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_Configuration_file_sample"></a>3.3.&nbsp;
Configuration file sample
</h2></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Task_definition"></a>3.3.1.&nbsp;
Task definition
</h3></div></div></div><p class="Normal">
The sample configuration file below has been created for extracting
features in order to build models for a machine learning application. The
type system for this sample defines several UIMA annotation types:
</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">org.apache.uima.tools.cfe.sample.Sentence - type that marks a sentence</p></li><li style="list-style-type: disc"><p class="Normal">org.apache.uima.tools.cfe.sample.Token - type that marks a token with features:</p></li></ul></div><p class="Normal">pennTag: String - POS tag of a token</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">org.apache.uima.tools.cfe.sample.NamedEntity - named entity type with features:</p></li></ul></div><p class="Normal">Code: String - specific code assigned to a named entity</p><p class="Normal">SemanticClass: String - semantic class of a named entity</p><p class="Normal">Tokens: FSArray - array of org.apache.uima.tools.cfe.sample.Token annotations, ordered by their offset, that are included in the named entity</p><p class="Normal">The classification task is defined as follows:</p><div class="orderedlist"><ol type="a"><li><p class="Normal">
classify first token of each named entities that has semantic
class <code class="code">Car Maker</code> with a class label that is a composite of
the string <code class="code">CMBegin</code> and a value of the <code class="code">Code</code> attribute that
named entity
</p></li><li><p class="Normal">
classify all other tokens of named entities of a semantic class
<code class="code">Car Maker</code> with a class label that is a composite of the string
<code class="code">CMInside</code> and a value of the <code class="code">Code</code> property of that named entity
</p></li><li><p class="Normal">classify all other tokens with a class label <code class="code">Other_Token</code></p></li></ol></div><p class="Normal">
To build a model for machine learning it is required to extract
features from surrounding tokens for all classes listed above.
In particular the following features are required to be extracted:
</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">a string literal of the token to which the class label is assigned (<code class="code">class token</code>)</p></li><li style="list-style-type: disc"><p class="Normal">
a string literal of each token that is located with in a window of 5
tokens from the <code class="code">class token</code> with the exception of prepositions (POS tag
is IN), conjunctions (CC), delimiters (DT), punctuation (POS tag is not
defined - null) and numbers (CD)
</p></li><li style="list-style-type: disc"><p class="Normal">
all extracted features have to be unique with their position information
relative to the location of the <code class="code">class token</code>.
</p></li></ul></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="_Implementation"></a>3.3.2.&nbsp;
Implementation
</h3></div></div></div><p class="Normal">Line 1 - a standard XML declaration that defines the XML version of the document and its encoding</p><p class="Normal">Line 2, 87 - FESL root element that references the schema and defines global variables, such as nullValueImage (see
<a href="#_Null_values" title="3.1.6.&nbsp; Null values">
<span class="Hyperlink1">Null values</span>
</a>)
</p><p class="Normal">Line 3-32 - rules for extracting features for first tokens of named entities.</p><p class="Normal">Line 3 - extracted features for those tokens are assigned a composite label that includes prefix <code class="code">CMBegin_</code> pl s a value of a <code class="code">Code</code> attribute of the first element of the TA's path. The search for FA is going to be performed within boundaries of enclosing org.apache.uima.tools.cfe.sample.Sentence annotation</p><p class="Normal">Line 4-12 - TAM that defines rules for identifying the fist TA</p><p class="Normal">Line 4 - defines TA's type (org.apache.uima.tools.cfe.sample.Token) and a full path to it (org.apache.uima.tools.cfe.sample.NamedEntity:Tokens:toArray[0]). According to this path notion, the CFE will:</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">search for annotations of type org.apache.uima.tools.cfe.sample.NamedEntity</p></li><li style="list-style-type: disc"><p class="Normal">
for annotations that were found it accesses the value of their attribute
Tokens and if the value is not null, the method toArray is called to
convert the value to an array
</p></li><li style="list-style-type: disc"><p class="Normal">if the resulted array is not empty, its first element will be considered to be a TA </p></li></ul></div><p class="Normal">Line 5-11 - defines rules for matching a group of features for TA</p><p class="Normal">Line 6-10 - defines rules for matching a feature for this group</p><p class="Normal">Line 6 - defines that the feature value is of the type
java.lang.String and has the feature the path __p0:SemanticClass, which
translates to a value of the attribute SemanticClass of the first element of
the TA's path (org.apache.uima.tools.cfe.sample.NamedEntity)
</p><p class="Normal">Line 7-9 - defines an explicit list of values that the feature value should be in</p><p class="Normal">Line 8 - defines the value <code class="code">Car Maker</code> as the only possible value for the feature </p><p class="Normal">Line 13-17 - FAM that defines rules for identifying first FA and its feature extraction</p><p class="Normal">Line 13 - defines FA's type to be org.apache.uima.tools.cfe.sample.Token;
the attribute windowsizeInside with the value 1 tells CFE to extract features from TA
itself (TA=FA) and setting orientation and distance attributes to true tells CFE to
include position information into the generated feature value
</p><p class="Normal">Line 14-16 - defines rules for matching a group of features for the first FA.</p><p class="Normal">Line 15 - defines rules for matching the only feature for
this group of the type java.lang.String and with feature path coveredText that
eventually will be translated by CFE to a method call of a org.apache.uima.tools.cfe.sample.Token
annotation object; according to this specification the feature value will be
unconditionally extracted
</p><p class="Normal">Line 18-31 - FAM that defines rules for identifying second type of FA and its feature extraction</p><p class="Normal">Line 18 - defines FA's type to be org.apache.uima.tools.cfe.sample.Token;
the attributes windowsizeLeft and windowsizeRight with the values 5 tell CFE
to extract features from 5 nearest annotations of this type to the left and
to the right from TA and having orientation and distance attributes set to
true tells CFE to include position information into the generated feature
value.
</p><p class="Normal">Line 19-30 - defines rules for matching a group of features for the second FA.</p><p class="Normal">Line 20 - defines rules for matching the first feature of
the group to be of the type java.lang.String and with the feature path
coveredText that eventually will be translated by CFE to a method call of a
org.apache.uima.tools.cfe.sample.Token annotation object; according to this
specification the feature value will be unconditionally extracted
</p><p class="Normal">Line 21-29 - define rules for matching the second feature of the group</p><p class="Normal">Line 21 - defines rules for matching the second feature
of the group to be of the type java.lang.String and with the feature path
pennTag that eventually will be translated by CFE to <code class="code">getPennTag</code> method call
of a org.apache.uima.tools.cfe.sample.Token annotation object; according to this
specification the feature will be evaluated against
<span class="Hyperlink1">enumFeatureValues</span>
and, as the exclude attribute is set to true:
</p><div class="itemizedlist"><ul type="disc"><li style="list-style-type: disc"><p class="Normal">
if the evaluation is successful, the feature matcher will cause the
parent group to be unmatched and since it is the only group in the
FAM, no output for this FA will be produced
</p></li><li style="list-style-type: disc"><p class="Normal">
if the evaluation is unsuccessful, this feature matcher will not affect
matching status of the group, so the output for FA will be generated as
the first matcher of the group unconditionally produces output
</p></li></ul></div><p class="Normal">As the
<span class="Hyperlink1">quiet</span>
attribute is set to true, the feature value extracted by the second
matcher will not be added to the generated for this FA output </p><p class="Normal">Line 22-28 - defines an explicit list of values that the
value of the second feature should be in
</p><p class="Normal">Line 23-27 - defines values <code class="code">IN</code>, <code class="code">CC</code>, <code class="code">DT</code>, <code class="code">CD</code>, <code class="code">null</code>
as possible values for the second feature; if the feature value is equal
to one of these values, evaluation of the enclosing feature matcher is
successful; if the feature value is null it will be converted to the
string defined by
<a href="#_Null_values" title="3.1.6.&nbsp; Null values">
<span class="Hyperlink1">nullValueImage</span>
</a>
(<code class="code">null</code> as set in line 2 of this sample) and as <code class="code">null</code> is one of the
list's elements, it will be successfully evaluated.
</p><p class="Normal">Line 34-63 - rules for extracting features for all tokens
of named entities except the first. These rules are the same as the rules
defined for first tokens of named entities (lines 3-32) with the following
exceptions:
</p><p class="Normal">Line 34 - defines that TAs matched by these rules will
be assigned a composite label that includes prefix <code class="code">CMInside_</code> plus a
value of the <code class="code">Code</code> attribute of a first element of the TA's path
</p><p class="Normal">Line 35 - sets the fullPath attribute to
org.apache.uima.tools.cfe.sample.NamedEntity:Tokens:toArray that can be
translated as <code class="code">any token of a named entity</code>, but because of
<a href="#_Implicit_TA_exclusion" title="3.1.7.&nbsp; Implicit TA exclusion">
<span class="Hyperlink1">implicit TA exclusion</span>
</a>
, the TAs that were matched for first tokens of named entities by the
rules for previous TAM are not included into the set of TAs that will be
evaluated by rules for this TAM
</p><p class="Normal">Line 65-86 - rules for extracting features for all tokens
other than tokens of named entities. These rules are the same as the rules
defined for previous categories with the following exceptions:
</p><p class="Normal">Line 65 - defines that TAs matched by the enclosed
rules will be assigned the string label <code class="code">Other_token</code>
</p><p class="Normal">Line 66 - only defines a type of TAs that should be
processed by the corresponding TAM without fullPath attribute. Such a
notation can be translated as <code class="code">all tokens</code>, but because of the
<a href="#_Implicit_TA_exclusion" title="3.1.7.&nbsp; Implicit TA exclusion">
<span class="Hyperlink1">implicit TA exclusion</span>
</a>
, the TAs, which were matched for tokens of named entities by rules
defined by the previous TAMs, are not included into the set of TAs that
will be evaluated by rules for this TAM. So, the actual translation will
be <code class="code">all tokens other than tokens of named entities.</code>
</p><div class="orderedlist"><ol type="1" compact><li>&lt;?xml version="1.0" encoding="UTF-8"?&gt;</li><li>&lt;tns:CFEConfig nullValueImage="null"
xmlns:tns="http://www.apache.org/uima/cfe/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.apache.org/uima/cfe/config CFEConfig.xsd "&gt;
</li><li> &lt;tns:targetAnnotations className="CMBegin_{__p0:Code}"
enclosingAnnotation="org.apache.uima.tools.cfe.sample.Sentence"&gt;
</li><li> &lt;tns:targetAnnotationMatcher
annotationTypeName="org.apache.uima.tools.cfe.sample.Token"
fullPath="org.apache.uima.tools.cfe.sample.NamedEntity:Tokens:toArray[0]"&gt;
</li><li> &lt;tns:groupFeatureMatchers&gt;</li><li> &lt;tns:featureMatchers featurePath="__p0:SemanticClass"
featureTypeName="java.lang.String"&gt;</li><li> &lt;tns:enumFeatureValues&gt;</li><li> &lt;tns:values&gt;Car Maker&lt;/tns:values&gt;</li><li> &lt;/tns:enumFeatureValues&gt;</li><li> &lt;/tns:featureMatchers&gt;</li><li> &lt;/tns:groupFeatureMatchers&gt;</li><li> &lt;/tns:targetAnnotationMatcher&gt;</li><li> &lt;tns:featureAnnotationMatchers annotationTypeName=
"org.apache.uima.tools.cfe.sample.Token" windowsizeInside="1"
orientation="true" distance="true"&gt;
</li><li> &lt;tns:groupFeatureMatchers&gt;</li><li> &lt;tns:featureMatchers featurePath="coveredText"
featureTypeName="java.lang.String"/&gt;</li><li> &lt;/tns:groupFeatureMatchers&gt;</li><li> &lt;/tns:featureAnnotationMatchers&gt;</li><li> &lt;tns:featureAnnotationMatchers annotationTypeName=
"org.apache.uima.tools.cfe.sample.Token" windowsizeLeft="5"
windowsizeRight="5" orientation="true" distance="true"&gt;
</li><li> &lt;tns:groupFeatureMatchers&gt;</li><li> &lt;tns:featureMatchers
featurePath="coveredText" featureTypeName="java.lang.String"/&gt;
</li><li> &lt;tns:featureMatchers featurePath="pennTag"
featureTypeName="java.lang.String" exclude="true" quiet="true"&gt;
</li><li> &lt;tns:enumFeatureValues caseSensitive="true"&gt;</li><li> &lt;tns:values&gt;IN&lt;/tns:values&gt;</li><li> &lt;tns:values&gt;CC&lt;/tns:values&gt;</li><li> &lt;tns:values&gt;DT&lt;/tns:values&gt;</li><li> &lt;tns:values&gt;CD&lt;/tns:values&gt;</li><li> &lt;tns:values&gt;null&lt;/tns:values&gt;</li><li> &lt;/tns:enumFeatureValues&gt;</li><li> &lt;/tns:featureMatchers&gt;</li><li> &lt;/tns:groupFeatureMatchers&gt;</li><li> &lt; tns:featureAnnotationMatchers&gt;</li><li> &lt;/tns:targetAnnotations&gt;</li><li></li><li> &lt;tns:targetAnnotations className="CMInside_{__p0:Code}"
enclosingAnnotation="org.apache.uima.tools.cfe.sample.Sentence"&gt;
</li><li> &lt;tns:targetAnnotationMatcher
annotationTypeName="org.apache.uima.tools.cfe.sample.Token"
fullPath="org.apache.uima.tools.cfe.sample.NamedEntity:Tokens:toArray"&gt;
</li><li> &lt;tns:groupFeatureMatchers&gt;</li><li> &lt;tns:featureMatchers featurePath="__p0:SemanticClass"
featureTypeName="java.lang.String"&gt;
</li><li> &lt;tns:enumFeatureValues&gt;</li><li> &lt;tns:values&gt;Car Maker&lt;/tns:values&gt;</li><li> &lt;/tns:enumFeatureValues&gt;</li><li> &lt;/tns:featureMatchers&gt;</li><li> &lt;/tns:groupFeatureMatchers&gt;</li><li> &lt;/tns:targetAnnotationMatcher&gt;</li><li> &lt;tns:featureAnnotationMatchers
annotationTypeName="org.apache.uima.tools.cfe.sample.Token"
windowsizeInside="1" orientation="true" distance="true"&gt;
</li><li> &lt;tns:groupFeatureMatchers&gt;</li><li> &lt;tns:featureMatchers
featurePath="coveredText" featureTypeName="java.lang.String"/&gt;
</li><li> &lt;/tns:groupFeatureMatchers&gt;</li><li> &lt;/tns:featureAnnotationMatchers&gt;</li><li> &lt;tns:featureAnnotationMatchers
annotationTypeName="org.apache.uima.tools.cfe.sample.Token" windowsizeLeft="5"
windowsizeRight="5" orientation="true" distance="true"&gt;
</li><li> &lt;tns:groupFeatureMatchers&gt;</li><li> &lt;tns:featureMatchers
featurePath="coveredText" featureTypeName="java.lang.String"/&gt;
</li><li> &lt;tns:featureMatchers
featurePath="pennTag" featureTypeName="java.lang.String" exclude="true" quiet="true"&gt;
</li><li> &lt;tns:enumFeatureValues caseSensitive="true"&gt;</li><li> &lt;tns:values&gt;IN&lt;/tns:values&gt;</li><li> &lt;tns:values&gt;CC&lt;/tns:values&gt;</li><li> &lt;tns:values&gt;DT&lt;/tns:values&gt;</li><li> &lt;tns:values&gt;CD&lt;/tns:values&gt;</li><li> &lt;tns:values&gt;null&lt;/tns:values&gt;</li><li> &lt;/tns:enumFeatureValues&gt;</li><li> &lt;/tns:featureMatchers&gt;</li><li> &lt;/tns:groupFeatureMatchers&gt;</li><li> &lt;/tns:featureAnnotationMatchers&gt;</li><li> &lt;/tns:targetAnnotations&gt;</li><li></li><li> &lt;tns:targetAnnotations className="Other_token"
enclosingAnnotation="org.apache.uima.tools.cfe.sample.Sentence"&gt;
</li><li> &lt;tns:targetAnnotationMatcher
annotationTypeName="org.apache.uima.tools.cfe.sample.Token"/&gt;
</li><li> &lt;tns:featureAnnotationMatchers
annotationTypeName="org.apache.uima.tools.cfe.sample.Token"
windowsizeInside="1" orientation="true" distance="true"&gt;
</li><li> &lt;tns:groupFeatureMatchers&gt;</li><li> &lt;tns:featureMatchers featurePath="coveredText"
featureTypeName="java.lang.String"/&gt;</li><li> &lt;/tns:groupFeatureMatchers&gt;</li><li> &lt;/tns:featureAnnotationMatchers&gt;</li><li> &lt;tns:featureAnnotationMatchers
annotationTypeName="org.apache.uima.tools.cfe.sample.Token"
windowsizeLeft="c" windowsizeRight="5" orientation="true" distance="true"&gt;
</li><li> &lt;tns:groupFeatureMatchers&gt;</li><li> &lt;tns:featureMatchers featurePath="coveredText"
featureTypeName="java.lang.String"/&gt;
</li><li> &lt;tns:featureMatchers featurePath="pennTag"
featureTypeName="java.lang.String" exclude="true" quiet="true"&gt;
</li><li> &lt;tns:enumFeatureValues caseSensitive="true"&gt;</li><li> &lt;tns:values&gt;IN&lt;/tns:values&gt;</li><li> &lt;tns:values&gt;CC&lt;/tns:values&gt;</li><li> &lt;tns:values&gt;DT&lt;/tns:values&gt;</li><li> &lt;tns:values&gt;CD&lt;/tns:values&gt;</li><li> &lt;tns:values&gt;null&lt;/tns:values&gt;</li><li> &lt;/tns:enumFeatureValues&gt;</li><li> &lt;/tns:featureMatchers&gt;</li><li> &lt;/tns:groupFeatureMatchers&gt;</li><li> &lt;/tns:featureAnnotationMatchers&gt;</li><li> &lt;/tns:targetAnnotations&gt;</li><li>&lt;/tns:CFEConfig&gt;</li></ol></div></div></div></div><div class="chapter" lang="en" id="_Using_CFE_for_evaluation"><div class="titlepage"><div><div><h2 class="title"><a name="_Using_CFE_for_evaluation"></a>Chapter&nbsp;4.&nbsp;
Using CFE for evaluation
</h2></div></div></div><p class="Normal">
Comparison of results produced by a pipeline of UIMA annotators to a
<code class="code">gold standard</code> or results of two different NLP systems is a frequent
task. With CFE this task can be automated.
</p><p class="Normal">
The paper "CFE a system for testing, evaluation and machine learning of
UIMA based applications" by Sominsky, Coden and Tanenblatt describes details of the
evaluation process.
</p></div></div></body></html>