blob: 726e7f46d6285a71249a8b4e5c6d084ea4660792 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
<!ENTITY imgroot "images/tools/ruta/workbench/" >
<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >
%uimaents;
]>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<section id="section.ugr.tools.ruta.workbench.testing">
<title>Testing</title>
<para> The UIMA Ruta Workbench comes bundled with its own testing environment that allows you to
test and evaluate UIMA Ruta scripts. It provides full back-end testing capabilities and allows
you to examine test results in detail.
</para>
<para> To test the quality of a written UIMA Ruta script, the testing procedure compares a
previously annotated gold standard file with the resulting xmiCAS file created by the selected
UIMA Ruta script. As a product of the testing operation a new xmiCAS file will be created,
containing detailed information about the test results. The evaluators compare the offsets of
annotations and, depending on the selected evaluator, add true positive, false positive or false
negative annotations for each tested annotation to the resulting xmiCAS file. Afterwards
precision, recall and f1-score are calculated for each test file and each type in the test file.
The f1-score is also calculated for the whole test set. The testing environment consists of four
views: Annotation Test, True Positive, False Positive and False Negative. The Annotation Test
view is by default associated with the UIMA Ruta perspective.
</para>
<note><para>
There are two options for choosing the types that should be evaluated, which is specified by the preference <quote>Use all types</quote>.
If this preference is activated (by default), then the user has to selected the types using the toolbar in the view.
There are button for selecting the included and excluded types. If this preference is deactivated,
then only the types present in the current test document are evaluated. This can result in missing false positive, if
the an annotation of a specific type was created by the rules and no annotation of this type is present in the test document.
</para></note>
<para>
<xref linkend='figure.ugr.tools.ruta.workbench.testing.script_explorer' />
shows the script explorer. Every UIMA Ruta project contains a folder called
<quote>test</quote>. This folder is the default location for the test-files. In the folder each script file has its
own subfolder with a relative path equal to the scripts package path in the
<quote>script</quote> folder. This folder contains the test files. In every scripts test folder, you will also find a
result folder where the results of the tests are saved. If you like to use test files from
another location in the file system, the results will be saved in the
<quote>temp</quote> subfolder of the project's test folder. All files in the temp folder will be deleted once
Eclipse is closed.
</para>
<para>
<figure id="figure.ugr.tools.ruta.workbench.testing.script_explorer">
<title>Test folder structure. </title>
<mediaobject>
<imageobject role="html">
<imagedata format="PNG" align="center"
fileref="&imgroot;testing/script_explorer.png" />
</imageobject>
<imageobject role="fo">
<imagedata format="PNG" align="center"
fileref="&imgroot;testing/script_explorer.png" />
</imageobject>
<textobject>
<phrase> The test folder structure. </phrase>
</textobject>
</mediaobject>
</figure>
</para>
<section id="section.ugr.tools.ruta.workbench.testing.usage">
<title>Usage</title>
<para> This section describes the general proceeding when using the testing environment. </para>
<para>
Currently, the testing environment has no own perspective associated to it. It is recommended
to start within the UIMA Ruta perspective. There, the Annotation Test view is open by
default. The True Positive, False Positive and False Negative views have to be opened
manually:
<quote>Window -> Show View -> True Positive/False Positive/False Negative </quote>.
</para>
<para> To explain the usage of the UIMA Ruta testing environment, the UIMA Ruta example project
is used again. Open this project.
Firstly, one has to select a script for testing: UIMA Ruta will always test the script, that
is currently open and active in the script editor. So, open the
<quote>Main.ruta</quote>
script file of the UIMA Ruta example project.
The next <link linkend='figure.ugr.tools.ruta.workbench.testing.annotation_test_initial_view'>figure</link>.
shows the Annotation Test view after doing this.
</para>
<para>
<figure id="figure.ugr.tools.ruta.workbench.testing.annotation_test_initial_view">
<title>
The Annotation Test view. Button from left to right:
Start Test; Select excluded type; Select included type; Select evaluator/preferences; Export to CSV; Extend Classpath
</title>
<mediaobject>
<imageobject role="html">
<imagedata width="576px" format="PNG" align="center"
fileref="&imgroot;testing/annotation_test_initial_view_2_2_0.png" />
</imageobject>
<imageobject role="fo">
<imagedata width="5.5in" format="PNG" align="center"
fileref="&imgroot;testing/annotation_test_initial_view_2_2_0.png" />
</imageobject>
<textobject>
<phrase> The Annotation Test view. </phrase>
</textobject>
</mediaobject>
</figure>
</para>
<para> All control elements that are needed for the interaction with the testing environment
are located here. At the top right, there is the buttons bar. At the top left
of the view the name of the script that is going to be tested is shown. It is
always equal to the script active in the editor. Below this, the test list is
located. This list contains the different files for testing. Right next to the name of the script
file you can select the desired view. Right to this you get statistics
over all ran tests: the number of all true positives (TP), false positives (FP) and false
negatives (FN). In the field below, you will find a table with statistic
information for a single selected test file. To change this view, select a file in the test
list field. The table shows a total TP, FP and FN information, as well as precision, recall
and f1-score for every type as well as for the whole file.
</para>
<para>
There is also an experimental feature to extend the classpath during testing, which allows to
evaluate scripts that call analysis engines in the same workspace.
Therefore, you have to toggle the button in the toolbar of the view.
</para>
<para>
Next, you have to add test files to your project. A test file is a previously annotated xmiCAS
file that can be used as a golden standard for the test. You can use any xmiCAS file. The
UIMA Ruta example project already contains such test files. These files are listed
in the Annotation Test view. Try do delete these files by selecting them and clicking on
<literal>Del</literal>. Add these files again by simply dragging them from the Script Explorer into the test file
list. A different way to add test-files is to use the
<quote>Load all test files from selected folder</quote>
button (green plus). It can be used to add all xmiCAS files from a selected folder.
</para>
<para>
Sometimes it is necessary to create some annotations manually: To create annotations manually,
use the
<quote>Cas Editor</quote>
perspective delivered with the UIMA workbench.
</para>
<para>
The testing environment supports different evaluators that allow a
sophisticated analysis of the behavior of a UIMA Ruta script. The evaluator can be chosen in
the testing environment's preference page. The preference page can be opened either through
the menu or by clicking on the
<quote>Select evaluator</quote>
button (blue gear wheels) in the testing view's toolbar. Clicking the button will open a
filtered version of the UIMA Ruta preference page. The default evaluator is the "Exact CAS
Evaluator", which compares the offsets of the annotations between the test file and the file
annotated by the tested script. To get an overview of all available evaluators, see
<xref linkend='section.ugr.tools.ruta.workbench.testing.evaluators' />
</para>
<para>
This preference page (see <xref linkend='figure.ugr.tools.ruta.workbench.testing.preference' />)
offers a few options that will modify the plug-ins general behavior. For example, the
preloading of previously collected result data can be turned off. An important option in the
preference page is the evaluator you can select. On default the "exact evaluator" is selected,
which compares the offsets of the annotations, that are contained in the file produced by the
selected script with the annotations in the test file. Other evaluators will compare
annotations in a different way.
</para>
<para>
<figure id="figure.ugr.tools.ruta.workbench.testing.preference">
<title>The testing preference page view </title>
<mediaobject>
<imageobject role="html">
<imagedata width="476px" format="PNG" align="center" fileref="&imgroot;testing/preference_2_2_0.png" />
</imageobject>
<imageobject role="fo">
<imagedata width="4in" format="PNG" align="center" fileref="&imgroot;testing/preference_2_2_0.png" />
</imageobject>
<textobject>
<phrase> The testing preference page view. </phrase>
</textobject>
</mediaobject>
</figure>
</para>
<para>
During a test-run it might be convenient to disable testing for specific
types like punctuation or tags. The
<quote>Select excluded types</quote>
button (white exclamation in a red disk) will open a dialog (see <xref linkend='figure.ugr.tools.ruta.workbench.testing.excluded_types' />)
where all types can be selected that should not be considered in the test.
</para>
<para>
<figure id="figure.ugr.tools.ruta.workbench.testing.excluded_types">
<title>Excluded types window </title>
<mediaobject>
<imageobject role="html">
<imagedata format="PNG" align="center"
fileref="&imgroot;testing/excluded_types.png" />
</imageobject>
<imageobject role="fo">
<imagedata format="PNG" align="center"
fileref="&imgroot;testing/excluded_types.png" />
</imageobject>
<textobject>
<phrase> Excluded types window. </phrase>
</textobject>
</mediaobject>
</figure>
</para>
<para>
A test-run can be started by clicking on the start button. Do this for the
UIMA Ruta example project.
<xref linkend='figure.ugr.tools.ruta.workbench.testing.annotation_test_test_run' />
shows the results.
</para>
<para>
<figure id="figure.ugr.tools.ruta.workbench.testing.annotation_test_test_run">
<title>The Annotation Test view. </title>
<mediaobject>
<imageobject role="html">
<imagedata width="576px" format="PNG" align="center"
fileref="&imgroot;testing/annotation_test_test_run_2_2_0.png" />
</imageobject>
<imageobject role="fo">
<imagedata width="5.5in" format="PNG" align="center"
fileref="&imgroot;testing/annotation_test_test_run_2_2_0.png" />
</imageobject>
<textobject>
<phrase> The Annotation Test view. </phrase>
</textobject>
</mediaobject>
</figure>
</para>
<para>The testing main view displays some information on how well the script
did after every test run. It will display an overall number of true positive, false positive
and false negatives annotations of all result files as well as an overall f1-score.
Furthermore, a table will be displayed that contains the overall statistics of the selected
test file as well as statistics for every single type in the test file. The information
displayed are true positives, false positives, false negatives, precision, recall and
f1-measure. </para>
<para>
The testing environment also supports the export of the overall data in form of a
comma-separated table. Clicking the
<quote>export data</quote>
button will open a dialog window that contains this table. The text in this table can be
copied and easily imported into other applications.
</para>
<para>
When running a test, the evaluator will create a new result xmiCAS file and will
add new true positive, false positive and false negative annotations. By clicking on a file in
the test-file list, you can open the corresponding result xmiCAS file in the CAS
Editor. While displaying the result xmiCAS file in the CAS Editor, the True Positive, False
Positive and False Negative views allow easy navigation through the new tp, fp and fn
annotations. The corresponding annotations are displayed in a hierarchic tree structure. This
allows an easy tracing of the results within the testing document. Clicking on one of the
annotations in those views will highlight the annotation in the CAS Editor. Opening
<quote>test1.result.xmi</quote>
in the UIMA Ruta example project changes the True Positive view as shown in
<xref linkend='figure.ugr.tools.ruta.workbench.testing.true_positive' />.
Notice that the type system, which will be used by the CAS Editor to open the evaluated file,
can only be resolved for the tested script, if the test files are located in the associated
folder structure that is the folder with the name of the script. If the files are located
in the temp folder, for example by adding the files to the list of test cases by drag and drop,
other strategies to find the correct type system will be applied. For UIMA Ruta projects,
for example, this will be the type system of the last launched script in this project.
</para>
<para>
<figure id="figure.ugr.tools.ruta.workbench.testing.true_positive">
<title>The True Positive view. </title>
<mediaobject>
<imageobject role="html">
<imagedata format="PNG" align="center"
fileref="&imgroot;testing/true_positive.png" />
</imageobject>
<imageobject role="fo">
<imagedata format="PNG" align="center"
fileref="&imgroot;testing/true_positive.png" />
</imageobject>
<textobject>
<phrase> The True Positive view. </phrase>
</textobject>
</mediaobject>
</figure>
</para>
</section>
<section id="section.ugr.tools.ruta.workbench.testing.evaluators">
<title>Evaluators</title>
<para> When testing a CAS file, the system compared the offsets of the annotations of a
previously annotated gold standard file with the offsets of the annotations of the result file
the script produced. Responsible for comparing annotations in the two CAS files are
evaluators. These evaluators have different methods and strategies implemented for comparing the
annotations. Also, an extension point is provided that allows easy implementation
of new evaluators. </para>
<para> Exact Match Evaluator: The Exact Match Evaluator compares the offsets of the annotations
in the result and the golden standard file. Any difference will be marked with either a false
positive or false negative annotations. </para>
<para> Partial Match Evaluator: The Partial Match Evaluator compares the offsets of the
annotations in the result and golden standard file. It will allow differences in the beginning
or the end of an annotation. For example, "corresponding" and "corresponding " will not be
annotated as an error. </para>
<para> Core Match Evaluator: The Core Match Evaluator accepts annotations that share a core
expression. In this context, a core expression is at least four digits long and starts with a
capitalized letter. For example, the two annotations "L404-123-421" and "L404-321-412" would be
considered a true positive match, because "L404" is considered a core expression that is
contained in both annotations. </para>
<para> Word Accuracy Evaluator: Compares the labels of all words/numbers in an annotation,
whereas the label equals the type of the annotation. This has the consequence, for example,
that each word or number that is not part of the annotation is counted as a single false
negative. For example in the sentence: "Christmas is on the 24.12 every year." The script
labels "Christmas is on the 12" as a single sentence, while the test file labels the sentence
correctly with a single sentence annotation. While, for example, the Exact CAS Evaluator is
only assigning a single False Negative annotation, Word Accuracy Evaluator will mark every word
or number as a single false negative. </para>
<para> Template Only Evaluator: This Evaluator compares the offsets of the annotations and the
features, that have been created by the script. For example, the text "Alan Mathison Turing" is
marked with the author annotation and "author" contains 2 features: "FirstName" and
"LastName". If the script now creates an author annotation with only one feature, the
annotation will be marked as a false positive. </para>
<para> Template on Word Level Evaluator: The Template On Word Evaluator compares the offsets of
the annotations. In addition, it also compares the features and feature structures and the
values stored in the features. For example, the annotation "author" might have features like
"FirstName" and "LastName". The authors name is "Alan Mathison Turing" and the script correctly
assigns the author annotation. The feature assigned by the script are "Firstname : Alan",
"LastName : Mathison", while the correct feature values are "FirstName Alan" and "LastName
Turing". In this case, the Template Only Evaluator will mark an annotation as a false positive,
since the feature values differ. </para>
</section>
</section>