<?xml version="1.0" encoding="UTF-8"?> | |
<!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" | |
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[ | |
<!ENTITY imgroot "images/tools/ruta/workbench/" > | |
<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" > | |
%uimaents; | |
]> | |
<!-- | |
Licensed to the Apache Software Foundation (ASF) under one | |
or more contributor license agreements. See the NOTICE file | |
distributed with this work for additional information | |
regarding copyright ownership. The ASF licenses this file | |
to you under the Apache License, Version 2.0 (the | |
"License"); you may not use this file except in compliance | |
with the License. You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, | |
software distributed under the License is distributed on an | |
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | |
KIND, either express or implied. See the License for the | |
specific language governing permissions and limitations | |
under the License. | |
--> | |
<section id="section.ugr.tools.ruta.workbench.testing"> | |
<title>Testing</title> | |
<para> The UIMA Ruta Workbench comes bundled with its own testing environment that allows you to | |
test and evaluate UIMA Ruta scripts. It provides full back-end testing capabilities and allows | |
you to examine test results in detail. | |
</para> | |
<para> To test the quality of a written UIMA Ruta script, the testing procedure compares a | |
previously annotated gold standard file with the resulting xmiCAS file created by the selected | |
UIMA Ruta script. As a product of the testing operation a new xmiCAS file will be created, | |
containing detailed information about the test results. The evaluators compare the offsets of | |
annotations and, depending on the selected evaluator, add true positive, false positive or false | |
negative annotations for each tested annotation to the resulting xmiCAS file. Afterwards | |
precision, recall and f1-score are calculated for each test file and each type in the test file. | |
The f1-score is also calculated for the whole test set. The testing environment consists of four | |
views: Annotation Test, True Positive, False Positive and False Negative. The Annotation Test | |
view is by default associated with the UIMA Ruta perspective. | |
</para> | |
<note><para> | |
There are two options for choosing the types that should be evaluated, which is specified by the preference <quote>Use all types</quote>. | |
If this preference is activated (by default), then the user has to selected the types using the toolbar in the view. | |
There are button for selecting the included and excluded types. If this preference is deactivated, | |
then only the types present in the current test document are evaluated. This can result in missing false positive, if | |
the an annotation of a specific type was created by the rules and no annotation of this type is present in the test document. | |
</para></note> | |
<para> | |
<xref linkend='figure.ugr.tools.ruta.workbench.testing.script_explorer' /> | |
shows the script explorer. Every UIMA Ruta project contains a folder called | |
<quote>test</quote>. This folder is the default location for the test-files. In the folder each script file has its | |
own subfolder with a relative path equal to the scripts package path in the | |
<quote>script</quote> folder. This folder contains the test files. In every scripts test folder, you will also find a | |
result folder where the results of the tests are saved. If you like to use test files from | |
another location in the file system, the results will be saved in the | |
<quote>temp</quote> subfolder of the project's test folder. All files in the temp folder will be deleted once | |
Eclipse is closed. | |
</para> | |
<para> | |
<figure id="figure.ugr.tools.ruta.workbench.testing.script_explorer"> | |
<title>Test folder structure. </title> | |
<mediaobject> | |
<imageobject role="html"> | |
<imagedata format="PNG" align="center" | |
fileref="&imgroot;testing/script_explorer.png" /> | |
</imageobject> | |
<imageobject role="fo"> | |
<imagedata format="PNG" align="center" | |
fileref="&imgroot;testing/script_explorer.png" /> | |
</imageobject> | |
<textobject> | |
<phrase> The test folder structure. </phrase> | |
</textobject> | |
</mediaobject> | |
</figure> | |
</para> | |
<section id="section.ugr.tools.ruta.workbench.testing.usage"> | |
<title>Usage</title> | |
<para> This section describes the general proceeding when using the testing environment. </para> | |
<para> | |
Currently, the testing environment has no own perspective associated to it. It is recommended | |
to start within the UIMA Ruta perspective. There, the Annotation Test view is open by | |
default. The True Positive, False Positive and False Negative views have to be opened | |
manually: | |
<quote>Window -> Show View -> True Positive/False Positive/False Negative </quote>. | |
</para> | |
<para> To explain the usage of the UIMA Ruta testing environment, the UIMA Ruta example project | |
is used again. Open this project. | |
Firstly, one has to select a script for testing: UIMA Ruta will always test the script, that | |
is currently open and active in the script editor. So, open the | |
<quote>Main.ruta</quote> | |
script file of the UIMA Ruta example project. | |
The next <link linkend='figure.ugr.tools.ruta.workbench.testing.annotation_test_initial_view'>figure</link>. | |
shows the Annotation Test view after doing this. | |
</para> | |
<para> | |
<figure id="figure.ugr.tools.ruta.workbench.testing.annotation_test_initial_view"> | |
<title> | |
The Annotation Test view. Button from left to right: | |
Start Test; Select excluded type; Select included type; Select evaluator/preferences; Export to CSV; Extend Classpath | |
</title> | |
<mediaobject> | |
<imageobject role="html"> | |
<imagedata width="576px" format="PNG" align="center" | |
fileref="&imgroot;testing/annotation_test_initial_view_2_2_0.png" /> | |
</imageobject> | |
<imageobject role="fo"> | |
<imagedata width="5.5in" format="PNG" align="center" | |
fileref="&imgroot;testing/annotation_test_initial_view_2_2_0.png" /> | |
</imageobject> | |
<textobject> | |
<phrase> The Annotation Test view. </phrase> | |
</textobject> | |
</mediaobject> | |
</figure> | |
</para> | |
<para> All control elements that are needed for the interaction with the testing environment | |
are located here. At the top right, there is the buttons bar. At the top left | |
of the view the name of the script that is going to be tested is shown. It is | |
always equal to the script active in the editor. Below this, the test list is | |
located. This list contains the different files for testing. Right next to the name of the script | |
file you can select the desired view. Right to this you get statistics | |
over all ran tests: the number of all true positives (TP), false positives (FP) and false | |
negatives (FN). In the field below, you will find a table with statistic | |
information for a single selected test file. To change this view, select a file in the test | |
list field. The table shows a total TP, FP and FN information, as well as precision, recall | |
and f1-score for every type as well as for the whole file. | |
</para> | |
<para> | |
There is also an experimental feature to extend the classpath during testing, which allows to | |
evaluate scripts that call analysis engines in the same workspace. | |
Therefore, you have to toggle the button in the toolbar of the view. | |
</para> | |
<para> | |
Next, you have to add test files to your project. A test file is a previously annotated xmiCAS | |
file that can be used as a golden standard for the test. You can use any xmiCAS file. The | |
UIMA Ruta example project already contains such test files. These files are listed | |
in the Annotation Test view. Try do delete these files by selecting them and clicking on | |
<literal>Del</literal>. Add these files again by simply dragging them from the Script Explorer into the test file | |
list. A different way to add test-files is to use the | |
<quote>Load all test files from selected folder</quote> | |
button (green plus). It can be used to add all xmiCAS files from a selected folder. | |
</para> | |
<para> | |
Sometimes it is necessary to create some annotations manually: To create annotations manually, | |
use the | |
<quote>Cas Editor</quote> | |
perspective delivered with the UIMA workbench. | |
</para> | |
<para> | |
The testing environment supports different evaluators that allow a | |
sophisticated analysis of the behavior of a UIMA Ruta script. The evaluator can be chosen in | |
the testing environment's preference page. The preference page can be opened either through | |
the menu or by clicking on the | |
<quote>Select evaluator</quote> | |
button (blue gear wheels) in the testing view's toolbar. Clicking the button will open a | |
filtered version of the UIMA Ruta preference page. The default evaluator is the "Exact CAS | |
Evaluator", which compares the offsets of the annotations between the test file and the file | |
annotated by the tested script. To get an overview of all available evaluators, see | |
<xref linkend='section.ugr.tools.ruta.workbench.testing.evaluators' /> | |
</para> | |
<para> | |
This preference page (see <xref linkend='figure.ugr.tools.ruta.workbench.testing.preference' />) | |
offers a few options that will modify the plug-ins general behavior. For example, the | |
preloading of previously collected result data can be turned off. An important option in the | |
preference page is the evaluator you can select. On default the "exact evaluator" is selected, | |
which compares the offsets of the annotations, that are contained in the file produced by the | |
selected script with the annotations in the test file. Other evaluators will compare | |
annotations in a different way. | |
</para> | |
<para> | |
<figure id="figure.ugr.tools.ruta.workbench.testing.preference"> | |
<title>The testing preference page view </title> | |
<mediaobject> | |
<imageobject role="html"> | |
<imagedata width="476px" format="PNG" align="center" fileref="&imgroot;testing/preference_2_2_0.png" /> | |
</imageobject> | |
<imageobject role="fo"> | |
<imagedata width="4in" format="PNG" align="center" fileref="&imgroot;testing/preference_2_2_0.png" /> | |
</imageobject> | |
<textobject> | |
<phrase> The testing preference page view. </phrase> | |
</textobject> | |
</mediaobject> | |
</figure> | |
</para> | |
<para> | |
During a test-run it might be convenient to disable testing for specific | |
types like punctuation or tags. The | |
<quote>Select excluded types</quote> | |
button (white exclamation in a red disk) will open a dialog (see <xref linkend='figure.ugr.tools.ruta.workbench.testing.excluded_types' />) | |
where all types can be selected that should not be considered in the test. | |
</para> | |
<para> | |
<figure id="figure.ugr.tools.ruta.workbench.testing.excluded_types"> | |
<title>Excluded types window </title> | |
<mediaobject> | |
<imageobject role="html"> | |
<imagedata format="PNG" align="center" | |
fileref="&imgroot;testing/excluded_types.png" /> | |
</imageobject> | |
<imageobject role="fo"> | |
<imagedata format="PNG" align="center" | |
fileref="&imgroot;testing/excluded_types.png" /> | |
</imageobject> | |
<textobject> | |
<phrase> Excluded types window. </phrase> | |
</textobject> | |
</mediaobject> | |
</figure> | |
</para> | |
<para> | |
A test-run can be started by clicking on the start button. Do this for the | |
UIMA Ruta example project. | |
<xref linkend='figure.ugr.tools.ruta.workbench.testing.annotation_test_test_run' /> | |
shows the results. | |
</para> | |
<para> | |
<figure id="figure.ugr.tools.ruta.workbench.testing.annotation_test_test_run"> | |
<title>The Annotation Test view. </title> | |
<mediaobject> | |
<imageobject role="html"> | |
<imagedata width="576px" format="PNG" align="center" | |
fileref="&imgroot;testing/annotation_test_test_run_2_2_0.png" /> | |
</imageobject> | |
<imageobject role="fo"> | |
<imagedata width="5.5in" format="PNG" align="center" | |
fileref="&imgroot;testing/annotation_test_test_run_2_2_0.png" /> | |
</imageobject> | |
<textobject> | |
<phrase> The Annotation Test view. </phrase> | |
</textobject> | |
</mediaobject> | |
</figure> | |
</para> | |
<para>The testing main view displays some information on how well the script | |
did after every test run. It will display an overall number of true positive, false positive | |
and false negatives annotations of all result files as well as an overall f1-score. | |
Furthermore, a table will be displayed that contains the overall statistics of the selected | |
test file as well as statistics for every single type in the test file. The information | |
displayed are true positives, false positives, false negatives, precision, recall and | |
f1-measure. </para> | |
<para> | |
The testing environment also supports the export of the overall data in form of a | |
comma-separated table. Clicking the | |
<quote>export data</quote> | |
button will open a dialog window that contains this table. The text in this table can be | |
copied and easily imported into other applications. | |
</para> | |
<para> | |
When running a test, the evaluator will create a new result xmiCAS file and will | |
add new true positive, false positive and false negative annotations. By clicking on a file in | |
the test-file list, you can open the corresponding result xmiCAS file in the CAS | |
Editor. While displaying the result xmiCAS file in the CAS Editor, the True Positive, False | |
Positive and False Negative views allow easy navigation through the new tp, fp and fn | |
annotations. The corresponding annotations are displayed in a hierarchic tree structure. This | |
allows an easy tracing of the results within the testing document. Clicking on one of the | |
annotations in those views will highlight the annotation in the CAS Editor. Opening | |
<quote>test1.result.xmi</quote> | |
in the UIMA Ruta example project changes the True Positive view as shown in | |
<xref linkend='figure.ugr.tools.ruta.workbench.testing.true_positive' />. | |
Notice that the type system, which will be used by the CAS Editor to open the evaluated file, | |
can only be resolved for the tested script, if the test files are located in the associated | |
folder structure that is the folder with the name of the script. If the files are located | |
in the temp folder, for example by adding the files to the list of test cases by drag and drop, | |
other strategies to find the correct type system will be applied. For UIMA Ruta projects, | |
for example, this will be the type system of the last launched script in this project. | |
</para> | |
<para> | |
<figure id="figure.ugr.tools.ruta.workbench.testing.true_positive"> | |
<title>The True Positive view. </title> | |
<mediaobject> | |
<imageobject role="html"> | |
<imagedata format="PNG" align="center" | |
fileref="&imgroot;testing/true_positive.png" /> | |
</imageobject> | |
<imageobject role="fo"> | |
<imagedata format="PNG" align="center" | |
fileref="&imgroot;testing/true_positive.png" /> | |
</imageobject> | |
<textobject> | |
<phrase> The True Positive view. </phrase> | |
</textobject> | |
</mediaobject> | |
</figure> | |
</para> | |
</section> | |
<section id="section.ugr.tools.ruta.workbench.testing.evaluators"> | |
<title>Evaluators</title> | |
<para> When testing a CAS file, the system compared the offsets of the annotations of a | |
previously annotated gold standard file with the offsets of the annotations of the result file | |
the script produced. Responsible for comparing annotations in the two CAS files are | |
evaluators. These evaluators have different methods and strategies implemented for comparing the | |
annotations. Also, an extension point is provided that allows easy implementation | |
of new evaluators. </para> | |
<para> Exact Match Evaluator: The Exact Match Evaluator compares the offsets of the annotations | |
in the result and the golden standard file. Any difference will be marked with either a false | |
positive or false negative annotations. </para> | |
<para> Partial Match Evaluator: The Partial Match Evaluator compares the offsets of the | |
annotations in the result and golden standard file. It will allow differences in the beginning | |
or the end of an annotation. For example, "corresponding" and "corresponding " will not be | |
annotated as an error. </para> | |
<para> Core Match Evaluator: The Core Match Evaluator accepts annotations that share a core | |
expression. In this context, a core expression is at least four digits long and starts with a | |
capitalized letter. For example, the two annotations "L404-123-421" and "L404-321-412" would be | |
considered a true positive match, because "L404" is considered a core expression that is | |
contained in both annotations. </para> | |
<para> Word Accuracy Evaluator: Compares the labels of all words/numbers in an annotation, | |
whereas the label equals the type of the annotation. This has the consequence, for example, | |
that each word or number that is not part of the annotation is counted as a single false | |
negative. For example in the sentence: "Christmas is on the 24.12 every year." The script | |
labels "Christmas is on the 12" as a single sentence, while the test file labels the sentence | |
correctly with a single sentence annotation. While, for example, the Exact CAS Evaluator is | |
only assigning a single False Negative annotation, Word Accuracy Evaluator will mark every word | |
or number as a single false negative. </para> | |
<para> Template Only Evaluator: This Evaluator compares the offsets of the annotations and the | |
features, that have been created by the script. For example, the text "Alan Mathison Turing" is | |
marked with the author annotation and "author" contains 2 features: "FirstName" and | |
"LastName". If the script now creates an author annotation with only one feature, the | |
annotation will be marked as a false positive. </para> | |
<para> Template on Word Level Evaluator: The Template On Word Evaluator compares the offsets of | |
the annotations. In addition, it also compares the features and feature structures and the | |
values stored in the features. For example, the annotation "author" might have features like | |
"FirstName" and "LastName". The authors name is "Alan Mathison Turing" and the script correctly | |
assigns the author annotation. The feature assigned by the script are "Firstname : Alan", | |
"LastName : Mathison", while the correct feature values are "FirstName Alan" and "LastName | |
Turing". In this case, the Template Only Evaluator will mark an annotation as a false positive, | |
since the feature values differ. </para> | |
</section> | |
</section> |