<div class="section" title="2.8.1. ADD"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.add">2.8.1. ADD</h3></div></div></div> | |
<p> | |
The ADD action adds all the elements of the passed | |
RutaExpressions to a given list. For example, this expressions | |
could be a string, an integer variable or a list. For a | |
complete overview on UIMA Ruta expressions see | |
<a class="xref" href="#ugr.tools.ruta.language.expressions" title="2.6. Expressions">Section 2.6, “Expressions”</a>. | |
</p> | |
<div class="section" title="2.8.1.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1551">2.8.1.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">ADD(ListVariable,(RutaExpression)+)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.1.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1556">2.8.1.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{->ADD(list, var)};</pre><p> | |
</p> | |
<p> | |
In this example, the variable 'var' is added to the list | |
'list'. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.2. ADDFILTERTYPE"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.addfiltertype">2.8.2. ADDFILTERTYPE</h3></div></div></div> | |
<p> | |
The ADDFILTERTYPE action adds its arguments to the list of filtered types, | |
which restrict the visibility of the rules. | |
</p> | |
<div class="section" title="2.8.2.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1565">2.8.2.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">ADDFILTERTYPE(TypeExpression(,TypeExpression)*)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.2.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1570">2.8.2.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{->ADDFILTERTYPE(CW)};</pre><p> | |
</p> | |
<p> | |
After applying this rule, capitalized words are invisible additionally to the previously filtered types. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.3. ADDRETAINTYPE"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.addretaintype">2.8.3. ADDRETAINTYPE</h3></div></div></div> | |
<p> | |
The ADDFILTERTYPE action adds its arguments to the list of retained types, | |
which extend the visibility of the rules. | |
</p> | |
<div class="section" title="2.8.3.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1579">2.8.3.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">ADDRETAINTYPE(TypeExpression(,TypeExpression)*)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.3.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1584">2.8.3.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{->ADDRETAINTYPE(MARKUP)};</pre><p> | |
</p> | |
<p> | |
After applying this rule, markup is visible additionally to the previously retained types. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.4. ASSIGN"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.assign">2.8.4. ASSIGN</h3></div></div></div> | |
<p> | |
The ASSIGN action assigns the value of the passed expression to | |
a variable of the same type. | |
</p> | |
<div class="section" title="2.8.4.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1593">2.8.4.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">ASSIGN(BooleanVariable,BooleanExpression)</pre><p> | |
</p> | |
<p> | |
</p><pre class="programlisting">ASSIGN(NumberVariable,NumberExpression)</pre><p> | |
</p> | |
<p> | |
</p><pre class="programlisting">ASSIGN(StringVariable,StringExpression)</pre><p> | |
</p> | |
<p> | |
</p><pre class="programlisting">ASSIGN(TypeVariable,TypeExpression)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.4.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1604">2.8.4.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{->ASSIGN(amount, (amount/2))};</pre><p> | |
</p> | |
<p> | |
In this example, the value of the variable 'amount' is divided in half. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.5. CALL"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.call">2.8.5. CALL</h3></div></div></div> | |
<p> | |
The CALL action initiates the execution of a different script | |
file or script block. Currently, only complete script files are | |
supported. | |
</p> | |
<div class="section" title="2.8.5.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1613">2.8.5.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">CALL(DifferentFile)</pre><p> | |
</p> | |
<p> | |
</p><pre class="programlisting">CALL(Block)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.5.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1620">2.8.5.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{->CALL(NamedEntities)};</pre><p> | |
</p> | |
<p> | |
Here, a script 'NamedEntities' for named entity recognition is | |
executed. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.6. CLEAR"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.clear">2.8.6. CLEAR</h3></div></div></div> | |
<p> | |
The CLEAR action removes all elements of the given list. If the list was initialized as it was declared, | |
then it is reset to its initial value. | |
</p> | |
<div class="section" title="2.8.6.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1629">2.8.6.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">CLEAR(ListVariable)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.6.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1634">2.8.6.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{->CLEAR(SomeList)};</pre><p> | |
</p> | |
<p> | |
This rule clears the list 'SomeList'. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.7. COLOR"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.color">2.8.7. COLOR</h3></div></div></div> | |
<p> | |
The COLOR action sets the color of an annotation type in the | |
modified view, if the rule has fired. The background color is passed as | |
the second parameter. The font color can be changed by passing a | |
further color as a third parameter. The supported colors are: black, silver, gray, | |
white, maroon, red, purple, fuchsia, green, lime, olive, yellow, | |
navy, blue, aqua, lightblue, lightgreen, orange, pink, salmon, cyan, | |
violet, tan, brown, white and mediumpurple. | |
</p> | |
<div class="section" title="2.8.7.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1643">2.8.7.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">COLOR(TypeExpression,StringExpression(, StringExpression | |
(, BooleanExpression)?)?)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.7.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1648">2.8.7.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{->COLOR(Headline, "red", "green", true)};</pre><p> | |
</p> | |
<p> | |
This rule colors all Headline annotations in the modified view. | |
Thereby, the background color is set to red, font color is set to green | |
and all 'Headline' annotations are selected when opening the | |
modified view. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.8. CONFIGURE"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.configure">2.8.8. CONFIGURE</h3></div></div></div> | |
<p> | |
The CONFIGURE action can be used to configure the analysis | |
engine of the given namespace (first parameter). The parameters that | |
should be configured with corresponding values are passed as | |
name-value | |
pairs. | |
</p> | |
<div class="section" title="2.8.8.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1657">2.8.8.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">CONFIGURE(AnalysisEngine(,StringExpression = Expression)+)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.8.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1662">2.8.8.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">ENGINE utils.HtmlAnnotator; | |
Document{->CONFIGURE(HtmlAnnotator, "onlyContent" = false)};</pre><p> | |
</p> | |
<p> | |
The former rule changes the value of configuration parameter <span class="quote">“<span class="quote">onlyContent</span>”</span> | |
to false and reconfigure the analysis engine. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.9. CREATE"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.create">2.8.9. CREATE</h3></div></div></div> | |
<p> | |
The CREATE action is similar to the MARK action. It also | |
annotates the matched text fragments with a type annotation, but | |
additionally assigns values to a chosen subset of the type's feature | |
elements. | |
</p> | |
<div class="section" title="2.8.9.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1672">2.8.9.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">CREATE(TypeExpression(,NumberExpression)* | |
(,StringExpression = Expression)+)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.9.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1677">2.8.9.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Paragraph{COUNT(ANY,0,10000,cnt)->CREATE(Headline,"size" = cnt)};</pre><p> | |
</p> | |
<p> | |
This rule counts the number of tokens of type ANY in a | |
Paragraph annotation and assigns the counted value to the int | |
variable 'cnt'. If the counted number is between 0 and 10000, a | |
Headline annotation is created for this Paragraph. Moreover, the | |
feature named 'size' of Headline is set to the value of 'cnt'. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.10. DEL"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.del">2.8.10. DEL</h3></div></div></div> | |
<p> | |
The DEL action deletes the matched text fragments in the | |
modified | |
view. | |
</p> | |
<div class="section" title="2.8.10.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1686">2.8.10.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">DEL</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.10.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1691">2.8.10.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Name{->DEL};</pre><p> | |
</p> | |
<p> | |
This rule deletes all text fragments that are annotated with a | |
Name annotation. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.11. DYNAMICANCHORING"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.dynamicanchoring">2.8.11. DYNAMICANCHORING</h3></div></div></div> | |
<p> | |
The DYNAMICANCHORING action turns dynamic anchoring on or off | |
(first parameter) and assigns the anchoring parameters penalty | |
(second parameter) and factor (third parameter). | |
</p> | |
<div class="section" title="2.8.11.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1700">2.8.11.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">DYNAMICANCHORING(BooleanExpression | |
(,NumberExpression(,NumberExpression)?)?)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.11.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1705">2.8.11.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{->DYNAMICANCHORING(true)};</pre><p> | |
</p> | |
<p> | |
The above mentioned example activates dynamic anchoring. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.12. EXEC"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.exec">2.8.12. EXEC</h3></div></div></div> | |
<p> | |
The EXEC action initiates the execution of a different script | |
file or analysis engine on the complete input document, independent from | |
the matched text and the current filtering settings. If the imported component (DifferentFile) | |
refers to another script file, it is applied on a new representation of the document: | |
the complete text of the original CAS with the default filtering | |
settings of the UIMA Ruta analysis engine. If it refers to an | |
external analysis engine, then it is applied on the complete document. | |
The optional, first argument is is a string expression, which specifies the view the component should be applied on. | |
The optional, third argument is a list of types, which should be reindexed by Ruta (not UIMA itself). | |
</p> | |
<div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3> | |
<p> | |
Annotations created by the external analysis engine are not accessible for UIMA Ruta rules in the same script. | |
The types of these annotations need to be provided in the second argument in order to be visible to the Ruta rules. | |
</p> | |
</div> | |
<div class="section" title="2.8.12.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1716">2.8.12.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">EXEC((StringExpression,)? DifferentFile(, TypeListExpression)?)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.12.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1721">2.8.12.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">ENGINE NamedEntities; | |
Document{->EXEC(NamedEntities, {Person, Location})};</pre><p> | |
</p> | |
<p> | |
Here, an analysis engine for named entity recognition is | |
executed once on the complete document and the annotations of the types Person and Location (and all subtypes) | |
are reindexed in UIMA Ruta. Without this list of types, the annotations are added to the CAS, but cannot be accessed by Ruta rules. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.13. FILL"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.fill">2.8.13. FILL</h3></div></div></div> | |
<p> | |
The FILL action fills a chosen subset of the given type's | |
feature elements. | |
</p> | |
<div class="section" title="2.8.13.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1730">2.8.13.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">FILL(TypeExpression(,StringExpression = Expression)+)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.13.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1735">2.8.13.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Headline{COUNT(ANY,0,10000,tokenCount) | |
->FILL(Headline,"size" = tokenCount)};</pre><p> | |
</p> | |
<p> | |
Here, the number of tokens within an Headline annotation is | |
counted and stored in variable 'tokenCount'. If the number of tokens | |
is within the interval [0;10000], the FILL action fills the | |
Headline's feature 'size' with the value of 'tokenCount'. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.14. FILTERTYPE"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.filtertype">2.8.14. FILTERTYPE</h3></div></div></div> | |
<p> | |
This action filters the given types of annotations. They are now | |
ignored by rules. Expressions are not yet supported. | |
This action is related to RETAINTYPE (see <a class="xref" href="#ugr.tools.ruta.language.actions.retaintype" title="2.8.34. RETAINTYPE">Section 2.8.34, “RETAINTYPE”</a>). | |
</p> | |
<div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3> | |
<p> | |
The visibility of types is calculated using three lists: | |
A list <span class="quote">“<span class="quote">default</span>”</span> for the initially filtered types, | |
which is specified in the configuration parameters of the analysis engine, the list <span class="quote">“<span class="quote">filtered</span>”</span>, which is | |
specified by the FILTERTYPE action, and the list <span class="quote">“<span class="quote">retained</span>”</span>, which is specified by the RETAINTYPE action. | |
For determining the actual visibility of types, list <span class="quote">“<span class="quote">filtered</span>”</span> is added to list <span class="quote">“<span class="quote">default</span>”</span> | |
and then all elements of list <span class="quote">“<span class="quote">retained</span>”</span> are removed. The annotations of the types in the resulting list are not visible. | |
Please note that the actions FILTERTYPE and RETAINTYPE replace all elements of the respective lists and that RETAINTYPE | |
overrides FILTERTYPE. | |
</p> | |
</div> | |
<div class="section" title="2.8.14.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1753">2.8.14.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">FILTERTYPE((TypeExpression(,TypeExpression)*))?</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.14.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1758">2.8.14.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{->FILTERTYPE(SW)};</pre><p> | |
</p> | |
<p> | |
This rule filters all small written words in the input | |
document. They are further ignored by every rule. | |
</p> | |
<p> | |
</p><pre class="programlisting">Document{->FILTERTYPE};</pre><p> | |
</p> | |
<p> | |
Here, the the action (without parentheses) specifies that no additional types should be filtered. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.15. GATHER"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.gather">2.8.15. GATHER</h3></div></div></div> | |
<p> | |
This action creates a complex structure: an annotation with | |
features. The optionally passed indexes (NumberExpressions after the | |
TypeExpression) can be used to create an annotation that spans the | |
matched information of several rule elements. The features are | |
collected using the indexes of the rule elements of the complete | |
rule. | |
</p> | |
<div class="section" title="2.8.15.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1770">2.8.15.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">GATHER(TypeExpression(,NumberExpression)* | |
(,StringExpression = NumberExpression)+)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.15.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1775">2.8.15.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">DECLARE Annotation A; | |
DECLARE Annotation B; | |
DECLARE Annotation C(Annotation a, Annotation b); | |
W{REGEXP("A")->MARK(A)}; | |
W{REGEXP("B")->MARK(B)}; | |
A B{-> GATHER(C, 1, 2, "a" = 1, "b" = 2)};</pre><p> | |
</p> | |
<p> | |
Two annotations A and B are declared and annotated. The last | |
rule creates an annotation C spanning the elements A (index 1 since | |
it is the first rule element) and B (index 2) with its features 'a' | |
set to annotation A (again index 1) and 'b' set to annotation B | |
(again index 2). | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.16. GET"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.get">2.8.16. GET</h3></div></div></div> | |
<p> | |
The GET action retrieves an element of the given list dependent on a | |
given strategy. | |
</p><div class="table"><a name="d5e1784"></a><p class="title"><b>Table 2.3. Currently supported strategies</b></p><div class="table-contents"> | |
<table summary="Currently supported strategies" style="border-collapse: collapse;border-top: 0.5pt solid black; border-bottom: 0.5pt solid black; border-left: 0.5pt solid black; border-right: 0.5pt solid black; "><colgroup><col><col></colgroup><thead><tr><th style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; " align="left">Strategy</th><th style="border-bottom: 0.5pt solid black; " align="left">Functionality</th></tr></thead><tbody><tr><td style="border-right: 0.5pt solid black; " align="left">dominant</td><td style="" align="left">finds the most occurring element</td></tr></tbody></table> | |
</div></div><p><br class="table-break"> | |
</p> | |
<div class="section" title="2.8.16.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1795">2.8.16.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">GET(ListExpression, Variable, StringExpression)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.16.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1800">2.8.16.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{->GET(list, var, "dominant")};</pre><p> | |
</p> | |
<p> | |
In this example, the element of the list 'list' that occurs | |
most is stored in the variable 'var'. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.17. GETFEATURE"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.getfeature">2.8.17. GETFEATURE</h3></div></div></div> | |
<p> | |
The GETFEATURE action stores the value of the matched | |
annotation's feature (first paramter) in the given variable (second | |
parameter). | |
</p> | |
<div class="section" title="2.8.17.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1809">2.8.17.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">GETFEATURE(StringExpression, Variable)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.17.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1814">2.8.17.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{->GETFEATURE("language", stringVar)};</pre><p> | |
</p> | |
<p> | |
In this example, variable 'stringVar' will contain the value of | |
the feature 'language'. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.18. GETLIST"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.getlist">2.8.18. GETLIST</h3></div></div></div> | |
<p> | |
This action retrieves a list of types dependent on a given strategy. | |
</p><div class="table"><a name="d5e1823"></a><p class="title"><b>Table 2.4. Currently supported strategies</b></p><div class="table-contents"> | |
<table summary="Currently supported strategies" style="border-collapse: collapse;border-top: 0.5pt solid black; border-bottom: 0.5pt solid black; border-left: 0.5pt solid black; border-right: 0.5pt solid black; "><colgroup><col><col></colgroup><thead><tr><th style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; " align="left">Strategy</th><th style="border-bottom: 0.5pt solid black; " align="left">Functionality</th></tr></thead><tbody><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; " align="left">Types</td><td style="border-bottom: 0.5pt solid black; " align="left">get all types within the matched annotation</td></tr><tr><td style="border-right: 0.5pt solid black; border-bottom: 0.5pt solid black; " align="left">Types:End</td><td style="border-bottom: 0.5pt solid black; " align="left">get all types that end at the same offset as the matched | |
annotation | |
</td></tr><tr><td style="border-right: 0.5pt solid black; " align="left">Types:Begin</td><td style="" align="left">get all types that start at the same offset as the | |
matched | |
annotation | |
</td></tr></tbody></table> | |
</div></div><p><br class="table-break"> | |
</p> | |
<div class="section" title="2.8.18.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1840">2.8.18.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">GETLIST(ListVariable, StringExpression)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.18.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1845">2.8.18.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{->GETLIST(list, "Types")};</pre><p> | |
</p> | |
<p> | |
Here, a list of all types within the document is created and | |
assigned to list variable 'list'. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.19. LOG"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.log">2.8.19. LOG</h3></div></div></div> | |
<p> | |
The LOG action writes a log message. | |
</p> | |
<div class="section" title="2.8.19.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1854">2.8.19.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">LOG(StringExpression)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.19.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1859">2.8.19.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{->LOG("processed")};</pre><p> | |
</p> | |
<p> | |
This rule writes a log message with the string "processed". | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.20. MARK"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.mark">2.8.20. MARK</h3></div></div></div> | |
<p> | |
The MARK action is the most important action in the UIMA Ruta | |
system. It creates a new annotation of the given type. The optionally | |
passed indexes (NumberExpressions after the TypeExpression) can be | |
used to create an annotation that spanns the matched information of | |
several rule elements. | |
</p> | |
<div class="section" title="2.8.20.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1868">2.8.20.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">MARK(TypeExpression(,NumberExpression)*)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.20.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1873">2.8.20.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Freeline Paragraph{->MARK(ParagraphAfterFreeline,1,2)};</pre><p> | |
</p> | |
<p> | |
This rule matches on a free line followed by a Paragraph | |
annotation and annotates both in a single ParagraphAfterFreeline | |
annotation. The two numerical expressions at the end of the mark | |
action state that the matched text of the first and the second rule | |
elements are joined to create the boundaries of the new annotation. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.21. MARKFAST"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.markfast">2.8.21. MARKFAST</h3></div></div></div> | |
<p> | |
The MARKFAST action creates annotations of the given type (first | |
parameter), if an element of the passed list (second parameter) occurs | |
within the window of the matched annotation. Thereby, the created | |
annotation does not cover the whole matched annotation. Instead, it | |
only covers the text of the found occurence. The third parameter is | |
optional. It defines, whether the MARKFAST action should ignore the case, | |
whereby its default value is false. The optional fourth parameter | |
specifies a character threshold for the ignorence of the case. It is | |
only relevant, if the ignore-case value is set to true. The last | |
parameter is set to true by default and specifies whether whitespaces | |
in the entries of the dictionary should be ignored. For more | |
information on lists see | |
<a class="xref" href="#ugr.tools.ruta.language.declarations.ressource" title="2.5.3. Resources">Section 2.5.3, “Resources”</a>. | |
Additionally to external word lists, string lists variables can be | |
used. | |
</p> | |
<div class="section" title="2.8.21.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1883">2.8.21.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">MARKFAST(TypeExpression,ListExpression(,BooleanExpression | |
(,NumberExpression,(BooleanExpression)?)?)?)</pre><p> | |
</p><pre class="programlisting">MARKFAST(TypeExpression,StringListExpression(,BooleanExpression | |
(,NumberExpression,(BooleanExpression)?)?)?)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.21.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1889">2.8.21.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">WORDLIST FirstNameList = 'FirstNames.txt'; | |
DECLARE FirstName; | |
Document{-> MARKFAST(FirstName, FirstNameList, true, 2)};</pre><p> | |
</p> | |
<p> | |
This rule annotates all first names listed in the list | |
'FirstNameList' within the document and ignores the case, if the | |
length of the word | |
is greater than 2. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.22. MARKFIRST"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.markfirst">2.8.22. MARKFIRST</h3></div></div></div> | |
<p> | |
The MARKFIRST action annotates the first token (basic annotation) of the matched | |
annotation with the given type. | |
</p> | |
<div class="section" title="2.8.22.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1898">2.8.22.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">MARKFIRST(TypeExpression)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.22.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1903">2.8.22.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{->MARKFIRST(First)};</pre><p> | |
</p> | |
<p> | |
This rule annotates the first token of the document with the | |
annotation First. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.23. MARKLAST"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.marklast">2.8.23. MARKLAST</h3></div></div></div> | |
<p> | |
The MARKLAST action annotates the last token of the matched | |
annotation with the given type. | |
</p> | |
<div class="section" title="2.8.23.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1912">2.8.23.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">MARKLAST(TypeExpression)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.23.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1917">2.8.23.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{->MARKLAST(Last)};</pre><p> | |
</p> | |
<p> | |
This rule annotates the last token of the document with the | |
annotation Last. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.24. MARKONCE"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.markonce">2.8.24. MARKONCE</h3></div></div></div> | |
<p> | |
The MARKONCE action has the same functionality as the MARK | |
action, but creates a new annotation only, | |
if each part of the matched annotation is not yet part of the given type. | |
</p> | |
<div class="section" title="2.8.24.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1926">2.8.24.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">MARKONCE(NumberExpression,TypeExpression(,NumberExpression)*)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.24.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1931">2.8.24.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Freeline Paragraph{->MARKONCE(ParagraphAfterFreeline,1,2)};</pre><p> | |
</p> | |
<p> | |
This rule matches on a free line followed by a Paragraph and | |
annotates both in a single ParagraphAfterFreeline annotation, if no part | |
is not already annotated with ParagraphAfterFreeline annotation. The | |
two numerical expressions at the end of the MARKONCE action state | |
that the matched text of the first and the second rule elements are | |
joined to create the boundaries of the new annotation. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.25. MARKSCORE"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.markscore">2.8.25. MARKSCORE</h3></div></div></div> | |
<p> | |
The MARKSCORE action is similar to the MARK action. It also creates a | |
new annotation of the given type, but only if it is not yet existing. | |
The optionally passed indexes (parameters after the TypeExpression) | |
can be used to create an annotation that spanns the matched | |
information of several rule elements. Additionally, a score value | |
(first parameter) is added to the heuristic score value of the | |
annotation. For more information on heuristic scores see | |
<a class="xref" href="#ugr.tools.ruta.language.score" title="2.12. Heuristic extraction using scoring rules">Section 2.12, “Heuristic extraction using scoring rules”</a> | |
. | |
</p> | |
<div class="section" title="2.8.25.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1941">2.8.25.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">MARKSCORE(NumberExpression,TypeExpression(,NumberExpression)*)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.25.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1946">2.8.25.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Freeline Paragraph{->MARKSCORE(10,ParagraphAfterFreeline,1,2)};</pre><p> | |
</p> | |
<p> | |
This rule matches on a free line followed by a paragraph and | |
annotates both in a single ParagraphAfterFreeline annotation. The | |
two number expressions at the end of the mark action indicate that | |
the matched text of the first and the second rule elements are | |
joined to create the boundaries of the new annotation. Additionally, | |
the score '10' is added to the heuristic threshold of this | |
annotation. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.26. MARKTABLE"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.marktable">2.8.26. MARKTABLE</h3></div></div></div> | |
<p> | |
The MARKTABLE action creates annotations of the given type (first | |
parameter), if an element of the given column (second parameter) of a | |
passed table (third parameter) occures within the window of the | |
matched annotation. Thereby, the created annotation does not cover the | |
whole matched annotation. Instead, it only covers the text of the | |
found occurence. Optionally the MARKTABLE action is able to assign | |
entries of the given table to features of the created annotation. | |
For | |
more information on tables see | |
<a class="xref" href="#ugr.tools.ruta.language.declarations.ressource" title="2.5.3. Resources">Section 2.5.3, “Resources”</a>. Additionally, several configuration parameters are possible. (See example.) | |
</p> | |
<div class="section" title="2.8.26.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1956">2.8.26.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">MARKTABLE(TypeExpression, NumberExpression, TableExpression | |
(,BooleanExpression, NumberExpression, | |
StringExpression, NumberExpression)? | |
(,StringExpression = NumberExpression)+)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.26.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1961">2.8.26.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">WORDTABLE TestTable = 'TestTable.csv'; | |
DECLARE Annotation Struct(STRING first); | |
Document{-> MARKTABLE(Struct, 1, TestTable, | |
true, 4, ".,-", 2, "first" = 2)};</pre><p> | |
</p> | |
<p> | |
In this example, the whole document is searched for all | |
occurences of the entries of the first column of the given table | |
'TestTable'. For each occurence, an annotation of the type Struct is | |
created and its feature 'first' is filled with the entry of the | |
second column. Moreover, the case of the word is ignored if the | |
length of the word exceeds 4. Additionally, the chars '.', ',' and | |
'-' are ignored, but maximally two of them. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.27. MATCHEDTEXT"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.matchedtext">2.8.27. MATCHEDTEXT</h3></div></div></div> | |
<p> | |
The MATCHEDTEXT action saves the text of the matched annotation | |
in a passed String variable. The optionally passed indexes can be | |
used to match the text of several rule elements. | |
</p> | |
<div class="section" title="2.8.27.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1970">2.8.27.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">MATCHEDTEXT(StringVariable(,NumberExpression)*)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.27.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1975">2.8.27.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Headline Paragraph{->MATCHEDTEXT(stringVariable,1,2)};</pre><p> | |
</p> | |
<p> | |
The text covered by the Headline (rule element 1) and the | |
Paragraph (rule element 2) annotation is saved in variable | |
'stringVariable'. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.28. MERGE"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.merge">2.8.28. MERGE</h3></div></div></div> | |
<p> | |
The MERGE action merges a number of given lists. The first | |
parameter defines, if the merge is done as intersection (false) or as | |
union (true). The second parameter is the list variable that will | |
contain the result. | |
</p> | |
<div class="section" title="2.8.28.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1984">2.8.28.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">MERGE(BooleanExpression, ListVariable, ListExpression, (ListExpression)+)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.28.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1989">2.8.28.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{->MERGE(false, listVar, list1, list2, list3)};</pre><p> | |
</p> | |
<p> | |
The elements that occur in all three lists will be placed in | |
the list 'listVar'. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.29. REMOVE"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.remove">2.8.29. REMOVE</h3></div></div></div> | |
<p> | |
The REMOVE action removes lists or single values from a given | |
list. | |
</p> | |
<div class="section" title="2.8.29.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1998">2.8.29.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">REMOVE(ListVariable,(Argument)+)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.29.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e2003">2.8.29.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{->REMOVE(list, var)};</pre><p> | |
</p> | |
<p> | |
In this example, the variable 'var' is removed from the list | |
'list'. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.30. REMOVEDUPLICATE"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.removeduplicate">2.8.30. REMOVEDUPLICATE</h3></div></div></div> | |
<p> | |
This action removes all duplicates within a given list. | |
</p> | |
<div class="section" title="2.8.30.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e2012">2.8.30.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">REMOVEDUPLICATE(ListVariable)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.30.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e2017">2.8.30.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{->REMOVEDUPLICATE(list)};</pre><p> | |
</p> | |
<p> | |
Here, all duplicates within the list 'list' are removed. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.31. REMOVEFILTERTYPE"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.removefiltertype">2.8.31. REMOVEFILTERTYPE</h3></div></div></div> | |
<p> | |
The REMOVEFILTERTYPE action removes its arguments from the list of filtered types, | |
which restrict the visibility of the rules. | |
</p> | |
<div class="section" title="2.8.31.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e2026">2.8.31.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">REMOVEFILTERTYPE(TypeExpression(,TypeExpression)*)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.31.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e2031">2.8.31.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{->REMOVEFILTERTYPE(W)};</pre><p> | |
</p> | |
<p> | |
After applying this rule, words are possibly visible again depending on the current filtering settings. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.32. REMOVERETAINTYPE"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.removeretaintype">2.8.32. REMOVERETAINTYPE</h3></div></div></div> | |
<p> | |
The REMOVEFILTERTYPE action removes its arguments from the list of retained types, | |
which extend the visibility of the rules. | |
</p> | |
<div class="section" title="2.8.32.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e2040">2.8.32.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">REMOVERETAINTYPE(TypeExpression(,TypeExpression)*)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.32.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e2045">2.8.32.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{->REMOVERETAINTYPE(W)};</pre><p> | |
</p> | |
<p> | |
After applying this rule, words are possibly not visible anymore depending on the current filtering settings. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.33. REPLACE"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.replace">2.8.33. REPLACE</h3></div></div></div> | |
<p> | |
The REPLACE action replaces the text of all matched annotations with | |
the given StringExpression. It remembers the modification for the | |
matched annotations and shows them in the modified view (see | |
<a class="xref" href="#ugr.tools.ruta.language.modification" title="2.13. Modification">Section 2.13, “Modification”</a>). | |
</p> | |
<div class="section" title="2.8.33.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e2055">2.8.33.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">REPLACE(StringExpression)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.33.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e2060">2.8.33.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">FirstName{->REPLACE("first name")};</pre><p> | |
</p> | |
<p> | |
This rule replaces all first names with the string 'first | |
name'. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.34. RETAINTYPE"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.retaintype">2.8.34. RETAINTYPE</h3></div></div></div> | |
<p> | |
The RETAINTYPE action retains the given types. This means that they | |
are now not ignored by rules. This action is related to | |
FILTERTYPE (see <a class="xref" href="#ugr.tools.ruta.language.actions.filtertype" title="2.8.14. FILTERTYPE">Section 2.8.14, “FILTERTYPE”</a>). | |
</p> | |
<div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3> | |
<p> | |
The visibility of types is calculated using three lists: | |
A list <span class="quote">“<span class="quote">default</span>”</span> for the initially filtered types, | |
which is specified in the configuration parameters of the analysis engine, the list <span class="quote">“<span class="quote">filtered</span>”</span>, which is | |
specified by the FILTERTYPE action, and the list <span class="quote">“<span class="quote">retained</span>”</span>, which is specified by the RETAINTYPE action. | |
For determining the actual visibility of types, list <span class="quote">“<span class="quote">filtered</span>”</span> is added to list <span class="quote">“<span class="quote">default</span>”</span> | |
and then all elements of list <span class="quote">“<span class="quote">retained</span>”</span> are removed. The annotations of the types in the resulting list are not visible. | |
Please note that the actions FILTERTYPE and RETAINTYPE replace all elements of the respective lists and that RETAINTYPE | |
overrides FILTERTYPE. | |
</p> | |
</div> | |
<div class="section" title="2.8.34.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e2078">2.8.34.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">RETAINTYPE((TypeExpression(,TypeExpression)*))?</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.34.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e2083">2.8.34.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{->RETAINTYPE(SPACE)};</pre><p> | |
</p> | |
<p> | |
Here, all spaces are retained and can be matched by rules. | |
</p> | |
<p> | |
</p><pre class="programlisting">Document{->RETAINTYPE};</pre><p> | |
</p> | |
<p> | |
Here, the the action (without parentheses) specifies that no types should be retained. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.35. SETFEATURE"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.setfeature">2.8.35. SETFEATURE</h3></div></div></div> | |
<p> | |
The SETFEATURE action sets the value of a feature of the | |
matched | |
complex structure. | |
</p> | |
<div class="section" title="2.8.35.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e2095">2.8.35.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">SETFEATURE(StringExpression,Expression)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.35.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e2100">2.8.35.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{->SETFEATURE("language","en")};</pre><p> | |
</p> | |
<p> | |
Here, the feature 'language' of the input document is set to | |
English. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.36. SHIFT"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.shift">2.8.36. SHIFT</h3></div></div></div> | |
<p> | |
The SHIFT action can be used to change the offsets of an annotation. The optional number expressions, | |
which point the rule elements of the rule, specify the new offsets of the annotation. The annotations that | |
will be modified have to start or end at the match of the rule element of the action. This means that the action | |
has to be placed at a matching condition, which will be used to specify the annotations to be changed. | |
</p> | |
<div class="section" title="2.8.36.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e2109">2.8.36.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">SHIFT(TypeExpression(,NumberExpression)*)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.36.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e2114">2.8.36.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Author{-> SHIFT(Author,1,2)} PM;</pre><p> | |
</p> | |
<p> | |
In this example, an annotation of the type <span class="quote">“<span class="quote">Author</span>”</span> is expanded | |
in order to cover the following punctation mark. | |
</p> | |
<p> | |
</p><pre class="programlisting">W{STARTSWITH(FS) -> SHIFT(FS, 1, 2)} W+ MARKUP;</pre><p> | |
</p> | |
<p> | |
In this example, an annotation of the type <span class="quote">“<span class="quote">FS</span>”</span> that consists mostly of words | |
is shrinked by removing the last MARKUP annotation. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.37. TRANSFER"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.transfer">2.8.37. TRANSFER</h3></div></div></div> | |
<p> | |
The TRANSFER action creates a new feature structure and adds all | |
compatible features of the matched annotation. | |
</p> | |
<div class="section" title="2.8.37.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e2128">2.8.37.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">TRANSFER(TypeExpression)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.37.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e2133">2.8.37.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{->TRANSFER(LanguageStorage)};</pre><p> | |
</p> | |
<p> | |
Here, a new feature structure <span class="quote">“<span class="quote">LanguageStorage</span>”</span> is created and | |
the compatible features of the Document annotation are copied. E.g., | |
if LanguageStorage defined a feature named 'language', then the | |
feature value of the Document annotation is copied. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.38. TRIE"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.trie">2.8.38. TRIE</h3></div></div></div> | |
<p> | |
The TRIE action uses an external multi tree word list to | |
annotate the matched annotation and provides several configuration | |
parameters. | |
</p> | |
<div class="section" title="2.8.38.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e2143">2.8.38.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">TRIE((String = Type)+,ListExpression,BooleanExpression,NumberExpression, | |
BooleanExpression,NumberExpression,StringExpression)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.38.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e2148">2.8.38.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{->TRIE("FirstNames.txt" = FirstName, "Companies.txt" = Company, | |
'Dictionary.mtwl', true, 4, false, 0, ".,-/")};</pre><p> | |
</p> | |
<p> | |
Here, the dictionary 'Dictionary.mtwl' that contains word lists | |
for first names and companies is used to annotate the document. The | |
words previously contained in the file 'FirstNames.txt' are | |
annotated with the type FirstName and the words in the file | |
'Companies.txt' with the type Company. The case of the word is | |
ignored, if the length of the word exceeds 4. The edit distance is | |
deactivated. The cost of an edit operation can currently not be | |
configured by an argument. The last argument additionally defines | |
several chars that will be ignored. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.39. TRIM"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.trim">2.8.39. TRIM</h3></div></div></div> | |
<p> | |
The TRIM action changes the offsets on the matched annotations by removing annotations, whose | |
types are specified by the given parameters. | |
</p> | |
<div class="section" title="2.8.39.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e2157">2.8.39.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">TRIE(TypeExpression ( , TypeExpression)*)</pre><p> | |
</p><pre class="programlisting">TRIE(TypeListExpression)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.39.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e2163">2.8.39.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Keyword{-> TRIM(SPACE)};</pre><p> | |
</p> | |
<p> | |
This rule removes all spaces at the beginning and at the end of Keyword annotations and | |
thus changes the offsets of the matched annotations. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.40. UNMARK"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.unmark">2.8.40. UNMARK</h3></div></div></div> | |
<p> | |
The UNMARK action removes the annotation of the given type | |
overlapping the matched annotation. There are two additional configurations: If additional | |
indexes are given, then the span of the specified rule elements are applied, similar the the MARK action. | |
If instead a boolean is given as an additional argument, then all annotations of the given type are removed | |
that start at the matched position. | |
</p> | |
<div class="section" title="2.8.40.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e2172">2.8.40.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">UNMARK(TypeExpression)</pre><p> | |
</p><pre class="programlisting">UNMARK(TypeExpression (,NumberExpression)*)</pre><p> | |
</p><pre class="programlisting">UNMARK(TypeExpression, BooleanExpression)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.40.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e2179">2.8.40.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Headline{->UNMARK(Headline)};</pre><p> | |
</p> | |
<p> | |
Here, the Headline annotation is removed. | |
</p> | |
<p> | |
</p><pre class="programlisting">CW ANY+? QUESTION{->UNMARK(Headline,1,3)};</pre><p> | |
</p> | |
<p> | |
Here, all Headline annotations are removed that start with a capitalized word and end with a question mark. | |
</p> | |
<p> | |
</p><pre class="programlisting">CW{->UNMARK(Headline,true)};</pre><p> | |
</p> | |
<p> | |
Here, all Headline annotations are removed that start with a capitalized word. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.8.41. UNMARKALL"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.actions.unmarkall">2.8.41. UNMARKALL</h3></div></div></div> | |
<p> | |
The UNMARKALL action removes all the annotations of the given | |
type and all of its descendants overlapping the matched annotation, | |
except the annotation is of at least one type in the passed list. | |
</p> | |
<div class="section" title="2.8.41.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e2194">2.8.41.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">UNMARKALL(TypeExpression, TypeListExpression)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.8.41.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e2199">2.8.41.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Annotation{->UNMARKALL(Annotation, {Headline})};</pre><p> | |
</p> | |
<p> | |
Here, all annotations except from headlines are removed. | |
</p> | |
</div> | |
</div> |