<div class="section" title="2.7.1. AFTER"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.after">2.7.1. AFTER</h3></div></div></div> | |
<p> | |
The AFTER condition evaluates true, if the matched annotation | |
starts after the beginning of an arbitrary annotation of the passed | |
type. If a list of types is passed, this has to be true for at least | |
one of them. | |
</p> | |
<div class="section" title="2.7.1.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1150">2.7.1.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">AFTER(Type|TypeListExpression)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.1.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1155">2.7.1.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">CW{AFTER(SW)};</pre><p> | |
</p> | |
<p> | |
Here, the rule matches on a capitalized word, if there is any | |
small written word previously. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.7.2. AND"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.and">2.7.2. AND</h3></div></div></div> | |
<p> | |
The AND condition is a composed condition and evaluates true, if | |
all contained conditions evaluate true. | |
</p> | |
<div class="section" title="2.7.2.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1164">2.7.2.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">AND(Condition1,...,ConditionN)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.2.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1169">2.7.2.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Paragraph{AND(PARTOF(Headline),CONTAINS(Keyword)) | |
->MARK(ImportantHeadline)};</pre><p> | |
</p> | |
<p> | |
In this example, a paragraph is annotated with an | |
ImportantHeadline annotation, if it is part of a Headline and | |
contains a Keyword annotation. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.7.3. BEFORE"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.before">2.7.3. BEFORE</h3></div></div></div> | |
<p> | |
The BEFORE condition evaluates true, if the matched annotation | |
starts before the beginning of an arbitrary annotation of the passed | |
type. If a list of types is passed, this has to be true for at least | |
one of them. | |
</p> | |
<div class="section" title="2.7.3.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1178">2.7.3.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">BEFORE(Type|TypeListExpression)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.3.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1183">2.7.3.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">CW{BEFORE(SW)};</pre><p> | |
</p> | |
<p> | |
Here, the rule matches on a capitalized word, if there is any | |
small written word afterwards. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.7.4. CONTAINS"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.contains">2.7.4. CONTAINS</h3></div></div></div> | |
<p> | |
The CONTAINS condition evaluates true on a matched annotation, | |
if | |
the frequency of the passed type lies within an optionally passed | |
interval. The limits of the passed interval are per default | |
interpreted as absolute numeral values. By passing a further boolean | |
parameter set to true the limits are interpreted as percental | |
values. | |
If no interval parameters are passed at all, then the condition | |
checks | |
whether the matched annotation contains at least one | |
occurrence of the | |
passed type. | |
</p> | |
<div class="section" title="2.7.4.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1192">2.7.4.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">CONTAINS(Type(,NumberExpression,NumberExpression(,BooleanExpression)?)?)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.4.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1197">2.7.4.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Paragraph{CONTAINS(Keyword)->MARK(KeywordParagraph)};</pre><p> | |
</p> | |
<p> | |
A Paragraph is annotated with a KeywordParagraph annotation, if | |
it contains a Keyword annotation. | |
</p> | |
<p> | |
</p><pre class="programlisting">Paragraph{CONTAINS(Keyword,2,4)->MARK(KeywordParagraph)};</pre><p> | |
</p> | |
<p> | |
A Paragraph is annotated with a KeywordParagraph annotation, if | |
it contains between two and four Keyword annotations. | |
</p> | |
<p> | |
</p><pre class="programlisting">Paragraph{CONTAINS(Keyword,50,100,true)->MARK(KeywordParagraph)};</pre><p> | |
</p> | |
<p> | |
A Paragraph is annotated with a KeywordParagraph annotation, if it | |
contains between 50% and 100% Keyword annotations. This is | |
calculated based on the tokens of the Paragraph. If the Paragraph | |
contains six basic annotations (see | |
<a class="xref" href="#ugr.tools.ruta.language.seeding" title="2.3. Basic annotations and tokens">Section 2.3, “Basic annotations and tokens”</a>), two of them are part of one Keyword annotation, and if one basic | |
annotation is also annotated with a Keyword annotation, then the | |
percentage of the contained Keywords is 50%. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.7.5. CONTEXTCOUNT"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.contextcount">2.7.5. CONTEXTCOUNT</h3></div></div></div> | |
<p> | |
The CONTEXTCOUNT condition numbers all occurrences of the | |
matched type within the context of a passed type's annotation | |
consecutively, thus assigning an index to each occurrence. | |
Additionally it stores the index of the matched annotation in a | |
numerical variable if one is passed. The condition evaluates true if | |
the index of the matched annotation is within a passed interval. If | |
no interval is passed, the condition always evaluates true. | |
</p> | |
<div class="section" title="2.7.5.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1213">2.7.5.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">CONTEXTCOUNT(Type(,NumberExpression,NumberExpression)?(,Variable)?)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.5.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1218">2.7.5.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Keyword{CONTEXTCOUNT(Paragraph,2,3,var) | |
->MARK(SecondOrThirdKeywordInParagraph)};</pre><p> | |
</p> | |
<p> | |
Here, the position of the matched Keyword annotation within a | |
Paragraph annotation is calculated and stored in the variable 'var'. | |
If the counted value lies within the interval [2,3], then the matched | |
Keyword is annotated with the SecondOrThirdKeywordInParagraph | |
annotation. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.7.6. COUNT"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.count">2.7.6. COUNT</h3></div></div></div> | |
<p> | |
The COUNT condition can be used in two different ways. In the | |
first case (see first definition), it counts the number of | |
annotations of the passed type within the window of the matched | |
annotation and stores the amount in a numerical variable, if such a | |
variable is passed. The condition evaluates true if the counted | |
amount is within a specified interval. If no interval is passed, the | |
condition always evaluates true. In the second case (see second | |
definition), it counts the number of occurrences of the passed | |
VariableExpression (second parameter) within the passed list (first | |
parameter) and stores the amount in a numerical variable, if such a | |
variable is passed. Again, the condition evaluates true if the counted | |
amount is within a specified interval. If no interval is passed, the | |
condition always evaluates true. | |
</p> | |
<div class="section" title="2.7.6.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1227">2.7.6.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">COUNT(Type(,NumberExpression,NumberExpression)?(,NumberVariable)?)</pre><p> | |
</p> | |
<p> | |
</p><pre class="programlisting">COUNT(ListExpression,VariableExpression | |
(,NumberExpression,NumberExpression)?(,NumberVariable)?)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.6.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1234">2.7.6.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Paragraph{COUNT(Keyword,1,10,var)->MARK(KeywordParagraph)};</pre><p> | |
</p> | |
<p> | |
Here, the amount of Keyword annotations within a Paragraph is | |
calculated and stored in the variable 'var'. If one to ten Keywords | |
were counted, the paragraph is marked with a KeywordParagraph | |
annotation. | |
</p> | |
<p> | |
</p><pre class="programlisting">Paragraph{COUNT(list,"author",5,7,var)};</pre><p> | |
</p> | |
<p> | |
Here, the number of occurrences of STRING "author" within the | |
STRINGLIST 'list' is counted and stored in the variable 'var'. If | |
"author" occurs five to seven times within 'list', the condition | |
evaluates true. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.7.7. CURRENTCOUNT"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.currentcount">2.7.7. CURRENTCOUNT</h3></div></div></div> | |
<p> | |
The CURRENTCOUNT condition numbers all occurrences of the matched | |
type within the whole document consecutively, thus assigning an index | |
to each occurrence. Additionally, it stores the index of the matched | |
annotation in a numerical variable, if one is passed. The condition | |
evaluates true if the index of the matched annotation is within a | |
specified interval. If no interval is passed, the condition always | |
evaluates true. | |
</p> | |
<div class="section" title="2.7.7.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1246">2.7.7.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">CURRENTCOUNT(Type(,NumberExpression,NumberExpression)?(,Variable)?)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.7.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1251">2.7.7.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Paragraph{CURRENTCOUNT(Keyword,3,3,var)->MARK(ParagraphWithThirdKeyword)};</pre><p> | |
</p> | |
<p> | |
Here, the Paragraph, which contains the third Keyword of the | |
whole document, is annotated with the ParagraphWithThirdKeyword | |
annotation. The index is stored in the variable 'var'. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.7.8. ENDSWITH"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.endswith">2.7.8. ENDSWITH</h3></div></div></div> | |
<p> | |
The ENDSWITH condition evaluates true, if an annotation of the | |
given type ends exactly at the same position as the matched | |
annotation. If a list of types is passed, this has to be true for at | |
least one of them. | |
</p> | |
<div class="section" title="2.7.8.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1260">2.7.8.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">ENDSWITH(Type|TypeListExpression) </pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.8.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1265">2.7.8.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Paragraph{ENDSWITH(SW)};</pre><p> | |
</p> | |
<p> | |
Here, the rule matches on a Paragraph annotation, if it ends | |
with a small written word. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.7.9. FEATURE"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.feature">2.7.9. FEATURE</h3></div></div></div> | |
<p> | |
The FEATURE condition compares a feature of the matched | |
annotation with the second argument. | |
</p> | |
<div class="section" title="2.7.9.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1274">2.7.9.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">FEATURE(StringExpression,Expression) </pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.9.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1279">2.7.9.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{FEATURE("language",targetLanguage)}</pre><p> | |
</p> | |
<p> | |
This rule matches, if the feature named 'language' of the | |
document annotation equals the value of the variable | |
'targetLanguage'. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.7.10. IF"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.if">2.7.10. IF</h3></div></div></div> | |
<p> | |
The IF condition evaluates true, if the contained boolean | |
expression evaluates true. | |
</p> | |
<div class="section" title="2.7.10.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1288">2.7.10.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">IF(BooleanExpression) </pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.10.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1293">2.7.10.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Paragraph{IF(keywordAmount > 5)->MARK(KeywordParagraph)};</pre><p> | |
</p> | |
<p> | |
A Paragraph annotation is annotated with a KeywordParagraph | |
annotation, if the value of the variable 'keywordAmount' is greater | |
than five. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.7.11. INLIST"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.inlist">2.7.11. INLIST</h3></div></div></div> | |
<p> | |
The INLIST condition is fulfilled, if the matched annotation is listed | |
in a given word or string list. The (relative) edit distance | |
is currently disabled. | |
</p> | |
<div class="section" title="2.7.11.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1302">2.7.11.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">INLIST(WordList(,NumberExpression,(BooleanExpression)?)?) </pre><p> | |
</p> | |
<p> | |
</p><pre class="programlisting">INLIST(StringList(,NumberExpression,(BooleanExpression)?)?) </pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.11.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1309">2.7.11.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Keyword{INLIST(specialKeywords.txt)->MARK(SpecialKeyword)};</pre><p> | |
</p> | |
<p> | |
A Keyword is annotated with the type SpecialKeyword, if the text | |
of the Keyword annotation is listed in the word list | |
'specialKeywords.txt'. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.7.12. IS"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.is">2.7.12. IS</h3></div></div></div> | |
<p> | |
The IS condition evaluates true, if there is an annotation of the | |
given type with the same beginning and ending offsets as the | |
matched | |
annotation. If a list of types is given, the condition | |
evaluates true, | |
if at least one of them fulfills the former condition. | |
</p> | |
<div class="section" title="2.7.12.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1318">2.7.12.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">IS(Type|TypeListExpression) </pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.12.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1323">2.7.12.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Author{IS(Englishman)->MARK(EnglishAuthor)};</pre><p> | |
</p> | |
<p> | |
If an Author annotation is also annotated with an Englishman | |
annotation, it is annotated with an EnglishAuthor annotation. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.7.13. LAST"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.last">2.7.13. LAST</h3></div></div></div> | |
<p> | |
The LAST condition evaluates true, if the type of the last token | |
within the window of the matched annotation is of the given type. | |
</p> | |
<div class="section" title="2.7.13.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1332">2.7.13.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">LAST(TypeExpression) </pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.13.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1337">2.7.13.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{LAST(CW)};</pre><p> | |
</p> | |
<p> | |
This rule fires, if the last token of the document is a | |
capitalized word. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.7.14. MOFN"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.mofn">2.7.14. MOFN</h3></div></div></div> | |
<p> | |
The MOFN condition is a composed condition. It evaluates true if | |
the number of containing conditions evaluating true is within a given | |
interval. | |
</p> | |
<div class="section" title="2.7.14.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1346">2.7.14.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">MOFN(NumberExpression,NumberExpression,Condition1,...,ConditionN) </pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.14.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1351">2.7.14.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Paragraph{MOFN(1,1,PARTOF(Headline),CONTAINS(Keyword)) | |
->MARK(HeadlineXORKeywords)};</pre><p> | |
</p> | |
<p> | |
A Paragraph is marked as a HeadlineXORKeywords, if the matched | |
text is either part of a Headline annotation or contains Keyword | |
annotations. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.7.15. NEAR"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.near">2.7.15. NEAR</h3></div></div></div> | |
<p> | |
The NEAR condition is fulfilled, if the distance of the matched | |
annotation to an annotation of the given type is within a given | |
interval. The direction is defined by a boolean parameter, whose | |
default value is set to true, therefore searching forward. By default this | |
condition works on an unfiltered index. An optional fifth boolean | |
parameter can be set to true to get the condition being evaluated on | |
a filtered index. | |
</p> | |
<div class="section" title="2.7.15.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1360">2.7.15.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">NEAR(TypeExpression,NumberExpression,NumberExpression | |
(,BooleanExpression(,BooleanExpression)?)?) </pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.15.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1365">2.7.15.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Paragraph{NEAR(Headline,0,10,false)->MARK(NoHeadline)};</pre><p> | |
</p> | |
<p> | |
A Paragraph that starts at most ten tokens after a Headline | |
annotation is annotated with the NoHeadline annotation. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.7.16. NOT"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.not">2.7.16. NOT</h3></div></div></div> | |
<p> | |
The NOT condition negates the result of its contained | |
condition. | |
</p> | |
<div class="section" title="2.7.16.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1374">2.7.16.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">"-"Condition</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.16.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1379">2.7.16.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Paragraph{-PARTOF(Headline)->MARK(Headline)};</pre><p> | |
</p> | |
<p> | |
A Paragraph that is not part of a Headline annotation so far is | |
annotated with a Headline annotation. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.7.17. OR"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.or">2.7.17. OR</h3></div></div></div> | |
<p> | |
The OR Condition is a composed condition and evaluates true, if | |
at least one contained condition is evaluated true. | |
</p> | |
<div class="section" title="2.7.17.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1388">2.7.17.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">OR(Condition1,...,ConditionN)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.17.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1393">2.7.17.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Paragraph{OR(PARTOF(Headline),CONTAINS(Keyword)) | |
->MARK(ImportantParagraph)};</pre><p> | |
</p> | |
<p> | |
In this example a Paragraph is annotated with the | |
ImportantParagraph annotation, if it is a Headline or contains | |
Keyword annotations. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.7.18. PARSE"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.parse">2.7.18. PARSE</h3></div></div></div> | |
<p> | |
The PARSE condition is fulfilled, if the text covered by the | |
matched annotation can be transformed into a value of the given | |
variable's type. If this is possible, the parsed value is | |
additionally assigned to the passed variable. | |
</p> | |
<div class="section" title="2.7.18.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1402">2.7.18.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">PARSE(variable)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.18.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1407">2.7.18.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">NUM{PARSE(var)};</pre><p> | |
</p> | |
<p> | |
If the variable 'var' is of an appropriate numeric type, the | |
value of NUM is parsed and subsequently stored in 'var'. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.7.19. PARTOF"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.partof">2.7.19. PARTOF</h3></div></div></div> | |
<p> | |
The PARTOF condition is fulfilled, if the matched annotation is | |
part of an annotation of the given type. However, it is not necessary | |
that the matched annotation is smaller than the annotation of the | |
given type. Use the (much slower) PARTOFNEQ condition instead, if this | |
is needed. If a type list is given, the condition evaluates true, if | |
the former described condition for a single type is fulfilled for at | |
least one of the types in the list. | |
</p> | |
<div class="section" title="2.7.19.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1416">2.7.19.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">PARTOF(Type|TypeListExpression)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.19.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1421">2.7.19.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Paragraph{PARTOF(Headline) -> MARK(ImportantParagraph)};</pre><p> | |
</p> | |
<p> | |
A Paragraph is an ImportantParagraph, if the matched text is | |
part of a Headline annotation. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.7.20. PARTOFNEQ"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.partofneq">2.7.20. PARTOFNEQ</h3></div></div></div> | |
<p> | |
The PARTOFNEQ condition is fulfilled if the matched annotation | |
is part of (smaller than and inside of) an annotation of the given | |
type. If also annotations of the same size should be acceptable, use | |
the PARTOF condition. If a type list is given, the condition | |
evaluates true if the former described condition is fulfilled for at | |
least one of the types in the list. | |
</p> | |
<div class="section" title="2.7.20.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1430">2.7.20.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">PARTOFNEQ(Type|TypeListExpression)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.20.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1435">2.7.20.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">W{PARTOFNEQ(Headline) -> MARK(ImportantWord)};</pre><p> | |
</p> | |
<p> | |
A word is an <span class="quote">“<span class="quote">ImportantWord</span>”</span>, if it is part of a headline. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.7.21. POSITION"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.position">2.7.21. POSITION</h3></div></div></div> | |
<p> | |
The POSITION condition is fulfilled, if the matched type is the | |
k-th occurence of this type within the window of an annotation of the | |
passed type, whereby k is defined by the value of the passed | |
NumberExpression. If the additional boolean paramter is set to false, | |
then k counts the occurences of of the minimal annotations. | |
</p> | |
<div class="section" title="2.7.21.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1445">2.7.21.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">POSITION(Type,NumberExpression(,BooleanExpression)?)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.21.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1450">2.7.21.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Keyword{POSITION(Paragraph,2)->MARK(SecondKeyword)};</pre><p> | |
</p> | |
<p> | |
The second Keyword in a Paragraph is annotated with the type | |
SecondKeyword. | |
</p> | |
<p> | |
</p><pre class="programlisting">Keyword{POSITION(Paragraph,2,false)->MARK(SecondKeyword)};</pre><p> | |
</p> | |
<p> | |
A Keyword in a Paragraph is annotated with the type | |
SecondKeyword, if it starts at the same offset as the second | |
(visible) RutaBasic annotation, which normally corresponds to | |
the tokens. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.7.22. REGEXP"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.regexp">2.7.22. REGEXP</h3></div></div></div> | |
<p> | |
The REGEXP condition is fulfilled, if the given pattern matches on the | |
matched annotation. However, if a string variable is given as the | |
first | |
argument, then the pattern is evaluated on the value of the | |
variable. | |
For more details on the syntax of regular | |
expressions, take a | |
look at | |
the | |
<a class="ulink" href="http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html" target="_top">Java API</a> | |
. By default the REGEXP condition is case-sensitive. To change this, | |
add an optional boolean parameter, which is set to true. The regular expression is | |
initialized with the flags DOTALL and MULTILINE, and if the optional parameter is set to true, | |
then additionally with the flags CASE_INSENSITIVE and UNICODE_CASE. | |
</p> | |
<div class="section" title="2.7.22.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1463">2.7.22.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">REGEXP((StringVariable,)? StringExpression(,BooleanExpression)?)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.22.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1468">2.7.22.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Keyword{REGEXP("..")->MARK(SmallKeyword)};</pre><p> | |
</p> | |
<p> | |
A Keyword that only consists of two chars is annotated with a | |
SmallKeyword annotation. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.7.23. SCORE"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.score">2.7.23. SCORE</h3></div></div></div> | |
<p> | |
The SCORE condition evaluates the heuristic score of the matched | |
annotation. This score is set or changed by the MARK action. | |
The | |
condition is fulfilled, if the score of the matched annotation is | |
in a | |
given interval. Optionally, the score can be stored in a | |
variable. | |
</p> | |
<div class="section" title="2.7.23.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1477">2.7.23.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">SCORE(NumberExpression,NumberExpression(,Variable)?)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.23.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1482">2.7.23.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">MaybeHeadline{SCORE(40,100)->MARK(Headline)};</pre><p> | |
</p> | |
<p> | |
An annotation of the type MaybeHeadline is annotated with | |
Headline, if its score is between 40 and 100. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.7.24. SIZE"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.size">2.7.24. SIZE</h3></div></div></div> | |
<p> | |
The SIZE contition counts the number of elements in the given | |
list. By default, this condition always evaluates true. When an interval | |
is passed, it evaluates true, if the counted number of list elements | |
is within the interval. The counted number can be stored in an | |
optionally passed numeral variable. | |
</p> | |
<div class="section" title="2.7.24.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1491">2.7.24.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">SIZE(ListExpression(,NumberExpression,NumberExpression)?(,Variable)?)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.24.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1496">2.7.24.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Document{SIZE(list,4,10,var)};</pre><p> | |
</p> | |
<p> | |
This rule fires, if the given list contains between 4 and 10 | |
elements. Additionally, the exact amount is stored in the variable | |
<span class="quote">“<span class="quote">var</span>”</span>. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.7.25. STARTSWITH"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.startswith">2.7.25. STARTSWITH</h3></div></div></div> | |
<p> | |
The STARTSWITH condition evaluates true, if an annotation of the | |
given type starts exactly at the same position as the matched | |
annotation. If a type list is given, the condition evaluates true, if | |
the former is true for at least one of the given types in the list. | |
</p> | |
<div class="section" title="2.7.25.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1506">2.7.25.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">STARTSWITH(Type|TypeListExpression)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.25.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1511">2.7.25.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Paragraph{STARTSWITH(SW)};</pre><p> | |
</p> | |
<p> | |
Here, the rule matches on a Paragraph annotation, if it starts | |
with small written word. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.7.26. TOTALCOUNT"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.totalcount">2.7.26. TOTALCOUNT</h3></div></div></div> | |
<p> | |
The TOTALCOUNT condition counts the annotations of the passed | |
type within the whole document and stores the amount in an optionally | |
passed numerical variable. The condition evaluates true, if the | |
amount | |
is within the passed interval. If no interval is passed, the | |
condition always evaluates true. | |
</p> | |
<div class="section" title="2.7.26.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1520">2.7.26.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">TOTALCOUNT(Type(,NumberExpression,NumberExpression(,Variable)?)?)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.26.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1525">2.7.26.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Paragraph{TOTALCOUNT(Keyword,1,10,var)->MARK(KeywordParagraph)};</pre><p> | |
</p> | |
<p> | |
Here, the amount of Keyword annotations within the whole | |
document is calculated and stored in the variable 'var'. If one to | |
ten Keywords were counted, the Paragraph is marked with a | |
KeywordParagraph annotation. | |
</p> | |
</div> | |
</div> | |
<div class="section" title="2.7.27. VOTE"><div class="titlepage"><div><div><h3 class="title" id="ugr.tools.ruta.language.conditions.vote">2.7.27. VOTE</h3></div></div></div> | |
<p> | |
The VOTE condition counts the annotations of the given two types | |
within the window of the matched annotation and evaluates true, | |
if it | |
finds more annotations of the first type. | |
</p> | |
<div class="section" title="2.7.27.1. Definition:"><div class="titlepage"><div><div><h4 class="title" id="d5e1534">2.7.27.1. | |
<span class="bold"><strong>Definition:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">VOTE(TypeExpression,TypeExpression)</pre><p> | |
</p> | |
</div> | |
<div class="section" title="2.7.27.2. Example:"><div class="titlepage"><div><div><h4 class="title" id="d5e1539">2.7.27.2. | |
<span class="bold"><strong>Example:</strong></span> | |
</h4></div></div></div> | |
<p> | |
</p><pre class="programlisting">Paragraph{VOTE(FirstName,LastName)};</pre><p> | |
</p> | |
<p> | |
Here, this rule fires, if a paragraph contains more firstnames | |
than lastnames. | |
</p> | |
</div> | |
</div> |