<?xml version="1.0" encoding="UTF-8"?> | |
<!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" | |
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[ | |
<!ENTITY imgroot "images/tools/tools.ruta/" > | |
<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" > | |
%uimaents; | |
]> | |
<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor | |
license agreements. See the NOTICE file distributed with this work for additional | |
information regarding copyright ownership. The ASF licenses this file to | |
you under the Apache License, Version 2.0 (the "License"); you may not use | |
this file except in compliance with the License. You may obtain a copy of | |
the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required | |
by applicable law or agreed to in writing, software distributed under the | |
License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS | |
OF ANY KIND, either express or implied. See the License for the specific | |
language governing permissions and limitations under the License. --> | |
<section id="ugr.tools.ruta.language.conditions"> | |
<title>Conditions</title> | |
<section id="ugr.tools.ruta.language.conditions.after"> | |
<title>AFTER</title> | |
<para> | |
The AFTER condition evaluates true, if the matched annotation | |
starts after the beginning of an arbitrary annotation of the passed | |
type. If a list of types is passed, this has to be true for at least | |
one of them. | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[AFTER(Type|TypeListExpression)]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[CW{AFTER(SW)};]]></programlisting> | |
</para> | |
<para> | |
Here, the rule matches on a capitalized word, if there is any | |
small written word previously. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.tools.ruta.language.conditions.and"> | |
<title>AND</title> | |
<para> | |
The AND condition is a composed condition and evaluates true, if | |
all contained conditions evaluate true. | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[AND(Condition1,...,ConditionN)]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[Paragraph{AND(PARTOF(Headline),CONTAINS(Keyword)) | |
->MARK(ImportantHeadline)};]]></programlisting> | |
</para> | |
<para> | |
In this example, a paragraph is annotated with an | |
ImportantHeadline annotation, if it is part of a Headline and | |
contains a Keyword annotation. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.tools.ruta.language.conditions.before"> | |
<title>BEFORE</title> | |
<para> | |
The BEFORE condition evaluates true, if the matched annotation | |
starts before the beginning of an arbitrary annotation of the passed | |
type. If a list of types is passed, this has to be true for at least | |
one of them. | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[BEFORE(Type|TypeListExpression)]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[CW{BEFORE(SW)};]]></programlisting> | |
</para> | |
<para> | |
Here, the rule matches on a capitalized word, if there is any | |
small written word afterwards. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.tools.ruta.language.conditions.contains"> | |
<title>CONTAINS</title> | |
<para> | |
The CONTAINS condition evaluates true on a matched annotation, | |
if | |
the frequency of the passed type lies within an optionally passed | |
interval. The limits of the passed interval are per default | |
interpreted as absolute numeral values. By passing a further boolean | |
parameter set to true the limits are interpreted as percental | |
values. | |
If no interval parameters are passed at all, then the condition | |
checks | |
whether the matched annotation contains at least one | |
occurrence of the | |
passed type. | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[CONTAINS(Type(,NumberExpression,NumberExpression(,BooleanExpression)?)?)]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[Paragraph{CONTAINS(Keyword)->MARK(KeywordParagraph)};]]></programlisting> | |
</para> | |
<para> | |
A Paragraph is annotated with a KeywordParagraph annotation, if | |
it contains a Keyword annotation. | |
</para> | |
<para> | |
<programlisting><![CDATA[Paragraph{CONTAINS(Keyword,2,4)->MARK(KeywordParagraph)};]]></programlisting> | |
</para> | |
<para> | |
A Paragraph is annotated with a KeywordParagraph annotation, if | |
it contains between two and four Keyword annotations. | |
</para> | |
<para> | |
<programlisting><![CDATA[Paragraph{CONTAINS(Keyword,50,100,true)->MARK(KeywordParagraph)};]]></programlisting> | |
</para> | |
<para> | |
A Paragraph is annotated with a KeywordParagraph annotation, if it | |
contains between 50% and 100% Keyword annotations. This is | |
calculated based on the tokens of the Paragraph. If the Paragraph | |
contains six basic annotations (see | |
<xref linkend='ugr.tools.ruta.language.seeding' />), two of them are part of one Keyword annotation, and if one basic | |
annotation is also annotated with a Keyword annotation, then the | |
percentage of the contained Keywords is 50%. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.tools.ruta.language.conditions.contextcount"> | |
<title>CONTEXTCOUNT</title> | |
<para> | |
The CONTEXTCOUNT condition numbers all occurrences of the | |
matched type within the context of a passed type's annotation | |
consecutively, thus assigning an index to each occurrence. | |
Additionally it stores the index of the matched annotation in a | |
numerical variable if one is passed. The condition evaluates true if | |
the index of the matched annotation is within a passed interval. If | |
no interval is passed, the condition always evaluates true. | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[CONTEXTCOUNT(Type(,NumberExpression,NumberExpression)?(,Variable)?)]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[Keyword{CONTEXTCOUNT(Paragraph,2,3,var) | |
->MARK(SecondOrThirdKeywordInParagraph)};]]></programlisting> | |
</para> | |
<para> | |
Here, the position of the matched Keyword annotation within a | |
Paragraph annotation is calculated and stored in the variable 'var'. | |
If the counted value lies within the interval [2,3], then the matched | |
Keyword is annotated with the SecondOrThirdKeywordInParagraph | |
annotation. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.tools.ruta.language.conditions.count"> | |
<title>COUNT</title> | |
<para> | |
The COUNT condition can be used in two different ways. In the | |
first case (see first definition), it counts the number of | |
annotations of the passed type within the window of the matched | |
annotation and stores the amount in a numerical variable, if such a | |
variable is passed. The condition evaluates true if the counted | |
amount is within a specified interval. If no interval is passed, the | |
condition always evaluates true. In the second case (see second | |
definition), it counts the number of occurrences of the passed | |
VariableExpression (second parameter) within the passed list (first | |
parameter) and stores the amount in a numerical variable, if such a | |
variable is passed. Again, the condition evaluates true if the counted | |
amount is within a specified interval. If no interval is passed, the | |
condition always evaluates true. | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[COUNT(Type(,NumberExpression,NumberExpression)?(,NumberVariable)?)]]></programlisting> | |
</para> | |
<para> | |
<programlisting><![CDATA[COUNT(ListExpression,VariableExpression | |
(,NumberExpression,NumberExpression)?(,NumberVariable)?)]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[Paragraph{COUNT(Keyword,1,10,var)->MARK(KeywordParagraph)};]]></programlisting> | |
</para> | |
<para> | |
Here, the amount of Keyword annotations within a Paragraph is | |
calculated and stored in the variable 'var'. If one to ten Keywords | |
were counted, the paragraph is marked with a KeywordParagraph | |
annotation. | |
</para> | |
<para> | |
<programlisting><![CDATA[Paragraph{COUNT(list,"author",5,7,var)};]]></programlisting> | |
</para> | |
<para> | |
Here, the number of occurrences of STRING "author" within the | |
STRINGLIST 'list' is counted and stored in the variable 'var'. If | |
"author" occurs five to seven times within 'list', the condition | |
evaluates true. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.tools.ruta.language.conditions.currentcount"> | |
<title>CURRENTCOUNT</title> | |
<para> | |
The CURRENTCOUNT condition numbers all occurrences of the matched | |
type within the whole document consecutively, thus assigning an index | |
to each occurrence. Additionally, it stores the index of the matched | |
annotation in a numerical variable, if one is passed. The condition | |
evaluates true if the index of the matched annotation is within a | |
specified interval. If no interval is passed, the condition always | |
evaluates true. | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[CURRENTCOUNT(Type(,NumberExpression,NumberExpression)?(,Variable)?)]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[Paragraph{CURRENTCOUNT(Keyword,3,3,var)->MARK(ParagraphWithThirdKeyword)};]]></programlisting> | |
</para> | |
<para> | |
Here, the Paragraph, which contains the third Keyword of the | |
whole document, is annotated with the ParagraphWithThirdKeyword | |
annotation. The index is stored in the variable 'var'. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.tools.ruta.language.conditions.endswith"> | |
<title>ENDSWITH</title> | |
<para> | |
The ENDSWITH condition evaluates true, if an annotation of the | |
given type ends exactly at the same position as the matched | |
annotation. If a list of types is passed, this has to be true for at | |
least one of them. | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[ENDSWITH(Type|TypeListExpression) ]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[Paragraph{ENDSWITH(SW)};]]></programlisting> | |
</para> | |
<para> | |
Here, the rule matches on a Paragraph annotation, if it ends | |
with a small written word. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.tools.ruta.language.conditions.feature"> | |
<title>FEATURE</title> | |
<para> | |
The FEATURE condition compares a feature of the matched | |
annotation with the second argument. | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[FEATURE(StringExpression,Expression) ]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[Document{FEATURE("language",targetLanguage)}]]></programlisting> | |
</para> | |
<para> | |
This rule matches, if the feature named 'language' of the | |
document annotation equals the value of the variable | |
'targetLanguage'. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.tools.ruta.language.conditions.if"> | |
<title>IF</title> | |
<para> | |
The IF condition evaluates true, if the contained boolean | |
expression evaluates true. | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[IF(BooleanExpression) ]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[Paragraph{IF(keywordAmount > 5)->MARK(KeywordParagraph)};]]></programlisting> | |
</para> | |
<para> | |
A Paragraph annotation is annotated with a KeywordParagraph | |
annotation, if the value of the variable 'keywordAmount' is greater | |
than five. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.tools.ruta.language.conditions.inlist"> | |
<title>INLIST</title> | |
<para> | |
The INLIST condition is fulfilled, if the matched annotation is listed | |
in a given word or string list. If an optional agrument is given, then | |
the value of the argument is used instead of the covered text of the matched annotation | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[INLIST(WordList(,StringExpression)?) ]]></programlisting> | |
</para> | |
<para> | |
<programlisting><![CDATA[INLIST(StringList(,StringExpression)?) ]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[Keyword{INLIST(SpecialKeywordList)->MARK(SpecialKeyword)};]]></programlisting> | |
</para> | |
<para> | |
A Keyword is annotated with the type SpecialKeyword, if the text | |
of the Keyword annotation is listed in the word list or string list | |
SpecialKeywordList. | |
</para> | |
<para> | |
<programlisting><![CDATA[Token{INLIST(MyLemmaList, Token.lemma)->MARK(SpecialLemma)};]]></programlisting> | |
</para> | |
<para> | |
This rule creates an annotation of the type SpecialLemma for each token that provides a feature value | |
of the feature "lemma" that is present in the string list or word list MyLemmaList. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.tools.ruta.language.conditions.is"> | |
<title>IS</title> | |
<para> | |
The IS condition evaluates true, if there is an annotation of the | |
given type with the same beginning and ending offsets as the | |
matched | |
annotation. If a list of types is given, the condition | |
evaluates true, | |
if at least one of them fulfills the former condition. | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[IS(Type|TypeListExpression) ]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[Author{IS(Englishman)->MARK(EnglishAuthor)};]]></programlisting> | |
</para> | |
<para> | |
If an Author annotation is also annotated with an Englishman | |
annotation, it is annotated with an EnglishAuthor annotation. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.tools.ruta.language.conditions.last"> | |
<title>LAST</title> | |
<para> | |
The LAST condition evaluates true, if the type of the last token | |
within the window of the matched annotation is of the given type. | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[LAST(TypeExpression) ]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[Document{LAST(CW)};]]></programlisting> | |
</para> | |
<para> | |
This rule fires, if the last token of the document is a | |
capitalized word. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.tools.ruta.language.conditions.mofn"> | |
<title>MOFN</title> | |
<para> | |
The MOFN condition is a composed condition. It evaluates true if | |
the number of containing conditions evaluating true is within a given | |
interval. | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[MOFN(NumberExpression,NumberExpression,Condition1,...,ConditionN) ]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[Paragraph{MOFN(1,1,PARTOF(Headline),CONTAINS(Keyword)) | |
->MARK(HeadlineXORKeywords)};]]></programlisting> | |
</para> | |
<para> | |
A Paragraph is marked as a HeadlineXORKeywords, if the matched | |
text is either part of a Headline annotation or contains Keyword | |
annotations. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.tools.ruta.language.conditions.near"> | |
<title>NEAR</title> | |
<para> | |
The NEAR condition is fulfilled, if the distance of the matched | |
annotation to an annotation of the given type is within a given | |
interval. The direction is defined by a boolean parameter, whose | |
default value is set to true, therefore searching forward. By default this | |
condition works on an unfiltered index. An optional fifth boolean | |
parameter can be set to true to get the condition being evaluated on | |
a filtered index. | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[NEAR(TypeExpression,NumberExpression,NumberExpression | |
(,BooleanExpression(,BooleanExpression)?)?) ]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[Paragraph{NEAR(Headline,0,10,false)->MARK(NoHeadline)};]]></programlisting> | |
</para> | |
<para> | |
A Paragraph that starts at most ten tokens after a Headline | |
annotation is annotated with the NoHeadline annotation. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.tools.ruta.language.conditions.not"> | |
<title>NOT</title> | |
<para> | |
The NOT condition negates the result of its contained | |
condition. | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA["-"Condition]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[Paragraph{-PARTOF(Headline)->MARK(Headline)};]]></programlisting> | |
</para> | |
<para> | |
A Paragraph that is not part of a Headline annotation so far is | |
annotated with a Headline annotation. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.tools.ruta.language.conditions.or"> | |
<title>OR</title> | |
<para> | |
The OR Condition is a composed condition and evaluates true, if | |
at least one contained condition is evaluated true. | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[OR(Condition1,...,ConditionN)]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[Paragraph{OR(PARTOF(Headline),CONTAINS(Keyword)) | |
->MARK(ImportantParagraph)};]]></programlisting> | |
</para> | |
<para> | |
In this example a Paragraph is annotated with the | |
ImportantParagraph annotation, if it is a Headline or contains | |
Keyword annotations. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.tools.ruta.language.conditions.parse"> | |
<title>PARSE</title> | |
<para> | |
The PARSE condition is fulfilled, if the text covered by the | |
matched annotation or the text defined by a optional first argument can be transformed into a value of the given | |
variable's type. If this is possible, the parsed value is | |
additionally assigned to the passed variable. For numeric values, | |
this conditions delegates to the NumberFormat of the locale given by the optional | |
last argument. Therefore, this condition parses the string <quote>2,3</quote> for the locale | |
<quote>en</quote> to the value 23. | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[PARSE((stringExpression,)? variable(, stringExpression)?)]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[NUM{PARSE(var,"de")}; | |
n:NUM{PARSE(n.ct,var,"de")};]]></programlisting> | |
</para> | |
<para> | |
If the variable 'var' is of an appropriate numeric type for the locale "de", the | |
value of NUM is parsed and subsequently stored in 'var'. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.tools.ruta.language.conditions.partof"> | |
<title>PARTOF</title> | |
<para> | |
The PARTOF condition is fulfilled, if the matched annotation is | |
part of an annotation of the given type. However, it is not necessary | |
that the matched annotation is smaller than the annotation of the | |
given type. Use the (much slower) PARTOFNEQ condition instead, if this | |
is needed. If a type list is given, the condition evaluates true, if | |
the former described condition for a single type is fulfilled for at | |
least one of the types in the list. | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[PARTOF(Type|TypeListExpression)]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[Paragraph{PARTOF(Headline) -> MARK(ImportantParagraph)};]]></programlisting> | |
</para> | |
<para> | |
A Paragraph is an ImportantParagraph, if the matched text is | |
part of a Headline annotation. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.tools.ruta.language.conditions.partofneq"> | |
<title>PARTOFNEQ</title> | |
<para> | |
The PARTOFNEQ condition is fulfilled if the matched annotation | |
is part of (smaller than and inside of) an annotation of the given | |
type. If also annotations of the same size should be acceptable, use | |
the PARTOF condition. If a type list is given, the condition | |
evaluates true if the former described condition is fulfilled for at | |
least one of the types in the list. | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[PARTOFNEQ(Type|TypeListExpression)]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[W{PARTOFNEQ(Headline) -> MARK(ImportantWord)};]]></programlisting> | |
</para> | |
<para> | |
A word is an <quote>ImportantWord</quote>, if it is part of a headline. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.tools.ruta.language.conditions.position"> | |
<title>POSITION</title> | |
<para> | |
The POSITION condition is fulfilled, if the matched type is the | |
k-th occurrence of this type within the window of an annotation of the | |
passed type, whereby k is defined by the value of the passed | |
NumberExpression. If the additional boolean paramter is set to false, | |
then k counts the occurrences of of the minimal annotations. | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[POSITION(Type,NumberExpression(,BooleanExpression)?)]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[Keyword{POSITION(Paragraph,2)->MARK(SecondKeyword)};]]></programlisting> | |
</para> | |
<para> | |
The second Keyword in a Paragraph is annotated with the type | |
SecondKeyword. | |
</para> | |
<para> | |
<programlisting><![CDATA[Keyword{POSITION(Paragraph,2,false)->MARK(SecondKeyword)};]]></programlisting> | |
</para> | |
<para> | |
A Keyword in a Paragraph is annotated with the type | |
SecondKeyword, if it starts at the same offset as the second | |
(visible) RutaBasic annotation, which normally corresponds to | |
the tokens. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.tools.ruta.language.conditions.regexp"> | |
<title>REGEXP</title> | |
<para> | |
The REGEXP condition is fulfilled, if the given pattern matches on the | |
matched annotation. However, if a string variable is given as the | |
first | |
argument, then the pattern is evaluated on the value of the | |
variable. | |
For more details on the syntax of regular | |
expressions, take a | |
look at | |
the | |
<ulink | |
url="http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html">Java API</ulink> | |
. By default the REGEXP condition is case-sensitive. To change this, | |
add an optional boolean parameter, which is set to true. The regular expression is | |
initialized with the flags DOTALL and MULTILINE, and if the optional parameter is set to true, | |
then additionally with the flags CASE_INSENSITIVE and UNICODE_CASE. | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[REGEXP((StringVariable,)? StringExpression(,BooleanExpression)?)]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[Keyword{REGEXP("..")->MARK(SmallKeyword)};]]></programlisting> | |
</para> | |
<para> | |
A Keyword that only consists of two chars is annotated with a | |
SmallKeyword annotation. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.tools.ruta.language.conditions.score"> | |
<title>SCORE</title> | |
<para> | |
The SCORE condition evaluates the heuristic score of the matched | |
annotation. This score is set or changed by the MARK action. | |
The | |
condition is fulfilled, if the score of the matched annotation is | |
in a | |
given interval. Optionally, the score can be stored in a | |
variable. | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[SCORE(NumberExpression,NumberExpression(,Variable)?)]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[MaybeHeadline{SCORE(40,100)->MARK(Headline)};]]></programlisting> | |
</para> | |
<para> | |
An annotation of the type MaybeHeadline is annotated with | |
Headline, if its score is between 40 and 100. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.tools.ruta.language.conditions.size"> | |
<title>SIZE</title> | |
<para> | |
The SIZE contition counts the number of elements in the given | |
list. By default, this condition always evaluates true. When an interval | |
is passed, it evaluates true, if the counted number of list elements | |
is within the interval. The counted number can be stored in an | |
optionally passed numeral variable. | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[SIZE(ListExpression(,NumberExpression,NumberExpression)?(,Variable)?)]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[Document{SIZE(list,4,10,var)};]]></programlisting> | |
</para> | |
<para> | |
This rule fires, if the given list contains between 4 and 10 | |
elements. Additionally, the exact amount is stored in the variable | |
<quote>var</quote>. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.tools.ruta.language.conditions.startswith"> | |
<title>STARTSWITH</title> | |
<para> | |
The STARTSWITH condition evaluates true, if an annotation of the | |
given type starts exactly at the same position as the matched | |
annotation. If a type list is given, the condition evaluates true, if | |
the former is true for at least one of the given types in the list. | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[STARTSWITH(Type|TypeListExpression)]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[Paragraph{STARTSWITH(SW)};]]></programlisting> | |
</para> | |
<para> | |
Here, the rule matches on a Paragraph annotation, if it starts | |
with small written word. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.tools.ruta.language.conditions.totalcount"> | |
<title>TOTALCOUNT</title> | |
<para> | |
The TOTALCOUNT condition counts the annotations of the passed | |
type within the whole document and stores the amount in an optionally | |
passed numerical variable. The condition evaluates true, if the | |
amount | |
is within the passed interval. If no interval is passed, the | |
condition always evaluates true. | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[TOTALCOUNT(Type(,NumberExpression,NumberExpression)?(,NumberVariable)?)]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[Paragraph{TOTALCOUNT(Keyword,1,10,var)->MARK(KeywordParagraph)};]]></programlisting> | |
</para> | |
<para> | |
Here, the amount of Keyword annotations within the whole | |
document is calculated and stored in the variable 'var'. If one to | |
ten Keywords were counted, the Paragraph is marked with a | |
KeywordParagraph annotation. | |
</para> | |
</section> | |
</section> | |
<section id="ugr.tools.ruta.language.conditions.vote"> | |
<title>VOTE</title> | |
<para> | |
The VOTE condition counts the annotations of the given two types | |
within the window of the matched annotation and evaluates true, | |
if it | |
finds more annotations of the first type. | |
</para> | |
<section> | |
<title> | |
<emphasis role="bold">Definition:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[VOTE(TypeExpression,TypeExpression)]]></programlisting> | |
</para> | |
</section> | |
<section> | |
<title> | |
<emphasis role="bold">Example:</emphasis> | |
</title> | |
<para> | |
<programlisting><![CDATA[Paragraph{VOTE(FirstName,LastName)};]]></programlisting> | |
</para> | |
<para> | |
Here, this rule fires, if a paragraph contains more firstnames | |
than lastnames. | |
</para> | |
</section> | |
</section> | |
</section> |