blob: a6b8db1a4d95faf12b10b0a20544407b50a737e2 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
<!ENTITY imgroot "images/tools/tools.textmarker/" >
<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >
%uimaents;
]>
<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor
license agreements. See the NOTICE file distributed with this work for additional
information regarding copyright ownership. The ASF licenses this file to
you under the Apache License, Version 2.0 (the "License"); you may not use
this file except in compliance with the License. You may obtain a copy of
the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required
by applicable law or agreed to in writing, software distributed under the
License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS
OF ANY KIND, either express or implied. See the License for the specific
language governing permissions and limitations under the License. -->
<section id="ugr.tools.tm.language.syntax">
<title>Syntax</title>
<para>
TextMarker defines its own language for writing rules and rule
scripts. This section gives a formal overview of its syntax.
</para>
<para>
Structure: The overall structure of a TextMarker
script is defined by
the following syntax.
<programlisting><![CDATA[Script -> PackageDeclaration GlobalStatements Statements
PackageDeclaration -> "PACKAGE" DottedIdentifier ";"
GlobalStatments -> GlobalStatement*
GlobalStatment -> ("TYPESYSTEM" | "SCRIPT" | "ENGINE")
DottedIdentifier2 ";"
Statements -> Statement*
Statement -> Declaration | VariableDeclaration
| BlockDeclaration | SimpleStatement ";"]]></programlisting>
Comments are excluded from the syntax definition. Comments start with
"//" and always go to the end of the line.
</para>
<para>
Example beginning of a TextMarker file:
<programlisting><![CDATA[PACKAGE uima.textmarker.example;
// import the types of this type system
// (located in the descriptor folder -> types folder)
TYPESYSTEM types.BibtexTypeSystem;
SCRIPT uima.textmarker.example.Author;
SCRIPT uima.textmarker.example.Title;
SCRIPT uima.textmarker.example.Year;
]]></programlisting>
Syntax of declarations:
<programlisting><![CDATA[Declaration -> "DECLARE" (AnnotationType)? Identifier ("," Identifier )*
| "DECLARE" AnnotationType Identifier ( "("
FeatureDeclaration ")" )?
FeatureDeclaration -> ( (AnnotationType | "STRING" | "INT" | "FLOAT"
"DOUBLE" | "BOOLEAN") Identifier) )+
VariableDeclaration -> (("TYPE" Identifier ("," Identifier)*
("=" AnnotationType)?)
| ("STRING" Identifier ("," Identifier)*
("=" StringExpression)?)
| (("INT" | "DOUBLE" | "FLOAT") Identifier
("," Identifier)* ("=" NumberExpression)?)
| ("BOOLEAN" Identifier ("," Identifier)*
("=" BooleanExpression)?)
| ("WORDLIST" Identifier ("=" WordListExpression)?)
| ("WORDTABLE" Identifier ("=" WordTableExpression)?)
| ("TYPELIST" Identifier ("=" TypeListExpression)?)
| ("STRINGLIST" Identifier
("=" StringListExpression)?)
| (("INTLIST" | "DOUBLELIST" | "FLOATLIST")
Identifier ("=" NumberListExpression)?)
| ("BOOLEANLIST" Identifier
("=" BooleanListExpression)?))
AnnotationType -> BasicAnnotationType | declaredAnnotationType
BasicAnnotationType -> ('COLON'| 'SW' | 'MARKUP' | 'PERIOD' | 'CW'| 'NUM'
| 'QUESTION' | 'SPECIAL' | 'CAP' | 'COMMA'
| 'EXCLAMATION' | 'SEMICOLON' | 'NBSP'| 'AMP' | '_'
| 'SENTENCEEND' | 'W' | 'PM' | 'ANY' | 'ALL'
| 'SPACE' | 'BREAK')
BlockDeclaration -> "BLOCK" "(" Identifier ")" RuleElementWithCA
"{" Statements "}"]]></programlisting>
Syntax of statements and rule elements
<programlisting><![CDATA[SimpleStatement -> RuleElements ";"
RuleElements -> RuleElement+
RuleElement -> RuleElementType | RuleElementLiteral
| RuleElementComposed | RuleElementDisjunctive
RuleElementType -> TypeExpression QuantifierPart?
("{" Conditions? Actions? "}")?
RuleElementWithCA -> TypeExpression QuantifierPart?
"{" Conditions? Actions? "}"
RuleElementLiteral -> SimpleStringExpression QuantifierPart?
"{" Conditions? Actions? "}"
RuleElementComposed -> "(" RuleElements ")" QuantifierPart?
"{" Conditions? Actions? "}"
RuleElementDisjunctive -> "(" (TypeExpression | SimpleStringExpression)
("|" (TypeExpression | SimpleStringExpression) )+
")" QuantifierPart? "{" Conditions? Actions? }"
QuantifierPart -> "*" | "*?" | "+" | "+?" | "?" | "??"
| "[" NumberExpression "," NumberExpression "]"
| "[" NumberExpression "," NumberExpression "]?"
Conditions -> Condition ( "," Condition )*
Actions -> "->" Action ( "," Action)*
]]></programlisting>
Since each condition and each action has its own syntax, conditions
and actions are described in their own section. (For conditions see
<xref linkend='ugr.tools.tm.language.conditions' />
, for actions see
<xref linkend='ugr.tools.tm.language.actions' />
The syntax of expressions is explained in
<xref linkend='ugr.tools.tm.language.expressions' />
</para>
<para>
Identifier
<programlisting><![CDATA[DottedIdentifier -> Identifier ("." Identifier)*
DottedIdentifier2 -> Identifier (("."|"-") Identifier)*
Identifier -> letter (letter|digit)*
]]></programlisting>
</para>
</section>