blob: b261900be07646405e7099fb4fc39e2efd971d97 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE api-answers PUBLIC "-//NetBeans//DTD Arch Answers//EN" "../nbbuild/antsrc/org/netbeans/nbbuild/Arch.dtd" [
<!ENTITY api-questions SYSTEM "../nbbuild/antsrc/org/netbeans/nbbuild/Arch-api-questions.xml">
]>
<api-answers
question-version="1.29"
author="mmetelka@netbeans.org"
>
&api-questions;
<!--
<question id="arch-what" when="init">
What is this project good for?
<hint>
Please provide here a few lines describing the project,
what problem it should solve, provide links to documentation,
specifications, etc.
</hint>
</question>
-->
<answer id="arch-what">
Lexer module provides token lists for various
text inputs. Token lists can either be flat or they can form
tree token hierarchies if any language embedding is present.
Tokens
</answer>
<!--
<question id="arch-overall" when="init">
Describe the overall architecture.
<hint>
What will be API for
<a href="http://openide.netbeans.org/tutorial/api-design.html#design.apiandspi">
clients and what support API</a>?
What parts will be pluggable?
How will plug-ins be registered? Please use <code>&lt;api type="export"/&gt;</code>
to describe your general APIs.
If possible please provide
simple diagrams.
</hint>
</question>
-->
<answer id="arch-overall">
The lexer module defines
<api name="LexerAPI" group="java" type="export" category="official"/>
providing access to sequence of tokens for various input sources.
<br/>
An <b>API entry point</b> is
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenHierarchy.html">TokenHierarchy</a>
class with its static methods that provide its instance for the given input source.
<h3>Input Sources</h3>
<p>
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenHierarchy.html">TokenHierarchy</a>
can be created for immutable input sources (
<a href="@JDK@/java/lang/CharSequence.html">CharSequence</a>
or
<a href="@JDK@/java/io/Reader.html">java.io.Reader</a>
) or for mutable input sources (typically
<a href="@JDK@/javax/swing/text/Document.html">javax.swing.text.Document</a>
).
<br/>
For mutable input source the lexer framework updates the tokens in the token hierarchy automatically
with subsequent changes to the underlying text input.
The tokens of the hierarchy always reflect the text of the input at the given time.
</p>
<h3>TokenSequence and Token</h3>
<p>
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenHierarchy.html#tokenSequence--">TokenHierarchy.tokenSequence()</a>
allows to iterate over a list of
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/Token.html">Token</a>
instances.
<br/>
The token carries a token identification
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenId.html">TokenId</a>
(returned by
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/Token.html#id--">Token.id()</a>
) and a text (aka token body) represented as
<a href="@JDK@/java/lang/CharSequence.html">CharSequence</a>
(returned by
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/Token.html#text--">Token.text()</a>
).
<br/>
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenUtilities.html">TokenUtilities</a>
contains many useful methods related to operations with the token's text such as
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenUtilities.html#equals-java.lang.CharSequence-java.lang.Object-">TokenUtilities.equals(CharSequence text, Object o)</a>,
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenUtilities.html#startsWith-java.lang.CharSequence-java.lang.CharSequence-">TokenUtilities.startsWith(CharSequence text, CharSequence prefix)</a>,
etc.
<br/>
It is also possible to debug the text of the token (replace special chars by escapes) by
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenUtilities.html#debugText-java.lang.CharSequence-">TokenUtilities.equals(CharSequence text)</a>.
<br/>
A typical token also carries offset of its occurrence in the input text.
</p>
<h3>Flyweight Tokens</h3>
<p>
As there are many token occurrences where the token text is the same for all
or many occurrences
(e.g. java keywords, operators or a single-space whitespace) the memory consumption
can be decreased considerably by allowing the creation of <b>flyweight token</b> instances
i.e. just one token instance is used for all the token's occurrences
in all the inputs.
<br/>
Flyweight tokens can be determined by
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/Token.html#isFlyweight--">Token.isFlyweight()</a>.
<br/>
The flyweight tokens do not carry a valid offset (their internal offset is -1).
<br/>
Therefore
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenSequence.html">TokenSequence</a>
is used for iteration through the tokens (instead of a regular iterator) and it provides
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenSequence.html#offset--">TokenSequence.offset()</a>
which returns the proper offset even when positioned over a flyweight token.
<br/>
When holding a reference to the token's instance its offset can also be determined by
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/Token.html#offset-org.netbeans.api.lexer.TokenHierarchy-">Token.offset(TokenHierarchy tokenHierarchy)</a>.
The <code>tokenHierarchy</code> parameter should be always <code>null</code> and it will be used
for the token hierarchy snapshot support in future releases.
<br/>
For flyweight tokens the
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/Token.html#offset-org.netbeans.api.lexer.TokenHierarchy-">Token.offset(TokenHierarchy tokenHierarchy)</a>
returns -1 and for regular tokens it gives the same value like
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenSequence.html#offset--">TokenSequence.offset()</a>.
</p>
<p>
There may be applications where the flyweight tokens use could be problematic.
For example if a parser would like to use token instances
in a parse tree nodes to determine the nodes' boundaries then the flyweight tokens
would always return offset -1 so the positions of the parse tree nodes
could not generally be determined from the tokens only.
<br/>
Therefore there is a possibility to de-flyweight a token by using
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenSequence.html#offsetToken--">TokenSequence.offsetToken()</a>
which checks the current token
and if it's flyweight then it replaces it with a non-flyweight token instance
with a valid offset and with the same properties as the original flyweight token.
</p>
<h3>TokenId and Language</h3>
<p>
Token is identified by its id represented by
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenId.html">TokenId</a>
interface. Token ids for a language are typically implemented as java enums (extensions of
<a href="@JDK@/java/lang/Enum.html">Enum</a>
) but it's not mandatory.
<br/>
All token ids for the given language are described by
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/Language.html">Language</a>.
<br/>
Each token id may belong
to one or more token categories that allow to better operate
tokens of the same type (e.g. keywords or operators).
<br/>
Each token id may define its primary category
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenId.html#primaryCategory--">TokenId.primaryCategory()</a>
and
<a href="@org-netbeans-modules-lexer@/org/netbeans/spi/lexer/LanguageHierarchy.html#createTokenCategories--">LanguageHierarchy.createTokenCategories()</a>
may provide additional categories for the token ids for the given language.
<br/>
Each language description has a mandatory mime-type specification
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/Language.html#mimeType--">Language.mimeType()</a>
<br/>
Although it's a bit non-related information it brings many benefits
because with the mime-type the language can be accompanied
with an arbitrary sort of settings (e.g. syntax coloring information etc.).
</p>
<h3>LanguageHierarchy, Lexer, LexerInput and TokenFactory</h3>
<p>
SPI providers wishing to provide a
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/Language.html">Language</a>
first need to define its SPI counterpart
<a href="@org-netbeans-modules-lexer@/org/netbeans/spi/lexer/LanguageHierarchy.html">LanguageHierarchy</a>.
It mainly needs to define token ids in
<a href="@org-netbeans-modules-lexer@/org/netbeans/spi/lexer/LanguageHierarchy.html#createTokenIds--">LanguageHierarchy.createTokenIds()</a>
and lexer in
<a href="@org-netbeans-modules-lexer@/org/netbeans/spi/lexer/LanguageHierarchy.html#createLexer-org.netbeans.spi.lexer.LexerRestartInfo-">
LanguageHierarchy.createLexer(LexerInput lexerInput, TokenFactory tokenFactory, Object state, LanguagePath languagePath, InputAttributes inputAttributes)</a>.
<br/>
<a href="@org-netbeans-modules-lexer@/org/netbeans/spi/lexer/Lexer.html">Lexer</a>
reads characters from
<a href="@org-netbeans-modules-lexer@/org/netbeans/spi/lexer/LexerInput.html">LexerInput</a>
and breaks the text into tokens.
<br/>
Tokens are produced by using methods of
<a href="@org-netbeans-modules-lexer@/org/netbeans/spi/lexer/TokenFactory.html">TokenFactory</a>.
<br/>
As a per-token memory consumption is critical the
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/Token.html">Token</a>
does not have any counterpart in SPI. However the framework prevents instantiation
of any other token classes except those contained in the lexer module's implementation.
</p>
<h3>Language Embedding</h3>
<p>
With language embedding the flat list of tokens becomes in fact a tree-like hierarchy
represented by the
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenHierarchy.html">TokenHierarchy</a>
class. Each token can potentially be broken into a sequence of embedded tokens.
<br/>The
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenSequence.html#embedded--">TokenSequence.embedded()</a>
method can be called to obtain the embedded tokens (when positioned on the branch token).
<br/>
There are two ways of specifying what language is embedded in a token. The language
can either be specified explicitly (hardcoded) in the
<a href="@org-netbeans-modules-lexer@/org/netbeans/spi/lexer/LanguageHierarchy.html#embedding-org.netbeans.api.lexer.Token-org.netbeans.api.lexer.LanguagePath-org.netbeans.api.lexer.InputAttributes-">LanguageHierarchy.embedding()</a>
method or there can be a
<a href="@org-netbeans-modules-lexer@/org/netbeans/spi/lexer/LanguageProvider.html">LanguageProvider</a>
registered in the default Lookup, which will create a
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/Language.html">Language</a>
for the embedded language.
<br/>
There is no limit on the depth of a language hierarchy and there can be as many embedded languages
as needed.
<br/>
In SPI the language embedding is represented by
<a href="@org-netbeans-modules-lexer@/org/netbeans/spi/lexer/LanguageEmbedding.html">LanguageEmbedding</a>.
</p>
</answer>
<!--
<question id="arch-usecases" when="init">
Describe the main <a href="http://openide.netbeans.org/tutorial/api-design.html#usecase">
use cases</a> of the new API. Who will use it under
what circumstances? What kind of code would typically need to be written
to use the module?
</question>
-->
<answer id="arch-usecases">
<!-- API Usecases - API Usecases - API Usecases - API Usecases - API Usecases -->
<h1>
API Usecases
</h1>
<h3>
Obtaining of token hierarchy for various inputs.
</h3>
The
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenHierarchy.html">TokenHierarchy</a>
is an entry point into Lexer API
and it represents the given input in terms of tokens.
<pre>
String text = "public void m() { }";
TokenHierarchy hi = TokenHierarchy.create(text, JavaLanguage.description());
</pre>
<br/>
Token hierarchy for swing documents must be operated under read/write document's lock.
<pre>
document.readLock();
try {
TokenHierarchy hi = TokenHierarchy.get(document);
... // explore tokens etc.
} finally {
document.readUnlock();
}
</pre>
<h3>
Obtaining and iterating token sequence over particular swing document from the given offset.
</h3>
The tokens cover the whole document and it's possible to iterate either forward or backward.
<br/>
Each token can contain language embedding that can also be explored by the token sequence.
The language embedding covers the whole text of the token (there can be few characters
skipped at the begining an end of the branch token).
<pre>
document.readLock();
try {
TokenHierarchy hi = TokenHierarchy.get(document);
TokenSequence ts = hi.tokenSequence();
// If necessary move ts to the requested offset
ts.move(offset);
while (ts.moveNext()) {
Token t = ts.token();
if (t.id() == ...) { ... }
if (TokenUtilities.equals(t.text(), "mytext")) { ... }
if (ts.offset() == ...) { ... }
// Possibly retrieve embedded token sequence
TokenSequence embedded = ts.embedded();
if (embedded != null) { // Token has a valid language embedding
...
}
}
} finally {
document.readUnlock();
}
</pre>
<br/>
Typical clients:
<ul>
<li>Editor's painting code doing syntax coloring
<code>org.netbeans.modules.lexer.editorbridge.LexerLayer</code> in <i>lexer/editorbridge</i> module.
</li>
<li>Brace matching code searching for matching brace in forward/backward direction.</li>
<li>Code completion's quick check whether caret is located inside comment token.</li>
<li>Parser constructing a parse tree iterating through the tokens in forward direction.</li>
</ul>
<h3>
Using language path of the token sequence
</h3>
For the given token sequence the client may check whether it's a top level
token sequence in the token hierarchy or whether it's embedded at which level
it's embedded and what are the parent languages.
<br/>
Each token can contain language embedding that can also be explored by the token sequence.
The language embedding covers the whole text of the token (there can be few characters
skipped at the begining an end of the branch token).
<pre>
TokenSequence ts = ...
LanguagePath lp = ts.languagePath();
if (lp.size() > 1) { ... } // This is embedded token sequence
if (lp.topLanguage() == JavaLanguage.description()) { ... } // top-level language of the token hierarchy
String mimePath = lp.mimePath();
Object setting-value = some-settings.getSetting(mimePath, setting-name);
</pre>
<h3>
Extra information about the input
</h3>
The
<a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/InputAttributes.html">InputAttributes</a>
class may carry extra information about the text input on which the token hierarchy
is being created. For example there can be information about the version of the language
that the input represents and the lexer may be written to recognize multiple versions
of the language. It should suffice to do the versioning through a simple integer:
<pre>
public class MyLexer implements Lexer&lt;MyTokenId&gt; {
private final int version;
...
public MyLexer(LexerInput input, TokenFactory&lt;MyTokenId&gt; tokenFactory, Object state,
LanguagePath languagePath, InputAttributes inputAttributes) {
...
Integer ver = (inputAttributes != null)
? (Integer)inputAttributes.getValue(languagePath, "version")
: null;
this.version = (ver != null) ? ver.intValue() : 1; // Use version 1 if not specified explicitly
}
public Token&lt;MyTokenId&gt; nextToken() {
...
if (recognized-assert-keyword) {
return (version &gt;= 4) { // "assert" recognized as keyword since version 4
? keyword(MyTokenId.ASSERT)
: identifier();
}
...
}
...
}
</pre>
The client will then use the following code:
<pre>
InputAttributes attrs = new InputAttributes();
// The "true" means global value i.e. for any occurrence of the MyLanguage including embeddings
attrs.setValue(MyLanguage.description(), "version", Integer.valueOf(3), true);
TokenHierarchy hi = TokenHierarchy.create(text, false, SimpleLanguage.description(), null, attrs);
...
</pre>
<h3>
Filtering out unnecessary tokens
</h3>
Filtering is only possible for immutable inputs (e.g. String or Reader).
<pre>
Set&lt;MyTokenId&gt; skipIds = EnumSet.of(MyTokenId.COMMENT, MyTokenId.WHITESPACE);
TokenHierarchy tokenHierarchy = TokenHierarchy.create(inputText, false,
MyLanguage.description(), skipIds, null);
...
</pre>
<br/>
Typical clients:
<ul>
<li>Parser constructing a parse tree. It is not interested
in the comment and whitespace tokens so these tokens do not need
to be constructed at all.
</li>
</ul>
<!-- SPI Usecases - SPI Usecases - SPI Usecases - SPI Usecases - SPI Usecases -->
<h1>
SPI Usecases
</h1>
<h3>
Providing language description and lexer.
</h3>
Token ids should be defined as enums. For example
<code>org.netbeans.lib.lexer.test.simple.SimpleTokenId</code> can be copied
or the following example from
<code>org.netbeans.modules.lexer.editorbridge.calc.lang.CalcTokenId</code>.
<br/>
The static <code>language()</code> method returns the language describing the token ids.
<pre>
public enum CalcTokenId implements TokenId {
WHITESPACE(null, "whitespace"),
SL_COMMENT(null, "comment"),
ML_COMMENT(null, "comment"),
E("e", "keyword"),
PI("pi", "keyword"),
IDENTIFIER(null, null),
INT_LITERAL(null, "number"),
FLOAT_LITERAL(null, "number"),
PLUS("+", "operator"),
MINUS("-", "operator"),
STAR("*", "operator"),
SLASH("/", "operator"),
LPAREN("(", "separator"),
RPAREN(")", "separator"),
ERROR(null, "error"),
ML_COMMENT_INCOMPLETE(null, "comment");
private final String fixedText;
private final String primaryCategory;
private CalcTokenId(String fixedText, String primaryCategory) {
this.fixedText = fixedText;
this.primaryCategory = primaryCategory;
}
public String fixedText() {
return fixedText;
}
public String primaryCategory() {
return primaryCategory;
}
private static final Language&lt;CalcTokenId&gt; language = new LanguageHierarchy&lt;CalcTokenId&gt;() {
<code>@Override</code>
protected Collection&lt;CalcTokenId&gt; createTokenIds() {
return EnumSet.allOf(CalcTokenId.class);
}
<code>@Override</code>
protected Map&lt;String,Collection&lt;CalcTokenId&gt;&gt; createTokenCategories() {
Map&lt;String,Collection&lt;CalcTokenId&gt;&gt; cats = new HashMap&lt;String,Collection&lt;CalcTokenId&gt;&gt;();
// Incomplete literals
cats.put("incomplete", EnumSet.of(CalcTokenId.ML_COMMENT_INCOMPLETE));
// Additional literals being a lexical error
cats.put("error", EnumSet.of(CalcTokenId.ML_COMMENT_INCOMPLETE));
return cats;
}
<code>@Override</code>
protected Lexer&lt;CalcTokenId&gt; createLexer(LexerRestartInfo&lt;CalcTokenId&gt; info) {
return new CalcLexer(info);
}
<code>@Override</code>
protected String mimeType() {
return "text/x-calc";
}
}.language();
public static final Language&lt;CalcTokenId&gt; language() {
return language;
}
}
</pre>
Note that it is not needed to publish the underlying <code>LanguageHierarchy</code> extension.
<br/>
Lexer example:
<pre>
public final class CalcLexer implements Lexer&lt;CalcTokenId&gt; {
private static final int EOF = LexerInput.EOF;
private static final Map&lt;String,CalcTokenId&gt; keywords = new HashMap&lt;String,CalcTokenId&gt;();
static {
keywords.put(CalcTokenId.E.fixedText(), CalcTokenId.E);
keywords.put(CalcTokenId.PI.fixedText(), CalcTokenId.PI);
}
private LexerInput input;
private TokenFactory&lt;CalcTokenId&gt; tokenFactory;
CalcLexer(LexerRestartInfo&lt;CalcTokenId&gt; info) {
this.input = info.input();
this.tokenFactory = info.tokenFactory();
assert (info.state() == null); // passed argument always null
}
public Token&lt;CalcTokenId&gt; nextToken() {
while (true) {
int ch = input.read();
switch (ch) {
case '+':
return token(CalcTokenId.PLUS);
case '-':
return token(CalcTokenId.MINUS);
case '*':
return token(CalcTokenId.STAR);
case '/':
switch (input.read()) {
case '/': // in single-line comment
while (true)
switch (input.read()) {
case '\r': input.consumeNewline();
case '\n':
case EOF:
return token(CalcTokenId.SL_COMMENT);
}
case '*': // in multi-line comment
while (true) {
ch = input.read();
while (ch == '*') {
ch = input.read();
if (ch == '/')
return token(CalcTokenId.ML_COMMENT);
else if (ch == EOF)
return token(CalcTokenId.ML_COMMENT_INCOMPLETE);
}
if (ch == EOF)
return token(CalcTokenId.ML_COMMENT_INCOMPLETE);
}
}
input.backup(1);
return token(CalcTokenId.SLASH);
case '(':
return token(CalcTokenId.LPAREN);
case ')':
return token(CalcTokenId.RPAREN);
case '0': case '1': case '2': case '3': case '4':
case '5': case '6': case '7': case '8': case '9':
case '.':
return finishIntOrFloatLiteral(ch);
case EOF:
return null;
default:
if (Character.isWhitespace((char)ch)) {
ch = input.read();
while (ch != EOF &amp;&amp; Character.isWhitespace((char)ch)) {
ch = input.read();
}
input.backup(1);
return token(CalcTokenId.WHITESPACE);
}
if (Character.isLetter((char)ch)) { // identifier or keyword
while (true) {
if (ch == EOF || !Character.isLetter((char)ch)) {
input.backup(1); // backup the extra char (or EOF)
// Check for keywords
CalcTokenId id = keywords.get(input.readText());
if (id == null) {
id = CalcTokenId.IDENTIFIER;
}
return token(id);
}
ch = input.read(); // read next char
}
}
return token(CalcTokenId.ERROR);
}
}
}
public Object state() {
return null;
}
private Token&lt;CalcTokenId&gt; finishIntOrFloatLiteral(int ch) {
boolean floatLiteral = false;
boolean inExponent = false;
while (true) {
switch (ch) {
case '.':
if (floatLiteral) {
return token(CalcTokenId.FLOAT_LITERAL);
} else {
floatLiteral = true;
}
break;
case '0': case '1': case '2': case '3': case '4':
case '5': case '6': case '7': case '8': case '9':
break;
case 'e': case 'E': // exponent part
if (inExponent) {
return token(CalcTokenId.FLOAT_LITERAL);
} else {
floatLiteral = true;
inExponent = true;
}
break;
default:
input.backup(1);
return token(floatLiteral ? CalcTokenId.FLOAT_LITERAL
: CalcTokenId.INT_LITERAL);
}
ch = input.read();
}
}
private Token&lt;CalcTokenId&gt; token(CalcTokenId id) {
return (id.fixedText() != null)
? tokenFactory.getFlyweightToken(id, id.fixedText())
: tokenFactory.createToken(id);
}
}
</pre>
<p>
The classes containing token ids and the language description should be
part of an API. The lexer should only be part of the implementation.
</p>
<h3>
Providing language embedding.
</h3>
The embedding may be provided statically
in the <code>LanguageHierarchy.embedding()</code>
see e.g. <code>org.netbeans.lib.lexer.test.simple.SimpleLanguage</code>.
<p>
Or it may be provided dynamically through the xml layer
by using a file in "Editors/language-mime-type/languagesEmbeddingMap" folder
named by the token-id's name containing target mime-type and initial and ending skip lengths:
</p>
<pre>
&lt;folder name="Editors"&gt;
&lt;folder name="text"&gt;
&lt;folder name="x-outer-language"&gt;
&lt;folder name="languagesEmbeddingMap"&gt;
&lt;file name="WORD"&gt;&lt;![CDATA[text/x-inner-language,1,2]]&gt;
&lt;/file&gt;
&lt;/folder&gt;
&lt;/folder&gt;
&lt;/folder&gt;
&lt;/folder&gt;
</pre>
</answer>
<!--
<question id="arch-quality" when="init">
How will the <a href="http://www.netbeans.org/community/guidelines/q-evangelism.html">quality</a>
of your code be tested and
how are future regressions going to be prevented?
<hint>
What kind of testing do
you want to use? How much functionality, in which areas,
should be covered by the tests?
</hint>
</question>
-->
<answer id="arch-quality">
The lexer module is completely unit-testable.
<br/>
Besides of tests for its own correctness it also contains support
for testing of correctness of lexers from SPI providers
by using <code>org.netbeans.lib.lexer.test.TestRandomModify</code> class.
<br/>
The main testing method for the lexer correctnes is token-by-token comparing
of the updated token sequence with a batch-lexed token sequence for the same input.
</answer>
<!--
<question id="arch-time" when="init">
What are the time estimates of the work?
<hint>
Please express your estimates of how long the design, implementation,
stabilization are likely to last. How many people will be needed to
implement this and what is the expected milestone by which the work should be
ready?
</hint>
</question>
-->
<answer id="arch-time">
The present implementation is stable but there are few missing implementations
and other things to be considered:
<ul>
<li>Dynamic language embedding binding through xml layer.</li>
<li>CharPreprocessor servicing and tests.</li>
<li>Token hierarchy for Reader.</li>
<li>TokenFactory.createBranchToken() impl.</li>
<li>Providing JavaCC and Antlr support.</li>
<li>Support for token positions (may add API).</li>
</ul>
</answer>
<!--
<question id="compat-i18n" when="impl">
Is your module correctly internationalized?
<hint>
Correct internationalization means that it obeys instructions
at <a href="http://www.netbeans.org/download/dev/javadoc/org-openide-modules/org/openide/modules/doc-files/i18n-branding.html">
NetBeans I18N pages</a>.
</hint>
</question>
-->
<answer id="compat-i18n">
Yes.
</answer>
<!--
<question id="compat-standards" when="init">
Does the module implement or define any standards? Is the
implementation exact or does it deviate somehow?
</question>
-->
<answer id="compat-standards">
Compatible with standards.
</answer>
<!--
<question id="compat-version" when="impl">
Can your module coexist with earlier and future
versions of itself? Can you correctly read all old settings? Will future
versions be able to read your current settings? Can you read
or politely ignore settings stored by a future version?
<hint>
Very helpful for reading settings is to store version number
there, so future versions can decide whether how to read/convert
the settings and older versions can ignore the new ones.
</hint>
</question>
-->
<answer id="compat-version">
Yes.
</answer>
<!--
<question id="dep-jre" when="final">
Which version of JRE do you need (1.2, 1.3, 1.4, etc.)?
<hint>
It is expected that if your module runs on 1.x that it will run
on 1.x+1 if no, state that please. Also describe here cases where
you run different code on different versions of JRE and why.
</hint>
</question>
-->
<answer id="dep-jre">
JDK1.4 and higher can be used.
</answer>
<!--
<question id="dep-jrejdk" when="final">
Do you require the JDK or is the JRE enough?
</question>
-->
<answer id="dep-jrejdk">
JRE is sufficient.
</answer>
<!--
<question id="dep-nb" when="init">
What other NetBeans projects and modules does this one depend on?
<hint>
If you want, describe such projects as imported APIs using
the <code>&lt;api name="identification" type="import or export" category="stable" url="where is the description" /&gt;</code>
</hint>
</question>
-->
<answer id="dep-nb">
<defaultanswer generate='here'/>
</answer>
<!--
<question id="dep-non-nb" when="init">
What other projects outside NetBeans does this one depend on?
<hint>
Some non-NetBeans projects are packaged as NetBeans modules
(see <a href="http://libs.netbeans.org/">libraries</a>) and
it is preferred to use this approach when more modules may
depend on such third-party library.
</hint>
</question>
-->
<answer id="dep-non-nb">
No other projects.
</answer>
<!--
<question id="dep-platform" when="init">
On which platforms does your module run? Does it run in the same
way on each?
<hint>
If your module is using JNI or deals with special differences of
OSes like filesystems, etc. please describe here what they are.
</hint>
</question>
-->
<answer id="dep-platform">
All platforms.
</answer>
<!--
<question id="deploy-dependencies" when="final">
What do other modules need to do to declare a dependency on this one?
<hint>
Provide a sample of the actual lines you would add to a module manifest
to declare a dependency, for example using OpenIDE-Module-Module-Dependencies
or OpenIDE-Module-Requires. You may use the magic token @SPECIFICATION-VERSION@
to represent the current specification version of the module.
</hint>
</question>
-->
<answer id="deploy-dependencies">
<pre>
OpenIDE-Module-Module-Dependencies: org.netbeans.modules.lexer/2 &gt; @SPECIFICATION-VERSION@
</pre>
</answer>
<!--
<question id="deploy-jar" when="impl">
Do you deploy just module JAR file(s) or other files as well?
<hint>
If your module consists of just one module JAR file, just confirm that.
If it uses more than one JAR, describe where they are located, how
they refer to each other.
If it consist of module JAR(s) and other files, please describe
what is their purpose, why other files are necessary. Please
make sure that installation/uninstallation leaves the system
in state as it was before installation.
</hint>
</question>
-->
<answer id="deploy-jar">
No additional files.
</answer>
<!--
<question id="deploy-nbm" when="impl">
Can you deploy an NBM via the Update Center?
<hint>
If not why?
</hint>
</question>
-->
<answer id="deploy-nbm">
Yes.
</answer>
<!--
<question id="deploy-packages" when="init">
Are packages of your module made inaccessible by not declaring them
public?
<hint>
NetBeans module system allows restriction of access rights to
public classes of your module from other modules. This prevents
unwanted dependencies of others on your code and should be used
whenever possible (<a href="http://www.netbeans.org/download/javadoc/OpenAPIs/org/openide/doc-files/upgrade.html#3.4-public-packages">
public packages
</a>). If you do not restrict access to your classes you are
making it too easy for other people to misuse your implementation
details, that is why you should have good reason for not
restricting package access.
</hint>
</question>
-->
<answer id="deploy-packages">
Yes, where appropriate.
</answer>
<!--
<question id="deploy-shared" when="final">
Do you need to be installed in the shared location only, or in the user directory only,
or can your module be installed anywhere?
<hint>
Installation location shall not matter, if it does explain why.
Consider also whether <code>InstalledFileLocator</code> can help.
</hint>
</question>
-->
<answer id="deploy-shared">
Anywhere.
</answer>
<!--
<question id="exec-classloader" when="impl">
Does your code create its own class loader(s)?
<hint>
A bit unusual. Please explain why and what for.
</hint>
</question>
-->
<answer id="exec-classloader">
No.
</answer>
<!--
<question id="exec-component" when="impl">
Is execution of your code influenced by any (string) property
of any of your components?
<hint>
Often <code>JComponent.getClientProperty</code>, <code>Action.getValue</code>
or <code>PropertyDescriptor.getValue</code>, etc. are used to influence
a behavior of some code. This of course forms an interface that should
be documented. Also if one depends on some interface that an object
implements (<code>component instanceof Runnable</code>) that forms an
API as well.
</hint>
</question>
-->
<answer id="exec-component">
No.
</answer>
<!--
<question id="exec-introspection" when="impl">
Does your module use any kind of runtime type information (<code>instanceof</code>,
work with <code>java.lang.Class</code>, etc.)?
<hint>
Check for cases when you have an object of type A and you also
expect it to (possibly) be of type B and do some special action. That
should be documented. The same applies on operations in meta-level
(Class.isInstance(...), Class.isAssignableFrom(...), etc.).
</hint>
</question>
-->
<answer id="exec-introspection">
No.
</answer>
<!--
<question id="exec-privateaccess" when="final">
Are you aware of any other parts of the system calling some of
your methods by reflection?
<hint>
If so, describe the "contract" as an API. Likely private or friend one, but
still API and consider rewrite of it.
</hint>
</question>
-->
<answer id="exec-privateaccess">
No.
</answer>
<!--
<question id="exec-process" when="impl">
Do you execute an external process from your module? How do you ensure
that the result is the same on different platforms? Do you parse output?
Do you depend on result code?
<hint>
If you feed an input, parse the output please declare that as an API.
</hint>
</question>
-->
<answer id="exec-process">
No.
</answer>
<!--
<question id="exec-property" when="impl">
Is execution of your code influenced by any environment or
Java system (<code>System.getProperty</code>) property?
<hint>
If there is a property that can change the behavior of your
code, somebody will likely use it. You should describe what it does
and the <a href="http://openide.netbeans.org/tutorial/api-design.html#life">stability category</a>
of this API. You may use
<pre>
&lt;api type="export" group="property" name="id" category="private" url="http://..."&gt;
description of the property, where it is used, what it influence, etc.
&lt;/api&gt;
</pre>
</hint>
</question>
-->
<answer id="exec-property">
<api type="export" group="logger" name="org.netbeans.lib.lexer.TokenHierarchyOperation" category="friend">
<code>FINE</code> level lists lexer changes made in tokens both at the root level
and embedded levels of the token hierarchy after each document modification.
<br/>
<code>FINER</code> level in addition will also check the whole token hierarchy
for internal consistency after each modification.
</api>
<api type="export" group="logger" name="org.netbeans.lib.lexer.TokenList" category="friend">
<code>FINE</code> level forces lexer to perform more thorough and strict checks
in certain situations so this is useful mainly for tests.
Lookahead and state information is generated even for batch-lexed inputs which allows
easier checking of incremental algorithm correctness (fixing of token list after modification).
There are also some additional checks performed
that should verify correctness of the framework and the SPI implementation
classes being used (for example when flyweight tokens are created the text
passed to the token factory is compared to the text in the lexer input).
</api>
</answer>
<!--
<question id="exec-reflection" when="impl">
Does your code use Java Reflection to execute other code?
<hint>
This usually indicates a missing or insufficient API in the other
part of the system. If the other side is not aware of your dependency
this contract can be easily broken.
</hint>
</question>
-->
<answer id="exec-reflection">
No.
</answer>
<!--
<question id="exec-threading" when="impl">
What threading models, if any, does your module adhere to?
<hint>
If your module calls foreign APIs which have a specific threading model,
indicate how you comply with the requirements for multithreaded access
(synchronization, mutexes, etc.) applicable to those APIs.
If your module defines any APIs, or has complex internal structures
that might be used from multiple threads, declare how you protect
data against concurrent access, race conditions, deadlocks, etc.,
and whether such rules are enforced by runtime warnings, errors, assertions, etc.
Examples: a class might be non-thread-safe (like Java Collections); might
be fully thread-safe (internal locking); might require access through a mutex
(and may or may not automatically acquire that mutex on behalf of a client method);
might be able to run only in the event queue; etc.
Also describe when any events are fired: synchronously, asynchronously, etc.
Ideas: <a href="http://core.netbeans.org/proposals/threading/index.html#recommendations">Threading Recommendations</a> (in progress)
</hint>
</question>
-->
<answer id="exec-threading">
Use of token hierarchies for mutable input sources
must adhere to the locking mechanisms for the input sources themselves.
<br/>
For example accessing token hierarchy for swing document
requires read/write locking of document prior accessing token hierarchy.
</answer>
<!--
<question id="format-clipboard" when="impl">
Which data flavors (if any) does your code read from or insert to
the clipboard (by access to clipboard on means calling methods on <code>java.awt.datatransfer.Transferable</code>?
<hint>
Often Node's deal with clipboard by usage of <code>Node.clipboardCopy, Node.clipboardCut and Node.pasteTypes</code>.
Check your code for overriding these methods.
</hint>
</question>
-->
<answer id="format-clipboard">
No clipboard support.
</answer>
<!--
<question id="format-dnd" when="impl">
Which protocols (if any) does your code understand during Drag &amp; Drop?
<hint>
Often Node's deal with clipboard by usage of <code>Node.drag, Node.getDropType</code>.
Check your code for overriding these methods. Btw. if they are not overridden, they
by default delegate to <code>Node.clipboardCopy, Node.clipboardCut and Node.pasteTypes</code>.
</hint>
</question>
-->
<answer id="format-dnd">
No D&amp;D.
</answer>
<!--
<question id="format-types" when="impl">
Which protocols and file formats (if any) does your module read or write on disk,
or transmit or receive over the network?
</question>
-->
<answer id="format-types">
No files read or written to the disk.
</answer>
<!--
<question id="lookup-lookup" when="init">
Does your module use <code>org.openide.util.Lookup</code>
or any similar technology to find any components to communicate with? Which ones?
<hint>
Please describe the interfaces you are searching for, where
are defined, whether you are searching for just one or more of them,
if the order is important, etc. Also classify the stability of such
API contract.
</hint>
</question>
-->
<answer id="lookup-lookup">
No
</answer>
<!--
<question id="lookup-register" when="final">
Do you register anything into lookup for other code to find?
<hint>
Do you register using layer file or using <code>META-INF/services</code>?
Who is supposed to find your component?
</hint>
</question>
-->
<answer id="lookup-register">
No.
</answer>
<!--
<question id="lookup-remove" when="final">
Do you remove entries of other modules from lookup?
<hint>
Why? Of course, that is possible, but it can be dangerous. Is the module
your are masking resource from aware of what you are doing?
</hint>
</question>
-->
<answer id="lookup-remove">
No.
</answer>
<!--
<question id="perf-exit" when="final">
Does your module run any code on exit?
</question>
-->
<answer id="perf-exit">
No.
</answer>
<!--
<question id="perf-huge_dialogs" when="final">
Does your module contain any dialogs or wizards with a large number of
GUI controls such as combo boxes, lists, trees, or text areas?
</question>
-->
<answer id="perf-huge_dialogs">
No.
</answer>
<!--
<question id="perf-limit" when="init">
Are there any hard-coded or practical limits in the number or size of
elements your code can handle?
</question>
-->
<answer id="perf-limit">
No practical limits.
</answer>
<!--
<question id="perf-mem" when="final">
How much memory does your component consume? Estimate
with a relation to the number of windows, etc.
</question>
-->
<answer id="perf-mem">
Memory consumption is critical for created tokens because there can be thousands
of tokens per typical document. Thus there are several basic token types:
<ul>
<li>DefaultToken: 24 bytes </li>
<li>StringToken: 32 bytes (but only used for flyweight tokens)</li>
<li>PrepToken: 32 bytes plus text storage size (but only used
for tokens where character preprocessing was necessary)
</li>
</ul>
</answer>
<!--
<question id="perf-menus" when="final">
Does your module use dynamically updated context menus, or
context-sensitive actions with complicated enablement logic?
</question>
-->
<answer id="perf-menus">
No.
</answer>
<!--
<question id="perf-progress" when="final">
Does your module execute any long-running tasks?
<hint>Long running tasks should never block
AWT thread as it badly hurts the UI
<a href="http://performance.netbeans.org/responsiveness/issues.html">
responsiveness</a>.
Tasks like connecting over
network, computing huge amount of data, compilation
be done asynchronously (for example
using <code>RequestProcessor</code>), definitively it should
not block AWT thread.
</hint>
</question>
-->
<answer id="perf-progress">
All the tasks should be granularized.
Both batch and incremental lexing is done lazily as clients ask for tokens.
<br/>
The only potential long-running task is relexing of a very long portion of documents
e.g. if someone would type '/*' at the begining of java document
without any comments - the whole document turns into unclosed comment.
<br/>
This typically isn't a problem unless the very long token does not need to be lexed
several times (the original support without permanent tokens had to lex the token
upon each request).
<br/>
The lexer framework further helps to improve the situation by introducing
token validation which attempts to validate the token by checking
whether the typed character may really affect the token
or whether it's just necessary to fix the original token's length.
</answer>
<!--
<question id="perf-scale" when="init">
Which external criteria influence the performance of your
program (size of file in editor, number of files in menu,
in source directory, etc.) and how well your code scales?
<hint>
Please include some estimates, there are other more detailed
questions to answer in later phases of implementation.
</hint>
</question>
-->
<answer id="perf-scale">
On a typical machine the framework is able to produce about 370,000 tokens
of a text input with 1 million characters in less than 0.5 second.
</answer>
<!--
<question id="perf-spi" when="init">
How the performance of the plugged in code will be enforced?
<hint>
If you allow foreign code to be plugged into your own module, how
do you enforce that it will behave correctly and quickly and will not
negatively influence the performance of your own module?
</hint>
</question>
-->
<answer id="perf-spi">
The token change listeners implementations should be written to execute quickly.
For complex tasks they should reschedule its work into another thread.
</answer>
<!--
<question id="perf-startup" when="final">
Does your module run any code on startup?
</question>
-->
<answer id="perf-startup">
No.
</answer>
<!--
<question id="perf-wakeup" when="final">
Does any piece of your code wake up periodically and do something
even when the system is otherwise idle (no user interaction)?
</question>
-->
<answer id="perf-wakeup">
No.
</answer>
<!--
<question id="resources-file" when="final">
Does your module use <code>java.io.File</code> directly?
<hint>
NetBeans provide a logical wrapper over plain files called
<code>org.openide.filesystems.FileObject</code> that
provides uniform access to such resources and is the preferred
way that should be used. But of course there can be situations when
this is not suitable.
</hint>
</question>
-->
<answer id="resources-file">
No.
</answer>
<!--
<question id="resources-layer" when="final">
Does your module provide own layer? Does it create any files or
folders in it? What it is trying to communicate by that and with which
components?
<hint>
NetBeans allows automatic and declarative installation of resources
by module layers. Module register files into appropriate places
and other components use that information to perform their task
(build menu, toolbar, window layout, list of templates, set of
options, etc.).
</hint>
</question>
-->
<answer id="resources-layer">
No.
</answer>
<!--
<question id="resources-mask" when="final">
Does your module mask/hide/override any resources provided by other modules in
their layers?
<hint>
If you mask a file provided by another module, you probably depend
on that and do not want the other module to (for example) change
the file's name. That module shall thus make that file available as an API
of some stability category.
</hint>
</question>
-->
<answer id="resources-mask">
No.
</answer>
<!--
<question id="resources-read" when="final">
Does your module read any resources from layers? For what purpose?
<hint>
As this is some kind of intermodule dependency, it is a kind of API.
Please describe it and classify according to
<a href="http://openide.netbeans.org/tutorial/api-design.html#categories">
common stability categories</a>.
</hint>
</question>
-->
<answer id="resources-read">
No.
</answer>
<!--
<question id="security-grant" when="final">
Does your code grant addition rights to some code?
<hint>Avoid using a classloder that adds some extra
permissions to loaded code unless realy necessary.
Also note that your API implementation
can also expose unneeded permissions to enemy code by
AccessController.doPrilileged() calls.</hint>
</question>
-->
<answer id="security-grant">
No.
</answer>
<!--
<question id="security-policy" when="final">
Does your functionality require standard policy file modification?
<hint>Your code may pass control to third party code not
coming from trusted domain. It covers code downloaded over
network or code coming from libraries that are not bundled
with NetBeans. Which permissions it needs to grant to which domain?</hint>
</question>
-->
<answer id="security-policy">
No.
</answer>
<!--
<question id="exec-ant-tasks" when="impl">
Do you define or register any ant tasks that other can use?
<hint>
If you provide an ant task that users can use, you need to be very
careful about its syntax and behaviour, as it most likely forms an
API for end users and as there is a lot of end users, their reaction
when such API gets broken can be pretty strong.
</hint>
</question>
-->
<answer id="exec-ant-tasks">
No.
</answer>
<!--
<question id="arch-where" when="init">
Where one can find sources for your module?
<hint>
Please provide link to the CVS web client at
http://www.netbeans.org/download/source_browse.html
or just use tag defaultanswer generate='here'
</hint>
</question>
-->
<answer id="arch-where">
<defaultanswer generate='here' />
</answer>
<!--
<question id="compat-deprecation" when="init">
How the introduction of your project influences functionality
provided by previous version of the product?
<hint>
If you are planning to deprecate/remove/change any existing APIs,
list them here accompanied with the reason explaining why you
are doing so.
</hint>
</question>
-->
<answer id="compat-deprecation">
<p>
The current API completely replaces the original one therefore
the major version of the module was increased from 1 to 2.
<br/>
There are no plans to deprecated any part of the present API
and it should be evolved in a compatible way.
</p>
</answer>
<!--
<question id="resources-preferences" when="final">
Does your module uses preferences via Preferences API? Does your module use NbPreferences or
or regular JDK Preferences ? Does it read, write or both ?
Does it share preferences with other modules ? If so, then why ?
<hint>
You may use
&lt;api type="export" group="preferences"
name="preference node name" category="private"&gt;
description of individual keys, where it is used, what it
influences, whether the module reads/write it, etc.
&lt;/api&gt;
Due to XML ID restrictions, rather than /org/netbeans/modules/foo give the "name" as org.netbeans.modules.foo.
Note that if you use NbPreferences this name will then be the same as the code name base of the module.
</hint>
</question>
-->
<answer id="resources-preferences">
<p>
No.
</p>
</answer>
</api-answers>