ide/lexer/arch.xml - netbeans - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!--

     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.

 -->
 <!DOCTYPE api-answers PUBLIC "-//NetBeans//DTD Arch Answers//EN" "../nbbuild/antsrc/org/netbeans/nbbuild/Arch.dtd" [
   <!ENTITY api-questions SYSTEM "../nbbuild/antsrc/org/netbeans/nbbuild/Arch-api-questions.xml">
 ]>

 <api-answers
   question-version="1.29"
   author="mmetelka@netbeans.org"
 >

   &api-questions;


 <!--
         <question id="arch-what" when="init">
             What is this project good for?
             <hint>
             Please provide here a few lines describing the project,
             what problem it should solve, provide links to documentation,
             specifications, etc.
             </hint>
         </question>
 -->
 <answer id="arch-what">
 Lexer module provides token lists for various
 text inputs. Token lists can either be flat or they can form
 tree token hierarchies if any language embedding is present.
 Tokens
 </answer>


 <!--
         <question id="arch-overall" when="init">
             Describe the overall architecture.
             <hint>
             What will be API for
             <a href="http://openide.netbeans.org/tutorial/api-design.html#design.apiandspi">
                 clients and what support API</a>?
             What parts will be pluggable?
             How will plug-ins be registered? Please use <code>&lt;api type="export"/&gt;</code>
             to describe your general APIs.
             If possible please provide
             simple diagrams.
             </hint>
         </question>
 -->
 <answer id="arch-overall">
 The lexer module defines
 <api name="LexerAPI" group="java" type="export" category="official"/>
 providing access to sequence of tokens for various input sources.
 <br/>
 An <b>API entry point</b> is
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenHierarchy.html">TokenHierarchy</a>
 class with its static methods that provide its instance for the given input source.

 <h3>Input Sources</h3>
 <p>
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenHierarchy.html">TokenHierarchy</a>
     can be created for immutable input sources (
 <a href="@JDK@/java/lang/CharSequence.html">CharSequence</a>
     or
 <a href="@JDK@/java/io/Reader.html">java.io.Reader</a>
     ) or for mutable input sources (typically
 <a href="@JDK@/javax/swing/text/Document.html">javax.swing.text.Document</a>
     ).
     <br/>
     For mutable input source the lexer framework updates the tokens in the token hierarchy automatically
     with subsequent changes to the underlying text input.
     The tokens of the hierarchy always reflect the text of the input at the given time.
 </p>

 <h3>TokenSequence and Token</h3>
 <p>
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenHierarchy.html#tokenSequence--">TokenHierarchy.tokenSequence()</a>
    allows to iterate over a list of
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/Token.html">Token</a>
     instances.
     <br/>
     The token carries a token identification
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenId.html">TokenId</a>
     (returned by
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/Token.html#id--">Token.id()</a>
     ) and a text (aka token body) represented as
 <a href="@JDK@/java/lang/CharSequence.html">CharSequence</a>
     (returned by
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/Token.html#text--">Token.text()</a>
     ).
     <br/>
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenUtilities.html">TokenUtilities</a>
     contains many useful methods related to operations with the token's text such as
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenUtilities.html#equals-java.lang.CharSequence-java.lang.Object-">TokenUtilities.equals(CharSequence text, Object o)</a>,
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenUtilities.html#startsWith-java.lang.CharSequence-java.lang.CharSequence-">TokenUtilities.startsWith(CharSequence text, CharSequence prefix)</a>,
    etc.
     <br/>
    It is also possible to debug the text of the token (replace special chars by escapes) by
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenUtilities.html#debugText-java.lang.CharSequence-">TokenUtilities.equals(CharSequence text)</a>.
     <br/>
     A typical token also carries offset of its occurrence in the input text.
 </p>

 <h3>Flyweight Tokens</h3>
 <p>
     As there are many token occurrences where the token text is the same for all
     or many occurrences
     (e.g. java keywords, operators or a single-space whitespace) the memory consumption
     can be decreased considerably by allowing the creation of <b>flyweight token</b> instances
     i.e. just one token instance is used for all the token's occurrences
     in all the inputs.
     <br/>
     Flyweight tokens can be determined by
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/Token.html#isFlyweight--">Token.isFlyweight()</a>.
     <br/>
     The flyweight tokens do not carry a valid offset (their internal offset is -1).
     <br/>
     Therefore
  <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenSequence.html">TokenSequence</a>
     is used for iteration through the tokens (instead of a regular iterator) and it provides
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenSequence.html#offset--">TokenSequence.offset()</a>
     which returns the proper offset even when positioned over a flyweight token.
     <br/>
     When holding a reference to the token's instance its offset can also be determined by
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/Token.html#offset-org.netbeans.api.lexer.TokenHierarchy-">Token.offset(TokenHierarchy tokenHierarchy)</a>.
     The <code>tokenHierarchy</code> parameter should be always <code>null</code> and it will be used
     for the token hierarchy snapshot support in future releases.
     <br/>
     For flyweight tokens the
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/Token.html#offset-org.netbeans.api.lexer.TokenHierarchy-">Token.offset(TokenHierarchy tokenHierarchy)</a>
     returns -1 and for regular tokens it gives the same value like
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenSequence.html#offset--">TokenSequence.offset()</a>.
 </p>

 <p>
     There may be applications where the flyweight tokens use could be problematic.
     For example if a parser would like to use token instances
     in a parse tree nodes to determine the nodes' boundaries then the flyweight tokens
     would always return offset -1 so the positions of the parse tree nodes
     could not generally be determined from the tokens only.
     <br/>
     Therefore there is a possibility to de-flyweight a token by using
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenSequence.html#offsetToken--">TokenSequence.offsetToken()</a>
     which checks the current token
     and if it's flyweight then it replaces it with a non-flyweight token instance
     with a valid offset and with the same properties as the original flyweight token.
 </p>

 <h3>TokenId and Language</h3>
 <p>
     Token is identified by its id represented by
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenId.html">TokenId</a>
     interface. Token ids for a language are typically implemented as java enums (extensions of
 <a href="@JDK@/java/lang/Enum.html">Enum</a>
     ) but it's not mandatory.
     <br/>
     All token ids for the given language are described by
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/Language.html">Language</a>.
     <br/>
     Each token id may belong
     to one or more token categories that allow to better operate
     tokens of the same type (e.g. keywords or operators).
     <br/>
     Each token id may define its primary category
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenId.html#primaryCategory--">TokenId.primaryCategory()</a>
     and
 <a href="@org-netbeans-modules-lexer@/org/netbeans/spi/lexer/LanguageHierarchy.html#createTokenCategories--">LanguageHierarchy.createTokenCategories()</a>
     may provide additional categories for the token ids for the given language.
     <br/>
     Each language description has a mandatory mime-type specification
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/Language.html#mimeType--">Language.mimeType()</a>
     <br/>
     Although it's a bit non-related information it brings many benefits
     because with the mime-type the language can be accompanied
     with an arbitrary sort of settings (e.g. syntax coloring information etc.).
 </p>

 <h3>LanguageHierarchy, Lexer, LexerInput and TokenFactory</h3>
 <p>
     SPI providers wishing to provide a
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/Language.html">Language</a>
     first need to define its SPI counterpart
 <a href="@org-netbeans-modules-lexer@/org/netbeans/spi/lexer/LanguageHierarchy.html">LanguageHierarchy</a>.
     It mainly needs to define token ids in
 <a href="@org-netbeans-modules-lexer@/org/netbeans/spi/lexer/LanguageHierarchy.html#createTokenIds--">LanguageHierarchy.createTokenIds()</a>
     and lexer in
 <a href="@org-netbeans-modules-lexer@/org/netbeans/spi/lexer/LanguageHierarchy.html#createLexer-org.netbeans.spi.lexer.LexerRestartInfo-">
     LanguageHierarchy.createLexer(LexerInput lexerInput, TokenFactory tokenFactory, Object state, LanguagePath languagePath, InputAttributes inputAttributes)</a>.
     <br/>
 <a href="@org-netbeans-modules-lexer@/org/netbeans/spi/lexer/Lexer.html">Lexer</a>
     reads characters from
 <a href="@org-netbeans-modules-lexer@/org/netbeans/spi/lexer/LexerInput.html">LexerInput</a>
     and breaks the text into tokens.
     <br/>
     Tokens are produced by using methods of
 <a href="@org-netbeans-modules-lexer@/org/netbeans/spi/lexer/TokenFactory.html">TokenFactory</a>.
     <br/>
     As a per-token memory consumption is critical the
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/Token.html">Token</a>
     does not have any counterpart in SPI. However the framework prevents instantiation
     of any other token classes except those contained in the lexer module's implementation.
 </p>


 <h3>Language Embedding</h3>
 <p>
     With language embedding the flat list of tokens becomes in fact a tree-like hierarchy
     represented by the
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenHierarchy.html">TokenHierarchy</a>
     class. Each token can potentially be broken into a sequence of embedded tokens.
     <br/>The
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenSequence.html#embedded--">TokenSequence.embedded()</a>
     method can be called to obtain the embedded tokens (when positioned on the branch token).
     <br/>
     There are two ways of specifying what language is embedded in a token. The language
     can either be specified explicitly (hardcoded) in the
 <a href="@org-netbeans-modules-lexer@/org/netbeans/spi/lexer/LanguageHierarchy.html#embedding-org.netbeans.api.lexer.Token-org.netbeans.api.lexer.LanguagePath-org.netbeans.api.lexer.InputAttributes-">LanguageHierarchy.embedding()</a>
     method or there can be a
 <a href="@org-netbeans-modules-lexer@/org/netbeans/spi/lexer/LanguageProvider.html">LanguageProvider</a>
     registered in the default Lookup, which will create a
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/Language.html">Language</a>
     for the embedded language.
     <br/>
     There is no limit on the depth of a language hierarchy and there can be as many embedded languages
     as needed.
     <br/>
     In SPI the language embedding is represented by
 <a href="@org-netbeans-modules-lexer@/org/netbeans/spi/lexer/LanguageEmbedding.html">LanguageEmbedding</a>.
 </p>

 </answer>


 <!--
         <question id="arch-usecases" when="init">
             Describe the main <a href="http://openide.netbeans.org/tutorial/api-design.html#usecase">
             use cases</a> of the new API. Who will use it under
             what circumstances? What kind of code would typically need to be written
             to use the module?
         </question>
 -->
 <answer id="arch-usecases">

 <!-- API Usecases - API Usecases - API Usecases - API Usecases - API Usecases -->

 <h1>
 API Usecases
 </h1>

 <h3>
 Obtaining of token hierarchy for various inputs.
 </h3>
 The
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/TokenHierarchy.html">TokenHierarchy</a>
 is an entry point into Lexer API
 and it represents the given input in terms of tokens.
 <pre>
     String text = "public void m() { }";
     TokenHierarchy hi = TokenHierarchy.create(text, JavaLanguage.description());
 </pre>

 <br/>
 Token hierarchy for swing documents must be operated under read/write document's lock.
 <pre>
     document.readLock();
     try {
         TokenHierarchy hi = TokenHierarchy.get(document);
         ... // explore tokens etc.
     } finally {
         document.readUnlock();
     }
 </pre>


 <h3>
 Obtaining and iterating token sequence over particular swing document from the given offset.
 </h3>
 The tokens cover the whole document and it's possible to iterate either forward or backward.
 <br/>
 Each token can contain language embedding that can also be explored by the token sequence.
 The language embedding covers the whole text of the token (there can be few characters
 skipped at the begining an end of the branch token).

 <pre>
     document.readLock();
     try {
         TokenHierarchy hi = TokenHierarchy.get(document);
         TokenSequence ts = hi.tokenSequence();
         // If necessary move ts to the requested offset
         ts.move(offset);
         while (ts.moveNext()) {
             Token t = ts.token();
             if (t.id() == ...) { ... }
             if (TokenUtilities.equals(t.text(), "mytext")) { ... }
             if (ts.offset() == ...) { ... }

             // Possibly retrieve embedded token sequence
             TokenSequence embedded = ts.embedded();
             if (embedded != null) { // Token has a valid language embedding
                 ...
             }
         }
     } finally {
         document.readUnlock();
     }
 </pre>

 <br/>
 Typical clients:
 <ul>
     <li>Editor's painting code doing syntax coloring
         <code>org.netbeans.modules.lexer.editorbridge.LexerLayer</code> in <i>lexer/editorbridge</i> module.
     </li>
     <li>Brace matching code searching for matching brace in forward/backward direction.</li>
     <li>Code completion's quick check whether caret is located inside comment token.</li>
     <li>Parser constructing a parse tree iterating through the tokens in forward direction.</li>
 </ul>

 <h3>
 Using language path of the token sequence
 </h3>
 For the given token sequence the client may check whether it's a top level
 token sequence in the token hierarchy or whether it's embedded at which level
 it's embedded and what are the parent languages.
 <br/>
 Each token can contain language embedding that can also be explored by the token sequence.
 The language embedding covers the whole text of the token (there can be few characters
 skipped at the begining an end of the branch token).

 <pre>
     TokenSequence ts = ...
     LanguagePath lp = ts.languagePath();
     if (lp.size() > 1) { ... } // This is embedded token sequence
     if (lp.topLanguage() == JavaLanguage.description()) { ... } // top-level language of the token hierarchy
     String mimePath = lp.mimePath();
     Object setting-value = some-settings.getSetting(mimePath, setting-name);
 </pre>


 <h3>
 Extra information about the input
 </h3>
 The
 <a href="@org-netbeans-modules-lexer@/org/netbeans/api/lexer/InputAttributes.html">InputAttributes</a>
 class may carry extra information about the text input on which the token hierarchy
 is being created. For example there can be information about the version of the language
 that the input represents and the lexer may be written to recognize multiple versions
 of the language. It should suffice to do the versioning through a simple integer:
 <pre>
 public class MyLexer implements Lexer&lt;MyTokenId&gt; {

     private final int version;

     ...

     public MyLexer(LexerInput input, TokenFactory&lt;MyTokenId&gt; tokenFactory, Object state,
     LanguagePath languagePath, InputAttributes inputAttributes) {
         ...

         Integer ver = (inputAttributes != null)
                 ? (Integer)inputAttributes.getValue(languagePath, "version")
                 : null;
         this.version = (ver != null) ? ver.intValue() : 1; // Use version 1 if not specified explicitly
     }

     public Token&lt;MyTokenId&gt; nextToken() {
         ...
         if (recognized-assert-keyword) {
             return (version &gt;= 4) { // "assert" recognized as keyword since version 4
                 ? keyword(MyTokenId.ASSERT)
                 : identifier();
         }
         ...
     }
     ...
 }
 </pre>

 The client will then use the following code:
 <pre>
     InputAttributes attrs = new InputAttributes();
     // The "true" means global value i.e. for any occurrence of the MyLanguage including embeddings
     attrs.setValue(MyLanguage.description(), "version", Integer.valueOf(3), true);
     TokenHierarchy hi = TokenHierarchy.create(text, false, SimpleLanguage.description(), null, attrs);
     ...
 </pre>


 <h3>
 Filtering out unnecessary tokens
 </h3>
 Filtering is only possible for immutable inputs (e.g. String or Reader).
 <pre>
     Set&lt;MyTokenId&gt; skipIds = EnumSet.of(MyTokenId.COMMENT, MyTokenId.WHITESPACE);
     TokenHierarchy tokenHierarchy = TokenHierarchy.create(inputText, false,
         MyLanguage.description(), skipIds, null);
     ...
 </pre>

 <br/>
 Typical clients:
 <ul>
     <li>Parser constructing a parse tree. It is not interested
         in the comment and whitespace tokens so these tokens do not need
         to be constructed at all.
     </li>
 </ul>


 <!-- SPI Usecases - SPI Usecases - SPI Usecases - SPI Usecases - SPI Usecases -->

 <h1>
 SPI Usecases
 </h1>

 <h3>
 Providing language description and lexer.
 </h3>

 Token ids should be defined as enums. For example
 <code>org.netbeans.lib.lexer.test.simple.SimpleTokenId</code> can be copied
 or the following example from
 <code>org.netbeans.modules.lexer.editorbridge.calc.lang.CalcTokenId</code>.
 <br/>
 The static <code>language()</code> method returns the language describing the token ids.
 <pre>
 public enum CalcTokenId implements TokenId {

     WHITESPACE(null, "whitespace"),
     SL_COMMENT(null, "comment"),
     ML_COMMENT(null, "comment"),
     E("e", "keyword"),
     PI("pi", "keyword"),
     IDENTIFIER(null, null),
     INT_LITERAL(null, "number"),
     FLOAT_LITERAL(null, "number"),
     PLUS("+", "operator"),
     MINUS("-", "operator"),
     STAR("*", "operator"),
     SLASH("/", "operator"),
     LPAREN("(", "separator"),
     RPAREN(")", "separator"),
     ERROR(null, "error"),
     ML_COMMENT_INCOMPLETE(null, "comment");


     private final String fixedText;

     private final String primaryCategory;

     private CalcTokenId(String fixedText, String primaryCategory) {
         this.fixedText = fixedText;
         this.primaryCategory = primaryCategory;
     }

     public String fixedText() {
         return fixedText;
     }

     public String primaryCategory() {
         return primaryCategory;
     }

     private static final Language&lt;CalcTokenId&gt; language = new LanguageHierarchy&lt;CalcTokenId&gt;() {
         <code>@Override</code>
         protected Collection&lt;CalcTokenId&gt; createTokenIds() {
             return EnumSet.allOf(CalcTokenId.class);
         }

         <code>@Override</code>
         protected Map&lt;String,Collection&lt;CalcTokenId&gt;&gt; createTokenCategories() {
             Map&lt;String,Collection&lt;CalcTokenId&gt;&gt; cats = new HashMap&lt;String,Collection&lt;CalcTokenId&gt;&gt;();

             // Incomplete literals
             cats.put("incomplete", EnumSet.of(CalcTokenId.ML_COMMENT_INCOMPLETE));
             // Additional literals being a lexical error
             cats.put("error", EnumSet.of(CalcTokenId.ML_COMMENT_INCOMPLETE));

             return cats;
         }

         <code>@Override</code>
         protected Lexer&lt;CalcTokenId&gt; createLexer(LexerRestartInfo&lt;CalcTokenId&gt; info) {
             return new CalcLexer(info);
         }

         <code>@Override</code>
         protected String mimeType() {
             return "text/x-calc";
         }

     }.language();

     public static final Language&lt;CalcTokenId&gt; language() {
         return language;
     }

 }
 </pre>

 Note that it is not needed to publish the underlying <code>LanguageHierarchy</code> extension.

 <br/>
 Lexer example:
 <pre>
 public final class CalcLexer implements Lexer&lt;CalcTokenId&gt; {

     private static final int EOF = LexerInput.EOF;

     private static final Map&lt;String,CalcTokenId&gt; keywords = new HashMap&lt;String,CalcTokenId&gt;();
     static {
         keywords.put(CalcTokenId.E.fixedText(), CalcTokenId.E);
         keywords.put(CalcTokenId.PI.fixedText(), CalcTokenId.PI);
     }

     private LexerInput input;

     private TokenFactory&lt;CalcTokenId&gt; tokenFactory;

     CalcLexer(LexerRestartInfo&lt;CalcTokenId&gt; info) {
         this.input = info.input();
         this.tokenFactory = info.tokenFactory();
         assert (info.state() == null); // passed argument always null
     }

     public Token&lt;CalcTokenId&gt; nextToken() {
         while (true) {
             int ch = input.read();
             switch (ch) {
                 case '+':
                     return token(CalcTokenId.PLUS);

                 case '-':
                     return token(CalcTokenId.MINUS);

                 case '*':
                     return token(CalcTokenId.STAR);

                 case '/':
                     switch (input.read()) {
                         case '/': // in single-line comment
                             while (true)
                                 switch (input.read()) {
                                     case '\r': input.consumeNewline();
                                     case '\n':
                                     case EOF:
                                         return token(CalcTokenId.SL_COMMENT);
                                 }
                         case '*': // in multi-line comment
                             while (true) {
                                 ch = input.read();
                                 while (ch == '*') {
                                     ch = input.read();
                                     if (ch == '/')
                                         return token(CalcTokenId.ML_COMMENT);
                                     else if (ch == EOF)
                                         return token(CalcTokenId.ML_COMMENT_INCOMPLETE);
                                 }
                                 if (ch == EOF)
                                     return token(CalcTokenId.ML_COMMENT_INCOMPLETE);
                             }
                     }
                     input.backup(1);
                     return token(CalcTokenId.SLASH);

                 case '(':
                     return token(CalcTokenId.LPAREN);

                 case ')':
                     return token(CalcTokenId.RPAREN);

                 case '0': case '1': case '2': case '3': case '4':
                 case '5': case '6': case '7': case '8': case '9':
                 case '.':
                     return finishIntOrFloatLiteral(ch);

                 case EOF:
                     return null;

                 default:
                     if (Character.isWhitespace((char)ch)) {
                         ch = input.read();
                         while (ch != EOF &amp;&amp; Character.isWhitespace((char)ch)) {
                             ch = input.read();
                         }
                         input.backup(1);
                         return token(CalcTokenId.WHITESPACE);
                     }

                     if (Character.isLetter((char)ch)) { // identifier or keyword
                         while (true) {
                             if (ch == EOF || !Character.isLetter((char)ch)) {
                                 input.backup(1); // backup the extra char (or EOF)
                                 // Check for keywords
                                 CalcTokenId id = keywords.get(input.readText());
                                 if (id == null) {
                                     id = CalcTokenId.IDENTIFIER;
                                 }
                                 return token(id);
                             }
                             ch = input.read(); // read next char
                         }
                     }

                     return token(CalcTokenId.ERROR);
             }
         }
     }

     public Object state() {
         return null;
     }

     private Token&lt;CalcTokenId&gt; finishIntOrFloatLiteral(int ch) {
         boolean floatLiteral = false;
         boolean inExponent = false;
         while (true) {
             switch (ch) {
                 case '.':
                     if (floatLiteral) {
                         return token(CalcTokenId.FLOAT_LITERAL);
                     } else {
                         floatLiteral = true;
                     }
                     break;
                 case '0': case '1': case '2': case '3': case '4':
                 case '5': case '6': case '7': case '8': case '9':
                     break;
                 case 'e': case 'E': // exponent part
                     if (inExponent) {
                         return token(CalcTokenId.FLOAT_LITERAL);
                     } else {
                         floatLiteral = true;
                         inExponent = true;
                     }
                     break;
                 default:
                     input.backup(1);
                     return token(floatLiteral ? CalcTokenId.FLOAT_LITERAL
                             : CalcTokenId.INT_LITERAL);
             }
             ch = input.read();
         }
     }

     private Token&lt;CalcTokenId&gt; token(CalcTokenId id) {
         return (id.fixedText() != null)
             ? tokenFactory.getFlyweightToken(id, id.fixedText())
             : tokenFactory.createToken(id);
     }

 }
 </pre>
 <p>
     The classes containing token ids and the language description should be
     part of an API. The lexer should only be part of the implementation.
 </p>


 <h3>
 Providing language embedding.
 </h3>

 The embedding may be provided statically
 in the <code>LanguageHierarchy.embedding()</code>
 see e.g. <code>org.netbeans.lib.lexer.test.simple.SimpleLanguage</code>.

 <p>
     Or it may be provided dynamically through the xml layer
     by using a file in "Editors/language-mime-type/languagesEmbeddingMap" folder
     named by the token-id's name containing target mime-type and initial and ending skip lengths:
 </p>
 <pre>
     &lt;folder name="Editors"&gt;
         &lt;folder name="text"&gt;
             &lt;folder name="x-outer-language"&gt;
                 &lt;folder name="languagesEmbeddingMap"&gt;
                     &lt;file name="WORD"&gt;&lt;![CDATA[text/x-inner-language,1,2]]&gt;
                     &lt;/file&gt;
                 &lt;/folder&gt;
             &lt;/folder&gt;
         &lt;/folder&gt;
     &lt;/folder&gt;
 </pre>

 </answer>


 <!--
         <question id="arch-quality" when="init">
             How will the <a href="http://www.netbeans.org/community/guidelines/q-evangelism.html">quality</a>
             of your code be tested and
             how are future regressions going to be prevented?
             <hint>
             What kind of testing do
             you want to use? How much functionality, in which areas,
             should be covered by the tests?
             </hint>
         </question>
 -->
 <answer id="arch-quality">
 The lexer module is completely unit-testable.
 <br/>
 Besides of tests for its own correctness it also contains support
 for testing of correctness of lexers from SPI providers
 by using <code>org.netbeans.lib.lexer.test.TestRandomModify</code> class.
 <br/>
 The main testing method for the lexer correctnes is token-by-token comparing
 of the updated token sequence with a batch-lexed token sequence for the same input.
 </answer>


 <!--
         <question id="arch-time" when="init">
             What are the time estimates of the work?
             <hint>
             Please express your estimates of how long the design, implementation,
             stabilization are likely to last. How many people will be needed to
             implement this and what is the expected milestone by which the work should be
             ready?
             </hint>
         </question>
 -->
 <answer id="arch-time">
 The present implementation is stable but there are few missing implementations
 and other things to be considered:
 <ul>
     <li>Dynamic language embedding binding through xml layer.</li>
     <li>CharPreprocessor servicing and tests.</li>
     <li>Token hierarchy for Reader.</li>
     <li>TokenFactory.createBranchToken() impl.</li>
     <li>Providing JavaCC and Antlr support.</li>
     <li>Support for token positions (may add API).</li>
 </ul>
 </answer>


 <!--
         <question id="compat-i18n" when="impl">
             Is your module correctly internationalized?
             <hint>
             Correct internationalization means that it obeys instructions
             at <a href="http://www.netbeans.org/download/dev/javadoc/org-openide-modules/org/openide/modules/doc-files/i18n-branding.html">
             NetBeans I18N pages</a>.
             </hint>
         </question>
 -->
 <answer id="compat-i18n">
 Yes.
 </answer>


 <!--
         <question id="compat-standards" when="init">
             Does the module implement or define any standards? Is the
             implementation exact or does it deviate somehow?
         </question>
 -->
 <answer id="compat-standards">
 Compatible with standards.
 </answer>


 <!--
         <question id="compat-version" when="impl">
             Can your module coexist with earlier and future
             versions of itself? Can you correctly read all old settings? Will future
             versions be able to read your current settings? Can you read
             or politely ignore settings stored by a future version?

             <hint>
             Very helpful for reading settings is to store version number
             there, so future versions can decide whether how to read/convert
             the settings and older versions can ignore the new ones.
             </hint>
         </question>
 -->
 <answer id="compat-version">
 Yes.
 </answer>


 <!--
         <question id="dep-jre" when="final">
             Which version of JRE do you need (1.2, 1.3, 1.4, etc.)?
             <hint>
             It is expected that if your module runs on 1.x that it will run
             on 1.x+1 if no, state that please. Also describe here cases where
             you run different code on different versions of JRE and why.
             </hint>
         </question>
 -->
 <answer id="dep-jre">
 JDK1.4 and higher can be used.
 </answer>


 <!--
         <question id="dep-jrejdk" when="final">
             Do you require the JDK or is the JRE enough?
         </question>
 -->
 <answer id="dep-jrejdk">
 JRE is sufficient.
 </answer>


 <!--
         <question id="dep-nb" when="init">
             What other NetBeans projects and modules does this one depend on?
             <hint>
             If you want, describe such projects as imported APIs using
             the <code>&lt;api name="identification" type="import or export" category="stable" url="where is the description" /&gt;</code>
             </hint>
         </question>
 -->
 <answer id="dep-nb">
     <defaultanswer generate='here'/>
 </answer>


 <!--
         <question id="dep-non-nb" when="init">
             What other projects outside NetBeans does this one depend on?

             <hint>
             Some non-NetBeans projects are packaged as NetBeans modules
             (see <a href="http://libs.netbeans.org/">libraries</a>) and
             it is preferred to use this approach when more modules may
             depend on such third-party library.
             </hint>
         </question>
 -->
 <answer id="dep-non-nb">
 No other projects.
 </answer>


 <!--
         <question id="dep-platform" when="init">
             On which platforms does your module run? Does it run in the same
             way on each?
             <hint>
             If your module is using JNI or deals with special differences of
             OSes like filesystems, etc. please describe here what they are.
             </hint>
         </question>
 -->
 <answer id="dep-platform">
 All platforms.
 </answer>


 <!--
         <question id="deploy-dependencies" when="final">
             What do other modules need to do to declare a dependency on this one?
             <hint>
                 Provide a sample of the actual lines you would add to a module manifest
                 to declare a dependency, for example using OpenIDE-Module-Module-Dependencies
                 or OpenIDE-Module-Requires. You may use the magic token @SPECIFICATION-VERSION@
                 to represent the current specification version of the module.
             </hint>
         </question>
 -->
 <answer id="deploy-dependencies">
 <pre>
 OpenIDE-Module-Module-Dependencies: org.netbeans.modules.lexer/2 &gt; @SPECIFICATION-VERSION@
 </pre>
 </answer>


 <!--
         <question id="deploy-jar" when="impl">
             Do you deploy just module JAR file(s) or other files as well?
             <hint>
             If your module consists of just one module JAR file, just confirm that.
             If it uses more than one JAR, describe where they are located, how
             they refer to each other.
             If it consist of module JAR(s) and other files, please describe
             what is their purpose, why other files are necessary. Please
             make sure that installation/uninstallation leaves the system
             in state as it was before installation.
             </hint>
         </question>
 -->
 <answer id="deploy-jar">
 No additional files.
 </answer>


 <!--
         <question id="deploy-nbm" when="impl">
             Can you deploy an NBM via the Update Center?
             <hint>
             If not why?
             </hint>
         </question>
 -->
 <answer id="deploy-nbm">
 Yes.
 </answer>


 <!--
         <question id="deploy-packages" when="init">
             Are packages of your module made inaccessible by not declaring them
             public?

             <hint>
             NetBeans module system allows restriction of access rights to
             public classes of your module from other modules. This prevents
             unwanted dependencies of others on your code and should be used
             whenever possible (<a href="http://www.netbeans.org/download/javadoc/OpenAPIs/org/openide/doc-files/upgrade.html#3.4-public-packages">
             public packages
             </a>). If you do not restrict access to your classes you are
             making it too easy for other people to misuse your implementation
             details, that is why you should have good reason for not
             restricting package access.
             </hint>
         </question>
 -->
 <answer id="deploy-packages">
 Yes, where appropriate.
 </answer>


 <!--
         <question id="deploy-shared" when="final">
             Do you need to be installed in the shared location only, or in the user directory only,
             or can your module be installed anywhere?
             <hint>
             Installation location shall not matter, if it does explain why.
             Consider also whether <code>InstalledFileLocator</code> can help.
             </hint>
         </question>
 -->
 <answer id="deploy-shared">
 Anywhere.
 </answer>


 <!--
         <question id="exec-classloader" when="impl">
             Does your code create its own class loader(s)?
             <hint>
             A bit unusual. Please explain why and what for.
             </hint>
         </question>
 -->
 <answer id="exec-classloader">
 No.
 </answer>


 <!--
         <question id="exec-component" when="impl">
             Is execution of your code influenced by any (string) property
             of any of your components?

             <hint>
             Often <code>JComponent.getClientProperty</code>, <code>Action.getValue</code>
             or <code>PropertyDescriptor.getValue</code>, etc. are used to influence
             a behavior of some code. This of course forms an interface that should
             be documented. Also if one depends on some interface that an object
             implements (<code>component instanceof Runnable</code>) that forms an
             API as well.
             </hint>
         </question>
 -->
 <answer id="exec-component">
 No.
 </answer>


 <!--
         <question id="exec-introspection" when="impl">
             Does your module use any kind of runtime type information (<code>instanceof</code>,
             work with <code>java.lang.Class</code>, etc.)?
             <hint>
             Check for cases when you have an object of type A and you also
             expect it to (possibly) be of type B and do some special action. That
             should be documented. The same applies on operations in meta-level
             (Class.isInstance(...), Class.isAssignableFrom(...), etc.).
             </hint>
         </question>
 -->
 <answer id="exec-introspection">
 No.
 </answer>


 <!--
         <question id="exec-privateaccess" when="final">
             Are you aware of any other parts of the system calling some of
             your methods by reflection?
             <hint>
             If so, describe the "contract" as an API. Likely private or friend one, but
             still API and consider rewrite of it.
             </hint>
         </question>
 -->
 <answer id="exec-privateaccess">
 No.
 </answer>


 <!--
         <question id="exec-process" when="impl">
             Do you execute an external process from your module? How do you ensure
             that the result is the same on different platforms? Do you parse output?
             Do you depend on result code?
             <hint>
             If you feed an input, parse the output please declare that as an API.
             </hint>
         </question>
 -->
 <answer id="exec-process">
 No.
 </answer>


 <!--
         <question id="exec-property" when="impl">
             Is execution of your code influenced by any environment or
             Java system (<code>System.getProperty</code>) property?

             <hint>
             If there is a property that can change the behavior of your
             code, somebody will likely use it. You should describe what it does
             and the <a href="http://openide.netbeans.org/tutorial/api-design.html#life">stability category</a>
             of this API. You may use
             <pre>
                 &lt;api type="export" group="property" name="id" category="private" url="http://..."&gt;
                     description of the property, where it is used, what it influence, etc.
                 &lt;/api&gt;
             </pre>
             </hint>
         </question>
 -->
 <answer id="exec-property">
     <api type="export" group="logger" name="org.netbeans.lib.lexer.TokenHierarchyOperation" category="friend">
         <code>FINE</code> level lists lexer changes made in tokens both at the root level
         and embedded levels of the token hierarchy after each document modification.
         <br/>
         <code>FINER</code> level in addition will also check the whole token hierarchy
          for internal consistency after each modification.
     </api>
     <api type="export" group="logger" name="org.netbeans.lib.lexer.TokenList" category="friend">
         <code>FINE</code> level forces lexer to perform more thorough and strict checks
         in certain situations so this is useful mainly for tests.
         Lookahead and state information is generated even for batch-lexed inputs which allows
         easier checking of incremental algorithm correctness (fixing of token list after modification).
         There are also some additional checks performed
         that should verify correctness of the framework and the SPI implementation
         classes being used (for example when flyweight tokens are created the text
         passed to the token factory is compared to the text in the lexer input).
     </api>
 </answer>


 <!--
         <question id="exec-reflection" when="impl">
             Does your code use Java Reflection to execute other code?
             <hint>
             This usually indicates a missing or insufficient API in the other
             part of the system. If the other side is not aware of your dependency
             this contract can be easily broken.
             </hint>
         </question>
 -->
 <answer id="exec-reflection">
 No.
 </answer>


 <!--
         <question id="exec-threading" when="impl">
             What threading models, if any, does your module adhere to?
             <hint>
                 If your module calls foreign APIs which have a specific threading model,
                 indicate how you comply with the requirements for multithreaded access
                 (synchronization, mutexes, etc.) applicable to those APIs.
                 If your module defines any APIs, or has complex internal structures
                 that might be used from multiple threads, declare how you protect
                 data against concurrent access, race conditions, deadlocks, etc.,
                 and whether such rules are enforced by runtime warnings, errors, assertions, etc.
                 Examples: a class might be non-thread-safe (like Java Collections); might
                 be fully thread-safe (internal locking); might require access through a mutex
                 (and may or may not automatically acquire that mutex on behalf of a client method);
                 might be able to run only in the event queue; etc.
                 Also describe when any events are fired: synchronously, asynchronously, etc.
                 Ideas: <a href="http://core.netbeans.org/proposals/threading/index.html#recommendations">Threading Recommendations</a> (in progress)
             </hint>
         </question>
 -->
 <answer id="exec-threading">
 Use of token hierarchies for mutable input sources
 must adhere to the locking mechanisms for the input sources themselves.
 <br/>
 For example accessing token hierarchy for swing document
 requires read/write locking of document prior accessing token hierarchy.
 </answer>

 <!--
         <question id="format-clipboard" when="impl">
             Which data flavors (if any) does your code read from or insert to
             the clipboard (by access to clipboard on means calling methods on <code>java.awt.datatransfer.Transferable</code>?

             <hint>
             Often Node's deal with clipboard by usage of <code>Node.clipboardCopy, Node.clipboardCut and Node.pasteTypes</code>.
             Check your code for overriding these methods.
             </hint>
         </question>
 -->
 <answer id="format-clipboard">
 No clipboard support.
 </answer>


 <!--
         <question id="format-dnd" when="impl">
             Which protocols (if any) does your code understand during Drag &amp; Drop?
             <hint>
             Often Node's deal with clipboard by usage of <code>Node.drag, Node.getDropType</code>.
             Check your code for overriding these methods. Btw. if they are not overridden, they
             by default delegate to <code>Node.clipboardCopy, Node.clipboardCut and Node.pasteTypes</code>.
             </hint>
         </question>
 -->
 <answer id="format-dnd">
 No D&amp;D.
 </answer>


 <!--
         <question id="format-types" when="impl">
             Which protocols and file formats (if any) does your module read or write on disk,
             or transmit or receive over the network?
         </question>
 -->
 <answer id="format-types">
 No files read or written to the disk.
 </answer>


 <!--
         <question id="lookup-lookup" when="init">
             Does your module use <code>org.openide.util.Lookup</code>
             or any similar technology to find any components to communicate with? Which ones?

             <hint>
             Please describe the interfaces you are searching for, where
             are defined, whether you are searching for just one or more of them,
             if the order is important, etc. Also classify the stability of such
             API contract.
             </hint>
         </question>
 -->
 <answer id="lookup-lookup">
 No
 </answer>


 <!--
         <question id="lookup-register" when="final">
             Do you register anything into lookup for other code to find?
             <hint>
             Do you register using layer file or using <code>META-INF/services</code>?
             Who is supposed to find your component?
             </hint>
         </question>
 -->
 <answer id="lookup-register">
 No.
 </answer>


 <!--
         <question id="lookup-remove" when="final">
             Do you remove entries of other modules from lookup?
             <hint>
             Why? Of course, that is possible, but it can be dangerous. Is the module
             your are masking resource from aware of what you are doing?
             </hint>
         </question>
 -->
 <answer id="lookup-remove">
 No.
 </answer>


 <!--
         <question id="perf-exit" when="final">
             Does your module run any code on exit?
         </question>
 -->
 <answer id="perf-exit">
 No.
 </answer>


 <!--
         <question id="perf-huge_dialogs" when="final">
             Does your module contain any dialogs or wizards with a large number of
             GUI controls such as combo boxes, lists, trees, or text areas?
         </question>
 -->
 <answer id="perf-huge_dialogs">
 No.
 </answer>


 <!--
         <question id="perf-limit" when="init">
             Are there any hard-coded or practical limits in the number or size of
             elements your code can handle?
         </question>
 -->
 <answer id="perf-limit">
 No practical limits.
 </answer>


 <!--
         <question id="perf-mem" when="final">
             How much memory does your component consume? Estimate
             with a relation to the number of windows, etc.
         </question>
 -->
 <answer id="perf-mem">
 Memory consumption is critical for created tokens because there can be thousands
 of tokens per typical document. Thus there are several basic token types:
 <ul>
     <li>DefaultToken: 24 bytes </li>
     <li>StringToken: 32 bytes (but only used for flyweight tokens)</li>
     <li>PrepToken: 32 bytes plus text storage size (but only used
         for tokens where character preprocessing was necessary)
     </li>
 </ul>
 </answer>


 <!--
         <question id="perf-menus" when="final">
             Does your module use dynamically updated context menus, or
             context-sensitive actions with complicated enablement logic?
         </question>
 -->
 <answer id="perf-menus">
 No.
 </answer>


 <!--
         <question id="perf-progress" when="final">
             Does your module execute any long-running tasks?

             <hint>Long running tasks should never block
             AWT thread as it badly hurts the UI
             <a href="http://performance.netbeans.org/responsiveness/issues.html">
             responsiveness</a>.
             Tasks like connecting over
             network, computing huge amount of data, compilation
             be done asynchronously (for example
             using <code>RequestProcessor</code>), definitively it should
             not block AWT thread.
             </hint>
         </question>
 -->
 <answer id="perf-progress">
 All the tasks should be granularized.
 Both batch and incremental lexing is done lazily as clients ask for tokens.
 <br/>
 The only potential long-running task is relexing of a very long portion of documents
 e.g. if someone would type '/*' at the begining of java document
 without any comments - the whole document turns into unclosed comment.
 <br/>
 This typically isn't a problem unless the very long token does not need to be lexed
 several times (the original support without permanent tokens had to lex the token
 upon each request).
 <br/>
 The lexer framework further helps to improve the situation by introducing
 token validation which attempts to validate the token by checking
  whether the typed character may really affect the token
  or whether it's just necessary to fix the original token's length.

 </answer>


 <!--
         <question id="perf-scale" when="init">
             Which external criteria influence the performance of your
             program (size of file in editor, number of files in menu,
             in source directory, etc.) and how well your code scales?
             <hint>
             Please include some estimates, there are other more detailed
             questions to answer in later phases of implementation.
             </hint>
         </question>
 -->
 <answer id="perf-scale">
 On a typical machine the framework is able to produce about 370,000 tokens
 of a text input with 1 million characters in less than 0.5 second.
 </answer>


 <!--
         <question id="perf-spi" when="init">
             How the performance of the plugged in code will be enforced?
             <hint>
             If you allow foreign code to be plugged into your own module, how
             do you enforce that it will behave correctly and quickly and will not
             negatively influence the performance of your own module?
             </hint>
         </question>
 -->
 <answer id="perf-spi">
 The token change listeners implementations should be written to execute quickly.
 For complex tasks they should reschedule its work into another thread.
 </answer>


 <!--
         <question id="perf-startup" when="final">
             Does your module run any code on startup?
         </question>
 -->
 <answer id="perf-startup">
 No.
 </answer>


 <!--
         <question id="perf-wakeup" when="final">
             Does any piece of your code wake up periodically and do something
             even when the system is otherwise idle (no user interaction)?
         </question>
 -->
 <answer id="perf-wakeup">
 No.
 </answer>


 <!--
         <question id="resources-file" when="final">
             Does your module use <code>java.io.File</code> directly?

             <hint>
             NetBeans provide a logical wrapper over plain files called
             <code>org.openide.filesystems.FileObject</code> that
             provides uniform access to such resources and is the preferred
             way that should be used. But of course there can be situations when
             this is not suitable.
             </hint>
         </question>
 -->
 <answer id="resources-file">
 No.
 </answer>


 <!--
         <question id="resources-layer" when="final">
             Does your module provide own layer? Does it create any files or
             folders in it? What it is trying to communicate by that and with which
             components?

             <hint>
             NetBeans allows automatic and declarative installation of resources
             by module layers. Module register files into appropriate places
             and other components use that information to perform their task
             (build menu, toolbar, window layout, list of templates, set of
             options, etc.).
             </hint>
         </question>
 -->
 <answer id="resources-layer">
 No.
 </answer>


 <!--
         <question id="resources-mask" when="final">
             Does your module mask/hide/override any resources provided by other modules in
             their layers?

             <hint>
             If you mask a file provided by another module, you probably depend
             on that and do not want the other module to (for example) change
             the file's name. That module shall thus make that file available as an API
             of some stability category.
             </hint>
         </question>
 -->
 <answer id="resources-mask">
 No.
 </answer>


 <!--
         <question id="resources-read" when="final">
             Does your module read any resources from layers? For what purpose?

             <hint>
             As this is some kind of intermodule dependency, it is a kind of API.
             Please describe it and classify according to
             <a href="http://openide.netbeans.org/tutorial/api-design.html#categories">
             common stability categories</a>.
             </hint>
         </question>
 -->
 <answer id="resources-read">
 No.
 </answer>


 <!--
         <question id="security-grant" when="final">
             Does your code grant addition rights to some code?
             <hint>Avoid using a classloder that adds some extra
             permissions to loaded code unless realy necessary.
             Also note that your API implementation
             can also expose unneeded permissions to enemy code by
             AccessController.doPrilileged() calls.</hint>
         </question>
 -->
 <answer id="security-grant">
 No.
 </answer>


 <!--
         <question id="security-policy" when="final">
             Does your functionality require standard policy file modification?
             <hint>Your code may pass control to third party code not
             coming from trusted domain. It covers code downloaded over
             network or code coming from libraries that are not bundled
             with NetBeans. Which permissions it needs to grant to which domain?</hint>
         </question>
 -->
 <answer id="security-policy">
 No.
 </answer>


 <!--
         <question id="exec-ant-tasks" when="impl">
             Do you define or register any ant tasks that other can use?

             <hint>
             If you provide an ant task that users can use, you need to be very
             careful about its syntax and behaviour, as it most likely forms an
 	          API for end users and as there is a lot of end users, their reaction
             when such API gets broken can be pretty strong.
             </hint>
         </question>
 -->
 <answer id="exec-ant-tasks">
 No.
 </answer>

 <!--
         <question id="arch-where" when="init">
             Where one can find sources for your module?
             <hint>
                 Please provide link to the CVS web client at
                 http://www.netbeans.org/download/source_browse.html
                 or just use tag defaultanswer generate='here'
             </hint>
         </question>
 -->
 <answer id="arch-where">
   <defaultanswer generate='here' />
 </answer>


 <!--
         <question id="compat-deprecation" when="init">
             How the introduction of your project influences functionality
             provided by previous version of the product?
             <hint>
             If you are planning to deprecate/remove/change any existing APIs,
             list them here accompanied with the reason explaining why you
             are doing so.
             </hint>
         </question>
 -->
  <answer id="compat-deprecation">
   <p>
    The current API completely replaces the original one therefore
    the major version of the module was increased from 1 to 2.
    <br/>
    There are no plans to deprecated any part of the present API
    and it should be evolved in a compatible way.
   </p>
  </answer>


 <!--
         <question id="resources-preferences" when="final">
             Does your module uses preferences via Preferences API? Does your module use NbPreferences or
             or regular JDK Preferences ? Does it read, write or both ?
             Does it share preferences with other modules ? If so, then why ?
             <hint>
                 You may use
                     &lt;api type="export" group="preferences"
                     name="preference node name" category="private"&gt;
                     description of individual keys, where it is used, what it
                     influences, whether the module reads/write it, etc.
                     &lt;/api&gt;
                 Due to XML ID restrictions, rather than /org/netbeans/modules/foo give the "name" as org.netbeans.modules.foo.
                 Note that if you use NbPreferences this name will then be the same as the code name base of the module.
             </hint>
         </question>
 -->
  <answer id="resources-preferences">
   <p>
    No.
   </p>
  </answer>

 </api-answers>