blob: 46936daceba46af6c30f55d2e1be00ad816304b8 [file] [log] [blame]
*Chapter 3*
h1. Lexical Structure
This chapter specifies the lexical structure of the Groovy programming language.
The organization of this chapter parallels the chapter on [Lexical Structure|http://example.com] in the [Java Language Specification (third edition)|http://example.com], and builds on top of that specification.
{anchor:3.1}
h2. 3.1 Unicode
Versions of the Groovy programming language up to and including 1.0 final use Unicode version 3.0 because J2SE 1.4 does.
Upgrades to newer versions of the Unicode Standard occurred in J2SE 5.0 (to Unicode 4.0).
The range of legal code points since J2SE 5.0 is now U+0000 to U+10FFFF, using the hexadecimal U+n _notation_.
The Groovy programming language represents text in sequences of 16-bit code units, using the UTF-16 encoding.
(-) Unlike Java, Groovy has no character literals, see ($3.10.4)
{anchor:3.2}
h2. 3.2 Lexical Translations
Unchanged. Tokenization of character streams occurs exactly as in Java.
{anchor:3.3}
h2. 3.3 Unicode Escapes
Unchanged. Unicode Escapes follow the same rules and are evaluated
at the same time as Java.
*J2SE 5.0+*
Representing supplementary characters requires two consecutive Unicode escapes.
{anchor:3.4}
h2. 3.4 Line Terminators
Unchanged. Unicode input characters are split into lines in the same way as Java.
{anchor:3.5}
h2. 3.5 Input Elements and Tokens
(+) Shell comments are possible on the first line of a groovy source file ($3.7)
(+) The definition of _Token_ includes _StringConstructor_ ($3.10.todo), as shown below.
{code}
Input:
ShellComment(opt) InputElements(opt) Sub(opt)
Token:
Identifier
Keyword
Literal
StringConstructor
Separator
Operator
{code}
(i) It is also noted that line terminators (as defined by ($3.4)) may be classified as either separators ($3.11) or white space ($3.6), according to the rules of ($3.11).
{anchor:3.6}
h2. 3.6 White Space
(i) Some line terminators are transformed into separators instead of whitespace, according to the rules defined in ($3.11), below.
{anchor:3.7}
h2. 3.7 Comments
(+) Groovy has a third kind of _comment_ which is only acceptable on the first line of a groovy source file ($3.5)
{table}
syntax | description
{{# text}} | A _shell_ _comment_: all the text from the ASCII
| character {{#}} to the end of the line is ignored (as in
| Unix shell scripts).
{table}
This comment style is formally specified by the following amendment to the Java productions ([JLS 3.7|http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html#48125]) :
{code}
ShellComment:
'#' CharactersInLine(opt)
{code}
These productions imply all of the following additional properties:
- {{#}}, {{/\*}} and {{\*/}} have no special meaning in comments that begin with {{//}}.
- {{#}} and {{//}} have no special meaning in comments that begin with {{/\*}} or {{/\*\*}}.
- {{//}}, {{/\*}} and {{\*/}} have no special meaning in comments that begin with {{#}}.
The lexical grammar implies that comments do not occur within string literals ($3.10.5) or regex literals ($3.10.todo).
(+) Note that the newline terminating an "end of line" comment can be significant. ($3.11)
{anchor:3.8}
h2. 3.8 Identifiers
(i) Groovy identifiers consist of _Java_ _letters_ and _Java_ _numbers_ with the exception of the ASCII dollar character '$' which is not a legal identifier character.
{table}
todo
jrose - "The dollar sign is sometimes used internally by Groovy to mangle non-Java identifiers which must be converted to Java names. For this reason, it would be confusing to allow unescaped dollar signs as Groovy identifier constituents."
jrayner - Is this true blackdrag?
Also... should we implement Character.isGroovyIdentifier(), including a formalism of this rule above?
{table}
(+) Groovy provides a way to use any Unicode string as a member name. (See [Chapter 6|Chapter06Names.html].)
{anchor:3.9}
h2. 3.9 Keywords
(+) The following character sequences, formed from ASCII letters, are reserved for use as additional _keywords_ and cannot be used as unqualified identifiers ($3.8):
{table}
Keywords: | one of |
{{any }} | {{def }} | {{threadsafe}}
{{as }} | {{in }} | {{with}}
{table}
The keywords {{any}}, {{const}}, {{goto}}, {{threadsafe}} and {{with}} are reserved, even though they are not currently used. This may allow a Groovy compiler to produce better error messages if these keywords incorrectly appear in programs.
(-) The keywords {{do}}, {{strictfp}}, {{native}} are not used in groovy.
(i) The following appear to not yet be implemented in groovy, but I think they are intended to be {{throws}}, {{enum}}, {{final}}
(i) Reference Implementation Note: {{metaClass}} and {{\_\_timeStamp}} are used internally, and should be avoided as basic members of Groovy Objects.
{table}
todo
Should we allow keywords as qualified member names (when idents quoted)?
e.g. myFoo.with() or myBar."any" etc...
{table}