<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?> | |
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> | |
<html> | |
<head> | |
<title>ASF: XSLTC Compiler Design</title> | |
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" /> | |
<meta http-equiv="Content-Style-Type" content="text/css" /> | |
<link rel="stylesheet" type="text/css" href="resources/apache-xalan.css" /> | |
</head> | |
<!-- | |
* Licensed to the Apache Software Foundation (ASF) under one | |
* or more contributor license agreements. See the NOTICE file | |
* distributed with this work for additional information | |
* regarding copyright ownership. The ASF licenses this file | |
* to you under the Apache License, Version 2.0 (the "License"); | |
* you may not use this file except in compliance with the License. | |
* You may obtain a copy of the License at | |
* | |
* http://www.apache.org/licenses/LICENSE-2.0 | |
* | |
* Unless required by applicable law or agreed to in writing, software | |
* distributed under the License is distributed on an "AS IS" BASIS, | |
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
* See the License for the specific language governing permissions and | |
* limitations under the License. | |
--> | |
<body> | |
<div id="title"> | |
<table class="HdrTitle"> | |
<tbody> | |
<tr> | |
<th rowspan="2"> | |
<a href="../index.html"> | |
<img alt="Trademark Logo" src="resources/XalanJ-Logo-tm.png" width="190" height="90" /> | |
</a> | |
</th> | |
<th text-align="center" width="75%"> | |
<a href="index.html">XSLTC Design</a> | |
</th> | |
</tr> | |
<tr> | |
<td valign="middle">XSLTC Compiler Design</td> | |
</tr> | |
</tbody> | |
</table> | |
<table class="HdrButtons" align="center" border="1"> | |
<tbody> | |
<tr> | |
<td> | |
<a href="http://www.apache.org">Apache Foundation</a> | |
</td> | |
<td> | |
<a href="http://xalan.apache.org">Xalan Project</a> | |
</td> | |
<td> | |
<a href="http://xerces.apache.org">Xerces Project</a> | |
</td> | |
<td> | |
<a href="http://www.w3.org/TR">Web Consortium</a> | |
</td> | |
<td> | |
<a href="http://www.oasis-open.org/standards">Oasis Open</a> | |
</td> | |
</tr> | |
</tbody> | |
</table> | |
</div> | |
<div id="navLeft"> | |
<ul> | |
<li> | |
<a href="index.html">Overview</a> | |
</li></ul><hr /><ul> | |
<li>Compiler design<br /> | |
</li></ul><hr /><ul> | |
<li> | |
<a href="xsl_whitespace_design.html">Whitespace</a> | |
</li> | |
<li> | |
<a href="xsl_sort_design.html">xsl:sort</a> | |
</li> | |
<li> | |
<a href="xsl_key_design.html">Keys</a> | |
</li> | |
<li> | |
<a href="xsl_comment_design.html">Comment design</a> | |
</li></ul><hr /><ul> | |
<li> | |
<a href="xsl_lang_design.html">lang()</a> | |
</li> | |
<li> | |
<a href="xsl_unparsed_design.html">Unparsed entities</a> | |
</li></ul><hr /><ul> | |
<li> | |
<a href="xsl_if_design.html">If design</a> | |
</li> | |
<li> | |
<a href="xsl_choose_design.html">Choose|When|Otherwise design</a> | |
</li> | |
<li> | |
<a href="xsl_include_design.html">Include|Import design</a> | |
</li> | |
<li> | |
<a href="xsl_variable_design.html">Variable|Param design</a> | |
</li></ul><hr /><ul> | |
<li> | |
<a href="xsltc_runtime.html">Runtime</a> | |
</li></ul><hr /><ul> | |
<li> | |
<a href="xsltc_dom.html">Internal DOM</a> | |
</li> | |
<li> | |
<a href="xsltc_namespace.html">Namespaces</a> | |
</li></ul><hr /><ul> | |
<li> | |
<a href="xsltc_trax.html">Translet & TrAX</a> | |
</li> | |
<li> | |
<a href="xsltc_predicates.html">XPath Predicates</a> | |
</li> | |
<li> | |
<a href="xsltc_iterators.html">Xsltc Iterators</a> | |
</li> | |
<li> | |
<a href="xsltc_native_api.html">Xsltc Native API</a> | |
</li> | |
<li> | |
<a href="xsltc_trax_api.html">Xsltc TrAX API</a> | |
</li> | |
<li> | |
<a href="xsltc_performance.html">Performance Hints</a> | |
</li> | |
</ul> | |
</div> | |
<div id="content"> | |
<h2>XSLTC Compiler Design</h2> | |
<ul> | |
<li> | |
<a href="#overview">Compiler Overview</a> | |
</li> | |
<li> | |
<a href="#ast">Building the Abstract Syntax Tree</a> | |
</li> | |
<li> | |
<a href="#typecheck">Type-check and Cast Expressions</a> | |
</li> | |
<li> | |
<a href="#compile">JVM byte-code generation</a> | |
</li> | |
</ul> | |
<a name="overview">‌</a> | |
<p align="right" size="2"> | |
<a href="#content">(top)</a> | |
</p> | |
<h3>Compiler overview</h3> | |
<p>The main component of the XSLTC compiler is the class</p> | |
<ul> | |
<li> | |
<code>org.apache.xalan.xsltc.compiler.XSLTC</code> | |
</li> | |
</ul> | |
<p>This class uses three parsers to consume the input stylesheet(s):</p> | |
<ul> | |
<li> | |
<code>javax.xml.parsers.SAXParser</code> | |
</li> | |
</ul> | |
<p>is used to parse the stylesheet document and pass its contents to | |
the compiler as basic SAX2 events.</p> | |
<ul> | |
<li> | |
<code>com.sun.xslt.compiler.XPathParser</code> | |
</li> | |
</ul> | |
<p> is a parser used to parse XPath expressions and patterns. This parser | |
is generated using JavaCUP and JavaLEX from Princeton University.</p> | |
<ul> | |
<li> | |
<code>com.sun.xslt.compiler.Parser</code> | |
</li> | |
</ul> | |
<p>is a wrapper for the other two parsers. This parser is responsible for | |
using the other two parsers to build the compiler's abstract syntax tree | |
(which is described in more detail in the next section of this document). | |
</p> | |
<a name="ast">‌</a> | |
<p align="right" size="2"> | |
<a href="#content">(top)</a> | |
</p> | |
<h3>Building an Abstract Syntax Tree</h3> | |
<p>An abstract syntax tree (AST) is a data-structure commonly used by | |
compilers to separate the parse-phase from the later phases of the | |
compilation. The AST has one node for each parsed token from the stylesheet | |
and can easily be parsed at the stages of type-checking and bytecode | |
generation.</p> | |
<ul> | |
<li> | |
<a href="#mapping">Mapping stylesheet elements to AST nodes</a> | |
</li> | |
<li> | |
<a href="#domxsl">Building the AST from AST nodes</a> | |
</li> | |
<li> | |
<a href="#mapping">Mapping XPath expressions and patterns to additional AST nodes</a> | |
</li> | |
</ul> | |
<p>The SAX parser passes the contents of the stylesheet to XSLTC's main | |
parser. The SAX events represent a decomposition of the XML document that | |
contains the stylesheet. The main parser needs to create one AST node from | |
each node that it receives from the SAX parser. It also needs to use the | |
XPath parser to decompose attributes that contain XPath expressions and | |
patterns. Remember that XSLT is in effect two languages: XML and XPath, | |
and one parser is needed for each of these languages. The SAX parser breaks | |
down the stylesheet document, the XPath parser breaks down XPath expressions | |
and patterns, and the main parser maps the decomposed elements into nodes | |
in the abstract syntax tree.</p> | |
<a name="mapping">‌</a> | |
<p align="right" size="2"> | |
<a href="#content">(top)</a> | |
</p> | |
<h4>Mapping stylesheets elements to AST nodes</h4> | |
<p>Every element that is defined in the XSLT 1.0 spec is represented by a | |
a class in the <code>org.apache.xalan.xsltc.compiler</code> package. The | |
main parser class contains a <code>Hashtable</code> that that maps XSL | |
elements into Java classes that make up the nodes in the AST. These Java | |
classes all reside in the <code>org.apache.xalan.xsltc.compiler</code> | |
package and extend either the <code>TopLevelElement</code> or the | |
<code>Instruction</code> classes. (Both these classes extend the | |
<code>SyntaxTreeNode</code> class.)</p> | |
<p>The mapping from XSL element names to Java classes/AST nodes is set up | |
in the <code>initClasses()</code> method of the main parser:</p> | |
<blockquote class="source"> | |
<pre> | |
private void initStdClasses() { | |
try { | |
initStdClass("template", "Template"); | |
initStdClass("param", "Param"); | |
initStdClass("with-param", "WithParam"); | |
initStdClass("variable", "Variable"); | |
initStdClass("output", "Output"); | |
: | |
: | |
: | |
} | |
} | |
private void initClass(String elementName, String className) | |
throws ClassNotFoundException { | |
_classes.put(elementName, | |
Class.forName(COMPILER_PACKAGE + '.' + className)); | |
}</pre> | |
</blockquote> | |
<a name="domxsl">‌</a> | |
<p align="right" size="2"> | |
<a href="#content">(top)</a> | |
</p> | |
<h4>Building the AST from AST nodes</h4> | |
<p>The parser builds an AST from the various syntax tree nodes. Each node | |
contains a reference to its parent node, a vector containing references | |
to all child nodes and a structure containing all attribute nodes:</p> | |
<blockquote class="source"> | |
<pre> | |
protected SyntaxTreeNode _parent; // Parent node | |
private Vector _contents; // Child nodes | |
protected Attributes _attributes; // Attributes of this element</pre> | |
</blockquote> | |
<p>These variables should be accessed using these methods:</p> | |
<blockquote class="source"> | |
<pre> | |
protected final SyntaxTreeNode getParent(); | |
protected final Vector getContents(); | |
protected String getAttribute(String qname); | |
protected Attributes getAttributes();</pre> | |
</blockquote> | |
<p>At this time the AST only contains nodes that represent the XSL elements | |
from the stylesheet. A SAX parse is generic and can only handle XML files | |
and will not break up and identify XPath patterns/expressions (these are | |
stored as attributes to the various nodes in the tree). Each XSL instruction | |
gets its own node in the AST, and the XPath patterns/expressions are stored | |
as attributes of these nodes. A stylesheet looking like this:</p> | |
<blockquote class="source"> | |
<pre> | |
<xsl:stylesheet .......> | |
<xsl:template match="chapter"> | |
<xsl:text>Chapter</xsl:text> | |
<xsl:value-of select="."> | |
</xsl:template> | |
</xsl>stylesheet></pre> | |
</blockquote> | |
<p>will be stored in the AST as indicated in the following picture:</p> | |
<p> | |
<img src="ast_stage1.gif" alt="ast_stage1.gif" /> | |
</p> | |
<p> | |
<b> | |
<i>Figure 1: The AST in its first stage</i> | |
</b> | |
</p> | |
<p>All objects that make up the nodes in the initial AST have a | |
<code>parseContents()</code> method. This method is responsible for:</p> | |
<ul> | |
<li>parsing the values of those attributes that contain XPath expressions | |
or patterns, breaking each expression/pattern into AST nodes and inserting | |
them into the tree.</li> | |
<li>reading/checking all other required attributes</li> | |
<li>propagate the <code>parseContents()</code> call down the tree</li> | |
</ul> | |
<p align="right" size="2"> | |
<a href="#content">(top)</a> | |
</p> | |
<h4>Mapping XPath expressions and patterns to additional AST nodes</h4> | |
<p>The nodes that represent the XPath expressions and patterns extend | |
either the <code>Expression</code> or <code>Pattern</code> class | |
respectively. These nodes are not appended to the <code>_contents</code> | |
vectory of each node, but rather stored as individual references in each | |
AST element node. One example is the <code>ForEach</code> class that | |
represents the <code><xsl:for-each></code> element. This class has | |
a variable that contains a reference to the AST sub-tree that represents | |
its <code>select</code> attribute:</p> | |
<blockquote class="source"> | |
<pre> | |
private Expression _select;</pre> | |
</blockquote> | |
<p>There is no standard way of storing these XPath expressions and each | |
AST node that contains one or more XPath expression/pattern must handle | |
that itself. This handling basically involves passing the attribute's | |
value to the XPath parser and receiving back an AST sub-tree.</p> | |
<p>With all XPath expressions/patterns expanded, the AST will look somewhat | |
like this:</p> | |
<p> | |
<img src="ast_stage2.gif" alt="ast_stage2.gif" /> | |
</p> | |
<p> | |
<b> | |
<i>Fiugre 2: The AST in its second stage</i> | |
</b> | |
</p> | |
<a name="typecheck">‌</a> | |
<p align="right" size="2"> | |
<a href="#content">(top)</a> | |
</p> | |
<h3>Type-check and Cast Expressions</h3> | |
<p>In many cases we will need to typecast the top node in the expression | |
sub-tree to suit the expected result-type of the expression, or to typecast | |
child nodes to suit the allowed types for the various operators in the | |
expression. This is done by calling 'typeCheck()' on the root-node in the | |
XSL tree. Each SyntaxTreeNode node is responsible for inserting type-cast | |
nodes between itself and its child nodes or XPath nodes. These type-cast | |
nodes will convert the output-types of the child/XPath nodes to the expected | |
input-type of the parent node. Let look at our AST again and the node that | |
represents the <code><xsl:value-of></code> element. This element | |
expects to receive a string from its <code>select</code> XPath expression, | |
but the <code>Step</code> expression will return either a node-set or a | |
single node. An extra node is inserted into the AST to perform the | |
necessary type conversions:</p> | |
<p> | |
<img src="ast_stage3.gif" alt="ast_stage3.gif" /> | |
</p> | |
<p> | |
<b> | |
<i>Figure 3: XPath expression type cast</i> | |
</b> | |
</p> | |
<p>The <code>typeCheck()</code> method of each SyntaxTreeNode object will | |
call <code>typeCheck()</code> on each of its XPath expressions. This method | |
will return the native type returned by the expression. The AST node will | |
insert an additional type-conversion node if the return-type does not match | |
the expected data-type. Each possible return type is represented by a class | |
in the <code>org.apache.xalan.xsltc.compiler.util</code> package. These | |
classes all contain methods that will generate bytecodes needed to perform | |
the actual type conversions (at runtime). The type-cast nodes in the AST | |
mainly consist of calls to these methods.</p> | |
<a name="compile">‌</a> | |
<p align="right" size="2"> | |
<a href="#content">(top)</a> | |
</p> | |
<h3>JVM byte-code generation</h3> | |
<ul> | |
<li> | |
<a href="#stylesheet">Compiling the stylesheet</a> | |
</li> | |
<li> | |
<a href="#toplevel">Compiling top-level elements</a> | |
</li> | |
<li> | |
<a href="#templates">Compiling template code</a> | |
</li> | |
<li> | |
<a href="#instructions">Compiling instructions, functions expressions and patterns</a> | |
</li> | |
</ul> | |
<p>Evey node in the AST extends the <code>SyntaxTreeNode</code> base class | |
and implements the <code>translate()</code> method. This method is | |
responsible for outputting the actual bytecodes that make up the | |
functionality required for each element, function, expression or pattern. | |
</p> | |
<a name="stylesheet">‌</a> | |
<p align="right" size="2"> | |
<a href="#content">(top)</a> | |
</p> | |
<h4>Compiling the stylesheet</h4> | |
<p>Some nodes in the AST require more complex code than others. The best | |
example is the <code><xsl:stylesheet></code> element. The code that | |
represents this element has to tie together the code that is generated by | |
all the other elements and generate the actual class definition for the main | |
translet class. The <code>Stylesheet</code> class generates the translet's | |
constructor and methods that handle all top-level elements.</p> | |
<a name="toplevel">‌</a> | |
<p align="right" size="2"> | |
<a href="#content">(top)</a> | |
</p> | |
<h4>Compiling top-level elements</h4> | |
<p>The bytecode that handles top-level elements must be generated before any | |
other code. The '<code>translate()</code>' method in these classes are | |
mainly called from these methods in the Stylesheet class:</p> | |
<blockquote class="source"> | |
<pre> | |
private String compileBuildKeys(ClassGenerator); | |
private String compileTopLevel(ClassGenerator, Enumeration); | |
private void compileConstructor(ClassGenerator, Output);</pre> | |
</blockquote> | |
<p>These methods handle most top-level elements, such as global variables | |
and parameters, <code><xsl:output></code> and | |
<code><xsl:decimal-format></code> instructions.</p> | |
<a name="templates">‌</a> | |
<p align="right" size="2"> | |
<a href="#content">(top)</a> | |
</p> | |
<h4>Compiling template code</h4> | |
<p>All XPath patterns in <code><xsl:apply-template></code> | |
instructions are converted into numeric values (known as the pattern's | |
kernel 'type'). All templates with identical pattern kernel types are | |
grouped together and inserted into a table known as a test sequence. | |
(The table of test sequences is found in the Mode class in the compiler | |
package. There will be one such table for each mode that is used in the | |
stylesheet). This table is used to build a big <code>switch()</code> | |
statement in the translet's <code>applyTemplates()</code> method. This | |
method is initially called with the root node of the input document.</p> | |
<p>The <code>applyTemplates()</code> method determines the node's type and | |
passes this type to the <code>switch()</code> statement to look up the | |
matching template. The test sequence code (the <code>TestSeq</code> class) | |
is responsible for inserting bytecodes to find one matching template | |
in cases where more than one template matches the current node type.</p> | |
<p>There may be several templates that share the same pattern kernel type. | |
Here are a few examples of templates with patterns that all have the same | |
kernel type:</p> | |
<blockquote class="source"> | |
<pre> | |
<xsl:template match="A/C"> | |
<xsl:template match="A/B/C"> | |
<xsl:template match="A | C"></pre> | |
</blockquote> | |
<p>All these templates will be grouped under the type for | |
<code><C></code> and will all get the same kernel type (the type for | |
<code>"C"</code>). The last template will be grouped both under | |
<code>"C"</code> and <code>"A"</code>, since it matches either element. | |
If the type identifier for <code>"C"</code> in this case is 8, all these | |
templates will be put under <code>case 8:</code> in | |
<code>applyTemplates()</code>'s big <code>switch()</code> statement. The | |
<code>TestSeq</code> class will insert some code under the | |
<code>case 8:</code> statement (similar to if's and then's) in order to | |
determine which of the three templates to trigger.</p> | |
<a name="instructions">‌</a> | |
<p align="right" size="2"> | |
<a href="#content">(top)</a> | |
</p> | |
<h4>Compiling instructions, functions, expressions and patterns</h4> | |
<p>The template code is generated by calling <code>translate()</code> on | |
each <code>Template</code> object in the abstract syntax tree. This call | |
will be propagated down the abstract syntax tree and every element will | |
output the bytecodes necessary to complete its task.</p> | |
<p>The Java Virtual Machine is stack-based, which goes hand-in-hand with | |
the tree structure of a stylesheet and the AST. A node in the AST will | |
call <code>translate()</code> on its child nodes and any XPath nodes before | |
it generates its own bytecodes. In that way the correct sequence of JVM | |
instructions is generated. Each one of the child nodes is responsible of | |
creating code that leaves the node's output value (if any) on the stack. | |
The typical procedure for the parent node is to create JVM code that | |
consumes these values off the stack and then leave its own output on the | |
stack (for its parent).</p> | |
<p>The tree-structure of the stylesheet is in this way closely tied with | |
the stack-based JVM. The design does not offer any obvious way of extending | |
the compiler to output code for other non-stack-based VMs or processors.</p> | |
<p align="right" size="2"> | |
<a href="#content">(top)</a> | |
</p> | |
</div> | |
<div id="footer">Copyright © 1999-2012 The Apache Software Foundation<br />Apache, Xalan, and the Feather logo are trademarks of The Apache Software Foundation<div class="small">Web Page created on - Sun 03/18/2012</div> | |
</div> | |
</body> | |
</html> |