blob: ae4d69dd34e9e3055d14228c8822b6e32db713b0 [file] [log] [blame]
<?xml version="1.0" standalone="no"?>
<!DOCTYPE s1 SYSTEM "../../style/dtd/document.dtd">
<!--
* The Apache Software License, Version 1.1
*
*
* Copyright (c) 2001 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Xalan" and "Apache Software Foundation" must
* not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache",
* nor may "Apache" appear in their name, without prior written
* permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation and was
* originally based on software copyright (c) 2001, Sun
* Microsystems., http://www.sun.com. For more
* information on the Apache Software Foundation, please see
* <http://www.apache.org/>.
-->
<s1 title="XSLTC Compiler Design">
<ul>
<li><link anchor="overview">Compiler Overview</link></li>
<li><link anchor="ast">Building an Abstract Syntax Tree</link></li>
<li><link anchor="typecheck">Type-check and Cast Expressions</link></li>
<li><link anchor="compile">Code generation</link></li>
</ul>
<anchor name="overview"/>
<s2 title="Compiler overview">
<p>The input stylesheet is parsed using the SAX 1-based parser from Sun's
Project X:</p>
<ul>
<li><code>com.sun.xml.parser.Parser</code></li>
</ul>
<p>This parser builds a DOM from the stylesheet document, and hands this
DOM over to the compiler. The compiler uses its own specialised parser to
parse XPath expressions and patterns:</p>
<ul>
<li><code>com.sun.xslt.compiler.XPathParser</code></li>
</ul>
<p>Both parsers are encapsulated in XSLTC's parser class:</p>
<ul>
<li><code>com.sun.xslt.compiler.Parser</code></li>
</ul>
</s2><anchor name="ast"/>
<s2 title="Building an Abstract Syntax Tree">
<ul>
<li><link anchor="mapping">Mapping stylesheet elements to Java classes</link></li>
<li><link anchor="domxsl">Building a DOM tree from the input XSL file</link></li>
</ul>
<p>The SAX parser builds a standard W3C DOM from the source stylesheet.
This DOM does not contain all the information needed to represent the
whole stylesheet. ( Remember that XSL is two languages; XML and XPath.
The DOM only covers XML. ) The compiler uses the DOM to build an
abstract syntax tree (AST) that contains all the nodes from the DOM, plus
additional nodes for the XPath expressions.</p>
<anchor name="mapping"/>
<s3 title="Mapping stylesheets elements to Java classes">
<p>Each XSL element is represented by a class in the
<code>com.sun.xslt.compiler</code> package. The Parser class contains a
Hashtable that that maps XSL instructions to classes that inherit from a
common parent class 'Instruction' (which again inherits from
'SyntaxTreeNode'). This mapping is set up in the <code>initClasses()</code> method:</p>
<source> private void initStdClasses() {
try {
initStdClass("template", "Template");
initStdClass("param", "Param");
initStdClass("with-param", "WithParam");
initStdClass("variable", "Variable");
initStdClass("output", "Output");
:
:
:
}
}
private void initClass(String elementName, String className)
throws ClassNotFoundException {
_classes.put(elementName,
Class.forName(COMPILER_PACKAGE + '.' + className));
}</source>
</s3><anchor name="domxsl"/>
<s3 title="Building a DOM tree from the input XSL file">
<p>The parser instanciates a DOM that holds the input XSL stylesheet. The
DOM can only handle XML files and will not break up and identify XPath
patterns/expressions (these are stored as attributes to the various
nodes in the tree) or calls to XSL functions(). Each XSL instruction gets
its own node in the DOM, and the XPath patterns/expressions are stored as
attributes of these nodes. A stylesheet looking like this:</p>
<source>
&lt;xsl:stylesheet .......&gt;
&lt;xsl:template match="chapter"&gt;
&lt;xsl:text&gt;Chapter&lt;/xsl:text&gt;
&lt;xslvalue-of select="."&gt;
&lt;/xsl:template&gt;
&lt;/xsl&gt;stylesheet&gt;
</source>
<p>will be stored in the DOM as indicated in the following picture:</p>
<p><img src="compiler_DOM.gif" alt="compiler_DOM.gif"/></p>
<p><ref>Figure 1: DOM containing XSL stylesheet</ref></p>
<p>The pattern '<code>match="chapter"</code>' and the expression
'<code>select="."</code>' are stored as attributes for the nodes
'<code>xsl:template</code>' and '<code>xsl:value-of</code>' respectively.
These attributes are accessible through the DOM interface.</p>
</s3>
<s3 title="Creating the Abstract Syntax Tree from the DOM">
<p>What we have to do next is to create a tree that also holds the XSL
specific elements; XPath expressions and patterns (with possible filters)
and calls to XSL functions. This is done by parsing the DOM and creating an
instance of a subclass of 'SyntaxTreeNode' for each node in the DOM. A node
in the DOM containing an XSL instruction (for example, "xsl:template") results in an
instance of the correspoding class derived from the HashTable created by
the parser (in this case in instance of the 'Template' class).</p>
<p>Each class that inherits SyntaxTreeNode has a vector called
'<code>_contents</code>' that holds references to all the children of the node
(if any). Each node has a method called '<code>parseContents()</code>'. It is
the responsibility of this method to parse any XPath expressions/patterns
that are expected and found in the node's attributes. The XPath patterns
and instructions are tokenised using the auto-generated class 'XPathParser'
(generated using JavaCup and JLex). The tokenised expressions/patterns
will result in a small sub-tree owned by the syntax tree node.</p>
<p>XSL nodes holding expressions has a pointer called '<code>_select</code>' that
points to a sub-tree representing the expression. This can be seen for
instance in the 'Template' class:</p>
<p><img src="compiler_AST.gif" alt="compiler_AST.gif"/></p>
<p><ref>Fiugre 2: Sample Abstract Syntax Tree</ref></p>
<p>In this example _select only points to a single node. In more complex
expressions the pointer will point to an whole sub-tree.</p>
</s3>
</s2><anchor name="typecheck"/>
<s2 title="Type-check and Cast Expressions">
<p>In many cases we will need to typecast the top node in the expression
sub-tree to suit the expected result-type of the expression, or to typecast
child nodes to suit the allowed types for the various operators in the
expression. This is done by calling 'typeCheck()' on the root-node in the
XSL tree. Each SyntaxTree node is responsible for its own type checking
(ie. the <code>typeCheck()</code> method must be overridden). Let us say that
our pattern was:</p>
<p><code>&lt;xsl:value-of select=&quot;1+2.73&quot;/&gt;</code></p>
<p><img src="typecast.gif" alt="typecast.gif"/></p>
<p><ref>Figure 3: XPath expression type conflict</ref></p>
<p>The number 1 is an integer, and the number 2.73 is a real number, so the
1 has to be promoted to a real. This is done ny inserting a new node between
the [1] and the [+]. This node will convert the 1 to a real number:</p>
<p><img src="cast_expression.gif" alt="cast_expression.gif"/></p>
<p><ref>Figure 4: Type casting</ref></p>
<p>The inserted node is an object of the class CastExpr. The SymbolTable
that was instanciated in (1) is used to determine what casts are needed for
the various operators and what return types the various expressions will
have.</p>
</s2><anchor name="compile"/>
<s2 title="Code generation">
<ul>
<li><link anchor="toplevelelem">Compiling top-level elements</link></li>
<li><link anchor="templatecode">Compiling template code</link></li>
<li><link anchor="instrfunc">Compiling XSL instructions and functions</link></li>
</ul>
<p>A general rule is that all classes that represent elements in the XSL
tree/document, i.e., classes that inherit from SyntaxTreeNode, output
bytecode in the 'translate()' method.</p>
<anchor name="toplevelelem"/>
<s3 title="Compiling top-level elements">
<p>The bytecode that handles top-level elements must be generated before any
other code. The '<code>translate()</code>' method in these classes are mainly
called from these methods in the Stylesheet class:</p>
<source> private String compileBuildKeys(ClassGenerator classGen);
private String compileTopLevel(ClassGenerator classGen, Enumeration elements);
private void compileConstructor(ClassGenerator classGen, Output output);</source>
<p>These methods handle most top-level elements, such as global variables
and parameters, <code>&lt;xsl:output&gt;</code> and
<code>&lt;xsl:decimal-format&gt;</code> instructions.</p>
</s3><anchor name="templatecode"/>
<s3 title="Compiling template code">
<p>All XPath patterns in <code>&lt;xsl:apply-template&gt;</code> instructions
are converted into numeric values (known as the pattern's kernel 'type').
All templates with identical pattern kernel types are grouped together and
inserted into a table with its assigned type. (This table is found in the
Mode class. There will be one such table for each mode that is used in the
stylesheet). This table is used to build a big <code>switch()</code> statement
in the translet's <code>applyTemplates()</code> method. This method is initially
called with the root node of the input document.</p>
<p>The <code>applyTemplates()</code> method determines the node's type and passes
this type to the <code>switch()</code> statement to look up the matching
template.</p>
<p>There may be several templates that share the same pattern kernel type.
Here are a few examples of templates with patterns that all have the same
kernel type:</p>
<source> &lt;xsl:template match=&quot;A/C&quot;&gt;
&lt;xsl:template match=&quot;A/B/C&quot;&gt;
&lt;xsl:template match=&quot;A | C&quot;&gt;</source>
<p>All these templates will be grouped under the type for <code>&lt;C&gt;</code>
and will all get the same kernel type (the type for <code>"C"</code>). The last
template will be grouped both under <code>"C"</code> and <code>"A"</code>. If the
type identifier for <code>"C"</code> in this case is 8, all these templates will
be put under <code>case 8:</code> in <code>applyTemplates()</code>'s big
<code>switch()</code> statement. The Mode class will insert extra code to choose
which template code to invoke.</p>
</s3><anchor name="instrfunc"/>
<s3 title="Compiling XSL instructions and functions">
<p>The template code is generated by calling <code>translate()</code> on each
Template object in the abstract syntax tree. This call will be propagated
down the tree and every element will output the bytecodes necessary to
complete its task.</p>
<p>Each node will call 'translate()' on its children, and possibly on
objects representing the node's XPath expressions, before outputting its
own bytecode. In that way the correct sequence of instructions is generated.
Each one of the child nodes is responsible of creating code that leaves the
node's output value (if any) on the stack. The typical procedure for the
parent node is to create code that consumes these values off the stack and
then leave its own output on the stack for its parent.</p>
<p>The tree-structure of the stylesheet is in this way closely tied with
the stack-based JVM. The design does not offer any obvious way of extending
the compiler to output code for other VMs or processors.</p>
</s3>
</s2>
</s1>