blob: f6ce6275e1c346752efcbdeb6cb5fe5075182b4f [file] [log] [blame]
<?xml version="1.0" standalone="no"?>
<!DOCTYPE s1 SYSTEM "../../style/dtd/document.dtd">
<!--
* The Apache Software License, Version 1.1
*
*
* Copyright (c) 2001 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Xalan" and "Apache Software Foundation" must
* not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache",
* nor may "Apache" appear in their name, without prior written
* permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation and was
* originally based on software copyright (c) 2001, Sun
* Microsystems., http://www.sun.com. For more
* information on the Apache Software Foundation, please see
* <http://www.apache.org/>.
-->
<s1 title="XSLTC Compiler Design">
<ul>
<li><link anchor="overview">Compiler Overview</link></li>
<li><link anchor="ast">Building the Abstract Syntax Tree</link></li>
<li><link anchor="typecheck">Type-check and Cast Expressions</link></li>
<li><link anchor="compile">JVM byte-code generation</link></li>
</ul>
<!--=================== OVERVIEW SECTION ===========================-->
<anchor name="overview"/>
<s2 title="Compiler overview">
<p>The main component of the XSLTC compiler is the class</p>
<ul>
<li><code>org.apache.xalan.xsltc.compiler.XSLTC</code></li>
</ul>
<p>This class uses three parsers to consume the input stylesheet(s):</p>
<ul>
<li><code>javax.xml.parsers.SAXParser</code></li>
</ul>
<p>is used to parse the stylesheet document and pass its contents to
the compiler as basic SAX2 events.</p>
<ul>
<li><code>com.sun.xslt.compiler.XPathParser</code></li>
</ul>
<p> is a parser used to parse XPath expressions and patterns. This parser
is generated using JavaCUP and JavaLEX from Princeton University.</p>
<ul>
<li><code>com.sun.xslt.compiler.Parser</code></li>
</ul>
<p>is a wrapper for the other two parsers. This parser is responsible for
using the other two parsers to build the compiler's abstract syntax tree
(which is described in more detail in the next section of this document).
</p>
</s2>
<!--============== ABSTRACT SYNTAX TREE SECTION ======================-->
<anchor name="ast"/>
<s2 title="Building an Abstract Syntax Tree">
<p>An abstract syntax tree (AST) is a data-structure commonly used by
compilers to separate the parse-phase from the later phases of the
compilation. The AST has one node for each parsed token from the stylesheet
and can easily be parsed at the stages of type-checking and bytecode
generation.</p>
<ul>
<li>
<link anchor="mapping">Mapping stylesheet elements to AST nodes</link>
</li>
<li>
<link anchor="domxsl">Building the AST from AST nodes</link>
</li>
<li>
<link anchor="mapping">Mapping XPath expressions and patterns to additional AST nodes</link>
</li>
</ul>
<p>The SAX parser passes the contents of the stylesheet to XSLTC's main
parser. The SAX events represent a decomposition of the XML document that
contains the stylesheet. The main parser needs to create one AST node from
each node that it receives from the SAX parser. It also needs to use the
XPath parser to decompose attributes that contain XPath expressions and
patterns. Remember that XSLT is in effect two languages: XML and XPath,
and one parser is needed for each of these languages. The SAX parser breaks
down the stylesheet document, the XPath parser breaks down XPath expressions
and patterns, and the main parser maps the decomposed elements into nodes
in the abstract syntax tree.</p>
<anchor name="mapping"/>
<s3 title="Mapping stylesheets elements to AST nodes">
<p>Every element that is defined in the XSLT 1.0 spec is represented by a
a class in the <code>org.apache.xalan.xsltc.compiler</code> package. The
main parser class contains a <code>Hashtable</code> that that maps XSL
elements into Java classes that make up the nodes in the AST. These Java
classes all reside in the <code>org.apache.xalan.xsltc.compiler</code>
package and extend either the <code>TopLevelElement</code> or the
<code>Instruction</code> classes. (Both these classes extend the
<code>SyntaxTreeNode</code> class.)</p>
<p>The mapping from XSL element names to Java classes/AST nodes is set up
in the <code>initClasses()</code> method of the main parser:</p><source>
private void initStdClasses() {
try {
initStdClass("template", "Template");
initStdClass("param", "Param");
initStdClass("with-param", "WithParam");
initStdClass("variable", "Variable");
initStdClass("output", "Output");
:
:
:
}
}
private void initClass(String elementName, String className)
throws ClassNotFoundException {
_classes.put(elementName,
Class.forName(COMPILER_PACKAGE + '.' + className));
}</source>
</s3>
<anchor name="domxsl"/>
<s3 title="Building the AST from AST nodes">
<p>The parser builds an AST from the various syntax tree nodes. Each node
contains a reference to its parent node, a vector containing references
to all child nodes and a structure containing all attribute nodes:</p><source>
protected SyntaxTreeNode _parent; // Parent node
private Vector _contents; // Child nodes
protected Attributes _attributes; // Attributes of this element</source>
<p>These variables should be accessed using these methods:</p><source>
protected final SyntaxTreeNode getParent();
protected final Vector getContents();
protected String getAttribute(String qname);
protected Attributes getAttributes();</source>
<p>At this time the AST only contains nodes that represent the XSL elements
from the stylesheet. A SAX parse is generic and can only handle XML files
and will not break up and identify XPath patterns/expressions (these are
stored as attributes to the various nodes in the tree). Each XSL instruction
gets its own node in the AST, and the XPath patterns/expressions are stored
as attributes of these nodes. A stylesheet looking like this:</p><source>
&lt;xsl:stylesheet .......&gt;
&lt;xsl:template match="chapter"&gt;
&lt;xsl:text&gt;Chapter&lt;/xsl:text&gt;
&lt;xsl:value-of select="."&gt;
&lt;/xsl:template&gt;
&lt;/xsl&gt;stylesheet&gt;</source>
<p>will be stored in the AST as indicated in the following picture:</p>
<p><img src="ast_stage1.gif" alt="ast_stage1.gif"/></p>
<p><ref>Figure 1: The AST in its first stage</ref></p>
<p>All objects that make up the nodes in the initial AST have a
<code>parseContents()</code> method. This method is responsible for:</p>
<ul>
<li>parsing the values of those attributes that contain XPath expressions
or patterns, breaking each expression/pattern into AST nodes and inserting
them into the tree.</li>
<li>reading/checking all other required attributes</li>
<li>propagate the <code>parseContents()</code> call down the tree</li>
</ul>
</s3>
<s3 title="Mapping XPath expressions and patterns to additional AST nodes">
<p>The nodes that represent the XPath expressions and patterns extend
either the <code>Expression</code> or <code>Pattern</code> class
respectively. These nodes are not appended to the <code>_contents</code>
vectory of each node, but rather stored as individual references in each
AST element node. One example is the <code>ForEach</code> class that
represents the <code>&lt;xsl:for-each&gt;</code> element. This class has
a variable that contains a reference to the AST sub-tree that represents
its <code>select</code> attribute:</p><source>
private Expression _select;</source>
<p>There is no standard way of storing these XPath expressions and each
AST node that contains one or more XPath expression/pattern must handle
that itself. This handling basically involves passing the attribute's
value to the XPath parser and receiving back an AST sub-tree.</p>
<p>With all XPath expressions/patterns expanded, the AST will look somewhat
like this:</p>
<p><img src="ast_stage2.gif" alt="ast_stage2.gif"/></p>
<p><ref>Fiugre 2: The AST in its second stage</ref></p>
</s3>
</s2>
<!--================= TYPE CONVERSION SECTION ========================-->
<anchor name="typecheck"/>
<s2 title="Type-check and Cast Expressions">
<p>In many cases we will need to typecast the top node in the expression
sub-tree to suit the expected result-type of the expression, or to typecast
child nodes to suit the allowed types for the various operators in the
expression. This is done by calling 'typeCheck()' on the root-node in the
XSL tree. Each SyntaxTreeNode node is responsible for inserting type-cast
nodes between itself and its child nodes or XPath nodes. These type-cast
nodes will convert the output-types of the child/XPath nodes to the expected
input-type of the parent node. Let look at our AST again and the node that
represents the <code>&lt;xsl:value-of&gt;</code> element. This element
expects to receive a string from its <code>select</code> XPath expression,
but the <code>Step</code> expression will return either a node-set or a
single node. An extra node is inserted into the AST to perform the
necessary type conversions:</p>
<p><img src="ast_stage3.gif" alt="ast_stage3.gif"/></p>
<p><ref>Figure 3: XPath expression type cast</ref></p>
<p>The <code>typeCheck()</code> method of each SyntaxTreeNode object will
call <code>typeCheck()</code> on each of its XPath expressions. This method
will return the native type returned by the expression. The AST node will
insert an additional type-conversion node if the return-type does not match
the expected data-type. Each possible return type is represented by a class
in the <code>org.apache.xalan.xsltc.compiler.util</code> package. These
classes all contain methods that will generate bytecodes needed to perform
the actual type conversions (at runtime). The type-cast nodes in the AST
mainly consist of calls to these methods.</p>
</s2>
<!--=============== BYTE-CODE GENERATION SECTION ======================-->
<anchor name="compile"/>
<s2 title="JVM byte-code generation">
<ul>
<li><link anchor="stylesheet">Compiling the stylesheet</link></li>
<li><link anchor="toplevel">Compiling top-level elements</link></li>
<li><link anchor="templates">Compiling template code</link></li>
<li><link anchor="instructions">Compiling instructions, functions expressions and patterns</link></li>
</ul>
<p>Evey node in the AST extends the <code>SyntaxTreeNode</code> base class
and implements the <code>translate()</code> method. This method is
responsible for outputting the actual bytecodes that make up the
functionality required for each element, function, expression or pattern.
</p>
<anchor name="stylesheet"/>
<s3 title="Compiling the stylesheet">
<p>Some nodes in the AST require more complex code than others. The best
example is the <code>&lt;xsl:stylesheet&gt;</code> element. The code that
represents this element has to tie together the code that is generated by
all the other elements and generate the actual class definition for the main
translet class. The <code>Stylesheet</code> class generates the translet's
constructor and methods that handle all top-level elements.</p>
</s3>
<anchor name="toplevel"/>
<s3 title="Compiling top-level elements">
<p>The bytecode that handles top-level elements must be generated before any
other code. The '<code>translate()</code>' method in these classes are
mainly called from these methods in the Stylesheet class:</p><source>
private String compileBuildKeys(ClassGenerator);
private String compileTopLevel(ClassGenerator, Enumeration);
private void compileConstructor(ClassGenerator, Output);</source>
<p>These methods handle most top-level elements, such as global variables
and parameters, <code>&lt;xsl:output&gt;</code> and
<code>&lt;xsl:decimal-format&gt;</code> instructions.</p>
</s3>
<anchor name="templates"/>
<s3 title="Compiling template code">
<p>All XPath patterns in <code>&lt;xsl:apply-template&gt;</code>
instructions are converted into numeric values (known as the pattern's
kernel 'type'). All templates with identical pattern kernel types are
grouped together and inserted into a table known as a test sequence.
(The table of test sequences is found in the Mode class in the compiler
package. There will be one such table for each mode that is used in the
stylesheet). This table is used to build a big <code>switch()</code>
statement in the translet's <code>applyTemplates()</code> method. This
method is initially called with the root node of the input document.</p>
<p>The <code>applyTemplates()</code> method determines the node's type and
passes this type to the <code>switch()</code> statement to look up the
matching template. The test sequence code (the <code>TestSeq</code> class)
is responsible for inserting bytecodes to find one matching template
in cases where more than one template matches the current node type.</p>
<p>There may be several templates that share the same pattern kernel type.
Here are a few examples of templates with patterns that all have the same
kernel type:</p><source>
&lt;xsl:template match=&quot;A/C&quot;&gt;
&lt;xsl:template match=&quot;A/B/C&quot;&gt;
&lt;xsl:template match=&quot;A | C&quot;&gt;</source>
<p>All these templates will be grouped under the type for
<code>&lt;C&gt;</code> and will all get the same kernel type (the type for
<code>"C"</code>). The last template will be grouped both under
<code>"C"</code> and <code>"A"</code>, since it matches either element.
If the type identifier for <code>"C"</code> in this case is 8, all these
templates will be put under <code>case 8:</code> in
<code>applyTemplates()</code>'s big <code>switch()</code> statement. The
<code>TestSeq</code> class will insert some code under the
<code>case 8:</code> statement (similar to if's and then's) in order to
determine which of the three templates to trigger.</p>
</s3>
<anchor name="instructions"/>
<s3 title="Compiling instructions, functions, expressions and patterns">
<p>The template code is generated by calling <code>translate()</code> on
each <code>Template</code> object in the abstract syntax tree. This call
will be propagated down the abstract syntax tree and every element will
output the bytecodes necessary to complete its task.</p>
<p>The Java Virtual Machine is stack-based, which goes hand-in-hand with
the tree structure of a stylesheet and the AST. A node in the AST will
call <code>translate()</code> on its child nodes and any XPath nodes before
it generates its own bytecodes. In that way the correct sequence of JVM
instructions is generated. Each one of the child nodes is responsible of
creating code that leaves the node's output value (if any) on the stack.
The typical procedure for the parent node is to create JVM code that
consumes these values off the stack and then leave its own output on the
stack (for its parent).</p>
<p>The tree-structure of the stylesheet is in this way closely tied with
the stack-based JVM. The design does not offer any obvious way of extending
the compiler to output code for other non-stack-based VMs or processors.</p>
</s3>
</s2>
</s1>