| <?xml version="1.0" standalone="no"?> |
| <!DOCTYPE s1 SYSTEM "../../style/dtd/document.dtd"> |
| <!-- |
| * The Apache Software License, Version 1.1 |
| * |
| * |
| * Copyright (c) 2001 The Apache Software Foundation. All rights |
| * reserved. |
| * |
| * Redistribution and use in source and binary forms, with or without |
| * modification, are permitted provided that the following conditions |
| * are met: |
| * |
| * 1. Redistributions of source code must retain the above copyright |
| * notice, this list of conditions and the following disclaimer. |
| * |
| * 2. Redistributions in binary form must reproduce the above copyright |
| * notice, this list of conditions and the following disclaimer in |
| * the documentation and/or other materials provided with the |
| * distribution. |
| * |
| * 3. The end-user documentation included with the redistribution, |
| * if any, must include the following acknowledgment: |
| * "This product includes software developed by the |
| * Apache Software Foundation (http://www.apache.org/)." |
| * Alternately, this acknowledgment may appear in the software itself, |
| * if and wherever such third-party acknowledgments normally appear. |
| * |
| * 4. The names "Xalan" and "Apache Software Foundation" must |
| * not be used to endorse or promote products derived from this |
| * software without prior written permission. For written |
| * permission, please contact apache@apache.org. |
| * |
| * 5. Products derived from this software may not be called "Apache", |
| * nor may "Apache" appear in their name, without prior written |
| * permission of the Apache Software Foundation. |
| * |
| * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED |
| * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES |
| * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE |
| * DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR |
| * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, |
| * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT |
| * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF |
| * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND |
| * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, |
| * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT |
| * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF |
| * SUCH DAMAGE. |
| * ==================================================================== |
| * |
| * This software consists of voluntary contributions made by many |
| * individuals on behalf of the Apache Software Foundation and was |
| * originally based on software copyright (c) 2001, Sun |
| * Microsystems., http://www.sun.com. For more |
| * information on the Apache Software Foundation, please see |
| * <http://www.apache.org/>. |
| --> |
| <s1 title="XSLTC Compiler Design"> |
| <ul> |
| <li><link anchor="overview">Compiler Overview</link></li> |
| <li><link anchor="ast">Building an Abstract Syntax Tree</link></li> |
| <li><link anchor="typecheck">Type-check and Cast Expressions</link></li> |
| <li><link anchor="compile">Code generation</link></li> |
| </ul> |
| |
| <anchor name="overview"/> |
| <s2 title="Compiler overview"> |
| |
| <p>The input stylesheet is parsed using the SAX 1-based parser from Sun's |
| Project X:</p> |
| <ul> |
| <li><code>com.sun.xml.parser.Parser</code></li> |
| </ul> |
| |
| <p>This parser builds a DOM from the stylesheet document, and hands this |
| DOM over to the compiler. The compiler uses its own specialised parser to |
| parse XPath expressions and patterns:</p> |
| <ul> |
| <li><code>com.sun.xslt.compiler.XPathParser</code></li> |
| </ul> |
| <p>Both parsers are encapsulated in XSLTC's parser class:</p> |
| <ul> |
| <li><code>com.sun.xslt.compiler.Parser</code></li> |
| </ul> |
| |
| </s2><anchor name="ast"/> |
| <s2 title="Building an Abstract Syntax Tree"> |
| <ul> |
| <li><link anchor="mapping">Mapping stylesheet elements to Java classes</link></li> |
| <li><link anchor="domxsl">Building a DOM tree from the input XSL file</link></li> |
| </ul> |
| <p>The SAX parser builds a standard W3C DOM from the source stylesheet. |
| This DOM does not contain all the information needed to represent the |
| whole stylesheet. ( Remember that XSL is two languages; XML and XPath. |
| The DOM only covers XML. ) The compiler uses the DOM to build an |
| abstract syntax tree (AST) that contains all the nodes from the DOM, plus |
| additional nodes for the XPath expressions.</p> |
| <anchor name="mapping"/> |
| <s3 title="Mapping stylesheets elements to Java classes"> |
| <p>Each XSL element is represented by a class in the |
| <code>com.sun.xslt.compiler</code> package. The Parser class contains a |
| Hashtable that that maps XSL instructions to classes that inherit from a |
| common parent class 'Instruction' (which again inherits from |
| 'SyntaxTreeNode'). This mapping is set up in the <code>initClasses()</code> method:</p> |
| <source> private void initStdClasses() { |
| try { |
| initStdClass("template", "Template"); |
| initStdClass("param", "Param"); |
| initStdClass("with-param", "WithParam"); |
| initStdClass("variable", "Variable"); |
| initStdClass("output", "Output"); |
| : |
| : |
| : |
| } |
| } |
| |
| private void initClass(String elementName, String className) |
| throws ClassNotFoundException { |
| _classes.put(elementName, |
| Class.forName(COMPILER_PACKAGE + '.' + className)); |
| }</source> |
| </s3><anchor name="domxsl"/> |
| <s3 title="Building a DOM tree from the input XSL file"> |
| <p>The parser instanciates a DOM that holds the input XSL stylesheet. The |
| DOM can only handle XML files and will not break up and identify XPath |
| patterns/expressions (these are stored as attributes to the various |
| nodes in the tree) or calls to XSL functions(). Each XSL instruction gets |
| its own node in the DOM, and the XPath patterns/expressions are stored as |
| attributes of these nodes. A stylesheet looking like this:</p> |
| <source> |
| <xsl:stylesheet .......> |
| <xsl:template match="chapter"> |
| <xsl:text>Chapter</xsl:text> |
| <xslvalue-of select="."> |
| </xsl:template> |
| </xsl>stylesheet> |
| </source> |
| <p>will be stored in the DOM as indicated in the following picture:</p> |
| <p><img src="compiler_DOM.gif" alt="compiler_DOM.gif"/></p> |
| <p><ref>Figure 1: DOM containing XSL stylesheet</ref></p> |
| <p>The pattern '<code>match="chapter"</code>' and the expression |
| '<code>select="."</code>' are stored as attributes for the nodes |
| '<code>xsl:template</code>' and '<code>xsl:value-of</code>' respectively. |
| These attributes are accessible through the DOM interface.</p> |
| </s3> |
| <s3 title="Creating the Abstract Syntax Tree from the DOM"> |
| <p>What we have to do next is to create a tree that also holds the XSL |
| specific elements; XPath expressions and patterns (with possible filters) |
| and calls to XSL functions. This is done by parsing the DOM and creating an |
| instance of a subclass of 'SyntaxTreeNode' for each node in the DOM. A node |
| in the DOM containing an XSL instruction (for example, "xsl:template") results in an |
| instance of the correspoding class derived from the HashTable created by |
| the parser (in this case in instance of the 'Template' class).</p> |
| |
| <p>Each class that inherits SyntaxTreeNode has a vector called |
| '<code>_contents</code>' that holds references to all the children of the node |
| (if any). Each node has a method called '<code>parseContents()</code>'. It is |
| the responsibility of this method to parse any XPath expressions/patterns |
| that are expected and found in the node's attributes. The XPath patterns |
| and instructions are tokenised using the auto-generated class 'XPathParser' |
| (generated using JavaCup and JLex). The tokenised expressions/patterns |
| will result in a small sub-tree owned by the syntax tree node.</p> |
| |
| <p>XSL nodes holding expressions has a pointer called '<code>_select</code>' that |
| points to a sub-tree representing the expression. This can be seen for |
| instance in the 'Template' class:</p> |
| <p><img src="compiler_AST.gif" alt="compiler_AST.gif"/></p> |
| <p><ref>Fiugre 2: Sample Abstract Syntax Tree</ref></p> |
| <p>In this example _select only points to a single node. In more complex |
| expressions the pointer will point to an whole sub-tree.</p> |
| </s3> |
| </s2><anchor name="typecheck"/> |
| <s2 title="Type-check and Cast Expressions"> |
| <p>In many cases we will need to typecast the top node in the expression |
| sub-tree to suit the expected result-type of the expression, or to typecast |
| child nodes to suit the allowed types for the various operators in the |
| expression. This is done by calling 'typeCheck()' on the root-node in the |
| XSL tree. Each SyntaxTree node is responsible for its own type checking |
| (ie. the <code>typeCheck()</code> method must be overridden). Let us say that |
| our pattern was:</p> |
| <p><code><xsl:value-of select="1+2.73"/></code></p> |
| <p><img src="typecast.gif" alt="typecast.gif"/></p> |
| <p><ref>Figure 3: XPath expression type conflict</ref></p> |
| <p>The number 1 is an integer, and the number 2.73 is a real number, so the |
| 1 has to be promoted to a real. This is done ny inserting a new node between |
| the [1] and the [+]. This node will convert the 1 to a real number:</p> |
| <p><img src="cast_expression.gif" alt="cast_expression.gif"/></p> |
| <p><ref>Figure 4: Type casting</ref></p> |
| |
| <p>The inserted node is an object of the class CastExpr. The SymbolTable |
| that was instanciated in (1) is used to determine what casts are needed for |
| the various operators and what return types the various expressions will |
| have.</p> |
| |
| </s2><anchor name="compile"/> |
| <s2 title="Code generation"> |
| <ul> |
| <li><link anchor="toplevelelem">Compiling top-level elements</link></li> |
| <li><link anchor="templatecode">Compiling template code</link></li> |
| <li><link anchor="instrfunc">Compiling XSL instructions and functions</link></li> |
| </ul> |
| <p>A general rule is that all classes that represent elements in the XSL |
| tree/document, i.e., classes that inherit from SyntaxTreeNode, output |
| bytecode in the 'translate()' method.</p> |
| <anchor name="toplevelelem"/> |
| <s3 title="Compiling top-level elements"> |
| <p>The bytecode that handles top-level elements must be generated before any |
| other code. The '<code>translate()</code>' method in these classes are mainly |
| called from these methods in the Stylesheet class:</p> |
| |
| <source> private String compileBuildKeys(ClassGenerator classGen); |
| private String compileTopLevel(ClassGenerator classGen, Enumeration elements); |
| private void compileConstructor(ClassGenerator classGen, Output output);</source> |
| |
| <p>These methods handle most top-level elements, such as global variables |
| and parameters, <code><xsl:output></code> and |
| <code><xsl:decimal-format></code> instructions.</p> |
| </s3><anchor name="templatecode"/> |
| <s3 title="Compiling template code"> |
| <p>All XPath patterns in <code><xsl:apply-template></code> instructions |
| are converted into numeric values (known as the pattern's kernel 'type'). |
| All templates with identical pattern kernel types are grouped together and |
| inserted into a table with its assigned type. (This table is found in the |
| Mode class. There will be one such table for each mode that is used in the |
| stylesheet). This table is used to build a big <code>switch()</code> statement |
| in the translet's <code>applyTemplates()</code> method. This method is initially |
| called with the root node of the input document.</p> |
| <p>The <code>applyTemplates()</code> method determines the node's type and passes |
| this type to the <code>switch()</code> statement to look up the matching |
| template.</p> |
| |
| <p>There may be several templates that share the same pattern kernel type. |
| Here are a few examples of templates with patterns that all have the same |
| kernel type:</p> |
| |
| <source> <xsl:template match="A/C"> |
| <xsl:template match="A/B/C"> |
| <xsl:template match="A | C"></source> |
| |
| <p>All these templates will be grouped under the type for <code><C></code> |
| and will all get the same kernel type (the type for <code>"C"</code>). The last |
| template will be grouped both under <code>"C"</code> and <code>"A"</code>. If the |
| type identifier for <code>"C"</code> in this case is 8, all these templates will |
| be put under <code>case 8:</code> in <code>applyTemplates()</code>'s big |
| <code>switch()</code> statement. The Mode class will insert extra code to choose |
| which template code to invoke.</p> |
| </s3><anchor name="instrfunc"/> |
| <s3 title="Compiling XSL instructions and functions"> |
| |
| <p>The template code is generated by calling <code>translate()</code> on each |
| Template object in the abstract syntax tree. This call will be propagated |
| down the tree and every element will output the bytecodes necessary to |
| complete its task.</p> |
| |
| <p>Each node will call 'translate()' on its children, and possibly on |
| objects representing the node's XPath expressions, before outputting its |
| own bytecode. In that way the correct sequence of instructions is generated. |
| Each one of the child nodes is responsible of creating code that leaves the |
| node's output value (if any) on the stack. The typical procedure for the |
| parent node is to create code that consumes these values off the stack and |
| then leave its own output on the stack for its parent.</p> |
| |
| <p>The tree-structure of the stylesheet is in this way closely tied with |
| the stack-based JVM. The design does not offer any obvious way of extending |
| the compiler to output code for other VMs or processors.</p> |
| </s3> |
| </s2> |
| </s1> |