| <?xml version="1.0" encoding="UTF-8" standalone="no"?> |
| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
| <html> |
| <head> |
| <title>ASF: XSLTC Compiler Design</title> |
| <meta http-equiv="content-type" content="text/html; charset=UTF-8" /> |
| <meta http-equiv="Content-Style-Type" content="text/css" /> |
| <link rel="stylesheet" type="text/css" href="resources/apache-xalan.css" /> |
| </head> |
| <!-- |
| * Licensed to the Apache Software Foundation (ASF) under one |
| * or more contributor license agreements. See the NOTICE file |
| * distributed with this work for additional information |
| * regarding copyright ownership. The ASF licenses this file |
| * to you under the Apache License, Version 2.0 (the "License"); |
| * you may not use this file except in compliance with the License. |
| * You may obtain a copy of the License at |
| * |
| * http://www.apache.org/licenses/LICENSE-2.0 |
| * |
| * Unless required by applicable law or agreed to in writing, software |
| * distributed under the License is distributed on an "AS IS" BASIS, |
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| * See the License for the specific language governing permissions and |
| * limitations under the License. |
| --> |
| <body> |
| <div id="title"> |
| <table class="HdrTitle"> |
| <tbody> |
| <tr> |
| <th rowspan="2"> |
| <a href="../index.html"> |
| <img alt="Trademark Logo" src="resources/XalanJ-Logo-tm.png" width="190" height="90" /> |
| </a> |
| </th> |
| <th text-align="center" width="75%"> |
| <a href="index.html">XSLTC Design</a> |
| </th> |
| </tr> |
| <tr> |
| <td valign="middle">XSLTC Compiler Design</td> |
| </tr> |
| </tbody> |
| </table> |
| <table class="HdrButtons" align="center" border="1"> |
| <tbody> |
| <tr> |
| <td> |
| <a href="http://www.apache.org">Apache Foundation</a> |
| </td> |
| <td> |
| <a href="http://xalan.apache.org">Xalan Project</a> |
| </td> |
| <td> |
| <a href="http://xerces.apache.org">Xerces Project</a> |
| </td> |
| <td> |
| <a href="http://www.w3.org/TR">Web Consortium</a> |
| </td> |
| <td> |
| <a href="http://www.oasis-open.org/standards">Oasis Open</a> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </div> |
| <div id="navLeft"> |
| <ul> |
| <li> |
| <a href="index.html">Overview</a> |
| </li></ul><hr /><ul> |
| <li>Compiler design<br /> |
| </li></ul><hr /><ul> |
| <li> |
| <a href="xsl_whitespace_design.html">Whitespace</a> |
| </li> |
| <li> |
| <a href="xsl_sort_design.html">xsl:sort</a> |
| </li> |
| <li> |
| <a href="xsl_key_design.html">Keys</a> |
| </li> |
| <li> |
| <a href="xsl_comment_design.html">Comment design</a> |
| </li></ul><hr /><ul> |
| <li> |
| <a href="xsl_lang_design.html">lang()</a> |
| </li> |
| <li> |
| <a href="xsl_unparsed_design.html">Unparsed entities</a> |
| </li></ul><hr /><ul> |
| <li> |
| <a href="xsl_if_design.html">If design</a> |
| </li> |
| <li> |
| <a href="xsl_choose_design.html">Choose|When|Otherwise design</a> |
| </li> |
| <li> |
| <a href="xsl_include_design.html">Include|Import design</a> |
| </li> |
| <li> |
| <a href="xsl_variable_design.html">Variable|Param design</a> |
| </li></ul><hr /><ul> |
| <li> |
| <a href="xsltc_runtime.html">Runtime</a> |
| </li></ul><hr /><ul> |
| <li> |
| <a href="xsltc_dom.html">Internal DOM</a> |
| </li> |
| <li> |
| <a href="xsltc_namespace.html">Namespaces</a> |
| </li></ul><hr /><ul> |
| <li> |
| <a href="xsltc_trax.html">Translet & TrAX</a> |
| </li> |
| <li> |
| <a href="xsltc_predicates.html">XPath Predicates</a> |
| </li> |
| <li> |
| <a href="xsltc_iterators.html">Xsltc Iterators</a> |
| </li> |
| <li> |
| <a href="xsltc_native_api.html">Xsltc Native API</a> |
| </li> |
| <li> |
| <a href="xsltc_trax_api.html">Xsltc TrAX API</a> |
| </li> |
| <li> |
| <a href="xsltc_performance.html">Performance Hints</a> |
| </li> |
| </ul> |
| </div> |
| <div id="content"> |
| <h2>XSLTC Compiler Design</h2> |
| |
| <ul> |
| <li> |
| <a href="#overview">Compiler Overview</a> |
| </li> |
| <li> |
| <a href="#ast">Building the Abstract Syntax Tree</a> |
| </li> |
| <li> |
| <a href="#typecheck">Type-check and Cast Expressions</a> |
| </li> |
| <li> |
| <a href="#compile">JVM byte-code generation</a> |
| </li> |
| </ul> |
| |
| |
| |
| <a name="overview"></a> |
| <p align="right" size="2"> |
| <a href="#content">(top)</a> |
| </p> |
| <h3>Compiler overview</h3> |
| |
| <p>The main component of the XSLTC compiler is the class</p> |
| <ul> |
| <li> |
| <code>org.apache.xalan.xsltc.compiler.XSLTC</code> |
| </li> |
| </ul> |
| |
| <p>This class uses three parsers to consume the input stylesheet(s):</p> |
| |
| <ul> |
| <li> |
| <code>javax.xml.parsers.SAXParser</code> |
| </li> |
| </ul> |
| |
| <p>is used to parse the stylesheet document and pass its contents to |
| the compiler as basic SAX2 events.</p> |
| |
| <ul> |
| <li> |
| <code>com.sun.xslt.compiler.XPathParser</code> |
| </li> |
| </ul> |
| |
| <p> is a parser used to parse XPath expressions and patterns. This parser |
| is generated using JavaCUP and JavaLEX from Princeton University.</p> |
| |
| <ul> |
| <li> |
| <code>com.sun.xslt.compiler.Parser</code> |
| </li> |
| </ul> |
| |
| <p>is a wrapper for the other two parsers. This parser is responsible for |
| using the other two parsers to build the compiler's abstract syntax tree |
| (which is described in more detail in the next section of this document). |
| </p> |
| |
| |
| |
| |
| <a name="ast"></a> |
| <p align="right" size="2"> |
| <a href="#content">(top)</a> |
| </p> |
| <h3>Building an Abstract Syntax Tree</h3> |
| |
| <p>An abstract syntax tree (AST) is a data-structure commonly used by |
| compilers to separate the parse-phase from the later phases of the |
| compilation. The AST has one node for each parsed token from the stylesheet |
| and can easily be parsed at the stages of type-checking and bytecode |
| generation.</p> |
| |
| <ul> |
| <li> |
| <a href="#mapping">Mapping stylesheet elements to AST nodes</a> |
| </li> |
| <li> |
| <a href="#domxsl">Building the AST from AST nodes</a> |
| </li> |
| <li> |
| <a href="#mapping">Mapping XPath expressions and patterns to additional AST nodes</a> |
| </li> |
| </ul> |
| |
| <p>The SAX parser passes the contents of the stylesheet to XSLTC's main |
| parser. The SAX events represent a decomposition of the XML document that |
| contains the stylesheet. The main parser needs to create one AST node from |
| each node that it receives from the SAX parser. It also needs to use the |
| XPath parser to decompose attributes that contain XPath expressions and |
| patterns. Remember that XSLT is in effect two languages: XML and XPath, |
| and one parser is needed for each of these languages. The SAX parser breaks |
| down the stylesheet document, the XPath parser breaks down XPath expressions |
| and patterns, and the main parser maps the decomposed elements into nodes |
| in the abstract syntax tree.</p> |
| |
| <a name="mapping"></a> |
| <p align="right" size="2"> |
| <a href="#content">(top)</a> |
| </p> |
| <h4>Mapping stylesheets elements to AST nodes</h4> |
| |
| <p>Every element that is defined in the XSLT 1.0 spec is represented by a |
| a class in the <code>org.apache.xalan.xsltc.compiler</code> package. The |
| main parser class contains a <code>Hashtable</code> that that maps XSL |
| elements into Java classes that make up the nodes in the AST. These Java |
| classes all reside in the <code>org.apache.xalan.xsltc.compiler</code> |
| package and extend either the <code>TopLevelElement</code> or the |
| <code>Instruction</code> classes. (Both these classes extend the |
| <code>SyntaxTreeNode</code> class.)</p> |
| |
| <p>The mapping from XSL element names to Java classes/AST nodes is set up |
| in the <code>initClasses()</code> method of the main parser:</p> |
| <blockquote class="source"> |
| <pre> |
| private void initStdClasses() { |
| try { |
| initStdClass("template", "Template"); |
| initStdClass("param", "Param"); |
| initStdClass("with-param", "WithParam"); |
| initStdClass("variable", "Variable"); |
| initStdClass("output", "Output"); |
| : |
| : |
| : |
| } |
| } |
| |
| private void initClass(String elementName, String className) |
| throws ClassNotFoundException { |
| _classes.put(elementName, |
| Class.forName(COMPILER_PACKAGE + '.' + className)); |
| }</pre> |
| </blockquote> |
| |
| |
| |
| <a name="domxsl"></a> |
| <p align="right" size="2"> |
| <a href="#content">(top)</a> |
| </p> |
| <h4>Building the AST from AST nodes</h4> |
| <p>The parser builds an AST from the various syntax tree nodes. Each node |
| contains a reference to its parent node, a vector containing references |
| to all child nodes and a structure containing all attribute nodes:</p> |
| <blockquote class="source"> |
| <pre> |
| protected SyntaxTreeNode _parent; // Parent node |
| private Vector _contents; // Child nodes |
| protected Attributes _attributes; // Attributes of this element</pre> |
| </blockquote> |
| |
| |
| <p>These variables should be accessed using these methods:</p> |
| <blockquote class="source"> |
| <pre> |
| protected final SyntaxTreeNode getParent(); |
| protected final Vector getContents(); |
| protected String getAttribute(String qname); |
| protected Attributes getAttributes();</pre> |
| </blockquote> |
| |
| <p>At this time the AST only contains nodes that represent the XSL elements |
| from the stylesheet. A SAX parse is generic and can only handle XML files |
| and will not break up and identify XPath patterns/expressions (these are |
| stored as attributes to the various nodes in the tree). Each XSL instruction |
| gets its own node in the AST, and the XPath patterns/expressions are stored |
| as attributes of these nodes. A stylesheet looking like this:</p> |
| <blockquote class="source"> |
| <pre> |
| <xsl:stylesheet .......> |
| <xsl:template match="chapter"> |
| <xsl:text>Chapter</xsl:text> |
| <xsl:value-of select="."> |
| </xsl:template> |
| </xsl>stylesheet></pre> |
| </blockquote> |
| |
| <p>will be stored in the AST as indicated in the following picture:</p> |
| <p> |
| <img src="ast_stage1.gif" alt="ast_stage1.gif" /> |
| </p> |
| <p> |
| <b> |
| <i>Figure 1: The AST in its first stage</i> |
| </b> |
| </p> |
| |
| <p>All objects that make up the nodes in the initial AST have a |
| <code>parseContents()</code> method. This method is responsible for:</p> |
| |
| <ul> |
| <li>parsing the values of those attributes that contain XPath expressions |
| or patterns, breaking each expression/pattern into AST nodes and inserting |
| them into the tree.</li> |
| <li>reading/checking all other required attributes</li> |
| <li>propagate the <code>parseContents()</code> call down the tree</li> |
| </ul> |
| |
| |
| <p align="right" size="2"> |
| <a href="#content">(top)</a> |
| </p> |
| <h4>Mapping XPath expressions and patterns to additional AST nodes</h4> |
| |
| <p>The nodes that represent the XPath expressions and patterns extend |
| either the <code>Expression</code> or <code>Pattern</code> class |
| respectively. These nodes are not appended to the <code>_contents</code> |
| vectory of each node, but rather stored as individual references in each |
| AST element node. One example is the <code>ForEach</code> class that |
| represents the <code><xsl:for-each></code> element. This class has |
| a variable that contains a reference to the AST sub-tree that represents |
| its <code>select</code> attribute:</p> |
| <blockquote class="source"> |
| <pre> |
| private Expression _select;</pre> |
| </blockquote> |
| |
| <p>There is no standard way of storing these XPath expressions and each |
| AST node that contains one or more XPath expression/pattern must handle |
| that itself. This handling basically involves passing the attribute's |
| value to the XPath parser and receiving back an AST sub-tree.</p> |
| |
| <p>With all XPath expressions/patterns expanded, the AST will look somewhat |
| like this:</p> |
| |
| <p> |
| <img src="ast_stage2.gif" alt="ast_stage2.gif" /> |
| </p> |
| <p> |
| <b> |
| <i>Fiugre 2: The AST in its second stage</i> |
| </b> |
| </p> |
| |
| |
| |
| |
| |
| |
| <a name="typecheck"></a> |
| <p align="right" size="2"> |
| <a href="#content">(top)</a> |
| </p> |
| <h3>Type-check and Cast Expressions</h3> |
| |
| <p>In many cases we will need to typecast the top node in the expression |
| sub-tree to suit the expected result-type of the expression, or to typecast |
| child nodes to suit the allowed types for the various operators in the |
| expression. This is done by calling 'typeCheck()' on the root-node in the |
| XSL tree. Each SyntaxTreeNode node is responsible for inserting type-cast |
| nodes between itself and its child nodes or XPath nodes. These type-cast |
| nodes will convert the output-types of the child/XPath nodes to the expected |
| input-type of the parent node. Let look at our AST again and the node that |
| represents the <code><xsl:value-of></code> element. This element |
| expects to receive a string from its <code>select</code> XPath expression, |
| but the <code>Step</code> expression will return either a node-set or a |
| single node. An extra node is inserted into the AST to perform the |
| necessary type conversions:</p> |
| |
| <p> |
| <img src="ast_stage3.gif" alt="ast_stage3.gif" /> |
| </p> |
| <p> |
| <b> |
| <i>Figure 3: XPath expression type cast</i> |
| </b> |
| </p> |
| |
| <p>The <code>typeCheck()</code> method of each SyntaxTreeNode object will |
| call <code>typeCheck()</code> on each of its XPath expressions. This method |
| will return the native type returned by the expression. The AST node will |
| insert an additional type-conversion node if the return-type does not match |
| the expected data-type. Each possible return type is represented by a class |
| in the <code>org.apache.xalan.xsltc.compiler.util</code> package. These |
| classes all contain methods that will generate bytecodes needed to perform |
| the actual type conversions (at runtime). The type-cast nodes in the AST |
| mainly consist of calls to these methods.</p> |
| |
| |
| |
| |
| <a name="compile"></a> |
| <p align="right" size="2"> |
| <a href="#content">(top)</a> |
| </p> |
| <h3>JVM byte-code generation</h3> |
| |
| <ul> |
| <li> |
| <a href="#stylesheet">Compiling the stylesheet</a> |
| </li> |
| <li> |
| <a href="#toplevel">Compiling top-level elements</a> |
| </li> |
| <li> |
| <a href="#templates">Compiling template code</a> |
| </li> |
| <li> |
| <a href="#instructions">Compiling instructions, functions expressions and patterns</a> |
| </li> |
| </ul> |
| |
| <p>Evey node in the AST extends the <code>SyntaxTreeNode</code> base class |
| and implements the <code>translate()</code> method. This method is |
| responsible for outputting the actual bytecodes that make up the |
| functionality required for each element, function, expression or pattern. |
| </p> |
| |
| <a name="stylesheet"></a> |
| <p align="right" size="2"> |
| <a href="#content">(top)</a> |
| </p> |
| <h4>Compiling the stylesheet</h4> |
| <p>Some nodes in the AST require more complex code than others. The best |
| example is the <code><xsl:stylesheet></code> element. The code that |
| represents this element has to tie together the code that is generated by |
| all the other elements and generate the actual class definition for the main |
| translet class. The <code>Stylesheet</code> class generates the translet's |
| constructor and methods that handle all top-level elements.</p> |
| |
| |
| <a name="toplevel"></a> |
| <p align="right" size="2"> |
| <a href="#content">(top)</a> |
| </p> |
| <h4>Compiling top-level elements</h4> |
| <p>The bytecode that handles top-level elements must be generated before any |
| other code. The '<code>translate()</code>' method in these classes are |
| mainly called from these methods in the Stylesheet class:</p> |
| <blockquote class="source"> |
| <pre> |
| private String compileBuildKeys(ClassGenerator); |
| private String compileTopLevel(ClassGenerator, Enumeration); |
| private void compileConstructor(ClassGenerator, Output);</pre> |
| </blockquote> |
| |
| <p>These methods handle most top-level elements, such as global variables |
| and parameters, <code><xsl:output></code> and |
| <code><xsl:decimal-format></code> instructions.</p> |
| |
| |
| <a name="templates"></a> |
| <p align="right" size="2"> |
| <a href="#content">(top)</a> |
| </p> |
| <h4>Compiling template code</h4> |
| <p>All XPath patterns in <code><xsl:apply-template></code> |
| instructions are converted into numeric values (known as the pattern's |
| kernel 'type'). All templates with identical pattern kernel types are |
| grouped together and inserted into a table known as a test sequence. |
| (The table of test sequences is found in the Mode class in the compiler |
| package. There will be one such table for each mode that is used in the |
| stylesheet). This table is used to build a big <code>switch()</code> |
| statement in the translet's <code>applyTemplates()</code> method. This |
| method is initially called with the root node of the input document.</p> |
| |
| <p>The <code>applyTemplates()</code> method determines the node's type and |
| passes this type to the <code>switch()</code> statement to look up the |
| matching template. The test sequence code (the <code>TestSeq</code> class) |
| is responsible for inserting bytecodes to find one matching template |
| in cases where more than one template matches the current node type.</p> |
| |
| <p>There may be several templates that share the same pattern kernel type. |
| Here are a few examples of templates with patterns that all have the same |
| kernel type:</p> |
| <blockquote class="source"> |
| <pre> |
| <xsl:template match="A/C"> |
| <xsl:template match="A/B/C"> |
| <xsl:template match="A | C"></pre> |
| </blockquote> |
| |
| <p>All these templates will be grouped under the type for |
| <code><C></code> and will all get the same kernel type (the type for |
| <code>"C"</code>). The last template will be grouped both under |
| <code>"C"</code> and <code>"A"</code>, since it matches either element. |
| If the type identifier for <code>"C"</code> in this case is 8, all these |
| templates will be put under <code>case 8:</code> in |
| <code>applyTemplates()</code>'s big <code>switch()</code> statement. The |
| <code>TestSeq</code> class will insert some code under the |
| <code>case 8:</code> statement (similar to if's and then's) in order to |
| determine which of the three templates to trigger.</p> |
| |
| |
| <a name="instructions"></a> |
| <p align="right" size="2"> |
| <a href="#content">(top)</a> |
| </p> |
| <h4>Compiling instructions, functions, expressions and patterns</h4> |
| |
| <p>The template code is generated by calling <code>translate()</code> on |
| each <code>Template</code> object in the abstract syntax tree. This call |
| will be propagated down the abstract syntax tree and every element will |
| output the bytecodes necessary to complete its task.</p> |
| |
| <p>The Java Virtual Machine is stack-based, which goes hand-in-hand with |
| the tree structure of a stylesheet and the AST. A node in the AST will |
| call <code>translate()</code> on its child nodes and any XPath nodes before |
| it generates its own bytecodes. In that way the correct sequence of JVM |
| instructions is generated. Each one of the child nodes is responsible of |
| creating code that leaves the node's output value (if any) on the stack. |
| The typical procedure for the parent node is to create JVM code that |
| consumes these values off the stack and then leave its own output on the |
| stack (for its parent).</p> |
| |
| <p>The tree-structure of the stylesheet is in this way closely tied with |
| the stack-based JVM. The design does not offer any obvious way of extending |
| the compiler to output code for other non-stack-based VMs or processors.</p> |
| |
| |
| |
| |
| <p align="right" size="2"> |
| <a href="#content">(top)</a> |
| </p> |
| </div> |
| <div id="footer">Copyright © 1999-2014 The Apache Software Foundation<br />Apache, Xalan, and the Feather logo are trademarks of The Apache Software Foundation<div class="small">Web Page created on - Thu 2014-05-15</div> |
| </div> |
| </body> |
| </html> |