| <?xml version="1.0" encoding="UTF-8" standalone="no"?> |
| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
| <html> |
| <head> |
| <title>ASF: XSLTC Predicate Handling</title> |
| <meta http-equiv="content-type" content="text/html; charset=UTF-8" /> |
| <meta http-equiv="Content-Style-Type" content="text/css" /> |
| <link rel="stylesheet" type="text/css" href="resources/apache-xalan.css" /> |
| </head> |
| <!-- |
| * Licensed to the Apache Software Foundation (ASF) under one |
| * or more contributor license agreements. See the NOTICE file |
| * distributed with this work for additional information |
| * regarding copyright ownership. The ASF licenses this file |
| * to you under the Apache License, Version 2.0 (the "License"); |
| * you may not use this file except in compliance with the License. |
| * You may obtain a copy of the License at |
| * |
| * http://www.apache.org/licenses/LICENSE-2.0 |
| * |
| * Unless required by applicable law or agreed to in writing, software |
| * distributed under the License is distributed on an "AS IS" BASIS, |
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| * See the License for the specific language governing permissions and |
| * limitations under the License. |
| --> |
| <body> |
| <div id="title"> |
| <table class="HdrTitle"> |
| <tbody> |
| <tr> |
| <th rowspan="2"> |
| <a href="../index.html"> |
| <img alt="Trademark Logo" src="resources/XalanJ-Logo-tm.png" width="190" height="90" /> |
| </a> |
| </th> |
| <th text-align="center" width="75%"> |
| <a href="index.html">XSLTC Design</a> |
| </th> |
| </tr> |
| <tr> |
| <td valign="middle">XSLTC Predicate Handling</td> |
| </tr> |
| </tbody> |
| </table> |
| <table class="HdrButtons" align="center" border="1"> |
| <tbody> |
| <tr> |
| <td> |
| <a href="http://www.apache.org">Apache Foundation</a> |
| </td> |
| <td> |
| <a href="http://xalan.apache.org">Xalan Project</a> |
| </td> |
| <td> |
| <a href="http://xerces.apache.org">Xerces Project</a> |
| </td> |
| <td> |
| <a href="http://www.w3.org/TR">Web Consortium</a> |
| </td> |
| <td> |
| <a href="http://www.oasis-open.org/standards">Oasis Open</a> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </div> |
| <div id="navLeft"> |
| <ul> |
| <li> |
| <a href="index.html">Overview</a> |
| </li></ul><hr /><ul> |
| <li> |
| <a href="xsltc_compiler.html">Compiler design</a> |
| </li></ul><hr /><ul> |
| <li> |
| <a href="xsl_whitespace_design.html">Whitespace</a> |
| </li> |
| <li> |
| <a href="xsl_sort_design.html">xsl:sort</a> |
| </li> |
| <li> |
| <a href="xsl_key_design.html">Keys</a> |
| </li> |
| <li> |
| <a href="xsl_comment_design.html">Comment design</a> |
| </li></ul><hr /><ul> |
| <li> |
| <a href="xsl_lang_design.html">lang()</a> |
| </li> |
| <li> |
| <a href="xsl_unparsed_design.html">Unparsed entities</a> |
| </li></ul><hr /><ul> |
| <li> |
| <a href="xsl_if_design.html">If design</a> |
| </li> |
| <li> |
| <a href="xsl_choose_design.html">Choose|When|Otherwise design</a> |
| </li> |
| <li> |
| <a href="xsl_include_design.html">Include|Import design</a> |
| </li> |
| <li> |
| <a href="xsl_variable_design.html">Variable|Param design</a> |
| </li></ul><hr /><ul> |
| <li> |
| <a href="xsltc_runtime.html">Runtime</a> |
| </li></ul><hr /><ul> |
| <li> |
| <a href="xsltc_dom.html">Internal DOM</a> |
| </li> |
| <li> |
| <a href="xsltc_namespace.html">Namespaces</a> |
| </li></ul><hr /><ul> |
| <li> |
| <a href="xsltc_trax.html">Translet & TrAX</a> |
| </li> |
| <li>XPath Predicates<br /> |
| </li> |
| <li> |
| <a href="xsltc_iterators.html">Xsltc Iterators</a> |
| </li> |
| <li> |
| <a href="xsltc_native_api.html">Xsltc Native API</a> |
| </li> |
| <li> |
| <a href="xsltc_trax_api.html">Xsltc TrAX API</a> |
| </li> |
| <li> |
| <a href="xsltc_performance.html">Performance Hints</a> |
| </li> |
| </ul> |
| </div> |
| <div id="content"> |
| <h2>XSLTC Predicate Handling</h2> |
| |
| <p align="right" size="2"> |
| <a href="#content">(top)</a> |
| </p> |
| <h3>Definition</h3> |
| |
| <p>According to Michael Kay's "XSLT Programmer's Reference" page |
| 736, a predicate is "An expression used to filter which nodes are |
| selected by a particular step in a path expression, or to select a subset of |
| the nodes in a node-set. A Boolean expression selects the nodes for which the |
| predicate is true; a numeric expression selects the node at the position |
| given by the value of the expression, for example '[1]' selects the first |
| node.". Note that a predicate containing a boolean expression can |
| return zero, one or more nodes, while a predicate containing a numeric |
| expression can return only zero or one node.</p> |
| |
| |
| |
| <p align="right" size="2"> |
| <a href="#content">(top)</a> |
| </p> |
| <h3>Examples</h3> |
| |
| <p>I'll list a few examples that I can refer back to later on in this |
| document. All examples will use this XML document:</p> |
| <blockquote class="source"> |
| <pre> |
| <?xml version="1.0"?> |
| <doc> |
| <foo location="Drumcondra"> |
| <bar name="Cat and Cage"/> |
| <bar name="Fagan's"/> |
| <bar name="Gravedigger's"/> |
| <bar name="Ivy House"/> |
| <foo> |
| <foo location="Town"> |
| <bar name="Peter's Pub"/> |
| <bar name="Grogan's"/> |
| <bar name="Hogans's"/> |
| <bar name="Brogan's"/> |
| </foo> |
| </doc></pre> |
| </blockquote> |
| |
| <p>Here are some examples of a predicate with boolean expressions:</p> |
| <blockquote class="source"> |
| <pre> |
| <xsl:for-each select="//bar[contains(@name,'ogan')]"> |
| <xsl:for-each select="//bar[parent::*/@location = 'Drumcondra']"> |
| <xsl:for-each select="//bar[@name = 'Cat and Cage']"></pre> |
| </blockquote> |
| |
| <p>The first two select more than one node, while the last selects only one. |
| The last expression could select more nodes if the input document was |
| different. Now, here are a few examples of predicates with numeric |
| expressions:</p> |
| <blockquote class="source"> |
| <pre> |
| <xsl:value-of select="//bar[1]"> |
| <xsl:value-of select="/doc/foo[2]/bar[1]"> |
| <xsl:value-of select="/doc/foo[2]/bar"></pre> |
| </blockquote> |
| <p>The last expression will return more than one node, but the step that |
| contains the predicate returns only one (the second <code><foo></code> |
| element).</p> |
| |
| <p>The above are the basic types of predicates. These can be grouped to create |
| a predicate pipeline, where the first predicate reduces the node-set that the |
| second predicate filters, and so on. Here are some examples:</p> |
| <blockquote class="source"> |
| <pre> |
| A: <for-each select="//bar[contains(@name,'ogan')][2]"> |
| C: <for-each select="//bar[2][contains(@name,'ogan')]"> |
| B: <for-each select="//bar[position() > 3][2]"></pre> |
| </blockquote> |
| |
| <p>It is easier to figure out which nodes these expressions should return if |
| one goes through the steps and predicates one by one. In expression |
| <code>A:</code> we first get all <code><bar></code> elements from the |
| whole document. Then the first predicate selects from that node-set only |
| those elements that have a <code>@name</code> attribute that contains |
| "ogan", and we're left with these elements:</p> |
| <blockquote class="source"> |
| <pre> |
| <bar name="Grogan's"> |
| <bar name="Hogans's"> |
| <bar name="Brogan's"></pre> |
| </blockquote> |
| <p>And finally, the last predicate then selects the second of those |
| elements:</p> |
| <blockquote class="source"> |
| <pre> |
| <bar name="Hogans's"></pre> |
| </blockquote> |
| |
| <p>Expression <code>B:</code> contains the same predicates as <code>A:</code>, |
| but the resulting node set if completely different. We start off with the same |
| set of <code><bar></code> elements, but we apply the |
| <code>"[2]"</code> predicate first, and end up with this |
| element:</p> |
| <blockquote class="source"> |
| <pre> |
| <bar name="Fagan's"></pre> |
| </blockquote> |
| |
| <p>Fagan's is the bar where the Irish Taoiseach (prime minister) drinks his |
| pints, but its name does not contain the string "<code>ogan</code>", |
| so the resulting node-set is empty.</p> |
| |
| <p>The third expressions also starts off with all <code><bar></code> |
| elements, applies the predicate "<code>[position() > 3]</code>", |
| and reduces the node set to these:</p> |
| <blockquote class="source"> |
| <pre> |
| <bar name="Ivy House"> |
| <bar name="Peter's Pub"> |
| <bar name="Grogan's"> |
| <bar name="Hogans's"> |
| <bar name="Brogan's"></pre> |
| </blockquote> |
| <p>The last predicate "<code>[2]</code>" is applied to this node-set |
| and set is further reduced to:</p> |
| <blockquote class="source"> |
| <pre> |
| <bar name="Peter's Pub"></pre> |
| </blockquote> |
| |
| |
| |
| <p align="right" size="2"> |
| <a href="#content">(top)</a> |
| </p> |
| <h3>Categories</h3> |
| |
| <p>From the examples in the last chapter we can try to categorize predicate |
| chains/pipelines to simplify our implementation. We can speed up processing |
| significantly if we can avoid using a data-structure (iterator) to represent |
| the intermediate step between predicates. The goal of setting up these |
| categories is to pinpoint those cases where an intermediate iterator has |
| to be used and when it can be avoided.</p> |
| |
| <p align="right" size="2"> |
| <a href="#content">(top)</a> |
| </p> |
| <h4>Single predicate expressions</h4> |
| |
| <p>Expressions containing just a single predicate have no intermediate step |
| and there is no need for any extra iterator. The expression inside the |
| predicate can be applied directly to the original iterator. We call this |
| category <b>SIMPLE_CONTEXT</b>.</p> |
| |
| |
| |
| <p align="right" size="2"> |
| <a href="#content">(top)</a> |
| </p> |
| <h4>Expressions containing only non-position predicates</h4> |
| |
| <p>Predicate-order is significant when the predicate-chain contains one or |
| more predicate with an expression similar to |
| "<code>position() > 3</code>" or "<code>2</code>". This |
| is because the <code>position()</code> and <code>last()</code> explicitly |
| refer to the intermediate step between applying each predicate. The |
| expression:</p> |
| <blockquote class="source"> |
| <pre> |
| <xsl:for-each select="//bar[contains(@name,'ogan')][parent::*/@location = 'Town']"></pre> |
| </blockquote> |
| <p>has two predicates that can be applied in any order and still produce the |
| desired node-set. Such predicates can be merged to:</p> |
| <blockquote class="source"> |
| <pre> |
| <xsl:for-each select="//bar[contains(@name,'ogan') & (parent::*/@location = 'Town')]"></pre> |
| </blockquote> |
| <p>We call this category NO_CONTEXT.</p> |
| |
| |
| |
| <p align="right" size="2"> |
| <a href="#content">(top)</a> |
| </p> |
| <h4>Expressions containing position predicates</h4> |
| |
| <p>A predicate-chain, whose predicates' expressions contain any use of the |
| <code>position()</code> or <code>last()</code> functions require some way |
| of representing the intermediate step in an iterator. The first predicate |
| is applied to the original node-set, and the resulting node-set must then |
| be stored in some other iterator, from which the second predicate can get |
| the current position from the iterator's <code>getPosition()</code> and |
| <code>getLast()</code> methods. We call this category |
| GENERAL_CONTEXT</p> |
| |
| |
| |
| <a name="exception"></a> |
| <p align="right" size="2"> |
| <a href="#content">(top)</a> |
| </p> |
| <h4>Expressions containing one position predicate</h4> |
| |
| <p>There is one expection from the GENERAL_CONTEXT category. If the |
| predicate-chain contains only one position-predicate, and that predicate is |
| the very first one, then that predicate can call the iterator that contains |
| the first node-set directly. Just look:</p> |
| <blockquote class="source"> |
| <pre> |
| <xsl:for-each select="//bar[2][parent::*/@location = 'Drumcondra']"></pre> |
| </blockquote> |
| <p>The <code>[2]</code> predicate can be applied to the original iterator |
| for the <code>//bar</code> step. And so can the |
| <code>[parent::*/@location = 'Drumcondra']</code> predicate as well. This |
| is only the case when the position predicate is first in the predicate |
| chain. These types of predicate chains belong in the |
| NO_CONTEXT category.</p> |
| |
| |
| |
| |
| |
| <p align="right" size="2"> |
| <a href="#content">(top)</a> |
| </p> |
| <h3>Design details</h3> |
| |
| <p>Predicates are handled quite differently in step expressions and step |
| patterns. Step expressions are not implemented with the various contexts in |
| mind and use a specialised iterator to wrap the code for each predicate. |
| Step patterns are more complicated and CPU (or should I say JVM?) |
| exhaustive. Step patterns containing predicates are analysed to determine |
| context type and compiled accordingly.</p> |
| |
| <p align="right" size="2"> |
| <a href="#content">(top)</a> |
| </p> |
| <h4>Predicates and Step expressions</h4> |
| |
| <p>The basic behaviour for a predicate is to compile a <b>filter</b>. This |
| filter is an auxiliary class that implements the |
| <code>org.apache.xalan.xsltc.dom.CurrentNodeListFilter</code> interface. The |
| <code>Step</code> or <code>StepPattern</code> that uses the predicate will |
| create a <code>org.apache.xalan.xsltc.dom.CurrentNodeListFilter</code>. This |
| iterator contains the nodes that pass through the predicate. The compiled |
| filter is used by the iterator to determine which nodes that should be |
| included. The <code>org.apache.xalan.xsltc.dom.CurrentNodeListFilter</code> |
| interface contains only a single method:</p> |
| <blockquote class="source"> |
| <pre> |
| public interface CurrentNodeListFilter { |
| public abstract boolean test(int node, int position, int last, int current, |
| AbstractTranslet translet, NodeIterator iter); |
| }</pre> |
| </blockquote> |
| |
| <p>The code that is compiled into the <code>test()</code> method is the |
| code for the predicate's expression. The <code>Predicate</code> class |
| compiles the filter class and a <code>test()</code> method skeleton, while |
| some sub-class of the <code>Expression</code> class compiles the actual |
| code that goes into this method.</p> |
| |
| <p>The iterator is initialised with a filter that implements this interface: |
| </p> |
| <blockquote class="source"> |
| <pre> |
| public CurrentNodeListIterator(NodeIterator source, |
| CurrentNodeListFilter filter, |
| int currentNode, |
| AbstractTranslet translet) { |
| this(source, !source.isReverse(), filter, currentNode, translet); |
| }</pre> |
| </blockquote> |
| |
| <p>The iterator will use its source iterator to provide it with the initial |
| node-set. Each node that is returned from this set is passed through the |
| filter before returned by the <code>next()</code> method. Note that the |
| source iterator can also be a current node-list iterator (if two or more |
| predicates are chained together).</p> |
| |
| |
| |
| <p align="right" size="2"> |
| <a href="#content">(top)</a> |
| </p> |
| <h4>Optimisations in Step expressions</h4> |
| |
| <h5>Node-value iterators</h5> |
| |
| <p>Some simple predicates that test for node values are handled by the |
| <code>NodeValueIterator</code> class at runtime. These are:</p> |
| <blockquote class="source"> |
| <pre> |
| A: foo[@attr = <value>] |
| B: foo[bar = <value>] |
| C: foo/bar[. = <value>]</pre> |
| </blockquote> |
| |
| <p>The first case is handled by creating an iterator that represents |
| <code>foo/@attr</code>, then passing this iterator and a test-value to |
| a <code>NodeValueIterator</code>. The <b><value></b> is an |
| expression that is compiled and passed to the iterator as a string. It |
| does <b>not</b> have to be a literal string as the string value is |
| found at runtime. The last two cases are similarly handled by creating an |
| iterator for <code>foo/bar</code> and passing that and the test-value to |
| a <code>NodeValueIterator</code>.</p> |
| |
| |
| |
| <h5>Nth descendant iterators</h5> |
| |
| <p>The <code>Step</code> class is also optimised for position-predicates |
| that are applied to descendant iterators:</p> |
| <blockquote class="source"> |
| <pre> |
| <xsl:for-each select="//bar[3]"></pre> |
| </blockquote> |
| |
| <p>Such step/predicate combinations are handled by the internal DOM's |
| inner class <code>NthDescendantIterator</code>.</p> |
| |
| |
| |
| <h5>Nth position iterators</h5> |
| |
| <p>Similarly, the <code>Step</code> class is optimised for |
| position-predicates that are applied to basic steps:</p> |
| <blockquote class="source"> |
| <pre> |
| <xsl:for-each select="bar[3]"></pre> |
| </blockquote> |
| |
| <p>Such step/predicate combinations are handled by the internal DOM's |
| inner class <code>NthPositionIterator</code>.</p> |
| |
| |
| |
| <h5>Node test</h5> |
| |
| <p>The predicate class contains a method that tells you if it is a boolean |
| test:</p> |
| <blockquote class="source"> |
| <pre> |
| public boolean isBooleanTest();</pre> |
| </blockquote> |
| |
| <p>This can be, but it currently is not, used by the <code>Step</code> class |
| to compile in optimised code. Some work to be done here!</p> |
| |
| |
| |
| |
| |
| <a name="step-patterns"></a> |
| <p align="right" size="2"> |
| <a href="#content">(top)</a> |
| </p> |
| <h4>Predicates and StepPatterns</h4> |
| |
| <p>Using predicates in patterns is slow on any XSLT processor, and XSLTC |
| is no exception. This is why the predicate context is carefully analysed |
| by the <code>StepPattern</code> class, so that the compiled code is |
| specialised to handle the specific predicate(s) in use. First we should |
| consider the basic step pattern.</p> |
| |
| <h5>Basic pattern handling</h5> |
| |
| <p>All patterns are grouped (by the <code>Mode</code> class) according to |
| their <b> |
| <i>kernel</i> |
| </b>-node type. The kernel node-type is node-type of the |
| <b>last</b> step in a pattern:</p> |
| <blockquote class="source"> |
| <pre> |
| <xsl:template match="foo/bar/baz"> ... <xsl:template></pre> |
| </blockquote> |
| |
| <p>In this case the type for elements <code><baz></code> is the |
| kernel type. This step is <b>not</b> compiled as a step pattern. The node |
| type is passed to the <code>Mode</code> class and is used to place the |
| remainder of the pattern code inside the big <code>switch()</code> statement |
| in the translet's <code>applyTemplates()</code> method. The whole pattern |
| is then <b>reduced</b> to:</p> |
| <blockquote class="source"> |
| <pre> |
| match="foo/bar"</pre> |
| </blockquote> |
| |
| <p>The <code>StepPattern</code> representing the <code><bar></code> |
| element test is compiled under the appropriate <code>case:</code> section |
| of the <code>switch()</code> statement. The code compiled for the step |
| pattern is basically just a call to the DOM's <code>getType()</code> |
| method and a test for the desired node type. There are two special cases |
| for:</p> |
| <blockquote class="source"> |
| <pre> |
| <xsl:template match="foo/*/baz"> ... <xsl:template> |
| <xsl:template match="foo/*@[2]"> ... <xsl:template></pre> |
| </blockquote> |
| |
| <p>In the first case we call <code>isElement()</code> and in the second case |
| we call <code>isAttribute()</code>, instead of <code>getType()</code>.</p> |
| |
| |
| |
| <h5>Patterns with predicates</h5> |
| |
| <p>The <code>typeCheck()</code> method of the <code>StepPattern</code> |
| invokes a method that analyses the predicates of the step pattern and |
| determines their context. (Note that this method needs to be updated to |
| handle the exception to the GENERAL_CONTEXT metioned in the section |
| <a href="#exception">Expressions containing one position predicate</a> earlier in this document.) The <code>translate()</code> method of the |
| <code>StepPattern</code> class contains a <code>switch()</code> statement |
| that calls methods that are tailored for compiling code for the various |
| predicate contexts.</p> |
| |
| |
| |
| |
| |
| |
| |
| <p align="right" size="2"> |
| <a href="#content">(top)</a> |
| </p> |
| </div> |
| <div id="footer">Copyright © 1999-2014 The Apache Software Foundation<br />Apache, Xalan, and the Feather logo are trademarks of The Apache Software Foundation<div class="small">Web Page created on - Thu 2014-05-15</div> |
| </div> |
| </body> |
| </html> |