<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?> | |
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> | |
<html> | |
<head> | |
<title>ASF: XSLTC Predicate Handling</title> | |
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" /> | |
<meta http-equiv="Content-Style-Type" content="text/css" /> | |
<link rel="stylesheet" type="text/css" href="resources/apache-xalan.css" /> | |
</head> | |
<!-- | |
* Licensed to the Apache Software Foundation (ASF) under one | |
* or more contributor license agreements. See the NOTICE file | |
* distributed with this work for additional information | |
* regarding copyright ownership. The ASF licenses this file | |
* to you under the Apache License, Version 2.0 (the "License"); | |
* you may not use this file except in compliance with the License. | |
* You may obtain a copy of the License at | |
* | |
* http://www.apache.org/licenses/LICENSE-2.0 | |
* | |
* Unless required by applicable law or agreed to in writing, software | |
* distributed under the License is distributed on an "AS IS" BASIS, | |
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
* See the License for the specific language governing permissions and | |
* limitations under the License. | |
--> | |
<body> | |
<div id="title"> | |
<table class="HdrTitle"> | |
<tbody> | |
<tr> | |
<th rowspan="2"> | |
<a href="../index.html"> | |
<img alt="Trademark Logo" src="resources/XalanJ-Logo-tm.png" width="190" height="90" /> | |
</a> | |
</th> | |
<th text-align="center" width="75%"> | |
<a href="index.html">XSLTC Design</a> | |
</th> | |
</tr> | |
<tr> | |
<td valign="middle">XSLTC Predicate Handling</td> | |
</tr> | |
</tbody> | |
</table> | |
<table class="HdrButtons" align="center" border="1"> | |
<tbody> | |
<tr> | |
<td> | |
<a href="http://www.apache.org">Apache Foundation</a> | |
</td> | |
<td> | |
<a href="http://xalan.apache.org">Xalan Project</a> | |
</td> | |
<td> | |
<a href="http://xerces.apache.org">Xerces Project</a> | |
</td> | |
<td> | |
<a href="http://www.w3.org/TR">Web Consortium</a> | |
</td> | |
<td> | |
<a href="http://www.oasis-open.org/standards">Oasis Open</a> | |
</td> | |
</tr> | |
</tbody> | |
</table> | |
</div> | |
<div id="navLeft"> | |
<ul> | |
<li> | |
<a href="index.html">Overview</a> | |
</li></ul><hr /><ul> | |
<li> | |
<a href="xsltc_compiler.html">Compiler design</a> | |
</li></ul><hr /><ul> | |
<li> | |
<a href="xsl_whitespace_design.html">Whitespace</a> | |
</li> | |
<li> | |
<a href="xsl_sort_design.html">xsl:sort</a> | |
</li> | |
<li> | |
<a href="xsl_key_design.html">Keys</a> | |
</li> | |
<li> | |
<a href="xsl_comment_design.html">Comment design</a> | |
</li></ul><hr /><ul> | |
<li> | |
<a href="xsl_lang_design.html">lang()</a> | |
</li> | |
<li> | |
<a href="xsl_unparsed_design.html">Unparsed entities</a> | |
</li></ul><hr /><ul> | |
<li> | |
<a href="xsl_if_design.html">If design</a> | |
</li> | |
<li> | |
<a href="xsl_choose_design.html">Choose|When|Otherwise design</a> | |
</li> | |
<li> | |
<a href="xsl_include_design.html">Include|Import design</a> | |
</li> | |
<li> | |
<a href="xsl_variable_design.html">Variable|Param design</a> | |
</li></ul><hr /><ul> | |
<li> | |
<a href="xsltc_runtime.html">Runtime</a> | |
</li></ul><hr /><ul> | |
<li> | |
<a href="xsltc_dom.html">Internal DOM</a> | |
</li> | |
<li> | |
<a href="xsltc_namespace.html">Namespaces</a> | |
</li></ul><hr /><ul> | |
<li> | |
<a href="xsltc_trax.html">Translet & TrAX</a> | |
</li> | |
<li>XPath Predicates<br /> | |
</li> | |
<li> | |
<a href="xsltc_iterators.html">Xsltc Iterators</a> | |
</li> | |
<li> | |
<a href="xsltc_native_api.html">Xsltc Native API</a> | |
</li> | |
<li> | |
<a href="xsltc_trax_api.html">Xsltc TrAX API</a> | |
</li> | |
<li> | |
<a href="xsltc_performance.html">Performance Hints</a> | |
</li> | |
</ul> | |
</div> | |
<div id="content"> | |
<h2>XSLTC Predicate Handling</h2> | |
<p align="right" size="2"> | |
<a href="#content">(top)</a> | |
</p> | |
<h3>Definition</h3> | |
<p>According to Michael Kay's "XSLT Programmer's Reference" page | |
736, a predicate is "An expression used to filter which nodes are | |
selected by a particular step in a path expression, or to select a subset of | |
the nodes in a node-set. A Boolean expression selects the nodes for which the | |
predicate is true; a numeric expression selects the node at the position | |
given by the value of the expression, for example '[1]' selects the first | |
node.". Note that a predicate containing a boolean expression can | |
return zero, one or more nodes, while a predicate containing a numeric | |
expression can return only zero or one node.</p> | |
<p align="right" size="2"> | |
<a href="#content">(top)</a> | |
</p> | |
<h3>Examples</h3> | |
<p>I'll list a few examples that I can refer back to later on in this | |
document. All examples will use this XML document:</p> | |
<blockquote class="source"> | |
<pre> | |
<?xml version="1.0"?> | |
<doc> | |
<foo location="Drumcondra"> | |
<bar name="Cat and Cage"/> | |
<bar name="Fagan's"/> | |
<bar name="Gravedigger's"/> | |
<bar name="Ivy House"/> | |
<foo> | |
<foo location="Town"> | |
<bar name="Peter's Pub"/> | |
<bar name="Grogan's"/> | |
<bar name="Hogans's"/> | |
<bar name="Brogan's"/> | |
</foo> | |
</doc></pre> | |
</blockquote> | |
<p>Here are some examples of a predicate with boolean expressions:</p> | |
<blockquote class="source"> | |
<pre> | |
<xsl:for-each select="//bar[contains(@name,'ogan')]"> | |
<xsl:for-each select="//bar[parent::*/@location = 'Drumcondra']"> | |
<xsl:for-each select="//bar[@name = 'Cat and Cage']"></pre> | |
</blockquote> | |
<p>The first two select more than one node, while the last selects only one. | |
The last expression could select more nodes if the input document was | |
different. Now, here are a few examples of predicates with numeric | |
expressions:</p> | |
<blockquote class="source"> | |
<pre> | |
<xsl:value-of select="//bar[1]"> | |
<xsl:value-of select="/doc/foo[2]/bar[1]"> | |
<xsl:value-of select="/doc/foo[2]/bar"></pre> | |
</blockquote> | |
<p>The last expression will return more than one node, but the step that | |
contains the predicate returns only one (the second <code><foo></code> | |
element).</p> | |
<p>The above are the basic types of predicates. These can be grouped to create | |
a predicate pipeline, where the first predicate reduces the node-set that the | |
second predicate filters, and so on. Here are some examples:</p> | |
<blockquote class="source"> | |
<pre> | |
A: <for-each select="//bar[contains(@name,'ogan')][2]"> | |
C: <for-each select="//bar[2][contains(@name,'ogan')]"> | |
B: <for-each select="//bar[position() > 3][2]"></pre> | |
</blockquote> | |
<p>It is easier to figure out which nodes these expressions should return if | |
one goes through the steps and predicates one by one. In expression | |
<code>A:</code> we first get all <code><bar></code> elements from the | |
whole document. Then the first predicate selects from that node-set only | |
those elements that have a <code>@name</code> attribute that contains | |
"ogan", and we're left with these elements:</p> | |
<blockquote class="source"> | |
<pre> | |
<bar name="Grogan's"> | |
<bar name="Hogans's"> | |
<bar name="Brogan's"></pre> | |
</blockquote> | |
<p>And finally, the last predicate then selects the second of those | |
elements:</p> | |
<blockquote class="source"> | |
<pre> | |
<bar name="Hogans's"></pre> | |
</blockquote> | |
<p>Expression <code>B:</code> contains the same predicates as <code>A:</code>, | |
but the resulting node set if completely different. We start off with the same | |
set of <code><bar></code> elements, but we apply the | |
<code>"[2]"</code> predicate first, and end up with this | |
element:</p> | |
<blockquote class="source"> | |
<pre> | |
<bar name="Fagan's"></pre> | |
</blockquote> | |
<p>Fagan's is the bar where the Irish Taoiseach (prime minister) drinks his | |
pints, but its name does not contain the string "<code>ogan</code>", | |
so the resulting node-set is empty.</p> | |
<p>The third expressions also starts off with all <code><bar></code> | |
elements, applies the predicate "<code>[position() > 3]</code>", | |
and reduces the node set to these:</p> | |
<blockquote class="source"> | |
<pre> | |
<bar name="Ivy House"> | |
<bar name="Peter's Pub"> | |
<bar name="Grogan's"> | |
<bar name="Hogans's"> | |
<bar name="Brogan's"></pre> | |
</blockquote> | |
<p>The last predicate "<code>[2]</code>" is applied to this node-set | |
and set is further reduced to:</p> | |
<blockquote class="source"> | |
<pre> | |
<bar name="Peter's Pub"></pre> | |
</blockquote> | |
<p align="right" size="2"> | |
<a href="#content">(top)</a> | |
</p> | |
<h3>Categories</h3> | |
<p>From the examples in the last chapter we can try to categorize predicate | |
chains/pipelines to simplify our implementation. We can speed up processing | |
significantly if we can avoid using a data-structure (iterator) to represent | |
the intermediate step between predicates. The goal of setting up these | |
categories is to pinpoint those cases where an intermediate iterator has | |
to be used and when it can be avoided.</p> | |
<p align="right" size="2"> | |
<a href="#content">(top)</a> | |
</p> | |
<h4>Single predicate expressions</h4> | |
<p>Expressions containing just a single predicate have no intermediate step | |
and there is no need for any extra iterator. The expression inside the | |
predicate can be applied directly to the original iterator. We call this | |
category <b>SIMPLE_CONTEXT</b>.</p> | |
<p align="right" size="2"> | |
<a href="#content">(top)</a> | |
</p> | |
<h4>Expressions containing only non-position predicates</h4> | |
<p>Predicate-order is significant when the predicate-chain contains one or | |
more predicate with an expression similar to | |
"<code>position() > 3</code>" or "<code>2</code>". This | |
is because the <code>position()</code> and <code>last()</code> explicitly | |
refer to the intermediate step between applying each predicate. The | |
expression:</p> | |
<blockquote class="source"> | |
<pre> | |
<xsl:for-each select="//bar[contains(@name,'ogan')][parent::*/@location = 'Town']"></pre> | |
</blockquote> | |
<p>has two predicates that can be applied in any order and still produce the | |
desired node-set. Such predicates can be merged to:</p> | |
<blockquote class="source"> | |
<pre> | |
<xsl:for-each select="//bar[contains(@name,'ogan') & (parent::*/@location = 'Town')]"></pre> | |
</blockquote> | |
<p>We call this category NO_CONTEXT.</p> | |
<p align="right" size="2"> | |
<a href="#content">(top)</a> | |
</p> | |
<h4>Expressions containing position predicates</h4> | |
<p>A predicate-chain, whose predicates' expressions contain any use of the | |
<code>position()</code> or <code>last()</code> functions require some way | |
of representing the intermediate step in an iterator. The first predicate | |
is applied to the original node-set, and the resulting node-set must then | |
be stored in some other iterator, from which the second predicate can get | |
the current position from the iterator's <code>getPosition()</code> and | |
<code>getLast()</code> methods. We call this category | |
GENERAL_CONTEXT</p> | |
<a name="exception">‌</a> | |
<p align="right" size="2"> | |
<a href="#content">(top)</a> | |
</p> | |
<h4>Expressions containing one position predicate</h4> | |
<p>There is one expection from the GENERAL_CONTEXT category. If the | |
predicate-chain contains only one position-predicate, and that predicate is | |
the very first one, then that predicate can call the iterator that contains | |
the first node-set directly. Just look:</p> | |
<blockquote class="source"> | |
<pre> | |
<xsl:for-each select="//bar[2][parent::*/@location = 'Drumcondra']"></pre> | |
</blockquote> | |
<p>The <code>[2]</code> predicate can be applied to the original iterator | |
for the <code>//bar</code> step. And so can the | |
<code>[parent::*/@location = 'Drumcondra']</code> predicate as well. This | |
is only the case when the position predicate is first in the predicate | |
chain. These types of predicate chains belong in the | |
NO_CONTEXT category.</p> | |
<p align="right" size="2"> | |
<a href="#content">(top)</a> | |
</p> | |
<h3>Design details</h3> | |
<p>Predicates are handled quite differently in step expressions and step | |
patterns. Step expressions are not implemented with the various contexts in | |
mind and use a specialised iterator to wrap the code for each predicate. | |
Step patterns are more complicated and CPU (or should I say JVM?) | |
exhaustive. Step patterns containing predicates are analysed to determine | |
context type and compiled accordingly.</p> | |
<p align="right" size="2"> | |
<a href="#content">(top)</a> | |
</p> | |
<h4>Predicates and Step expressions</h4> | |
<p>The basic behaviour for a predicate is to compile a <b>filter</b>. This | |
filter is an auxiliary class that implements the | |
<code>org.apache.xalan.xsltc.dom.CurrentNodeListFilter</code> interface. The | |
<code>Step</code> or <code>StepPattern</code> that uses the predicate will | |
create a <code>org.apache.xalan.xsltc.dom.CurrentNodeListFilter</code>. This | |
iterator contains the nodes that pass through the predicate. The compiled | |
filter is used by the iterator to determine which nodes that should be | |
included. The <code>org.apache.xalan.xsltc.dom.CurrentNodeListFilter</code> | |
interface contains only a single method:</p> | |
<blockquote class="source"> | |
<pre> | |
public interface CurrentNodeListFilter { | |
public abstract boolean test(int node, int position, int last, int current, | |
AbstractTranslet translet, NodeIterator iter); | |
}</pre> | |
</blockquote> | |
<p>The code that is compiled into the <code>test()</code> method is the | |
code for the predicate's expression. The <code>Predicate</code> class | |
compiles the filter class and a <code>test()</code> method skeleton, while | |
some sub-class of the <code>Expression</code> class compiles the actual | |
code that goes into this method.</p> | |
<p>The iterator is initialised with a filter that implements this interface: | |
</p> | |
<blockquote class="source"> | |
<pre> | |
public CurrentNodeListIterator(NodeIterator source, | |
CurrentNodeListFilter filter, | |
int currentNode, | |
AbstractTranslet translet) { | |
this(source, !source.isReverse(), filter, currentNode, translet); | |
}</pre> | |
</blockquote> | |
<p>The iterator will use its source iterator to provide it with the initial | |
node-set. Each node that is returned from this set is passed through the | |
filter before returned by the <code>next()</code> method. Note that the | |
source iterator can also be a current node-list iterator (if two or more | |
predicates are chained together).</p> | |
<p align="right" size="2"> | |
<a href="#content">(top)</a> | |
</p> | |
<h4>Optimisations in Step expressions</h4> | |
<h5>Node-value iterators</h5> | |
<p>Some simple predicates that test for node values are handled by the | |
<code>NodeValueIterator</code> class at runtime. These are:</p> | |
<blockquote class="source"> | |
<pre> | |
A: foo[@attr = <value>] | |
B: foo[bar = <value>] | |
C: foo/bar[. = <value>]</pre> | |
</blockquote> | |
<p>The first case is handled by creating an iterator that represents | |
<code>foo/@attr</code>, then passing this iterator and a test-value to | |
a <code>NodeValueIterator</code>. The <b><value></b> is an | |
expression that is compiled and passed to the iterator as a string. It | |
does <b>not</b> have to be a literal string as the string value is | |
found at runtime. The last two cases are similarly handled by creating an | |
iterator for <code>foo/bar</code> and passing that and the test-value to | |
a <code>NodeValueIterator</code>.</p> | |
<h5>Nth descendant iterators</h5> | |
<p>The <code>Step</code> class is also optimised for position-predicates | |
that are applied to descendant iterators:</p> | |
<blockquote class="source"> | |
<pre> | |
<xsl:for-each select="//bar[3]"></pre> | |
</blockquote> | |
<p>Such step/predicate combinations are handled by the internal DOM's | |
inner class <code>NthDescendantIterator</code>.</p> | |
<h5>Nth position iterators</h5> | |
<p>Similarly, the <code>Step</code> class is optimised for | |
position-predicates that are applied to basic steps:</p> | |
<blockquote class="source"> | |
<pre> | |
<xsl:for-each select="bar[3]"></pre> | |
</blockquote> | |
<p>Such step/predicate combinations are handled by the internal DOM's | |
inner class <code>NthPositionIterator</code>.</p> | |
<h5>Node test</h5> | |
<p>The predicate class contains a method that tells you if it is a boolean | |
test:</p> | |
<blockquote class="source"> | |
<pre> | |
public boolean isBooleanTest();</pre> | |
</blockquote> | |
<p>This can be, but it currently is not, used by the <code>Step</code> class | |
to compile in optimised code. Some work to be done here!</p> | |
<a name="step-patterns">‌</a> | |
<p align="right" size="2"> | |
<a href="#content">(top)</a> | |
</p> | |
<h4>Predicates and StepPatterns</h4> | |
<p>Using predicates in patterns is slow on any XSLT processor, and XSLTC | |
is no exception. This is why the predicate context is carefully analysed | |
by the <code>StepPattern</code> class, so that the compiled code is | |
specialised to handle the specific predicate(s) in use. First we should | |
consider the basic step pattern.</p> | |
<h5>Basic pattern handling</h5> | |
<p>All patterns are grouped (by the <code>Mode</code> class) according to | |
their <b> | |
<i>kernel</i> | |
</b>-node type. The kernel node-type is node-type of the | |
<b>last</b> step in a pattern:</p> | |
<blockquote class="source"> | |
<pre> | |
<xsl:template match="foo/bar/baz"> ... <xsl:template></pre> | |
</blockquote> | |
<p>In this case the type for elements <code><baz></code> is the | |
kernel type. This step is <b>not</b> compiled as a step pattern. The node | |
type is passed to the <code>Mode</code> class and is used to place the | |
remainder of the pattern code inside the big <code>switch()</code> statement | |
in the translet's <code>applyTemplates()</code> method. The whole pattern | |
is then <b>reduced</b> to:</p> | |
<blockquote class="source"> | |
<pre> | |
match="foo/bar"</pre> | |
</blockquote> | |
<p>The <code>StepPattern</code> representing the <code><bar></code> | |
element test is compiled under the appropriate <code>case:</code> section | |
of the <code>switch()</code> statement. The code compiled for the step | |
pattern is basically just a call to the DOM's <code>getType()</code> | |
method and a test for the desired node type. There are two special cases | |
for:</p> | |
<blockquote class="source"> | |
<pre> | |
<xsl:template match="foo/*/baz"> ... <xsl:template> | |
<xsl:template match="foo/*@[2]"> ... <xsl:template></pre> | |
</blockquote> | |
<p>In the first case we call <code>isElement()</code> and in the second case | |
we call <code>isAttribute()</code>, instead of <code>getType()</code>.</p> | |
<h5>Patterns with predicates</h5> | |
<p>The <code>typeCheck()</code> method of the <code>StepPattern</code> | |
invokes a method that analyses the predicates of the step pattern and | |
determines their context. (Note that this method needs to be updated to | |
handle the exception to the GENERAL_CONTEXT metioned in the section | |
<a href="#exception">Expressions containing one position predicate</a> earlier in this document.) The <code>translate()</code> method of the | |
<code>StepPattern</code> class contains a <code>switch()</code> statement | |
that calls methods that are tailored for compiling code for the various | |
predicate contexts.</p> | |
<p align="right" size="2"> | |
<a href="#content">(top)</a> | |
</p> | |
</div> | |
<div id="footer">Copyright © 1999-2012 The Apache Software Foundation<br />Apache, Xalan, and the Feather logo are trademarks of The Apache Software Foundation<div class="small">Web Page created on - Sun 03/18/2012</div> | |
</div> | |
</body> | |
</html> |