xdocs/sources/xsltc/xsl_key_design.xml - xalan-j - Git at Google

 <?xml version="1.0" standalone="no"?>
 <!DOCTYPE s1 SYSTEM "../../style/dtd/document.dtd">
 <!--
  * Licensed to the Apache Software Foundation (ASF) under one or more
  * contributor license agreements.  See the NOTICE file distributed with
  * this work for additional information regarding copyright ownership.
  * The ASF licenses this file to You under the Apache License, Version 2.0
  * (the "License"); you may not use this file except in compliance with
  * the License.  You may obtain a copy of the License at
  *
  *     http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing, software
  * distributed under the License is distributed on an "AS IS" BASIS,
  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  * See the License for the specific language governing permissions and
  * limitations under the License.
 -->
 <!-- $Id$ -->
   <s1 title="&lt;xsl:key&gt; / key() / IdKeyPattern">

   <s2 title="Contents">
   <ul>
     <li><link anchor="functionality">Functionality</link></li>
     <li><link anchor="implementation">Implementation</link></li>
   </ul>
   </s2>

   <anchor name="functionality"/>
   <s2 title="Functionality">

   <p>The <code>&lt;xsl:key&gt;</code> element is a top-level element that can be
   used to define a named index of nodes from the source XML tree(s). The
   element has three attributes:</p>

   <ul>
     <li>
       <code>name</code> - the name of the index
     </li>
     <li>
       <code>match</code> - a pattern that defines the nodeset we want
       indexed
     </li>
     <li>
       <code>use</code> - an expression that defines the value to be used
       as the index key value.
     </li>
   </ul>

   <p>A named index can be accessed using either the <code>key()</code> function
   or a KeyPattern. Both these methods address the index using its defined name
   (the "name" attribute above) and a parameter defining one or more lookup
   values for the index. The function or pattern returns a node set containing
   all nodes in the index whose key value match the parameter's value(s):</p>

   <source>
     &lt;xsl:key name="book-author" match="book" use="author"/&gt;
         :
         :
     &lt;xsl:for-each select="key('book-author', 'Mikhail Bulgakov')"&gt;
       &lt;xsl:value-of select="author"/&gt;
       &lt;xsl:text&gt;: &lt;/xsl:text&gt;
       &lt;xsl:value-of select="author"/&gt;
       &lt;xsl:text&gt;&amp;#xa;&lt;/xsl:text&gt;
     &lt;/xsl:for-each&gt;</source>

   <p>The KeyPattern can be used within an index definition to create unions
   and intersections of node sets:</p><source>
     &lt;xsl:key name="cultcies" match="farmer | fisherman" use="name"/&gt;</source>

   <p>This could have been done using regular <code>&lt;xsl:for-each&gt;</code>
   and <code>&lt;xsl:select&gt;</code> elements. However, if your stylesheet
   accesses the same selection of nodes over and over again, the transformation
   will be much more efficient using pre-indexed keys as shown above.</p>

   </s2>

   <anchor name="implementation"/>
   <s2 title="Implementation">

     <anchor name="indexing"/>
     <s3 title="Indexing">

     <p>The <code>AbstractTranslet</code> class has a global hashtable that holds
     an index for each named key in the stylesheet (hashing on the "name"
     attribute of <code>&lt;xsl:key&gt;</code>). <code>AbstractTranslet</code>
     has a couple of public methods for inserting and retrieving data from this
     hashtable:</p><source>
     public void buildKeyIndex(String keyName, int nodeID, String value);
     public KeyIndex KeyIndex getKeyIndex(String keyName);</source>

     <p>The <code>Key</code> class compiles code that traverses the input DOM and
     extracts nodes that match some given parameters (the <code>"match"</code>
     attribute of the <code>&lt;xsl:key&gt;</code> element). A new element is
     inserted into the named key's index. The nodes' DOM index and the value
     translated from the <code>"use"</code> attribute of the
     <code>&lt;xsl:key&gt;</code> element are stored in the new entry in the
     index.</p>

     <p>Something similar is done for indexing IDs. This index is generated from
     the <code>ID</code> and <code>IDREF</code> fields in the input document's
     DTD. This means that the code for generating this index cannot be generated
     at compile-time, which again means that the code has to be generic enough
     to handle all DTDs. The class that handles this is the
     <code>org.apache.xalan.xsltc.dom.DTDMonitor</code> class. This class
     implements the <code>org.xml.sax.XMLReader</code> and
     <code>org.xml.sax.DTDHandler</code> interfaces. A client application using
     the native API must instanciate a <code>DTDMonitor</code> and pass it to the
     translet code - if, and only if, it wants IDs indexed (one can improve
     performance by omitting this step). This is descrived in the
     <link idref="xsltc_native_api">XSLTC Native API reference</link>. The
     <code>DTDMonitor</code> class will use the same indexing as the code
     generated by the <code>Key</code> class. The index for ID's is called
     "##id". We assume that no stylesheets will contain a key with this name.</p>

     <p>The index itself is implemented in the
     <code>org.apache.xalan.xsltc.dom.KeyIndex</code> class. The index has an
     hashtable with all the values from the matching nodes (the part of the node
     used to generate this value is the one specified in the <code>"use"</code>
     attribute). For every matching value there is a bit-array (implemented in
     the <code>org.apache.xalan.xsltc.BitArray</code> class), holding a list of
     all node indexes for which this value gives a match:</p>
     <p><img src="key_relations.gif" alt="key_relations.gif"/></p>
     <p><ref>Figure 1: Indexing tables</ref></p>

     <p>The <code>KeyIndex</code> class implements the <code>NodeIterator</code>
     interface, so that it can be returned directly by the implementation of the
     <code>key()</code> function. This is how the index generated by
     <code>&lt;xsl:key&gt;</code> and the node-set returned by the
     <code>key()</code> and KeyPattern are tied together. You can see how this is
     done in the <code>translate()</code> method of the <code>KeyCall</code>
     class.</p>

     <p>The <code>key()</code> function can be called in two ways:</p><source>
     key('key-name','value')
     key('key-name','node-set')</source>

     <p>The first parameter is always the name of the key. We use this value to
     lookup our index from the _keyIndexes hashtable in AbstractTranslet:</p>
     <source>
     il.append(classGen.aloadThis());
     _name.translate(classGen, methodGen);
     il.append(new INVOKEVIRTUAL(getKeyIndex));</source>

     <p>This compiles into a call to
     <code>AbstractTranslet.getKeyIndex(String name)</code>, and it leaves a
     <code>KeyIndex</code> object on the stack. What we then need to do it to
     initialise the <code>KeyIndex</code> to give us nodes with the requested
     value. This is done by leaving the <code>KeyIndex</code> object on the stack
     and pushing the <code>"value"</code> parameter to <code>key()</code>, before
     calling <code>lookup()</code> on the index:</p><source>
     il.append(DUP);  // duplicate the KeyIndex obejct before return
     _value.translate(classGen, methodGen);
     il.append(new INVOKEVIRTUAL(lookup));</source>

     <p>This compiles into a call to <code>KeyIndex.lookup(String value)</code>.
     This will initialise the <code>KeyIndex</code> object to return nodes that
     match the given value, so the <code>KeyIndex</code> object can be left on
     the stack when we return. This because the <code>KeyIndex</code> object
     implements the <code>NodeIterator</code> interface.</p>

     <p>This matter is a bit more complex when the second parameter of
     <code>key()</code> is a node-set. In this case we need to traverse the nodes
     in the set and do a lookup for each node in the set. What I do is this:</p>

     <ul>
       <li>
         construct a <code>KeyIndex</code> object that will hold the
         return node-set
       </li>
       <li>
         find the named <code>KeyIndex</code> object from the hashtable in
         AbstractTranslet
       </li>
       <li>
         get an iterator for the node-set and do the folowing loop:</li>
         <ul>
           <li>get string value for current node</li>
           <li>do lookup in KeyIndex object for the named index</li>
           <li>merge the resulting node-set into the return node-set</li>
         </ul>
       <li>
         leave the return node-set on stack when done
       </li>
     </ul>

     </s3>

     <anchor name="improvements"/>
     <s3 title="Possible indexing improvements">

     <p>The indexing implementation can be very, very memeory exhaustive as there
     is one <code>BitArray</code> allocated for each value for every index. This
     is particulary bad for the index used for IDs ('##id'), where a single value
     should only map to one, single node. This means that a whole
     <code>BitArray</code> is used just to contain one node. The
     <code>KeyIndex</code> class should be updated to hold the first node for a
     value in an <code>Integer</code> object, and then replace that with a
     <code>BitArray</code> or a <code>Vector</code> only is a second node is
     added to the value. Here is an outline for <code>KeyIndex</code>:</p>
     <source>

     public void add(Object value, int node) {
         Object container;
 	if ((container = (BitArray)_index.get(value)) == null) {
             _index.put(value, new Integer(node));
 	}
 	else {
             // Check if there is _one_ node for this value
             if (container instanceof Integer) {
                 int first = ((Integer)container
                 _nodes = new BitArray(_arraySize);
                 _nodes.setMask(node &amp; 0xff000000);
                 _nodes.setBit(first &amp; 0x00ffffff);
                 _nodes.setBit(node &amp; 0x00ffffff);
                 _index.put(value, _nodes);
             }
             // Otherwise add node to axisting bit array
             else {
                 _nodex = (BitArray)container;
                 _nodes.setBit(node &amp; 0x00ffffff);
             }
         }
     }</source>

     <p>Other methods inside the <code>KeyIndex</code> should be updated to
     reflect this.</p>

     </s3>

     <anchor name="patterns"/>
     <s3 title="Id and Key patterns">

     <p>As already mentioned, the challenge with the <code>id()</code> and
     <code>key()</code> patterns is that they have no specific node type.
     Patterns are normally grouped based on their pattern's "kernel" node type.
     This is done in the <code>org.apache.xalan.xsltc.compiler.Mode</code> class.
     The <code>Mode</code> class has a method, <code>flatternAlaternative</code>,
     that does this grouping, and all templates with a common kernel node type
     will be inserted into a "test sequence". A test sequence is a set templates
     with the same kernel node type. The <code>TestSeq</code> class generates
     code that will figure out which pattern, amongst several patterns with the
     same kernel node type, that matches a certain node. This is used by the
     <code>Mode</code> class when generating the <code>applyTemplates</code>
     method in the translet. A test sequence is also generated for all templates
     whose pattern does not have a kernel node type. This is the case for all
     Id and KeyPatterns. This test sequence, if necessary, is put before the
     big <code>switch()</code> in the <code>applyTemplates()</code> mehtod. This
     test has to be done for every single node that is traversed, causing the
     transformation to slow down siginificantly. This is why we do <em>not</em>
     recommend using this type of patterns with XSLTC.</p>

     </s3>

   </s2>

 </s1>
	<?xml version="1.0" standalone="no"?>
	<!DOCTYPE s1 SYSTEM "../../style/dtd/document.dtd">
	<!--
	* Licensed to the Apache Software Foundation (ASF) under one or more
	* contributor license agreements. See the NOTICE file distributed with
	* this work for additional information regarding copyright ownership.
	* The ASF licenses this file to You under the Apache License, Version 2.0
	* (the "License"); you may not use this file except in compliance with
	* the License. You may obtain a copy of the License at
	*
	* http://www.apache.org/licenses/LICENSE-2.0
	*
	* Unless required by applicable law or agreed to in writing, software
	* distributed under the License is distributed on an "AS IS" BASIS,
	* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	* See the License for the specific language governing permissions and
	* limitations under the License.
	-->
	<!-- $Id$ -->
	<s1 title="<xsl:key> / key() / IdKeyPattern">

	<s2 title="Contents">
	<ul>
	<li><link anchor="functionality">Functionality</link></li>
	<li><link anchor="implementation">Implementation</link></li>
	</ul>
	</s2>

	<anchor name="functionality"/>
	<s2 title="Functionality">

	<p>The <code><xsl:key></code> element is a top-level element that can be
	used to define a named index of nodes from the source XML tree(s). The
	element has three attributes:</p>

	<ul>
	<li>
	<code>name</code> - the name of the index
	</li>
	<li>
	<code>match</code> - a pattern that defines the nodeset we want
	indexed
	</li>
	<li>
	<code>use</code> - an expression that defines the value to be used
	as the index key value.
	</li>
	</ul>

	<p>A named index can be accessed using either the <code>key()</code> function
	or a KeyPattern. Both these methods address the index using its defined name
	(the "name" attribute above) and a parameter defining one or more lookup
	values for the index. The function or pattern returns a node set containing
	all nodes in the index whose key value match the parameter's value(s):</p>

	<source>
	<xsl:key name="book-author" match="book" use="author"/>
	:
	:
	<xsl:for-each select="key('book-author', 'Mikhail Bulgakov')">
	<xsl:value-of select="author"/>
	<xsl:text>: </xsl:text>
	<xsl:value-of select="author"/>
	<xsl:text>&#xa;</xsl:text>
	</xsl:for-each></source>

	<p>The KeyPattern can be used within an index definition to create unions
	and intersections of node sets:</p><source>
	<xsl:key name="cultcies" match="farmer \| fisherman" use="name"/></source>

	<p>This could have been done using regular <code><xsl:for-each></code>
	and <code><xsl:select></code> elements. However, if your stylesheet
	accesses the same selection of nodes over and over again, the transformation
	will be much more efficient using pre-indexed keys as shown above.</p>

	</s2>

	<anchor name="implementation"/>
	<s2 title="Implementation">

	<anchor name="indexing"/>
	<s3 title="Indexing">

	<p>The <code>AbstractTranslet</code> class has a global hashtable that holds
	an index for each named key in the stylesheet (hashing on the "name"
	attribute of <code><xsl:key></code>). <code>AbstractTranslet</code>
	has a couple of public methods for inserting and retrieving data from this
	hashtable:</p><source>
	public void buildKeyIndex(String keyName, int nodeID, String value);
	public KeyIndex KeyIndex getKeyIndex(String keyName);</source>

	<p>The <code>Key</code> class compiles code that traverses the input DOM and
	extracts nodes that match some given parameters (the <code>"match"</code>
	attribute of the <code><xsl:key></code> element). A new element is
	inserted into the named key's index. The nodes' DOM index and the value
	translated from the <code>"use"</code> attribute of the
	<code><xsl:key></code> element are stored in the new entry in the
	index.</p>

	<p>Something similar is done for indexing IDs. This index is generated from
	the <code>ID</code> and <code>IDREF</code> fields in the input document's
	DTD. This means that the code for generating this index cannot be generated
	at compile-time, which again means that the code has to be generic enough
	to handle all DTDs. The class that handles this is the
	<code>org.apache.xalan.xsltc.dom.DTDMonitor</code> class. This class
	implements the <code>org.xml.sax.XMLReader</code> and
	<code>org.xml.sax.DTDHandler</code> interfaces. A client application using
	the native API must instanciate a <code>DTDMonitor</code> and pass it to the
	translet code - if, and only if, it wants IDs indexed (one can improve
	performance by omitting this step). This is descrived in the
	<link idref="xsltc_native_api">XSLTC Native API reference</link>. The
	<code>DTDMonitor</code> class will use the same indexing as the code
	generated by the <code>Key</code> class. The index for ID's is called
	"##id". We assume that no stylesheets will contain a key with this name.</p>

	<p>The index itself is implemented in the
	<code>org.apache.xalan.xsltc.dom.KeyIndex</code> class. The index has an
	hashtable with all the values from the matching nodes (the part of the node
	used to generate this value is the one specified in the <code>"use"</code>
	attribute). For every matching value there is a bit-array (implemented in
	the <code>org.apache.xalan.xsltc.BitArray</code> class), holding a list of
	all node indexes for which this value gives a match:</p>
	<p><img src="key_relations.gif" alt="key_relations.gif"/></p>
	<p><ref>Figure 1: Indexing tables</ref></p>

	<p>The <code>KeyIndex</code> class implements the <code>NodeIterator</code>
	interface, so that it can be returned directly by the implementation of the
	<code>key()</code> function. This is how the index generated by
	<code><xsl:key></code> and the node-set returned by the
	<code>key()</code> and KeyPattern are tied together. You can see how this is
	done in the <code>translate()</code> method of the <code>KeyCall</code>
	class.</p>

	<p>The <code>key()</code> function can be called in two ways:</p><source>
	key('key-name','value')
	key('key-name','node-set')</source>

	<p>The first parameter is always the name of the key. We use this value to
	lookup our index from the _keyIndexes hashtable in AbstractTranslet:</p>
	<source>
	il.append(classGen.aloadThis());
	_name.translate(classGen, methodGen);
	il.append(new INVOKEVIRTUAL(getKeyIndex));</source>

	<p>This compiles into a call to
	<code>AbstractTranslet.getKeyIndex(String name)</code>, and it leaves a
	<code>KeyIndex</code> object on the stack. What we then need to do it to
	initialise the <code>KeyIndex</code> to give us nodes with the requested
	value. This is done by leaving the <code>KeyIndex</code> object on the stack
	and pushing the <code>"value"</code> parameter to <code>key()</code>, before
	calling <code>lookup()</code> on the index:</p><source>
	il.append(DUP); // duplicate the KeyIndex obejct before return
	_value.translate(classGen, methodGen);
	il.append(new INVOKEVIRTUAL(lookup));</source>

	<p>This compiles into a call to <code>KeyIndex.lookup(String value)</code>.
	This will initialise the <code>KeyIndex</code> object to return nodes that
	match the given value, so the <code>KeyIndex</code> object can be left on
	the stack when we return. This because the <code>KeyIndex</code> object
	implements the <code>NodeIterator</code> interface.</p>

	<p>This matter is a bit more complex when the second parameter of
	<code>key()</code> is a node-set. In this case we need to traverse the nodes
	in the set and do a lookup for each node in the set. What I do is this:</p>

	<ul>
	<li>
	construct a <code>KeyIndex</code> object that will hold the
	return node-set
	</li>
	<li>
	find the named <code>KeyIndex</code> object from the hashtable in
	AbstractTranslet
	</li>
	<li>
	get an iterator for the node-set and do the folowing loop:</li>
	<ul>
	<li>get string value for current node</li>
	<li>do lookup in KeyIndex object for the named index</li>
	<li>merge the resulting node-set into the return node-set</li>
	</ul>
	<li>
	leave the return node-set on stack when done
	</li>
	</ul>

	</s3>

	<anchor name="improvements"/>
	<s3 title="Possible indexing improvements">

	<p>The indexing implementation can be very, very memeory exhaustive as there
	is one <code>BitArray</code> allocated for each value for every index. This
	is particulary bad for the index used for IDs ('##id'), where a single value
	should only map to one, single node. This means that a whole
	<code>BitArray</code> is used just to contain one node. The
	<code>KeyIndex</code> class should be updated to hold the first node for a
	value in an <code>Integer</code> object, and then replace that with a
	<code>BitArray</code> or a <code>Vector</code> only is a second node is
	added to the value. Here is an outline for <code>KeyIndex</code>:</p>
	<source>

	public void add(Object value, int node) {
	Object container;
	if ((container = (BitArray)_index.get(value)) == null) {
	_index.put(value, new Integer(node));
	}
	else {
	// Check if there is _one_ node for this value
	if (container instanceof Integer) {
	int first = ((Integer)container
	_nodes = new BitArray(_arraySize);
	_nodes.setMask(node & 0xff000000);
	_nodes.setBit(first & 0x00ffffff);
	_nodes.setBit(node & 0x00ffffff);
	_index.put(value, _nodes);
	}
	// Otherwise add node to axisting bit array
	else {
	_nodex = (BitArray)container;
	_nodes.setBit(node & 0x00ffffff);
	}
	}
	}</source>

	<p>Other methods inside the <code>KeyIndex</code> should be updated to
	reflect this.</p>

	</s3>

	<anchor name="patterns"/>
	<s3 title="Id and Key patterns">

	<p>As already mentioned, the challenge with the <code>id()</code> and
	<code>key()</code> patterns is that they have no specific node type.
	Patterns are normally grouped based on their pattern's "kernel" node type.
	This is done in the <code>org.apache.xalan.xsltc.compiler.Mode</code> class.
	The <code>Mode</code> class has a method, <code>flatternAlaternative</code>,
	that does this grouping, and all templates with a common kernel node type
	will be inserted into a "test sequence". A test sequence is a set templates
	with the same kernel node type. The <code>TestSeq</code> class generates
	code that will figure out which pattern, amongst several patterns with the
	same kernel node type, that matches a certain node. This is used by the
	<code>Mode</code> class when generating the <code>applyTemplates</code>
	method in the translet. A test sequence is also generated for all templates
	whose pattern does not have a kernel node type. This is the case for all
	Id and KeyPatterns. This test sequence, if necessary, is put before the
	big <code>switch()</code> in the <code>applyTemplates()</code> mehtod. This
	test has to be done for every single node that is traversed, causing the
	transformation to slow down siginificantly. This is why we do <em>not</em>
	recommend using this type of patterns with XSLTC.</p>

	</s3>

	</s2>

	</s1>