| <?xml version="1.0" standalone="no"?> |
| <!DOCTYPE s1 SYSTEM "../../style/dtd/document.dtd"> |
| <!-- |
| * Licensed to the Apache Software Foundation (ASF) under one or more |
| * contributor license agreements. See the NOTICE file distributed with |
| * this work for additional information regarding copyright ownership. |
| * The ASF licenses this file to You under the Apache License, Version 2.0 |
| * (the "License"); you may not use this file except in compliance with |
| * the License. You may obtain a copy of the License at |
| * |
| * http://www.apache.org/licenses/LICENSE-2.0 |
| * |
| * Unless required by applicable law or agreed to in writing, software |
| * distributed under the License is distributed on an "AS IS" BASIS, |
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| * See the License for the specific language governing permissions and |
| * limitations under the License. |
| --> |
| <!-- $Id$ --> |
| <s1 title="<xsl:key> / key() / IdKeyPattern"> |
| |
| <s2 title="Contents"> |
| <ul> |
| <li><link anchor="functionality">Functionality</link></li> |
| <li><link anchor="implementation">Implementation</link></li> |
| </ul> |
| </s2> |
| |
| <anchor name="functionality"/> |
| <s2 title="Functionality"> |
| |
| <p>The <code><xsl:key></code> element is a top-level element that can be |
| used to define a named index of nodes from the source XML tree(s). The |
| element has three attributes:</p> |
| |
| <ul> |
| <li> |
| <code>name</code> - the name of the index |
| </li> |
| <li> |
| <code>match</code> - a pattern that defines the nodeset we want |
| indexed |
| </li> |
| <li> |
| <code>use</code> - an expression that defines the value to be used |
| as the index key value. |
| </li> |
| </ul> |
| |
| <p>A named index can be accessed using either the <code>key()</code> function |
| or a KeyPattern. Both these methods address the index using its defined name |
| (the "name" attribute above) and a parameter defining one or more lookup |
| values for the index. The function or pattern returns a node set containing |
| all nodes in the index whose key value match the parameter's value(s):</p> |
| |
| <source> |
| <xsl:key name="book-author" match="book" use="author"/> |
| : |
| : |
| <xsl:for-each select="key('book-author', 'Mikhail Bulgakov')"> |
| <xsl:value-of select="author"/> |
| <xsl:text>: </xsl:text> |
| <xsl:value-of select="author"/> |
| <xsl:text>&#xa;</xsl:text> |
| </xsl:for-each></source> |
| |
| <p>The KeyPattern can be used within an index definition to create unions |
| and intersections of node sets:</p><source> |
| <xsl:key name="cultcies" match="farmer | fisherman" use="name"/></source> |
| |
| <p>This could have been done using regular <code><xsl:for-each></code> |
| and <code><xsl:select></code> elements. However, if your stylesheet |
| accesses the same selection of nodes over and over again, the transformation |
| will be much more efficient using pre-indexed keys as shown above.</p> |
| |
| </s2> |
| |
| <anchor name="implementation"/> |
| <s2 title="Implementation"> |
| |
| <anchor name="indexing"/> |
| <s3 title="Indexing"> |
| |
| <p>The <code>AbstractTranslet</code> class has a global hashtable that holds |
| an index for each named key in the stylesheet (hashing on the "name" |
| attribute of <code><xsl:key></code>). <code>AbstractTranslet</code> |
| has a couple of public methods for inserting and retrieving data from this |
| hashtable:</p><source> |
| public void buildKeyIndex(String keyName, int nodeID, String value); |
| public KeyIndex KeyIndex getKeyIndex(String keyName);</source> |
| |
| <p>The <code>Key</code> class compiles code that traverses the input DOM and |
| extracts nodes that match some given parameters (the <code>"match"</code> |
| attribute of the <code><xsl:key></code> element). A new element is |
| inserted into the named key's index. The nodes' DOM index and the value |
| translated from the <code>"use"</code> attribute of the |
| <code><xsl:key></code> element are stored in the new entry in the |
| index.</p> |
| |
| <p>Something similar is done for indexing IDs. This index is generated from |
| the <code>ID</code> and <code>IDREF</code> fields in the input document's |
| DTD. This means that the code for generating this index cannot be generated |
| at compile-time, which again means that the code has to be generic enough |
| to handle all DTDs. The class that handles this is the |
| <code>org.apache.xalan.xsltc.dom.DTDMonitor</code> class. This class |
| implements the <code>org.xml.sax.XMLReader</code> and |
| <code>org.xml.sax.DTDHandler</code> interfaces. A client application using |
| the native API must instanciate a <code>DTDMonitor</code> and pass it to the |
| translet code - if, and only if, it wants IDs indexed (one can improve |
| performance by omitting this step). This is descrived in the |
| <link idref="xsltc_native_api">XSLTC Native API reference</link>. The |
| <code>DTDMonitor</code> class will use the same indexing as the code |
| generated by the <code>Key</code> class. The index for ID's is called |
| "##id". We assume that no stylesheets will contain a key with this name.</p> |
| |
| <p>The index itself is implemented in the |
| <code>org.apache.xalan.xsltc.dom.KeyIndex</code> class. The index has an |
| hashtable with all the values from the matching nodes (the part of the node |
| used to generate this value is the one specified in the <code>"use"</code> |
| attribute). For every matching value there is a bit-array (implemented in |
| the <code>org.apache.xalan.xsltc.BitArray</code> class), holding a list of |
| all node indexes for which this value gives a match:</p> |
| <p><img src="key_relations.gif" alt="key_relations.gif"/></p> |
| <p><ref>Figure 1: Indexing tables</ref></p> |
| |
| <p>The <code>KeyIndex</code> class implements the <code>NodeIterator</code> |
| interface, so that it can be returned directly by the implementation of the |
| <code>key()</code> function. This is how the index generated by |
| <code><xsl:key></code> and the node-set returned by the |
| <code>key()</code> and KeyPattern are tied together. You can see how this is |
| done in the <code>translate()</code> method of the <code>KeyCall</code> |
| class.</p> |
| |
| <p>The <code>key()</code> function can be called in two ways:</p><source> |
| key('key-name','value') |
| key('key-name','node-set')</source> |
| |
| <p>The first parameter is always the name of the key. We use this value to |
| lookup our index from the _keyIndexes hashtable in AbstractTranslet:</p> |
| <source> |
| il.append(classGen.aloadThis()); |
| _name.translate(classGen, methodGen); |
| il.append(new INVOKEVIRTUAL(getKeyIndex));</source> |
| |
| <p>This compiles into a call to |
| <code>AbstractTranslet.getKeyIndex(String name)</code>, and it leaves a |
| <code>KeyIndex</code> object on the stack. What we then need to do it to |
| initialise the <code>KeyIndex</code> to give us nodes with the requested |
| value. This is done by leaving the <code>KeyIndex</code> object on the stack |
| and pushing the <code>"value"</code> parameter to <code>key()</code>, before |
| calling <code>lookup()</code> on the index:</p><source> |
| il.append(DUP); // duplicate the KeyIndex obejct before return |
| _value.translate(classGen, methodGen); |
| il.append(new INVOKEVIRTUAL(lookup));</source> |
| |
| <p>This compiles into a call to <code>KeyIndex.lookup(String value)</code>. |
| This will initialise the <code>KeyIndex</code> object to return nodes that |
| match the given value, so the <code>KeyIndex</code> object can be left on |
| the stack when we return. This because the <code>KeyIndex</code> object |
| implements the <code>NodeIterator</code> interface.</p> |
| |
| <p>This matter is a bit more complex when the second parameter of |
| <code>key()</code> is a node-set. In this case we need to traverse the nodes |
| in the set and do a lookup for each node in the set. What I do is this:</p> |
| |
| <ul> |
| <li> |
| construct a <code>KeyIndex</code> object that will hold the |
| return node-set |
| </li> |
| <li> |
| find the named <code>KeyIndex</code> object from the hashtable in |
| AbstractTranslet |
| </li> |
| <li> |
| get an iterator for the node-set and do the folowing loop:</li> |
| <ul> |
| <li>get string value for current node</li> |
| <li>do lookup in KeyIndex object for the named index</li> |
| <li>merge the resulting node-set into the return node-set</li> |
| </ul> |
| <li> |
| leave the return node-set on stack when done |
| </li> |
| </ul> |
| |
| </s3> |
| |
| <anchor name="improvements"/> |
| <s3 title="Possible indexing improvements"> |
| |
| <p>The indexing implementation can be very, very memeory exhaustive as there |
| is one <code>BitArray</code> allocated for each value for every index. This |
| is particulary bad for the index used for IDs ('##id'), where a single value |
| should only map to one, single node. This means that a whole |
| <code>BitArray</code> is used just to contain one node. The |
| <code>KeyIndex</code> class should be updated to hold the first node for a |
| value in an <code>Integer</code> object, and then replace that with a |
| <code>BitArray</code> or a <code>Vector</code> only is a second node is |
| added to the value. Here is an outline for <code>KeyIndex</code>:</p> |
| <source> |
| |
| public void add(Object value, int node) { |
| Object container; |
| if ((container = (BitArray)_index.get(value)) == null) { |
| _index.put(value, new Integer(node)); |
| } |
| else { |
| // Check if there is _one_ node for this value |
| if (container instanceof Integer) { |
| int first = ((Integer)container |
| _nodes = new BitArray(_arraySize); |
| _nodes.setMask(node & 0xff000000); |
| _nodes.setBit(first & 0x00ffffff); |
| _nodes.setBit(node & 0x00ffffff); |
| _index.put(value, _nodes); |
| } |
| // Otherwise add node to axisting bit array |
| else { |
| _nodex = (BitArray)container; |
| _nodes.setBit(node & 0x00ffffff); |
| } |
| } |
| }</source> |
| |
| <p>Other methods inside the <code>KeyIndex</code> should be updated to |
| reflect this.</p> |
| |
| </s3> |
| |
| <anchor name="patterns"/> |
| <s3 title="Id and Key patterns"> |
| |
| <p>As already mentioned, the challenge with the <code>id()</code> and |
| <code>key()</code> patterns is that they have no specific node type. |
| Patterns are normally grouped based on their pattern's "kernel" node type. |
| This is done in the <code>org.apache.xalan.xsltc.compiler.Mode</code> class. |
| The <code>Mode</code> class has a method, <code>flatternAlaternative</code>, |
| that does this grouping, and all templates with a common kernel node type |
| will be inserted into a "test sequence". A test sequence is a set templates |
| with the same kernel node type. The <code>TestSeq</code> class generates |
| code that will figure out which pattern, amongst several patterns with the |
| same kernel node type, that matches a certain node. This is used by the |
| <code>Mode</code> class when generating the <code>applyTemplates</code> |
| method in the translet. A test sequence is also generated for all templates |
| whose pattern does not have a kernel node type. This is the case for all |
| Id and KeyPatterns. This test sequence, if necessary, is put before the |
| big <code>switch()</code> in the <code>applyTemplates()</code> mehtod. This |
| test has to be done for every single node that is traversed, causing the |
| transformation to slow down siginificantly. This is why we do <em>not</em> |
| recommend using this type of patterns with XSLTC.</p> |
| |
| </s3> |
| |
| </s2> |
| |
| </s1> |