docs/xalan/xalan-j/xsltc/xsl_whitespace_design.html - xalan-site - Git at Google

 <?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
 <html>
 <head>
 <title>ASF: &lt;xsl:strip/preserve-space&gt;</title>
 <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" />
 <meta http-equiv="Content-Style-Type" content="text/css" />
 <link rel="stylesheet" type="text/css" href="resources/apache-xalan.css" />
 </head>
 <!--
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements. See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership. The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the  "License");
  * you may not use this file except in compliance with the License.
  * You may obtain a copy of the License at
  *
  *     http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing, software
  * distributed under the License is distributed on an "AS IS" BASIS,
  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  * See the License for the specific language governing permissions and
  * limitations under the License.
  -->
 <body>
 <div id="title">
 <table class="HdrTitle">
 <tbody>
 <tr>
 <th rowspan="2">
 <a href="../index.html">
 <img alt="Trademark Logo" src="resources/XalanJ-Logo-tm.png" width="190" height="90" />
 </a>
 </th>
 <th text-align="center" width="75%">
 <a href="index.html">XSLTC Design</a>
 </th>
 </tr>
 <tr>
 <td valign="middle">&lt;xsl:strip/preserve-space&gt;</td>
 </tr>
 </tbody>
 </table>
 <table class="HdrButtons" align="center" border="1">
 <tbody>
 <tr>
 <td>
 <a href="http://www.apache.org">Apache Foundation</a>
 </td>
 <td>
 <a href="http://xalan.apache.org">Xalan Project</a>
 </td>
 <td>
 <a href="http://xerces.apache.org">Xerces Project</a>
 </td>
 <td>
 <a href="http://www.w3.org/TR">Web Consortium</a>
 </td>
 <td>
 <a href="http://www.oasis-open.org/standards">Oasis Open</a>
 </td>
 </tr>
 </tbody>
 </table>
 </div>
 <div id="navLeft">
 <ul>
 <li>
 <a href="index.html">Overview</a>
 </li></ul><hr /><ul>
 <li>
 <a href="xsltc_compiler.html">Compiler design</a>
 </li></ul><hr /><ul>
 <li>Whitespace<br />
 </li>
 <li>
 <a href="xsl_sort_design.html">xsl:sort</a>
 </li>
 <li>
 <a href="xsl_key_design.html">Keys</a>
 </li>
 <li>
 <a href="xsl_comment_design.html">Comment design</a>
 </li></ul><hr /><ul>
 <li>
 <a href="xsl_lang_design.html">lang()</a>
 </li>
 <li>
 <a href="xsl_unparsed_design.html">Unparsed entities</a>
 </li></ul><hr /><ul>
 <li>
 <a href="xsl_if_design.html">If design</a>
 </li>
 <li>
 <a href="xsl_choose_design.html">Choose|When|Otherwise design</a>
 </li>
 <li>
 <a href="xsl_include_design.html">Include|Import design</a>
 </li>
 <li>
 <a href="xsl_variable_design.html">Variable|Param design</a>
 </li></ul><hr /><ul>
 <li>
 <a href="xsltc_runtime.html">Runtime</a>
 </li></ul><hr /><ul>
 <li>
 <a href="xsltc_dom.html">Internal DOM</a>
 </li>
 <li>
 <a href="xsltc_namespace.html">Namespaces</a>
 </li></ul><hr /><ul>
 <li>
 <a href="xsltc_trax.html">Translet &amp; TrAX</a>
 </li>
 <li>
 <a href="xsltc_predicates.html">XPath Predicates</a>
 </li>
 <li>
 <a href="xsltc_iterators.html">Xsltc Iterators</a>
 </li>
 <li>
 <a href="xsltc_native_api.html">Xsltc Native API</a>
 </li>
 <li>
 <a href="xsltc_trax_api.html">Xsltc TrAX API</a>
 </li>
 <li>
 <a href="xsltc_performance.html">Performance Hints</a>
 </li>
 </ul>
 </div>
 <div id="content">
 <h2>&lt;xsl:strip/preserve-space&gt;</h2>

   <ul>
     <li>
 <a href="#functionality">Functionality</a>
 </li>
     <li>
 <a href="#identify">Identifying strippable whitespace nodes</a>
 </li>
     <li>
 <a href="#which">Determining which nodes to strip</a>
 </li>
     <li>
 <a href="#strip">Stripping nodes</a>
 </li>
     <li>
 <a href="#filter">Filtering whitespace nodes</a>
 </li>
   </ul>

   <a name="functionality">&#8204;</a>
   <p align="right" size="2">
 <a href="#content">(top)</a>
 </p>
 <h3>Functionality</h3>

   <p>The <code>&lt;xsl:strip-space&gt;</code> and <code>&lt;xsl:preserve-space&gt;</code>
   elements are used to control the way whitespace nodes in the source XML
   document are handled. These elements have no impact on whitespace in the XSLT
   stylesheet. Both elements can occur only as top-level elements, possible more
   than once, and the elements are always empty</p>

   <p>Both elements take one attribute "elements" which contains a
   whitespace separated list of named nodes which should be or preserved
   stripped from the source document. These names can be on one of these three
   formats (NameTest format):</p>

   <ul>
     <li>
       All whitespace nodes:
       <code>elements="*"</code>
     </li>
     <li>
       All whitespace nodes with a namespace:
       <code>elements="&lt;namespace&gt;:*"</code>
     </li>
     <li>
       Specific whitespace nodes: <code>elements="&lt;qname&gt;"</code>
     </li>
   </ul>

   <a name="identify">&#8204;</a>
   <p align="right" size="2">
 <a href="#content">(top)</a>
 </p>
 <h3>Identifying strippable whitespace nodes</h3>

   <p>The DOM detects all text nodes and assigns them the type <code>TEXT</code>.
   All text nodes are scanned to detect whitespace-only nodes. A text-node is
   considered a whitespace node only if it consist entirely of characters from
   the set { 0x09, 0x0a, 0x0d, 0x20 }. The DOM implementation class has a static
   method used to detect such nodes:</p>

 <blockquote class="source">
 <pre>
     private static final boolean isWhitespaceChar(char c) {
         return c == 0x20 || c == 0x0A || c == 0x0D || c == 0x09;
     }
 </pre>
 </blockquote>

   <p>The characters are checked in probable order.</p>

   <p> The DOM has a bit-array that is used to  tag text-nodes as strippable
   whitespace nodes:</p>

   <blockquote class="source">
 <pre>private int[] _whitespace;</pre>
 </blockquote>

   <p>There are two methods in the DOM implementation class for accessing this
   bit-array: <code>markWhitespace(node)</code> and <code>isWhitespace(node)</code>.
   The array is resized together with all other arrays in the DOM by the
   <code>DOM.resizeArrays()</code> method. The bits in the array are set in the
   <code>DOM.maybeCreateTextNode()</code> method. This method must know whether
   the current node is a located under an element with an
   <code>xml:space="&lt;value&gt;"</code> attribute in the DOM, in which
   case it is not a strippable whitespace node.</p>

   <p>An auxillary class, WhitespaceHandler, is used for this purpose. The class
   works in a way as a stack, where you "push" a new strip/preserve setting
   together with the node in which this setting was determined. This means that
   for every time the DOM builder encounters an <code>xml:space</code> attribute
   it will invoke a method on an instance of the WhitespaceHandler class to
   signal that a new preserve/strip setting has been encountered. This is done
   in the <code>makeAttributeNode()</code> method. The whitespace handler stores the
   new setting and pushes the current element node on its stack. When the
   DOM builder closes up an element (in <code>endElement()</code>), it invokes
   another method of the whitespace handler to check if the strip/preserve
   setting is still valid. If the setting is now invalid (we're closing the
   element whose node id is on the top of the stack) the handler inverts the
   setting and pops the element node id off the stack. The previous
   strip/preserve setting is then valid, and the id of node where this setting
   was defined is on the top of the stack.</p>

   <a name="which">&#8204;</a>
   <p align="right" size="2">
 <a href="#content">(top)</a>
 </p>
 <h3>Determining which nodes to strip</h3>

   <p>A text node is never stripped unless it contains only whitespace
   characters (Unicode characters 0x09, 0x0A, 0x0D and 0x20). Stripping a text
   node means that the node disappears from the DOM; so that it is never
   included in the output and that it is ignored by all functions such as
   <code>count()</code>. A text node is preserved if any of the following apply:</p>

   <ul>
     <li>
       the element name of the parent of the text node is in the set of
       elements listed in <code>&lt;xsl:preserve-space&gt;</code>
     </li>
     <li>
       the text node contains at least one non-whitespace character
     </li>
     <li>
       an ancenstor of the whitespace text node has an attribute of
       <code>xsl:space="preserve"</code>, and no close ancestor has and
       attribute of <code>xsl:space="default"</code>.
     </li>
   </ul>

   <p>Otherwise, the text node is stripped. Initially the set of
   whitespace-preserving element names contains all element names, so the
   default behaviour is to preserve all whitespace text nodes.</p>

   <p>This seems simple enough, but resolving conflicts between matching
   <code>&lt;xsl:strip-space&gt;</code> and <code>&lt;xsl:preserve-space&gt;</code>
   elements requires a lot of thought. Our first consideration is import
   precedence; the match with the highest import precedence is always chosen.
   Import precedence is determined by the order in which the compared elements
   are visited. (In this case those elements are the top-level
   <code>&lt;xsl:strip-space&gt;</code> and <code>&lt;xsl:preserve-space&gt;</code>
   elements.) This example is taken from the XSLT recommendation:</p>

   <ul>
     <li>stylesheet A imports stylesheets B and C in that order;</li>
     <li>stylesheet B imports stylesheet D;</li>
     <li>stylesheet C imports stylesheet E.</li>
   </ul>

   <p>Then the order of import precedence (lowest first) is D, B, E, C, A.</p>

   <p>Our next consideration is the priority of NameTests (XPath spec):</p>
   <ul>
     <li>
       <code>elements="&lt;qname&gt;"</code> has priority 0
     </li>
     <li>
       <code>elements="&lt;namespace&gt;:*"</code> has priority -0.25
     </li>
     <li>
       <code>elements="*"</code> has priority -0.5
     </li>
   </ul>

   <p>It is considered an error if the desicion is still ambiguous after this,
   and it is up to the implementors to decide what the apropriate action is.</p>

   <p>With all this complexity, the normal usage for these elements is quite
   smiple; either preserve all whitespace nodes but one type:</p>

   <blockquote class="source">
 <pre>&lt;xsl:strip-space elements="foo"/&gt;</pre>
 </blockquote>

   <p>or strip all whitespace nodes but one type:</p>

   <blockquote class="source">
 <pre>
     &lt;xsl:strip-space elements="*"/&gt;
     &lt;xsl:preserve-space elements="foo"/&gt;</pre>
 </blockquote>

   <a name="strip">&#8204;</a>
   <p align="right" size="2">
 <a href="#content">(top)</a>
 </p>
 <h3>Stripping nodes</h3>

   <p>The ultimate goal of our design would be to totally screen all stripped
   nodes from the translet; to either physically remove them from the DOM or to
   make it appear as if they are not there. The first approach will cause
   problems in cases where multiple translets access the same DOM. In the future
   we wish to let translets run within servlets / JSPs with a common DOM cache.
   This DOM cache will keep copies of DOMs in memory to prevent the same XML
   file from being downloaded and parsed several times. This is a scenarios we
   might see:</p>

     <p>
 <img src="DOMInterface.gif" alt="DOMInterface.gif" />
 </p>
     <p>
 <b>
 <i>Figure 1: Multiple translets accessing a common pool of DOMs</i>
 </b>
 </p>

   <p>The three translets running on this host access a common pool of 4 DOMs.
   The DOMs are accessed through a common DOM interface. Translets accessing
   a single DOM will have a DOMAdapter and a single DOMImpl object behind this
   interface, while translets accessing several DOMs will be given a MultiDOM
   and a set of DOMImpl objects.</p>

   <p>The translet to the left may want to strip some nodes from the shared DOM
   in the cache, while the other translets may want to preserve all whitespace
   nodes. Our initial thought then is to keep the DOM as it is and somehow
   screen the left-hand translet of all the whitespace nodes it does not want to
   process. There are a few ways in which we can accomplish this:</p>

   <ul>
     <li>
       The translet can, prior to starting to traverse the DOM, send a reference
       to the tables containing information on which nodes we want stripped to
       the DOM interface. The DOM interface is then responsible for hiding all
       stripped whitespace nodes from the iterators and the translet. A problem
       with this approach is that we want to omit the DOM interface layer if
       the translet is only accessing a single DOM. The DOM interface layer will
       only be instanciated by the translet if the stylesheet contained a call
       to the <code>document()</code> function.<br />
 <br />
     </li>
     <li>
       The translet can provide its iterators with information on which nodes it
       does not want to see. The translet is still shielded from unwanted
       whitespace nodes, but it has the hassle of passing extra information over
       to most iterators it ever instanciates. Note that all iterators do not
       need be aware of whitepspace nodes in this case. If you have a look at
       the figure again you will see that only the first level iterator (that is
       the one closest to the DOM or DOM interface) will have to strip off
       whitespace nodes. But, there may be several iterators that operate
       directly on the DOM ( invoked by code handling XSL functions such as
       <code>count()</code>) and every single one of those will need to be told
       which whitespace nodes the translet does not want to see.<br />
 <br />
     </li>
     <li>
       The third approach will take advantage of the fact that not all
       translets will want to strip whitespace nodes. The most effective way of
       removing unwanted whitespace nodes is to do it once and for all, before
       the actual traversal of the DOM starts. This can be done by making a
       clone of the DOM with exlusive-access rights for this translet only. We
       still gain performance from the cache because we do not have to pay the
       cost of the delay caused by downloading and parsing the XML source file.
       The cost we have to pay is the time needed for the actual cloning and the
       extra memory we use.<br />
 <br />
       Normally one would imagine the translet (or the wrapper class that
       invokes the translet) calls the DOM cache with just an URL and receives
       a reference to an instanciated DOM. The cache will either have built
       this DOM on-demand or just passed back a reference to an existing tree.
       In this case the DOM would need an extra call that a translet would use
       to clone a DOM, passing the existing DOM reference to the cache and
       recieving a new reference to the cloned DOM. The translet can then do
       whatever it wants with this DOM (the cache need not even keep a reference
       to this tree).
     </li>
   </ul>

   <p>We are lucky enough to be able to combine the first two approaches. All
   iterators that directly access the DOM (axis iterators) are instanciated by
   calls to the DOM interface layer (the DOM class). The actual iterators are
   created in the DOM implementation layer (the DOMImpl class). So, we can pass
   references to the preserve/strip whitespace tables to the DOM, and the DOM
   will make sure that all axis iterators return node sets with respect to these
   tables.</p>
   <a name="filter">&#8204;</a>
   <p align="right" size="2">
 <a href="#content">(top)</a>
 </p>
 <h3>Filtering whitespace nodes</h3>

   <p>For each axis iterator and for <code>DOM.makeStringValue()</code> and
   <code>DOM.stringValueAux()</code> we must apply a filter for eliminating all
   unwanted whitespace nodes. To achive this we must build a very efficient
   predicate for determining if the current node should be stripped or not. This
   predicate is built by <code>Whitespace.compilePredicate()</code>. This method is
   static and builds a predicate for a vector of WhitespaceRule objects. (The
   WhitespaceRule class is defined within the Whitespace class.) Each
   WhitespaceRule object contains information for a single element listed
   in an <code>&lt;xsl:strip/preserve-space&gt;</code> element, and is broken down
   into the following information:</p>

   <ul>
     <li>the namespace (can be the default namespace)</li>
     <li>the element name or "<code>*</code>"</li>
     <li>the type of rule; NS:EL, NS:<code>*</code> or <code>*</code>
 </li>
     <li>the priority of the rule (based on import precedence and type)</li>
     <li>the action; either strip or preserver</li>
   </ul>

   <p>The Vector of WhitespaceRules is arranged in order of priority and
   redundant rules are removed. A predicate method is then compiled into the
   translet as:</p>

 <blockquote class="source">
 <pre>
     public boolean stripSpace(int node);
 </pre>
 </blockquote>

   <p>Unfortunately this method cannot be declared static.</p>

   <p>When the Stylesheet objectcompiles the <code>topLevel()</code> method of the
   translet it checks for the existance of the <code>stripSpace()</code> method. If
   this method exists the <code>topLevel()</code> will be compiled to pass the
   translet to the DOM as a StripWhitespaceFilter (the translet implements this
   interface when the <code>stripSpace()</code> method is compiled).</p>

   <p>All axis iterators and the <code>DOM.makeStringValue()</code> and
   <code>DOM.stringValueAux()</code> methods check for the existance of this filter
   (it is kept in a global variable in the DOM implementation class) and takes
   the appropriate actions. The methods in the DOM for returning axis iterators
   will place a StrippingIterator on top of the axis iterator if the filter is
   present, and the two methods just mentioned will return empty strings for
   whitespace nodes that should be stripped.</p>


 <p align="right" size="2">
 <a href="#content">(top)</a>
 </p>
 </div>
 <div id="footer">Copyright © 1999-2012 The Apache Software Foundation<br />Apache, Xalan, and the Feather logo are trademarks of The Apache Software Foundation<div class="small">Web Page created on - Sun 03/18/2012</div>
 </div>
 </body>
 </html>
	<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
	<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
	<html>
	<head>
	<title>ASF: <xsl:strip/preserve-space></title>
	<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" />
	<meta http-equiv="Content-Style-Type" content="text/css" />
	<link rel="stylesheet" type="text/css" href="resources/apache-xalan.css" />
	</head>
	<!--
	* Licensed to the Apache Software Foundation (ASF) under one
	* or more contributor license agreements. See the NOTICE file
	* distributed with this work for additional information
	* regarding copyright ownership. The ASF licenses this file
	* to you under the Apache License, Version 2.0 (the "License");
	* you may not use this file except in compliance with the License.
	* You may obtain a copy of the License at
	*
	* http://www.apache.org/licenses/LICENSE-2.0
	*
	* Unless required by applicable law or agreed to in writing, software
	* distributed under the License is distributed on an "AS IS" BASIS,
	* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	* See the License for the specific language governing permissions and
	* limitations under the License.
	-->
	<body>
	<div id="title">
	<table class="HdrTitle">
	<tbody>
	<tr>
	<th rowspan="2">
	<a href="../index.html">
	<img alt="Trademark Logo" src="resources/XalanJ-Logo-tm.png" width="190" height="90" />
	</a>
	</th>
	<th text-align="center" width="75%">
	<a href="index.html">XSLTC Design</a>
	</th>
	</tr>
	<tr>
	<td valign="middle"><xsl:strip/preserve-space></td>
	</tr>
	</tbody>
	</table>
	<table class="HdrButtons" align="center" border="1">
	<tbody>
	<tr>
	<td>
	<a href="http://www.apache.org">Apache Foundation</a>
	</td>
	<td>
	<a href="http://xalan.apache.org">Xalan Project</a>
	</td>
	<td>
	<a href="http://xerces.apache.org">Xerces Project</a>
	</td>
	<td>
	<a href="http://www.w3.org/TR">Web Consortium</a>
	</td>
	<td>
	<a href="http://www.oasis-open.org/standards">Oasis Open</a>
	</td>
	</tr>
	</tbody>
	</table>
	</div>
	<div id="navLeft">
	<ul>
	<li>
	<a href="index.html">Overview</a>
	</li></ul><hr /><ul>
	<li>
	<a href="xsltc_compiler.html">Compiler design</a>
	</li></ul><hr /><ul>
	<li>Whitespace<br />
	</li>
	<li>
	<a href="xsl_sort_design.html">xsl:sort</a>
	</li>
	<li>
	<a href="xsl_key_design.html">Keys</a>
	</li>
	<li>
	<a href="xsl_comment_design.html">Comment design</a>
	</li></ul><hr /><ul>
	<li>
	<a href="xsl_lang_design.html">lang()</a>
	</li>
	<li>
	<a href="xsl_unparsed_design.html">Unparsed entities</a>
	</li></ul><hr /><ul>
	<li>
	<a href="xsl_if_design.html">If design</a>
	</li>
	<li>
	<a href="xsl_choose_design.html">Choose\|When\|Otherwise design</a>
	</li>
	<li>
	<a href="xsl_include_design.html">Include\|Import design</a>
	</li>
	<li>
	<a href="xsl_variable_design.html">Variable\|Param design</a>
	</li></ul><hr /><ul>
	<li>
	<a href="xsltc_runtime.html">Runtime</a>
	</li></ul><hr /><ul>
	<li>
	<a href="xsltc_dom.html">Internal DOM</a>
	</li>
	<li>
	<a href="xsltc_namespace.html">Namespaces</a>
	</li></ul><hr /><ul>
	<li>
	<a href="xsltc_trax.html">Translet & TrAX</a>
	</li>
	<li>
	<a href="xsltc_predicates.html">XPath Predicates</a>
	</li>
	<li>
	<a href="xsltc_iterators.html">Xsltc Iterators</a>
	</li>
	<li>
	<a href="xsltc_native_api.html">Xsltc Native API</a>
	</li>
	<li>
	<a href="xsltc_trax_api.html">Xsltc TrAX API</a>
	</li>
	<li>
	<a href="xsltc_performance.html">Performance Hints</a>
	</li>
	</ul>
	</div>
	<div id="content">
	<h2><xsl:strip/preserve-space></h2>

	<ul>
	<li>
	<a href="#functionality">Functionality</a>
	</li>
	<li>
	<a href="#identify">Identifying strippable whitespace nodes</a>
	</li>
	<li>
	<a href="#which">Determining which nodes to strip</a>
	</li>
	<li>
	<a href="#strip">Stripping nodes</a>
	</li>
	<li>
	<a href="#filter">Filtering whitespace nodes</a>
	</li>
	</ul>

	<a name="functionality">‌</a>
	<p align="right" size="2">
	<a href="#content">(top)</a>
	</p>
	<h3>Functionality</h3>

	<p>The <code><xsl:strip-space></code> and <code><xsl:preserve-space></code>
	elements are used to control the way whitespace nodes in the source XML
	document are handled. These elements have no impact on whitespace in the XSLT
	stylesheet. Both elements can occur only as top-level elements, possible more
	than once, and the elements are always empty</p>

	<p>Both elements take one attribute "elements" which contains a
	whitespace separated list of named nodes which should be or preserved
	stripped from the source document. These names can be on one of these three
	formats (NameTest format):</p>

	<ul>
	<li>
	All whitespace nodes:
	<code>elements="*"</code>
	</li>
	<li>
	All whitespace nodes with a namespace:
	<code>elements="<namespace>:*"</code>
	</li>
	<li>
	Specific whitespace nodes: <code>elements="<qname>"</code>
	</li>
	</ul>

	<a name="identify">‌</a>
	<p align="right" size="2">
	<a href="#content">(top)</a>
	</p>
	<h3>Identifying strippable whitespace nodes</h3>

	<p>The DOM detects all text nodes and assigns them the type <code>TEXT</code>.
	All text nodes are scanned to detect whitespace-only nodes. A text-node is
	considered a whitespace node only if it consist entirely of characters from
	the set { 0x09, 0x0a, 0x0d, 0x20 }. The DOM implementation class has a static
	method used to detect such nodes:</p>

	<blockquote class="source">
	<pre>
	private static final boolean isWhitespaceChar(char c) {
	return c == 0x20 \|\| c == 0x0A \|\| c == 0x0D \|\| c == 0x09;
	}
	</pre>
	</blockquote>

	<p>The characters are checked in probable order.</p>

	<p> The DOM has a bit-array that is used to tag text-nodes as strippable
	whitespace nodes:</p>

	<blockquote class="source">
	<pre>private int[] _whitespace;</pre>
	</blockquote>

	<p>There are two methods in the DOM implementation class for accessing this
	bit-array: <code>markWhitespace(node)</code> and <code>isWhitespace(node)</code>.
	The array is resized together with all other arrays in the DOM by the
	<code>DOM.resizeArrays()</code> method. The bits in the array are set in the
	<code>DOM.maybeCreateTextNode()</code> method. This method must know whether
	the current node is a located under an element with an
	<code>xml:space="<value>"</code> attribute in the DOM, in which
	case it is not a strippable whitespace node.</p>

	<p>An auxillary class, WhitespaceHandler, is used for this purpose. The class
	works in a way as a stack, where you "push" a new strip/preserve setting
	together with the node in which this setting was determined. This means that
	for every time the DOM builder encounters an <code>xml:space</code> attribute
	it will invoke a method on an instance of the WhitespaceHandler class to
	signal that a new preserve/strip setting has been encountered. This is done
	in the <code>makeAttributeNode()</code> method. The whitespace handler stores the
	new setting and pushes the current element node on its stack. When the
	DOM builder closes up an element (in <code>endElement()</code>), it invokes
	another method of the whitespace handler to check if the strip/preserve
	setting is still valid. If the setting is now invalid (we're closing the
	element whose node id is on the top of the stack) the handler inverts the
	setting and pops the element node id off the stack. The previous
	strip/preserve setting is then valid, and the id of node where this setting
	was defined is on the top of the stack.</p>

	<a name="which">‌</a>
	<p align="right" size="2">
	<a href="#content">(top)</a>
	</p>
	<h3>Determining which nodes to strip</h3>

	<p>A text node is never stripped unless it contains only whitespace
	characters (Unicode characters 0x09, 0x0A, 0x0D and 0x20). Stripping a text
	node means that the node disappears from the DOM; so that it is never
	included in the output and that it is ignored by all functions such as
	<code>count()</code>. A text node is preserved if any of the following apply:</p>

	<ul>
	<li>
	the element name of the parent of the text node is in the set of
	elements listed in <code><xsl:preserve-space></code>
	</li>
	<li>
	the text node contains at least one non-whitespace character
	</li>
	<li>
	an ancenstor of the whitespace text node has an attribute of
	<code>xsl:space="preserve"</code>, and no close ancestor has and
	attribute of <code>xsl:space="default"</code>.
	</li>
	</ul>

	<p>Otherwise, the text node is stripped. Initially the set of
	whitespace-preserving element names contains all element names, so the
	default behaviour is to preserve all whitespace text nodes.</p>

	<p>This seems simple enough, but resolving conflicts between matching
	<code><xsl:strip-space></code> and <code><xsl:preserve-space></code>
	elements requires a lot of thought. Our first consideration is import
	precedence; the match with the highest import precedence is always chosen.
	Import precedence is determined by the order in which the compared elements
	are visited. (In this case those elements are the top-level
	<code><xsl:strip-space></code> and <code><xsl:preserve-space></code>
	elements.) This example is taken from the XSLT recommendation:</p>

	<ul>
	<li>stylesheet A imports stylesheets B and C in that order;</li>
	<li>stylesheet B imports stylesheet D;</li>
	<li>stylesheet C imports stylesheet E.</li>
	</ul>

	<p>Then the order of import precedence (lowest first) is D, B, E, C, A.</p>

	<p>Our next consideration is the priority of NameTests (XPath spec):</p>
	<ul>
	<li>
	<code>elements="<qname>"</code> has priority 0
	</li>
	<li>
	<code>elements="<namespace>:*"</code> has priority -0.25
	</li>
	<li>
	<code>elements="*"</code> has priority -0.5
	</li>
	</ul>

	<p>It is considered an error if the desicion is still ambiguous after this,
	and it is up to the implementors to decide what the apropriate action is.</p>

	<p>With all this complexity, the normal usage for these elements is quite
	smiple; either preserve all whitespace nodes but one type:</p>

	<blockquote class="source">
	<pre><xsl:strip-space elements="foo"/></pre>
	</blockquote>

	<p>or strip all whitespace nodes but one type:</p>

	<blockquote class="source">
	<pre>
	<xsl:strip-space elements="*"/>
	<xsl:preserve-space elements="foo"/></pre>
	</blockquote>

	<a name="strip">‌</a>
	<p align="right" size="2">
	<a href="#content">(top)</a>
	</p>
	<h3>Stripping nodes</h3>

	<p>The ultimate goal of our design would be to totally screen all stripped
	nodes from the translet; to either physically remove them from the DOM or to
	make it appear as if they are not there. The first approach will cause
	problems in cases where multiple translets access the same DOM. In the future
	we wish to let translets run within servlets / JSPs with a common DOM cache.
	This DOM cache will keep copies of DOMs in memory to prevent the same XML
	file from being downloaded and parsed several times. This is a scenarios we
	might see:</p>

	<p>
	<img src="DOMInterface.gif" alt="DOMInterface.gif" />
	</p>
	<p>
	<b>
	<i>Figure 1: Multiple translets accessing a common pool of DOMs</i>
	</b>
	</p>

	<p>The three translets running on this host access a common pool of 4 DOMs.
	The DOMs are accessed through a common DOM interface. Translets accessing
	a single DOM will have a DOMAdapter and a single DOMImpl object behind this
	interface, while translets accessing several DOMs will be given a MultiDOM
	and a set of DOMImpl objects.</p>

	<p>The translet to the left may want to strip some nodes from the shared DOM
	in the cache, while the other translets may want to preserve all whitespace
	nodes. Our initial thought then is to keep the DOM as it is and somehow
	screen the left-hand translet of all the whitespace nodes it does not want to
	process. There are a few ways in which we can accomplish this:</p>

	<ul>
	<li>
	The translet can, prior to starting to traverse the DOM, send a reference
	to the tables containing information on which nodes we want stripped to
	the DOM interface. The DOM interface is then responsible for hiding all
	stripped whitespace nodes from the iterators and the translet. A problem
	with this approach is that we want to omit the DOM interface layer if
	the translet is only accessing a single DOM. The DOM interface layer will
	only be instanciated by the translet if the stylesheet contained a call
	to the <code>document()</code> function.<br />
	<br />
	</li>
	<li>
	The translet can provide its iterators with information on which nodes it
	does not want to see. The translet is still shielded from unwanted
	whitespace nodes, but it has the hassle of passing extra information over
	to most iterators it ever instanciates. Note that all iterators do not
	need be aware of whitepspace nodes in this case. If you have a look at
	the figure again you will see that only the first level iterator (that is
	the one closest to the DOM or DOM interface) will have to strip off
	whitespace nodes. But, there may be several iterators that operate
	directly on the DOM ( invoked by code handling XSL functions such as
	<code>count()</code>) and every single one of those will need to be told
	which whitespace nodes the translet does not want to see.<br />
	<br />
	</li>
	<li>
	The third approach will take advantage of the fact that not all
	translets will want to strip whitespace nodes. The most effective way of
	removing unwanted whitespace nodes is to do it once and for all, before
	the actual traversal of the DOM starts. This can be done by making a
	clone of the DOM with exlusive-access rights for this translet only. We
	still gain performance from the cache because we do not have to pay the
	cost of the delay caused by downloading and parsing the XML source file.
	The cost we have to pay is the time needed for the actual cloning and the
	extra memory we use.<br />
	<br />
	Normally one would imagine the translet (or the wrapper class that
	invokes the translet) calls the DOM cache with just an URL and receives
	a reference to an instanciated DOM. The cache will either have built
	this DOM on-demand or just passed back a reference to an existing tree.
	In this case the DOM would need an extra call that a translet would use
	to clone a DOM, passing the existing DOM reference to the cache and
	recieving a new reference to the cloned DOM. The translet can then do
	whatever it wants with this DOM (the cache need not even keep a reference
	to this tree).
	</li>
	</ul>

	<p>We are lucky enough to be able to combine the first two approaches. All
	iterators that directly access the DOM (axis iterators) are instanciated by
	calls to the DOM interface layer (the DOM class). The actual iterators are
	created in the DOM implementation layer (the DOMImpl class). So, we can pass
	references to the preserve/strip whitespace tables to the DOM, and the DOM
	will make sure that all axis iterators return node sets with respect to these
	tables.</p>
	<a name="filter">‌</a>
	<p align="right" size="2">
	<a href="#content">(top)</a>
	</p>
	<h3>Filtering whitespace nodes</h3>

	<p>For each axis iterator and for <code>DOM.makeStringValue()</code> and
	<code>DOM.stringValueAux()</code> we must apply a filter for eliminating all
	unwanted whitespace nodes. To achive this we must build a very efficient
	predicate for determining if the current node should be stripped or not. This
	predicate is built by <code>Whitespace.compilePredicate()</code>. This method is
	static and builds a predicate for a vector of WhitespaceRule objects. (The
	WhitespaceRule class is defined within the Whitespace class.) Each
	WhitespaceRule object contains information for a single element listed
	in an <code><xsl:strip/preserve-space></code> element, and is broken down
	into the following information:</p>

	<ul>
	<li>the namespace (can be the default namespace)</li>
	<li>the element name or "<code>*</code>"</li>
	<li>the type of rule; NS:EL, NS:<code></code> or <code></code>
	</li>
	<li>the priority of the rule (based on import precedence and type)</li>
	<li>the action; either strip or preserver</li>
	</ul>

	<p>The Vector of WhitespaceRules is arranged in order of priority and
	redundant rules are removed. A predicate method is then compiled into the
	translet as:</p>

	<blockquote class="source">
	<pre>
	public boolean stripSpace(int node);
	</pre>
	</blockquote>

	<p>Unfortunately this method cannot be declared static.</p>

	<p>When the Stylesheet objectcompiles the <code>topLevel()</code> method of the
	translet it checks for the existance of the <code>stripSpace()</code> method. If
	this method exists the <code>topLevel()</code> will be compiled to pass the
	translet to the DOM as a StripWhitespaceFilter (the translet implements this
	interface when the <code>stripSpace()</code> method is compiled).</p>

	<p>All axis iterators and the <code>DOM.makeStringValue()</code> and
	<code>DOM.stringValueAux()</code> methods check for the existance of this filter
	(it is kept in a global variable in the DOM implementation class) and takes
	the appropriate actions. The methods in the DOM for returning axis iterators
	will place a StrippingIterator on top of the axis iterator if the filter is
	present, and the two methods just mentioned will return empty strings for
	whitespace nodes that should be stripped.</p>


	<p align="right" size="2">
	<a href="#content">(top)</a>
	</p>
	</div>
	<div id="footer">Copyright © 1999-2012 The Apache Software Foundation<br />Apache, Xalan, and the Feather logo are trademarks of The Apache Software Foundation<div class="small">Web Page created on - Sun 03/18/2012</div>
	</div>
	</body>
	</html>