| <?xml version="1.0"?> |
| |
| <spec> |
| <title>Transformation API For XML (TrAX)</title> |
| <frontmatter> |
| <pubdate>November 12, 2000</pubdate> |
| <author><firstname>Scott</firstname> |
| <surname>Boag</surname> |
| <orgname>IBM Research</orgname> |
| <address> |
| <email>Scott_Boag@us.ibm.com</email> |
| </address> |
| </author></frontmatter> |
| <introduction> |
| <title>Introduction</title> |
| <para>This overview describes the set of APIs contained in |
| <ulink url="http://xml.apache.org/xalan-j/apidocs/javax/xml/transform/package-summary.html">javax.xml.transform</ulink>, <ulink url="http://xml.apache.org/xalan-j/apidocs/javax/xml/transform/stream/package-summary.html">javax.xml.transform.stream</ulink>, <ulink url="http://xml.apache.org/xalan-j/apidocs/javax/xml/transform/dom/package-summary.html">javax.xml.transform.dom</ulink>, and <ulink url="http://xml.apache.org/xalan-j/apidocs/javax/xml/transform/sax/package-summary.html">javax.xml.transform.sax</ulink>. For the sake of brevity, these interfaces are referred to |
| as TrAX (Transformation API for XML). </para> |
| <para>There is a broad need for Java applications to be able to transform XML |
| and related tree-shaped data structures. In fact, XML is not normally very |
| useful to an application without going through some sort of transformation, |
| unless the semantic structure is used directly as data. Almost all XML-related |
| applications need to perform transformations. Transformations may be described |
| by Java code, Perl code, <ulink url="http://www.w3.org/TR/xslt">XSLT</ulink> |
| Stylesheets, other types of script, or by proprietary formats. The inputs, one |
| or multiple, to a transformation, may be a URL, XML stream, a DOM tree, SAX |
| Events, or a proprietary format or data structure. The output types are the |
| pretty much the same types as the inputs, but different inputs may need to be |
| combined with different outputs.</para> |
| <para>The great challenge of a transformation API is how to deal with all the |
| possible combinations of inputs and outputs, without becoming specialized for |
| any of the given types.</para> |
| <para>The Java community will greatly benefit from a common API that will |
| allow them to understand and apply a single model, write to consistent |
| interfaces, and apply the transformations polymorphically. TrAX attempts to |
| define a model that is clean and generic, yet fills general application |
| requirements across a wide variety of uses. </para> |
| <terminology> |
| <title>General Terminology</title> |
| <para>This section will explain some general terminology used in this |
| document. Technical terminology will be explained in the Model section. In many |
| cases, the general terminology overlaps with the technical terminology.</para> |
| <variablelist> |
| <varlistentry> |
| <term>Tree</term> |
| <listitem>This term, as used within this document, describes an |
| abstract structure that consists of nodes or events that may be produced by |
| XML. A Tree physically may be a DOM tree, a series of well balanced parse |
| events (such as those coming from a SAX2 ContentHander), a series of requests |
| (the result of which can describe a tree), or a stream of marked-up |
| characters.</listitem> |
| </varlistentry> |
| <varlistentry> |
| <term>Source Tree(s)</term> |
| <listitem>One or more trees that are the inputs to the |
| transformation.</listitem> |
| </varlistentry> |
| <varlistentry> |
| <term>Result Tree(s)</term> |
| <listitem>One or more trees that are the output of the |
| transformation.</listitem> |
| </varlistentry> |
| <varlistentry> |
| <term>Transformation</term> |
| <listitem>The process of consuming a stream or tree to produce |
| another stream or tree.</listitem> |
| </varlistentry> |
| <varlistentry> |
| <term>Identity (or Copy) Transformation</term> |
| <listitem>The process of transformation from a source to a result, |
| making as few structural changes as possible and no informational changes. The |
| term is somewhat loosely used, as the process is really a copy. from one |
| "format" (such as a DOM tree, stream, or set of SAX events) to |
| another.</listitem> |
| </varlistentry> |
| <varlistentry> |
| <term>Serialization</term> |
| <listitem>The process of taking a tree and turning it into a stream. In |
| some sense, a serialization is a specialized transformation.</listitem> |
| </varlistentry> |
| <varlistentry> |
| <term>Parsing</term> |
| <listitem>The process of taking a stream and turning it into a tree. In |
| some sense, parsing is a specialized transformation.</listitem> |
| </varlistentry> |
| <varlistentry> |
| <term>Transformer</term> |
| <listitem>A Transformer is the object that executes the transformation. |
| </listitem> |
| </varlistentry> |
| <varlistentry> |
| <term>Transformation instructions</term> |
| <listitem>Describes the transformation. A form of code, script, or |
| simply a declaration or series of declarations.</listitem> |
| </varlistentry> |
| <varlistentry> |
| <term>Stylesheet</term> |
| <listitem>The same as "transformation instructions," except it is |
| likely to be used in conjunction with <ulink |
| url="http://www.w3.org/TR/xslt">XSLT</ulink>.</listitem> |
| </varlistentry> |
| <varlistentry> |
| <term>Templates</term> |
| <listitem>Another form of "transformation instructions." In the TrAX |
| interface, this term is used to describe processed or compiled transformation |
| instructions. The Source flows through a Templates object to be formed into the |
| Result.</listitem> |
| </varlistentry> |
| <varlistentry> |
| <term>Processor</term> |
| <listitem>A general term for the thing that may both process the |
| transformation instructions, and perform the transformation.</listitem> |
| </varlistentry> |
| <varlistentry> |
| <term>DOM</term> |
| <listitem>Document Object Model, specifically referring to the |
| <termref link-url="http://www.w3.org/TR/DOM-Level-2 ">Document Object Model |
| (DOM) Level 2 Specification</termref>.</listitem> |
| </varlistentry> |
| <varlistentry> |
| <term>SAX</term> |
| <listitem>Simple API for XML, specifically referring to the |
| <termref link-url="http://www.megginson.com/SAX/SAX2">SAX 2.0 |
| release</termref>.</listitem> |
| </varlistentry> |
| </variablelist> |
| </terminology></introduction> |
| <requirements> |
| <title>Requirements</title> |
| <para>The following requirements have been determined from broad experience |
| with XML projects from the various members participating on the JCP.</para> |
| <orderedlist> |
| <listitem id="requirement-simple">TrAX must provide a clean, simple |
| interface for simple uses.</listitem> |
| <listitem id="requirement-general">TrAX must be powerful enough to be |
| applied to a wide range of uses, such as, e-commerce, content management, |
| server content delivery, and client applications.</listitem> |
| <listitem id="requirement-optimizeable">A processor that implements a TrAX |
| interface must be optimizeable. Performance is a critical issue for most |
| transformation use cases.</listitem> |
| <listitem id="requirement-compiled-model">As a specialization of the above |
| requirement, a TrAX processor must be able to support a compiled model, so that |
| a single set of transformation instructions can be compiled, optimized, and |
| applied to a large set of input sources.</listitem> |
| <listitem id="requirement-independence">TrAX must not be dependent an any |
| given type of transformation instructions. For instance, it must remain |
| independent of <ulink url="http://www.w3.org/TR/xslt">XSLT</ulink>.</listitem> |
| <listitem id="requirement-from-dom">TrAX must be able to allow processors |
| to transform DOM trees.</listitem> |
| <listitem id="requirement-to-dom">TrAX must be able to allow processors to |
| produce DOM trees.</listitem> |
| <listitem id="requirement-from-sax">TrAX must allow processors to transform |
| SAX events.</listitem> |
| <listitem id="requirement-to-sax">TrAX must allow processors to produce SAX |
| events.</listitem> |
| <listitem id="requirement-from-stream">TrAX must allow processors to |
| transform streams of XML.</listitem> |
| <listitem id="requirement-to-stream">TrAX must allow processors to produce |
| XML, HTML, and other types of streams.</listitem> |
| <listitem id="requirement-combo-input-output">TrAX must allow processors to |
| implement the various combinations of inputs and outputs within a single |
| processor.</listitem> |
| <listitem id="requirement-limited-input-output">TrAX must allow processors |
| to implement only a limited set of inputs. For instance, it should be possible |
| to write a processor that implements the TrAX interfaces and that only |
| processes DOM trees, not streams or SAX events.</listitem> |
| <listitem id="requirement-proprietary-data-structures">TrAX should allow a |
| processor to implement transformations of proprietary data structures. For |
| instance, it should be possible to implement a processor that provides TrAX |
| interfaces that performs transformation of JDOM trees.</listitem> |
| <listitem id="requirement-serialization-props">TrAX must allow the setting |
| of serialization properties, without constraint as to what the details of those |
| properties are.</listitem> |
| <listitem id="requirement-setting-parameters">TrAX must allow the setting |
| of parameters to the transformation instructions.</listitem> |
| <listitem id="requirement-namespaced-properties">TrAX must support the |
| setting of parameters and properties as XML Namespaced items (i.e., qualified |
| names).</listitem> |
| <listitem id="requirement-relative-url-resolution">TrAX must support URL |
| resolution from within the transformation, and have it return the needed data |
| structure.</listitem> |
| <listitem id="requirement-error-reporting">TrAX must have a mechanism for |
| reporting errors and warnings to the calling application.</listitem> |
| </orderedlist> </requirements> |
| <model> |
| <title>Model</title> |
| <para>The section defines the abstract model for TrAX, apart from the details |
| of the interfaces.</para> |
| <para>A TRaX <termref |
| link-url="pattern-TransformerFactory">TransformerFactory</termref> is an object |
| that processes transformation instructions, and produces |
| <termref link-url="pattern-Templates">Templates</termref> (in the technical |
| terminology). A <termref link-url="pattern-Templates">Templates</termref> |
| object provides a <termref |
| link-url="pattern-Transformer">Transformer</termref>, which transforms one or |
| more <termref link-url="pattern-Source">Source</termref>s into one or more |
| <termref link-url="pattern-Result">Result</termref>s.</para> |
| <para>To use the TRaX interface, you create a |
| <termref link-url="pattern-TransformerFactory">TransformerFactory</termref>, |
| which may directly provide a <termref |
| link-url="pattern-Transformers">Transformers</termref>, or which can provide |
| <termref link-url="pattern-Templates">Templates</termref> from a variety of |
| <termref link-url="pattern-Source">Source</termref>s. The |
| <termref link-url="pattern-Templates">Templates</termref> object is a processed |
| or compiled representation of the transformation instructions, and provides a |
| <termref link-url="pattern-Transformer">Transformer</termref>. The |
| <termref link-url="pattern-Transformer">Transformer</termref> processes a |
| <termref link-url="pattern-Transformer">Source</termref> according to the |
| instructions found in the <termref |
| link-url="pattern-Templates">Templates</termref>, and produces a |
| <termref link-url="pattern-Result">Result</termref>.</para> |
| <para>The process of transformation from a tree, either in the form of an |
| object model, or in the form of parse events, into a stream, is known as |
| <termref>serialization</termref>. We believe this is the most suitable term for |
| this process, despite the overlap with Java object serialization.</para> |
| <patterns module="TRaX"> <pattern><pattern-name |
| id="pattern-Processor">Processor</pattern-name><intent>Generic concept for the |
| set of objects that implement the TrAX interfaces.</intent> |
| <responsibilities>Create compiled transformation instructions, transform |
| sources, and manage transformation parameters and |
| properties.</responsibilities><thread-safety>Only the Templates object can be |
| used concurrently in multiple threads. The rest of the processor does not do |
| synchronized blocking, and so may not be used to perform multiple concurrent |
| operations.</thread-safety></pattern><pattern> |
| <pattern-name id="pattern-TransformerFactory">TransformerFactory</pattern-name> |
| <intent>Serve as a vendor-neutral Processor interface for |
| <ulink url="http://www.w3.org/TR/xslt">XSLT</ulink> and similar |
| processors.</intent> <responsibilities>Serve as a factory for a concrete |
| implementation of an TransformerFactory, serve as a direct factory for |
| Transformer objects, serve as a factory for Templates objects, and manage |
| processor specific features.</responsibilities> <thread-safety>A |
| TransformerFactory may not perform mulitple concurrent |
| operations.</thread-safety> </pattern> <pattern> |
| <pattern-name id="pattern-Templates">Templates</pattern-name> <intent>The |
| runtime representation of the transformation instructions.</intent> |
| <responsibilities>A data bag for transformation instructions; act as a factory |
| for Transformers.</responsibilities> <thread-safety>Threadsafe for concurrent |
| usage over multiple threads once construction is complete.</thread-safety> |
| </pattern> <pattern> <pattern-name |
| id="pattern-Transformer">Transformer</pattern-name> <intent>Act as a per-thread |
| execution context for transformations, act as an interface for performing the |
| transformation.</intent><responsibilities>Perform the |
| transformation.</responsibilities> <thread-safety>Only one instance per thread |
| is safe.</thread-safety> <notes>The Transformer is bound to the Templates |
| object that created it.</notes> </pattern> <pattern> |
| <pattern-name id="pattern-Source">Source</pattern-name> <intent>Serve as a |
| single vendor-neutral object for multiple types of input.</intent> |
| <responsibilities>Act as simple data holder for System IDs, DOM nodes, streams, |
| etc.</responsibilities> <thread-safety>Threadsafe concurrently over multiple |
| threads for read-only operations; must be synchronized for edit |
| operations.</thread-safety> </pattern><pattern> |
| <pattern-name id="pattern-Result">Result</pattern-name> |
| <potential-alternate-name>ResultTarget</potential-alternate-name> <intent>Serve |
| as a single object for multiple types of output, so there can be simple process |
| method signatures.</intent> <responsibilities>Act as simple data holder for |
| output stream, DOM node, ContentHandler, etc.</responsibilities> |
| <thread-safety>Threadsafe concurrently over multiple threads for read-only, |
| must be synchronized for edit.</thread-safety> </pattern> </patterns> |
| </model> |
| </spec> |