| <?xml version="1.0" encoding="UTF-8"?> |
| <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML 5.0//EN" "http://docbook.org/xml/5.0/dtd/docbook.dtd"> |
| <!-- |
| ~ Licensed to the Apache Software Foundation (ASF) under one |
| ~ or more contributor license agreements. See the NOTICE file |
| ~ distributed with this work for additional information |
| ~ regarding copyright ownership. The ASF licenses this file |
| ~ to you under the Apache License, Version 2.0 (the |
| ~ "License"); you may not use this file except in compliance |
| ~ with the License. You may obtain a copy of the License at |
| ~ |
| ~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~ |
| ~ Unless required by applicable law or agreed to in writing, |
| ~ software distributed under the License is distributed on an |
| ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| ~ KIND, either express or implied. See the License for the |
| ~ specific language governing permissions and limitations |
| ~ under the License. |
| --> |
| <book> |
| <info> |
| <title>Axiom User Guide</title> |
| <releaseinfo>&version; |
| </releaseinfo> |
| |
| <legalnotice> |
| <para> |
| Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See |
| the NOTICE file distributed with this work for additional information regarding copyright ownership. The |
| ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use |
| this file except in compliance with the License. You may obtain a copy of the License at |
| </para> |
| <para> |
| <link xlink:href="http://www.apache.org/licenses/LICENSE-2.0"/> |
| </para> |
| <para> |
| Unless required by applicable law or agreed to in writing, software distributed under the License is |
| distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or |
| implied. See the License for the specific language governing permissions and limitations under the |
| License. |
| </para> |
| </legalnotice> |
| </info> |
| |
| <toc/> |
| |
| <chapter> |
| <title>Introduction</title> |
| <section> |
| <title>What is Axiom?</title> |
| <para> |
| Axiom stands for <firstterm>Axis Object Model</firstterm> and refers to the XML infoset model |
| that is initially developed for Apache Axis2. XML infoset refers to the information included inside the |
| XML, and for programmatic manipulation it is convenient to have a representation of this XML infoset in |
| a language specific manner. For an object oriented language the obvious choice is a model made up of |
| objects. <link xlink:href="http://www.w3.org/DOM/">DOM</link> and <link xlink:href="http://www.jdom.org/">JDOM</link> |
| are two such XML models. Axiom is conceptually similar to such an XML model by its external behavior but |
| deep down it is very much different. The objective of this tutorial is to introduce the basics of Axiom and |
| explain the best practices to be followed while using Axiom. However, before diving in to the deep end of |
| Axiom it is better to skim the surface and see what it is all about! |
| </para> |
| </section> |
| <section> |
| <title>For whom is this Tutorial?</title> |
| <para> |
| This tutorial can be used by anyone who is interested in Axiom and needs to |
| gain a deeper knowledge about the model. However, it is assumed that the |
| reader has a basic understanding of the concepts of XML (such as |
| <link xlink:href="http://www.w3.org/TR/REC-xml-names/">Namespaces</link>) and a working |
| knowledge of tools such as <link xlink:href="http://ant.apache.org/">Ant</link>. |
| Knowledge in similar object models such as DOM will be quite helpful in |
| understanding Axiom, mainly to highlight the differences and similarities |
| between the two, but such knowledge is not assumed. Several links are listed |
| in <xref linkend="links"/> that will help understand the basics of |
| XML. |
| </para> |
| </section> |
| <section> |
| <title>What is Pull Parsing?</title> |
| <para> |
| Pull parsing is a recent trend in XML processing. The previously popular XML |
| processing frameworks such as |
| <link xlink:href="http://en.wikipedia.org/wiki/Simple_API_for_XML">SAX</link> and |
| <link xlink:href="http://en.wikipedia.org/wiki/Document_Object_Model">DOM</link> were |
| "push-based" which means the control of the parsing was in the hands of the |
| parser itself. This approach is fine and easy to use, but it was not |
| efficient in handling large XML documents since a complete memory model will |
| be generated in the memory. Pull parsing inverts the control and hence the |
| parser only proceeds at the users command. The user can decide to store or |
| discard events generated from the parser. Axiom is based on pull parsing. To |
| learn more about XML pull parsing see the |
| <link xlink:href="http://www.bearcave.com/software/java/xml/xmlpull.html">XML pull |
| parsing introduction</link>. |
| </para> |
| </section> |
| <section> |
| <title>A Bit of History</title> |
| <para> |
| As mentioned earlier, Axiom was initially developed as part of Axis and simply |
| called <firstterm>OM</firstterm>. |
| The original OM was proposed as a store for the pull parser events for |
| later processing, at the Axis summit held in Colombo, Sri Lanka, in September |
| 2004. However, this approach was soon improved and OM was pursued as a |
| complete <link xlink:href="http://dret.net/glossary/xmlinfoset">XML infoset</link> model |
| due to its flexibility. Several implementation techniques were attempted |
| during the initial phases. The two most promising techniques were the table |
| based technique and the link list based technique. During the intermediate |
| performance tests the link list based technique proved to be much more memory |
| efficient for smaller and mid sized XML documents. The advantage of the table |
| based OM was only visible for the large and very large XML documents, and |
| hence, the link list based technique was chosen as the most suitable. Initial |
| efforts were focused on implementing the XML infoset (XML Information Set) |
| items which are relevant to the SOAP specification (DTD support, Processing |
| Instruction support, etc were not considered). The advantage of having a |
| tight integration was evident at this stage and this resulted in having SOAP |
| specific interfaces as part of OM rather than a layer on top of it. OM was |
| deliberately made |
| <link xlink:href="http://en.wikipedia.org/wiki/Application_programming_interface">API</link> |
| centric. It allows the implementations to take place independently and |
| swapped without affecting the program later. |
| </para> |
| </section> |
| <section> |
| <title>Features of Axiom</title> |
| <para> |
| Axiom is a lightweight XML infoset representation that supports deferred building |
| That means that the object model can be |
| manipulated as flexibly as any other object model (Such as |
| <link xlink:href="http://www.jdom.org/">JDOM</link>), but underneath, the objects will be |
| created only when they are absolutely required. This leads to much less |
| memory intensive programming. Following is a short feature overview of OM. |
| </para> |
| <itemizedlist> |
| <listitem> |
| <para> |
| <emphasis role="bold">Lightweight</emphasis>: Axiom is specifically targeted to be |
| lightweight. This is achieved by reducing the depth of the hierarchy, |
| number of methods and the attributes enclosed in the objects. This makes |
| the objects less memory intensive. |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| <emphasis role="bold">Deferred building</emphasis>: By far this is the most important |
| feature of Axiom. The objects are not made unless a need arises for them. |
| This passes the control of building over to the object model itself |
| rather than an external builder. |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| <emphasis role="bold">Pull based</emphasis>: For a deferred building mechanism a pull |
| based parser is required. |
| </para> |
| </listitem> |
| </itemizedlist> |
| <para> |
| The Following image shows how Axiom API is viewed by the user |
| </para> |
| <figure> |
| <title>Architecture overview</title> |
| <mediaobject> |
| <imageobject> |
| <imagedata fileref="architecture.jpg" format="JPG"/> |
| </imageobject> |
| </mediaobject> |
| </figure> |
| </section> |
| <section> |
| <title>Relation with StAX</title> |
| <para> |
| <link xlink:href="http://today.java.net/pub/a/today/2006/07/20/introduction-to-stax.html">StAX</link> |
| (<link xlink:href="http://www.jcp.org/en/jsr/detail?id=173">JSR 173</link>) is the standard pull parser API for Java. |
| Axiom makes use of the StAX API to allow application code to access the object model in streaming mode, which means |
| that application code can request an <classname>XMLStreamReader</classname> for any document or element information item. |
| In addition, support for deferred building relies on the usage of a pull parser. The two standard implementations |
| of the Axiom API (LLOM and DOOM) both use the StAX API for that. To work with these implementations, a StAX compliant |
| parser <emphasis>must</emphasis> be present in the classpath. |
| </para> |
| </section> |
| <section> |
| <title>A Bit About Caching</title> |
| <para> |
| Since Axiom is a deferred built object model, It incorporates the concept of |
| caching. Caching refers to the creation of the objects while parsing the pull |
| stream. The reason why this is so important is because caching can be turned |
| off in certain situations. If so the parser proceeds without building the |
| object structure. User can extract the raw pull stream from Axiom and use that |
| instead of the object model. In this case it is sometimes beneficial to switch off |
| caching. <xref linkend="advanced"/> explains |
| more on accessing the raw pull stream and switching on and off the |
| caching. |
| </para> |
| </section> |
| <section> |
| <title>Where Does SOAP Come into Play?</title> |
| <para> |
| In a nutshell <link xlink:href="http://www.w3schools.com/SOAP/soap_intro.asp">SOAP</link> is an |
| information exchange protocol based on XML. SOAP has a defined set of XML |
| elements that should be used in messages. Since Axis2 is a "SOAP Engine" and |
| Axiom is built for Axis2, a set of SOAP specific objects were also defined along |
| with Axiom. These SOAP Objects are extensions of the general object model classes. |
| </para> |
| </section> |
| </chapter> |
| |
| <chapter> |
| <title>Working with Axiom</title> |
| <section> |
| <title>Obtaining the Axiom Binary</title> |
| <para> |
| There are several methods through which the Axiom binary can be obtained: |
| </para> |
| <orderedlist> |
| <listitem> |
| <para> |
| If your project uses Maven, then it is sufficient to add Axiom as a dependency, |
| as described in <xref linkend="using-maven2"/>. Releases are available from |
| the central repository, and snapshots are available from |
| <literal>http://repository.apache.org/snapshots/</literal>. |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| A prebuilt binary distribution can be |
| <link xlink:href="http://ws.apache.org/axiom/download.cgi">downloaded</link> |
| from the site. Source distributions are also available. They can be built |
| using Maven 2, by executing <command>mvn install</command> in the root |
| directory of the distribution. |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| It is also possible to check out the source code for the current development |
| version (trunk) or previous releases from the Subversion repository and build it |
| using Maven 2. Detailed information on getting the source code |
| from the Subversion repository is found |
| <link xlink:href="http://ws.apache.org/axiom/source-repository.html">here</link>. |
| </para> |
| </listitem> |
| </orderedlist> |
| <para> |
| Once the Axiom binary is obtained by any of the above ways, it should be |
| included in the classpath for any of the Axiom based programs to work. |
| Subsequent sections of this guide assume that this build step is complete |
| and <filename>axiom-api-&version;.jar</filename> and <filename>axiom-impl-&version;.jar</filename> are |
| present in the classpath along with the StAX API jar file and a StAX |
| implementation. |
| </para> |
| </section> |
| <section> |
| <title>Creating an object model programmatically</title> |
| <para> |
| An object model instance can be created programmatically by instantiating the objects |
| representing the individual nodes of the document and then assembling them into a |
| tree structure. Axiom defines a set of interfaces representing the different node types. |
| E.g. <classname>OMElement</classname> represents an element, while <classname>OMText</classname> |
| represents character data that appears inside an element. |
| Axiom requires that all node instances are created using a factory. |
| The reason for this is to cater for different implementations of the Axiom API, |
| as shown in <xref linkend="fig_api"/>. |
| </para> |
| <figure xml:id="fig_api"> |
| <title>The Axiom API with different implementations</title> |
| <mediaobject> |
| <imageobject> |
| <imagedata fileref="api.jpg" format="JPG"/> |
| </imageobject> |
| </mediaobject> |
| </figure> |
| <para> |
| Two implementations are currently shipped with Axiom: |
| </para> |
| <itemizedlist> |
| <listitem> |
| <para> |
| The Linked List implementation (LLOM). This is the standard implementation. As the |
| name implies, it uses linked lists to store collections of nodes. |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| DOOM (DOM over OM), which adds support for the DOM API. |
| </para> |
| </listitem> |
| </itemizedlist> |
| <para> |
| For each implementation, there are actually three factories: one for plain XML, and the other |
| ones for the two SOAP versions. The factories for the default implementation can be obtained |
| by calling the appropriate static methods in <classname>OMAbstractFactory</classname>. |
| E.g. <methodname>OMAbstractFactory.getOMFactory()</methodname> will return the proper |
| factory for plain XML. <xref linkend="list2"/> shows how this factory is used to create |
| several <classname>OMElement</classname> instances. |
| </para> |
| <example xml:id="list2"> |
| <title>Creating an object model programmatically</title> |
| <programlisting>//create a factory |
| OMFactory factory = OMAbstractFactory.getOMFactory(); |
| //use the factory to create two namespace objects |
| OMNamespace ns1 = factory.createOMNamespace("bar","x"); |
| OMNamespace ns2 = factory.createOMNamespace("bar1","y"); |
| //use the factory to create three elements |
| OMElement root = factory.createOMElement("root",ns1); |
| OMElement elt11 = factory.createOMElement("foo1",ns1); |
| OMElement elt12 = factory.createOMElement("foo2",ns1);</programlisting> |
| </example> |
| <para> |
| The Axiom API defines several methods to assemble individual objects into a tree |
| structure. The most prominent ones are the following two methods available on |
| <classname>OMElement</classname> instances: |
| </para> |
| <programlisting>public void addChild(OMNode omNode); |
| public void addAttribute(OMAttribute attr);</programlisting> |
| <para> |
| <methodname>addChild</methodname> will always add the child as the last child of the parent. |
| <xref linkend="ex-addChild"/> shows how this method is used to assemble the three elements |
| created in <xref linkend="list2"/> into a tree structure. |
| </para> |
| <example xml:id="ex-addChild"> |
| <title>Usage of <methodname>addChild</methodname></title> |
| <programlisting>//set the children |
| elt11.addChild(elt21); |
| elt12.addChild(elt22); |
| root.addChild(elt11); |
| root.addChild(elt12);</programlisting> |
| </example> |
| <para> |
| A given node can be removed from the tree by calling the <methodname>detach()</methodname> |
| method. A node can also be removed from the tree by calling the remove |
| method of the returned iterator which will also call the detach method of |
| the particular node internally. |
| </para> |
| </section> |
| <section> |
| <title>Creating an object model by parsing an XML document</title> |
| <para> |
| Creating an object model from an existing document involves a second concept, namely |
| that of a <firstterm>builder</firstterm>. The responsibility of the builder is to |
| instantiate nodes corresponding to the information items in the document being parsed. |
| Note that as for programmatically created object models, this still involves the |
| factory, but it is now the builder that will call the <methodname>createXxx</methodname> |
| methods of the factory. |
| </para> |
| <para> |
| There are different types of builders, corresponding to different types of |
| input documents, namely: plain XML, SOAP, XOP and MTOM. The appropriate type of |
| builder should be created using the corresponding static method in |
| <classname>OMXMLBuilderFactory</classname>. <xref linkend="list1"/> shows the |
| correct method of creating an object model for a plain XML document from an input stream. |
| </para> |
| <note> |
| <para> |
| As explained in <xref linkend="OMXMLBuilderFactory"/>, this is the recommended way |
| of creating a builder starting with Axiom 1.2.11. In previous versions, this was done |
| by instantiating <classname>StAXOMBuilder</classname> or one of its subclasses directly. |
| This approach is still supported as well. |
| </para> |
| </note> |
| <example xml:id="list1"> |
| <title>Creating an object model from an input stream</title> |
| <programlisting>//create the input stream |
| InputStream in = new FileInputStream(file); |
| |
| //create the builder |
| OMXMLParserWrapper builder = OMXMLBuilderFactory.createOMBuilder(in); |
| |
| //get the root element |
| OMElement documentElement = builder.getDocumentElement();</programlisting> |
| </example> |
| <para> |
| Several differences exist between |
| a programmatically created <classname>OMNode</classname> and <classname>OMNode</classname> instances created by a builder. The most |
| important difference is that the former will have no builder object enclosed, |
| where as the latter always carries a reference to its builder. |
| </para> |
| <para> |
| As stated earlier, since the object model is built as and |
| when required, each and every <classname>OMNode</classname> should have a reference to its builder. |
| If this information is not available, it is due to the object being created |
| without a builder. This difference becomes evident when the user tries to get |
| a non caching pull parser from the <classname>OMElement</classname>. This will be discussed in more |
| detail in <xref linkend="advanced"/>. |
| </para> |
| <para> |
| In order to understand the requirement of the builder reference in each |
| and every <classname>OMNode</classname>, consider the following scenario. Assume that the parent |
| element is built but the children elements are not. If the parent is asked to |
| iterate through its children, this information is not readily available to |
| the parent element and it should build its children first before attempting |
| to iterate them. In order to provide a reference of the builder, each and |
| every node of the object model should carry the reference to its builder. Each |
| and every <classname>OMNode</classname> carries a flag that states its build status. Apart from this |
| restriction there are no other constraints that keep the programmer away from |
| mixing up programmatically made <classname>OMNode</classname> objects with <classname>OMNode</classname> objects built from |
| builders. |
| </para> |
| <para> |
| The SOAP object hierarchy is made in the most natural way for a |
| programmer. An inspection of the API will show that it is quite close to the |
| SAAJ API but with no bindings to DOM or any other model. The SOAP classes |
| extend basic Axiom classes (such as the <classname>OMElement</classname>) hence, one can access a SOAP |
| document either with the abstraction of SOAP or drill down to the underlying |
| XML Object model with a simple casting. |
| </para> |
| </section> |
| <section> |
| <title>Namespaces</title> |
| <para> |
| Namespaces are a tricky part of any XML object model and is the same in |
| Axiom. However, the interface to the namespace have been made very simple. |
| <classname>OMNamespace</classname> is the class that represents a namespace with intentionally |
| removed setter methods. This makes the <classname>OMNamespace</classname> immutable and allows |
| the underlying implementation to share the objects without any |
| difficulty. |
| </para> |
| <para> |
| Following are the important methods available in <classname>OMElement</classname> to handle |
| namespaces. |
| </para> |
| <programlisting>public OMNamespace declareNamespace(String uri, String prefix); |
| public OMNamespace declareNamespace(OMNamespace namespace); |
| public OMNamespace findNamespace(String uri, String prefix);</programlisting> |
| <para> |
| The <methodname>declareNamespaceXX</methodname> methods are fairly straightforward. Add a namespace |
| to namespace declarations section. Note that a namespace declaration that has |
| already being added will not be added twice. <methodname>findNamespace</methodname> is a very handy |
| method to locate a namespace object higher up the object tree. It searches |
| for a matching namespace in its own declarations section and jumps to the |
| parent if it's not found. The search progresses up the tree until a matching |
| namespace is found or the root has been reached. |
| </para> |
| <para> |
| During the serialization a directly created namespace from the factory |
| will only be added to the declarations when that prefix is encountered by the |
| serializer. More of the serialization matters will be discussed in |
| <xref linkend="serializer"/>. |
| </para> |
| <para> |
| The following simple code segment shows how the namespaces are dealt in OM |
| </para> |
| <example xml:id="list6"> |
| <title>Creating an OM document with namespaces</title> |
| <programlisting>OMFactory factory = OMAbstractFactory.getOMFactory(); |
| OMNamespace ns1 = factory.createOMNamespace("bar","x"); |
| OMElement root = factory.createOMElement("root",ns1); |
| OMNamespace ns2 = root.declareNamespace("bar1","y"); |
| OMElement elt1 = factory.createOMElement("foo",ns1); |
| OMElement elt2 = factory.createOMElement("yuck",ns2); |
| OMText txt1 = factory.createOMText(elt2,"blah"); |
| elt2.addChild(txt1); |
| elt1.addChild(elt2); |
| root.addChild(elt1);</programlisting> |
| </example> |
| <para> |
| Serialization of the root element produces the following XML: |
| </para> |
| <programlisting><?db-font-size 80%?><x:root xmlns:x="bar" xmlns:y="bar1"><x:foo><y:yuck>blah</y:yuck></x:foo></x:root></programlisting> |
| </section> |
| <section> |
| <title>Traversing</title> |
| <para> |
| Traversing the object structure can be done in the usual way by using the |
| list of children. Note however, that the child nodes are returned as an |
| iterator. The Iterator supports the 'Axiom way' of accessing elements and is |
| more convenient than a list for sequential access. The following code sample |
| shows how the children can be accessed. The children are of the type <classname>OMNode</classname> |
| that can either be <classname>OMText</classname> or <classname>OMElement</classname>. |
| </para> |
| <programlisting>Iterator children = root.getChildren(); |
| while(children.hasNext()){ |
| OMNode node = (OMNode)children.next(); |
| }</programlisting> |
| <para> |
| Apart from this, every <classname>OMNode</classname> has links to its siblings. If more thorough |
| navigation is needed the <methodname>getNextOMSibling()</methodname> |
| and <methodname>getPreviousOMSibling()</methodname> methods can be |
| used. A more selective set can be chosen by using the |
| <methodname>getChildrenWithName(QName)</methodname> methods. |
| The <methodname>getChildWithName(Qname)</methodname> method |
| returns the first child that matches the given <classname>QName</classname> and |
| <methodname>getChildrenWithName(QName)</methodname> returns a collection containing all the matching |
| children. The advantage of these iterators is that they won't build the whole |
| object structure at once, until its required. |
| </para> |
| <important> |
| <para> |
| As explained in <xref linkend="iterator-changes"/>, in Axiom 1.2.10 and earlier, |
| all iterator implementations internally stayed one |
| step ahead of their apparent location. This could have the side effect of building elements |
| that are not intended to be built at all. |
| </para> |
| </important> |
| </section> |
| <section xml:id="serializer"> |
| <title>Serializer</title> |
| <para> |
| An Axiom tree can be serialized either as the pure object model or the pull event |
| stream. The serialization uses a <classname>XMLStreamWriter</classname> object to write out the |
| output and hence, the same serialization mechanism can be used to write |
| different types of outputs (such as text, binary, etc.). |
| </para> |
| <para> |
| A caching flag is provided by Axiom to control the building of the in-memory |
| object model. The <classname>OMNode</classname> has two methods, |
| <methodname>serializeAndConsume</methodname> and <methodname>serialize</methodname>. When |
| <methodname>serializeAndConsume</methodname> is called the cache flag is reset and the serializer does |
| not cache the stream. Hence, the object model will not be built if the cache |
| flag is not set. |
| </para> |
| <para> |
| The serializer serializes namespaces in the following way: |
| </para> |
| <orderedlist> |
| <listitem> |
| <para> |
| When a namespace that is in the scope but not yet declared is |
| encountered, it will then be declared. |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| When a namespace that is in scope and already declared is encountered, |
| the existing declarations prefix is used. |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| When the namespaces are declared explicitly using the elements |
| <methodname>declareNamespace()</methodname> method, they will be serialized even if those |
| namespaces are not used in that scope. |
| </para> |
| </listitem> |
| </orderedlist> |
| <para> |
| Because of this behavior, if a fragment of the XML is serialized, it will |
| also be <emphasis>namespace qualified</emphasis> with the necessary namespace |
| declarations. |
| </para> |
| <para> |
| Here is an example that shows how to write the output to the console, with |
| reference to the earlier code sample- <xref linkend="list1"/> |
| that created a SOAP envelope. |
| </para> |
| <programlisting>XMLStreamWriter writer = |
| XMLOutputFactory.newInstance().createXMLStreamWriter(System.out); |
| //dump the output to console with caching |
| envelope.serialize(writer); |
| writer.flush();</programlisting> |
| <para> |
| or simply |
| </para> |
| <programlisting>System.out.println(root.toStringWithConsume());</programlisting> |
| <para> |
| The above mentioned features of the serializer forces a correct |
| serialization even if only a part of the Axiom tree is serialized. The following |
| serializations show how the serialization mechanism takes the trouble to |
| accurately figure out the namespaces. The example is from <xref linkend="list6"/> |
| which creates a small object model programmatically. |
| Serialization of the root element produces the following: |
| </para> |
| <programlisting><?db-font-size 80%?><x:root xmlns:x="bar" xmlns:y="bar1"><x:foo><y:yuck>blah</y:yuck></x:foo></x:root></programlisting> |
| <para> |
| However, serialization of only the foo element produces the following: |
| </para> |
| <programlisting><?db-font-size 80%?><x:foo xmlns:x="bar"><y:yuck xmlns:y="bar1">blah</y:yuck></x:foo></programlisting> |
| <para> |
| Note how the serializer puts the relevant namespace declarations in place. |
| </para> |
| </section> |
| <section> |
| <title>Complete Code for the Axiom based Document Building and Serialization</title> |
| <para> |
| The following code segment shows how to use Axiom for completely building |
| a document and then serializing it into text pushing the output to the |
| console. Only the important sections are shown here. The complete program |
| listing can be found in <xref linkend="appendix"/>. |
| </para> |
| <programlisting>//create the input stream |
| InputStream in = new FileInputStream(file); |
| |
| //create the builder |
| OMXMLParserWrapper builder = OMXMLBuilderFactory.createOMBuilder(in); |
| |
| //get the root element |
| OMElement documentElement = builder.getDocumentElement(); |
| |
| //dump the out put to console with caching |
| System.out.println(documentElement.toStringWithConsume());</programlisting> |
| </section> |
| <section xml:id="StAXUtils"> |
| <title>Creating stream readers and writers using <classname>StAXUtils</classname></title> |
| <para> |
| The normal way to create <classname>XMLStreamReader</classname> and |
| <classname>XMLStreamWriter</classname> instances is to first request a |
| <classname>XMLInputFactory</classname> or <classname>XMLOutputFactory</classname> |
| instance from the StAX API and then use the factory methods to create the |
| reader or writer. |
| </para> |
| <para> |
| Doing this every time a reader or writer is created is cumbersome and also |
| introduces some overhead because on every invocation the <methodname>newInstance</methodname> |
| methods in <classname>XMLInputFactory</classname> and <classname>XMLOutputFactory</classname> |
| go through the process of looking up the StAX implementation to use and creating |
| a new instance of the factory. The only case where this is really needed is when |
| it is necessary to configure the factory in a special way (by setting properties on it). |
| </para> |
| <para> |
| Axiom has a utility class called <classname>StAXUtils</classname> that provides |
| methods to easily create readers and writers configured with default settings. |
| It also keeps the created factories in a cache to improve performance. The caching |
| occurs by (context) class loader and it is therefore safe to use <classname>StAXUtils</classname> |
| in a runtime environment with a complex class loader hierarchy. |
| </para> |
| <caution> |
| <para> |
| Axiom 1.2.8 implicitly assumed that <classname>XMLInputFactory</classname> and |
| <classname>XMLOutputFactory</classname> instances are thread safe. This is the case |
| for Woodstox (which is the default StAX implementation used by Axiom), but not |
| e.g. for the StAX implementation shipped with Sun's Java 6 runtime environment. |
| Therefore, when using Axiom versions prior to 1.2.9, you should avoid using <classname>StAXUtils</classname> |
| together with a StAX implementation other than Woodstox, especially in a highly |
| concurrent environment. The issue has been fixed in Axiom 1.2.9. See |
| <link xlink:href="https://issues.apache.org/jira/browse/AXIOM-74">AXIOM-74</link> |
| for more details. |
| </para> |
| </caution> |
| <para> |
| <classname>StAXUtils</classname> also enables a property file based configuration |
| mechanism to change the default factory settings at assembly or deployment time of |
| the application using Axiom. This is described in more details in |
| <xref linkend="factory.properties"/>. |
| </para> |
| <important> |
| <para> |
| The <methodname>getInputFactory</methodname> and <methodname>getOutputFactory</methodname> |
| methods in <classname>StAXUtils</classname> give access to the cached factories. |
| In versions prior to 1.2.9, Axiom didn't restrict access to the <methodname>setProperty</methodname> method |
| of these factories. In principle this makes it possible to change the configuration |
| of these factories for the whole application. However, since this depends on the |
| implementation details of <classname>StAXUtils</classname> (e.g. how factories |
| are cached) and since there is a proper configuration mechanism for that purpose, |
| using this possibility is strongly discouraged. Starting with version 1.2.9, Axiom |
| restricts access to <methodname>setProperty</methodname> to prevent tampering with |
| the cached factories. |
| </para> |
| </important> |
| <para> |
| The methods in <classname>StAXUtils</classname> to create readers and writers |
| are rather self-explaining. For example to create an <classname>XMLStreamReader</classname> |
| from an <classname>InputStream</classname>, use the following code: |
| </para> |
| <programlisting>InputStream in = ... |
| XMLStreamReader reader = StAXUtils.createXMLStreamReader(in);</programlisting> |
| </section> |
| <section> |
| <title>Releasing the parser</title> |
| <para> |
| As we have seen previously, when creating an object model from a stream, all nodes keep a |
| reference to the builder and thus to the underlying parser. Since an XML parser instance is |
| a heavyweight object, it is important to release it as soon as it is no longer required. |
| The <methodname>close</methodname> method defined by the <classname>OMSerializable</classname> |
| interface it used for that. Note that it doesn't matter an which node this method is |
| called; it will always close and release the parser for the whole tree. The |
| <varname>build</varname> parameter of the <methodname>close</methodname> method specifies |
| if the node should be built before closing the parser. |
| </para> |
| <para> |
| To illustrate this, consider <xref linkend="list1"/>. After finishing the processing of the |
| object model and assuming that it will not access the object model afterwards, the code should |
| be completed by the following instruction: |
| </para> |
| <programlisting>documentElement.close(false);</programlisting> |
| <para> |
| Closing the parser is especially important in applications that process large numbers of |
| XML documents. In addition, some StAX implementation are able to <quote>recycle</quote> |
| parsers, i.e. to reset a parser instance and to reuse it on another input stream. However, this |
| can only work if the parser has been closed explicitly or if the instance has been marked for |
| finalization by the Java VM. Closing the parser explicitly as shown above will reduce the |
| memory footprint of the application if this type of parser is used. |
| </para> |
| </section> |
| <section> |
| <title>Exception handling</title> |
| <para> |
| The fact that Axiom uses deferred building means that a call to a method in one |
| of the object model classes may cause Axiom to read events from the underlying |
| StAX parser, unless the node has already been built or if it was created |
| programmatically. If an I/O error occurs or if the XML document being read is |
| not well formed, an exception will be reported by the parser. This exception is |
| propagated to the user code as an <classname>OMException</classname>. |
| </para> |
| <para> |
| Note that <classname>OMException</classname> is an unchecked exception. |
| Strictly speaking this is in violation of the principle that unchecked exceptions |
| should be reserved for problems resulting from programming problems. |
| There are however several compelling reasons to use unchecked exceptions in this |
| case: |
| </para> |
| <itemizedlist> |
| <listitem> |
| <para> |
| The same API is used to work with programmatically created object models |
| and with object models created from an XML document. On a programmatically |
| created object model, an <classname>OMException</classname> in general |
| indicates a programming problem. Moreover one of the design goals of Axiom |
| is to give the user code the illusion that it is interacting with a complete |
| in-memory representation of an XML document, even if behind the scenes |
| Axiom will only create the objects on demand. Using checked exceptions |
| would break that abstraction. |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| In most cases, code interacting with the object model will not be able |
| to recover from an <classname>OMException</classname>. Consider for example |
| a utility method that receives an <classname>OMElement</classname> as input |
| and that is supposed to extract some data from this information item. |
| When a parsing error occurs while iterating over the children of that |
| element, there is nothing the utility method could do to recover from this |
| error. |
| </para> |
| <para> |
| The only place where it makes sense to catch this type of exception and to |
| attempt to recover from it is in the code that creates the |
| <classname>XMLStreamReader</classname> and builder. It is clear that |
| it would not be reasonable to force developers to declare a checked exception |
| on every method that interacts with an Axiom object model only to allow |
| propagation of that exception to the code that initially created the parser. |
| </para> |
| </listitem> |
| </itemizedlist> |
| <para> |
| The situation is actually quite similar to that encountered in three-tier |
| applications, where the DAO layer in general wraps checked exceptions from |
| the database in an unchecked exception because the business logic and the |
| presentation tier will not be able to recover from these errors. |
| </para> |
| <para> |
| When catching an <classname>OMException</classname> special attention should |
| be paid if the code handling the exception again tries to access the object model. |
| Indeed this will inevitably result in another exception being triggered, unless the |
| code only accesses those parts of the tree that have been built successfully. |
| E.g. the following code will give unexpected results because the call to |
| <methodname>serializeAndConsume</methodname> will almost certainly trigger another |
| exception: |
| </para> |
| <programlisting>OMElement element = ... |
| try { |
| ... |
| } catch (OMException ex) { |
| ex.printStackTrace(); |
| element.serializeAndConsume(System.out); |
| }</programlisting> |
| <caution> |
| <para> |
| In Axiom versions prior to 1.2.8, an attempt to access the object model after |
| an exception has been reported by the underlying parser may result in an |
| <classname>OutOfMemoryError</classname> or cause Axiom to lock itself up in |
| an infinite loop. The reason for this is that in some cases, after throwing an |
| exception, the Woodstox parser (which is the default StAX implementation used |
| by Axiom) is left in an inconsistent state in which it will return an infinite |
| sequence of events. Starting with Axiom 1.2.8, the object model builder |
| will never attempt to read new events from a parser that has previously reported |
| an I/O or parsing error. These versions of Axiom are therefore safe; see |
| <link xlink:href="https://issues.apache.org/jira/browse/AXIOM-34">AXIOM-34</link> |
| for more details. |
| </para> |
| </caution> |
| <note> |
| <para> |
| The discussion in this section suggests that Axiom should make a |
| clear distinction between exceptions caused by parser errors and |
| exceptions caused by programming problems or other errors, e.g. |
| by using distinct subclasses of <classname>OMException</classname>. |
| This is currently not the case. This issue may be addressed in a future |
| version of Axiom. |
| </para> |
| </note> |
| </section> |
| </chapter> |
| |
| <chapter xml:id="advanced"> |
| <title>Advanced Operations with Axiom</title> |
| <section> |
| <title>Accessing the Pull Parser</title> |
| <para> |
| Axiom is tightly integrated with StAX and the |
| <methodname>getXMLStreamReader()</methodname> and <methodname>getXMLStreamReaderWithoutCaching()</methodname> methods in the |
| <classname>OMElement</classname> provides a <classname>XMLStreamReader</classname> |
| object. This <classname>XMLStreamReader</classname> instance |
| has a special capability of switching between the underlying stream and the |
| Axiom object tree if the cache setting is off. However, this functionality is |
| completely transparent to the user. This is further explained in the |
| following paragraphs. |
| </para> |
| <para> |
| Axiom has the concept of caching, and the Axiom tree is the actual cache of the events |
| fired. However, the requester can choose to get the pull events from the |
| underlying stream rather than the Axiom tree. This can be achieved by getting |
| the pull parser with the cache off. If the pull parser was obtained without |
| switching off cache, the new events fired will be cached and the tree |
| updated. This returned pull parser will switch between the object structure |
| and the stream underneath, and the users need not worry about the differences |
| caused by the switching. The exact pull stream the original document would |
| have provided would be produced even if the Axiom tree was fully or partially |
| built. The <methodname>getXMLStreamReaderWithoutCaching()</methodname> method is very useful when the |
| events need to be handled in a pull based manner without any intermediate |
| models. This makes such operations faster and efficient. |
| </para> |
| <important> |
| <para> |
| For consistency reasons once the cache is |
| switched off it cannot be switched on again. |
| </para> |
| </important> |
| </section> |
| </chapter> |
| |
| <chapter> |
| <title>Integrating Axiom into your project</title> |
| <section xml:id="using-maven2"> |
| <title>Using Axiom in a Maven 2 project</title> |
| <para> |
| If your project uses Maven 2, it is fairly easy to add Axiom to your project. |
| Simply add the following entries to the <tag class="element">dependencies</tag> |
| section of <filename>pom.xml</filename>: |
| </para> |
| <programlisting><![CDATA[<dependency> |
| <groupId>org.apache.ws.commons.axiom</groupId> |
| <artifactId>axiom-api</artifactId> |
| <version>]]>&version;<![CDATA[</version> |
| </dependency> |
| <dependency> |
| <groupId>org.apache.ws.commons.axiom</groupId> |
| <artifactId>axiom-impl</artifactId> |
| <version>]]>&version;<![CDATA[</version> |
| <scope>runtime</scope> |
| </dependency>]]></programlisting> |
| <para> |
| All Axiom releases are deployed to the Maven central repository and there is no need |
| to add an entry to the <tag class="element">repositories</tag> section. |
| However, if you want to work with the development (snapshot) version of Axiom, it |
| is necessary to add the Apache Snapshot Repository: |
| </para> |
| <programlisting><![CDATA[<repository> |
| <id>apache.snapshots</id> |
| <name>Apache Snapshot Repository</name> |
| <url>http://repository.apache.org/snapshots/</url> |
| <releases> |
| <enabled>false</enabled> |
| </releases> |
| </repository>]]></programlisting> |
| <tip> |
| <para> |
| If you are working on another Apache project, you don't need to add the snapshot repository |
| in the POM file since it is already declared in the <literal>org.apache:apache</literal> |
| parent POM. |
| </para> |
| </tip> |
| </section> |
| <section> |
| <title>Applying application wide configuration</title> |
| <para> |
| Sometimes it is necessary to customize some particular aspects of Axiom for an entire |
| application. There are several things that can be configured through system properties |
| and/or property files. This is also important when using third party applications or |
| libraries that depend on Axiom. |
| </para> |
| <section xml:id="factory.properties"> |
| <title>Changing the default StAX factory settings</title> |
| <note> |
| <para> |
| The information in this section only applies to |
| <classname>XMLStreamReader</classname> or <classname>XMLStreamWriter</classname> |
| instances created using <classname>StAXUtils</classname> |
| (see <xref linkend="StAXUtils"/>). Readers and writers created using the |
| standard StAX APIs will keep their default settings as defined by the |
| implementation (or dictated by the StAX specifications). |
| </para> |
| </note> |
| <note> |
| <para> |
| The feature described in this section was introduced in Axiom 1.2.9. |
| </para> |
| </note> |
| <para> |
| When creating a new <classname>XMLInputFactory</classname> (resp. |
| <classname>XMLInputFactory</classname>), <classname>StAXUtils</classname> |
| looks for a property file named <filename>XMLInputFactory.properties</filename> |
| (resp. <filename>XMLOutputFactory.properties</filename>) in the classpath, |
| using the same class loader as the one from which the factory is loaded |
| (by default this is the context classloader). |
| If a corresponding resource is found, the properties in that file |
| are applied to the factory using the <methodname>XMLInputFactory#setProperty</methodname> |
| (resp. <methodname>XMLOutputFactory#setProperty</methodname>) method. |
| </para> |
| <para> |
| This feature can be used to set factory properties of type <classname>Boolean</classname>, |
| <classname>Integer</classname> and <classname>String</classname>. The following |
| sections present some sample use cases. |
| </para> |
| <section> |
| <title>Changing the serialization of the CR-LF character sequence</title> |
| <para> |
| Section 2.11 of <biblioref linkend="bib.xml"/> specifies that an <quote>XML processor |
| must behave as if it normalized all line breaks in external parsed entities (including |
| the document entity) on input, before parsing, by translating both the two-character |
| sequence #xD #xA and any #xD that is not followed by #xA to a single #xA |
| character.</quote> This implies that when a Windows style line ending, i.e. a CR-LF |
| character sequence is serialized literally into an XML document, the CR character |
| will be lost during deserialization. Depending on the use case this may or may not |
| be desirable. |
| </para> |
| <para> |
| The only way to strictly preserve CR characters is to serialize them as |
| character entities, i.e. <tag class="genentity">#xD</tag>. This is the default |
| behavior of Woodstox. This can be easily checked using the following Java snippet: |
| </para> |
| <programlisting>OMFactory factory = OMAbstractFactory.getOMFactory(); |
| OMElement element = factory.createOMElement("root", null); |
| element.setText("Test\r\nwith CRLF"); |
| element.serialize(System.out);</programlisting> |
| <para> |
| This code produces the following output: |
| </para> |
| <screen><![CDATA[<root>Test
 |
| with CRLF</root>]]></screen> |
| <note> |
| <para> |
| From Axiom's point of view this is actually a reasonable behavior. |
| The reason is that when creating an <classname>OMText</classname> node programmatically, |
| it is easy for the user code to normalize the text content to avoid the |
| appearance of the character entity. On the other hand, if the default |
| behavior was to serialize CR-LF literally (implying that the CR character |
| will be lost during deserialization), it would be difficult (if not |
| impossible) for user code that needs to strictly preserve the text data |
| to construct the object model in such a way as to force serialization of |
| the CR as character entity. |
| </para> |
| </note> |
| <para> |
| In some cases this behavior may be undesirable<footnote><para>See |
| <link xlink:href="http://jira.codehaus.org/browse/WSTX-94">WSTX-94</link> for a discussion |
| about this.</para></footnote>. Fortunately Woodstox allows to modify this behavior |
| by changing the value of the <varname>com.ctc.wstx.outputEscapeCr</varname> property |
| on the <classname>XMLOutputFactory</classname>. If Axiom is used (and in particular |
| <classname>StAXUtils</classname>) than this can be achieved by adding |
| a <filename>XMLOutputFactory.properties</filename> file with the following content |
| to the classpath (in the default package): |
| </para> |
| <programlisting>com.ctc.wstx.outputEscapeCr=false</programlisting> |
| <para> |
| Now the output of the Java snippet shown above will be: |
| </para> |
| <screen><![CDATA[<root>Test |
| with CRLF</root>]]></screen> |
| </section> |
| <section> |
| <title>Preserving CDATA sections during parsing</title> |
| <para> |
| By default, <classname>StAXUtils</classname> creates StAX parsers in coaelescing mode. |
| In this mode, the parser will never return two character data events in sequence, while |
| in non coaelescing mode, the parser is allowed to break up character data into smaller |
| chunks and to return multiple consecutive character events, which may improve throughput |
| for documents containing large text nodes. |
| It should be noted that <classname>StAXUtils</classname> overrides the default settings |
| mandated by the StAX specification, which specifies that by default, a StAX parser must |
| be in non coalescing mode. The primary reason is compatibility: older versions of |
| Woodstox had coalescing switched on by default. |
| </para> |
| <para> |
| A side effect of the default settings chosen by Axiom is that by default, CDATA sections |
| are not reported by parser created by |
| <classname>StAXUtils</classname>. The reason is that in coalescing mode, the parser will |
| not only coaelsce adjacent text nodes, but also CDATA sections. Applications that require |
| correct reporting of CDATA sections should therefore disable coalescing. This can be |
| achieved by creating a <filename>XMLInputFactory.properties</filename> file with the |
| following content: |
| </para> |
| <programlisting>javax.xml.stream.isCoalescing=false</programlisting> |
| </section> |
| </section> |
| </section> |
| <section> |
| <title>Migrating from older Axiom versions</title> |
| <para> |
| The release notes provide information about changes in Axiom that might impact existing |
| code when migrating from an older version of Axiom. Note that they are not |
| meant as a change log that lists all changes or new features. Also, before upgrading |
| to a newer Axiom version, you should always check if your code uses methods or classes |
| that have been deprecated. You should fix all deprecation warnings before changing the |
| Axiom version. In general the Javadoc of the deprecated class or method gives you |
| a hint on how to change your code. |
| </para> |
| </section> |
| </chapter> |
| |
| <chapter> |
| <title>Common mistakes, problems and anti-patterns</title> |
| <para> |
| This chapter presents some of the common mistakes and problems people face when writing code |
| using Axiom, as well as anti-patterns that should be avoided. |
| </para> |
| <section> |
| <title>Violating the <classname>javax.activation.DataSource</classname> contract</title> |
| <para> |
| When working with binary (base64) content, it is sometimes necessary to write a |
| custom <classname>DataSource</classname> implementation to wrap binary data that is |
| available in a different form (and for which Axiom or the Java Activation Framework |
| has no out-of-the-box data source implementation). Data sources are also sometimes |
| (but less frequently) used in conjunction with <classname>OMSourcedElement</classname> |
| and <classname>OMDataSource</classname>. |
| </para> |
| <para> |
| The documentation of the <classname>DataSource</classname> is very clear on the expected |
| behavior of the <methodname>getInputStream</methodname> method: |
| </para> |
| <programlisting>/** |
| * This method returns an InputStream representing |
| * the data and throws the appropriate exception if it can |
| * not do so. Note that a new InputStream object must be |
| * returned each time this method is called, and the stream must be |
| * positioned at the beginning of the data. |
| * |
| * @return an InputStream |
| */ |
| public InputStream getInputStream() throws IOException;</programlisting> |
| <para> |
| A common mistake is to implement the data source in a way that makes |
| <methodname>getInputStream</methodname> <quote>destructive</quote>. Consider |
| the implementation shown in <xref linkend="InputStreamDataSource"/><footnote><para>The example |
| shown is actually a simplified version of code that is |
| <link xlink:href="http://svn.apache.org/repos/asf/axis/axis2/java/core/tags/v1.5/modules/kernel/src/org/apache/axis2/builder/unknowncontent/InputStreamDataSource.java">part of Axis2 1.5</link>.</para></footnote>. |
| It is clear that this data source can only be read once and that any subsequent call to |
| <methodname>getInputStream</methodname> will return an already closed input stream. |
| </para> |
| <example xml:id="InputStreamDataSource"> |
| <title><classname>DataSource</classname> implementation that violates the interface contract</title> |
| <programlisting>public class InputStreamDataSource implements DataSource { |
| private final InputStream is; |
| |
| public InputStreamDataSource(InputStream is) { |
| this.is = is; |
| } |
| |
| public String getContentType() { |
| return "application/octet-stream"; |
| } |
| |
| public InputStream getInputStream() throws IOException { |
| return is; |
| } |
| |
| public String getName() { |
| return null; |
| } |
| |
| public OutputStream getOutputStream() throws IOException { |
| throw new UnsupportedOperationException(); |
| } |
| }</programlisting> |
| </example> |
| <para> |
| What makes this mistake so vicious is that very likely it will not cause |
| problems immediately. The reason is that Axiom is optimized to read the data |
| only when necessary, which in most cases means only once! However, in some cases |
| it is unavoidable to read the data several times. When that happens, the broken |
| <classname>DataSource</classname> implementation will cause problems that may |
| be extremely hard to debug. |
| </para> |
| <para> |
| Imagine for example<footnote><para>For another example, see |
| <link xlink:href="http://markmail.org/thread/omx7umk5fnpb6dnc"/>.</para></footnote> |
| that the implementation shown above is used to produce an |
| MTOM message. At first this will work without any problems because the data |
| source is read only once when serializing the message. If later on the MTOM |
| threshold feature is enabled, the broken implementation will (in the worst case) |
| cause the corresponding MIME parts to be empty or (in the best case) trigger an |
| I/O error because Axiom attempts to read from an already closed stream. |
| The reason for this is that when an MTOM threshold is set, Axiom reads the data |
| source twice: once to determine if its size exceeds the |
| threshold<footnote><para>To do this, Axiom doesn't read the entire data source, |
| but only reads up to the threshold.</para></footnote> and once during |
| serialization of the message. |
| </para> |
| </section> |
| <section> |
| <title>Issues that <quote>magically</quote> disappear</title> |
| <para> |
| Quite frequently users post messages on the Axiom related mailing lists about |
| issues that seem to disappear by <quote>magic</quote> when they try to debug |
| them. The reason why this can happen is simple. As explained earlier, Axiom uses |
| deferred building, but at the same time does its best to hide that from the user, |
| so that he doesn't need to worry about whether the object model has already been |
| built or not. On the other hand, when serializing the object model to XML or when |
| requesting a pull parser (<classname>XMLStreamReader</classname>) from a node, |
| the code paths taken may be radically different depending on whether or not |
| the corresponding part of the tree has already been built. This is especially |
| true when caching is disabled. |
| </para> |
| <para> |
| While the end result should be the same in all cases, it is also clear that |
| in some circumstances an issue that occurs with an incompletely built tree may |
| disappear if there is something that causes Axiom to build the rest of the object |
| model. What is important to understand is that the <quote>something</quote> may |
| be as trivial as a call to the <methodname>toString</methodname> method of an |
| <classname>OMNode</classname>! The fact that adding |
| <methodname>System.out.println</methodname> statements or logging instructions |
| is a common debugging technique then explains why issues sometimes seem to |
| <quote>magically</quote> disappear during debugging. |
| </para> |
| <para> |
| Finally, it should be noted that inspecting an <classname>OMNode</classname> |
| in a debugger also causes a call to the <methodname>toString</methodname> |
| method on that object. This means that by just clicking on something in the |
| <quote>Variables</quote> window of your debugger, you may completely change the |
| state of the process that is being debugged! |
| </para> |
| </section> |
| <section> |
| <title>The OM-inside-OMDataSource anti-pattern</title> |
| <section> |
| <title>Weak version</title> |
| <para> |
| <classname>OMDataSource</classname> objects are used in conjunction with |
| <classname>OMSourcedElement</classname> to build Axiom object model instances |
| that contain information items that are represented using a framework or API |
| other than Axiom. Wrapping this <quote>foreign</quote> data in an |
| <classname>OMDataSource</classname> and adding it to the Axiom object model |
| using an <classname>OMSourcedElement</classname> in most cases avoids the |
| conversion of the data to the <quote>native</quote> Axiom object |
| model<footnote><para>An exception is when code tries to access the children |
| of the <classname>OMSourcedElement</classname>. In this case, the |
| <classname>OMSourcedElement</classname> will be <firstterm>expanded</firstterm>, |
| i.e. the data will be converted to the native Axiom object model.</para></footnote>. |
| The <classname>OMDataSource</classname> contract requires the implementation |
| to support two different ways of providing the data, both relying on StAX: |
| </para> |
| <itemizedlist> |
| <listitem> |
| <para> |
| The implementation must be able to provide a pull parser |
| (<classname>XMLStreamReader</classname>) from which the infoset can be |
| read. |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| The data source must be able to serialize the infoset to an |
| <classname>XMLStreamWriter</classname> (push). |
| </para> |
| </listitem> |
| </itemizedlist> |
| <para> |
| For the consumer of an event based representation of an XML infoset, it is in |
| general easier to work in pull mode. That is the reason why StAX has gained |
| popularity over push based approaches such as SAX. On the other hand for a producer |
| such as an <classname>OMDataSource</classname> implementation, it's exactly the |
| other way round: it is far easier to serialize an infoset to an |
| <classname>XMLStreamWriter</classname> (push) than to build an |
| <classname>XMLStreamReader</classname> from which a consumer can read (pull) events. |
| </para> |
| <para> |
| Experience indeed shows that the most challenging part in creating an |
| <classname>OMDataSource</classname> implementation is to write the |
| <methodname>getReader</methodname> method. In the past, to avoid that difficulty some |
| implementations simply built an Axiom tree and returned the |
| <classname>XMLStreamReader</classname> provided by |
| <methodname>OMElement#getXMLStreamReader()</methodname>. For example, older versions of ADB |
| (Axis2 Data Binding) used the following code<footnote><para>For the complete |
| code, see <link xlink:href="http://svn.apache.org/repos/asf/axis/axis2/java/core/tags/v1.5/modules/adb/src/org/apache/axis2/databinding/ADBDataSource.java"/>.</para></footnote>: |
| </para> |
| <example xml:id="adb-getReader"> |
| <title><methodname>OMDataSource#getReader()</methodname> implementation used in older ADB versions</title> |
| <programlisting>public XMLStreamReader getReader() throws XMLStreamException { |
| MTOMAwareOMBuilder mtomAwareOMBuilder = new MTOMAwareOMBuilder(); |
| serialize(mtomAwareOMBuilder); |
| return mtomAwareOMBuilder.getOMElement().getXMLStreamReader(); |
| }</programlisting> |
| </example> |
| <para> |
| The <classname>MTOMAwareOMBuilder</classname> class referenced by this code was a special |
| implementation of <classname>XMLStreamWriter</classname> building an Axiom tree from the |
| sequence of events sent to it. The code than used this Axiom tree to get the |
| <classname>XMLStreamReader</classname> implementation. While this was a functionally correct |
| implementation of the <methodname>getReader</methodname> method, it is not a good |
| solution from a performance perspective and also contradicts some of the ideas on |
| which Axiom is based, namely that the object model should only be built when necessary. |
| </para> |
| <para> |
| Starting with Axiom 1.2.14, there is a solution to avoid this anti-pattern. |
| <classname>OMDataSource</classname> implementations that cannot provide a meaningful |
| <classname>XMLStreamReader</classname> instance should extend |
| <classname>org.apache.axiom.om.ds.AbstractPushOMDataSource</classname> and only |
| implement the <methodname>serialize</methodname> method. |
| <classname>OMSourcedElement</classname> will handle <classname>OMDataSource</classname> implementations extending this class |
| differently when it comes to expansion: instead of using <methodname>OMDataSource#getReader()</methodname> to |
| expand the element, it will use <methodname>OMDataSource#serialize(XMLStreamWriter)</methodname> (with a special |
| <classname>XMLStreamWriter</classname> that builds the descendants of the <classname>OMSourcedElement</classname>). Note that this means |
| that such an <classname>OMSourcedElement</classname> will be expanded instantly, and that deferred building of |
| the descendants is not applicable. Nevertheless, this approach is significantly more efficient |
| than using the OM-inside-OMDataSource anti-pattern. |
| </para> |
| </section> |
| <section> |
| <title>Strong version</title> |
| <para> |
| There is also a stronger version of the anti-pattern which consists in |
| implementing the <methodname>serialize</methodname> method by building an Axiom tree |
| and then serializing the tree to the <classname>XMLStreamWriter</classname>. |
| Except for very special cases, there is <emphasis role="strong">no valid reason |
| whatsoever</emphasis> to do this! To see why this is so, consider the two |
| possible cases: |
| </para> |
| <orderedlist> |
| <listitem> |
| <para> |
| The <classname>OMDataSource</classname> already implements the |
| <methodname>getReader</methodname> method in a proper way, i.e. without |
| building an intermediary Axiom tree. To properly implement |
| <methodname>serialize</methodname>, it is then sufficient |
| to pull the events from the reader returned by a call to |
| <methodname>getReader</methodname> and copy them to the |
| <classname>XMLStreamReader</classname>. The easiest and most efficient |
| way to do this is to extend <classname>org.apache.axiom.om.ds.AbstractPullOMDataSource</classname> |
| (available in Axiom 1.2.14), which implements the <methodname>serialize</methodname> |
| method in exactly that way. |
| There is thus no need to build an intermediary object model in this case. |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| The <methodname>getReader</methodname> method also uses an intermediary |
| Axiom tree<footnote><para>See e.g. |
| <link xlink:href="http://svn.apache.org/repos/asf/axis/axis2/java/core/tags/v1.5/modules/kernel/src/org/apache/axis2/builder/unknowncontent/UnknownContentOMDataSource.java"/>.</para></footnote>. |
| In that case it doesn't make sense to use an <classname>OMSourcedElement</classname> |
| in the first place! At least it doesn't make sense if one assumes that |
| in general the <classname>OMSourcedElement</classname> will either be |
| serialized or its content accessed after being added to the tree. Indeed, |
| in this case the Axiom tree will be built at least once (if not multiple times), |
| so that the code might as well use a normal <classname>OMElement</classname>. |
| </para> |
| <para> |
| This only leaves the very special case where the <classname>OMSourcedElement</classname> |
| is in general neither accessed nor serialized, either because it will usually be somehow |
| discarded or because the code uses <methodname>OMDataSourceExt#getObject()</methodname> |
| to retrieve the raw data. Even in that case one can argue that in general |
| it should not be too hard to implement at least the <methodname>serialize</methodname> |
| method properly by transforming the raw or foreign data directly to StAX events written to the |
| <classname>XMLStreamWriter</classname>. |
| </para> |
| </listitem> |
| </orderedlist> |
| <para> |
| QED |
| </para> |
| </section> |
| </section> |
| </chapter> |
| |
| <chapter xml:id="appendix"> |
| <title>Appendix</title> |
| <section> |
| <title>Program Listing for Build and Serialize</title> |
| <programlisting><?db-font-size 80%?>import org.apache.axiom.om.OMElement; |
| import org.apache.axiom.om.OMXMLBuilderFactory; |
| import org.apache.axiom.om.OMXMLParserWrapper; |
| |
| import javax.xml.stream.XMLStreamException; |
| import java.io.FileInputStream; |
| import java.io.FileNotFoundException; |
| import java.io.InputStream; |
| |
| public class TestOMBuilder { |
| |
| /** |
| * Pass the file name as an argument |
| * @param args |
| */ |
| public static void main(String[] args) { |
| try { |
| //create the input stream |
| InputStream in = new FileInputStream(args[0]); |
| //create the builder |
| OMXMLParserWrapper builder = OMXMLBuilderFactory.createOMBuilder(in); |
| //get the root element |
| OMElement documentElement = builder.getDocumentElement(); |
| |
| //dump the out put to console with caching |
| System.out.println(documentElement.toStringWithConsume()); |
| |
| } catch (XMLStreamException e) { |
| e.printStackTrace(); |
| } catch (FileNotFoundException e) { |
| e.printStackTrace(); |
| } |
| } |
| }</programlisting> |
| </section> |
| <section xml:id="links"> |
| <title>Links</title> |
| <para> |
| For basics in XML |
| </para> |
| <itemizedlist> |
| <listitem> |
| <para> |
| <link xlink:href="http://www-128.ibm.com/developerworks/xml/newto/index.html">Developerworks Introduction to XML</link> |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| <link xlink:href="http://www.bearcave.com/software/java/xml/xmlpull.html">Introduction to Pull parsing</link> |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| <link xlink:href="http://today.java.net/pub/a/today/2006/07/20/introduction-to-stax.html">Introduction to StAX</link> |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| <link xlink:href="http://www.jaxmag.com/itr/online_artikel/psecom,id,726,nodeid,147.html">Fast and Lightweight Object Model for XML</link> |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| <link xlink:href="http://www-128.ibm.com/developerworks/library/x-axiom/">Get the most out of XML processing with AXIOM</link> |
| </para> |
| </listitem> |
| </itemizedlist> |
| </section> |
| </chapter> |
| |
| <bibliography> |
| <title>References</title> |
| <bibliodiv> |
| <title>Specifications</title> |
| <biblioentry xml:id="bib.xml"> |
| <abbrev>XML</abbrev> |
| <title><link xlink:href="http://www.w3.org/TR/2008/REC-xml-20081126/">Extensible Markup Language (XML) 1.0 (Fifth Edition)</link></title> |
| <publishername>W3C Recommendation</publishername> |
| <pubdate>26 November 2008</pubdate> |
| </biblioentry> |
| <biblioentry xml:id="bib.xmlns"> |
| <abbrev>XMLNS</abbrev> |
| <title><link xlink:href="http://www.w3.org/TR/2009/REC-xml-names-20091208/">Namespaces in XML 1.0 (Third Edition)</link></title> |
| <publishername>W3C Recommendation</publishername> |
| <pubdate>8 December 2009</pubdate> |
| </biblioentry> |
| <biblioentry xml:id="bib.xmlns11"> |
| <abbrev>XMLNS11</abbrev> |
| <title><link xlink:href="http://www.w3.org/TR/2006/REC-xml-names11-20060816/">Namespaces in XML 1.1 (Second Edition)</link></title> |
| <publishername>W3C Recommendation</publishername> |
| <pubdate>16 August 2006</pubdate> |
| </biblioentry> |
| </bibliodiv> |
| </bibliography> |
| </book> |