blob: 39f84f2aee7e6b47f3515c44c50afa95eab0259f [file] [log] [blame]
= Processing XML
== Parsing XML
=== XmlParser and XmlSlurper
The most commonly used approach for parsing XML with Groovy is to use
one of:
* `groovy.util.XmlParser`
* `groovy.util.XmlSlurper`
Both have the same approach to parse an xml. Both come with a bunch of
overloaded parse methods plus some special methods such as `parseText`,
parseFile and others. For the next example we will use the `parseText`
method. It parses a XML `String` and recursively converts it to a list
or map of objects.
[source,groovy]
.XmlSlurper
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideXmlSlurperTest.groovy[tags=testParseText,indent=0]
----
<1> Parsing the XML an returning the root node as a GPathResult
<2> Checking we're using a GPathResult
<3> Traversing the tree in a GPath style
[source,groovy]
.XmlParser
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideXmlParserTest.groovy[tags=testParseText,indent=0]
----
<1> Parsing the XML an returning the root node as a Node
<2> Checking we're using a Node
<3> Traversing the tree in a GPath style
Let's see the *similarities* between `XMLParser` and `XMLSlurper` first:
* Both are based on `SAX` so they both are low memory footprint
* Both can update/transform the XML
But they have key *differences*:
* `XmlSlurper` evaluates the structure lazily. So if you update the xml
you'll have to evaluate the whole tree again.
* `XmlSlurper` returns `GPathResult` instances when parsing XML
* `XmlParser` returns `Node` objects when parsing XML
When to use one or the another?
NOTE: There is a discussion at
http://stackoverflow.com/questions/7558019/groovy-xmlslurper-vs-xmlparser[StackOverflow]. The
conclusions written here are based partially in this entry.
* *If you want to transform an existing document to another* then
`XmlSlurper` will be the choice
* *If you want to update and read at the same time* then `XmlParser` is
the choice.
The rationale behind this is that every time you create a node with
`XmlSlurper` it won't be available until you parse the document again
with another `XmlSlurper` instance. Need to read just a few nodes
XmlSlurper is for you ".
* *If you just have to read a few nodes* `XmlSlurper` should be your
choice, since it will not have to create a complete structure in
memory"
In general both classes perform similar way. Even the way of using
GPath expressions with them are the same (both use `breadthFirst()` and
`depthFirst()` expressions). So I guess it depends on the write/read
frequency.
=== DOMCategory
There is another way of parsing XML documents with Groovy with the
used of `groovy.xml.dom.DOMCategory` which is a category class which
adds GPath style operations to Java's DOM classes.
NOTE: Java has in-built support for DOM processing of XML using classes
representing the various parts of XML documents, e.g. `Document`,
`Element`, `NodeList`, `Attr` etc. For more information about these classes,
refer to the respective JavaDocs.
Having a XML like the following:
[source,groovy]
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideDOMCategory.groovy[tags=testXML,indent=0]
----
You can parse it using 'groovy.xml.DOMBuilder` and
`groovy.xml.dom.DOMCategory`.
[source,groovy]
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideDOMCategory.groovy[tags=testExample1,indent=0]
----
<1> Parsing the XML
<2> Creating `DOMCategory` scope to be able to use helper method calls
== GPath
The most common way of querying XML in Groovy is using `GPath`:
__GPath is a path expression language integrated into Groovy which
allows parts of nested structured data to be identified. In this
sense, it has similar aims and scope as XPath does for XML. The two
main places where you use GPath expressions is when dealing with
nested POJOs or when dealing with XML__
It is similar to http://en.wikipedia.org/wiki/XPath[XPath]
expressions and you can use it not only with XML but also with POJO
classes. As an example, you can specify a path to an object or element
of interest:
* `a.b.c` -> for XML, yields all the `<c>` elements inside `<b>` inside `<a>`
* `a.b.c` -> all POJOs, yields the `<c>` properties for all the `<b>`
properties of `<a>` (sort of like a.getB().getC() in JavaBeans)
For XML, you can also specify attributes, e.g.:
* `a["@href"]` -> the href attribute of all the a elements
* `a.'@href'` -> an alternative way of expressing this
* `a.@href` -> an alternative way of expressing this when using XmlSlurper
Let's illustrate this with an example:
[source,groovy]
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideXmlSlurperTest.groovy[tags=books,indent=0]
----
=== Simply traversing the tree
First thing we could do is to get a value using POJO's notation. Let's
get the first book's author's name
[source,groovy]
.Getting node value
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideXmlSlurperTest.groovy[tags=testGettingANodeText,indent=0]
----
First we parse the document with `XmlSlurper` and the we have to
consider the returning value as the root of the XML document, so in
this case is "response".
That's why we start traversing the document from response and then
`value.books.book[0].author`. Note that in `XPath` the node arrays starts
in [1] instead of [0], but because `GPath` is Java-based it begins at
index 0.
In the end we'll have the instance of the `author` node and because we
wanted the text inside that node we should be calling the `text()`
method. The `author` node is an instance of `GPathResult` type and
`text()` a method giving us the content of that node as a String.
When using `GPath` with an xml parsed with `XmlSlurper` we'll have as a
result a `GPathResult` object. `GPathResult` has many other convenient
methods to convert the text inside a node to any other type such as:
* `toInteger()`
* `toFloat()`
* `toBigInteger()`
* ...
All these methods try to convert a `String` to the appropriate type.
If we were using a XML parsed with `XmlParser` we could be dealing with
instances of type `Node`. But still all the actions applied to
`GPathResult` in these examples could be applied to a Node as
well. Creators of both parsers took into account `GPath` compatibility.
Next step is to get the some values from a given node's attribute. In the following sample
we want to get the first book's author's id. We'll be using two different approaches. Let's see the code first:
[source,groovy]
.Getting an attribute's value
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideXmlSlurperTest.groovy[tags=testGettingAnAttributeText,indent=0]
----
<1> Getting the first book node
<2> Getting the book's id attribute `@id`
<3> Getting the book's id attribute with `map notation` `['@id']`
<4> Getting the value as a String
<5> Getting the value of the attribute as an `Integer`
As you can see there are to types of notations to get attributes,
the
* _direct notation_ with `@nameoftheattribute`
* _map notation_ using `['@nameoftheattribute']`
Both of them are equally valid.
=== Speed things up with breadthFirst and depthfirst
If you ever have used XPath you may have used expressions like
* `//` : Look everywhere
* `/following-sibling::othernode` : Look for a node "othernode" in the same level
More or less we have their counterparts in `Gpath` with the methods
`breadthFirst()` and `depthfirst()`.
The first example shows a simple use of `breadthFirst()`. The creators of
this methods created a shorter syntax for it using the symbol `*`.
[source,groovy]
.breadthFirst()
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideXmlSlurperTest.groovy[tags=testBreadFirst1,indent=0]
----
This test searches for any node at the same level of the "books" node
first, and *only if* it couldn't find the node we were looking for,
then it will look deeper in the tree, always taking into account the
given the expression inside the closure.
The expression says *_Look for any node with a tag name
equals 'book' having an id with a value of '2'_*.
But what if we would like to look for a given value
without having to know exactly where it is. Let's say that the
only thing we know is the id of the author "Lewis Carroll" . How are
we going to be able to find that book? `depthFirst()` is the solution:
[source,groovy]
.depthFirst()
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideXmlSlurperTest.groovy[tags=testDepthFirst1,indent=0]
----
`depthfirst()` is the same as looking something *everywhere in the
tree from this point down*. In this case we've used the method
`find(Closure cl)` to find just the first occurrence.
What if we want to collect all book's titles?
[source,groovy]
.depthFirst()
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideXmlSlurperTest.groovy[tags=testDepthFirst2,indent=0]
----
It is worth mentioning again that there are some useful methods
converting a node's value to an integer,float...etc. Those methods
could be convenient when doing comparisons like this:
[source,groovy]
.helpers
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideXmlSlurperTest.groovy[tags=testHelpers,indent=0]
----
In this case the number 2 has been hardcoded but imagine that value
could have come from any other source (Database...etc)
== Creating XML
The most commonly used approach for creating XML with Groovy is to use
a builder, i.e. one of:
* `groovy.xml.MarkupBuilder`
* `groovy.xml.StreamingMarkupBuilder`
=== MarkupBuilder
Here is an example of using Groovy's MarkupBuilder to create a new XML file:
[source,groovy]
.Creating Xml with MarkupBuilder
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideMarkupBuilderTest.groovy[tags=createCarsTest,indent=0]
----
<1> Create an instance of `MarkupBuilder`
<2> Start creating the XML tree
<3> Create an instance of `XmlSlurper` to traverse and test the
generated XML
Let's take a look a little bit closer:
[source,groovy]
.Creating XML elements
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideMarkupBuilderTest.groovy[tags=testCreateSimpleXml1,indent=0]
----
<1> We're creating a reference string to compare against
<2> The `xmlWriter` instance is used by `MarkupBuilder` to convert the
xml representation to a String instance eventually
<3> The `xmlMarkup.movie(...)` call will create a XML node with a tag
called `movie` and with content `the godfather`.
[source,groovy]
.Creating XML elements with attributes
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideMarkupBuilderTest.groovy[tags=testCreateSimpleXml2,indent=0]
----
<1> This time in order to create both attributes and node content you
can create as many map entries as you like and finally add a value
to set the node's content
NOTE: The value could be any `Object`, the value will be serialized to its
`String` representation.
[source,groovy]
.Creating XML nested elements
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideMarkupBuilderTest.groovy[tags=testCreateSimpleXml3,indent=0]
----
<1> A closure represents the children elements of a given node. Notice
this time instead of using a String for the attribute we're using a
number.
Sometimes you may want to use a specific namespace in your xml documents:
[source,groovy]
.Namespace aware
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideMarkupBuilderTest.groovy[tags=testNamespaceAware,indent=0]
----
<1> Creating a node with a given namespace `xmlns:x`
<2> Creating a `XmlSlurper` registering the namespace to be able to
test the XML we just created
What about having some more meaningful example. We may want to
generate more elements, to have some logic when creating our XML:
[source,groovy]
.Mix code
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideMarkupBuilderTest.groovy[tags=testComplexUse1,indent=0]
----
<1> Generating elements from a range
<2> Using a conditional for creating a given element
Of course the instance of a builder can be passed as a parameter to
refactor/modularize your code:
[source,groovy]
.Mix code
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideMarkupBuilderTest.groovy[tags=testComplexUse2,indent=0]
----
<1> In this case we've created a Closure to handle the creation of a list of movies
<2> Just using the `buildMovieList` function when necessary
=== StreamingMarkupBuilder
The class `groovy.xml.StreamingMarkupBuilder` is a builder class for
creating XML markup. This implementation uses a
`groovy.xml.streamingmarkupsupport.StreamingMarkupWriter` to handle
output.
[source,groovy]
.Using StreamingMarkupBuilder
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideStreamingMarkupBuilderTest.groovy[tags=testSimpleExample,indent=0]
----
<1> Note that `StreamingMarkupBuilder.bind` returns a `Writable`
instance that may be used to stream the markup to a Writer
<2> We're capturing the output in a String to parse it again an check
the structure of the generated XML with `XmlSlurper`.
=== MarkupBuilderHelper
The `groovy.xml.MarkupBuilderHelper` is, as its name reflects, a
helper for `groovy.xml.MarkupBuilder`.
This helper normally can be accessed from within an instance of class
`groovy.xml.MarkupBuilder` or an instance of
`groovy.xml.StreamingMarkupBuilder`.
This helper could be handy in situations when you may want to:
* Produce a comment in the output
* Produce an XML processing instruction in the output
* Produce an XML declaration in the output
* Print data in the body of the current tag, escaping XML entities
* Print data in the body of the current tag
In both `MarkupBuilder` and `StreamingMarkupBuilder` this helper is
accessed by the property `mkp`:
[source,groovy]
.Using MarkupBuilder's 'mkp'
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideMarkupBuilderTest.groovy[tags=testMkp1,indent=0]
----
<1> Using `mkp` to create a comment in the XML
<2> Using `mkp` to generate an escaped value
<3> Checking both assumptions were true
Here is another example to show the use of `mkp` property accesible
from within the `bind` method scope when using
`StreamingMarkupBuilder`:
[source,groovy]
.Using StreamingMarkupBuilder's 'mkp'
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideStreamingMarkupBuilderTest.groovy[tags=testMkp,indent=0]
----
<1> If we want to generate a escaped value for the name attribute with
`mkp.yield`
<2> Checking the values later on with `XmlSlurper`
=== DOMToGroovy
Suppose we have an existing XML document and we want to automate
generation of the markup without having to type it all in? We just
need to use `org.codehaus.groovy.tools.xml.DOMToGroovy` as shown in
the following example:
[source,groovy]
.Building MarkupBuilder from DOMToGroovy
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideMarkupBuilderTest.groovy[tags=testDOMToGroovy,indent=0]
----
<1> Creating `DOMToGroovy` instance
<2> Converts the XML to `MarkupBuilder` calls which are available in the output `StringWriter`
<3> Using `output` variable to create the whole MarkupBuilder
<4> Back to XML string
== Manipulating XML
In this chapter you'll see the different ways of adding / modifying /
removing nodes using `XmlSlurper` or `XmlParser`. The xml we are going
to be handling is the following:
[source,groovy]
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideXmlParserTest.groovy[tags=responseBookXml,indent=0]
----
=== Adding nodes
The main difference between `XmlSlurper` and `XmlParser` is that when
former creates the nodes they won't be available until the document's
been evaluated again, so you should parse the transformed document
again in order to be able to see the new nodes. So keep that in mind
when choosing any of both approaches.
If you needed to see a node right after creating it then `XmlParser`
should be your choice, but if you're planning to do many changes to
the XML and send the result to another process maybe `XmlSlurper` would
be more efficient.
You can't create a new node directly using the `XmlSlurper` instance,
but you can with `XmlParser`. The way of creating a new node from
XmlParser is through its method `createNode(..)`
[source,groovy]
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideXmlParserTest.groovy[tags=testAddingNodes1,indent=0]
----
The `createNode()` method receives the following parameters:
* parent node (could be null)
* The qualified name for the tag (In this case we only use the local
part without any namespace). We're using an instance of
`groovy.xml.QName`
* A map with the tag's attributes (None in this particular case)
Anyway you won't normally be creating a node from the parser instance
but from the parsed XML instance. That is from a `Node` or a
`GPathResult` instance.
Take a look at the next example. We are parsing the xml with `XmlParser`
and then creating a new node from the parsed document's instance
(Notice the method here is slightly different in the way it receives
the parameters):
[source,groovy]
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideXmlParserTest.groovy[tags=testAddingNodes2,indent=0]
----
When using `XmlSlurper`, `GPathResult` instances don't have `createNode()`
method.
=== Modifying / Removing nodes
We know how to parse the document, add new nodes, now I want to change
a given node's content. Let's start using `XmlParser` and `Node`. This
example changes the first book information to actually another book.
[source,groovy]
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideXmlParserTest.groovy[tags=testModifyingNodes1,indent=0]
----
When using `replaceNode()` the closure we pass as parameter should
follow the same rules as if we were using `groovy.xml.MarkupBuilder`:
Here's the same example using `XmlSlurper`:
[source,groovy]
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideXmlSlurperTest.groovy[tags=testModifyingNodes1,indent=0]
----
Notice how using `XmlSlurper` we have to parse the transformed document
again in order to find the created nodes. In this particular example
could be a little bit annoying isn't it?
Finally both parsers also use the same approach for adding a new
attribute to a given attribute. This time again the difference is
whether you want the new nodes to be available right away or
not. First `XmlParser`:
[source,groovy]
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideXmlParserTest.groovy[tags=testSettingAttributes1,indent=0]
----
And `XmlSlurper`:
[source,groovy]
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideXmlSlurperTest.groovy[tags=testSettingAttributes1,indent=0]
----
When using `XmlSlurper`, adding a new attribute does *not* require you to perform a new evaluation.
=== Printing XML
==== XmlUtil
Sometimes is useful to get not only the value of a given node but the
node itself (for instance to add this node to another XML).
For that you can use `groovy.xml.XmlUtil` class. It has several static
methods to serialize the xml fragment from several type of sources
(Node,GPathResult,String...)
[source,groovy]
.Getting a node as a string
----
include::{rootProjectDir}/subprojects/groovy-xml/src/spec/test/UserGuideXmlUtilTest.groovy[tags=testGettingANode,indent=0]
----