blob: 4689051306fe05a88c13c2fad5603c5651c5dc3f [file] [log] [blame]
<?xml version="1.0"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<document>
<properties>
<title>XML Howto</title>
<author email="oliver.heger@t-online.de">Oliver Heger</author>
</properties>
<body>
<section name="Using XML based Configurations">
<p>
This section explains how to use hierarchical
and structured XML datasets.
</p>
</section>
<section name="Hierarchical properties">
<p>
Because of its
tree-like nature XML documents can represent data that is
structured in many ways. This section explains how to deal with
such structured documents and demonstrates the enhanced query
facilities supported by the <code>XMLConfiguration</code> class..
</p>
<subsection name="Accessing properties defined in XML documents">
<p>
We will start with a simple XML document to show some basics
about accessing properties. The following file named
<code>gui.xml</code> is used as example document:
</p>
<source><![CDATA[
<?xml version="1.0" encoding="ISO-8859-1" ?>
<gui-definition>
<colors>
<background>#808080</background>
<text>#000000</text>
<header>#008000</header>
<link normal="#000080" visited="#800080"/>
<default>${colors.header}</default>
</colors>
<rowsPerPage>15</rowsPerPage>
<buttons>
<name>OK,Cancel,Help</name>
</buttons>
<numberFormat pattern="###\,###.##"/>
</gui-definition>
]]></source>
<p>
(As becomes obvious, this tutorial does not bother with good
design of XML documents, the example file should rather
demonstrate the different ways of accessing properties.)
To access the data stored in this document it must be loaded
by <code>XMLConfiguration</code>. Like other file based
configuration classes <code>XMLConfiguration</code> supports
many ways of specifying the file to process. One way is to
pass the file name to the constructor as shown in the following
code fragment:
</p>
<source><![CDATA[
try
{
XMLConfiguration config = new XMLConfiguration("tables.xml");
// do something with config
}
catch(ConfigurationException cex)
{
// something went wrong, e.g. the file was not found
}
]]></source>
<p>
If no exception was thrown, the properties defined in the
XML document are now available in the configuration object.
The following fragment shows how the properties can be accessed:
</p>
<source><![CDATA[
String backColor = config.getString("colors.background");
String textColor = config.getString("colors.text");
String linkNormal = config.getString("colors.link[@normal]");
String defColor = config.getString("colors.default");
int rowsPerPage = config.getInt("rowsPerPage");
List buttons = config.getList("buttons.name");
]]></source>
<p>
This listing demonstrates some important points about constructing
keys for accessing properties load from XML documents and about
features of <code>XMLConfiguration</code> in general:
<ul>
<li>
Nested elements are accessed using a dot notation. In
the example document there is an element
<code>&lt;text&gt;</code> in the body of the
<code>&lt;color&gt;</code> element. The corresponding
key is <code>color.text</code>.
</li>
<li>
The root element is ignored when constructing keys. In
the example you do not write
<code>gui-definition.color.text</code>, but only
<code>color.text</code>.
</li>
<li>
Attributes of XML elements are accessed in a XPath like
notation.
</li>
<li>
Interpolation can be used as in <code>PropertiesConfiguration</code>.
Here the <code>&lt;default&gt;</code> element in the
<code>colors</code> section refers to another color.
</li>
<li>
Lists of properties can be defined in a short form using
the delimiter character (which is the comma by default).
In this example the <code>buttons.name</code> property
has the three values <em>OK</em>, <em>Cancel</em>, and
<em>Help</em>, so it is queried using the <code>getList()</code>
method. This works in attributes, too. Using the static
<code>setDelimiter()</code> method of
<code>AbstractConfiguration</code> you can globally
define a different delimiter character or -
by setting the delimiter to 0 - disabling this mechanism
completely. Placing a backslash before a delimiter
character will escape it. This is demonstrated in the
<code>pattern</code> attribute of the <code>numberFormat</code>
element.
</li>
</ul>
</p>
<p>
In the next section will show how data in a more complex XML
document can be processed.
</p>
</subsection>
<subsection name="Structured XML">
<p>
Consider the following scenario: An application operates on
database tables and wants to load a definition of the database
schema from its configuration. A XML document provides this
information. It could look as follows:
</p>
<source><![CDATA[
<?xml version="1.0" encoding="ISO-8859-1" ?>
<database>
<tables>
<table tableType="system">
<name>users</name>
<fields>
<field>
<name>uid</name>
<type>long</type>
</field>
<field>
<name>uname</name>
<type>java.lang.String</type>
</field>
<field>
<name>firstName</name>
<type>java.lang.String</type>
</field>
<field>
<name>lastName</name>
<type>java.lang.String</type>
</field>
<field>
<name>email</name>
<type>java.lang.String</type>
</field>
</fields>
</table>
<table tableType="application">
<name>documents</name>
<fields>
<field>
<name>docid</name>
<type>long</type>
</field>
<field>
<name>name</name>
<type>java.lang.String</type>
</field>
<field>
<name>creationDate</name>
<type>java.util.Date</type>
</field>
<field>
<name>authorID</name>
<type>long</type>
</field>
<field>
<name>version</name>
<type>int</type>
</field>
</fields>
</table>
</tables>
</database>
]]></source>
<p>
This XML is quite self explanatory; there is an arbitrary number
of table elements, each of it has a name and a list of fields.
A field in turn consists of a name and a data type. This
XML document (let's call it <code>tables.xml</code>) can be
loaded in exactly the same way as the simple document in the
section before.
</p>
<p>
When we now want to access some of the properties we face a
problem: the syntax for constructing configuration keys we
learned so far is not powerful enough to access all of the data
stored in the tables document.
</p>
<p>
Because the document contains a list of tables some properties
are defined more than once. E.g. the configuration key
<code>tables.table.name</code> refers to a <code>name</code>
element inside a <code>table</code> element inside a
<code>tables</code> element. This constellation happens to
occur twice in the tables document.
</p>
<p>
Multiple definitions of a property do not cause problems and are
supported by all classes of Configuration. If such a property
is queried using <code>getProperty()</code>, the method
recognizes that there are multiple values for that property and
returns a collection with all these values. So we could write
</p>
<source><![CDATA[
Object prop = config.getProperty("tables.table.name");
if(prop instanceof Collection)
{
System.out.println("Number of tables: " + ((Collection) prop).size());
}
]]></source>
<p>
An alternative to this code would be the <code>getList()</code>
method of <code>Configuration</code>. If a property is known to
have multiple values (as is the table name property in this example),
<code>getList()</code> allows to retrieve all values at once.
<b>Note:</b> it is legal to call <code>getString()</code>
or one of the other getter methods on a property with multiple
values; it returns the first element of the list.
</p>
</subsection>
<subsection name="Accessing structured properties">
<p>
Okay, we can obtain a list with the name of all defined
tables. In the same way we can retrieve a list with the names
of all table fields: just pass the key
<code>tables.table.fields.field.name</code> to the
<code>getList()</code> method. In our example this list
would contain 10 elements, the names of all fields of all tables.
This is fine, but how do we know, which field belongs to
which table?
</p>
<p>
When working with such hierarchical structures the configuration keys
used to query properties can have an extended syntax. All components
of a key can be appended by a numerical value in parentheses that
determines the index of the affected property. So if we have two
<code>table</code> elements we can exactly specify, which one we
want to address by appending the corresponding index. This is
explained best by some examples:
</p>
<p>
We will now provide some configuration keys and show the results
of a <code>getProperty()</code> call with these keys as arguments.
<dl>
<dt><code>tables.table(0).name</code></dt>
<dd>
Returns the name of the first table (all indices are 0 based),
in this example the string <em>users</em>.
</dd>
<dt><code>tables.table(0)[@tableType]</code></dt>
<dd>
Returns the value of the tableType attribute of the first
table (<em>system</em>).
</dd>
<dt><code>tables.table(1).name</code></dt>
<dd>
Analogous to the first example returns the name of the
second table (<em>documents</em>).
</dd>
<dt><code>tables.table(2).name</code></dt>
<dd>
Here the name of a third table is queried, but because there
are only two tables result is <b>null</b>. The fact that a
<b>null</b> value is returned for invalid indices can be used
to find out how many values are defined for a certain property:
just increment the index in a loop as long as valid objects
are returned.
</dd>
<dt><code>tables.table(1).fields.field.name</code></dt>
<dd>
Returns a collection with the names of all fields that
belong to the second table. With such kind of keys it is
now possible to find out, which fields belong to which table.
</dd>
<dt><code>tables.table(1).fields.field(2).name</code></dt>
<dd>
The additional index after field selects a certain field.
This expression represents the name of the third field in
the second table (<em>creationDate</em>).
</dd>
<dt><code>tables.table.fields.field(0).type</code></dt>
<dd>
This key may be a bit unusual but nevertheless completely
valid. It selects the data types of the first fields in all
tables. So here a collection would be returned with the
values [<em>long, long</em>].
</dd>
</dl>
</p>
<p>
These examples should make the usage of indices quite clear.
Because each configuration key can contain an arbitrary number
of indices it is possible to navigate through complex structures of
XML documents; each XML element can be uniquely identified.
</p>
</subsection>
<subsection name="Adding new properties">
<p>
So far we have learned how to use indices to avoid ambiguities when
querying properties. The same problem occurs when adding new
properties to a structured configuration. As an example let's
assume we want to add a new field to the second table. New properties
can be added to a configuration using the <code>addProperty()</code>
method. Of course, we have to exactly specify where in the tree like structure new
data is to be inserted. A statement like
</p>
<source><![CDATA[
// Warning: This might cause trouble!
config.addProperty("tables.table.fields.field.name", "size");
]]></source>
<p>
would not be sufficient because it does not contain all needed
information. How is such a statement processed by the
<code>addProperty()</code> method?
</p>
<p>
<code>addProperty()</code> splits the provided key into its
single parts and navigates through the properties tree along the
corresponding element names. In this example it will start at the
root element and then find the <code>tables</code> element. The
next key part to be processed is <code>table</code>, but here a
problem occurs: the configuration contains two <code>table</code>
properties below the <code>tables</code> element. To get rid off
this ambiguity an index can be specified at this position in the
key that makes clear, which of the two properties should be
followed. <code>tables.table(1).fields.field.name</code> e.g.
would select the second <code>table</code> property. If an index
is missing, <code>addProperty()</code> always follows the last
available element. In our example this would be the second
<code>table</code>, too.
</p>
<p>
The following parts of the key are processed in exactly the same
manner. Under the selected <code>table</code> property there is
exactly one <code>fields</code> property, so this step is not
problematic at all. In the next step the <code>field</code> part
has to be processed. At the actual position in the properties tree
there are multiple <code>field</code> (sub) properties. So we here
have the same situation as for the <code>table</code> part.
Because no explicit index is defined the last <code>field</code>
property is selected. The last part of the key passed to
<code>addProperty()</code> (<code>name</code> in this example)
will always be added as new property at the position that has
been reached in the former processing steps. So in our example
the last <code>field</code> property of the second table would
be given a new <code>name</code> sub property and the resulting
structure would look like the following listing:
</p>
<source><![CDATA[
...
<table tableType="application">
<name>documents</name>
<fields>
<field>
<name>docid</name>
<type>long</type>
</field>
<field>
<name>name</name>
<type>java.lang.String</type>
</field>
<field>
<name>creationDate</name>
<type>java.util.Date</type>
</field>
<field>
<name>authorID</name>
<type>long</type>
</field>
<field>
<name>version</name>
<name>size</name> <== Newly added property
<type>int</type>
</field>
</fields>
</table>
</tables>
</database>
]]></source>
<p>
This result is obviously not what was desired, but it demonstrates
how <code>addProperty()</code> works: the method follows an
existing branch in the properties tree and adds new leaves to it.
(If the passed in key does not match a branch in the existing tree,
a new branch will be added. E.g. if we pass the key
<code>tables.table.data.first.test</code>, the existing tree can be
navigated until the <code>data</code> part of the key. From here a
new branch is started with the remaining parts <code>data</code>,
<code>first</code> and <code>test</code>.)
</p>
<p>
If we want a different behavior, we must explicitely tell
<code>addProperty()</code> what to do. In our example with the
new field our intension was to create a new branch for the
<code>field</code> part in the key, so that a new <code>field</code>
property is added to the structure rather than adding sub properties
to the last existing <code>field</code> property. This can be
achieved by specifying the special index <code>(-1)</code> at the
corresponding position in the key as shown below:
</p>
<source><![CDATA[
config.addProperty("tables.table(1).fields.field(-1).name", "size");
config.addProperty("tables.table(1).fields.field.type", "int");
]]></source>
<p>
The first line in this fragment specifies that a new branch is
to be created for the <code>field</code> property (index -1).
In the second line no index is specified for the field, so the
last one is used - which happens to be the field that has just
been created. So these two statements add a fully defined field
to the second table. This is the default pattern for adding new
properties or whole hierarchies of properties: first create a new
branch in the properties tree and then populate its sub properties.
As an additional example let's add a complete new table definition
to our example configuration:
</p>
<source><![CDATA[
// Add a new table element and define the name
config.addProperty("tables.table(-1).name", "versions");
// Add a new field to the new table
// (an index for the table is not necessary because the latest is used)
config.addProperty("tables.table.fields.field(-1).name", "id");
config.addProperty("tables.table.fields.field.type", "int");
// Add another field to the new table
config.addProperty("tables.table.fields.field(-1).name", "date");
config.addProperty("tables.table.fields.field.type", "java.sql.Date");
...
]]></source>
<p>
For more information about adding properties to a hierarchical
configuration also have a look at the javadocs for
<code>HierarchicalConfiguration</code>.
</p>
</subsection>
<subsection name="Escaping dot characters in XML tags">
<p>
In XML the dot character used as delimiter by most configuration
classes is a legal character that can occur in any tag. So the
following XML document is completely valid:
</p>
<source><![CDATA[
<?xml version="1.0" encoding="ISO-8859-1" ?>
<configuration>
<test.value>42</test.value>
<test.complex>
<test.sub.element>many dots</test.sub.element>
</test.complex>
</configuration>
]]></source>
<p>
This XML document can be loaded by <code>XMLConfiguration</code>
without trouble, but when we want to access certain properties
we face a problem: The configuration claims that it does not
store any values for the properties with the keys
<code>test.value</code> or <code>test.complex.test.sub.element</code>!
</p>
<p>
Of course, it is the dot character contained in the property
names, which causes this problem. A dot is always interpreted
as a delimiter between elements. So given the property key
<code>test.value</code> the configuration would look for an
element named <code>test</code> and then for a sub element
with the name <code>value</code>. To change this behavior it is
possible to escape a dot character, thus telling the configuration
that it is really part of an element name. This is simply done
by duplicating the dot. So the following statements will return
the desired property values:
</p>
<source><![CDATA[
int testVal = config.getInt("test..value");
String complex = config.getString("test..complex.test..sub..element");
]]></source>
<p>
Note the duplicated dots whereever the dot does not act as
delimiter. This way it is possible to access properties containing
dots in arbitrary combination. However, as you can see, the
escaping can be confusing sometimes. So if you have a choice,
you should avoid dots in the tag names of your XML configuration
files.
</p>
</subsection>
</section>
<section name="Validation of XML configuration files">
<p>
XML parsers provide support for validation of XML documents to ensure that they
conform to a certain DTD. This feature can be useful for
configuration files, too. <code>XMLConfiguration</code> allows to enable
validation for the files to load.
</p>
<p>
The easiest way to turn on validation is to simply set the
<code>validating</code> property to true as shown in the
following example:
</p>
<source><![CDATA[
XMLConfiguration config = new XMLConfiguration();
config.setFileName("myconfig.xml");
config.setValidating(true);
// This will throw a ConfigurationException if the XML document does not
// conform to its DTD.
config.load();
]]></source>
<p>
Setting the <code>validating</code> flag to true will cause
<code>XMLConfiguration</code> to use a validating XML parser. At this parser
a custom <code>ErrorHandler</code> will be registered, which throws
exceptions on simple and fatal parsing errors.
</p>
<p>
While using the <code>validating</code> flag is a simple means of
enabling validation it cannot fullfil more complex requirements,
e.g. schema validation. To be able to deal with such requirements
XMLConfiguration provides a generic way of setting up the XML
parser to use: A preconfigured <code>DocumentBuilder</code> object
can be passed to the <code>setDocumentBuilder()</code> method.
</p>
<p>
So an application can create a <code>DocumentBuilder</code> object
and initialize it according to its special needs. Then this
object must be passed to the <code>XMLConfiguration</code> instance
before invocation of the <code>load()</code> method. When loading
a configuration file, the passed in <code>DocumentBuilder</code> will
be used instead of the default one.
</p>
</section>
</body>
</document>