blob: b0e96b73dd8c7308eaa5e775ba5976c9a3771cf3 [file] [log] [blame]
<?xml version="1.0"?>
<title>Configuration Factory and Hierarchical Structured Data Howto</title>
<author email="">Oliver Heger</author>
<section name="Using XML based Configurations">
This section explains how to use Hierarchical
and Structured XML datasets.
<section name="Hierarchical properties">
The XML document we used in the section about composite configuration was quite simple. Because of its
tree-like nature XML documents can represent data that is
structured in many ways. This section explains how to deal with
such structured documents.
<subsection name="Structured XML">
Consider the following scenario: An application operates on
database tables and wants to load a definition of the database
schema from its configuration. A XML document provides this
information. It could look as follows:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<table tableType="system">
<table tableType="application">
This XML is quite self explanatory; there is an arbitrary number
of table elements, each of it has a name and a list of fields.
A field in turn consists of a name and a data type.
To access the data stored in this document it must be included
in the configuration definition file:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<properties fileName=""/>
<xml fileName="gui.xml"/>
<xml fileName="tables.xml"/>
The additional <code>xml</code> element causes the document
with the table definitions to be loaded. When we now want to
read some of the properties we face a problem: the syntax for
constructing configuration keys we learned so far is not
powerful enough to access all of the data stored in the tables
Because the document contains a list of tables some properties
are defined more than once. E.g. the configuration key
<code></code> refers to a <code>name</code>
element inside a <code>table</code> element inside a
<code>tables</code> element. This constellation happens to
occur twice in the tables document.
Multiple definitions of a property do not cause problems and are
supported by all classes of Configuration. If such a property
is queried using <code>getProperty()</code>, the method
recognizes that there are multiple values for that property and
returns a collection with all these values. So we could write
Object prop = config.getProperty("");
if(prop instanceof Collection)
System.out.println("Number of tables: " + ((Collection) prop).size());
An alternative to this code would be the <code>getList()</code>
method of <code>Configuration</code>. If a property is known to
have multiple values (as is the table name property in this example),
<code>getList()</code> allows to retrieve all values at once.
<b>Note:</b> it is legal to call <code>getString()</code>
or one of the other getter methods on a property with multiple
values; it returns the first element of the list.
<subsection name="Accessing structured properties">
Okay, we can obtain a list with the name of all defined
tables. In the same way we can retrieve a list with the names
of all table fields: just pass the key
<code></code> to the
<code>getList()</code> method. In our example this list
would contain 10 elements, the names of all fields of all tables.
This is fine, but how do we know, which field belongs to
which table?
The answer is, with our actual approach we have no chance to
obtain this knowledge! If XML documents are loaded this way,
their exact structure is lost. Though all field names are found
and stored the information which field belongs to which table
is not saved. Fortunately Configuration provides a way of
dealing with structured XML documents. To enable this feature
the configuration definition file has to be slightly altered.
It becomes:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<properties fileName=""/>
<xml fileName="gui.xml"/>
<hierarchicalXml fileName="tables.xml"/>
Note that one <code>xml</code> element was replaced by a
<code>hierarchicalXml</code> element. This element tells the configuration
factory that not the default class for processing XML documents
should be used, but the class <code>HierarchicalXMLConfiguration</code>.
As the name implies this class is capable of saving the
hierarchy of XML documents thus keeping their structure.
When working with such hierarchical properties configuration keys
used to query properties support an extended syntax. All components
of a key can be appended by a numerical value in parentheses that
determines the index of the affected property. This is explained best
by some examples:
We will now provide some configuration keys and show the results
of a <code>getProperty()</code> call with these keys as arguments.
Returns the name of the first table (all indices are 0 based),
in this example the string <em>users</em>.
Returns the value of the tableType attribute of the first
table (<em>system</em>).
Analogous to the first example returns the name of the
second table (<em>documents</em>).
Here the name of a third table is queried, but because there
are only two tables result is <b>null</b>. The fact that a
<b>null</b> value is returned for invalid indices can be used
to find out how many values are defined for a certain property:
just increment the index in a loop as long as valid objects
are returned.
Returns a collection with the names of all fields that
belong to the second table. With such kind of keys it is
now possible to find out, which fields belong to which table.
The additional index after field selects a certain field.
This expression represents the name of the third field in
the second table (<em>creationDate</em>).
This key may be a bit unusual but nevertheless completely
valid. It selects the data types of the first fields in all
tables. So here a collection would be returned with the
values [<em>long, long</em>].
These examples should make the usage of indices quite clear.
Because each configuration key can contain an arbitrary number
of indices it is possible to navigate through complex structures of
XML documents; each XML element can be uniquely identified.
So at the end of this section we can draw the following facit:
For simple XML documents that define only some simple properties
and do not have a complex structure the default XML configuration
class is suitable. If documents are more complex and their structure
is important, the hierarchy aware class should be used, which is
enabled by an additional <code>className</code> attribute as
shown in the example configuration definition file above.
<section name="Union configuration">
In an earlier section about the configuration definition file for
<code>ConfigurationFactory</code> it was stated that configuration
files included first can override properties in configuraton files
included later and an example use case for this behaviour was given.
There may be times when there are other requirements.
Let's continue the example with the application that somehow process
database tables and that reads the definitions of the affected tables from
its configuration. Now consider that this application grows larger and
must be maintained by a team of developers. Each developer works on
a separated set of tables. In such a scenario it would be problematic
if the definitions for all tables would be kept in a single file. It can be
expected that this file needs to be changed very often and thus can be
a bottleneck for team development when it is nearly steadily checked
out. It would be much better if each developer had an associated file
with table definitions and all these information could be linked together
at the end.
<code>ConfigurationFactory</code> provides support for such a use case,
too. It is possible to specify in the configuration definition file that
from a set of configuration sources a logic union configuration is to be
constructed. Then all properties defined in the provided sources are
collected and can be accessed as if they had been defined in a single
source. To demonstrate this feature let us assume that a developer of
the database application has defined a specific XML file with a table
definition named <code>tasktables.xml</code>:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<table tableType="application">
This file defines the structure of an additional table, which should be
added to the so far existing table definitions. To achieve this the
configuration definition file has to be changed: A new section is added
that contains the include elements of all configuration sources which
are to be combined.
<?xml version="1.0" encoding="ISO-8859-1" ?>
<!-- Configuration definition file that demonstrates the
override and additional sections -->
<properties fileName=""/>
<xml fileName="gui.xml"/>
<hierarchicalXml fileName="tables.xml"/>
<hierarchicalXml fileName="tasktables.xml" at="tables"/>
Compared to the older versions of this file a couple of changes has been
done. One major difference is that the elements for including configuration
sources are no longer direct children of the root element, but are now
contained in either an <code>override</code> or <code>additional</code>
section. The names of these sections already imply their purpose.
The <code>override</code> section is not strictly necessary. Elements in
this section are treated as if they were children of the root element, i.e.
properties in the included configuration sources override properties in
sources included later. So the <code>override</code> tags could have
been ommitted, but for sake of clearity it is recommended to use them
when there is also an <code>additional</code> section.
It is the <code>additonal</code> section that introduces a new behaviour.
All configuration sources listed here are combined to a union configuration.
In our example we have put two <code>xml</code> elements in this area
that load the available files with database table definitions. The syntax
of elements in the <code>additional</code> section is analogous to the
syntax described so far. The only difference is an additionally supported
<code>at</code> attribute that specifies the position in the logic union
configuration where the included properties are to be added. In this
example we set the <code>at</code> attribute of the second element to
<em>tables</em>. This is because the file starts with a <code>table</code>
element, but to be compatible with the other table definition file should be
accessable under the key <code>tables.table</code>.
After these modifications have been performed the configuration obtained
from the <code>ConfigurationFactory</code> will allow access to three
database tables. A call of <code>config.getString("tables.table(2).name");</code>
will result in a value of <em>tasks</em>. In an analogous way it is possible
to retrieve the fields of the third table.
Note that it is also possible to override properties defined in an
<code>additonal</code> section. This can be done by placing a
configuration source in the <code>override</code> section that defines
properties that are also defined in one of the sources listed in the
<code>additional</code> section. The example does not make use of that.
Note also that the order of the <code>override</code> and
<code>additional</code> sections in a configuration definition file does
not matter. Sources in an <code>override</code> section are always treated with
higher priority (otherwise they could not override the values of other
<section name="XML processing">
We have now loaded some configuration sources and accessed some of the
properties. What else can we do? One additional feature provided by
Configuration is its support for XML-like processing of <code>Configuration</code>
objects that is implemented by the <code>ConfigurationXMLDocument</code>
class. The XML format for data exchange has become very popular, so there
are many use cases why you may want a XML-like view for your configuration.
This section shows how to make use of these features.
<subsection name="Basic XML access">
When it comes to XML processing of configuration sources the
<code>ConfigurationXMLDocument</code> class is usually involved.
When an instance of this class is created a <code>Configuration</code>
object is passed to the constructor. All operations executed on the
instance are then related to this configuration or a subset of it.
The most fundamental operation for treating a configuration source
as a XML document is to request a <code>ConfigurationXMLReader</code>
object from the <code>ConfigurationXMLDocument</code> instance.
The object returned by this method implements the
<code>org.xml.sax.XMLReader</code> interface and thus is a SAX 2
conform XML parser. This parser can then be passed to components
that can deal with SAX events. The following snippet shows how such
a SAX parser can be obtained:
ConfigurationXMLDocument configDoc = new ConfigurationXMLDocument(config);
XMLReader reader = configDoc.createXMLReader();
// or:
// XMLReader reader = configDoc.createXMLReader("tables");
// Now do something with the reader
As this example shows it is either possible to obtain a reader object
for the whole configuration or for a subset of it. The obtained object
can now be used where ever a SAX parser is supported. There is
only one thing to mention: The <code>XMLReader</code> returned
by <code>createXMLReader()</code> has been initialized with the
configuration source (or a subset of it) stored in the
<code>ConfigurationXMLDocument</code> instance. If now one of the
<code>parse()</code> methods is called, the passed in argument,
which usually specifies the input source to parse, is ignored. This
information is unnecessary because all parsing is always done on
the associated <code>Configuration</code> object. Later in this section
this fact should become clearer.
<subsection name="Working with documents">
SAX may be the most efficient, but it is surely not the most convenient
way of XML processing. If a document with a complex structure is to
be navigated, a DOM based approach is usually more suitable.
<code>ConfigurationXMLDocument</code> provides a couple of
overloaded <code>getDocument()</code> methods that return a dom4j
<code>Document</code> object. The arguments that can be passed to
these methods allow to select a subset of the configuration source and
to specify the name of the resulting document's root element. The following
code fragment provides an example:
ConfigurationXMLDocument configDoc = new ConfigurationXMLDocument(config);
Document doc = configDoc.getDocument("tables", "database");
The <code>Document</code> returned here contains all data below the
<em>tables</em> section, i.e. it will have the root element <em>database</em>
with three <em>table</em> elements as children. In addition to the
<code>getDocument()</code> methods there is also a set of
<code>getW3cDocument()</code> methods. These methods act in an
analogous way, but return a <code>org.w3c.dom.Document</code> object
rather than a dom4j document.
Once a DOM document has been obtained the whole world of DOM processing
is open. Especially dom4j allows for powerful and convenient manipulations,
e.g. the document could be transformed using a stylesheet or written to
a file. If a configuration source or parts of it should simply be saved as
a XML document, there is an even easier way: the <code>write()</code>
methods of <code>ConfigurationXMLDocument</code>. Let's assume our
example application wants to send its database table definitions to an
external tool, maybe to initialize the database schema. The following code
shows how an XML file with this information could be written:
ConfigurationXMLDocument configDoc = new ConfigurationXMLDocument(config);
Writer out = null;
out = new BufferedWriter(new FileWriter("tabledef.xml"));
configDoc.write(out, "tables", "database");
<subsection name="Calling Digester">
<em>Commons Digester</em> is another Apache Jakarta project that
implements a powerful engine for processing XML documents. In this section
we will make use of Digester to transform the table definitions into a
corresponding object model. For this tutorial the interesting part is
the stuff related to the communication with Digester; more information
about Digester and its features can be found at the homepage of the
Digester project.
We start with with the definition of a set of beans that will later store
information about the database tables and their fields. To keep this
example short these are very simple classes with hardly more than
a couple of getter and setter methods. As you can see there are
corresponding properties for all elements that can appear in the
table configuration.
public class TestConfigurationXMLDocument
/** Stores the tables.*/
private ArrayList tables;
* Adds a new table object. Called by Digester.
* @param table the new table
public void addTable(Table table)
* A simple bean class for storing information about a table field.
* Used for the Digester test.
public static class TableField
private String name;
private String type;
public String getName()
return name;
public String getType()
return type;
public void setName(String string)
name = string;
public void setType(String string)
type = string;
* A simple bean class for storing information about a database table.
* Used for the Digester test.
public static class Table
/** Stores the list of fields.*/
private ArrayList fields;
/** Stores the table name.*/
private String name;
/** Stores the table type.*/
private String tableType;
public Table()
fields = new ArrayList();
public String getName()
return name;
public String getTableType()
return tableType;
public void setName(String string)
name = string;
public void setTableType(String string)
tableType = string;
* Adds a field.
* @param field the new field
public void addField(TableField field)
* Returns the field with the given index.
* @param idx the index
* @return the field with this index
public TableField getField(int idx)
return (TableField) fields.get(idx);
* Returns the number of fields.
* @return the number of fields
public int size()
return fields.size();
While <code>TableField</code> is almost trivial, <code>Table</code>
provides the ability of storing a number of field objects in an internal
collection. The test class also defines a collection that will later
store the <code>Table</code> objects created from the configuration.
To pass the table configuration to Digester we have to
Ask our <code>ConfigurationXMLDocument</code> instance for
a <code>XMLReader</code> for the <em>tables</em> section.
Construct a new Digester object and pass this XML reader to
the constructor.
Initialize the Digester instance with the necessary processing rules
to create a corresponding object hierarchy for the table definitions.
Call the Digester's <code>parse()</code> method to initiate
The following listing shows the implementation of these steps:
public void testCallDigesterComplex() throws Exception
Digester digester =
new Digester(configDoc.createXMLReader("tables"));
digester.addObjectCreate("config/table", Table.class);
digester.addCallMethod("config/table/name", "setName", 0);
digester.addObjectCreate("config/table/fields/field", TableField.class);
"setName", 0);
"setType", 0);
"addField", TableField.class.getName());
digester.addSetNext("config/table", "addTable", Table.class.getName());
At the beginning of this listing <code>createXMLReader()</code>
is called on the <code>ConfigurationXMLDocument</code> instance
to obtain a SAX parser for the configuration subset with the table
definitions. When the Digester object is created this parser is passed
to its constructor.
The major part of the fragment deals with setting up the necessary
Digester rules. Details for that can be found in the Digester documentation.
In short we define rules to create <code>Table</code> and
<code>TableField</code> objects when the corresponding XML elements
are detected. The newly created objects are initialized with the
properties defined in the XML code. Then it is ensured that a new
<code>TableField</code> object is added to a <code>Table</code>
and that for each new <code>Table</code> object the
<code>addTable()</code> method of the test class is invoked.
This all conforms to the default usage pattern of Digester, only two
points should be noticed:
The match strings for all Digester rules have the prefix
<em>config/table</em>. The reason for this is that the XML document
that is generated by the SAX parser has a corresponding structure.
Remember when we called <code>createXMLReader()</code>, we
specified the string <em>tables</em> as argument. This means that
the resulting document will have all the content that can be found
in the configuration under the key <em>tables</em>, which happens
to be three <em>table</em> elements with their corresponding
children. The root element of this document is named <em>config</em>.
This is the default name of the root element if no other name is
specified. If we had called <code>createXMLReader("tables", "tabledef")</code>,
the root element would have been named <em>tabledef</em> and
we would have to use match strings of the form <em>tabledef/table</em>.
The call of the Digester's <code>parse()</code> method is a little
strange because we only pass in the string <em>config</em> as
argument. Fact is that if a <code>XMLReader</code> obtained
from <code>ConfigurationXMLDocument</code> is involved, the
parameter of the <code>parse()</code> method is completely
ignored. The reader always refers to the configuration source it
was constructed for. So we could have used an arbitrary string.
After the <code>parse()</code> method returns the <code>tables</code>
collection of the test class contains three <code>Table</code> objects
with all information specified in the configuration.
<subsection name="Calling Digester for creating simple objects">
If an application's configuration defines complex objects that should
be instantiated using Digester it will usually be necessary to provide
specific Digester rules as shown in the last section. But another
use case is to create an object whose class name is defined in the
configuration and to perform some simple initialization on it.
Imagine the example database application wants to open a connection to
a database. Because it is a very sophisticated application it supports
lots of different databases. To achieve this there is an abstract
<code>ConnectionInfo</code> class that provides typical connect
properties (such as user name, password and the name of the database)
and an abstract <code>connect()</code> method, which establishes
the connection to the database:
public abstract class ConnectionInfo
private String dsn;
private String user;
private String passwd;
public String getDsn()
return dsn;
public String getPasswd()
return passwd;
public String getUser()
return user;
public void setDsn(String string)
dsn = string;
public void setPasswd(String string)
passwd = string;
public void setUser(String string)
user = string;
* Creates a connection to a database.
* Must be defined in sub classes.
* @return the created connection
public abstract Connection createConnection() throws SQLException;
There are now some sub classes of this class, one for each supported
database with a proper implementation of <code>createConnection()</code>.
When the example application is run it does not know from start,
which database it has to access. Instead it determines the database
driver class from a configuration property, creates an instance of it,
and uses it to obtain a connection. For this use case the
<code>callDigester()</code> method of <code>ConfigurationXMLDocument</code>
was designed.
To demonstrate this feature we start with an additional configuration source
that defines the connection of the database to be used. We call it
<?xml version="1.0" encoding="UTF-8"?>
<class name="myapp.ConnectionInfoOracle">
<property name="dsn" value="MyData"/>
<property name="user" value="scott"/>
<property name="passwd" value="tiger"/>
This configuration file shows the typical structure of XML code
that defines an object to be created using Digester: There must be
a <code>class</code> element with a <code>name</code> attribute
defining the full qualified name of the class to be instantiated. In the
body of the <code>class</code> element there can be an arbitrary
number of <code>property</code> elements. Each element defines
the name and value of a property to be set, so Digester will call a
corresponding setter method on the bean just created. Here we set
properties with the names <em>dsn, user</em> and <em>passwd</em>.
As you can see, our <code>ConnectionInfo</code> base class has
getter and setter methods for exactly these properties. We have now to
include this additional configuration file in our configuration definition file:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<!-- Configuration definition file that demonstrates the
override and additional sections -->
<properties fileName=""/>
<xml fileName="gui.xml"/>
<hierarchicalXml fileName="tables.xml"/>
<hierarchicalXml fileName="tasktables.xml" at="tables"/>
<hierarchicalXml fileName="connection.xml"/>
After that it is now quite easy to obtain an object with information
about a database connection. The following code fragment shows
how a connection can be opened (it assumes that the <code>Configuration</code>
object obtained from the configuration factory is stored in a variable named
ConfigurationXMLDocument configDoc = new ConfigurationXMLDocument(config);
ConnectionInfo connInfo = (ConnectionInfo) configDoc.callDigester("connection");
Connection conn = connInfo.createConnection();
<subsection name="Some caveats">
At the end of this section about <code>ConfigurationXMLDocument</code>
some notes about points a developer should be aware of are provided:
The class should not be used on the <code>Configuration</code>
object obtained from <code>ConfigurationFactory</code> as a
whole. Because this configuration can contain multiple configuration
sources including those that override other properties the results
are probably not what you expect. You can of course pass such a
composite configuration to the constructor of
<code>ConfigurationXMLDocument</code>, but you should then,
when you call methods on this instance, always provide a
configuration key that selects certain parts of the configuration.
The XML processing abilities of the class naturally work best
when a <code>HierarchicalConfiguration</code> object is
involved. There is also support for other types of configuration
sources, but this will work well only for very simple properties.
The problem here is the loss of information concerning the
structure of the properties (as was explained in an earlier secion).
So if you read a configuration source
using the <code>xml</code> element rather than the
<code>hierarchicalXml</code> element, XML documents
generated by <code>ConfigurationXMLDocument</code> may
look wired; the same is true for <code>properties</code>
If you read a XML configuration file and then save it again
using <code>ConfigurationXMLDocument.write()</code> the
result is not guaranteed to be identic to the original file.
While the document structure is kept (i.e. the relation between
elements and their children), there may be differences in the
order in which elements are written.