| <?xml version="1.0"?> |
| <!-- |
| Copyright 2004-2005 The Apache Software Foundation |
| |
| Licensed under the Apache License, Version 2.0 (the "License"); |
| you may not use this file except in compliance with the License. |
| You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <document> |
| |
| <properties> |
| <title>XML Howto</title> |
| <author email="oliver.heger@t-online.de">Oliver Heger</author> |
| </properties> |
| |
| <body> |
| <section name="Using XML based Configurations"> |
| <p> |
| This section explains how to use hierarchical |
| and structured XML datasets. |
| </p> |
| </section> |
| |
| <section name="Hierarchical properties"> |
| <p> |
| Because of its |
| tree-like nature XML documents can represent data that is |
| structured in many ways. This section explains how to deal with |
| such structured documents and demonstrates the enhanced query |
| facilities supported by the <code>XMLConfiguration</code> class.. |
| </p> |
| <subsection name="Accessing properties defined in XML documents"> |
| <p> |
| We will start with a simple XML document to show some basics |
| about accessing properties. The following file named |
| <code>gui.xml</code> is used as example document: |
| </p> |
| <source><![CDATA[ |
| <?xml version="1.0" encoding="ISO-8859-1" ?> |
| <gui-definition> |
| <colors> |
| <background>#808080</background> |
| <text>#000000</text> |
| <header>#008000</header> |
| <link normal="#000080" visited="#800080"/> |
| <default>${colors.header}</default> |
| </colors> |
| <rowsPerPage>15</rowsPerPage> |
| <buttons> |
| <name>OK,Cancel,Help</name> |
| </buttons> |
| <numberFormat pattern="###\,###.##"/> |
| </gui-definition> |
| ]]></source> |
| <p> |
| (As becomes obvious, this tutorial does not bother with good |
| design of XML documents, the example file should rather |
| demonstrate the different ways of accessing properties.) |
| To access the data stored in this document it must be loaded |
| by <code>XMLConfiguration</code>. Like other file based |
| configuration classes <code>XMLConfiguration</code> supports |
| many ways of specifying the file to process. One way is to |
| pass the file name to the constructor as shown in the following |
| code fragment: |
| </p> |
| <source><![CDATA[ |
| try |
| { |
| XMLConfiguration config = new XMLConfiguration("tables.xml"); |
| // do something with config |
| } |
| catch(ConfigurationException cex) |
| { |
| // something went wrong, e.g. the file was not found |
| } |
| ]]></source> |
| <p> |
| If no exception was thrown, the properties defined in the |
| XML document are now available in the configuration object. |
| The following fragment shows how the properties can be accessed: |
| </p> |
| <source><![CDATA[ |
| String backColor = config.getString("colors.background"); |
| String textColor = config.getString("colors.text"); |
| String linkNormal = config.getString("colors.link[@normal]"); |
| String defColor = config.getString("colors.default"); |
| int rowsPerPage = config.getInt("rowsPerPage"); |
| List buttons = config.getList("buttons.name"); |
| ]]></source> |
| <p> |
| This listing demonstrates some important points about constructing |
| keys for accessing properties load from XML documents and about |
| features of <code>XMLConfiguration</code> in general: |
| <ul> |
| <li> |
| Nested elements are accessed using a dot notation. In |
| the example document there is an element |
| <code><text></code> in the body of the |
| <code><color></code> element. The corresponding |
| key is <code>color.text</code>. |
| </li> |
| <li> |
| The root element is ignored when constructing keys. In |
| the example you do not write |
| <code>gui-definition.color.text</code>, but only |
| <code>color.text</code>. |
| </li> |
| <li> |
| Attributes of XML elements are accessed in a XPath like |
| notation. |
| </li> |
| <li> |
| Interpolation can be used as in <code>PropertiesConfiguration</code>. |
| Here the <code><default></code> element in the |
| <code>colors</code> section refers to another color. |
| </li> |
| <li> |
| Lists of properties can be defined in a short form using |
| the delimiter character (which is the comma by default). |
| In this example the <code>buttons.name</code> property |
| has the three values <em>OK</em>, <em>Cancel</em>, and |
| <em>Help</em>, so it is queried using the <code>getList()</code> |
| method. This works in attributes, too. Using the static |
| <code>setDelimiter()</code> method of |
| <code>AbstractConfiguration</code> you can globally |
| define a different delimiter character or - |
| by setting the delimiter to 0 - disabling this mechanism |
| completely. Placing a backslash before a delimiter |
| character will escape it. This is demonstrated in the |
| <code>pattern</code> attribute of the <code>numberFormat</code> |
| element. |
| </li> |
| </ul> |
| </p> |
| <p> |
| In the next section will show how data in a more complex XML |
| document can be processed. |
| </p> |
| </subsection> |
| <subsection name="Structured XML"> |
| <p> |
| Consider the following scenario: An application operates on |
| database tables and wants to load a definition of the database |
| schema from its configuration. A XML document provides this |
| information. It could look as follows: |
| </p> |
| <source><![CDATA[ |
| <?xml version="1.0" encoding="ISO-8859-1" ?> |
| |
| <database> |
| <tables> |
| <table tableType="system"> |
| <name>users</name> |
| <fields> |
| <field> |
| <name>uid</name> |
| <type>long</type> |
| </field> |
| <field> |
| <name>uname</name> |
| <type>java.lang.String</type> |
| </field> |
| <field> |
| <name>firstName</name> |
| <type>java.lang.String</type> |
| </field> |
| <field> |
| <name>lastName</name> |
| <type>java.lang.String</type> |
| </field> |
| <field> |
| <name>email</name> |
| <type>java.lang.String</type> |
| </field> |
| </fields> |
| </table> |
| <table tableType="application"> |
| <name>documents</name> |
| <fields> |
| <field> |
| <name>docid</name> |
| <type>long</type> |
| </field> |
| <field> |
| <name>name</name> |
| <type>java.lang.String</type> |
| </field> |
| <field> |
| <name>creationDate</name> |
| <type>java.util.Date</type> |
| </field> |
| <field> |
| <name>authorID</name> |
| <type>long</type> |
| </field> |
| <field> |
| <name>version</name> |
| <type>int</type> |
| </field> |
| </fields> |
| </table> |
| </tables> |
| </database> |
| ]]></source> |
| <p> |
| This XML is quite self explanatory; there is an arbitrary number |
| of table elements, each of it has a name and a list of fields. |
| A field in turn consists of a name and a data type. This |
| XML document (let's call it <code>tables.xml</code>) can be |
| loaded in exactly the same way as the simple document in the |
| section before. |
| </p> |
| <p> |
| When we now want to access some of the properties we face a |
| problem: the syntax for constructing configuration keys we |
| learned so far is not powerful enough to access all of the data |
| stored in the tables document. |
| </p> |
| <p> |
| Because the document contains a list of tables some properties |
| are defined more than once. E.g. the configuration key |
| <code>tables.table.name</code> refers to a <code>name</code> |
| element inside a <code>table</code> element inside a |
| <code>tables</code> element. This constellation happens to |
| occur twice in the tables document. |
| </p> |
| <p> |
| Multiple definitions of a property do not cause problems and are |
| supported by all classes of Configuration. If such a property |
| is queried using <code>getProperty()</code>, the method |
| recognizes that there are multiple values for that property and |
| returns a collection with all these values. So we could write |
| </p> |
| <source><![CDATA[ |
| Object prop = config.getProperty("tables.table.name"); |
| if(prop instanceof Collection) |
| { |
| System.out.println("Number of tables: " + ((Collection) prop).size()); |
| } |
| ]]></source> |
| <p> |
| An alternative to this code would be the <code>getList()</code> |
| method of <code>Configuration</code>. If a property is known to |
| have multiple values (as is the table name property in this example), |
| <code>getList()</code> allows to retrieve all values at once. |
| <b>Note:</b> it is legal to call <code>getString()</code> |
| or one of the other getter methods on a property with multiple |
| values; it returns the first element of the list. |
| </p> |
| </subsection> |
| <subsection name="Accessing structured properties"> |
| <p> |
| Okay, we can obtain a list with the name of all defined |
| tables. In the same way we can retrieve a list with the names |
| of all table fields: just pass the key |
| <code>tables.table.fields.field.name</code> to the |
| <code>getList()</code> method. In our example this list |
| would contain 10 elements, the names of all fields of all tables. |
| This is fine, but how do we know, which field belongs to |
| which table? |
| </p> |
| <p> |
| When working with such hierarchical structures the configuration keys |
| used to query properties can have an extended syntax. All components |
| of a key can be appended by a numerical value in parentheses that |
| determines the index of the affected property. So if we have two |
| <code>table</code> elements we can exactly specify, which one we |
| want to address by appending the corresponding index. This is |
| explained best by some examples: |
| </p> |
| <p> |
| We will now provide some configuration keys and show the results |
| of a <code>getProperty()</code> call with these keys as arguments. |
| <dl> |
| <dt><code>tables.table(0).name</code></dt> |
| <dd> |
| Returns the name of the first table (all indices are 0 based), |
| in this example the string <em>users</em>. |
| </dd> |
| <dt><code>tables.table(0)[@tableType]</code></dt> |
| <dd> |
| Returns the value of the tableType attribute of the first |
| table (<em>system</em>). |
| </dd> |
| <dt><code>tables.table(1).name</code></dt> |
| <dd> |
| Analogous to the first example returns the name of the |
| second table (<em>documents</em>). |
| </dd> |
| <dt><code>tables.table(2).name</code></dt> |
| <dd> |
| Here the name of a third table is queried, but because there |
| are only two tables result is <b>null</b>. The fact that a |
| <b>null</b> value is returned for invalid indices can be used |
| to find out how many values are defined for a certain property: |
| just increment the index in a loop as long as valid objects |
| are returned. |
| </dd> |
| <dt><code>tables.table(1).fields.field.name</code></dt> |
| <dd> |
| Returns a collection with the names of all fields that |
| belong to the second table. With such kind of keys it is |
| now possible to find out, which fields belong to which table. |
| </dd> |
| <dt><code>tables.table(1).fields.field(2).name</code></dt> |
| <dd> |
| The additional index after field selects a certain field. |
| This expression represents the name of the third field in |
| the second table (<em>creationDate</em>). |
| </dd> |
| <dt><code>tables.table.fields.field(0).type</code></dt> |
| <dd> |
| This key may be a bit unusual but nevertheless completely |
| valid. It selects the data types of the first fields in all |
| tables. So here a collection would be returned with the |
| values [<em>long, long</em>]. |
| </dd> |
| </dl> |
| </p> |
| <p> |
| These examples should make the usage of indices quite clear. |
| Because each configuration key can contain an arbitrary number |
| of indices it is possible to navigate through complex structures of |
| XML documents; each XML element can be uniquely identified. |
| </p> |
| </subsection> |
| <subsection name="Adding new properties"> |
| <p> |
| So far we have learned how to use indices to avoid ambiguities when |
| querying properties. The same problem occurs when adding new |
| properties to a structured configuration. As an example let's |
| assume we want to add a new field to the second table. New properties |
| can be added to a configuration using the <code>addProperty()</code> |
| method. Of course, we have to exactly specify where in the tree like structure new |
| data is to be inserted. A statement like |
| </p> |
| <source><![CDATA[ |
| // Warning: This might cause trouble! |
| config.addProperty("tables.table.fields.field.name", "size"); |
| ]]></source> |
| <p> |
| would not be sufficient because it does not contain all needed |
| information. How is such a statement processed by the |
| <code>addProperty()</code> method? |
| </p> |
| <p> |
| <code>addProperty()</code> splits the provided key into its |
| single parts and navigates through the properties tree along the |
| corresponding element names. In this example it will start at the |
| root element and then find the <code>tables</code> element. The |
| next key part to be processed is <code>table</code>, but here a |
| problem occurs: the configuration contains two <code>table</code> |
| properties below the <code>tables</code> element. To get rid off |
| this ambiguity an index can be specified at this position in the |
| key that makes clear, which of the two properties should be |
| followed. <code>tables.table(1).fields.field.name</code> e.g. |
| would select the second <code>table</code> property. If an index |
| is missing, <code>addProperty()</code> always follows the last |
| available element. In our example this would be the second |
| <code>table</code>, too. |
| </p> |
| <p> |
| The following parts of the key are processed in exactly the same |
| manner. Under the selected <code>table</code> property there is |
| exactly one <code>fields</code> property, so this step is not |
| problematic at all. In the next step the <code>field</code> part |
| has to be processed. At the actual position in the properties tree |
| there are multiple <code>field</code> (sub) properties. So we here |
| have the same situation as for the <code>table</code> part. |
| Because no explicit index is defined the last <code>field</code> |
| property is selected. The last part of the key passed to |
| <code>addProperty()</code> (<code>name</code> in this example) |
| will always be added as new property at the position that has |
| been reached in the former processing steps. So in our example |
| the last <code>field</code> property of the second table would |
| be given a new <code>name</code> sub property and the resulting |
| structure would look like the following listing: |
| </p> |
| <source><![CDATA[ |
| ... |
| <table tableType="application"> |
| <name>documents</name> |
| <fields> |
| <field> |
| <name>docid</name> |
| <type>long</type> |
| </field> |
| <field> |
| <name>name</name> |
| <type>java.lang.String</type> |
| </field> |
| <field> |
| <name>creationDate</name> |
| <type>java.util.Date</type> |
| </field> |
| <field> |
| <name>authorID</name> |
| <type>long</type> |
| </field> |
| <field> |
| <name>version</name> |
| <name>size</name> <== Newly added property |
| <type>int</type> |
| </field> |
| </fields> |
| </table> |
| </tables> |
| </database> |
| ]]></source> |
| <p> |
| This result is obviously not what was desired, but it demonstrates |
| how <code>addProperty()</code> works: the method follows an |
| existing branch in the properties tree and adds new leaves to it. |
| (If the passed in key does not match a branch in the existing tree, |
| a new branch will be added. E.g. if we pass the key |
| <code>tables.table.data.first.test</code>, the existing tree can be |
| navigated until the <code>data</code> part of the key. From here a |
| new branch is started with the remaining parts <code>data</code>, |
| <code>first</code> and <code>test</code>.) |
| </p> |
| <p> |
| If we want a different behavior, we must explicitely tell |
| <code>addProperty()</code> what to do. In our example with the |
| new field our intension was to create a new branch for the |
| <code>field</code> part in the key, so that a new <code>field</code> |
| property is added to the structure rather than adding sub properties |
| to the last existing <code>field</code> property. This can be |
| achieved by specifying the special index <code>(-1)</code> at the |
| corresponding position in the key as shown below: |
| </p> |
| <source><![CDATA[ |
| config.addProperty("tables.table(1).fields.field(-1).name", "size"); |
| config.addProperty("tables.table(1).fields.field.type", "int"); |
| ]]></source> |
| <p> |
| The first line in this fragment specifies that a new branch is |
| to be created for the <code>field</code> property (index -1). |
| In the second line no index is specified for the field, so the |
| last one is used - which happens to be the field that has just |
| been created. So these two statements add a fully defined field |
| to the second table. This is the default pattern for adding new |
| properties or whole hierarchies of properties: first create a new |
| branch in the properties tree and then populate its sub properties. |
| As an additional example let's add a complete new table definition |
| to our example configuration: |
| </p> |
| <source><![CDATA[ |
| // Add a new table element and define the name |
| config.addProperty("tables.table(-1).name", "versions"); |
| |
| // Add a new field to the new table |
| // (an index for the table is not necessary because the latest is used) |
| config.addProperty("tables.table.fields.field(-1).name", "id"); |
| config.addProperty("tables.table.fields.field.type", "int"); |
| |
| // Add another field to the new table |
| config.addProperty("tables.table.fields.field(-1).name", "date"); |
| config.addProperty("tables.table.fields.field.type", "java.sql.Date"); |
| ... |
| ]]></source> |
| <p> |
| For more information about adding properties to a hierarchical |
| configuration also have a look at the javadocs for |
| <code>HierarchicalConfiguration</code>. |
| </p> |
| </subsection> |
| <subsection name="Escaping dot characters in XML tags"> |
| <p> |
| In XML the dot character used as delimiter by most configuration |
| classes is a legal character that can occur in any tag. So the |
| following XML document is completely valid: |
| </p> |
| <source><![CDATA[ |
| <?xml version="1.0" encoding="ISO-8859-1" ?> |
| |
| <configuration> |
| <test.value>42</test.value> |
| <test.complex> |
| <test.sub.element>many dots</test.sub.element> |
| </test.complex> |
| </configuration> |
| ]]></source> |
| <p> |
| This XML document can be loaded by <code>XMLConfiguration</code> |
| without trouble, but when we want to access certain properties |
| we face a problem: The configuration claims that it does not |
| store any values for the properties with the keys |
| <code>test.value</code> or <code>test.complex.test.sub.element</code>! |
| </p> |
| <p> |
| Of course, it is the dot character contained in the property |
| names, which causes this problem. A dot is always interpreted |
| as a delimiter between elements. So given the property key |
| <code>test.value</code> the configuration would look for an |
| element named <code>test</code> and then for a sub element |
| with the name <code>value</code>. To change this behavior it is |
| possible to escape a dot character, thus telling the configuration |
| that it is really part of an element name. This is simply done |
| by duplicating the dot. So the following statements will return |
| the desired property values: |
| </p> |
| <source><![CDATA[ |
| int testVal = config.getInt("test..value"); |
| String complex = config.getString("test..complex.test..sub..element"); |
| ]]></source> |
| <p> |
| Note the duplicated dots whereever the dot does not act as |
| delimiter. This way it is possible to access properties containing |
| dots in arbitrary combination. However, as you can see, the |
| escaping can be confusing sometimes. So if you have a choice, |
| you should avoid dots in the tag names of your XML configuration |
| files. |
| </p> |
| </subsection> |
| </section> |
| |
| <section name="Validation of XML configuration files"> |
| <p> |
| XML parsers provide support for validation of XML documents to ensure that they |
| conform to a certain DTD. This feature can be useful for |
| configuration files, too. <code>XMLConfiguration</code> allows to enable |
| validation for the files to load. |
| </p> |
| <p> |
| The easiest way to turn on validation is to simply set the |
| <code>validating</code> property to true as shown in the |
| following example: |
| </p> |
| <source><![CDATA[ |
| XMLConfiguration config = new XMLConfiguration(); |
| config.setFileName("myconfig.xml"); |
| config.setValidating(true); |
| |
| // This will throw a ConfigurationException if the XML document does not |
| // conform to its DTD. |
| config.load(); |
| ]]></source> |
| <p> |
| Setting the <code>validating</code> flag to true will cause |
| <code>XMLConfiguration</code> to use a validating XML parser. At this parser |
| a custom <code>ErrorHandler</code> will be registered, which throws |
| exceptions on simple and fatal parsing errors. |
| </p> |
| <p> |
| While using the <code>validating</code> flag is a simple means of |
| enabling validation it cannot fullfil more complex requirements, |
| e.g. schema validation. To be able to deal with such requirements |
| XMLConfiguration provides a generic way of setting up the XML |
| parser to use: A preconfigured <code>DocumentBuilder</code> object |
| can be passed to the <code>setDocumentBuilder()</code> method. |
| </p> |
| <p> |
| So an application can create a <code>DocumentBuilder</code> object |
| and initialize it according to its special needs. Then this |
| object must be passed to the <code>XMLConfiguration</code> instance |
| before invocation of the <code>load()</code> method. When loading |
| a configuration file, the passed in <code>DocumentBuilder</code> will |
| be used instead of the default one. |
| </p> |
| </section> |
| </body> |
| |
| </document> |