| <?xml version="1.0" encoding="UTF-8"?> |
| <!-- |
| Copyright 2002-2004 The Apache Software Foundation |
| |
| Licensed under the Apache License, Version 2.0 (the "License"); |
| you may not use this file except in compliance with the License. |
| You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.2//EN" "document-v12.dtd" [ |
| |
| ]> |
| |
| <document> |
| <header> |
| <title>XML Validation</title> |
| <subtitle>DTDs, catalogs and whatnot</subtitle> |
| <version>$Revision: 1.19 $</version> |
| </header> |
| |
| <body> |
| <section id="xml_validation"> |
| <title>XML validation</title> |
| <p> |
| By default, Forrest will try to validate your XML before generating |
| HTML or a webapp from it, and fail if any XML files are not valid. |
| Validation can be performed manually by typing |
| '<code>forrest validate</code>' in the project root. |
| </p> |
| <p> |
| For an XML file to be valid, it <em>must</em> have a DOCTYPE |
| declaration at the top, indicating its content type. Hence by |
| default, any Forrest-processed XML file that lacks a DOCTYPE |
| declaration will cause the build to break. |
| </p> |
| <p> |
| Despite the strict default behavior, Forrest is quite flexible about |
| validation. Validation can be switched off for certain sections of a |
| project. In validated sections, it is possible for projects to specify |
| exactly what files they want (and don't want) validated. Forrest |
| validation is controlled through a set of properties in |
| <code>forrest.properties</code>: |
| </p> |
| <source> |
| ############## |
| # validation properties |
| |
| # These props determine if validation is performed at all |
| # Values are inherited unless overridden. |
| # Eg, if forrest.validate=false, then all others are false unless set to true. |
| #forrest.validate=true |
| #forrest.validate.xdocs=${forrest.validate} |
| #forrest.validate.skinconf=${forrest.validate} |
| #forrest.validate.sitemap=${forrest.validate} |
| #forrest.validate.stylesheets=${forrest.validate} |
| #forrest.validate.skins=${forrest.validate} |
| #forrest.validate.skins.stylesheets=${forrest.validate.skins} |
| |
| |
| # Key: |
| # *.failonerror=(true|false) stop when an XML file is invalid |
| # *.includes=(pattern) Comma-separated list of path patterns to validate |
| # *.excludes=(pattern) Comma-separated list of path patterns to not validate |
| |
| #forrest.validate.failonerror=true |
| #forrest.validate.includes=**/* |
| #forrest.validate.excludes= |
| # |
| #forrest.validate.xdocs.failonerror=${forrest.validate.failonerror} |
| # |
| #forrest.validate.xdocs.includes=**/*.x* |
| #forrest.validate.xdocs.excludes= |
| # |
| #forrest.validate.skinconf.includes=${skinconf-file} |
| #forrest.validate.skinconf.excludes= |
| #forrest.validate.skinconf.failonerror=${forrest.validate.failonerror} |
| # |
| #forrest.validate.sitemap.includes=${sitemap-file} |
| #forrest.validate.sitemap.excludes= |
| #forrest.validate.sitemap.failonerror=${forrest.validate.failonerror} |
| # |
| #forrest.validate.stylesheets.includes=**/*.xsl |
| #forrest.validate.stylesheets.excludes= |
| #forrest.validate.stylesheets.failonerror=${forrest.validate.failonerror} |
| # |
| #forrest.validate.skins.includes=**/* |
| #forrest.validate.skins.excludes=**/*.xsl |
| #forrest.validate.skins.failonerror=${forrest.validate.failonerror} |
| # |
| #forrest.validate.skins.stylesheets.includes=**/*.xsl |
| #forrest.validate.skins.stylesheets.excludes= |
| #forrest.validate.skins.stylesheets.failonerror=${forrest.validate.skins.failonerror} |
| </source> |
| <p> |
| For example, to avoid validating |
| <code>${project.xdocs-dir}</code>/slides.xml and everything inside the |
| <code>${project.xdocs-dir}/manual/</code> directory, add this to |
| <code>forrest.properties</code>: |
| </p> |
| <source>forrest.validate.excludes=slides.xml, manual/**</source> |
| <note> |
| The <code>failonerror</code> properties only work for files validated |
| with <xmlvalidate>, not (yet) for those validated |
| with <jing>, where <code>failonerror</code> defaults to |
| <code>true</code>. |
| </note> |
| </section> |
| |
| <section> |
| <title>Validating new XML types</title> |
| <p> |
| Forrest provides a <link href="ext:catalog_spec">SGML Catalog</link> |
| [<link href="ext:catalog_intro">tutorial</link>], |
| <code>xml-forrest/src/resources/schema/catalog.xcat</code>, as a means of |
| associating public identifiers (<code>-//APACHE//DTD Documentation |
| V1.1//EN</code> above) with DTDs. |
| If you <link href="site:new_content_type">add a new content type</link>, you |
| should add the DTD to <code>${project.schema-dir}/dtd/</code> and add |
| an entry to the <code>${project.schema-dir}/catalog.xcat</code> file. This |
| section describes the details of this process. |
| </p> |
| |
| <section> |
| <title>Creating or extending a DTD</title> |
| <p> |
| The main Forrest DTDs are designed to be modular and extensible, so |
| it is fairly easy to create a new document type that is a superset |
| of one from Forrest. This is what we'll demonstrate here, using our |
| earlier <link href="site:new_content_type">download format</link> |
| as an example. Our download format adds a group of new elements to |
| the standard 'documentv11' format. Our new elements are described |
| by the following DTD: |
| </p> |
| <source> |
| <!ELEMENT release (downloads)> |
| <!ATTLIST release |
| version CDATA #REQUIRED |
| date CDATA #REQUIRED> |
| |
| <!ELEMENT downloads (file*)> |
| |
| <!ELEMENT file EMPTY> |
| <!ATTLIST file |
| url CDATA #REQUIRED |
| name CDATA #REQUIRED |
| size CDATA #IMPLIED> |
| </source> |
| <p> |
| The documentv11 entities are defined in a reusable 'module': |
| <code>xml-forrest/src/resources/schema/dtd/document-v11.mod</code> |
| The |
| <code>xml-forrest/src/resources/schema/dtd/document-v11.dtd</code> |
| file provides a full description and basic example of how to pull in |
| modules. In our example, our DTD reuses modules |
| <code>common-charents-v10.mod</code> and |
| <code>document-v11.mod</code>. Here is the full DTD, with |
| explanation to follow. |
| </p> |
| <source> |
| <!-- =================================================================== |
| |
| Download Doc format |
| |
| PURPOSE: |
| This DTD provides simple extensions on the Apache DocumentV11 format to link |
| to a set of downloadable files. |
| |
| TYPICAL INVOCATION: |
| |
| <!DOCTYPE document PUBLIC "-//Acme//DTD Download Documentation V1.0//EN" |
| "download-v11.dtd"> |
| |
| |
| AUTHORS: |
| Jeff Turner <jefft@apache.org> |
| |
| |
| COPYRIGHT: |
| Copyright (c) 2002 The Apache Software Foundation. |
| |
| Permission to copy in any form is granted provided this notice is |
| included in all copies. Permission to redistribute is granted |
| provided this file is distributed untouched in all its parts and |
| included files. |
| |
| ==================================================================== --> |
| |
| |
| <!-- =============================================================== --> |
| <!-- Include the Common ISO Character Entity Sets --> |
| <!-- =============================================================== --> |
| |
| <!ENTITY % common-charents PUBLIC |
| "-//APACHE//ENTITIES Common Character Entity Sets V1.0//EN" |
| "common-charents-v10.mod"> |
| %common-charents; |
| |
| <!-- =============================================================== --> |
| <!-- Document --> |
| <!-- =============================================================== --> |
| |
| <!ENTITY % document PUBLIC |
| "-//APACHE//ENTITIES Documentation V1.1//EN" |
| "document-v11.mod"> |
| |
| <!-- Override this entity so that 'release' is allowed below 'section' --> |
| <!ENTITY % local.sections "|release"> |
| |
| %document; |
| |
| <!ELEMENT release (downloads)> |
| <!ATTLIST release |
| version CDATA #REQUIRED |
| date CDATA #REQUIRED> |
| |
| <!ELEMENT downloads (file*)> |
| |
| <!ELEMENT file EMPTY> |
| <!ATTLIST file |
| url CDATA #REQUIRED |
| name CDATA #REQUIRED |
| size CDATA #IMPLIED> |
| |
| <!-- =============================================================== --> |
| <!-- End of DTD --> |
| <!-- =============================================================== --> |
| </source> |
| <p>This custom DTD should be placed in <code>src/documentation/resources/schema/dtd/</code></p> |
| <p> |
| The <!ENTITY % ... > blocks are so-called <link |
| href="http://www.xml.com/axml/target.html#dt-PERef">parameter |
| entities</link>. They are like macros, whose content will be |
| inserted when a parameter-entity reference, like |
| <code>%common-charents;</code> or <code>%document;</code> is |
| inserted. |
| </p> |
| <p> |
| In our DTD, we first pull in the 'common-charents' entity, which |
| defines character symbol sets. We then define the 'document' |
| entity. However, before the <code>%document;</code> PE reference, we |
| first override the 'local.section' entity. This is a hook into |
| document-v11.mod. By setting its value to '|release', we declare |
| that our <release> element is to be allowed wherever "local |
| sections" are used. There are five or so such hooks for different |
| areas of the document; see document-v11.dtd for more details. We then |
| import the %document; contents, and declare the rest of our DTD |
| elements. |
| </p> |
| <p> |
| We now have a DTD for the 'download' document type. |
| </p> |
| <note><link |
| href="http://www.oasis-open.org/docbook/documentation/reference/html/ch05.html">Chapter |
| 5: Customizing DocBook</link> of Norman Walsh's "DocBook: The |
| Definitive Guide" gives a complete overview of the process of |
| customizing a DTD. |
| </note> |
| </section> |
| <section> |
| <title>Associating DTDs with document types</title> |
| |
| <p> |
| Recall that our DOCTYPE declaration for our download document type |
| is: |
| </p> |
| <source><!DOCTYPE document PUBLIC "-//Acme//DTD Download Documentation V1.0//EN" |
| "download-v11.dtd"> |
| </source> |
| <p> |
| We only care about the quoted section after <code>PUBLIC</code>, called |
| the "public identifier", which globally identifies our document type. |
| We cannot rely on the subsequent "system identifier" part |
| ("download-v11.dtd"), because as a relative reference it is liable to |
| break. The solution Forrest uses is to ignore the system id, and rely |
| on a mapping from the public ID to a stable DTD location, via a |
| Catalog file.</p> |
| <note> |
| See <link href="ext:catalog_intro">this article</link> for a good |
| introduction to catalogs and the Cocoon documentation |
| <link href="ext:cocoon/catalogs">Entity resolution with catalogs</link>. |
| </note> |
| <p> |
| Forrest provides a standard catalog file at |
| <code>xml-forrest/src/resources/schema/catalog.xcat</code> for the document |
| types that Forrest provides. Projects can augment this with their |
| own catalog file located in |
| <code>${project.schema-dir}/catalog.xcat</code>. |
| Use the "project.catalog" property in <code>forrest.properties</code> |
| if you need a different pathname. Remember to raise the "verbosity" |
| level if you suspect problems with your catalog. |
| </p> |
| <p> |
| Forrest uses the XML Catalog syntax by default, although the OASIS |
| format can also be used. Here is what the XML format looks like: |
| </p> |
| <source><![CDATA[<?xml version="1.0"?> |
| <!-- OASIS XML Catalog for Forrest --> |
| <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> |
| <public publicId="-//Acme//DTD Download Documentation V1.0//EN" |
| uri="dtd/download.dtd"/> |
| </catalog>]]></source> |
| <p> |
| The format is described in <link href="ext:catalog_spec">the |
| spec</link>, and is fairly simple. <code>public</code> elements map |
| a public identifier to a DTD (relative to the catalog file). |
| </p> |
| <p> |
| We now have a custom DTD and a catalog mapping which lets Forrest |
| locate the DTD. Now if we were to run '<code>forrest validate</code>' |
| our download file would validate along with all the others. If |
| something goes wrong, try running <code>forrest -v validate</code> to |
| see the error in more detail. |
| </p> |
| </section> |
| </section> |
| <section> |
| <title>Validating in an editor</title> |
| <p> |
| If you have an XML editor that understands SGML or XML catalogs, let |
| it know where the Forrest catalog file is, and you will be able to |
| validate any Forrest XML file, regardless of location, as you edit |
| your files. |
| </p> |
| <section> |
| <title>Case study: setting up xmllint</title> |
| <p> |
| On *nix systems, one of the best XML validation tools is |
| <code>xmllint</code>, that comes as part of the libxml2 package. It is |
| very fast, can validate whole directories of files at once, and can |
| configured to use Forrest's catalog file for validation. |
| </p> |
| <p> |
| To tell xmllint where the Forrest catalog is, add the path to the catalog |
| file to the <code>SGML_CATALOG_FILES</code> variable. For example: |
| </p> |
| <source>export SGML_CATALOG_FILES=$SGML_CATALOG_FILES:\ |
| /home/jeff/apache/xml/xml-forrest/src/resources/schema/catalog |
| </source> |
| <p> |
| Then Forrest XML files can be validated as follows: |
| </p> |
| <source> xmllint --valid --noout --catalogs *.xml |
| </source> |
| <p> |
| For users of the vim editor, the following .vimrc entries are useful: |
| </p> |
| <source> |
| au FileType xml set efm=%A%f:%l:\ %.%#error:\ %m,%-Z%p^,%-C%.%# |
| au FileType xml set makeprg=xmllint\ --noout\ --valid\ --catalogs\ % |
| </source> |
| </section> |
| </section> |
| |
| <anchor id="relaxng"/> |
| <section> |
| <title>Validation using RELAX NG</title> |
| <p> |
| Other validation is also conducted during build-time using RELAX NG. |
| This validates all of the important configuration files, both in |
| Forrest itself and in your project. At the moment it processes all |
| skinconf.xml files, all sitemap.xmap files, and all XSLT stylesheets. |
| </p> |
| <p> |
| The RNG grammars to do this are located in the |
| <code>src/resources/schema/relaxng</code> directory. If you want to |
| know more about this, and perhaps extend it for your own use, then |
| see <code>src/resources/schema/relaxng/README.txt</code> and the Ant |
| targets in the various build.xml files. |
| </p> |
| </section> |
| </body> |
| </document> |