| <?xml version="1.0"?> | 
 | <!DOCTYPE modulesynopsis SYSTEM "../style/modulesynopsis.dtd"> | 
 | <?xml-stylesheet type="text/xsl" href="../style/manual.en.xsl"?> | 
 | <!-- $LastChangedRevision$ --> | 
 |  | 
 | <!-- | 
 |  Licensed to the Apache Software Foundation (ASF) under one or more | 
 |  contributor license agreements.  See the NOTICE file distributed with | 
 |  this work for additional information regarding copyright ownership. | 
 |  The ASF licenses this file to You under the Apache License, Version 2.0 | 
 |  (the "License"); you may not use this file except in compliance with | 
 |  the License.  You may obtain a copy of the License at | 
 |  | 
 |      http://www.apache.org/licenses/LICENSE-2.0 | 
 |  | 
 |  Unless required by applicable law or agreed to in writing, software | 
 |  distributed under the License is distributed on an "AS IS" BASIS, | 
 |  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | 
 |  See the License for the specific language governing permissions and | 
 |  limitations under the License. | 
 | --> | 
 |  | 
 | <modulesynopsis metafile="mod_xml2enc.xml.meta"> | 
 |  | 
 | <name>mod_xml2enc</name> | 
 | <description>Enhanced charset/internationalisation support for libxml2-based | 
 | filter modules</description> | 
 | <status>Base</status> | 
 | <sourcefile>mod_xml2enc.c</sourcefile> | 
 | <identifier>xml2enc_module</identifier> | 
 | <compatibility>Version 2.4 and later.  Available as a third-party module | 
 | for 2.2.x versions</compatibility> | 
 |  | 
 | <summary> | 
 |     <p>This module provides enhanced internationalisation support for | 
 |     markup-aware filter modules such as <module>mod_proxy_html</module>. | 
 |     It can automatically detect the encoding of input data and ensure | 
 |     they are correctly processed by the <a href="http://xmlsoft.org/" | 
 |     >libxml2</a> parser, including converting to Unicode (UTF-8) where | 
 |     necessary.  It can also convert data to an encoding of choice | 
 |     after markup processing, and will ensure the correct <var>charset</var> | 
 |     value is set in the HTTP <var>Content-Type</var> header.</p> | 
 | </summary> | 
 |  | 
 | <section id="usage"><title>Usage</title> | 
 |     <p>There are two usage scenarios: with modules programmed to work | 
 |     with mod_xml2enc, and with those that are not aware of it:</p> | 
 |     <dl> | 
 |     <dt>Filter modules enabled for mod_xml2enc</dt><dd> | 
 |     <p>Modules such as <module>mod_proxy_html</module> version 3.1 | 
 |     and up use the <code>xml2enc_charset</code> optional function to retrieve | 
 |     the charset argument to pass to the libxml2 parser, and may use the | 
 |     <code>xml2enc_filter</code> optional function to postprocess to another | 
 |     encoding.  Using mod_xml2enc with an enabled module, no configuration | 
 |     is necessary: the other module will configure mod_xml2enc for you | 
 |     (though you may still want to customise it using the configuration | 
 |     directives below).</p> | 
 |     </dd> | 
 |     <dt>Non-enabled modules</dt><dd> | 
 |     <p>To use it with a libxml2-based module that isn't explicitly enabled for | 
 |     mod_xml2enc, you will have to configure the filter chain yourself.  So to | 
 |     use it with a filter <strong>foo</strong> provided by a module | 
 |     <strong>mod_foo</strong> to improve the latter's i18n support with HTML and | 
 |     XML, you could use</p> | 
 |     <pre><code> | 
 |     FilterProvider iconv    xml2enc Content-Type $text/html | 
 |     FilterProvider iconv    xml2enc Content-Type $xml | 
 |     FilterProvider markup   foo Content-Type $text/html | 
 |     FilterProvider markup   foo Content-Type $xml | 
 |     FilterChain     iconv markup | 
 |     </code></pre> | 
 |     <p><strong>mod_foo</strong> will now support any character set supported by either | 
 |     (or both) of libxml2 or apr_xlate/iconv.</p> | 
 |     </dd></dl> | 
 | </section> | 
 |  | 
 | <section id="api"><title>Programming API</title> | 
 |     <p>Programmers writing libxml2-based filter modules are encouraged to | 
 |     enable them for mod_xml2enc, to provide strong i18n support for your | 
 |     users without reinventing the wheel.  The programming API is exposed in | 
 |     <var>mod_xml2enc.h</var>, and a usage example is | 
 |     <module>mod_proxy_html</module>.</p> | 
 | </section> | 
 |  | 
 | <section id="sniffing"><title>Detecting an Encoding</title> | 
 |     <p>Unlike <module>mod_charset_lite</module>, mod_xml2enc is designed | 
 |     to work with data whose encoding cannot be known in advance and thus | 
 |     configured.  It therefore uses 'sniffing' techniques to detect the | 
 |     encoding of HTTP data as follows:</p> | 
 |     <ol> | 
 |         <li>If the HTTP <var>Content-Type</var> header includes a | 
 |         <var>charset</var> parameter, that is used.</li> | 
 |         <li>If the data start with an XML Byte Order Mark (BOM) or an | 
 |         XML encoding declaration, that is used.</li> | 
 |         <li>If an encoding is declared in an HTML <code><META></code> | 
 |         element, that is used.</li> | 
 |         <li>If none of the above match, the default value set by | 
 |         <directive>xml2EncDefault</directive> is used.</li> | 
 |     </ol> | 
 |     <p>The rules are applied in order.  As soon as a match is found, | 
 |     it is used and detection is stopped.</p> | 
 | </section> | 
 |  | 
 | <section id="output"><title>Output Encoding</title> | 
 | <p><a href="http://xmlsoft.org/">libxml2</a> always uses UTF-8 (Unicode) | 
 | internally, and libxml2-based filter modules will output that by default. | 
 | mod_xml2enc can change the output encoding through the API, but there | 
 | is currently no way to configure that directly.</p> | 
 | <p>Changing the output encoding should (in theory, at least) never be | 
 | necessary, and is not recommended due to the extra processing load on | 
 | the server of an unnecessary conversion.</p> | 
 | </section> | 
 |  | 
 | <section id="alias"><title>Unsupported Encodings</title> | 
 | <p>If you are working with encodings that are not supported by any of | 
 | the conversion methods available on your platform, you can still alias | 
 | them to a supported encoding using <directive>xml2EncAlias</directive>.</p> | 
 | </section> | 
 |  | 
 | <directivesynopsis> | 
 | <name>xml2EncDefault</name> | 
 | <description>Sets a default encoding to assume when absolutely no information | 
 | can be <a href="#sniffing">automatically detected</a></description> | 
 | <syntax>xml2EncDefault <var>name</var></syntax> | 
 | <contextlist><context>server config</context> | 
 | <context>virtual host</context><context>directory</context> | 
 | <context>.htaccess</context></contextlist> | 
 | <override>All</override> | 
 |  | 
 | <usage> | 
 |     <p>If you are processing data with known encoding but no encoding | 
 |     information, you can set this default to help mod_xml2enc process | 
 |     the data correctly.  For example, to work with the default value | 
 |     of Latin1 (<var>iso-8859-1</var>) specified in HTTP/1.0, use:</p> | 
 |     <highlight language="config"> | 
 | xml2EncDefault iso-8859-1 | 
 |     </highlight> | 
 | </usage> | 
 | </directivesynopsis> | 
 |  | 
 | <directivesynopsis> | 
 | <name>xml2EncAlias</name> | 
 | <description>Recognise Aliases for encoding values</description> | 
 | <syntax>xml2EncAlias <var>charset alias [alias ...]</var></syntax> | 
 | <contextlist><context>server config</context></contextlist> | 
 |  | 
 | <usage> | 
 |     <p>This server-wide directive aliases one or more encoding to another | 
 |     encoding.  This enables encodings not recognised by libxml2 to be handled | 
 |     internally by libxml2's encoding support using the translation table for | 
 |     a recognised encoding.  This serves two purposes: to support character sets | 
 |     (or names) not recognised either by libxml2 or iconv, and to skip | 
 |     conversion for an encoding where it is known to be unnecessary.</p> | 
 | </usage> | 
 | </directivesynopsis> | 
 |  | 
 | <directivesynopsis> | 
 | <name>xml2StartParse</name> | 
 | <description>Advise the parser to skip leading junk.</description> | 
 | <syntax>xml2StartParse <var>element [element ...]</var></syntax> | 
 | <contextlist><context>server config</context><context>virtual host</context> | 
 | <context>directory</context><context>.htaccess</context></contextlist> | 
 | <override>All</override> | 
 |  | 
 | <usage> | 
 |     <p>Specify that the markup parser should start at the first instance | 
 |     of any of the elements specified.  This can be used as a workaround | 
 |     where a broken backend inserts leading junk that messes up the parser (<a | 
 |     href="http://bahumbug.wordpress.com/2006/10/12/mod_proxy_html-revisited/" | 
 |     >example here</a>).</p> | 
 |     <p>It should never be used for XML, nor well-formed HTML.</p> | 
 | </usage> | 
 | </directivesynopsis> | 
 |  | 
 | </modulesynopsis> |