blob: da4957028c608215da5a84188f097cc440db7514 [file] [log] [blame]
<?xml version="1.0"?>
<!DOCTYPE modulesynopsis SYSTEM "../style/modulesynopsis.dtd">
<?xml-stylesheet type="text/xsl" href="../style/manual.en.xsl"?>
<!-- $LastChangedRevision$ -->
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<modulesynopsis metafile="mod_proxy_html.xml.meta">
<name>mod_proxy_html</name>
<description>Rewrite HTML links in to ensure they are addressable
from Clients' networks in a proxy context.</description>
<status>Base</status>
<sourcefile>mod_proxy_html.c</sourcefile>
<identifier>proxy_html_module</identifier>
<compatibility>Version 2.4 and later. Available as a third-party module
for earlier 2.x versions</compatibility>
<summary>
<p>This module provides an output filter to rewrite HTML links in a
proxy situation, to ensure that links work for users outside the proxy.
It serves the same purpose as Apache's ProxyPassReverse directive does
for HTTP headers, and is an essential component of a reverse proxy.</p>
<p>For example, if a company has an application server at
<code>appserver.example.com</code> that is only visible from within
the company's internal network, and a public webserver
<code>www.example.com</code>, they may wish to provide a gateway to the
application server at <code>http://www.example.com/appserver/</code>.
When the application server links to itself, those links need to be
rewritten to work through the gateway. mod_proxy_html serves to rewrite
<code>&lt;a href="http://appserver.example.com/foo/bar.html"&gt;foobar&lt;/a&gt;</code> to
<code>&lt;a href="http://www.example.com/appserver/foo/bar.html"&gt;foobar&lt;/a&gt;</code>
making it accessible from outside.</p>
<p>mod_proxy_html was originally developed at Web&#222;ing, whose
extensive <a href="http://apache.webthing.com/mod_proxy_html/"
>documentation</a> may be useful to users.</p>
</summary>
<directivesynopsis>
<name>ProxyHTMLMeta</name>
<description>Turns on or off extra pre-parsing of metadata in HTML
<code>&lt;head&gt;</code> sections.</description>
<syntax>ProxyHTMLMeta <var>On|Off</var></syntax>
<default>ProxyHTMLMeta Off</default>
<contextlist><context>server config</context>
<context>virtual host</context><context>directory</context>
</contextlist>
<compatibility>Version 2.4 and later; available as a third-party
module for earlier 2.x versions.</compatibility>
<usage>
<p>This turns on or off pre-parsing of metadata in HTML
<code>&lt;head&gt;</code> sections.</p>
<p>If not required, turning ProxyHTMLMeta Off will give a small
performance boost by skipping this parse step. However, it
is sometimes necessary for internationalisation to work correctly.</p>
<p>ProxyHTMLMeta has two effects. Firstly and most importantly
it enables detection of character encodings declared in the form</p>
<pre>&lt;meta http-equiv="Content-Type" content="text/html;charset=<var>foo</var>"&gt;</pre>
<p>or, in the case of an XHTML document, an XML declaration.
It is NOT required if the charset is declared in a real HTTP header
(which is always preferable) from the backend server, nor if the
document is <var>utf-8</var> (unicode) or a subset such as ASCII.
You may also be able to dispense with it where documents use a
default declared using <directive module="mod_xml2enc"
>xml2EncDefault</directive>, but that risks propagating an
incorrect declaration. A <directive>ProxyHTMLCharsetOut</directive>
can remove that risk, but is likely to be a bigger processing
overhead than enabling ProxyHTMLMeta.</p>
<p>The other effect of enabling ProxyHTMLMeta is to parse all
<code>&lt;meta http-equiv=...&gt;</code> declarations and convert
them to real HTTP headers, in keeping with the original purpose
of this form of the HTML &lt;meta&gt; element.</p>
<note type="warning"><title>Warning</title>
Because ProxyHTMLMeta promotes <strong>all</strong>
<code>http-equiv</code> elements to HTTP headers, it is important that you
only enable it in cases where you trust the HTML content as much as you
trust the upstream server. If the HTML is controlled by bad actors, it
will be possible for them to inject arbitrary, possibly malicious, HTTP
headers into your server's responses.
</note>
</usage>
</directivesynopsis>
<directivesynopsis>
<name>ProxyHTMLEnable</name>
<description>Turns the proxy_html filter on or off.</description>
<syntax>ProxyHTMLEnable <var>On|Off</var></syntax>
<default>ProxyHTMLEnable Off</default>
<contextlist><context>server config</context>
<context>virtual host</context><context>directory</context>
</contextlist>
<compatibility>Version 2.4 and later; available as a third-party
module for earlier 2.x versions.</compatibility>
<usage>
<p>A simple switch to enable or disable the proxy_html filter.
If <module>mod_xml2enc</module> is loaded it will also automatically
set up internationalisation support.</p>
<p>Note that the proxy_html filter will only act on HTML data
(Content-Type text/html or application/xhtml+xml) and when the
data are proxied. You can override this (at your own risk) by
setting the <var>PROXY_HTML_FORCE</var> environment variable.</p>
</usage>
</directivesynopsis>
<directivesynopsis>
<name>ProxyHTMLURLMap</name>
<description>Defines a rule to rewrite HTML links</description>
<syntax>ProxyHTMLURLMap <var>from-pattern to-pattern [flags] [cond]</var></syntax>
<contextlist><context>server config</context>
<context>virtual host</context><context>directory</context>
</contextlist>
<compatibility>Version 2.4 and later; available as a third-party
module for earlier 2.x versions.</compatibility>
<usage>
<p>This is the key directive for rewriting HTML links. When parsing a document,
whenever a link target matches <var>from-pattern</var>, the matching
portion will be rewritten to <var>to-pattern</var>, as modified by any
flags supplied and by the
<directive module="mod_proxy_html">ProxyHTMLExtended</directive> directive.
Only the elements specified using
the <directive module="mod_proxy_html">ProxyHTMLLinks</directive> directive
will be considered as HTML links.</p>
<p>The optional third argument may define any of the following
<strong>Flags</strong>. Flags are case-sensitive.</p>
<dl>
<dt>h</dt>
<dd><p>Ignore HTML links (pass through unchanged)</p></dd>
<dt>e</dt>
<dd><p>Ignore scripting events (pass through unchanged)</p></dd>
<dt>c</dt>
<dd><p>Pass embedded script and style sections through untouched.</p></dd>
<dt>L</dt>
<dd><p>Last-match. If this rule matches, no more rules are applied
(note that this happens automatically for HTML links).</p></dd>
<dt>l</dt>
<dd><p>Opposite to L. Overrides the one-change-only default
behaviour with HTML links.</p></dd>
<dt>R</dt>
<dd><p>Use Regular Expression matching-and-replace. <code>from-pattern</code>
is a regexp, and <code>to-pattern</code> a replacement string that may be
based on the regexp. Regexp memory is supported: you can use brackets ()
in the <code>from-pattern</code> and retrieve the matches with $1 to $9
in the <code>to-pattern</code>.</p>
<p>If R is not set, it will use string-literal search-and-replace.
The logic is <em>starts-with</em> in HTML links, but
<em>contains</em> in scripting events and embedded script and style sections.
</p>
</dd>
<dt>x</dt>
<dd><p>Use POSIX extended Regular Expressions. Only applicable with R.</p></dd>
<dt>i</dt>
<dd><p>Case-insensitive matching. Only applicable with R.</p></dd>
<dt>n</dt>
<dd><p>Disable regexp memory (for speed). Only applicable with R.</p></dd>
<dt>s</dt>
<dd><p>Line-based regexp matching. Only applicable with R.</p></dd>
<dt>^</dt>
<dd><p>Match at start only. This applies only to string matching
(not regexps) and is irrelevant to HTML links.</p></dd>
<dt>$</dt>
<dd><p>Match at end only. This applies only to string matching
(not regexps) and is irrelevant to HTML links.</p></dd>
<dt>V</dt>
<dd><p>Interpolate environment variables in <code>to-pattern</code>.
A string of the form <code>${varname|default}</code> will be replaced by the
value of environment variable <code>varname</code>. If that is unset, it
is replaced by <code>default</code>. The <code>|default</code> is optional.</p>
<p>NOTE: interpolation will only be enabled if
<directive>ProxyHTMLInterp</directive> is <var>On</var>.</p>
</dd>
<dt>v</dt>
<dd><p>Interpolate environment variables in <code>from-pattern</code>.
Patterns supported are as above.</p>
<p>NOTE: interpolation will only be enabled if
<directive>ProxyHTMLInterp</directive> is <var>On</var>.</p>
</dd>
</dl>
<p>The optional fourth <strong>cond</strong> argument defines a condition
that will be evaluated per Request, provided
<directive>ProxyHTMLInterp</directive> is <var>On</var>.
If the condition evaluates FALSE the map will not be applied in this request.
If TRUE, or if no condition is defined, the map is applied.</p>
<p>A <strong>cond</strong> is evaluated by the <a href="../expr.html"
>Expression Parser</a>. In addition, the simpler syntax of conditions
in mod_proxy_html 3.x for HTTPD 2.0 and 2.2 is also supported.</p>
</usage>
</directivesynopsis>
<directivesynopsis>
<name>ProxyHTMLInterp</name>
<description>Enables per-request interpolation of
<directive>ProxyHTMLURLMap</directive> rules.</description>
<syntax>ProxyHTMLInterp <var>On|Off</var></syntax>
<default>ProxyHTMLInterp Off</default>
<contextlist><context>server config</context>
<context>virtual host</context><context>directory</context>
</contextlist>
<compatibility>Version 2.4 and later; available as a third-party
for earlier 2.x versions</compatibility>
<usage>
<p>This enables per-request interpolation in
<directive>ProxyHTMLURLMap</directive> to- and from- patterns.</p>
<p>If interpolation is not enabled, all rules are pre-compiled at startup.
With interpolation, they must be re-compiled for every request, which
implies an extra processing overhead. It should therefore be
enabled only when necessary.</p>
</usage>
</directivesynopsis>
<directivesynopsis>
<name>ProxyHTMLDocType</name>
<description>Sets an HTML or XHTML document type declaration.</description>
<syntax>ProxyHTMLDocType <var>HTML|XHTML [Legacy]</var><br/><strong>OR</strong>
<br/>ProxyHTMLDocType <var>fpi [SGML|XML]</var><br/><strong>OR</strong>
<br/>ProxyHTMLDocType <var>html5</var><br/><strong>OR</strong>
<br/>ProxyHTMLDocType <var>auto</var></syntax>
<default>ProxyHTMLDocType auto (2.5/trunk versions); no FPI (2.4.x)</default>
<contextlist><context>server config</context>
<context>virtual host</context><context>directory</context>
</contextlist>
<compatibility>Version 2.4 and later; available as a third-party
for earlier 2.x versions</compatibility>
<usage>
<p>In the first form, documents will be declared as HTML 4.01 or XHTML 1.0
according to the option selected. This option also determines whether
HTML or XHTML syntax is used for output. Note that the format of the
documents coming from the backend server is immaterial: the parser will
deal with it automatically. If the optional second argument is set to
"Legacy", documents will be declared "Transitional", an option that may
be necessary if you are proxying pre-1998 content or working with defective
authoring/publishing tools.</p>
<p>In the second form, it will insert your own FPI. The optional second
argument determines whether SGML/HTML or XML/XHTML syntax will be used.</p>
<p>The third form declares documents as HTML 5.</p>
<p>The fourth form is new in HTTPD trunk and not yet available in released
versions, and uses libxml2's HTML parser to detect the doctype.</p>
<p>If the first form is used, mod_proxy_html
will also clean up the HTML to the specified standard. It cannot
fix every error, but it will strip out bogus elements and attributes.
It will also optionally log other errors at <directive
module="core">LogLevel</directive> Debug.</p>
</usage>
</directivesynopsis>
<directivesynopsis>
<name>ProxyHTMLFixups</name>
<description>Fixes for simple HTML errors.</description>
<syntax>ProxyHTMLFixups <var>[lowercase] [dospath] [reset]</var></syntax>
<contextlist><context>server config</context>
<context>virtual host</context><context>directory</context>
</contextlist>
<compatibility>Version 2.4 and later; available as a third-party
for earlier 2.x versions</compatibility>
<usage>
<p>This directive takes one to three arguments as follows:</p>
<ul>
<li><code>lowercase</code> Urls are rewritten to lowercase</li>
<li><code>dospath</code> Backslashes in URLs are rewritten to forward slashes.</li>
<li><code>reset</code> Unset any options set at a higher level in the configuration.</li>
</ul>
<p>Take care when using these. The fixes will correct certain authoring
mistakes, but risk also erroneously fixing links that were correct to start with.
Only use them if you know you have a broken backend server.</p>
</usage>
</directivesynopsis>
<directivesynopsis>
<name>ProxyHTMLExtended</name>
<description>Determines whether to fix links in inline scripts, stylesheets,
and scripting events.</description>
<syntax>ProxyHTMLExtended <var>On|Off</var></syntax>
<default>ProxyHTMLExtended Off</default>
<contextlist><context>server config</context>
<context>virtual host</context><context>directory</context>
</contextlist>
<compatibility>Version 2.4 and later; available as a third-party
for earlier 2.x versions</compatibility>
<usage>
<p>Set to <code>Off</code>, HTML links are rewritten according to the
<directive>ProxyHTMLURLMap</directive> directives, but links appearing
in Javascript and CSS are ignored.</p>
<p>Set to <code>On</code>, all scripting events (as determined by
<directive>ProxyHTMLEvents</directive>) and embedded scripts or
stylesheets are also processed by the <directive>ProxyHTMLURLMap</directive>
rules, according to the flags set for each rule. Since this requires more
parsing, performance will be best if you only enable it when strictly necessary.
</p><p>
You'll also need to take care over patterns matched, since the parser has no
knowledge of what is a URL within an embedded script or stylesheet.
In particular, extended matching of <code>/</code> is likely to lead to
false matches.
</p>
</usage>
</directivesynopsis>
<directivesynopsis>
<name>ProxyHTMLStripComments</name>
<description>Determines whether to strip HTML comments.</description>
<syntax>ProxyHTMLStripComments <var>On|Off</var></syntax>
<default>ProxyHTMLStripComments Off</default>
<contextlist><context>server config</context>
<context>virtual host</context><context>directory</context>
</contextlist>
<compatibility>Version 2.4 and later; available as a third-party
for earlier 2.x versions</compatibility>
<usage>
<p>This directive will cause mod_proxy_html to strip HTML comments.
Note that this will also kill off any scripts or styles embedded in
comments (a bogosity introduced in 1995/6 with Netscape 2 for the
benefit of then-older browsers, but still in use today).
It may also interfere with comment-based processors such as SSI or ESI:
be sure to run any of those <em>before</em> mod_proxy_html in the
filter chain if stripping comments!</p>
</usage>
</directivesynopsis>
<directivesynopsis>
<name>ProxyHTMLBufSize</name>
<description>Sets the buffer size increment for buffering inline scripts and
stylesheets.</description>
<syntax>ProxyHTMLBufSize <var>bytes</var></syntax>
<contextlist><context>server config</context>
<context>virtual host</context><context>directory</context>
</contextlist>
<compatibility>Version 2.4 and later; available as a third-party
for earlier 2.x versions</compatibility>
<usage>
<p>In order to parse non-HTML content (stylesheets and scripts) embedded
in HTML documents, mod_proxy_html
has to read the entire script or stylesheet into a buffer. This buffer will
be expanded as necessary to hold the largest script or stylesheet in a page,
in increments of <var>bytes</var> as set by this directive.</p>
<p>The default is 8192, and will work well for almost all pages. However,
if you know you're proxying pages containing stylesheets and/or
scripts bigger than 8K (that is, for a single script or stylesheet,
NOT in total), it will be more efficient to set a larger buffer
size and avoid the need to resize the buffer dynamically during a request.
</p>
</usage>
</directivesynopsis>
<directivesynopsis>
<name>ProxyHTMLEvents</name>
<description>Specify attributes to treat as scripting events.</description>
<syntax>ProxyHTMLEvents <var>attribute [attribute ...]</var></syntax>
<contextlist><context>server config</context>
<context>virtual host</context><context>directory</context>
</contextlist>
<compatibility>Version 2.4 and later; available as a third-party
for earlier 2.x versions</compatibility>
<usage>
<p>Specifies one or more attributes to treat as scripting events and
apply <directive>ProxyHTMLURLMap</directive>s to where enabled.
You can specify any number of attributes in one or more
<code>ProxyHTMLEvents</code> directives.</p>
<p>Normally you'll set this globally. If you set ProxyHTMLEvents in more than
one scope so that one overrides the other, you'll need to specify a complete
set in each of those scopes.</p>
<p>A default configuration is supplied in <var>proxy-html.conf</var>
and defines the events in standard HTML 4 and XHTML 1.</p>
</usage>
</directivesynopsis>
<directivesynopsis>
<name>ProxyHTMLLinks</name>
<description>Specify HTML elements that have URL attributes to be rewritten.</description>
<syntax>ProxyHTMLLinks <var>element attribute [attribute2 ...]</var></syntax>
<contextlist><context>server config</context>
<context>virtual host</context><context>directory</context>
</contextlist>
<compatibility>Version 2.4 and later; available as a third-party
for earlier 2.x versions</compatibility>
<usage>
<p>Specifies elements that have URL attributes that should be rewritten
using standard <directive module="mod_proxy_html">ProxyHTMLURLMap</directive>s.
You will need one ProxyHTMLLinks directive per element,
but it can have any number of attributes.</p>
<p>Normally you'll set this globally. If you set ProxyHTMLLinks in more than
one scope so that one overrides the other, you'll need to specify a complete
set in each of those scopes.</p>
<p>A default configuration is supplied in <var>proxy-html.conf</var>
and defines the HTML links for standard HTML 4 and XHTML 1.</p>
<example>
<title>Examples from proxy-html.conf</title>
<highlight language="config">
ProxyHTMLLinks a href
ProxyHTMLLinks area href
ProxyHTMLLinks link href
ProxyHTMLLinks img src longdesc usemap
ProxyHTMLLinks object classid codebase data usemap
ProxyHTMLLinks q cite
ProxyHTMLLinks blockquote cite
ProxyHTMLLinks ins cite
ProxyHTMLLinks del cite
ProxyHTMLLinks form action
ProxyHTMLLinks input src usemap
ProxyHTMLLinks head profile
ProxyHTMLLinks base href
ProxyHTMLLinks script src for
</highlight>
</example>
</usage>
</directivesynopsis>
<directivesynopsis>
<name>ProxyHTMLCharsetOut</name>
<description>Specify a charset for mod_proxy_html output.</description>
<syntax>ProxyHTMLCharsetOut <var>Charset | *</var></syntax>
<contextlist><context>server config</context>
<context>virtual host</context><context>directory</context>
</contextlist>
<compatibility>Version 2.4 and later; available as a third-party
for earlier 2.x versions</compatibility>
<usage>
<p>This selects an encoding for mod_proxy_html output. It should not
normally be used, as any change from the default <code>UTF-8</code>
(Unicode - as used internally by libxml2) will impose an additional
processing overhead. The special token <code>ProxyHTMLCharsetOut *</code>
will generate output using the same encoding as the input.</p>
<p>Note that this relies on <module>mod_xml2enc</module> being loaded.</p>
</usage>
</directivesynopsis>
</modulesynopsis>