| <?xml version="1.0"?> |
| <!DOCTYPE modulesynopsis SYSTEM "../style/modulesynopsis.dtd"> |
| <?xml-stylesheet type="text/xsl" href="../style/manual.en.xsl"?> |
| <!-- $LastChangedRevision$ --> |
| |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <modulesynopsis metafile="mod_proxy_html.xml.meta"> |
| |
| <name>mod_proxy_html</name> |
| <description>Rewrite HTML links in to ensure they are addressable |
| from Clients' networks in a proxy context.</description> |
| <status>Base</status> |
| <sourcefile>mod_proxy_html.c</sourcefile> |
| <identifier>proxy_html_module</identifier> |
| <compatibility>Version 2.4 and later. Available as a third-party module |
| for earlier 2.x versions</compatibility> |
| |
| <summary> |
| <p>This module provides an output filter to rewrite HTML links in a |
| proxy situation, to ensure that links work for users outside the proxy. |
| It serves the same purpose as Apache's ProxyPassReverse directive does |
| for HTTP headers, and is an essential component of a reverse proxy.</p> |
| |
| <p>For example, if a company has an application server at |
| <code>appserver.example.com</code> that is only visible from within |
| the company's internal network, and a public webserver |
| <code>www.example.com</code>, they may wish to provide a gateway to the |
| application server at <code>http://www.example.com/appserver/</code>. |
| When the application server links to itself, those links need to be |
| rewritten to work through the gateway. mod_proxy_html serves to rewrite |
| <code><a href="http://appserver.example.com/foo/bar.html">foobar</a></code> to |
| <code><a href="http://www.example.com/appserver/foo/bar.html">foobar</a></code> |
| making it accessible from outside.</p> |
| |
| <p>mod_proxy_html was originally developed at WebÞing, whose |
| extensive <a href="http://apache.webthing.com/mod_proxy_html/" |
| >documentation</a> may be useful to users.</p> |
| </summary> |
| |
| <directivesynopsis> |
| <name>ProxyHTMLMeta</name> |
| <description>Turns on or off extra pre-parsing of metadata in HTML |
| <code><head></code> sections.</description> |
| <syntax>ProxyHTMLMeta On|Off</syntax> |
| <default>ProxyHTMLMeta Off</default> |
| <contextlist><context>server config</context> |
| <context>virtual host</context><context>directory</context> |
| </contextlist> |
| <compatibility>Version 2.4 and later; available as a third-party |
| module for earlier 2.x versions.</compatibility> |
| |
| <usage> |
| <p>This turns on or off pre-parsing of metadata in HTML |
| <code><head></code> sections.</p> |
| <p>If not required, turning ProxyHTMLMeta Off will give a small |
| performance boost by skipping this parse step. However, it |
| is sometimes necessary for internationalisation to work correctly.</p> |
| <p><directive>ProxyHTMLMeta</directive> has two effects. Firstly and most importantly |
| it enables detection of character encodings declared in the form</p> |
| <pre><meta http-equiv="Content-Type" content="text/html;charset=<var>foo</var>"></pre> |
| <p>or, in the case of an XHTML document, an XML declaration. |
| It is NOT required if the charset is declared in a real HTTP header |
| (which is always preferable) from the backend server, nor if the |
| document is <var>utf-8</var> (unicode) or a subset such as ASCII. |
| You may also be able to dispense with it where documents use a |
| default declared using <directive module="mod_xml2enc" |
| >xml2EncDefault</directive>, but that risks propagating an |
| incorrect declaration. A <directive module="mod_proxy_html">ProxyHTMLCharsetOut</directive> |
| can remove that risk, but is likely to be a bigger processing |
| overhead than enabling ProxyHTMLMeta.</p> |
| <p>The other effect of enabling <directive>ProxyHTMLMeta</directive> is to parse all |
| <code><meta http-equiv=...></code> declarations and convert |
| them to real HTTP headers, in keeping with the original purpose |
| of this form of the HTML <meta> element.</p> |
| |
| <note type="warning"><title>Warning</title> |
| Because ProxyHTMLMeta promotes <strong>all</strong> |
| <code>http-equiv</code> elements to HTTP headers, it is important that you |
| only enable it in cases where you trust the HTML content as much as you |
| trust the upstream server. If the HTML is controlled by bad actors, it |
| will be possible for them to inject arbitrary, possibly malicious, HTTP |
| headers into your server's responses. |
| </note> |
| </usage> |
| </directivesynopsis> |
| |
| <directivesynopsis> |
| <name>ProxyHTMLEnable</name> |
| <description>Turns the proxy_html filter on or off.</description> |
| <syntax>ProxyHTMLEnable On|Off</syntax> |
| <default>ProxyHTMLEnable Off</default> |
| <contextlist><context>server config</context> |
| <context>virtual host</context><context>directory</context> |
| </contextlist> |
| <compatibility>Version 2.4 and later; available as a third-party |
| module for earlier 2.x versions.</compatibility> |
| |
| <usage> |
| <p>A simple switch to enable or disable the proxy_html filter. |
| If <module>mod_xml2enc</module> is loaded it will also automatically |
| set up internationalisation support.</p> |
| <p>Note that the proxy_html filter will only act on HTML data |
| (Content-Type text/html or application/xhtml+xml) and when the |
| data are proxied. You can override this (at your own risk) by |
| setting the <var>PROXY_HTML_FORCE</var> environment variable.</p> |
| </usage> |
| </directivesynopsis> |
| |
| <directivesynopsis> |
| <name>ProxyHTMLURLMap</name> |
| <description>Defines a rule to rewrite HTML links</description> |
| <syntax>ProxyHTMLURLMap <var>from-pattern to-pattern [flags] [cond]</var></syntax> |
| <contextlist><context>server config</context> |
| <context>virtual host</context><context>directory</context> |
| </contextlist> |
| <compatibility>Version 2.4 and later; available as a third-party |
| module for earlier 2.x versions.</compatibility> |
| |
| <usage> |
| <p>This is the key directive for rewriting HTML links. When parsing a document, |
| whenever a link target matches <var>from-pattern</var>, the matching |
| portion will be rewritten to <var>to-pattern</var>, as modified by any |
| flags supplied and by the |
| <directive module="mod_proxy_html">ProxyHTMLExtended</directive> directive. |
| Only the elements specified using |
| the <directive module="mod_proxy_html">ProxyHTMLLinks</directive> directive |
| will be considered as HTML links.</p> |
| |
| <p>The optional third argument may define any of the following |
| <strong>Flags</strong>. Flags are case-sensitive.</p> |
| <dl> |
| <dt>h</dt> |
| <dd><p>Ignore HTML links (pass through unchanged)</p></dd> |
| <dt>e</dt> |
| <dd><p>Ignore scripting events (pass through unchanged)</p></dd> |
| <dt>c</dt> |
| <dd><p>Pass embedded script and style sections through untouched.</p></dd> |
| |
| <dt>L</dt> |
| <dd><p>Last-match. If this rule matches, no more rules are applied |
| (note that this happens automatically for HTML links).</p></dd> |
| <dt>l</dt> |
| <dd><p>Opposite to L. Overrides the one-change-only default |
| behaviour with HTML links.</p></dd> |
| <dt>R</dt> |
| <dd><p>Use Regular Expression matching-and-replace. <code>from-pattern</code> |
| is a regexp, and <code>to-pattern</code> a replacement string that may be |
| based on the regexp. Regexp memory is supported: you can use brackets () |
| in the <code>from-pattern</code> and retrieve the matches with $1 to $9 |
| in the <code>to-pattern</code>.</p> |
| |
| <p>If R is not set, it will use string-literal search-and-replace. |
| The logic is <em>starts-with</em> in HTML links, but |
| <em>contains</em> in scripting events and embedded script and style sections. |
| </p> |
| </dd> |
| <dt>x</dt> |
| <dd><p>Use POSIX extended Regular Expressions. Only applicable with R.</p></dd> |
| <dt>i</dt> |
| <dd><p>Case-insensitive matching. Only applicable with R.</p></dd> |
| |
| <dt>n</dt> |
| <dd><p>Disable regexp memory (for speed). Only applicable with R.</p></dd> |
| <dt>s</dt> |
| <dd><p>Line-based regexp matching. Only applicable with R.</p></dd> |
| <dt>^</dt> |
| <dd><p>Match at start only. This applies only to string matching |
| (not regexps) and is irrelevant to HTML links.</p></dd> |
| <dt>$</dt> |
| <dd><p>Match at end only. This applies only to string matching |
| (not regexps) and is irrelevant to HTML links.</p></dd> |
| <dt>V</dt> |
| <dd><p>Interpolate environment variables in <code>to-pattern</code>. |
| A string of the form <code>${varname|default}</code> will be replaced by the |
| value of environment variable <code>varname</code>. If that is unset, it |
| is replaced by <code>default</code>. The <code>|default</code> is optional.</p> |
| <p>NOTE: interpolation will only be enabled if |
| <directive module="mod_proxy_html">ProxyHTMLInterp</directive> is <var>On</var>.</p> |
| </dd> |
| |
| <dt>v</dt> |
| <dd><p>Interpolate environment variables in <code>from-pattern</code>. |
| Patterns supported are as above.</p> |
| <p>NOTE: interpolation will only be enabled if |
| <directive module="mod_proxy_html">ProxyHTMLInterp</directive> is <var>On</var>.</p> |
| </dd> |
| </dl> |
| |
| <p>The optional fourth <strong>cond</strong> argument defines a condition |
| that will be evaluated per Request, provided |
| <directive module="mod_proxy_html">ProxyHTMLInterp</directive> is <var>On</var>. |
| If the condition evaluates FALSE the map will not be applied in this request. |
| If TRUE, or if no condition is defined, the map is applied.</p> |
| <p>A <strong>cond</strong> is evaluated by the <a href="../expr.html" |
| >Expression Parser</a>. In addition, the simpler syntax of conditions |
| in mod_proxy_html 3.x for HTTPD 2.0 and 2.2 is also supported.</p> |
| </usage> |
| </directivesynopsis> |
| |
| <directivesynopsis> |
| <name>ProxyHTMLInterp</name> |
| <description>Enables per-request interpolation of |
| <directive>ProxyHTMLURLMap</directive> rules.</description> |
| <syntax>ProxyHTMLInterp On|Off</syntax> |
| <default>ProxyHTMLInterp Off</default> |
| <contextlist><context>server config</context> |
| <context>virtual host</context><context>directory</context> |
| </contextlist> |
| <compatibility>Version 2.4 and later; available as a third-party |
| for earlier 2.x versions</compatibility> |
| |
| <usage> |
| <p>This enables per-request interpolation in |
| <directive module="mod_proxy_html">ProxyHTMLURLMap</directive> to- and from- patterns.</p> |
| <p>If interpolation is not enabled, all rules are pre-compiled at startup. |
| With interpolation, they must be re-compiled for every request, which |
| implies an extra processing overhead. It should therefore be |
| enabled only when necessary.</p> |
| </usage> |
| </directivesynopsis> |
| |
| <directivesynopsis> |
| <name>ProxyHTMLDocType</name> |
| <description>Sets an HTML or XHTML document type declaration.</description> |
| <syntax>ProxyHTMLDocType HTML|XHTML [Legacy]<br/><strong>OR</strong> |
| <br/>ProxyHTMLDocType <var>fpi</var> [SGML|XML]</syntax> |
| <contextlist><context>server config</context> |
| <context>virtual host</context><context>directory</context> |
| </contextlist> |
| <compatibility>Version 2.4 and later; available as a third-party |
| for earlier 2.x versions</compatibility> |
| |
| <usage> |
| <p>In the first form, documents will be declared as HTML 4.01 or XHTML 1.0 |
| according to the option selected. This option also determines whether |
| HTML or XHTML syntax is used for output. Note that the format of the |
| documents coming from the backend server is immaterial: the parser will |
| deal with it automatically. If the optional second argument is set to |
| "Legacy", documents will be declared "Transitional", an option that may |
| be necessary if you are proxying pre-1998 content or working with defective |
| authoring/publishing tools.</p> |
| <p>In the second form, it will insert your own FPI. The optional second |
| argument determines whether SGML/HTML or XML/XHTML syntax will be used.</p> |
| <p>The default is changed to omitting any FPI, |
| on the grounds that no FPI is better than a bogus one. If your backend |
| generates decent HTML or XHTML, set it accordingly.</p> |
| <p>If the first form is used, mod_proxy_html |
| will also clean up the HTML to the specified standard. It cannot |
| fix every error, but it will strip out bogus elements and attributes. |
| It will also optionally log other errors at <directive |
| module="core">LogLevel</directive> Debug.</p> |
| </usage> |
| </directivesynopsis> |
| |
| <directivesynopsis> |
| <name>ProxyHTMLFixups</name> |
| <description>Fixes for simple HTML errors.</description> |
| <syntax>ProxyHTMLFixups [lowercase] [dospath] [reset]</syntax> |
| <contextlist><context>server config</context> |
| <context>virtual host</context><context>directory</context> |
| </contextlist> |
| <compatibility>Version 2.4 and later; available as a third-party |
| for earlier 2.x versions</compatibility> |
| <usage> |
| <p>This directive takes one to three arguments as follows:</p> |
| <ul> |
| <li><code>lowercase</code> Urls are rewritten to lowercase</li> |
| <li><code>dospath</code> Backslashes in URLs are rewritten to forward slashes.</li> |
| <li><code>reset</code> Unset any options set at a higher level in the configuration.</li> |
| </ul> |
| <p>Take care when using these. The fixes will correct certain authoring |
| mistakes, but risk also erroneously fixing links that were correct to start with. |
| Only use them if you know you have a broken backend server.</p> |
| </usage> |
| </directivesynopsis> |
| |
| <directivesynopsis> |
| <name>ProxyHTMLExtended</name> |
| <description>Determines whether to fix links in inline scripts, stylesheets, |
| and scripting events.</description> |
| <syntax>ProxyHTMLExtended On|Off</syntax> |
| <default>ProxyHTMLExtended Off</default> |
| <contextlist><context>server config</context> |
| <context>virtual host</context><context>directory</context> |
| </contextlist> |
| <compatibility>Version 2.4 and later; available as a third-party |
| for earlier 2.x versions</compatibility> |
| <usage> |
| <p>Set to <code>Off</code>, HTML links are rewritten according to the |
| <directive module="mod_proxy_html">ProxyHTMLURLMap</directive> directives, but links appearing |
| in Javascript and CSS are ignored.</p> |
| <p>Set to <code>On</code>, all scripting events (as determined by |
| <directive module="mod_proxy_html">ProxyHTMLEvents</directive>) and embedded scripts or |
| stylesheets are also processed by the <directive module="mod_proxy_html">ProxyHTMLURLMap</directive> |
| rules, according to the flags set for each rule. Since this requires more |
| parsing, performance will be best if you only enable it when strictly necessary. |
| </p><p> |
| You'll also need to take care over patterns matched, since the parser has no |
| knowledge of what is a URL within an embedded script or stylesheet. |
| In particular, extended matching of <code>/</code> is likely to lead to |
| false matches. |
| </p> |
| </usage> |
| </directivesynopsis> |
| |
| <directivesynopsis> |
| <name>ProxyHTMLStripComments</name> |
| <description>Determines whether to strip HTML comments.</description> |
| <syntax>ProxyHTMLStripComments On|Off</syntax> |
| <default>ProxyHTMLStripComments Off</default> |
| <contextlist><context>server config</context> |
| <context>virtual host</context><context>directory</context> |
| </contextlist> |
| <compatibility>Version 2.4 and later; available as a third-party |
| for earlier 2.x versions</compatibility> |
| <usage> |
| <p>This directive will cause mod_proxy_html to strip HTML comments. |
| Note that this will also kill off any scripts or styles embedded in |
| comments (a bogosity introduced in 1995/6 with Netscape 2 for the |
| benefit of then-older browsers, but still in use today). |
| It may also interfere with comment-based processors such as SSI or ESI: |
| be sure to run any of those <em>before</em> mod_proxy_html in the |
| filter chain if stripping comments!</p> |
| </usage> |
| </directivesynopsis> |
| |
| <directivesynopsis> |
| <name>ProxyHTMLBufSize</name> |
| <description>Sets the buffer size increment for buffering inline scripts and |
| stylesheets.</description> |
| <syntax>ProxyHTMLBufSize <var>bytes</var></syntax> |
| <contextlist><context>server config</context> |
| <context>virtual host</context><context>directory</context> |
| </contextlist> |
| <compatibility>Version 2.4 and later; available as a third-party |
| for earlier 2.x versions</compatibility> |
| <usage> |
| <p>In order to parse non-HTML content (stylesheets and scripts) embedded |
| in HTML documents, mod_proxy_html |
| has to read the entire script or stylesheet into a buffer. This buffer will |
| be expanded as necessary to hold the largest script or stylesheet in a page, |
| in increments of <var>bytes</var> as set by this directive.</p> |
| <p>The default is 8192, and will work well for almost all pages. However, |
| if you know you're proxying pages containing stylesheets and/or |
| scripts bigger than 8K (that is, for a single script or stylesheet, |
| NOT in total), it will be more efficient to set a larger buffer |
| size and avoid the need to resize the buffer dynamically during a request. |
| </p> |
| </usage> |
| </directivesynopsis> |
| |
| <directivesynopsis> |
| <name>ProxyHTMLEvents</name> |
| <description>Specify attributes to treat as scripting events.</description> |
| <syntax>ProxyHTMLEvents <var>attribute [attribute ...]</var></syntax> |
| <contextlist><context>server config</context> |
| <context>virtual host</context><context>directory</context> |
| </contextlist> |
| <compatibility>Version 2.4 and later; available as a third-party |
| for earlier 2.x versions</compatibility> |
| <usage> |
| <p>Specifies one or more attributes to treat as scripting events and |
| apply <directive module="mod_proxy_html">ProxyHTMLURLMap</directive>s to where enabled. |
| You can specify any number of attributes in one or more |
| <directive>ProxyHTMLEvents</directive> directives.</p> |
| <p>Normally you'll set this globally. If you set <directive>ProxyHTMLEvents</directive> in more than |
| one scope so that one overrides the other, you'll need to specify a complete |
| set in each of those scopes.</p> |
| <p>A default configuration is supplied in <var>proxy-html.conf</var> |
| and defines the events in standard HTML 4 and XHTML 1.</p> |
| </usage> |
| </directivesynopsis> |
| |
| <directivesynopsis> |
| <name>ProxyHTMLLinks</name> |
| <description>Specify HTML elements that have URL attributes to be rewritten.</description> |
| <syntax>ProxyHTMLLinks <var>element attribute [attribute2 ...]</var></syntax> |
| <contextlist><context>server config</context> |
| <context>virtual host</context><context>directory</context> |
| </contextlist> |
| <compatibility>Version 2.4 and later; available as a third-party |
| for earlier 2.x versions</compatibility> |
| <usage> |
| <p>Specifies elements that have URL attributes that should be rewritten |
| using standard <directive module="mod_proxy_html">ProxyHTMLURLMap</directive>s. |
| You will need one <directive>ProxyHTMLLinks</directive> directive per element, |
| but it can have any number of attributes.</p> |
| <p>Normally you'll set this globally. If you set <directive>ProxyHTMLLinks</directive> in more than |
| one scope so that one overrides the other, you'll need to specify a complete |
| set in each of those scopes.</p> |
| <p>A default configuration is supplied in <var>proxy-html.conf</var> |
| and defines the HTML links for standard HTML 4 and XHTML 1.</p> |
| <example> |
| <title>Examples from proxy-html.conf</title> |
| <highlight language="config"> |
| ProxyHTMLLinks a href |
| ProxyHTMLLinks area href |
| ProxyHTMLLinks link href |
| ProxyHTMLLinks img src longdesc usemap |
| ProxyHTMLLinks object classid codebase data usemap |
| ProxyHTMLLinks q cite |
| ProxyHTMLLinks blockquote cite |
| ProxyHTMLLinks ins cite |
| ProxyHTMLLinks del cite |
| ProxyHTMLLinks form action |
| ProxyHTMLLinks input src usemap |
| ProxyHTMLLinks head profile |
| ProxyHTMLLinks base href |
| ProxyHTMLLinks script src for |
| </highlight> |
| </example> |
| </usage> |
| </directivesynopsis> |
| |
| <directivesynopsis> |
| <name>ProxyHTMLCharsetOut</name> |
| <description>Specify a charset for mod_proxy_html output.</description> |
| <syntax>ProxyHTMLCharsetOut <var>Charset | *</var></syntax> |
| <contextlist><context>server config</context> |
| <context>virtual host</context><context>directory</context> |
| </contextlist> |
| <compatibility>Version 2.4 and later; available as a third-party |
| for earlier 2.x versions</compatibility> |
| <usage> |
| <p>This selects an encoding for mod_proxy_html output. It should not |
| normally be used, as any change from the default <code>UTF-8</code> |
| (Unicode - as used internally by libxml2) will impose an additional |
| processing overhead. The special token <code>ProxyHTMLCharsetOut *</code> |
| will generate output using the same encoding as the input.</p> |
| <p>Note that this relies on <module>mod_xml2enc</module> being loaded.</p> |
| </usage> |
| </directivesynopsis> |
| |
| |
| |
| </modulesynopsis> |