| <?xml version="1.0" encoding="ISO-8859-1"?> |
| <!DOCTYPE document [ |
| <!ENTITY project SYSTEM "project.xml"> |
| ]> |
| <document url="proxy.html"> |
| |
| &project; |
| <copyright> |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| </copyright> |
| <properties> |
| <title>Reverse Proxy HowTo</title> |
| <author email="rjung@apache.org">Rainer Jung</author> |
| <date>$Date$</date> |
| </properties> |
| <body> |
| <section name="Introduction"> |
| <br/> |
| <p>The Apache module mod_jk and its ISAPI and NSAPI variants connect |
| a web server to a backend (typically Tomcat) using the AJP protocol. |
| The web server receives an HTTP(S) request and the module forwards |
| the request to the backend. This function is usually called a gateway |
| or a proxy, in the context of HTTP it is called a reverse proxy. |
| </p> |
| </section> |
| <section name="Typical Problems"> |
| <br/> |
| <p>A reverse proxy is not totally transparent to the application on |
| the backend. For instance the host name and port the original client |
| (e.g. browser) needs to talk to belong to the web server and not to the |
| backend, so the reverse proxy talks to a different host name and port. |
| When the application on the backend returns content including |
| self-referential URLs using its own backend address and port, the |
| client will usually not be able to use these URLs. |
| </p> |
| <p>Another example is the client IP address, which for the web server is the |
| source IP of the incoming connection, whereas for the backend the |
| connection always comes from the web server. This can be a problem, when |
| the client IP is used by the backend application e.g. for security reasons. |
| </p> |
| </section> |
| <section name="AJP as a Solution"> |
| <br/> |
| <p>Most of these problems are automatically handled by the AJP protocol |
| and the AJP connectors of the backend. The AJP protocol transports |
| this communication metadata and the backend connector presents this |
| metadata whenever the application asks for it using Servlet API methods. |
| </p> |
| <p>The following list contains the communication metadata handled by AJP |
| and the ServletRequest/HttpServletRequest API calls which can be used to retrieve them: |
| <ul> |
| <li>local name: <code>getLocalName()</code> and <code>getLocalAddr</code>. |
| This is also equal to <code>getServerName()</code>, unless a <code>Host</code> header |
| is contained in the request. In this case the server name is taken from that header. |
| </li> |
| <li>local port: <code>getLocalPort()</code> |
| This is also equal to <code>getServerPort()</code>, unless a <code>Host</code> header |
| is contained in the request. In this case the server port is taken from that header |
| if it contains an explicit port, or is equal to the default port of the scheme used. |
| </li> |
| <li>client address: <code>getRemoteAddr()</code> |
| </li> |
| <li>client port: <code>getRemotePort()</code> |
| The remote port was initially not supported. It is available when using mod_jk 1.2.32 |
| with Apache or IIS (not for the NSAPI plugin) together with Tomcat version at least |
| 5.5.28, 6.0.20 or 7.0.0. For older versions, <code>getRemotePort()</code> |
| will incorrectly return 0 or -1. As a workaround you can forward the remote port by setting |
| <code>JkEnvVar REMOTE_PORT</code> and then either using |
| <code>request.getAttribute("REMOTE_PORT")</code> instead of <code>getRemotePort()</code> |
| or wrapping the request using a filter and overriding <code>getRemotePort()</code> with |
| <code>request.getAttribute("REMOTE_PORT")</code>. |
| </li> |
| <li>client host: <code>getRemoteHost()</code> |
| </li> |
| <li>authentication type: <code>getAuthType()</code> |
| </li> |
| <li>remote user: <code>getRemoteUser()</code>, |
| if <code>tomcatAuthentication="false"</code> |
| </li> |
| <li>protocol: <code>getProtocol()</code> |
| </li> |
| <li>HTTP method: <code>getMethod()</code> |
| </li> |
| <li>URI: <code>getRequestURI()</code> |
| </li> |
| <li>HTTPS used: <code>isSecure()</code>, <code>getScheme()</code> |
| </li> |
| <li>query string: <code>getQueryString()</code> |
| </li> |
| </ul> |
| The following additional SSL-related data will be made available by Apache and forwarded by mod_jk only |
| if you set <code>SSLOptions +StdEnvVars</code>. For the certificate information you also need |
| to set <code>SSLOptions +ExportCertData</code>. |
| <ul> |
| <li>SSL cipher: <code>getAttribute(javax.servlet.request.cipher_suite)</code> |
| </li> |
| <li>SSL key size: <code>getAttribute(javax.servlet.request.key_size)</code>. |
| Can be disabled using <code>JkOptions -ForwardKeySize</code>. |
| </li> |
| <li>SSL client certificate: <code>getAttribute(javax.servlet.request.X509Certificate)</code>. |
| If you want the whole certificate chain, then you need to also set <code>JkOptions ForwardSSLCertChain</code>. |
| It is likely, that in this case you also need to adjust the maximal AJP packet size |
| using the worker attribute <a href="../reference/workers.html">max_packet_size</a>. |
| </li> |
| <li>SSL session ID: <code>getAttribute(javax.servlet.request.ssl_session)</code>. |
| This is for Tomcat, it has not yet been standardized. |
| </li> |
| </ul> |
| </p> |
| </section> |
| <section name="Fine Tuning"> |
| <br/> |
| <p>In some situations this is not enough though. Assume there is another |
| less clever reverse proxy in front of your web server, for instance an |
| HTTP load balancer or similar device which also serves as an SSL accelerator. |
| </p> |
| <p>Then you are sure that all your clients use HTTPS, but your web server doesn't |
| know about that. All it can see is requests coming from the accelerator using |
| plain HTTP. |
| </p> |
| <p>Another example would be a simple reverse proxy in front of your web server, |
| so that the client IP address that your web server sees is always the IP address |
| of this reverse proxy, and not of the original client. Often such reverse proxies |
| generate an additional HTTP header, like <code>X-Forwareded-for</code> which |
| contains the original client IP address (or a list of IP addresses, if there are |
| more cascading reverse proxies in front). It would be nice, if we could use the |
| content of such a header as the client IP address to pass to the backend. |
| </p> |
| <p>So we might need to manipulate some of the data that AJP sends to the backend. |
| When using mod_jk inside Apache httpd you can use several httpd environment |
| variables to let mod_jk know, which data it should forward. These environment variables |
| can be set by the httpd directives SetEnv or SetEnvIf, but also in a very flexible |
| way using mod_rewrite (since httpd 2.x it can not only test against environment |
| variables, but also set them). |
| </p> |
| <p>The following list contains all environment variables mod_jk checks, before |
| sending data to the backend: |
| <ul> |
| <li>JK_LOCAL_NAME: the local name |
| </li> |
| <li>JK_LOCAL_PORT: the local port |
| </li> |
| <li>JK_REMOTE_HOST: the client host |
| </li> |
| <li>JK_REMOTE_ADDR: the client address |
| </li> |
| <li>JK_AUTH_TYPE: the authentication type |
| </li> |
| <li>JK_REMOTE_USER: the remote user |
| </li> |
| <li>HTTPS: On (case-insensitive) to indicate, that HTTPS is used |
| </li> |
| <li>SSL_CIPHER: the SSL cipher |
| </li> |
| <li>SSL_CIPHER_USEKEYSIZE: the SSL key size |
| </li> |
| <li>SSL_CLIENT_CERT: the SSL client certificate |
| </li> |
| <li>SSL_CLIENT_CERT_CHAIN_: prefix of variable names, containing |
| the client cerificate chain |
| </li> |
| <li>SSL_SESSION_ID: the SSL session ID |
| </li> |
| </ul> |
| </p> |
| <p>Remember: in general you don't need to set them. The module retrieves the data automatically |
| from the web server. Only in case you want to change this data, you can overwrite it by |
| using these variables. |
| </p> |
| <p>Some of these variables might also be used by other web server modules. All |
| variables whose name does not begin with "JK" are set directly by Apache httpd. |
| If you want to change the data, but do not want to negatively influence the behaviour |
| of other modules, you can change the names of all variables mod_jk uses to private ones. |
| For the details see the <a href="../reference/apache.html">Apache reference</a> page. |
| </p> |
| <p>All variables, that are not SSL-related have only been introduced in version 1.2.27. |
| </p> |
| <p>Finally there is a shortcut to forward the local IP of the web server as the remote IP. |
| This can be useful, e.g. when using the Tomcat remote address valve for allowing connections |
| only from registered Apache web servers. This feature is activated by setting |
| <code>JkOptions ForwardLocalAddress</code>. |
| </p> |
| </section> |
| <section name="Tomcat AJP Connector Settings"> |
| <br/> |
| <p>As an alternative to using the environment variables described in the previous section |
| (which do only exist when using Apache httpd), you can also configure Tomcat to overwrite |
| some of the communications data forwarded by mod_jk. The AJP connector in Tomcat's <code>server.xml</code> |
| allows to set the <a href="http://tomcat.apache.org/tomcat-6.0-doc/config/ajp.html#Attributes">following properties</a>: |
| <ul> |
| <li>proxyName: server name as returned by <code>getServerName()</code> |
| </li> |
| <li>proxyPort: server port as returned by <code>getServerPort()</code> |
| </li> |
| <li>scheme: protocol scheme as returned by <code>getScheme()</code> |
| </li> |
| <li>secure: set to "true", if you wish <code>isSecure()</code> to return "true". |
| </li> |
| </ul> |
| Remember: in general you don't need to set those. AJP automatically handles all cases |
| where the web server running mod_jk knows the right data. |
| </p> |
| </section> |
| <section name="URL Handling"> |
| <br/> |
| <subsection name="URL Rewriting"> |
| <p>Sometimes one want to change path components of the URLs under which an application |
| is available. Especially if a web application is deployed as some context, say <code>/myapp</code>, |
| marketing prefers short URLs, so want the application to be directly available under |
| <code>http://www.mycompany.com/</code>. Although you can deploy the application as the so-called |
| ROOT context, which will be directly available at "/", admins often prefer not to use |
| the ROOT context, e.g. because only one application can be the root context (per host). |
| </p> |
| <p>The procedure to change the URLs in the reverse proxy is tedious, because often |
| an application produces self-referential URLs, which then include the path components |
| which you tried to hide to the outside world. Nevertheless, if you absolutely need to do it, |
| here are the steps. |
| </p> |
| <p>Case A: You need to make the application available at a simple URL, but it is OK, if |
| users proceed using the more complex URLs, as long as they don't have to type them in. |
| That's the easy case, and if this suffices to you, you're lucky. Use a simply RedirectMatch |
| for Apache httpd: |
| </p> |
| <source> |
| RedirectMatch ^/$ http://www.mycompany.com/myapp/ |
| </source> |
| <p>Your application will then be available under <code>http://www.mycompany.com/</code>, |
| and each visitor will be immediately redirected to the real URL |
| <code>http://www.mycompany.com/myapp/</code> |
| </p> |
| <p>Case B: You need to hide path components for all requests going to the application. |
| Here's the recipe for the case, where you want to hide the first path component |
| <code>/myapp</code>. More complex manipulations are left as an exercise to the reader. |
| First the solution for the case of Apache httpd: |
| </p> |
| <p>1. Use <a href="http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html"><code>mod_rewrite</code></a> |
| to add <code>/myapp</code> to all requests before forwarding to the backend: |
| </p> |
| <source> |
| # Don't forget the PT flag! (pass through) |
| RewriteRule ^/(.*) http://www.mycompany.com/myapp/$1 [PT] |
| </source> |
| <p>2. Use <a href="http://httpd.apache.org/docs/2.2/mod/mod_headers.html"><code>mod_headers</code></a> |
| to rewrite any HTTP redirects your application might return. Such redirects typically contain |
| the path components you want to hide, because by the HTTP standard, redirects always need to include |
| the full URL, and your application is not aware of the fact, that your clients talk to it via |
| some shortened URL. An HTTP redirect is done with a special response header named <code>Location</code>. |
| We rewrite the Location headers of our responses: |
| </p> |
| <source> |
| # Keep protocol, server and port if present, |
| # but insert our webapp name before the rest of the URL |
| Header edit Location ^([^/]*//[^/]*)?/(.*)$ $1/myapp/$2 |
| </source> |
| <p>3. Use <code>mod_headers</code> again, to rewrite the paths contained in any cookies, |
| your application might set. Such cookie paths again might contain |
| the path components you want to hide. |
| A cookie is set with the HTTP response header named <code>Set-Cookie</code>. |
| We rewrite the Set-Cookie headers of our responses: |
| </p> |
| <source> |
| # Fix the cookie path |
| Header edit Set-Cookie "^(.*; Path=/)(.*)" $1/myapp/$2 |
| </source> |
| <p>3. Some applications might contain hard coded absolute links. |
| In this case check, whether you find a configuration item for your web framework |
| to configure the base URL. If not, your only chance is to parse all response |
| content bodies and do search and replace. This is fragile and very resource intensive. |
| If you really need to do this, you can use |
| <a href="http://apache.webthing.com/mod_proxy_html/"><code>mod_proxy_html</code></a>, |
| <a href="http://httpd.apache.org/docs/2.2/mod/mod_substitute.html"><code>mod_substitute</code></a> |
| or <a href="http://blogs.sun.com/basant/entry/using_mod_sed_to_filter"><code>mod_sed</code></a> |
| for this task. |
| </p> |
| <p>If you are using Microsoft IIS as a web server, the ISAPI plugin provides a way |
| of doing the first step with a builtin feature. You define a mapping file for simple prefix |
| changes like this: |
| </p> |
| <source> |
| # Add a context prefix to all requests ... |
| /=/myapp/ |
| # ... or change some prefix ... |
| /oldapp/=/myapp/ |
| </source> |
| <p>and then put the name of the file in the <code>rewrite_rule_file</code> entry of the registry or your |
| <code>isapi_redirect.properties</code> file. In you <code>uriworkermap.properties</code> file, you |
| still need to map the URLs as they are before rewriting! |
| </p> |
| <p>More complex rewrites can be done using the same file, but with regular expressions. A leading |
| tilde sign '<code>~</code>', indicates, that you are using a regular expression: |
| </p> |
| <source> |
| # Use a regular expression rewrite |
| ~/oldapps([0-9]*)/=/newapps$1/ |
| </source> |
| <p>There is no support for Steps 2 (rewriting redirect responses) or 3 (rewriting cookie paths). |
| </p> |
| </subsection> |
| <subsection name="URL Encoding"> |
| <p>Some types of problems are triggered by the use of encoded URLs |
| (see <a href="http://en.wikipedia.org/wiki/Percent-encoding">percent encoding</a>). |
| For the same location there exist |
| a lot of different URLs which are equivalent. The reverse proxy needs to inspect the URL in order |
| to apply its own authentication rules and to decide, to which backend it should send the request |
| (or whether it should handle it itself). Therefore the request URL first is normalized: |
| percent encoded characters are decoded, <code>/./</code> is replaced by <code>/</code>, |
| <code>/XXX/../</code> is replaced by <code>/</code> and similar manipulations of the URL are done. |
| After that, the web server might apply rewrite rules to further change the URL in less obvious ways. |
| Finally there is no more way to put the resulting URL in an encoding, which is "similar" to |
| the one which was used for the original URL. |
| </p> |
| <p> |
| For historical reasons, there have been several alternatives, how mod_jk and the ISAPI |
| plugin encoded the resulting URL before sending it to the backend. They could be chosen via |
| <code>JkOptions</code> (Apache httpd) or <code>uri_select</code> (ISAPI). None of those historical |
| encodings are recommended, because they have either negative functionality implications or |
| pose a security risk. The default encoding since version 1.2.24 is <code>ForwardURIProxy</code> |
| (Apache httpd) or <code>proxy</code> (ISAPI) and it is strongly recommended to keep the default |
| and remove all old explicit settings. |
| </p> |
| </subsection> |
| </section> |
| <section name="Request Attributes"> |
| <br/> |
| <p> |
| You can also add more attributes to any request you are forwarding when using Apache httpd. |
| For this use the <code>JkEnvVar</code> directive (for details see the |
| <a href="../reference/apache.html">Apache reference</a> page). Such request attributes can be |
| retrieved on the Tomcat side via request.getAttribute(attributeName). |
| Note that their names will not be listed in request.getAttributeNames()! |
| </p> |
| </section> |
| </body> |
| </document> |