blob: ca986d36ebe3682e1d1503ed90714c1f915cf380 [file] [log] [blame]
<?xml version='1.0' encoding='UTF-8' ?>
<!DOCTYPE manualpage SYSTEM "../style/manualpage.dtd">
<?xml-stylesheet type="text/xsl" href="../style/manual.en.xsl"?>
<!-- $LastChangedRevision$ -->
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<manualpage metafile="reverse_proxy.xml.meta">
<parentdocument href="./">How-To / Tutorials</parentdocument>
<title>Reverse Proxy Guide</title>
<summary>
<p>In addition to being a "basic" web server, and providing static and
dynamic content to end-users, Apache httpd (as well as most other web
servers) can also act as a reverse proxy server, also-known-as a
"gateway" server.</p>
<p>In such scenarios, httpd itself does not generate or host the data,
but rather the content is obtained by one or several backend servers,
which normally have no direct connection to the external network. As
httpd receives a request from a client, the request itself is <em>proxied</em>
to one of these backend servers, which then handles the request, generates
the content and then sends this content back to httpd, which then
generates the actual HTTP response back to the client.</p>
<p>There are numerous reasons for such an implementation, but generally
the typical rationales are due to security, high-availability, load-balancing
and centralized authentication/authorization. It is critical in these
implementations that the layout, design and architecture of the backend
infrastructure (those servers which actually handle the requests) are
insulated and protected from the outside; as far as the client is concerned,
the reverse proxy server <em>is</em> the sole source of all content.</p>
<p>A typical implementation is below:</p>
<p class="centered"><img src="../images/reverse-proxy-arch.png" alt="reverse-proxy-arch" /></p>
</summary>
<section id="related">
<title>Reverse Proxy</title>
<related>
<modulelist>
<module>mod_proxy</module>
<module>mod_proxy_balancer</module>
<module>mod_proxy_hcheck</module>
</modulelist>
<directivelist>
<directive module="mod_proxy">ProxyPass</directive>
<directive module="mod_proxy">BalancerMember</directive>
</directivelist>
</related>
</section>
<section id="simple">
<title>Simple reverse proxying</title>
<p>
The <directive module="mod_proxy">ProxyPass</directive>
directive specifies the mapping of incoming requests to the backend
server (or a cluster of servers known as a <code>Balancer</code>
group). The simpliest example proxies all requests (<code>"/"</code>)
to a single backend:
</p>
<highlight language="config">
ProxyPass "/" "http://www.example.com/"
</highlight>
<p>
To ensure that and <code>Location:</code> headers generated from
the backend are modified to point to the reverse proxy, instead of
back to itself, the <directive module="mod_proxy">ProxyPassReverse</directive>
directive is most often required:
</p>
<highlight language="config">
ProxyPass "/" "http://www.example.com/"
ProxyPassReverse "/" "http://www.example.com/"
</highlight>
<p>Only specific URIs can be proxied, as shown in this example:</p>
<highlight language="config">
ProxyPass "/images" "http://www.example.com/"
ProxyPassReverse "/images" "http://www.example.com/"
</highlight>
<p>In the above, any requests which start with the <code>/images</code>
path with be proxied to the specified backend, otherwise it will be handled
locally.
</p>
</section>
<section id="cluster">
<title>Clusters and Balancers</title>
<p>
As useful as the above is, it still has the deficiencies that should
the (single) backend node go down, or become heavily loaded, that proxying
those requests provides no real advantage. What is needed is the ability
to define a set or group of backend servers which can handle such
requests and for the reverse proxy to load balance and failover among
them. This group is sometimes called a <em>cluster</em> but Apache httpd's
term is a <em>balancer</em>. One defines a balancer by leveraging the
<directive module="mod_proxy" type="section">Proxy</directive> and
<directive module="mod_proxy">BalancerMember</directive> directives as
shown:
</p>
<highlight language="config">
&lt;Proxy balancer://myset&gt;
BalancerMember http://www2.example.com:8080
BalancerMember http://www3.example.com:8080
ProxySet lbmethod=bytraffic
&lt;/Proxy&gt;
ProxyPass "/images/" "balancer://myset/"
ProxyPassReverse "/images/" "balancer://myset/"
</highlight>
<p>
The <code>balancer://</code> scheme is what tells httpd that we are creating
a balancer set, with the name <em>myset</em>. It includes 2 backend servers,
which httpd calls <em>BalancerMembers</em>. In this case, any requests for
<code>/images</code> will be proxied to <em>one</em> of the 2 backends.
The <directive module="mod_proxy">ProxySet</directive> directive
specifies that the <em>myset</em> Balancer use a load balancing algorithm
that balances based on I/O bytes.
</p>
<note type="hint"><title>Hint</title>
<p>
<em>BalancerMembers</em> are also sometimes referred to as <em>workers</em>.
</p>
</note>
</section>
<section id="config">
<title>Balancer and BalancerMember configuration</title>
<p>
You can adjust numerous configuration details of the <em>balancers</em>
and the <em>workers</em> via the various parameters defined in
<directive module="mod_proxy">ProxyPass</directive>. For example,
assuming we would want <code>http://www3.example.com:8080</code> to
handle 3x the traffic with a timeout of 1 second, we would adjust the
configuration as follows:
</p>
<highlight language="config">
&lt;Proxy balancer://myset&gt;
BalancerMember http://www2.example.com:8080
BalancerMember http://www3.example.com:8080 loadfactor=3 timeout=1
ProxySet lbmethod=bytraffic
&lt;/Proxy&gt;
ProxyPass "/images" "balancer://myset/"
ProxyPassReverse "/images" "balancer://myset/"
</highlight>
</section>
<section id="failover">
<title>Failover</title>
<p>
You can also fine-tune various failover scenarios, detailing which
workers and even which balancers should accessed in such cases. For
example, the below setup implements 2 failover cases: In the first,
<code>http://hstandby.example.com:8080</code> is only sent traffic
if all other workers in the <em>myset</em> balancer are not available.
If that worker itself is not available, only then will the
<code>http://bkup1.example.com:8080</code> and <code>http://bkup2.example.com:8080</code>
workers be brought into rotation:
</p>
<highlight language="config">
&lt;Proxy balancer://myset&gt;
BalancerMember http://www2.example.com:8080
BalancerMember http://www3.example.com:8080 loadfactor=3 timeout=1
BalancerMember http://hstandby.example.com:8080 status=+H
BalancerMember http://bkup1.example.com:8080 lbset=1
BalancerMember http://bkup2.example.com:8080 lbset=1
ProxySet lbmethod=byrequests
&lt;/Proxy&gt;
ProxyPass "/images/" "balancer://myset/"
ProxyPassReverse "/images/" "balancer://myset/"
</highlight>
<p>
The magic of this failover setup is setting <code>http://hstandby.example.com:8080</code>
with the <code>+H</code> status flag, which puts it in <em>hot standby</em> mode,
and making the 2 <code>bkup#</code> servers part of the #1 load balancer set (the
default set is 0); for failover, hot standbys (if they exist) are used 1st, when all regular
workers are unavailable; load balancer sets are always tried lowest number first.
</p>
</section>
<section id="manager">
<title>Balancer Manager</title>
<p>
One of the most unique and useful features of Apache httpd's reverse proxy is
the embedded <em>balancer-manager</em> application. Similar to
<module>mod_status</module>, <em>balancer-manager</em> displays
the current working configuration and status of the enabled
balancers and workers currently in use. However, not only does it
display these parameters, it also allows for dynamic, runtime, on-the-fly
reconfiguration of almost all of them, including adding new <em>BalancerMembers</em>
(workers) to an existing balancer. To enable these capability, the following
needs to be added to your configuration:
</p>
<highlight language="config">
&lt;Location "/balancer-manager"&gt;
SetHandler balancer-manager
Require host localhost
&lt;/Location&gt;
</highlight>
<note type="warning"><title>Warning</title>
<p>Do not enable the <em>balancer-manager</em> until you have <a
href="../mod/mod_proxy.html#access">secured your server</a>. In
particular, ensure that access to the URL is tightly
restricted.</p>
</note>
<p>
When the reverse proxy server is accessed at that url
(eg: <code>http://rproxy.example.com/balancer-manager/</code>, you will see a
page similar to the below:
</p>
<p class="centered"><img src="../images/bal-man.png" alt="balancer-manager page" /></p>
<p>
This form allows the devops admin to adjust various parameters, take
workers offline, change load balancing methods and add new works. For
example, clicking on the balancer itself, you will get the following page:
</p>
<p class="centered"><img src="../images/bal-man-b.png" alt="balancer-manager page" /></p>
<p>
Whereas clicking on a worker, displays this page:
</p>
<p class="centered"><img src="../images/bal-man-w.png" alt="balancer-manager page" /></p>
<p>
To have these changes persist restarts of the reverse proxy, ensure that
<directive module="mod_proxy">BalancerPersist</directive> is enabled.
</p>
</section>
<section id="health-check">
<title>Dynamic Health Checks</title>
<p>
Before httpd proxies a request to a worker, it can <em>"test"</em> if that worker
is available via setting the <code>ping</code> parameter for that worker using
<directive module="mod_proxy">ProxyPass</directive>. Oftentimes it is
more useful to check the health of the workers <em>out of band</em>, in a
dynamic fashion. This is achieved in Apache httpd by the
<module>mod_proxy_hcheck</module> module.
</p>
</section>
<section id="status">
<title>BalancerMember status flags</title>
<p>
In the <em>balancer-manager</em> the current state, or <em>status</em>, of a worker
is displayed and can be set/reset. The meanings of these statuses are as follows:
</p>
<table border="1">
<tr><th>Flag</th><th>String</th><th>Description</th></tr>
<tr><td>&nbsp;</td><td><em>Ok</em></td><td>Worker is available</td></tr>
<tr><td>&nbsp;</td><td><em>Init</em></td><td>Worker has been initialized</td></tr>
<tr><td><code>D</code></td><td><em>Dis</em></td><td>Worker is disabled and will not accept any requests; will be
automatically retried.</td></tr>
<tr><td><code>S</code></td><td><em>Stop</em></td><td>Worker is administratively stopped; will not accept requests
and will not be automatically retried</td></tr>
<tr><td><code>I</code></td><td><em>Ign</em></td><td>Worker is in ignore-errors mode and will always be considered available.</td></tr>
<tr><td><code>H</code></td><td><em>Stby</em></td><td>Worker is in hot-standby mode and will only be used if no other
viable workers are available.</td></tr>
<tr><td><code>E</code></td><td><em>Err</em></td><td>Worker is in an error state, usually due to failing pre-request check;
requests will not be proxied to this worker, but it will be retried depending on
the <code>retry</code> setting of the worker.</td></tr>
<tr><td><code>N</code></td><td><em>Drn</em></td><td>Worker is in drain mode and will only accept existing sticky sessions
destined for itself and ignore all other requests.</td></tr>
<tr><td><code>C</code></td><td><em>HcFl</em></td><td>Worker has failed dynamic health check and will not be used until it
passes subsequent health checks.</td></tr>
</table>
</section>
</manualpage>