docs/manual/howto/reverse_proxy.xml - httpd - Git at Google

 <?xml version='1.0' encoding='UTF-8' ?>
 <!DOCTYPE manualpage SYSTEM "../style/manualpage.dtd">
 <?xml-stylesheet type="text/xsl" href="../style/manual.en.xsl"?>
 <!-- $LastChangedRevision$ -->

 <!--
  Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
 -->

 <manualpage metafile="reverse_proxy.xml.meta">
 <parentdocument href="./">How-To / Tutorials</parentdocument>

   <title>Reverse Proxy Guide</title>

   <summary>
     <p>In addition to being a "basic" web server, and providing static and
     dynamic content to end-users, Apache httpd (as well as most other web
     servers) can also act as a reverse proxy server, also-known-as a
     "gateway" server.</p>

     <p>In such scenarios, httpd itself does not generate or host the data,
     but rather the content is obtained by one or several backend servers,
     which normally have no direct connection to the external network. As
     httpd receives a request from a client, the request itself is <em>proxied</em>
     to one of these backend servers, which then handles the request, generates
     the content and then sends this content back to httpd, which then
     generates the actual HTTP response back to the client.</p>

     <p>There are numerous reasons for such an implementation, but generally
     the typical rationales are due to security, high-availability, load-balancing
     and centralized authentication/authorization. It is critical in these
     implementations that the layout, design and architecture of the backend
     infrastructure (those servers which actually handle the requests) are
     insulated and protected from the outside; as far as the client is concerned,
     the reverse proxy server <em>is</em> the sole source of all content.</p>

     <p>A typical implementation is below:</p>
     <p class="centered"><img src="../images/reverse-proxy-arch.png" alt="reverse-proxy-arch" /></p>

   </summary>


   <section id="related">
   <title>Reverse Proxy</title>
   <related>
     <modulelist>
       <module>mod_proxy</module>
       <module>mod_proxy_balancer</module>
       <module>mod_proxy_hcheck</module>
     </modulelist>
     <directivelist>
       <directive module="mod_proxy">ProxyPass</directive>
       <directive module="mod_proxy">BalancerMember</directive>
     </directivelist>
   </related>
   </section>

   <section id="simple">
     <title>Simple reverse proxying</title>

     <p>
       The <directive module="mod_proxy">ProxyPass</directive>
       directive specifies the mapping of incoming requests to the backend
       server (or a cluster of servers known as a <code>Balancer</code>
       group). The simpliest example proxies all requests (<code>"/"</code>)
       to a single backend:
     </p>

     <highlight language="config">
 ProxyPass "/"  "http://www.example.com/"
     </highlight>

     <p>
       To ensure that and <code>Location:</code> headers generated from
       the backend are modified to point to the reverse proxy, instead of
       back to itself, the <directive module="mod_proxy">ProxyPassReverse</directive>
       directive is most often required:
     </p>

     <highlight language="config">
 ProxyPass "/"  "http://www.example.com/"
 ProxyPassReverse "/"  "http://www.example.com/"
     </highlight>

     <p>Only specific URIs can be proxied, as shown in this example:</p>

     <highlight language="config">
 ProxyPass "/images"  "http://www.example.com/"
 ProxyPassReverse "/images"  "http://www.example.com/"
     </highlight>

     <p>In the above, any requests which start with the <code>/images</code>
       path with be proxied to the specified backend, otherwise it will be handled
       locally.
     </p>
   </section>

   <section id="cluster">
     <title>Clusters and Balancers</title>

     <p>
       As useful as the above is, it still has the deficiencies that should
       the (single) backend node go down, or become heavily loaded, that proxying
       those requests provides no real advantage. What is needed is the ability
       to define a set or group of backend servers which can handle such
       requests and for the reverse proxy to load balance and failover among
       them. This group is sometimes called a <em>cluster</em> but Apache httpd's
       term is a <em>balancer</em>. One defines a balancer by leveraging the
       <directive module="mod_proxy" type="section">Proxy</directive> and
       <directive module="mod_proxy">BalancerMember</directive> directives as
       shown:
     </p>

     <highlight language="config">
 &lt;Proxy balancer://myset&gt;
     BalancerMember http://www2.example.com:8080
     BalancerMember http://www3.example.com:8080
     ProxySet lbmethod=bytraffic
 &lt;/Proxy&gt;

 ProxyPass "/images/"  "balancer://myset/"
 ProxyPassReverse "/images/"  "balancer://myset/"
     </highlight>

     <p>
       The <code>balancer://</code> scheme is what tells httpd that we are creating
       a balancer set, with the name <em>myset</em>. It includes 2 backend servers,
       which httpd calls <em>BalancerMembers</em>. In this case, any requests for
       <code>/images</code> will be proxied to <em>one</em> of the 2 backends.
       The <directive module="mod_proxy">ProxySet</directive> directive
       specifies that the <em>myset</em> Balancer use a load balancing algorithm
       that balances based on I/O bytes.
     </p>

     <note type="hint"><title>Hint</title>
       <p>
       	<em>BalancerMembers</em> are also sometimes referred to as <em>workers</em>.
       </p>
    </note>

   </section>

   <section id="config">
     <title>Balancer and BalancerMember configuration</title>

     <p>
       You can adjust numerous configuration details of the <em>balancers</em>
       and the <em>workers</em> via the various parameters defined in
       <directive module="mod_proxy">ProxyPass</directive>. For example,
       assuming we would want <code>http://www3.example.com:8080</code> to
       handle 3x the traffic with a timeout of 1 second, we would adjust the
       configuration as follows:
     </p>

     <highlight language="config">
 &lt;Proxy balancer://myset&gt;
     BalancerMember http://www2.example.com:8080
     BalancerMember http://www3.example.com:8080 loadfactor=3 timeout=1
     ProxySet lbmethod=bytraffic
 &lt;/Proxy&gt;

 ProxyPass "/images"  "balancer://myset/"
 ProxyPassReverse "/images"  "balancer://myset/"
     </highlight>

   </section>

   <section id="failover">
     <title>Failover</title>

     <p>
       You can also fine-tune various failover scenarios, detailing which
       workers and even which balancers should accessed in such cases. For
       example, the below setup implements 2 failover cases: In the first,
       <code>http://hstandby.example.com:8080</code> is only sent traffic
       if all other workers in the <em>myset</em> balancer are not available.
       If that worker itself is not available, only then will the
       <code>http://bkup1.example.com:8080</code> and <code>http://bkup2.example.com:8080</code>
       workers be brought into rotation:
     </p>

     <highlight language="config">
 &lt;Proxy balancer://myset&gt;
     BalancerMember http://www2.example.com:8080
     BalancerMember http://www3.example.com:8080 loadfactor=3 timeout=1
     BalancerMember http://hstandby.example.com:8080 status=+H
     BalancerMember http://bkup1.example.com:8080 lbset=1
     BalancerMember http://bkup2.example.com:8080 lbset=1
     ProxySet lbmethod=byrequests
 &lt;/Proxy&gt;

 ProxyPass "/images/"  "balancer://myset/"
 ProxyPassReverse "/images/"  "balancer://myset/"
     </highlight>

     <p>
       The magic of this failover setup is setting <code>http://hstandby.example.com:8080</code>
       with the <code>+H</code> status flag, which puts it in <em>hot standby</em> mode,
       and making the 2 <code>bkup#</code> servers part of the #1 load balancer set (the
       default set is 0); for failover, hot standbys (if they exist) are used 1st, when all regular
       workers are unavailable; load balancer sets are always tried lowest number first.
     </p>

   </section>

   <section id="manager">
     <title>Balancer Manager</title>

     <p>
       One of the most unique and useful features of Apache httpd's reverse proxy is
 	  the embedded <em>balancer-manager</em> application. Similar to
 	  <module>mod_status</module>, <em>balancer-manager</em> displays
 	  the current working configuration and status of the enabled
 	  balancers and workers currently in use. However, not only does it
 	  display these parameters, it also allows for dynamic, runtime, on-the-fly
 	  reconfiguration of almost all of them, including adding new <em>BalancerMembers</em>
 	  (workers) to an existing balancer. To enable these capability, the following
 	  needs to be added to your configuration:
     </p>

     <highlight language="config">
 &lt;Location "/balancer-manager"&gt;
     SetHandler balancer-manager
     Require host localhost
 &lt;/Location&gt;
     </highlight>

     <note type="warning"><title>Warning</title>
       <p>Do not enable the <em>balancer-manager</em> until you have <a
       href="../mod/mod_proxy.html#access">secured your server</a>. In
       particular, ensure that access to the URL is tightly
       restricted.</p>
     </note>

     <p>
       When the reverse proxy server is accessed at that url
       (eg: <code>http://rproxy.example.com/balancer-manager/</code>, you will see a
       page similar to the below:
     </p>
     <p class="centered"><img src="../images/bal-man.png" alt="balancer-manager page" /></p>

     <p>
       This form allows the devops admin to adjust various parameters, take
       workers offline, change load balancing methods and add new works. For
       example, clicking on the balancer itself, you will get the following page:
     </p>
     <p class="centered"><img src="../images/bal-man-b.png" alt="balancer-manager page" /></p>

     <p>
       Whereas clicking on a worker, displays this page:
     </p>
     <p class="centered"><img src="../images/bal-man-w.png" alt="balancer-manager page" /></p>

     <p>
       To have these changes persist restarts of the reverse proxy, ensure that
       <directive module="mod_proxy">BalancerPersist</directive> is enabled.
     </p>

   </section>

   <section id="health-check">
     <title>Dynamic Health Checks</title>

     <p>
       Before httpd proxies a request to a worker, it can <em>"test"</em> if that worker
       is available via setting the <code>ping</code> parameter for that worker using
       <directive module="mod_proxy">ProxyPass</directive>. Oftentimes it is
       more useful to check the health of the workers <em>out of band</em>, in a
       dynamic fashion. This is achieved in Apache httpd by the
       <module>mod_proxy_hcheck</module> module.
     </p>

   </section>

   <section id="status">
     <title>BalancerMember status flags</title>

     <p>
       In the <em>balancer-manager</em> the current state, or <em>status</em>, of a worker
       is displayed and can be set/reset. The meanings of these statuses are as follows:
     </p>
       <table border="1">
       	<tr><th>Flag</th><th>String</th><th>Description</th></tr>
       	<tr><td>&nbsp;</td><td><em>Ok</em></td><td>Worker is available</td></tr>
       	<tr><td>&nbsp;</td><td><em>Init</em></td><td>Worker has been initialized</td></tr>
         <tr><td><code>D</code></td><td><em>Dis</em></td><td>Worker is disabled and will not accept any requests; will be
                     automatically retried.</td></tr>
         <tr><td><code>S</code></td><td><em>Stop</em></td><td>Worker is administratively stopped; will not accept requests
                     and will not be automatically retried</td></tr>
         <tr><td><code>I</code></td><td><em>Ign</em></td><td>Worker is in ignore-errors mode and will always be considered available.</td></tr>
         <tr><td><code>H</code></td><td><em>Stby</em></td><td>Worker is in hot-standby mode and will only be used if no other
                     viable workers are available.</td></tr>
         <tr><td><code>E</code></td><td><em>Err</em></td><td>Worker is in an error state, usually due to failing pre-request check;
                     requests will not be proxied to this worker, but it will be retried depending on
                     the <code>retry</code> setting of the worker.</td></tr>
         <tr><td><code>N</code></td><td><em>Drn</em></td><td>Worker is in drain mode and will only accept existing sticky sessions
                     destined for itself and ignore all other requests.</td></tr>
         <tr><td><code>C</code></td><td><em>HcFl</em></td><td>Worker has failed dynamic health check and will not be used until it
                     passes subsequent health checks.</td></tr>
       </table>
   </section>

 </manualpage>
	<?xml version='1.0' encoding='UTF-8' ?>
	<!DOCTYPE manualpage SYSTEM "../style/manualpage.dtd">
	<?xml-stylesheet type="text/xsl" href="../style/manual.en.xsl"?>
	<!-- $LastChangedRevision$ -->

	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->

	<manualpage metafile="reverse_proxy.xml.meta">
	<parentdocument href="./">How-To / Tutorials</parentdocument>

	<title>Reverse Proxy Guide</title>

	<summary>
	<p>In addition to being a "basic" web server, and providing static and
	dynamic content to end-users, Apache httpd (as well as most other web
	servers) can also act as a reverse proxy server, also-known-as a
	"gateway" server.</p>

	<p>In such scenarios, httpd itself does not generate or host the data,
	but rather the content is obtained by one or several backend servers,
	which normally have no direct connection to the external network. As
	httpd receives a request from a client, the request itself is <em>proxied</em>
	to one of these backend servers, which then handles the request, generates
	the content and then sends this content back to httpd, which then
	generates the actual HTTP response back to the client.</p>

	<p>There are numerous reasons for such an implementation, but generally
	the typical rationales are due to security, high-availability, load-balancing
	and centralized authentication/authorization. It is critical in these
	implementations that the layout, design and architecture of the backend
	infrastructure (those servers which actually handle the requests) are
	insulated and protected from the outside; as far as the client is concerned,
	the reverse proxy server <em>is</em> the sole source of all content.</p>

	<p>A typical implementation is below:</p>
	<p class="centered"><img src="../images/reverse-proxy-arch.png" alt="reverse-proxy-arch" /></p>

	</summary>


	<section id="related">
	<title>Reverse Proxy</title>
	<related>
	<modulelist>
	<module>mod_proxy</module>
	<module>mod_proxy_balancer</module>
	<module>mod_proxy_hcheck</module>
	</modulelist>
	<directivelist>
	<directive module="mod_proxy">ProxyPass</directive>
	<directive module="mod_proxy">BalancerMember</directive>
	</directivelist>
	</related>
	</section>

	<section id="simple">
	<title>Simple reverse proxying</title>

	<p>
	The <directive module="mod_proxy">ProxyPass</directive>
	directive specifies the mapping of incoming requests to the backend
	server (or a cluster of servers known as a <code>Balancer</code>
	group). The simpliest example proxies all requests (<code>"/"</code>)
	to a single backend:
	</p>

	<highlight language="config">
	ProxyPass "/" "http://www.example.com/"
	</highlight>

	<p>
	To ensure that and <code>Location:</code> headers generated from
	the backend are modified to point to the reverse proxy, instead of
	back to itself, the <directive module="mod_proxy">ProxyPassReverse</directive>
	directive is most often required:
	</p>

	<highlight language="config">
	ProxyPass "/" "http://www.example.com/"
	ProxyPassReverse "/" "http://www.example.com/"
	</highlight>

	<p>Only specific URIs can be proxied, as shown in this example:</p>

	<highlight language="config">
	ProxyPass "/images" "http://www.example.com/"
	ProxyPassReverse "/images" "http://www.example.com/"
	</highlight>

	<p>In the above, any requests which start with the <code>/images</code>
	path with be proxied to the specified backend, otherwise it will be handled
	locally.
	</p>
	</section>

	<section id="cluster">
	<title>Clusters and Balancers</title>

	<p>
	As useful as the above is, it still has the deficiencies that should
	the (single) backend node go down, or become heavily loaded, that proxying
	those requests provides no real advantage. What is needed is the ability
	to define a set or group of backend servers which can handle such
	requests and for the reverse proxy to load balance and failover among
	them. This group is sometimes called a <em>cluster</em> but Apache httpd's
	term is a <em>balancer</em>. One defines a balancer by leveraging the
	<directive module="mod_proxy" type="section">Proxy</directive> and
	<directive module="mod_proxy">BalancerMember</directive> directives as
	shown:
	</p>

	<highlight language="config">
	<Proxy balancer://myset>
	BalancerMember http://www2.example.com:8080
	BalancerMember http://www3.example.com:8080
	ProxySet lbmethod=bytraffic
	</Proxy>

	ProxyPass "/images/" "balancer://myset/"
	ProxyPassReverse "/images/" "balancer://myset/"
	</highlight>

	<p>
	The <code>balancer://</code> scheme is what tells httpd that we are creating
	a balancer set, with the name <em>myset</em>. It includes 2 backend servers,
	which httpd calls <em>BalancerMembers</em>. In this case, any requests for
	<code>/images</code> will be proxied to <em>one</em> of the 2 backends.
	The <directive module="mod_proxy">ProxySet</directive> directive
	specifies that the <em>myset</em> Balancer use a load balancing algorithm
	that balances based on I/O bytes.
	</p>

	<note type="hint"><title>Hint</title>
	<p>
	<em>BalancerMembers</em> are also sometimes referred to as <em>workers</em>.
	</p>
	</note>

	</section>

	<section id="config">
	<title>Balancer and BalancerMember configuration</title>

	<p>
	You can adjust numerous configuration details of the <em>balancers</em>
	and the <em>workers</em> via the various parameters defined in
	<directive module="mod_proxy">ProxyPass</directive>. For example,
	assuming we would want <code>http://www3.example.com:8080</code> to
	handle 3x the traffic with a timeout of 1 second, we would adjust the
	configuration as follows:
	</p>

	<highlight language="config">
	<Proxy balancer://myset>
	BalancerMember http://www2.example.com:8080
	BalancerMember http://www3.example.com:8080 loadfactor=3 timeout=1
	ProxySet lbmethod=bytraffic
	</Proxy>

	ProxyPass "/images" "balancer://myset/"
	ProxyPassReverse "/images" "balancer://myset/"
	</highlight>

	</section>

	<section id="failover">
	<title>Failover</title>

	<p>
	You can also fine-tune various failover scenarios, detailing which
	workers and even which balancers should accessed in such cases. For
	example, the below setup implements 2 failover cases: In the first,
	<code>http://hstandby.example.com:8080</code> is only sent traffic
	if all other workers in the <em>myset</em> balancer are not available.
	If that worker itself is not available, only then will the
	<code>http://bkup1.example.com:8080</code> and <code>http://bkup2.example.com:8080</code>
	workers be brought into rotation:
	</p>

	<highlight language="config">
	<Proxy balancer://myset>
	BalancerMember http://www2.example.com:8080
	BalancerMember http://www3.example.com:8080 loadfactor=3 timeout=1
	BalancerMember http://hstandby.example.com:8080 status=+H
	BalancerMember http://bkup1.example.com:8080 lbset=1
	BalancerMember http://bkup2.example.com:8080 lbset=1
	ProxySet lbmethod=byrequests
	</Proxy>

	ProxyPass "/images/" "balancer://myset/"
	ProxyPassReverse "/images/" "balancer://myset/"
	</highlight>

	<p>
	The magic of this failover setup is setting <code>http://hstandby.example.com:8080</code>
	with the <code>+H</code> status flag, which puts it in <em>hot standby</em> mode,
	and making the 2 <code>bkup#</code> servers part of the #1 load balancer set (the
	default set is 0); for failover, hot standbys (if they exist) are used 1st, when all regular
	workers are unavailable; load balancer sets are always tried lowest number first.
	</p>

	</section>

	<section id="manager">
	<title>Balancer Manager</title>

	<p>
	One of the most unique and useful features of Apache httpd's reverse proxy is
	the embedded <em>balancer-manager</em> application. Similar to
	<module>mod_status</module>, <em>balancer-manager</em> displays
	the current working configuration and status of the enabled
	balancers and workers currently in use. However, not only does it
	display these parameters, it also allows for dynamic, runtime, on-the-fly
	reconfiguration of almost all of them, including adding new <em>BalancerMembers</em>
	(workers) to an existing balancer. To enable these capability, the following
	needs to be added to your configuration:
	</p>

	<highlight language="config">
	<Location "/balancer-manager">
	SetHandler balancer-manager
	Require host localhost
	</Location>
	</highlight>

	<note type="warning"><title>Warning</title>
	<p>Do not enable the <em>balancer-manager</em> until you have <a
	href="../mod/mod_proxy.html#access">secured your server</a>. In
	particular, ensure that access to the URL is tightly
	restricted.</p>
	</note>

	<p>
	When the reverse proxy server is accessed at that url
	(eg: <code>http://rproxy.example.com/balancer-manager/</code>, you will see a
	page similar to the below:
	</p>
	<p class="centered"><img src="../images/bal-man.png" alt="balancer-manager page" /></p>

	<p>
	This form allows the devops admin to adjust various parameters, take
	workers offline, change load balancing methods and add new works. For
	example, clicking on the balancer itself, you will get the following page:
	</p>
	<p class="centered"><img src="../images/bal-man-b.png" alt="balancer-manager page" /></p>

	<p>
	Whereas clicking on a worker, displays this page:
	</p>
	<p class="centered"><img src="../images/bal-man-w.png" alt="balancer-manager page" /></p>

	<p>
	To have these changes persist restarts of the reverse proxy, ensure that
	<directive module="mod_proxy">BalancerPersist</directive> is enabled.
	</p>

	</section>

	<section id="health-check">
	<title>Dynamic Health Checks</title>

	<p>
	Before httpd proxies a request to a worker, it can <em>"test"</em> if that worker
	is available via setting the <code>ping</code> parameter for that worker using
	<directive module="mod_proxy">ProxyPass</directive>. Oftentimes it is
	more useful to check the health of the workers <em>out of band</em>, in a
	dynamic fashion. This is achieved in Apache httpd by the
	<module>mod_proxy_hcheck</module> module.
	</p>

	</section>

	<section id="status">
	<title>BalancerMember status flags</title>

	<p>
	In the <em>balancer-manager</em> the current state, or <em>status</em>, of a worker
	is displayed and can be set/reset. The meanings of these statuses are as follows:
	</p>
	<table border="1">
	<tr><th>Flag</th><th>String</th><th>Description</th></tr>
	<tr><td> </td><td><em>Ok</em></td><td>Worker is available</td></tr>
	<tr><td> </td><td><em>Init</em></td><td>Worker has been initialized</td></tr>
	<tr><td><code>D</code></td><td><em>Dis</em></td><td>Worker is disabled and will not accept any requests; will be
	automatically retried.</td></tr>
	<tr><td><code>S</code></td><td><em>Stop</em></td><td>Worker is administratively stopped; will not accept requests
	and will not be automatically retried</td></tr>
	<tr><td><code>I</code></td><td><em>Ign</em></td><td>Worker is in ignore-errors mode and will always be considered available.</td></tr>
	<tr><td><code>H</code></td><td><em>Stby</em></td><td>Worker is in hot-standby mode and will only be used if no other
	viable workers are available.</td></tr>
	<tr><td><code>E</code></td><td><em>Err</em></td><td>Worker is in an error state, usually due to failing pre-request check;
	requests will not be proxied to this worker, but it will be retried depending on
	the <code>retry</code> setting of the worker.</td></tr>
	<tr><td><code>N</code></td><td><em>Drn</em></td><td>Worker is in drain mode and will only accept existing sticky sessions
	destined for itself and ignore all other requests.</td></tr>
	<tr><td><code>C</code></td><td><em>HcFl</em></td><td>Worker has failed dynamic health check and will not be used until it
	passes subsequent health checks.</td></tr>
	</table>
	</section>

	</manualpage>