| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> |
| <HTML> |
| <HEAD> |
| <META NAME="description" |
| CONTENT="Some 'how to' tips for the Apache httpd server"> |
| <META NAME="keywords" CONTENT="apache,redirect,robots,rotate,logfiles"> |
| <TITLE>Apache HOWTO documentation</TITLE> |
| </HEAD> |
| |
| <!-- Background white, links blue (unvisited), navy (visited), red (active) --> |
| <BODY |
| BGCOLOR="#FFFFFF" |
| TEXT="#000000" |
| LINK="#0000FF" |
| VLINK="#000080" |
| ALINK="#FF0000" |
| > |
| <!--#include virtual="header.html" --> |
| <H1 ALIGN="CENTER">Apache HOWTO documentation</H1> |
| |
| How to: |
| <UL> |
| <LI><A HREF="#redirect">redirect an entire server or directory to a single |
| URL</A> |
| <LI><A HREF="#logreset">reset your log files</A> |
| <LI><A HREF="#stoprob">stop/restrict robots</A> |
| <LI><A HREF="#proxyssl">proxy SSL requests <EM>through</EM> your non-SSL |
| server</A> |
| </UL> |
| |
| <HR> |
| <H2><A NAME="redirect">How to redirect an entire server or directory to a |
| single URL</A></H2> |
| |
| <P>There are two chief ways to redirect all requests for an entire |
| server to a single location: one which requires the use of |
| <CODE>mod_rewrite</CODE>, and another which uses a CGI script. |
| |
| <P>First: if all you need to do is migrate a server from one name to |
| another, simply use the <CODE>Redirect</CODE> directive, as supplied |
| by <CODE>mod_alias</CODE>: |
| |
| <BLOCKQUOTE><PRE> |
| Redirect / http://www.apache.org/ |
| </PRE></BLOCKQUOTE> |
| |
| <P>Since <CODE>Redirect</CODE> will forward along the complete path, |
| however, it may not be appropriate - for example, when the directory |
| structure has changed after the move, and you simply want to direct people |
| to the home page. |
| |
| <P>The best option is to use the standard Apache module |
| <CODE>mod_rewrite</CODE>. |
| If that module is compiled in, the following lines |
| |
| <BLOCKQUOTE><PRE>RewriteEngine On |
| RewriteRule /.* http://www.apache.org/ [R] |
| </PRE></BLOCKQUOTE> |
| |
| This will send an HTTP 302 Redirect back to the client, and no matter |
| what they gave in the original URL, they'll be sent to |
| "http://www.apache.org". |
| |
| The second option is to set up a <CODE>ScriptAlias</CODE> pointing to |
| a <STRONG>CGI script</STRONG> which outputs a 301 or 302 status and the |
| location |
| of the other server.</P> |
| |
| <P>By using a <STRONG>CGI script</STRONG> you can intercept various requests |
| and |
| treat them specially, <EM>e.g.</EM>, you might want to intercept |
| <STRONG>POST</STRONG> |
| requests, so that the client isn't redirected to a script on the other |
| server which expects POST information (a redirect will lose the POST |
| information.) You might also want to use a CGI script if you don't |
| want to compile mod_rewrite into your server. |
| |
| <P>Here's how to redirect all requests to a script... In the server |
| configuration file, |
| <BLOCKQUOTE><PRE>ScriptAlias / /usr/local/httpd/cgi-bin/redirect_script/</PRE> |
| </BLOCKQUOTE> |
| |
| and here's a simple perl script to redirect requests: |
| |
| <BLOCKQUOTE><PRE> |
| #!/usr/local/bin/perl |
| |
| print "Status: 302 Moved Temporarily\r\n" . |
| "Location: http://www.some.where.else.com/\r\n" . |
| "\r\n"; |
| |
| </PRE></BLOCKQUOTE></P> |
| |
| <HR> |
| |
| <H2><A NAME="logreset">How to reset your log files</A></H2> |
| |
| <P>Sooner or later, you'll want to reset your log files (access_log and |
| error_log) because they are too big, or full of old information you don't |
| need.</P> |
| |
| <P><CODE>access.log</CODE> typically grows by 1Mb for each 10,000 requests.</P> |
| |
| <P>Most people's first attempt at replacing the logfile is to just move the |
| logfile or remove the logfile. This doesn't work.</P> |
| |
| <P>Apache will continue writing to the logfile at the same offset as before the |
| logfile moved. This results in a new logfile being created which is just |
| as big as the old one, but it now contains thousands (or millions) of null |
| characters.</P> |
| |
| <P>The correct procedure is to move the logfile, then signal Apache to tell |
| it to reopen the logfiles.</P> |
| |
| <P>Apache is signaled using the <STRONG>SIGHUP</STRONG> (-1) signal. |
| <EM>e.g.</EM> |
| <BLOCKQUOTE><CODE> |
| mv access_log access_log.old<BR> |
| kill -1 `cat httpd.pid` |
| </CODE></BLOCKQUOTE> |
| </P> |
| |
| <P>Note: <CODE>httpd.pid</CODE> is a file containing the |
| <STRONG>p</STRONG>rocess <STRONG>id</STRONG> |
| of the Apache httpd daemon, Apache saves this in the same directory as the log |
| files.</P> |
| |
| <P>Many people use this method to replace (and backup) their logfiles on a |
| nightly or weekly basis.</P> |
| <HR> |
| |
| <H2><A NAME="stoprob">How to stop or restrict robots</A></H2> |
| |
| <P>Ever wondered why so many clients are interested in a file called |
| <CODE>robots.txt</CODE> which you don't have, and never did have?</P> |
| |
| <P>These clients are called <STRONG>robots</STRONG> (also known as crawlers, |
| spiders and other cute name) - special automated clients which |
| wander around the web looking for interesting resources.</P> |
| |
| <P>Most robots are used to generate some kind of <EM>web index</EM> which |
| is then used by a <EM>search engine</EM> to help locate information.</P> |
| |
| <P><CODE>robots.txt</CODE> provides a means to request that robots limit their |
| activities at the site, or more often than not, to leave the site alone.</P> |
| |
| <P>When the first robots were developed, they had a bad reputation for |
| sending hundreds/thousands of requests to each site, often resulting |
| in the site being overloaded. Things have improved dramatically since |
| then, thanks to <A |
| HREF="http://info.webcrawler.com/mak/projects/robots/guidelines.html"> |
| Guidelines for Robot Writers</A>, but even so, some robots may exhibit |
| unfriendly behavior which the webmaster isn't willing to tolerate, and |
| will want to stop.</P> |
| |
| <P>Another reason some webmasters want to block access to robots, is to |
| stop them indexing dynamic information. Many search engines will use the |
| data collected from your pages for months to come - not much use if your |
| serving stock quotes, news, weather reports or anything else that will be |
| stale by the time people find it in a search engine.</P> |
| |
| <P>If you decide to exclude robots completely, or just limit the areas |
| in which they can roam, create a <CODE>robots.txt</CODE> file; refer |
| to the <A HREF="http://info.webcrawler.com/mak/projects/robots/robots.html" |
| >robot information pages</A> provided by Martijn Koster for the syntax.</P> |
| |
| <HR> |
| <H2><A NAME="proxyssl">How to proxy SSL requests <EM>through</EM> |
| your non-SSL Apache server</A> |
| <BR> |
| <SMALL>(<EM>submitted by David Sedlock</EM>)</SMALL> |
| </H2> |
| <P> |
| SSL uses port 443 for requests for secure pages. If your browser just |
| sits there for a long time when you attempt to access a secure page |
| over your Apache proxy, then the proxy may not be configured to handle |
| SSL. You need to instruct Apache to listen on port 443 in addition to |
| any of the ports on which it is already listening: |
| </P> |
| <PRE> |
| Listen 80 |
| Listen 443 |
| </PRE> |
| <P> |
| Then set the security proxy in your browser to 443. That might be it! |
| </P> |
| <P> |
| If your proxy is sending requests to another proxy, then you may have |
| to set the directive ProxyRemote differently. Here are my settings: |
| </P> |
| <PRE> |
| ProxyRemote http://nicklas:80/ http://proxy.mayn.franken.de:8080 |
| ProxyRemote http://nicklas:443/ http://proxy.mayn.franken.de:443 |
| </PRE> |
| <P> |
| Requests on port 80 of my proxy <SAMP>nicklas</SAMP> are forwarded to |
| proxy<SAMP>.mayn.franken.de:8080</SAMP>, while requests on port 443 are |
| forwarded to <SAMP>proxy.mayn.franken.de:443</SAMP>. |
| If the remote proxy is not set up to |
| handle port 443, then the last directive can be left out. SSL requests |
| will only go over the first proxy. |
| </P> |
| <P> |
| Note that your Apache does NOT have to be set up to serve secure pages |
| with SSL. Proxying SSL is a different thing from using it. |
| </P> |
| <!--#include virtual="footer.html" --> |
| </BODY> |
| </HTML> |