| <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> |
| <HTML> |
| <HEAD> |
| <META NAME="description" CONTENT="Some 'how to' tips for the Apache httpd server"> |
| <META NAME="keywords" CONTENT="apache,redirect,robots,rotate,logfiles"> |
| <TITLE>Apache HOWTO documentation</TITLE> |
| </HEAD> |
| |
| <BODY> |
| <!--#include virtual="header.html" --> |
| <H1>Apache HOWTO documentation</H1> |
| |
| How to: |
| <ul> |
| <li><A HREF="#redirect">redirect an entire server or directory to a single URL</A> |
| <li><A HREF="#logreset">reset your log files</A> |
| <li><A HREF="#stoprob">stop/restrict robots</A> |
| </ul> |
| |
| <HR> |
| <H2><A name="redirect">How to redirect an entire server or directory to a single URL</A></H2> |
| |
| <P>There are two chief ways to redirect all requests for an entire |
| server to a single location: one which requires the use of |
| <code>mod_rewrite</code>, and another which uses a CGI script. |
| |
| <P>First: if all you need to do is migrate a server from one name to |
| another, simply use the <code>Redirect</code> directive, as supplied |
| by <code>mod_alias</code>: |
| |
| <blockquote><pre> |
| Redirect / http://www.apache.org/ |
| </pre></blockquote> |
| |
| <P>Since <code>Redirect</code> will forward along the complete path, |
| however, it may not be appropriate - for example, when the directory |
| structure has changed after the move, and you simply want to direct people |
| to the home page. |
| |
| <P>The best option is to use the standard Apache module <code>mod_rewrite</code>. |
| If that module is compiled in, the following lines: |
| |
| <blockquote><pre>RewriteEngine On |
| RewriteRule /.* http://www.apache.org/ [R] |
| </pre></blockquote> |
| |
| This will send an HTTP 302 Redirect back to the client, and no matter |
| what they gave in the original URL, they'll be sent to |
| "http://www.apache.org". |
| |
| The second option is to set up a <CODE>ScriptAlias</Code> pointing to |
| a <B>cgi script</B> which outputs a 301 or 302 status and the location |
| of the other server.</P> |
| |
| <P>By using a <B>cgi-script</B> you can intercept various requests and |
| treat them specially, e.g. you might want to intercept <B>POST</B> |
| requests, so that the client isn't redirected to a script on the other |
| server which expects POST information (a redirect will lose the POST |
| information.) You might also want to use a CGI script if you don't |
| want to compile mod_rewrite into your server. |
| |
| <P>Here's how to redirect all requests to a script... In the server |
| configuration file, |
| <blockquote><pre>ScriptAlias / /usr/local/httpd/cgi-bin/redirect_script</pre></blockquote> |
| |
| and here's a simple perl script to redirect requests: |
| |
| <blockquote><pre> |
| #!/usr/local/bin/perl |
| |
| print "Status: 302 Moved Temporarily\r |
| Location: http://www.some.where.else.com/\r\n\r\n"; |
| |
| </pre></blockquote></P> |
| |
| <HR> |
| |
| <H2><A name="logreset">How to reset your log files</A></H2> |
| |
| <P>Sooner or later, you'll want to reset your log files (access_log and |
| error_log) because they are too big, or full of old information you don't |
| need.</P> |
| |
| <P><CODE>access.log</CODE> typically grows by 1Mb for each 10,000 requests.</P> |
| |
| <P>Most people's first attempt at replacing the logfile is to just move the |
| logfile or remove the logfile. This doesn't work.</P> |
| |
| <P>Apache will continue writing to the logfile at the same offset as before the |
| logfile moved. This results in a new logfile being created which is just |
| as big as the old one, but it now contains thousands (or millions) of null |
| characters.</P> |
| |
| <P>The correct procedure is to move the logfile, then signal Apache to tell it to reopen the logfiles.</P> |
| |
| <P>Apache is signaled using the <B>SIGHUP</B> (-1) signal. e.g. |
| <blockquote><code> |
| mv access_log access_log.old<BR> |
| kill -1 `cat httpd.pid` |
| </code></blockquote> |
| </P> |
| |
| <P>Note: <code>httpd.pid</code> is a file containing the <B>p</B>rocess <B>id</B> |
| of the Apache httpd daemon, Apache saves this in the same directory as the log |
| files.</P> |
| |
| <P>Many people use this method to replace (and backup) their logfiles on a |
| nightly or weekly basis.</P> |
| <HR> |
| |
| <H2><A name="stoprob">How to stop or restrict robots</A></H2> |
| |
| <P>Ever wondered why so many clients are interested in a file called |
| <code>robots.txt</code> which you don't have, and never did have?</P> |
| |
| <P>These clients are called <B>robots</B> (also known as crawlers, |
| spiders and other cute name) - special automated clients which |
| wander around the web looking for interesting resources.</P> |
| |
| <P>Most robots are used to generate some kind of <em>web index</em> which |
| is then used by a <em>search engine</em> to help locate information.</P> |
| |
| <P><code>robots.txt</code> provides a means to request that robots limit their |
| activities at the site, or more often than not, to leave the site alone.</P> |
| |
| <P>When the first robots were developed, they had a bad reputation for sending hundreds/thousands of requests to each site, often resulting in the site being overloaded. Things have improved dramatically since then, thanks to <A HREF="http://info.webcrawler.com/mak/projects/robots/guidelines.html"> Guidelines for Robot Writers</A>, but even so, some robots may <A HREF="http://www.zyzzyva.com/robots/alert/">exhibit unfriendly behavior</A> which the webmaster isn't willing to tolerate, and will want to stop.</P> |
| |
| <P>Another reason some webmasters want to block access to robots, is to |
| stop them indexing dynamic information. Many search engines will use the |
| data collected from your pages for months to come - not much use if your |
| serving stock quotes, news, weather reports or anything else that will be |
| stale by the time people find it in a search engine.</P> |
| |
| <P>If you decide to exclude robots completely, or just limit the areas |
| in which they can roam, create a <CODE>robots.txt</CODE> file; refer |
| to the <A HREF="http://info.webcrawler.com/mak/projects/robots/robots.html">robot information pages</A> provided by Martijn Koster for the syntax.</P> |
| |
| <!--#include virtual="footer.html" --> |
| </BODY> |
| </HTML> |