docs/manual/logs.html.en - httpd - Git at Google

 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
 <HTML>
 <HEAD>
 <TITLE>Log Files - Apache HTTP Server</TITLE>
 </HEAD>

 <!-- Background white, links blue (unvisited), navy (visited), red (active) -->
 <BODY
  BGCOLOR="#FFFFFF"
  TEXT="#000000"
  LINK="#0000FF"
  VLINK="#000080"
  ALINK="#FF0000"
 >
 <!--#include virtual="header.html" -->
 <h1 align="center">Log Files</h1>

 <p>In order to effectively manage a web server, it is necessary to get
 feedback about the activity and performance of the server as well as
 any problems that may be occuring.  The Apache HTTP Server provides
 very comprehensive and flexible logging capabilities.  This document
 describes how to configure its logging capabilities, and how to
 understand what the logs contain.</p>

 <ul>
 <li><a href="#security">Security Warning</a></li>
 <li><a href="#errorlog">Error Log</a></li>
 <li><a href="#accesslog">Access Log</a>
   <ul>
     <li><a href="#common">Common Log Format</a></li>
     <li><a href="#combined">Combined Log Format</a></li>
     <li><a href="#multiple">Multiple Access Logs</a></li>
     <li><a href="#conditional">Conditional Logging</a></li>
   </ul></li>
 <li><a href="#rotation">Log Rotation</a></li>
 <li><a href="#piped">Piped Logs</a></li>
 <li><a href="#virtualhosts">VirtualHosts</a>
 <li><a href="#other">Other Log Files</a>
   <ul>
     <li><a href="#pidfile">PID File</a></li>
     <li><a href="#scriptlog">Script Log</a></li>
     <li><a href="#rewritelog">Rewrite Log</a></li>
   </ul></li>
 </ul>

 <hr>

 <h2><a name="security">Security Warning</a></h2>

 <p>Anyone who can write to the directory where Apache is writing a
 log file can almost certainly gain access to the uid that the server is
 started as, which is normally root.  Do <EM>NOT</EM> give people write
 access to the directory the logs are stored in without being aware of
 the consequences; see the <A HREF="misc/security_tips.html">security tips</A>
 document for details.</p>

 <p>In addition, log files may contain information supplied directly
 by the client, without escaping.  Therefore, it is possible for
 malicious clients to insert control-characters in the log files, so
 care must be taken in dealing with raw logs.</p>

 <hr>

 <h2><a name="errorlog">Error Log</a></h2>

 <table border="1">
 <tr><td valign="top">
 <strong>Related Directives</strong><br><br>

 <a href="mod/core.html#errorlog">ErrorLog</a><br>
 <a href="mod/core.html#loglevel">LogLevel</a>
 </td></tr></table>

 <p>The server error log, whose name and location is set by the <a
 href="mod/core.html#errorlog">ErrorLog</a> directive, is the most
 important log file.  This is the place where Apache httpd will send
 diagnostic information and record any errors that it encounters in
 processing requests.  It is the first place to look when a problem
 occurs with starting the server or with the operation of the server,
 since it will often contain details of what went wrong and how to fix
 it.</p>

 <p>The error log is usually written to a file (typically
 <code>error_log</code> on unix systems and <code>error.log</code> on
 Windows and OS/2).  On unix systems it is also possible to have the
 server send errors to <code>syslog</code> or <a href="#pipe">pipe
 them to a program</a>.</p>

 <p>The format of the error log is relatively free-form and
 descriptive.  But there is certain information that is contained
 in most error log entries.  For example, here is a typical message.</p>

 <blockquote><code>
 [Wed Oct 11 14:32:52 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test
 </code></blockquote>

 <p>The first item in the log entry is the date and time of the
 message.  The second entry lists the severity of the error being
 reported. The <a href="mod/core.html#loglevel">LogLevel</a> directive
 is used to control the types of errors that are sent to the error log
 by restricting the severity level.  The third entry gives the IP
 address of the client that generated the error.  Beyond that is the
 message itself, which in this case indicates that the server has been
 configured to deny the client access.  The server reports the
 file-system path (as opposed to the web path) of the requested
 document.</p>

 <p>A very wide variety of different messages can appear in the error
 log.  Most look similar to the example above.  The error log will also
 contain debugging output from CGI scripts.  Any information written to
 <code>stderr</code> by a CGI script will be copied directly to the
 error log.</p>

 <p>It is not possible to customize the error log by adding or removing
 information.  However, error log entries dealing with particular
 requests have corresponding entries in the <a href="accesslog">access
 log</a>.  For example, the above example entry corresponds to an
 access log entry with status code 403.  Since it is possible to
 customize the access log, you can obtain more information about error
 conditions using that log file.</p>

 <p>During testing, it is often useful to continuously monitor the
 error log for any problems.  On unix systems, you can accomplish this
 using:</p>
 <blockquote><code>
 tail -f error_log
 </code></blockquote>

 <hr>

 <h2><a name="accesslog">Access Log</a></h2>

 <table border=1><tr><td valign="top">
 <strong>Related Modules</strong><br><br>

 <a href="mod/mod_log_config.html">mod_log_config</a><br>

 </td><td valign="top">
 <strong>Related Directives</strong><br><br>

 <a href="mod/mod_log_config.html#customlog">CustomLog</a><br>
 <a href="mod/mod_log_config.html#logformat">LogFormat</a><br>
 <a href="mod/mod_setenvif.html#setenvif">SetEnvIf</a>

 </td></tr></table>

 <p>The server access log records all requests processed by the server.
 The location and content of the access log are controlled
 by the <a href="mod/mod_log_config.html#customlog">CustomLog</a>
 directive.  The <a
 href="mod/mod_log_config.html#logformat">LogFormat</a> directive can
 be used to simplify the selection of the contents of the logs.
 This section describes how to configure the server to record
 information in the access log.</p>

 <p>Of course, storing the information in the access log is only the
 start of log management.  The next step is to analyze this information
 to produce useful statistics.  Log analysis in general is beyond the
 scope of this document, and not really part of the job of the
 web server itself.  For more information about this topic, and for
 applications which perform log analysis, check the <a
 href="http://dmoz.org/Computers/Software/Internet/Site_Management/Log_analysis/"
 >Open Directory</a> or <a
 href="http://dir.yahoo.com/Computers_and_Internet/Software/Internet/World_Wide_Web/Servers/Log_Analysis_Tools/"
 >Yahoo</a>.</p>

 <p>Various versions of Apache httpd have used other modules and
 directives to control access logging, including mod_log_referer,
 mod_log_agent, and the <code>TransferLog</code> directive.  The
 <code>CustomLog</code> directive now subsumes the functionality of all
 the older directives.</p>

 <p>The format of the access log is highly configurable.  The format is
 specified using a <a href="mod/mod_log_config.html#format">format
 string</a> that looks much like a C-style printf(1) format string.
 Some examples are presented in the next sections.  For a complete list
 of the possible contents of the format string, see the <a
 href="mod/mod_log_config.html">mod_log_config documentation</a>.</p>

 <h3><a name="common">Common Log Format</a></h3>

 <p>A typical configuration for the access log might look
 as follows.</p>

 <blockquote><code>
 LogFormat "%h %l %u %t \"%r\" %>s %b" common<br>
 CustomLog logs/access_log common
 </code></blockquote>

 <p>This defines the <em>nickname</em> <code>common</code> and
 associates it with a particular log format string.  The format string
 consists of percent directives, each of which tell the server to log a
 particular piece of information.  Literal characters may also be
 placed in the format string and will be copied directly into the log
 output.  The quote character (<code>"</code>) must be escaped by
 placing a back-slash before it to prevent it from being interpreted as
 the end of the format string.  The format string may also contain the
 special control characters "<code>\n</code>" for new-line and
 "<code>\t</code>" for tab.</p>

 <p>The <code>CustomLog</code> directive sets up a new log file using
 the defined <em>nickname</em>.  The filename for the access log is
 relative to the <a href="mod/core.html#serverroot">ServerRoot</a>
 unless it begins with a slash.</p>

 <p>The above configuration will write log entries in a format known as
 the Common Log Format (CLF).  This standard format can be produced by
 many different web servers and read by many log analysis programs.
 The log file entries produced in CLF will look something like
 this:</p>

 <blockquote><code>
 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
 </code></blockquote>

 <p>Each part of this log entry is described below.</p>

 <dl>
 <dt><code>127.0.0.1</code> (<code>%h</code>)</dt> <dd>This is the IP
 address of the client (remote host) which made the request to the
 server.  If <a
 href="mod/core.html#hostnamelookups">HostNameLookups</a> is set to
 <code>On</code>, then the server will try to determine the hostname
 and log it in place of the IP address.  However, this configuration is
 not recommended since it can significantly slow the server.  Instead,
 it is best to use a log post-processor such as <a
 href="programs/logresolve.html">logresolve</a> to determine the
 hostnames.  The IP address reported here is not necessarily the
 address of the machine at which the user is sitting.  If a proxy
 server exists between the user and the server, this address will be
 the address of the proxy, rather than the originating machine.</dd>

 <dt><code>-</code> (<code>%l</code>)</dt> <dd>The "hyphen" in the
 output indicates that the requested piece of information is not
 available.  In this case, the information that is not available is the
 RFC 1413 identity of the client determined by <code>identd</code> on
 the clients machine.  This information is highly unreliable and should
 almost never be used except on tightly controlled internal networks.
 Apache httpd will not even attempt to determine this information
 unless <a href="mod/core.html#identitycheck">IdentityCheck</a> is set
 to <code>On</code>.</dd>

 <dt><code>frank</code> (<code>%u</code>)</dt> <dd>This is the userid
 of the person requesting the document as determined by HTTP
 authentication.  The same value is typically provided to CGI scripts
 in the <code>REMOTE_USER</code> environment variable.  If the status
 code for the request (see below) is 401, then this value should not be
 trusted because the user is not yet authenticated.  If the document is
 not password protected, this entry will be "<code>-</code>" just like
 the previous one.</dd>

 <dt><code>[10/Oct/2000:13:55:36 -0700]</code> (<code>%t</code>)</dt>
 <dd>The time that the server finished processing the request.  The
 format is:
 <BLOCKQUOTE><CODE> [day/month/year:hour:minute:second zone] <BR>
 day = 2*digit<BR>
 month = 3*letter<BR>
 year = 4*digit<BR>
 hour = 2*digit<BR>
 minute = 2*digit<BR>
 second = 2*digit<BR>
 zone = (`+' | `-') 4*digit</CODE></BLOCKQUOTE>
 It is possible to have the time displayed in another format
 by specifying <code>%{format}t</code> in the log format string, where
 <code>format</code> is as in <code>strftime(3)</code> from the C
 standard library.
 </dd>

 <dt><code>"GET /apache_pb.gif HTTP/1.0"</code>
 (<code>\"%r\"</code>)</dt> <dd>The request line from the client is
 given in double quotes.  The request line contains a great deal of
 useful information.  First, the method used by the client is
 <code>GET</code>.  Second, the client requested the resource
 <code>/apache_pb.gif</code>, and third, the client used the protocol
 <code>HTTP/1.0</code>. It is also possible to log one or more
 parts of the request line independently.  For example, the format
 string "<code>%m %U%q %H</code>" will log the method, path,
 query-string, and protocol, resulting in exactly the same output as
 "<code>%r</code>".</dd>

 <dt><code>200</code> (<code>%>s</code>)</dt> <dd>This is the status
 code that the server sends back to the client.  This information is
 very valuable, because it reveals whether the request resulted in a
 successful response (codes beginning in 2), a redirection (codes
 beginning in 3), an error caused by the client (codes beginning in 4),
 or an error in the server (codes beginning in 5).
 The full list of possible status codes can be
 found in the <a href="http://www.w3.org/Protocols/rfc2616/rfc2616.txt"
 >HTTP specification</a> (RFC2616 section 10).</dd>

 <dt><code>2326</code> (<code>%b</code>)
 <dd>The last entry indicates the size of the object returned to
 the client, not including the response headers.  If no content
 was returned to the client, this value will be "<code>-</code>".
 To log "<code>0</code>" for no content, use <code>%B</code>
 instead.</dd>

 </dl>

 <h4><a name="combined">Combined Log Format</a></h4>

 <p>Another commonly used format string is called the
 Combined Log Format.  It can be used as follows.</p>

 <blockquote><code>
 LogFormat "%h %l %u %t \"%r\" %&gt;s %b \"%{Referer}i\" \"%{User-agent}i\"" combined<br>
 CustomLog log/acces_log combined
 </code></blockquote>

 <p>This format is exactly the same as the Common Log Format, with the
 addition of two more fields.  Each of the additional fields uses the
 percent-directive <code>%{<em>header</em>}i</code>, where
 <em>header</em> can be any HTTP request header.  The access log under
 this format will look like:</p>

 <blockquote><code>
 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"
 </code></blockquote>

 <p>The additional fields are:</p>

 <dl>

 <dt><code>"http://www.example.com/start.html"</code>
 (<code>\"%{Referer}i\"</code>)</dt> <dd>The "Referer" (sic) HTTP
 request header.  This gives the site that the client reports having
 been referred from.  (This should be the page that links to or includes
 <code>/apache_pb.gif</code>).

 <dt><code>"Mozilla/4.08 [en] (Win98; I ;Nav)"</code>
 (<code>\"%{User-agent}i\"</code>)</dt> <dd>The User-Agent HTTP request
 header.  This is the identifying information that the client browser
 reports about itself.</dd>

 </dl>

 <h3><a name="multiple">Multiple Access Logs</a></h3>

 <p>Multiple access logs can be created simply by specifying multiple
 <code>CustomLog</code> directives in the configuration file.  For
 example, the following directives will create three access logs.  The
 first contains the basic CLF information, while the second and third
 contain referer and browser information.  The last two
 <code>CustomLog</code> lines show how to mimic the effects of the
 <code>ReferLog</code> and <code>AgentLog</code> directives.</p>

 <blockquote><code>
 LogFormat "%h %l %u %t \"%r\" %>s %b" common<br>
 CustomLog logs/access_log common<br>
 CustomLog logs/referer_log "%{Referer}i -> %U"<br>
 CustomLog logs/agent_log "%{User-agent}i"
 </code></blockquote>

 <p>This example also shows that it is not necessary to define a
 nickname with the <code>LogFormat</code> directive.  Instead, the log
 format can be specified directly in the <code>CustomLog</code>
 directive.</p>

 <h3><a name="conditional">Conditional Logging</a></h3>

 <p>There are times when it is convenient to exclude certain entries
 from the access logs based on characteristics of the client request.
 This is easily accomplished with the help of <a
 href="env.html">environment variables</a>.  First, an environment
 variable must be set to indicate that the request meets certain
 conditions.  This is usually accomplished with <a
 href="mod/mod_setenvif.html#setenvif">SetEnvIf</a>.  Then the
 <code>env=</code> clause of the <code>CustomLog</code> directive is
 used to include or exclude requests where the environment variable is
 set.  Some examples:</p>

 <blockquote><code>
 # Mark requests from the loop-back interface<br>
 SetEnvIf Remote_Addr "127\.0\.0\.1" dontlog<br>
 # Mark requests for the robots.txt file<br>
 SetEnvIf Request_URI "^/robots\.txt$" dontlog<br>
 # Log what remains<br>
 CustomLog logs/access_log common env=!dontlog
 </code></blockquote>

 <p>As another example, consider logging requests from english-speakers
 to one log file, and non-english speakers to a different log file.</p>

 <blockquote><code>
 SetEnvIf Accept-Language "en" english<br>
 CustomLog logs/english_log common env=english<br>
 CustomLog logs/non_english_log common env=!english
 </code></blockquote>

 <p>Although we have just shown that conditional logging is very
 powerful and flexibly, it is not the only way to control the contents
 of the logs.  Log files are more useful when they contain a complete
 record of server activity.  It is often easier to simply post-process
 the log files to remove requests that you do not want to consider.</p>

 <hr>

 <h2><a name="rotation">Log Rotation</a></h2>

 <p>On even a moderately busy server, the quantity of information
 stored in the log files is very large.  The access log file typically
 grows 1 MB or more per 10,000 requests.  It will consequently be
 necessary to periodically rotate the log files by moving or deleting
 the existing logs.  This cannot be done while the server is running,
 because Apache will continue writing to the old log file as long as it
 holds the file open.  Instead, the server must be <a
 href="stopping.html">restarted</a> after the log files are moved or
 deleted so that it will open new log files.</p>

 <p>By using a <em>graceful</em> restart, the server can be instructed
 to open new log files without losing any existing or pending
 connections from clients.  However, in order to accomplish this, the
 server must continue to write to the old log files while it finishes
 serving old requests.  It is therefore necessary to wait for some time
 after the restart before doing any processing on the log files.  A
 typical scenario that simply rotates the logs and compresses the old
 logs to save space is:</p>

 <blockquote><code>
 mv access_log access_log.old<br>
 mv error_log error_log.old<br>
 apachectl graceful<br>
 sleep 600<br>
 gzip access_log.old error_log.old
 </code></blockquote>

 <p>Another way to perform log rotation is using <a href="#piped">piped
 logs</a> as discussed in the next section.</p>

 <hr>

 <h2><a name="piped">Piped Logs</a></h2>

 <p>Apache httpd is capable of writing error and access log files
 through a pipe to another process, rather than directly to a file.
 This capability dramatically increases the flexibility of logging,
 without adding code to the main server.  In order to write logs to a
 pipe, simply replace the filename with the pipe character
 "<code>|</code>", followed by the name of the executable which should
 accept log entries on its standard input.  Apache will start the
 piped-log process when the server starts, and will restart it if it
 crashes while the server is running.  (This last feature is why we can
 refer to this technique as "reliable piped logging".)</p>

 <p>Piped log processes are spawned by the parent Apache httpd process,
 and inherit the userid of that process.  This means that piped log
 programs usually run as root.  It is therefore very important to keep
 the programs simple and secure.</p>

 <p>Some simple examples using piped logs:</p>

 <blockquote><code>
 # compressed logs<br>
 CustomLog "|/usr/bin/gzip -c >> /var/log/access_log.gz" common<br>
 # almost-real-time name resolution<br>
 CustomLog "|/usr/local/apache/bin/logresolve >> /var/log/access_log" common
 </code></blockquote>

 <p>Notice that quotes are used to enclose the entire command
 that will be called for the pipe.  Although these examples are
 for the access log, the same technique can be used for the
 error log.</p>

 <p>One important use of piped logs is to allow log rotation without
 having to restart the server.  The Apache HTTP Server includes a
 simple program called <a
 href="programs/rotatelogs.html">rotatelogs</a> for this purpose.  For
 example, to rotate the logs every 24 hours, you can use:</p>

 <blockquote><code>
 CustomLog "|/usr/local/apache/bin/rotatelogs /var/log/access_log 86400" common
 </code></blockquote>

 <p>A similar, but much more flexible log rotation program
 called <a href="http://www.ford-mason.co.uk/resources/cronolog/">cronolog</a>
 is available at an external site.</p>

 <p>As with conditional logging, piped logs are a very powerful tool,
 but they should not be used where a simpler solution like
 off-line post-processing is available.</p>

 <hr>

 <h2><a name="virtualhosts">Virtual Hosts</a></h2>

 <p>When running a server with many <a href="vhosts/">virtual
 hosts</a>, there are several options for dealing with log files.
 First, it is possible to use logs exactly as in a single-host server.
 Simply by placing the logging directives outside the
 <code>&lt;VirtualHost&gt;</code> sections in the main server context,
 it is possible to log all requests in the same access log and error
 log.  This technique does not allow for easy collection of statistics
 on individual virtual hosts.</p>

 <p>If <code>CustomLog</code> or <code>ErrorLog</code> directives are
 placed inside a <code>&lt;VirtualHost&gt;</code> section, all requests
 or errors for that virtual host will be logged only to the specified
 file.  Any virtual host which does not have logging directives will
 still have its requests sent to the main server logs.  This technique
 is very useful for a small number of virtual hosts, but if the number
 of hosts is very large, it can be complicated to manage.  In addition,
 it can often create problems with <a
 href="vhosts/fd-limits.html">insufficient file descriptors</a>.</p>

 <p>For the access log, there is a very good compromise.  By adding
 information on the virtual host to the log format string,
 it is possible to log all hosts to the same log, and later
 split the log into individual files.  For example, consider the
 following directives.</p>

 <blockquote><code>
 LogFormat "%v %l %u %t \"%r\" %>s %b" comonvhost<br>
 CustomLog logs/access_log comonvhost
 </code></blockquote>

 <p>The <code>%v</code> is used to log the name of the virtual host
 that is serving the request.  Then a program like <a
 href="programs/other.html">split-logfile</a> can be used to
 post-process the access log in order to split it into one file per
 virtual host.</p>

 <p>Unfortunately, no similar technique is available for the error log,
 so you must choose between mixing all virtual hosts in the same error
 log and using one error log per virtual host.</p>

 <hr>

 <h2><a name="other">Other Log Files</a></h2>

 <table border=1><tr><td valign="top">
 <strong>Related Modules</strong><br><br>

 <a href="mod/mod_cgi.html">mod_cgi</a><br>
 <a href="mod/mod_rewrite.html">mod_rewrite</a>

 </td><td valign="top">
 <strong>Related Directives</strong><br><br>

 <a href="mod/core.html#pidfile">PidFile</a><br>
 <a href="mod/mod_rewrite.html#RewriteLog">RewriteLog</a></br>
 <a href="mod/mod_rewrite.html#RewriteLogLevel">RewriteLogLevel</a></br>
 <a href="mod/mod_cgi.html#scriptlog">ScriptLog</a><br>
 <a href="mod/mod_cgi.html#scriptloglength">ScriptLogLength</a><br>
 <a href="mod/mod_cgi.html#scriptlogbuffer">ScriptLogBuffer</a>

 </td></tr></table>

 <h3><a name="pidfile">PID File</a></h3>

 <p>On startup, Apache httpd saves the process id of the parent httpd
 process to the file <code>logs/httpd.pid</code>. This filename can be
 changed with the <A HREF="mod/core.html#pidfile">PidFile</A>
 directive. The process-id is for use by the administrator in
 restarting and terminating the daemon by sending signals
 to the parent process; on Windows, use the -k command line
 option instead.  For more information see the <A
 HREF="stopping.html">Stopping and Restarting</A> page.

 <h3><a name="scriptlog">Script Log</a></h3>

 <p>In order to aid in debugging, the
 <a href="mod/mod_cgi.html#scriptlog">ScriptLog</a>
 directive allows you to record the input to and output from
 CGI scripts.  This should only be used in testing - not for
 live servers.  More information is available in the
 <a href="mod/mod_cgi.html">mod_cgi documentation</a>.

 <h3><a name="rewritelog">Rewrite Log</a></h3>

 <p>When using the powerful and complex features of <a
 href="mod/mod_rewrite.html">mod_rewrite</a>, it is almost always
 necessary to use the <a
 href="mod/mod_rewrite.html#RewriteLog">RewriteLog</a> to help in
 debugging.  This log file produces a detailed analysis of how the
 rewriting engine transforms requests.  The level of detail is
 controlled by the <a
 href="mod/mod_rewrite.html#RewriteLogLevel">RewriteLogLevel</a>
 directive.</p>

 <!--#include virtual="footer.html" -->
 </BODY>
 </HTML>