| <?xml version="1.0" encoding="UTF-8" ?> |
| <!DOCTYPE manualpage SYSTEM "style/manualpage.dtd"> |
| <?xml-stylesheet type="text/xsl" href="style/manual.en.xsl"?> |
| <!-- $LastChangedRevision$ --> |
| |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <manualpage metafile="caching.xml.meta"> |
| |
| <title>Caching Guide</title> |
| |
| <summary> |
| <p>This document supplements the <module>mod_cache</module>, |
| <module>mod_cache_disk</module>, <module>mod_file_cache</module> and <a |
| href="programs/htcacheclean.html">htcacheclean</a> reference documentation. |
| It describes how to use the Apache HTTP Server's caching features to accelerate web and |
| proxy serving, while avoiding common problems and misconfigurations.</p> |
| </summary> |
| |
| <section id="introduction"> |
| <title>Introduction</title> |
| |
| <p>As of Apache HTTP server version 2.2 <module>mod_cache</module> |
| and <module>mod_file_cache</module> are no longer marked |
| experimental and are considered suitable for production use. These |
| caching architectures provide a powerful means to accelerate HTTP |
| handling, both as an origin webserver and as a proxy.</p> |
| |
| <p><module>mod_cache</module> and its provider modules |
| <module>mod_cache_disk</module> |
| provide intelligent, HTTP-aware caching. The content itself is stored |
| in the cache, and mod_cache aims to honor all of the various HTTP |
| headers and options that control the cachability of content. It can |
| handle both local and proxied content. <module>mod_cache</module> |
| is aimed at both simple and complex caching configurations, where |
| you are dealing with proxied content, dynamic local content or |
| have a need to speed up access to local files which change with |
| time.</p> |
| |
| <p><module>mod_file_cache</module> on the other hand presents a more |
| basic, but sometimes useful, form of caching. Rather than maintain |
| the complexity of actively ensuring the cachability of URLs, |
| <module>mod_file_cache</module> offers file-handle and memory-mapping |
| tricks to keep a cache of files as they were when httpd was last |
| started. As such, <module>mod_file_cache</module> is aimed at improving |
| the access time to local static files which do not change very |
| often.</p> |
| |
| <p>As <module>mod_file_cache</module> presents a relatively simple |
| caching implementation, apart from the specific sections on <directive |
| module="mod_file_cache">CacheFile</directive> and <directive |
| module="mod_file_cache">MMapFile</directive>, the explanations |
| in this guide cover the <module>mod_cache</module> caching |
| architecture.</p> |
| |
| <p>To get the most from this document, you should be familiar with |
| the basics of HTTP, and have read the Users' Guides to |
| <a href="urlmapping.html">Mapping URLs to the Filesystem</a> and |
| <a href="content-negotiation.html">Content negotiation</a>.</p> |
| |
| </section> |
| |
| <section id="overview"> |
| |
| <title>Caching Overview</title> |
| |
| <related> |
| <modulelist> |
| <module>mod_cache</module> |
| <module>mod_cache_disk</module> |
| <module>mod_file_cache</module> |
| </modulelist> |
| <directivelist> |
| <directive module="mod_cache">CacheEnable</directive> |
| <directive module="mod_cache">CacheDisable</directive> |
| <directive module="mod_file_cache">CacheFile</directive> |
| <directive module="mod_file_cache">MMapFile</directive> |
| <directive module="core">UseCanonicalName</directive> |
| <directive module="mod_negotiation">CacheNegotiatedDocs</directive> |
| </directivelist> |
| </related> |
| |
| <p>There are two main stages in <module>mod_cache</module> that can |
| occur in the lifetime of a request. First, <module>mod_cache</module> |
| is a URL mapping module, which means that if a URL has been cached, |
| and the cached version of that URL has not expired, the request will |
| be served directly by <module>mod_cache</module>.</p> |
| |
| <p>This means that any other stages that might ordinarily happen |
| in the process of serving a request -- for example being handled |
| by <module>mod_proxy</module>, or <module>mod_rewrite</module> -- |
| won't happen. But then this is the point of caching content in |
| the first place.</p> |
| |
| <p>If the URL is not found within the cache, <module>mod_cache</module> |
| will add a <a href="filter.html">filter</a> to the request handling. After |
| httpd has located the content by the usual means, the filter will be run |
| as the content is served. If the content is determined to be cacheable, |
| the content will be saved to the cache for future serving.</p> |
| |
| <p>If the URL is found within the cache, but also found to have expired, |
| the filter is added anyway, but <module>mod_cache</module> will create |
| a conditional request to the backend, to determine if the cached version |
| is still current. If the cached version is still current, its |
| meta-information will be updated and the request will be served from the |
| cache. If the cached version is no longer current, the cached version |
| will be deleted and the filter will save the updated content to the cache |
| as it is served.</p> |
| |
| <section> |
| <title>Improving Cache Hits</title> |
| |
| <p>When caching locally generated content, ensuring that |
| <directive module="core">UseCanonicalName</directive> is set to |
| <code>On</code> can dramatically improve the ratio of cache hits. This |
| is because the hostname of the virtual-host serving the content forms |
| a part of the cache key. With the setting set to <code>On</code> |
| virtual-hosts with multiple server names or aliases will not produce |
| differently cached entities, and instead content will be cached as |
| per the canonical hostname.</p> |
| |
| <p>Because caching is performed within the URL to filename translation |
| phase, cached documents will only be served in response to URL requests. |
| Ordinarily this is of little consequence, but there is one circumstance |
| in which it matters: If you are using <a href="howto/ssi.html">Server |
| Side Includes</a>;</p> |
| |
| <example> |
| <!-- The following include can be cached --><br /> |
| <!--#include virtual="/footer.html" --> <br /> |
| <br /> |
| <!-- The following include can not be cached --><br /> |
| <!--#include file="/path/to/footer.html" --> |
| </example> |
| |
| <p>If you are using Server Side Includes, and want the benefit of speedy |
| serves from the cache, you should use <code>virtual</code> include |
| types.</p> |
| </section> |
| |
| <section> |
| <title>Expiry Periods</title> |
| |
| <p>The default expiry period for cached entities is one hour, however |
| this can be easily over-ridden by using the <directive |
| module="mod_cache">CacheDefaultExpire</directive> directive. This |
| default is only used when the original source of the content does not |
| specify an expire time or time of last modification.</p> |
| |
| <p>If a response does not include an <code>Expires</code> header but does |
| include a <code>Last-Modified</code> header, <module>mod_cache</module> |
| can infer an expiry period based on the use of the <directive |
| module="mod_cache">CacheLastModifiedFactor</directive> directive.</p> |
| |
| <p>For local content, <module>mod_expires</module> may be used to |
| fine-tune the expiry period.</p> |
| |
| <p>The maximum expiry period may also be controlled by using the |
| <directive module="mod_cache">CacheMaxExpire</directive>.</p> |
| |
| </section> |
| |
| <section> |
| <title>A Brief Guide to Conditional Requests</title> |
| |
| <p>When content expires from the cache and is re-requested from the |
| backend or content provider, rather than pass on the original request, |
| httpd will use a conditional request instead.</p> |
| |
| <p>HTTP offers a number of headers which allow a client, or cache |
| to discern between different versions of the same content. For |
| example if a resource was served with an "Etag:" header, it is |
| possible to make a conditional request with an "If-None-Match:" |
| header. If a resource was served with a "Last-Modified:" header |
| it is possible to make a conditional request with an |
| "If-Modified-Since:" header, and so on.</p> |
| |
| <p>When such a conditional request is made, the response differs |
| depending on whether the content matches the conditions. If a request is |
| made with an "If-Modified-Since:" header, and the content has not been |
| modified since the time indicated in the request then a terse "304 Not |
| Modified" response is issued.</p> |
| |
| <p>If the content has changed, then it is served as if the request were |
| not conditional to begin with.</p> |
| |
| <p>The benefits of conditional requests in relation to caching are |
| twofold. Firstly, when making such a request to the backend, if the |
| content from the backend matches the content in the store, this can be |
| determined easily and without the overhead of transferring the entire |
| resource.</p> |
| |
| <p>Secondly, conditional requests are usually less strenuous on the |
| backend. For static files, typically all that is involved is a call |
| to <code>stat()</code> or similar system call, to see if the file has |
| changed in size or modification time. As such, even if httpd is |
| caching local content, even expired content may still be served faster |
| from the cache if it has not changed. As long as reading from the cache |
| store is faster than reading from the backend (e.g. <module |
| >mod_cache_disk</module> with memory disk |
| compared to reading from disk).</p> |
| </section> |
| |
| <section> |
| <title>What Can be Cached?</title> |
| |
| <p>As mentioned already, the two styles of caching in httpd work |
| differently, <module>mod_file_cache</module> caching maintains file |
| contents as they were when httpd was started. When a request is |
| made for a file that is cached by this module, it is intercepted |
| and the cached file is served.</p> |
| |
| <p><module>mod_cache</module> caching on the other hand is more |
| complex. When serving a request, if it has not been cached |
| previously, the caching module will determine if the content |
| is cacheable. The conditions for determining cachability of |
| a response are;</p> |
| |
| <ol> |
| <li>Caching must be enabled for this URL. See the <directive |
| module="mod_cache">CacheEnable</directive> and <directive |
| module="mod_cache">CacheDisable</directive> directives.</li> |
| |
| <li>The response must have a HTTP status code of 200, 203, 300, 301 or |
| 410.</li> |
| |
| <li>The request must be a HTTP GET request.</li> |
| |
| <li>If the request contains an "Authorization:" header, the response |
| will not be cached.</li> |
| |
| <li>If the response contains an "Authorization:" header, it must |
| also contain an "s-maxage", "must-revalidate" or "public" option |
| in the "Cache-Control:" header.</li> |
| |
| <li>If the URL included a query string (e.g. from a HTML form GET |
| method) it will not be cached unless the response specifies an |
| explicit expiration by including an "Expires:" header or the max-age |
| or s-maxage directive of the "Cache-Control:" header, as per RFC2616 |
| sections 13.9 and 13.2.1.</li> |
| |
| <li>If the response has a status of 200 (OK), the response must |
| also include at least one of the "Etag", "Last-Modified" or |
| the "Expires" headers, or the max-age or s-maxage directive of |
| the "Cache-Control:" header, unless the |
| <directive module="mod_cache">CacheIgnoreNoLastMod</directive> |
| directive has been used to require otherwise.</li> |
| |
| <li>If the response includes the "private" option in a "Cache-Control:" |
| header, it will not be stored unless the |
| <directive module="mod_cache">CacheStorePrivate</directive> has been |
| used to require otherwise.</li> |
| |
| <li>Likewise, if the response includes the "no-store" option in a |
| "Cache-Control:" header, it will not be stored unless the |
| <directive module="mod_cache">CacheStoreNoStore</directive> has been |
| used.</li> |
| |
| <li>A response will not be stored if it includes a "Vary:" header |
| containing the match-all "*".</li> |
| </ol> |
| </section> |
| |
| <section> |
| <title>What Should Not be Cached?</title> |
| |
| <p>In short, any content which is highly time-sensitive, or which varies |
| depending on the particulars of the request that are not covered by |
| HTTP negotiation, should not be cached.</p> |
| |
| <p>If you have dynamic content which changes depending on the IP address |
| of the requester, or changes every 5 minutes, it should almost certainly |
| not be cached.</p> |
| |
| <p>If on the other hand, the content served differs depending on the |
| values of various HTTP headers, it might be possible |
| to cache it intelligently through the use of a "Vary" header.</p> |
| </section> |
| |
| <section> |
| <title>Variable/Negotiated Content</title> |
| |
| <p>If a response with a "Vary" header is received by |
| <module>mod_cache</module> when requesting content by the backend it |
| will attempt to handle it intelligently. If possible, |
| <module>mod_cache</module> will detect the headers attributed in the |
| "Vary" response in future requests and serve the correct cached |
| response.</p> |
| |
| <p>If for example, a response is received with a vary header such as;</p> |
| |
| <example> |
| Vary: negotiate,accept-language,accept-charset |
| </example> |
| |
| <p><module>mod_cache</module> will only serve the cached content to |
| requesters with accept-language and accept-charset headers |
| matching those of the original request.</p> |
| </section> |
| </section> |
| |
| <section id="security"> |
| <title>Security Considerations</title> |
| |
| <section> |
| <title>Authorization and Access Control</title> |
| |
| <p>Using <module>mod_cache</module> is very much like having a built |
| in reverse-proxy. Requests will be served by the caching module unless |
| it determines that the backend should be queried. When caching local |
| resources, this drastically changes the security model of httpd.</p> |
| |
| <p>As traversing a filesystem hierarchy to examine potential |
| <code>.htaccess</code> files would be a very expensive operation, |
| partially defeating the point of caching (to speed up requests), |
| <module>mod_cache</module> makes no decision about whether a cached |
| entity is authorised for serving. In other words; if |
| <module>mod_cache</module> has cached some content, it will be served |
| from the cache as long as that content has not expired.</p> |
| |
| <p>If, for example, your configuration permits access to a resource by IP |
| address you should ensure that this content is not cached. You can do this |
| by using the <directive module="mod_cache">CacheDisable</directive> |
| directive, or <module>mod_expires</module>. Left unchecked, |
| <module>mod_cache</module> - very much like a reverse proxy - would cache |
| the content when served and then serve it to any client, on any IP |
| address.</p> |
| </section> |
| |
| <section> |
| <title>Local exploits</title> |
| |
| <p>As requests to end-users can be served from the cache, the cache |
| itself can become a target for those wishing to deface or interfere with |
| content. It is important to bear in mind that the cache must at all |
| times be writable by the user which httpd is running as. This is in |
| stark contrast to the usually recommended situation of maintaining |
| all content unwritable by the Apache user.</p> |
| |
| <p>If the Apache user is compromised, for example through a flaw in |
| a CGI process, it is possible that the cache may be targeted. When |
| using <module>mod_cache_disk</module>, it is relatively easy to |
| insert or modify a cached entity.</p> |
| |
| <p>This presents a somewhat elevated risk in comparison to the other |
| types of attack it is possible to make as the Apache user. If you are |
| using <module>mod_cache_disk</module> you should bear this in mind - |
| ensure you upgrade httpd when security upgrades are announced and |
| run CGI processes as a non-Apache user using <a |
| href="suexec.html">suEXEC</a> if possible.</p> |
| |
| </section> |
| |
| <section> |
| <title>Cache Poisoning</title> |
| |
| <p>When running httpd as a caching proxy server, there is also the |
| potential for so-called cache poisoning. Cache Poisoning is a broad |
| term for attacks in which an attacker causes the proxy server to |
| retrieve incorrect (and usually undesirable) content from the backend. |
| </p> |
| |
| <p>For example if the DNS servers used by your system running |
| httpd |
| are vulnerable to DNS cache poisoning, an attacker may be able to control |
| where httpd connects to when requesting content from the origin server. |
| Another example is so-called HTTP request-smuggling attacks.</p> |
| |
| <p>This document is not the correct place for an in-depth discussion |
| of HTTP request smuggling (instead, try your favourite search engine) |
| however it is important to be aware that it is possible to make |
| a series of requests, and to exploit a vulnerability on an origin |
| webserver such that the attacker can entirely control the content |
| retrieved by the proxy.</p> |
| </section> |
| </section> |
| |
| <section id="filehandle"> |
| <title>File-Handle Caching</title> |
| |
| <related> |
| <modulelist> |
| <module>mod_file_cache</module> |
| </modulelist> |
| <directivelist> |
| <directive module="mod_file_cache">CacheFile</directive> |
| </directivelist> |
| </related> |
| |
| <p>The act of opening a file can itself be a source of delay, particularly |
| on network filesystems. By maintaining a cache of open file descriptors |
| for commonly served files, httpd can avoid this delay. Currently |
| httpd |
| provides one implementation of File-Handle Caching.</p> |
| |
| <section> |
| <title>CacheFile</title> |
| |
| <p>The most basic form of caching present in httpd is the file-handle |
| caching provided by <module>mod_file_cache</module>. Rather than caching |
| file-contents, this cache maintains a table of open file descriptors. Files |
| to be cached in this manner are specified in the configuration file using |
| the <directive module="mod_file_cache">CacheFile</directive> |
| directive.</p> |
| |
| <p>The |
| <directive module="mod_file_cache">CacheFile</directive> directive |
| instructs httpd to open the file when it is started and to re-use |
| this file-handle for all subsequent access to this file.</p> |
| |
| <example> |
| CacheFile /usr/local/apache2/htdocs/index.html |
| </example> |
| |
| <p>If you intend to cache a large number of files in this manner, you |
| must ensure that your operating system's limit for the number of open |
| files is set appropriately.</p> |
| |
| <p>Although using <directive module="mod_file_cache">CacheFile</directive> |
| does not cause the file-contents to be cached per-se, it does mean |
| that if the file changes while httpd is running these changes will |
| not be picked up. The file will be consistently served as it was |
| when httpd was started.</p> |
| |
| <p>If the file is removed while httpd is running, it will continue |
| to maintain an open file descriptor and serve the file as it was when |
| httpd was started. This usually also means that although the file |
| will have been deleted, and not show up on the filesystem, extra free |
| space will not be recovered until httpd is stopped and the file |
| descriptor closed.</p> |
| </section> |
| |
| </section> |
| |
| <section id="inmemory"> |
| <title>In-Memory Caching</title> |
| |
| <related> |
| <modulelist> |
| <module>mod_file_cache</module> |
| </modulelist> |
| <directivelist> |
| <directive module="mod_cache">CacheEnable</directive> |
| <directive module="mod_cache">CacheDisable</directive> |
| <directive module="mod_file_cache">MMapFile</directive> |
| </directivelist> |
| </related> |
| |
| <p>Serving directly from system memory is universally the fastest method |
| of serving content. Reading files from a disk controller or, even worse, |
| from a remote network is orders of magnitude slower. Disk controllers |
| usually involve physical processes, and network access is limited by |
| your available bandwidth. Memory access on the other hand can take mere |
| nano-seconds.</p> |
| |
| <p>System memory isn't cheap though, byte for byte it's by far the most |
| expensive type of storage and it's important to ensure that it is used |
| efficiently. By caching files in memory you decrease the amount of |
| memory available on the system. As we'll see, in the case of operating |
| system caching, this is not so much of an issue, but when using |
| httpd's own in-memory caching it is important to make sure that you |
| do not allocate too much memory to a cache. Otherwise the system |
| will be forced to swap out memory, which will likely degrade |
| performance.</p> |
| |
| <section> |
| <title>Operating System Caching</title> |
| |
| <p>Almost all modern operating systems cache file-data in memory managed |
| directly by the kernel. This is a powerful feature, and for the most |
| part operating systems get it right. For example, on Linux, let's look at |
| the difference in the time it takes to read a file for the first time |
| and the second time;</p> |
| |
| <example><pre> |
| colm@coroebus:~$ time cat testfile > /dev/null |
| real 0m0.065s |
| user 0m0.000s |
| sys 0m0.001s |
| colm@coroebus:~$ time cat testfile > /dev/null |
| real 0m0.003s |
| user 0m0.003s |
| sys 0m0.000s</pre> |
| </example> |
| |
| <p>Even for this small file, there is a huge difference in the amount |
| of time it takes to read the file. This is because the kernel has cached |
| the file contents in memory.</p> |
| |
| <p>By ensuring there is "spare" memory on your system, you can ensure |
| that more and more file-contents will be stored in this cache. This |
| can be a very efficient means of in-memory caching, and involves no |
| extra configuration of httpd at all.</p> |
| |
| <p>Additionally, because the operating system knows when files are |
| deleted or modified, it can automatically remove file contents from the |
| cache when necessary. This is a big advantage over httpd's in-memory |
| caching which has no way of knowing when a file has changed.</p> |
| </section> |
| |
| <p>Despite the performance and advantages of automatic operating system |
| caching there are some circumstances in which in-memory caching may be |
| better performed by httpd.</p> |
| |
| <section> |
| <title>MMapFile Caching</title> |
| |
| <p><module>mod_file_cache</module> provides the |
| <directive module="mod_file_cache">MMapFile</directive> directive, which |
| allows you to have httpd map a static file's contents into memory at |
| start time (using the mmap system call). httpd will use the in-memory |
| contents for all subsequent accesses to this file.</p> |
| |
| <example> |
| MMapFile /usr/local/apache2/htdocs/index.html |
| </example> |
| |
| <p>As with the |
| <directive module="mod_file_cache">CacheFile</directive> directive, any |
| changes in these files will not be picked up by httpd after it has |
| started.</p> |
| |
| <p> The <directive module="mod_file_cache">MMapFile</directive> |
| directive does not keep track of how much memory it allocates, so |
| you must ensure not to over-use the directive. Each httpd child |
| process will replicate this memory, so it is critically important |
| to ensure that the files mapped are not so large as to cause the |
| system to swap memory.</p> |
| </section> |
| </section> |
| |
| <section id="disk"> |
| <title>Disk-based Caching</title> |
| |
| <related> |
| <modulelist> |
| <module>mod_cache_disk</module> |
| </modulelist> |
| <directivelist> |
| <directive module="mod_cache">CacheEnable</directive> |
| <directive module="mod_cache">CacheDisable</directive> |
| </directivelist> |
| </related> |
| |
| <p><module>mod_cache_disk</module> provides a disk-based caching mechanism |
| for <module>mod_cache</module>. This cache is intelligent and content will |
| be served from the cache only as long as it is considered valid.</p> |
| |
| <p>Typically the module will be configured as so;</p> |
| |
| <example> |
| CacheRoot /var/cache/apache/<br /> |
| CacheEnable disk /<br /> |
| CacheDirLevels 2<br /> |
| CacheDirLength 1 |
| </example> |
| |
| <p>Importantly, as the cached files are locally stored, operating system |
| in-memory caching will typically be applied to their access also. So |
| although the files are stored on disk, if they are frequently accessed |
| it is likely the operating system will ensure that they are actually |
| served from memory.</p> |
| |
| <section> |
| <title>Understanding the Cache-Store</title> |
| |
| <p>To store items in the cache, <module>mod_cache_disk</module> creates |
| a 22 character hash of the URL being requested. This hash incorporates |
| the hostname, protocol, port, path and any CGI arguments to the URL, |
| to ensure that multiple URLs do not collide.</p> |
| |
| <p>Each character may be any one of 64-different characters, which mean |
| that overall there are 64^22 possible hashes. For example, a URL might |
| be hashed to <code>xyTGxSMO2b68mBCykqkp1w</code>. This hash is used |
| as a prefix for the naming of the files specific to that URL within |
| the cache, however first it is split up into directories as per |
| the <directive module="mod_cache_disk">CacheDirLevels</directive> and |
| <directive module="mod_cache_disk">CacheDirLength</directive> |
| directives.</p> |
| |
| <p><directive module="mod_cache_disk">CacheDirLevels</directive> |
| specifies how many levels of subdirectory there should be, and |
| <directive module="mod_cache_disk">CacheDirLength</directive> |
| specifies how many characters should be in each directory. With |
| the example settings given above, the hash would be turned into |
| a filename prefix as |
| <code>/var/cache/apache/x/y/TGxSMO2b68mBCykqkp1w</code>.</p> |
| |
| <p>The overall aim of this technique is to reduce the number of |
| subdirectories or files that may be in a particular directory, |
| as most file-systems slow down as this number increases. With |
| setting of "1" for |
| <directive module="mod_cache_disk">CacheDirLength</directive> |
| there can at most be 64 subdirectories at any particular level. |
| With a setting of 2 there can be 64 * 64 subdirectories, and so on. |
| Unless you have a good reason not to, using a setting of "1" |
| for <directive module="mod_cache_disk">CacheDirLength</directive> |
| is recommended.</p> |
| |
| <p>Setting |
| <directive module="mod_cache_disk">CacheDirLevels</directive> |
| depends on how many files you anticipate to store in the cache. |
| With the setting of "2" used in the above example, a grand |
| total of 4096 subdirectories can ultimately be created. With |
| 1 million files cached, this works out at roughly 245 cached |
| URLs per directory.</p> |
| |
| <p>Each URL uses at least two files in the cache-store. Typically |
| there is a ".header" file, which includes meta-information about |
| the URL, such as when it is due to expire and a ".data" file |
| which is a verbatim copy of the content to be served.</p> |
| |
| <p>In the case of a content negotiated via the "Vary" header, a |
| ".vary" directory will be created for the URL in question. This |
| directory will have multiple ".data" files corresponding to the |
| differently negotiated content.</p> |
| </section> |
| |
| <section> |
| <title>Maintaining the Disk Cache</title> |
| |
| <p>Although <module>mod_cache_disk</module> will remove cached content |
| as it is expired, it does not maintain any information on the total |
| size of the cache or how little free space may be left.</p> |
| |
| <p>Instead, provided with httpd is the <a |
| href="programs/htcacheclean.html">htcacheclean</a> tool which, as the name |
| suggests, allows you to clean the cache periodically. Determining |
| how frequently to run <a |
| href="programs/htcacheclean.html">htcacheclean</a> and what target size to |
| use for the cache is somewhat complex and trial and error may be needed to |
| select optimal values.</p> |
| |
| <p><a href="programs/htcacheclean.html">htcacheclean</a> has two modes of |
| operation. It can be run as persistent daemon, or periodically from |
| cron. <a |
| href="programs/htcacheclean.html">htcacheclean</a> can take up to an hour |
| or more to process very large (tens of gigabytes) caches and if you are |
| running it from cron it is recommended that you determine how long a typical |
| run takes, to avoid running more than one instance at a time.</p> |
| |
| <p class="figure"> |
| <img src="images/caching_fig1.gif" alt="" width="600" |
| height="406" /><br /> |
| <a id="figure1" name="figure1"><dfn>Figure 1</dfn></a>: Typical |
| cache growth / clean sequence.</p> |
| |
| <p>Because <module>mod_cache_disk</module> does not itself pay attention |
| to how much space is used you should ensure that |
| <a href="programs/htcacheclean.html">htcacheclean</a> is configured to |
| leave enough "grow room" following a clean.</p> |
| </section> |
| |
| </section> |
| |
| </manualpage> |