| <?xml version="1.0" encoding="UTF-8"?> |
| <!DOCTYPE preface PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" |
| "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"> |
| <!-- |
| ==================================================================== |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| ==================================================================== |
| --> |
| <chapter id="caching"> |
| <title>HTTP Caching</title> |
| |
| <section id="generalconcepts"> |
| <title>General Concepts</title> |
| |
| <para>HttpClient Cache provides an HTTP/1.1-compliant caching layer to be |
| used with HttpClient--the Java equivalent of a browser cache. The |
| implementation follows the Decorator design pattern, where the |
| CachingHttpClient class is a drop-in replacement for |
| a DefaultHttpClient; requests that can be satisfied entirely from the cache |
| will not result in actual origin requests. Stale cache entries are |
| automatically validated with the origin where possible, using conditional GETs |
| and the If-Modified-Since and/or If-None-Match request headers. |
| </para> |
| |
| <para> |
| HTTP/1.1 caching in general is designed to be <emphasis>semantically |
| transparent</emphasis>; that is, a cache should not change the meaning of |
| the request-response exchange between client and server. As such, it should |
| be safe to drop a CachingHttpClient into an existing compliant client-server |
| relationship. Although the caching module is part of the client from an |
| HTTP protocol point of view, the implementation aims to be compatible with |
| the requirements placed on a transparent caching proxy. |
| </para> |
| |
| <para>Finally, CachingHttpClient includes support the Cache-Control |
| extensions specified by RFC 5861 (stale-if-error and stale-while-revalidate). |
| </para> |
| |
| <para>When CachingHttpClient executes a request, it goes through the |
| following flow:</para> |
| |
| <orderedlist> |
| <listitem> |
| <para>Check the request for basic compliance with the HTTP 1.1 |
| protocol and attempt to correct the request.</para> |
| </listitem> |
| |
| <listitem> |
| <para>Flush any cache entries which would be invalidated by this |
| request.</para> |
| </listitem> |
| |
| <listitem> |
| <para>Determine if the current request would be servable from cache. |
| If not, directly pass through the request to the origin server and |
| return the response, after caching it if appropriate.</para> |
| </listitem> |
| |
| <listitem> |
| <para>If it was a a cache-servable request, it will attempt to read it |
| from the cache. If it is not in the cache, call the origin server and |
| cache the response, if appropriate.</para> |
| </listitem> |
| |
| <listitem> |
| <para>If the cached response is suitable to be served as a response, |
| construct a BasicHttpResponse containing a ByteArrayEntity and return |
| it. Otherwise, attempt to revalidate the cache entry against the |
| origin server.</para> |
| </listitem> |
| |
| <listitem> |
| <para>In the case of a cached response which cannot be revalidated, |
| call the origin server and cache the response, if appropriate.</para> |
| </listitem> |
| </orderedlist> |
| |
| <para>When CachingHttpClient receives a response, it goes through the |
| following flow:</para> |
| |
| <orderedlist> |
| <listitem> |
| <para>Examining the response for protocol compliance</para> |
| </listitem> |
| |
| <listitem> |
| <para>Determine whether the response is cacheable</para> |
| </listitem> |
| |
| <listitem> |
| <para>If it is cacheable, attempt to read up to the maximum size |
| allowed in the configuration and store it in the cache.</para> |
| </listitem> |
| |
| <listitem> |
| <para>If the response is too large for the cache, reconstruct the |
| partially consumed response and return it directly without caching |
| it.</para> |
| </listitem> |
| </orderedlist> |
| |
| <para>It is important to note that CachingHttpClient is not, itself, an |
| implementation of HttpClient, but that it decorates an instance of an |
| HttpClient implementation. If you do not provide an implementation, it |
| will use DefaultHttpClient internally by default.</para> |
| </section> |
| |
| <section id="rfc2616compliance"> |
| <title>RFC-2616 Compliance</title> |
| |
| <para>HttpClient Cache makes an effort to be at least <emphasis>conditionally |
| compliant</emphasis> with <ulink |
| url="http://www.ietf.org/rfc/rfc2616.txt">RFC-2616</ulink>. That is, |
| wherever the specification indicates MUST or MUST NOT for HTTP caches, the |
| caching layer attempts to behave in a way that satisfies those |
| requirements. This means the caching module won't produce incorrect |
| behavior when you drop it in. At the same time, the project is continuing |
| to work on unconditional compliance, which would add compliance with all the |
| SHOULDs and SHOULD NOTs, many of which we already comply with. We just can't |
| claim fully unconditional compliance until we satisfy <emphasis>all</emphasis> |
| of them.</para> |
| </section> |
| |
| <section> |
| <title>Example Usage</title> |
| |
| <para>This is a simple example of how to set up a basic CachingHttpClient. |
| As configured, it will store a maximum of 1000 cached objects, each of |
| which may have a maximum body size of 8192 bytes. The numbers selected |
| here are for example only and not intended to be prescriptive or |
| considered as recommendations.</para> |
| |
| <programlisting><![CDATA[ |
| CacheConfig cacheConfig = new CacheConfig(); |
| cacheConfig.setMaxCacheEntries(1000); |
| cacheConfig.setMaxObjectSizeBytes(8192); |
| |
| HttpClient cachingClient = new CachingHttpClient(new DefaultHttpClient(), cacheConfig); |
| |
| HttpContext localContext = new BasicHttpContext(); |
| HttpGet httpget = new HttpGet("http://www.mydomain.com/content/"); |
| HttpResponse response = cachingClient.execute(httpget, localContext); |
| HttpEntity entity = response.getEntity(); |
| EntityUtils.consume(entity); |
| CacheResponseStatus responseStatus = (CacheResponseStatus) localContext.getAttribute( |
| CachingHttpClient.CACHE_RESPONSE_STATUS); |
| switch (responseStatus) { |
| case CACHE_HIT: |
| System.out.println("A response was generated from the cache with no requests " + |
| "sent upstream"); |
| break; |
| case CACHE_MODULE_RESPONSE: |
| System.out.println("The response was generated directly by the caching module"); |
| break; |
| case CACHE_MISS: |
| System.out.println("The response came from an upstream server"); |
| break; |
| case VALIDATED: |
| System.out.println("The response was generated from the cache after validating " + |
| "the entry with the origin server"); |
| break; |
| } |
| ]]> |
| </programlisting> |
| </section> |
| |
| <section id="configuration"> |
| <title>Configuration</title> |
| |
| <para>As the CachingHttpClient is a decorator, much of the configuration you may |
| want to do can be done on the HttpClient used as the "backend" by the HttpClient |
| (this includes setting options like timeouts and connection pool sizes). For |
| caching-specific configuration, you can provide a CacheConfig instance to |
| customize behavior across the following areas:</para> |
| |
| <para><emphasis>Cache size.</emphasis> If the backend storage supports these limits, |
| you can specify the maximum number of cache entries as well as the maximum cacheable |
| response body size.</para> |
| |
| |
| <para><emphasis>Public/private caching.</emphasis> By default, the caching module |
| considers itself to be a shared (public) cache, and will not, for example, cache |
| responses to requests with Authorization headers or responses marked with |
| "Cache-Control: private". If, however, the cache is only going to be used by one |
| logical "user" (behaving similarly to a browser cache), then you will want to turn |
| off the shared cache setting.</para> |
| |
| <para><emphasis>Heuristic caching.</emphasis>Per RFC2616, a cache MAY cache |
| certain cache entries even if no explicit cache control headers are set by the |
| origin. This behavior is off by default, but you may want to turn this on if you |
| are working with an origin that doesn't set proper headers but where you still |
| want to cache the responses. You will want to enable heuristic caching, then |
| specify either a default freshness lifetime and/or a fraction of the time since |
| the resource was last modified. See Sections 13.2.2 and 13.2.4 of the HTTP/1.1 |
| RFC for more details on heuristic caching.</para> |
| |
| <para><emphasis>Background validation.</emphasis> The cache module supports the |
| stale-while-revalidate directive of RFC5861, which allows certain cache entry |
| revalidations to happen in the background. You may want to tweak the settings |
| for the minimum and maximum number of background worker threads, as well as the |
| maximum time they can be idle before being reclaimed. You can also control the |
| size of the queue used for revalidations when there aren't enough workers to |
| keep up with demand.</para> |
| </section> |
| |
| <section id="storage"> |
| <title>Storage Backends</title> |
| |
| <para>The default implementation of CachingHttpClient stores cache entries and |
| cached response bodies in memory in the JVM of your application. While this |
| offers high performance, it may not be appropriate for your application due to |
| the limitation on size or because the cache entries are ephemeral and don't |
| survive an application restart. The current release includes support for storing |
| cache entries using Ehcache and memcached implementations, which allow for |
| spilling cache entries to disk or storing them in an external process.</para> |
| |
| <para>If none of those options are suitable for your application, it is |
| possible to provide your own storage backend by implementing the HttpCacheStorage |
| interface and then supplying that to CachingHttpClient at construction time. In |
| this case, the cache entries will be stored using your scheme but you will get to |
| reuse all of the logic surrounding HTTP/1.1 compliance and cache handling. |
| Generally speaking, it should be possible to create an HttpCacheStorage |
| implementation out of anything that supports a key/value store (similar to the |
| Java Map interface) with the ability to apply atomic updates.</para> |
| |
| <para>Finally, because the CachingHttpClient is a decorator for HttpClient, |
| it's entirely possible to set up a multi-tier caching hierarchy; for example, |
| wrapping an in-memory CachingHttpClient around one that stores cache entries on |
| disk or remotely in memcached, following a pattern similar to virtual memory, |
| L1/L2 processor caches, etc. |
| </para> |
| </section> |
| </chapter> |