src/docbkx/caching.xml - httpcomponents-client - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!DOCTYPE preface PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
 "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">
 <!--
     ====================================================================
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
     ====================================================================
 -->
 <chapter id="caching">
   <title>HTTP Caching</title>

   <section id="generalconcepts">
     <title>General Concepts</title>

     <para>HttpClient Cache provides an HTTP/1.1-compliant caching layer to be
     used with HttpClient--the Java equivalent of a browser cache. The
     implementation follows the Chain of Responsibility design pattern, where the
     caching HttpClient implementation can serve a drop-in replacement for
     the default non-caching HttpClient implementation; requests that can be
     satisfied entirely from the cache will not result in actual origin requests.
     Stale cache entries are automatically validated with the origin where possible,
     using conditional GETs and the If-Modified-Since and/or If-None-Match request
     headers.
     </para>

     <para>
     HTTP/1.1 caching in general is designed to be <emphasis>semantically
     transparent</emphasis>; that is, a cache should not change the meaning of
     the request-response exchange between client and server. As such, it should
     be safe to drop a caching HttpClient into an existing compliant client-server
     relationship. Although the caching module is part of the client from an
     HTTP protocol point of view, the implementation aims to be compatible with
     the requirements placed on a transparent caching proxy.
     </para>

     <para>Finally, caching HttpClient includes support the Cache-Control
     extensions specified by RFC 5861 (stale-if-error and stale-while-revalidate).
     </para>

     <para>When caching HttpClient executes a request, it goes through the
     following flow:</para>

     <orderedlist>
       <listitem>
         <para>Check the request for basic compliance with the HTTP 1.1
         protocol and attempt to correct the request.</para>
       </listitem>

       <listitem>
         <para>Flush any cache entries which would be invalidated by this
         request.</para>
       </listitem>

       <listitem>
         <para>Determine if the current request would be servable from cache.
         If not, directly pass through the request to the origin server and
         return the response, after caching it if appropriate.</para>
       </listitem>

       <listitem>
         <para>If it was a a cache-servable request, it will attempt to read it
         from the cache. If it is not in the cache, call the origin server and
         cache the response, if appropriate.</para>
       </listitem>

       <listitem>
         <para>If the cached response is suitable to be served as a response,
         construct a BasicHttpResponse containing a ByteArrayEntity and return
         it. Otherwise, attempt to revalidate the cache entry against the
         origin server.</para>
       </listitem>

       <listitem>
         <para>In the case of a cached response which cannot be revalidated,
         call the origin server and cache the response, if appropriate.</para>
       </listitem>
     </orderedlist>

     <para>When caching HttpClient receives a response, it goes through the
     following flow:</para>

     <orderedlist>
       <listitem>
         <para>Examining the response for protocol compliance</para>
       </listitem>

       <listitem>
         <para>Determine whether the response is cacheable</para>
       </listitem>

       <listitem>
         <para>If it is cacheable, attempt to read up to the maximum size
         allowed in the configuration and store it in the cache.</para>
       </listitem>

       <listitem>
         <para>If the response is too large for the cache, reconstruct the
         partially consumed response and return it directly without caching
         it.</para>
       </listitem>
     </orderedlist>

     <para>It is important to note that caching HttpClient is not, itself,
     a different implementation of HttpClient, but that it works by inserting
     itself as an additonal processing component to the request execution
     pipeline.</para>
   </section>

   <section id="rfc2616compliance">
     <title>RFC-2616 Compliance</title>

     <para>We believe HttpClient Cache is <emphasis>unconditionally
     compliant</emphasis> with <ulink
     url="http://www.ietf.org/rfc/rfc2616.txt">RFC-2616</ulink>. That is,
     wherever the specification indicates MUST, MUST NOT, SHOULD, or SHOULD NOT
     for HTTP caches, the caching layer attempts to behave in a way that satisfies
     those requirements. This means the caching module won't produce incorrect
     behavior when you drop it in. </para>
   </section>

   <section>
     <title>Example Usage</title>

     <para>This is a simple example of how to set up a basic caching HttpClient.
     As configured, it will store a maximum of 1000 cached objects, each of
     which may have a maximum body size of 8192 bytes. The numbers selected
     here are for example only and not intended to be prescriptive or
     considered as recommendations.</para>

     <programlisting><![CDATA[
 CacheConfig cacheConfig = CacheConfig.custom()
         .setMaxCacheEntries(1000)
         .setMaxObjectSize(8192)
         .build();
 RequestConfig requestConfig = RequestConfig.custom()
         .setConnectTimeout(30000)
         .setSocketTimeout(30000)
         .build();
 CloseableHttpClient cachingClient = CachingHttpClients.custom()
         .setCacheConfig(cacheConfig)
         .setDefaultRequestConfig(requestConfig)
         .build();

 HttpCacheContext context = HttpCacheContext.create();
 HttpGet httpget = new HttpGet("http://www.mydomain.com/content/");
 CloseableHttpResponse response = cachingClient.execute(httpget, context);
 try {
     CacheResponseStatus responseStatus = context.getCacheResponseStatus();
     switch (responseStatus) {
         case CACHE_HIT:
             System.out.println("A response was generated from the cache with " +
                     "no requests sent upstream");
             break;
         case CACHE_MODULE_RESPONSE:
             System.out.println("The response was generated directly by the " +
                     "caching module");
             break;
         case CACHE_MISS:
             System.out.println("The response came from an upstream server");
             break;
         case VALIDATED:
             System.out.println("The response was generated from the cache " +
                     "after validating the entry with the origin server");
             break;
     }
 } finally {
     response.close();
 }
 ]]>
     </programlisting>
   </section>

   <section id="configuration">
     <title>Configuration</title>

     <para>The caching HttpClient inherits all configuration options and parameters
     of the default non-caching implementation (this includes setting options like
     timeouts and connection pool sizes). For caching-specific configuration, you can
     provide a <classname>CacheConfig</classname> instance to customize behavior
     across the following areas:</para>

     <para><emphasis>Cache size.</emphasis> If the backend storage supports these limits,
     you can specify the maximum number of cache entries as well as the maximum cacheable
     response body size.</para>


     <para><emphasis>Public/private caching.</emphasis> By default, the caching module
     considers itself to be a shared (public) cache, and will not, for example, cache
     responses to requests with Authorization headers or responses marked with
     "Cache-Control: private". If, however, the cache is only going to be used by one
     logical "user" (behaving similarly to a browser cache), then you will want to turn
     off the shared cache setting.</para>

     <para><emphasis>Heuristic caching.</emphasis>Per RFC2616, a cache MAY cache
     certain cache entries even if no explicit cache control headers are set by the
     origin. This behavior is off by default, but you may want to turn this on if you
     are working with an origin that doesn't set proper headers but where you still
     want to cache the responses. You will want to enable heuristic caching, then
     specify either a default freshness lifetime and/or a fraction of the time since
     the resource was last modified. See Sections 13.2.2 and 13.2.4 of the HTTP/1.1
     RFC for more details on heuristic caching.</para>

     <para><emphasis>Background validation.</emphasis> The cache module supports the
     stale-while-revalidate directive of RFC5861, which allows certain cache entry
     revalidations to happen in the background. You may want to tweak the settings
     for the minimum and maximum number of background worker threads, as well as the
     maximum time they can be idle before being reclaimed. You can also control the
     size of the queue used for revalidations when there aren't enough workers to
     keep up with demand.</para>
   </section>

   <section id="storage">
     <title>Storage Backends</title>

     <para>The default implementation of caching HttpClient stores cache entries and
     cached response bodies in memory in the JVM of your application. While this
     offers high performance, it may not be appropriate for your application due to
     the limitation on size or because the cache entries are ephemeral and don't
     survive an application restart. The current release includes support for storing
     cache entries using EhCache and memcached implementations, which allow for
     spilling cache entries to disk or storing them in an external process.</para>

     <para>If none of those options are suitable for your application, it is
     possible to provide your own storage backend by implementing the HttpCacheStorage
     interface and then supplying that to caching HttpClient at construction time. In
     this case, the cache entries will be stored using your scheme but you will get to
     reuse all of the logic surrounding HTTP/1.1 compliance and cache handling.
     Generally speaking, it should be possible to create an HttpCacheStorage
     implementation out of anything that supports a key/value store (similar to the
     Java Map interface) with the ability to apply atomic updates.</para>

     <para>Finally, with some extra efforts it's entirely possible to set up
     a multi-tier caching hierarchy; for example, wrapping an in-memory caching
     HttpClient around one that stores cache entries on disk or remotely in memcached,
     following a pattern similar to virtual memory, L1/L2 processor caches, etc.
     </para>
   </section>
 </chapter>
	<?xml version="1.0" encoding="UTF-8"?>
	<!DOCTYPE preface PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
	"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">
	<!--
	====================================================================
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	====================================================================
	-->
	<chapter id="caching">
	<title>HTTP Caching</title>

	<section id="generalconcepts">
	<title>General Concepts</title>

	<para>HttpClient Cache provides an HTTP/1.1-compliant caching layer to be
	used with HttpClient--the Java equivalent of a browser cache. The
	implementation follows the Chain of Responsibility design pattern, where the
	caching HttpClient implementation can serve a drop-in replacement for
	the default non-caching HttpClient implementation; requests that can be
	satisfied entirely from the cache will not result in actual origin requests.
	Stale cache entries are automatically validated with the origin where possible,
	using conditional GETs and the If-Modified-Since and/or If-None-Match request
	headers.
	</para>

	<para>
	HTTP/1.1 caching in general is designed to be <emphasis>semantically
	transparent</emphasis>; that is, a cache should not change the meaning of
	the request-response exchange between client and server. As such, it should
	be safe to drop a caching HttpClient into an existing compliant client-server
	relationship. Although the caching module is part of the client from an
	HTTP protocol point of view, the implementation aims to be compatible with
	the requirements placed on a transparent caching proxy.
	</para>

	<para>Finally, caching HttpClient includes support the Cache-Control
	extensions specified by RFC 5861 (stale-if-error and stale-while-revalidate).
	</para>

	<para>When caching HttpClient executes a request, it goes through the
	following flow:</para>

	<orderedlist>
	<listitem>
	<para>Check the request for basic compliance with the HTTP 1.1
	protocol and attempt to correct the request.</para>
	</listitem>

	<listitem>
	<para>Flush any cache entries which would be invalidated by this
	request.</para>
	</listitem>

	<listitem>
	<para>Determine if the current request would be servable from cache.
	If not, directly pass through the request to the origin server and
	return the response, after caching it if appropriate.</para>
	</listitem>

	<listitem>
	<para>If it was a a cache-servable request, it will attempt to read it
	from the cache. If it is not in the cache, call the origin server and
	cache the response, if appropriate.</para>
	</listitem>

	<listitem>
	<para>If the cached response is suitable to be served as a response,
	construct a BasicHttpResponse containing a ByteArrayEntity and return
	it. Otherwise, attempt to revalidate the cache entry against the
	origin server.</para>
	</listitem>

	<listitem>
	<para>In the case of a cached response which cannot be revalidated,
	call the origin server and cache the response, if appropriate.</para>
	</listitem>
	</orderedlist>

	<para>When caching HttpClient receives a response, it goes through the
	following flow:</para>

	<orderedlist>
	<listitem>
	<para>Examining the response for protocol compliance</para>
	</listitem>

	<listitem>
	<para>Determine whether the response is cacheable</para>
	</listitem>

	<listitem>
	<para>If it is cacheable, attempt to read up to the maximum size
	allowed in the configuration and store it in the cache.</para>
	</listitem>

	<listitem>
	<para>If the response is too large for the cache, reconstruct the
	partially consumed response and return it directly without caching
	it.</para>
	</listitem>
	</orderedlist>

	<para>It is important to note that caching HttpClient is not, itself,
	a different implementation of HttpClient, but that it works by inserting
	itself as an additonal processing component to the request execution
	pipeline.</para>
	</section>

	<section id="rfc2616compliance">
	<title>RFC-2616 Compliance</title>

	<para>We believe HttpClient Cache is <emphasis>unconditionally
	compliant</emphasis> with <ulink
	url="http://www.ietf.org/rfc/rfc2616.txt">RFC-2616</ulink>. That is,
	wherever the specification indicates MUST, MUST NOT, SHOULD, or SHOULD NOT
	for HTTP caches, the caching layer attempts to behave in a way that satisfies
	those requirements. This means the caching module won't produce incorrect
	behavior when you drop it in. </para>
	</section>

	<section>
	<title>Example Usage</title>

	<para>This is a simple example of how to set up a basic caching HttpClient.
	As configured, it will store a maximum of 1000 cached objects, each of
	which may have a maximum body size of 8192 bytes. The numbers selected
	here are for example only and not intended to be prescriptive or
	considered as recommendations.</para>

	<programlisting><![CDATA[
	CacheConfig cacheConfig = CacheConfig.custom()
	.setMaxCacheEntries(1000)
	.setMaxObjectSize(8192)
	.build();
	RequestConfig requestConfig = RequestConfig.custom()
	.setConnectTimeout(30000)
	.setSocketTimeout(30000)
	.build();
	CloseableHttpClient cachingClient = CachingHttpClients.custom()
	.setCacheConfig(cacheConfig)
	.setDefaultRequestConfig(requestConfig)
	.build();

	HttpCacheContext context = HttpCacheContext.create();
	HttpGet httpget = new HttpGet("http://www.mydomain.com/content/");
	CloseableHttpResponse response = cachingClient.execute(httpget, context);
	try {
	CacheResponseStatus responseStatus = context.getCacheResponseStatus();
	switch (responseStatus) {
	case CACHE_HIT:
	System.out.println("A response was generated from the cache with " +
	"no requests sent upstream");
	break;
	case CACHE_MODULE_RESPONSE:
	System.out.println("The response was generated directly by the " +
	"caching module");
	break;
	case CACHE_MISS:
	System.out.println("The response came from an upstream server");
	break;
	case VALIDATED:
	System.out.println("The response was generated from the cache " +
	"after validating the entry with the origin server");
	break;
	}
	} finally {
	response.close();
	}
	]]>
	</programlisting>
	</section>

	<section id="configuration">
	<title>Configuration</title>

	<para>The caching HttpClient inherits all configuration options and parameters
	of the default non-caching implementation (this includes setting options like
	timeouts and connection pool sizes). For caching-specific configuration, you can
	provide a <classname>CacheConfig</classname> instance to customize behavior
	across the following areas:</para>

	<para><emphasis>Cache size.</emphasis> If the backend storage supports these limits,
	you can specify the maximum number of cache entries as well as the maximum cacheable
	response body size.</para>


	<para><emphasis>Public/private caching.</emphasis> By default, the caching module
	considers itself to be a shared (public) cache, and will not, for example, cache
	responses to requests with Authorization headers or responses marked with
	"Cache-Control: private". If, however, the cache is only going to be used by one
	logical "user" (behaving similarly to a browser cache), then you will want to turn
	off the shared cache setting.</para>

	<para><emphasis>Heuristic caching.</emphasis>Per RFC2616, a cache MAY cache
	certain cache entries even if no explicit cache control headers are set by the
	origin. This behavior is off by default, but you may want to turn this on if you
	are working with an origin that doesn't set proper headers but where you still
	want to cache the responses. You will want to enable heuristic caching, then
	specify either a default freshness lifetime and/or a fraction of the time since
	the resource was last modified. See Sections 13.2.2 and 13.2.4 of the HTTP/1.1
	RFC for more details on heuristic caching.</para>

	<para><emphasis>Background validation.</emphasis> The cache module supports the
	stale-while-revalidate directive of RFC5861, which allows certain cache entry
	revalidations to happen in the background. You may want to tweak the settings
	for the minimum and maximum number of background worker threads, as well as the
	maximum time they can be idle before being reclaimed. You can also control the
	size of the queue used for revalidations when there aren't enough workers to
	keep up with demand.</para>
	</section>

	<section id="storage">
	<title>Storage Backends</title>

	<para>The default implementation of caching HttpClient stores cache entries and
	cached response bodies in memory in the JVM of your application. While this
	offers high performance, it may not be appropriate for your application due to
	the limitation on size or because the cache entries are ephemeral and don't
	survive an application restart. The current release includes support for storing
	cache entries using EhCache and memcached implementations, which allow for
	spilling cache entries to disk or storing them in an external process.</para>

	<para>If none of those options are suitable for your application, it is
	possible to provide your own storage backend by implementing the HttpCacheStorage
	interface and then supplying that to caching HttpClient at construction time. In
	this case, the cache entries will be stored using your scheme but you will get to
	reuse all of the logic surrounding HTTP/1.1 compliance and cache handling.
	Generally speaking, it should be possible to create an HttpCacheStorage
	implementation out of anything that supports a key/value store (similar to the
	Java Map interface) with the ability to apply atomic updates.</para>

	<para>Finally, with some extra efforts it's entirely possible to set up
	a multi-tier caching hierarchy; for example, wrapping an in-memory caching
	HttpClient around one that stores cache entries on disk or remotely in memcached,
	following a pattern similar to virtual memory, L1/L2 processor caches, etc.
	</para>
	</section>
	</chapter>