| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> |
| <HTML> |
| <HEAD> |
| <TITLE>Apache Content Negotiation</TITLE> |
| </HEAD> |
| |
| <!-- Background white, links blue (unvisited), navy (visited), red (active) --> |
| <BODY |
| BGCOLOR="#FFFFFF" |
| TEXT="#000000" |
| LINK="#0000FF" |
| VLINK="#000080" |
| ALINK="#FF0000" |
| > |
| <!--#include virtual="header.html" --> |
| <H1 ALIGN="CENTER">Content Negotiation</H1> |
| |
| <P> |
| Apache's support for content negotiation has been updated to meet the |
| HTTP/1.1 specification. It can choose the best representation of a |
| resource based on the browser-supplied preferences for media type, |
| languages, character set and encoding. It is also implements a |
| couple of features to give more intelligent handling of requests from |
| browsers which send incomplete negotiation information. <P> |
| |
| Content negotiation is provided by the |
| <A HREF="mod/mod_negotiation.html">mod_negotiation</A> module, |
| which is compiled in by default. |
| |
| <HR> |
| |
| <H2>About Content Negotiation</H2> |
| |
| <P> |
| A resource may be available in several different representations. For |
| example, it might be available in different languages or different |
| media types, or a combination. One way of selecting the most |
| appropriate choice is to give the user an index page, and let them |
| select. However it is often possible for the server to choose |
| automatically. This works because browsers can send as part of each |
| request information about what representations they prefer. For |
| example, a browser could indicate that it would like to see |
| information in French, if possible, else English will do. Browsers |
| indicate their preferences by headers in the request. To request only |
| French representations, the browser would send |
| |
| <PRE> |
| Accept-Language: fr |
| </PRE> |
| |
| <P> |
| Note that this preference will only be applied when there is a choice |
| of representations and they vary by language. |
| <P> |
| |
| As an example of a more complex request, this browser has been |
| configured to accept French and English, but prefer French, and to |
| accept various media types, preferring HTML over plain text or other |
| text types, and preferring GIF or JPEG over other media types, but also |
| allowing any other media type as a last resort: |
| |
| <PRE> |
| Accept-Language: fr; q=1.0, en; q=0.5 |
| Accept: text/html; q=1.0, text/*; q=0.8, image/gif; q=0.6, |
| image/jpeg; q=0.6, image/*; q=0.5, */*; q=0.1 |
| </PRE> |
| |
| Apache 1.2 supports 'server driven' content negotiation, as defined in |
| the HTTP/1.1 specification. It fully supports the Accept, |
| Accept-Language, Accept-Charset and Accept-Encoding request headers. |
| Apache 1.3.4 also supports 'transparent' content negotiation, which is |
| an experimental negotiation protocol defined in RFC 2295 and RFC 2296. |
| It does not offer support for 'feature negotiation' as defined in |
| these RFCs. |
| <P> |
| |
| A <STRONG>resource</STRONG> is a conceptual entity identified by a URI |
| (RFC 2396). An HTTP server like Apache provides access to |
| <STRONG>representations</STRONG> of the resource(s) within its namespace, |
| with each representation in the form of a sequence of bytes with a |
| defined media type, character set, encoding, etc. Each resource may be |
| associated with zero, one, or more than one representation |
| at any given time. If multiple representations are available, |
| the resource is referred to as <STRONG>negotiable</STRONG> and each of its |
| representations is termed a <STRONG>variant</STRONG>. The ways in which the |
| variants for a negotiable resource vary are called the |
| <STRONG>dimensions</STRONG> of negotiation. |
| |
| <H2>Negotiation in Apache</H2> |
| |
| <P> |
| In order to negotiate a resource, the server needs to be given |
| information about each of the variants. This is done in one of two |
| ways: |
| |
| <UL> |
| <LI> Using a type map (<EM>i.e.</EM>, a <CODE>*.var</CODE> file) which |
| names the files containing the variants explicitly, or |
| <LI> Using a 'MultiViews' search, where the server does an implicit |
| filename pattern match and chooses from among the results. |
| </UL> |
| |
| <H3>Using a type-map file</H3> |
| |
| <P> |
| A type map is a document which is associated with the handler |
| named <CODE>type-map</CODE> (or, for backwards-compatibility with |
| older Apache configurations, the mime type |
| <CODE>application/x-type-map</CODE>). Note that to use this feature, |
| you must have a handler set in the configuration that defines a |
| file suffix as <CODE>type-map</CODE>; this is best done with a |
| |
| <PRE> |
| AddHandler type-map var |
| </PRE> |
| |
| in the server configuration file. See the comments in the sample config |
| file for more details. <P> |
| |
| Type map files have an entry for each available variant; these entries |
| consist of contiguous HTTP-format header lines. Entries for |
| different variants are separated by blank lines. Blank lines are |
| illegal within an entry. It is conventional to begin a map file with |
| an entry for the combined entity as a whole (although this |
| is not required, and if present will be ignored). An example |
| map file is: |
| |
| <PRE> |
| URI: foo |
| |
| URI: foo.en.html |
| Content-type: text/html |
| Content-language: en |
| |
| URI: foo.fr.de.html |
| Content-type: text/html;charset=iso-8859-2 |
| Content-language: fr, de |
| </PRE> |
| |
| If the variants have different source qualities, that may be indicated |
| by the "qs" parameter to the media type, as in this picture (available |
| as jpeg, gif, or ASCII-art): |
| |
| <PRE> |
| URI: foo |
| |
| URI: foo.jpeg |
| Content-type: image/jpeg; qs=0.8 |
| |
| URI: foo.gif |
| Content-type: image/gif; qs=0.5 |
| |
| URI: foo.txt |
| Content-type: text/plain; qs=0.01 |
| </PRE> |
| <P> |
| |
| qs values can vary in the range 0.000 to 1.000. Note that any variant with |
| a qs value of 0.000 will never be chosen. Variants with no 'qs' |
| parameter value are given a qs factor of 1.0. The qs parameter indicates |
| the relative 'quality' of this variant compared to the other available |
| variants, independent of the client's capabilities. For example, a jpeg |
| file is usually of higher source quality than an ascii file if it is |
| attempting to represent a photograph. However, if the resource being |
| represented is an original ascii art, then an ascii representation would |
| have a higher source quality than a jpeg representation. A qs value |
| is therefore specific to a given variant depending on the nature of |
| the resource it represents. |
| |
| <P> |
| The full list of headers recognized is: |
| |
| <DL> |
| <DT> <CODE>URI:</CODE> |
| <DD> uri of the file containing the variant (of the given media |
| type, encoded with the given content encoding). These are |
| interpreted as URLs relative to the map file; they must be on |
| the same server (!), and they must refer to files to which the |
| client would be granted access if they were to be requested |
| directly. |
| <DT> <CODE>Content-Type:</CODE> |
| <DD> media type --- charset, level and "qs" parameters may be given. These |
| are often referred to as MIME types; typical media types are |
| <CODE>image/gif</CODE>, <CODE>text/plain</CODE>, or |
| <CODE>text/html; level=3</CODE>. |
| <DT> <CODE>Content-Language:</CODE> |
| <DD> The languages of the variant, specified as an Internet standard |
| language tag from RFC 1766 (<EM>e.g.</EM>, <CODE>en</CODE> for English, |
| <CODE>kr</CODE> for Korean, <EM>etc.</EM>). |
| <DT> <CODE>Content-Encoding:</CODE> |
| <DD> If the file is compressed, or otherwise encoded, rather than |
| containing the actual raw data, this says how that was done. |
| Apache only recognizes encodings that are defined by an |
| <A HREF="mod/mod_mime.html#addencoding">AddEncoding</A> directive. |
| This normally includes the encodings <CODE>x-compress</CODE> |
| for compress'd files, and <CODE>x-gzip</CODE> for gzip'd files. |
| The <CODE>x-</CODE> prefix is ignored for encoding comparisons. |
| <DT> <CODE>Content-Length:</CODE> |
| <DD> The size of the file. Specifying content |
| lengths in the type-map allows the server to compare file sizes |
| without checking the actual files. |
| <DT> <CODE>Description:</CODE> |
| <DD> A human-readable textual description of the variant. If Apache cannot |
| find any appropriate variant to return, it will return an error |
| response which lists all available variants instead. Such a variant |
| list will include the human-readable variant descriptions. |
| </DL> |
| |
| <H3>Multiviews</H3> |
| |
| <P> |
| <CODE>MultiViews</CODE> is a per-directory option, meaning it can be set with |
| an <CODE>Options</CODE> directive within a <CODE><Directory></CODE>, |
| <CODE><Location></CODE> or <CODE><Files></CODE> |
| section in <CODE>access.conf</CODE>, or (if <CODE>AllowOverride</CODE> |
| is properly set) in <CODE>.htaccess</CODE> files. Note that |
| <CODE>Options All</CODE> does not set <CODE>MultiViews</CODE>; you |
| have to ask for it by name. |
| |
| <P> |
| The effect of <CODE>MultiViews</CODE> is as follows: if the server |
| receives a request for <CODE>/some/dir/foo</CODE>, if |
| <CODE>/some/dir</CODE> has <CODE>MultiViews</CODE> enabled, and |
| <CODE>/some/dir/foo</CODE> does <EM>not</EM> exist, then the server reads the |
| directory looking for files named foo.*, and effectively fakes up a |
| type map which names all those files, assigning them the same media |
| types and content-encodings it would have if the client had asked for |
| one of them by name. It then chooses the best match to the client's |
| requirements. |
| |
| <P> |
| <CODE>MultiViews</CODE> may also apply to searches for the file named by the |
| <CODE>DirectoryIndex</CODE> directive, if the server is trying to |
| index a directory. If the configuration files specify |
| |
| <PRE> |
| DirectoryIndex index |
| </PRE> |
| |
| then the server will arbitrate between <CODE>index.html</CODE> |
| and <CODE>index.html3</CODE> if both are present. If neither are |
| present, and <CODE>index.cgi</CODE> is there, the server will run it. |
| |
| <P> |
| If one of the files found when reading the directive is a CGI script, |
| it's not obvious what should happen. The code gives that case |
| special treatment --- if the request was a POST, or a GET with |
| QUERY_ARGS or PATH_INFO, the script is given an extremely high quality |
| rating, and generally invoked; otherwise it is given an extremely low |
| quality rating, which generally causes one of the other views (if any) |
| to be retrieved. |
| |
| <H2>The Negotiation Methods</H2> |
| |
| After Apache has obtained a list of the variants for a given resource, |
| either from a type-map file or from the filenames in the directory, it |
| invokes one of two methods to decide on the 'best' variant to |
| return, if any. It is not necessary to know any of the details of how |
| negotiation actually takes place in order to use Apache's content |
| negotiation features. However the rest of this document explains the |
| methods used for those interested. |
| <P> |
| |
| There are two negotiation methods: |
| |
| <OL> |
| |
| <LI><STRONG>Server driven negotiation with the Apache |
| algorithm</STRONG> is used in the normal case. The Apache algorithm is |
| explained in more detail below. When this algorithm is used, Apache |
| can sometimes 'fiddle' the quality factor of a particular dimension to |
| achieve a better result. The ways Apache can fiddle quality factors is |
| explained in more detail below. |
| |
| <LI><STRONG>Transparent content negotiation</STRONG> is used when the |
| browser specifically requests this through the mechanism defined in RFC |
| 2295. This negotiation method gives the browser full control over |
| deciding on the 'best' variant, the result is therefore dependent on |
| the specific algorithms used by the browser. As part of the |
| transparent negotiation process, the browser can ask Apache to run the |
| 'remote variant selection algorithm' defined in RFC 2296. </UL> |
| |
| |
| <H3>Dimensions of Negotiation</H3> |
| |
| <TABLE> |
| <TR valign="top"> |
| <TH>Dimension |
| <TH>Notes |
| <TR valign="top"> |
| <TD>Media Type |
| <TD>Browser indicates preferences with the Accept header field. Each item |
| can have an associated quality factor. Variant description can also |
| have a quality factor (the "qs" parameter). |
| <TR valign="top"> |
| <TD>Language |
| <TD>Browser indicates preferences with the Accept-Language header field. |
| Each item can have a quality factor. Variants can be associated with none, one |
| or more than one language. |
| <TR valign="top"> |
| <TD>Encoding |
| <TD>Browser indicates preference with the Accept-Encoding header field. |
| Each item can have a quality factor. |
| <TR valign="top"> |
| <TD>Charset |
| <TD>Browser indicates preference with the Accept-Charset header field. |
| Each item can have a quality factor. |
| Variants can indicate a charset as a parameter of the media type. |
| </TABLE> |
| |
| <H3>Apache Negotiation Algorithm</H3> |
| |
| <P> |
| Apache can use the following algorithm to select the 'best' variant |
| (if any) to return to the browser. This algorithm is not |
| further configurable. It operates as follows: |
| |
| <OL> |
| <LI>First, for each dimension of the negotiation, check the appropriate |
| <EM>Accept*</EM> header field and assign a quality to each |
| variant. If the <EM>Accept*</EM> header for any dimension implies that this |
| variant is not acceptable, eliminate it. If no variants remain, go |
| to step 4. |
| |
| <LI>Select the 'best' variant by a process of elimination. Each of the |
| following tests is applied in order. Any variants not selected at each |
| test are eliminated. After each test, if only one variant remains, |
| select it as the best match and proceed to step 3. If more than one |
| variant remains, move on to the next test. |
| |
| <OL> |
| <LI>Multiply the quality factor from the Accept header with the |
| quality-of-source factor for this variant's media type, and select |
| the variants with the highest value. |
| |
| <LI>Select the variants with the highest language quality factor. |
| |
| <LI>Select the variants with the best language match, using either the |
| order of languages in the Accept-Language header (if present), or else |
| else the order of languages in the <CODE>LanguagePriority</CODE> |
| directive (if present). |
| |
| <LI>Select the variants with the highest 'level' media parameter |
| (used to give the version of text/html media types). |
| |
| <LI>Select variants with the best charset media parameters, |
| as given on the Accept-Charset header line. Charset ISO-8859-1 |
| is acceptable unless explicitly excluded. Variants with a |
| <CODE>text/*</CODE> media type but not explicitly associated |
| with a particular charset are assumed to be in ISO-8859-1. |
| |
| <LI>Select those variants which have associated |
| charset media parameters that are <EM>not</EM> ISO-8859-1. |
| If there are no such variants, select all variants instead. |
| |
| <LI>Select the variants with the best encoding. If there are |
| variants with an encoding that is acceptable to the user-agent, |
| select only these variants. Otherwise if there is a mix of encoded |
| and non-encoded variants, select only the unencoded variants. |
| If either all variants are encoded or all variants are not encoded, |
| select all variants. |
| |
| <LI>Select the variants with the smallest content length. |
| |
| <LI>Select the first variant of those remaining. This will be either the |
| first listed in the type-map file, or when variants are read from |
| the directory, the one whose file name comes first when sorted using |
| ASCII code order. |
| |
| </OL> |
| |
| <LI>The algorithm has now selected one 'best' variant, so return |
| it as the response. The HTTP response header Vary is set to indicate the |
| dimensions of negotiation (browsers and caches can use this |
| information when caching the resource). End. |
| |
| <LI>To get here means no variant was selected (because none are acceptable |
| to the browser). Return a 406 status (meaning "No acceptable representation") |
| with a response body consisting of an HTML document listing the |
| available variants. Also set the HTTP Vary header to indicate the |
| dimensions of variance. |
| |
| </OL> |
| |
| <H2><A NAME="better">Fiddling with Quality Values</A></H2> |
| |
| <P> |
| Apache sometimes changes the quality values from what would be |
| expected by a strict interpretation of the Apache negotiation |
| algorithm above. This is to get a better result from the algorithm for |
| browsers which do not send full or accurate information. Some of the |
| most popular browsers send Accept header information which would |
| otherwise result in the selection of the wrong variant in many |
| cases. If a browser sends full and correct information these fiddles |
| will not be applied. |
| <P> |
| |
| <H3>Media Types and Wildcards</H3> |
| |
| <P> |
| The Accept: request header indicates preferences for media types. It |
| can also include 'wildcard' media types, such as "image/*" or "*/*" |
| where the * matches any string. So a request including: |
| <PRE> |
| Accept: image/*, */* |
| </PRE> |
| |
| would indicate that any type starting "image/" is acceptable, |
| as is any other type (so the first "image/*" is redundant). Some |
| browsers routinely send wildcards in addition to explicit types they |
| can handle. For example: |
| <PRE> |
| Accept: text/html, text/plain, image/gif, image/jpeg, */* |
| </PRE> |
| |
| The intention of this is to indicate that the explicitly |
| listed types are preferred, but if a different representation is |
| available, that is ok too. However under the basic algorithm, as given |
| above, the */* wildcard has exactly equal preference to all the other |
| types, so they are not being preferred. The browser should really have |
| sent a request with a lower quality (preference) value for *.*, such |
| as: |
| <PRE> |
| Accept: text/html, text/plain, image/gif, image/jpeg, */*; q=0.01 |
| </PRE> |
| |
| The explicit types have no quality factor, so they default to a |
| preference of 1.0 (the highest). The wildcard */* is given |
| a low preference of 0.01, so other types will only be returned if |
| no variant matches an explicitly listed type. |
| <P> |
| |
| If the Accept: header contains <EM>no</EM> q factors at all, Apache sets |
| the q value of "*/*", if present, to 0.01 to emulate the desired |
| behavior. It also sets the q value of wildcards of the format |
| "type/*" to 0.02 (so these are preferred over matches against |
| "*/*". If any media type on the Accept: header contains a q factor, |
| these special values are <EM>not</EM> applied, so requests from browsers |
| which send the correct information to start with work as expected. |
| |
| <H3>Variants with no Language</H3> |
| |
| <P> |
| If some of the variants for a particular resource have a language |
| attribute, and some do not, those variants with no language |
| are given a very low language quality factor of 0.001.<P> |
| |
| The reason for setting this language quality factor for |
| variant with no language to a very low value is to allow |
| for a default variant which can be supplied if none of the |
| other variants match the browser's language preferences. |
| |
| For example, consider the situation with three variants: |
| |
| <UL> |
| <LI>foo.en.html, language en |
| <LI>foo.fr.html, language en |
| <LI>foo.html, no language |
| </UL> |
| |
| <P> |
| The meaning of a variant with no language is that it is |
| always acceptable to the browser. If the request Accept-Language |
| header includes either en or fr (or both) one of foo.en.html |
| or foo.fr.html will be returned. If the browser does not list |
| either en or fr as acceptable, foo.html will be returned instead. |
| |
| <H2>Extensions to Transparent Content Negotiation</H2> |
| |
| Apache extends the transparent content negotiation protocol (RFC 2295) |
| as follows. A new <CODE> {encoding ..}</CODE> element is used in |
| variant lists to label variants which are available with a specific |
| content-encoding only. The implementation of the |
| RVSA/1.0 algorithm (RFC 2296) is extended to recognize encoded |
| variants in the list, and to use them as candidate variants whenever |
| their encodings are acceptable according to the Accept-Encoding |
| request header. The RVSA/1.0 implementation does not round computed |
| quality factors to 5 decimal places before choosing the best variant. |
| |
| <H2>Note on hyperlinks and naming conventions</H2> |
| |
| <P> |
| If you are using language negotiation you can choose between |
| different naming conventions, because files can have more than one |
| extension, and the order of the extensions is normally irrelevant |
| (see <A HREF="mod/mod_mime.html">mod_mime</A> documentation for details). |
| <P> |
| A typical file has a MIME-type extension (<EM>e.g.</EM>, <SAMP>html</SAMP>), |
| maybe an encoding extension (<EM>e.g.</EM>, <SAMP>gz</SAMP>), and of course a |
| language extension (<EM>e.g.</EM>, <SAMP>en</SAMP>) when we have different |
| language variants of this file. |
| |
| <P> |
| Examples: |
| <UL> |
| <LI>foo.en.html |
| <LI>foo.html.en |
| <LI>foo.en.html.gz |
| </UL> |
| |
| <P> |
| Here some more examples of filenames together with valid and invalid |
| hyperlinks: |
| </P> |
| |
| <TABLE BORDER=1 CELLPADDING=8 CELLSPACING=0> |
| <TR> |
| <TH>Filename</TH> |
| <TH>Valid hyperlink</TH> |
| <TH>Invalid hyperlink</TH> |
| </TR> |
| <TR> |
| <TD><EM>foo.html.en</EM></TD> |
| <TD>foo<BR> |
| foo.html</TD> |
| <TD>-</TD> |
| </TR> |
| <TR> |
| <TD><EM>foo.en.html</EM></TD> |
| <TD>foo</TD> |
| <TD>foo.html</TD> |
| </TR> |
| <TR> |
| <TD><EM>foo.html.en.gz</EM></TD> |
| <TD>foo<BR> |
| foo.html</TD> |
| <TD>foo.gz<BR> |
| foo.html.gz</TD> |
| </TR> |
| <TR> |
| <TD><EM>foo.en.html.gz</EM></TD> |
| <TD>foo</TD> |
| <TD>foo.html<BR> |
| foo.html.gz<BR> |
| foo.gz</TD> |
| </TR> |
| <TR> |
| <TD><EM>foo.gz.html.en</EM></TD> |
| <TD>foo<BR> |
| foo.gz<BR> |
| foo.gz.html</TD> |
| <TD>foo.html</TD> |
| </TR> |
| <TR> |
| <TD><EM>foo.html.gz.en</EM></TD> |
| <TD>foo<BR> |
| foo.html<BR> |
| foo.html.gz</TD> |
| <TD>foo.gz</TD> |
| </TR> |
| </TABLE> |
| |
| <P> |
| Looking at the table above you will notice that it is always possible to |
| use the name without any extensions in an hyperlink (<EM>e.g.</EM>, <SAMP>foo</SAMP>). |
| The advantage is that you can hide the actual type of a |
| document rsp. file and can change it later, <EM>e.g.</EM>, from <SAMP>html</SAMP> |
| to <SAMP>shtml</SAMP> or <SAMP>cgi</SAMP> without changing any |
| hyperlink references. |
| |
| <P> |
| If you want to continue to use a MIME-type in your hyperlinks (<EM>e.g.</EM> |
| <SAMP>foo.html</SAMP>) the language extension (including an encoding extension |
| if there is one) must be on the right hand side of the MIME-type extension |
| (<EM>e.g.</EM>, <SAMP>foo.html.en</SAMP>). |
| |
| |
| <H2>Note on Caching</H2> |
| |
| <P> |
| When a cache stores a representation, it associates it with the request URL. |
| The next time that URL is requested, the cache can use the stored |
| representation. But, if the resource is negotiable at the server, |
| this might result in only the first requested variant being cached and |
| subsequent cache hits might return the wrong response. To prevent this, |
| Apache normally marks all responses that are returned after content negotiation |
| as non-cacheable by HTTP/1.0 clients. Apache also supports the HTTP/1.1 |
| protocol features to allow caching of negotiated responses. <P> |
| |
| For requests which come from a HTTP/1.0 compliant client (either a |
| browser or a cache), the directive <TT>CacheNegotiatedDocs</TT> can be |
| used to allow caching of responses which were subject to negotiation. |
| This directive can be given in the server config or virtual host, and |
| takes no arguments. It has no effect on requests from HTTP/1.1 clients. |
| |
| <!--#include virtual="footer.html" --> |
| </BODY> |
| </HTML> |