blob: 921924f2733dead2d34aa88cb59bec49bd2a1d53 [file] [log] [blame]
.. Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed
with this work for additional information regarding copyright
ownership. The ASF licenses this file to you under the Apache
License, Version 2.0 (the "License"); you may not use this file
except in compliance with the License. You may obtain a copy of
the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied. See the License for the specific language governing
permissions and limitations under the License.
.. _admin-plugins-metalink:
Metalink Plugin
===============
The `Metalink` plugin implements the Metalink_ download description
format in order to try not to download the same file twice. This
improves cache efficiency and speeds up users' downloads.
It takes standard headers and knowledge about objects in the cache and
potentially rewrites those headers so that a client will use a URL
that's already cached instead of one that isn't. The headers are
specified in RFC 6249 (Metalink/HTTP: Mirrors and Hashes) and RFC 3230
(Instance Digests in HTTP) and are sent by various download
redirectors or content distribution networks.
A lot of download sites distribute the same files from many different
mirrors and users don't know which mirrors are already cached. These
sites often present users with a simple download button, but the
button doesn't predictably access the same mirror, or a mirror that's
already cached. To users it seems like the download works sometimes
(takes seconds) and not others (takes hours), which is frustrating.
An extreme example of this happens when users share a limited,
possibly unreliable internet connection, as is common in parts of
Africa for example.
How it Works
------------
When the plugin sees a response with a :mailheader:`Location: ...`
header and a :mailheader:`Digest: SHA-256=...` header, it checks if
the URL in the :mailheader:`Location` header is already cached. If it
isn't, then it tries to find a URL that is cached to use instead. It
looks in the cache for some object that matches the digest in the
:mailheader:`Digest` header and if it succeeds, then it rewrites the
:mailheader:`Location` header with that object's URL.
This way a client should get sent to a URL that's already cached and
won't download the file again.
Installation
------------
The `Metalink` plugin is a :term:`global plugin`. Enable it by adding
``metalink.so`` to your :file:`plugin.config` file. There are no
options.
Implementation Status
---------------------
The plugin implements the :c:data:`TS_HTTP_SEND_RESPONSE_HDR_HOOK`
hook to check and potentially rewrite the :mailheader:`Location` and
:mailheader:`Digest` headers after responses are cached. It doesn't
do it before they're cached because the contents of the cache can
change after responses are cached. It uses :c:func:`TSCacheRead` to
check if the URL in the :mailheader:`Location` header is already
cached. In future, the plugin should also check if the URL is fresh
or not.
The plugin implements the :c:data:`TS_HTTP_READ_RESPONSE_HDR_HOOK`
hook and :ref:`a null transformation <developer-plugins-http-transformations-null-transform>`
to compute the SHA-256 digest for
content as it's added to the cache. It uses SHA256_Init(),
SHA256_Update(), and SHA256_Final() from OpenSSL to compute the
digest, then it uses :c:func:`TSCacheWrite` to associate the digest
with the request URL. This adds a new cache object where the key is
the digest and the object is the request URL.
To check if the cache already contains content that matches a digest,
the plugin must call :c:func:`TSCacheRead` with the digest as the key,
read the URL stored in the resultant object, and then call
:c:func:`TSCacheRead` again with this URL as the key. This is
probably inefficient and should be improved.
An early version of the plugin scanned :mailheader:`Link: <...>;
rel=duplicate` headers. If the URL in the :mailheader:`Location: ...`
header wasn't already cached, it scanned :mailheader:`Link: <...>;
rel=duplicate` headers for a URL that was. The :mailheader:`Digest:
SHA-256=...` header is superior because it will find content that
already exists in the cache in every case that a :mailheader:`Link:
<...>; rel=duplicate` header would, plus in cases where the URL is not
listed among the :mailheader:`Link: <...>; rel=duplicate` headers,
maybe because the content was downloaded from a URL not participating
in the content distribution network, or maybe because there are too
many mirrors to list in :mailheader:`Link: <...>; rel=duplicate`
headers.
The :mailheader:`Digest: SHA-256=...` header is also more efficient
than :mailheader:`Link: <...>; rel=duplicate` headers because it
involves a constant number of cache lookups. RFC 6249 requires a
:mailheader:`Digest: SHA-256=...` header or :mailheader:`Link: <...>;
rel=duplicate` headers MUST be ignored:
If Instance Digests are not provided by the Metalink servers, the
:mailheader:`Link` header fields pertaining to this specification
MUST be ignored.
Metalinks contain whole file hashes as described in Section 6, and
MUST include SHA-256, as specified in [FIPS-180-3].
.. _Metalink: http://en.wikipedia.org/wiki/Metalink