blob: 33a79317b1d7dcf35b2e6460415d37fbce1e45df [file] [log] [blame]
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
=============
update.config
=============
.. configfile:: update.config
The :file:`update.config` file controls how Traffic Server performs a
scheduled update of specific local cache content. The file contains a
list of URLs specifying objects that you want to schedule for update.
A scheduled update performs a local HTTP ``GET`` on the objects at the
specific time or interval. You can control the following parameters for
each specified object:
- The URL
- URL-specific request headers, which overrides the default
- The update time and interval
- The recursion depth
After you modify the :file:`update.config` file,
run the :option:`traffic_line -x`
command to apply changes. When you apply changes to one node in a
cluster, Traffic Server automatically applies the changes to all other
nodes in the cluster.
Supported Tag/Attribute Pairs
=============================
Scheduled update supports the following tag/attribute pairs when
performing recursive URL updates:
- ``<a href=" ">``
- ``<img src=" ">``
- ``<img href=" ">``
- ``<body background=" ">``
- ``<frame src=" ">``
- ``<iframe src=" ">``
- ``<fig src=" ">``
- ``<overlay src=" ">``
- ``<applet code=" ">``
- ``<script src=" ">``
- ``<embed src=" ">``
- ``<bgsound src=" ">``
- ``<area href=" ">``
- ``<base href=" ">``
- ``<meta content=" ">``
Scheduled update is designed to operate on URL sets consisting of
hundreds of input URLs (expanded to thousands when recursive URLs are
included); it is *not* intended to operate on extremely large URL sets,
such as those used by Internet crawlers.
Format
======
Each line in the :file:`update.config` file uses the following format::
URL\request_headers\offset_hour\interval\recursion_depth\
The following list describes each field.
.. _update-config-format-url:
*URL*
HTTP-based URLs.
.. _update-config-format-request-headers:
*request_headers*
Optional. A list of headers, separated by semicolons, passed in each
``GET`` request. You can define any request header that conforms to
the HTTP specification; the default is no request header.
.. _update-config-format-offset-hour:
*offset_hour*
The base hour used to derive the update periods. The range is 00-23
hours.
.. _update-config-format-interval:
*interval*
The interval (in seconds) at which updates should occur, starting at
the offset hour.
.. _update-config-format-reecursion-depth:
*recursion_depth*
The depth to which referenced URLs are recursively updated, starting
at the given URL. This field applies only to HTTP.
Examples
========
An example HTTP scheduled update is provided below:
::
http://www.company.com\User-Agent: noname user agent\13\3600\5\
The example specifies the URL and request headers, an offset hour of 13
(1 pm), an interval of one hour, and a recursion depth of 5. This would
result in updates at 13:00, 14:00, 15:00, and so on. To schedule an
update that occurs only once a day, use an interval value 86400 (i.e.,
24 hours x 60 minutes x 60 seconds = 86400).
.. XXX: The following seems misplaced here, and is probably better off placed in an apendix.
Specifying URL Regular Expressions (``url_regex``)
==================================================
This section describes how to specify a ``url_regex``. Entries of type
``url_regex`` within the configuration files use regular expressions to
perform a match.
The following list provides examples to show how to create a valid
``url_regex``.
``x``
Matches the character ``x``
``.``
Match any character
``^``
Specifies beginning of line
``$``
Specifies end of line
``[xyz]``
A **character class**. In this case, the pattern matches either
``x``, ``y``, or\ ``z``
``[abj-oZ]``
A **character class** with a range. This pattern matches ``a``,
``b``, any letter from ``j`` through ``o``, or ``Z``
``[^A-Z]``
A **negated character class**. For example, this pattern matches any
character except those in the class.
``r*``
Zero or more ``r``, where ``r`` is any regular expression.
``r+``
One or more ``r``, where ``r`` is any regular expression.
``r?``
Zero or one ``r``, where ``r`` is any regular expression.
``r{2,5}``
From two to five ``r``, where ``r`` is any regular expression.
``r{2,}``
Two or more ``r``, where ``r`` is any regular expression.
``r{4}``
Exactly four ``r``, where ``r`` is any regular expression.
``"[xyz]\"images"``
The literal string ``[xyz]"images"``
``\X``
If ``X`` is ``a, b, f, n, r, t,`` or ``v``, then the ``ANSI-C``
interpretation of ``\x``; otherwise, a literal ``X``. This is used
to escape operators such as ``*``
``\0``
A ``NULL`` character
``\123``
The character with octal value ``123``
``\x2a``
The character with hexadecimal value ``2a``
``(r)``
Matches an ``r``, where ``r`` is any regular expression. You can use
parentheses to override precedence.
``rs``
The regular expression ``r``, followed by the regular expression
``s``
``r|s``
Either an ``r`` or an ``s``
``#<n>#``
Inserts an **end node**, which causes regular expression matching to
stop when reached. The value ``n`` is returned.
You can specify ``dest_domain=mydomain.com`` to match any host in
``mydomain.com``. Likewise, you can specify ``dest_domain=.`` to match
any request.