| .. Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| |
| .. include:: ../../common.defs |
| |
| .. highlight:: cpp |
| .. default-domain:: cpp |
| |
| .. _developer-doc-hostdb: |
| |
| HostDB |
| ****** |
| |
| HostDB is a cache of DNS results. It is used to increase performance by aggregating address |
| resolution across transactions. HostDB also stores state information for specific IP addresses. |
| |
| Operation |
| ========= |
| |
| The primary operation for HostDB is to resolve a fully qualified domain name ("FQDN"). As noted each |
| FQDN is associated with a single record. Each record has an array of items. When a resolution |
| request is made the database is checked to see if the record is already present. If so, it is |
| served. Otherwise a DNS request is made. When the nameserver replies a record is created, added |
| to the database, and then returned to the requestor. |
| |
| Each info tracks several status values for its corresponding upstream. These are |
| |
| * HTTP version |
| * Last failure time |
| |
| The HTTP version is tracked from responses and provides a mechanism to make intelligent guesses |
| about the protocol to use to the upstream. |
| |
| The last failure time tracks when the last connection failure to the info occurred and doubles as |
| a flag, where a value of ``TS_TIME_ZERO`` indicates a live target and any other value indicates a |
| down info. |
| |
| If an info is marked down (has a non-zero last failure time) there is a "fail window" during which |
| no connections are permitted. After this time the info is considered to be a "zombie". If all infos |
| for a record are down then a specific error message is generated (body factory tag |
| "connect#all_down"). Otherwise if the selected info is a zombie, a request is permitted but the |
| zombie is immediately marked down again, preventing any additional requests until either the fail |
| window has passed or the single connection succeeds. A successful connection clears the last file |
| time and the info becomes alive. |
| |
| Runtime Structure |
| ================= |
| |
| DNS results are stored in a global hash table as instances of ``HostDBRecord``. Each record stores |
| the results of a single query. These records are not updated with new DNS results - instead a new |
| record instance is created and replaces the previous instance in the table. The records are |
| reference counted so such a replacement doesn't invalidate the old record if the latter is still |
| being accessed. Some specific dynamic data is migrated from the old record to the new one, such as |
| the failure status of the upstreams in the record. |
| |
| In each record is a variable length array of items, instances of ``HostDBInfo``, one for each |
| IP address in the record. This is called the "round robin" data for historical reasons. For SRV |
| records there is an additional storage area in the record that is used to store the SRV names. |
| |
| .. figure:: HostDB-Data-Layout.svg |
| |
| The round robin data is accessed by using an offset and count in the base record. For SRV records |
| each record has an offset, relative to that ``HostDBInfo`` instance, for its own name in the name |
| storage area. |
| |
| State information for the outbound connection has been moved to a refurbished ``DNSInfo`` class |
| named ``ResolveInfo``. As much as possible relevant state information has been moved from the |
| ``HttpSM`` to this structure. This is intended for future work where the state machine deals only |
| with upstream transactions and not sessions. |
| |
| ``ResolveInfo`` may contain a reference to a HostDB record, which preserves the record even if it is |
| replaced due to DNS queries in other transactions. The record is not required as the resolution |
| information can be supplied directly without DNS or HostDB, e.g. a plugin sets the upstream address |
| explicitly. The ``resolved_p`` flag indicates if the current information is valid and ready to be |
| used or not. A result of this is there is no longer a specific holder for API provided addresses - |
| the interface now puts the address in the ``ResolveInfo`` and marks it as resolved. This prevents |
| further DNS / HostDB lookups and the address is used as is. |
| |
| The upstream port is a bit tricky and should be cleaned up. Currently value in ``srv_port`` |
| determines the port if set. If not, then the port in ``addr`` is used. |
| |
| Resolution Style |
| ---------------- |
| |
| .. cpp:enum:: OS_Addr |
| |
| Metadata about the source of the resolved address.' |
| |
| .. cpp:enumerator:: TRY_DEFAULT |
| |
| Use default resolution. This is the initial state. |
| |
| .. cpp:enumerator:: TRY_HOSTDB |
| |
| Use HostDB to resolve the target key. |
| |
| .. cpp:enumerator:: TRY_CLIENT |
| |
| Use the client supplied target address. This is used for transparent connections - the upstream |
| address is obtained from the inbound connection. May fail over to HostDB. |
| |
| .. cpp:enumerator:: USE_HOSTDB |
| |
| Use HostDB to resolve the target key. |
| |
| .. cpp:enumerator:: USE_CLIENT |
| |
| Use the client supplied target address. |
| |
| .. cpp:enumerator:: USE_API |
| |
| Use the address provided via the plugin API. |
| |
| The parallel values for using HostDB and the client target address are to control fail over on |
| connection failure. The ``TRY_`` values can fail over to another style, but the ``USE_`` values |
| cannot. This prevents cycles of style changes by having any ``TRY_`` value fail over to a |
| ``USE_`` value, at which point it can no longer change. Note there is no ``TRY_API`` - if a |
| plugin sets the upstream address that is locked in. |
| |
| Issues |
| ====== |
| |
| Currently if an upstream is marked down connections are still permitted, the only change is the |
| number of retries. This has caused operational problems where down systems are flooded with requests |
| which, despite the timeouts, accumulate in ATS until ATS runs out of memory (there were instances of |
| over 800K pending transactions). This also made it hard to bring the upstreams back online. With |
| these changes, requests to upstreams marked down are strongly rate limited and other transactions are |
| immediately terminated with a 502 response, protecting both the upstream and ATS. |
| |
| Future |
| ====== |
| |
| There is still some work to be done in future PRs. |
| |
| * The fail window and the zombie window should be separate values. It is quite reasonable to want |
| to configure a very short fail window (possibly 0) with a moderately long zombie window so that |
| probing connections can immediately start going upstream at a low rate. |
| |
| * Failing an upstream should be more loosely connected to transactions. Currently there is a one |
| to one relationship where failure is defined as the failure of a specific transaction to connect. |
| There are situations where the number of connections attempts for mark a failure is should be |
| larger than the number of retries for a single transaction. For transiently busy upstreams and |
| low latency requests it can be reasonable to tune the per transaction timeout low with no retries |
| but this then risks marking down upstreams that were merely a bit slow at a given moment. |
| |
| * Parallel DNS requests should be supported. This is for both cross family requests and for split |
| DNS. |
| |
| * It would be nice to be able to do the probing connections to an upstream using synthetic requests |
| instead of burning actual user requests. What would be needed is a handoff from ATS to the probe |
| to indicate a particular upstream is considered down, at which point active health checks are done |
| until the upstream is once again alive, at which point this is handed off back to ATS. |
| |
| History |
| ======= |
| |
| This version has several major architectural changes from the previous version. |
| |
| * The data is split into records and info, not handled as a variant of a single data type. This |
| provides a noticeable simplification of the code. |
| |
| * Single and multiple address results are treated identically - a singleton is simply a multiple |
| of size 1. This yields a major simplification of the implementation. |
| |
| * Connections are throttled to upstreams marked down, allowing only a single connection attempt per fail |
| window timing until a connection succeeds. |
| |
| * Timing information is stored in ``std::chrono`` data types instead of proprietary types. |
| |
| * State information has been promoted to atomics and updates are immediate rather than scheduled. |
| This also means the data in the state machine is a reference to a shared object, not a local copy. |
| The promotion was necessary to coordinate zombie connections to upstreams marked down across transactions. |
| |
| * The "resolve key" is now a separate data object from the HTTP request. This is a subtle but |
| major change. The effect is requests can be routed to different upstreams without changing |
| the request. Parent selection can be greatly simplified as it become merely a matter of setting |
| the resolve key, rather than having a completely different code path. |