Client Steering Diversity

Problem Description

Delivery services of type CLIENT_STEERING are composed of one or more delivery service targets, and each target typically has its own origin. That gives clients some redundant options in case of origin failure, because the client will be able to retry a different target which is served by a different origin. However, it is also desirable to have redundancy in terms of edge cache failure, but the problem is that CLIENT_STEERING does not take this into account very well. Depending on the delivery-service-to-server assignments and initial dispersion settings of the CLIENT_STEERING service's targets, the result of a CLIENT_STEERING request can include locations that all point to the same edge cache. In the case of edge cache failure, the client might not have any other edge caches in the result to retry, so it will retry the same failed edge cache multiple times. Ideally, there should be a way to configure the CDN to make sure there are as many unique edge caches in a CLIENT_STEERING result as possible so that clients have a diverse set of edge caches to retry in case of failure.

Proposed Change

Add a new TR_PROFILE parameter (client.steering.forced.diversity) that will end up in the CRConfig for Traffic Router to use. If true, Traffic Router will diversify CLIENT_STEERING results by including more unique edge caches. If false or unset, Traffic Router will stick to the old behavior as the default.

Traffic Portal Impact

n/a

Traffic Ops Impact

n/a

REST API Impact

n/a

Client Impact

n/a

Data Model / Database Impact

n/a

ORT Impact

n/a

Traffic Monitor Impact

n/a

Traffic Router Impact

Traffic Router will consume a new TR_PROFILE parameter named client.steering.forced.diversity. Similar to other existing Traffic Router config params, it will end up in the top-level "config" section of the CRConfig.json when it is generated by Traffic Ops.

If the parameter value is set to "true", Traffic Router will change its current behavior in order to diversify CLIENT_STEERING results. Otherwise, Traffic Router will continue to process CLIENT_STEERING requests using the old behavior (non-diverse) as the default.

The way Traffic Router processes CLIENT_STEERING results today is that it basically processes and routes each individual target separately, and the route result of one target does not affect the potential route results of the other targets. With client.steering.forced.diversity set to "true", Traffic Router will track the set of edge caches being returned in the CLIENT_STEERING result as it processes and routes each target. Once an edge cache has already been chosen for a target, that same edge should not be chosen for other targets in the same CLIENT_STEERING result. By doing this, the end CLIENT_STEERING result will contain as many unique edge caches in the cachegroup as possible.

In the case of “deep” cachegroups which might have less caches than the number of targets to route to, Traffic Router will start choosing edge caches from the “regular” cachegroup once all edges from the “deep” cachegroup have been chosen. If there are more targets than available edge caches in the cachegroup, then Traffic Router will start including duplicate edge caches via the old, default behavior until edge caches have been chosen for all targets.

Traffic Stats Impact

n/a

Traffic Vault Impact

n/a

Documentation Impact

The new client.steering.forced.diversity parameter will be documented in Traffic Router's profile parameter configuration section.

Testing Impact

This could be tested via Traffic Router's integration test framework. Given a CLIENT_STEERING delivery service that would normally return all duplicate edges in the result, enable the new parameter and verify that the result is actually diverse as expected.

Performance Impact

No impact to Traffic Router performance should be expected at all with this feature enabled.

Due to the diversification of CLIENT_STEERING results, clients may experience some “first request” latency due to “cold” caches that hadn't previously been taking the same requests.

Security Impact

n/a

Upgrade Impact

This change only impacts Traffic Router, and by default Traffic Router will continue following the existing behavior of non-diverse CLIENT_STEERING (with the feature disabled/not configured). Only once the feature is enabled via the profile parameter will TR change its behavior to make CLIENT_STEERING results more diverse.

The recommended upgrade procedure would be to enable this feature via the profile parameter only after all TRs have been upgraded, so that all TRs can switch over to the new behavior at the same time.

Operations Impact

Operators would need to know about the new TR_PROFILE parameter to enable this feature, but this feature can also be safely ignored if the default (non-diverse) CLIENT_STEERING is sufficient.

Developer Impact

This change will make the logic around finding available caches for a CLIENT_STEERING request slightly more complex, because the code will have to check for the new behavior flag and process the request accordingly. If after some time this change in behavior should become the new default, we could remove the check for the flag and just process all requests as diverse-enabled.

Alternatives

One alternative would be to make this a per-delivery-service setting by adding a new column to the deliveryservice table, but we did not think that level of granularity was necessary and settled on a per-CDN level of granularity by allowing the feature to be enabled via a TR_PROFILE parameter.

There was also the possibility of changing the default behavior of TR altogether instead of enabling it via a TR_PROFILE parameter, but we thought it would be desirable to be able to upgrade/deploy TRs without changing the behavior at the same time.

Another design choice to note was how to handle the case where there are more CLIENT_STEERING targets than available caches to choose from. If all the caches in a deep cachegroup have already been chosen for the same request, caches from the best regular cachegroup will be selected for the request until all targets have a selected cache. If all the caches in a regular cachegroup have already been chosen for the same request, TR will continue to select caches from that same cachegroup (as opposed to the next closest cachegroup or fallback) until all targets have a selected cache.

Dependencies

n/a

References

n/a