| FEATURE INTRODUCTION | 
 | ==================== | 
 |  | 
 | Feature Name: | 
 | ------------- | 
 | 	Congestion Control | 
 |  | 
 | Synopsis of Feature:  | 
 | --------------------  | 
 |  | 
 | The main purpose of this congestion control feature is to keep track | 
 | of which hosts are congested so that TS will not forward requests to | 
 | those congested hosts; instead, TS will send back to clients a | 
 | Retry-After response to tell them to retry congested hosts at a later | 
 | time. | 
 |  | 
 | CASE 1.  Connection Failures: | 
 | ------------------------------ | 
 | (1) For each request to a live (non-congested) server, TS will try at most m | 
 |     times to connect to the server, and the timeout is n seconds for each try. | 
 |     If TS does not succeed with m tries, then one connection failure is counted | 
 |     towards the server. | 
 |  | 
 |     Note that if a client aborts a request before a timeout occurs, it does not | 
 |     count as a connection failure. | 
 |  | 
 | (2) A server is marked congested if there are more than M connection failures | 
 |     within N seconds. | 
 |  | 
 | (3) If a server is marked congested, then TS will not forward requests to it  | 
 |     until Proxy Retry After Time (PRAT) (which is current time + t) | 
 |  | 
 | (4) For a request to a congested server before the server's PRAT time, TS sends | 
 |     a Retry-After response to tell the client to retry the request after | 
 |     Client Retry After Time (CRAT) (= PRAT - current time + T + a random  | 
 |     interger from 0 to alpha). | 
 |  | 
 | (5) For a request to a congested server after the server's PRAT time, TS will | 
 |     try at most m' times to connect to the server, and the timeout is n'  | 
 |     seconds for each try. | 
 |  | 
 | (6) A congested server will stay dead if TS cannot make a successful  | 
 |     connection; otherwise, the server becomes live again. | 
 |  | 
 | CASE 2: Maximum Number of Connections | 
 | ------------------------------------- | 
 | TS will temporarily mark a server as congested if a "max_connection" number | 
 | to the server is reached. If a new client request comes in and needs a new  | 
 | connection to the server, the client will get 503 Retry-After back.  | 
 | There is no PRAT on the "max_connection" reached servers. | 
 |  | 
 | Here a server can be identified by IP address (per_ip) or by host name  | 
 | (per_host). For example, www.inktomi.com has two IP addresses,  | 
 | 209.131.63.206 and 209.131.63.207. If per_ip is used, then each IP address | 
 | has its own number of connection failures, and each IP address will be marked | 
 | congested or not by itself. That is, if 206 is marked congested (but 207 is  | 
 | not), requests can still be forwarded to 207. On the other hand, if per_host is | 
 | used, then one connection failure to either 206 or 207 will be counted to the | 
 | number if connection failures of host www.inktomi.com. If the host  | 
 | www.inktomi.com is marked congested, then essentially both 206 and 207 are | 
 | marked congested.  | 
 |  | 
 | We can also use prefix as a secondary specifier to specify the scope of  | 
 | congestion control to sub-host (service) area. For example,  | 
 |    dest_host=www.inktomi.com prefix=/cgi/search.exe | 
 | This rule can detect the stop of the cgi program or database it depends on. | 
 | Each specification has an independent counter. The error of requests to  | 
 | www.inktomi.com/index.html will count independently to the counter of this  | 
 | line. The prefix=/cgi/ means all requests to the objects under /cgi/ have | 
 | one common counter with specified parameters. It does not mean each URI  | 
 | under the directory has its own counter. | 
 |  | 
 | The TS administrator will be able to specify the customizeable error_page,  | 
 | the error_page can be customized to return the reason (for example: "The | 
 | site is under maintenance") of congestion for '503'. In the error_page, the | 
 | URL of the page congested, and the Retry-After time can be returned. | 
 |  | 
 | ENGINEERING DESCRIPTION: | 
 | ======================== | 
 |  | 
 | Risk points of feature: | 
 | ----------------------- | 
 |  | 
 | The set of "origin server connect attempts" configuration varibles in  | 
 | records.config will be affected by this feature. See the following  | 
 | "Requirement on Server Management" and the above "Synopsis of feature" | 
 | sections for more information. | 
 |  | 
 | Also, there are some known problematic/unexpected behaviors of this feature. | 
 | See the following "Problematic behavior" section. | 
 |  | 
 | Effect on SDK/API: | 
 | ------------------ | 
 | 	None. | 
 |  | 
 | Management Implications: | 
 | ----------------------- | 
 |  | 
 | <Record Changes>  | 
 | There is one new config variable to enable/disable/test congestion control. | 
 | 	proxy.config.http.congestion_control.enabled INT 0|1|2 | 
 | 	proxy.config.http.congestion_control.filename STRING congestion.config | 
 |  | 
 |  | 
 | <Statistics Changes> | 
 | 1.  Number of congestions because of connection failures | 
 |     stat name: proxy.process.congestion.congested_on_conn_failures | 
 | 2.  Number of congestions because of max_connection reached  | 
 |     stat name: proxy.process.congestion.congested_on_max_connection | 
 |  | 
 |  | 
 | <Config File> | 
 | A new .config file "congestion.config" is used to specify the parameters for  | 
 | different servers. | 
 |  | 
 | Each rule will have one primary key to identify the servers, the primaries  | 
 | can be  | 
 | 	dest_host= | 
 | 	dest_domain= | 
 | 	dest_ip= | 
 | 	regex_host= | 
 |  | 
 | Each rule can also have secondary keys, secondary keys include | 
 | 	prefix=         // for different directory / service | 
 | 	port=           // for different server ports | 
 |  | 
 | The tag=value pairs are used to specify the rules: | 
 |  | 
 | 	max_connection_failures=<integer>	//  M | 
 | 	fail_window=<interger>			//  N | 
 | 	proxy_retry_interval=<integer>		//  t | 
 | 	client_wait_interval=<integer>		//  T | 
 | 	wait_interval_alpha=<integer>		//alpha | 
 | 	live_os_conn_timeout=<integer>		//  n | 
 | 	live_os_conn_retries=<integer>		//  m | 
 | 	dead_os_conn_timeout=<integer>		//  n' | 
 | 	dead_os_conn_retries=<interger>		//  m' | 
 | 	max_connection=<integer>		// -1 means unlimited | 
 | 	error_page=<page uri>           | 
 | 	congestion_scheme=per_ip|per_host | 
 |  | 
 | The suggested default values are as follows: | 
 | 	max_connection_failures=5 | 
 | 	fail_window=120 | 
 | 	proxy_retry_interval=10 | 
 | 	client_wait_interval=300 | 
 | 	wait_interval_alpha=30 | 
 | 	live_os_conn_timeout=60 | 
 | 	live_os_conn_retries=2 | 
 | 	dead_os_conn_timeout=15 | 
 | 	dead_os_conn_retries=1 | 
 | 	max_connection=-1 | 
 | 	error_page="congestion#retryAfter" | 
 | 	congestion_scheme="per_ip" | 
 |  | 
 | The above tag values will be used as default if the tag is not specified  | 
 | in the rule. | 
 |  | 
 | The default values can be overrided by setting the records.config variables | 
 | CONFIG proxy.config.http.congestion_control.default.<tag> <INT|STRING> <value> | 
 |  | 
 | The following "origin server connect attempts" configuration variables may | 
 | be affected by this congestion control feature: | 
 | 	proxy.config.http.connect_attempts_max_retries | 
 | 	proxy.config.http.connect_attempts_max_retries_dead_server | 
 | 	proxy.config.http.connect_attempts_rr_retries | 
 | 	proxy.config.http.connect_attempts_timeout | 
 | 	proxy.config.http.down_server.cache_time | 
 | 	proxy.config.http.down_server.abort_threshold | 
 |  | 
 | For a request to a server that does not have an applicable rule in  | 
 | congestion.config, the values for these "origins server connect attempts"  | 
 | variables are used by TS. Otherwise, the corresponding values specified  | 
 | in congestion.config will override them. | 
 |  | 
 |  | 
 | <Alarm Changes> | 
 | Add two new alarm types to Traffic Manager: | 
 | 1) MGMT_SIGNAL_HTTP_CONGESTED_SERVER   | 
 | 	used to indicate a congested server | 
 | 2) MGMT_SIGNAL_HTTP_ALLEVIATED_SERVER  | 
 | 	used to indicate a congested server is no longer congested | 
 | These alarms are not processed like the other Traffic Manager alarms. | 
 | Whenever these alarms are signalled (even if they are repeat alarms) | 
 | *only* an SNMP trap will be sent. Note, that this means that | 
 | potentially, users can be flooded with SNMP traps if a congested | 
 | server is always signalling an alarm.  | 
 |  | 
 | <Web UI Enhancement> | 
 | For configuration purposes, we will add a new "Congestion Control" tab to the  | 
 | "Configure -> Networking -> Connection Management" section of the web UI.   | 
 | Within this tab users can: | 
 | 1. enable/disable the congestion control feature | 
 | 2. edit the congestion.config file (which will be displayed in a html text box) | 
 |  | 
 |  | 
 | <Command-Line Interface Enhancement> | 
 | Use the traffic_line command-line interface to retrieve the congestion  | 
 | statistics and monitoring information. | 
 | 1. "traffic_line -r <statistic_name>"  | 
 |    Returns the value of the statistic specified  | 
 | 2. "traffic_line -q"  | 
 |    Returns a list of currently congested sites (one site per line);  | 
 |    for each congested site, displays the information in the following format:  | 
 |    '<time>|<rule #>|<hostname>|<ip_address>|<scheme>|<prefix>|<congestion reason>|<F#>|<M#>' | 
 | 	- time : congestion detected time  | 
 | 		 in seconds since 00:00:00 UTC, January 1, 1970. | 
 | 	- rule #  | 
 | 	- hostname | 
 | 	- ip address | 
 | 	- scheme: per_ip or per_host | 
 | 	- prefix (if none, leave blank) | 
 | 	- congestion reason: M or F | 
 |           M - congestion caused by exceeding max connections,  | 
 | 	  F - congestion caused by OS response timeout/failure | 
 |         - F#    number of congested requests because of F | 
 |         - M#    number of congested requests because of M | 
 |  | 
 | NOTE: In order to use "traffic_line -q", raf must be enabled and have a  | 
 |       raf port specified. These are the default values used. | 
 | 	CONFIG proxy.config.raf.enabled INT 1 | 
 | 	CONFIG proxy.config.raf.port INT 9000 | 
 |       If the raf port conflicts with another port, then change it by: | 
 |            traffic_line -s "proxy.config.raf.port" <new-port>  | 
 |  | 
 |  | 
 | Engineering description of feature: | 
 | ----------------------------------- | 
 | Data structure and algorithm: | 
 |  | 
 | Congestion Control Database (in memory and disk): | 
 |  | 
 | Using Multicache Implementation | 
 | CongestEntry{ | 
 | 	unsigned int ip; | 
 | 	int hostname_offset; | 
 | 	int prefix_offset; | 
 | 	int last_failure;  | 
 | 	char fail_history[17]; | 
 | 	unsigned int congestion_scheme; // per_ip | per_host; | 
 | 	unsigned int congested; //0 | 1 | 
 | 	short max_connection; | 
 | 	short num_connection; | 
 | 	short max_connection_failures; | 
 | 	unsigned long   num_congested;    // reserved for per server stat. | 
 | }; | 
 |  | 
 | For each server, TS uses an array of 17 entries to record the number of  | 
 | connection failures. Each entry is 16-bit long and records the number of  | 
 | connection failures for 1/16 of the fail_window, for example, for a  | 
 | fail_window=240 seconds, the granularity of recording is 240/16=15 seconds. | 
 | That is, the first entry records the number of connection failures from time | 
 | t+0 to t+15, and the second entry records the number of connection failures  | 
 | from time t+16 to t+30, and so on. TS will mark the server as congested if  | 
 | the sum of the 9 entries is greater than the specified max_connection_failures. | 
 | Note that this algorithm will count number of failures in the past 240 to 255 | 
 | seconds. For higher accuracy, we need to increase the number of entries. | 
 |  | 
 |  | 
 | Operation: | 
 | ---------- | 
 |  | 
 | The following is an overview of the operation of this congestion control | 
 | feature in TS. After parsing a valid request, a TS calls its HostDB module | 
 | to get the HostDBInfo record for the host name. If the host name has more  | 
 | than one IP address, then TS selects one of them as usual.  | 
 | Then, TS uses the hostname, the selected IP address, and request URL to lookup | 
 | for the first matched rule in congestion.config.  | 
 |  | 
 | TS will lookup the CongestionDB   | 
 |  | 
 | case 1: "congested" is true and | 
 |          (current time <= "last_failure" + "proxy_retry_window"): | 
 | 	TS sends to the client a Retry-After response. | 
 | case 2: "congested" is true and | 
 | 	(current time > "last_failure" + "proxy_retry_window"): | 
 | 	TS makes a connection to this congested server. | 
 | case 3: "congested" is false and | 
 |         (current connections >= max_connections) | 
 | 	TS sends to the client a Retry-After response. | 
 | case 4: "congested" is false and | 
 | 	(current connections < max_connections) | 
 | 	TS makes a connection to this non-congested server. | 
 |  | 
 | Various timeouts and max_retry numbers are set up according to the matched  | 
 | rule in congestion.config. | 
 |  | 
 | If a connection failure is detected, TS updates the CongestionDB record. If it | 
 | is case 2, and the connection succeeds, we need to mark the server live again. | 
 |  | 
 |  | 
 | Problematic Behavior: | 
 | --------------------- | 
 | For a host with multiple nicknames, we night mis-calculate the number | 
 | of failures. | 
 |  | 
 | For example,  | 
 | 	www.berkeley.edu is a nickname for amber.berkeley.edu | 
 | 	amber.berkeley.edu has address 128.32.25.12.  | 
 | In this simple case, if you only specify www.berkeley.edu in the rule and use | 
 | per_host scheme, we will miss the info when request use amber.berkeley.edu  | 
 | as the hostname. | 
 |  | 
 | Another problem is with the granularity of connection failure recording. In  | 
 | some cases, TS will mark servers congested which is actually not congested  | 
 | according to the rules in congestion.config. | 
 |  | 
 | TS will not be able to distinguish a original server busy from TS itself  | 
 | is busy. | 
 |  | 
 | Implementation Limits: | 
 | --------------------- | 
 | 	1. granularity of connection failure records (17 entries). | 
 | 	2. maximum number of failures can be recorded (1<<16 = 65536). | 
 | 	3. potentail performance hit beause of updating congestion info | 
 |            (need to take locks to update the info) | 
 | Modules need to be touched: | 
 | --------------------------- | 
 | ControlMatcher | 
 | 	one new primary field is added ---- host_regex | 
 |  | 
 | HttpSM / HttpTransact | 
 | 	for apparent reasons | 
 |  | 
 | KNOWN PROBLEM | 
 | ============= | 
 | (1) The config filename for congestion control must be congestion.config,  | 
 |     this is a known bug for TS | 
 | (2) The test case:  | 
 |     proxy.config.http.congestion_control.enabled INT 2  | 
 |     is not implemented, due to the limited time for coding | 
 |  | 
 |  | 
 | TEST DESCRIPTIONS | 
 | ================= | 
 |  | 
 | Test description: | 
 | ----------------- | 
 | (1) Enable congestion control and specify rules for a few server in  | 
 |     congestion.config. Then run (existing) tests to verify the  | 
 |     "origin server connect attempts" configuration variable are still | 
 |     working for servers that are not specified in congestion.config. | 
 | (2) Enable congestion control and in congestion.config, specify a rule | 
 |     for a server that can be controlled (up/down) and has only one IP | 
 |     address. Verify TS follow the rule for the server by sending requests  | 
 |     for the server thru TS and controlling whether the server is up and  | 
 |     down. | 
 | (3) Specify a prefix rule and a dest_host rule on the same host, and | 
 |     control the service specified by the prefix, check if prefix rule is | 
 |     in effect. | 
 | (4) Repeat test (2) with a server with more than one IP address. Both  | 
 |     congestion schemes (per_ip and per_host) should be tested. | 
 |  | 
 | (5) Test all possiable conbinations of rules. Check the error_page and  | 
 |     error logs. | 
 |  | 
 | (6) Kill one of the servers that connected to TS, hence the server is  | 
 |     "congested". Ensure the congested alarm is signaled (check the | 
 |     WebUI) and a SNMP trap is sent Re-start the dead | 
 |     server, hence the server is alive. Ensure the alleviated alarm is | 
 |     signaled and a SNMP trap is sent. | 
 |  | 
 | Test tool: | 
 | ---------- | 
 | 	syntest could be a good candidate for functional tests. | 
 | 	for load test, try jtest combined with syntest. | 
 |  | 
 | Test configurations: | 
 | -------------------- | 
 |  | 
 |  | 
 | Change Log: | 
 | =========== | 
 |  | 
 | Removed Feature(s): | 
 | ***  Saving congestion control information to the disk. | 
 |  | 
 | New/Modified Feature(s): | 
 |  | 
 | *** traffic_line -q output format (short): | 
 |  | 
 | short format | 
 |    <time>|<rule #>|<hostname>|<ip_address>|<scheme>|<prefix>|<congestion reason>|<F#>|<M#> | 
 | long format | 
 |    <time>|<rule #>|<hostname>|<ip_address>|<scheme>|<prefix>|<congestion reason>|<F#>|<M#>|<local/GMT time>|<key>|<last_failure>|<num_fail_events>|<internal_ref count>|<num_connections> | 
 |  | 
 | 	- time : congestion detected time | 
 | 		 in seconds since 00:00:00 UTC, January 1, 1970. | 
 | 	- rule #  | 
 | 	- hostname | 
 | 	- ip address | 
 | 	- scheme: per_ip or per_host | 
 | 	- prefix (if none, leave blank) | 
 | 	- congestion reason: M or F | 
 |           M - congestion caused by exceeding max connections,  | 
 | 	  F - congestion caused by OS response timeout/failure | 
 |         - F#    number of congested requests because of F | 
 |         - M#    number of congested requests because of M | 
 | 	- key: the internal key in congestion control table | 
 | 	- local/GMT time:  YYYY/MM/DD hh:mm:ss  | 
 | 		CONFIG proxy.config.http.congestion_control.localtime INT 1 //localtime format | 
 | 		CONFIG proxy.config.http.congestion_control.localtime INT 0 //GMT format | 
 |  | 
 |  | 
 | *** telnet localhost <Raf port> | 
 | 0 congest list   | 
 | 		-- list congested servers at the moment (short format) | 
 | 0 congest list long [0-4] | 
 | 		-- list congested servers at the moment (long format) | 
 | 0 query deadhosts | 
 | 		-- list congested servers at the moment | 
 |  | 
 | 0 congest remove key=XXXXXXXXXXXXXX {key=XXXXXXXXXXXXXX}  | 
 | 		-- remove the entries whose keys are listed | 
 | 		   manual activate the congested server | 
 |  | 
 | 0 congest remove host=<hostname>[/prefix] | 
 |  | 
 | 0 congest remove ip=<xxx.xxx.xxx.xxx>[/prefix] | 
 |  | 
 | 0 congest remove all | 
 | 		-- remove all entries in the congestion control internal table | 
 |  | 
 |  | 
 |  | 
 |  | 
 |  | 
 |  |