DESIGN - httpd-flood - Git at Google

 Flood Design Document
 ----------------------------------------

 Flood is an experimental multi-purpose HTTP load-testing tool.
 The design of flood is intended to be modular in the extreme to
 support testing of a wide variety of website applications.
 Flood is also intended to be scalable and accurate.

 ------------------------------
 Functional Requirements:
 ------------------------------

 The following is a list of functions we intend to incorporate into flood:

 1) Timing Metrics
  a) TCP connect() time
  b) Time to send request (time to fill local buffers)
  c) Time until first response chunk was received
     (tests network latency at the application layer (HTTP))
  d) Time to receive a full response
  e) (optional) Local Processing times, such as time to generate the Request
 2) Simple response error detection
 3) Load testing a set of URLs:
  a) Random
  b) Round-Robin
  c) Sequenced (with cookie propogation)
  d) Chaining of the above strategies
 4) Distributed load generation (using rsh/ssh)
 5) Complex response validation
  a) grep/regexp matching
  b) higher-level validation?
 6) Complex request generation
  a) Spider a site to generate a list of URLs
  b) Read a CERN log to generate URL paths from real users
    (could also add weights to the URLs depending on the occurance in the logs)
  c) Read a tcpdump/sniff packet trace to generate URL paths

 ------------------------------
 Functional Specifications:
 ------------------------------

 With the above functional goals in mind, the following components will have
 to be designed and implemented:

 1) Request/Response framework
  - Every hit to the site passes through this sequence of calls.
 2) Timer hooks in the framework as well as a generic timer facility
  - Hooks are placed to properly gauge request generation time, network
    I/O time, round-trip time, network latency as well as bandwidth,
    local processing time, etc...
  - Hook facilities also allow module authors to time various aspects of
    their processes, and aggregate them into the final report.
 3) Simple distributed processing environment:
  - Patterned after distributed models like cvs or rsync that use
    rsh/ssh to invoke a remote process. Statistics from this remote
    process must be regular and should ideally follow a standardized
    reporting format.
 4) Simple Reporting Format (? might be overengineering ?)
  - A simple format used to report statistics between modules, may they
    be remote processes or otherwise out-of-process.
  - I envision this being some really simple XML schema that all processes
    use to report data back to the terminal, where the originating process
    collects this data, processes, and reports in various formats (XML,
    human-readable, etc...)

 (more to come)

 ------------------------------
 Design Specifications:
 ------------------------------

 ------------------------------
 Modular Test Framework

 Flood provides the control logic that will invoke a test procedure and
 collect reported statistics. Tests are modular, and conform to this
 simple invocation interface. In order to simplify the task of a module
 developer, flood also provides an extensive support library.

 ------------------------------
 Test Invocation:

 Tests are invoked with a standard interface.

 [ proposal:

   typedef unsigned int testid_t; /* unique test id across this invocation of flood
                                     (including fork()s and remote invocations */

   apr_status_t invoke_test(config_t * config, /* to be defined */
                            testid_t testid);


 ]

 ------------------------------
 Transaction Reporting:

 All tests will collect and process operating statistics and report these
 back to the controlling process via standard calls. The following statistics
 are required from every module and for each HTTP transaction:

 - TCP Handshake time ( time spent in connect() )
 - Application-layer latency (Time to receive first HTTP bytes)
 - Idle time (total time spent waiting for data)
 - Response time (total time to receive the HTTP Response)
 ? - Total Transaction Time

 [ proposal:

   apr_status_t report_stats(const char * test_name,   /* short name/description */
                             int test_number,          /* unique to this test */
                             apr_status_t test_result, /* success, failure */
                             apr_interval_t tcp,
                             apr_interval_t first_resp,
                             apr_interval_t idle,
                             apr_interval_t total_resp,
                             apr_interval_t total);
 ]

 ------------------------------
 Module Configuration:

 [ mentioned above in "config_t", needs to be elaborated here.

   - what is the config syntax?
   - what is the definition of config_t?
   - what support functions do modules use to retrieve config values?
   - what if a needed key/value is missing?
 ]

 ------------------------------
 Flood Support Library:

 The support library provides many of the functions necessary to write
 a conforming test module.

 [ define this more, or move documentation to another file...
   I hate to suggest it, but maybe preparsed XML (apr supports it, no?)
 ]

 ------------------------------
 Local Parallel Tests:

 Given the above scenarios, one may wish to perform many tests in parallel.
 Flood provides two major way to accomplish this: Threaded and Forked.
 These two methods can actually be performed at the same time, allowing
 for fine-grain control of local resources to maximize test throughput.

 Threaded:

 The process is instructed to make a number of user-space threads,
 each of which will perform a complex event chain as described above. For
 the purpose of this document, each thread that performs an actual test
 is called a "worker".

 Forked:

 The process is instructed to make multiple duplicate copies of itself
 using the fork() system command, each of which can run one test
 or can run a threaded/parallelized group of tests. For the purpose
 of this document, each fork()ed process is called a "child".

 ------------------------------
 Distributed Tests:

 Further parallelization of the tests is possible, given access to a number
 of remote machines. Flood can invoke a remote instance of itself with
 either the "rsh" or "ssh" remote shell commands. These instances of flood
 are special, in that they do not report directly back to the user, but
 instead communicate their statistical information back to the main
 flood process, which aggregates the information and generates a human
 readable report.

 ------------------------------
 Interprocess Communication:

 In such a largely scalable and distributable system, one of the larger
 hurdles will be interprocess communication. One can imagine scenarios
 where a main flood process has spawned many remote instances, each of
 which (including the original) will fork into many children processes,
 each of which will run multiple parallelized testing event chains.
 When this massive invocation has completed, the user will be presented
 with a single unified report. To deal with this communication, a
 protocol must be designed to facilitate IPC.

 [ propose a protocol here, perhaps XML? ]

 ----------------------------------------
 $Id$
	Flood Design Document
	----------------------------------------

	Flood is an experimental multi-purpose HTTP load-testing tool.
	The design of flood is intended to be modular in the extreme to
	support testing of a wide variety of website applications.
	Flood is also intended to be scalable and accurate.

	------------------------------
	Functional Requirements:
	------------------------------

	The following is a list of functions we intend to incorporate into flood:

	1) Timing Metrics
	a) TCP connect() time
	b) Time to send request (time to fill local buffers)
	c) Time until first response chunk was received
	(tests network latency at the application layer (HTTP))
	d) Time to receive a full response
	e) (optional) Local Processing times, such as time to generate the Request
	2) Simple response error detection
	3) Load testing a set of URLs:
	a) Random
	b) Round-Robin
	c) Sequenced (with cookie propogation)
	d) Chaining of the above strategies
	4) Distributed load generation (using rsh/ssh)
	5) Complex response validation
	a) grep/regexp matching
	b) higher-level validation?
	6) Complex request generation
	a) Spider a site to generate a list of URLs
	b) Read a CERN log to generate URL paths from real users
	(could also add weights to the URLs depending on the occurance in the logs)
	c) Read a tcpdump/sniff packet trace to generate URL paths

	------------------------------
	Functional Specifications:
	------------------------------

	With the above functional goals in mind, the following components will have
	to be designed and implemented:

	1) Request/Response framework
	- Every hit to the site passes through this sequence of calls.
	2) Timer hooks in the framework as well as a generic timer facility
	- Hooks are placed to properly gauge request generation time, network
	I/O time, round-trip time, network latency as well as bandwidth,
	local processing time, etc...
	- Hook facilities also allow module authors to time various aspects of
	their processes, and aggregate them into the final report.
	3) Simple distributed processing environment:
	- Patterned after distributed models like cvs or rsync that use
	rsh/ssh to invoke a remote process. Statistics from this remote
	process must be regular and should ideally follow a standardized
	reporting format.
	4) Simple Reporting Format (? might be overengineering ?)
	- A simple format used to report statistics between modules, may they
	be remote processes or otherwise out-of-process.
	- I envision this being some really simple XML schema that all processes
	use to report data back to the terminal, where the originating process
	collects this data, processes, and reports in various formats (XML,
	human-readable, etc...)

	(more to come)

	------------------------------
	Design Specifications:
	------------------------------

	------------------------------
	Modular Test Framework

	Flood provides the control logic that will invoke a test procedure and
	collect reported statistics. Tests are modular, and conform to this
	simple invocation interface. In order to simplify the task of a module
	developer, flood also provides an extensive support library.

	------------------------------
	Test Invocation:

	Tests are invoked with a standard interface.

	[ proposal:

	typedef unsigned int testid_t; /* unique test id across this invocation of flood
	(including fork()s and remote invocations */

	apr_status_t invoke_test(config_t * config, /* to be defined */
	testid_t testid);


	]

	------------------------------
	Transaction Reporting:

	All tests will collect and process operating statistics and report these
	back to the controlling process via standard calls. The following statistics
	are required from every module and for each HTTP transaction:

	- TCP Handshake time ( time spent in connect() )
	- Application-layer latency (Time to receive first HTTP bytes)
	- Idle time (total time spent waiting for data)
	- Response time (total time to receive the HTTP Response)
	? - Total Transaction Time

	[ proposal:

	apr_status_t report_stats(const char * test_name, /* short name/description */
	int test_number, /* unique to this test */
	apr_status_t test_result, /* success, failure */
	apr_interval_t tcp,
	apr_interval_t first_resp,
	apr_interval_t idle,
	apr_interval_t total_resp,
	apr_interval_t total);
	]

	------------------------------
	Module Configuration:

	[ mentioned above in "config_t", needs to be elaborated here.

	- what is the config syntax?
	- what is the definition of config_t?
	- what support functions do modules use to retrieve config values?
	- what if a needed key/value is missing?
	]

	------------------------------
	Flood Support Library:

	The support library provides many of the functions necessary to write
	a conforming test module.

	[ define this more, or move documentation to another file...
	I hate to suggest it, but maybe preparsed XML (apr supports it, no?)
	]

	------------------------------
	Local Parallel Tests:

	Given the above scenarios, one may wish to perform many tests in parallel.
	Flood provides two major way to accomplish this: Threaded and Forked.
	These two methods can actually be performed at the same time, allowing
	for fine-grain control of local resources to maximize test throughput.

	Threaded:

	The process is instructed to make a number of user-space threads,
	each of which will perform a complex event chain as described above. For
	the purpose of this document, each thread that performs an actual test
	is called a "worker".

	Forked:

	The process is instructed to make multiple duplicate copies of itself
	using the fork() system command, each of which can run one test
	or can run a threaded/parallelized group of tests. For the purpose
	of this document, each fork()ed process is called a "child".

	------------------------------
	Distributed Tests:

	Further parallelization of the tests is possible, given access to a number
	of remote machines. Flood can invoke a remote instance of itself with
	either the "rsh" or "ssh" remote shell commands. These instances of flood
	are special, in that they do not report directly back to the user, but
	instead communicate their statistical information back to the main
	flood process, which aggregates the information and generates a human
	readable report.

	------------------------------
	Interprocess Communication:

	In such a largely scalable and distributable system, one of the larger
	hurdles will be interprocess communication. One can imagine scenarios
	where a main flood process has spawned many remote instances, each of
	which (including the original) will fork into many children processes,
	each of which will run multiple parallelized testing event chains.
	When this massive invocation has completed, the user will be presented
	with a single unified report. To deal with this communication, a
	protocol must be designed to facilitate IPC.

	[ propose a protocol here, perhaps XML? ]

	----------------------------------------
	$Id$