testdata/bin/README-BENCHMARK-TEST-GENERATION - impala - Git at Google

 This is a overview of the benchmark workflow and the scripts involved. The workflow is as follows:

 1) Create base benchmark schema and load data into these tables.
 2) Create extended benchmark schema (different file formats, compression, etc)
    and load data by copying from tables created in 1) using INSERT statements.
 3) Run the benchmarks using $IMPALA_HOME/bin/run_benchmark.py

 The *.sql scripts to create the extended benchmarks schema and data loading are dynamically generated
 using the generate_benchmark_statements.rb script. This script reads in files that describe what
 combinations of data set, file format, compression algorithm to be used and outputs the query
 files.

 The input to the generate_benchmark_statements.rb script is generated using the
 generate_test_vectors.rb script. This script looks at the different dimension values (defined
 in benchmark_dimensions.yaml) such as file format = rcfile, sequence file, text and outputs
 a set of test vectors. It outputs both an exhaustive and reduced set of combinations.

 Currently, a pre-generated set of vectors is checked in along with the *.sql files so these
 scripts don't need to be run unless there is a new dimension added/removed. These can be viewed
 at: benchmark_*.vector and create-benchmark*-generated.sql.

 For more information about these scripts please view the comments within the scripts themselves.
	This is a overview of the benchmark workflow and the scripts involved. The workflow is as follows:

	1) Create base benchmark schema and load data into these tables.
	2) Create extended benchmark schema (different file formats, compression, etc)
	and load data by copying from tables created in 1) using INSERT statements.
	3) Run the benchmarks using $IMPALA_HOME/bin/run_benchmark.py

	The *.sql scripts to create the extended benchmarks schema and data loading are dynamically generated
	using the generate_benchmark_statements.rb script. This script reads in files that describe what
	combinations of data set, file format, compression algorithm to be used and outputs the query
	files.

	The input to the generate_benchmark_statements.rb script is generated using the
	generate_test_vectors.rb script. This script looks at the different dimension values (defined
	in benchmark_dimensions.yaml) such as file format = rcfile, sequence file, text and outputs
	a set of test vectors. It outputs both an exhaustive and reduced set of combinations.

	Currently, a pre-generated set of vectors is checked in along with the *.sql files so these
	scripts don't need to be run unless there is a new dimension added/removed. These can be viewed
	at: benchmark_.vector and create-benchmark-generated.sql.

	For more information about these scripts please view the comments within the scripts themselves.