tests/comparison/leopard/README - impala - Git at Google

 Summary
 -------

 This package runs the query generator continuously. Compares Impala and Postgres results
 for a randomly generated query and produces several reports per day. Reports are
 displayed on a web page which allows the user to conveniently examine the discovered
 issues. The user can also start a custom run against a private Impala branch using the
 web interface.

 Requirements
 ------------

 Docker -- A docker image with Impala and PostgresQL installed and at
     least one reference database loaded into PostgresQL. data_generator.py is a useful
     tool to migrate data from Impala into PostgresQL.

 To get started, run ./controller.py and ./front_end.py. You should be able to view the
 web page at http://localhost:5000. Results and logs are saved to /tmp/query_gen


 Basic Configuration
 -------------------

 The following are useful environment variables for running the
 controller and Docker images within it.

 DOCKER_USER - user *within* the Impala Docker container who owns the
 Impala source tree and test data.

 DOCKER_PASSWORD - password for the user *within* the Impala Docker
 container.

 TARGET_HOST - host system on which Docker Engine is running. This is the
 host that the controller will use to issue Docker commands like "docker
 run".

 TARGET_HOST_USERNAME - username for controller process to use to SSH
 into TARGET_HOST. Via Fabric, one can either type a password or use SSH
 keys.

 DOCKER_IMAGE_NAME - image to pull via "docker pull"


 External Volume Configuration
 -----------------------------

 To run Leopard against Impala with Kudu, we need to work around
 KUDU-1419. KUDU-1419 is likely to occur if your Docker Storage Engine is
 AUFS, or maybe others.  The easiest way to overcome this is to mount an
 external Docker volume that contain the necessary test data.  To try to
 handle this automatically, you can export any or all of the environment
 variables, depending on your host and container setups:

 DOCKER_IMPALA_USER_UID, DOCKER_IMPALA_USER_GID - numeric UID and GID for
 the owner of the Impala test data (testdata/cluster from an Impala
 source checkout) within your Docker container. Numeric IDs are needed,
 because there is no guarantee the symbolic owner and group on the
 container match the IDs on the target host.

 HOST_TESTDATA_EXTERNAL_VOLUME_PATH - path on TARGET_HOST where the
 external volume will reside. This is the destination for rsync to warm
 the volume and the left-hand side of "docker run -v".

 DOCKER_TESTDATA_VOLUME_PATH - path on your Docker container to the
 testdata/cluster Impala directory. This is source for rsync to warm the
 volume and the right-hand side of "docker run -v".

 HOST_TO_DOCKER_SSH_KEY - name of private key on TARGET_HOST for use with
 rsync so as to "warm" the external volume automatically.

 You are encouraged to configure your container in such a way that rsync
 with passwordless SSH is possible so as to create the external volume
 using the environment variables above.

 To do that, this is a handy guide on how to use rsync with SSH keys:

 https://www.guyrutenberg.com/2014/01/14/restricting-ssh-access-to-rsync/
	Summary
	-------

	This package runs the query generator continuously. Compares Impala and Postgres results
	for a randomly generated query and produces several reports per day. Reports are
	displayed on a web page which allows the user to conveniently examine the discovered
	issues. The user can also start a custom run against a private Impala branch using the
	web interface.

	Requirements
	------------

	Docker -- A docker image with Impala and PostgresQL installed and at
	least one reference database loaded into PostgresQL. data_generator.py is a useful
	tool to migrate data from Impala into PostgresQL.

	To get started, run ./controller.py and ./front_end.py. You should be able to view the
	web page at http://localhost:5000. Results and logs are saved to /tmp/query_gen


	Basic Configuration
	-------------------

	The following are useful environment variables for running the
	controller and Docker images within it.

	DOCKER_USER - user within the Impala Docker container who owns the
	Impala source tree and test data.

	DOCKER_PASSWORD - password for the user within the Impala Docker
	container.

	TARGET_HOST - host system on which Docker Engine is running. This is the
	host that the controller will use to issue Docker commands like "docker
	run".

	TARGET_HOST_USERNAME - username for controller process to use to SSH
	into TARGET_HOST. Via Fabric, one can either type a password or use SSH
	keys.

	DOCKER_IMAGE_NAME - image to pull via "docker pull"


	External Volume Configuration
	-----------------------------

	To run Leopard against Impala with Kudu, we need to work around
	KUDU-1419. KUDU-1419 is likely to occur if your Docker Storage Engine is
	AUFS, or maybe others. The easiest way to overcome this is to mount an
	external Docker volume that contain the necessary test data. To try to
	handle this automatically, you can export any or all of the environment
	variables, depending on your host and container setups:

	DOCKER_IMPALA_USER_UID, DOCKER_IMPALA_USER_GID - numeric UID and GID for
	the owner of the Impala test data (testdata/cluster from an Impala
	source checkout) within your Docker container. Numeric IDs are needed,
	because there is no guarantee the symbolic owner and group on the
	container match the IDs on the target host.

	HOST_TESTDATA_EXTERNAL_VOLUME_PATH - path on TARGET_HOST where the
	external volume will reside. This is the destination for rsync to warm
	the volume and the left-hand side of "docker run -v".

	DOCKER_TESTDATA_VOLUME_PATH - path on your Docker container to the
	testdata/cluster Impala directory. This is source for rsync to warm the
	volume and the right-hand side of "docker run -v".

	HOST_TO_DOCKER_SSH_KEY - name of private key on TARGET_HOST for use with
	rsync so as to "warm" the external volume automatically.

	You are encouraged to configure your container in such a way that rsync
	with passwordless SSH is possible so as to create the external volume
	using the environment variables above.

	To do that, this is a handy guide on how to use rsync with SSH keys:

	https://www.guyrutenberg.com/2014/01/14/restricting-ssh-access-to-rsync/