blob: a8144ac2ebcdbc5492229ab744bd9939f6ec36d3 [file] [log] [blame]
Summary
-------
This package runs the query generator continuously. Compares Impala and Postgres results
for a randomly generated query and produces several reports per day. Reports are
displayed on a web page which allows the user to conveniently examine the discovered
issues. The user can also start a custom run against a private Impala branch using the
web interface.
Requirements
------------
Docker -- A docker image with Impala and PostgresQL installed and at
least one reference database loaded into PostgresQL. data_generator.py is a useful
tool to migrate data from Impala into PostgresQL.
To get started, run ./controller.py and ./front_end.py. You should be able to view the
web page at http://localhost:5000. Results and logs are saved to /tmp/query_gen
Basic Configuration
-------------------
The following are useful environment variables for running the
controller and Docker images within it.
DOCKER_USER - user *within* the Impala Docker container who owns the
Impala source tree and test data.
DOCKER_PASSWORD - password for the user *within* the Impala Docker
container.
TARGET_HOST - host system on which Docker Engine is running. This is the
host that the controller will use to issue Docker commands like "docker
run".
TARGET_HOST_USERNAME - username for controller process to use to SSH
into TARGET_HOST. Via Fabric, one can either type a password or use SSH
keys.
DOCKER_IMAGE_NAME - image to pull via "docker pull"
External Volume Configuration
-----------------------------
To run Leopard against Impala with Kudu, we need to work around
KUDU-1419. KUDU-1419 is likely to occur if your Docker Storage Engine is
AUFS, or maybe others. The easiest way to overcome this is to mount an
external Docker volume that contain the necessary test data. To try to
handle this automatically, you can export any or all of the environment
variables, depending on your host and container setups:
DOCKER_IMPALA_USER_UID, DOCKER_IMPALA_USER_GID - numeric UID and GID for
the owner of the Impala test data (testdata/cluster from an Impala
source checkout) within your Docker container. Numeric IDs are needed,
because there is no guarantee the symbolic owner and group on the
container match the IDs on the target host.
HOST_TESTDATA_EXTERNAL_VOLUME_PATH - path on TARGET_HOST where the
external volume will reside. This is the destination for rsync to warm
the volume and the left-hand side of "docker run -v".
DOCKER_TESTDATA_VOLUME_PATH - path on your Docker container to the
testdata/cluster Impala directory. This is source for rsync to warm the
volume and the right-hand side of "docker run -v".
HOST_TO_DOCKER_SSH_KEY - name of private key on TARGET_HOST for use with
rsync so as to "warm" the external volume automatically.
You are encouraged to configure your container in such a way that rsync
with passwordless SSH is possible so as to create the external volume
using the environment variables above.
To do that, this is a handy guide on how to use rsync with SSH keys:
https://www.guyrutenberg.com/2014/01/14/restricting-ssh-access-to-rsync/