Integration tests need a Druid cluster. While some tests support using Kubernetes for the Quickstart cluster, most need a cluster with some test-specific configuration. We use Docker Compose to create that cluster, based on a test-oriented Docker image built by the it-image
Maven module (activated by the test-image
profile.) The image contains the Druid distribution, unpacked, along with the MySQL and MariaDB client libaries and and the Kafka protobuf dependency. Docker Compose is used to pass configuration specific to each service.
In addition to the Druid image, we use “official” images for dependencies such as ZooKeeper, MySQL and Kafka.
The image here is distinct from the “retail” image used for getting started. The test image:
it-tools
.Assuming DRUID_DEV
points to your Druid build directory, to build the image (only):
cd $DRUID_DEV/docker-tests/it-image mvn -P test-image install
Building of the image occurs in four steps:
pom.xml
file gathers versions and other information from the build. It also uses the normal Maven dependency mechanism to download the MySQL, MariaDB and Kafka client libraries, then copies them to the target/docker
directory. It then invokes the build-image.sh
script.build-image.sh
adds the Druid build tarball from distribution/target
, copies the contents of test-image/docker
to target/docker
and then invokes the docker build
command.docker build
uses target/docker
as the context, and thus uses the Dockerfile
to build the image. The Dockerfile
copies artifacts into the image, then defers to the test-setup.sh
script.test-setup.sh
script is copied into the image and run. This script does the work of installing Druid.The resulting image is named org.apache.druid/test:<version>
.
A normal mvn clean
won't remove the Docker image because that is often not what you want. Instead, do:
mvn clean -P test-image
You can also remove the image using Docker or the Docker desktop.
target/docker
Docker requires that all build resources be within the current directory. We don't want to change the source directory: in Maven, only the target directories should contain build artifacts. So, the pom.xml
file builds up a target/docker
directory. The pom.xml
file then invokes the build-image.sh
script to complete the setup. The resulting directory structure is:
/target/docker |- Dockerfile (from docker/) |- scripts (from docker/) |- apache-druid-<version>-bin.tar.gz (from distribution, by build-image.sh) |- MySQL client (done by pom.xml) |- MariaDB client (done by pom.xml) |- Kafka protobuf client (done by pom.xml)
Then, we invoke docker build
to build our test image. The Dockerfile
copies files into the image. Actual setup is done by the test-setup.sh
script copied into the image.
Many Dockerfiles issue Linux commands inline. In some cases, this can speed up subsequent builds because Docker can reuse layers. However, such Dockerfiles are tedious to debug. It is far easier to do the detailed setup in a script within the image. With this approach, you can debug the script by loading it into the image, but don‘t run it in the Dockerfile. Instead, launch the image with a bash
shell and run the script by hand to debug. Since our build process is quick, we don’t lose much by reusing layers.
You can quick rebuild the image if you've previously run a Maven image build. Assume DRUID_DEV
points to your Druid development root. Start with a Maven build:
cd $DRUID_DEV/docker/test-image mvn -P test-image install
Maven is rather slow to do its part. Let it grind away once to populate target/docker
. Then, as you debug the Dockerfile
, or test-setup.sh
, you can build faster:
cd $DRUID_DEV/docker/test-image ./rebuild.sh
This works because the Maven build creates a file target/env.sh
that contains the Maven-defined environment. rebuild.sh
reads that environment, then proceeds as would the Maven build. Image build time shrinks from about a minute to just a few seconds. rebuild.sh
will fail if target/env.sh
is missing, which reminds you to do the full Maven build that first time.
Remember to do a full Maven build if you change the actual Druid code. You'll need Maven to rebuild the affected jar file and to recreate the distribution image. You can do this the slow way by doing a full rebuild, or, if you are comfortable with maven, you can selectively run just the one module build followed by just the distribution build.
The Druid test image adds the following to the base image:
/usr/local/druid
/usr/local/launch.sh
lib
directory.The specific “bill of materials” follows. DRUID_HOME
is the location of the Druid install and is set to /usr/local/druid
.
Variable or Item | Source | Destination |
---|---|---|
Druid build | distribution/target | $DRUID_HOME |
MySQL Connector | Maven repo | $DRUID_HOME/lib |
Kafka Protobuf | Maven repo | $DRUID_HOME/lib |
Druid launch script | docker/launch.sh | /usr/local/launch.sh |
Env-var-to-config script | docker/druid.sh | /usr/local/druid.sh |
Several environment variables are defined. DRUID_HOME
is useful at runtime.
Name | Description |
---|---|
DRUID_HOME | Location of the Druid install |
DRUID_VERSION | Druid version used to build the image |
JAVA_HOME | Java location |
JAVA_VERSION | Java version |
MYSQL_VERSION | MySQL version (DB, connector) (not actually used) |
MYSQL_DRIVER_CLASSNAME | Name of the MySQL driver (not actually used) |
CONFLUENT_VERSION | Kafka Protobuf library version (not actually used) |
The image assumes a “shared” directory passes in additional configuration information, and exports logs and other items for inspection.
/shared
<project>/target/shared
This means that each test group has a distinct shared directory, populated as needed for that test.
Input items:
Item | Description |
---|---|
conf/ | log4j.xml config (optional) |
hadoop-xml/ | Hadoop configuration (optional) |
hadoop-dependencies/ | Hadoop dependencies (optional) |
lib/ | Extra Druid class path items (optional) |
Output items:
Item | Description |
---|---|
logs/ | Log files from each service |
tasklogs/ | Indexer task logs |
kafka/ | Kafka persistence |
db/ | MySQL database |
druid/ | Druid persistence, etc. |
Note on the db
directory: the MySQL container creates this directory when it starts. If you start, then restart the MySQL container, you must remove the db
directory before restart or MySQL will fail due to existing files.
The three third-party containers are configured to log to the /shared
directory rather than to Docker:
/shared/logs/kafka.log
/shared/logs/zookeeper.log
/shared/logs/mysql.log
The container launches the launch.sh
script which:
The “raw” Java environment variables are a bit overly broad and result in copy/paste when a test wants to customize only part of the option, such as JVM arguments. To assist, the image breaks configuration down into smaller pieces, which it assembles prior to launch.
Enviornment Viable | Description |
---|---|
DRUID_SERVICE | Name of the Druid service to run in the server $DRUID_SERVICE option |
DRUID_INSTANCE | Suffix added to the DRUID_SERVICE to create the log file name. |
Use when running more than one of the same service. | |
DRUID_COMMON_JAVA_OPTS | Java options common to all services |
DRUID_SERVICE_JAVA_OPTS | Java options for this one service or instance |
DEBUG_OPTS | Optional debugging Java options |
LOG4J_CONFIG | Optional Log4J configuration used in -Dlog4j.configurationFile=$LOG4J_CONFIG |
DRUID_CLASSPATH | Optional extra Druid class path |
In addition, three other shared directories are added to the class path if they exist:
/shared/hadoop-xml
- included itself/shared/lib
- Included as /shared/lib/*
to include extra jars/shared/resources
- included itself to hold extra class-path resourcesinit
ProcessMiddle Manager launches Peon processes which must be reaped. Add the following option to the Docker Compose configuration for this service:
init: true
The following extensions are installed in the image:
druid-avro-extensions druid-aws-rds-extensions druid-azure-extensions druid-basic-security druid-bloom-filter druid-datasketches druid-ec2-extensions druid-google-extensions druid-hdfs-storage druid-histogram druid-kafka-extraction-namespace druid-kafka-indexing-service druid-kerberos druid-kinesis-indexing-service druid-kubernetes-extensions druid-lookups-cached-global druid-lookups-cached-single druid-orc-extensions druid-pac4j druid-parquet-extensions druid-protobuf-extensions druid-ranger-security druid-s3-extensions druid-stats it-tools mysql-metadata-storage postgresql-metadata-storage simple-client-sslcontext
If more are needed, they should be added during the image build.