| <!-- |
| ~ Licensed to the Apache Software Foundation (ASF) under one |
| ~ or more contributor license agreements. See the NOTICE file |
| ~ distributed with this work for additional information |
| ~ regarding copyright ownership. The ASF licenses this file |
| ~ to you under the Apache License, Version 2.0 (the |
| ~ "License"); you may not use this file except in compliance |
| ~ with the License. You may obtain a copy of the License at |
| ~ |
| ~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~ |
| ~ Unless required by applicable law or agreed to in writing, |
| ~ software distributed under the License is distributed on an |
| ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| ~ KIND, either express or implied. See the License for the |
| ~ specific language governing permissions and limitations |
| ~ under the License. |
| --> |
| |
| # Docker Test Image for Druid |
| |
| Integration tests need a Druid cluster. While some tests support using |
| Kubernetes for the Quickstart cluster, most need a cluster with some |
| test-specific configuration. We use Docker Compose to create that cluster, |
| based on a test-oriented Docker image built by the `it-image` Maven module |
| (activated by the `test-image` profile.) |
| The image contains the Druid distribution, |
| unpacked, along with the MySQL and MariaDB client libaries and |
| and the Kafka protobuf dependency. Docker Compose is |
| used to pass configuration specific to each service. |
| |
| In addition to the Druid image, we use "official" images for dependencies such |
| as ZooKeeper, MySQL and Kafka. |
| |
| The image here is distinct from the |
| ["retail" image](https://druid.apache.org/docs/latest/tutorials/docker.html) |
| used for getting started. The test image: |
| |
| * Uses a shared directory to hold logs and some configuration. |
| * Uses "official" images for dependencies. |
| * Assumes the wrapper Docker compose scripts. |
| * Has some additional test-specific extensions as defind in `it-tools`. |
| |
| ## Build Process |
| |
| Assuming `DRUID_DEV` points to your Druid build directory, |
| to build the image (only): |
| |
| ```bash |
| cd $DRUID_DEV/docker-tests/it-image |
| mvn -P test-image install |
| ``` |
| |
| Building of the image occurs in four steps: |
| |
| * The Maven `pom.xml` file gathers versions and other information from the build. |
| It also uses the normal Maven dependency mechanism to download the MySQL, |
| MariaDB and |
| Kafka client libraries, then copies them to the `target/docker` directory. |
| It then invokes the `build-image.sh` script. |
| * `build-image.sh` adds the Druid build tarball from `distribution/target`, |
| copies the contents of `test-image/docker` to `target/docker` and |
| then invokes the `docker build` command. |
| * `docker build` uses `target/docker` as the context, and thus |
| uses the `Dockerfile` to build the image. The `Dockerfile` copies artifacts into |
| the image, then defers to the `test-setup.sh` script. |
| * The `test-setup.sh` script is copied into the image and run. This script does |
| the work of installing Druid. |
| |
| The resulting image is named `org.apache.druid/test:<version>`. |
| |
| ### Clean |
| |
| A normal `mvn clean` won't remove the Docker image because that is often not |
| what you want. Instead, do: |
| |
| ```bash |
| mvn clean -P test-image |
| ``` |
| |
| You can also remove the image using Docker or the Docker desktop. |
| |
| ### `target/docker` |
| |
| Docker requires that all build resources be within the current directory. We don't want |
| to change the source directory: in Maven, only the target directories should contain |
| build artifacts. So, the `pom.xml` file builds up a `target/docker` directory. The |
| `pom.xml` file then invokes the `build-image.sh` script to complete the setup. The |
| resulting directory structure is: |
| |
| ```text |
| /target/docker |
| |- Dockerfile (from docker/) |
| |- scripts (from docker/) |
| |- apache-druid-<version>-bin.tar.gz (from distribution, by build-image.sh) |
| |- MySQL client (done by pom.xml) |
| |- MariaDB client (done by pom.xml) |
| |- Kafka protobuf client (done by pom.xml) |
| ``` |
| |
| Then, we invoke `docker build` to build our test image. The `Dockerfile` copies |
| files into the image. Actual setup is done by the `test-setup.sh` script copied |
| into the image. |
| |
| Many Dockerfiles issue Linux commands inline. In some cases, this can speed up |
| subsequent builds because Docker can reuse layers. However, such Dockerfiles are |
| tedious to debug. It is far easier to do the detailed setup in a script within |
| the image. With this approach, you can debug the script by loading it into |
| the image, but don't run it in the Dockerfile. Instead, launch the image with |
| a `bash` shell and run the script by hand to debug. Since our build process |
| is quick, we don't lose much by reusing layers. |
| |
| ### Manual Image Rebuilds |
| |
| You can quick rebuild the image if you've previously run a Maven image build. |
| Assume `DRUID_DEV` points to your Druid development root. Start with a |
| Maven build: |
| |
| ```bash |
| cd $DRUID_DEV/docker/test-image |
| mvn -P test-image install |
| ``` |
| |
| Maven is rather slow to do its part. Let it grind away once to populate |
| `target/docker`. Then, as you debug the `Dockerfile`, or `test-setup.sh`, |
| you can build faster: |
| |
| ```bash |
| cd $DRUID_DEV/docker/test-image |
| ./rebuild.sh |
| ``` |
| |
| This works because the Maven build creates a file `target/env.sh` that |
| contains the Maven-defined environment. `rebuild.sh` reads that |
| environment, then proceeds as would the Maven build. |
| Image build time shrinks from about a minute to just a few seconds. |
| `rebuild.sh` will fail if `target/env.sh` is missing, which reminds |
| you to do the full Maven build that first time. |
| |
| Remember to do a full Maven build if you change the actual Druid code. |
| You'll need Maven to rebuild the affected jar file and to recreate the |
| distribution image. You can do this the slow way by doing a full rebuild, |
| or, if you are comfortable with maven, you can selectively run just the |
| one module build followed by just the distribution build. |
| |
| ## Image Contents |
| |
| The Druid test image adds the following to the base image: |
| |
| * A Debian base image with the target JDK installed. |
| * Druid in `/usr/local/druid` |
| * Script to run Druid: `/usr/local/launch.sh` |
| * Extra libraries (Kafka, MySQL, MariaDB) placed in the Druid `lib` directory. |
| |
| The specific "bill of materials" follows. `DRUID_HOME` is the location of |
| the Druid install and is set to `/usr/local/druid`. |
| |
| | Variable or Item | Source | Destination | |
| | -------- | ------ | ----- | |
| | Druid build | `distribution/target` | `$DRUID_HOME` | |
| | MySQL Connector | Maven repo | `$DRUID_HOME/lib` | |
| | Kafka Protobuf | Maven repo | `$DRUID_HOME/lib` | |
| | Druid launch script | `docker/launch.sh` | `/usr/local/launch.sh` | |
| | Env-var-to-config script | `docker/druid.sh` | `/usr/local/druid.sh` | |
| |
| Several environment variables are defined. `DRUID_HOME` is useful at |
| runtime. |
| |
| | Name | Description | |
| | ---- | ----------- | |
| | `DRUID_HOME` | Location of the Druid install | |
| | `DRUID_VERSION` | Druid version used to build the image | |
| | `JAVA_HOME` | Java location | |
| | `JAVA_VERSION` | Java version | |
| | `MYSQL_VERSION` | MySQL version (DB, connector) (not actually used) | |
| | `MYSQL_DRIVER_CLASSNAME` | Name of the MySQL driver (not actually used) | |
| | `CONFLUENT_VERSION` | Kafka Protobuf library version (not actually used) | |
| |
| ## Shared Directory |
| |
| The image assumes a "shared" directory passes in additional configuration |
| information, and exports logs and other items for inspection. |
| |
| * Location in the container: `/shared` |
| * Location on the host: `<project>/target/shared` |
| |
| This means that each test group has a distinct shared directory, |
| populated as needed for that test. |
| |
| Input items: |
| |
| | Item | Description | |
| | ---- | ----------- | |
| | `conf/` | `log4j.xml` config (optional) | |
| | `hadoop-xml/` | Hadoop configuration (optional) | |
| | `hadoop-dependencies/` | Hadoop dependencies (optional) | |
| | `lib/` | Extra Druid class path items (optional) | |
| |
| Output items: |
| |
| | Item | Description | |
| | ---- | ----------- | |
| | `logs/` | Log files from each service | |
| | `tasklogs/` | Indexer task logs | |
| | `kafka/` | Kafka persistence | |
| | `db/` | MySQL database | |
| | `druid/` | Druid persistence, etc. | |
| |
| Note on the `db` directory: the MySQL container creates this directory |
| when it starts. If you start, then restart the MySQL container, you *must* |
| remove the `db` directory before restart or MySQL will fail due to existing |
| files. |
| |
| ### Per-test Extensions |
| |
| The image build includes a standard set of extensions. Contrib or custom extensions |
| may wish to add additional extensions. This is most easily done not by altering the |
| image, but by adding the extensions at cluster startup. If the shared directory has |
| an `extensions` subdirectory, then that directory is added to the extension search |
| path on container startup. To add an extension `my-extension`, your shared directory |
| should look like this: |
| |
| ```text |
| shared |
| +- ... |
| +- extensions |
| +- my-extension |
| +- my-extension-<version>.jar |
| +- ... |
| ``` |
| |
| The `extensions` directory should be created within the per-cluster `setup.sh` script |
| which is when starting your test cluster. |
| |
| Be sure to also include the extension in the load list in your `docker-compose.py` template. |
| To load the extension on all nodes: |
| |
| ```python |
| def extend_druid_service(self, service): |
| self.add_env(service, 'druid_test_loadList', 'my-extension') |
| ``` |
| |
| Note that the above requires Druid and IT features added in early March, 2023. |
| |
| ### Third-Party Logs |
| |
| The three third-party containers are configured to log to the `/shared` |
| directory rather than to Docker: |
| |
| * Kafka: `/shared/logs/kafka.log` |
| * ZooKeeper: `/shared/logs/zookeeper.log` |
| * MySQL: `/shared/logs/mysql.log` |
| |
| ## Entry Point |
| |
| The container launches the `launch.sh` script which: |
| |
| * Converts environment variables to config files. |
| * Assembles the Java command line arguments, including those |
| explained above, and the just-generated config files. |
| * Launches Java as "pid 1" so it will receive signals. |
| |
| ### Run Configuration |
| |
| The "raw" Java environment variables are a bit overly broad and result |
| in copy/paste when a test wants to customize only part of the option, such |
| as JVM arguments. To assist, the image breaks configuration down into |
| smaller pieces, which it assembles prior to launch. |
| |
| | Enviornment Viable | Description | |
| | ------------------ | ----------- | |
| | `DRUID_SERVICE` | Name of the Druid service to run in the `server $DRUID_SERVICE` option | |
| | `DRUID_INSTANCE` | Suffix added to the `DRUID_SERVICE` to create the log file name. Use when running more than one of the same service. | |
| | `DRUID_COMMON_JAVA_OPTS` | Java options common to all services | |
| | `DRUID_SERVICE_JAVA_OPTS` | Java options for this one service or instance | |
| | `DEBUG_OPTS` | Optional debugging Java options | |
| | `LOG4J_CONFIG` | Optional Log4J configuration used in `-Dlog4j.configurationFile=$LOG4J_CONFIG` | |
| | `DRUID_CLASSPATH` | Optional extra Druid class path | |
| |
| In addition, three other shared directories are added to the class path if they exist: |
| |
| * `/shared/hadoop-xml` - included itself |
| * `/shared/lib` - Included as `/shared/lib/*` to include extra jars |
| * `/shared/resources` - included itself to hold extra class-path resources |
| |
| ### `init` Process |
| |
| Middle Manager launches Peon processes which must be reaped. |
| Add [the following option](https://docs.docker.com/compose/compose-file/compose-file-v2/#init) |
| to the Docker Compose configuration for this service: |
| |
| ```text |
| init: true |
| ``` |
| |
| ## Extensions |
| |
| The following extensions are installed in the image: |
| |
| ```text |
| druid-avro-extensions |
| druid-aws-rds-extensions |
| druid-azure-extensions |
| druid-basic-security |
| druid-bloom-filter |
| druid-datasketches |
| druid-ec2-extensions |
| druid-google-extensions |
| druid-hdfs-storage |
| druid-histogram |
| druid-kafka-extraction-namespace |
| druid-kafka-indexing-service |
| druid-kerberos |
| druid-kinesis-indexing-service |
| druid-kubernetes-extensions |
| druid-lookups-cached-global |
| druid-lookups-cached-single |
| druid-orc-extensions |
| druid-pac4j |
| druid-parquet-extensions |
| druid-protobuf-extensions |
| druid-ranger-security |
| druid-s3-extensions |
| druid-stats |
| it-tools |
| mysql-metadata-storage |
| postgresql-metadata-storage |
| simple-client-sslcontext |
| ``` |
| |
| If more are needed, they should be added during the image build. |