Advanced data verification and plan verification tests for RuntimeFilter (#514)

* Advanced data verification and plan verification tests for RuntimeFilter

Runtime filter was added to DRILL as a part of DRILL-6385, DRILL-6731 and DRILL-6792.

This PR contains the following changes

- Data verification tests for broadcast and hash joins with join on different data types.
- Plan verification tests to verify Runtime Filter is applied on probe side scan, wherever applicable.
- The queries are run against TPC-H SF 100 data

* Minor fix, remove tsv header for nested broadcast scenario

* Incorporated review comments

- Changed .q and .q.explain extensions to .sql and .sql.explain.
- Explictly enable / disable broadcast join and set the threshold so that change in default values for these parameters will not affect the tests.
- Round decimals to 3 digits to have more stable test case verification.
- Remove "alter session" to conform with standard for setting / resetting session level parameters.
50 files changed
tree: fef109dc71021d7725ff655d8dd712622d5a466c
  1. bin/
  2. conf/
  3. framework/
  4. .gitignore
  5. after-run.sql
  6. AUTHORS.md
  7. before-run.sql
  8. CONTRIBUTING.md
  9. LICENSE.md
  10. README.md
README.md

Test Framework for Apache Drill

Test Framework for SQL on Hadoop technologies. Currently supports Apache Drill, a schema-free SQL query engine for Hadoop, NoSQL and cloud storage.

The framework is built for regression, integration & sanity testing. Includes test coverage (with baselines) for core Drill functionality, and supported features. And are used by the Apache Drill community for pre-commit regression and part of the release criteria.

Requirements

  1. The test framework requires a distributed file system such as HDFS or MapR-FS to be configured. Some of the tests can also be run against a local file system. By default, it's configured to run against MapR-FS. You can change the default behavior by modifying conf/core-site.xml. Refer to conf/core-site.xml.example for settings.
  2. To run all tests, Hive and HBase needs to be installed and running. To exclude Hive and HBase tests, please refer to the example in the Execute Tests section.
  3. The test framework should be run on a Drill cluster node. Refer to Drill documentation for details on how to setup Drill. It can also be run on a client node with additional configuration required.
  4. Cluster information are set in the conf/drillTestConfig.properties file. This is the main configuration file for the framework. It needs to be modified with local cluster info before compile the framework and run tests.

Build Project

To begin using the test framework, you need to build the project and download dependent datasets (configured in pom.xml).

git clone git@github.com:mapr/drill-test-framework.git
cd drill-test-framework
bin/build_framework -Pdownload

If you've already downloaded the datasets previously, you can simply skip the download.

Execute Tests

In the root directory of your repository, execute the following command to run tests:

bin/run_tests -s <suites> -g <groups> -t <Timeout> -x <Exclude> -n <Concurrency> -d

Example:

Contributing

We encourage contributions from users! You can fix bugs, make enhancements or add new tests. Create a PR here on GitHub for your change.

Refer to CONTRIBUTING.md for details on the test framework structure and instructions on how to contribute.

License

Licensed under the Apache License 2.0. Please see LICENSE.md