tree: f47b94d3e8f36a81c2121b59fe4ea296403fdc57 [path history] [tgz]
  1. src/
  2. build.gradle

TPC-DS Benchmark

Google Dataflow Runner

To execute TPC-DS benchmark for 1Gb dataset on Google Dataflow, run the following example command from the command line:

./gradlew :sdks:java:testing:tpcds:run -Ptpcds.args="--dataSize=1G \
  --runner=DataflowRunner \
  --queries=3,26,55 \
  --tpcParallel=2 \
  --dataDirectory=/path/to/tpcds_data/ \
  --project=apache-beam-testing \
  --stagingLocation=gs://beamsql_tpcds_1/staging \
  --tempLocation=gs://beamsql_tpcds_2/temp \
  --dataDirectory=/path/to/tpcds_data/ \
  --region=us-west1 \

To run a query using ZetaSQL planner (currently Query96 can be run using ZetaSQL), set the plannerName as below. If not specified, the default planner is Calcite.

./gradlew :sdks:java:testing:tpcds:run -Ptpcds.args="--dataSize=1G \
  --runner=DataflowRunner \
  --queries=96 \
  --tpcParallel=2 \
  --dataDirectory=/path/to/tpcds_data/ \
  --plannerName=org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner \
  --project=apache-beam-testing \
  --stagingLocation=gs://beamsql_tpcds_1/staging \
  --tempLocation=gs://beamsql_tpcds_2/temp \
  --region=us-west1 \

Spark Runner

To execute TPC-DS benchmark with Query3 for 1Gb dataset on Apache Spark 2.x, run the following example command from the command line:

./gradlew :sdks:java:testing:tpcds:run -Ptpcds.runner=":runners:spark:2" -Ptpcds.args=" \
  --runner=SparkRunner \
  --queries=3 \
  --tpcParallel=1 \
  --dataDirectory=/path/to/tpcds_data/ \
  --dataSize=1G \