tree: 435941e3b50a97c0985e059060e0f820622b592d [path history] [tgz]
  1. src/
  2. pom.xml
  3. README.md
dev/kyuubi-tpcds/README.md

Introduction

This module includes TPC-DS data generator and benchmark tool.

How to use

package jar with following command: ./build/mvn clean package -Ptpcds -pl dev/kyuubi-tpcds -am

Data Generator

Support options:

keydefaultdescription
dbdefaultthe database to write data
scaleFactor1the scale factor of TPC-DS
formatparquetthe format of table to store data
parallelscaleFactor * 2the parallelism of Spark job

Example: the following command to generate 10GB data with new database tpcds_sf10.

$SPARK_HOME/bin/spark-submit \
  --class org.apache.kyuubi.tpcds.DataGenerator \
  kyuubi-tpcds_*.jar \
  --db tpcds_sf10 --scaleFactor 10 --format parquet --parallel 20

Benchmark Tool

Support options:

keydefaultdescription
dbnone(required)the TPC-DS database
benchmarktpcds-v2.4-benchmarkthe name of application
iterations3the number of iterations to run
breakdownfalsewhether to record breakdown results of an execution
filterafilter on the name of the queries to run, e.g. q1-v2.4
results-dir/spark/sql/performancedir to store benchmark results, e.g. hdfs://hdfs-nn:9870/pref

Example: the following command to benchmark TPC-DS sf10 with exists database tpcds_sf10.

$SPARK_HOME/bin/spark-submit \
  --class org.apache.kyuubi.tpcds.benchmark.RunBenchmark \
  kyuubi-tpcds_*.jar --db tpcds_sf10

We also support run one of the TPC-DS query:

$SPARK_HOME/bin/spark-submit \
  --class org.apache.kyuubi.tpcds.benchmark.RunBenchmark \
  kyuubi-tpcds_*.jar --db tpcds_sf10 --filter q1-v2.4

The result of TPC-DS benchmark like:

nameminTimeMsmaxTimeMsavgTimeMsstdDevstdDevPercent
q1-v2.450.522384868.010383323.398267471.6482145.8413108576