| --- |
| { |
| "title": "Star Schema Benchmark", |
| "language": "en" |
| } |
| --- |
| |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| |
| # Star Schema Benchmark |
| |
| [Star Schema Benchmark(SSB)](https://www.cs.umb.edu/~poneil/StarSchemaB.PDF) is a lightweight performance test set in the data warehouse scenario. SSB provides a simplified star schema data based on [TPC-H](http://www.tpc.org/tpch/), which is mainly used to test the performance of multi-table JOIN query under star schema. In addition, the industry usually flattens SSB into a wide table model (Referred as: SSB flat) to test the performance of the query engine, refer to [Clickhouse](https://clickhouse.com/docs/zh/getting-started). |
| |
| This document mainly introduces the performance of Doris on the SSB 1000G test set. |
| |
| We tested 13 queries on the SSB standard test dataset based on Apache Doris version 2.0.6. |
| |
| ## 1. Hardware Environment |
| |
| | Hardware | Configuration Instructions | |
| |---------------------|-------------------------------------------| |
| | Number of mMachines | 4 Tencent Cloud Virtual Machine(1FE,3BEs) | |
| | CPU | AMD EPYC™ Milan(2.55GHz/3.5GHz) 48C | |
| | Memory | 192G | |
| | Network | 21Gbps | |
| | Disk | ESSD Cloud Hard Disk | |
| |
| ## 2. Software Environment |
| |
| - Doris Deployed 3BEs and 1FE |
| - Kernel Version: Linux version 5.4.0-96-generic (buildd@lgw01-amd64-051) |
| - OS version: Ubuntu 20.04 LTS (Focal Fossa) |
| - Doris software version: Apache Doris 2.0.6. |
| - JDK: openjdk version "1.8.0_131" |
| |
| ## 3. Test Data Volume |
| |
| | SSB Table Name | Rows | Annotation | |
| |:---------------|:--------------|:---------------------------------| |
| | lineorder | 5,999,989,709 | Commodity Order Details | |
| | customer | 30,000,000 | Customer Information | |
| | part | 2,000,000 | Parts Information | |
| | supplier | 2,000,000 | Supplier Information | |
| | dates | 2,556 | Date | |
| | lineorder_flat | 5,999,989,709 | Wide Table after Data Flattening | |
| |
| ## 4. SSB Flat Test Results |
| |
| Here we use Apache Doris 2.0.6 for comparative testing. In the test, we use Query Time(ms) as the main performance indicator. The test results are as follows: |
| |
| | Query | Doris 2.0.6 (ms) | |
| |-------|------------------| |
| | q1.1 | 86 | |
| | q1.2 | 31 | |
| | q1.3 | 87 | |
| | q2.1 | 1046 | |
| | q2.2 | 569 | |
| | q2.3 | 480 | |
| | q3.1 | 1339 | |
| | q3.2 | 957 | |
| | q3.3 | 215 | |
| | q3.4 | 34 | |
| | q4.1 | 1569 | |
| | q4.2 | 174 | |
| | q4.3 | 109 | |
| | Total | 6696 | |
| |
| |
| ## 5. Standard SSB Test Results |
| |
| Here we use Apache Doris 2.0.6 for comparative testing. In the test, we use Query Time(ms) as the main performance indicator. The test results are as follows: |
| |
| | Query | Doris 2.0.6 (ms) | |
| |-------|------------------| |
| | q1.1 | 332 | |
| | q1.2 | 86 | |
| | q1.3 | 80 | |
| | q2.1 | 985 | |
| | q2.2 | 844 | |
| | q2.3 | 768 | |
| | q3.1 | 2924 | |
| | q3.2 | 944 | |
| | q3.3 | 766 | |
| | q3.4 | 146 | |
| | q4.1 | 3451 | |
| | q4.2 | 829 | |
| | q4.3 | 325 | |
| | Total | 12480 | |
| |
| ## 6. Environment Preparation |
| |
| Please first refer to the [official documentation](. /install/install-deploy.md) to install and deploy Apache Doris first to obtain a Doris cluster which is working well(including at least 1 FE 1 BE, 1 FE 3 BEs is recommended). |
| |
| ## 7. Data Preparation |
| |
| ### 7.1 Download and Install the SSB Data Generation Tool. |
| |
| Execute the following script to download and compile the [ssb-tools](https://github.com/apache/doris/tree/master/tools/ssb-tools) tool. |
| |
| ```shell |
| sh bin/build-ssb-dbgen.sh |
| ``` |
| |
| After successful installation, the `dbgen` binary will be generated under the `ssb-dbgen/` directory. |
| |
| ### 7.2 Generate SSB Test Set |
| |
| Execute the following script to generate the SSB dataset: |
| |
| ```shell |
| sh bin/gen-ssb-data.sh -s 1000 |
| ``` |
| |
| > Note 1: Check the script help via `sh gen-ssb-data.sh -h`. |
| > |
| > Note 2: The data will be generated under the `ssb-data/` directory with the suffix `.tbl`. The total file size is about 600GB and may need a few minutes to an hour to generate. |
| > |
| > Note 3: A standard test data set of 100G is generated by default. |
| |
| ### 7.3 Create Table |
| |
| #### 7.3.1 Prepare the `doris-cluster.conf` File. |
| |
| Before import the script, you need to write the FE’s ip port and other information in the `doris-cluster.conf` file. |
| |
| The file is located under `${DORIS_HOME}/tools/ssb-tools/conf/`. |
| |
| The content of the file includes FE's ip, HTTP port, user name, password and the DB name of the data to be imported: |
| |
| ```shell |
| # Any of FE host |
| export FE_HOST='127.0.0.1' |
| # http_port in fe.conf |
| export FE_HTTP_PORT=8030 |
| # query_port in fe.conf |
| export FE_QUERY_PORT=9030 |
| # Doris username |
| export USER='root' |
| # Doris password |
| export PASSWORD='' |
| # The database where SSB tables located |
| export DB='ssb' |
| ``` |
| |
| #### 7.3.2 Execute the Following Script to Generate and Create the SSB Table: |
| |
| ```shell |
| sh bin/create-ssb-tables.sh -s 1000 |
| ``` |
| |
| Or copy the table creation statements in [create-ssb-tables.sql](https://github.com/apache/incubator-doris/tree/master/tools/ssb-tools/ddl/create-ssb-tables-sf1000.sql) and [ create-ssb-flat-table.sql](https://github.com/apache/incubator-doris/tree/master/tools/ssb-tools/ddl/create-ssb-flat-tables-sf1000.sql) and then execute them in the MySQL client. |
| |
| ### 7.4 Import data |
| |
| We use the following command to complete all data import of SSB test set and SSB FLAT wide table data synthesis and then import into the table. |
| |
| ```shell |
| sh bin/load-ssb-data.sh |
| ``` |
| |
| ### 7.5 Checking Imported data |
| |
| ```sql |
| select count(*) from part; |
| select count(*) from customer; |
| select count(*) from supplier; |
| select count(*) from dates; |
| select count(*) from lineorder; |
| select count(*) from lineorder_flat; |
| ``` |
| |
| ### 7.6 Query Test |
| |
| - SSB-Flat Query Statement: [ ssb-flat-queries](https://github.com/apache/doris/tree/master/tools/ssb-tools/ssb-flat-queries) |
| - Standard SSB Queries: [ ssb-queries](https://github.com/apache/doris/tree/master/tools/ssb-tools/ssb-queries) |
| |
| #### 7.6.1 SSB FLAT Test for SQL |
| |
| ```sql |
| --Q1.1 |
| SELECT SUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue |
| FROM lineorder_flat |
| WHERE |
| LO_ORDERDATE >= 19930101 |
| AND LO_ORDERDATE <= 19931231 |
| AND LO_DISCOUNT BETWEEN 1 AND 3 |
| AND LO_QUANTITY < 25; |
| |
| --Q1.2 |
| SELECT SUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue |
| FROM lineorder_flat |
| WHERE |
| LO_ORDERDATE >= 19940101 |
| AND LO_ORDERDATE <= 19940131 |
| AND LO_DISCOUNT BETWEEN 4 AND 6 |
| AND LO_QUANTITY BETWEEN 26 AND 35; |
| |
| --Q1.3 |
| SELECT SUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue |
| FROM lineorder_flat |
| WHERE |
| weekofyear(LO_ORDERDATE) = 6 |
| AND LO_ORDERDATE >= 19940101 |
| AND LO_ORDERDATE <= 19941231 |
| AND LO_DISCOUNT BETWEEN 5 AND 7 |
| AND LO_QUANTITY BETWEEN 26 AND 35; |
| |
| --Q2.1 |
| SELECT |
| SUM(LO_REVENUE), (LO_ORDERDATE DIV 10000) AS YEAR, |
| P_BRAND |
| FROM lineorder_flat |
| WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA' |
| GROUP BY YEAR, P_BRAND |
| ORDER BY YEAR, P_BRAND; |
| |
| --Q2.2 |
| SELECT |
| SUM(LO_REVENUE), (LO_ORDERDATE DIV 10000) AS YEAR, |
| P_BRAND |
| FROM lineorder_flat |
| WHERE |
| P_BRAND >= 'MFGR#2221' |
| AND P_BRAND <= 'MFGR#2228' |
| AND S_REGION = 'ASIA' |
| GROUP BY YEAR, P_BRAND |
| ORDER BY YEAR, P_BRAND; |
| |
| --Q2.3 |
| SELECT |
| SUM(LO_REVENUE), (LO_ORDERDATE DIV 10000) AS YEAR, |
| P_BRAND |
| FROM lineorder_flat |
| WHERE |
| P_BRAND = 'MFGR#2239' |
| AND S_REGION = 'EUROPE' |
| GROUP BY YEAR, P_BRAND |
| ORDER BY YEAR, P_BRAND; |
| |
| --Q3.1 |
| SELECT |
| C_NATION, |
| S_NATION, (LO_ORDERDATE DIV 10000) AS YEAR, |
| SUM(LO_REVENUE) AS revenue |
| FROM lineorder_flat |
| WHERE |
| C_REGION = 'ASIA' |
| AND S_REGION = 'ASIA' |
| AND LO_ORDERDATE >= 19920101 |
| AND LO_ORDERDATE <= 19971231 |
| GROUP BY C_NATION, S_NATION, YEAR |
| ORDER BY YEAR ASC, revenue DESC; |
| |
| --Q3.2 |
| SELECT |
| C_CITY, |
| S_CITY, (LO_ORDERDATE DIV 10000) AS YEAR, |
| SUM(LO_REVENUE) AS revenue |
| FROM lineorder_flat |
| WHERE |
| C_NATION = 'UNITED STATES' |
| AND S_NATION = 'UNITED STATES' |
| AND LO_ORDERDATE >= 19920101 |
| AND LO_ORDERDATE <= 19971231 |
| GROUP BY C_CITY, S_CITY, YEAR |
| ORDER BY YEAR ASC, revenue DESC; |
| |
| --Q3.3 |
| SELECT |
| C_CITY, |
| S_CITY, (LO_ORDERDATE DIV 10000) AS YEAR, |
| SUM(LO_REVENUE) AS revenue |
| FROM lineorder_flat |
| WHERE |
| C_CITY IN ('UNITED KI1', 'UNITED KI5') |
| AND S_CITY IN ('UNITED KI1', 'UNITED KI5') |
| AND LO_ORDERDATE >= 19920101 |
| AND LO_ORDERDATE <= 19971231 |
| GROUP BY C_CITY, S_CITY, YEAR |
| ORDER BY YEAR ASC, revenue DESC; |
| |
| --Q3.4 |
| SELECT |
| C_CITY, |
| S_CITY, (LO_ORDERDATE DIV 10000) AS YEAR, |
| SUM(LO_REVENUE) AS revenue |
| FROM lineorder_flat |
| WHERE |
| C_CITY IN ('UNITED KI1', 'UNITED KI5') |
| AND S_CITY IN ('UNITED KI1', 'UNITED KI5') |
| AND LO_ORDERDATE >= 19971201 |
| AND LO_ORDERDATE <= 19971231 |
| GROUP BY C_CITY, S_CITY, YEAR |
| ORDER BY YEAR ASC, revenue DESC; |
| |
| --Q4.1 |
| SELECT (LO_ORDERDATE DIV 10000) AS YEAR, |
| C_NATION, |
| SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit |
| FROM lineorder_flat |
| WHERE |
| C_REGION = 'AMERICA' |
| AND S_REGION = 'AMERICA' |
| AND P_MFGR IN ('MFGR#1', 'MFGR#2') |
| GROUP BY YEAR, C_NATION |
| ORDER BY YEAR ASC, C_NATION ASC; |
| |
| --Q4.2 |
| SELECT (LO_ORDERDATE DIV 10000) AS YEAR, |
| S_NATION, |
| P_CATEGORY, |
| SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit |
| FROM lineorder_flat |
| WHERE |
| C_REGION = 'AMERICA' |
| AND S_REGION = 'AMERICA' |
| AND LO_ORDERDATE >= 19970101 |
| AND LO_ORDERDATE <= 19981231 |
| AND P_MFGR IN ('MFGR#1', 'MFGR#2') |
| GROUP BY YEAR, S_NATION, P_CATEGORY |
| ORDER BY |
| YEAR ASC, |
| S_NATION ASC, |
| P_CATEGORY ASC; |
| |
| --Q4.3 |
| SELECT (LO_ORDERDATE DIV 10000) AS YEAR, |
| S_CITY, |
| P_BRAND, |
| SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit |
| FROM lineorder_flat |
| WHERE |
| S_NATION = 'UNITED STATES' |
| AND LO_ORDERDATE >= 19970101 |
| AND LO_ORDERDATE <= 19981231 |
| AND P_CATEGORY = 'MFGR#14' |
| GROUP BY YEAR, S_CITY, P_BRAND |
| ORDER BY YEAR ASC, S_CITY ASC, P_BRAND ASC; |
| ``` |
| |
| #### 7.6.2 SSB Standard Test for SQL |
| |
| ```sql |
| --Q1.1 |
| SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE |
| FROM lineorder, dates |
| WHERE |
| lo_orderdate = d_datekey |
| AND d_year = 1993 |
| AND lo_discount BETWEEN 1 AND 3 |
| AND lo_quantity < 25; |
| |
| --Q1.2 |
| SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE |
| FROM lineorder, dates |
| WHERE |
| lo_orderdate = d_datekey |
| AND d_yearmonth = 'Jan1994' |
| AND lo_discount BETWEEN 4 AND 6 |
| AND lo_quantity BETWEEN 26 AND 35; |
| |
| --Q1.3 |
| SELECT |
| SUM(lo_extendedprice * lo_discount) AS REVENUE |
| FROM lineorder, dates |
| WHERE |
| lo_orderdate = d_datekey |
| AND d_weeknuminyear = 6 |
| AND d_year = 1994 |
| AND lo_discount BETWEEN 5 AND 7 |
| AND lo_quantity BETWEEN 26 AND 35; |
| |
| --Q2.1 |
| SELECT SUM(lo_revenue), d_year, p_brand |
| FROM lineorder, dates, part, supplier |
| WHERE |
| lo_orderdate = d_datekey |
| AND lo_partkey = p_partkey |
| AND lo_suppkey = s_suppkey |
| AND p_category = 'MFGR#12' |
| AND s_region = 'AMERICA' |
| GROUP BY d_year, p_brand |
| ORDER BY p_brand; |
| |
| --Q2.2 |
| SELECT SUM(lo_revenue), d_year, p_brand |
| FROM lineorder, dates, part, supplier |
| WHERE |
| lo_orderdate = d_datekey |
| AND lo_partkey = p_partkey |
| AND lo_suppkey = s_suppkey |
| AND p_brand BETWEEN 'MFGR#2221' AND 'MFGR#2228' |
| AND s_region = 'ASIA' |
| GROUP BY d_year, p_brand |
| ORDER BY d_year, p_brand; |
| |
| --Q2.3 |
| SELECT SUM(lo_revenue), d_year, p_brand |
| FROM lineorder, dates, part, supplier |
| WHERE |
| lo_orderdate = d_datekey |
| AND lo_partkey = p_partkey |
| AND lo_suppkey = s_suppkey |
| AND p_brand = 'MFGR#2239' |
| AND s_region = 'EUROPE' |
| GROUP BY d_year, p_brand |
| ORDER BY d_year, p_brand; |
| |
| --Q3.1 |
| SELECT |
| c_nation, |
| s_nation, |
| d_year, |
| SUM(lo_revenue) AS REVENUE |
| FROM customer, lineorder, supplier, dates |
| WHERE |
| lo_custkey = c_custkey |
| AND lo_suppkey = s_suppkey |
| AND lo_orderdate = d_datekey |
| AND c_region = 'ASIA' |
| AND s_region = 'ASIA' |
| AND d_year >= 1992 |
| AND d_year <= 1997 |
| GROUP BY c_nation, s_nation, d_year |
| ORDER BY d_year ASC, REVENUE DESC; |
| |
| --Q3.2 |
| SELECT |
| c_city, |
| s_city, |
| d_year, |
| SUM(lo_revenue) AS REVENUE |
| FROM customer, lineorder, supplier, dates |
| WHERE |
| lo_custkey = c_custkey |
| AND lo_suppkey = s_suppkey |
| AND lo_orderdate = d_datekey |
| AND c_nation = 'UNITED STATES' |
| AND s_nation = 'UNITED STATES' |
| AND d_year >= 1992 |
| AND d_year <= 1997 |
| GROUP BY c_city, s_city, d_year |
| ORDER BY d_year ASC, REVENUE DESC; |
| |
| --Q3.3 |
| SELECT |
| c_city, |
| s_city, |
| d_year, |
| SUM(lo_revenue) AS REVENUE |
| FROM customer, lineorder, supplier, dates |
| WHERE |
| lo_custkey = c_custkey |
| AND lo_suppkey = s_suppkey |
| AND lo_orderdate = d_datekey |
| AND ( |
| c_city = 'UNITED KI1' |
| OR c_city = 'UNITED KI5' |
| ) |
| AND ( |
| s_city = 'UNITED KI1' |
| OR s_city = 'UNITED KI5' |
| ) |
| AND d_year >= 1992 |
| AND d_year <= 1997 |
| GROUP BY c_city, s_city, d_year |
| ORDER BY d_year ASC, REVENUE DESC; |
| |
| --Q3.4 |
| SELECT |
| c_city, |
| s_city, |
| d_year, |
| SUM(lo_revenue) AS REVENUE |
| FROM customer, lineorder, supplier, dates |
| WHERE |
| lo_custkey = c_custkey |
| AND lo_suppkey = s_suppkey |
| AND lo_orderdate = d_datekey |
| AND ( |
| c_city = 'UNITED KI1' |
| OR c_city = 'UNITED KI5' |
| ) |
| AND ( |
| s_city = 'UNITED KI1' |
| OR s_city = 'UNITED KI5' |
| ) |
| AND d_yearmonth = 'Dec1997' |
| GROUP BY c_city, s_city, d_year |
| ORDER BY d_year ASC, REVENUE DESC; |
| |
| --Q4.1 |
| SELECT |
| d_year, |
| c_nation, |
| SUM(lo_revenue - lo_supplycost) AS PROFIT |
| FROM dates, customer, supplier, part, lineorder |
| WHERE |
| lo_custkey = c_custkey |
| AND lo_suppkey = s_suppkey |
| AND lo_partkey = p_partkey |
| AND lo_orderdate = d_datekey |
| AND c_region = 'AMERICA' |
| AND s_region = 'AMERICA' |
| AND ( |
| p_mfgr = 'MFGR#1' |
| OR p_mfgr = 'MFGR#2' |
| ) |
| GROUP BY d_year, c_nation |
| ORDER BY d_year, c_nation; |
| |
| --Q4.2 |
| SELECT |
| d_year, |
| s_nation, |
| p_category, |
| SUM(lo_revenue - lo_supplycost) AS PROFIT |
| FROM dates, customer, supplier, part, lineorder |
| WHERE |
| lo_custkey = c_custkey |
| AND lo_suppkey = s_suppkey |
| AND lo_partkey = p_partkey |
| AND lo_orderdate = d_datekey |
| AND c_region = 'AMERICA' |
| AND s_region = 'AMERICA' |
| AND ( |
| d_year = 1997 |
| OR d_year = 1998 |
| ) |
| AND ( |
| p_mfgr = 'MFGR#1' |
| OR p_mfgr = 'MFGR#2' |
| ) |
| GROUP BY d_year, s_nation, p_category |
| ORDER BY d_year, s_nation, p_category; |
| |
| --Q4.3 |
| SELECT |
| d_year, |
| s_city, |
| p_brand, |
| SUM(lo_revenue - lo_supplycost) AS PROFIT |
| FROM dates, customer, supplier, part, lineorder |
| WHERE |
| lo_custkey = c_custkey |
| AND lo_suppkey = s_suppkey |
| AND lo_partkey = p_partkey |
| AND lo_orderdate = d_datekey |
| AND s_nation = 'UNITED STATES' |
| AND ( |
| d_year = 1997 |
| OR d_year = 1998 |
| ) |
| AND p_category = 'MFGR#14' |
| GROUP BY d_year, s_city, p_brand |
| ORDER BY d_year, s_city, p_brand; |
| |
| ``` |