| <!--- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| |
| # DataFusion Python Examples for TPC-H |
| |
| These examples reproduce the problems listed in the Transaction Process Council |
| TPC-H benchmark. The purpose of these examples is to demonstrate how to use |
| different aspects of Data Fusion and not necessarily geared towards creating the |
| most performant queries possible. Within each example is a description of the |
| problem. For users who are familiar with SQL style commands, you can compare the |
| approaches in these examples with those listed in the specification. |
| |
| - https://www.tpc.org/tpch/ |
| |
| The examples provided are based on version 2.18.0 of the TPC-H specification. |
| |
| ## Data Setup |
| |
| To run these examples, you must first generate a dataset. The `dbgen` tool |
| provided by TPC can create datasets of arbitrary scale. For testing it is |
| typically sufficient to create a 1 gigabyte dataset. For convenience, this |
| repository has a script which uses docker to create this dataset. From the |
| `benchmarks/tpch` directory execute the following script. |
| |
| ```bash |
| ./tpch-gen.sh 1 |
| ``` |
| |
| The examples provided use parquet files for the tables generated by `dbgen`. |
| A python script is provided to convert the text files from `dbgen` into parquet |
| files expected by the examples. From the `examples/tpch` directory you can |
| execute the following command to create the necessary parquet files. |
| |
| ```bash |
| python convert_data_to_parquet.py |
| ``` |
| |
| ## Description of Examples |
| |
| For easier access, a description of the techniques demonstrated in each file |
| is in the README.md file in the `examples` directory. |