tree: 79a6b57943ca900aff23ad1c6424334747989808 [path history] [tgz]
  1. csv_data_loaders.py
  2. dag-query_1.pdf
  3. dag-query_12.pdf
  4. dag-query_8.pdf
  5. query_1.py
  6. query_12.py
  7. query_8.py
  8. README.md
  9. run.py
examples/spark/tpc-h/README.md

TPC-H

We've represented a few TPC-h queries using pyspark + hamilton.

While we have not optimized these for benchmarking, they provide a good set of examples for how to express pyspark logic/break it into hamilton functions.

Running

To run, you have run.py -- this enables you to run a few of the queries. That said, you'll have to generate the data on your own, which is a bit tricky.

Download dbgen here, and follow the instructions: https://www.tpc.org/tpch/. You can also reach out to us and we'll help you get set up.