We've represented a few TPC-h queries using pyspark + hamilton.
While we have not optimized these for benchmarking, they provide a good set of examples for how to express pyspark logic/break it into hamilton functions.
To run, you have run.py -- this enables you to run a few of the queries. That said, you'll have to generate the data on your own, which is a bit tricky.
Download dbgen here, and follow the instructions: https://www.tpc.org/tpch/. You can also reach out to us and we'll help you get set up.