If you're using pyspark, Hamilton allows for natural manipulation of pyspark dataframes, with some special constructs for managing DAGs of UDFs.
See the example in pyspark to learn more.
If you're using Pandas, Hamilton scales by using Koalas on Spark. Koalas became part of Spark officially in Spark 3.2, and was renamed Pandas on Spark. The example in pandas_on_spark here assumes that.
If you're not using Pandas, then you can use Hamilton to manage and organize your pyspark UDFs. See the example in pyspark_udfs.
Note: we‘re looking to expand coverage and support for more Spark use cases. Please come find us, or open an issue, if you have a use case that you’d like to see supported!