blob: df7a588a5e75c7319981d07e95f26fc8560af460 [file] [view]
# Scaling Hamilton on Spark
## Pyspark
If you're using pyspark, Hamilton allows for natural manipulation of pyspark dataframes,
with some special constructs for managing DAGs of UDFs.
See the example in `pyspark` to learn more.
## Pandas
If you're using Pandas, Hamilton scales by using Koalas on Spark.
Koalas became part of Spark officially in Spark 3.2, and was renamed Pandas on Spark.
The example in `pandas_on_spark` here assumes that.
## Pyspark UDFs
If you're not using Pandas, then you can use Hamilton to manage and organize your pyspark UDFs.
See the example in `pyspark_udfs`.
Note: we're looking to expand coverage and support for more Spark use cases. Please come find us, or open an issue,
if you have a use case that you'd like to see supported!