| |
| # Scaling Hamilton on Spark |
| ## Pyspark |
| |
| If you're using pyspark, Hamilton allows for natural manipulation of pyspark dataframes, |
| with some special constructs for managing DAGs of UDFs. |
| |
| See the example in `pyspark` to learn more. |
| |
| ## Pandas |
| If you're using Pandas, Hamilton scales by using Koalas on Spark. |
| Koalas became part of Spark officially in Spark 3.2, and was renamed Pandas on Spark. |
| The example in `pandas_on_spark` here assumes that. |
| |
| ## Pyspark UDFs |
| If you're not using Pandas, then you can use Hamilton to manage and organize your pyspark UDFs. |
| See the example in `pyspark_udfs`. |
| |
| Note: we're looking to expand coverage and support for more Spark use cases. Please come find us, or open an issue, |
| if you have a use case that you'd like to see supported! |