Hamilton on Dask

Here we have a hello world example showing how you can take some Hamilton functions and then easily run them in a distributed setting via dask.

Note: please read this dask best practices post; don‘t scale if you don’t need to.

pip install sf-hamilton[dask-complete] or pip install sf-hamilton dask[complete] to for the right dependencies to run this example.

File organization:

business_logic.py houses logic that should be invariant to how hamilton is executed.
data_loaders.py houses logic to load data for the business_logic.py module. The idea is that you'd swap this module out for other ways of loading data or use @config.when to determine what to load.
run.py is the script that shows how you can swap in loading data from a dask dataframe and reuse pandas.
run_with_delayed.py shows how you can farm out computation of each function to dask via dask.delayed.
run_with_delayed_and_dask_objects.py shows the combination of the above. It is slightly non-sensical, since we're entirely operating on what are dask objects effectively. But otherwise shows the code pattern to use both.

Visualization of execution

Here is the graph of execution:

hello_world_dask