tree: 115d55c5c13b8b343b5be85977ab42a0215b3a3e [path history] [tgz]
  1. my_functions.png
  2. my_functions.py
  3. my_functions_async.png
  4. my_functions_async.py
  5. notebook.ipynb
  6. README.md
  7. requirements.txt
  8. run.py
examples/parallelism/lazy_threadpool_execution/README.md

Lazy threadpool execution

Open In Colab

This example is different from the other examples under /parallelism/ in that it demonstrates how to use an adapter to put each function into a threadpool that allows for lazy DAG evaluation and for parallelism to be achieved. This is useful when you have a lot of functions doing I/O bound tasks and you want to speed up the execution of your program. E.g. doing lots of HTTP requests, reading/writing to disk, LLM API calls, etc.

Note: this adapter does not support DAGs with Parallelizable and Collect functions; create an issue if you need this feature.

DAG

The above image shows the DAG that will be executed. You can see from the structure that the DAG can be parallelized, i.e. the left most nodes can be executed in parallel.

When you execute run.py, you will output that shows:

  1. The DAG running in parallel -- check the image against what is printed.
  2. The DAG logging to the Apache Hamilton UI -- please adjust for you project.
  3. The DAG running without the adapter -- this is to show the difference in execution time.
  4. An async version of the DAG running in parallel -- this is to show that the performance of this approach is similar.
python run.py

To use this adapter:

from hamilton import driver
from hamilton.plugins import h_threadpool

# import your hamilton functions
import my_functions

# Create the adapter
adapter = h_threadpool.FutureAdapter()

# Create a driver
dr = (
    driver.Builder()
    .with_modules(my_functions)
    .with_adapters(adapter)
    .build()
)
# execute
dr.execute(["s", "x", "a"]) # if the DAG can be parallelized it will be