blob: 85a58ed04c9509fd009ffbecf8160e063501fa8b [file] [log] [blame]
======
Driver
======
Once you defined your dataflow in a Python module, you need to create a Hamilton Driver to execute it. This page details the Driver basics, which include:
1. Defining the Driver
2. Visualizing the dataflow
3. Executing the dataflow
For this page, let's pretend we defined the following module ``my_dataflow.py``:
.. code-block:: python
# my_dataflow.py
def A() -> int:
"""Constant value 35"""
return 35
def B(A: int) -> float:
"""Divide A by 3"""
return A / 3
def C(A: int, B: float) -> float:
"""Square A and multiply by B"""
return A**2 * B
Define the Driver
-----------------
First, you need to create a ``driver.Driver`` object. This is done by passing Python modules to the ``driver.Builder()`` object along other configurations and calling ``.build()``.
The most basic Driver is built like this:
.. code-block:: python
# run.py
from hamilton import driver
import my_dataflow # <- module containing functions to define dataflow
# variable `dr` is of type `driver.Driver`
# it is created by a `driver.Builder` object
dr = driver.Builder().with_modules(my_dataflow).build()
The ``.build()`` method will fail if the definition found in ``my_dataflow`` is invalid (e.g., type mismatch, missing annotations) allowing you to fix issues and iterate quickly.
The ``Driver`` is defined in the context you intend to run, separately from your dataflow module. It can be in a script, notebook, server, web app, or anywhere else Python can run. As a convention, most Hamilton code examples use a script named ``run.py``.
Visualize the dataflow
----------------------
Once you successfully created your Driver, you can visualize the entire dataflow with the following:
.. code-block:: python
# run.py
from hamilton import driver
import my_dataflow
dr = driver.Builder().with_modules(my_dataflow).build()
dr.display_all_functions("dag.png") # outputs a file dag.png
dr.display_all_functions() # to view directly in a notebook
Dataflow visualizations are useful for documenting your project and quickly making sense of what a dataflow does (see :doc:`visualization`).
Execute the dataflow
--------------------
From the Driver, you can request the value of specific nodes by calling ``dr.execute(final_vars=[...])``, where ``final_vars`` is a list of node names. By default, results are returned in a dictionary with ``{node_name: result}``.
The following requests the node ``C`` and visualizes the dataflow execution:
.. code-block:: python
# run.py
from hamilton import driver
import my_dataflow
dr = driver.Builder().with_modules(my_dataflow).build()
dr.visualize_execution(["C"], "execute_c.png")
results = dr.execute(["C"])
print(results["C"]) # access results dictionary
The Driver automatically determines the minimum required path to compute requested nodes. See the respective outputs for ``dr.visualize_execution(["C"])`` and ``dr.visualize_execution(["B"])``:
.. image:: ../_static/execute_c.png
:height: 250px
.. image:: ../_static/execute_b.png
:height: 250px
Development tips
----------------
With Hamilton, development time is mostly spent writing functions for your dataflow in a Python module. Rebuilding the Driver and visualizing your dataflow as you make changes helps iterative development. Find below two useful development workflows.
With a Python module
~~~~~~~~~~~~~~~~~~~~
One approach is to define the dataflow and the Driver in the same file (e.g., ``my_dataflow.py``). Then, you can execute it as a script with ``python my_dataflow.py`` to rebuild the Driver and visualize your dataflow. This ensures your dataflow definition remains valid as you make changes.
For example:
.. code-block:: python
# my_dataflow.py
def A() -> int:
"""Constant value 35"""
return 35
# ... more functions
# is True when calling `python my_dataflow.py`
if __name__ == "__main__":
from hamilton import driver
# __main__ refers to the file itself
# and yes, a file can import itself as a module!
import __main__
dr = driver.Builder().with_modules(__main__).build()
dr.display_all_functions("dag.png")
dr.execute(["C"])
With a Jupyter notebook
~~~~~~~~~~~~~~~~~~~~~~~
Another approach is to define the dataflow in a module (e.g., ``my_dataflow.py``) and reload the Driver in a Jupyter notebook. This allows for a more interactive experience when you want to inspect the results of functions as you're developing.
By default, Python only imports a module once and subsequent ``import`` statements don't reload the module. We reload our imported module with ``importlib.reload(my_dataflow)`` and rebuild the Driver as we make changes to our dataflow.
.. code-block:: python
# notebook.ipynb
# %%cell 1
import importlib
from hamilton import driver
import my_dataflow
# %%cell 2
# this will reload an already imported module
importlib.reload(my_dataflow)
# rebuild the `Driver` with the reloaded module and execute again
dr = driver.Builder().with_modules(my_dataflow).build()
dr.display_all_functions("dag.png")
results = dr.execute(["C"])
# %%cell 3
# do something with results
print(results["C"])
Learn other Jupyter development tips on the page :doc:`../how-tos/use-in-jupyter-notebook`.
Recap
-----
- The Driver automatically assembles a dataflow from Python modules
- The Driver visualizes the dataflow created from your code
- Functions are executed by requesting nodes to driver ``.execute()``
Next step
---------
Now, you know the basics of authoring and executing Hamilton dataflows! We encourage you to:
- Write some code with our `interactive tutorials <https://www.tryhamilton.dev/intro>`_
- Kickstart your project with `community dataflows <https://hub.dagworks.io/docs/>`_
The next **Concepts** pages cover notions to write more expressive and powerful code. If you feel stuck or constrained with the basics, it's probably a good time to (re)visit them. They include:
- Materialization: interact with external data sources
- Function modifiers: write expressive dataflows without repeating code
- Builder: how to customize your Driver