docs/concepts/driver.rst - hamilton - Git at Google

 ======
 Driver
 ======

 Once you defined your dataflow in a Python module, you need to create a Hamilton Driver to execute it. This page details the Driver basics, which include:

 1. Defining the Driver
 2. Visualizing the dataflow
 3. Executing the dataflow

 For this page, let's pretend we defined the following module ``my_dataflow.py``:

 .. code-block:: python

     # my_dataflow.py
     def A() -> int:
         """Constant value 35"""
         return 35

     def B(A: int) -> float:
         """Divide A by 3"""
         return A / 3

     def C(A: int, B: float) -> float:
         """Square A and multiply by B"""
         return A**2 * B

 Define the Driver
 -----------------

 First, you need to create a ``driver.Driver`` object. This is done by passing Python modules to the ``driver.Builder()`` object along other configurations and calling ``.build()``.

 The most basic Driver is built like this:

 .. code-block:: python

     # run.py
     from hamilton import driver
     import my_dataflow  # <- module containing functions to define dataflow

     # variable `dr` is of type `driver.Driver`
     # it is created by a `driver.Builder` object
     dr = driver.Builder().with_modules(my_dataflow).build()

 The ``.build()`` method will fail if the definition found in ``my_dataflow`` is invalid (e.g., type mismatch, missing annotations) allowing you to fix issues and iterate quickly.

 The ``Driver`` is defined in the context you intend to run, separately from your dataflow module. It can be in a script, notebook, server, web app, or anywhere else Python can run. As a convention, most Hamilton code examples use a script named ``run.py``.

 Visualize the dataflow
 ----------------------

 Once you successfully created your Driver, you can visualize the entire dataflow with the following:

 .. code-block:: python

     # run.py
     from hamilton import driver
     import my_dataflow

     dr = driver.Builder().with_modules(my_dataflow).build()
     dr.display_all_functions("dag.png")  # outputs a file dag.png
     dr.display_all_functions()  # to view directly in a notebook

 Dataflow visualizations are useful for documenting your project and quickly making sense of what a dataflow does (see :doc:`visualization`).

 Execute the dataflow
 --------------------

 From the Driver, you can request the value of specific nodes by calling ``dr.execute(final_vars=[...])``, where ``final_vars`` is a list of node names. By default, results are returned in a dictionary with ``{node_name: result}``.

 The following requests the node ``C`` and visualizes the dataflow execution:

 .. code-block:: python

     # run.py
     from hamilton import driver
     import my_dataflow

     dr = driver.Builder().with_modules(my_dataflow).build()
     dr.visualize_execution(["C"], "execute_c.png")
     results = dr.execute(["C"])

     print(results["C"])  # access results dictionary

 The Driver automatically determines the minimum required path to compute requested nodes. See the respective outputs for ``dr.visualize_execution(["C"])`` and ``dr.visualize_execution(["B"])``:

 .. image:: ../_static/execute_c.png
     :height: 250px
 .. image:: ../_static/execute_b.png
     :height: 250px


 Development tips
 ----------------

 With Hamilton, development time is mostly spent writing functions for your dataflow in a Python module. Rebuilding the Driver and visualizing your dataflow as you make changes helps iterative development. Find below two useful development workflows.

 With a Python module
 ~~~~~~~~~~~~~~~~~~~~

 One approach is to define the dataflow and the Driver in the same file (e.g., ``my_dataflow.py``). Then, you can execute it as a script with ``python my_dataflow.py`` to rebuild the Driver and visualize your dataflow. This ensures your dataflow definition remains valid as you make changes.

 For example:

 .. code-block:: python

     # my_dataflow.py
     def A() -> int:
         """Constant value 35"""
         return 35

     # ... more functions

     # is True when calling `python my_dataflow.py`
     if __name__ == "__main__":
         from hamilton import driver
         # __main__ refers to the file itself
         # and yes, a file can import itself as a module!
         import __main__

         dr = driver.Builder().with_modules(__main__).build()
         dr.display_all_functions("dag.png")
         dr.execute(["C"])

 With a Jupyter notebook
 ~~~~~~~~~~~~~~~~~~~~~~~

 Another approach is to define the dataflow in a module (e.g., ``my_dataflow.py``) and reload the Driver in a Jupyter notebook. This allows for a more interactive experience when you want to inspect the results of functions as you're developing.

 By default, Python only imports a module once and subsequent ``import`` statements don't reload the module. We reload our imported module with ``importlib.reload(my_dataflow)`` and rebuild the Driver as we make changes to our dataflow.

 .. code-block:: python

     # notebook.ipynb
     # %%cell 1
     import importlib
     from hamilton import driver
     import my_dataflow

     # %%cell 2
     # this will reload an already imported module
     importlib.reload(my_dataflow)

     # rebuild the `Driver` with the reloaded module and execute again
     dr = driver.Builder().with_modules(my_dataflow).build()
     dr.display_all_functions("dag.png")
     results = dr.execute(["C"])

     # %%cell 3
     # do something with results
     print(results["C"])

 Learn other Jupyter development tips on the page :doc:`../how-tos/use-in-jupyter-notebook`.

 Recap
 -----
 - The Driver automatically assembles a dataflow from Python modules
 - The Driver visualizes the dataflow created from your code
 - Functions are executed by requesting nodes to driver ``.execute()``

 Next step
 ---------
 Now, you know the basics of authoring and executing Hamilton dataflows! We encourage you to:

 - Write some code with our `interactive tutorials <https://www.tryhamilton.dev/intro>`_
 - Kickstart your project with `community dataflows <https://hub.dagworks.io/docs/>`_

 The next **Concepts** pages cover notions to write more expressive and powerful code. If you feel stuck or constrained with the basics, it's probably a good time to (re)visit them. They include:

 - Materialization: interact with external data sources
 - Function modifiers: write expressive dataflows without repeating code
 - Builder: how to customize your Driver
	======
	Driver
	======

	Once you defined your dataflow in a Python module, you need to create a Hamilton Driver to execute it. This page details the Driver basics, which include:

	1. Defining the Driver
	2. Visualizing the dataflow
	3. Executing the dataflow

	For this page, let's pretend we defined the following module ``my_dataflow.py``:

	.. code-block:: python

	# my_dataflow.py
	def A() -> int:
	"""Constant value 35"""
	return 35

	def B(A: int) -> float:
	"""Divide A by 3"""
	return A / 3

	def C(A: int, B: float) -> float:
	"""Square A and multiply by B"""
	return A*2 B

	Define the Driver
	-----------------

	First, you need to create a ``driver.Driver`` object. This is done by passing Python modules to the ``driver.Builder()`` object along other configurations and calling ``.build()``.

	The most basic Driver is built like this:

	.. code-block:: python

	# run.py
	from hamilton import driver
	import my_dataflow # <- module containing functions to define dataflow

	# variable `dr` is of type `driver.Driver`
	# it is created by a `driver.Builder` object
	dr = driver.Builder().with_modules(my_dataflow).build()

	The ``.build()`` method will fail if the definition found in ``my_dataflow`` is invalid (e.g., type mismatch, missing annotations) allowing you to fix issues and iterate quickly.

	The ``Driver`` is defined in the context you intend to run, separately from your dataflow module. It can be in a script, notebook, server, web app, or anywhere else Python can run. As a convention, most Hamilton code examples use a script named ``run.py``.

	Visualize the dataflow
	----------------------

	Once you successfully created your Driver, you can visualize the entire dataflow with the following:

	.. code-block:: python

	# run.py
	from hamilton import driver
	import my_dataflow

	dr = driver.Builder().with_modules(my_dataflow).build()
	dr.display_all_functions("dag.png") # outputs a file dag.png
	dr.display_all_functions() # to view directly in a notebook

	Dataflow visualizations are useful for documenting your project and quickly making sense of what a dataflow does (see :doc:`visualization`).

	Execute the dataflow
	--------------------

	From the Driver, you can request the value of specific nodes by calling ``dr.execute(final_vars=[...])``, where ``final_vars`` is a list of node names. By default, results are returned in a dictionary with ``{node_name: result}``.

	The following requests the node ``C`` and visualizes the dataflow execution:

	.. code-block:: python

	# run.py
	from hamilton import driver
	import my_dataflow

	dr = driver.Builder().with_modules(my_dataflow).build()
	dr.visualize_execution(["C"], "execute_c.png")
	results = dr.execute(["C"])

	print(results["C"]) # access results dictionary

	The Driver automatically determines the minimum required path to compute requested nodes. See the respective outputs for ``dr.visualize_execution(["C"])`` and ``dr.visualize_execution(["B"])``:

	.. image:: ../_static/execute_c.png
	:height: 250px
	.. image:: ../_static/execute_b.png
	:height: 250px


	Development tips
	----------------

	With Hamilton, development time is mostly spent writing functions for your dataflow in a Python module. Rebuilding the Driver and visualizing your dataflow as you make changes helps iterative development. Find below two useful development workflows.

	With a Python module
	~~~~~~~~~~~~~~~~~~~~

	One approach is to define the dataflow and the Driver in the same file (e.g., ``my_dataflow.py``). Then, you can execute it as a script with ``python my_dataflow.py`` to rebuild the Driver and visualize your dataflow. This ensures your dataflow definition remains valid as you make changes.

	For example:

	.. code-block:: python

	# my_dataflow.py
	def A() -> int:
	"""Constant value 35"""
	return 35

	# ... more functions

	# is True when calling `python my_dataflow.py`
	if __name__ == "__main__":
	from hamilton import driver
	# __main__ refers to the file itself
	# and yes, a file can import itself as a module!
	import __main__

	dr = driver.Builder().with_modules(__main__).build()
	dr.display_all_functions("dag.png")
	dr.execute(["C"])

	With a Jupyter notebook
	~~~~~~~~~~~~~~~~~~~~~~~

	Another approach is to define the dataflow in a module (e.g., ``my_dataflow.py``) and reload the Driver in a Jupyter notebook. This allows for a more interactive experience when you want to inspect the results of functions as you're developing.

	By default, Python only imports a module once and subsequent ``import`` statements don't reload the module. We reload our imported module with ``importlib.reload(my_dataflow)`` and rebuild the Driver as we make changes to our dataflow.

	.. code-block:: python

	# notebook.ipynb
	# %%cell 1
	import importlib
	from hamilton import driver
	import my_dataflow

	# %%cell 2
	# this will reload an already imported module
	importlib.reload(my_dataflow)

	# rebuild the `Driver` with the reloaded module and execute again
	dr = driver.Builder().with_modules(my_dataflow).build()
	dr.display_all_functions("dag.png")
	results = dr.execute(["C"])

	# %%cell 3
	# do something with results
	print(results["C"])

	Learn other Jupyter development tips on the page :doc:`../how-tos/use-in-jupyter-notebook`.

	Recap
	-----
	- The Driver automatically assembles a dataflow from Python modules
	- The Driver visualizes the dataflow created from your code
	- Functions are executed by requesting nodes to driver ``.execute()``

	Next step
	---------
	Now, you know the basics of authoring and executing Hamilton dataflows! We encourage you to:

	- Write some code with our `interactive tutorials <https://www.tryhamilton.dev/intro>`_
	- Kickstart your project with `community dataflows <https://hub.dagworks.io/docs/>`_

	The next Concepts pages cover notions to write more expressive and powerful code. If you feel stuck or constrained with the basics, it's probably a good time to (re)visit them. They include:

	- Materialization: interact with external data sources
	- Function modifiers: write expressive dataflows without repeating code
	- Builder: how to customize your Driver