docs/concepts/driver-capabilities.rst - hamilton - Git at Google

 =====================
 Running Your Code
 =====================

 Writing functions is great, but its meaningless if you have no way to execute them. We use "drivers" to execute the code.
 We currently have a single Hamilton Driver that is responsible for the following:

 #. Crawling your python modules to extract functions to turn into nodes in the DAG.
 #. Running the DAG.
 #. Assembling the results.
 #. Enabling you to visualize the DAG.

 It is provided as an easy way for the user to specify the data she wants without dealing with the complexities of DAGs,
 function graphs, or nodes.

 The basic structure of using the Hamilton Driver is:

 .. code-block:: python

     from hamilton import driver
     from hamilton import base

     # 1. Setup config. See the Parameterizing the DAG section for usage
     config = {}

     # 2. we need to tell hamilton where to load function definitions from
     module_name = 'my_functions'
     module = importlib.import_module(module_name)  # or simply "import my_functions"

     # 3. Determine the return type -- default is a pandas.DataFrame.
     adapter = base.SimplePythonDataFrameGraphAdapter() # See GraphAdapter docs for more details.

     # These all feed into creating the driver & thus DAG.
     dr = driver.Driver(config, module, adapter=adapter)

     # lastly to get something, we need to call execute
     result = dr.execute(['desired_output1', 'desired_output2'])

     # pip install sf-hamilton[visualization] for this next line to work:
     dr.visualize_execution(['desired_output1', 'desired_output2'], './my_file.dot, {})

 Note that the stock Hamilton driver is the API interface to use to execute your Hamilton dataflows. Before
 diving into how to call/use the driver more, let's cover DAG parameterization.

 .. _parameterizing-the-dag:

 Parameterizing the DAG
 ------------------------------

 Static dataflows are only so useful. In the real world, we need to be able to configure both the shape of the DAG and
 the inputs to the DAG as part of the Hamilton driver. The default Hamilton driver comes with three input types that you
 can control. They both take the form ``Dict[str, Any]``, i.e. a dictionary of string keys that maps to any object type.

 #. **config** The config is a dictionary of strings to values. This is passed into the constructor of the Hamilton driver, as it is required to create the DAG. It `also` gets passed into the DAG at runtime, so you have access to parameter values. See :doc:`decorators-overview`, as well as the examples below, for how the config can be used.

 #. **inputs** The `runtime inputs` to the DAG. These have to be mutually disjoint from the config -- overriding the config does not make sense here, as the DAG has been constructed assuming fixed configs.

 #. **overrides** Values to override nodes in a DAG. During execution, nothing upstream of these are computed.

 Let's go through some examples that show you how to write a Hamilton function that allows it to be conditionally used
 depending on configuration.

 **You have a DAG for region and business line**, where the rolling average for marketing spend is computed differently
 (see :doc:`../getting-started/index` for the motivating example). In this case, you'll define the DAG as follows:

 .. code-block:: python

     @config.when(business_line='CA')
     def avg_rolling_spend__CA(spend: pd.Series) -> pd.Series:
         """Rolling average of spend in the canada region."""
         return spend.rolling(3).mean()

     @config.when(business_line='US')
     def avg_rolling_spend__US(spend: pd.Series) -> pd.Series:
         """Rolling average of spend in the US region."""
         return spend.rolling(2).mean()

 When the graph is compiled, the implementation of ``avg_rolling_spend`` varies based off of the configuration value.
 You would construct the driver with ``config={'region' : 'US'}``, to get the desired behavior.

 **You want to pass in the region/business line to change the behavior or a transform.** Say you have a big dataframe of
 marketing spend with columns representing the region, and also want to filter it out for the individual region. You
 would define the transform function as follows.

 .. code-block:: python

     def avg_rolling_spend(spend_by_country: pd.DataFrame, region: str) -> pd.Series:
         """Rolling average of spend in the specified region."""
         return spend_by_country[spend_by_country.region==region].spend

 You would execute the driver with ``input={'region' : 'US'}``, to get the desired behavior. You could `also` construct
 the DAG with ``config={'region' : 'US'}``.

 **You want to override the value of a transform**. In this case, you can just pass this into the execute function of the
 driver as overrides. E.G.:

 .. code-block:: python

     df = dr.execute(
         ['acquisition_cost'],
         overrides={'spend' : pd.Series(
             [40, 80, 100, 400, 800, 1000], # what if we increased the marketing spend?
             index=pd.date_range("2022-01-01", periods=6, freq="w"))})


 Calling Execute()
 #################

 There are two ways to use ``execute()``:

 #. Call it once -- you only request the outputs required. E.g. ``dr.execute(['desired_output1', 'desired_output2'])``
 #. Call it in succession by providing it specific inputs, in addition to the outputs required. E.g. ``dr.execute(['desired_output1', 'desired_output2'], inputs={...})``

 We recommend using option (1) where possible. Option (2) only makes sense if you want to reuse the dataflow created for
 different data sets, or to chunk over large data or iterate over objects, e.g. images or text.

 Visualizing Execution
 #####################

 Hamilton enables you to quickly and easily visualize your entire DAG, as well as the specific execution path to compute
 an output. Underneath we default to use `graphviz <https://graphviz.org/>`_ for visualization.

 Visualize just execution required to create outputs
 ***************************************************

 .. code-block:: python

     dr.visualize_execution(['desired_output1', 'desired_output2'], './my_file.dot', render_args)

 In addition to specifying the outputs you desire, you need to provide a path to save the created dot file and image, and
 then provide some  arguments for rendering -- at minimum, pass in an empty dictionary.

 Visualize the entire DAG constructed
 ************************************

 .. code-block:: python

     dr.display_all_functions('./my_file.dot', render_args)

 You need to provide a path to save the created dot file and image, and then provide some optional arguments for
 rendering.

 Should I define my own Driver?
 ------------------------------

 The APIs that the Hamilton Driver is built on, are considered internal. So it is possible for you to define your own
 driver in place of the stock Hamilton driver, we suggest the following path if you don't like how the current Hamilton
 Driver interface is designed:

 `Write a "Wrapper" class that delegates to the Hamilton Driver.`

 i.e.

 .. code-block:: python

     from hamilton import driver

     class MyCustomDriver(object):
         def __init__(self, constructor_arg, ...):
            self.constructor_arg = constructor_arg
            ...
         # some internal functions specific to your context
         # ...

         def my_execute_function(self, arg1, arg2, ...):
             """What actually calls the Hamilton"""
             dr = driver.Driver(self.constructor_arg, ...)
             df = dr.execute(self.outputs)
             return self.augmetn(df)

 That way, you can create the right API constructs to invoke Hamilton in your context, and then delegate to the stock
 Hamilton Driver. By doing so, it will ensure that your code continues to work, since we intend to honor the Hamilton
 Driver APIs with backwards compatibility as much as possible.
	=====================
	Running Your Code
	=====================

	Writing functions is great, but its meaningless if you have no way to execute them. We use "drivers" to execute the code.
	We currently have a single Hamilton Driver that is responsible for the following:

	#. Crawling your python modules to extract functions to turn into nodes in the DAG.
	#. Running the DAG.
	#. Assembling the results.
	#. Enabling you to visualize the DAG.

	It is provided as an easy way for the user to specify the data she wants without dealing with the complexities of DAGs,
	function graphs, or nodes.

	The basic structure of using the Hamilton Driver is:

	.. code-block:: python

	from hamilton import driver
	from hamilton import base

	# 1. Setup config. See the Parameterizing the DAG section for usage
	config = {}

	# 2. we need to tell hamilton where to load function definitions from
	module_name = 'my_functions'
	module = importlib.import_module(module_name) # or simply "import my_functions"

	# 3. Determine the return type -- default is a pandas.DataFrame.
	adapter = base.SimplePythonDataFrameGraphAdapter() # See GraphAdapter docs for more details.

	# These all feed into creating the driver & thus DAG.
	dr = driver.Driver(config, module, adapter=adapter)

	# lastly to get something, we need to call execute
	result = dr.execute(['desired_output1', 'desired_output2'])

	# pip install sf-hamilton[visualization] for this next line to work:
	dr.visualize_execution(['desired_output1', 'desired_output2'], './my_file.dot, {})

	Note that the stock Hamilton driver is the API interface to use to execute your Hamilton dataflows. Before
	diving into how to call/use the driver more, let's cover DAG parameterization.

	.. _parameterizing-the-dag:

	Parameterizing the DAG
	------------------------------

	Static dataflows are only so useful. In the real world, we need to be able to configure both the shape of the DAG and
	the inputs to the DAG as part of the Hamilton driver. The default Hamilton driver comes with three input types that you
	can control. They both take the form ``Dict[str, Any]``, i.e. a dictionary of string keys that maps to any object type.

	#. config The config is a dictionary of strings to values. This is passed into the constructor of the Hamilton driver, as it is required to create the DAG. It `also` gets passed into the DAG at runtime, so you have access to parameter values. See :doc:`decorators-overview`, as well as the examples below, for how the config can be used.

	#. inputs The `runtime inputs` to the DAG. These have to be mutually disjoint from the config -- overriding the config does not make sense here, as the DAG has been constructed assuming fixed configs.

	#. overrides Values to override nodes in a DAG. During execution, nothing upstream of these are computed.

	Let's go through some examples that show you how to write a Hamilton function that allows it to be conditionally used
	depending on configuration.

	You have a DAG for region and business line, where the rolling average for marketing spend is computed differently
	(see :doc:`../getting-started/index` for the motivating example). In this case, you'll define the DAG as follows:

	.. code-block:: python

	@config.when(business_line='CA')
	def avg_rolling_spend__CA(spend: pd.Series) -> pd.Series:
	"""Rolling average of spend in the canada region."""
	return spend.rolling(3).mean()

	@config.when(business_line='US')
	def avg_rolling_spend__US(spend: pd.Series) -> pd.Series:
	"""Rolling average of spend in the US region."""
	return spend.rolling(2).mean()

	When the graph is compiled, the implementation of ``avg_rolling_spend`` varies based off of the configuration value.
	You would construct the driver with ``config={'region' : 'US'}``, to get the desired behavior.

	You want to pass in the region/business line to change the behavior or a transform. Say you have a big dataframe of
	marketing spend with columns representing the region, and also want to filter it out for the individual region. You
	would define the transform function as follows.

	.. code-block:: python

	def avg_rolling_spend(spend_by_country: pd.DataFrame, region: str) -> pd.Series:
	"""Rolling average of spend in the specified region."""
	return spend_by_country[spend_by_country.region==region].spend

	You would execute the driver with ``input={'region' : 'US'}``, to get the desired behavior. You could `also` construct
	the DAG with ``config={'region' : 'US'}``.

	You want to override the value of a transform. In this case, you can just pass this into the execute function of the
	driver as overrides. E.G.:

	.. code-block:: python

	df = dr.execute(
	['acquisition_cost'],
	overrides={'spend' : pd.Series(
	[40, 80, 100, 400, 800, 1000], # what if we increased the marketing spend?
	index=pd.date_range("2022-01-01", periods=6, freq="w"))})


	Calling Execute()
	#################

	There are two ways to use ``execute()``:

	#. Call it once -- you only request the outputs required. E.g. ``dr.execute(['desired_output1', 'desired_output2'])``
	#. Call it in succession by providing it specific inputs, in addition to the outputs required. E.g. ``dr.execute(['desired_output1', 'desired_output2'], inputs={...})``

	We recommend using option (1) where possible. Option (2) only makes sense if you want to reuse the dataflow created for
	different data sets, or to chunk over large data or iterate over objects, e.g. images or text.

	Visualizing Execution
	#####################

	Hamilton enables you to quickly and easily visualize your entire DAG, as well as the specific execution path to compute
	an output. Underneath we default to use `graphviz <https://graphviz.org/>`_ for visualization.

	Visualize just execution required to create outputs
	***************************************************

	.. code-block:: python

	dr.visualize_execution(['desired_output1', 'desired_output2'], './my_file.dot', render_args)

	In addition to specifying the outputs you desire, you need to provide a path to save the created dot file and image, and
	then provide some arguments for rendering -- at minimum, pass in an empty dictionary.

	Visualize the entire DAG constructed
	************************************

	.. code-block:: python

	dr.display_all_functions('./my_file.dot', render_args)

	You need to provide a path to save the created dot file and image, and then provide some optional arguments for
	rendering.

	Should I define my own Driver?
	------------------------------

	The APIs that the Hamilton Driver is built on, are considered internal. So it is possible for you to define your own
	driver in place of the stock Hamilton driver, we suggest the following path if you don't like how the current Hamilton
	Driver interface is designed:

	`Write a "Wrapper" class that delegates to the Hamilton Driver.`

	i.e.

	.. code-block:: python

	from hamilton import driver

	class MyCustomDriver(object):
	def __init__(self, constructor_arg, ...):
	self.constructor_arg = constructor_arg
	...
	# some internal functions specific to your context
	# ...

	def my_execute_function(self, arg1, arg2, ...):
	"""What actually calls the Hamilton"""
	dr = driver.Driver(self.constructor_arg, ...)
	df = dr.execute(self.outputs)
	return self.augmetn(df)

	That way, you can create the right API constructs to invoke Hamilton in your context, and then delegate to the stock
	Hamilton Driver. By doing so, it will ensure that your code continues to work, since we intend to honor the Hamilton
	Driver APIs with backwards compatibility as much as possible.