| ============= |
| Visualization |
| ============= |
| |
| After assembling the dataflow, several visualization features become available to the Driver. Hamilton dataflow visualizations are great for documentation because they are directly derived from the code. |
| |
| On this page, you'll learn: |
| |
| - the available visualization functions |
| - how to answer lineage questions |
| - how to apply a custom style to your visualization |
| |
| For this page, we'll assume we have the following dataflow and Driver: |
| |
| .. code-block:: python |
| |
| # my_dataflow.py |
| def A() -> int: |
| """Constant value 35""" |
| return 35 |
| |
| def B(A: int) -> float: |
| """Divide A by 3""" |
| return A / 3 |
| |
| def C(A: int, B: float) -> float: |
| """Square A and multiply by B""" |
| return A**2 * B |
| |
| def D(A: int) -> str: |
| """Say `hello` A times""" |
| return "hello " |
| |
| def E(D: str) -> str: |
| """Say hello*A world""" |
| return D + "world" |
| |
| # run.py |
| from hamilton import driver |
| import my_dataflow |
| |
| dr = driver.Builder().with_modules(my_dataflow).build() |
| |
| |
| Available visualizations |
| ------------------------ |
| |
| View full dataflow |
| ~~~~~~~~~~~~~~~~~~ |
| |
| During development and for documentation, it's most useful to view the full dataflow and all nodes. |
| |
| .. code-block:: python |
| |
| dr.display_all_functions(...) |
| |
| .. image:: _visualization/display_all.png |
| :height: 200px |
| |
| View executed dataflow |
| ~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Visualizing exactly which nodes were executed is more helpful than viewing the full dataflow when logging driver execution (e.g., ML experiments). |
| |
| You should produce the visualization before executing the dataflow. Otherwise, the figure won't be generated if the execution fails first. |
| |
| .. code-block:: python |
| |
| # pull variables to ensure .execute() and |
| # .visualize_execution() receive the same |
| # arguments |
| final_vars = ["A", "C", "E"] |
| inputs = dict() |
| overrides = dict(B=36.1) |
| |
| dr.visualize_execution( |
| final_vars=final_vars, |
| inputs=inputs, |
| overrides=overrides, |
| ) |
| dr.execute( |
| final_vars=final_vars, |
| inputs=inputs, |
| overrides=overrides, |
| ) |
| |
| .. image:: _visualization/execution.png |
| :height: 250px |
| |
| An equivalent method is available if you're using materialization. |
| |
| .. code-block:: python |
| |
| materializer = to.json( |
| path="./out.json", |
| dependencies=["C", "E"], |
| combine=base.DictResult(), |
| id="results_to_json", |
| ) |
| additional_vars = ["A"] |
| inputs = dict() |
| overrides = dict(B=36.1) |
| |
| dr.visualize_materialization( |
| materializer, |
| additional_vars=additional_vars, |
| inputs=inputs, |
| overrides=dict(B=36.1), |
| output_file_path="dag.png" |
| ) |
| dr.materialize( |
| materializer, |
| additional_vars=additional_vars, |
| inputs=inputs, |
| overrides=dict(B=36.1), |
| ) |
| |
| .. image:: _visualization/materialization.png |
| :height: 250px |
| |
| |
| Learn more about :doc:`materialization`. |
| |
| View node dependencies |
| ---------------------- |
| |
| Representing data pipelines, ML experiments, or LLM applications as a dataflow helps reason about the dependencies between operations. The Hamilton Driver has the following utilities to select and return a list of nodes (to learn more :doc:`../how-tos/use-hamilton-for-lineage`): |
| |
| - ``.what_is_upstream_of(*node_names: str)`` |
| - ``.what_is_downstream_of(*node_names: str)`` |
| - ``.what_is_the_path_between(upstream_node_name: str, downstream_node_name: str)`` |
| |
| These functions are wrapped into their visualization counterparts: |
| |
| Display ancestors of ``B``: |
| |
| .. code-block:: python |
| |
| dr.display_upstream(["B"]) |
| |
| .. image:: _visualization/upstream.png |
| :height: 200px |
| |
| Display descendants of ``D`` and its immediate parents (``A`` only). |
| |
| .. code-block:: python |
| |
| dr.display_downstream(["D"]) |
| |
| .. image:: _visualization/downstream.png |
| :height: 200px |
| |
| Filter nodes to the necessary path: |
| |
| .. code-block:: python |
| |
| dr.visualize_path-between("A", "C") |
| # dr.visualize_path-between("C", "D") would return |
| # ValueError: No path found between C and D. |
| |
| .. image:: _visualization/between.png |
| :height: 200px |
| |
| Configure your visualization |
| ---------------------------- |
| |
| All of the above visualization functions share parameters to customize the visualization (e.g., hide legend, hide inputs). Learn more by reviewing the API reference for `Driver.display_all_functions() <https://hamilton.dagworks.io/en/latest/reference/drivers/Driver/#hamilton.driver.Driver.display_all_functions>`_; parameters should apply to all other visualizations. |
| |
| .. _custom-visualization-style: |
| |
| Apply custom style |
| ~~~~~~~~~~~~~~~~~~ |
| |
| By default, each node is labeled with name and type, and stylized (shape, color, outline, etc.). By passing a function to the parameter ``custom_style_function``, you can customize the node style based on its attributes. This pairs nicely with the ``@tag`` function modifier (learn more :ref:`tag-decorators`) |
| |
| Your own custom style function must: |
| |
| 1. Use only keyword arguments, taking in ``node`` and ``node_class``. |
| 2. Return a tuple ``(style, node_class, legend_name)`` where: |
| - ``style``: dictionary of valid graphviz node style attributes. |
| - ``node_class``: class used to style the default visualization - we recommend returning the input ``node_class`` |
| - ``legend_name``: text to display in the legend. Return ``None`` for no legend entry. |
| 3. For the execution-focused visualizations, your custom styles are applied before the modifiers for outputs and overrides are applied. |
| |
| If you need more customization, we suggest getting the graphviz object produced, and modifying it directly. |
| |
| This `online graphviz editor <https://edotor.net/>`_ can help you get started! |
| |
| .. code-block:: python |
| |
| def custom_style( |
| *, node: graph_types.HamiltonNode, node_class: str |
| ) -> Tuple[dict, Optional[str], Optional[str]]: |
| """Custom style function for the visualization. |
| |
| :param node: node that Hamilton is styling. |
| :param node_class: class used to style the default visualization |
| :return: a triple of (style, node_class, legend_name) |
| """ |
| if node.type in [float, int]: |
| style = ({"fillcolor": "aquamarine"}, node_class, "numbers") |
| |
| else: |
| style = ({}, node_class, None) |
| |
| return style |
| |
| dr.display_all_functions(custom_style_function=custom_style) |
| |
| |
| .. image:: _visualization/custom_style.png |
| :height: 250px |
| |
| |
| See the `full code example <https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/styling_visualization>`_ for more details. |