blob: 73d7111ced9559d797bfbf479bb23bfbd274cd21 [file]
========================
Available Graph Adapters
========================
Here we list available graph adapters
Use ``from hamilton import base`` to use these Graph Adapters:
.. list-table::
:header-rows: 1
* - Name
- What it does
- When you'd use it
* - `base.SimplePythonDataFrameGraphAdapter <https://github.com/stitchfix/hamilton/blob/main/hamilton/base.py#L134>`_
- This executes the Hamilton dataflow locally on a machine in a single threaded, single process fashion. It assumes a pandas dataframe as a result.
- This is the default GraphAdapter that Hamilton uses. Use this when you want to execute on a single machine, without parallelization, and you want a pandas dataframe as output.
* - `base.SimplePythonGraphAdapter <https://github.com/stitchfix/hamilton/blob/main/hamilton/base.py#L149>`_
- This executes the Hamilton dataflow locally on a machine in a single threaded, single process fashion. It allows you to specify a ResultBuilder to control the return type of what ``execute()`` returns.
- This is the default GraphAdapter that Hamilton uses. Use this when you want to execute on a single machine, without parallelization, and you want to control the return type of the object that ``execute()`` returns.
Experimental Graph Adapters
---------------------------
The following are considered experimental; there is a possibility of their API changing. That said, the code is stable,
and you should feel comfortable giving the code for a spin - let us know how it goes, and what the rough edges are if
you find any.
Use ``from hamilton.experimental import h_[NAME]`` to use these Graph Adapters:
.. list-table::
:header-rows: 1
* - Name
- What it does
- When you'd use it
* - `h_dask.DaskGraphAdapter <https://github.com/stitchfix/hamilton/blob/main/hamilton/experimental/h_dask.py#L21>`_
- | This walks the graph and translates it to run onto `Dask <https://dask.org/>`_.
| You have the ability to pass in a ResultMixin object to the constructor to control the return type that gets produce by running on Dask.
- Use this if you want to utilize multiple cores on a single machine, or you want to scale to large data set sizes with a Dask cluster that you can connect to.
* - `h_ray.RayGraphAdapter <https://github.com/stitchfix/hamilton/blob/main/hamilton/experimental/h_ray.py#L12>`_
- | This walks the graph and translates it to run onto `Ray <https://ray.io/>`_.
| You have the ability to pass in a ResultMixin object to the constructor to control the return type that gets produce by running on Ray.
- Use this if you want to utilize multiple cores on a single machine, or you want to scale to larger data set sizes with a Ray cluster that you can connect to. Note: you are still constrained by machine memory size with Ray; you can't just scale to any dataset size.
* - `h_spark.SparkKoalasGraphAdapter <https://github.com/stitchfix/hamilton/blob/main/hamilton/experimental/h_spark.py#L25>`_
- | This walks the graph and translates it to run onto `Apache Spark <https://spark.apache.org/">`_ using the `Pandas API on Spark <https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/index.html>`_ (aka `Koalas <https://koalas.readthedocs.io/en/latest>`_).
| You only have the ability to return either a Koalas Dataframe or a Pandas Dataframe. To do that you either use the stock `base.PandasDataFrameResult <https://github.com/stitchfix/hamilton/blob/main/hamilton/base.py#L39>`_ ResultMixin, or you use the `h_spark.KoalasDataframeResult <https://github.com/stitchfix/hamilton/blob/main/hamilton/experimental/h_spark.py#L16>`_.
- | You'd generally use this if you have an existing spark cluster running in your workplace, and you want to scale to very large data set sizes.
| Note this GraphAdapter has only been tested to work on Spark 3.2+ when Koalas became part of the standard Spark library.