blob: 94e827cc7419aebac9e9caa748089fd734e25b43 [file] [log] [blame]
=======================
Caching logic
=======================
Caching Behavior
----------------
.. autoclass:: hamilton.caching.adapter.CachingBehavior
:exclude-members: __init__, from_string
:members:
.. automethod:: from_string
``@cache`` decorator
----------------------
.. autoclass:: hamilton.function_modifiers.metadata.cache
:special-members: __init__
:private-members:
:members:
:exclude-members: BEHAVIOR_KEY, FORMAT_KEY
.. autoattribute:: BEHAVIOR_KEY
.. autoattribute:: FORMAT_KEY
Logging
-----------
.. autoclass:: hamilton.caching.adapter.CachingEvent
:special-members: __init__
:members:
:inherited-members:
.. autoclass:: hamilton.caching.adapter.CachingEventType
:special-members: __init__
:members:
:inherited-members:
Adapter
--------
.. autoclass:: hamilton.caching.adapter.HamiltonCacheAdapter
:special-members: __init__
:members:
:inherited-members:
Quirks and limitations
----------------------
Caching is a large and complex feature. This section is an attempt to list quirks and limitations, known and theoretical, to help debugging and guide feature development
- The standard library includes a lot of types which are not primitives. Thus, Hamilton might not be supporting them explicitly. It should be simple to add, so ping us if you need it.
- The ``ResultStore`` could be architectured better to support custom formats. Right now, we use a ``DataSaver`` to produce the ``.parquet`` file and we pickle the ``DataLoader`` for later retrieval. Then, the metadata and result stores are completely unaware of the ``.parquet`` file making it difficult to handle cache eviction.
- When a function with default parameter values passes through lifecycle hooks, the default values are not part of the ``node_kwargs``. They need to be retrieved manually from the ``node.Node`` object.
- supporting the Hamilton ``AsyncDriver`` would require making the adapter async, but also the stores. A potential challenge is ensuring that you can use the same cache (i.e., same SQLite db and filesystem) for both sync and async drivers.
- If the ``@cache`` allows to specify the ``format`` (e.g., ``json``, ``parquet``), we probably want ``.with_cache()`` to support the same feature.
- Hamilton allows a single ``do_node_execute()`` hook. Since the caching feature uses it, it is currently incompatible with other adapters leveraging it (``PDBDebugger``, ``CacheAdapter`` (deprecated), ``GracefulErrorAdapter`` (somewhat redundant with caching), ``DiskCacheAdapter`` (deprecated), ``NarwhalsAdapter`` (could be refactored))
- the presence of MD5 hashing can be seen as a security risk and prevent adoption. `read more in DVC issues <https://github.com/iterative/dvc/issues/3069>`_
- when hitting the base case of ``fingerprinting.hash_value()`` we return the constant ``UNHASHABLE_VALUE``. If the adapter receives this value, it will append a random UUID to it. This is to prevent collision between unhashable types. This ``data_version`` is no longer deterministic, but the value can still be retrieved or be part of another node's ``cache_key``.
- having ``@functools.singledispatch(object)`` allows to override the base case of ``hash_value()`` because it will catch all types.