blob: a4eaed8d2337eccf7cb5465aa7d95ea230929b00 [file] [log] [blame]
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Licensed to the Apache Software Foundation (ASF) under one\nor more contributor license agreements. See the NOTICE file\ndistributed with this work for additional information\nregarding copyright ownership. The ASF licenses this file\nto you under the Apache License, Version 2.0 (the\n\"License\"); you may not use this file except in compliance\nwith the License. You may obtain a copy of the License at\n\n http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing,\nsoftware distributed under the License is distributed on an\n\"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\nKIND, either express or implied. See the License for the\nspecific language governing permissions and limitations\nunder the License."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Execute this cell to install dependencies\n",
"## ignore_ci\n",
"%pip install sf-hamilton[visualization]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Hamilton caching tutorial [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dagworks-inc/hamilton/blob/main/examples/caching/tutorial.ipynb) [![GitHub badge](https://img.shields.io/badge/github-view_source-2b3137?logo=github)](https://github.com/apache/hamilton/blob/main/examples/caching/tutorial.ipynb)\n",
"\n",
"In Hamilton, **caching** broadly refers to \"reusing results from previous executions to skip redundant computation\". If you change code or pass new data, it will automatically determine which results can be reused and which nodes need to be re-executed. This improves execution speed and reduces resource usage (computation, API credits, etc.).\n",
"\n",
"## Table of contents\n",
"- [Basics](#basics)\n",
" - [Understanding the `cache_key`](#understanding-the-cache_key)\n",
"- [Adding a node](#adding-a-node)\n",
"- [Changing inputs](#changing-inputs)\n",
"- [Changing code](#changing-code)\n",
"- [Changing external data](#changing-external-data)\n",
" - [Idempotency](#idempotency)\n",
" - [`.with_cache()` to specify caching behavior](#with_cache-to-specify-caching-behavior)\n",
" - [`@cache` to specify caching behavior](#cache-to-specify-caching-behavior)\n",
" - [When to use `@cache` vs `.with_cache()`](#when-to-use-cache-vs-with_cache)\n",
"- [Force recompute all](#force-recompute-all)\n",
"- [Setting default behavior](#setting-default-behavior)\n",
"- [Materializers](#materializers)\n",
" - [Usage patterns](#usage-patterns)\n",
"- [Changing the cache format](#changing-the-cache-format)\n",
"- [Introspecting the cache](#introspecting-the-cache)\n",
"- [Managing storage](#managing-storage)\n",
" - [Setting the cache path](#setting-the-cache-path)\n",
" - [Instantiating the result_store and metadata_store](#instantiating-the-result_store-and-metadata_store)\n",
" - [Deleting data and recovering storage](#deleting-data-and-recovering-storage)\n",
"- [Usage patterns](#usage-patterns)\n",
"- 🚧 INTERNALS\n",
" - [Manually retrieve results](#manually-retrieve-results)\n",
" - [Decoding the cache_key](#decoding-the-cache_key)\n",
" - [Manually retrieve metadata](#manually-retrieve-metadata)\n",
"\n",
"\n",
"> NOTE. This notebook is on the longer side. We highly suggest using the navigation bar to help."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Throughout this tutorial, we'll be using the Hamilton notebook extension to define dataflows directly in the notebook ([see tutorial](https://github.com/apache/hamilton/blob/main/examples/jupyter_notebook_magic/example.ipynb)).\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from hamilton import driver\n",
"\n",
"# load the notebook extension\n",
"%reload_ext hamilton.plugins.jupyter_magic"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We import the `logging` module and get the logger from `hamilton.caching`. With the level set to ``INFO``, we'll see ``GET_RESULT`` and ``EXECUTE_NODE`` cache events as they happen."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"\n",
"logger = logging.getLogger(\"hamilton.caching\")\n",
"logger.setLevel(logging.INFO)\n",
"logger.addHandler(logging.StreamHandler())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The next cell deletes the cached data to ensure this notebook can be run from top to bottom without any issues."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import shutil\n",
"\n",
"shutil.rmtree(\"./.hamilton_cache\", ignore_errors=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basics\n",
"\n",
"Throughout this notebook, we'll use the same simple dataflow that processes transactions in various locations and currencies.\n",
"\n",
"We use the cell magic `%%cell_to_module` from the Hamilton notebook extension. It will convert the content of the cell into a Python module that can be loaded by Hamilton. The `--display` flag allows to visualize the dataflow."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"316pt\" height=\"286pt\"\n",
" viewBox=\"0.00 0.00 316.00 285.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 281.5)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-281.5 312,-281.5 312,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"18.5,-137.5 18.5,-269.5 114.5,-269.5 114.5,-137.5 18.5,-137.5\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-254.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- raw_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M104,-127.5C104,-127.5 29,-127.5 29,-127.5 23,-127.5 17,-121.5 17,-115.5 17,-115.5 17,-75.5 17,-75.5 17,-69.5 23,-63.5 29,-63.5 29,-63.5 104,-63.5 104,-63.5 110,-63.5 116,-69.5 116,-75.5 116,-75.5 116,-115.5 116,-115.5 116,-121.5 110,-127.5 104,-127.5\"/>\n",
"<text text-anchor=\"start\" x=\"30\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"28\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M296,-90.5C296,-90.5 174,-90.5 174,-90.5 168,-90.5 162,-84.5 162,-78.5 162,-78.5 162,-38.5 162,-38.5 162,-32.5 168,-26.5 174,-26.5 174,-26.5 296,-26.5 296,-26.5 302,-26.5 308,-32.5 308,-38.5 308,-38.5 308,-78.5 308,-78.5 308,-84.5 302,-90.5 296,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"173\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"196.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M116.11,-84.7C127.39,-82.19 139.71,-79.45 151.99,-76.72\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"152.81,-80.13 161.82,-74.54 151.29,-73.29 152.81,-80.13\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"133,-45 0,-45 0,0 133,0 133,-45\"/>\n",
"<text text-anchor=\"start\" x=\"15.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"99.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M133.31,-36.73C139.45,-38.06 145.73,-39.41 151.99,-40.77\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"151.3,-44.2 161.81,-42.89 152.78,-37.36 151.3,-44.2\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"96,-238 37,-238 37,-201 96,-201 96,-238\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-215.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M94.5,-183C94.5,-183 38.5,-183 38.5,-183 32.5,-183 26.5,-177 26.5,-171 26.5,-171 26.5,-158 26.5,-158 26.5,-152 32.5,-146 38.5,-146 38.5,-146 94.5,-146 94.5,-146 100.5,-146 106.5,-152 106.5,-158 106.5,-158 106.5,-171 106.5,-171 106.5,-177 100.5,-183 94.5,-183\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-160.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f745e4bd050>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%cell_to_module basics_module --display\n",
"import pandas as pd\n",
"\n",
"DATA = {\n",
" \"cities\": [\"New York\", \"Los Angeles\", \"Chicago\", \"Montréal\", \"Vancouver\"],\n",
" \"date\": [\"2024-09-13\", \"2024-09-12\", \"2024-09-11\", \"2024-09-11\", \"2024-09-09\"],\n",
" \"amount\": [478.23, 251.67, 989.34, 742.14, 584.56],\n",
" \"country\": [\"USA\", \"USA\", \"USA\", \"Canada\", \"Canada\"],\n",
" \"currency\": [\"USD\", \"USD\", \"USD\", \"CAD\", \"CAD\"],\n",
"}\n",
"\n",
"def raw_data() -> pd.DataFrame:\n",
" \"\"\"Loading raw data. This simulates loading from a file, database, or external service.\"\"\"\n",
" return pd.DataFrame(DATA)\n",
"\n",
"def processed_data(raw_data: pd.DataFrame, cutoff_date: str) -> pd.DataFrame:\n",
" \"\"\"Filter out rows before cutoff date and convert currency to USD.\"\"\"\n",
" df = raw_data.loc[raw_data.date > cutoff_date].copy()\n",
" df[\"amound_in_usd\"] = df[\"amount\"]\n",
" df.loc[df.country == \"Canada\", \"amound_in_usd\"] *= 0.73\n",
" return df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, we build the ``Driver`` with caching enabled and execute the dataflow."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"raw_data::adapter::execute_node\n",
"processed_data::adapter::execute_node\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" cities date amount country currency amound_in_usd\n",
"0 New York 2024-09-13 478.23 USA USD 478.2300\n",
"1 Los Angeles 2024-09-12 251.67 USA USD 251.6700\n",
"2 Chicago 2024-09-11 989.34 USA USD 989.3400\n",
"3 Montréal 2024-09-11 742.14 Canada CAD 541.7622\n",
"4 Vancouver 2024-09-09 584.56 Canada CAD 426.7288\n"
]
}
],
"source": [
"basics_dr = driver.Builder().with_modules(basics_module).with_cache().build()\n",
"\n",
"basics_results_1 = basics_dr.execute([\"processed_data\"], inputs={\"cutoff_date\": \"2024-09-01\"})\n",
"print()\n",
"print(basics_results_1[\"processed_data\"].head())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can view what values were retrieved from the cache using `dr.cache.view_run()`. Since this was the first execution, nothing is retrieved."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"316pt\" height=\"341pt\"\n",
" viewBox=\"0.00 0.00 316.00 340.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 336.5)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-336.5 312,-336.5 312,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"18.5,-137.5 18.5,-324.5 114.5,-324.5 114.5,-137.5 18.5,-137.5\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-309.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- raw_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M104,-127.5C104,-127.5 29,-127.5 29,-127.5 23,-127.5 17,-121.5 17,-115.5 17,-115.5 17,-75.5 17,-75.5 17,-69.5 23,-63.5 29,-63.5 29,-63.5 104,-63.5 104,-63.5 110,-63.5 116,-69.5 116,-75.5 116,-75.5 116,-115.5 116,-115.5 116,-121.5 110,-127.5 104,-127.5\"/>\n",
"<text text-anchor=\"start\" x=\"30\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"28\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M296,-90.5C296,-90.5 174,-90.5 174,-90.5 168,-90.5 162,-84.5 162,-78.5 162,-78.5 162,-38.5 162,-38.5 162,-32.5 168,-26.5 174,-26.5 174,-26.5 296,-26.5 296,-26.5 302,-26.5 308,-32.5 308,-38.5 308,-38.5 308,-78.5 308,-78.5 308,-84.5 302,-90.5 296,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"173\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"196.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M116.11,-84.7C127.39,-82.19 139.71,-79.45 151.99,-76.72\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"152.81,-80.13 161.82,-74.54 151.29,-73.29 152.81,-80.13\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"133,-45 0,-45 0,0 133,0 133,-45\"/>\n",
"<text text-anchor=\"start\" x=\"15.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"99.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M133.31,-36.73C139.45,-38.06 145.73,-39.41 151.99,-40.77\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"151.3,-44.2 161.81,-42.89 152.78,-37.36 151.3,-44.2\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"96,-293 37,-293 37,-256 96,-256 96,-293\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-270.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M94.5,-238C94.5,-238 38.5,-238 38.5,-238 32.5,-238 26.5,-232 26.5,-226 26.5,-226 26.5,-213 26.5,-213 26.5,-207 32.5,-201 38.5,-201 38.5,-201 94.5,-201 94.5,-201 100.5,-201 106.5,-207 106.5,-213 106.5,-213 106.5,-226 106.5,-226 106.5,-232 100.5,-238 94.5,-238\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-215.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- output -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>output</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M88.5,-183C88.5,-183 44.5,-183 44.5,-183 38.5,-183 32.5,-177 32.5,-171 32.5,-171 32.5,-158 32.5,-158 32.5,-152 38.5,-146 44.5,-146 44.5,-146 88.5,-146 88.5,-146 94.5,-146 100.5,-152 100.5,-158 100.5,-158 100.5,-171 100.5,-171 100.5,-177 94.5,-183 88.5,-183\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-160.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f745e4be2d0>"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"basics_dr.cache.view_run()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On the second execution, `processed_data` is retrieved from cache as reported in the logs and highlighted in the visualization"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"raw_data::result_store::get_result::hit\n",
"processed_data::result_store::get_result::hit\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" cities date amount country currency amound_in_usd\n",
"0 New York 2024-09-13 478.23 USA USD 478.2300\n",
"1 Los Angeles 2024-09-12 251.67 USA USD 251.6700\n",
"2 Chicago 2024-09-11 989.34 USA USD 989.3400\n",
"3 Montréal 2024-09-11 742.14 Canada CAD 541.7622\n",
"4 Vancouver 2024-09-09 584.56 Canada CAD 426.7288\n",
"\n"
]
},
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"316pt\" height=\"341pt\"\n",
" viewBox=\"0.00 0.00 316.00 340.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 336.5)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-336.5 312,-336.5 312,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"8.5,-137.5 8.5,-324.5 124.5,-324.5 124.5,-137.5 8.5,-137.5\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-309.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- raw_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>raw_data</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104,-127.5C104,-127.5 29,-127.5 29,-127.5 23,-127.5 17,-121.5 17,-115.5 17,-115.5 17,-75.5 17,-75.5 17,-69.5 23,-63.5 29,-63.5 29,-63.5 104,-63.5 104,-63.5 110,-63.5 116,-69.5 116,-75.5 116,-75.5 116,-115.5 116,-115.5 116,-121.5 110,-127.5 104,-127.5\"/>\n",
"<text text-anchor=\"start\" x=\"30\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"28\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#ffc857\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M296,-90.5C296,-90.5 174,-90.5 174,-90.5 168,-90.5 162,-84.5 162,-78.5 162,-78.5 162,-38.5 162,-38.5 162,-32.5 168,-26.5 174,-26.5 174,-26.5 296,-26.5 296,-26.5 302,-26.5 308,-32.5 308,-38.5 308,-38.5 308,-78.5 308,-78.5 308,-84.5 302,-90.5 296,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"173\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"196.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M116.11,-84.7C127.39,-82.19 139.71,-79.45 151.99,-76.72\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"152.81,-80.13 161.82,-74.54 151.29,-73.29 152.81,-80.13\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"133,-45 0,-45 0,0 133,0 133,-45\"/>\n",
"<text text-anchor=\"start\" x=\"15.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"99.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M133.31,-36.73C139.45,-38.06 145.73,-39.41 151.99,-40.77\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"151.3,-44.2 161.81,-42.89 152.78,-37.36 151.3,-44.2\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"96,-293 37,-293 37,-256 96,-256 96,-293\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-270.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- output -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>output</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M88.5,-238C88.5,-238 44.5,-238 44.5,-238 38.5,-238 32.5,-232 32.5,-226 32.5,-226 32.5,-213 32.5,-213 32.5,-207 38.5,-201 44.5,-201 44.5,-201 88.5,-201 88.5,-201 94.5,-201 100.5,-207 100.5,-213 100.5,-213 100.5,-226 100.5,-226 100.5,-232 94.5,-238 88.5,-238\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-215.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n",
"</g>\n",
"<!-- from cache -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>from cache</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104.5,-183C104.5,-183 28.5,-183 28.5,-183 22.5,-183 16.5,-177 16.5,-171 16.5,-171 16.5,-158 16.5,-158 16.5,-152 22.5,-146 28.5,-146 28.5,-146 104.5,-146 104.5,-146 110.5,-146 116.5,-152 116.5,-158 116.5,-158 116.5,-171 116.5,-171 116.5,-177 110.5,-183 104.5,-183\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-160.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">from cache</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f745e4bc290>"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"basics_results_2 = basics_dr.execute([\"processed_data\"], inputs={\"cutoff_date\": \"2024-09-01\"})\n",
"print()\n",
"print(basics_results_2[\"processed_data\"].head())\n",
"print()\n",
"basics_dr.cache.view_run()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Understanding the `cache_key`\n",
"\n",
"The Hamilton cache stores results using a `cache_key`. It is composed of the node's name (`node_name`), the code that defines it (`code_version`), and its data inputs (`data_version` of its dependencies).\n",
"\n",
"For example, the cache keys for the previous cells are:\n",
"\n",
"```json\n",
"{\n",
" \"node_name\": \"raw_data\",\n",
" \"code_version\": \"9d727859b9fd883247c3379d4d25a35af4a56df9d9fde20c75c6375dde631c68\",\n",
" \"dependencies_data_versions\": {} // it has no dependencies\n",
"}\n",
"{\n",
" \"node_name\": \"processed_data\",\n",
" \"code_version\": \"c9e3377d6c5044944bd89eeb7073c730ee8707627c39906b4156c6411f056f00\",\n",
" \"dependencies_data_versions\": {\n",
" \"cutoff_date\": \"WkGjJythLWYAIj2Qr8T_ug==\", // input value\n",
" \"raw_data\": \"t-BDcMLikFSNdn4piUKy1mBcKPoEsnsYjUNzWg==\" // raw_data's result\n",
" }\n",
"}\n",
"```\n",
"\n",
"Results could be successfully retrieved because nodes in the first execution and second execution shared the same `cache_key`.\n",
"\n",
"The `cache_key` objects are internal and you won't have to interact with them directly. However, keep that concept in mind throughout this tutorial. Towards the end, we show how to manually handle the `cache_key` for debugging."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Adding a node\n",
"\n",
"Let's say you're iteratively developing your dataflow and you add a new node. Here, we copy the previous module into a new module named `adding_node_module` and define the node `amount_per_country`.\n",
"\n",
"> In practice, you would edit the cell directly, but this makes the notebook easier to read and maintain"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"527pt\" height=\"286pt\"\n",
" viewBox=\"0.00 0.00 527.00 285.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 281.5)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-281.5 523,-281.5 523,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"18.5,-137.5 18.5,-269.5 114.5,-269.5 114.5,-137.5 18.5,-137.5\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-254.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- raw_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M104,-127.5C104,-127.5 29,-127.5 29,-127.5 23,-127.5 17,-121.5 17,-115.5 17,-115.5 17,-75.5 17,-75.5 17,-69.5 23,-63.5 29,-63.5 29,-63.5 104,-63.5 104,-63.5 110,-63.5 116,-69.5 116,-75.5 116,-75.5 116,-115.5 116,-115.5 116,-121.5 110,-127.5 104,-127.5\"/>\n",
"<text text-anchor=\"start\" x=\"30\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"28\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M296,-90.5C296,-90.5 174,-90.5 174,-90.5 168,-90.5 162,-84.5 162,-78.5 162,-78.5 162,-38.5 162,-38.5 162,-32.5 168,-26.5 174,-26.5 174,-26.5 296,-26.5 296,-26.5 302,-26.5 308,-32.5 308,-38.5 308,-38.5 308,-78.5 308,-78.5 308,-84.5 302,-90.5 296,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"173\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"196.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M116.11,-84.7C127.39,-82.19 139.71,-79.45 151.99,-76.72\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"152.81,-80.13 161.82,-74.54 151.29,-73.29 152.81,-80.13\"/>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M507,-90.5C507,-90.5 349,-90.5 349,-90.5 343,-90.5 337,-84.5 337,-78.5 337,-78.5 337,-38.5 337,-38.5 337,-32.5 343,-26.5 349,-26.5 349,-26.5 507,-26.5 507,-26.5 513,-26.5 519,-32.5 519,-38.5 519,-38.5 519,-78.5 519,-78.5 519,-84.5 513,-90.5 507,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"348\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"389.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M308.21,-58.5C314.23,-58.5 320.39,-58.5 326.57,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"326.98,-62 336.98,-58.5 326.98,-55 326.98,-62\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"133,-45 0,-45 0,0 133,0 133,-45\"/>\n",
"<text text-anchor=\"start\" x=\"15.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"99.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M133.31,-36.73C139.45,-38.06 145.73,-39.41 151.99,-40.77\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"151.3,-44.2 161.81,-42.89 152.78,-37.36 151.3,-44.2\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"96,-238 37,-238 37,-201 96,-201 96,-238\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-215.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M94.5,-183C94.5,-183 38.5,-183 38.5,-183 32.5,-183 26.5,-177 26.5,-171 26.5,-171 26.5,-158 26.5,-158 26.5,-152 32.5,-146 38.5,-146 38.5,-146 94.5,-146 94.5,-146 100.5,-146 106.5,-152 106.5,-158 106.5,-158 106.5,-171 106.5,-171 106.5,-177 100.5,-183 94.5,-183\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-160.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f745e4d4e10>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%cell_to_module adding_node_module --display\n",
"import pandas as pd\n",
"\n",
"DATA = {\n",
" \"cities\": [\"New York\", \"Los Angeles\", \"Chicago\", \"Montréal\", \"Vancouver\"],\n",
" \"date\": [\"2024-09-13\", \"2024-09-12\", \"2024-09-11\", \"2024-09-11\", \"2024-09-09\"],\n",
" \"amount\": [478.23, 251.67, 989.34, 742.14, 584.56],\n",
" \"country\": [\"USA\", \"USA\", \"USA\", \"Canada\", \"Canada\"],\n",
" \"currency\": [\"USD\", \"USD\", \"USD\", \"CAD\", \"CAD\"],\n",
"}\n",
"\n",
"def raw_data() -> pd.DataFrame:\n",
" \"\"\"Loading raw data. This simulates loading from a file, database, or external service.\"\"\"\n",
" return pd.DataFrame(DATA)\n",
"\n",
"def processed_data(raw_data: pd.DataFrame, cutoff_date: str) -> pd.DataFrame:\n",
" \"\"\"Filter out rows before cutoff date and convert currency to USD.\"\"\"\n",
" df = raw_data.loc[raw_data.date > cutoff_date].copy()\n",
" df[\"amound_in_usd\"] = df[\"amount\"]\n",
" df.loc[df.country == \"Canada\", \"amound_in_usd\"] *= 0.73\n",
" return df\n",
"\n",
"def amount_per_country(processed_data: pd.DataFrame) -> pd.DataFrame:\n",
" \"\"\"Sum the amount in USD per country\"\"\"\n",
" return processed_data.groupby(\"country\")[\"amound_in_usd\"].sum().to_frame()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We build a new `Driver` with `adding_node_module` and execute the dataflow. You'll notice that `raw_data` and `processed_data` are retrieved and only `amount_per_country` is executed."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"raw_data::result_store::get_result::hit\n",
"processed_data::result_store::get_result::hit\n",
"amount_per_country::adapter::execute_node\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" amound_in_usd\n",
"country \n",
"Canada 968.491\n",
"USA 1719.240\n",
"\n"
]
},
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"527pt\" height=\"396pt\"\n",
" viewBox=\"0.00 0.00 527.00 395.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 391.5)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-391.5 523,-391.5 523,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"8.5,-137.5 8.5,-379.5 124.5,-379.5 124.5,-137.5 8.5,-137.5\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-364.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#ffc857\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M296,-90.5C296,-90.5 174,-90.5 174,-90.5 168,-90.5 162,-84.5 162,-78.5 162,-78.5 162,-38.5 162,-38.5 162,-32.5 168,-26.5 174,-26.5 174,-26.5 296,-26.5 296,-26.5 302,-26.5 308,-32.5 308,-38.5 308,-38.5 308,-78.5 308,-78.5 308,-84.5 302,-90.5 296,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"173\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"196.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M507,-90.5C507,-90.5 349,-90.5 349,-90.5 343,-90.5 337,-84.5 337,-78.5 337,-78.5 337,-38.5 337,-38.5 337,-32.5 343,-26.5 349,-26.5 349,-26.5 507,-26.5 507,-26.5 513,-26.5 519,-32.5 519,-38.5 519,-38.5 519,-78.5 519,-78.5 519,-84.5 513,-90.5 507,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"348\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"389.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M308.21,-58.5C314.23,-58.5 320.39,-58.5 326.57,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"326.98,-62 336.98,-58.5 326.98,-55 326.98,-62\"/>\n",
"</g>\n",
"<!-- raw_data -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>raw_data</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104,-127.5C104,-127.5 29,-127.5 29,-127.5 23,-127.5 17,-121.5 17,-115.5 17,-115.5 17,-75.5 17,-75.5 17,-69.5 23,-63.5 29,-63.5 29,-63.5 104,-63.5 104,-63.5 110,-63.5 116,-69.5 116,-75.5 116,-75.5 116,-115.5 116,-115.5 116,-121.5 110,-127.5 104,-127.5\"/>\n",
"<text text-anchor=\"start\" x=\"30\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"28\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M116.11,-84.7C127.39,-82.19 139.71,-79.45 151.99,-76.72\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"152.81,-80.13 161.82,-74.54 151.29,-73.29 152.81,-80.13\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"133,-45 0,-45 0,0 133,0 133,-45\"/>\n",
"<text text-anchor=\"start\" x=\"15.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"99.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M133.31,-36.73C139.45,-38.06 145.73,-39.41 151.99,-40.77\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"151.3,-44.2 161.81,-42.89 152.78,-37.36 151.3,-44.2\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"96,-348 37,-348 37,-311 96,-311 96,-348\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-325.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M94.5,-293C94.5,-293 38.5,-293 38.5,-293 32.5,-293 26.5,-287 26.5,-281 26.5,-281 26.5,-268 26.5,-268 26.5,-262 32.5,-256 38.5,-256 38.5,-256 94.5,-256 94.5,-256 100.5,-256 106.5,-262 106.5,-268 106.5,-268 106.5,-281 106.5,-281 106.5,-287 100.5,-293 94.5,-293\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-270.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- output -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>output</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M88.5,-238C88.5,-238 44.5,-238 44.5,-238 38.5,-238 32.5,-232 32.5,-226 32.5,-226 32.5,-213 32.5,-213 32.5,-207 38.5,-201 44.5,-201 44.5,-201 88.5,-201 88.5,-201 94.5,-201 100.5,-207 100.5,-213 100.5,-213 100.5,-226 100.5,-226 100.5,-232 94.5,-238 88.5,-238\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-215.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n",
"</g>\n",
"<!-- from cache -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>from cache</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104.5,-183C104.5,-183 28.5,-183 28.5,-183 22.5,-183 16.5,-177 16.5,-171 16.5,-171 16.5,-158 16.5,-158 16.5,-152 22.5,-146 28.5,-146 28.5,-146 104.5,-146 104.5,-146 110.5,-146 116.5,-152 116.5,-158 116.5,-158 116.5,-171 116.5,-171 116.5,-177 110.5,-183 104.5,-183\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-160.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">from cache</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f745e4dddd0>"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"adding_node_dr = driver.Builder().with_modules(adding_node_module).with_cache().build()\n",
"\n",
"adding_node_results = adding_node_dr.execute(\n",
" [\"processed_data\", \"amount_per_country\"], inputs={\"cutoff_date\": \"2024-09-01\"}\n",
")\n",
"print()\n",
"print(adding_node_results[\"amount_per_country\"].head())\n",
"print()\n",
"adding_node_dr.cache.view_run()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Even though this is the first execution of `adding_node_dr` and the module `adding_node_module`, the cache contains results for `raw_data` and `processed_data`. We're able to retrieve values because they have the same cache keys (code version and dependencies data versions).\n",
"\n",
"This means you can reuse cached results across dataflows. This is particularly useful with training and inference machine learning pipelines."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Changing inputs\n",
"\n",
"We reuse the same dataflow `adding_node_module`, but change the input `cutoff_date` from\n",
"`\"2024-09-01\"` to `\"2024-09-11\"`. \n",
"\n",
"\n",
"This new input forces `processed_data` to be re-executed. This produces a new result for `processed_data`, which cascades and also forced `amount_per_country` to be re-executed."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"raw_data::result_store::get_result::hit\n",
"processed_data::adapter::execute_node\n",
"amount_per_country::adapter::execute_node\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" amound_in_usd\n",
"country \n",
"USA 729.9\n",
"\n"
]
},
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"527pt\" height=\"396pt\"\n",
" viewBox=\"0.00 0.00 527.00 395.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 391.5)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-391.5 523,-391.5 523,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"8.5,-137.5 8.5,-379.5 124.5,-379.5 124.5,-137.5 8.5,-137.5\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-364.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M296,-90.5C296,-90.5 174,-90.5 174,-90.5 168,-90.5 162,-84.5 162,-78.5 162,-78.5 162,-38.5 162,-38.5 162,-32.5 168,-26.5 174,-26.5 174,-26.5 296,-26.5 296,-26.5 302,-26.5 308,-32.5 308,-38.5 308,-38.5 308,-78.5 308,-78.5 308,-84.5 302,-90.5 296,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"173\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"196.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M507,-90.5C507,-90.5 349,-90.5 349,-90.5 343,-90.5 337,-84.5 337,-78.5 337,-78.5 337,-38.5 337,-38.5 337,-32.5 343,-26.5 349,-26.5 349,-26.5 507,-26.5 507,-26.5 513,-26.5 519,-32.5 519,-38.5 519,-38.5 519,-78.5 519,-78.5 519,-84.5 513,-90.5 507,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"348\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"389.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M308.21,-58.5C314.23,-58.5 320.39,-58.5 326.57,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"326.98,-62 336.98,-58.5 326.98,-55 326.98,-62\"/>\n",
"</g>\n",
"<!-- raw_data -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>raw_data</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104,-127.5C104,-127.5 29,-127.5 29,-127.5 23,-127.5 17,-121.5 17,-115.5 17,-115.5 17,-75.5 17,-75.5 17,-69.5 23,-63.5 29,-63.5 29,-63.5 104,-63.5 104,-63.5 110,-63.5 116,-69.5 116,-75.5 116,-75.5 116,-115.5 116,-115.5 116,-121.5 110,-127.5 104,-127.5\"/>\n",
"<text text-anchor=\"start\" x=\"30\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"28\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M116.11,-84.7C127.39,-82.19 139.71,-79.45 151.99,-76.72\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"152.81,-80.13 161.82,-74.54 151.29,-73.29 152.81,-80.13\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"133,-45 0,-45 0,0 133,0 133,-45\"/>\n",
"<text text-anchor=\"start\" x=\"15.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"99.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M133.31,-36.73C139.45,-38.06 145.73,-39.41 151.99,-40.77\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"151.3,-44.2 161.81,-42.89 152.78,-37.36 151.3,-44.2\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"96,-348 37,-348 37,-311 96,-311 96,-348\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-325.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M94.5,-293C94.5,-293 38.5,-293 38.5,-293 32.5,-293 26.5,-287 26.5,-281 26.5,-281 26.5,-268 26.5,-268 26.5,-262 32.5,-256 38.5,-256 38.5,-256 94.5,-256 94.5,-256 100.5,-256 106.5,-262 106.5,-268 106.5,-268 106.5,-281 106.5,-281 106.5,-287 100.5,-293 94.5,-293\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-270.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- output -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>output</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M88.5,-238C88.5,-238 44.5,-238 44.5,-238 38.5,-238 32.5,-232 32.5,-226 32.5,-226 32.5,-213 32.5,-213 32.5,-207 38.5,-201 44.5,-201 44.5,-201 88.5,-201 88.5,-201 94.5,-201 100.5,-207 100.5,-213 100.5,-213 100.5,-226 100.5,-226 100.5,-232 94.5,-238 88.5,-238\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-215.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n",
"</g>\n",
"<!-- from cache -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>from cache</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104.5,-183C104.5,-183 28.5,-183 28.5,-183 22.5,-183 16.5,-177 16.5,-171 16.5,-171 16.5,-158 16.5,-158 16.5,-152 22.5,-146 28.5,-146 28.5,-146 104.5,-146 104.5,-146 110.5,-146 116.5,-152 116.5,-158 116.5,-158 116.5,-171 116.5,-171 116.5,-177 110.5,-183 104.5,-183\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-160.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">from cache</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f745e4e74d0>"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"changing_inputs_dr = driver.Builder().with_modules(adding_node_module).with_cache().build()\n",
"\n",
"changing_inputs_results_1 = changing_inputs_dr.execute(\n",
" [\"processed_data\", \"amount_per_country\"], inputs={\"cutoff_date\": \"2024-09-11\"}\n",
")\n",
"print()\n",
"print(changing_inputs_results_1[\"amount_per_country\"].head())\n",
"print()\n",
"changing_inputs_dr.cache.view_run()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we execute with the `cutoff_date` value `\"2024-09-05\"`, which forces `processed_data` to be executed."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"raw_data::result_store::get_result::hit\n",
"processed_data::adapter::execute_node\n",
"amount_per_country::result_store::get_result::hit\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" amound_in_usd\n",
"country \n",
"Canada 968.491\n",
"USA 1719.240\n",
"\n"
]
},
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"527pt\" height=\"396pt\"\n",
" viewBox=\"0.00 0.00 527.00 395.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 391.5)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-391.5 523,-391.5 523,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"8.5,-137.5 8.5,-379.5 124.5,-379.5 124.5,-137.5 8.5,-137.5\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-364.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M296,-90.5C296,-90.5 174,-90.5 174,-90.5 168,-90.5 162,-84.5 162,-78.5 162,-78.5 162,-38.5 162,-38.5 162,-32.5 168,-26.5 174,-26.5 174,-26.5 296,-26.5 296,-26.5 302,-26.5 308,-32.5 308,-38.5 308,-38.5 308,-78.5 308,-78.5 308,-84.5 302,-90.5 296,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"173\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"196.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#ffc857\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M507,-90.5C507,-90.5 349,-90.5 349,-90.5 343,-90.5 337,-84.5 337,-78.5 337,-78.5 337,-38.5 337,-38.5 337,-32.5 343,-26.5 349,-26.5 349,-26.5 507,-26.5 507,-26.5 513,-26.5 519,-32.5 519,-38.5 519,-38.5 519,-78.5 519,-78.5 519,-84.5 513,-90.5 507,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"348\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"389.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M308.21,-58.5C314.23,-58.5 320.39,-58.5 326.57,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"326.98,-62 336.98,-58.5 326.98,-55 326.98,-62\"/>\n",
"</g>\n",
"<!-- raw_data -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>raw_data</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104,-127.5C104,-127.5 29,-127.5 29,-127.5 23,-127.5 17,-121.5 17,-115.5 17,-115.5 17,-75.5 17,-75.5 17,-69.5 23,-63.5 29,-63.5 29,-63.5 104,-63.5 104,-63.5 110,-63.5 116,-69.5 116,-75.5 116,-75.5 116,-115.5 116,-115.5 116,-121.5 110,-127.5 104,-127.5\"/>\n",
"<text text-anchor=\"start\" x=\"30\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"28\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M116.11,-84.7C127.39,-82.19 139.71,-79.45 151.99,-76.72\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"152.81,-80.13 161.82,-74.54 151.29,-73.29 152.81,-80.13\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"133,-45 0,-45 0,0 133,0 133,-45\"/>\n",
"<text text-anchor=\"start\" x=\"15.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"99.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M133.31,-36.73C139.45,-38.06 145.73,-39.41 151.99,-40.77\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"151.3,-44.2 161.81,-42.89 152.78,-37.36 151.3,-44.2\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"96,-348 37,-348 37,-311 96,-311 96,-348\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-325.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M94.5,-293C94.5,-293 38.5,-293 38.5,-293 32.5,-293 26.5,-287 26.5,-281 26.5,-281 26.5,-268 26.5,-268 26.5,-262 32.5,-256 38.5,-256 38.5,-256 94.5,-256 94.5,-256 100.5,-256 106.5,-262 106.5,-268 106.5,-268 106.5,-281 106.5,-281 106.5,-287 100.5,-293 94.5,-293\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-270.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- output -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>output</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M88.5,-238C88.5,-238 44.5,-238 44.5,-238 38.5,-238 32.5,-232 32.5,-226 32.5,-226 32.5,-213 32.5,-213 32.5,-207 38.5,-201 44.5,-201 44.5,-201 88.5,-201 88.5,-201 94.5,-201 100.5,-207 100.5,-213 100.5,-213 100.5,-226 100.5,-226 100.5,-232 94.5,-238 88.5,-238\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-215.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n",
"</g>\n",
"<!-- from cache -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>from cache</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104.5,-183C104.5,-183 28.5,-183 28.5,-183 22.5,-183 16.5,-177 16.5,-171 16.5,-171 16.5,-158 16.5,-158 16.5,-152 22.5,-146 28.5,-146 28.5,-146 104.5,-146 104.5,-146 110.5,-146 116.5,-152 116.5,-158 116.5,-158 116.5,-171 116.5,-171 116.5,-177 110.5,-183 104.5,-183\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-160.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">from cache</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f745e4fb050>"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"changing_inputs_results_2 = changing_inputs_dr.execute(\n",
" [\"processed_data\", \"amount_per_country\"], inputs={\"cutoff_date\": \"2024-09-05\"}\n",
")\n",
"print()\n",
"print(changing_inputs_results_2[\"amount_per_country\"].head())\n",
"print()\n",
"changing_inputs_dr.cache.view_run()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice that the cache could still retrieve `amount_per_country`. This is because `processed_data` return a value that had been cached previously (in the `Adding a node` section).\n",
"\n",
"In concrete terms, filtering rows by the date `\"2024-09-05\"` or `\"2024-09-01\"` includes the same rows and produces the same dataframe."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" cities date amount country currency amound_in_usd\n",
"0 New York 2024-09-13 478.23 USA USD 478.2300\n",
"1 Los Angeles 2024-09-12 251.67 USA USD 251.6700\n",
"2 Chicago 2024-09-11 989.34 USA USD 989.3400\n",
"3 Montréal 2024-09-11 742.14 Canada CAD 541.7622\n",
"4 Vancouver 2024-09-09 584.56 Canada CAD 426.7288\n",
"\n",
" cities date amount country currency amound_in_usd\n",
"0 New York 2024-09-13 478.23 USA USD 478.2300\n",
"1 Los Angeles 2024-09-12 251.67 USA USD 251.6700\n",
"2 Chicago 2024-09-11 989.34 USA USD 989.3400\n",
"3 Montréal 2024-09-11 742.14 Canada CAD 541.7622\n",
"4 Vancouver 2024-09-09 584.56 Canada CAD 426.7288\n"
]
}
],
"source": [
"print(adding_node_results[\"processed_data\"])\n",
"print()\n",
"print(changing_inputs_results_2[\"processed_data\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Changing code\n",
"As you develop your dataflow, you will need to edit upstream nodes. Caching will automatically detect code changes and determine which node needs to be re-executed. In `processed_data()`, we'll change the conversation rate from `0.73` to `0.71`.\n",
"\n",
"> NOTE. changes to docstrings and comments `#` are ignored when versioning a node."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"%%cell_to_module changing_code_module\n",
"import pandas as pd\n",
"\n",
"DATA = {\n",
" \"cities\": [\"New York\", \"Los Angeles\", \"Chicago\", \"Montréal\", \"Vancouver\"],\n",
" \"date\": [\"2024-09-13\", \"2024-09-12\", \"2024-09-11\", \"2024-09-11\", \"2024-09-09\"],\n",
" \"amount\": [478.23, 251.67, 989.34, 742.14, 584.56],\n",
" \"country\": [\"USA\", \"USA\", \"USA\", \"Canada\", \"Canada\"],\n",
" \"currency\": [\"USD\", \"USD\", \"USD\", \"CAD\", \"CAD\"],\n",
"}\n",
"\n",
"def raw_data() -> pd.DataFrame:\n",
" \"\"\"Loading raw data. This simulates loading from a file, database, or external service.\"\"\"\n",
" return pd.DataFrame(DATA)\n",
"\n",
"def processed_data(raw_data: pd.DataFrame, cutoff_date: str) -> pd.DataFrame:\n",
" \"\"\"Filter out rows before cutoff date and convert currency to USD.\"\"\"\n",
" df = raw_data.loc[raw_data.date > cutoff_date].copy()\n",
" df[\"amound_in_usd\"] = df[\"amount\"]\n",
" df.loc[df.country == \"Canada\", \"amound_in_usd\"] *= 0.71 # <- VALUE CHANGED FROM module_2\n",
" return df\n",
"\n",
"def amount_per_country(processed_data: pd.DataFrame) -> pd.DataFrame:\n",
" \"\"\"Sum the amount in USD per country\"\"\"\n",
" return processed_data.groupby(\"country\")[\"amound_in_usd\"].sum().to_frame()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We need to execute `processed_data` because the code change created a new `cache_key` and led to a cache miss. Then, `processed_data` returns a previously unseen value, forcing `amount_per_country` to also be re-executed"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"raw_data::result_store::get_result::hit\n",
"processed_data::adapter::execute_node\n",
"amount_per_country::adapter::execute_node\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" amound_in_usd\n",
"country \n",
"Canada 941.957\n",
"USA 1719.240\n",
"\n"
]
},
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"527pt\" height=\"396pt\"\n",
" viewBox=\"0.00 0.00 527.00 395.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 391.5)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-391.5 523,-391.5 523,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"8.5,-137.5 8.5,-379.5 124.5,-379.5 124.5,-137.5 8.5,-137.5\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-364.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M296,-90.5C296,-90.5 174,-90.5 174,-90.5 168,-90.5 162,-84.5 162,-78.5 162,-78.5 162,-38.5 162,-38.5 162,-32.5 168,-26.5 174,-26.5 174,-26.5 296,-26.5 296,-26.5 302,-26.5 308,-32.5 308,-38.5 308,-38.5 308,-78.5 308,-78.5 308,-84.5 302,-90.5 296,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"173\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"196.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M507,-90.5C507,-90.5 349,-90.5 349,-90.5 343,-90.5 337,-84.5 337,-78.5 337,-78.5 337,-38.5 337,-38.5 337,-32.5 343,-26.5 349,-26.5 349,-26.5 507,-26.5 507,-26.5 513,-26.5 519,-32.5 519,-38.5 519,-38.5 519,-78.5 519,-78.5 519,-84.5 513,-90.5 507,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"348\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"389.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M308.21,-58.5C314.23,-58.5 320.39,-58.5 326.57,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"326.98,-62 336.98,-58.5 326.98,-55 326.98,-62\"/>\n",
"</g>\n",
"<!-- raw_data -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>raw_data</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104,-127.5C104,-127.5 29,-127.5 29,-127.5 23,-127.5 17,-121.5 17,-115.5 17,-115.5 17,-75.5 17,-75.5 17,-69.5 23,-63.5 29,-63.5 29,-63.5 104,-63.5 104,-63.5 110,-63.5 116,-69.5 116,-75.5 116,-75.5 116,-115.5 116,-115.5 116,-121.5 110,-127.5 104,-127.5\"/>\n",
"<text text-anchor=\"start\" x=\"30\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"28\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M116.11,-84.7C127.39,-82.19 139.71,-79.45 151.99,-76.72\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"152.81,-80.13 161.82,-74.54 151.29,-73.29 152.81,-80.13\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"133,-45 0,-45 0,0 133,0 133,-45\"/>\n",
"<text text-anchor=\"start\" x=\"15.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"99.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M133.31,-36.73C139.45,-38.06 145.73,-39.41 151.99,-40.77\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"151.3,-44.2 161.81,-42.89 152.78,-37.36 151.3,-44.2\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"96,-348 37,-348 37,-311 96,-311 96,-348\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-325.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M94.5,-293C94.5,-293 38.5,-293 38.5,-293 32.5,-293 26.5,-287 26.5,-281 26.5,-281 26.5,-268 26.5,-268 26.5,-262 32.5,-256 38.5,-256 38.5,-256 94.5,-256 94.5,-256 100.5,-256 106.5,-262 106.5,-268 106.5,-268 106.5,-281 106.5,-281 106.5,-287 100.5,-293 94.5,-293\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-270.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- output -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>output</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M88.5,-238C88.5,-238 44.5,-238 44.5,-238 38.5,-238 32.5,-232 32.5,-226 32.5,-226 32.5,-213 32.5,-213 32.5,-207 38.5,-201 44.5,-201 44.5,-201 88.5,-201 88.5,-201 94.5,-201 100.5,-207 100.5,-213 100.5,-213 100.5,-226 100.5,-226 100.5,-232 94.5,-238 88.5,-238\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-215.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n",
"</g>\n",
"<!-- from cache -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>from cache</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104.5,-183C104.5,-183 28.5,-183 28.5,-183 22.5,-183 16.5,-177 16.5,-171 16.5,-171 16.5,-158 16.5,-158 16.5,-152 22.5,-146 28.5,-146 28.5,-146 104.5,-146 104.5,-146 110.5,-146 116.5,-152 116.5,-158 116.5,-158 116.5,-171 116.5,-171 116.5,-177 110.5,-183 104.5,-183\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-160.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">from cache</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f745e511a10>"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"changing_code_dr_1 = driver.Builder().with_modules(changing_code_module).with_cache().build()\n",
"\n",
"changing_code_results_1 = changing_code_dr_1.execute(\n",
" [\"processed_data\", \"amount_per_country\"], inputs={\"cutoff_date\": \"2024-09-01\"}\n",
")\n",
"print()\n",
"print(changing_code_results_1[\"amount_per_country\"].head())\n",
"print()\n",
"changing_code_dr_1.cache.view_run()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We make another code change to `processed_data` to accomodate currency conversion for Brazil and Mexico."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"%%cell_to_module changing_code_module_2\n",
"import pandas as pd\n",
"\n",
"DATA = {\n",
" \"cities\": [\"New York\", \"Los Angeles\", \"Chicago\", \"Montréal\", \"Vancouver\"],\n",
" \"date\": [\"2024-09-13\", \"2024-09-12\", \"2024-09-11\", \"2024-09-11\", \"2024-09-09\"],\n",
" \"amount\": [478.23, 251.67, 989.34, 742.14, 584.56],\n",
" \"country\": [\"USA\", \"USA\", \"USA\", \"Canada\", \"Canada\"],\n",
" \"currency\": [\"USD\", \"USD\", \"USD\", \"CAD\", \"CAD\"],\n",
"}\n",
"\n",
"def raw_data() -> pd.DataFrame:\n",
" \"\"\"Loading raw data. This simulates loading from a file, database, or external service.\"\"\"\n",
" return pd.DataFrame(DATA)\n",
"\n",
"def processed_data(raw_data: pd.DataFrame, cutoff_date: str) -> pd.DataFrame:\n",
" \"\"\"Filter out rows before cutoff date and convert currency to USD.\"\"\"\n",
" df = raw_data.loc[raw_data.date > cutoff_date].copy()\n",
" df[\"amound_in_usd\"] = df[\"amount\"]\n",
" df.loc[df.country == \"Canada\", \"amound_in_usd\"] *= 0.71 \n",
" df.loc[df.country == \"Brazil\", \"amound_in_usd\"] *= 0.18 # <- LINE ADDED\n",
" df.loc[df.country == \"Mexico\", \"amound_in_usd\"] *= 0.05 # <- LINE ADDED\n",
" return df\n",
"\n",
"def amount_per_country(processed_data: pd.DataFrame) -> pd.DataFrame:\n",
" \"\"\"Sum the amount in USD per country\"\"\"\n",
" return processed_data.groupby(\"country\")[\"amound_in_usd\"].sum().to_frame()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Again, the code change forces `processed_data` to be executed."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"raw_data::result_store::get_result::hit\n",
"processed_data::adapter::execute_node\n",
"amount_per_country::result_store::get_result::hit\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" amound_in_usd\n",
"country \n",
"Canada 941.957\n",
"USA 1719.240\n",
"\n"
]
},
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"527pt\" height=\"396pt\"\n",
" viewBox=\"0.00 0.00 527.00 395.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 391.5)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-391.5 523,-391.5 523,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"8.5,-137.5 8.5,-379.5 124.5,-379.5 124.5,-137.5 8.5,-137.5\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-364.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M296,-90.5C296,-90.5 174,-90.5 174,-90.5 168,-90.5 162,-84.5 162,-78.5 162,-78.5 162,-38.5 162,-38.5 162,-32.5 168,-26.5 174,-26.5 174,-26.5 296,-26.5 296,-26.5 302,-26.5 308,-32.5 308,-38.5 308,-38.5 308,-78.5 308,-78.5 308,-84.5 302,-90.5 296,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"173\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"196.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#ffc857\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M507,-90.5C507,-90.5 349,-90.5 349,-90.5 343,-90.5 337,-84.5 337,-78.5 337,-78.5 337,-38.5 337,-38.5 337,-32.5 343,-26.5 349,-26.5 349,-26.5 507,-26.5 507,-26.5 513,-26.5 519,-32.5 519,-38.5 519,-38.5 519,-78.5 519,-78.5 519,-84.5 513,-90.5 507,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"348\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"389.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M308.21,-58.5C314.23,-58.5 320.39,-58.5 326.57,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"326.98,-62 336.98,-58.5 326.98,-55 326.98,-62\"/>\n",
"</g>\n",
"<!-- raw_data -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>raw_data</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104,-127.5C104,-127.5 29,-127.5 29,-127.5 23,-127.5 17,-121.5 17,-115.5 17,-115.5 17,-75.5 17,-75.5 17,-69.5 23,-63.5 29,-63.5 29,-63.5 104,-63.5 104,-63.5 110,-63.5 116,-69.5 116,-75.5 116,-75.5 116,-115.5 116,-115.5 116,-121.5 110,-127.5 104,-127.5\"/>\n",
"<text text-anchor=\"start\" x=\"30\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"28\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M116.11,-84.7C127.39,-82.19 139.71,-79.45 151.99,-76.72\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"152.81,-80.13 161.82,-74.54 151.29,-73.29 152.81,-80.13\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"133,-45 0,-45 0,0 133,0 133,-45\"/>\n",
"<text text-anchor=\"start\" x=\"15.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"99.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M133.31,-36.73C139.45,-38.06 145.73,-39.41 151.99,-40.77\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"151.3,-44.2 161.81,-42.89 152.78,-37.36 151.3,-44.2\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"96,-348 37,-348 37,-311 96,-311 96,-348\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-325.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M94.5,-293C94.5,-293 38.5,-293 38.5,-293 32.5,-293 26.5,-287 26.5,-281 26.5,-281 26.5,-268 26.5,-268 26.5,-262 32.5,-256 38.5,-256 38.5,-256 94.5,-256 94.5,-256 100.5,-256 106.5,-262 106.5,-268 106.5,-268 106.5,-281 106.5,-281 106.5,-287 100.5,-293 94.5,-293\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-270.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- output -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>output</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M88.5,-238C88.5,-238 44.5,-238 44.5,-238 38.5,-238 32.5,-232 32.5,-226 32.5,-226 32.5,-213 32.5,-213 32.5,-207 38.5,-201 44.5,-201 44.5,-201 88.5,-201 88.5,-201 94.5,-201 100.5,-207 100.5,-213 100.5,-213 100.5,-226 100.5,-226 100.5,-232 94.5,-238 88.5,-238\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-215.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n",
"</g>\n",
"<!-- from cache -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>from cache</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104.5,-183C104.5,-183 28.5,-183 28.5,-183 22.5,-183 16.5,-177 16.5,-171 16.5,-171 16.5,-158 16.5,-158 16.5,-152 22.5,-146 28.5,-146 28.5,-146 104.5,-146 104.5,-146 110.5,-146 116.5,-152 116.5,-158 116.5,-158 116.5,-171 116.5,-171 116.5,-177 110.5,-183 104.5,-183\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-160.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">from cache</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f745e51f3d0>"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"changing_code_dr_2 = driver.Builder().with_modules(changing_code_module_2).with_cache().build()\n",
"\n",
"changing_code_results_2 = changing_code_dr_2.execute(\n",
" [\"processed_data\", \"amount_per_country\"], inputs={\"cutoff_date\": \"2024-09-01\"}\n",
")\n",
"print()\n",
"print(changing_code_results_2[\"amount_per_country\"].head())\n",
"print()\n",
"changing_code_dr_2.cache.view_run()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However, `amount_per_country` can be retrieved because `processed_data` returned a previously seen value.\n",
"\n",
"In concrete terms, adding code to process currency from Brazil and Mexico didn't change the `processed_data` result because it only includes data from the USA and Canada.\n",
"\n",
"> NOTE. This is similar to what happened at the end of the section **Changing inputs**."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" cities date amount country currency amound_in_usd\n",
"0 New York 2024-09-13 478.23 USA USD 478.2300\n",
"1 Los Angeles 2024-09-12 251.67 USA USD 251.6700\n",
"2 Chicago 2024-09-11 989.34 USA USD 989.3400\n",
"3 Montréal 2024-09-11 742.14 Canada CAD 526.9194\n",
"4 Vancouver 2024-09-09 584.56 Canada CAD 415.0376\n",
"\n",
" cities date amount country currency amound_in_usd\n",
"0 New York 2024-09-13 478.23 USA USD 478.2300\n",
"1 Los Angeles 2024-09-12 251.67 USA USD 251.6700\n",
"2 Chicago 2024-09-11 989.34 USA USD 989.3400\n",
"3 Montréal 2024-09-11 742.14 Canada CAD 526.9194\n",
"4 Vancouver 2024-09-09 584.56 Canada CAD 415.0376\n"
]
}
],
"source": [
"print(changing_code_results_1[\"processed_data\"])\n",
"print()\n",
"print(changing_code_results_2[\"processed_data\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Changing external data\n",
"\n",
"Hamilton's caching mechanism uses the node's `code_version` and its dependencies `data_version` to determine if the node needs to be executed or the result can be retrieved from cache. By default, it assumes [idempotency](https://www.astronomer.io/docs/learn/dag-best-practices#review-idempotency) of operations.\n",
"\n",
"This section covers how to handle node with external effects, such as reading or writing external data.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Idempotency\n",
"\n",
"To illustrate idempotency, let's use this minimal dataflow which has a single node that returns the current date and time:\n",
"\n",
"```python\n",
"import datetime\n",
"\n",
"def current_datetime() -> datetime.datetime:\n",
" return datetime.datetime.now()\n",
"```\n",
"\n",
"The first execution will execute the node and store the resulting date and time. On the second execution, the cache will read the stored result instead of re-executing. Why? Because the `code_version` is the same and the dependencies `data_version` (it has no dependencies) haven't changed.\n",
"\n",
"A similar situation occurs when reading from external data, as shown here:\n",
"\n",
"```python\n",
"import pandas as pd\n",
"\n",
"def dataset(file_path: str) -> pd.DataFrame:\n",
" return pd.read_csv(file_path)\n",
"```\n",
"\n",
"Here, the code of `dataset()` and the value for `file_path` can stay the same, but the file itself could be updated (e.g., new rows added).\n",
"\n",
"The next sections show how to always re-execute a node and ensure the latest data is used. The `DATA` constant is modified with transactions in Brazil and Mexico to simulate `raw_data` loading a new dataset."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"%%cell_to_module changing_external_module\n",
"import pandas as pd\n",
"\n",
"DATA = {\n",
" \"cities\": [\"New York\", \"Los Angeles\", \"Chicago\", \"Montréal\", \"Vancouver\", \"Houston\", \"Phoenix\", \"Mexico City\", \"Chihuahua City\", \"Rio de Janeiro\"],\n",
" \"date\": [\"2024-09-13\", \"2024-09-12\", \"2024-09-11\", \"2024-09-11\", \"2024-09-09\", \"2024-09-08\", \"2024-09-07\", \"2024-09-06\", \"2024-09-05\", \"2024-09-04\"],\n",
" \"amount\": [478.23, 251.67, 989.34, 742.14, 584.56, 321.85, 918.67, 135.22, 789.12, 432.78],\n",
" \"country\": [\"USA\", \"USA\", \"USA\", \"Canada\", \"Canada\", \"USA\", \"USA\", \"Mexico\", \"Mexico\", \"Brazil\"],\n",
" \"currency\": [\"USD\", \"USD\", \"USD\", \"CAD\", \"CAD\", \"USD\", \"USD\", \"MXN\", \"MXN\", \"BRL\"],\n",
"}\n",
"\n",
"def raw_data() -> pd.DataFrame:\n",
" \"\"\"Loading raw data. This simulates loading from a file, database, or external service.\"\"\"\n",
" return pd.DataFrame(DATA)\n",
"\n",
"def processed_data(raw_data: pd.DataFrame, cutoff_date: str) -> pd.DataFrame:\n",
" \"\"\"Filter out rows before cutoff date and convert currency to USD.\"\"\"\n",
" df = raw_data.loc[raw_data.date > cutoff_date].copy()\n",
" df[\"amound_in_usd\"] = df[\"amount\"]\n",
" df.loc[df.country == \"Canada\", \"amound_in_usd\"] *= 0.71 \n",
" df.loc[df.country == \"Brazil\", \"amound_in_usd\"] *= 0.18\n",
" df.loc[df.country == \"Mexico\", \"amound_in_usd\"] *= 0.05\n",
" return df\n",
"\n",
"def amount_per_country(processed_data: pd.DataFrame) -> pd.DataFrame:\n",
" \"\"\"Sum the amount in USD per country\"\"\"\n",
" return processed_data.groupby(\"country\")[\"amound_in_usd\"].sum().to_frame()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"At execution, we see `raw_data` being retrieved along with all downstream nodes. Also, we note that the printed results don't include Brazil nor Mexico."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"raw_data::result_store::get_result::hit\n",
"processed_data::result_store::get_result::hit\n",
"amount_per_country::result_store::get_result::hit\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" amound_in_usd\n",
"country \n",
"Canada 941.957\n",
"USA 1719.240\n",
"\n"
]
},
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"527pt\" height=\"341pt\"\n",
" viewBox=\"0.00 0.00 527.00 340.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 336.5)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-336.5 523,-336.5 523,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"8.5,-137.5 8.5,-324.5 124.5,-324.5 124.5,-137.5 8.5,-137.5\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-309.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M296,-90.5C296,-90.5 174,-90.5 174,-90.5 168,-90.5 162,-84.5 162,-78.5 162,-78.5 162,-38.5 162,-38.5 162,-32.5 168,-26.5 174,-26.5 174,-26.5 296,-26.5 296,-26.5 302,-26.5 308,-32.5 308,-38.5 308,-38.5 308,-78.5 308,-78.5 308,-84.5 302,-90.5 296,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"173\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"196.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#ffc857\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M507,-90.5C507,-90.5 349,-90.5 349,-90.5 343,-90.5 337,-84.5 337,-78.5 337,-78.5 337,-38.5 337,-38.5 337,-32.5 343,-26.5 349,-26.5 349,-26.5 507,-26.5 507,-26.5 513,-26.5 519,-32.5 519,-38.5 519,-38.5 519,-78.5 519,-78.5 519,-84.5 513,-90.5 507,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"348\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"389.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M308.21,-58.5C314.23,-58.5 320.39,-58.5 326.57,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"326.98,-62 336.98,-58.5 326.98,-55 326.98,-62\"/>\n",
"</g>\n",
"<!-- raw_data -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>raw_data</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104,-127.5C104,-127.5 29,-127.5 29,-127.5 23,-127.5 17,-121.5 17,-115.5 17,-115.5 17,-75.5 17,-75.5 17,-69.5 23,-63.5 29,-63.5 29,-63.5 104,-63.5 104,-63.5 110,-63.5 116,-69.5 116,-75.5 116,-75.5 116,-115.5 116,-115.5 116,-121.5 110,-127.5 104,-127.5\"/>\n",
"<text text-anchor=\"start\" x=\"30\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"28\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M116.11,-84.7C127.39,-82.19 139.71,-79.45 151.99,-76.72\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"152.81,-80.13 161.82,-74.54 151.29,-73.29 152.81,-80.13\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"133,-45 0,-45 0,0 133,0 133,-45\"/>\n",
"<text text-anchor=\"start\" x=\"15.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"99.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M133.31,-36.73C139.45,-38.06 145.73,-39.41 151.99,-40.77\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"151.3,-44.2 161.81,-42.89 152.78,-37.36 151.3,-44.2\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"96,-293 37,-293 37,-256 96,-256 96,-293\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-270.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- output -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>output</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M88.5,-238C88.5,-238 44.5,-238 44.5,-238 38.5,-238 32.5,-232 32.5,-226 32.5,-226 32.5,-213 32.5,-213 32.5,-207 38.5,-201 44.5,-201 44.5,-201 88.5,-201 88.5,-201 94.5,-201 100.5,-207 100.5,-213 100.5,-213 100.5,-226 100.5,-226 100.5,-232 94.5,-238 88.5,-238\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-215.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n",
"</g>\n",
"<!-- from cache -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>from cache</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104.5,-183C104.5,-183 28.5,-183 28.5,-183 22.5,-183 16.5,-177 16.5,-171 16.5,-171 16.5,-158 16.5,-158 16.5,-152 22.5,-146 28.5,-146 28.5,-146 104.5,-146 104.5,-146 110.5,-146 116.5,-152 116.5,-158 116.5,-158 116.5,-171 116.5,-171 116.5,-177 110.5,-183 104.5,-183\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-160.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">from cache</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f745e53c2d0>"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"changing_external_dr = driver.Builder().with_modules(changing_external_module).with_cache().build()\n",
"\n",
"changing_external_results = changing_external_dr.execute(\n",
" [\"amount_per_country\"], inputs={\"cutoff_date\": \"2024-09-01\"}\n",
")\n",
"print()\n",
"print(changing_external_results[\"amount_per_country\"].head())\n",
"print()\n",
"changing_external_dr.cache.view_run()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `.with_cache()` to specify caching behavior\n",
"Here, we build a new `Driver` with the same `changing_external_module`, but we specify in `.with_cache()` to always recompute `raw_data`. \n",
"\n",
"The visualization shows that `raw_data` was executed, and because of the new data, all downstream nodes also need to be executed. The results now include Brazil and Mexico."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"raw_data::adapter::execute_node\n",
"processed_data::adapter::execute_node\n",
"amount_per_country::adapter::execute_node\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" amound_in_usd\n",
"country \n",
"Brazil 77.9004\n",
"Canada 941.9570\n",
"Mexico 46.2170\n",
"USA 2959.7600\n",
"\n"
]
},
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"527pt\" height=\"341pt\"\n",
" viewBox=\"0.00 0.00 527.00 340.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 336.5)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-336.5 523,-336.5 523,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"18.5,-137.5 18.5,-324.5 114.5,-324.5 114.5,-137.5 18.5,-137.5\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-309.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M296,-90.5C296,-90.5 174,-90.5 174,-90.5 168,-90.5 162,-84.5 162,-78.5 162,-78.5 162,-38.5 162,-38.5 162,-32.5 168,-26.5 174,-26.5 174,-26.5 296,-26.5 296,-26.5 302,-26.5 308,-32.5 308,-38.5 308,-38.5 308,-78.5 308,-78.5 308,-84.5 302,-90.5 296,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"173\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"196.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M507,-90.5C507,-90.5 349,-90.5 349,-90.5 343,-90.5 337,-84.5 337,-78.5 337,-78.5 337,-38.5 337,-38.5 337,-32.5 343,-26.5 349,-26.5 349,-26.5 507,-26.5 507,-26.5 513,-26.5 519,-32.5 519,-38.5 519,-38.5 519,-78.5 519,-78.5 519,-84.5 513,-90.5 507,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"348\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"389.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M308.21,-58.5C314.23,-58.5 320.39,-58.5 326.57,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"326.98,-62 336.98,-58.5 326.98,-55 326.98,-62\"/>\n",
"</g>\n",
"<!-- raw_data -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M104,-127.5C104,-127.5 29,-127.5 29,-127.5 23,-127.5 17,-121.5 17,-115.5 17,-115.5 17,-75.5 17,-75.5 17,-69.5 23,-63.5 29,-63.5 29,-63.5 104,-63.5 104,-63.5 110,-63.5 116,-69.5 116,-75.5 116,-75.5 116,-115.5 116,-115.5 116,-121.5 110,-127.5 104,-127.5\"/>\n",
"<text text-anchor=\"start\" x=\"30\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"28\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M116.11,-84.7C127.39,-82.19 139.71,-79.45 151.99,-76.72\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"152.81,-80.13 161.82,-74.54 151.29,-73.29 152.81,-80.13\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"133,-45 0,-45 0,0 133,0 133,-45\"/>\n",
"<text text-anchor=\"start\" x=\"15.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"99.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M133.31,-36.73C139.45,-38.06 145.73,-39.41 151.99,-40.77\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"151.3,-44.2 161.81,-42.89 152.78,-37.36 151.3,-44.2\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"96,-293 37,-293 37,-256 96,-256 96,-293\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-270.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M94.5,-238C94.5,-238 38.5,-238 38.5,-238 32.5,-238 26.5,-232 26.5,-226 26.5,-226 26.5,-213 26.5,-213 26.5,-207 32.5,-201 38.5,-201 38.5,-201 94.5,-201 94.5,-201 100.5,-201 106.5,-207 106.5,-213 106.5,-213 106.5,-226 106.5,-226 106.5,-232 100.5,-238 94.5,-238\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-215.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- output -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>output</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M88.5,-183C88.5,-183 44.5,-183 44.5,-183 38.5,-183 32.5,-177 32.5,-171 32.5,-171 32.5,-158 32.5,-158 32.5,-152 38.5,-146 44.5,-146 44.5,-146 88.5,-146 88.5,-146 94.5,-146 100.5,-152 100.5,-158 100.5,-158 100.5,-171 100.5,-171 100.5,-177 94.5,-183 88.5,-183\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-160.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f745e54c4d0>"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"changing_external_with_cache_dr = (\n",
" driver.Builder()\n",
" .with_modules(changing_external_module)\n",
" .with_cache(recompute=[\"raw_data\"])\n",
" .build()\n",
")\n",
"\n",
"changing_external_with_cache_results = changing_external_with_cache_dr.execute(\n",
" [\"amount_per_country\"], inputs={\"cutoff_date\": \"2024-09-01\"}\n",
")\n",
"print()\n",
"print(changing_external_with_cache_results[\"amount_per_country\"].head())\n",
"print()\n",
"changing_external_with_cache_dr.cache.view_run()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `@cache` to specify caching behavior\n",
"Another way to specify the `RECOMPUTE` behavior is to use the `@cache` decorator."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"%%cell_to_module changing_external_decorator_module\n",
"import pandas as pd\n",
"from hamilton.function_modifiers import cache\n",
"\n",
"DATA = {\n",
" \"cities\": [\"New York\", \"Los Angeles\", \"Chicago\", \"Montréal\", \"Vancouver\", \"Houston\", \"Phoenix\", \"Mexico City\", \"Chihuahua City\", \"Rio de Janeiro\"],\n",
" \"date\": [\"2024-09-13\", \"2024-09-12\", \"2024-09-11\", \"2024-09-11\", \"2024-09-09\", \"2024-09-08\", \"2024-09-07\", \"2024-09-06\", \"2024-09-05\", \"2024-09-04\"],\n",
" \"amount\": [478.23, 251.67, 989.34, 742.14, 584.56, 321.85, 918.67, 135.22, 789.12, 432.78],\n",
" \"country\": [\"USA\", \"USA\", \"USA\", \"Canada\", \"Canada\", \"USA\", \"USA\", \"Mexico\", \"Mexico\", \"Brazil\"],\n",
" \"currency\": [\"USD\", \"USD\", \"USD\", \"CAD\", \"CAD\", \"USD\", \"USD\", \"MXN\", \"MXN\", \"BRL\"],\n",
"}\n",
"\n",
"@cache(behavior=\"recompute\")\n",
"def raw_data() -> pd.DataFrame:\n",
" \"\"\"Loading raw data. This simulates loading from a file, database, or external service.\"\"\"\n",
" return pd.DataFrame(DATA)\n",
"\n",
"def processed_data(raw_data: pd.DataFrame, cutoff_date: str) -> pd.DataFrame:\n",
" \"\"\"Filter out rows before cutoff date and convert currency to USD.\"\"\"\n",
" df = raw_data.loc[raw_data.date > cutoff_date].copy()\n",
" df[\"amound_in_usd\"] = df[\"amount\"]\n",
" df.loc[df.country == \"Canada\", \"amound_in_usd\"] *= 0.71 \n",
" df.loc[df.country == \"Brazil\", \"amound_in_usd\"] *= 0.18\n",
" df.loc[df.country == \"Mexico\", \"amound_in_usd\"] *= 0.05\n",
" return df\n",
"\n",
"def amount_per_country(processed_data: pd.DataFrame) -> pd.DataFrame:\n",
" \"\"\"Sum the amount in USD per country\"\"\"\n",
" return processed_data.groupby(\"country\")[\"amound_in_usd\"].sum().to_frame()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We build a new `Driver` with `changing_external_cache_decorator_module`, which includes the `@cache` decorator. Note that we don't specify anything in `.with_cache()`."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"raw_data::adapter::execute_node\n",
"processed_data::result_store::get_result::hit\n",
"amount_per_country::result_store::get_result::hit\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" amound_in_usd\n",
"country \n",
"Brazil 77.9004\n",
"Canada 941.9570\n",
"Mexico 46.2170\n",
"USA 2959.7600\n",
"\n"
]
},
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"527pt\" height=\"396pt\"\n",
" viewBox=\"0.00 0.00 527.00 395.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 391.5)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-391.5 523,-391.5 523,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"8.5,-137.5 8.5,-379.5 124.5,-379.5 124.5,-137.5 8.5,-137.5\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-364.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M296,-90.5C296,-90.5 174,-90.5 174,-90.5 168,-90.5 162,-84.5 162,-78.5 162,-78.5 162,-38.5 162,-38.5 162,-32.5 168,-26.5 174,-26.5 174,-26.5 296,-26.5 296,-26.5 302,-26.5 308,-32.5 308,-38.5 308,-38.5 308,-78.5 308,-78.5 308,-84.5 302,-90.5 296,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"173\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"196.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#ffc857\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M507,-90.5C507,-90.5 349,-90.5 349,-90.5 343,-90.5 337,-84.5 337,-78.5 337,-78.5 337,-38.5 337,-38.5 337,-32.5 343,-26.5 349,-26.5 349,-26.5 507,-26.5 507,-26.5 513,-26.5 519,-32.5 519,-38.5 519,-38.5 519,-78.5 519,-78.5 519,-84.5 513,-90.5 507,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"348\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"389.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M308.21,-58.5C314.23,-58.5 320.39,-58.5 326.57,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"326.98,-62 336.98,-58.5 326.98,-55 326.98,-62\"/>\n",
"</g>\n",
"<!-- raw_data -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M104,-127.5C104,-127.5 29,-127.5 29,-127.5 23,-127.5 17,-121.5 17,-115.5 17,-115.5 17,-75.5 17,-75.5 17,-69.5 23,-63.5 29,-63.5 29,-63.5 104,-63.5 104,-63.5 110,-63.5 116,-69.5 116,-75.5 116,-75.5 116,-115.5 116,-115.5 116,-121.5 110,-127.5 104,-127.5\"/>\n",
"<text text-anchor=\"start\" x=\"30\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"28\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M116.11,-84.7C127.39,-82.19 139.71,-79.45 151.99,-76.72\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"152.81,-80.13 161.82,-74.54 151.29,-73.29 152.81,-80.13\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"133,-45 0,-45 0,0 133,0 133,-45\"/>\n",
"<text text-anchor=\"start\" x=\"15.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"99.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M133.31,-36.73C139.45,-38.06 145.73,-39.41 151.99,-40.77\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"151.3,-44.2 161.81,-42.89 152.78,-37.36 151.3,-44.2\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"96,-348 37,-348 37,-311 96,-311 96,-348\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-325.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M94.5,-293C94.5,-293 38.5,-293 38.5,-293 32.5,-293 26.5,-287 26.5,-281 26.5,-281 26.5,-268 26.5,-268 26.5,-262 32.5,-256 38.5,-256 38.5,-256 94.5,-256 94.5,-256 100.5,-256 106.5,-262 106.5,-268 106.5,-268 106.5,-281 106.5,-281 106.5,-287 100.5,-293 94.5,-293\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-270.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- output -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>output</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M88.5,-238C88.5,-238 44.5,-238 44.5,-238 38.5,-238 32.5,-232 32.5,-226 32.5,-226 32.5,-213 32.5,-213 32.5,-207 38.5,-201 44.5,-201 44.5,-201 88.5,-201 88.5,-201 94.5,-201 100.5,-207 100.5,-213 100.5,-213 100.5,-226 100.5,-226 100.5,-232 94.5,-238 88.5,-238\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-215.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n",
"</g>\n",
"<!-- from cache -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>from cache</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104.5,-183C104.5,-183 28.5,-183 28.5,-183 22.5,-183 16.5,-177 16.5,-171 16.5,-171 16.5,-158 16.5,-158 16.5,-152 22.5,-146 28.5,-146 28.5,-146 104.5,-146 104.5,-146 110.5,-146 116.5,-152 116.5,-158 116.5,-158 116.5,-171 116.5,-171 116.5,-177 110.5,-183 104.5,-183\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-160.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">from cache</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f745e559250>"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"changing_external_decorator_dr = (\n",
" driver.Builder().with_modules(changing_external_decorator_module).with_cache().build()\n",
")\n",
"\n",
"changing_external_decorator_results = changing_external_decorator_dr.execute(\n",
" [\"amount_per_country\"], inputs={\"cutoff_date\": \"2024-09-01\"}\n",
")\n",
"print()\n",
"print(changing_external_decorator_results[\"amount_per_country\"].head())\n",
"print()\n",
"changing_external_decorator_dr.cache.view_run()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see that `raw_data` was re-executed. Then, `processed_data` and `amount_per_country` can be retrieved because they were produced just before by the `changing_external_with_cache_dr`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### When to use `@cache` vs. `.with_cache()`?\n",
"\n",
"Specifying the caching behavior via `.with_cache()` or `@cache` is entirely equivalent. There are benefits to either approach:\n",
"\n",
"- `@cache`: specify behavior at the dataflow-level. The behavior is tied to the node and will be picked up by all `Driver` loading the module. This can prevent errors or unexpected behaviors for users of that dataflow.\n",
"\n",
"- `.with_cache()`: specify behavior at the `Driver`-level. Gives the flexiblity to change the behavior without modifying the dataflow code and committing changes. You might be ok with `DEFAULT` during development, but want to ensure `RECOMPUTE` in production.\n",
"\n",
"Importantly, the behavior specified in `.with_cache(...)` overrides whatever is in `@cache` because it is closer to execution. For example, having `.with_cache(default=[\"raw_data\"])` `@cache(behavior=\"recompute\")` would force `DEFAULT` behavior.\n",
"\n",
"> ⛔ **Important**: Using the `@cache` decorator alone doesn't enable caching; adding `.with_cache()` to the `Builder` does. The decorator is only a mean to specify special behaviors for a node.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Force recompute all\n",
"By specifying `.with_cache(recompute=True)`, you are setting the behavior `RECOMPUTE` for all nodes. This forces recomputation, which is useful for producing a \"cache refresh\" with up-to-date values."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"raw_data::adapter::execute_node\n",
"processed_data::adapter::execute_node\n",
"amount_per_country::adapter::execute_node\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" amound_in_usd\n",
"country \n",
"Brazil 77.9004\n",
"Canada 941.9570\n",
"Mexico 46.2170\n",
"USA 2959.7600\n",
"\n"
]
},
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"527pt\" height=\"341pt\"\n",
" viewBox=\"0.00 0.00 527.00 340.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 336.5)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-336.5 523,-336.5 523,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"18.5,-137.5 18.5,-324.5 114.5,-324.5 114.5,-137.5 18.5,-137.5\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-309.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M296,-90.5C296,-90.5 174,-90.5 174,-90.5 168,-90.5 162,-84.5 162,-78.5 162,-78.5 162,-38.5 162,-38.5 162,-32.5 168,-26.5 174,-26.5 174,-26.5 296,-26.5 296,-26.5 302,-26.5 308,-32.5 308,-38.5 308,-38.5 308,-78.5 308,-78.5 308,-84.5 302,-90.5 296,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"173\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"196.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M507,-90.5C507,-90.5 349,-90.5 349,-90.5 343,-90.5 337,-84.5 337,-78.5 337,-78.5 337,-38.5 337,-38.5 337,-32.5 343,-26.5 349,-26.5 349,-26.5 507,-26.5 507,-26.5 513,-26.5 519,-32.5 519,-38.5 519,-38.5 519,-78.5 519,-78.5 519,-84.5 513,-90.5 507,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"348\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"389.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M308.21,-58.5C314.23,-58.5 320.39,-58.5 326.57,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"326.98,-62 336.98,-58.5 326.98,-55 326.98,-62\"/>\n",
"</g>\n",
"<!-- raw_data -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M104,-127.5C104,-127.5 29,-127.5 29,-127.5 23,-127.5 17,-121.5 17,-115.5 17,-115.5 17,-75.5 17,-75.5 17,-69.5 23,-63.5 29,-63.5 29,-63.5 104,-63.5 104,-63.5 110,-63.5 116,-69.5 116,-75.5 116,-75.5 116,-115.5 116,-115.5 116,-121.5 110,-127.5 104,-127.5\"/>\n",
"<text text-anchor=\"start\" x=\"30\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"28\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M116.11,-84.7C127.39,-82.19 139.71,-79.45 151.99,-76.72\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"152.81,-80.13 161.82,-74.54 151.29,-73.29 152.81,-80.13\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"133,-45 0,-45 0,0 133,0 133,-45\"/>\n",
"<text text-anchor=\"start\" x=\"15.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"99.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M133.31,-36.73C139.45,-38.06 145.73,-39.41 151.99,-40.77\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"151.3,-44.2 161.81,-42.89 152.78,-37.36 151.3,-44.2\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"96,-293 37,-293 37,-256 96,-256 96,-293\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-270.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M94.5,-238C94.5,-238 38.5,-238 38.5,-238 32.5,-238 26.5,-232 26.5,-226 26.5,-226 26.5,-213 26.5,-213 26.5,-207 32.5,-201 38.5,-201 38.5,-201 94.5,-201 94.5,-201 100.5,-201 106.5,-207 106.5,-213 106.5,-213 106.5,-226 106.5,-226 106.5,-232 100.5,-238 94.5,-238\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-215.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- output -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>output</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M88.5,-183C88.5,-183 44.5,-183 44.5,-183 38.5,-183 32.5,-177 32.5,-171 32.5,-171 32.5,-158 32.5,-158 32.5,-152 38.5,-146 44.5,-146 44.5,-146 88.5,-146 88.5,-146 94.5,-146 100.5,-152 100.5,-158 100.5,-158 100.5,-171 100.5,-171 100.5,-177 94.5,-183 88.5,-183\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-160.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f745e4bc810>"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"recompute_all_dr = (\n",
" driver.Builder()\n",
" .with_modules(changing_external_decorator_module)\n",
" .with_cache(recompute=True)\n",
" .build()\n",
")\n",
"\n",
"recompute_all_results = recompute_all_dr.execute(\n",
" [\"amount_per_country\"], inputs={\"cutoff_date\": \"2024-09-01\"}\n",
")\n",
"print()\n",
"print(recompute_all_results[\"amount_per_country\"].head())\n",
"print()\n",
"recompute_all_dr.cache.view_run()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see that all nodes were recomputed."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setting default behavior\n",
"\n",
"Once you enable caching using `.with_cache()`, it is a \"opt-out\" feature by default. This means all nodes are cached unless you set the `DISABLE` behavior via `@cache` or `.with_cache(disable=[...])`. This can become difficult to manage as the number of nodes increases. \n",
"\n",
"You can make it an \"opt-in\" feature by setting `default_behavior=\"disable\"` in `.with_cache()`. This way, you're using caching, but only for nodes explicitly specified in `@cache` or `.with_cache()`.\n",
"\n",
"Here, we build a `Driver` with the `changing_external_decorator_module`, where `raw_data` was set to have behavior `RECOMPUTE`, and set the default behavior to `DISABLE`."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"raw_data::adapter::execute_node\n",
"processed_data::adapter::execute_node\n",
"amount_per_country::adapter::execute_node\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" amound_in_usd\n",
"country \n",
"Brazil 77.9004\n",
"Canada 941.9570\n",
"Mexico 46.2170\n",
"USA 2959.7600\n",
"\n"
]
},
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"527pt\" height=\"341pt\"\n",
" viewBox=\"0.00 0.00 527.00 340.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 336.5)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-336.5 523,-336.5 523,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"18.5,-137.5 18.5,-324.5 114.5,-324.5 114.5,-137.5 18.5,-137.5\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-309.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M296,-90.5C296,-90.5 174,-90.5 174,-90.5 168,-90.5 162,-84.5 162,-78.5 162,-78.5 162,-38.5 162,-38.5 162,-32.5 168,-26.5 174,-26.5 174,-26.5 296,-26.5 296,-26.5 302,-26.5 308,-32.5 308,-38.5 308,-38.5 308,-78.5 308,-78.5 308,-84.5 302,-90.5 296,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"173\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"196.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M507,-90.5C507,-90.5 349,-90.5 349,-90.5 343,-90.5 337,-84.5 337,-78.5 337,-78.5 337,-38.5 337,-38.5 337,-32.5 343,-26.5 349,-26.5 349,-26.5 507,-26.5 507,-26.5 513,-26.5 519,-32.5 519,-38.5 519,-38.5 519,-78.5 519,-78.5 519,-84.5 513,-90.5 507,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"348\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"389.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M308.21,-58.5C314.23,-58.5 320.39,-58.5 326.57,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"326.98,-62 336.98,-58.5 326.98,-55 326.98,-62\"/>\n",
"</g>\n",
"<!-- raw_data -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M104,-127.5C104,-127.5 29,-127.5 29,-127.5 23,-127.5 17,-121.5 17,-115.5 17,-115.5 17,-75.5 17,-75.5 17,-69.5 23,-63.5 29,-63.5 29,-63.5 104,-63.5 104,-63.5 110,-63.5 116,-69.5 116,-75.5 116,-75.5 116,-115.5 116,-115.5 116,-121.5 110,-127.5 104,-127.5\"/>\n",
"<text text-anchor=\"start\" x=\"30\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"28\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M116.11,-84.7C127.39,-82.19 139.71,-79.45 151.99,-76.72\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"152.81,-80.13 161.82,-74.54 151.29,-73.29 152.81,-80.13\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"133,-45 0,-45 0,0 133,0 133,-45\"/>\n",
"<text text-anchor=\"start\" x=\"15.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"99.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M133.31,-36.73C139.45,-38.06 145.73,-39.41 151.99,-40.77\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"151.3,-44.2 161.81,-42.89 152.78,-37.36 151.3,-44.2\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"96,-293 37,-293 37,-256 96,-256 96,-293\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-270.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M94.5,-238C94.5,-238 38.5,-238 38.5,-238 32.5,-238 26.5,-232 26.5,-226 26.5,-226 26.5,-213 26.5,-213 26.5,-207 32.5,-201 38.5,-201 38.5,-201 94.5,-201 94.5,-201 100.5,-201 106.5,-207 106.5,-213 106.5,-213 106.5,-226 106.5,-226 106.5,-232 100.5,-238 94.5,-238\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-215.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- output -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>output</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M88.5,-183C88.5,-183 44.5,-183 44.5,-183 38.5,-183 32.5,-177 32.5,-171 32.5,-171 32.5,-158 32.5,-158 32.5,-152 38.5,-146 44.5,-146 44.5,-146 88.5,-146 88.5,-146 94.5,-146 100.5,-152 100.5,-158 100.5,-158 100.5,-171 100.5,-171 100.5,-177 94.5,-183 88.5,-183\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-160.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f745e55b950>"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"default_behavior_dr = (\n",
" driver.Builder()\n",
" .with_modules(changing_external_decorator_module)\n",
" .with_cache(default_behavior=\"disable\")\n",
" .build()\n",
")\n",
"\n",
"default_behavior_results = default_behavior_dr.execute(\n",
" [\"amount_per_country\"], inputs={\"cutoff_date\": \"2024-09-01\"}\n",
")\n",
"print()\n",
"print(default_behavior_results[\"amount_per_country\"].head())\n",
"print()\n",
"default_behavior_dr.cache.view_run()"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'amount_per_country': <CachingBehavior.DISABLE: 3>,\n",
" 'processed_data': <CachingBehavior.DISABLE: 3>,\n",
" 'raw_data': <CachingBehavior.RECOMPUTE: 2>,\n",
" 'cutoff_date': <CachingBehavior.DISABLE: 3>}"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"default_behavior_dr.cache.behaviors[default_behavior_dr.cache.last_run_id]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Materializers\n",
"\n",
"> NOTE. You can skip this section if you're not using materializers.\n",
"\n",
"`DataLoader` and `DataSaver` (collectively \"materializers\") are special Hamilton nodes that connect your dataflow to external data (files, databases, etc.). These constructs are safe to use with caching and are complementary.\n",
"\n",
"**Caching**\n",
"- writing and reading shorter-term data to be used with the dataflow\n",
"- strong connection between the code and the data\n",
"- automatically handle multiple versions of the same dataset\n",
"\n",
"**Materializers**\n",
"- robust mechanism to read/write data from many sources\n",
"- data isn't necessarily meant to be used with Hamilton (e.g., loading from a warehouse, outputting a report).\n",
"- typically outputs to a static destination; each write overwrites the previous stored dataset.\n",
"\n",
"The next cell uses `@dataloader` and `@datasaver` decorators. In the visualization, we see the added `raw_data.loader` and `saved_data` nodes."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"847pt\" height=\"355pt\"\n",
" viewBox=\"0.00 0.00 847.00 354.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 350.5)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-350.5 843,-350.5 843,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"14.5,-149.5 14.5,-338.5 135.5,-338.5 135.5,-149.5 14.5,-149.5\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-323.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M686,-90.5C686,-90.5 528,-90.5 528,-90.5 522,-90.5 516,-84.5 516,-78.5 516,-78.5 516,-38.5 516,-38.5 516,-32.5 522,-26.5 528,-26.5 528,-26.5 686,-26.5 686,-26.5 692,-26.5 698,-32.5 698,-38.5 698,-38.5 698,-78.5 698,-78.5 698,-84.5 692,-90.5 686,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"527\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"568.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- saved_data -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>saved_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M839,-94.5C839,-98.91 813.9,-102.5 783,-102.5 752.1,-102.5 727,-98.91 727,-94.5 727,-94.5 727,-22.5 727,-22.5 727,-18.09 752.1,-14.5 783,-14.5 813.9,-14.5 839,-18.09 839,-22.5 839,-22.5 839,-94.5 839,-94.5\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M839,-94.5C839,-90.09 813.9,-86.5 783,-86.5 752.1,-86.5 727,-90.09 727,-94.5\"/>\n",
"<text text-anchor=\"start\" x=\"738\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">saved_data</text>\n",
"<text text-anchor=\"start\" x=\"738\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">saved_data()</text>\n",
"</g>\n",
"<!-- amount_per_country&#45;&gt;saved_data -->\n",
"<g id=\"edge5\" class=\"edge\">\n",
"<title>amount_per_country&#45;&gt;saved_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M698.06,-58.5C704.33,-58.5 710.57,-58.5 716.67,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"716.76,-62 726.76,-58.5 716.76,-55 716.76,-62\"/>\n",
"</g>\n",
"<!-- raw_data -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M283,-127.5C283,-127.5 208,-127.5 208,-127.5 202,-127.5 196,-121.5 196,-115.5 196,-115.5 196,-75.5 196,-75.5 196,-69.5 202,-63.5 208,-63.5 208,-63.5 283,-63.5 283,-63.5 289,-63.5 295,-69.5 295,-75.5 295,-75.5 295,-115.5 295,-115.5 295,-121.5 289,-127.5 283,-127.5\"/>\n",
"<text text-anchor=\"start\" x=\"209\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"207\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M475,-90.5C475,-90.5 353,-90.5 353,-90.5 347,-90.5 341,-84.5 341,-78.5 341,-78.5 341,-38.5 341,-38.5 341,-32.5 347,-26.5 353,-26.5 353,-26.5 475,-26.5 475,-26.5 481,-26.5 487,-32.5 487,-38.5 487,-38.5 487,-78.5 487,-78.5 487,-84.5 481,-90.5 475,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"352\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"375.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M295.11,-84.7C306.39,-82.19 318.71,-79.45 330.99,-76.72\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"331.81,-80.13 340.82,-74.54 330.29,-73.29 331.81,-80.13\"/>\n",
"</g>\n",
"<!-- raw_data.loader -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>raw_data.loader</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M150,-131.5C150,-135.91 116.38,-139.5 75,-139.5 33.62,-139.5 0,-135.91 0,-131.5 0,-131.5 0,-59.5 0,-59.5 0,-55.09 33.62,-51.5 75,-51.5 116.38,-51.5 150,-55.09 150,-59.5 150,-59.5 150,-131.5 150,-131.5\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M150,-131.5C150,-127.09 116.38,-123.5 75,-123.5 33.62,-123.5 0,-127.09 0,-131.5\"/>\n",
"<text text-anchor=\"start\" x=\"11\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data.loader</text>\n",
"<text text-anchor=\"start\" x=\"38\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">raw_data()</text>\n",
"</g>\n",
"<!-- raw_data.loader&#45;&gt;raw_data -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>raw_data.loader&#45;&gt;raw_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M150.4,-95.5C162.14,-95.5 174.15,-95.5 185.48,-95.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"185.67,-99 195.67,-95.5 185.67,-92 185.67,-99\"/>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M487.21,-58.5C493.23,-58.5 499.39,-58.5 505.57,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"505.98,-62 515.98,-58.5 505.98,-55 505.98,-62\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"312,-45 179,-45 179,0 312,0 312,-45\"/>\n",
"<text text-anchor=\"start\" x=\"194.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"278.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge4\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M312.31,-36.73C318.45,-38.06 324.73,-39.41 330.99,-40.77\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"330.3,-44.2 340.81,-42.89 331.78,-37.36 330.3,-44.2\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"104.5,-307 45.5,-307 45.5,-270 104.5,-270 104.5,-307\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-284.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M103,-252C103,-252 47,-252 47,-252 41,-252 35,-246 35,-240 35,-240 35,-227 35,-227 35,-221 41,-215 47,-215 47,-215 103,-215 103,-215 109,-215 115,-221 115,-227 115,-227 115,-240 115,-240 115,-246 109,-252 103,-252\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-229.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- materializer -->\n",
"<g id=\"node9\" class=\"node\">\n",
"<title>materializer</title>\n",
"<path fill=\"#ffffff\" stroke=\"black\" d=\"M127.5,-193.76C127.5,-195.76 103.97,-197.38 75,-197.38 46.03,-197.38 22.5,-195.76 22.5,-193.76 22.5,-193.76 22.5,-161.24 22.5,-161.24 22.5,-159.24 46.03,-157.62 75,-157.62 103.97,-157.62 127.5,-159.24 127.5,-161.24 127.5,-161.24 127.5,-193.76 127.5,-193.76\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M127.5,-193.76C127.5,-191.77 103.97,-190.15 75,-190.15 46.03,-190.15 22.5,-191.77 22.5,-193.76\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-173.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">materializer</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f745e55bd10>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%cell_to_module materializers_module -d\n",
"import pandas as pd\n",
"from hamilton.function_modifiers import dataloader, datasaver\n",
"\n",
"DATA = {\n",
" \"cities\": [\"New York\", \"Los Angeles\", \"Chicago\", \"Montréal\", \"Vancouver\", \"Houston\", \"Phoenix\", \"Mexico City\", \"Chihuahua City\", \"Rio de Janeiro\"],\n",
" \"date\": [\"2024-09-13\", \"2024-09-12\", \"2024-09-11\", \"2024-09-11\", \"2024-09-09\", \"2024-09-08\", \"2024-09-07\", \"2024-09-06\", \"2024-09-05\", \"2024-09-04\"],\n",
" \"amount\": [478.23, 251.67, 989.34, 742.14, 584.56, 321.85, 918.67, 135.22, 789.12, 432.78],\n",
" \"country\": [\"USA\", \"USA\", \"USA\", \"Canada\", \"Canada\", \"USA\", \"USA\", \"Mexico\", \"Mexico\", \"Brazil\"],\n",
" \"currency\": [\"USD\", \"USD\", \"USD\", \"CAD\", \"CAD\", \"USD\", \"USD\", \"MXN\", \"MXN\", \"BRL\"],\n",
"}\n",
"\n",
"@dataloader()\n",
"def raw_data() -> tuple[pd.DataFrame, dict]:\n",
" \"\"\"Loading raw data. This simulates loading from a file, database, or external service.\"\"\"\n",
" data = pd.DataFrame(DATA)\n",
" metadata = {\"source\": \"notebook\", \"format\": \"json\"}\n",
" return data, metadata\n",
"\n",
"def processed_data(raw_data: pd.DataFrame, cutoff_date: str) -> pd.DataFrame:\n",
" \"\"\"Filter out rows before cutoff date and convert currency to USD.\"\"\"\n",
" df = raw_data.loc[raw_data.date > cutoff_date].copy()\n",
" df[\"amound_in_usd\"] = df[\"amount\"]\n",
" df.loc[df.country == \"Canada\", \"amound_in_usd\"] *= 0.71 \n",
" df.loc[df.country == \"Brazil\", \"amound_in_usd\"] *= 0.18\n",
" df.loc[df.country == \"Mexico\", \"amound_in_usd\"] *= 0.05\n",
" return df\n",
"\n",
"def amount_per_country(processed_data: pd.DataFrame) -> pd.DataFrame:\n",
" \"\"\"Sum the amount in USD per country\"\"\"\n",
" return processed_data.groupby(\"country\")[\"amound_in_usd\"].sum().to_frame()\n",
"\n",
"@datasaver()\n",
"def saved_data(amount_per_country: pd.DataFrame) -> dict:\n",
" amount_per_country.to_parquet(\"./saved_data.parquet\")\n",
" metadata = {\"source\": \"notebook\", \"format\": \"parquet\"}\n",
" return metadata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we build a `Driver` as usual. "
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"raw_data.loader::adapter::execute_node\n",
"raw_data::adapter::execute_node\n",
"processed_data::result_store::get_result::hit\n",
"amount_per_country::result_store::get_result::hit\n",
"saved_data::adapter::execute_node\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" amound_in_usd\n",
"country \n",
"Brazil 77.9004\n",
"Canada 941.9570\n",
"Mexico 46.2170\n",
"USA 2959.7600\n",
"\n"
]
},
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"847pt\" height=\"465pt\"\n",
" viewBox=\"0.00 0.00 847.00 464.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 460.5)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-460.5 843,-460.5 843,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"14.5,-149.5 14.5,-448.5 135.5,-448.5 135.5,-149.5 14.5,-149.5\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-433.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- raw_data.loader -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>raw_data.loader</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M150,-131.5C150,-135.91 116.38,-139.5 75,-139.5 33.62,-139.5 0,-135.91 0,-131.5 0,-131.5 0,-59.5 0,-59.5 0,-55.09 33.62,-51.5 75,-51.5 116.38,-51.5 150,-55.09 150,-59.5 150,-59.5 150,-131.5 150,-131.5\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M150,-131.5C150,-127.09 116.38,-123.5 75,-123.5 33.62,-123.5 0,-127.09 0,-131.5\"/>\n",
"<text text-anchor=\"start\" x=\"11\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data.loader</text>\n",
"<text text-anchor=\"start\" x=\"38\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">raw_data()</text>\n",
"</g>\n",
"<!-- raw_data -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M283,-127.5C283,-127.5 208,-127.5 208,-127.5 202,-127.5 196,-121.5 196,-115.5 196,-115.5 196,-75.5 196,-75.5 196,-69.5 202,-63.5 208,-63.5 208,-63.5 283,-63.5 283,-63.5 289,-63.5 295,-69.5 295,-75.5 295,-75.5 295,-115.5 295,-115.5 295,-121.5 289,-127.5 283,-127.5\"/>\n",
"<text text-anchor=\"start\" x=\"209\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"207\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- raw_data.loader&#45;&gt;raw_data -->\n",
"<g id=\"edge4\" class=\"edge\">\n",
"<title>raw_data.loader&#45;&gt;raw_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M150.4,-95.5C162.14,-95.5 174.15,-95.5 185.48,-95.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"185.67,-99 195.67,-95.5 185.67,-92 185.67,-99\"/>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M475,-90.5C475,-90.5 353,-90.5 353,-90.5 347,-90.5 341,-84.5 341,-78.5 341,-78.5 341,-38.5 341,-38.5 341,-32.5 347,-26.5 353,-26.5 353,-26.5 475,-26.5 475,-26.5 481,-26.5 487,-32.5 487,-38.5 487,-38.5 487,-78.5 487,-78.5 487,-84.5 481,-90.5 475,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"352\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"375.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#ffc857\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M686,-90.5C686,-90.5 528,-90.5 528,-90.5 522,-90.5 516,-84.5 516,-78.5 516,-78.5 516,-38.5 516,-38.5 516,-32.5 522,-26.5 528,-26.5 528,-26.5 686,-26.5 686,-26.5 692,-26.5 698,-32.5 698,-38.5 698,-38.5 698,-78.5 698,-78.5 698,-84.5 692,-90.5 686,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"527\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"568.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M487.21,-58.5C493.23,-58.5 499.39,-58.5 505.57,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"505.98,-62 515.98,-58.5 505.98,-55 505.98,-62\"/>\n",
"</g>\n",
"<!-- saved_data -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>saved_data</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M839,-94.5C839,-98.91 813.9,-102.5 783,-102.5 752.1,-102.5 727,-98.91 727,-94.5 727,-94.5 727,-22.5 727,-22.5 727,-18.09 752.1,-14.5 783,-14.5 813.9,-14.5 839,-18.09 839,-22.5 839,-22.5 839,-94.5 839,-94.5\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M839,-94.5C839,-90.09 813.9,-86.5 783,-86.5 752.1,-86.5 727,-90.09 727,-94.5\"/>\n",
"<text text-anchor=\"start\" x=\"738\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">saved_data</text>\n",
"<text text-anchor=\"start\" x=\"738\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">saved_data()</text>\n",
"</g>\n",
"<!-- amount_per_country&#45;&gt;saved_data -->\n",
"<g id=\"edge5\" class=\"edge\">\n",
"<title>amount_per_country&#45;&gt;saved_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M698.06,-58.5C704.33,-58.5 710.57,-58.5 716.67,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"716.76,-62 726.76,-58.5 716.76,-55 716.76,-62\"/>\n",
"</g>\n",
"<!-- raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M295.11,-84.7C306.39,-82.19 318.71,-79.45 330.99,-76.72\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"331.81,-80.13 340.82,-74.54 330.29,-73.29 331.81,-80.13\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"312,-45 179,-45 179,0 312,0 312,-45\"/>\n",
"<text text-anchor=\"start\" x=\"194.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"278.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M312.31,-36.73C318.45,-38.06 324.73,-39.41 330.99,-40.77\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"330.3,-44.2 340.81,-42.89 331.78,-37.36 330.3,-44.2\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"104.5,-417 45.5,-417 45.5,-380 104.5,-380 104.5,-417\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-394.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M103,-362C103,-362 47,-362 47,-362 41,-362 35,-356 35,-350 35,-350 35,-337 35,-337 35,-331 41,-325 47,-325 47,-325 103,-325 103,-325 109,-325 115,-331 115,-337 115,-337 115,-350 115,-350 115,-356 109,-362 103,-362\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-339.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- output -->\n",
"<g id=\"node9\" class=\"node\">\n",
"<title>output</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M97,-307C97,-307 53,-307 53,-307 47,-307 41,-301 41,-295 41,-295 41,-282 41,-282 41,-276 47,-270 53,-270 53,-270 97,-270 97,-270 103,-270 109,-276 109,-282 109,-282 109,-295 109,-295 109,-301 103,-307 97,-307\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-284.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n",
"</g>\n",
"<!-- materializer -->\n",
"<g id=\"node10\" class=\"node\">\n",
"<title>materializer</title>\n",
"<path fill=\"#ffffff\" stroke=\"black\" d=\"M127.5,-248.76C127.5,-250.76 103.97,-252.38 75,-252.38 46.03,-252.38 22.5,-250.76 22.5,-248.76 22.5,-248.76 22.5,-216.24 22.5,-216.24 22.5,-214.24 46.03,-212.62 75,-212.62 103.97,-212.62 127.5,-214.24 127.5,-216.24 127.5,-216.24 127.5,-248.76 127.5,-248.76\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M127.5,-248.76C127.5,-246.77 103.97,-245.15 75,-245.15 46.03,-245.15 22.5,-246.77 22.5,-248.76\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-228.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">materializer</text>\n",
"</g>\n",
"<!-- from cache -->\n",
"<g id=\"node11\" class=\"node\">\n",
"<title>from cache</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M113,-195C113,-195 37,-195 37,-195 31,-195 25,-189 25,-183 25,-183 25,-170 25,-170 25,-164 31,-158 37,-158 37,-158 113,-158 113,-158 119,-158 125,-164 125,-170 125,-170 125,-183 125,-183 125,-189 119,-195 113,-195\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-172.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">from cache</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f745e57ab50>"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"materializers_dr = driver.Builder().with_modules(materializers_module).with_cache().build()\n",
"\n",
"materializers_results = materializers_dr.execute(\n",
" [\"amount_per_country\", \"saved_data\"], inputs={\"cutoff_date\": \"2024-09-01\"}\n",
")\n",
"print()\n",
"print(materializers_results[\"amount_per_country\"].head())\n",
"print()\n",
"materializers_dr.cache.view_run()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We execute the dataflow a second time to show that loaders and savers are just like any other node; they can be cached and retrieved."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"raw_data.loader::result_store::get_result::hit\n",
"raw_data::result_store::get_result::hit\n",
"processed_data::result_store::get_result::hit\n",
"amount_per_country::result_store::get_result::hit\n",
"saved_data::result_store::get_result::hit\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" amound_in_usd\n",
"country \n",
"Brazil 77.9004\n",
"Canada 941.9570\n",
"Mexico 46.2170\n",
"USA 2959.7600\n",
"\n"
]
},
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"847pt\" height=\"410pt\"\n",
" viewBox=\"0.00 0.00 847.00 409.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 405.5)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-405.5 843,-405.5 843,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"14.5,-149.5 14.5,-393.5 135.5,-393.5 135.5,-149.5 14.5,-149.5\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-378.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- raw_data.loader -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>raw_data.loader</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M150,-131.5C150,-135.91 116.38,-139.5 75,-139.5 33.62,-139.5 0,-135.91 0,-131.5 0,-131.5 0,-59.5 0,-59.5 0,-55.09 33.62,-51.5 75,-51.5 116.38,-51.5 150,-55.09 150,-59.5 150,-59.5 150,-131.5 150,-131.5\"/>\n",
"<path fill=\"none\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M150,-131.5C150,-127.09 116.38,-123.5 75,-123.5 33.62,-123.5 0,-127.09 0,-131.5\"/>\n",
"<text text-anchor=\"start\" x=\"11\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data.loader</text>\n",
"<text text-anchor=\"start\" x=\"38\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">raw_data()</text>\n",
"</g>\n",
"<!-- raw_data -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>raw_data</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M283,-127.5C283,-127.5 208,-127.5 208,-127.5 202,-127.5 196,-121.5 196,-115.5 196,-115.5 196,-75.5 196,-75.5 196,-69.5 202,-63.5 208,-63.5 208,-63.5 283,-63.5 283,-63.5 289,-63.5 295,-69.5 295,-75.5 295,-75.5 295,-115.5 295,-115.5 295,-121.5 289,-127.5 283,-127.5\"/>\n",
"<text text-anchor=\"start\" x=\"209\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"207\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- raw_data.loader&#45;&gt;raw_data -->\n",
"<g id=\"edge4\" class=\"edge\">\n",
"<title>raw_data.loader&#45;&gt;raw_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M150.4,-95.5C162.14,-95.5 174.15,-95.5 185.48,-95.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"185.67,-99 195.67,-95.5 185.67,-92 185.67,-99\"/>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M475,-90.5C475,-90.5 353,-90.5 353,-90.5 347,-90.5 341,-84.5 341,-78.5 341,-78.5 341,-38.5 341,-38.5 341,-32.5 347,-26.5 353,-26.5 353,-26.5 475,-26.5 475,-26.5 481,-26.5 487,-32.5 487,-38.5 487,-38.5 487,-78.5 487,-78.5 487,-84.5 481,-90.5 475,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"352\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"375.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#ffc857\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M686,-90.5C686,-90.5 528,-90.5 528,-90.5 522,-90.5 516,-84.5 516,-78.5 516,-78.5 516,-38.5 516,-38.5 516,-32.5 522,-26.5 528,-26.5 528,-26.5 686,-26.5 686,-26.5 692,-26.5 698,-32.5 698,-38.5 698,-38.5 698,-78.5 698,-78.5 698,-84.5 692,-90.5 686,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"527\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"568.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M487.21,-58.5C493.23,-58.5 499.39,-58.5 505.57,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"505.98,-62 515.98,-58.5 505.98,-55 505.98,-62\"/>\n",
"</g>\n",
"<!-- saved_data -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>saved_data</title>\n",
"<path fill=\"#ffc857\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M839,-94.5C839,-98.91 813.9,-102.5 783,-102.5 752.1,-102.5 727,-98.91 727,-94.5 727,-94.5 727,-22.5 727,-22.5 727,-18.09 752.1,-14.5 783,-14.5 813.9,-14.5 839,-18.09 839,-22.5 839,-22.5 839,-94.5 839,-94.5\"/>\n",
"<path fill=\"none\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M839,-94.5C839,-90.09 813.9,-86.5 783,-86.5 752.1,-86.5 727,-90.09 727,-94.5\"/>\n",
"<text text-anchor=\"start\" x=\"738\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">saved_data</text>\n",
"<text text-anchor=\"start\" x=\"738\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">saved_data()</text>\n",
"</g>\n",
"<!-- amount_per_country&#45;&gt;saved_data -->\n",
"<g id=\"edge5\" class=\"edge\">\n",
"<title>amount_per_country&#45;&gt;saved_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M698.06,-58.5C704.33,-58.5 710.57,-58.5 716.67,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"716.76,-62 726.76,-58.5 716.76,-55 716.76,-62\"/>\n",
"</g>\n",
"<!-- raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M295.11,-84.7C306.39,-82.19 318.71,-79.45 330.99,-76.72\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"331.81,-80.13 340.82,-74.54 330.29,-73.29 331.81,-80.13\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"312,-45 179,-45 179,0 312,0 312,-45\"/>\n",
"<text text-anchor=\"start\" x=\"194.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"278.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M312.31,-36.73C318.45,-38.06 324.73,-39.41 330.99,-40.77\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"330.3,-44.2 340.81,-42.89 331.78,-37.36 330.3,-44.2\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"104.5,-362 45.5,-362 45.5,-325 104.5,-325 104.5,-362\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-339.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- output -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>output</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M97,-307C97,-307 53,-307 53,-307 47,-307 41,-301 41,-295 41,-295 41,-282 41,-282 41,-276 47,-270 53,-270 53,-270 97,-270 97,-270 103,-270 109,-276 109,-282 109,-282 109,-295 109,-295 109,-301 103,-307 97,-307\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-284.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n",
"</g>\n",
"<!-- materializer -->\n",
"<g id=\"node9\" class=\"node\">\n",
"<title>materializer</title>\n",
"<path fill=\"#ffffff\" stroke=\"black\" d=\"M127.5,-248.76C127.5,-250.76 103.97,-252.38 75,-252.38 46.03,-252.38 22.5,-250.76 22.5,-248.76 22.5,-248.76 22.5,-216.24 22.5,-216.24 22.5,-214.24 46.03,-212.62 75,-212.62 103.97,-212.62 127.5,-214.24 127.5,-216.24 127.5,-216.24 127.5,-248.76 127.5,-248.76\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M127.5,-248.76C127.5,-246.77 103.97,-245.15 75,-245.15 46.03,-245.15 22.5,-246.77 22.5,-248.76\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-228.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">materializer</text>\n",
"</g>\n",
"<!-- from cache -->\n",
"<g id=\"node10\" class=\"node\">\n",
"<title>from cache</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M113,-195C113,-195 37,-195 37,-195 31,-195 25,-189 25,-183 25,-183 25,-170 25,-170 25,-164 31,-158 37,-158 37,-158 113,-158 113,-158 119,-158 125,-164 125,-170 125,-170 125,-183 125,-183 125,-189 119,-195 113,-195\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-172.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">from cache</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f745e568750>"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"materializers_results = materializers_dr.execute(\n",
" [\"amount_per_country\", \"saved_data\"], inputs={\"cutoff_date\": \"2024-09-01\"}\n",
")\n",
"print()\n",
"print(materializers_results[\"amount_per_country\"].head())\n",
"print()\n",
"materializers_dr.cache.view_run()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Usage patterns\n",
"\n",
"Here are a few common scenarios:\n",
"\n",
"**Loading data is expensive**: Your dataflow uses a `DataLoader` to get data from Snowflake. You want to load it once and cache it. When executing your dataflow, you want to use your cached copy to save query time, egress costs, etc.\n",
"- Use the `DEFAULT` caching behavior for loaders.\n",
"\n",
"**Only save new data**: You run the dataflow multiple times (maybe with different parameters or on a schedule) and only want to write to destination when the data changes.\n",
"- Use the `DEFAULT` caching behavior for savers.\n",
"\n",
"**Always read the latest data**: You want to use caching, but also ensure the dataflow always uses the latest data. This involves executing the `DataLoader` every time, get the data in-memory, version it, and then determine what needs to be executed (see [Changing external data](#changing-external-data)).\n",
"- Use the `RECOMPUTE` caching behavior for loaders.\n",
"\n",
"Use the parameters `default_loader_behavior` or `default_saver_behavior` of the `.with_cache()` clause to specify the behavior for all loaders or savers.\n",
"\n",
"> NOTE. The **Caching + materializers tutorial** notebook details how to achieve granular control over loader and saver behaviors."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"raw_data.loader::adapter::execute_node\n",
"raw_data::adapter::execute_node\n",
"processed_data::result_store::get_result::hit\n",
"amount_per_country::result_store::get_result::hit\n",
"saved_data::adapter::execute_node\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" amound_in_usd\n",
"country \n",
"Brazil 77.9004\n",
"Canada 941.9570\n",
"Mexico 46.2170\n",
"USA 2959.7600\n",
"\n"
]
},
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"847pt\" height=\"465pt\"\n",
" viewBox=\"0.00 0.00 847.00 464.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 460.5)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-460.5 843,-460.5 843,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"14.5,-149.5 14.5,-448.5 135.5,-448.5 135.5,-149.5 14.5,-149.5\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-433.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- raw_data.loader -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>raw_data.loader</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M150,-131.5C150,-135.91 116.38,-139.5 75,-139.5 33.62,-139.5 0,-135.91 0,-131.5 0,-131.5 0,-59.5 0,-59.5 0,-55.09 33.62,-51.5 75,-51.5 116.38,-51.5 150,-55.09 150,-59.5 150,-59.5 150,-131.5 150,-131.5\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M150,-131.5C150,-127.09 116.38,-123.5 75,-123.5 33.62,-123.5 0,-127.09 0,-131.5\"/>\n",
"<text text-anchor=\"start\" x=\"11\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data.loader</text>\n",
"<text text-anchor=\"start\" x=\"38\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">raw_data()</text>\n",
"</g>\n",
"<!-- raw_data -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M283,-127.5C283,-127.5 208,-127.5 208,-127.5 202,-127.5 196,-121.5 196,-115.5 196,-115.5 196,-75.5 196,-75.5 196,-69.5 202,-63.5 208,-63.5 208,-63.5 283,-63.5 283,-63.5 289,-63.5 295,-69.5 295,-75.5 295,-75.5 295,-115.5 295,-115.5 295,-121.5 289,-127.5 283,-127.5\"/>\n",
"<text text-anchor=\"start\" x=\"209\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"207\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- raw_data.loader&#45;&gt;raw_data -->\n",
"<g id=\"edge4\" class=\"edge\">\n",
"<title>raw_data.loader&#45;&gt;raw_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M150.4,-95.5C162.14,-95.5 174.15,-95.5 185.48,-95.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"185.67,-99 195.67,-95.5 185.67,-92 185.67,-99\"/>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M475,-90.5C475,-90.5 353,-90.5 353,-90.5 347,-90.5 341,-84.5 341,-78.5 341,-78.5 341,-38.5 341,-38.5 341,-32.5 347,-26.5 353,-26.5 353,-26.5 475,-26.5 475,-26.5 481,-26.5 487,-32.5 487,-38.5 487,-38.5 487,-78.5 487,-78.5 487,-84.5 481,-90.5 475,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"352\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"375.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#ffc857\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M686,-90.5C686,-90.5 528,-90.5 528,-90.5 522,-90.5 516,-84.5 516,-78.5 516,-78.5 516,-38.5 516,-38.5 516,-32.5 522,-26.5 528,-26.5 528,-26.5 686,-26.5 686,-26.5 692,-26.5 698,-32.5 698,-38.5 698,-38.5 698,-78.5 698,-78.5 698,-84.5 692,-90.5 686,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"527\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"568.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M487.21,-58.5C493.23,-58.5 499.39,-58.5 505.57,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"505.98,-62 515.98,-58.5 505.98,-55 505.98,-62\"/>\n",
"</g>\n",
"<!-- saved_data -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>saved_data</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M839,-94.5C839,-98.91 813.9,-102.5 783,-102.5 752.1,-102.5 727,-98.91 727,-94.5 727,-94.5 727,-22.5 727,-22.5 727,-18.09 752.1,-14.5 783,-14.5 813.9,-14.5 839,-18.09 839,-22.5 839,-22.5 839,-94.5 839,-94.5\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M839,-94.5C839,-90.09 813.9,-86.5 783,-86.5 752.1,-86.5 727,-90.09 727,-94.5\"/>\n",
"<text text-anchor=\"start\" x=\"738\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">saved_data</text>\n",
"<text text-anchor=\"start\" x=\"738\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">saved_data()</text>\n",
"</g>\n",
"<!-- amount_per_country&#45;&gt;saved_data -->\n",
"<g id=\"edge5\" class=\"edge\">\n",
"<title>amount_per_country&#45;&gt;saved_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M698.06,-58.5C704.33,-58.5 710.57,-58.5 716.67,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"716.76,-62 726.76,-58.5 716.76,-55 716.76,-62\"/>\n",
"</g>\n",
"<!-- raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M295.11,-84.7C306.39,-82.19 318.71,-79.45 330.99,-76.72\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"331.81,-80.13 340.82,-74.54 330.29,-73.29 331.81,-80.13\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"312,-45 179,-45 179,0 312,0 312,-45\"/>\n",
"<text text-anchor=\"start\" x=\"194.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"278.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M312.31,-36.73C318.45,-38.06 324.73,-39.41 330.99,-40.77\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"330.3,-44.2 340.81,-42.89 331.78,-37.36 330.3,-44.2\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"104.5,-417 45.5,-417 45.5,-380 104.5,-380 104.5,-417\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-394.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M103,-362C103,-362 47,-362 47,-362 41,-362 35,-356 35,-350 35,-350 35,-337 35,-337 35,-331 41,-325 47,-325 47,-325 103,-325 103,-325 109,-325 115,-331 115,-337 115,-337 115,-350 115,-350 115,-356 109,-362 103,-362\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-339.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- output -->\n",
"<g id=\"node9\" class=\"node\">\n",
"<title>output</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M97,-307C97,-307 53,-307 53,-307 47,-307 41,-301 41,-295 41,-295 41,-282 41,-282 41,-276 47,-270 53,-270 53,-270 97,-270 97,-270 103,-270 109,-276 109,-282 109,-282 109,-295 109,-295 109,-301 103,-307 97,-307\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-284.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n",
"</g>\n",
"<!-- materializer -->\n",
"<g id=\"node10\" class=\"node\">\n",
"<title>materializer</title>\n",
"<path fill=\"#ffffff\" stroke=\"black\" d=\"M127.5,-248.76C127.5,-250.76 103.97,-252.38 75,-252.38 46.03,-252.38 22.5,-250.76 22.5,-248.76 22.5,-248.76 22.5,-216.24 22.5,-216.24 22.5,-214.24 46.03,-212.62 75,-212.62 103.97,-212.62 127.5,-214.24 127.5,-216.24 127.5,-216.24 127.5,-248.76 127.5,-248.76\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M127.5,-248.76C127.5,-246.77 103.97,-245.15 75,-245.15 46.03,-245.15 22.5,-246.77 22.5,-248.76\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-228.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">materializer</text>\n",
"</g>\n",
"<!-- from cache -->\n",
"<g id=\"node11\" class=\"node\">\n",
"<title>from cache</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M113,-195C113,-195 37,-195 37,-195 31,-195 25,-189 25,-183 25,-183 25,-170 25,-170 25,-164 31,-158 37,-158 37,-158 113,-158 113,-158 119,-158 125,-164 125,-170 125,-170 125,-183 125,-183 125,-189 119,-195 113,-195\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-172.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">from cache</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f745cb70650>"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"materializers_dr_2 = (\n",
" driver.Builder()\n",
" .with_modules(materializers_module)\n",
" .with_cache(default_loader_behavior=\"recompute\", default_saver_behavior=\"disable\")\n",
" .build()\n",
")\n",
"\n",
"materializers_results_2 = materializers_dr_2.execute(\n",
" [\"amount_per_country\", \"saved_data\"], inputs={\"cutoff_date\": \"2024-09-01\"}\n",
")\n",
"print()\n",
"print(materializers_results_2[\"amount_per_country\"].head())\n",
"print()\n",
"materializers_dr_2.cache.view_run()"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'amount_per_country': <CachingBehavior.DEFAULT: 1>,\n",
" 'processed_data': <CachingBehavior.DEFAULT: 1>,\n",
" 'raw_data.loader': <CachingBehavior.RECOMPUTE: 2>,\n",
" 'raw_data': <CachingBehavior.RECOMPUTE: 2>,\n",
" 'saved_data': <CachingBehavior.DISABLE: 3>,\n",
" 'cutoff_date': <CachingBehavior.DEFAULT: 1>}"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"materializers_dr_2.cache.behaviors[materializers_dr_2.cache.last_run_id]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Changing the cache format\n",
"\n",
"By default, results are stored in ``pickle`` format. It's a convenient default but [comes with caveats](https://grantjenks.com/docs/diskcache/tutorial.html#caveats). You can use the `@cache` decorator to specify another file format for storing results.\n",
"\n",
"By default this includes:\n",
"\n",
"- `json`\n",
"- `parquet`\n",
"- `csv`\n",
"- `excel`\n",
"- `file`\n",
"- `feather`\n",
"- `orc`\n",
"\n",
"This feature uses `DataLoader` and `DataSaver` under the hood and supports all of the same formats (including your custom ones, as long as they take a `path` attribute).\n",
"\n",
"> This is an area of active development. Feel free to share suggestions and feedback!\n",
"\n",
"The next cell sets `processed_data` to be cached using the `parquet` format."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"%%cell_to_module cache_format_module\n",
"import pandas as pd\n",
"from hamilton.function_modifiers import cache\n",
"\n",
"DATA = {\n",
" \"cities\": [\"New York\", \"Los Angeles\", \"Chicago\", \"Montréal\", \"Vancouver\", \"Houston\", \"Phoenix\", \"Mexico City\", \"Chihuahua City\", \"Rio de Janeiro\"],\n",
" \"date\": [\"2024-09-13\", \"2024-09-12\", \"2024-09-11\", \"2024-09-11\", \"2024-09-09\", \"2024-09-08\", \"2024-09-07\", \"2024-09-06\", \"2024-09-05\", \"2024-09-04\"],\n",
" \"amount\": [478.23, 251.67, 989.34, 742.14, 584.56, 321.85, 918.67, 135.22, 789.12, 432.78],\n",
" \"country\": [\"USA\", \"USA\", \"USA\", \"Canada\", \"Canada\", \"USA\", \"USA\", \"Mexico\", \"Mexico\", \"Brazil\"],\n",
" \"currency\": [\"USD\", \"USD\", \"USD\", \"CAD\", \"CAD\", \"USD\", \"USD\", \"MXN\", \"MXN\", \"BRL\"],\n",
"}\n",
"\n",
"def raw_data() -> pd.DataFrame:\n",
" \"\"\"Loading raw data. This simulates loading from a file, database, or external service.\"\"\"\n",
" return pd.DataFrame(DATA)\n",
"\n",
"@cache(format=\"parquet\")\n",
"def processed_data(raw_data: pd.DataFrame, cutoff_date: str) -> pd.DataFrame:\n",
" \"\"\"Filter out rows before cutoff date and convert currency to USD.\"\"\"\n",
" df = raw_data.loc[raw_data.date > cutoff_date].copy()\n",
" df[\"amound_in_usd\"] = df[\"amount\"]\n",
" df.loc[df.country == \"Canada\", \"amound_in_usd\"] *= 0.71 \n",
" df.loc[df.country == \"Brazil\", \"amound_in_usd\"] *= 0.18\n",
" df.loc[df.country == \"Mexico\", \"amound_in_usd\"] *= 0.05\n",
" return df\n",
"\n",
"def amount_per_country(processed_data: pd.DataFrame) -> pd.Series:\n",
" \"\"\"Sum the amount in USD per country\"\"\"\n",
" return processed_data.groupby(\"country\")[\"amound_in_usd\"].sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When executing the dataflow, we see `raw_data` recomputed because it's a dataloader. The result for `processed_data` will be retrieved, but it will be saved again as `.parquet` this time. "
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"raw_data::result_store::get_result::hit\n",
"processed_data::adapter::execute_node\n",
"amount_per_country::adapter::execute_node\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"country\n",
"Canada 941.957\n",
"USA 1719.240\n",
"Name: amound_in_usd, dtype: float64\n",
"\n"
]
},
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"527pt\" height=\"396pt\"\n",
" viewBox=\"0.00 0.00 527.00 395.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 391.5)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-391.5 523,-391.5 523,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"8.5,-137.5 8.5,-379.5 124.5,-379.5 124.5,-137.5 8.5,-137.5\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-364.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M296,-90.5C296,-90.5 174,-90.5 174,-90.5 168,-90.5 162,-84.5 162,-78.5 162,-78.5 162,-38.5 162,-38.5 162,-32.5 168,-26.5 174,-26.5 174,-26.5 296,-26.5 296,-26.5 302,-26.5 308,-32.5 308,-38.5 308,-38.5 308,-78.5 308,-78.5 308,-84.5 302,-90.5 296,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"173\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"196.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M507,-90.5C507,-90.5 349,-90.5 349,-90.5 343,-90.5 337,-84.5 337,-78.5 337,-78.5 337,-38.5 337,-38.5 337,-32.5 343,-26.5 349,-26.5 349,-26.5 507,-26.5 507,-26.5 513,-26.5 519,-32.5 519,-38.5 519,-38.5 519,-78.5 519,-78.5 519,-84.5 513,-90.5 507,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"348\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"406.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">Series</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M308.21,-58.5C314.23,-58.5 320.39,-58.5 326.57,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"326.98,-62 336.98,-58.5 326.98,-55 326.98,-62\"/>\n",
"</g>\n",
"<!-- raw_data -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>raw_data</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104,-127.5C104,-127.5 29,-127.5 29,-127.5 23,-127.5 17,-121.5 17,-115.5 17,-115.5 17,-75.5 17,-75.5 17,-69.5 23,-63.5 29,-63.5 29,-63.5 104,-63.5 104,-63.5 110,-63.5 116,-69.5 116,-75.5 116,-75.5 116,-115.5 116,-115.5 116,-121.5 110,-127.5 104,-127.5\"/>\n",
"<text text-anchor=\"start\" x=\"30\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"28\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M116.11,-84.7C127.39,-82.19 139.71,-79.45 151.99,-76.72\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"152.81,-80.13 161.82,-74.54 151.29,-73.29 152.81,-80.13\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"133,-45 0,-45 0,0 133,0 133,-45\"/>\n",
"<text text-anchor=\"start\" x=\"15.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"99.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M133.31,-36.73C139.45,-38.06 145.73,-39.41 151.99,-40.77\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"151.3,-44.2 161.81,-42.89 152.78,-37.36 151.3,-44.2\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"96,-348 37,-348 37,-311 96,-311 96,-348\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-325.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M94.5,-293C94.5,-293 38.5,-293 38.5,-293 32.5,-293 26.5,-287 26.5,-281 26.5,-281 26.5,-268 26.5,-268 26.5,-262 32.5,-256 38.5,-256 38.5,-256 94.5,-256 94.5,-256 100.5,-256 106.5,-262 106.5,-268 106.5,-268 106.5,-281 106.5,-281 106.5,-287 100.5,-293 94.5,-293\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-270.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- output -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>output</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M88.5,-238C88.5,-238 44.5,-238 44.5,-238 38.5,-238 32.5,-232 32.5,-226 32.5,-226 32.5,-213 32.5,-213 32.5,-207 38.5,-201 44.5,-201 44.5,-201 88.5,-201 88.5,-201 94.5,-201 100.5,-207 100.5,-213 100.5,-213 100.5,-226 100.5,-226 100.5,-232 94.5,-238 88.5,-238\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-215.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n",
"</g>\n",
"<!-- from cache -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>from cache</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104.5,-183C104.5,-183 28.5,-183 28.5,-183 22.5,-183 16.5,-177 16.5,-171 16.5,-171 16.5,-158 16.5,-158 16.5,-152 22.5,-146 28.5,-146 28.5,-146 104.5,-146 104.5,-146 110.5,-146 116.5,-152 116.5,-158 116.5,-158 116.5,-171 116.5,-171 116.5,-177 110.5,-183 104.5,-183\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-160.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">from cache</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f745e4f5e90>"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cache_format_dr = driver.Builder().with_modules(cache_format_module).with_cache().build()\n",
"\n",
"cache_format_results = cache_format_dr.execute(\n",
" [\"amount_per_country\"], inputs={\"cutoff_date\": \"2024-09-01\"}\n",
")\n",
"print()\n",
"print(cache_format_results[\"amount_per_country\"].head())\n",
"print()\n",
"cache_format_dr.cache.view_run()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, under the `./.hamilton_cache`, there will be two results of the same name, one with the `.parquet` extension and one without. The one without is actually a pickeld `DataLoader` to retrieve the `.parquet` file.\n",
"\n",
"You can access the path programmatically via the `result_store._path_from_data_version(...)` method."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_version = cache_format_dr.cache.data_versions[cache_format_dr.cache.last_run_id][\n",
" \"processed_data\"\n",
"]\n",
"parquet_path = cache_format_dr.cache.result_store._path_from_data_version(data_version).with_suffix(\n",
" \".parquet\"\n",
")\n",
"parquet_path.exists()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introspecting the cache\n",
"The `Driver.cache` stores information about all executions over its lifetime. Previous `run_id` are available through `Driver.cache.run_ids` and can be used in tandem without other utility functions:\n",
"\n",
"- Resolve the node caching behavior (e.g., \"recompute\")\n",
"- Access structured logs\n",
"- Visualize the cache execution\n",
"\n",
"Also, `Driver.cache.last_run_id` is a shortcut to the most recent execution."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'amount_per_country': <CachingBehavior.DEFAULT: 1>,\n",
" 'processed_data': <CachingBehavior.DEFAULT: 1>,\n",
" 'raw_data': <CachingBehavior.DEFAULT: 1>,\n",
" 'cutoff_date': <CachingBehavior.DEFAULT: 1>}"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cache_format_dr.cache.resolve_behaviors(cache_format_dr.cache.last_run_id)"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"processed_data::adapter::resolve_behavior\n",
"processed_data::adapter::set_cache_key\n",
"processed_data::adapter::get_cache_key::hit\n",
"processed_data::adapter::get_data_version::miss\n",
"processed_data::metadata_store::get_data_version::miss\n",
"processed_data::adapter::execute_node\n",
"processed_data::adapter::set_data_version\n",
"processed_data::metadata_store::set_data_version\n",
"processed_data::adapter::get_cache_key::hit\n",
"processed_data::adapter::get_data_version::hit\n",
"processed_data::result_store::set_result\n",
"processed_data::adapter::get_data_version::hit\n",
"processed_data::adapter::resolve_behavior\n"
]
}
],
"source": [
"run_logs = cache_format_dr.cache.logs(cache_format_dr.cache.last_run_id, level=\"debug\")\n",
"for event in run_logs[\"processed_data\"]:\n",
" print(event)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"527pt\" height=\"396pt\"\n",
" viewBox=\"0.00 0.00 527.00 395.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 391.5)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-391.5 523,-391.5 523,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"8.5,-137.5 8.5,-379.5 124.5,-379.5 124.5,-137.5 8.5,-137.5\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-364.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M296,-90.5C296,-90.5 174,-90.5 174,-90.5 168,-90.5 162,-84.5 162,-78.5 162,-78.5 162,-38.5 162,-38.5 162,-32.5 168,-26.5 174,-26.5 174,-26.5 296,-26.5 296,-26.5 302,-26.5 308,-32.5 308,-38.5 308,-38.5 308,-78.5 308,-78.5 308,-84.5 302,-90.5 296,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"173\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"196.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M507,-90.5C507,-90.5 349,-90.5 349,-90.5 343,-90.5 337,-84.5 337,-78.5 337,-78.5 337,-38.5 337,-38.5 337,-32.5 343,-26.5 349,-26.5 349,-26.5 507,-26.5 507,-26.5 513,-26.5 519,-32.5 519,-38.5 519,-38.5 519,-78.5 519,-78.5 519,-84.5 513,-90.5 507,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"348\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"406.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">Series</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M308.21,-58.5C314.23,-58.5 320.39,-58.5 326.57,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"326.98,-62 336.98,-58.5 326.98,-55 326.98,-62\"/>\n",
"</g>\n",
"<!-- raw_data -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>raw_data</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104,-127.5C104,-127.5 29,-127.5 29,-127.5 23,-127.5 17,-121.5 17,-115.5 17,-115.5 17,-75.5 17,-75.5 17,-69.5 23,-63.5 29,-63.5 29,-63.5 104,-63.5 104,-63.5 110,-63.5 116,-69.5 116,-75.5 116,-75.5 116,-115.5 116,-115.5 116,-121.5 110,-127.5 104,-127.5\"/>\n",
"<text text-anchor=\"start\" x=\"30\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"28\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M116.11,-84.7C127.39,-82.19 139.71,-79.45 151.99,-76.72\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"152.81,-80.13 161.82,-74.54 151.29,-73.29 152.81,-80.13\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"133,-45 0,-45 0,0 133,0 133,-45\"/>\n",
"<text text-anchor=\"start\" x=\"15.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"99.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M133.31,-36.73C139.45,-38.06 145.73,-39.41 151.99,-40.77\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"151.3,-44.2 161.81,-42.89 152.78,-37.36 151.3,-44.2\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"96,-348 37,-348 37,-311 96,-311 96,-348\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-325.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M94.5,-293C94.5,-293 38.5,-293 38.5,-293 32.5,-293 26.5,-287 26.5,-281 26.5,-281 26.5,-268 26.5,-268 26.5,-262 32.5,-256 38.5,-256 38.5,-256 94.5,-256 94.5,-256 100.5,-256 106.5,-262 106.5,-268 106.5,-268 106.5,-281 106.5,-281 106.5,-287 100.5,-293 94.5,-293\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-270.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- output -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>output</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M88.5,-238C88.5,-238 44.5,-238 44.5,-238 38.5,-238 32.5,-232 32.5,-226 32.5,-226 32.5,-213 32.5,-213 32.5,-207 38.5,-201 44.5,-201 44.5,-201 88.5,-201 88.5,-201 94.5,-201 100.5,-207 100.5,-213 100.5,-213 100.5,-226 100.5,-226 100.5,-232 94.5,-238 88.5,-238\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-215.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n",
"</g>\n",
"<!-- from cache -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>from cache</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104.5,-183C104.5,-183 28.5,-183 28.5,-183 22.5,-183 16.5,-177 16.5,-171 16.5,-171 16.5,-158 16.5,-158 16.5,-152 22.5,-146 28.5,-146 28.5,-146 104.5,-146 104.5,-146 110.5,-146 116.5,-152 116.5,-158 116.5,-158 116.5,-171 116.5,-171 116.5,-177 110.5,-183 104.5,-183\"/>\n",
"<text text-anchor=\"middle\" x=\"66.5\" y=\"-160.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">from cache</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f745cb790d0>"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# for `.view_run()` passing no parameter is equivalent to the last `run_id`\n",
"cache_format_dr.cache.view_run(cache_format_dr.cache.last_run_id)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Interactively explore runs\n",
"By using `ipywidgets` we can easily build a widget to iterate over `run_id` values and display cache information. Below, we create a `Driver` and execute it a few times to generate data then inspect it with a widget."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"raw_data::result_store::get_result::hit\n",
"processed_data::result_store::get_result::hit\n",
"amount_per_country::result_store::get_result::hit\n",
"raw_data::result_store::get_result::hit\n",
"processed_data::adapter::execute_node\n",
"amount_per_country::result_store::get_result::hit\n",
"raw_data::result_store::get_result::hit\n",
"processed_data::adapter::execute_node\n",
"amount_per_country::adapter::execute_node\n",
"raw_data::result_store::get_result::hit\n",
"processed_data::adapter::execute_node\n",
"amount_per_country::adapter::execute_node\n",
"raw_data::result_store::get_result::hit\n",
"processed_data::adapter::execute_node\n",
"amount_per_country::adapter::execute_node\n"
]
},
{
"data": {
"text/plain": [
"{'amount_per_country': Series([], Name: amound_in_usd, dtype: float64)}"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"interactive_dr = driver.Builder().with_modules(cache_format_module).with_cache().build()\n",
"\n",
"interactive_dr.execute([\"amount_per_country\"], inputs={\"cutoff_date\": \"2024-09-01\"})\n",
"interactive_dr.execute([\"amount_per_country\"], inputs={\"cutoff_date\": \"2024-09-05\"})\n",
"interactive_dr.execute([\"amount_per_country\"], inputs={\"cutoff_date\": \"2024-09-10\"})\n",
"interactive_dr.execute([\"amount_per_country\"], inputs={\"cutoff_date\": \"2024-09-11\"})\n",
"interactive_dr.execute([\"amount_per_country\"], inputs={\"cutoff_date\": \"2024-09-13\"})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following cell allows you to click-and-drag or use arrow-keys to navigate"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "8a9785e33191453bac0b952ce1f80ef3",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"interactive(children=(SelectionSlider(description='run_id', options=('101f1759-82c3-416b-875b-e184b765af3c', '…"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from IPython.display import display\n",
"from ipywidgets import SelectionSlider, interact\n",
"\n",
"\n",
"@interact(run_id=SelectionSlider(options=interactive_dr.cache.run_ids))\n",
"def iterate_over_runs(run_id):\n",
" display(interactive_dr.cache.data_versions[run_id])\n",
" display(interactive_dr.cache.view_run(run_id=run_id))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Managing storage\n",
"### Setting the cache `path`\n",
"\n",
"By default, metadata and results are stored under `./.hamilton_cache`, relative to the current directory at execution time. You can also manually set the directory via `.with_cache(path=...)` to isolate or centralize cache storage between dataflows or projects.\n",
"\n",
"Running the next cell will create the directory `./my_other_cache`."
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [],
"source": [
"manual_path_dr = (\n",
" driver.Builder().with_modules(cache_format_module).with_cache(path=\"./my_other_cache\").build()\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Instantiating the `result_store` and `metadata_store`\n",
"If you need to store metadata and results in separate locations, you can do so by instantiating the `result_store` and `metadata_store` manually with their own configuration. In this case, setting `.with_cache(path=...)` would be ignored."
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"from hamilton.caching.stores.file import FileResultStore\n",
"from hamilton.caching.stores.sqlite import SQLiteMetadataStore\n",
"\n",
"result_store = FileResultStore(path=\"./results\")\n",
"metadata_store = SQLiteMetadataStore(path=\"./metadata\")\n",
"\n",
"manual_stores_dr = (\n",
" driver.Builder()\n",
" .with_modules(cache_format_module)\n",
" .with_cache(\n",
" result_store=result_store,\n",
" metadata_store=metadata_store,\n",
" )\n",
" .build()\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deleting data and recovering storage\n",
"As you use caching, you might be generating a lot of data that you don't need anymore. One straightforward solution is to delete the entire directory where metadata and results are stored. \n",
"\n",
"You can also programmatically call `.delete_all()` on the `result_store` and `metadata_store`, which should reclaim most storage. If you delete results, make sure to also delete metadata. The caching mechanism should figure it out, but it's safer to keep them in sync."
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [],
"source": [
"manual_stores_dr.cache.metadata_store.delete_all()\n",
"manual_stores_dr.cache.result_store.delete_all()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Usage patterns\n",
"\n",
"As demonstrated here, caching works great in a notebook environment.\n",
"\n",
"- In addition to iteration speed, caching allows you to restart your kernel or shutdown your computer for the day without worry. When you'll come back, you will still be able to retrieve results from cache.\n",
"\n",
"- A similar benefit is the ability resume execution between environments. For example, you might be running Hamilton in a script, but when a bug happens you can reload these values in a notebook and investigate.\n",
"\n",
"- Caching works great with other adapters like the `HamiltonTracker` that powers the Hamilton UI and the `MLFlowTracker` for experiment tracking.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 🚧 INTERNALS\n",
"If you're curious the following sections provide details about the caching internals. These APIs are not public and may change without notice."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Manually retrieve results\n",
"Using the `Driver.cache` you can directly retrieve results from previous executions. The cache stores \"data versions\" which are keys for the `result_store`. \n",
"\n",
"Here, we get the `run_id` for the 4th execution (index 3) and the data version for `processed_data` before retrieving its value."
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" cities date amount country currency amound_in_usd\n",
"0 New York 2024-09-13 478.23 USA USD 478.23\n",
"1 Los Angeles 2024-09-12 251.67 USA USD 251.67\n"
]
}
],
"source": [
"run_id = interactive_dr.cache.run_ids[3]\n",
"data_version = interactive_dr.cache.data_versions[run_id][\"processed_data\"]\n",
"result = interactive_dr.cache.result_store.get(data_version)\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Decoding the `cache_key`\n",
"\n",
"By now, you should have a better grasp on how Hamilton's caching determines when to execute a node. Internally, it creates a `cache_key` from the `code_version` of the node and the `data_version` of each dependency. The cache keys are stored on the `Driver.cache` and can be decoded for introspection and debugging.\n",
"\n",
"Here, we get the `run_id` for the 3rd execution (index 2) and the cache key for `amount_per_country`. We then use `decode_key()` to retrieve the `node_name`, `code_version`, and `dependencies_data_versions`."
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'node_name': 'amount_per_country',\n",
" 'code_version': 'c2ccafa54280fbc969870b6baa445211277d7e8cfa98a0821836c175603ffda2',\n",
" 'dependencies_data_versions': {'processed_data': 'WgV5-4SfdKTfUY66x-msj_xXsKNPNTP2guRhfw=='}}"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from hamilton.caching.cache_key import decode_key\n",
"\n",
"run_id = interactive_dr.cache.run_ids[2]\n",
"cache_key = interactive_dr.cache.cache_keys[run_id][\"amount_per_country\"]\n",
"decode_key(cache_key)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Indeed, this match the data version for `processed_data` for the 3rd execution."
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'WgV5-4SfdKTfUY66x-msj_xXsKNPNTP2guRhfw=='"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"interactive_dr.cache.data_versions[run_id][\"processed_data\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Manually retrieve metadata\n",
"\n",
"In addition to the `result_store`, there is a `metadata_store` that contains mapping between `cache_key` and `data_version` (cache keys are unique, but many can point to the same data).\n",
"\n",
"Using the knowledge from the previous section, we can use the cache key for `amount_per_country` to retrieve its `data_version` and result. It's also possible to decode its `cache_key`, and get the `data_version` for its dependencies, making the node execution reproducible."
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"country\n",
"Canada 526.9194\n",
"USA 1719.2400\n",
"Name: amound_in_usd, dtype: float64\n"
]
}
],
"source": [
"run_id = interactive_dr.cache.run_ids[2]\n",
"cache_key = interactive_dr.cache.cache_keys[run_id][\"amount_per_country\"]\n",
"amount_data_version = interactive_dr.cache.metadata_store.get(cache_key)\n",
"amount_result = interactive_dr.cache.result_store.get(amount_data_version)\n",
"print(amount_result)"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"processed_data\n",
" cities date amount country currency amound_in_usd\n",
"0 New York 2024-09-13 478.23 USA USD 478.23\n",
"1 Los Angeles 2024-09-12 251.67 USA USD 251.67\n",
"2 Chicago 2024-09-11 989.34 USA USD 989.34\n",
"3 Montréal 2024-09-11 742.14 Canada CAD 526.9194\n",
"\n"
]
}
],
"source": [
"for dep_name, dependency_data_version in decode_key(cache_key)[\n",
" \"dependencies_data_versions\"\n",
"].items():\n",
" dep_result = interactive_dr.cache.result_store.get(dependency_data_version)\n",
" print(dep_name)\n",
" print(dep_result)\n",
" print()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}