| { |
| "cells": [ |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "# Execute this cell to install dependencies\n", |
| "%pip install sf-hamilton[visualization]" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "# In-memory caching tutorial [](https://colab.research.google.com/github/dagworks-inc/hamilton/blob/main/examples/caching/in_memory_tutorial.ipynb) [](https://github.com/dagworks-inc/hamilton/blob/main/examples/caching/in_memory_tutorial.ipynb)\n", |
| "\n", |
| "\n", |
| "This notebook shows how to use in-memory caching, which allows to cache results between runs without writing to disk. This uses the `InMemoryResultStore` and `InMemoryMetadataStore` classes.\n", |
| "\n", |
| "> ⛔ In-memory caching can consume a lot of memory if you're using storing large results. Selectively caching nodes is recommended.\n", |
| "\n", |
| "If you're new to caching, you should take a look at the [caching tutorial](./tutorial.ipynb) first!" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "## Setup\n", |
| "Throughout this tutorial, we'll be using the Hamilton notebook extension to define dataflows directly in the notebook ([see tutorial](https://github.com/DAGWorks-Inc/hamilton/blob/main/examples/jupyter_notebook_magic/example.ipynb)).\n", |
| "\n", |
| "Then, we get the logger for caching and clear previously cached results." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 1, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "import logging\n", |
| "import shutil\n", |
| "\n", |
| "# avoid loading all available plugins for fast startup time\n", |
| "from hamilton import registry\n", |
| "registry.disable_autoload()\n", |
| "registry.load_extension(\"pandas\")\n", |
| "\n", |
| "from hamilton import driver\n", |
| "\n", |
| "# load the notebook extension\n", |
| "%reload_ext hamilton.plugins.jupyter_magic\n", |
| "\n", |
| "logger = logging.getLogger(\"hamilton.caching\")\n", |
| "logger.setLevel(logging.INFO)\n", |
| "logger.addHandler(logging.StreamHandler())\n", |
| "\n", |
| "shutil.rmtree(\"./.hamilton_cache\", ignore_errors=True)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "## Define a dataflow\n", |
| "We define a simple dataflow that loads a dataframe of transactions, filters by date, converts currency to USD, and sums the amount per country." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 2, |
| "metadata": {}, |
| "outputs": [ |
| { |
| "data": { |
| "image/svg+xml": [ |
| "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n", |
| "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n", |
| " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n", |
| "<!-- Generated by graphviz version 2.43.0 (0)\n", |
| " -->\n", |
| "<!-- Title: %3 Pages: 1 -->\n", |
| "<svg width=\"527pt\" height=\"286pt\"\n", |
| " viewBox=\"0.00 0.00 527.00 285.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n", |
| "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 281.5)\">\n", |
| "<title>%3</title>\n", |
| "<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-281.5 523,-281.5 523,4 -4,4\"/>\n", |
| "<g id=\"clust1\" class=\"cluster\">\n", |
| "<title>cluster__legend</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" points=\"18.5,-137.5 18.5,-269.5 114.5,-269.5 114.5,-137.5 18.5,-137.5\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-254.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n", |
| "</g>\n", |
| "<!-- raw_data -->\n", |
| "<g id=\"node1\" class=\"node\">\n", |
| "<title>raw_data</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M104,-127.5C104,-127.5 29,-127.5 29,-127.5 23,-127.5 17,-121.5 17,-115.5 17,-115.5 17,-75.5 17,-75.5 17,-69.5 23,-63.5 29,-63.5 29,-63.5 104,-63.5 104,-63.5 110,-63.5 116,-69.5 116,-75.5 116,-75.5 116,-115.5 116,-115.5 116,-121.5 110,-127.5 104,-127.5\"/>\n", |
| "<text text-anchor=\"start\" x=\"30\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n", |
| "<text text-anchor=\"start\" x=\"28\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- processed_data -->\n", |
| "<g id=\"node2\" class=\"node\">\n", |
| "<title>processed_data</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M296,-90.5C296,-90.5 174,-90.5 174,-90.5 168,-90.5 162,-84.5 162,-78.5 162,-78.5 162,-38.5 162,-38.5 162,-32.5 168,-26.5 174,-26.5 174,-26.5 296,-26.5 296,-26.5 302,-26.5 308,-32.5 308,-38.5 308,-38.5 308,-78.5 308,-78.5 308,-84.5 302,-90.5 296,-90.5\"/>\n", |
| "<text text-anchor=\"start\" x=\"173\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n", |
| "<text text-anchor=\"start\" x=\"196.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- raw_data->processed_data -->\n", |
| "<g id=\"edge1\" class=\"edge\">\n", |
| "<title>raw_data->processed_data</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M116.11,-84.7C127.39,-82.19 139.71,-79.45 151.99,-76.72\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"152.81,-80.13 161.82,-74.54 151.29,-73.29 152.81,-80.13\"/>\n", |
| "</g>\n", |
| "<!-- amount_per_country -->\n", |
| "<g id=\"node3\" class=\"node\">\n", |
| "<title>amount_per_country</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M507,-90.5C507,-90.5 349,-90.5 349,-90.5 343,-90.5 337,-84.5 337,-78.5 337,-78.5 337,-38.5 337,-38.5 337,-32.5 343,-26.5 349,-26.5 349,-26.5 507,-26.5 507,-26.5 513,-26.5 519,-32.5 519,-38.5 519,-38.5 519,-78.5 519,-78.5 519,-84.5 513,-90.5 507,-90.5\"/>\n", |
| "<text text-anchor=\"start\" x=\"348\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n", |
| "<text text-anchor=\"start\" x=\"389.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- processed_data->amount_per_country -->\n", |
| "<g id=\"edge3\" class=\"edge\">\n", |
| "<title>processed_data->amount_per_country</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M308.21,-58.5C314.23,-58.5 320.39,-58.5 326.57,-58.5\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"326.98,-62 336.98,-58.5 326.98,-55 326.98,-62\"/>\n", |
| "</g>\n", |
| "<!-- _processed_data_inputs -->\n", |
| "<g id=\"node4\" class=\"node\">\n", |
| "<title>_processed_data_inputs</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"133,-45 0,-45 0,0 133,0 133,-45\"/>\n", |
| "<text text-anchor=\"start\" x=\"15.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n", |
| "<text text-anchor=\"start\" x=\"99.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n", |
| "</g>\n", |
| "<!-- _processed_data_inputs->processed_data -->\n", |
| "<g id=\"edge2\" class=\"edge\">\n", |
| "<title>_processed_data_inputs->processed_data</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M133.31,-36.73C139.45,-38.06 145.73,-39.41 151.99,-40.77\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"151.3,-44.2 161.81,-42.89 152.78,-37.36 151.3,-44.2\"/>\n", |
| "</g>\n", |
| "<!-- input -->\n", |
| "<g id=\"node5\" class=\"node\">\n", |
| "<title>input</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"96,-238 37,-238 37,-201 96,-201 96,-238\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-215.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n", |
| "</g>\n", |
| "<!-- function -->\n", |
| "<g id=\"node6\" class=\"node\">\n", |
| "<title>function</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M94.5,-183C94.5,-183 38.5,-183 38.5,-183 32.5,-183 26.5,-177 26.5,-171 26.5,-171 26.5,-158 26.5,-158 26.5,-152 32.5,-146 38.5,-146 38.5,-146 94.5,-146 94.5,-146 100.5,-146 106.5,-152 106.5,-158 106.5,-158 106.5,-171 106.5,-171 106.5,-177 100.5,-183 94.5,-183\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-160.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n", |
| "</g>\n", |
| "</g>\n", |
| "</svg>\n" |
| ], |
| "text/plain": [ |
| "<graphviz.graphs.Digraph at 0x7fa266fc7910>" |
| ] |
| }, |
| "metadata": {}, |
| "output_type": "display_data" |
| } |
| ], |
| "source": [ |
| "%%cell_to_module dataflow_module --display\n", |
| "import pandas as pd\n", |
| "\n", |
| "DATA = {\n", |
| " \"cities\": [\"New York\", \"Los Angeles\", \"Chicago\", \"Montréal\", \"Vancouver\"],\n", |
| " \"date\": [\"2024-09-13\", \"2024-09-12\", \"2024-09-11\", \"2024-09-11\", \"2024-09-09\"],\n", |
| " \"amount\": [478.23, 251.67, 989.34, 742.14, 584.56],\n", |
| " \"country\": [\"USA\", \"USA\", \"USA\", \"Canada\", \"Canada\"],\n", |
| " \"currency\": [\"USD\", \"USD\", \"USD\", \"CAD\", \"CAD\"],\n", |
| "}\n", |
| "\n", |
| "def raw_data() -> pd.DataFrame:\n", |
| " \"\"\"Loading raw data. This simulates loading from a file, database, or external service.\"\"\"\n", |
| " return pd.DataFrame(DATA)\n", |
| "\n", |
| "def processed_data(raw_data: pd.DataFrame, cutoff_date: str) -> pd.DataFrame:\n", |
| " \"\"\"Filter out rows before cutoff date and convert currency to USD.\"\"\"\n", |
| " df = raw_data.loc[raw_data.date > cutoff_date].copy()\n", |
| " df[\"amound_in_usd\"] = df[\"amount\"]\n", |
| " df.loc[df.country == \"Canada\", \"amound_in_usd\"] *= 0.71 \n", |
| " df.loc[df.country == \"Brazil\", \"amound_in_usd\"] *= 0.18 # <- LINE ADDED\n", |
| " df.loc[df.country == \"Mexico\", \"amound_in_usd\"] *= 0.05 # <- LINE ADDED\n", |
| " return df\n", |
| "\n", |
| "def amount_per_country(processed_data: pd.DataFrame) -> pd.DataFrame:\n", |
| " \"\"\"Sum the amount in USD per country\"\"\"\n", |
| " return processed_data.groupby(\"country\")[\"amound_in_usd\"].sum().to_frame()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "## In-memory caching\n", |
| "To use in-memory caching, pass `InMemoryResultStore` and `InMemoryMetadataStore` to `Builder().with_cache()`." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 3, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "from hamilton.caching.stores.memory import InMemoryMetadataStore, InMemoryResultStore\n", |
| "\n", |
| "dr = (\n", |
| " driver.Builder()\n", |
| " .with_modules(dataflow_module)\n", |
| " .with_cache(\n", |
| " result_store=InMemoryResultStore(),\n", |
| " metadata_store=InMemoryMetadataStore(),\n", |
| " )\n", |
| " .build()\n", |
| ")" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "### Execution 1\n", |
| "For execution 1, we see that all nodes are executed." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 4, |
| "metadata": {}, |
| "outputs": [ |
| { |
| "name": "stderr", |
| "output_type": "stream", |
| "text": [ |
| "raw_data::adapter::execute_node\n", |
| "processed_data::adapter::execute_node\n" |
| ] |
| }, |
| { |
| "name": "stdout", |
| "output_type": "stream", |
| "text": [ |
| "\n", |
| " cities date amount country currency amound_in_usd\n", |
| "0 New York 2024-09-13 478.23 USA USD 478.2300\n", |
| "1 Los Angeles 2024-09-12 251.67 USA USD 251.6700\n", |
| "2 Chicago 2024-09-11 989.34 USA USD 989.3400\n", |
| "3 Montréal 2024-09-11 742.14 Canada CAD 526.9194\n", |
| "4 Vancouver 2024-09-09 584.56 Canada CAD 415.0376\n" |
| ] |
| }, |
| { |
| "data": { |
| "image/svg+xml": [ |
| "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n", |
| "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n", |
| " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n", |
| "<!-- Generated by graphviz version 2.43.0 (0)\n", |
| " -->\n", |
| "<!-- Title: %3 Pages: 1 -->\n", |
| "<svg width=\"316pt\" height=\"341pt\"\n", |
| " viewBox=\"0.00 0.00 316.00 340.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n", |
| "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 336.5)\">\n", |
| "<title>%3</title>\n", |
| "<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-336.5 312,-336.5 312,4 -4,4\"/>\n", |
| "<g id=\"clust1\" class=\"cluster\">\n", |
| "<title>cluster__legend</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" points=\"18.5,-137.5 18.5,-324.5 114.5,-324.5 114.5,-137.5 18.5,-137.5\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-309.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n", |
| "</g>\n", |
| "<!-- raw_data -->\n", |
| "<g id=\"node1\" class=\"node\">\n", |
| "<title>raw_data</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M104,-127.5C104,-127.5 29,-127.5 29,-127.5 23,-127.5 17,-121.5 17,-115.5 17,-115.5 17,-75.5 17,-75.5 17,-69.5 23,-63.5 29,-63.5 29,-63.5 104,-63.5 104,-63.5 110,-63.5 116,-69.5 116,-75.5 116,-75.5 116,-115.5 116,-115.5 116,-121.5 110,-127.5 104,-127.5\"/>\n", |
| "<text text-anchor=\"start\" x=\"30\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n", |
| "<text text-anchor=\"start\" x=\"28\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- processed_data -->\n", |
| "<g id=\"node2\" class=\"node\">\n", |
| "<title>processed_data</title>\n", |
| "<path fill=\"#ffc857\" stroke=\"black\" d=\"M296,-90.5C296,-90.5 174,-90.5 174,-90.5 168,-90.5 162,-84.5 162,-78.5 162,-78.5 162,-38.5 162,-38.5 162,-32.5 168,-26.5 174,-26.5 174,-26.5 296,-26.5 296,-26.5 302,-26.5 308,-32.5 308,-38.5 308,-38.5 308,-78.5 308,-78.5 308,-84.5 302,-90.5 296,-90.5\"/>\n", |
| "<text text-anchor=\"start\" x=\"173\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n", |
| "<text text-anchor=\"start\" x=\"196.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- raw_data->processed_data -->\n", |
| "<g id=\"edge1\" class=\"edge\">\n", |
| "<title>raw_data->processed_data</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M116.11,-84.7C127.39,-82.19 139.71,-79.45 151.99,-76.72\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"152.81,-80.13 161.82,-74.54 151.29,-73.29 152.81,-80.13\"/>\n", |
| "</g>\n", |
| "<!-- _processed_data_inputs -->\n", |
| "<g id=\"node3\" class=\"node\">\n", |
| "<title>_processed_data_inputs</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"133,-45 0,-45 0,0 133,0 133,-45\"/>\n", |
| "<text text-anchor=\"start\" x=\"15.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n", |
| "<text text-anchor=\"start\" x=\"99.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n", |
| "</g>\n", |
| "<!-- _processed_data_inputs->processed_data -->\n", |
| "<g id=\"edge2\" class=\"edge\">\n", |
| "<title>_processed_data_inputs->processed_data</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M133.31,-36.73C139.45,-38.06 145.73,-39.41 151.99,-40.77\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"151.3,-44.2 161.81,-42.89 152.78,-37.36 151.3,-44.2\"/>\n", |
| "</g>\n", |
| "<!-- input -->\n", |
| "<g id=\"node4\" class=\"node\">\n", |
| "<title>input</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"96,-293 37,-293 37,-256 96,-256 96,-293\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-270.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n", |
| "</g>\n", |
| "<!-- function -->\n", |
| "<g id=\"node5\" class=\"node\">\n", |
| "<title>function</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M94.5,-238C94.5,-238 38.5,-238 38.5,-238 32.5,-238 26.5,-232 26.5,-226 26.5,-226 26.5,-213 26.5,-213 26.5,-207 32.5,-201 38.5,-201 38.5,-201 94.5,-201 94.5,-201 100.5,-201 106.5,-207 106.5,-213 106.5,-213 106.5,-226 106.5,-226 106.5,-232 100.5,-238 94.5,-238\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-215.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n", |
| "</g>\n", |
| "<!-- output -->\n", |
| "<g id=\"node6\" class=\"node\">\n", |
| "<title>output</title>\n", |
| "<path fill=\"#ffc857\" stroke=\"black\" d=\"M88.5,-183C88.5,-183 44.5,-183 44.5,-183 38.5,-183 32.5,-177 32.5,-171 32.5,-171 32.5,-158 32.5,-158 32.5,-152 38.5,-146 44.5,-146 44.5,-146 88.5,-146 88.5,-146 94.5,-146 100.5,-152 100.5,-158 100.5,-158 100.5,-171 100.5,-171 100.5,-177 94.5,-183 88.5,-183\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-160.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n", |
| "</g>\n", |
| "</g>\n", |
| "</svg>\n" |
| ], |
| "text/plain": [ |
| "<graphviz.graphs.Digraph at 0x7fa2668b4e90>" |
| ] |
| }, |
| "execution_count": 4, |
| "metadata": {}, |
| "output_type": "execute_result" |
| } |
| ], |
| "source": [ |
| "results = dr.execute([\"processed_data\"], inputs={\"cutoff_date\": \"2024-09-01\"})\n", |
| "print()\n", |
| "print(results[\"processed_data\"].head())\n", |
| "dr.cache.view_run()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "### Execution 2\n", |
| "For execution 2, we see that all nodes are retrieved from cache." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 5, |
| "metadata": {}, |
| "outputs": [ |
| { |
| "name": "stderr", |
| "output_type": "stream", |
| "text": [ |
| "raw_data::result_store::get_result::hit\n", |
| "processed_data::result_store::get_result::hit\n" |
| ] |
| }, |
| { |
| "name": "stdout", |
| "output_type": "stream", |
| "text": [ |
| "\n", |
| " cities date amount country currency amound_in_usd\n", |
| "0 New York 2024-09-13 478.23 USA USD 478.2300\n", |
| "1 Los Angeles 2024-09-12 251.67 USA USD 251.6700\n", |
| "2 Chicago 2024-09-11 989.34 USA USD 989.3400\n", |
| "3 Montréal 2024-09-11 742.14 Canada CAD 526.9194\n", |
| "4 Vancouver 2024-09-09 584.56 Canada CAD 415.0376\n" |
| ] |
| }, |
| { |
| "data": { |
| "image/svg+xml": [ |
| "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n", |
| "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n", |
| " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n", |
| "<!-- Generated by graphviz version 2.43.0 (0)\n", |
| " -->\n", |
| "<!-- Title: %3 Pages: 1 -->\n", |
| "<svg width=\"316pt\" height=\"341pt\"\n", |
| " viewBox=\"0.00 0.00 316.00 340.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n", |
| "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 336.5)\">\n", |
| "<title>%3</title>\n", |
| "<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-336.5 312,-336.5 312,4 -4,4\"/>\n", |
| "<g id=\"clust1\" class=\"cluster\">\n", |
| "<title>cluster__legend</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" points=\"8.5,-137.5 8.5,-324.5 124.5,-324.5 124.5,-137.5 8.5,-137.5\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-309.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n", |
| "</g>\n", |
| "<!-- raw_data -->\n", |
| "<g id=\"node1\" class=\"node\">\n", |
| "<title>raw_data</title>\n", |
| "<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104,-127.5C104,-127.5 29,-127.5 29,-127.5 23,-127.5 17,-121.5 17,-115.5 17,-115.5 17,-75.5 17,-75.5 17,-69.5 23,-63.5 29,-63.5 29,-63.5 104,-63.5 104,-63.5 110,-63.5 116,-69.5 116,-75.5 116,-75.5 116,-115.5 116,-115.5 116,-121.5 110,-127.5 104,-127.5\"/>\n", |
| "<text text-anchor=\"start\" x=\"30\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n", |
| "<text text-anchor=\"start\" x=\"28\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- processed_data -->\n", |
| "<g id=\"node2\" class=\"node\">\n", |
| "<title>processed_data</title>\n", |
| "<path fill=\"#ffc857\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M296,-90.5C296,-90.5 174,-90.5 174,-90.5 168,-90.5 162,-84.5 162,-78.5 162,-78.5 162,-38.5 162,-38.5 162,-32.5 168,-26.5 174,-26.5 174,-26.5 296,-26.5 296,-26.5 302,-26.5 308,-32.5 308,-38.5 308,-38.5 308,-78.5 308,-78.5 308,-84.5 302,-90.5 296,-90.5\"/>\n", |
| "<text text-anchor=\"start\" x=\"173\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n", |
| "<text text-anchor=\"start\" x=\"196.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- raw_data->processed_data -->\n", |
| "<g id=\"edge1\" class=\"edge\">\n", |
| "<title>raw_data->processed_data</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M116.11,-84.7C127.39,-82.19 139.71,-79.45 151.99,-76.72\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"152.81,-80.13 161.82,-74.54 151.29,-73.29 152.81,-80.13\"/>\n", |
| "</g>\n", |
| "<!-- _processed_data_inputs -->\n", |
| "<g id=\"node3\" class=\"node\">\n", |
| "<title>_processed_data_inputs</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"133,-45 0,-45 0,0 133,0 133,-45\"/>\n", |
| "<text text-anchor=\"start\" x=\"15.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n", |
| "<text text-anchor=\"start\" x=\"99.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n", |
| "</g>\n", |
| "<!-- _processed_data_inputs->processed_data -->\n", |
| "<g id=\"edge2\" class=\"edge\">\n", |
| "<title>_processed_data_inputs->processed_data</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M133.31,-36.73C139.45,-38.06 145.73,-39.41 151.99,-40.77\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"151.3,-44.2 161.81,-42.89 152.78,-37.36 151.3,-44.2\"/>\n", |
| "</g>\n", |
| "<!-- input -->\n", |
| "<g id=\"node4\" class=\"node\">\n", |
| "<title>input</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"96,-293 37,-293 37,-256 96,-256 96,-293\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-270.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n", |
| "</g>\n", |
| "<!-- output -->\n", |
| "<g id=\"node5\" class=\"node\">\n", |
| "<title>output</title>\n", |
| "<path fill=\"#ffc857\" stroke=\"black\" d=\"M88.5,-238C88.5,-238 44.5,-238 44.5,-238 38.5,-238 32.5,-232 32.5,-226 32.5,-226 32.5,-213 32.5,-213 32.5,-207 38.5,-201 44.5,-201 44.5,-201 88.5,-201 88.5,-201 94.5,-201 100.5,-207 100.5,-213 100.5,-213 100.5,-226 100.5,-226 100.5,-232 94.5,-238 88.5,-238\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-215.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n", |
| "</g>\n", |
| "<!-- from cache -->\n", |
| "<g id=\"node6\" class=\"node\">\n", |
| "<title>from cache</title>\n", |
| "<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104.5,-183C104.5,-183 28.5,-183 28.5,-183 22.5,-183 16.5,-177 16.5,-171 16.5,-171 16.5,-158 16.5,-158 16.5,-152 22.5,-146 28.5,-146 28.5,-146 104.5,-146 104.5,-146 110.5,-146 116.5,-152 116.5,-158 116.5,-158 116.5,-171 116.5,-171 116.5,-177 110.5,-183 104.5,-183\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-160.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">from cache</text>\n", |
| "</g>\n", |
| "</g>\n", |
| "</svg>\n" |
| ], |
| "text/plain": [ |
| "<graphviz.graphs.Digraph at 0x7fa2653b1d50>" |
| ] |
| }, |
| "execution_count": 5, |
| "metadata": {}, |
| "output_type": "execute_result" |
| } |
| ], |
| "source": [ |
| "results = dr.execute([\"processed_data\"], inputs={\"cutoff_date\": \"2024-09-01\"})\n", |
| "print()\n", |
| "print(results[\"processed_data\"].head())\n", |
| "dr.cache.view_run()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "## Persisting in-memory data\n", |
| "\n", |
| "Now, we import `SQLiteMetadataStore` and `FileResultStore` to persist the data to disk. We access the in-memory stores via `dr.cache.result_store` and `dr.cache.metadata_store` and call the `.persist_to()` method on each.\n", |
| "\n", |
| "After executing the cell, you should see a new directory `./.persisted_cache` with results and metadata." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 6, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "from hamilton.caching.stores.sqlite import SQLiteMetadataStore\n", |
| "from hamilton.caching.stores.file import FileResultStore\n", |
| "\n", |
| "path = \"./.persisted_cache\"\n", |
| "on_disk_results = FileResultStore(path=path)\n", |
| "on_disk_metadata = SQLiteMetadataStore(path=path)\n", |
| "\n", |
| "dr.cache.result_store.persist_to(on_disk_results)\n", |
| "dr.cache.metadata_store.persist_to(on_disk_metadata)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "## Loading persisted data\n", |
| "\n", |
| "Now, we create a new `Driver`. Instead of starting with empty in-memory stores, we will load the previously persisted results by calling `.load_from()` on the `InMemoryResultStore` and `InMemoryMetadataStore` classes.\n", |
| "\n", |
| "For `InMemoryResultStore.load_from()`, we must provide a `MetadataStore` or a list of `data_version` to load results for." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 7, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "dr = (\n", |
| " driver.Builder()\n", |
| " .with_modules(dataflow_module)\n", |
| " .with_cache(\n", |
| " result_store=InMemoryResultStore.load_from(\n", |
| " on_disk_results,\n", |
| " metadata_store=on_disk_metadata,\n", |
| " ),\n", |
| " metadata_store=InMemoryMetadataStore.load_from(on_disk_metadata),\n", |
| " )\n", |
| " .build()\n", |
| ")" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "We print the size of the metadata store to show it contains 2 entries (one for `raw_data` and another for `processed_data`). Also, we see that results load from `FileResultStore`are successfully retrieved from the in-memory stores." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 8, |
| "metadata": {}, |
| "outputs": [ |
| { |
| "name": "stderr", |
| "output_type": "stream", |
| "text": [ |
| "raw_data::result_store::get_result::hit\n", |
| "processed_data::result_store::get_result::hit\n" |
| ] |
| }, |
| { |
| "name": "stdout", |
| "output_type": "stream", |
| "text": [ |
| "2\n", |
| "\n", |
| " cities date amount country currency amound_in_usd\n", |
| "0 New York 2024-09-13 478.23 USA USD 478.2300\n", |
| "1 Los Angeles 2024-09-12 251.67 USA USD 251.6700\n", |
| "2 Chicago 2024-09-11 989.34 USA USD 989.3400\n", |
| "3 Montréal 2024-09-11 742.14 Canada CAD 526.9194\n", |
| "4 Vancouver 2024-09-09 584.56 Canada CAD 415.0376\n" |
| ] |
| }, |
| { |
| "data": { |
| "image/svg+xml": [ |
| "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n", |
| "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n", |
| " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n", |
| "<!-- Generated by graphviz version 2.43.0 (0)\n", |
| " -->\n", |
| "<!-- Title: %3 Pages: 1 -->\n", |
| "<svg width=\"316pt\" height=\"341pt\"\n", |
| " viewBox=\"0.00 0.00 316.00 340.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n", |
| "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 336.5)\">\n", |
| "<title>%3</title>\n", |
| "<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-336.5 312,-336.5 312,4 -4,4\"/>\n", |
| "<g id=\"clust1\" class=\"cluster\">\n", |
| "<title>cluster__legend</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" points=\"8.5,-137.5 8.5,-324.5 124.5,-324.5 124.5,-137.5 8.5,-137.5\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-309.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n", |
| "</g>\n", |
| "<!-- raw_data -->\n", |
| "<g id=\"node1\" class=\"node\">\n", |
| "<title>raw_data</title>\n", |
| "<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104,-127.5C104,-127.5 29,-127.5 29,-127.5 23,-127.5 17,-121.5 17,-115.5 17,-115.5 17,-75.5 17,-75.5 17,-69.5 23,-63.5 29,-63.5 29,-63.5 104,-63.5 104,-63.5 110,-63.5 116,-69.5 116,-75.5 116,-75.5 116,-115.5 116,-115.5 116,-121.5 110,-127.5 104,-127.5\"/>\n", |
| "<text text-anchor=\"start\" x=\"30\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n", |
| "<text text-anchor=\"start\" x=\"28\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- processed_data -->\n", |
| "<g id=\"node2\" class=\"node\">\n", |
| "<title>processed_data</title>\n", |
| "<path fill=\"#ffc857\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M296,-90.5C296,-90.5 174,-90.5 174,-90.5 168,-90.5 162,-84.5 162,-78.5 162,-78.5 162,-38.5 162,-38.5 162,-32.5 168,-26.5 174,-26.5 174,-26.5 296,-26.5 296,-26.5 302,-26.5 308,-32.5 308,-38.5 308,-38.5 308,-78.5 308,-78.5 308,-84.5 302,-90.5 296,-90.5\"/>\n", |
| "<text text-anchor=\"start\" x=\"173\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n", |
| "<text text-anchor=\"start\" x=\"196.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- raw_data->processed_data -->\n", |
| "<g id=\"edge1\" class=\"edge\">\n", |
| "<title>raw_data->processed_data</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M116.11,-84.7C127.39,-82.19 139.71,-79.45 151.99,-76.72\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"152.81,-80.13 161.82,-74.54 151.29,-73.29 152.81,-80.13\"/>\n", |
| "</g>\n", |
| "<!-- _processed_data_inputs -->\n", |
| "<g id=\"node3\" class=\"node\">\n", |
| "<title>_processed_data_inputs</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"133,-45 0,-45 0,0 133,0 133,-45\"/>\n", |
| "<text text-anchor=\"start\" x=\"15.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n", |
| "<text text-anchor=\"start\" x=\"99.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n", |
| "</g>\n", |
| "<!-- _processed_data_inputs->processed_data -->\n", |
| "<g id=\"edge2\" class=\"edge\">\n", |
| "<title>_processed_data_inputs->processed_data</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M133.31,-36.73C139.45,-38.06 145.73,-39.41 151.99,-40.77\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"151.3,-44.2 161.81,-42.89 152.78,-37.36 151.3,-44.2\"/>\n", |
| "</g>\n", |
| "<!-- input -->\n", |
| "<g id=\"node4\" class=\"node\">\n", |
| "<title>input</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"96,-293 37,-293 37,-256 96,-256 96,-293\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-270.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n", |
| "</g>\n", |
| "<!-- output -->\n", |
| "<g id=\"node5\" class=\"node\">\n", |
| "<title>output</title>\n", |
| "<path fill=\"#ffc857\" stroke=\"black\" d=\"M88.5,-238C88.5,-238 44.5,-238 44.5,-238 38.5,-238 32.5,-232 32.5,-226 32.5,-226 32.5,-213 32.5,-213 32.5,-207 38.5,-201 44.5,-201 44.5,-201 88.5,-201 88.5,-201 94.5,-201 100.5,-207 100.5,-213 100.5,-213 100.5,-226 100.5,-226 100.5,-232 94.5,-238 88.5,-238\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-215.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n", |
| "</g>\n", |
| "<!-- from cache -->\n", |
| "<g id=\"node6\" class=\"node\">\n", |
| "<title>from cache</title>\n", |
| "<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104.5,-183C104.5,-183 28.5,-183 28.5,-183 22.5,-183 16.5,-177 16.5,-171 16.5,-171 16.5,-158 16.5,-158 16.5,-152 22.5,-146 28.5,-146 28.5,-146 104.5,-146 104.5,-146 110.5,-146 116.5,-152 116.5,-158 116.5,-158 116.5,-171 116.5,-171 116.5,-177 110.5,-183 104.5,-183\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-160.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">from cache</text>\n", |
| "</g>\n", |
| "</g>\n", |
| "</svg>\n" |
| ], |
| "text/plain": [ |
| "<graphviz.graphs.Digraph at 0x7fa2653aa910>" |
| ] |
| }, |
| "execution_count": 8, |
| "metadata": {}, |
| "output_type": "execute_result" |
| } |
| ], |
| "source": [ |
| "print(dr.cache.metadata_store.size)\n", |
| "\n", |
| "results = dr.execute([\"processed_data\"], inputs={\"cutoff_date\": \"2024-09-01\"})\n", |
| "print()\n", |
| "print(results[\"processed_data\"].head())\n", |
| "dr.cache.view_run()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "## Use cases\n", |
| "\n", |
| "In-memory caching can be useful when you're doing a lot of experimentation in a notebook or an interactive session and don't want to persist results for future use. \n", |
| "\n", |
| "It can also speed up execution in some cases because you're no longer doing read/write to disk for each node execution." |
| ] |
| } |
| ], |
| "metadata": { |
| "kernelspec": { |
| "display_name": ".venv", |
| "language": "python", |
| "name": "python3" |
| }, |
| "language_info": { |
| "codemirror_mode": { |
| "name": "ipython", |
| "version": 3 |
| }, |
| "file_extension": ".py", |
| "mimetype": "text/x-python", |
| "name": "python", |
| "nbconvert_exporter": "python", |
| "pygments_lexer": "ipython3", |
| "version": "3.11.9" |
| } |
| }, |
| "nbformat": 4, |
| "nbformat_minor": 2 |
| } |