| { |
| "cells": [ |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "Licensed to the Apache Software Foundation (ASF) under one\nor more contributor license agreements. See the NOTICE file\ndistributed with this work for additional information\nregarding copyright ownership. The ASF licenses this file\nto you under the Apache License, Version 2.0 (the\n\"License\"); you may not use this file except in compliance\nwith the License. You may obtain a copy of the License at\n\n http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing,\nsoftware distributed under the License is distributed on an\n\"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\nKIND, either express or implied. See the License for the\nspecific language governing permissions and limitations\nunder the License." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "# Execute this cell to install dependencies\n", |
| "%pip install sf-hamilton[visualization]" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "# In-memory caching tutorial [](https://colab.research.google.com/github/dagworks-inc/hamilton/blob/main/examples/caching/in_memory_tutorial.ipynb) [](https://github.com/apache/hamilton/blob/main/examples/caching/in_memory_tutorial.ipynb)\n", |
| "\n", |
| "\n", |
| "This notebook shows how to use in-memory caching, which allows to cache results between runs without writing to disk. This uses the `InMemoryResultStore` and `InMemoryMetadataStore` classes.\n", |
| "\n", |
| "> ⛔ In-memory caching can consume a lot of memory if you're using storing large results. Selectively caching nodes is recommended.\n", |
| "\n", |
| "If you're new to caching, you should take a look at the [caching tutorial](./tutorial.ipynb) first!" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "## Setup\n", |
| "Throughout this tutorial, we'll be using the Hamilton notebook extension to define dataflows directly in the notebook ([see tutorial](https://github.com/apache/hamilton/blob/main/examples/jupyter_notebook_magic/example.ipynb)).\n", |
| "\n", |
| "Then, we get the logger for caching and clear previously cached results." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 1, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "import logging\n", |
| "import shutil\n", |
| "\n", |
| "# avoid loading all available plugins for fast startup time\n", |
| "from hamilton import registry\n", |
| "\n", |
| "registry.disable_autoload()\n", |
| "registry.load_extension(\"pandas\")\n", |
| "\n", |
| "from hamilton import driver\n", |
| "\n", |
| "# load the notebook extension\n", |
| "%reload_ext hamilton.plugins.jupyter_magic\n", |
| "\n", |
| "logger = logging.getLogger(\"hamilton.caching\")\n", |
| "logger.setLevel(logging.INFO)\n", |
| "logger.addHandler(logging.StreamHandler())\n", |
| "\n", |
| "shutil.rmtree(\"./.hamilton_cache\", ignore_errors=True)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "## Define a dataflow\n", |
| "We define a simple dataflow that loads a dataframe of transactions, filters by date, converts currency to USD, and sums the amount per country." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 2, |
| "metadata": {}, |
| "outputs": [ |
| { |
| "data": { |
| "image/svg+xml": [ |
| "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n", |
| "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n", |
| " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n", |
| "<!-- Generated by graphviz version 2.43.0 (0)\n", |
| " -->\n", |
| "<!-- Title: %3 Pages: 1 -->\n", |
| "<svg width=\"527pt\" height=\"286pt\"\n", |
| " viewBox=\"0.00 0.00 527.00 285.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n", |
| "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 281.5)\">\n", |
| "<title>%3</title>\n", |
| "<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-281.5 523,-281.5 523,4 -4,4\"/>\n", |
| "<g id=\"clust1\" class=\"cluster\">\n", |
| "<title>cluster__legend</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" points=\"18.5,-137.5 18.5,-269.5 114.5,-269.5 114.5,-137.5 18.5,-137.5\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-254.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n", |
| "</g>\n", |
| "<!-- raw_data -->\n", |
| "<g id=\"node1\" class=\"node\">\n", |
| "<title>raw_data</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M104,-127.5C104,-127.5 29,-127.5 29,-127.5 23,-127.5 17,-121.5 17,-115.5 17,-115.5 17,-75.5 17,-75.5 17,-69.5 23,-63.5 29,-63.5 29,-63.5 104,-63.5 104,-63.5 110,-63.5 116,-69.5 116,-75.5 116,-75.5 116,-115.5 116,-115.5 116,-121.5 110,-127.5 104,-127.5\"/>\n", |
| "<text text-anchor=\"start\" x=\"30\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n", |
| "<text text-anchor=\"start\" x=\"28\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- processed_data -->\n", |
| "<g id=\"node2\" class=\"node\">\n", |
| "<title>processed_data</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M296,-90.5C296,-90.5 174,-90.5 174,-90.5 168,-90.5 162,-84.5 162,-78.5 162,-78.5 162,-38.5 162,-38.5 162,-32.5 168,-26.5 174,-26.5 174,-26.5 296,-26.5 296,-26.5 302,-26.5 308,-32.5 308,-38.5 308,-38.5 308,-78.5 308,-78.5 308,-84.5 302,-90.5 296,-90.5\"/>\n", |
| "<text text-anchor=\"start\" x=\"173\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n", |
| "<text text-anchor=\"start\" x=\"196.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- raw_data->processed_data -->\n", |
| "<g id=\"edge1\" class=\"edge\">\n", |
| "<title>raw_data->processed_data</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M116.11,-84.7C127.39,-82.19 139.71,-79.45 151.99,-76.72\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"152.81,-80.13 161.82,-74.54 151.29,-73.29 152.81,-80.13\"/>\n", |
| "</g>\n", |
| "<!-- amount_per_country -->\n", |
| "<g id=\"node3\" class=\"node\">\n", |
| "<title>amount_per_country</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M507,-90.5C507,-90.5 349,-90.5 349,-90.5 343,-90.5 337,-84.5 337,-78.5 337,-78.5 337,-38.5 337,-38.5 337,-32.5 343,-26.5 349,-26.5 349,-26.5 507,-26.5 507,-26.5 513,-26.5 519,-32.5 519,-38.5 519,-38.5 519,-78.5 519,-78.5 519,-84.5 513,-90.5 507,-90.5\"/>\n", |
| "<text text-anchor=\"start\" x=\"348\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n", |
| "<text text-anchor=\"start\" x=\"389.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- processed_data->amount_per_country -->\n", |
| "<g id=\"edge3\" class=\"edge\">\n", |
| "<title>processed_data->amount_per_country</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M308.21,-58.5C314.23,-58.5 320.39,-58.5 326.57,-58.5\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"326.98,-62 336.98,-58.5 326.98,-55 326.98,-62\"/>\n", |
| "</g>\n", |
| "<!-- _processed_data_inputs -->\n", |
| "<g id=\"node4\" class=\"node\">\n", |
| "<title>_processed_data_inputs</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"133,-45 0,-45 0,0 133,0 133,-45\"/>\n", |
| "<text text-anchor=\"start\" x=\"15.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n", |
| "<text text-anchor=\"start\" x=\"99.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n", |
| "</g>\n", |
| "<!-- _processed_data_inputs->processed_data -->\n", |
| "<g id=\"edge2\" class=\"edge\">\n", |
| "<title>_processed_data_inputs->processed_data</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M133.31,-36.73C139.45,-38.06 145.73,-39.41 151.99,-40.77\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"151.3,-44.2 161.81,-42.89 152.78,-37.36 151.3,-44.2\"/>\n", |
| "</g>\n", |
| "<!-- input -->\n", |
| "<g id=\"node5\" class=\"node\">\n", |
| "<title>input</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"96,-238 37,-238 37,-201 96,-201 96,-238\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-215.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n", |
| "</g>\n", |
| "<!-- function -->\n", |
| "<g id=\"node6\" class=\"node\">\n", |
| "<title>function</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M94.5,-183C94.5,-183 38.5,-183 38.5,-183 32.5,-183 26.5,-177 26.5,-171 26.5,-171 26.5,-158 26.5,-158 26.5,-152 32.5,-146 38.5,-146 38.5,-146 94.5,-146 94.5,-146 100.5,-146 106.5,-152 106.5,-158 106.5,-158 106.5,-171 106.5,-171 106.5,-177 100.5,-183 94.5,-183\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-160.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n", |
| "</g>\n", |
| "</g>\n", |
| "</svg>\n" |
| ], |
| "text/plain": [ |
| "<graphviz.graphs.Digraph at 0x7fa266fc7910>" |
| ] |
| }, |
| "metadata": {}, |
| "output_type": "display_data" |
| } |
| ], |
| "source": [ |
| "%%cell_to_module dataflow_module --display\n", |
| "import pandas as pd\n", |
| "\n", |
| "DATA = {\n", |
| " \"cities\": [\"New York\", \"Los Angeles\", \"Chicago\", \"Montréal\", \"Vancouver\"],\n", |
| " \"date\": [\"2024-09-13\", \"2024-09-12\", \"2024-09-11\", \"2024-09-11\", \"2024-09-09\"],\n", |
| " \"amount\": [478.23, 251.67, 989.34, 742.14, 584.56],\n", |
| " \"country\": [\"USA\", \"USA\", \"USA\", \"Canada\", \"Canada\"],\n", |
| " \"currency\": [\"USD\", \"USD\", \"USD\", \"CAD\", \"CAD\"],\n", |
| "}\n", |
| "\n", |
| "def raw_data() -> pd.DataFrame:\n", |
| " \"\"\"Loading raw data. This simulates loading from a file, database, or external service.\"\"\"\n", |
| " return pd.DataFrame(DATA)\n", |
| "\n", |
| "def processed_data(raw_data: pd.DataFrame, cutoff_date: str) -> pd.DataFrame:\n", |
| " \"\"\"Filter out rows before cutoff date and convert currency to USD.\"\"\"\n", |
| " df = raw_data.loc[raw_data.date > cutoff_date].copy()\n", |
| " df[\"amound_in_usd\"] = df[\"amount\"]\n", |
| " df.loc[df.country == \"Canada\", \"amound_in_usd\"] *= 0.71 \n", |
| " df.loc[df.country == \"Brazil\", \"amound_in_usd\"] *= 0.18 # <- LINE ADDED\n", |
| " df.loc[df.country == \"Mexico\", \"amound_in_usd\"] *= 0.05 # <- LINE ADDED\n", |
| " return df\n", |
| "\n", |
| "def amount_per_country(processed_data: pd.DataFrame) -> pd.DataFrame:\n", |
| " \"\"\"Sum the amount in USD per country\"\"\"\n", |
| " return processed_data.groupby(\"country\")[\"amound_in_usd\"].sum().to_frame()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "## In-memory caching\n", |
| "To use in-memory caching, pass `InMemoryResultStore` and `InMemoryMetadataStore` to `Builder().with_cache()`." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 3, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "from hamilton.caching.stores.memory import InMemoryMetadataStore, InMemoryResultStore\n", |
| "\n", |
| "dr = (\n", |
| " driver.Builder()\n", |
| " .with_modules(dataflow_module)\n", |
| " .with_cache(\n", |
| " result_store=InMemoryResultStore(),\n", |
| " metadata_store=InMemoryMetadataStore(),\n", |
| " )\n", |
| " .build()\n", |
| ")" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "### Execution 1\n", |
| "For execution 1, we see that all nodes are executed." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 4, |
| "metadata": {}, |
| "outputs": [ |
| { |
| "name": "stderr", |
| "output_type": "stream", |
| "text": [ |
| "raw_data::adapter::execute_node\n", |
| "processed_data::adapter::execute_node\n" |
| ] |
| }, |
| { |
| "name": "stdout", |
| "output_type": "stream", |
| "text": [ |
| "\n", |
| " cities date amount country currency amound_in_usd\n", |
| "0 New York 2024-09-13 478.23 USA USD 478.2300\n", |
| "1 Los Angeles 2024-09-12 251.67 USA USD 251.6700\n", |
| "2 Chicago 2024-09-11 989.34 USA USD 989.3400\n", |
| "3 Montréal 2024-09-11 742.14 Canada CAD 526.9194\n", |
| "4 Vancouver 2024-09-09 584.56 Canada CAD 415.0376\n" |
| ] |
| }, |
| { |
| "data": { |
| "image/svg+xml": [ |
| "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n", |
| "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n", |
| " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n", |
| "<!-- Generated by graphviz version 2.43.0 (0)\n", |
| " -->\n", |
| "<!-- Title: %3 Pages: 1 -->\n", |
| "<svg width=\"316pt\" height=\"341pt\"\n", |
| " viewBox=\"0.00 0.00 316.00 340.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n", |
| "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 336.5)\">\n", |
| "<title>%3</title>\n", |
| "<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-336.5 312,-336.5 312,4 -4,4\"/>\n", |
| "<g id=\"clust1\" class=\"cluster\">\n", |
| "<title>cluster__legend</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" points=\"18.5,-137.5 18.5,-324.5 114.5,-324.5 114.5,-137.5 18.5,-137.5\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-309.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n", |
| "</g>\n", |
| "<!-- raw_data -->\n", |
| "<g id=\"node1\" class=\"node\">\n", |
| "<title>raw_data</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M104,-127.5C104,-127.5 29,-127.5 29,-127.5 23,-127.5 17,-121.5 17,-115.5 17,-115.5 17,-75.5 17,-75.5 17,-69.5 23,-63.5 29,-63.5 29,-63.5 104,-63.5 104,-63.5 110,-63.5 116,-69.5 116,-75.5 116,-75.5 116,-115.5 116,-115.5 116,-121.5 110,-127.5 104,-127.5\"/>\n", |
| "<text text-anchor=\"start\" x=\"30\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n", |
| "<text text-anchor=\"start\" x=\"28\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- processed_data -->\n", |
| "<g id=\"node2\" class=\"node\">\n", |
| "<title>processed_data</title>\n", |
| "<path fill=\"#ffc857\" stroke=\"black\" d=\"M296,-90.5C296,-90.5 174,-90.5 174,-90.5 168,-90.5 162,-84.5 162,-78.5 162,-78.5 162,-38.5 162,-38.5 162,-32.5 168,-26.5 174,-26.5 174,-26.5 296,-26.5 296,-26.5 302,-26.5 308,-32.5 308,-38.5 308,-38.5 308,-78.5 308,-78.5 308,-84.5 302,-90.5 296,-90.5\"/>\n", |
| "<text text-anchor=\"start\" x=\"173\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n", |
| "<text text-anchor=\"start\" x=\"196.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- raw_data->processed_data -->\n", |
| "<g id=\"edge1\" class=\"edge\">\n", |
| "<title>raw_data->processed_data</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M116.11,-84.7C127.39,-82.19 139.71,-79.45 151.99,-76.72\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"152.81,-80.13 161.82,-74.54 151.29,-73.29 152.81,-80.13\"/>\n", |
| "</g>\n", |
| "<!-- _processed_data_inputs -->\n", |
| "<g id=\"node3\" class=\"node\">\n", |
| "<title>_processed_data_inputs</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"133,-45 0,-45 0,0 133,0 133,-45\"/>\n", |
| "<text text-anchor=\"start\" x=\"15.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n", |
| "<text text-anchor=\"start\" x=\"99.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n", |
| "</g>\n", |
| "<!-- _processed_data_inputs->processed_data -->\n", |
| "<g id=\"edge2\" class=\"edge\">\n", |
| "<title>_processed_data_inputs->processed_data</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M133.31,-36.73C139.45,-38.06 145.73,-39.41 151.99,-40.77\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"151.3,-44.2 161.81,-42.89 152.78,-37.36 151.3,-44.2\"/>\n", |
| "</g>\n", |
| "<!-- input -->\n", |
| "<g id=\"node4\" class=\"node\">\n", |
| "<title>input</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"96,-293 37,-293 37,-256 96,-256 96,-293\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-270.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n", |
| "</g>\n", |
| "<!-- function -->\n", |
| "<g id=\"node5\" class=\"node\">\n", |
| "<title>function</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M94.5,-238C94.5,-238 38.5,-238 38.5,-238 32.5,-238 26.5,-232 26.5,-226 26.5,-226 26.5,-213 26.5,-213 26.5,-207 32.5,-201 38.5,-201 38.5,-201 94.5,-201 94.5,-201 100.5,-201 106.5,-207 106.5,-213 106.5,-213 106.5,-226 106.5,-226 106.5,-232 100.5,-238 94.5,-238\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-215.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n", |
| "</g>\n", |
| "<!-- output -->\n", |
| "<g id=\"node6\" class=\"node\">\n", |
| "<title>output</title>\n", |
| "<path fill=\"#ffc857\" stroke=\"black\" d=\"M88.5,-183C88.5,-183 44.5,-183 44.5,-183 38.5,-183 32.5,-177 32.5,-171 32.5,-171 32.5,-158 32.5,-158 32.5,-152 38.5,-146 44.5,-146 44.5,-146 88.5,-146 88.5,-146 94.5,-146 100.5,-152 100.5,-158 100.5,-158 100.5,-171 100.5,-171 100.5,-177 94.5,-183 88.5,-183\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-160.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n", |
| "</g>\n", |
| "</g>\n", |
| "</svg>\n" |
| ], |
| "text/plain": [ |
| "<graphviz.graphs.Digraph at 0x7fa2668b4e90>" |
| ] |
| }, |
| "execution_count": 4, |
| "metadata": {}, |
| "output_type": "execute_result" |
| } |
| ], |
| "source": [ |
| "results = dr.execute([\"processed_data\"], inputs={\"cutoff_date\": \"2024-09-01\"})\n", |
| "print()\n", |
| "print(results[\"processed_data\"].head())\n", |
| "dr.cache.view_run()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "### Execution 2\n", |
| "For execution 2, we see that all nodes are retrieved from cache." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 5, |
| "metadata": {}, |
| "outputs": [ |
| { |
| "name": "stderr", |
| "output_type": "stream", |
| "text": [ |
| "raw_data::result_store::get_result::hit\n", |
| "processed_data::result_store::get_result::hit\n" |
| ] |
| }, |
| { |
| "name": "stdout", |
| "output_type": "stream", |
| "text": [ |
| "\n", |
| " cities date amount country currency amound_in_usd\n", |
| "0 New York 2024-09-13 478.23 USA USD 478.2300\n", |
| "1 Los Angeles 2024-09-12 251.67 USA USD 251.6700\n", |
| "2 Chicago 2024-09-11 989.34 USA USD 989.3400\n", |
| "3 Montréal 2024-09-11 742.14 Canada CAD 526.9194\n", |
| "4 Vancouver 2024-09-09 584.56 Canada CAD 415.0376\n" |
| ] |
| }, |
| { |
| "data": { |
| "image/svg+xml": [ |
| "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n", |
| "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n", |
| " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n", |
| "<!-- Generated by graphviz version 2.43.0 (0)\n", |
| " -->\n", |
| "<!-- Title: %3 Pages: 1 -->\n", |
| "<svg width=\"316pt\" height=\"341pt\"\n", |
| " viewBox=\"0.00 0.00 316.00 340.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n", |
| "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 336.5)\">\n", |
| "<title>%3</title>\n", |
| "<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-336.5 312,-336.5 312,4 -4,4\"/>\n", |
| "<g id=\"clust1\" class=\"cluster\">\n", |
| "<title>cluster__legend</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" points=\"8.5,-137.5 8.5,-324.5 124.5,-324.5 124.5,-137.5 8.5,-137.5\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-309.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n", |
| "</g>\n", |
| "<!-- raw_data -->\n", |
| "<g id=\"node1\" class=\"node\">\n", |
| "<title>raw_data</title>\n", |
| "<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104,-127.5C104,-127.5 29,-127.5 29,-127.5 23,-127.5 17,-121.5 17,-115.5 17,-115.5 17,-75.5 17,-75.5 17,-69.5 23,-63.5 29,-63.5 29,-63.5 104,-63.5 104,-63.5 110,-63.5 116,-69.5 116,-75.5 116,-75.5 116,-115.5 116,-115.5 116,-121.5 110,-127.5 104,-127.5\"/>\n", |
| "<text text-anchor=\"start\" x=\"30\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n", |
| "<text text-anchor=\"start\" x=\"28\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- processed_data -->\n", |
| "<g id=\"node2\" class=\"node\">\n", |
| "<title>processed_data</title>\n", |
| "<path fill=\"#ffc857\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M296,-90.5C296,-90.5 174,-90.5 174,-90.5 168,-90.5 162,-84.5 162,-78.5 162,-78.5 162,-38.5 162,-38.5 162,-32.5 168,-26.5 174,-26.5 174,-26.5 296,-26.5 296,-26.5 302,-26.5 308,-32.5 308,-38.5 308,-38.5 308,-78.5 308,-78.5 308,-84.5 302,-90.5 296,-90.5\"/>\n", |
| "<text text-anchor=\"start\" x=\"173\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n", |
| "<text text-anchor=\"start\" x=\"196.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- raw_data->processed_data -->\n", |
| "<g id=\"edge1\" class=\"edge\">\n", |
| "<title>raw_data->processed_data</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M116.11,-84.7C127.39,-82.19 139.71,-79.45 151.99,-76.72\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"152.81,-80.13 161.82,-74.54 151.29,-73.29 152.81,-80.13\"/>\n", |
| "</g>\n", |
| "<!-- _processed_data_inputs -->\n", |
| "<g id=\"node3\" class=\"node\">\n", |
| "<title>_processed_data_inputs</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"133,-45 0,-45 0,0 133,0 133,-45\"/>\n", |
| "<text text-anchor=\"start\" x=\"15.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n", |
| "<text text-anchor=\"start\" x=\"99.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n", |
| "</g>\n", |
| "<!-- _processed_data_inputs->processed_data -->\n", |
| "<g id=\"edge2\" class=\"edge\">\n", |
| "<title>_processed_data_inputs->processed_data</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M133.31,-36.73C139.45,-38.06 145.73,-39.41 151.99,-40.77\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"151.3,-44.2 161.81,-42.89 152.78,-37.36 151.3,-44.2\"/>\n", |
| "</g>\n", |
| "<!-- input -->\n", |
| "<g id=\"node4\" class=\"node\">\n", |
| "<title>input</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"96,-293 37,-293 37,-256 96,-256 96,-293\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-270.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n", |
| "</g>\n", |
| "<!-- output -->\n", |
| "<g id=\"node5\" class=\"node\">\n", |
| "<title>output</title>\n", |
| "<path fill=\"#ffc857\" stroke=\"black\" d=\"M88.5,-238C88.5,-238 44.5,-238 44.5,-238 38.5,-238 32.5,-232 32.5,-226 32.5,-226 32.5,-213 32.5,-213 32.5,-207 38.5,-201 44.5,-201 44.5,-201 88.5,-201 88.5,-201 94.5,-201 100.5,-207 100.5,-213 100.5,-213 100.5,-226 100.5,-226 100.5,-232 94.5,-238 88.5,-238\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-215.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n", |
| "</g>\n", |
| "<!-- from cache -->\n", |
| "<g id=\"node6\" class=\"node\">\n", |
| "<title>from cache</title>\n", |
| "<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104.5,-183C104.5,-183 28.5,-183 28.5,-183 22.5,-183 16.5,-177 16.5,-171 16.5,-171 16.5,-158 16.5,-158 16.5,-152 22.5,-146 28.5,-146 28.5,-146 104.5,-146 104.5,-146 110.5,-146 116.5,-152 116.5,-158 116.5,-158 116.5,-171 116.5,-171 116.5,-177 110.5,-183 104.5,-183\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-160.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">from cache</text>\n", |
| "</g>\n", |
| "</g>\n", |
| "</svg>\n" |
| ], |
| "text/plain": [ |
| "<graphviz.graphs.Digraph at 0x7fa2653b1d50>" |
| ] |
| }, |
| "execution_count": 5, |
| "metadata": {}, |
| "output_type": "execute_result" |
| } |
| ], |
| "source": [ |
| "results = dr.execute([\"processed_data\"], inputs={\"cutoff_date\": \"2024-09-01\"})\n", |
| "print()\n", |
| "print(results[\"processed_data\"].head())\n", |
| "dr.cache.view_run()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "## Persisting in-memory data\n", |
| "\n", |
| "Now, we import `SQLiteMetadataStore` and `FileResultStore` to persist the data to disk. We access the in-memory stores via `dr.cache.result_store` and `dr.cache.metadata_store` and call the `.persist_to()` method on each.\n", |
| "\n", |
| "After executing the cell, you should see a new directory `./.persisted_cache` with results and metadata." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 6, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "from hamilton.caching.stores.file import FileResultStore\n", |
| "from hamilton.caching.stores.sqlite import SQLiteMetadataStore\n", |
| "\n", |
| "path = \"./.persisted_cache\"\n", |
| "on_disk_results = FileResultStore(path=path)\n", |
| "on_disk_metadata = SQLiteMetadataStore(path=path)\n", |
| "\n", |
| "dr.cache.result_store.persist_to(on_disk_results)\n", |
| "dr.cache.metadata_store.persist_to(on_disk_metadata)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "## Loading persisted data\n", |
| "\n", |
| "Now, we create a new `Driver`. Instead of starting with empty in-memory stores, we will load the previously persisted results by calling `.load_from()` on the `InMemoryResultStore` and `InMemoryMetadataStore` classes.\n", |
| "\n", |
| "For `InMemoryResultStore.load_from()`, we must provide a `MetadataStore` or a list of `data_version` to load results for." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 7, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "dr = (\n", |
| " driver.Builder()\n", |
| " .with_modules(dataflow_module)\n", |
| " .with_cache(\n", |
| " result_store=InMemoryResultStore.load_from(\n", |
| " on_disk_results,\n", |
| " metadata_store=on_disk_metadata,\n", |
| " ),\n", |
| " metadata_store=InMemoryMetadataStore.load_from(on_disk_metadata),\n", |
| " )\n", |
| " .build()\n", |
| ")" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "We print the size of the metadata store to show it contains 2 entries (one for `raw_data` and another for `processed_data`). Also, we see that results load from `FileResultStore`are successfully retrieved from the in-memory stores." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 8, |
| "metadata": {}, |
| "outputs": [ |
| { |
| "name": "stderr", |
| "output_type": "stream", |
| "text": [ |
| "raw_data::result_store::get_result::hit\n", |
| "processed_data::result_store::get_result::hit\n" |
| ] |
| }, |
| { |
| "name": "stdout", |
| "output_type": "stream", |
| "text": [ |
| "2\n", |
| "\n", |
| " cities date amount country currency amound_in_usd\n", |
| "0 New York 2024-09-13 478.23 USA USD 478.2300\n", |
| "1 Los Angeles 2024-09-12 251.67 USA USD 251.6700\n", |
| "2 Chicago 2024-09-11 989.34 USA USD 989.3400\n", |
| "3 Montréal 2024-09-11 742.14 Canada CAD 526.9194\n", |
| "4 Vancouver 2024-09-09 584.56 Canada CAD 415.0376\n" |
| ] |
| }, |
| { |
| "data": { |
| "image/svg+xml": [ |
| "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n", |
| "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n", |
| " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n", |
| "<!-- Generated by graphviz version 2.43.0 (0)\n", |
| " -->\n", |
| "<!-- Title: %3 Pages: 1 -->\n", |
| "<svg width=\"316pt\" height=\"341pt\"\n", |
| " viewBox=\"0.00 0.00 316.00 340.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n", |
| "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 336.5)\">\n", |
| "<title>%3</title>\n", |
| "<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-336.5 312,-336.5 312,4 -4,4\"/>\n", |
| "<g id=\"clust1\" class=\"cluster\">\n", |
| "<title>cluster__legend</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" points=\"8.5,-137.5 8.5,-324.5 124.5,-324.5 124.5,-137.5 8.5,-137.5\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-309.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n", |
| "</g>\n", |
| "<!-- raw_data -->\n", |
| "<g id=\"node1\" class=\"node\">\n", |
| "<title>raw_data</title>\n", |
| "<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104,-127.5C104,-127.5 29,-127.5 29,-127.5 23,-127.5 17,-121.5 17,-115.5 17,-115.5 17,-75.5 17,-75.5 17,-69.5 23,-63.5 29,-63.5 29,-63.5 104,-63.5 104,-63.5 110,-63.5 116,-69.5 116,-75.5 116,-75.5 116,-115.5 116,-115.5 116,-121.5 110,-127.5 104,-127.5\"/>\n", |
| "<text text-anchor=\"start\" x=\"30\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n", |
| "<text text-anchor=\"start\" x=\"28\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- processed_data -->\n", |
| "<g id=\"node2\" class=\"node\">\n", |
| "<title>processed_data</title>\n", |
| "<path fill=\"#ffc857\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M296,-90.5C296,-90.5 174,-90.5 174,-90.5 168,-90.5 162,-84.5 162,-78.5 162,-78.5 162,-38.5 162,-38.5 162,-32.5 168,-26.5 174,-26.5 174,-26.5 296,-26.5 296,-26.5 302,-26.5 308,-32.5 308,-38.5 308,-38.5 308,-78.5 308,-78.5 308,-84.5 302,-90.5 296,-90.5\"/>\n", |
| "<text text-anchor=\"start\" x=\"173\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n", |
| "<text text-anchor=\"start\" x=\"196.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- raw_data->processed_data -->\n", |
| "<g id=\"edge1\" class=\"edge\">\n", |
| "<title>raw_data->processed_data</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M116.11,-84.7C127.39,-82.19 139.71,-79.45 151.99,-76.72\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"152.81,-80.13 161.82,-74.54 151.29,-73.29 152.81,-80.13\"/>\n", |
| "</g>\n", |
| "<!-- _processed_data_inputs -->\n", |
| "<g id=\"node3\" class=\"node\">\n", |
| "<title>_processed_data_inputs</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"133,-45 0,-45 0,0 133,0 133,-45\"/>\n", |
| "<text text-anchor=\"start\" x=\"15.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n", |
| "<text text-anchor=\"start\" x=\"99.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n", |
| "</g>\n", |
| "<!-- _processed_data_inputs->processed_data -->\n", |
| "<g id=\"edge2\" class=\"edge\">\n", |
| "<title>_processed_data_inputs->processed_data</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M133.31,-36.73C139.45,-38.06 145.73,-39.41 151.99,-40.77\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"151.3,-44.2 161.81,-42.89 152.78,-37.36 151.3,-44.2\"/>\n", |
| "</g>\n", |
| "<!-- input -->\n", |
| "<g id=\"node4\" class=\"node\">\n", |
| "<title>input</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"96,-293 37,-293 37,-256 96,-256 96,-293\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-270.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n", |
| "</g>\n", |
| "<!-- output -->\n", |
| "<g id=\"node5\" class=\"node\">\n", |
| "<title>output</title>\n", |
| "<path fill=\"#ffc857\" stroke=\"black\" d=\"M88.5,-238C88.5,-238 44.5,-238 44.5,-238 38.5,-238 32.5,-232 32.5,-226 32.5,-226 32.5,-213 32.5,-213 32.5,-207 38.5,-201 44.5,-201 44.5,-201 88.5,-201 88.5,-201 94.5,-201 100.5,-207 100.5,-213 100.5,-213 100.5,-226 100.5,-226 100.5,-232 94.5,-238 88.5,-238\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-215.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n", |
| "</g>\n", |
| "<!-- from cache -->\n", |
| "<g id=\"node6\" class=\"node\">\n", |
| "<title>from cache</title>\n", |
| "<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M104.5,-183C104.5,-183 28.5,-183 28.5,-183 22.5,-183 16.5,-177 16.5,-171 16.5,-171 16.5,-158 16.5,-158 16.5,-152 22.5,-146 28.5,-146 28.5,-146 104.5,-146 104.5,-146 110.5,-146 116.5,-152 116.5,-158 116.5,-158 116.5,-171 116.5,-171 116.5,-177 110.5,-183 104.5,-183\"/>\n", |
| "<text text-anchor=\"middle\" x=\"66.5\" y=\"-160.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">from cache</text>\n", |
| "</g>\n", |
| "</g>\n", |
| "</svg>\n" |
| ], |
| "text/plain": [ |
| "<graphviz.graphs.Digraph at 0x7fa2653aa910>" |
| ] |
| }, |
| "execution_count": 8, |
| "metadata": {}, |
| "output_type": "execute_result" |
| } |
| ], |
| "source": [ |
| "print(dr.cache.metadata_store.size)\n", |
| "\n", |
| "results = dr.execute([\"processed_data\"], inputs={\"cutoff_date\": \"2024-09-01\"})\n", |
| "print()\n", |
| "print(results[\"processed_data\"].head())\n", |
| "dr.cache.view_run()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "## Use cases\n", |
| "\n", |
| "In-memory caching can be useful when you're doing a lot of experimentation in a notebook or an interactive session and don't want to persist results for future use. \n", |
| "\n", |
| "It can also speed up execution in some cases because you're no longer doing read/write to disk for each node execution." |
| ] |
| } |
| ], |
| "metadata": { |
| "kernelspec": { |
| "display_name": ".venv", |
| "language": "python", |
| "name": "python3" |
| }, |
| "language_info": { |
| "codemirror_mode": { |
| "name": "ipython", |
| "version": 3 |
| }, |
| "file_extension": ".py", |
| "mimetype": "text/x-python", |
| "name": "python", |
| "nbconvert_exporter": "python", |
| "pygments_lexer": "ipython3", |
| "version": "3.11.9" |
| } |
| }, |
| "nbformat": 4, |
| "nbformat_minor": 2 |
| } |