blob: 8235db1d98bb91e118ff97d71d1d8573f1672906 [file] [log] [blame]
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Licensed to the Apache Software Foundation (ASF) under one\nor more contributor license agreements. See the NOTICE file\ndistributed with this work for additional information\nregarding copyright ownership. The ASF licenses this file\nto you under the Apache License, Version 2.0 (the\n\"License\"); you may not use this file except in compliance\nwith the License. You may obtain a copy of the License at\n\n http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing,\nsoftware distributed under the License is distributed on an\n\"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\nKIND, either express or implied. See the License for the\nspecific language governing permissions and limitations\nunder the License."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Execute this cell to install dependencies\n",
"%pip install sf-hamilton[visualization]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Caching + materializers tutorial [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dagworks-inc/hamilton/blob/main/examples/caching/materializer_tutorial.ipynb) [![GitHub badge](https://img.shields.io/badge/github-view_source-2b3137?logo=github)](https://github.com/apache/hamilton/blob/main/examples/caching/materializer_tutorial.ipynb)\n",
"\n",
"\n",
"This notebook is a companion tutorial to the **Hamilton caching tutorial** notebook, which introduces caching more broadly.\n",
"\n",
"Its **Materializers** section teaches about different usage patterns for caching + materializers and introduces the `default_loader_behavior` and `default_saver_behavior` parameters. This notebook will show how to control loader and saver behaviors granularly.\n",
"\n",
"## Use cases\n",
"\n",
"As a reminder, here are some potential usage patterns\n",
"\n",
"**Loading data is expensive**: Your dataflow uses a `DataLoader` to get data from Snowflake. You want to load it once and cache it. When executing your dataflow, you want to use your cached copy to save query time, egress costs, etc.\n",
"- Use the `DEFAULT` caching behavior for loaders.\n",
"\n",
"**Only save new data**: You run the dataflow multiple times (maybe with different parameters or on a schedule) and only want to write to destination when the data changes.\n",
"- Use the `DEFAULT` caching behavior for savers.\n",
"\n",
"**Always read the latest data**: You want to use caching, but also ensure the dataflow always uses the latest data. This involves executing the `DataLoader` every time, get the data in-memory, version it, and then determine what needs to be executed (see [Changing external data](#changing-external-data)).\n",
"- Use the `RECOMPUTE` caching behavior for loaders.\n",
"\n",
"> NOTE. Caching + materializers is actively being improved so default behaviors and low-level APIs might change. This is a very powerful combo. If you have ideas, questions, or use cases, please reach out on Slack!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set up\n",
"The next cell sets up the notebook by:\n",
"- loading the Hamilton notebook extension\n",
"- getting the caching logger\n",
"- removing any existing cache directory"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"import shutil\n",
"\n",
"from hamilton import driver\n",
"\n",
"CACHE_DIR = \"./.materializer_and_caching_cache\"\n",
"\n",
"logger = logging.getLogger(\"hamilton.caching\")\n",
"logger.setLevel(logging.INFO)\n",
"logger.addHandler(logging.StreamHandler())\n",
"\n",
"shutil.rmtree(CACHE_DIR, ignore_errors=True)\n",
"\n",
"# load the notebook extension\n",
"%reload_ext hamilton.plugins.jupyter_magic"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## TL;DR\n",
"\n",
"Before diving into the details, here are the high-level ideas\n",
"\n",
"Materializers are available in several flavors:\n",
"- `@dataloader` and `@datasaver`: create a custom function to serve as `DataLoader` or `DataSaver`.\n",
"- `@load_from`: use a decorator to specify that an argument should be loaded from an external source\n",
"- `@save_to`: create a node that saves the output of the decorated node\n",
"- `from_` and `to`: equivalent to `@load_from` and `@save_to` but at the `Driver`-level\n",
"\n",
"When materializers and caching interact, it's important to realize the following:\n",
"- `@dataloader` and `@datasaver` are just like any other nodes and you can use `@cache` and `.with_cache()` as usual.\n",
"- `@load_from` and `@save_to` create nodes dynamically, so you there's no loader/saver function to apply `@cache` to directly. Instead, you add `@cache` to the function that has the `@load_from`/`@save_to` decorator. Also, you need to specify the name of internal nodes in `.with_cache()`, which can be trickier\n",
"- `from_` and `to` can't be decorated with `@cache` because they're defined at the `Driver`-level. Defining \"static\" materializers using `.with_materializers()` and `.with_cache()` is more intuitive. If you're using `Driver.materialize()` with \"dynamic\" materializers, you can still use `.with_cache()`. It can be more odd because `.with_cache()` will define behaviors for nodes that don't exist yet.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `@dataloader` and `@datasaver`\n",
"### Dataflow-level\n",
"\n",
"Let's rewrite the dataflow with the `@dataloader` and `@datasaver` decorators.\n",
"\n",
"- **DataLoader**: the function `raw_data()` now returns a `tuple` of `(result, metadata)`. The tuple type annotation needs to specify that `raw_data` returns a `pd.DataFrame` as the first element.\n",
"- **DataSaver**: the function `saved_data()` was added. It receives `amount_per_country()` and saves it to a parquet file. It must return a dictionary, which can contain metadata.\n",
"\n",
"Using the `@cache` decorator with `raw_data` or `saved_data` will apply the behavior to all associated materialization nodes.\n",
"\n",
"> NOTE. the `@cache` decorator can be above or below the `@dataloader` / `@datasaver` decorator."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"847pt\" height=\"355pt\"\n",
" viewBox=\"0.00 0.00 847.00 354.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 350.5)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-350.5 843,-350.5 843,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"14.5,-149.5 14.5,-338.5 135.5,-338.5 135.5,-149.5 14.5,-149.5\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-323.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M475,-90.5C475,-90.5 353,-90.5 353,-90.5 347,-90.5 341,-84.5 341,-78.5 341,-78.5 341,-38.5 341,-38.5 341,-32.5 347,-26.5 353,-26.5 353,-26.5 475,-26.5 475,-26.5 481,-26.5 487,-32.5 487,-38.5 487,-38.5 487,-78.5 487,-78.5 487,-84.5 481,-90.5 475,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"352\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"375.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M686,-90.5C686,-90.5 528,-90.5 528,-90.5 522,-90.5 516,-84.5 516,-78.5 516,-78.5 516,-38.5 516,-38.5 516,-32.5 522,-26.5 528,-26.5 528,-26.5 686,-26.5 686,-26.5 692,-26.5 698,-32.5 698,-38.5 698,-38.5 698,-78.5 698,-78.5 698,-84.5 692,-90.5 686,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"527\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"568.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge4\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M487.21,-58.5C493.23,-58.5 499.39,-58.5 505.57,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"505.98,-62 515.98,-58.5 505.98,-55 505.98,-62\"/>\n",
"</g>\n",
"<!-- raw_data -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M283,-127.5C283,-127.5 208,-127.5 208,-127.5 202,-127.5 196,-121.5 196,-115.5 196,-115.5 196,-75.5 196,-75.5 196,-69.5 202,-63.5 208,-63.5 208,-63.5 283,-63.5 283,-63.5 289,-63.5 295,-69.5 295,-75.5 295,-75.5 295,-115.5 295,-115.5 295,-121.5 289,-127.5 283,-127.5\"/>\n",
"<text text-anchor=\"start\" x=\"209\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"207\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M295.11,-84.7C306.39,-82.19 318.71,-79.45 330.99,-76.72\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"331.81,-80.13 340.82,-74.54 330.29,-73.29 331.81,-80.13\"/>\n",
"</g>\n",
"<!-- saved_data -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>saved_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M839,-94.5C839,-98.91 813.9,-102.5 783,-102.5 752.1,-102.5 727,-98.91 727,-94.5 727,-94.5 727,-22.5 727,-22.5 727,-18.09 752.1,-14.5 783,-14.5 813.9,-14.5 839,-18.09 839,-22.5 839,-22.5 839,-94.5 839,-94.5\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M839,-94.5C839,-90.09 813.9,-86.5 783,-86.5 752.1,-86.5 727,-90.09 727,-94.5\"/>\n",
"<text text-anchor=\"start\" x=\"738\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">saved_data</text>\n",
"<text text-anchor=\"start\" x=\"738\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">saved_data()</text>\n",
"</g>\n",
"<!-- amount_per_country&#45;&gt;saved_data -->\n",
"<g id=\"edge5\" class=\"edge\">\n",
"<title>amount_per_country&#45;&gt;saved_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M698.06,-58.5C704.33,-58.5 710.57,-58.5 716.67,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"716.76,-62 726.76,-58.5 716.76,-55 716.76,-62\"/>\n",
"</g>\n",
"<!-- raw_data.loader -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>raw_data.loader</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M150,-131.5C150,-135.91 116.38,-139.5 75,-139.5 33.62,-139.5 0,-135.91 0,-131.5 0,-131.5 0,-59.5 0,-59.5 0,-55.09 33.62,-51.5 75,-51.5 116.38,-51.5 150,-55.09 150,-59.5 150,-59.5 150,-131.5 150,-131.5\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M150,-131.5C150,-127.09 116.38,-123.5 75,-123.5 33.62,-123.5 0,-127.09 0,-131.5\"/>\n",
"<text text-anchor=\"start\" x=\"11\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data.loader</text>\n",
"<text text-anchor=\"start\" x=\"38\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">raw_data()</text>\n",
"</g>\n",
"<!-- raw_data.loader&#45;&gt;raw_data -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>raw_data.loader&#45;&gt;raw_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M150.4,-95.5C162.14,-95.5 174.15,-95.5 185.48,-95.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"185.67,-99 195.67,-95.5 185.67,-92 185.67,-99\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"312,-45 179,-45 179,0 312,0 312,-45\"/>\n",
"<text text-anchor=\"start\" x=\"194.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"278.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M312.31,-36.73C318.45,-38.06 324.73,-39.41 330.99,-40.77\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"330.3,-44.2 340.81,-42.89 331.78,-37.36 330.3,-44.2\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"104.5,-307 45.5,-307 45.5,-270 104.5,-270 104.5,-307\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-284.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M103,-252C103,-252 47,-252 47,-252 41,-252 35,-246 35,-240 35,-240 35,-227 35,-227 35,-221 41,-215 47,-215 47,-215 103,-215 103,-215 109,-215 115,-221 115,-227 115,-227 115,-240 115,-240 115,-246 109,-252 103,-252\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-229.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- materializer -->\n",
"<g id=\"node9\" class=\"node\">\n",
"<title>materializer</title>\n",
"<path fill=\"#ffffff\" stroke=\"black\" d=\"M127.5,-193.76C127.5,-195.76 103.97,-197.38 75,-197.38 46.03,-197.38 22.5,-195.76 22.5,-193.76 22.5,-193.76 22.5,-161.24 22.5,-161.24 22.5,-159.24 46.03,-157.62 75,-157.62 103.97,-157.62 127.5,-159.24 127.5,-161.24 127.5,-161.24 127.5,-193.76 127.5,-193.76\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M127.5,-193.76C127.5,-191.77 103.97,-190.15 75,-190.15 46.03,-190.15 22.5,-191.77 22.5,-193.76\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-173.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">materializer</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f111ae73090>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%cell_to_module dataloader_dataflow_module -d\n",
"import pandas as pd\n",
"from hamilton.function_modifiers import dataloader, datasaver, cache\n",
"\n",
"DATA = {\n",
" \"cities\": [\"New York\", \"Los Angeles\", \"Chicago\", \"Montréal\", \"Vancouver\", \"Houston\", \"Phoenix\", \"Mexico City\", \"Chihuahua City\", \"Rio de Janeiro\"],\n",
" \"date\": [\"2024-09-13\", \"2024-09-12\", \"2024-09-11\", \"2024-09-11\", \"2024-09-09\", \"2024-09-08\", \"2024-09-07\", \"2024-09-06\", \"2024-09-05\", \"2024-09-04\"],\n",
" \"amount\": [478.23, 251.67, 989.34, 742.14, 584.56, 321.85, 918.67, 135.22, 789.12, 432.78],\n",
" \"country\": [\"USA\", \"USA\", \"USA\", \"Canada\", \"Canada\", \"USA\", \"USA\", \"Mexico\", \"Mexico\", \"Brazil\"],\n",
" \"currency\": [\"USD\", \"USD\", \"USD\", \"CAD\", \"CAD\", \"USD\", \"USD\", \"MXN\", \"MXN\", \"BRL\"],\n",
"}\n",
"\n",
"@cache(behavior=\"recompute\")\n",
"@dataloader()\n",
"def raw_data() -> tuple[pd.DataFrame, dict]:\n",
" \"\"\"Loading raw data. This simulates loading from a file, database, or external service.\"\"\"\n",
" data = pd.DataFrame(DATA)\n",
" metadata = {\"source\": \"notebook\", \"format\": \"json\"}\n",
" return data, metadata\n",
"\n",
"def processed_data(raw_data: pd.DataFrame, cutoff_date: str) -> pd.DataFrame:\n",
" \"\"\"Filter out rows before cutoff date and convert currency to USD.\"\"\"\n",
" df = raw_data.loc[raw_data.date > cutoff_date].copy()\n",
" df[\"amound_in_usd\"] = df[\"amount\"]\n",
" df.loc[df.country == \"Canada\", \"amound_in_usd\"] *= 0.71 \n",
" df.loc[df.country == \"Brazil\", \"amound_in_usd\"] *= 0.18\n",
" df.loc[df.country == \"Mexico\", \"amound_in_usd\"] *= 0.05\n",
" return df\n",
"\n",
"def amount_per_country(processed_data: pd.DataFrame) -> pd.DataFrame:\n",
" \"\"\"Sum the amount in USD per country\"\"\"\n",
" return processed_data.groupby(\"country\")[\"amound_in_usd\"].sum().to_frame()\n",
"\n",
"@cache(behavior=\"recompute\")\n",
"@datasaver()\n",
"def saved_data(amount_per_country: pd.DataFrame) -> dict:\n",
" amount_per_country.to_parquet(\"./saved_data.parquet\")\n",
" metadata = {\"source\": \"notebook\", \"format\": \"parquet\"}\n",
" return metadata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The visualization now displays the \"materializer\" node for the data loader. When we execute the dataflow twice and see that both `raw_data` and the associated `raw_data.loader` are recomputed."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"raw_data.loader::adapter::execute_node\n",
"raw_data.loader::adapter::execute_node\n",
"raw_data::adapter::execute_node\n",
"raw_data::adapter::execute_node\n",
"processed_data::adapter::execute_node\n",
"processed_data::adapter::execute_node\n",
"amount_per_country::adapter::execute_node\n",
"amount_per_country::adapter::execute_node\n",
"saved_data::adapter::execute_node\n",
"saved_data::adapter::execute_node\n",
"raw_data.loader::adapter::execute_node\n",
"raw_data.loader::adapter::execute_node\n",
"raw_data::adapter::execute_node\n",
"raw_data::adapter::execute_node\n",
"processed_data::result_store::get_result::hit\n",
"processed_data::result_store::get_result::hit\n",
"amount_per_country::result_store::get_result::hit\n",
"amount_per_country::result_store::get_result::hit\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"saved_data::adapter::execute_node\n",
"saved_data::adapter::execute_node\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" amound_in_usd\n",
"country \n",
"Brazil 77.9004\n",
"Canada 941.9570\n",
"Mexico 46.2170\n",
"USA 2959.7600\n",
"\n"
]
},
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"847pt\" height=\"413pt\"\n",
" viewBox=\"0.00 0.00 847.00 413.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 409)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-409 843,-409 843,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"14.5,-98 14.5,-397 135.5,-397 135.5,-98 14.5,-98\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-381.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- raw_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M283,-76C283,-76 208,-76 208,-76 202,-76 196,-70 196,-64 196,-64 196,-24 196,-24 196,-18 202,-12 208,-12 208,-12 283,-12 283,-12 289,-12 295,-18 295,-24 295,-24 295,-64 295,-64 295,-70 289,-76 283,-76\"/>\n",
"<text text-anchor=\"start\" x=\"209\" y=\"-54.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"207\" y=\"-26.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M475,-112C475,-112 353,-112 353,-112 347,-112 341,-106 341,-100 341,-100 341,-60 341,-60 341,-54 347,-48 353,-48 353,-48 475,-48 475,-48 481,-48 487,-54 487,-60 487,-60 487,-100 487,-100 487,-106 481,-112 475,-112\"/>\n",
"<text text-anchor=\"start\" x=\"352\" y=\"-90.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"375.5\" y=\"-62.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M295.11,-54.51C306.39,-56.95 318.71,-59.61 330.99,-62.27\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"330.3,-65.7 340.82,-64.39 331.78,-58.86 330.3,-65.7\"/>\n",
"</g>\n",
"<!-- raw_data.loader -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>raw_data.loader</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M150,-80C150,-84.41 116.38,-88 75,-88 33.62,-88 0,-84.41 0,-80 0,-80 0,-8 0,-8 0,-3.59 33.62,0 75,0 116.38,0 150,-3.59 150,-8 150,-8 150,-80 150,-80\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M150,-80C150,-75.59 116.38,-72 75,-72 33.62,-72 0,-75.59 0,-80\"/>\n",
"<text text-anchor=\"start\" x=\"11\" y=\"-54.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data.loader</text>\n",
"<text text-anchor=\"start\" x=\"38\" y=\"-26.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">raw_data()</text>\n",
"</g>\n",
"<!-- raw_data.loader&#45;&gt;raw_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>raw_data.loader&#45;&gt;raw_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M150.4,-44C162.14,-44 174.15,-44 185.48,-44\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"185.67,-47.5 195.67,-44 185.67,-40.5 185.67,-47.5\"/>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#ffc857\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M686,-112C686,-112 528,-112 528,-112 522,-112 516,-106 516,-100 516,-100 516,-60 516,-60 516,-54 522,-48 528,-48 528,-48 686,-48 686,-48 692,-48 698,-54 698,-60 698,-60 698,-100 698,-100 698,-106 692,-112 686,-112\"/>\n",
"<text text-anchor=\"start\" x=\"527\" y=\"-90.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"568.5\" y=\"-62.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge5\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M487.21,-80C493.23,-80 499.39,-80 505.57,-80\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"505.98,-83.5 515.98,-80 505.98,-76.5 505.98,-83.5\"/>\n",
"</g>\n",
"<!-- saved_data -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>saved_data</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M839,-116C839,-120.41 813.9,-124 783,-124 752.1,-124 727,-120.41 727,-116 727,-116 727,-44 727,-44 727,-39.59 752.1,-36 783,-36 813.9,-36 839,-39.59 839,-44 839,-44 839,-116 839,-116\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M839,-116C839,-111.59 813.9,-108 783,-108 752.1,-108 727,-111.59 727,-116\"/>\n",
"<text text-anchor=\"start\" x=\"738\" y=\"-90.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">saved_data</text>\n",
"<text text-anchor=\"start\" x=\"738\" y=\"-62.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">saved_data()</text>\n",
"</g>\n",
"<!-- amount_per_country&#45;&gt;saved_data -->\n",
"<g id=\"edge4\" class=\"edge\">\n",
"<title>amount_per_country&#45;&gt;saved_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M698.06,-80C704.33,-80 710.57,-80 716.67,-80\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"716.76,-83.5 726.76,-80 716.76,-76.5 716.76,-83.5\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"312,-139.5 179,-139.5 179,-94.5 312,-94.5 312,-139.5\"/>\n",
"<text text-anchor=\"start\" x=\"194.5\" y=\"-112.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"278.5\" y=\"-112.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M312.31,-102.38C318.45,-101.01 324.73,-99.62 330.99,-98.22\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"331.81,-101.63 340.81,-96.04 330.29,-94.79 331.81,-101.63\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"104.5,-365.5 45.5,-365.5 45.5,-328.5 104.5,-328.5 104.5,-365.5\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-343.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M103,-310.5C103,-310.5 47,-310.5 47,-310.5 41,-310.5 35,-304.5 35,-298.5 35,-298.5 35,-285.5 35,-285.5 35,-279.5 41,-273.5 47,-273.5 47,-273.5 103,-273.5 103,-273.5 109,-273.5 115,-279.5 115,-285.5 115,-285.5 115,-298.5 115,-298.5 115,-304.5 109,-310.5 103,-310.5\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-288.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- output -->\n",
"<g id=\"node9\" class=\"node\">\n",
"<title>output</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M97,-255.5C97,-255.5 53,-255.5 53,-255.5 47,-255.5 41,-249.5 41,-243.5 41,-243.5 41,-230.5 41,-230.5 41,-224.5 47,-218.5 53,-218.5 53,-218.5 97,-218.5 97,-218.5 103,-218.5 109,-224.5 109,-230.5 109,-230.5 109,-243.5 109,-243.5 109,-249.5 103,-255.5 97,-255.5\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-233.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n",
"</g>\n",
"<!-- materializer -->\n",
"<g id=\"node10\" class=\"node\">\n",
"<title>materializer</title>\n",
"<path fill=\"#ffffff\" stroke=\"black\" d=\"M127.5,-197.26C127.5,-199.26 103.97,-200.88 75,-200.88 46.03,-200.88 22.5,-199.26 22.5,-197.26 22.5,-197.26 22.5,-164.74 22.5,-164.74 22.5,-162.74 46.03,-161.12 75,-161.12 103.97,-161.12 127.5,-162.74 127.5,-164.74 127.5,-164.74 127.5,-197.26 127.5,-197.26\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M127.5,-197.26C127.5,-195.27 103.97,-193.65 75,-193.65 46.03,-193.65 22.5,-195.27 22.5,-197.26\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-177.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">materializer</text>\n",
"</g>\n",
"<!-- from cache -->\n",
"<g id=\"node11\" class=\"node\">\n",
"<title>from cache</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M113,-143.5C113,-143.5 37,-143.5 37,-143.5 31,-143.5 25,-137.5 25,-131.5 25,-131.5 25,-118.5 25,-118.5 25,-112.5 31,-106.5 37,-106.5 37,-106.5 113,-106.5 113,-106.5 119,-106.5 125,-112.5 125,-118.5 125,-118.5 125,-131.5 125,-131.5 125,-137.5 119,-143.5 113,-143.5\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-121.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">from cache</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f111aeb4990>"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataloader_dataflow_dr = (\n",
" driver.Builder().with_modules(dataloader_dataflow_module).with_cache(path=CACHE_DIR).build()\n",
")\n",
"\n",
"dataloader_dataflow_results = dataloader_dataflow_dr.execute(\n",
" [\"amount_per_country\", \"saved_data\"], inputs={\"cutoff_date\": \"2024-09-01\"}\n",
")\n",
"dataloader_dataflow_results = dataloader_dataflow_dr.execute(\n",
" [\"amount_per_country\", \"saved_data\"], inputs={\"cutoff_date\": \"2024-09-01\"}\n",
")\n",
"print()\n",
"print(dataloader_dataflow_results[\"amount_per_country\"].head())\n",
"print()\n",
"dataloader_dataflow_dr.cache.view_run()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can manually inspect the node caching behavior via the `Driver.cache`. The `RECOMPUTE` behavior applied to `raw_data` is also applied to the internal `raw_data.loader`."
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'amount_per_country': <CachingBehavior.DEFAULT: 1>,\n",
" 'processed_data': <CachingBehavior.DEFAULT: 1>,\n",
" 'raw_data.loader': <CachingBehavior.RECOMPUTE: 2>,\n",
" 'raw_data': <CachingBehavior.RECOMPUTE: 2>,\n",
" 'saved_data': <CachingBehavior.RECOMPUTE: 2>,\n",
" 'cutoff_date': <CachingBehavior.DEFAULT: 1>}"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataloader_dataflow_dr.cache.behaviors[dataloader_dataflow_dr.cache.last_run_id]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Driver-level\n",
"Now, let's specify the behavior at the `Driver`-level instead. The next cell contains the same module, but without the `@cache` decorator."
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [],
"source": [
"%%cell_to_module dataloader_driver_module\n",
"import pandas as pd\n",
"from hamilton.function_modifiers import dataloader, datasaver\n",
"\n",
"DATA = {\n",
" \"cities\": [\"New York\", \"Los Angeles\", \"Chicago\", \"Montréal\", \"Vancouver\", \"Houston\", \"Phoenix\", \"Mexico City\", \"Chihuahua City\", \"Rio de Janeiro\"],\n",
" \"date\": [\"2024-09-13\", \"2024-09-12\", \"2024-09-11\", \"2024-09-11\", \"2024-09-09\", \"2024-09-08\", \"2024-09-07\", \"2024-09-06\", \"2024-09-05\", \"2024-09-04\"],\n",
" \"amount\": [478.23, 251.67, 989.34, 742.14, 584.56, 321.85, 918.67, 135.22, 789.12, 432.78],\n",
" \"country\": [\"USA\", \"USA\", \"USA\", \"Canada\", \"Canada\", \"USA\", \"USA\", \"Mexico\", \"Mexico\", \"Brazil\"],\n",
" \"currency\": [\"USD\", \"USD\", \"USD\", \"CAD\", \"CAD\", \"USD\", \"USD\", \"MXN\", \"MXN\", \"BRL\"],\n",
"}\n",
"\n",
"@dataloader()\n",
"def raw_data() -> tuple[pd.DataFrame, dict]:\n",
" \"\"\"Loading raw data. This simulates loading from a file, database, or external service.\"\"\"\n",
" data = pd.DataFrame(DATA)\n",
" metadata = {\"source\": \"notebook\", \"format\": \"json\"}\n",
" return data, metadata\n",
"\n",
"def processed_data(raw_data: pd.DataFrame, cutoff_date: str) -> pd.DataFrame:\n",
" \"\"\"Filter out rows before cutoff date and convert currency to USD.\"\"\"\n",
" df = raw_data.loc[raw_data.date > cutoff_date].copy()\n",
" df[\"amound_in_usd\"] = df[\"amount\"]\n",
" df.loc[df.country == \"Canada\", \"amound_in_usd\"] *= 0.71 \n",
" df.loc[df.country == \"Brazil\", \"amound_in_usd\"] *= 0.18\n",
" df.loc[df.country == \"Mexico\", \"amound_in_usd\"] *= 0.05\n",
" return df\n",
"\n",
"def amount_per_country(processed_data: pd.DataFrame) -> pd.DataFrame:\n",
" \"\"\"Sum the amount in USD per country\"\"\"\n",
" return processed_data.groupby(\"country\")[\"amound_in_usd\"].sum().to_frame()\n",
"\n",
"@datasaver()\n",
"def saved_data(amount_per_country: pd.DataFrame) -> dict:\n",
" amount_per_country.to_parquet(\"./saved_data.parquet\")\n",
" metadata = {\"source\": \"notebook\", \"format\": \"parquet\"}\n",
" return metadata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When building the `Driver`, we use `.with_cache(recompute=[\"raw_data\", \"saved_data\"])` to specify the nodes behavior."
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"raw_data.loader::adapter::execute_node\n",
"raw_data.loader::adapter::execute_node\n",
"raw_data::adapter::execute_node\n",
"raw_data::adapter::execute_node\n",
"processed_data::result_store::get_result::hit\n",
"processed_data::result_store::get_result::hit\n",
"amount_per_country::result_store::get_result::hit\n",
"amount_per_country::result_store::get_result::hit\n",
"saved_data::adapter::execute_node\n",
"saved_data::adapter::execute_node\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" amound_in_usd\n",
"country \n",
"Brazil 77.9004\n",
"Canada 941.9570\n",
"Mexico 46.2170\n",
"USA 2959.7600\n",
"\n"
]
},
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"847pt\" height=\"413pt\"\n",
" viewBox=\"0.00 0.00 847.00 413.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 409)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-409 843,-409 843,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"14.5,-98 14.5,-397 135.5,-397 135.5,-98 14.5,-98\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-381.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- raw_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M283,-76C283,-76 208,-76 208,-76 202,-76 196,-70 196,-64 196,-64 196,-24 196,-24 196,-18 202,-12 208,-12 208,-12 283,-12 283,-12 289,-12 295,-18 295,-24 295,-24 295,-64 295,-64 295,-70 289,-76 283,-76\"/>\n",
"<text text-anchor=\"start\" x=\"209\" y=\"-54.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"207\" y=\"-26.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M475,-112C475,-112 353,-112 353,-112 347,-112 341,-106 341,-100 341,-100 341,-60 341,-60 341,-54 347,-48 353,-48 353,-48 475,-48 475,-48 481,-48 487,-54 487,-60 487,-60 487,-100 487,-100 487,-106 481,-112 475,-112\"/>\n",
"<text text-anchor=\"start\" x=\"352\" y=\"-90.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"375.5\" y=\"-62.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M295.11,-54.51C306.39,-56.95 318.71,-59.61 330.99,-62.27\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"330.3,-65.7 340.82,-64.39 331.78,-58.86 330.3,-65.7\"/>\n",
"</g>\n",
"<!-- raw_data.loader -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>raw_data.loader</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M150,-80C150,-84.41 116.38,-88 75,-88 33.62,-88 0,-84.41 0,-80 0,-80 0,-8 0,-8 0,-3.59 33.62,0 75,0 116.38,0 150,-3.59 150,-8 150,-8 150,-80 150,-80\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M150,-80C150,-75.59 116.38,-72 75,-72 33.62,-72 0,-75.59 0,-80\"/>\n",
"<text text-anchor=\"start\" x=\"11\" y=\"-54.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data.loader</text>\n",
"<text text-anchor=\"start\" x=\"38\" y=\"-26.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">raw_data()</text>\n",
"</g>\n",
"<!-- raw_data.loader&#45;&gt;raw_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>raw_data.loader&#45;&gt;raw_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M150.4,-44C162.14,-44 174.15,-44 185.48,-44\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"185.67,-47.5 195.67,-44 185.67,-40.5 185.67,-47.5\"/>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#ffc857\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M686,-112C686,-112 528,-112 528,-112 522,-112 516,-106 516,-100 516,-100 516,-60 516,-60 516,-54 522,-48 528,-48 528,-48 686,-48 686,-48 692,-48 698,-54 698,-60 698,-60 698,-100 698,-100 698,-106 692,-112 686,-112\"/>\n",
"<text text-anchor=\"start\" x=\"527\" y=\"-90.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"568.5\" y=\"-62.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge5\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M487.21,-80C493.23,-80 499.39,-80 505.57,-80\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"505.98,-83.5 515.98,-80 505.98,-76.5 505.98,-83.5\"/>\n",
"</g>\n",
"<!-- saved_data -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>saved_data</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M839,-116C839,-120.41 813.9,-124 783,-124 752.1,-124 727,-120.41 727,-116 727,-116 727,-44 727,-44 727,-39.59 752.1,-36 783,-36 813.9,-36 839,-39.59 839,-44 839,-44 839,-116 839,-116\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M839,-116C839,-111.59 813.9,-108 783,-108 752.1,-108 727,-111.59 727,-116\"/>\n",
"<text text-anchor=\"start\" x=\"738\" y=\"-90.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">saved_data</text>\n",
"<text text-anchor=\"start\" x=\"738\" y=\"-62.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">saved_data()</text>\n",
"</g>\n",
"<!-- amount_per_country&#45;&gt;saved_data -->\n",
"<g id=\"edge4\" class=\"edge\">\n",
"<title>amount_per_country&#45;&gt;saved_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M698.06,-80C704.33,-80 710.57,-80 716.67,-80\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"716.76,-83.5 726.76,-80 716.76,-76.5 716.76,-83.5\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"312,-139.5 179,-139.5 179,-94.5 312,-94.5 312,-139.5\"/>\n",
"<text text-anchor=\"start\" x=\"194.5\" y=\"-112.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"278.5\" y=\"-112.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M312.31,-102.38C318.45,-101.01 324.73,-99.62 330.99,-98.22\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"331.81,-101.63 340.81,-96.04 330.29,-94.79 331.81,-101.63\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"104.5,-365.5 45.5,-365.5 45.5,-328.5 104.5,-328.5 104.5,-365.5\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-343.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M103,-310.5C103,-310.5 47,-310.5 47,-310.5 41,-310.5 35,-304.5 35,-298.5 35,-298.5 35,-285.5 35,-285.5 35,-279.5 41,-273.5 47,-273.5 47,-273.5 103,-273.5 103,-273.5 109,-273.5 115,-279.5 115,-285.5 115,-285.5 115,-298.5 115,-298.5 115,-304.5 109,-310.5 103,-310.5\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-288.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- output -->\n",
"<g id=\"node9\" class=\"node\">\n",
"<title>output</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M97,-255.5C97,-255.5 53,-255.5 53,-255.5 47,-255.5 41,-249.5 41,-243.5 41,-243.5 41,-230.5 41,-230.5 41,-224.5 47,-218.5 53,-218.5 53,-218.5 97,-218.5 97,-218.5 103,-218.5 109,-224.5 109,-230.5 109,-230.5 109,-243.5 109,-243.5 109,-249.5 103,-255.5 97,-255.5\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-233.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n",
"</g>\n",
"<!-- materializer -->\n",
"<g id=\"node10\" class=\"node\">\n",
"<title>materializer</title>\n",
"<path fill=\"#ffffff\" stroke=\"black\" d=\"M127.5,-197.26C127.5,-199.26 103.97,-200.88 75,-200.88 46.03,-200.88 22.5,-199.26 22.5,-197.26 22.5,-197.26 22.5,-164.74 22.5,-164.74 22.5,-162.74 46.03,-161.12 75,-161.12 103.97,-161.12 127.5,-162.74 127.5,-164.74 127.5,-164.74 127.5,-197.26 127.5,-197.26\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M127.5,-197.26C127.5,-195.27 103.97,-193.65 75,-193.65 46.03,-193.65 22.5,-195.27 22.5,-197.26\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-177.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">materializer</text>\n",
"</g>\n",
"<!-- from cache -->\n",
"<g id=\"node11\" class=\"node\">\n",
"<title>from cache</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M113,-143.5C113,-143.5 37,-143.5 37,-143.5 31,-143.5 25,-137.5 25,-131.5 25,-131.5 25,-118.5 25,-118.5 25,-112.5 31,-106.5 37,-106.5 37,-106.5 113,-106.5 113,-106.5 119,-106.5 125,-112.5 125,-118.5 125,-118.5 125,-131.5 125,-131.5 125,-137.5 119,-143.5 113,-143.5\"/>\n",
"<text text-anchor=\"middle\" x=\"75\" y=\"-121.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">from cache</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f111c8067d0>"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataloader_driver_dr = (\n",
" driver.Builder()\n",
" .with_modules(dataloader_driver_module)\n",
" .with_cache(\n",
" path=CACHE_DIR,\n",
" recompute=[\"raw_data\", \"saved_data\"],\n",
" )\n",
" .build()\n",
")\n",
"\n",
"dataloader_driver_results = dataloader_driver_dr.execute(\n",
" [\"amount_per_country\", \"saved_data\"], inputs={\"cutoff_date\": \"2024-09-01\"}\n",
")\n",
"print()\n",
"print(dataloader_driver_results[\"amount_per_country\"].head())\n",
"print()\n",
"dataloader_driver_dr.cache.view_run()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `RECOMPUTE` behavior applied to `raw_data` is also applied to the internal `raw_data.loader`."
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'amount_per_country': <CachingBehavior.DEFAULT: 1>,\n",
" 'processed_data': <CachingBehavior.DEFAULT: 1>,\n",
" 'raw_data.loader': <CachingBehavior.RECOMPUTE: 2>,\n",
" 'raw_data': <CachingBehavior.RECOMPUTE: 2>,\n",
" 'saved_data': <CachingBehavior.RECOMPUTE: 2>,\n",
" 'cutoff_date': <CachingBehavior.DEFAULT: 1>}"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataloader_driver_dr.cache.behaviors[dataloader_driver_dr.cache.last_run_id]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `@load_from` and `@save_to`\n",
"### Dataflow-level\n",
"\n",
"Using `@load_from` and `@save_to` respectively remove the need to have the `raw_data()` and `saved_data()` functions. Instead, the loader/saver nodes are created a runtime, meaning we can't directly decorate them with `@cache`.\n",
"\n",
"> At the time of release, the `@cache` decorator must be **under** the `@load_from` or `@save_to`. This quirk will be fixed because order shouldn't matter. \n",
"\n",
"The `@cache` decorator will be applied to `processed_data` and `amount_per_country`. By default, this will apply the behavior both to the loader node `raw_data`, but also `processed_data`. Similarly, `amount_per_country` and the generated `save.amount_per_country` will receive the behavior."
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"1299pt\" height=\"355pt\"\n",
" viewBox=\"0.00 0.00 1299.00 354.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 350.5)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-350.5 1295,-350.5 1295,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"92,-149.5 92,-338.5 213,-338.5 213,-149.5 92,-149.5\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-323.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- processed_data.load_data.raw_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>processed_data.load_data.raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M305,-131.5C305,-135.91 236.65,-139.5 152.5,-139.5 68.35,-139.5 0,-135.91 0,-131.5 0,-131.5 0,-59.5 0,-59.5 0,-55.09 68.35,-51.5 152.5,-51.5 236.65,-51.5 305,-55.09 305,-59.5 305,-59.5 305,-131.5 305,-131.5\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M305,-131.5C305,-127.09 236.65,-123.5 152.5,-123.5 68.35,-123.5 0,-127.09 0,-131.5\"/>\n",
"<text text-anchor=\"start\" x=\"11\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data.load_data.raw_data</text>\n",
"<text text-anchor=\"start\" x=\"75.5\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">PandasParquetReader</text>\n",
"</g>\n",
"<!-- processed_data.select_data.raw_data -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>processed_data.select_data.raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M640,-127.5C640,-127.5 346,-127.5 346,-127.5 340,-127.5 334,-121.5 334,-115.5 334,-115.5 334,-75.5 334,-75.5 334,-69.5 340,-63.5 346,-63.5 346,-63.5 640,-63.5 640,-63.5 646,-63.5 652,-69.5 652,-75.5 652,-75.5 652,-115.5 652,-115.5 652,-121.5 646,-127.5 640,-127.5\"/>\n",
"<text text-anchor=\"start\" x=\"345\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data.select_data.raw_data</text>\n",
"<text text-anchor=\"start\" x=\"454.5\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data.load_data.raw_data&#45;&gt;processed_data.select_data.raw_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>processed_data.load_data.raw_data&#45;&gt;processed_data.select_data.raw_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M305.42,-95.5C311.5,-95.5 317.61,-95.5 323.72,-95.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"323.99,-99 333.99,-95.5 323.99,-92 323.99,-99\"/>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M815,-90.5C815,-90.5 693,-90.5 693,-90.5 687,-90.5 681,-84.5 681,-78.5 681,-78.5 681,-38.5 681,-38.5 681,-32.5 687,-26.5 693,-26.5 693,-26.5 815,-26.5 815,-26.5 821,-26.5 827,-32.5 827,-38.5 827,-38.5 827,-78.5 827,-78.5 827,-84.5 821,-90.5 815,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"692\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"715.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data.select_data.raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>processed_data.select_data.raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M652.12,-72.91C658.46,-72.01 664.71,-71.11 670.79,-70.24\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"671.47,-73.68 680.87,-68.8 670.48,-66.75 671.47,-73.68\"/>\n",
"</g>\n",
"<!-- save.amount_per_country -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>save.amount_per_country</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M1291,-94.5C1291,-98.91 1240.8,-102.5 1179,-102.5 1117.2,-102.5 1067,-98.91 1067,-94.5 1067,-94.5 1067,-22.5 1067,-22.5 1067,-18.09 1117.2,-14.5 1179,-14.5 1240.8,-14.5 1291,-18.09 1291,-22.5 1291,-22.5 1291,-94.5 1291,-94.5\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M1291,-94.5C1291,-90.09 1240.8,-86.5 1179,-86.5 1117.2,-86.5 1067,-90.09 1067,-94.5\"/>\n",
"<text text-anchor=\"start\" x=\"1078\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">save.amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"1105\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">PandasParquetWriter</text>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M1026,-90.5C1026,-90.5 868,-90.5 868,-90.5 862,-90.5 856,-84.5 856,-78.5 856,-78.5 856,-38.5 856,-38.5 856,-32.5 862,-26.5 868,-26.5 868,-26.5 1026,-26.5 1026,-26.5 1032,-26.5 1038,-32.5 1038,-38.5 1038,-38.5 1038,-78.5 1038,-78.5 1038,-84.5 1032,-90.5 1026,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"867\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"908.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge5\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M827.21,-58.5C833.23,-58.5 839.39,-58.5 845.57,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"845.98,-62 855.98,-58.5 845.98,-55 845.98,-62\"/>\n",
"</g>\n",
"<!-- amount_per_country&#45;&gt;save.amount_per_country -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>amount_per_country&#45;&gt;save.amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M1038.24,-58.5C1044.29,-58.5 1050.43,-58.5 1056.6,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"1056.99,-62 1066.99,-58.5 1056.99,-55 1056.99,-62\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"559.5,-45 426.5,-45 426.5,0 559.5,0 559.5,-45\"/>\n",
"<text text-anchor=\"start\" x=\"442\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"526\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge4\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M559.69,-31.63C593.53,-36.33 635.11,-42.11 671.03,-47.11\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"670.61,-50.58 681,-48.49 671.58,-43.65 670.61,-50.58\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"182,-307 123,-307 123,-270 182,-270 182,-307\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-284.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M180.5,-252C180.5,-252 124.5,-252 124.5,-252 118.5,-252 112.5,-246 112.5,-240 112.5,-240 112.5,-227 112.5,-227 112.5,-221 118.5,-215 124.5,-215 124.5,-215 180.5,-215 180.5,-215 186.5,-215 192.5,-221 192.5,-227 192.5,-227 192.5,-240 192.5,-240 192.5,-246 186.5,-252 180.5,-252\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-229.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- materializer -->\n",
"<g id=\"node9\" class=\"node\">\n",
"<title>materializer</title>\n",
"<path fill=\"#ffffff\" stroke=\"black\" d=\"M205,-193.76C205,-195.76 181.47,-197.38 152.5,-197.38 123.53,-197.38 100,-195.76 100,-193.76 100,-193.76 100,-161.24 100,-161.24 100,-159.24 123.53,-157.62 152.5,-157.62 181.47,-157.62 205,-159.24 205,-161.24 205,-161.24 205,-193.76 205,-193.76\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M205,-193.76C205,-191.77 181.47,-190.15 152.5,-190.15 123.53,-190.15 100,-191.77 100,-193.76\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-173.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">materializer</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f111aeb6550>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%cell_to_module load_from_dataflow_module -d\n",
"import pandas as pd\n",
"from hamilton.function_modifiers import load_from, save_to, cache\n",
"\n",
"DATA = {\n",
" \"cities\": [\"New York\", \"Los Angeles\", \"Chicago\", \"Montréal\", \"Vancouver\", \"Houston\", \"Phoenix\", \"Mexico City\", \"Chihuahua City\", \"Rio de Janeiro\"],\n",
" \"date\": [\"2024-09-13\", \"2024-09-12\", \"2024-09-11\", \"2024-09-11\", \"2024-09-09\", \"2024-09-08\", \"2024-09-07\", \"2024-09-06\", \"2024-09-05\", \"2024-09-04\"],\n",
" \"amount\": [478.23, 251.67, 989.34, 742.14, 584.56, 321.85, 918.67, 135.22, 789.12, 432.78],\n",
" \"country\": [\"USA\", \"USA\", \"USA\", \"Canada\", \"Canada\", \"USA\", \"USA\", \"Mexico\", \"Mexico\", \"Brazil\"],\n",
" \"currency\": [\"USD\", \"USD\", \"USD\", \"CAD\", \"CAD\", \"USD\", \"USD\", \"MXN\", \"MXN\", \"BRL\"],\n",
"}\n",
"\n",
"@load_from.parquet(path=\"raw_data.parquet\", inject_=\"raw_data\")\n",
"@cache(behavior=\"recompute\")\n",
"def processed_data(raw_data: pd.DataFrame, cutoff_date: str) -> pd.DataFrame:\n",
" \"\"\"Filter out rows before cutoff date and convert currency to USD.\"\"\"\n",
" df = raw_data.loc[raw_data.date > cutoff_date].copy()\n",
" df[\"amound_in_usd\"] = df[\"amount\"]\n",
" df.loc[df.country == \"Canada\", \"amound_in_usd\"] *= 0.71 \n",
" df.loc[df.country == \"Brazil\", \"amound_in_usd\"] *= 0.18\n",
" df.loc[df.country == \"Mexico\", \"amound_in_usd\"] *= 0.05\n",
" return df\n",
"\n",
"@save_to.parquet(path=\"saved_data.parquet\")\n",
"@cache(behavior=\"recompute\")\n",
"def amount_per_country(processed_data: pd.DataFrame) -> pd.DataFrame:\n",
" \"\"\"Sum the amount in USD per country\"\"\"\n",
" return processed_data.groupby(\"country\")[\"amound_in_usd\"].sum().to_frame()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The visualization displays the internal nodes generated by `@load_from` and `@save_to`."
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"processed_data.load_data.raw_data::adapter::execute_node\n",
"processed_data.load_data.raw_data::adapter::execute_node\n",
"processed_data.select_data.raw_data::adapter::execute_node\n",
"processed_data.select_data.raw_data::adapter::execute_node\n",
"processed_data::adapter::execute_node\n",
"processed_data::adapter::execute_node\n",
"amount_per_country::adapter::execute_node\n",
"amount_per_country::adapter::execute_node\n",
"save.amount_per_country::adapter::execute_node\n",
"save.amount_per_country::adapter::execute_node\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" amound_in_usd\n",
"country \n",
"Brazil 77.9004\n",
"Canada 941.957\n",
"Mexico 46.217\n",
"USA 2959.76\n",
"\n"
]
},
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"1299pt\" height=\"358pt\"\n",
" viewBox=\"0.00 0.00 1299.00 358.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 354)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-354 1295,-354 1295,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"92,-98 92,-342 213,-342 213,-98 92,-98\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-326.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- processed_data.select_data.raw_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>processed_data.select_data.raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M640,-76C640,-76 346,-76 346,-76 340,-76 334,-70 334,-64 334,-64 334,-24 334,-24 334,-18 340,-12 346,-12 346,-12 640,-12 640,-12 646,-12 652,-18 652,-24 652,-24 652,-64 652,-64 652,-70 646,-76 640,-76\"/>\n",
"<text text-anchor=\"start\" x=\"345\" y=\"-54.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data.select_data.raw_data</text>\n",
"<text text-anchor=\"start\" x=\"454.5\" y=\"-26.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M815,-112C815,-112 693,-112 693,-112 687,-112 681,-106 681,-100 681,-100 681,-60 681,-60 681,-54 687,-48 693,-48 693,-48 815,-48 815,-48 821,-48 827,-54 827,-60 827,-60 827,-100 827,-100 827,-106 821,-112 815,-112\"/>\n",
"<text text-anchor=\"start\" x=\"692\" y=\"-90.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"715.5\" y=\"-62.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data.select_data.raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>processed_data.select_data.raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M652.12,-65.98C658.46,-66.86 664.71,-67.73 670.79,-68.57\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"670.48,-72.06 680.87,-69.97 671.45,-65.13 670.48,-72.06\"/>\n",
"</g>\n",
"<!-- save.amount_per_country -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>save.amount_per_country</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M1291,-116C1291,-120.41 1240.8,-124 1179,-124 1117.2,-124 1067,-120.41 1067,-116 1067,-116 1067,-44 1067,-44 1067,-39.59 1117.2,-36 1179,-36 1240.8,-36 1291,-39.59 1291,-44 1291,-44 1291,-116 1291,-116\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M1291,-116C1291,-111.59 1240.8,-108 1179,-108 1117.2,-108 1067,-111.59 1067,-116\"/>\n",
"<text text-anchor=\"start\" x=\"1078\" y=\"-90.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">save.amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"1105\" y=\"-62.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">PandasParquetWriter</text>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M1026,-112C1026,-112 868,-112 868,-112 862,-112 856,-106 856,-100 856,-100 856,-60 856,-60 856,-54 862,-48 868,-48 868,-48 1026,-48 1026,-48 1032,-48 1038,-54 1038,-60 1038,-60 1038,-100 1038,-100 1038,-106 1032,-112 1026,-112\"/>\n",
"<text text-anchor=\"start\" x=\"867\" y=\"-90.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"908.5\" y=\"-62.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge5\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M827.21,-80C833.23,-80 839.39,-80 845.57,-80\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"845.98,-83.5 855.98,-80 845.98,-76.5 845.98,-83.5\"/>\n",
"</g>\n",
"<!-- processed_data.load_data.raw_data -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>processed_data.load_data.raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M305,-80C305,-84.41 236.65,-88 152.5,-88 68.35,-88 0,-84.41 0,-80 0,-80 0,-8 0,-8 0,-3.59 68.35,0 152.5,0 236.65,0 305,-3.59 305,-8 305,-8 305,-80 305,-80\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M305,-80C305,-75.59 236.65,-72 152.5,-72 68.35,-72 0,-75.59 0,-80\"/>\n",
"<text text-anchor=\"start\" x=\"11\" y=\"-54.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data.load_data.raw_data</text>\n",
"<text text-anchor=\"start\" x=\"75.5\" y=\"-26.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">PandasParquetReader</text>\n",
"</g>\n",
"<!-- processed_data.load_data.raw_data&#45;&gt;processed_data.select_data.raw_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>processed_data.load_data.raw_data&#45;&gt;processed_data.select_data.raw_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M305.42,-44C311.5,-44 317.61,-44 323.72,-44\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"323.99,-47.5 333.99,-44 323.99,-40.5 323.99,-47.5\"/>\n",
"</g>\n",
"<!-- amount_per_country&#45;&gt;save.amount_per_country -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>amount_per_country&#45;&gt;save.amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M1038.24,-80C1044.29,-80 1050.43,-80 1056.6,-80\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"1056.99,-83.5 1066.99,-80 1056.99,-76.5 1056.99,-83.5\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"559.5,-139.5 426.5,-139.5 426.5,-94.5 559.5,-94.5 559.5,-139.5\"/>\n",
"<text text-anchor=\"start\" x=\"442\" y=\"-112.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"526\" y=\"-112.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge4\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M559.69,-107.62C593.53,-102.78 635.11,-96.84 671.03,-91.71\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"671.6,-95.16 681,-90.29 670.61,-88.24 671.6,-95.16\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"182,-310.5 123,-310.5 123,-273.5 182,-273.5 182,-310.5\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-288.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M180.5,-255.5C180.5,-255.5 124.5,-255.5 124.5,-255.5 118.5,-255.5 112.5,-249.5 112.5,-243.5 112.5,-243.5 112.5,-230.5 112.5,-230.5 112.5,-224.5 118.5,-218.5 124.5,-218.5 124.5,-218.5 180.5,-218.5 180.5,-218.5 186.5,-218.5 192.5,-224.5 192.5,-230.5 192.5,-230.5 192.5,-243.5 192.5,-243.5 192.5,-249.5 186.5,-255.5 180.5,-255.5\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-233.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- output -->\n",
"<g id=\"node9\" class=\"node\">\n",
"<title>output</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M174.5,-200.5C174.5,-200.5 130.5,-200.5 130.5,-200.5 124.5,-200.5 118.5,-194.5 118.5,-188.5 118.5,-188.5 118.5,-175.5 118.5,-175.5 118.5,-169.5 124.5,-163.5 130.5,-163.5 130.5,-163.5 174.5,-163.5 174.5,-163.5 180.5,-163.5 186.5,-169.5 186.5,-175.5 186.5,-175.5 186.5,-188.5 186.5,-188.5 186.5,-194.5 180.5,-200.5 174.5,-200.5\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-178.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n",
"</g>\n",
"<!-- materializer -->\n",
"<g id=\"node10\" class=\"node\">\n",
"<title>materializer</title>\n",
"<path fill=\"#ffffff\" stroke=\"black\" d=\"M205,-142.26C205,-144.26 181.47,-145.88 152.5,-145.88 123.53,-145.88 100,-144.26 100,-142.26 100,-142.26 100,-109.74 100,-109.74 100,-107.74 123.53,-106.12 152.5,-106.12 181.47,-106.12 205,-107.74 205,-109.74 205,-109.74 205,-142.26 205,-142.26\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M205,-142.26C205,-140.27 181.47,-138.65 152.5,-138.65 123.53,-138.65 100,-140.27 100,-142.26\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-122.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">materializer</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f111ae03710>"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"load_from_dataflow_dr = (\n",
" driver.Builder().with_modules(load_from_dataflow_module).with_cache(path=CACHE_DIR).build()\n",
")\n",
"\n",
"load_from_dataflow_results = load_from_dataflow_dr.execute(\n",
" [\"amount_per_country\", \"save.amount_per_country\"], inputs={\"cutoff_date\": \"2024-09-01\"}\n",
")\n",
"print()\n",
"print(load_from_dataflow_results[\"amount_per_country\"].head())\n",
"print()\n",
"load_from_dataflow_dr.cache.view_run()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As expected, most nodes receive the `RECOMPUTE` behavior in this case. Note that both internal nodes `processed_data.load_data.raw_data` and `processed_data.select_data.raw_data` receive the behavior."
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'save.amount_per_country': <CachingBehavior.RECOMPUTE: 2>,\n",
" 'amount_per_country': <CachingBehavior.RECOMPUTE: 2>,\n",
" 'processed_data': <CachingBehavior.RECOMPUTE: 2>,\n",
" 'processed_data.load_data.raw_data': <CachingBehavior.RECOMPUTE: 2>,\n",
" 'processed_data.select_data.raw_data': <CachingBehavior.RECOMPUTE: 2>,\n",
" 'cutoff_date': <CachingBehavior.DEFAULT: 1>}"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"load_from_dataflow_dr.cache.behaviors[load_from_dataflow_dr.cache.last_run_id]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Granular control"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the previous cells, using `@cache` applied the behavior to all the nodes associated with the function decorated by `@load_from` or `@save_to`.\n",
"\n",
"To achieve granular control, we can use the `target_` parameter of the `@cache` decorator where you can specify the name of the generated nodes.\n",
"\n",
"For `@load_from`, we will want to target `processed_data.load_data.raw_data`. Generally, this node name has the form `f\"{main_node}.load_data.{loaded_node}\"`. In complex scenarios, you should also add `f\"{main_node}.select_data.{loaded_node}\"` to the `target_` parameter for extra safety.\n",
"\n",
"For `@save_to`, we will want to target `save.amount_per_country`. The generic node name is `f\"save.{main_node}\"`."
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"1299pt\" height=\"355pt\"\n",
" viewBox=\"0.00 0.00 1299.00 354.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 350.5)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-350.5 1295,-350.5 1295,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"92,-149.5 92,-338.5 213,-338.5 213,-149.5 92,-149.5\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-323.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- processed_data.load_data.raw_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>processed_data.load_data.raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M305,-131.5C305,-135.91 236.65,-139.5 152.5,-139.5 68.35,-139.5 0,-135.91 0,-131.5 0,-131.5 0,-59.5 0,-59.5 0,-55.09 68.35,-51.5 152.5,-51.5 236.65,-51.5 305,-55.09 305,-59.5 305,-59.5 305,-131.5 305,-131.5\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M305,-131.5C305,-127.09 236.65,-123.5 152.5,-123.5 68.35,-123.5 0,-127.09 0,-131.5\"/>\n",
"<text text-anchor=\"start\" x=\"11\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data.load_data.raw_data</text>\n",
"<text text-anchor=\"start\" x=\"75.5\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">PandasParquetReader</text>\n",
"</g>\n",
"<!-- processed_data.select_data.raw_data -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>processed_data.select_data.raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M640,-127.5C640,-127.5 346,-127.5 346,-127.5 340,-127.5 334,-121.5 334,-115.5 334,-115.5 334,-75.5 334,-75.5 334,-69.5 340,-63.5 346,-63.5 346,-63.5 640,-63.5 640,-63.5 646,-63.5 652,-69.5 652,-75.5 652,-75.5 652,-115.5 652,-115.5 652,-121.5 646,-127.5 640,-127.5\"/>\n",
"<text text-anchor=\"start\" x=\"345\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data.select_data.raw_data</text>\n",
"<text text-anchor=\"start\" x=\"454.5\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data.load_data.raw_data&#45;&gt;processed_data.select_data.raw_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>processed_data.load_data.raw_data&#45;&gt;processed_data.select_data.raw_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M305.42,-95.5C311.5,-95.5 317.61,-95.5 323.72,-95.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"323.99,-99 333.99,-95.5 323.99,-92 323.99,-99\"/>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M815,-90.5C815,-90.5 693,-90.5 693,-90.5 687,-90.5 681,-84.5 681,-78.5 681,-78.5 681,-38.5 681,-38.5 681,-32.5 687,-26.5 693,-26.5 693,-26.5 815,-26.5 815,-26.5 821,-26.5 827,-32.5 827,-38.5 827,-38.5 827,-78.5 827,-78.5 827,-84.5 821,-90.5 815,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"692\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"715.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data.select_data.raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>processed_data.select_data.raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M652.12,-72.91C658.46,-72.01 664.71,-71.11 670.79,-70.24\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"671.47,-73.68 680.87,-68.8 670.48,-66.75 671.47,-73.68\"/>\n",
"</g>\n",
"<!-- save.amount_per_country -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>save.amount_per_country</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M1291,-94.5C1291,-98.91 1240.8,-102.5 1179,-102.5 1117.2,-102.5 1067,-98.91 1067,-94.5 1067,-94.5 1067,-22.5 1067,-22.5 1067,-18.09 1117.2,-14.5 1179,-14.5 1240.8,-14.5 1291,-18.09 1291,-22.5 1291,-22.5 1291,-94.5 1291,-94.5\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M1291,-94.5C1291,-90.09 1240.8,-86.5 1179,-86.5 1117.2,-86.5 1067,-90.09 1067,-94.5\"/>\n",
"<text text-anchor=\"start\" x=\"1078\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">save.amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"1105\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">PandasParquetWriter</text>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M1026,-90.5C1026,-90.5 868,-90.5 868,-90.5 862,-90.5 856,-84.5 856,-78.5 856,-78.5 856,-38.5 856,-38.5 856,-32.5 862,-26.5 868,-26.5 868,-26.5 1026,-26.5 1026,-26.5 1032,-26.5 1038,-32.5 1038,-38.5 1038,-38.5 1038,-78.5 1038,-78.5 1038,-84.5 1032,-90.5 1026,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"867\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"908.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge5\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M827.21,-58.5C833.23,-58.5 839.39,-58.5 845.57,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"845.98,-62 855.98,-58.5 845.98,-55 845.98,-62\"/>\n",
"</g>\n",
"<!-- amount_per_country&#45;&gt;save.amount_per_country -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>amount_per_country&#45;&gt;save.amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M1038.24,-58.5C1044.29,-58.5 1050.43,-58.5 1056.6,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"1056.99,-62 1066.99,-58.5 1056.99,-55 1056.99,-62\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"559.5,-45 426.5,-45 426.5,0 559.5,0 559.5,-45\"/>\n",
"<text text-anchor=\"start\" x=\"442\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"526\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge4\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M559.69,-31.63C593.53,-36.33 635.11,-42.11 671.03,-47.11\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"670.61,-50.58 681,-48.49 671.58,-43.65 670.61,-50.58\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"182,-307 123,-307 123,-270 182,-270 182,-307\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-284.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M180.5,-252C180.5,-252 124.5,-252 124.5,-252 118.5,-252 112.5,-246 112.5,-240 112.5,-240 112.5,-227 112.5,-227 112.5,-221 118.5,-215 124.5,-215 124.5,-215 180.5,-215 180.5,-215 186.5,-215 192.5,-221 192.5,-227 192.5,-227 192.5,-240 192.5,-240 192.5,-246 186.5,-252 180.5,-252\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-229.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- materializer -->\n",
"<g id=\"node9\" class=\"node\">\n",
"<title>materializer</title>\n",
"<path fill=\"#ffffff\" stroke=\"black\" d=\"M205,-193.76C205,-195.76 181.47,-197.38 152.5,-197.38 123.53,-197.38 100,-195.76 100,-193.76 100,-193.76 100,-161.24 100,-161.24 100,-159.24 123.53,-157.62 152.5,-157.62 181.47,-157.62 205,-159.24 205,-161.24 205,-161.24 205,-193.76 205,-193.76\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M205,-193.76C205,-191.77 181.47,-190.15 152.5,-190.15 123.53,-190.15 100,-191.77 100,-193.76\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-173.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">materializer</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f11184dc410>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%cell_to_module load_from_granular_module -d\n",
"import pandas as pd\n",
"from hamilton.function_modifiers import load_from, save_to, cache\n",
"\n",
"DATA = {\n",
" \"cities\": [\"New York\", \"Los Angeles\", \"Chicago\", \"Montréal\", \"Vancouver\", \"Houston\", \"Phoenix\", \"Mexico City\", \"Chihuahua City\", \"Rio de Janeiro\"],\n",
" \"date\": [\"2024-09-13\", \"2024-09-12\", \"2024-09-11\", \"2024-09-11\", \"2024-09-09\", \"2024-09-08\", \"2024-09-07\", \"2024-09-06\", \"2024-09-05\", \"2024-09-04\"],\n",
" \"amount\": [478.23, 251.67, 989.34, 742.14, 584.56, 321.85, 918.67, 135.22, 789.12, 432.78],\n",
" \"country\": [\"USA\", \"USA\", \"USA\", \"Canada\", \"Canada\", \"USA\", \"USA\", \"Mexico\", \"Mexico\", \"Brazil\"],\n",
" \"currency\": [\"USD\", \"USD\", \"USD\", \"CAD\", \"CAD\", \"USD\", \"USD\", \"MXN\", \"MXN\", \"BRL\"],\n",
"}\n",
"\n",
"@load_from.parquet(path=\"raw_data.parquet\", inject_=\"raw_data\")\n",
"@cache(behavior=\"recompute\", target_=\"processed_data.load_data.raw_data\")\n",
"def processed_data(raw_data: pd.DataFrame, cutoff_date: str) -> pd.DataFrame:\n",
" \"\"\"Filter out rows before cutoff date and convert currency to USD.\"\"\"\n",
" df = raw_data.loc[raw_data.date > cutoff_date].copy()\n",
" df[\"amound_in_usd\"] = df[\"amount\"]\n",
" df.loc[df.country == \"Canada\", \"amound_in_usd\"] *= 0.71 \n",
" df.loc[df.country == \"Brazil\", \"amound_in_usd\"] *= 0.18\n",
" df.loc[df.country == \"Mexico\", \"amound_in_usd\"] *= 0.05\n",
" return df\n",
"\n",
"@save_to.parquet(path=\"saved_data.parquet\")\n",
"@cache(behavior=\"recompute\", target_=\"save.amount_per_country\")\n",
"def amount_per_country(processed_data: pd.DataFrame) -> pd.DataFrame:\n",
" \"\"\"Sum the amount in USD per country\"\"\"\n",
" return processed_data.groupby(\"country\")[\"amound_in_usd\"].sum().to_frame()"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"processed_data.load_data.raw_data::adapter::execute_node\n",
"processed_data.load_data.raw_data::adapter::execute_node\n",
"processed_data.select_data.raw_data::adapter::execute_node\n",
"processed_data.select_data.raw_data::adapter::execute_node\n",
"processed_data::adapter::execute_node\n",
"processed_data::adapter::execute_node\n",
"amount_per_country::adapter::execute_node\n",
"amount_per_country::adapter::execute_node\n",
"save.amount_per_country::adapter::execute_node\n",
"save.amount_per_country::adapter::execute_node\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" amound_in_usd\n",
"country \n",
"Brazil 77.9004\n",
"Canada 941.957\n",
"Mexico 46.217\n",
"USA 2959.76\n",
"\n"
]
},
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"1299pt\" height=\"358pt\"\n",
" viewBox=\"0.00 0.00 1299.00 358.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 354)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-354 1295,-354 1295,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"92,-98 92,-342 213,-342 213,-98 92,-98\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-326.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- processed_data.select_data.raw_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>processed_data.select_data.raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M640,-76C640,-76 346,-76 346,-76 340,-76 334,-70 334,-64 334,-64 334,-24 334,-24 334,-18 340,-12 346,-12 346,-12 640,-12 640,-12 646,-12 652,-18 652,-24 652,-24 652,-64 652,-64 652,-70 646,-76 640,-76\"/>\n",
"<text text-anchor=\"start\" x=\"345\" y=\"-54.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data.select_data.raw_data</text>\n",
"<text text-anchor=\"start\" x=\"454.5\" y=\"-26.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M815,-112C815,-112 693,-112 693,-112 687,-112 681,-106 681,-100 681,-100 681,-60 681,-60 681,-54 687,-48 693,-48 693,-48 815,-48 815,-48 821,-48 827,-54 827,-60 827,-60 827,-100 827,-100 827,-106 821,-112 815,-112\"/>\n",
"<text text-anchor=\"start\" x=\"692\" y=\"-90.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"715.5\" y=\"-62.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data.select_data.raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>processed_data.select_data.raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M652.12,-65.98C658.46,-66.86 664.71,-67.73 670.79,-68.57\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"670.48,-72.06 680.87,-69.97 671.45,-65.13 670.48,-72.06\"/>\n",
"</g>\n",
"<!-- save.amount_per_country -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>save.amount_per_country</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M1291,-116C1291,-120.41 1240.8,-124 1179,-124 1117.2,-124 1067,-120.41 1067,-116 1067,-116 1067,-44 1067,-44 1067,-39.59 1117.2,-36 1179,-36 1240.8,-36 1291,-39.59 1291,-44 1291,-44 1291,-116 1291,-116\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M1291,-116C1291,-111.59 1240.8,-108 1179,-108 1117.2,-108 1067,-111.59 1067,-116\"/>\n",
"<text text-anchor=\"start\" x=\"1078\" y=\"-90.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">save.amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"1105\" y=\"-62.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">PandasParquetWriter</text>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M1026,-112C1026,-112 868,-112 868,-112 862,-112 856,-106 856,-100 856,-100 856,-60 856,-60 856,-54 862,-48 868,-48 868,-48 1026,-48 1026,-48 1032,-48 1038,-54 1038,-60 1038,-60 1038,-100 1038,-100 1038,-106 1032,-112 1026,-112\"/>\n",
"<text text-anchor=\"start\" x=\"867\" y=\"-90.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"908.5\" y=\"-62.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge5\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M827.21,-80C833.23,-80 839.39,-80 845.57,-80\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"845.98,-83.5 855.98,-80 845.98,-76.5 845.98,-83.5\"/>\n",
"</g>\n",
"<!-- processed_data.load_data.raw_data -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>processed_data.load_data.raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M305,-80C305,-84.41 236.65,-88 152.5,-88 68.35,-88 0,-84.41 0,-80 0,-80 0,-8 0,-8 0,-3.59 68.35,0 152.5,0 236.65,0 305,-3.59 305,-8 305,-8 305,-80 305,-80\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M305,-80C305,-75.59 236.65,-72 152.5,-72 68.35,-72 0,-75.59 0,-80\"/>\n",
"<text text-anchor=\"start\" x=\"11\" y=\"-54.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data.load_data.raw_data</text>\n",
"<text text-anchor=\"start\" x=\"75.5\" y=\"-26.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">PandasParquetReader</text>\n",
"</g>\n",
"<!-- processed_data.load_data.raw_data&#45;&gt;processed_data.select_data.raw_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>processed_data.load_data.raw_data&#45;&gt;processed_data.select_data.raw_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M305.42,-44C311.5,-44 317.61,-44 323.72,-44\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"323.99,-47.5 333.99,-44 323.99,-40.5 323.99,-47.5\"/>\n",
"</g>\n",
"<!-- amount_per_country&#45;&gt;save.amount_per_country -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>amount_per_country&#45;&gt;save.amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M1038.24,-80C1044.29,-80 1050.43,-80 1056.6,-80\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"1056.99,-83.5 1066.99,-80 1056.99,-76.5 1056.99,-83.5\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"559.5,-139.5 426.5,-139.5 426.5,-94.5 559.5,-94.5 559.5,-139.5\"/>\n",
"<text text-anchor=\"start\" x=\"442\" y=\"-112.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"526\" y=\"-112.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge4\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M559.69,-107.62C593.53,-102.78 635.11,-96.84 671.03,-91.71\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"671.6,-95.16 681,-90.29 670.61,-88.24 671.6,-95.16\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"182,-310.5 123,-310.5 123,-273.5 182,-273.5 182,-310.5\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-288.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M180.5,-255.5C180.5,-255.5 124.5,-255.5 124.5,-255.5 118.5,-255.5 112.5,-249.5 112.5,-243.5 112.5,-243.5 112.5,-230.5 112.5,-230.5 112.5,-224.5 118.5,-218.5 124.5,-218.5 124.5,-218.5 180.5,-218.5 180.5,-218.5 186.5,-218.5 192.5,-224.5 192.5,-230.5 192.5,-230.5 192.5,-243.5 192.5,-243.5 192.5,-249.5 186.5,-255.5 180.5,-255.5\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-233.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- output -->\n",
"<g id=\"node9\" class=\"node\">\n",
"<title>output</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M174.5,-200.5C174.5,-200.5 130.5,-200.5 130.5,-200.5 124.5,-200.5 118.5,-194.5 118.5,-188.5 118.5,-188.5 118.5,-175.5 118.5,-175.5 118.5,-169.5 124.5,-163.5 130.5,-163.5 130.5,-163.5 174.5,-163.5 174.5,-163.5 180.5,-163.5 186.5,-169.5 186.5,-175.5 186.5,-175.5 186.5,-188.5 186.5,-188.5 186.5,-194.5 180.5,-200.5 174.5,-200.5\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-178.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n",
"</g>\n",
"<!-- materializer -->\n",
"<g id=\"node10\" class=\"node\">\n",
"<title>materializer</title>\n",
"<path fill=\"#ffffff\" stroke=\"black\" d=\"M205,-142.26C205,-144.26 181.47,-145.88 152.5,-145.88 123.53,-145.88 100,-144.26 100,-142.26 100,-142.26 100,-109.74 100,-109.74 100,-107.74 123.53,-106.12 152.5,-106.12 181.47,-106.12 205,-107.74 205,-109.74 205,-109.74 205,-142.26 205,-142.26\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M205,-142.26C205,-140.27 181.47,-138.65 152.5,-138.65 123.53,-138.65 100,-140.27 100,-142.26\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-122.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">materializer</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f111ae13550>"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"load_from_granular_dr = (\n",
" driver.Builder().with_modules(load_from_granular_module).with_cache(path=CACHE_DIR).build()\n",
")\n",
"\n",
"load_from_granular_results = load_from_granular_dr.execute(\n",
" [\"amount_per_country\", \"save.amount_per_country\"], inputs={\"cutoff_date\": \"2024-09-01\"}\n",
")\n",
"print()\n",
"print(load_from_granular_results[\"amount_per_country\"].head())\n",
"print()\n",
"load_from_granular_dr.cache.view_run()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we see that the nodes decorated with `@load_from` and `@save_to` (`processed_data` and `amount_per_country`) don't receive the behavior specified in `@cache`."
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'amount_per_country': <CachingBehavior.DEFAULT: 1>,\n",
" 'save.amount_per_country': <CachingBehavior.RECOMPUTE: 2>,\n",
" 'processed_data': <CachingBehavior.DEFAULT: 1>,\n",
" 'processed_data.select_data.raw_data': <CachingBehavior.DEFAULT: 1>,\n",
" 'processed_data.load_data.raw_data': <CachingBehavior.RECOMPUTE: 2>,\n",
" 'cutoff_date': <CachingBehavior.DEFAULT: 1>}"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"load_from_granular_dr.cache.behaviors[load_from_granular_dr.cache.last_run_id]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Driver-level\n",
"\n",
"The next cell presents the same module as before, but without the `@cache` decorator."
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"1299pt\" height=\"355pt\"\n",
" viewBox=\"0.00 0.00 1299.00 354.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 350.5)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-350.5 1295,-350.5 1295,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"92,-149.5 92,-338.5 213,-338.5 213,-149.5 92,-149.5\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-323.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- processed_data.load_data.raw_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>processed_data.load_data.raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M305,-131.5C305,-135.91 236.65,-139.5 152.5,-139.5 68.35,-139.5 0,-135.91 0,-131.5 0,-131.5 0,-59.5 0,-59.5 0,-55.09 68.35,-51.5 152.5,-51.5 236.65,-51.5 305,-55.09 305,-59.5 305,-59.5 305,-131.5 305,-131.5\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M305,-131.5C305,-127.09 236.65,-123.5 152.5,-123.5 68.35,-123.5 0,-127.09 0,-131.5\"/>\n",
"<text text-anchor=\"start\" x=\"11\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data.load_data.raw_data</text>\n",
"<text text-anchor=\"start\" x=\"75.5\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">PandasParquetReader</text>\n",
"</g>\n",
"<!-- processed_data.select_data.raw_data -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>processed_data.select_data.raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M640,-127.5C640,-127.5 346,-127.5 346,-127.5 340,-127.5 334,-121.5 334,-115.5 334,-115.5 334,-75.5 334,-75.5 334,-69.5 340,-63.5 346,-63.5 346,-63.5 640,-63.5 640,-63.5 646,-63.5 652,-69.5 652,-75.5 652,-75.5 652,-115.5 652,-115.5 652,-121.5 646,-127.5 640,-127.5\"/>\n",
"<text text-anchor=\"start\" x=\"345\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data.select_data.raw_data</text>\n",
"<text text-anchor=\"start\" x=\"454.5\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data.load_data.raw_data&#45;&gt;processed_data.select_data.raw_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>processed_data.load_data.raw_data&#45;&gt;processed_data.select_data.raw_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M305.42,-95.5C311.5,-95.5 317.61,-95.5 323.72,-95.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"323.99,-99 333.99,-95.5 323.99,-92 323.99,-99\"/>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M815,-90.5C815,-90.5 693,-90.5 693,-90.5 687,-90.5 681,-84.5 681,-78.5 681,-78.5 681,-38.5 681,-38.5 681,-32.5 687,-26.5 693,-26.5 693,-26.5 815,-26.5 815,-26.5 821,-26.5 827,-32.5 827,-38.5 827,-38.5 827,-78.5 827,-78.5 827,-84.5 821,-90.5 815,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"692\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"715.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data.select_data.raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>processed_data.select_data.raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M652.12,-72.91C658.46,-72.01 664.71,-71.11 670.79,-70.24\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"671.47,-73.68 680.87,-68.8 670.48,-66.75 671.47,-73.68\"/>\n",
"</g>\n",
"<!-- save.amount_per_country -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>save.amount_per_country</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M1291,-94.5C1291,-98.91 1240.8,-102.5 1179,-102.5 1117.2,-102.5 1067,-98.91 1067,-94.5 1067,-94.5 1067,-22.5 1067,-22.5 1067,-18.09 1117.2,-14.5 1179,-14.5 1240.8,-14.5 1291,-18.09 1291,-22.5 1291,-22.5 1291,-94.5 1291,-94.5\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M1291,-94.5C1291,-90.09 1240.8,-86.5 1179,-86.5 1117.2,-86.5 1067,-90.09 1067,-94.5\"/>\n",
"<text text-anchor=\"start\" x=\"1078\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">save.amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"1105\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">PandasParquetWriter</text>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M1026,-90.5C1026,-90.5 868,-90.5 868,-90.5 862,-90.5 856,-84.5 856,-78.5 856,-78.5 856,-38.5 856,-38.5 856,-32.5 862,-26.5 868,-26.5 868,-26.5 1026,-26.5 1026,-26.5 1032,-26.5 1038,-32.5 1038,-38.5 1038,-38.5 1038,-78.5 1038,-78.5 1038,-84.5 1032,-90.5 1026,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"867\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"908.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge5\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M827.21,-58.5C833.23,-58.5 839.39,-58.5 845.57,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"845.98,-62 855.98,-58.5 845.98,-55 845.98,-62\"/>\n",
"</g>\n",
"<!-- amount_per_country&#45;&gt;save.amount_per_country -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>amount_per_country&#45;&gt;save.amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M1038.24,-58.5C1044.29,-58.5 1050.43,-58.5 1056.6,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"1056.99,-62 1066.99,-58.5 1056.99,-55 1056.99,-62\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"559.5,-45 426.5,-45 426.5,0 559.5,0 559.5,-45\"/>\n",
"<text text-anchor=\"start\" x=\"442\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"526\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge4\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M559.69,-31.63C593.53,-36.33 635.11,-42.11 671.03,-47.11\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"670.61,-50.58 681,-48.49 671.58,-43.65 670.61,-50.58\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"182,-307 123,-307 123,-270 182,-270 182,-307\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-284.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M180.5,-252C180.5,-252 124.5,-252 124.5,-252 118.5,-252 112.5,-246 112.5,-240 112.5,-240 112.5,-227 112.5,-227 112.5,-221 118.5,-215 124.5,-215 124.5,-215 180.5,-215 180.5,-215 186.5,-215 192.5,-221 192.5,-227 192.5,-227 192.5,-240 192.5,-240 192.5,-246 186.5,-252 180.5,-252\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-229.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- materializer -->\n",
"<g id=\"node9\" class=\"node\">\n",
"<title>materializer</title>\n",
"<path fill=\"#ffffff\" stroke=\"black\" d=\"M205,-193.76C205,-195.76 181.47,-197.38 152.5,-197.38 123.53,-197.38 100,-195.76 100,-193.76 100,-193.76 100,-161.24 100,-161.24 100,-159.24 123.53,-157.62 152.5,-157.62 181.47,-157.62 205,-159.24 205,-161.24 205,-161.24 205,-193.76 205,-193.76\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M205,-193.76C205,-191.77 181.47,-190.15 152.5,-190.15 123.53,-190.15 100,-191.77 100,-193.76\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-173.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">materializer</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f111aec0e90>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%cell_to_module load_from_driver_module -d\n",
"import pandas as pd\n",
"from hamilton.function_modifiers import load_from, save_to\n",
"\n",
"DATA = {\n",
" \"cities\": [\"New York\", \"Los Angeles\", \"Chicago\", \"Montréal\", \"Vancouver\", \"Houston\", \"Phoenix\", \"Mexico City\", \"Chihuahua City\", \"Rio de Janeiro\"],\n",
" \"date\": [\"2024-09-13\", \"2024-09-12\", \"2024-09-11\", \"2024-09-11\", \"2024-09-09\", \"2024-09-08\", \"2024-09-07\", \"2024-09-06\", \"2024-09-05\", \"2024-09-04\"],\n",
" \"amount\": [478.23, 251.67, 989.34, 742.14, 584.56, 321.85, 918.67, 135.22, 789.12, 432.78],\n",
" \"country\": [\"USA\", \"USA\", \"USA\", \"Canada\", \"Canada\", \"USA\", \"USA\", \"Mexico\", \"Mexico\", \"Brazil\"],\n",
" \"currency\": [\"USD\", \"USD\", \"USD\", \"CAD\", \"CAD\", \"USD\", \"USD\", \"MXN\", \"MXN\", \"BRL\"],\n",
"}\n",
"\n",
"@load_from.parquet(path=\"raw_data.parquet\", inject_=\"raw_data\")\n",
"def processed_data(raw_data: pd.DataFrame, cutoff_date: str) -> pd.DataFrame:\n",
" \"\"\"Filter out rows before cutoff date and convert currency to USD.\"\"\"\n",
" df = raw_data.loc[raw_data.date > cutoff_date].copy()\n",
" df[\"amound_in_usd\"] = df[\"amount\"]\n",
" df.loc[df.country == \"Canada\", \"amound_in_usd\"] *= 0.71 \n",
" df.loc[df.country == \"Brazil\", \"amound_in_usd\"] *= 0.18\n",
" df.loc[df.country == \"Mexico\", \"amound_in_usd\"] *= 0.05\n",
" return df\n",
"\n",
"@save_to.parquet(path=\"saved_data.parquet\")\n",
"def amount_per_country(processed_data: pd.DataFrame) -> pd.DataFrame:\n",
" \"\"\"Sum the amount in USD per country\"\"\"\n",
" return processed_data.groupby(\"country\")[\"amound_in_usd\"].sum().to_frame()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For the `.with_cache()` clause, we don't have to specify the loader's internal names; we can simply use `\"raw_data\"`. For the saver, we must use `\"save.amount_per_country\"` because this matches the name we need to pass to `Driver.execute()`.\n"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"processed_data.load_data.raw_data::adapter::execute_node\n",
"processed_data.load_data.raw_data::adapter::execute_node\n",
"processed_data.select_data.raw_data::adapter::execute_node\n",
"processed_data.select_data.raw_data::adapter::execute_node\n",
"processed_data::adapter::execute_node\n",
"processed_data::adapter::execute_node\n",
"amount_per_country::adapter::execute_node\n",
"amount_per_country::adapter::execute_node\n",
"save.amount_per_country::adapter::execute_node\n",
"save.amount_per_country::adapter::execute_node\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" amound_in_usd\n",
"country \n",
"Brazil 77.9004\n",
"Canada 941.957\n",
"Mexico 46.217\n",
"USA 2959.76\n",
"\n"
]
},
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"1299pt\" height=\"358pt\"\n",
" viewBox=\"0.00 0.00 1299.00 358.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 354)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-354 1295,-354 1295,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"92,-98 92,-342 213,-342 213,-98 92,-98\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-326.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- processed_data.select_data.raw_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>processed_data.select_data.raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M640,-76C640,-76 346,-76 346,-76 340,-76 334,-70 334,-64 334,-64 334,-24 334,-24 334,-18 340,-12 346,-12 346,-12 640,-12 640,-12 646,-12 652,-18 652,-24 652,-24 652,-64 652,-64 652,-70 646,-76 640,-76\"/>\n",
"<text text-anchor=\"start\" x=\"345\" y=\"-54.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data.select_data.raw_data</text>\n",
"<text text-anchor=\"start\" x=\"454.5\" y=\"-26.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M815,-112C815,-112 693,-112 693,-112 687,-112 681,-106 681,-100 681,-100 681,-60 681,-60 681,-54 687,-48 693,-48 693,-48 815,-48 815,-48 821,-48 827,-54 827,-60 827,-60 827,-100 827,-100 827,-106 821,-112 815,-112\"/>\n",
"<text text-anchor=\"start\" x=\"692\" y=\"-90.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"715.5\" y=\"-62.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data.select_data.raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>processed_data.select_data.raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M652.12,-65.98C658.46,-66.86 664.71,-67.73 670.79,-68.57\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"670.48,-72.06 680.87,-69.97 671.45,-65.13 670.48,-72.06\"/>\n",
"</g>\n",
"<!-- save.amount_per_country -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>save.amount_per_country</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M1291,-116C1291,-120.41 1240.8,-124 1179,-124 1117.2,-124 1067,-120.41 1067,-116 1067,-116 1067,-44 1067,-44 1067,-39.59 1117.2,-36 1179,-36 1240.8,-36 1291,-39.59 1291,-44 1291,-44 1291,-116 1291,-116\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M1291,-116C1291,-111.59 1240.8,-108 1179,-108 1117.2,-108 1067,-111.59 1067,-116\"/>\n",
"<text text-anchor=\"start\" x=\"1078\" y=\"-90.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">save.amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"1105\" y=\"-62.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">PandasParquetWriter</text>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M1026,-112C1026,-112 868,-112 868,-112 862,-112 856,-106 856,-100 856,-100 856,-60 856,-60 856,-54 862,-48 868,-48 868,-48 1026,-48 1026,-48 1032,-48 1038,-54 1038,-60 1038,-60 1038,-100 1038,-100 1038,-106 1032,-112 1026,-112\"/>\n",
"<text text-anchor=\"start\" x=\"867\" y=\"-90.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"908.5\" y=\"-62.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge5\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M827.21,-80C833.23,-80 839.39,-80 845.57,-80\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"845.98,-83.5 855.98,-80 845.98,-76.5 845.98,-83.5\"/>\n",
"</g>\n",
"<!-- processed_data.load_data.raw_data -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>processed_data.load_data.raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M305,-80C305,-84.41 236.65,-88 152.5,-88 68.35,-88 0,-84.41 0,-80 0,-80 0,-8 0,-8 0,-3.59 68.35,0 152.5,0 236.65,0 305,-3.59 305,-8 305,-8 305,-80 305,-80\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M305,-80C305,-75.59 236.65,-72 152.5,-72 68.35,-72 0,-75.59 0,-80\"/>\n",
"<text text-anchor=\"start\" x=\"11\" y=\"-54.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data.load_data.raw_data</text>\n",
"<text text-anchor=\"start\" x=\"75.5\" y=\"-26.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">PandasParquetReader</text>\n",
"</g>\n",
"<!-- processed_data.load_data.raw_data&#45;&gt;processed_data.select_data.raw_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>processed_data.load_data.raw_data&#45;&gt;processed_data.select_data.raw_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M305.42,-44C311.5,-44 317.61,-44 323.72,-44\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"323.99,-47.5 333.99,-44 323.99,-40.5 323.99,-47.5\"/>\n",
"</g>\n",
"<!-- amount_per_country&#45;&gt;save.amount_per_country -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>amount_per_country&#45;&gt;save.amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M1038.24,-80C1044.29,-80 1050.43,-80 1056.6,-80\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"1056.99,-83.5 1066.99,-80 1056.99,-76.5 1056.99,-83.5\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"559.5,-139.5 426.5,-139.5 426.5,-94.5 559.5,-94.5 559.5,-139.5\"/>\n",
"<text text-anchor=\"start\" x=\"442\" y=\"-112.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"526\" y=\"-112.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge4\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M559.69,-107.62C593.53,-102.78 635.11,-96.84 671.03,-91.71\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"671.6,-95.16 681,-90.29 670.61,-88.24 671.6,-95.16\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"182,-310.5 123,-310.5 123,-273.5 182,-273.5 182,-310.5\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-288.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M180.5,-255.5C180.5,-255.5 124.5,-255.5 124.5,-255.5 118.5,-255.5 112.5,-249.5 112.5,-243.5 112.5,-243.5 112.5,-230.5 112.5,-230.5 112.5,-224.5 118.5,-218.5 124.5,-218.5 124.5,-218.5 180.5,-218.5 180.5,-218.5 186.5,-218.5 192.5,-224.5 192.5,-230.5 192.5,-230.5 192.5,-243.5 192.5,-243.5 192.5,-249.5 186.5,-255.5 180.5,-255.5\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-233.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- output -->\n",
"<g id=\"node9\" class=\"node\">\n",
"<title>output</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M174.5,-200.5C174.5,-200.5 130.5,-200.5 130.5,-200.5 124.5,-200.5 118.5,-194.5 118.5,-188.5 118.5,-188.5 118.5,-175.5 118.5,-175.5 118.5,-169.5 124.5,-163.5 130.5,-163.5 130.5,-163.5 174.5,-163.5 174.5,-163.5 180.5,-163.5 186.5,-169.5 186.5,-175.5 186.5,-175.5 186.5,-188.5 186.5,-188.5 186.5,-194.5 180.5,-200.5 174.5,-200.5\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-178.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n",
"</g>\n",
"<!-- materializer -->\n",
"<g id=\"node10\" class=\"node\">\n",
"<title>materializer</title>\n",
"<path fill=\"#ffffff\" stroke=\"black\" d=\"M205,-142.26C205,-144.26 181.47,-145.88 152.5,-145.88 123.53,-145.88 100,-144.26 100,-142.26 100,-142.26 100,-109.74 100,-109.74 100,-107.74 123.53,-106.12 152.5,-106.12 181.47,-106.12 205,-107.74 205,-109.74 205,-109.74 205,-142.26 205,-142.26\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M205,-142.26C205,-140.27 181.47,-138.65 152.5,-138.65 123.53,-138.65 100,-140.27 100,-142.26\"/>\n",
"<text text-anchor=\"middle\" x=\"152.5\" y=\"-122.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">materializer</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f11184fe550>"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"load_from_driver_dr = (\n",
" driver.Builder()\n",
" .with_modules(load_from_driver_module)\n",
" .with_cache(path=CACHE_DIR, recompute=[\"raw_data\", \"save.amount_per_country\"])\n",
" .build()\n",
")\n",
"\n",
"load_from_driver_results = load_from_driver_dr.execute(\n",
" [\"amount_per_country\", \"save.amount_per_country\"], inputs={\"cutoff_date\": \"2024-09-01\"}\n",
")\n",
"print()\n",
"print(load_from_driver_results[\"amount_per_country\"].head())\n",
"print()\n",
"load_from_driver_dr.cache.view_run()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The internal nodes associated with `raw_data` have the right behavior. It's generally easier to use than combining `@cache` and `@load_from`/`@save_to`."
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'save.amount_per_country': <CachingBehavior.RECOMPUTE: 2>,\n",
" 'amount_per_country': <CachingBehavior.DEFAULT: 1>,\n",
" 'processed_data': <CachingBehavior.DEFAULT: 1>,\n",
" 'processed_data.load_data.raw_data': <CachingBehavior.RECOMPUTE: 2>,\n",
" 'processed_data.select_data.raw_data': <CachingBehavior.RECOMPUTE: 2>,\n",
" 'cutoff_date': <CachingBehavior.DEFAULT: 1>}"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"load_from_driver_dr.cache.behaviors[load_from_driver_dr.cache.last_run_id]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `from_.` and `to.`\n",
"The constructs `from_` and `to` are ways of defining `DataLoader` and `DataSaver` objects at the `Driver`-level. Like the previous cells, there is no `raw_data()` or `saved_data()` nodes, but no `@load_from` & `@save_to` decorators either.\n",
"\n",
"Notice in the module visualization that `raw_data` now appears as an \"input\" and the saver node is absent."
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"584pt\" height=\"224pt\"\n",
" viewBox=\"0.00 0.00 584.00 224.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 220)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-220 580,-220 580,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"47,-76 47,-208 143,-208 143,-76 47,-76\"/>\n",
"<text text-anchor=\"middle\" x=\"95\" y=\"-192.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M353,-65C353,-65 231,-65 231,-65 225,-65 219,-59 219,-53 219,-53 219,-13 219,-13 219,-7 225,-1 231,-1 231,-1 353,-1 353,-1 359,-1 365,-7 365,-13 365,-13 365,-53 365,-53 365,-59 359,-65 353,-65\"/>\n",
"<text text-anchor=\"start\" x=\"230\" y=\"-43.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"253.5\" y=\"-15.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M564,-65C564,-65 406,-65 406,-65 400,-65 394,-59 394,-53 394,-53 394,-13 394,-13 394,-7 400,-1 406,-1 406,-1 564,-1 564,-1 570,-1 576,-7 576,-13 576,-13 576,-53 576,-53 576,-59 570,-65 564,-65\"/>\n",
"<text text-anchor=\"start\" x=\"405\" y=\"-43.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"446.5\" y=\"-15.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M365.21,-33C371.23,-33 377.39,-33 383.57,-33\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"383.98,-36.5 393.98,-33 383.98,-29.5 383.98,-36.5\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"190,-66 0,-66 0,0 190,0 190,-66\"/>\n",
"<text text-anchor=\"start\" x=\"22\" y=\"-39.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"99\" y=\"-39.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">DataFrame</text>\n",
"<text text-anchor=\"start\" x=\"15\" y=\"-18.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"127.5\" y=\"-18.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M190.07,-33C196.25,-33 202.44,-33 208.56,-33\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"208.76,-36.5 218.76,-33 208.75,-29.5 208.76,-36.5\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"124.5,-176.5 65.5,-176.5 65.5,-139.5 124.5,-139.5 124.5,-176.5\"/>\n",
"<text text-anchor=\"middle\" x=\"95\" y=\"-154.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M123,-121.5C123,-121.5 67,-121.5 67,-121.5 61,-121.5 55,-115.5 55,-109.5 55,-109.5 55,-96.5 55,-96.5 55,-90.5 61,-84.5 67,-84.5 67,-84.5 123,-84.5 123,-84.5 129,-84.5 135,-90.5 135,-96.5 135,-96.5 135,-109.5 135,-109.5 135,-115.5 129,-121.5 123,-121.5\"/>\n",
"<text text-anchor=\"middle\" x=\"95\" y=\"-99.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f111ae764d0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%cell_to_module from_module -d\n",
"import pandas as pd\n",
"\n",
"def processed_data(raw_data: pd.DataFrame, cutoff_date: str) -> pd.DataFrame:\n",
" \"\"\"Filter out rows before cutoff date and convert currency to USD.\"\"\"\n",
" df = raw_data.loc[raw_data.date > cutoff_date].copy()\n",
" df[\"amound_in_usd\"] = df[\"amount\"]\n",
" df.loc[df.country == \"Canada\", \"amound_in_usd\"] *= 0.71 \n",
" df.loc[df.country == \"Brazil\", \"amound_in_usd\"] *= 0.18\n",
" df.loc[df.country == \"Mexico\", \"amound_in_usd\"] *= 0.05\n",
" return df\n",
"\n",
"def amount_per_country(processed_data: pd.DataFrame) -> pd.DataFrame:\n",
" \"\"\"Sum the amount in USD per country\"\"\"\n",
" return processed_data.groupby(\"country\")[\"amound_in_usd\"].sum().to_frame()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are two ways to use `from_` and `to`:\n",
"- via \"static\" materializers added to the `Driver` using `Builder.with_materializers()`\n",
"- via \"dynamic\" materializers passed to `Driver.materialize()` (similar to `Driver.execute()`)\n",
"\n",
"In both cases, it will work with the `.with_cache(recompute=...)` clause."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `.with_materializers()`\n",
"Here, we use `.with_materializers()` to add a parquet loader for `raw_data` and a parquet saver for `amount_per_country`. Note that in `to.parquet(id=...)`, the `id` will be the node name of the data saver.\n",
"\n",
"Then, we add to `.with_cache(recompute=[...])` the node names `raw_data` and `saved_data` (the saver `id`) \n",
"\n",
"We call them \"static\" materializers because they're attached to the `Driver`, can be visualized, and called directly via `.execute()`"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"931pt\" height=\"355pt\"\n",
" viewBox=\"0.00 0.00 931.00 354.50\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 350.5)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-350.5 927,-350.5 927,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"27.5,-149.5 27.5,-338.5 148.5,-338.5 148.5,-149.5 27.5,-149.5\"/>\n",
"<text text-anchor=\"middle\" x=\"88\" y=\"-323.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M501,-90.5C501,-90.5 379,-90.5 379,-90.5 373,-90.5 367,-84.5 367,-78.5 367,-78.5 367,-38.5 367,-38.5 367,-32.5 373,-26.5 379,-26.5 379,-26.5 501,-26.5 501,-26.5 507,-26.5 513,-32.5 513,-38.5 513,-38.5 513,-78.5 513,-78.5 513,-84.5 507,-90.5 501,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"378\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"401.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M712,-90.5C712,-90.5 554,-90.5 554,-90.5 548,-90.5 542,-84.5 542,-78.5 542,-78.5 542,-38.5 542,-38.5 542,-32.5 548,-26.5 554,-26.5 554,-26.5 712,-26.5 712,-26.5 718,-26.5 724,-32.5 724,-38.5 724,-38.5 724,-78.5 724,-78.5 724,-84.5 718,-90.5 712,-90.5\"/>\n",
"<text text-anchor=\"start\" x=\"553\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"594.5\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge5\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M513.21,-58.5C519.23,-58.5 525.39,-58.5 531.57,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"531.98,-62 541.98,-58.5 531.98,-55 531.98,-62\"/>\n",
"</g>\n",
"<!-- raw_data -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M309,-127.5C309,-127.5 234,-127.5 234,-127.5 228,-127.5 222,-121.5 222,-115.5 222,-115.5 222,-75.5 222,-75.5 222,-69.5 228,-63.5 234,-63.5 234,-63.5 309,-63.5 309,-63.5 315,-63.5 321,-69.5 321,-75.5 321,-75.5 321,-115.5 321,-115.5 321,-121.5 315,-127.5 309,-127.5\"/>\n",
"<text text-anchor=\"start\" x=\"235\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"233\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M321.11,-84.7C332.39,-82.19 344.71,-79.45 356.99,-76.72\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"357.81,-80.13 366.82,-74.54 356.29,-73.29 357.81,-80.13\"/>\n",
"</g>\n",
"<!-- load_data.raw_data -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>load_data.raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M176,-131.5C176,-135.91 136.56,-139.5 88,-139.5 39.44,-139.5 0,-135.91 0,-131.5 0,-131.5 0,-59.5 0,-59.5 0,-55.09 39.44,-51.5 88,-51.5 136.56,-51.5 176,-55.09 176,-59.5 176,-59.5 176,-131.5 176,-131.5\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M176,-131.5C176,-127.09 136.56,-123.5 88,-123.5 39.44,-123.5 0,-127.09 0,-131.5\"/>\n",
"<text text-anchor=\"start\" x=\"11\" y=\"-106.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">load_data.raw_data</text>\n",
"<text text-anchor=\"start\" x=\"11\" y=\"-78.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">PandasParquetReader</text>\n",
"</g>\n",
"<!-- load_data.raw_data&#45;&gt;raw_data -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>load_data.raw_data&#45;&gt;raw_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M176.03,-95.5C188.01,-95.5 200.08,-95.5 211.39,-95.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"211.55,-99 221.55,-95.5 211.55,-92 211.55,-99\"/>\n",
"</g>\n",
"<!-- saved_data -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>saved_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M923,-94.5C923,-98.91 884.9,-102.5 838,-102.5 791.1,-102.5 753,-98.91 753,-94.5 753,-94.5 753,-22.5 753,-22.5 753,-18.09 791.1,-14.5 838,-14.5 884.9,-14.5 923,-18.09 923,-22.5 923,-22.5 923,-94.5 923,-94.5\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M923,-94.5C923,-90.09 884.9,-86.5 838,-86.5 791.1,-86.5 753,-90.09 753,-94.5\"/>\n",
"<text text-anchor=\"start\" x=\"793\" y=\"-69.3\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">saved_data</text>\n",
"<text text-anchor=\"start\" x=\"764\" y=\"-41.3\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">PandasParquetWriter</text>\n",
"</g>\n",
"<!-- amount_per_country&#45;&gt;saved_data -->\n",
"<g id=\"edge4\" class=\"edge\">\n",
"<title>amount_per_country&#45;&gt;saved_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M724.22,-58.5C730.33,-58.5 736.49,-58.5 742.61,-58.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"742.87,-62 752.87,-58.5 742.87,-55 742.87,-62\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"338,-45 205,-45 205,0 338,0 338,-45\"/>\n",
"<text text-anchor=\"start\" x=\"220.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"304.5\" y=\"-18.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M338.31,-36.73C344.45,-38.06 350.73,-39.41 356.99,-40.77\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"356.3,-44.2 366.81,-42.89 357.78,-37.36 356.3,-44.2\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"117.5,-307 58.5,-307 58.5,-270 117.5,-270 117.5,-307\"/>\n",
"<text text-anchor=\"middle\" x=\"88\" y=\"-284.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M116,-252C116,-252 60,-252 60,-252 54,-252 48,-246 48,-240 48,-240 48,-227 48,-227 48,-221 54,-215 60,-215 60,-215 116,-215 116,-215 122,-215 128,-221 128,-227 128,-227 128,-240 128,-240 128,-246 122,-252 116,-252\"/>\n",
"<text text-anchor=\"middle\" x=\"88\" y=\"-229.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- materializer -->\n",
"<g id=\"node9\" class=\"node\">\n",
"<title>materializer</title>\n",
"<path fill=\"#ffffff\" stroke=\"black\" d=\"M140.5,-193.76C140.5,-195.76 116.97,-197.38 88,-197.38 59.03,-197.38 35.5,-195.76 35.5,-193.76 35.5,-193.76 35.5,-161.24 35.5,-161.24 35.5,-159.24 59.03,-157.62 88,-157.62 116.97,-157.62 140.5,-159.24 140.5,-161.24 140.5,-161.24 140.5,-193.76 140.5,-193.76\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M140.5,-193.76C140.5,-191.77 116.97,-190.15 88,-190.15 59.03,-190.15 35.5,-191.77 35.5,-193.76\"/>\n",
"<text text-anchor=\"middle\" x=\"88\" y=\"-173.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">materializer</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<hamilton.driver.Driver at 0x7f11184dc9d0>"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from hamilton.io.materialization import from_, to\n",
"\n",
"static_from_dr = (\n",
" driver.Builder()\n",
" .with_modules(from_module)\n",
" .with_materializers(\n",
" from_.parquet(path=\"raw_data.parquet\", target=\"raw_data\"),\n",
" to.parquet(\n",
" id=\"saved_data\",\n",
" dependencies=[\"amount_per_country\"],\n",
" path=\"saved_data.parquet\",\n",
" ),\n",
" )\n",
" .with_cache(\n",
" path=CACHE_DIR,\n",
" recompute=[\"raw_data\", \"saved_data\"],\n",
" default_loader_behavior=\"disable\",\n",
" )\n",
" .build()\n",
")\n",
"static_from_dr"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We execute the dataflow using `.execute()` and requesting the data saver's name `saved_data`."
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"load_data.raw_data::adapter::execute_node\n",
"load_data.raw_data::adapter::execute_node\n",
"raw_data::adapter::execute_node\n",
"raw_data::adapter::execute_node\n",
"processed_data::result_store::get_result::hit\n",
"processed_data::result_store::get_result::hit\n",
"amount_per_country::result_store::get_result::hit\n",
"amount_per_country::result_store::get_result::hit\n",
"saved_data::adapter::execute_node\n",
"saved_data::adapter::execute_node\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" amound_in_usd\n",
"country \n",
"Brazil 77.9004\n",
"Canada 941.9570\n",
"Mexico 46.2170\n",
"USA 2959.7600\n",
"\n"
]
},
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"931pt\" height=\"413pt\"\n",
" viewBox=\"0.00 0.00 931.00 413.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 409)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-409 927,-409 927,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"27.5,-98 27.5,-397 148.5,-397 148.5,-98 27.5,-98\"/>\n",
"<text text-anchor=\"middle\" x=\"88\" y=\"-381.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- raw_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M309,-76C309,-76 234,-76 234,-76 228,-76 222,-70 222,-64 222,-64 222,-24 222,-24 222,-18 228,-12 234,-12 234,-12 309,-12 309,-12 315,-12 321,-18 321,-24 321,-24 321,-64 321,-64 321,-70 315,-76 309,-76\"/>\n",
"<text text-anchor=\"start\" x=\"235\" y=\"-54.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"233\" y=\"-26.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M501,-112C501,-112 379,-112 379,-112 373,-112 367,-106 367,-100 367,-100 367,-60 367,-60 367,-54 373,-48 379,-48 379,-48 501,-48 501,-48 507,-48 513,-54 513,-60 513,-60 513,-100 513,-100 513,-106 507,-112 501,-112\"/>\n",
"<text text-anchor=\"start\" x=\"378\" y=\"-90.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"401.5\" y=\"-62.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M321.11,-54.51C332.39,-56.95 344.71,-59.61 356.99,-62.27\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"356.3,-65.7 366.82,-64.39 357.78,-58.86 356.3,-65.7\"/>\n",
"</g>\n",
"<!-- load_data.raw_data -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>load_data.raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M176,-80C176,-84.41 136.56,-88 88,-88 39.44,-88 0,-84.41 0,-80 0,-80 0,-8 0,-8 0,-3.59 39.44,0 88,0 136.56,0 176,-3.59 176,-8 176,-8 176,-80 176,-80\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M176,-80C176,-75.59 136.56,-72 88,-72 39.44,-72 0,-75.59 0,-80\"/>\n",
"<text text-anchor=\"start\" x=\"11\" y=\"-54.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">load_data.raw_data</text>\n",
"<text text-anchor=\"start\" x=\"11\" y=\"-26.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">PandasParquetReader</text>\n",
"</g>\n",
"<!-- load_data.raw_data&#45;&gt;raw_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>load_data.raw_data&#45;&gt;raw_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M176.03,-44C188.01,-44 200.08,-44 211.39,-44\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"211.55,-47.5 221.55,-44 211.55,-40.5 211.55,-47.5\"/>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#ffc857\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M712,-112C712,-112 554,-112 554,-112 548,-112 542,-106 542,-100 542,-100 542,-60 542,-60 542,-54 548,-48 554,-48 554,-48 712,-48 712,-48 718,-48 724,-54 724,-60 724,-60 724,-100 724,-100 724,-106 718,-112 712,-112\"/>\n",
"<text text-anchor=\"start\" x=\"553\" y=\"-90.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"594.5\" y=\"-62.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge5\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M513.21,-80C519.23,-80 525.39,-80 531.57,-80\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"531.98,-83.5 541.98,-80 531.98,-76.5 531.98,-83.5\"/>\n",
"</g>\n",
"<!-- saved_data -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>saved_data</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M923,-116C923,-120.41 884.9,-124 838,-124 791.1,-124 753,-120.41 753,-116 753,-116 753,-44 753,-44 753,-39.59 791.1,-36 838,-36 884.9,-36 923,-39.59 923,-44 923,-44 923,-116 923,-116\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M923,-116C923,-111.59 884.9,-108 838,-108 791.1,-108 753,-111.59 753,-116\"/>\n",
"<text text-anchor=\"start\" x=\"793\" y=\"-90.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">saved_data</text>\n",
"<text text-anchor=\"start\" x=\"764\" y=\"-62.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">PandasParquetWriter</text>\n",
"</g>\n",
"<!-- amount_per_country&#45;&gt;saved_data -->\n",
"<g id=\"edge4\" class=\"edge\">\n",
"<title>amount_per_country&#45;&gt;saved_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M724.22,-80C730.33,-80 736.49,-80 742.61,-80\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"742.87,-83.5 752.87,-80 742.87,-76.5 742.87,-83.5\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"338,-139.5 205,-139.5 205,-94.5 338,-94.5 338,-139.5\"/>\n",
"<text text-anchor=\"start\" x=\"220.5\" y=\"-112.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"304.5\" y=\"-112.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M338.31,-102.38C344.45,-101.01 350.73,-99.62 356.99,-98.22\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"357.81,-101.63 366.81,-96.04 356.29,-94.79 357.81,-101.63\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"117.5,-365.5 58.5,-365.5 58.5,-328.5 117.5,-328.5 117.5,-365.5\"/>\n",
"<text text-anchor=\"middle\" x=\"88\" y=\"-343.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M116,-310.5C116,-310.5 60,-310.5 60,-310.5 54,-310.5 48,-304.5 48,-298.5 48,-298.5 48,-285.5 48,-285.5 48,-279.5 54,-273.5 60,-273.5 60,-273.5 116,-273.5 116,-273.5 122,-273.5 128,-279.5 128,-285.5 128,-285.5 128,-298.5 128,-298.5 128,-304.5 122,-310.5 116,-310.5\"/>\n",
"<text text-anchor=\"middle\" x=\"88\" y=\"-288.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- output -->\n",
"<g id=\"node9\" class=\"node\">\n",
"<title>output</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M110,-255.5C110,-255.5 66,-255.5 66,-255.5 60,-255.5 54,-249.5 54,-243.5 54,-243.5 54,-230.5 54,-230.5 54,-224.5 60,-218.5 66,-218.5 66,-218.5 110,-218.5 110,-218.5 116,-218.5 122,-224.5 122,-230.5 122,-230.5 122,-243.5 122,-243.5 122,-249.5 116,-255.5 110,-255.5\"/>\n",
"<text text-anchor=\"middle\" x=\"88\" y=\"-233.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n",
"</g>\n",
"<!-- materializer -->\n",
"<g id=\"node10\" class=\"node\">\n",
"<title>materializer</title>\n",
"<path fill=\"#ffffff\" stroke=\"black\" d=\"M140.5,-197.26C140.5,-199.26 116.97,-200.88 88,-200.88 59.03,-200.88 35.5,-199.26 35.5,-197.26 35.5,-197.26 35.5,-164.74 35.5,-164.74 35.5,-162.74 59.03,-161.12 88,-161.12 116.97,-161.12 140.5,-162.74 140.5,-164.74 140.5,-164.74 140.5,-197.26 140.5,-197.26\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M140.5,-197.26C140.5,-195.27 116.97,-193.65 88,-193.65 59.03,-193.65 35.5,-195.27 35.5,-197.26\"/>\n",
"<text text-anchor=\"middle\" x=\"88\" y=\"-177.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">materializer</text>\n",
"</g>\n",
"<!-- from cache -->\n",
"<g id=\"node11\" class=\"node\">\n",
"<title>from cache</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M126,-143.5C126,-143.5 50,-143.5 50,-143.5 44,-143.5 38,-137.5 38,-131.5 38,-131.5 38,-118.5 38,-118.5 38,-112.5 44,-106.5 50,-106.5 50,-106.5 126,-106.5 126,-106.5 132,-106.5 138,-112.5 138,-118.5 138,-118.5 138,-131.5 138,-131.5 138,-137.5 132,-143.5 126,-143.5\"/>\n",
"<text text-anchor=\"middle\" x=\"88\" y=\"-121.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">from cache</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f1118505690>"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"static_from_results = static_from_dr.execute(\n",
" [\"amount_per_country\", \"saved_data\"], inputs={\"cutoff_date\": \"2024-09-01\"}\n",
")\n",
"print()\n",
"print(static_from_results[\"amount_per_country\"].head())\n",
"print()\n",
"static_from_dr.cache.view_run()"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'amount_per_country': <CachingBehavior.DEFAULT: 1>,\n",
" 'processed_data': <CachingBehavior.DEFAULT: 1>,\n",
" 'raw_data': <CachingBehavior.RECOMPUTE: 2>,\n",
" 'cutoff_date': <CachingBehavior.DEFAULT: 1>,\n",
" 'saved_data': <CachingBehavior.RECOMPUTE: 2>,\n",
" 'load_data.raw_data': <CachingBehavior.RECOMPUTE: 2>}"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"static_from_dr.cache.behaviors[static_from_dr.cache.last_run_id]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `.materialize()`\n",
"Now, we build a `Driver` without the static materializers. Just like the dataflow definition, the module will show `raw_data` as an input."
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"584pt\" height=\"224pt\"\n",
" viewBox=\"0.00 0.00 584.00 224.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 220)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-220 580,-220 580,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"47,-76 47,-208 143,-208 143,-76 47,-76\"/>\n",
"<text text-anchor=\"middle\" x=\"95\" y=\"-192.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M353,-65C353,-65 231,-65 231,-65 225,-65 219,-59 219,-53 219,-53 219,-13 219,-13 219,-7 225,-1 231,-1 231,-1 353,-1 353,-1 359,-1 365,-7 365,-13 365,-13 365,-53 365,-53 365,-59 359,-65 353,-65\"/>\n",
"<text text-anchor=\"start\" x=\"230\" y=\"-43.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"253.5\" y=\"-15.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M564,-65C564,-65 406,-65 406,-65 400,-65 394,-59 394,-53 394,-53 394,-13 394,-13 394,-7 400,-1 406,-1 406,-1 564,-1 564,-1 570,-1 576,-7 576,-13 576,-13 576,-53 576,-53 576,-59 570,-65 564,-65\"/>\n",
"<text text-anchor=\"start\" x=\"405\" y=\"-43.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"446.5\" y=\"-15.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M365.21,-33C371.23,-33 377.39,-33 383.57,-33\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"383.98,-36.5 393.98,-33 383.98,-29.5 383.98,-36.5\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"190,-66 0,-66 0,0 190,0 190,-66\"/>\n",
"<text text-anchor=\"start\" x=\"22\" y=\"-39.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"99\" y=\"-39.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">DataFrame</text>\n",
"<text text-anchor=\"start\" x=\"15\" y=\"-18.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"127.5\" y=\"-18.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M190.07,-33C196.25,-33 202.44,-33 208.56,-33\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"208.76,-36.5 218.76,-33 208.75,-29.5 208.76,-36.5\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"124.5,-176.5 65.5,-176.5 65.5,-139.5 124.5,-139.5 124.5,-176.5\"/>\n",
"<text text-anchor=\"middle\" x=\"95\" y=\"-154.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M123,-121.5C123,-121.5 67,-121.5 67,-121.5 61,-121.5 55,-115.5 55,-109.5 55,-109.5 55,-96.5 55,-96.5 55,-90.5 61,-84.5 67,-84.5 67,-84.5 123,-84.5 123,-84.5 129,-84.5 135,-90.5 135,-96.5 135,-96.5 135,-109.5 135,-109.5 135,-115.5 129,-121.5 123,-121.5\"/>\n",
"<text text-anchor=\"middle\" x=\"95\" y=\"-99.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<hamilton.driver.Driver at 0x7f11184d3e10>"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dynamic_from_dr = (\n",
" driver.Builder()\n",
" .with_modules(from_module)\n",
" .with_cache(path=CACHE_DIR, recompute=[\"raw_data\", \"saved_data\"])\n",
" .build()\n",
")\n",
"dynamic_from_dr"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The method `Driver.materialize()` has a slightly different signature than `Driver.execute()`. - The first argument collects `DataLoader` and `DataSaver` objects\n",
"- `additional_vars` is equivalent to `final_vars` in `Driver.execute()`\n",
"- it returns a tuple of `(metadata, additional_vars_results)` "
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"load_data.raw_data::adapter::execute_node\n",
"load_data.raw_data::adapter::execute_node\n",
"raw_data::adapter::execute_node\n",
"raw_data::adapter::execute_node\n",
"processed_data::result_store::get_result::hit\n",
"processed_data::result_store::get_result::hit\n",
"amount_per_country::result_store::get_result::hit\n",
"amount_per_country::result_store::get_result::hit\n",
"saved_data::adapter::execute_node\n",
"saved_data::adapter::execute_node\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" amound_in_usd\n",
"country \n",
"Brazil 77.9004\n",
"Canada 941.9570\n",
"Mexico 46.2170\n",
"USA 2959.7600\n",
"\n"
]
},
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 2.43.0 (0)\n",
" -->\n",
"<!-- Title: %3 Pages: 1 -->\n",
"<svg width=\"931pt\" height=\"413pt\"\n",
" viewBox=\"0.00 0.00 931.00 413.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 409)\">\n",
"<title>%3</title>\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-409 927,-409 927,4 -4,4\"/>\n",
"<g id=\"clust1\" class=\"cluster\">\n",
"<title>cluster__legend</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" points=\"27.5,-98 27.5,-397 148.5,-397 148.5,-98 27.5,-98\"/>\n",
"<text text-anchor=\"middle\" x=\"88\" y=\"-381.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n",
"</g>\n",
"<!-- raw_data -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M309,-76C309,-76 234,-76 234,-76 228,-76 222,-70 222,-64 222,-64 222,-24 222,-24 222,-18 228,-12 234,-12 234,-12 309,-12 309,-12 315,-12 321,-18 321,-24 321,-24 321,-64 321,-64 321,-70 315,-76 309,-76\"/>\n",
"<text text-anchor=\"start\" x=\"235\" y=\"-54.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">raw_data</text>\n",
"<text text-anchor=\"start\" x=\"233\" y=\"-26.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>processed_data</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M501,-112C501,-112 379,-112 379,-112 373,-112 367,-106 367,-100 367,-100 367,-60 367,-60 367,-54 373,-48 379,-48 379,-48 501,-48 501,-48 507,-48 513,-54 513,-60 513,-60 513,-100 513,-100 513,-106 507,-112 501,-112\"/>\n",
"<text text-anchor=\"start\" x=\"378\" y=\"-90.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">processed_data</text>\n",
"<text text-anchor=\"start\" x=\"401.5\" y=\"-62.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- raw_data&#45;&gt;processed_data -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>raw_data&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M321.11,-54.51C332.39,-56.95 344.71,-59.61 356.99,-62.27\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"356.3,-65.7 366.82,-64.39 357.78,-58.86 356.3,-65.7\"/>\n",
"</g>\n",
"<!-- load_data.raw_data -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>load_data.raw_data</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M176,-80C176,-84.41 136.56,-88 88,-88 39.44,-88 0,-84.41 0,-80 0,-80 0,-8 0,-8 0,-3.59 39.44,0 88,0 136.56,0 176,-3.59 176,-8 176,-8 176,-80 176,-80\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M176,-80C176,-75.59 136.56,-72 88,-72 39.44,-72 0,-75.59 0,-80\"/>\n",
"<text text-anchor=\"start\" x=\"11\" y=\"-54.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">load_data.raw_data</text>\n",
"<text text-anchor=\"start\" x=\"11\" y=\"-26.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">PandasParquetReader</text>\n",
"</g>\n",
"<!-- load_data.raw_data&#45;&gt;raw_data -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>load_data.raw_data&#45;&gt;raw_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M176.03,-44C188.01,-44 200.08,-44 211.39,-44\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"211.55,-47.5 221.55,-44 211.55,-40.5 211.55,-47.5\"/>\n",
"</g>\n",
"<!-- amount_per_country -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>amount_per_country</title>\n",
"<path fill=\"#ffc857\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M712,-112C712,-112 554,-112 554,-112 548,-112 542,-106 542,-100 542,-100 542,-60 542,-60 542,-54 548,-48 554,-48 554,-48 712,-48 712,-48 718,-48 724,-54 724,-60 724,-60 724,-100 724,-100 724,-106 718,-112 712,-112\"/>\n",
"<text text-anchor=\"start\" x=\"553\" y=\"-90.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">amount_per_country</text>\n",
"<text text-anchor=\"start\" x=\"594.5\" y=\"-62.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n",
"</g>\n",
"<!-- processed_data&#45;&gt;amount_per_country -->\n",
"<g id=\"edge5\" class=\"edge\">\n",
"<title>processed_data&#45;&gt;amount_per_country</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M513.21,-80C519.23,-80 525.39,-80 531.57,-80\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"531.98,-83.5 541.98,-80 531.98,-76.5 531.98,-83.5\"/>\n",
"</g>\n",
"<!-- saved_data -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>saved_data</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M923,-116C923,-120.41 884.9,-124 838,-124 791.1,-124 753,-120.41 753,-116 753,-116 753,-44 753,-44 753,-39.59 791.1,-36 838,-36 884.9,-36 923,-39.59 923,-44 923,-44 923,-116 923,-116\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M923,-116C923,-111.59 884.9,-108 838,-108 791.1,-108 753,-111.59 753,-116\"/>\n",
"<text text-anchor=\"start\" x=\"793\" y=\"-90.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">saved_data</text>\n",
"<text text-anchor=\"start\" x=\"764\" y=\"-62.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">PandasParquetWriter</text>\n",
"</g>\n",
"<!-- amount_per_country&#45;&gt;saved_data -->\n",
"<g id=\"edge4\" class=\"edge\">\n",
"<title>amount_per_country&#45;&gt;saved_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M724.22,-80C730.33,-80 736.49,-80 742.61,-80\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"742.87,-83.5 752.87,-80 742.87,-76.5 742.87,-83.5\"/>\n",
"</g>\n",
"<!-- _processed_data_inputs -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>_processed_data_inputs</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"338,-139.5 205,-139.5 205,-94.5 338,-94.5 338,-139.5\"/>\n",
"<text text-anchor=\"start\" x=\"220.5\" y=\"-112.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">cutoff_date</text>\n",
"<text text-anchor=\"start\" x=\"304.5\" y=\"-112.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n",
"</g>\n",
"<!-- _processed_data_inputs&#45;&gt;processed_data -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>_processed_data_inputs&#45;&gt;processed_data</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M338.31,-102.38C344.45,-101.01 350.73,-99.62 356.99,-98.22\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"357.81,-101.63 366.81,-96.04 356.29,-94.79 357.81,-101.63\"/>\n",
"</g>\n",
"<!-- input -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>input</title>\n",
"<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"117.5,-365.5 58.5,-365.5 58.5,-328.5 117.5,-328.5 117.5,-365.5\"/>\n",
"<text text-anchor=\"middle\" x=\"88\" y=\"-343.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n",
"</g>\n",
"<!-- function -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>function</title>\n",
"<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M116,-310.5C116,-310.5 60,-310.5 60,-310.5 54,-310.5 48,-304.5 48,-298.5 48,-298.5 48,-285.5 48,-285.5 48,-279.5 54,-273.5 60,-273.5 60,-273.5 116,-273.5 116,-273.5 122,-273.5 128,-279.5 128,-285.5 128,-285.5 128,-298.5 128,-298.5 128,-304.5 122,-310.5 116,-310.5\"/>\n",
"<text text-anchor=\"middle\" x=\"88\" y=\"-288.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n",
"</g>\n",
"<!-- output -->\n",
"<g id=\"node9\" class=\"node\">\n",
"<title>output</title>\n",
"<path fill=\"#ffc857\" stroke=\"black\" d=\"M110,-255.5C110,-255.5 66,-255.5 66,-255.5 60,-255.5 54,-249.5 54,-243.5 54,-243.5 54,-230.5 54,-230.5 54,-224.5 60,-218.5 66,-218.5 66,-218.5 110,-218.5 110,-218.5 116,-218.5 122,-224.5 122,-230.5 122,-230.5 122,-243.5 122,-243.5 122,-249.5 116,-255.5 110,-255.5\"/>\n",
"<text text-anchor=\"middle\" x=\"88\" y=\"-233.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n",
"</g>\n",
"<!-- materializer -->\n",
"<g id=\"node10\" class=\"node\">\n",
"<title>materializer</title>\n",
"<path fill=\"#ffffff\" stroke=\"black\" d=\"M140.5,-197.26C140.5,-199.26 116.97,-200.88 88,-200.88 59.03,-200.88 35.5,-199.26 35.5,-197.26 35.5,-197.26 35.5,-164.74 35.5,-164.74 35.5,-162.74 59.03,-161.12 88,-161.12 116.97,-161.12 140.5,-162.74 140.5,-164.74 140.5,-164.74 140.5,-197.26 140.5,-197.26\"/>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M140.5,-197.26C140.5,-195.27 116.97,-193.65 88,-193.65 59.03,-193.65 35.5,-195.27 35.5,-197.26\"/>\n",
"<text text-anchor=\"middle\" x=\"88\" y=\"-177.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">materializer</text>\n",
"</g>\n",
"<!-- from cache -->\n",
"<g id=\"node11\" class=\"node\">\n",
"<title>from cache</title>\n",
"<path fill=\"#ffffff\" stroke=\"#f06449\" stroke-width=\"3\" d=\"M126,-143.5C126,-143.5 50,-143.5 50,-143.5 44,-143.5 38,-137.5 38,-131.5 38,-131.5 38,-118.5 38,-118.5 38,-112.5 44,-106.5 50,-106.5 50,-106.5 126,-106.5 126,-106.5 132,-106.5 138,-112.5 138,-118.5 138,-118.5 138,-131.5 138,-131.5 138,-137.5 132,-143.5 126,-143.5\"/>\n",
"<text text-anchor=\"middle\" x=\"88\" y=\"-121.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">from cache</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x7f11184ff7d0>"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"metadata, dynamic_from_results = dynamic_from_dr.materialize(\n",
" from_.parquet(path=\"raw_data.parquet\", target=\"raw_data\"),\n",
" to.parquet(\n",
" id=\"saved_data\",\n",
" dependencies=[\"amount_per_country\"],\n",
" path=\"saved_data.parquet\",\n",
" ),\n",
" additional_vars=[\"amount_per_country\"],\n",
" inputs={\"cutoff_date\": \"2024-09-01\"},\n",
")\n",
"print()\n",
"print(dynamic_from_results[\"amount_per_country\"].head())\n",
"print()\n",
"dynamic_from_dr.cache.view_run()"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'amount_per_country': <CachingBehavior.DEFAULT: 1>,\n",
" 'processed_data': <CachingBehavior.DEFAULT: 1>,\n",
" 'raw_data': <CachingBehavior.RECOMPUTE: 2>,\n",
" 'cutoff_date': <CachingBehavior.DEFAULT: 1>,\n",
" 'saved_data': <CachingBehavior.RECOMPUTE: 2>,\n",
" 'load_data.raw_data': <CachingBehavior.RECOMPUTE: 2>}"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dynamic_from_dr.cache.behaviors[dynamic_from_dr.cache.last_run_id]"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}