| { |
| "cells": [ |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "# People Data Labs + Hamilton\n", |
| "This notebook will teach you how to use People Data Labs (PDL) [Company enrichment](https://docs.peopledatalabs.com/docs/company-enrichment-api) data along stock market data for financial analysis. We will introduce the Python library [Hamilon](https://hamilton.dagworks.io/en/latest/?badge=latest) to help create data transformations.\n", |
| "\n", |
| "**Content**\n", |
| "1. Data preparation\n", |
| "2. Analytics: Explore the relationship between employee count and stock growth" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "## 0. Imports\n", |
| "Make sure you followed the `README` to install dependencies and download the data." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 1, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "import pandas as pd\n", |
| "from hamilton import driver\n", |
| "from IPython.display import display\n", |
| "\n", |
| "# Loads a \"jupyter magic\" that allows special notebook interactions\n", |
| "%load_ext hamilton.plugins.jupyter_magic" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "## 1. Load raw data\n", |
| "Hamilton [uses Python functions to define a dataflow](https://hamilton.dagworks.io/en/latest/concepts/node/) of transformations. \n", |
| "\n", |
| "The next cell starts with the special statement `%%cell_to_module` and includes Python functions to define steps of our analysis. \n", |
| "\n", |
| "Executing the cell will produce a visualization of the flow of operations." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 2, |
| "metadata": {}, |
| "outputs": [ |
| { |
| "data": { |
| "image/svg+xml": [ |
| "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n", |
| "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n", |
| " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n", |
| "<!-- Generated by graphviz version 2.43.0 (0)\n", |
| " -->\n", |
| "<!-- Title: %3 Pages: 1 -->\n", |
| "<svg width=\"548pt\" height=\"348pt\"\n", |
| " viewBox=\"0.00 0.00 548.00 348.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n", |
| "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 344)\">\n", |
| "<title>%3</title>\n", |
| "<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-344 544,-344 544,4 -4,4\"/>\n", |
| "<g id=\"clust1\" class=\"cluster\">\n", |
| "<title>cluster__legend</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" points=\"12,-200 12,-332 108,-332 108,-200 12,-200\"/>\n", |
| "<text text-anchor=\"middle\" x=\"60\" y=\"-316.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n", |
| "</g>\n", |
| "<!-- employee_count_by_month_df -->\n", |
| "<g id=\"node1\" class=\"node\">\n", |
| "<title>employee_count_by_month_df</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M528,-146C528,-146 297,-146 297,-146 291,-146 285,-140 285,-134 285,-134 285,-94 285,-94 285,-88 291,-82 297,-82 297,-82 528,-82 528,-82 534,-82 540,-88 540,-94 540,-94 540,-134 540,-134 540,-140 534,-146 528,-146\"/>\n", |
| "<text text-anchor=\"start\" x=\"296\" y=\"-124.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">employee_count_by_month_df</text>\n", |
| "<text text-anchor=\"start\" x=\"374\" y=\"-96.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- company_info -->\n", |
| "<g id=\"node2\" class=\"node\">\n", |
| "<title>company_info</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M465.5,-64C465.5,-64 359.5,-64 359.5,-64 353.5,-64 347.5,-58 347.5,-52 347.5,-52 347.5,-12 347.5,-12 347.5,-6 353.5,0 359.5,0 359.5,0 465.5,0 465.5,0 471.5,0 477.5,-6 477.5,-12 477.5,-12 477.5,-52 477.5,-52 477.5,-58 471.5,-64 465.5,-64\"/>\n", |
| "<text text-anchor=\"start\" x=\"358.5\" y=\"-42.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">company_info</text>\n", |
| "<text text-anchor=\"start\" x=\"374\" y=\"-14.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- pdl_data -->\n", |
| "<g id=\"node3\" class=\"node\">\n", |
| "<title>pdl_data</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M240,-105C240,-105 165,-105 165,-105 159,-105 153,-99 153,-93 153,-93 153,-53 153,-53 153,-47 159,-41 165,-41 165,-41 240,-41 240,-41 246,-41 252,-47 252,-53 252,-53 252,-93 252,-93 252,-99 246,-105 240,-105\"/>\n", |
| "<text text-anchor=\"start\" x=\"169\" y=\"-83.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">pdl_data</text>\n", |
| "<text text-anchor=\"start\" x=\"164\" y=\"-55.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- pdl_data->employee_count_by_month_df -->\n", |
| "<g id=\"edge1\" class=\"edge\">\n", |
| "<title>pdl_data->employee_count_by_month_df</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M252,-82.56C259.17,-83.97 266.84,-85.49 274.79,-87.05\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"274.13,-90.49 284.61,-88.99 275.48,-83.62 274.13,-90.49\"/>\n", |
| "</g>\n", |
| "<!-- pdl_data->company_info -->\n", |
| "<g id=\"edge2\" class=\"edge\">\n", |
| "<title>pdl_data->company_info</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M252,-63.44C277.35,-58.44 308.91,-52.22 337.25,-46.64\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"337.99,-50.06 347.12,-44.69 336.63,-43.19 337.99,-50.06\"/>\n", |
| "</g>\n", |
| "<!-- stock_data -->\n", |
| "<g id=\"node4\" class=\"node\">\n", |
| "<title>stock_data</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M244,-189C244,-189 161,-189 161,-189 155,-189 149,-183 149,-177 149,-177 149,-137 149,-137 149,-131 155,-125 161,-125 161,-125 244,-125 244,-125 250,-125 256,-131 256,-137 256,-137 256,-177 256,-177 256,-183 250,-189 244,-189\"/>\n", |
| "<text text-anchor=\"start\" x=\"160\" y=\"-167.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">stock_data</text>\n", |
| "<text text-anchor=\"start\" x=\"164\" y=\"-139.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- _pdl_data_inputs -->\n", |
| "<g id=\"node5\" class=\"node\">\n", |
| "<title>_pdl_data_inputs</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"116,-106 4,-106 4,-40 116,-40 116,-106\"/>\n", |
| "<text text-anchor=\"start\" x=\"22.5\" y=\"-79.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">pdl_file</text>\n", |
| "<text text-anchor=\"start\" x=\"82\" y=\"-79.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n", |
| "<text text-anchor=\"start\" x=\"19\" y=\"-58.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">data_dir</text>\n", |
| "<text text-anchor=\"start\" x=\"82\" y=\"-58.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n", |
| "</g>\n", |
| "<!-- _pdl_data_inputs->pdl_data -->\n", |
| "<g id=\"edge3\" class=\"edge\">\n", |
| "<title>_pdl_data_inputs->pdl_data</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M116.13,-73C124.86,-73 133.91,-73 142.72,-73\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"142.92,-76.5 152.92,-73 142.92,-69.5 142.92,-76.5\"/>\n", |
| "</g>\n", |
| "<!-- _stock_data_inputs -->\n", |
| "<g id=\"node6\" class=\"node\">\n", |
| "<title>_stock_data_inputs</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"120,-190 0,-190 0,-124 120,-124 120,-190\"/>\n", |
| "<text text-anchor=\"start\" x=\"19\" y=\"-163.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">data_dir</text>\n", |
| "<text text-anchor=\"start\" x=\"86\" y=\"-163.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n", |
| "<text text-anchor=\"start\" x=\"15\" y=\"-142.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">stock_file</text>\n", |
| "<text text-anchor=\"start\" x=\"86\" y=\"-142.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n", |
| "</g>\n", |
| "<!-- _stock_data_inputs->stock_data -->\n", |
| "<g id=\"edge4\" class=\"edge\">\n", |
| "<title>_stock_data_inputs->stock_data</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M120.18,-157C126.23,-157 132.4,-157 138.49,-157\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"138.64,-160.5 148.64,-157 138.64,-153.5 138.64,-160.5\"/>\n", |
| "</g>\n", |
| "<!-- input -->\n", |
| "<g id=\"node7\" class=\"node\">\n", |
| "<title>input</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"89.5,-300.5 30.5,-300.5 30.5,-263.5 89.5,-263.5 89.5,-300.5\"/>\n", |
| "<text text-anchor=\"middle\" x=\"60\" y=\"-278.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n", |
| "</g>\n", |
| "<!-- function -->\n", |
| "<g id=\"node8\" class=\"node\">\n", |
| "<title>function</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M88,-245.5C88,-245.5 32,-245.5 32,-245.5 26,-245.5 20,-239.5 20,-233.5 20,-233.5 20,-220.5 20,-220.5 20,-214.5 26,-208.5 32,-208.5 32,-208.5 88,-208.5 88,-208.5 94,-208.5 100,-214.5 100,-220.5 100,-220.5 100,-233.5 100,-233.5 100,-239.5 94,-245.5 88,-245.5\"/>\n", |
| "<text text-anchor=\"middle\" x=\"60\" y=\"-223.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n", |
| "</g>\n", |
| "</g>\n", |
| "</svg>\n" |
| ], |
| "text/plain": [ |
| "<graphviz.graphs.Digraph at 0x7f6a48f4fee0>" |
| ] |
| }, |
| "execution_count": 2, |
| "metadata": {}, |
| "output_type": "execute_result" |
| } |
| ], |
| "source": [ |
| "%%cell_to_module -m data_preparation -d\n", |
| "\n", |
| "from pathlib import Path\n", |
| "import pandas as pd\n", |
| "\n", |
| "\n", |
| "def pdl_data(pdl_file: str, data_dir: str = \"data/\") -> pd.DataFrame:\n", |
| " \"\"\"Load raw Pêople Data Labs data stored locally\"\"\"\n", |
| " return pd.read_json(Path(data_dir, pdl_file))\n", |
| "\n", |
| "\n", |
| "def stock_data(stock_file: str, data_dir: str = \"data/\") -> pd.DataFrame:\n", |
| " \"\"\"Load raw stock data stored locally\"\"\"\n", |
| " return pd.read_json(Path(data_dir, stock_file))\n", |
| "\n", |
| "\n", |
| "def company_info(pdl_data: pd.DataFrame) -> pd.DataFrame:\n", |
| " \"\"\"Select columns containing general company info\"\"\"\n", |
| " columns = [\n", |
| " \"id\", \"ticker\", \"website\", \"name\", \"display_name\", \"legal_name\", \"founded\", \n", |
| " \"industry\", \"type\", \"summary\", \"total_funding_raised\", \"latest_funding_stage\",\n", |
| " \"number_funding_rounds\", \"last_funding_date\", \"inferred_revenue\"\n", |
| " ]\n", |
| " return pdl_data[columns]\n", |
| "\n", |
| "\n", |
| "def employee_count_by_month_df(pdl_data: pd.DataFrame) -> pd.DataFrame:\n", |
| " \"\"\"Normalized employee count data\"\"\"\n", |
| " return (\n", |
| " pd.json_normalize(pdl_data[\"employee_count_by_month\"])\n", |
| " .assign(ticker=pdl_data[\"ticker\"])\n", |
| " .melt(\n", |
| " id_vars=\"ticker\",\n", |
| " var_name=\"year_month\",\n", |
| " value_name=\"employee_count\",\n", |
| " )\n", |
| " .astype({\"year_month\": \"datetime64[ns]\"})\n", |
| " )" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "### Execute your first dataflow\n", |
| "This first cell creates a Hamilton `Driver` used to execute code. \n", |
| "\n", |
| "We pass it the `data_preparation` module define in the cell above (with `%%cell_to_module -m data_preparation`)" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 3, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "hamilton_driver = (\n", |
| " driver.Builder()\n", |
| " .with_modules(data_preparation)\n", |
| " .build()\n", |
| ")" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "We specify the input values and the variables we want to compute.\n", |
| "\n", |
| "In this case, we pass the path to the `pdl_file` and request `company_info`." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 4, |
| "metadata": {}, |
| "outputs": [ |
| { |
| "data": { |
| "image/svg+xml": [ |
| "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n", |
| "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n", |
| " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n", |
| "<!-- Generated by graphviz version 2.43.0 (0)\n", |
| " -->\n", |
| "<!-- Title: %3 Pages: 1 -->\n", |
| "<svg width=\"404pt\" height=\"267pt\"\n", |
| " viewBox=\"0.00 0.00 403.50 267.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n", |
| "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 263)\">\n", |
| "<title>%3</title>\n", |
| "<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-263 399.5,-263 399.5,4 -4,4\"/>\n", |
| "<g id=\"clust1\" class=\"cluster\">\n", |
| "<title>cluster__legend</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" points=\"8,-64 8,-251 104,-251 104,-64 8,-64\"/>\n", |
| "<text text-anchor=\"middle\" x=\"56\" y=\"-235.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n", |
| "</g>\n", |
| "<!-- pdl_data -->\n", |
| "<g id=\"node1\" class=\"node\">\n", |
| "<title>pdl_data</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M224.5,-64C224.5,-64 149.5,-64 149.5,-64 143.5,-64 137.5,-58 137.5,-52 137.5,-52 137.5,-12 137.5,-12 137.5,-6 143.5,0 149.5,0 149.5,0 224.5,0 224.5,0 230.5,0 236.5,-6 236.5,-12 236.5,-12 236.5,-52 236.5,-52 236.5,-58 230.5,-64 224.5,-64\"/>\n", |
| "<text text-anchor=\"start\" x=\"153.5\" y=\"-42.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">pdl_data</text>\n", |
| "<text text-anchor=\"start\" x=\"148.5\" y=\"-14.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- company_info -->\n", |
| "<g id=\"node2\" class=\"node\">\n", |
| "<title>company_info</title>\n", |
| "<path fill=\"#ffc857\" stroke=\"black\" d=\"M383.5,-64C383.5,-64 277.5,-64 277.5,-64 271.5,-64 265.5,-58 265.5,-52 265.5,-52 265.5,-12 265.5,-12 265.5,-6 271.5,0 277.5,0 277.5,0 383.5,0 383.5,0 389.5,0 395.5,-6 395.5,-12 395.5,-12 395.5,-52 395.5,-52 395.5,-58 389.5,-64 383.5,-64\"/>\n", |
| "<text text-anchor=\"start\" x=\"276.5\" y=\"-42.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">company_info</text>\n", |
| "<text text-anchor=\"start\" x=\"292\" y=\"-14.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- pdl_data->company_info -->\n", |
| "<g id=\"edge2\" class=\"edge\">\n", |
| "<title>pdl_data->company_info</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M236.7,-32C242.64,-32 248.81,-32 255.02,-32\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"255.47,-35.5 265.47,-32 255.47,-28.5 255.47,-35.5\"/>\n", |
| "</g>\n", |
| "<!-- _pdl_data_inputs -->\n", |
| "<g id=\"node3\" class=\"node\">\n", |
| "<title>_pdl_data_inputs</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"108.5,-54.5 3.5,-54.5 3.5,-9.5 108.5,-9.5 108.5,-54.5\"/>\n", |
| "<text text-anchor=\"start\" x=\"19\" y=\"-27.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">pdl_file</text>\n", |
| "<text text-anchor=\"start\" x=\"75\" y=\"-27.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n", |
| "</g>\n", |
| "<!-- _pdl_data_inputs->pdl_data -->\n", |
| "<g id=\"edge1\" class=\"edge\">\n", |
| "<title>_pdl_data_inputs->pdl_data</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M108.73,-32C114.74,-32 120.91,-32 127.02,-32\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"127.19,-35.5 137.19,-32 127.19,-28.5 127.19,-35.5\"/>\n", |
| "</g>\n", |
| "<!-- input -->\n", |
| "<g id=\"node4\" class=\"node\">\n", |
| "<title>input</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"85.5,-219.5 26.5,-219.5 26.5,-182.5 85.5,-182.5 85.5,-219.5\"/>\n", |
| "<text text-anchor=\"middle\" x=\"56\" y=\"-197.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n", |
| "</g>\n", |
| "<!-- function -->\n", |
| "<g id=\"node5\" class=\"node\">\n", |
| "<title>function</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M84,-164.5C84,-164.5 28,-164.5 28,-164.5 22,-164.5 16,-158.5 16,-152.5 16,-152.5 16,-139.5 16,-139.5 16,-133.5 22,-127.5 28,-127.5 28,-127.5 84,-127.5 84,-127.5 90,-127.5 96,-133.5 96,-139.5 96,-139.5 96,-152.5 96,-152.5 96,-158.5 90,-164.5 84,-164.5\"/>\n", |
| "<text text-anchor=\"middle\" x=\"56\" y=\"-142.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n", |
| "</g>\n", |
| "<!-- output -->\n", |
| "<g id=\"node6\" class=\"node\">\n", |
| "<title>output</title>\n", |
| "<path fill=\"#ffc857\" stroke=\"black\" d=\"M78,-109.5C78,-109.5 34,-109.5 34,-109.5 28,-109.5 22,-103.5 22,-97.5 22,-97.5 22,-84.5 22,-84.5 22,-78.5 28,-72.5 34,-72.5 34,-72.5 78,-72.5 78,-72.5 84,-72.5 90,-78.5 90,-84.5 90,-84.5 90,-97.5 90,-97.5 90,-103.5 84,-109.5 78,-109.5\"/>\n", |
| "<text text-anchor=\"middle\" x=\"56\" y=\"-87.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n", |
| "</g>\n", |
| "</g>\n", |
| "</svg>\n" |
| ], |
| "text/plain": [ |
| "<graphviz.graphs.Digraph at 0x7f6a0a5e7ca0>" |
| ] |
| }, |
| "metadata": {}, |
| "output_type": "display_data" |
| }, |
| { |
| "data": { |
| "text/html": [ |
| "<div>\n", |
| "<style scoped>\n", |
| " .dataframe tbody tr th:only-of-type {\n", |
| " vertical-align: middle;\n", |
| " }\n", |
| "\n", |
| " .dataframe tbody tr th {\n", |
| " vertical-align: top;\n", |
| " }\n", |
| "\n", |
| " .dataframe thead th {\n", |
| " text-align: right;\n", |
| " }\n", |
| "</style>\n", |
| "<table border=\"1\" class=\"dataframe\">\n", |
| " <thead>\n", |
| " <tr style=\"text-align: right;\">\n", |
| " <th></th>\n", |
| " <th>id</th>\n", |
| " <th>ticker</th>\n", |
| " <th>website</th>\n", |
| " <th>name</th>\n", |
| " <th>display_name</th>\n", |
| " <th>legal_name</th>\n", |
| " <th>founded</th>\n", |
| " <th>industry</th>\n", |
| " <th>type</th>\n", |
| " <th>summary</th>\n", |
| " <th>total_funding_raised</th>\n", |
| " <th>latest_funding_stage</th>\n", |
| " <th>number_funding_rounds</th>\n", |
| " <th>last_funding_date</th>\n", |
| " <th>inferred_revenue</th>\n", |
| " </tr>\n", |
| " </thead>\n", |
| " <tbody>\n", |
| " <tr>\n", |
| " <th>0</th>\n", |
| " <td>firstbankpr</td>\n", |
| " <td>FBP</td>\n", |
| " <td>1firstbank.com</td>\n", |
| " <td>firstbank</td>\n", |
| " <td>FirstBank</td>\n", |
| " <td>FIRST BANCORP</td>\n", |
| " <td>1948.0</td>\n", |
| " <td>banking</td>\n", |
| " <td>public</td>\n", |
| " <td>backed by a history spanning over 70 years, fi...</td>\n", |
| " <td>NaN</td>\n", |
| " <td>None</td>\n", |
| " <td>NaN</td>\n", |
| " <td>None</td>\n", |
| " <td>$100M-$250M</td>\n", |
| " </tr>\n", |
| " <tr>\n", |
| " <th>1</th>\n", |
| " <td>motorolasolutions</td>\n", |
| " <td>MSI</td>\n", |
| " <td>motorolasolutions.com</td>\n", |
| " <td>motorola solutions</td>\n", |
| " <td>Motorola Solutions</td>\n", |
| " <td>Motorola Solutions, Inc.</td>\n", |
| " <td>1928.0</td>\n", |
| " <td>telecommunications</td>\n", |
| " <td>public</td>\n", |
| " <td>motorola solutions is a global leader in publi...</td>\n", |
| " <td>1.000000e+09</td>\n", |
| " <td>post_ipo_equity</td>\n", |
| " <td>1.0</td>\n", |
| " <td>2023-01-06</td>\n", |
| " <td>$10B+</td>\n", |
| " </tr>\n", |
| " <tr>\n", |
| " <th>2</th>\n", |
| " <td>american-equity</td>\n", |
| " <td>AEL-PA</td>\n", |
| " <td>american-equity.com</td>\n", |
| " <td>american equity</td>\n", |
| " <td>American Equity</td>\n", |
| " <td>None</td>\n", |
| " <td>1995.0</td>\n", |
| " <td>insurance</td>\n", |
| " <td>public</td>\n", |
| " <td>american equity investment life insurance comp...</td>\n", |
| " <td>2.530000e+08</td>\n", |
| " <td>post_ipo_equity</td>\n", |
| " <td>2.0</td>\n", |
| " <td>2022-01-07</td>\n", |
| " <td>$250M-$500M</td>\n", |
| " </tr>\n", |
| " </tbody>\n", |
| "</table>\n", |
| "</div>" |
| ], |
| "text/plain": [ |
| " id ticker website name \\\n", |
| "0 firstbankpr FBP 1firstbank.com firstbank \n", |
| "1 motorolasolutions MSI motorolasolutions.com motorola solutions \n", |
| "2 american-equity AEL-PA american-equity.com american equity \n", |
| "\n", |
| " display_name legal_name founded industry \\\n", |
| "0 FirstBank FIRST BANCORP 1948.0 banking \n", |
| "1 Motorola Solutions Motorola Solutions, Inc. 1928.0 telecommunications \n", |
| "2 American Equity None 1995.0 insurance \n", |
| "\n", |
| " type summary \\\n", |
| "0 public backed by a history spanning over 70 years, fi... \n", |
| "1 public motorola solutions is a global leader in publi... \n", |
| "2 public american equity investment life insurance comp... \n", |
| "\n", |
| " total_funding_raised latest_funding_stage number_funding_rounds \\\n", |
| "0 NaN None NaN \n", |
| "1 1.000000e+09 post_ipo_equity 1.0 \n", |
| "2 2.530000e+08 post_ipo_equity 2.0 \n", |
| "\n", |
| " last_funding_date inferred_revenue \n", |
| "0 None $100M-$250M \n", |
| "1 2023-01-06 $10B+ \n", |
| "2 2022-01-07 $250M-$500M " |
| ] |
| }, |
| "metadata": {}, |
| "output_type": "display_data" |
| } |
| ], |
| "source": [ |
| "inputs = dict(pdl_file=\"pdl_data.json\")\n", |
| "\n", |
| "results = hamilton_driver.execute([\"company_info\"], inputs=inputs)\n", |
| "\n", |
| "# `display()` can \"print\" multiple values for a single cell\n", |
| "# display the execution path and the result for `company_info`\n", |
| "display(\n", |
| " hamilton_driver.visualize_execution([\"company_info\"], inputs=inputs),\n", |
| " results[\"company_info\"].head(3)\n", |
| ")" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "## 2. Analytics\n", |
| "We're interested in the potential relationship between employee count and stock growth in private companies (series A to D). \n", |
| "\n", |
| "We will need to:\n", |
| "- filter companies by funding stage\n", |
| "- define a window period since last funding round\n", |
| "- compute growth for employee count and stock\n", |
| "\n", |
| "NOTE. We need to mention imports in each cell with `%%cell_to_module` (e.g., `import pandas as pd`) even if a package was imported previously." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 5, |
| "metadata": {}, |
| "outputs": [ |
| { |
| "data": { |
| "image/svg+xml": [ |
| "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n", |
| "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n", |
| " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n", |
| "<!-- Generated by graphviz version 2.43.0 (0)\n", |
| " -->\n", |
| "<!-- Title: %3 Pages: 1 -->\n", |
| "<svg width=\"1614pt\" height=\"359pt\"\n", |
| " viewBox=\"0.00 0.00 1614.00 359.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n", |
| "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 355)\">\n", |
| "<title>%3</title>\n", |
| "<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-355 1610,-355 1610,4 -4,4\"/>\n", |
| "<g id=\"clust1\" class=\"cluster\">\n", |
| "<title>cluster__legend</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" points=\"66.5,-211 66.5,-343 162.5,-343 162.5,-211 66.5,-211\"/>\n", |
| "<text text-anchor=\"middle\" x=\"114.5\" y=\"-327.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n", |
| "</g>\n", |
| "<!-- employee_growth_rate_since_last_funding_round -->\n", |
| "<g id=\"node1\" class=\"node\">\n", |
| "<title>employee_growth_rate_since_last_funding_round</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M1339,-125C1339,-125 960,-125 960,-125 954,-125 948,-119 948,-113 948,-113 948,-73 948,-73 948,-67 954,-61 960,-61 960,-61 1339,-61 1339,-61 1345,-61 1351,-67 1351,-73 1351,-73 1351,-113 1351,-113 1351,-119 1345,-125 1339,-125\"/>\n", |
| "<text text-anchor=\"start\" x=\"959\" y=\"-103.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">employee_growth_rate_since_last_funding_round</text>\n", |
| "<text text-anchor=\"start\" x=\"1111\" y=\"-75.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- augmented_company_info -->\n", |
| "<g id=\"node6\" class=\"node\">\n", |
| "<title>augmented_company_info</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M1594,-125C1594,-125 1392,-125 1392,-125 1386,-125 1380,-119 1380,-113 1380,-113 1380,-73 1380,-73 1380,-67 1386,-61 1392,-61 1392,-61 1594,-61 1594,-61 1600,-61 1606,-67 1606,-73 1606,-73 1606,-113 1606,-113 1606,-119 1600,-125 1594,-125\"/>\n", |
| "<text text-anchor=\"start\" x=\"1391\" y=\"-103.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">augmented_company_info</text>\n", |
| "<text text-anchor=\"start\" x=\"1454.5\" y=\"-75.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- employee_growth_rate_since_last_funding_round->augmented_company_info -->\n", |
| "<g id=\"edge9\" class=\"edge\">\n", |
| "<title>employee_growth_rate_since_last_funding_round->augmented_company_info</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M1351.13,-93C1357.36,-93 1363.54,-93 1369.62,-93\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"1369.78,-96.5 1379.78,-93 1369.78,-89.5 1369.78,-96.5\"/>\n", |
| "</g>\n", |
| "<!-- selected_companies -->\n", |
| "<g id=\"node2\" class=\"node\">\n", |
| "<title>selected_companies</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M497.5,-65C497.5,-65 341.5,-65 341.5,-65 335.5,-65 329.5,-59 329.5,-53 329.5,-53 329.5,-13 329.5,-13 329.5,-7 335.5,-1 341.5,-1 341.5,-1 497.5,-1 497.5,-1 503.5,-1 509.5,-7 509.5,-13 509.5,-13 509.5,-53 509.5,-53 509.5,-59 503.5,-65 497.5,-65\"/>\n", |
| "<text text-anchor=\"start\" x=\"340.5\" y=\"-43.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">selected_companies</text>\n", |
| "<text text-anchor=\"start\" x=\"381\" y=\"-15.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- employees_since_last_funding_round -->\n", |
| "<g id=\"node4\" class=\"node\">\n", |
| "<title>employees_since_last_funding_round</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M907,-129C907,-129 622,-129 622,-129 616,-129 610,-123 610,-117 610,-117 610,-77 610,-77 610,-71 616,-65 622,-65 622,-65 907,-65 907,-65 913,-65 919,-71 919,-77 919,-77 919,-117 919,-117 919,-123 913,-129 907,-129\"/>\n", |
| "<text text-anchor=\"start\" x=\"621\" y=\"-107.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">employees_since_last_funding_round</text>\n", |
| "<text text-anchor=\"start\" x=\"726\" y=\"-79.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- selected_companies->employees_since_last_funding_round -->\n", |
| "<g id=\"edge4\" class=\"edge\">\n", |
| "<title>selected_companies->employees_since_last_funding_round</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M509.72,-49.65C537.4,-54.81 568.81,-60.67 599.89,-66.47\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"599.49,-69.96 609.96,-68.35 600.77,-63.08 599.49,-69.96\"/>\n", |
| "</g>\n", |
| "<!-- selected_companies->augmented_company_info -->\n", |
| "<g id=\"edge8\" class=\"edge\">\n", |
| "<title>selected_companies->augmented_company_info</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M509.69,-27.11C676.19,-17.81 1045.13,-5.24 1351,-52 1361.91,-53.67 1373.17,-55.95 1384.34,-58.58\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"1383.58,-62 1394.13,-60.98 1385.25,-55.2 1383.58,-62\"/>\n", |
| "</g>\n", |
| "<!-- n_company_by_funding_stage -->\n", |
| "<g id=\"node3\" class=\"node\">\n", |
| "<title>n_company_by_funding_stage</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M533.5,-211C533.5,-211 305.5,-211 305.5,-211 299.5,-211 293.5,-205 293.5,-199 293.5,-199 293.5,-159 293.5,-159 293.5,-153 299.5,-147 305.5,-147 305.5,-147 533.5,-147 533.5,-147 539.5,-147 545.5,-153 545.5,-159 545.5,-159 545.5,-199 545.5,-199 545.5,-205 539.5,-211 533.5,-211\"/>\n", |
| "<text text-anchor=\"start\" x=\"304.5\" y=\"-189.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">n_company_by_funding_stage</text>\n", |
| "<text text-anchor=\"start\" x=\"381\" y=\"-161.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- employees_since_last_funding_round->employee_growth_rate_since_last_funding_round -->\n", |
| "<g id=\"edge1\" class=\"edge\">\n", |
| "<title>employees_since_last_funding_round->employee_growth_rate_since_last_funding_round</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M919.07,-95.4C925.14,-95.33 931.26,-95.27 937.41,-95.2\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"937.81,-98.7 947.77,-95.1 937.73,-91.7 937.81,-98.7\"/>\n", |
| "</g>\n", |
| "<!-- stock_growth_rate_since_last_funding_round -->\n", |
| "<g id=\"node5\" class=\"node\">\n", |
| "<title>stock_growth_rate_since_last_funding_round</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M1321.5,-207C1321.5,-207 977.5,-207 977.5,-207 971.5,-207 965.5,-201 965.5,-195 965.5,-195 965.5,-155 965.5,-155 965.5,-149 971.5,-143 977.5,-143 977.5,-143 1321.5,-143 1321.5,-143 1327.5,-143 1333.5,-149 1333.5,-155 1333.5,-155 1333.5,-195 1333.5,-195 1333.5,-201 1327.5,-207 1321.5,-207\"/>\n", |
| "<text text-anchor=\"start\" x=\"976.5\" y=\"-185.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">stock_growth_rate_since_last_funding_round</text>\n", |
| "<text text-anchor=\"start\" x=\"1111\" y=\"-157.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- employees_since_last_funding_round->stock_growth_rate_since_last_funding_round -->\n", |
| "<g id=\"edge6\" class=\"edge\">\n", |
| "<title>employees_since_last_funding_round->stock_growth_rate_since_last_funding_round</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M919.07,-128.28C939.56,-132.45 960.64,-136.74 981.34,-140.96\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"980.64,-144.39 991.14,-142.95 982.04,-137.53 980.64,-144.39\"/>\n", |
| "</g>\n", |
| "<!-- stock_growth_rate_since_last_funding_round->augmented_company_info -->\n", |
| "<g id=\"edge10\" class=\"edge\">\n", |
| "<title>stock_growth_rate_since_last_funding_round->augmented_company_info</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M1312.98,-142.94C1325.87,-140.04 1338.67,-137.05 1351,-134 1359,-132.02 1367.24,-129.89 1375.51,-127.67\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"1376.5,-131.03 1385.24,-125.03 1374.67,-124.27 1376.5,-131.03\"/>\n", |
| "</g>\n", |
| "<!-- _selected_companies_inputs -->\n", |
| "<g id=\"node7\" class=\"node\">\n", |
| "<title>_selected_companies_inputs</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"229,-66 0,-66 0,0 229,0 229,-66\"/>\n", |
| "<text text-anchor=\"start\" x=\"15.5\" y=\"-39.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">rounds_selection</text>\n", |
| "<text text-anchor=\"start\" x=\"166\" y=\"-39.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">list</text>\n", |
| "<text text-anchor=\"start\" x=\"25.5\" y=\"-18.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">company_info</text>\n", |
| "<text text-anchor=\"start\" x=\"138.5\" y=\"-18.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- _selected_companies_inputs->selected_companies -->\n", |
| "<g id=\"edge2\" class=\"edge\">\n", |
| "<title>_selected_companies_inputs->selected_companies</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M229.19,-33C258.78,-33 290.43,-33 319.21,-33\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"319.27,-36.5 329.27,-33 319.27,-29.5 319.27,-36.5\"/>\n", |
| "</g>\n", |
| "<!-- _n_company_by_funding_stage_inputs -->\n", |
| "<g id=\"node8\" class=\"node\">\n", |
| "<title>_n_company_by_funding_stage_inputs</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"219,-201.5 10,-201.5 10,-156.5 219,-156.5 219,-201.5\"/>\n", |
| "<text text-anchor=\"start\" x=\"25.5\" y=\"-174.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">company_info</text>\n", |
| "<text text-anchor=\"start\" x=\"128.5\" y=\"-174.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- _n_company_by_funding_stage_inputs->n_company_by_funding_stage -->\n", |
| "<g id=\"edge3\" class=\"edge\">\n", |
| "<title>_n_company_by_funding_stage_inputs->n_company_by_funding_stage</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M219.35,-179C239.91,-179 261.77,-179 283.19,-179\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"283.33,-182.5 293.33,-179 283.33,-175.5 283.33,-182.5\"/>\n", |
| "</g>\n", |
| "<!-- _employees_since_last_funding_round_inputs -->\n", |
| "<g id=\"node9\" class=\"node\">\n", |
| "<title>_employees_since_last_funding_round_inputs</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"581,-128.5 258,-128.5 258,-83.5 581,-83.5 581,-128.5\"/>\n", |
| "<text text-anchor=\"start\" x=\"273.5\" y=\"-101.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">employee_count_by_month_df</text>\n", |
| "<text text-anchor=\"start\" x=\"490.5\" y=\"-101.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- _employees_since_last_funding_round_inputs->employees_since_last_funding_round -->\n", |
| "<g id=\"edge5\" class=\"edge\">\n", |
| "<title>_employees_since_last_funding_round_inputs->employees_since_last_funding_round</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M581.2,-101.78C587.34,-101.62 593.5,-101.46 599.64,-101.3\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"600.06,-104.79 609.96,-101.03 599.87,-97.79 600.06,-104.79\"/>\n", |
| "</g>\n", |
| "<!-- _stock_growth_rate_since_last_funding_round_inputs -->\n", |
| "<g id=\"node10\" class=\"node\">\n", |
| "<title>_stock_growth_rate_since_last_funding_round_inputs</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"858,-197.5 671,-197.5 671,-152.5 858,-152.5 858,-197.5\"/>\n", |
| "<text text-anchor=\"start\" x=\"686.5\" y=\"-170.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">stock_data</text>\n", |
| "<text text-anchor=\"start\" x=\"767.5\" y=\"-170.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- _stock_growth_rate_since_last_funding_round_inputs->stock_growth_rate_since_last_funding_round -->\n", |
| "<g id=\"edge7\" class=\"edge\">\n", |
| "<title>_stock_growth_rate_since_last_funding_round_inputs->stock_growth_rate_since_last_funding_round</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M858.21,-175C887.73,-175 921.56,-175 955.46,-175\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"955.5,-178.5 965.5,-175 955.5,-171.5 955.5,-178.5\"/>\n", |
| "</g>\n", |
| "<!-- input -->\n", |
| "<g id=\"node11\" class=\"node\">\n", |
| "<title>input</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"144,-311.5 85,-311.5 85,-274.5 144,-274.5 144,-311.5\"/>\n", |
| "<text text-anchor=\"middle\" x=\"114.5\" y=\"-289.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n", |
| "</g>\n", |
| "<!-- function -->\n", |
| "<g id=\"node12\" class=\"node\">\n", |
| "<title>function</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M142.5,-256.5C142.5,-256.5 86.5,-256.5 86.5,-256.5 80.5,-256.5 74.5,-250.5 74.5,-244.5 74.5,-244.5 74.5,-231.5 74.5,-231.5 74.5,-225.5 80.5,-219.5 86.5,-219.5 86.5,-219.5 142.5,-219.5 142.5,-219.5 148.5,-219.5 154.5,-225.5 154.5,-231.5 154.5,-231.5 154.5,-244.5 154.5,-244.5 154.5,-250.5 148.5,-256.5 142.5,-256.5\"/>\n", |
| "<text text-anchor=\"middle\" x=\"114.5\" y=\"-234.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n", |
| "</g>\n", |
| "</g>\n", |
| "</svg>\n" |
| ], |
| "text/plain": [ |
| "<graphviz.graphs.Digraph at 0x7f6a0a5e7be0>" |
| ] |
| }, |
| "execution_count": 5, |
| "metadata": {}, |
| "output_type": "execute_result" |
| } |
| ], |
| "source": [ |
| "%%cell_to_module -m analytics -d\n", |
| "\n", |
| "import pandas as pd\n", |
| "\n", |
| "\n", |
| "def n_company_by_funding_stage(company_info: pd.DataFrame) -> pd.DataFrame:\n", |
| " \"\"\"Get the number of company per funding stage\"\"\"\n", |
| " return (\n", |
| " company_info\n", |
| " .groupby(\"latest_funding_stage\")[\"latest_funding_stage\"]\n", |
| " .value_counts()\n", |
| " .sort_values(ascending=False)\n", |
| " )\n", |
| "\n", |
| "\n", |
| "def selected_companies(company_info: pd.DataFrame, rounds_selection: list[str]) -> pd.DataFrame:\n", |
| " \"\"\"Companies with `latest_funding_stage` included in `rounds_selection\"\"\"\n", |
| " return company_info.loc[company_info.latest_funding_stage.isin(rounds_selection)]\n", |
| "\n", |
| " \n", |
| "def employees_since_last_funding_round(\n", |
| " employee_count_by_month_df: pd.DataFrame,\n", |
| " selected_companies: pd.DataFrame,\n", |
| ") -> pd.DataFrame:\n", |
| " \"\"\"Select employee count data since the last funding round\"\"\"\n", |
| " employee_count_by_month_df = employee_count_by_month_df.loc[\n", |
| " employee_count_by_month_df.ticker.isin(selected_companies.ticker)\n", |
| " ]\n", |
| " df = pd.merge(\n", |
| " left=employee_count_by_month_df,\n", |
| " right=selected_companies[[\"ticker\", \"last_funding_date\"]],\n", |
| " on=\"ticker\",\n", |
| " how=\"left\"\n", |
| " )\n", |
| " return df.loc[df.year_month > df.last_funding_date]\n", |
| "\n", |
| "\n", |
| "def _growth_rate(group):\n", |
| " \"\"\"aggregation for growth rate; data needs to be sorted\"\"\"\n", |
| " return (group.iloc[-1] - group.iloc[0]) / group.iloc[0]\n", |
| "\n", |
| "\n", |
| "def employee_growth_rate_since_last_funding_round(\n", |
| " employees_since_last_funding_round: pd.DataFrame,\n", |
| ") -> pd.DataFrame:\n", |
| " \"\"\"Employee count growth rate since last funding round\"\"\" \n", |
| " return (\n", |
| " employees_since_last_funding_round\n", |
| " .sort_values(by=\"year_month\", ascending=True)\n", |
| " .groupby(\"ticker\")[\"employee_count\"]\n", |
| " .aggregate(_growth_rate)\n", |
| " .sort_values(ascending=False)\n", |
| " .reset_index()\n", |
| " .rename(columns={\"employee_count\": \"employee_growth\"})\n", |
| " )\n", |
| " \n", |
| " \n", |
| "def stock_growth_rate_since_last_funding_round(\n", |
| " stock_data: pd.DataFrame,\n", |
| " employees_since_last_funding_round: pd.DataFrame,\n", |
| ") -> pd.DataFrame:\n", |
| " \"\"\"Stock data since last funding round.\n", |
| " Returns None is no stock history or window found.\n", |
| " \n", |
| " NOTE. We use the minimum date from the employee count history instead of the true\n", |
| " funding round date to ensure growth rates cover the same period\n", |
| " \"\"\"\n", |
| " period_start = (\n", |
| " employees_since_last_funding_round\n", |
| " .groupby(\"ticker\")[\"year_month\"]\n", |
| " .min()\n", |
| " .reset_index()\n", |
| " )\n", |
| " df = pd.merge(left=stock_data, right=period_start, on=\"ticker\", how=\"inner\")\n", |
| "\n", |
| " stock_growth = dict()\n", |
| " for idx, row in df.iterrows():\n", |
| " history = pd.json_normalize(row[\"historical_price\"]).astype({\"date\": \"datetime64[ns]\"})\n", |
| " \n", |
| " # skip ticker if history is empty\n", |
| " if history.empty:\n", |
| " stock_growth[row.ticker] = None\n", |
| " continue\n", |
| " \n", |
| " window = history[history.date > row.year_month]\n", |
| " \n", |
| " # skip ticker if window is empty\n", |
| " if window.empty:\n", |
| " stock_growth[row.ticker] = None\n", |
| " continue\n", |
| " \n", |
| " stock_growth[row.ticker] = _growth_rate(window[\"close\"])\n", |
| " \n", |
| " return (\n", |
| " pd.DataFrame()\n", |
| " .from_dict(stock_growth, orient=\"index\")\n", |
| " .reset_index()\n", |
| " .rename(columns={\"index\": \"ticker\", 0: \"stock_growth\"})\n", |
| " )\n", |
| " \n", |
| " \n", |
| "def augmented_company_info(\n", |
| " selected_companies: pd.DataFrame,\n", |
| " employee_growth_rate_since_last_funding_round: pd.DataFrame,\n", |
| " stock_growth_rate_since_last_funding_round: pd.DataFrame,\n", |
| ") -> pd.DataFrame:\n", |
| " \"\"\"Merge employee count and stock growth with company info\"\"\"\n", |
| " df = pd.merge(\n", |
| " selected_companies,\n", |
| " employee_growth_rate_since_last_funding_round,\n", |
| " on=\"ticker\",\n", |
| " how=\"left\",\n", |
| " )\n", |
| " df = pd.merge(\n", |
| " df,\n", |
| " stock_growth_rate_since_last_funding_round,\n", |
| " on=\"ticker\",\n", |
| " how=\"left\",\n", |
| " )\n", |
| " return df" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "### Execute the dataflow\n", |
| "We create another Driver to include the newly defined `analytics` module. \n", |
| "\n", |
| "Notice that the `Builder().with_modules()` can receive more than one module. Indeed, the visualization includes nodes both `data_preparation` and `analytics`." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 6, |
| "metadata": {}, |
| "outputs": [ |
| { |
| "data": { |
| "image/svg+xml": [ |
| "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n", |
| "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n", |
| " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n", |
| "<!-- Generated by graphviz version 2.43.0 (0)\n", |
| " -->\n", |
| "<!-- Title: %3 Pages: 1 -->\n", |
| "<svg width=\"1838pt\" height=\"369pt\"\n", |
| " viewBox=\"0.00 0.00 1838.00 369.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n", |
| "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 365)\">\n", |
| "<title>%3</title>\n", |
| "<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-365 1834,-365 1834,4 -4,4\"/>\n", |
| "<g id=\"clust1\" class=\"cluster\">\n", |
| "<title>cluster__legend</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" points=\"8,-221 8,-353 104,-353 104,-221 8,-221\"/>\n", |
| "<text text-anchor=\"middle\" x=\"56\" y=\"-337.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n", |
| "</g>\n", |
| "<!-- employee_count_by_month_df -->\n", |
| "<g id=\"node1\" class=\"node\">\n", |
| "<title>employee_count_by_month_df</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M512,-137C512,-137 281,-137 281,-137 275,-137 269,-131 269,-125 269,-125 269,-85 269,-85 269,-79 275,-73 281,-73 281,-73 512,-73 512,-73 518,-73 524,-79 524,-85 524,-85 524,-125 524,-125 524,-131 518,-137 512,-137\"/>\n", |
| "<text text-anchor=\"start\" x=\"280\" y=\"-115.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">employee_count_by_month_df</text>\n", |
| "<text text-anchor=\"start\" x=\"358\" y=\"-87.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- employees_since_last_funding_round -->\n", |
| "<g id=\"node5\" class=\"node\">\n", |
| "<title>employees_since_last_funding_round</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M1131,-147C1131,-147 846,-147 846,-147 840,-147 834,-141 834,-135 834,-135 834,-95 834,-95 834,-89 840,-83 846,-83 846,-83 1131,-83 1131,-83 1137,-83 1143,-89 1143,-95 1143,-95 1143,-135 1143,-135 1143,-141 1137,-147 1131,-147\"/>\n", |
| "<text text-anchor=\"start\" x=\"845\" y=\"-125.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">employees_since_last_funding_round</text>\n", |
| "<text text-anchor=\"start\" x=\"950\" y=\"-97.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- employee_count_by_month_df->employees_since_last_funding_round -->\n", |
| "<g id=\"edge6\" class=\"edge\">\n", |
| "<title>employee_count_by_month_df->employees_since_last_funding_round</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M524.07,-107.15C611.62,-108.63 728.94,-110.62 823.63,-112.22\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"823.83,-115.73 833.89,-112.4 823.95,-108.73 823.83,-115.73\"/>\n", |
| "</g>\n", |
| "<!-- company_info -->\n", |
| "<g id=\"node2\" class=\"node\">\n", |
| "<title>company_info</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M449.5,-283C449.5,-283 343.5,-283 343.5,-283 337.5,-283 331.5,-277 331.5,-271 331.5,-271 331.5,-231 331.5,-231 331.5,-225 337.5,-219 343.5,-219 343.5,-219 449.5,-219 449.5,-219 455.5,-219 461.5,-225 461.5,-231 461.5,-231 461.5,-271 461.5,-271 461.5,-277 455.5,-283 449.5,-283\"/>\n", |
| "<text text-anchor=\"start\" x=\"342.5\" y=\"-261.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">company_info</text>\n", |
| "<text text-anchor=\"start\" x=\"358\" y=\"-233.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- selected_companies -->\n", |
| "<g id=\"node4\" class=\"node\">\n", |
| "<title>selected_companies</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M757,-207C757,-207 601,-207 601,-207 595,-207 589,-201 589,-195 589,-195 589,-155 589,-155 589,-149 595,-143 601,-143 601,-143 757,-143 757,-143 763,-143 769,-149 769,-155 769,-155 769,-195 769,-195 769,-201 763,-207 757,-207\"/>\n", |
| "<text text-anchor=\"start\" x=\"600\" y=\"-185.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">selected_companies</text>\n", |
| "<text text-anchor=\"start\" x=\"640.5\" y=\"-157.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- company_info->selected_companies -->\n", |
| "<g id=\"edge4\" class=\"edge\">\n", |
| "<title>company_info->selected_companies</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M461.5,-233.66C496.23,-224.25 539.93,-212.41 578.95,-201.84\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"580.03,-205.17 588.76,-199.18 578.2,-198.42 580.03,-205.17\"/>\n", |
| "</g>\n", |
| "<!-- n_company_by_funding_stage -->\n", |
| "<g id=\"node6\" class=\"node\">\n", |
| "<title>n_company_by_funding_stage</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M793,-289C793,-289 565,-289 565,-289 559,-289 553,-283 553,-277 553,-277 553,-237 553,-237 553,-231 559,-225 565,-225 565,-225 793,-225 793,-225 799,-225 805,-231 805,-237 805,-237 805,-277 805,-277 805,-283 799,-289 793,-289\"/>\n", |
| "<text text-anchor=\"start\" x=\"564\" y=\"-267.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">n_company_by_funding_stage</text>\n", |
| "<text text-anchor=\"start\" x=\"640.5\" y=\"-239.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- company_info->n_company_by_funding_stage -->\n", |
| "<g id=\"edge8\" class=\"edge\">\n", |
| "<title>company_info->n_company_by_funding_stage</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M461.5,-252.37C485.8,-252.89 514.49,-253.5 542.89,-254.11\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"542.85,-257.61 552.92,-254.32 543,-250.61 542.85,-257.61\"/>\n", |
| "</g>\n", |
| "<!-- employee_growth_rate_since_last_funding_round -->\n", |
| "<g id=\"node3\" class=\"node\">\n", |
| "<title>employee_growth_rate_since_last_funding_round</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M1563,-147C1563,-147 1184,-147 1184,-147 1178,-147 1172,-141 1172,-135 1172,-135 1172,-95 1172,-95 1172,-89 1178,-83 1184,-83 1184,-83 1563,-83 1563,-83 1569,-83 1575,-89 1575,-95 1575,-95 1575,-135 1575,-135 1575,-141 1569,-147 1563,-147\"/>\n", |
| "<text text-anchor=\"start\" x=\"1183\" y=\"-125.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">employee_growth_rate_since_last_funding_round</text>\n", |
| "<text text-anchor=\"start\" x=\"1335\" y=\"-97.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- augmented_company_info -->\n", |
| "<g id=\"node10\" class=\"node\">\n", |
| "<title>augmented_company_info</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M1818,-147C1818,-147 1616,-147 1616,-147 1610,-147 1604,-141 1604,-135 1604,-135 1604,-95 1604,-95 1604,-89 1610,-83 1616,-83 1616,-83 1818,-83 1818,-83 1824,-83 1830,-89 1830,-95 1830,-95 1830,-135 1830,-135 1830,-141 1824,-147 1818,-147\"/>\n", |
| "<text text-anchor=\"start\" x=\"1615\" y=\"-125.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">augmented_company_info</text>\n", |
| "<text text-anchor=\"start\" x=\"1678.5\" y=\"-97.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- employee_growth_rate_since_last_funding_round->augmented_company_info -->\n", |
| "<g id=\"edge14\" class=\"edge\">\n", |
| "<title>employee_growth_rate_since_last_funding_round->augmented_company_info</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M1575.13,-115C1581.36,-115 1587.54,-115 1593.62,-115\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"1593.78,-118.5 1603.78,-115 1593.78,-111.5 1593.78,-118.5\"/>\n", |
| "</g>\n", |
| "<!-- selected_companies->employees_since_last_funding_round -->\n", |
| "<g id=\"edge7\" class=\"edge\">\n", |
| "<title>selected_companies->employees_since_last_funding_round</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M769.35,-157.56C786.64,-154.19 805.26,-150.56 824.07,-146.89\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"824.77,-150.32 833.92,-144.97 823.43,-143.45 824.77,-150.32\"/>\n", |
| "</g>\n", |
| "<!-- selected_companies->augmented_company_info -->\n", |
| "<g id=\"edge13\" class=\"edge\">\n", |
| "<title>selected_companies->augmented_company_info</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M769.27,-180.77C931.03,-189.52 1282.92,-200.77 1575,-156 1585.91,-154.33 1597.17,-152.05 1608.34,-149.41\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"1609.25,-152.79 1618.13,-147.01 1607.58,-145.99 1609.25,-152.79\"/>\n", |
| "</g>\n", |
| "<!-- employees_since_last_funding_round->employee_growth_rate_since_last_funding_round -->\n", |
| "<g id=\"edge3\" class=\"edge\">\n", |
| "<title>employees_since_last_funding_round->employee_growth_rate_since_last_funding_round</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M1143.07,-115C1149.14,-115 1155.26,-115 1161.41,-115\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"1161.77,-118.5 1171.77,-115 1161.77,-111.5 1161.77,-118.5\"/>\n", |
| "</g>\n", |
| "<!-- stock_growth_rate_since_last_funding_round -->\n", |
| "<g id=\"node9\" class=\"node\">\n", |
| "<title>stock_growth_rate_since_last_funding_round</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M1545.5,-65C1545.5,-65 1201.5,-65 1201.5,-65 1195.5,-65 1189.5,-59 1189.5,-53 1189.5,-53 1189.5,-13 1189.5,-13 1189.5,-7 1195.5,-1 1201.5,-1 1201.5,-1 1545.5,-1 1545.5,-1 1551.5,-1 1557.5,-7 1557.5,-13 1557.5,-13 1557.5,-53 1557.5,-53 1557.5,-59 1551.5,-65 1545.5,-65\"/>\n", |
| "<text text-anchor=\"start\" x=\"1200.5\" y=\"-43.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">stock_growth_rate_since_last_funding_round</text>\n", |
| "<text text-anchor=\"start\" x=\"1335\" y=\"-15.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- employees_since_last_funding_round->stock_growth_rate_since_last_funding_round -->\n", |
| "<g id=\"edge12\" class=\"edge\">\n", |
| "<title>employees_since_last_funding_round->stock_growth_rate_since_last_funding_round</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M1130.92,-82.95C1144.79,-79.89 1158.66,-76.86 1172,-74 1182.51,-71.75 1193.33,-69.45 1204.24,-67.16\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"1205.22,-70.53 1214.29,-65.06 1203.78,-63.68 1205.22,-70.53\"/>\n", |
| "</g>\n", |
| "<!-- pdl_data -->\n", |
| "<g id=\"node7\" class=\"node\">\n", |
| "<title>pdl_data</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M228,-210C228,-210 153,-210 153,-210 147,-210 141,-204 141,-198 141,-198 141,-158 141,-158 141,-152 147,-146 153,-146 153,-146 228,-146 228,-146 234,-146 240,-152 240,-158 240,-158 240,-198 240,-198 240,-204 234,-210 228,-210\"/>\n", |
| "<text text-anchor=\"start\" x=\"157\" y=\"-188.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">pdl_data</text>\n", |
| "<text text-anchor=\"start\" x=\"152\" y=\"-160.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- pdl_data->employee_count_by_month_df -->\n", |
| "<g id=\"edge1\" class=\"edge\">\n", |
| "<title>pdl_data->employee_count_by_month_df</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M240.06,-157.27C249.6,-153.39 259.57,-149.47 269,-146 274.08,-144.13 279.29,-142.26 284.56,-140.41\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"285.81,-143.68 294.11,-137.09 283.51,-137.07 285.81,-143.68\"/>\n", |
| "</g>\n", |
| "<!-- pdl_data->company_info -->\n", |
| "<g id=\"edge2\" class=\"edge\">\n", |
| "<title>pdl_data->company_info</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M240.11,-197.96C249.65,-201.73 259.61,-205.57 269,-209 285.96,-215.2 304.34,-221.53 321.69,-227.32\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"320.62,-230.65 331.21,-230.48 322.82,-224.01 320.62,-230.65\"/>\n", |
| "</g>\n", |
| "<!-- stock_data -->\n", |
| "<g id=\"node8\" class=\"node\">\n", |
| "<title>stock_data</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M1030,-65C1030,-65 947,-65 947,-65 941,-65 935,-59 935,-53 935,-53 935,-13 935,-13 935,-7 941,-1 947,-1 947,-1 1030,-1 1030,-1 1036,-1 1042,-7 1042,-13 1042,-13 1042,-53 1042,-53 1042,-59 1036,-65 1030,-65\"/>\n", |
| "<text text-anchor=\"start\" x=\"946\" y=\"-43.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">stock_data</text>\n", |
| "<text text-anchor=\"start\" x=\"950\" y=\"-15.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- stock_data->stock_growth_rate_since_last_funding_round -->\n", |
| "<g id=\"edge11\" class=\"edge\">\n", |
| "<title>stock_data->stock_growth_rate_since_last_funding_round</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M1042.35,-33C1078.42,-33 1128.64,-33 1179.04,-33\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"1179.12,-36.5 1189.12,-33 1179.12,-29.5 1179.12,-36.5\"/>\n", |
| "</g>\n", |
| "<!-- stock_growth_rate_since_last_funding_round->augmented_company_info -->\n", |
| "<g id=\"edge15\" class=\"edge\">\n", |
| "<title>stock_growth_rate_since_last_funding_round->augmented_company_info</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M1536.98,-65.06C1549.87,-67.96 1562.67,-70.95 1575,-74 1583,-75.98 1591.24,-78.11 1599.51,-80.33\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"1598.67,-83.73 1609.24,-82.97 1600.5,-76.97 1598.67,-83.73\"/>\n", |
| "</g>\n", |
| "<!-- _selected_companies_inputs -->\n", |
| "<g id=\"node11\" class=\"node\">\n", |
| "<title>_selected_companies_inputs</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"483.5,-200.5 309.5,-200.5 309.5,-155.5 483.5,-155.5 483.5,-200.5\"/>\n", |
| "<text text-anchor=\"start\" x=\"324.5\" y=\"-173.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">rounds_selection</text>\n", |
| "<text text-anchor=\"start\" x=\"447.5\" y=\"-173.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">list</text>\n", |
| "</g>\n", |
| "<!-- _selected_companies_inputs->selected_companies -->\n", |
| "<g id=\"edge5\" class=\"edge\">\n", |
| "<title>_selected_companies_inputs->selected_companies</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M483.57,-177.08C513.52,-176.76 547.36,-176.4 578.32,-176.07\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"578.74,-179.56 588.7,-175.96 578.66,-172.56 578.74,-179.56\"/>\n", |
| "</g>\n", |
| "<!-- _pdl_data_inputs -->\n", |
| "<g id=\"node12\" class=\"node\">\n", |
| "<title>_pdl_data_inputs</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"112,-211 0,-211 0,-145 112,-145 112,-211\"/>\n", |
| "<text text-anchor=\"start\" x=\"18.5\" y=\"-184.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">pdl_file</text>\n", |
| "<text text-anchor=\"start\" x=\"78\" y=\"-184.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n", |
| "<text text-anchor=\"start\" x=\"15\" y=\"-163.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">data_dir</text>\n", |
| "<text text-anchor=\"start\" x=\"78\" y=\"-163.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n", |
| "</g>\n", |
| "<!-- _pdl_data_inputs->pdl_data -->\n", |
| "<g id=\"edge9\" class=\"edge\">\n", |
| "<title>_pdl_data_inputs->pdl_data</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M112.04,-178C118.14,-178 124.38,-178 130.53,-178\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"130.76,-181.5 140.76,-178 130.76,-174.5 130.76,-181.5\"/>\n", |
| "</g>\n", |
| "<!-- _stock_data_inputs -->\n", |
| "<g id=\"node13\" class=\"node\">\n", |
| "<title>_stock_data_inputs</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"739,-66 619,-66 619,0 739,0 739,-66\"/>\n", |
| "<text text-anchor=\"start\" x=\"638\" y=\"-39.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">data_dir</text>\n", |
| "<text text-anchor=\"start\" x=\"705\" y=\"-39.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n", |
| "<text text-anchor=\"start\" x=\"634\" y=\"-18.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">stock_file</text>\n", |
| "<text text-anchor=\"start\" x=\"705\" y=\"-18.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n", |
| "</g>\n", |
| "<!-- _stock_data_inputs->stock_data -->\n", |
| "<g id=\"edge10\" class=\"edge\">\n", |
| "<title>_stock_data_inputs->stock_data</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M739.29,-33C792.49,-33 870.34,-33 924.61,-33\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"924.8,-36.5 934.8,-33 924.8,-29.5 924.8,-36.5\"/>\n", |
| "</g>\n", |
| "<!-- input -->\n", |
| "<g id=\"node14\" class=\"node\">\n", |
| "<title>input</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"85.5,-321.5 26.5,-321.5 26.5,-284.5 85.5,-284.5 85.5,-321.5\"/>\n", |
| "<text text-anchor=\"middle\" x=\"56\" y=\"-299.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n", |
| "</g>\n", |
| "<!-- function -->\n", |
| "<g id=\"node15\" class=\"node\">\n", |
| "<title>function</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M84,-266.5C84,-266.5 28,-266.5 28,-266.5 22,-266.5 16,-260.5 16,-254.5 16,-254.5 16,-241.5 16,-241.5 16,-235.5 22,-229.5 28,-229.5 28,-229.5 84,-229.5 84,-229.5 90,-229.5 96,-235.5 96,-241.5 96,-241.5 96,-254.5 96,-254.5 96,-260.5 90,-266.5 84,-266.5\"/>\n", |
| "<text text-anchor=\"middle\" x=\"56\" y=\"-244.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n", |
| "</g>\n", |
| "</g>\n", |
| "</svg>\n" |
| ], |
| "text/plain": [ |
| "<hamilton.driver.Driver at 0x7f6a0a5e78e0>" |
| ] |
| }, |
| "execution_count": 6, |
| "metadata": {}, |
| "output_type": "execute_result" |
| } |
| ], |
| "source": [ |
| "analytics_driver = (\n", |
| " driver.Builder()\n", |
| " .with_modules(data_preparation, analytics)\n", |
| " .build()\n", |
| ")\n", |
| "analytics_driver" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 7, |
| "metadata": {}, |
| "outputs": [ |
| { |
| "data": { |
| "image/svg+xml": [ |
| "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n", |
| "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n", |
| " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n", |
| "<!-- Generated by graphviz version 2.43.0 (0)\n", |
| " -->\n", |
| "<!-- Title: %3 Pages: 1 -->\n", |
| "<svg width=\"1835pt\" height=\"412pt\"\n", |
| " viewBox=\"0.00 0.00 1834.50 412.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n", |
| "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 408)\">\n", |
| "<title>%3</title>\n", |
| "<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-408 1830.5,-408 1830.5,4 -4,4\"/>\n", |
| "<g id=\"clust1\" class=\"cluster\">\n", |
| "<title>cluster__legend</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" points=\"8,-209 8,-396 104,-396 104,-209 8,-209\"/>\n", |
| "<text text-anchor=\"middle\" x=\"56\" y=\"-380.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">Legend</text>\n", |
| "</g>\n", |
| "<!-- company_info -->\n", |
| "<g id=\"node1\" class=\"node\">\n", |
| "<title>company_info</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M446,-282C446,-282 340,-282 340,-282 334,-282 328,-276 328,-270 328,-270 328,-230 328,-230 328,-224 334,-218 340,-218 340,-218 446,-218 446,-218 452,-218 458,-224 458,-230 458,-230 458,-270 458,-270 458,-276 452,-282 446,-282\"/>\n", |
| "<text text-anchor=\"start\" x=\"339\" y=\"-260.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">company_info</text>\n", |
| "<text text-anchor=\"start\" x=\"354.5\" y=\"-232.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- selected_companies -->\n", |
| "<g id=\"node4\" class=\"node\">\n", |
| "<title>selected_companies</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M753.5,-206C753.5,-206 597.5,-206 597.5,-206 591.5,-206 585.5,-200 585.5,-194 585.5,-194 585.5,-154 585.5,-154 585.5,-148 591.5,-142 597.5,-142 597.5,-142 753.5,-142 753.5,-142 759.5,-142 765.5,-148 765.5,-154 765.5,-154 765.5,-194 765.5,-194 765.5,-200 759.5,-206 753.5,-206\"/>\n", |
| "<text text-anchor=\"start\" x=\"596.5\" y=\"-184.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">selected_companies</text>\n", |
| "<text text-anchor=\"start\" x=\"637\" y=\"-156.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- company_info->selected_companies -->\n", |
| "<g id=\"edge4\" class=\"edge\">\n", |
| "<title>company_info->selected_companies</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M458,-232.66C492.73,-223.25 536.43,-211.41 575.45,-200.84\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"576.53,-204.17 585.26,-198.18 574.7,-197.42 576.53,-204.17\"/>\n", |
| "</g>\n", |
| "<!-- n_company_by_funding_stage -->\n", |
| "<g id=\"node5\" class=\"node\">\n", |
| "<title>n_company_by_funding_stage</title>\n", |
| "<path fill=\"#ffc857\" stroke=\"black\" d=\"M789.5,-288C789.5,-288 561.5,-288 561.5,-288 555.5,-288 549.5,-282 549.5,-276 549.5,-276 549.5,-236 549.5,-236 549.5,-230 555.5,-224 561.5,-224 561.5,-224 789.5,-224 789.5,-224 795.5,-224 801.5,-230 801.5,-236 801.5,-236 801.5,-276 801.5,-276 801.5,-282 795.5,-288 789.5,-288\"/>\n", |
| "<text text-anchor=\"start\" x=\"560.5\" y=\"-266.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">n_company_by_funding_stage</text>\n", |
| "<text text-anchor=\"start\" x=\"637\" y=\"-238.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- company_info->n_company_by_funding_stage -->\n", |
| "<g id=\"edge6\" class=\"edge\">\n", |
| "<title>company_info->n_company_by_funding_stage</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M458,-251.37C482.3,-251.89 510.99,-252.5 539.39,-253.11\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"539.35,-256.61 549.42,-253.32 539.5,-249.61 539.35,-256.61\"/>\n", |
| "</g>\n", |
| "<!-- employee_growth_rate_since_last_funding_round -->\n", |
| "<g id=\"node2\" class=\"node\">\n", |
| "<title>employee_growth_rate_since_last_funding_round</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M1559.5,-146C1559.5,-146 1180.5,-146 1180.5,-146 1174.5,-146 1168.5,-140 1168.5,-134 1168.5,-134 1168.5,-94 1168.5,-94 1168.5,-88 1174.5,-82 1180.5,-82 1180.5,-82 1559.5,-82 1559.5,-82 1565.5,-82 1571.5,-88 1571.5,-94 1571.5,-94 1571.5,-134 1571.5,-134 1571.5,-140 1565.5,-146 1559.5,-146\"/>\n", |
| "<text text-anchor=\"start\" x=\"1179.5\" y=\"-124.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">employee_growth_rate_since_last_funding_round</text>\n", |
| "<text text-anchor=\"start\" x=\"1331.5\" y=\"-96.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- augmented_company_info -->\n", |
| "<g id=\"node10\" class=\"node\">\n", |
| "<title>augmented_company_info</title>\n", |
| "<path fill=\"#ffc857\" stroke=\"black\" d=\"M1814.5,-146C1814.5,-146 1612.5,-146 1612.5,-146 1606.5,-146 1600.5,-140 1600.5,-134 1600.5,-134 1600.5,-94 1600.5,-94 1600.5,-88 1606.5,-82 1612.5,-82 1612.5,-82 1814.5,-82 1814.5,-82 1820.5,-82 1826.5,-88 1826.5,-94 1826.5,-94 1826.5,-134 1826.5,-134 1826.5,-140 1820.5,-146 1814.5,-146\"/>\n", |
| "<text text-anchor=\"start\" x=\"1611.5\" y=\"-124.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">augmented_company_info</text>\n", |
| "<text text-anchor=\"start\" x=\"1675\" y=\"-96.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- employee_growth_rate_since_last_funding_round->augmented_company_info -->\n", |
| "<g id=\"edge14\" class=\"edge\">\n", |
| "<title>employee_growth_rate_since_last_funding_round->augmented_company_info</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M1571.63,-114C1577.86,-114 1584.04,-114 1590.12,-114\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"1590.28,-117.5 1600.28,-114 1590.28,-110.5 1590.28,-117.5\"/>\n", |
| "</g>\n", |
| "<!-- employee_count_by_month_df -->\n", |
| "<g id=\"node3\" class=\"node\">\n", |
| "<title>employee_count_by_month_df</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M508.5,-136C508.5,-136 277.5,-136 277.5,-136 271.5,-136 265.5,-130 265.5,-124 265.5,-124 265.5,-84 265.5,-84 265.5,-78 271.5,-72 277.5,-72 277.5,-72 508.5,-72 508.5,-72 514.5,-72 520.5,-78 520.5,-84 520.5,-84 520.5,-124 520.5,-124 520.5,-130 514.5,-136 508.5,-136\"/>\n", |
| "<text text-anchor=\"start\" x=\"276.5\" y=\"-114.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">employee_count_by_month_df</text>\n", |
| "<text text-anchor=\"start\" x=\"354.5\" y=\"-86.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- employees_since_last_funding_round -->\n", |
| "<g id=\"node6\" class=\"node\">\n", |
| "<title>employees_since_last_funding_round</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M1127.5,-146C1127.5,-146 842.5,-146 842.5,-146 836.5,-146 830.5,-140 830.5,-134 830.5,-134 830.5,-94 830.5,-94 830.5,-88 836.5,-82 842.5,-82 842.5,-82 1127.5,-82 1127.5,-82 1133.5,-82 1139.5,-88 1139.5,-94 1139.5,-94 1139.5,-134 1139.5,-134 1139.5,-140 1133.5,-146 1127.5,-146\"/>\n", |
| "<text text-anchor=\"start\" x=\"841.5\" y=\"-124.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">employees_since_last_funding_round</text>\n", |
| "<text text-anchor=\"start\" x=\"946.5\" y=\"-96.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- employee_count_by_month_df->employees_since_last_funding_round -->\n", |
| "<g id=\"edge7\" class=\"edge\">\n", |
| "<title>employee_count_by_month_df->employees_since_last_funding_round</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M520.57,-106.15C608.12,-107.63 725.44,-109.62 820.13,-111.22\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"820.33,-114.73 830.39,-111.4 820.45,-107.73 820.33,-114.73\"/>\n", |
| "</g>\n", |
| "<!-- selected_companies->employees_since_last_funding_round -->\n", |
| "<g id=\"edge8\" class=\"edge\">\n", |
| "<title>selected_companies->employees_since_last_funding_round</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M765.85,-156.56C783.14,-153.19 801.76,-149.56 820.57,-145.89\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"821.27,-149.32 830.42,-143.97 819.93,-142.45 821.27,-149.32\"/>\n", |
| "</g>\n", |
| "<!-- selected_companies->augmented_company_info -->\n", |
| "<g id=\"edge13\" class=\"edge\">\n", |
| "<title>selected_companies->augmented_company_info</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M765.77,-179.77C927.53,-188.52 1279.42,-199.77 1571.5,-155 1582.41,-153.33 1593.67,-151.05 1604.84,-148.41\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"1605.75,-151.79 1614.63,-146.01 1604.08,-144.99 1605.75,-151.79\"/>\n", |
| "</g>\n", |
| "<!-- employees_since_last_funding_round->employee_growth_rate_since_last_funding_round -->\n", |
| "<g id=\"edge2\" class=\"edge\">\n", |
| "<title>employees_since_last_funding_round->employee_growth_rate_since_last_funding_round</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M1139.57,-114C1145.64,-114 1151.76,-114 1157.91,-114\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"1158.27,-117.5 1168.27,-114 1158.27,-110.5 1158.27,-117.5\"/>\n", |
| "</g>\n", |
| "<!-- stock_growth_rate_since_last_funding_round -->\n", |
| "<g id=\"node9\" class=\"node\">\n", |
| "<title>stock_growth_rate_since_last_funding_round</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M1542,-64C1542,-64 1198,-64 1198,-64 1192,-64 1186,-58 1186,-52 1186,-52 1186,-12 1186,-12 1186,-6 1192,0 1198,0 1198,0 1542,0 1542,0 1548,0 1554,-6 1554,-12 1554,-12 1554,-52 1554,-52 1554,-58 1548,-64 1542,-64\"/>\n", |
| "<text text-anchor=\"start\" x=\"1197\" y=\"-42.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">stock_growth_rate_since_last_funding_round</text>\n", |
| "<text text-anchor=\"start\" x=\"1331.5\" y=\"-14.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- employees_since_last_funding_round->stock_growth_rate_since_last_funding_round -->\n", |
| "<g id=\"edge12\" class=\"edge\">\n", |
| "<title>employees_since_last_funding_round->stock_growth_rate_since_last_funding_round</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M1127.42,-81.95C1141.29,-78.89 1155.16,-75.86 1168.5,-73 1179.01,-70.75 1189.83,-68.45 1200.74,-66.16\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"1201.72,-69.53 1210.79,-64.06 1200.28,-62.68 1201.72,-69.53\"/>\n", |
| "</g>\n", |
| "<!-- stock_data -->\n", |
| "<g id=\"node7\" class=\"node\">\n", |
| "<title>stock_data</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M1026.5,-64C1026.5,-64 943.5,-64 943.5,-64 937.5,-64 931.5,-58 931.5,-52 931.5,-52 931.5,-12 931.5,-12 931.5,-6 937.5,0 943.5,0 943.5,0 1026.5,0 1026.5,0 1032.5,0 1038.5,-6 1038.5,-12 1038.5,-12 1038.5,-52 1038.5,-52 1038.5,-58 1032.5,-64 1026.5,-64\"/>\n", |
| "<text text-anchor=\"start\" x=\"942.5\" y=\"-42.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">stock_data</text>\n", |
| "<text text-anchor=\"start\" x=\"946.5\" y=\"-14.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- stock_data->stock_growth_rate_since_last_funding_round -->\n", |
| "<g id=\"edge11\" class=\"edge\">\n", |
| "<title>stock_data->stock_growth_rate_since_last_funding_round</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M1038.85,-32C1074.92,-32 1125.14,-32 1175.54,-32\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"1175.62,-35.5 1185.62,-32 1175.62,-28.5 1175.62,-35.5\"/>\n", |
| "</g>\n", |
| "<!-- pdl_data -->\n", |
| "<g id=\"node8\" class=\"node\">\n", |
| "<title>pdl_data</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M224.5,-209C224.5,-209 149.5,-209 149.5,-209 143.5,-209 137.5,-203 137.5,-197 137.5,-197 137.5,-157 137.5,-157 137.5,-151 143.5,-145 149.5,-145 149.5,-145 224.5,-145 224.5,-145 230.5,-145 236.5,-151 236.5,-157 236.5,-157 236.5,-197 236.5,-197 236.5,-203 230.5,-209 224.5,-209\"/>\n", |
| "<text text-anchor=\"start\" x=\"153.5\" y=\"-187.8\" font-family=\"Helvetica,sans-Serif\" font-weight=\"bold\" font-size=\"14.00\">pdl_data</text>\n", |
| "<text text-anchor=\"start\" x=\"148.5\" y=\"-159.8\" font-family=\"Helvetica,sans-Serif\" font-style=\"italic\" font-size=\"14.00\">DataFrame</text>\n", |
| "</g>\n", |
| "<!-- pdl_data->company_info -->\n", |
| "<g id=\"edge1\" class=\"edge\">\n", |
| "<title>pdl_data->company_info</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M236.56,-197.73C246.1,-201.61 256.07,-205.53 265.5,-209 282.5,-215.25 300.97,-221.53 318.4,-227.22\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"317.37,-230.56 327.96,-230.32 319.53,-223.9 317.37,-230.56\"/>\n", |
| "</g>\n", |
| "<!-- pdl_data->employee_count_by_month_df -->\n", |
| "<g id=\"edge3\" class=\"edge\">\n", |
| "<title>pdl_data->employee_count_by_month_df</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M236.56,-156.27C246.1,-152.39 256.07,-148.47 265.5,-145 270.58,-143.13 275.79,-141.26 281.06,-139.41\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"282.31,-142.68 290.61,-136.09 280.01,-136.07 282.31,-142.68\"/>\n", |
| "</g>\n", |
| "<!-- stock_growth_rate_since_last_funding_round->augmented_company_info -->\n", |
| "<g id=\"edge15\" class=\"edge\">\n", |
| "<title>stock_growth_rate_since_last_funding_round->augmented_company_info</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M1533.48,-64.06C1546.37,-66.96 1559.17,-69.95 1571.5,-73 1579.5,-74.98 1587.74,-77.11 1596.01,-79.33\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"1595.17,-82.73 1605.74,-81.97 1597,-75.97 1595.17,-82.73\"/>\n", |
| "</g>\n", |
| "<!-- _selected_companies_inputs -->\n", |
| "<g id=\"node11\" class=\"node\">\n", |
| "<title>_selected_companies_inputs</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"480,-199.5 306,-199.5 306,-154.5 480,-154.5 480,-199.5\"/>\n", |
| "<text text-anchor=\"start\" x=\"321\" y=\"-172.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">rounds_selection</text>\n", |
| "<text text-anchor=\"start\" x=\"444\" y=\"-172.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">list</text>\n", |
| "</g>\n", |
| "<!-- _selected_companies_inputs->selected_companies -->\n", |
| "<g id=\"edge5\" class=\"edge\">\n", |
| "<title>_selected_companies_inputs->selected_companies</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M480.07,-176.08C510.02,-175.76 543.86,-175.4 574.82,-175.07\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"575.24,-178.56 585.2,-174.96 575.16,-171.56 575.24,-178.56\"/>\n", |
| "</g>\n", |
| "<!-- _stock_data_inputs -->\n", |
| "<g id=\"node12\" class=\"node\">\n", |
| "<title>_stock_data_inputs</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"735.5,-54.5 615.5,-54.5 615.5,-9.5 735.5,-9.5 735.5,-54.5\"/>\n", |
| "<text text-anchor=\"start\" x=\"630.5\" y=\"-27.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">stock_file</text>\n", |
| "<text text-anchor=\"start\" x=\"701.5\" y=\"-27.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n", |
| "</g>\n", |
| "<!-- _stock_data_inputs->stock_data -->\n", |
| "<g id=\"edge9\" class=\"edge\">\n", |
| "<title>_stock_data_inputs->stock_data</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M735.79,-32C788.99,-32 866.84,-32 921.11,-32\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"921.3,-35.5 931.3,-32 921.3,-28.5 921.3,-35.5\"/>\n", |
| "</g>\n", |
| "<!-- _pdl_data_inputs -->\n", |
| "<g id=\"node13\" class=\"node\">\n", |
| "<title>_pdl_data_inputs</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"108.5,-199.5 3.5,-199.5 3.5,-154.5 108.5,-154.5 108.5,-199.5\"/>\n", |
| "<text text-anchor=\"start\" x=\"19\" y=\"-172.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">pdl_file</text>\n", |
| "<text text-anchor=\"start\" x=\"75\" y=\"-172.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">str</text>\n", |
| "</g>\n", |
| "<!-- _pdl_data_inputs->pdl_data -->\n", |
| "<g id=\"edge10\" class=\"edge\">\n", |
| "<title>_pdl_data_inputs->pdl_data</title>\n", |
| "<path fill=\"none\" stroke=\"black\" d=\"M108.73,-177C114.74,-177 120.91,-177 127.02,-177\"/>\n", |
| "<polygon fill=\"black\" stroke=\"black\" points=\"127.19,-180.5 137.19,-177 127.19,-173.5 127.19,-180.5\"/>\n", |
| "</g>\n", |
| "<!-- input -->\n", |
| "<g id=\"node14\" class=\"node\">\n", |
| "<title>input</title>\n", |
| "<polygon fill=\"#ffffff\" stroke=\"black\" stroke-dasharray=\"5,2\" points=\"85.5,-364.5 26.5,-364.5 26.5,-327.5 85.5,-327.5 85.5,-364.5\"/>\n", |
| "<text text-anchor=\"middle\" x=\"56\" y=\"-342.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">input</text>\n", |
| "</g>\n", |
| "<!-- function -->\n", |
| "<g id=\"node15\" class=\"node\">\n", |
| "<title>function</title>\n", |
| "<path fill=\"#b4d8e4\" stroke=\"black\" d=\"M84,-309.5C84,-309.5 28,-309.5 28,-309.5 22,-309.5 16,-303.5 16,-297.5 16,-297.5 16,-284.5 16,-284.5 16,-278.5 22,-272.5 28,-272.5 28,-272.5 84,-272.5 84,-272.5 90,-272.5 96,-278.5 96,-284.5 96,-284.5 96,-297.5 96,-297.5 96,-303.5 90,-309.5 84,-309.5\"/>\n", |
| "<text text-anchor=\"middle\" x=\"56\" y=\"-287.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">function</text>\n", |
| "</g>\n", |
| "<!-- output -->\n", |
| "<g id=\"node16\" class=\"node\">\n", |
| "<title>output</title>\n", |
| "<path fill=\"#ffc857\" stroke=\"black\" d=\"M78,-254.5C78,-254.5 34,-254.5 34,-254.5 28,-254.5 22,-248.5 22,-242.5 22,-242.5 22,-229.5 22,-229.5 22,-223.5 28,-217.5 34,-217.5 34,-217.5 78,-217.5 78,-217.5 84,-217.5 90,-223.5 90,-229.5 90,-229.5 90,-242.5 90,-242.5 90,-248.5 84,-254.5 78,-254.5\"/>\n", |
| "<text text-anchor=\"middle\" x=\"56\" y=\"-232.3\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">output</text>\n", |
| "</g>\n", |
| "</g>\n", |
| "</svg>\n" |
| ], |
| "text/plain": [ |
| "<graphviz.graphs.Digraph at 0x7f6a0a5e6b90>" |
| ] |
| }, |
| "metadata": {}, |
| "output_type": "display_data" |
| }, |
| { |
| "data": { |
| "text/html": [ |
| "<div>\n", |
| "<style scoped>\n", |
| " .dataframe tbody tr th:only-of-type {\n", |
| " vertical-align: middle;\n", |
| " }\n", |
| "\n", |
| " .dataframe tbody tr th {\n", |
| " vertical-align: top;\n", |
| " }\n", |
| "\n", |
| " .dataframe thead th {\n", |
| " text-align: right;\n", |
| " }\n", |
| "</style>\n", |
| "<table border=\"1\" class=\"dataframe\">\n", |
| " <thead>\n", |
| " <tr style=\"text-align: right;\">\n", |
| " <th></th>\n", |
| " <th>id</th>\n", |
| " <th>ticker</th>\n", |
| " <th>website</th>\n", |
| " <th>name</th>\n", |
| " <th>display_name</th>\n", |
| " <th>legal_name</th>\n", |
| " <th>founded</th>\n", |
| " <th>industry</th>\n", |
| " <th>type</th>\n", |
| " <th>summary</th>\n", |
| " <th>total_funding_raised</th>\n", |
| " <th>latest_funding_stage</th>\n", |
| " <th>number_funding_rounds</th>\n", |
| " <th>last_funding_date</th>\n", |
| " <th>inferred_revenue</th>\n", |
| " <th>employee_growth</th>\n", |
| " <th>stock_growth</th>\n", |
| " </tr>\n", |
| " </thead>\n", |
| " <tbody>\n", |
| " <tr>\n", |
| " <th>0</th>\n", |
| " <td>relxtech</td>\n", |
| " <td>RLX</td>\n", |
| " <td>relxtech.com</td>\n", |
| " <td>relx technology</td>\n", |
| " <td>Relx Technology</td>\n", |
| " <td>None</td>\n", |
| " <td>2018.0</td>\n", |
| " <td>consumer electronics</td>\n", |
| " <td>public</td>\n", |
| " <td>-</td>\n", |
| " <td>5.755918e+06</td>\n", |
| " <td>series_c</td>\n", |
| " <td>4.0</td>\n", |
| " <td>2019-08-15</td>\n", |
| " <td>$50M-$100M</td>\n", |
| " <td>0.552632</td>\n", |
| " <td>14.133333</td>\n", |
| " </tr>\n", |
| " <tr>\n", |
| " <th>1</th>\n", |
| " <td>beike</td>\n", |
| " <td>BEKE</td>\n", |
| " <td>ke.com</td>\n", |
| " <td>贝壳找房ke.com</td>\n", |
| " <td>贝壳找房ke.com</td>\n", |
| " <td>None</td>\n", |
| " <td>2018.0</td>\n", |
| " <td>real estate</td>\n", |
| " <td>public</td>\n", |
| " <td>beiker is a technology-driven new housing serv...</td>\n", |
| " <td>3.602538e+09</td>\n", |
| " <td>series_d</td>\n", |
| " <td>6.0</td>\n", |
| " <td>2020-03-05</td>\n", |
| " <td>$250M-$500M</td>\n", |
| " <td>0.112971</td>\n", |
| " <td>1.653437</td>\n", |
| " </tr>\n", |
| " <tr>\n", |
| " <th>2</th>\n", |
| " <td>upstart-network</td>\n", |
| " <td>UPST</td>\n", |
| " <td>upstart.com</td>\n", |
| " <td>upstart</td>\n", |
| " <td>Upstart</td>\n", |
| " <td>None</td>\n", |
| " <td>2012.0</td>\n", |
| " <td>financial services</td>\n", |
| " <td>public</td>\n", |
| " <td>founded by ex-googlers, upstart is the first l...</td>\n", |
| " <td>1.440500e+08</td>\n", |
| " <td>series_d</td>\n", |
| " <td>7.0</td>\n", |
| " <td>2019-04-08</td>\n", |
| " <td>$500M-$1B</td>\n", |
| " <td>3.952218</td>\n", |
| " <td>0.220795</td>\n", |
| " </tr>\n", |
| " <tr>\n", |
| " <th>3</th>\n", |
| " <td>alarm-com</td>\n", |
| " <td>ALRM</td>\n", |
| " <td>alarm.com</td>\n", |
| " <td>alarm.com</td>\n", |
| " <td>Alarm.com</td>\n", |
| " <td>None</td>\n", |
| " <td>2000.0</td>\n", |
| " <td>information technology and services</td>\n", |
| " <td>public</td>\n", |
| " <td>alarm.com is the leading platform for the inte...</td>\n", |
| " <td>1.630000e+08</td>\n", |
| " <td>series_b</td>\n", |
| " <td>2.0</td>\n", |
| " <td>2012-07-24</td>\n", |
| " <td>$250M-$500M</td>\n", |
| " <td>7.906040</td>\n", |
| " <td>-0.154480</td>\n", |
| " </tr>\n", |
| " <tr>\n", |
| " <th>4</th>\n", |
| " <td>51talkhq</td>\n", |
| " <td>COE</td>\n", |
| " <td>51talk.com</td>\n", |
| " <td>51talk hq</td>\n", |
| " <td>51Talk HQ</td>\n", |
| " <td>None</td>\n", |
| " <td>2011.0</td>\n", |
| " <td>internet</td>\n", |
| " <td>public</td>\n", |
| " <td>founded in 2011, headquartered in singapore, 5...</td>\n", |
| " <td>6.712601e+07</td>\n", |
| " <td>series_c</td>\n", |
| " <td>5.0</td>\n", |
| " <td>2014-10-23</td>\n", |
| " <td>$100M-$250M</td>\n", |
| " <td>3.103896</td>\n", |
| " <td>2.714286</td>\n", |
| " </tr>\n", |
| " </tbody>\n", |
| "</table>\n", |
| "</div>" |
| ], |
| "text/plain": [ |
| " id ticker website name display_name \\\n", |
| "0 relxtech RLX relxtech.com relx technology Relx Technology \n", |
| "1 beike BEKE ke.com 贝壳找房ke.com 贝壳找房ke.com \n", |
| "2 upstart-network UPST upstart.com upstart Upstart \n", |
| "3 alarm-com ALRM alarm.com alarm.com Alarm.com \n", |
| "4 51talkhq COE 51talk.com 51talk hq 51Talk HQ \n", |
| "\n", |
| " legal_name founded industry type \\\n", |
| "0 None 2018.0 consumer electronics public \n", |
| "1 None 2018.0 real estate public \n", |
| "2 None 2012.0 financial services public \n", |
| "3 None 2000.0 information technology and services public \n", |
| "4 None 2011.0 internet public \n", |
| "\n", |
| " summary total_funding_raised \\\n", |
| "0 - 5.755918e+06 \n", |
| "1 beiker is a technology-driven new housing serv... 3.602538e+09 \n", |
| "2 founded by ex-googlers, upstart is the first l... 1.440500e+08 \n", |
| "3 alarm.com is the leading platform for the inte... 1.630000e+08 \n", |
| "4 founded in 2011, headquartered in singapore, 5... 6.712601e+07 \n", |
| "\n", |
| " latest_funding_stage number_funding_rounds last_funding_date \\\n", |
| "0 series_c 4.0 2019-08-15 \n", |
| "1 series_d 6.0 2020-03-05 \n", |
| "2 series_d 7.0 2019-04-08 \n", |
| "3 series_b 2.0 2012-07-24 \n", |
| "4 series_c 5.0 2014-10-23 \n", |
| "\n", |
| " inferred_revenue employee_growth stock_growth \n", |
| "0 $50M-$100M 0.552632 14.133333 \n", |
| "1 $250M-$500M 0.112971 1.653437 \n", |
| "2 $500M-$1B 3.952218 0.220795 \n", |
| "3 $250M-$500M 7.906040 -0.154480 \n", |
| "4 $100M-$250M 3.103896 2.714286 " |
| ] |
| }, |
| "metadata": {}, |
| "output_type": "display_data" |
| } |
| ], |
| "source": [ |
| "inputs = dict(\n", |
| " pdl_file=\"pdl_data.json\",\n", |
| " stock_file=\"stock_data.json\",\n", |
| " rounds_selection=[\"series_a\", \"series_b\", \"series_c\", \"series_d\"]\n", |
| ")\n", |
| "\n", |
| "final_vars = [\n", |
| " \"n_company_by_funding_stage\",\n", |
| " \"augmented_company_info\",\n", |
| "]\n", |
| "\n", |
| "results = analytics_driver.execute(final_vars, inputs=inputs)\n", |
| "\n", |
| "display(\n", |
| " analytics_driver.visualize_execution(final_vars, inputs=inputs),\n", |
| " results[\"augmented_company_info\"].head(),\n", |
| ")" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "# Conclusion\n", |
| "Congrats! You concluded the introduction to People Data Labs + Hamilton!\n", |
| "\n", |
| "You now know the basics of Hamilton and how it can help you define data transformations. If you haven't, visit the README of the repository to learn how to use Hamilton outside of notebooks." |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "### Resources\n", |
| "- [PDL Blog](https://blog.peopledatalabs.com/) and [PDL Recipes](https://docs.peopledatalabs.com/recipes)\n", |
| "- [Interactive Hamilton training](https://www.tryhamilton.dev/hamilton-basics/jumping-in)\n", |
| "- [Hamilton documentation](https://hamilton.dagworks.io/en/latest/concepts/node/)\n", |
| "- more [Hamilton code examples](https://github.com/DAGWorks-Inc/hamilton/tree/main/examples) and integrations with Python tools." |
| ] |
| } |
| ], |
| "metadata": { |
| "kernelspec": { |
| "display_name": "venv", |
| "language": "python", |
| "name": "python3" |
| }, |
| "language_info": { |
| "codemirror_mode": { |
| "name": "ipython", |
| "version": 3 |
| }, |
| "file_extension": ".py", |
| "mimetype": "text/x-python", |
| "name": "python", |
| "nbconvert_exporter": "python", |
| "pygments_lexer": "ipython3", |
| "version": "3.10.9" |
| } |
| }, |
| "nbformat": 4, |
| "nbformat_minor": 2 |
| } |