blob: 40625ae257bcaf843fc2395ee5eb31d98f172dd5 [file] [log] [blame]
<!DOCTYPE html>
<html lang="en" data-content_root="../">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
<title>Data Sources &#8212; Apache Arrow DataFusion documentation</title>
<link href="../_static/styles/theme.css?digest=1999514e3f237ded88cf" rel="stylesheet">
<link href="../_static/styles/pydata-sphinx-theme.css?digest=1999514e3f237ded88cf" rel="stylesheet">
<link rel="stylesheet"
href="../_static/vendor/fontawesome/5.13.0/css/all.min.css">
<link rel="preload" as="font" type="font/woff2" crossorigin
href="../_static/vendor/fontawesome/5.13.0/webfonts/fa-solid-900.woff2">
<link rel="preload" as="font" type="font/woff2" crossorigin
href="../_static/vendor/fontawesome/5.13.0/webfonts/fa-brands-400.woff2">
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=8f2a1f02" />
<link rel="stylesheet" type="text/css" href="../_static/styles/pydata-sphinx-theme.css?v=1140d252" />
<link rel="stylesheet" type="text/css" href="../_static/graphviz.css?v=4ae1632d" />
<link rel="stylesheet" type="text/css" href="../_static/theme_overrides.css?v=dca7052a" />
<link rel="preload" as="script" href="../_static/scripts/pydata-sphinx-theme.js?digest=1999514e3f237ded88cf">
<script src="../_static/documentation_options.js?v=8a448e45"></script>
<script src="../_static/doctools.js?v=9bcbadda"></script>
<script src="../_static/sphinx_highlight.js?v=dc90522c"></script>
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="DataFrames" href="dataframe/index.html" />
<link rel="prev" title="Concepts" href="basics.html" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="docsearch:language" content="en">
<!-- Google Analytics -->
</head>
<body data-spy="scroll" data-target="#bd-toc-nav" data-offset="80">
<div class="container-fluid" id="banner"></div>
<div class="container-xl">
<div class="row">
<!-- Only show if we have sidebars configured, else just a small margin -->
<div class="col-12 col-md-3 bd-sidebar">
<div class="sidebar-start-items">
<a class="navbar-brand" href="../index.html">
<img src="../_static/images/2x_bgwhite_original.png" class="logo" alt="logo">
</a>
<form class="bd-search d-flex align-items-center" action="../search.html" method="get">
<i class="icon fas fa-search"></i>
<input type="search" class="form-control" name="q" id="search-input" placeholder="Search the docs ..." aria-label="Search the docs ..." autocomplete="off" >
</form>
<nav class="bd-links" id="bd-docs-nav" aria-label="Main navigation">
<div class="bd-toc-item active">
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
LINKS
</span>
</p>
<ul class="nav bd-sidenav">
<li class="toctree-l1">
<a class="reference external" href="https://github.com/apache/datafusion-python">
Github and Issue Tracker
</a>
</li>
<li class="toctree-l1">
<a class="reference external" href="https://docs.rs/datafusion/latest/datafusion/">
Rust's API Docs
</a>
</li>
<li class="toctree-l1">
<a class="reference external" href="https://github.com/apache/datafusion/blob/main/CODE_OF_CONDUCT.md">
Code of conduct
</a>
</li>
<li class="toctree-l1">
<a class="reference external" href="https://github.com/apache/datafusion-python/tree/main/examples">
Examples
</a>
</li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
USER GUIDE
</span>
</p>
<ul class="current nav bd-sidenav">
<li class="toctree-l1">
<a class="reference internal" href="introduction.html">
Introduction
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="basics.html">
Concepts
</a>
</li>
<li class="toctree-l1 current active">
<a class="current reference internal" href="#">
Data Sources
</a>
</li>
<li class="toctree-l1 has-children">
<a class="reference internal" href="dataframe/index.html">
DataFrames
</a>
<input class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" type="checkbox"/>
<label for="toctree-checkbox-1">
<i class="fas fa-chevron-down">
</i>
</label>
<ul>
<li class="toctree-l2">
<a class="reference internal" href="dataframe/rendering.html">
HTML Rendering in Jupyter
</a>
</li>
</ul>
</li>
<li class="toctree-l1 has-children">
<a class="reference internal" href="common-operations/index.html">
Common Operations
</a>
<input class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" type="checkbox"/>
<label for="toctree-checkbox-2">
<i class="fas fa-chevron-down">
</i>
</label>
<ul>
<li class="toctree-l2">
<a class="reference internal" href="common-operations/views.html">
Registering Views
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="common-operations/basic-info.html">
Basic Operations
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="common-operations/select-and-filter.html">
Column Selections
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="common-operations/expressions.html">
Expressions
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="common-operations/joins.html">
Joins
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="common-operations/functions.html">
Functions
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="common-operations/aggregations.html">
Aggregation
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="common-operations/windows.html">
Window Functions
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="common-operations/udf-and-udfa.html">
User-Defined Functions
</a>
</li>
</ul>
</li>
<li class="toctree-l1 has-children">
<a class="reference internal" href="io/index.html">
IO
</a>
<input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" type="checkbox"/>
<label for="toctree-checkbox-3">
<i class="fas fa-chevron-down">
</i>
</label>
<ul>
<li class="toctree-l2">
<a class="reference internal" href="io/arrow.html">
Arrow
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="io/avro.html">
Avro
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="io/csv.html">
CSV
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="io/json.html">
JSON
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="io/parquet.html">
Parquet
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="io/table_provider.html">
Custom Table Provider
</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<a class="reference internal" href="configuration.html">
Configuration
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="sql.html">
SQL
</a>
</li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
CONTRIBUTOR GUIDE
</span>
</p>
<ul class="nav bd-sidenav">
<li class="toctree-l1">
<a class="reference internal" href="../contributor-guide/introduction.html">
Introduction
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="../contributor-guide/ffi.html">
Python Extensions
</a>
</li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
API
</span>
</p>
<ul class="nav bd-sidenav">
<li class="toctree-l1 has-children">
<a class="reference internal" href="../autoapi/index.html">
API Reference
</a>
<input class="toctree-checkbox" id="toctree-checkbox-4" name="toctree-checkbox-4" type="checkbox"/>
<label for="toctree-checkbox-4">
<i class="fas fa-chevron-down">
</i>
</label>
<ul>
<li class="toctree-l2 has-children">
<a class="reference internal" href="../autoapi/datafusion/index.html">
datafusion
</a>
<input class="toctree-checkbox" id="toctree-checkbox-5" name="toctree-checkbox-5" type="checkbox"/>
<label for="toctree-checkbox-5">
<i class="fas fa-chevron-down">
</i>
</label>
<ul>
<li class="toctree-l3">
<a class="reference internal" href="../autoapi/datafusion/catalog/index.html">
datafusion.catalog
</a>
</li>
<li class="toctree-l3">
<a class="reference internal" href="../autoapi/datafusion/context/index.html">
datafusion.context
</a>
</li>
<li class="toctree-l3">
<a class="reference internal" href="../autoapi/datafusion/dataframe/index.html">
datafusion.dataframe
</a>
</li>
<li class="toctree-l3">
<a class="reference internal" href="../autoapi/datafusion/dataframe_formatter/index.html">
datafusion.dataframe_formatter
</a>
</li>
<li class="toctree-l3">
<a class="reference internal" href="../autoapi/datafusion/expr/index.html">
datafusion.expr
</a>
</li>
<li class="toctree-l3">
<a class="reference internal" href="../autoapi/datafusion/functions/index.html">
datafusion.functions
</a>
</li>
<li class="toctree-l3">
<a class="reference internal" href="../autoapi/datafusion/html_formatter/index.html">
datafusion.html_formatter
</a>
</li>
<li class="toctree-l3 has-children">
<a class="reference internal" href="../autoapi/datafusion/input/index.html">
datafusion.input
</a>
<input class="toctree-checkbox" id="toctree-checkbox-6" name="toctree-checkbox-6" type="checkbox"/>
<label for="toctree-checkbox-6">
<i class="fas fa-chevron-down">
</i>
</label>
<ul>
<li class="toctree-l4">
<a class="reference internal" href="../autoapi/datafusion/input/base/index.html">
datafusion.input.base
</a>
</li>
<li class="toctree-l4">
<a class="reference internal" href="../autoapi/datafusion/input/location/index.html">
datafusion.input.location
</a>
</li>
</ul>
</li>
<li class="toctree-l3">
<a class="reference internal" href="../autoapi/datafusion/io/index.html">
datafusion.io
</a>
</li>
<li class="toctree-l3">
<a class="reference internal" href="../autoapi/datafusion/object_store/index.html">
datafusion.object_store
</a>
</li>
<li class="toctree-l3">
<a class="reference internal" href="../autoapi/datafusion/plan/index.html">
datafusion.plan
</a>
</li>
<li class="toctree-l3">
<a class="reference internal" href="../autoapi/datafusion/record_batch/index.html">
datafusion.record_batch
</a>
</li>
<li class="toctree-l3">
<a class="reference internal" href="../autoapi/datafusion/substrait/index.html">
datafusion.substrait
</a>
</li>
<li class="toctree-l3">
<a class="reference internal" href="../autoapi/datafusion/unparser/index.html">
datafusion.unparser
</a>
</li>
<li class="toctree-l3">
<a class="reference internal" href="../autoapi/datafusion/user_defined/index.html">
datafusion.user_defined
</a>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</div>
</nav>
</div>
<div class="sidebar-end-items">
</div>
</div>
<div class="d-none d-xl-block col-xl-2 bd-toc">
<div class="toc-item">
<div class="tocsection onthispage pt-5 pb-3">
<i class="fas fa-list"></i> On this page
</div>
<nav id="bd-toc-nav">
<ul class="visible nav section-nav flex-column">
<li class="toc-h1 nav-item toc-entry">
<a class="reference internal nav-link" href="#">
Data Sources
</a>
<ul class="visible nav section-nav flex-column">
<li class="toc-h2 nav-item toc-entry">
<a class="reference internal nav-link" href="#local-file">
Local file
</a>
</li>
<li class="toc-h2 nav-item toc-entry">
<a class="reference internal nav-link" href="#create-in-memory">
Create in-memory
</a>
</li>
<li class="toc-h2 nav-item toc-entry">
<a class="reference internal nav-link" href="#object-store">
Object Store
</a>
</li>
<li class="toc-h2 nav-item toc-entry">
<a class="reference internal nav-link" href="#other-dataframe-libraries">
Other DataFrame Libraries
</a>
</li>
<li class="toc-h2 nav-item toc-entry">
<a class="reference internal nav-link" href="#delta-lake">
Delta Lake
</a>
</li>
<li class="toc-h2 nav-item toc-entry">
<a class="reference internal nav-link" href="#apache-iceberg">
Apache Iceberg
</a>
</li>
<li class="toc-h2 nav-item toc-entry">
<a class="reference internal nav-link" href="#custom-table-provider">
Custom Table Provider
</a>
</li>
</ul>
</li>
<li class="toc-h1 nav-item toc-entry">
<a class="reference internal nav-link" href="#catalog">
Catalog
</a>
<ul class="visible nav section-nav flex-column">
<li class="toc-h2 nav-item toc-entry">
<a class="reference internal nav-link" href="#user-defined-catalog-and-schema">
User Defined Catalog and Schema
</a>
</li>
</ul>
</li>
</ul>
</nav>
</div>
<div class="toc-item">
</div>
</div>
<main class="col-12 col-md-9 col-xl-7 py-md-5 pl-md-5 pr-md-4 bd-content" role="main">
<div>
<section id="data-sources">
<span id="user-guide-data-sources"></span><h1>Data Sources<a class="headerlink" href="#data-sources" title="Link to this heading"></a></h1>
<p>DataFusion provides a wide variety of ways to get data into a DataFrame to perform operations.</p>
<section id="local-file">
<h2>Local file<a class="headerlink" href="#local-file" title="Link to this heading"></a></h2>
<p>DataFusion has the ability to read from a variety of popular file formats, such as <a class="reference internal" href="io/parquet.html#io-parquet"><span class="std std-ref">Parquet</span></a>,
<a class="reference internal" href="io/csv.html#io-csv"><span class="std std-ref">CSV</span></a>, <a class="reference internal" href="io/json.html#io-json"><span class="std std-ref">JSON</span></a>, and <a class="reference internal" href="io/avro.html#io-avro"><span class="std std-ref">AVRO</span></a>.</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="n">In</span> <span class="p">[</span><span class="mi">1</span><span class="p">]:</span> <span class="kn">from</span><span class="w"> </span><span class="nn">datafusion</span><span class="w"> </span><span class="kn">import</span> <span class="n">SessionContext</span>
<span class="n">In</span> <span class="p">[</span><span class="mi">2</span><span class="p">]:</span> <span class="n">ctx</span> <span class="o">=</span> <span class="n">SessionContext</span><span class="p">()</span>
<span class="n">In</span> <span class="p">[</span><span class="mi">3</span><span class="p">]:</span> <span class="n">df</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s2">&quot;pokemon.csv&quot;</span><span class="p">)</span>
<span class="n">In</span> <span class="p">[</span><span class="mi">4</span><span class="p">]:</span> <span class="n">df</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
<span class="n">DataFrame</span><span class="p">()</span>
<span class="o">+----+---------------------------+--------+--------+-------+----+--------+---------+---------+---------+-------+------------+-----------+</span>
<span class="o">|</span> <span class="c1"># | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary |</span>
<span class="o">+----+---------------------------+--------+--------+-------+----+--------+---------+---------+---------+-------+------------+-----------+</span>
<span class="o">|</span> <span class="mi">1</span> <span class="o">|</span> <span class="n">Bulbasaur</span> <span class="o">|</span> <span class="n">Grass</span> <span class="o">|</span> <span class="n">Poison</span> <span class="o">|</span> <span class="mi">318</span> <span class="o">|</span> <span class="mi">45</span> <span class="o">|</span> <span class="mi">49</span> <span class="o">|</span> <span class="mi">49</span> <span class="o">|</span> <span class="mi">65</span> <span class="o">|</span> <span class="mi">65</span> <span class="o">|</span> <span class="mi">45</span> <span class="o">|</span> <span class="mi">1</span> <span class="o">|</span> <span class="n">false</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">2</span> <span class="o">|</span> <span class="n">Ivysaur</span> <span class="o">|</span> <span class="n">Grass</span> <span class="o">|</span> <span class="n">Poison</span> <span class="o">|</span> <span class="mi">405</span> <span class="o">|</span> <span class="mi">60</span> <span class="o">|</span> <span class="mi">62</span> <span class="o">|</span> <span class="mi">63</span> <span class="o">|</span> <span class="mi">80</span> <span class="o">|</span> <span class="mi">80</span> <span class="o">|</span> <span class="mi">60</span> <span class="o">|</span> <span class="mi">1</span> <span class="o">|</span> <span class="n">false</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">3</span> <span class="o">|</span> <span class="n">Venusaur</span> <span class="o">|</span> <span class="n">Grass</span> <span class="o">|</span> <span class="n">Poison</span> <span class="o">|</span> <span class="mi">525</span> <span class="o">|</span> <span class="mi">80</span> <span class="o">|</span> <span class="mi">82</span> <span class="o">|</span> <span class="mi">83</span> <span class="o">|</span> <span class="mi">100</span> <span class="o">|</span> <span class="mi">100</span> <span class="o">|</span> <span class="mi">80</span> <span class="o">|</span> <span class="mi">1</span> <span class="o">|</span> <span class="n">false</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">3</span> <span class="o">|</span> <span class="n">VenusaurMega</span> <span class="n">Venusaur</span> <span class="o">|</span> <span class="n">Grass</span> <span class="o">|</span> <span class="n">Poison</span> <span class="o">|</span> <span class="mi">625</span> <span class="o">|</span> <span class="mi">80</span> <span class="o">|</span> <span class="mi">100</span> <span class="o">|</span> <span class="mi">123</span> <span class="o">|</span> <span class="mi">122</span> <span class="o">|</span> <span class="mi">120</span> <span class="o">|</span> <span class="mi">80</span> <span class="o">|</span> <span class="mi">1</span> <span class="o">|</span> <span class="n">false</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">4</span> <span class="o">|</span> <span class="n">Charmander</span> <span class="o">|</span> <span class="n">Fire</span> <span class="o">|</span> <span class="o">|</span> <span class="mi">309</span> <span class="o">|</span> <span class="mi">39</span> <span class="o">|</span> <span class="mi">52</span> <span class="o">|</span> <span class="mi">43</span> <span class="o">|</span> <span class="mi">60</span> <span class="o">|</span> <span class="mi">50</span> <span class="o">|</span> <span class="mi">65</span> <span class="o">|</span> <span class="mi">1</span> <span class="o">|</span> <span class="n">false</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">5</span> <span class="o">|</span> <span class="n">Charmeleon</span> <span class="o">|</span> <span class="n">Fire</span> <span class="o">|</span> <span class="o">|</span> <span class="mi">405</span> <span class="o">|</span> <span class="mi">58</span> <span class="o">|</span> <span class="mi">64</span> <span class="o">|</span> <span class="mi">58</span> <span class="o">|</span> <span class="mi">80</span> <span class="o">|</span> <span class="mi">65</span> <span class="o">|</span> <span class="mi">80</span> <span class="o">|</span> <span class="mi">1</span> <span class="o">|</span> <span class="n">false</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">6</span> <span class="o">|</span> <span class="n">Charizard</span> <span class="o">|</span> <span class="n">Fire</span> <span class="o">|</span> <span class="n">Flying</span> <span class="o">|</span> <span class="mi">534</span> <span class="o">|</span> <span class="mi">78</span> <span class="o">|</span> <span class="mi">84</span> <span class="o">|</span> <span class="mi">78</span> <span class="o">|</span> <span class="mi">109</span> <span class="o">|</span> <span class="mi">85</span> <span class="o">|</span> <span class="mi">100</span> <span class="o">|</span> <span class="mi">1</span> <span class="o">|</span> <span class="n">false</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">6</span> <span class="o">|</span> <span class="n">CharizardMega</span> <span class="n">Charizard</span> <span class="n">X</span> <span class="o">|</span> <span class="n">Fire</span> <span class="o">|</span> <span class="n">Dragon</span> <span class="o">|</span> <span class="mi">634</span> <span class="o">|</span> <span class="mi">78</span> <span class="o">|</span> <span class="mi">130</span> <span class="o">|</span> <span class="mi">111</span> <span class="o">|</span> <span class="mi">130</span> <span class="o">|</span> <span class="mi">85</span> <span class="o">|</span> <span class="mi">100</span> <span class="o">|</span> <span class="mi">1</span> <span class="o">|</span> <span class="n">false</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">6</span> <span class="o">|</span> <span class="n">CharizardMega</span> <span class="n">Charizard</span> <span class="n">Y</span> <span class="o">|</span> <span class="n">Fire</span> <span class="o">|</span> <span class="n">Flying</span> <span class="o">|</span> <span class="mi">634</span> <span class="o">|</span> <span class="mi">78</span> <span class="o">|</span> <span class="mi">104</span> <span class="o">|</span> <span class="mi">78</span> <span class="o">|</span> <span class="mi">159</span> <span class="o">|</span> <span class="mi">115</span> <span class="o">|</span> <span class="mi">100</span> <span class="o">|</span> <span class="mi">1</span> <span class="o">|</span> <span class="n">false</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">7</span> <span class="o">|</span> <span class="n">Squirtle</span> <span class="o">|</span> <span class="n">Water</span> <span class="o">|</span> <span class="o">|</span> <span class="mi">314</span> <span class="o">|</span> <span class="mi">44</span> <span class="o">|</span> <span class="mi">48</span> <span class="o">|</span> <span class="mi">65</span> <span class="o">|</span> <span class="mi">50</span> <span class="o">|</span> <span class="mi">64</span> <span class="o">|</span> <span class="mi">43</span> <span class="o">|</span> <span class="mi">1</span> <span class="o">|</span> <span class="n">false</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">8</span> <span class="o">|</span> <span class="n">Wartortle</span> <span class="o">|</span> <span class="n">Water</span> <span class="o">|</span> <span class="o">|</span> <span class="mi">405</span> <span class="o">|</span> <span class="mi">59</span> <span class="o">|</span> <span class="mi">63</span> <span class="o">|</span> <span class="mi">80</span> <span class="o">|</span> <span class="mi">65</span> <span class="o">|</span> <span class="mi">80</span> <span class="o">|</span> <span class="mi">58</span> <span class="o">|</span> <span class="mi">1</span> <span class="o">|</span> <span class="n">false</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">9</span> <span class="o">|</span> <span class="n">Blastoise</span> <span class="o">|</span> <span class="n">Water</span> <span class="o">|</span> <span class="o">|</span> <span class="mi">530</span> <span class="o">|</span> <span class="mi">79</span> <span class="o">|</span> <span class="mi">83</span> <span class="o">|</span> <span class="mi">100</span> <span class="o">|</span> <span class="mi">85</span> <span class="o">|</span> <span class="mi">105</span> <span class="o">|</span> <span class="mi">78</span> <span class="o">|</span> <span class="mi">1</span> <span class="o">|</span> <span class="n">false</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">9</span> <span class="o">|</span> <span class="n">BlastoiseMega</span> <span class="n">Blastoise</span> <span class="o">|</span> <span class="n">Water</span> <span class="o">|</span> <span class="o">|</span> <span class="mi">630</span> <span class="o">|</span> <span class="mi">79</span> <span class="o">|</span> <span class="mi">103</span> <span class="o">|</span> <span class="mi">120</span> <span class="o">|</span> <span class="mi">135</span> <span class="o">|</span> <span class="mi">115</span> <span class="o">|</span> <span class="mi">78</span> <span class="o">|</span> <span class="mi">1</span> <span class="o">|</span> <span class="n">false</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">10</span> <span class="o">|</span> <span class="n">Caterpie</span> <span class="o">|</span> <span class="n">Bug</span> <span class="o">|</span> <span class="o">|</span> <span class="mi">195</span> <span class="o">|</span> <span class="mi">45</span> <span class="o">|</span> <span class="mi">30</span> <span class="o">|</span> <span class="mi">35</span> <span class="o">|</span> <span class="mi">20</span> <span class="o">|</span> <span class="mi">20</span> <span class="o">|</span> <span class="mi">45</span> <span class="o">|</span> <span class="mi">1</span> <span class="o">|</span> <span class="n">false</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">11</span> <span class="o">|</span> <span class="n">Metapod</span> <span class="o">|</span> <span class="n">Bug</span> <span class="o">|</span> <span class="o">|</span> <span class="mi">205</span> <span class="o">|</span> <span class="mi">50</span> <span class="o">|</span> <span class="mi">20</span> <span class="o">|</span> <span class="mi">55</span> <span class="o">|</span> <span class="mi">25</span> <span class="o">|</span> <span class="mi">25</span> <span class="o">|</span> <span class="mi">30</span> <span class="o">|</span> <span class="mi">1</span> <span class="o">|</span> <span class="n">false</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">12</span> <span class="o">|</span> <span class="n">Butterfree</span> <span class="o">|</span> <span class="n">Bug</span> <span class="o">|</span> <span class="n">Flying</span> <span class="o">|</span> <span class="mi">395</span> <span class="o">|</span> <span class="mi">60</span> <span class="o">|</span> <span class="mi">45</span> <span class="o">|</span> <span class="mi">50</span> <span class="o">|</span> <span class="mi">90</span> <span class="o">|</span> <span class="mi">80</span> <span class="o">|</span> <span class="mi">70</span> <span class="o">|</span> <span class="mi">1</span> <span class="o">|</span> <span class="n">false</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">13</span> <span class="o">|</span> <span class="n">Weedle</span> <span class="o">|</span> <span class="n">Bug</span> <span class="o">|</span> <span class="n">Poison</span> <span class="o">|</span> <span class="mi">195</span> <span class="o">|</span> <span class="mi">40</span> <span class="o">|</span> <span class="mi">35</span> <span class="o">|</span> <span class="mi">30</span> <span class="o">|</span> <span class="mi">20</span> <span class="o">|</span> <span class="mi">20</span> <span class="o">|</span> <span class="mi">50</span> <span class="o">|</span> <span class="mi">1</span> <span class="o">|</span> <span class="n">false</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">14</span> <span class="o">|</span> <span class="n">Kakuna</span> <span class="o">|</span> <span class="n">Bug</span> <span class="o">|</span> <span class="n">Poison</span> <span class="o">|</span> <span class="mi">205</span> <span class="o">|</span> <span class="mi">45</span> <span class="o">|</span> <span class="mi">25</span> <span class="o">|</span> <span class="mi">50</span> <span class="o">|</span> <span class="mi">25</span> <span class="o">|</span> <span class="mi">25</span> <span class="o">|</span> <span class="mi">35</span> <span class="o">|</span> <span class="mi">1</span> <span class="o">|</span> <span class="n">false</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">15</span> <span class="o">|</span> <span class="n">Beedrill</span> <span class="o">|</span> <span class="n">Bug</span> <span class="o">|</span> <span class="n">Poison</span> <span class="o">|</span> <span class="mi">395</span> <span class="o">|</span> <span class="mi">65</span> <span class="o">|</span> <span class="mi">90</span> <span class="o">|</span> <span class="mi">40</span> <span class="o">|</span> <span class="mi">45</span> <span class="o">|</span> <span class="mi">80</span> <span class="o">|</span> <span class="mi">75</span> <span class="o">|</span> <span class="mi">1</span> <span class="o">|</span> <span class="n">false</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">15</span> <span class="o">|</span> <span class="n">BeedrillMega</span> <span class="n">Beedrill</span> <span class="o">|</span> <span class="n">Bug</span> <span class="o">|</span> <span class="n">Poison</span> <span class="o">|</span> <span class="mi">495</span> <span class="o">|</span> <span class="mi">65</span> <span class="o">|</span> <span class="mi">150</span> <span class="o">|</span> <span class="mi">40</span> <span class="o">|</span> <span class="mi">15</span> <span class="o">|</span> <span class="mi">80</span> <span class="o">|</span> <span class="mi">145</span> <span class="o">|</span> <span class="mi">1</span> <span class="o">|</span> <span class="n">false</span> <span class="o">|</span>
<span class="o">+----+---------------------------+--------+--------+-------+----+--------+---------+---------+---------+-------+------------+-----------+</span>
</pre></div>
</div>
</section>
<section id="create-in-memory">
<h2>Create in-memory<a class="headerlink" href="#create-in-memory" title="Link to this heading"></a></h2>
<p>Sometimes it can be convenient to create a small DataFrame from a Python list or dictionary object.
To do this in DataFusion, you can use one of the three functions
<a class="reference internal" href="../autoapi/datafusion/context/index.html#datafusion.context.SessionContext.from_pydict" title="datafusion.context.SessionContext.from_pydict"><code class="xref py py-func docutils literal notranslate"><span class="pre">from_pydict()</span></code></a>,
<a class="reference internal" href="../autoapi/datafusion/context/index.html#datafusion.context.SessionContext.from_pylist" title="datafusion.context.SessionContext.from_pylist"><code class="xref py py-func docutils literal notranslate"><span class="pre">from_pylist()</span></code></a>, or
<a class="reference internal" href="../autoapi/datafusion/context/index.html#datafusion.context.SessionContext.create_dataframe" title="datafusion.context.SessionContext.create_dataframe"><code class="xref py py-func docutils literal notranslate"><span class="pre">create_dataframe()</span></code></a>.</p>
<p>As their names suggest, <code class="docutils literal notranslate"><span class="pre">from_pydict</span></code> and <code class="docutils literal notranslate"><span class="pre">from_pylist</span></code> will create DataFrames from Python
dictionary and list objects, respectively. <code class="docutils literal notranslate"><span class="pre">create_dataframe</span></code> assumes you will pass in a list
of list of <a class="reference external" href="https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatch.html">PyArrow Record Batches</a>.</p>
<p>The following three examples all will create identical DataFrames:</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="n">In</span> <span class="p">[</span><span class="mi">5</span><span class="p">]:</span> <span class="kn">import</span><span class="w"> </span><span class="nn">pyarrow</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="nn">pa</span>
<span class="n">In</span> <span class="p">[</span><span class="mi">6</span><span class="p">]:</span> <span class="n">ctx</span><span class="o">.</span><span class="n">from_pylist</span><span class="p">([</span>
<span class="o">...</span><span class="p">:</span> <span class="p">{</span> <span class="s2">&quot;a&quot;</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="s2">&quot;b&quot;</span><span class="p">:</span> <span class="mf">10.0</span><span class="p">,</span> <span class="s2">&quot;c&quot;</span><span class="p">:</span> <span class="s2">&quot;alpha&quot;</span> <span class="p">},</span>
<span class="o">...</span><span class="p">:</span> <span class="p">{</span> <span class="s2">&quot;a&quot;</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span> <span class="s2">&quot;b&quot;</span><span class="p">:</span> <span class="mf">20.0</span><span class="p">,</span> <span class="s2">&quot;c&quot;</span><span class="p">:</span> <span class="s2">&quot;beta&quot;</span> <span class="p">},</span>
<span class="o">...</span><span class="p">:</span> <span class="p">{</span> <span class="s2">&quot;a&quot;</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span> <span class="s2">&quot;b&quot;</span><span class="p">:</span> <span class="mf">30.0</span><span class="p">,</span> <span class="s2">&quot;c&quot;</span><span class="p">:</span> <span class="s2">&quot;gamma&quot;</span> <span class="p">},</span>
<span class="o">...</span><span class="p">:</span> <span class="p">])</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
<span class="o">...</span><span class="p">:</span>
<span class="n">DataFrame</span><span class="p">()</span>
<span class="o">+---+------+-------+</span>
<span class="o">|</span> <span class="n">a</span> <span class="o">|</span> <span class="n">b</span> <span class="o">|</span> <span class="n">c</span> <span class="o">|</span>
<span class="o">+---+------+-------+</span>
<span class="o">|</span> <span class="mi">1</span> <span class="o">|</span> <span class="mf">10.0</span> <span class="o">|</span> <span class="n">alpha</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">2</span> <span class="o">|</span> <span class="mf">20.0</span> <span class="o">|</span> <span class="n">beta</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">3</span> <span class="o">|</span> <span class="mf">30.0</span> <span class="o">|</span> <span class="n">gamma</span> <span class="o">|</span>
<span class="o">+---+------+-------+</span>
<span class="n">In</span> <span class="p">[</span><span class="mi">7</span><span class="p">]:</span> <span class="n">ctx</span><span class="o">.</span><span class="n">from_pydict</span><span class="p">({</span>
<span class="o">...</span><span class="p">:</span> <span class="s2">&quot;a&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span>
<span class="o">...</span><span class="p">:</span> <span class="s2">&quot;b&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mf">10.0</span><span class="p">,</span> <span class="mf">20.0</span><span class="p">,</span> <span class="mf">30.0</span><span class="p">],</span>
<span class="o">...</span><span class="p">:</span> <span class="s2">&quot;c&quot;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&quot;alpha&quot;</span><span class="p">,</span> <span class="s2">&quot;beta&quot;</span><span class="p">,</span> <span class="s2">&quot;gamma&quot;</span><span class="p">],</span>
<span class="o">...</span><span class="p">:</span> <span class="p">})</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
<span class="o">...</span><span class="p">:</span>
<span class="n">DataFrame</span><span class="p">()</span>
<span class="o">+---+------+-------+</span>
<span class="o">|</span> <span class="n">a</span> <span class="o">|</span> <span class="n">b</span> <span class="o">|</span> <span class="n">c</span> <span class="o">|</span>
<span class="o">+---+------+-------+</span>
<span class="o">|</span> <span class="mi">1</span> <span class="o">|</span> <span class="mf">10.0</span> <span class="o">|</span> <span class="n">alpha</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">2</span> <span class="o">|</span> <span class="mf">20.0</span> <span class="o">|</span> <span class="n">beta</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">3</span> <span class="o">|</span> <span class="mf">30.0</span> <span class="o">|</span> <span class="n">gamma</span> <span class="o">|</span>
<span class="o">+---+------+-------+</span>
<span class="n">In</span> <span class="p">[</span><span class="mi">8</span><span class="p">]:</span> <span class="n">batch</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">RecordBatch</span><span class="o">.</span><span class="n">from_arrays</span><span class="p">(</span>
<span class="o">...</span><span class="p">:</span> <span class="p">[</span>
<span class="o">...</span><span class="p">:</span> <span class="n">pa</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]),</span>
<span class="o">...</span><span class="p">:</span> <span class="n">pa</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mf">10.0</span><span class="p">,</span> <span class="mf">20.0</span><span class="p">,</span> <span class="mf">30.0</span><span class="p">]),</span>
<span class="o">...</span><span class="p">:</span> <span class="n">pa</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="s2">&quot;alpha&quot;</span><span class="p">,</span> <span class="s2">&quot;beta&quot;</span><span class="p">,</span> <span class="s2">&quot;gamma&quot;</span><span class="p">]),</span>
<span class="o">...</span><span class="p">:</span> <span class="p">],</span>
<span class="o">...</span><span class="p">:</span> <span class="n">names</span><span class="o">=</span><span class="p">[</span><span class="s2">&quot;a&quot;</span><span class="p">,</span> <span class="s2">&quot;b&quot;</span><span class="p">,</span> <span class="s2">&quot;c&quot;</span><span class="p">],</span>
<span class="o">...</span><span class="p">:</span> <span class="p">)</span>
<span class="o">...</span><span class="p">:</span>
<span class="n">In</span> <span class="p">[</span><span class="mi">9</span><span class="p">]:</span> <span class="n">ctx</span><span class="o">.</span><span class="n">create_dataframe</span><span class="p">([[</span><span class="n">batch</span><span class="p">]])</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
<span class="n">DataFrame</span><span class="p">()</span>
<span class="o">+---+------+-------+</span>
<span class="o">|</span> <span class="n">a</span> <span class="o">|</span> <span class="n">b</span> <span class="o">|</span> <span class="n">c</span> <span class="o">|</span>
<span class="o">+---+------+-------+</span>
<span class="o">|</span> <span class="mi">1</span> <span class="o">|</span> <span class="mf">10.0</span> <span class="o">|</span> <span class="n">alpha</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">2</span> <span class="o">|</span> <span class="mf">20.0</span> <span class="o">|</span> <span class="n">beta</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">3</span> <span class="o">|</span> <span class="mf">30.0</span> <span class="o">|</span> <span class="n">gamma</span> <span class="o">|</span>
<span class="o">+---+------+-------+</span>
</pre></div>
</div>
</section>
<section id="object-store">
<h2>Object Store<a class="headerlink" href="#object-store" title="Link to this heading"></a></h2>
<p>DataFusion has support for multiple storage options in addition to local files.
The example below requires an appropriate S3 account with access credentials.</p>
<p>Supported Object Stores are</p>
<ul class="simple">
<li><p><a class="reference internal" href="../autoapi/datafusion/object_store/index.html#datafusion.object_store.AmazonS3" title="datafusion.object_store.AmazonS3"><code class="xref py py-class docutils literal notranslate"><span class="pre">AmazonS3</span></code></a></p></li>
<li><p><a class="reference internal" href="../autoapi/datafusion/object_store/index.html#datafusion.object_store.GoogleCloud" title="datafusion.object_store.GoogleCloud"><code class="xref py py-class docutils literal notranslate"><span class="pre">GoogleCloud</span></code></a></p></li>
<li><p><a class="reference internal" href="../autoapi/datafusion/object_store/index.html#datafusion.object_store.Http" title="datafusion.object_store.Http"><code class="xref py py-class docutils literal notranslate"><span class="pre">Http</span></code></a></p></li>
<li><p><a class="reference internal" href="../autoapi/datafusion/object_store/index.html#datafusion.object_store.LocalFileSystem" title="datafusion.object_store.LocalFileSystem"><code class="xref py py-class docutils literal notranslate"><span class="pre">LocalFileSystem</span></code></a></p></li>
<li><p><a class="reference internal" href="../autoapi/datafusion/object_store/index.html#datafusion.object_store.MicrosoftAzure" title="datafusion.object_store.MicrosoftAzure"><code class="xref py py-class docutils literal notranslate"><span class="pre">MicrosoftAzure</span></code></a></p></li>
</ul>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span><span class="w"> </span><span class="nn">datafusion.object_store</span><span class="w"> </span><span class="kn">import</span> <span class="n">AmazonS3</span>
<span class="n">region</span> <span class="o">=</span> <span class="s2">&quot;us-east-1&quot;</span>
<span class="n">bucket_name</span> <span class="o">=</span> <span class="s2">&quot;yellow-trips&quot;</span>
<span class="n">s3</span> <span class="o">=</span> <span class="n">AmazonS3</span><span class="p">(</span>
<span class="n">bucket_name</span><span class="o">=</span><span class="n">bucket_name</span><span class="p">,</span>
<span class="n">region</span><span class="o">=</span><span class="n">region</span><span class="p">,</span>
<span class="n">access_key_id</span><span class="o">=</span><span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&quot;AWS_ACCESS_KEY_ID&quot;</span><span class="p">),</span>
<span class="n">secret_access_key</span><span class="o">=</span><span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&quot;AWS_SECRET_ACCESS_KEY&quot;</span><span class="p">),</span>
<span class="p">)</span>
<span class="n">path</span> <span class="o">=</span> <span class="sa">f</span><span class="s2">&quot;s3://</span><span class="si">{</span><span class="n">bucket_name</span><span class="si">}</span><span class="s2">/&quot;</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">register_object_store</span><span class="p">(</span><span class="s2">&quot;s3://&quot;</span><span class="p">,</span> <span class="n">s3</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">register_parquet</span><span class="p">(</span><span class="s2">&quot;trips&quot;</span><span class="p">,</span> <span class="n">path</span><span class="p">)</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">table</span><span class="p">(</span><span class="s2">&quot;trips&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</pre></div>
</div>
</section>
<section id="other-dataframe-libraries">
<h2>Other DataFrame Libraries<a class="headerlink" href="#other-dataframe-libraries" title="Link to this heading"></a></h2>
<p>DataFusion can import DataFrames directly from other libraries, such as
<a class="reference external" href="https://pola.rs/">Polars</a> and <a class="reference external" href="https://pandas.pydata.org/">Pandas</a>.
Since DataFusion version 42.0.0, any DataFrame library that supports the Arrow FFI PyCapsule
interface can be imported to DataFusion using the
<a class="reference internal" href="../autoapi/datafusion/context/index.html#datafusion.context.SessionContext.from_arrow" title="datafusion.context.SessionContext.from_arrow"><code class="xref py py-func docutils literal notranslate"><span class="pre">from_arrow()</span></code></a> function. Older versions of Polars may
not support the arrow interface. In those cases, you can still import via the
<a class="reference internal" href="../autoapi/datafusion/context/index.html#datafusion.context.SessionContext.from_polars" title="datafusion.context.SessionContext.from_polars"><code class="xref py py-func docutils literal notranslate"><span class="pre">from_polars()</span></code></a> function.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span><span class="w"> </span><span class="nn">pandas</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="nn">pd</span>
<span class="n">data</span> <span class="o">=</span> <span class="p">{</span> <span class="s2">&quot;a&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="s2">&quot;b&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mf">10.0</span><span class="p">,</span> <span class="mf">20.0</span><span class="p">,</span> <span class="mf">30.0</span><span class="p">],</span> <span class="s2">&quot;c&quot;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&quot;alpha&quot;</span><span class="p">,</span> <span class="s2">&quot;beta&quot;</span><span class="p">,</span> <span class="s2">&quot;gamma&quot;</span><span class="p">]</span> <span class="p">}</span>
<span class="n">pandas_df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="n">datafusion_df</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">from_arrow</span><span class="p">(</span><span class="n">pandas_df</span><span class="p">)</span>
<span class="n">datafusion_df</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</pre></div>
</div>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span><span class="w"> </span><span class="nn">polars</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="nn">pl</span>
<span class="n">polars_df</span> <span class="o">=</span> <span class="n">pl</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="n">datafusion_df</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">from_arrow</span><span class="p">(</span><span class="n">polars_df</span><span class="p">)</span>
<span class="n">datafusion_df</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</pre></div>
</div>
</section>
<section id="delta-lake">
<h2>Delta Lake<a class="headerlink" href="#delta-lake" title="Link to this heading"></a></h2>
<p>DataFusion 43.0.0 and later support the ability to register table providers from sources such
as Delta Lake. This will require a recent version of
<a class="reference external" href="https://delta-io.github.io/delta-rs/">deltalake</a> to provide the required interfaces.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span><span class="w"> </span><span class="nn">deltalake</span><span class="w"> </span><span class="kn">import</span> <span class="n">DeltaTable</span>
<span class="n">delta_table</span> <span class="o">=</span> <span class="n">DeltaTable</span><span class="p">(</span><span class="s2">&quot;path_to_table&quot;</span><span class="p">)</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">register_table</span><span class="p">(</span><span class="s2">&quot;my_delta_table&quot;</span><span class="p">,</span> <span class="n">delta_table</span><span class="p">)</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">table</span><span class="p">(</span><span class="s2">&quot;my_delta_table&quot;</span><span class="p">)</span>
<span class="n">df</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</pre></div>
</div>
<p>On older versions of <code class="docutils literal notranslate"><span class="pre">deltalake</span></code> (prior to 0.22) you can use the
<a class="reference external" href="https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Dataset.html">Arrow DataSet</a>
interface to import to DataFusion, but this does not support features such as filter push down
which can lead to a significant performance difference.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span><span class="w"> </span><span class="nn">deltalake</span><span class="w"> </span><span class="kn">import</span> <span class="n">DeltaTable</span>
<span class="n">delta_table</span> <span class="o">=</span> <span class="n">DeltaTable</span><span class="p">(</span><span class="s2">&quot;path_to_table&quot;</span><span class="p">)</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">register_dataset</span><span class="p">(</span><span class="s2">&quot;my_delta_table&quot;</span><span class="p">,</span> <span class="n">delta_table</span><span class="o">.</span><span class="n">to_pyarrow_dataset</span><span class="p">())</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">table</span><span class="p">(</span><span class="s2">&quot;my_delta_table&quot;</span><span class="p">)</span>
<span class="n">df</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</pre></div>
</div>
</section>
<section id="apache-iceberg">
<h2>Apache Iceberg<a class="headerlink" href="#apache-iceberg" title="Link to this heading"></a></h2>
<p>DataFusion 45.0.0 and later support the ability to register Apache Iceberg tables as table providers through the Custom Table Provider interface.</p>
<p>This requires either the <a class="reference external" href="https://pypi.org/project/pyiceberg/">pyiceberg</a> library (&gt;=0.10.0) or the <a class="reference external" href="https://pypi.org/project/pyiceberg-core/">pyiceberg-core</a> library (&gt;=0.5.0).</p>
<ul class="simple">
<li><p>The <code class="docutils literal notranslate"><span class="pre">pyiceberg-core</span></code> library exposes Iceberg Rust’s implementation of the Custom Table Provider interface as python bindings.</p></li>
<li><p>The <code class="docutils literal notranslate"><span class="pre">pyiceberg</span></code> library utilizes the <code class="docutils literal notranslate"><span class="pre">pyiceberg-core</span></code> python bindings under the hood and provides a native way for Python users to interact with the DataFusion.</p></li>
</ul>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span><span class="w"> </span><span class="nn">datafusion</span><span class="w"> </span><span class="kn">import</span> <span class="n">SessionContext</span>
<span class="kn">from</span><span class="w"> </span><span class="nn">pyiceberg.catalog</span><span class="w"> </span><span class="kn">import</span> <span class="n">load_catalog</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">pyarrow</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="nn">pa</span>
<span class="c1"># Load catalog and create/load a table</span>
<span class="n">catalog</span> <span class="o">=</span> <span class="n">load_catalog</span><span class="p">(</span><span class="s2">&quot;catalog&quot;</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="s2">&quot;in-memory&quot;</span><span class="p">)</span>
<span class="n">catalog</span><span class="o">.</span><span class="n">create_namespace_if_not_exists</span><span class="p">(</span><span class="s2">&quot;default&quot;</span><span class="p">)</span>
<span class="c1"># Create some sample data</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">table</span><span class="p">({</span><span class="s2">&quot;x&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="s2">&quot;y&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">]})</span>
<span class="n">iceberg_table</span> <span class="o">=</span> <span class="n">catalog</span><span class="o">.</span><span class="n">create_table</span><span class="p">(</span><span class="s2">&quot;default.test&quot;</span><span class="p">,</span> <span class="n">schema</span><span class="o">=</span><span class="n">data</span><span class="o">.</span><span class="n">schema</span><span class="p">)</span>
<span class="n">iceberg_table</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="c1"># Register the table with DataFusion</span>
<span class="n">ctx</span> <span class="o">=</span> <span class="n">SessionContext</span><span class="p">()</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">register_table_provider</span><span class="p">(</span><span class="s2">&quot;test&quot;</span><span class="p">,</span> <span class="n">iceberg_table</span><span class="p">)</span>
<span class="c1"># Query the table using DataFusion</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">table</span><span class="p">(</span><span class="s2">&quot;test&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</pre></div>
</div>
<p>Note that the Datafusion integration rely on features from the <a class="reference external" href="https://github.com/apache/iceberg-rust/">Iceberg Rust</a> implementation instead of the <a class="reference external" href="https://github.com/apache/iceberg-python/">PyIceberg</a> implementation.
Features that are available in PyIceberg but not yet in Iceberg Rust will not be available when using DataFusion.</p>
</section>
<section id="custom-table-provider">
<h2>Custom Table Provider<a class="headerlink" href="#custom-table-provider" title="Link to this heading"></a></h2>
<p>You can implement a custom Data Provider in Rust and expose it to DataFusion through the
the interface as describe in the <a class="reference internal" href="io/table_provider.html#io-custom-table-provider"><span class="std std-ref">Custom Table Provider</span></a>
section. This is an advanced topic, but a
<a class="reference external" href="https://github.com/apache/datafusion-python/tree/main/examples/datafusion-ffi-example">user example</a>
is provided in the DataFusion repository.</p>
</section>
</section>
<section id="catalog">
<h1>Catalog<a class="headerlink" href="#catalog" title="Link to this heading"></a></h1>
<p>A common technique for organizing tables is using a three level hierarchical approach. DataFusion
supports this form of organizing using the <a class="reference internal" href="../autoapi/datafusion/catalog/index.html#datafusion.catalog.Catalog" title="datafusion.catalog.Catalog"><code class="xref py py-class docutils literal notranslate"><span class="pre">Catalog</span></code></a>,
<a class="reference internal" href="../autoapi/datafusion/catalog/index.html#datafusion.catalog.Schema" title="datafusion.catalog.Schema"><code class="xref py py-class docutils literal notranslate"><span class="pre">Schema</span></code></a>, and <a class="reference internal" href="../autoapi/datafusion/catalog/index.html#datafusion.catalog.Table" title="datafusion.catalog.Table"><code class="xref py py-class docutils literal notranslate"><span class="pre">Table</span></code></a>. By default,
a <a class="reference internal" href="../autoapi/datafusion/context/index.html#datafusion.context.SessionContext" title="datafusion.context.SessionContext"><code class="xref py py-class docutils literal notranslate"><span class="pre">SessionContext</span></code></a> comes with a single Catalog and a single Schema
with the names <code class="docutils literal notranslate"><span class="pre">datafusion</span></code> and <code class="docutils literal notranslate"><span class="pre">default</span></code>, respectively.</p>
<p>The default implementation uses an in-memory approach to the catalog and schema. We have support
for adding additional in-memory catalogs and schemas. This can be done like in the following
example:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span><span class="w"> </span><span class="nn">datafusion.catalog</span><span class="w"> </span><span class="kn">import</span> <span class="n">Catalog</span><span class="p">,</span> <span class="n">Schema</span>
<span class="n">my_catalog</span> <span class="o">=</span> <span class="n">Catalog</span><span class="o">.</span><span class="n">memory_catalog</span><span class="p">()</span>
<span class="n">my_schema</span> <span class="o">=</span> <span class="n">Schema</span><span class="o">.</span><span class="n">memory_schema</span><span class="p">()</span>
<span class="n">my_catalog</span><span class="o">.</span><span class="n">register_schema</span><span class="p">(</span><span class="s2">&quot;my_schema_name&quot;</span><span class="p">,</span> <span class="n">my_schema</span><span class="p">)</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">register_catalog</span><span class="p">(</span><span class="s2">&quot;my_catalog_name&quot;</span><span class="p">,</span> <span class="n">my_catalog</span><span class="p">)</span>
</pre></div>
</div>
<p>You could then register tables in <code class="docutils literal notranslate"><span class="pre">my_schema</span></code> and access them either through the DataFrame
API or via sql commands such as <code class="docutils literal notranslate"><span class="pre">&quot;SELECT</span> <span class="pre">*</span> <span class="pre">from</span> <span class="pre">my_catalog_name.my_schema_name.my_table&quot;</span></code>.</p>
<section id="user-defined-catalog-and-schema">
<h2>User Defined Catalog and Schema<a class="headerlink" href="#user-defined-catalog-and-schema" title="Link to this heading"></a></h2>
<p>If the in-memory catalogs are insufficient for your uses, there are two approaches you can take
to implementing a custom catalog and/or schema. In the below discussion, we describe how to
implement these for a Catalog, but the approach to implementing for a Schema is nearly
identical.</p>
<p>DataFusion supports Catalogs written in either Rust or Python. If you write a Catalog in Rust,
you will need to export it as a Python library via PyO3. There is a complete example of a
catalog implemented this way in the
<a class="reference external" href="https://github.com/apache/datafusion-python/tree/main/examples/">examples folder</a>
of our repository. Writing catalog providers in Rust provides typically can lead to significant
performance improvements over the Python based approach.</p>
<p>To implement a Catalog in Python, you will need to inherit from the abstract base class
<a class="reference internal" href="../autoapi/datafusion/catalog/index.html#datafusion.catalog.CatalogProvider" title="datafusion.catalog.CatalogProvider"><code class="xref py py-class docutils literal notranslate"><span class="pre">CatalogProvider</span></code></a>. There are examples in the
<a class="reference external" href="https://github.com/apache/datafusion-python/tree/main/python/tests">unit tests</a> of
implementing a basic Catalog in Python where we simply keep a dictionary of the
registered Schemas.</p>
<p>One important note for developers is that when we have a Catalog defined in Python, we have
two different ways of accessing this Catalog. First, we register the catalog with a Rust
wrapper. This allows for any rust based code to call the Python functions as necessary.
Second, if the user access the Catalog via the Python API, we identify this and return back
the original Python object that implements the Catalog. This is an important distinction
for developers because we do <em>not</em> return a Python wrapper around the Rust wrapper of the
original Python object.</p>
</section>
</section>
</div>
<!-- Previous / next buttons -->
<div class='prev-next-area'>
<a class='left-prev' id="prev-link" href="basics.html" title="previous page">
<i class="fas fa-angle-left"></i>
<div class="prev-next-info">
<p class="prev-next-subtitle">previous</p>
<p class="prev-next-title">Concepts</p>
</div>
</a>
<a class='right-next' id="next-link" href="dataframe/index.html" title="next page">
<div class="prev-next-info">
<p class="prev-next-subtitle">next</p>
<p class="prev-next-title">DataFrames</p>
</div>
<i class="fas fa-angle-right"></i>
</a>
</div>
</main>
</div>
</div>
<script src="../_static/scripts/pydata-sphinx-theme.js?digest=1999514e3f237ded88cf"></script>
<!-- Based on pydata_sphinx_theme/footer.html -->
<footer class="footer mt-5 mt-md-0">
<div class="container">
<div class="footer-item">
<p class="copyright">
&copy; Copyright 2019-2024, Apache Software Foundation.<br>
</p>
</div>
<div class="footer-item">
<p class="sphinx-version">
Created using <a href="http://sphinx-doc.org/">Sphinx</a> 8.1.3.<br>
</p>
</div>
<div class="footer-item">
<p>Apache Arrow DataFusion, Arrow DataFusion, Apache, the Apache feather logo, and the Apache Arrow DataFusion project logo</p>
<p>are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.</p>
</div>
</div>
</footer>
</body>
</html>