blob: 3d66fd5cf2df75602f5f681bd047bc40d38fd5d9 [file] [log] [blame]
<!DOCTYPE HTML>
<html lang="en-US">
<head>
<meta charset="UTF-8">
<title>Tutorial</title>
<meta name="viewport" content="width=device-width,initial-scale=1">
<meta name="generator" content="Jekyll v3.7.3">
<link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
<link rel="stylesheet" href="/css/screen.css">
<link rel="icon" type="image/x-icon" href="/favicon.ico">
<!--[if lt IE 9]>
<script src="/js/html5shiv.min.js"></script>
<script src="/js/respond.min.js"></script>
<![endif]-->
</head>
<body class="wrap">
<header role="banner">
<div class="grid">
<div class="unit center-on-mobiles">
<h1>
<a href="/">
<span class="sr-only">Apache Calcite</span>
<img src="/img/logo.svg" alt="Calcite Logo">
</a>
</h1>
</div>
<nav class="main-nav">
<ul>
<li class="">
<a href="/">Home</a>
</li>
<li class="">
<a href="/downloads/">Download</a>
</li>
<li class="">
<a href="/community/">Community</a>
</li>
<li class="">
<a href="/develop/">Develop</a>
</li>
<li class="">
<a href="/news/">News</a>
</li>
<li class="current">
<a href="/docs/">Docs</a>
</li>
</ul>
</nav>
</div>
</header>
<section class="docs">
<div class="grid">
<div class="docs-nav-mobile unit whole show-on-mobiles">
<select onchange="if (this.value) window.location.href=this.value">
<option value="">Navigate the docs…</option>
<optgroup label="Overview">
</optgroup>
<optgroup label="Advanced">
</optgroup>
<optgroup label="Avatica">
</optgroup>
<optgroup label="Reference">
</optgroup>
<optgroup label="Meta">
</optgroup>
</select>
</div>
<div class="unit four-fifths">
<article>
<h1>Tutorial</h1>
<!--
-->
<p>This is a step-by-step tutorial that shows how to build and connect to
Calcite. It uses a simple adapter that makes a directory of CSV files
appear to be a schema containing tables. Calcite does the rest, and
provides a full SQL interface.</p>
<p>Calcite-example-CSV is a fully functional adapter for
Calcite that reads
text files in
<a href="https://en.wikipedia.org/wiki/Comma-separated_values">CSV
(comma-separated values)</a> format. It is remarkable that a couple of
hundred lines of Java code are sufficient to provide full SQL query
capability.</p>
<p>CSV also serves as a template for building adapters to other
data formats. Even though there are not many lines of code, it covers
several important concepts:</p>
<ul>
<li>user-defined schema using SchemaFactory and Schema interfaces;</li>
<li>declaring schemas in a model JSON file;</li>
<li>declaring views in a model JSON file;</li>
<li>user-defined table using the Table interface;</li>
<li>determining the record type of a table;</li>
<li>a simple implementation of Table, using the ScannableTable interface, that
enumerates all rows directly;</li>
<li>a more advanced implementation that implements FilterableTable, and can
filter out rows according to simple predicates;</li>
<li>advanced implementation of Table, using TranslatableTable, that translates
to relational operators using planner rules.</li>
</ul>
<h2 id="download-and-build">Download and build</h2>
<p>You need Java (version 8, 9 or 10) and git.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>git clone https://github.com/apache/calcite.git
<span class="nv">$ </span><span class="nb">cd </span>calcite
<span class="nv">$ </span>./mvnw install <span class="nt">-DskipTests</span> <span class="nt">-Dcheckstyle</span>.skip<span class="o">=</span><span class="nb">true</span>
<span class="nv">$ </span><span class="nb">cd </span>example/csv</code></pre></figure>
<h2 id="first-queries">First queries</h2>
<p>Now let’s connect to Calcite using
<a href="https://github.com/julianhyde/sqlline">sqlline</a>, a SQL shell
that is included in this project.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>./sqlline
sqlline&gt; <span class="o">!</span>connect jdbc:calcite:model<span class="o">=</span>target/test-classes/model.json admin admin</code></pre></figure>
<p>(If you are running Windows, the command is <code class="highlighter-rouge">sqlline.bat</code>.)</p>
<p>Execute a metadata query:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">sqlline&gt; <span class="o">!</span>tables
+------------+--------------+-------------+---------------+----------+------+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE |
+------------+--------------+-------------+---------------+----------+------+
| null | SALES | DEPTS | TABLE | null | null |
| null | SALES | EMPS | TABLE | null | null |
| null | SALES | HOBBIES | TABLE | null | null |
| null | metadata | COLUMNS | SYSTEM_TABLE | null | null |
| null | metadata | TABLES | SYSTEM_TABLE | null | null |
+------------+--------------+-------------+---------------+----------+------+</code></pre></figure>
<p>(JDBC experts, note: sqlline’s <code>!tables</code> command is just executing
<a href="https://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getTables(java.lang.String, java.lang.String, java.lang.String, java.lang.String[])"><code>DatabaseMetaData.getTables()</code></a>
behind the scenes.
It has other commands to query JDBC metadata, such as <code>!columns</code> and <code>!describe</code>.)</p>
<p>As you can see there are 5 tables in the system: tables
<code>EMPS</code>, <code>DEPTS</code> and <code>HOBBIES</code> in the current
<code>SALES</code> schema, and <code>COLUMNS</code> and
<code>TABLES</code> in the system <code>metadata</code> schema. The
system tables are always present in Calcite, but the other tables are
provided by the specific implementation of the schema; in this case,
the <code>EMPS</code> and <code>DEPTS</code> tables are based on the
<code>EMPS.csv</code> and <code>DEPTS.csv</code> files in the
<code>target/test-classes</code> directory.</p>
<p>Let’s execute some queries on those tables, to show that Calcite is providing
a full implementation of SQL. First, a table scan:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">sqlline&gt; SELECT <span class="k">*</span> FROM emps<span class="p">;</span>
+--------+--------+---------+---------+----------------+--------+-------+---+
| EMPNO | NAME | DEPTNO | GENDER | CITY | EMPID | AGE | S |
+--------+--------+---------+---------+----------------+--------+-------+---+
| 100 | Fred | 10 | | | 30 | 25 | t |
| 110 | Eric | 20 | M | San Francisco | 3 | 80 | n |
| 110 | John | 40 | M | Vancouver | 2 | null | f |
| 120 | Wilma | 20 | F | | 1 | 5 | n |
| 130 | Alice | 40 | F | Vancouver | 2 | null | f |
+--------+--------+---------+---------+----------------+--------+-------+---+</code></pre></figure>
<p>Now JOIN and GROUP BY:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">sqlline&gt; SELECT d.name, COUNT<span class="o">(</span><span class="k">*</span><span class="o">)</span>
<span class="nb">.</span> <span class="nb">.</span> <span class="nb">.</span> .&gt; FROM emps AS e JOIN depts AS d ON e.deptno <span class="o">=</span> d.deptno
<span class="nb">.</span> <span class="nb">.</span> <span class="nb">.</span> .&gt; GROUP BY d.name<span class="p">;</span>
+------------+---------+
| NAME | EXPR<span class="nv">$1</span> |
+------------+---------+
| Sales | 1 |
| Marketing | 2 |
+------------+---------+</code></pre></figure>
<p>Last, the VALUES operator generates a single row, and is a convenient
way to test expressions and SQL built-in functions:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">sqlline&gt; VALUES CHAR_LENGTH<span class="o">(</span><span class="s1">'Hello, '</span> <span class="o">||</span> <span class="s1">'world!'</span><span class="o">)</span><span class="p">;</span>
+---------+
| EXPR<span class="nv">$0</span> |
+---------+
| 13 |
+---------+</code></pre></figure>
<p>Calcite has many other SQL features. We don’t have time to cover them
here. Write some more queries to experiment.</p>
<h2 id="schema-discovery">Schema discovery</h2>
<p>Now, how did Calcite find these tables? Remember, core Calcite does not
know anything about CSV files. (As a “database without a storage
layer”, Calcite doesn’t know about any file formats.) Calcite knows about
those tables because we told it to run code in the calcite-example-csv
project.</p>
<p>There are a couple of steps in that chain. First, we define a schema
based on a schema factory class in a model file. Then the schema
factory creates a schema, and the schema creates several tables, each
of which knows how to get data by scanning a CSV file. Last, after
Calcite has parsed the query and planned it to use those tables, Calcite
invokes the tables to read the data as the query is being
executed. Now let’s look at those steps in more detail.</p>
<p>On the JDBC connect string we gave the path of a model in JSON
format. Here is the model:</p>
<figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w">
</span><span class="err">version</span><span class="p">:</span><span class="w"> </span><span class="err">'</span><span class="mf">1.0</span><span class="err">'</span><span class="p">,</span><span class="w">
</span><span class="err">defaultSchema</span><span class="p">:</span><span class="w"> </span><span class="err">'SALES'</span><span class="p">,</span><span class="w">
</span><span class="err">schemas</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="err">name</span><span class="p">:</span><span class="w"> </span><span class="err">'SALES'</span><span class="p">,</span><span class="w">
</span><span class="err">type</span><span class="p">:</span><span class="w"> </span><span class="err">'custom'</span><span class="p">,</span><span class="w">
</span><span class="err">factory</span><span class="p">:</span><span class="w"> </span><span class="err">'org.apache.calcite.adapter.csv.CsvSchemaFactory'</span><span class="p">,</span><span class="w">
</span><span class="err">operand</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="err">directory</span><span class="p">:</span><span class="w"> </span><span class="err">'target/test-classes/sales'</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">]</span><span class="w">
</span><span class="p">}</span></code></pre></figure>
<p>The model defines a single schema called ‘SALES’. The schema is
powered by a plugin class,
<a href="https://github.com/apache/calcite/blob/master/example/csv/src/main/java/org/apache/calcite/adapter/csv/CsvSchemaFactory.java">org.apache.calcite.adapter.csv.CsvSchemaFactory</a>,
which is part of the
calcite-example-csv project and implements the Calcite interface
<a href="/apidocs/org/apache/calcite/schema/SchemaFactory.html">SchemaFactory</a>.
Its <code>create</code> method instantiates a
schema, passing in the <code>directory</code> argument from the model file:</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="n">Schema</span> <span class="nf">create</span><span class="o">(</span><span class="n">SchemaPlus</span> <span class="n">parentSchema</span><span class="o">,</span> <span class="n">String</span> <span class="n">name</span><span class="o">,</span>
<span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Object</span><span class="o">&gt;</span> <span class="n">operand</span><span class="o">)</span> <span class="o">{</span>
<span class="n">String</span> <span class="n">directory</span> <span class="o">=</span> <span class="o">(</span><span class="n">String</span><span class="o">)</span> <span class="n">operand</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="s">"directory"</span><span class="o">);</span>
<span class="n">String</span> <span class="n">flavorName</span> <span class="o">=</span> <span class="o">(</span><span class="n">String</span><span class="o">)</span> <span class="n">operand</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="s">"flavor"</span><span class="o">);</span>
<span class="n">CsvTable</span><span class="o">.</span><span class="na">Flavor</span> <span class="n">flavor</span><span class="o">;</span>
<span class="k">if</span> <span class="o">(</span><span class="n">flavorName</span> <span class="o">==</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span>
<span class="n">flavor</span> <span class="o">=</span> <span class="n">CsvTable</span><span class="o">.</span><span class="na">Flavor</span><span class="o">.</span><span class="na">SCANNABLE</span><span class="o">;</span>
<span class="o">}</span> <span class="k">else</span> <span class="o">{</span>
<span class="n">flavor</span> <span class="o">=</span> <span class="n">CsvTable</span><span class="o">.</span><span class="na">Flavor</span><span class="o">.</span><span class="na">valueOf</span><span class="o">(</span><span class="n">flavorName</span><span class="o">.</span><span class="na">toUpperCase</span><span class="o">());</span>
<span class="o">}</span>
<span class="k">return</span> <span class="k">new</span> <span class="nf">CsvSchema</span><span class="o">(</span>
<span class="k">new</span> <span class="nf">File</span><span class="o">(</span><span class="n">directory</span><span class="o">),</span>
<span class="n">flavor</span><span class="o">);</span>
<span class="o">}</span></code></pre></figure>
<p>Driven by the model, the schema factory instantiates a single schema
called ‘SALES’. The schema is an instance of
<a href="https://github.com/apache/calcite/blob/master/example/csv/src/main/java/org/apache/calcite/adapter/csv/CsvSchema.java">org.apache.calcite.adapter.csv.CsvSchema</a>
and implements the Calcite interface
<a href="/apidocs/org/apache/calcite/schema/Schema.html">Schema</a>.</p>
<p>A schema’s job is to produce a list of tables. (It can also list sub-schemas and
table-functions, but these are advanced features and calcite-example-csv does
not support them.) The tables implement Calcite’s
<a href="/apidocs/org/apache/calcite/schema/Table.html">Table</a>
interface. <code>CsvSchema</code> produces tables that are instances of
<a href="https://github.com/apache/calcite/blob/master/example/csv/src/main/java/org/apache/calcite/adapter/csv/CsvTable.java">CsvTable</a>
and its sub-classes.</p>
<p>Here is the relevant code from <code>CsvSchema</code>, overriding the
<code><a href="/apidocs/org/apache/calcite/schema/impl/AbstractSchema.html#getTableMap()">getTableMap()</a></code>
method in the <code>AbstractSchema</code> base class.</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">protected</span> <span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Table</span><span class="o">&gt;</span> <span class="nf">getTableMap</span><span class="o">()</span> <span class="o">{</span>
<span class="c1">// Look for files in the directory ending in ".csv", ".csv.gz", ".json",</span>
<span class="c1">// ".json.gz".</span>
<span class="n">File</span><span class="o">[]</span> <span class="n">files</span> <span class="o">=</span> <span class="n">directoryFile</span><span class="o">.</span><span class="na">listFiles</span><span class="o">(</span>
<span class="k">new</span> <span class="nf">FilenameFilter</span><span class="o">()</span> <span class="o">{</span>
<span class="kd">public</span> <span class="kt">boolean</span> <span class="nf">accept</span><span class="o">(</span><span class="n">File</span> <span class="n">dir</span><span class="o">,</span> <span class="n">String</span> <span class="n">name</span><span class="o">)</span> <span class="o">{</span>
<span class="kd">final</span> <span class="n">String</span> <span class="n">nameSansGz</span> <span class="o">=</span> <span class="n">trim</span><span class="o">(</span><span class="n">name</span><span class="o">,</span> <span class="s">".gz"</span><span class="o">);</span>
<span class="k">return</span> <span class="n">nameSansGz</span><span class="o">.</span><span class="na">endsWith</span><span class="o">(</span><span class="s">".csv"</span><span class="o">)</span>
<span class="o">||</span> <span class="n">nameSansGz</span><span class="o">.</span><span class="na">endsWith</span><span class="o">(</span><span class="s">".json"</span><span class="o">);</span>
<span class="o">}</span>
<span class="o">});</span>
<span class="k">if</span> <span class="o">(</span><span class="n">files</span> <span class="o">==</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span>
<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">"directory "</span> <span class="o">+</span> <span class="n">directoryFile</span> <span class="o">+</span> <span class="s">" not found"</span><span class="o">);</span>
<span class="n">files</span> <span class="o">=</span> <span class="k">new</span> <span class="n">File</span><span class="o">[</span><span class="mi">0</span><span class="o">];</span>
<span class="o">}</span>
<span class="c1">// Build a map from table name to table; each file becomes a table.</span>
<span class="kd">final</span> <span class="n">ImmutableMap</span><span class="o">.</span><span class="na">Builder</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Table</span><span class="o">&gt;</span> <span class="n">builder</span> <span class="o">=</span> <span class="n">ImmutableMap</span><span class="o">.</span><span class="na">builder</span><span class="o">();</span>
<span class="k">for</span> <span class="o">(</span><span class="n">File</span> <span class="n">file</span> <span class="o">:</span> <span class="n">files</span><span class="o">)</span> <span class="o">{</span>
<span class="n">String</span> <span class="n">tableName</span> <span class="o">=</span> <span class="n">trim</span><span class="o">(</span><span class="n">file</span><span class="o">.</span><span class="na">getName</span><span class="o">(),</span> <span class="s">".gz"</span><span class="o">);</span>
<span class="kd">final</span> <span class="n">String</span> <span class="n">tableNameSansJson</span> <span class="o">=</span> <span class="n">trimOrNull</span><span class="o">(</span><span class="n">tableName</span><span class="o">,</span> <span class="s">".json"</span><span class="o">);</span>
<span class="k">if</span> <span class="o">(</span><span class="n">tableNameSansJson</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span>
<span class="n">JsonTable</span> <span class="n">table</span> <span class="o">=</span> <span class="k">new</span> <span class="n">JsonTable</span><span class="o">(</span><span class="n">file</span><span class="o">);</span>
<span class="n">builder</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">tableNameSansJson</span><span class="o">,</span> <span class="n">table</span><span class="o">);</span>
<span class="k">continue</span><span class="o">;</span>
<span class="o">}</span>
<span class="n">tableName</span> <span class="o">=</span> <span class="n">trim</span><span class="o">(</span><span class="n">tableName</span><span class="o">,</span> <span class="s">".csv"</span><span class="o">);</span>
<span class="kd">final</span> <span class="n">Table</span> <span class="n">table</span> <span class="o">=</span> <span class="n">createTable</span><span class="o">(</span><span class="n">file</span><span class="o">);</span>
<span class="n">builder</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">tableName</span><span class="o">,</span> <span class="n">table</span><span class="o">);</span>
<span class="o">}</span>
<span class="k">return</span> <span class="n">builder</span><span class="o">.</span><span class="na">build</span><span class="o">();</span>
<span class="o">}</span>
<span class="cm">/** Creates different sub-type of table based on the "flavor" attribute. */</span>
<span class="kd">private</span> <span class="n">Table</span> <span class="nf">createTable</span><span class="o">(</span><span class="n">File</span> <span class="n">file</span><span class="o">)</span> <span class="o">{</span>
<span class="k">switch</span> <span class="o">(</span><span class="n">flavor</span><span class="o">)</span> <span class="o">{</span>
<span class="k">case</span> <span class="nl">TRANSLATABLE:</span>
<span class="k">return</span> <span class="k">new</span> <span class="nf">CsvTranslatableTable</span><span class="o">(</span><span class="n">file</span><span class="o">,</span> <span class="kc">null</span><span class="o">);</span>
<span class="k">case</span> <span class="nl">SCANNABLE:</span>
<span class="k">return</span> <span class="k">new</span> <span class="nf">CsvScannableTable</span><span class="o">(</span><span class="n">file</span><span class="o">,</span> <span class="kc">null</span><span class="o">);</span>
<span class="k">case</span> <span class="nl">FILTERABLE:</span>
<span class="k">return</span> <span class="k">new</span> <span class="nf">CsvFilterableTable</span><span class="o">(</span><span class="n">file</span><span class="o">,</span> <span class="kc">null</span><span class="o">);</span>
<span class="k">default</span><span class="o">:</span>
<span class="k">throw</span> <span class="k">new</span> <span class="nf">AssertionError</span><span class="o">(</span><span class="s">"Unknown flavor "</span> <span class="o">+</span> <span class="n">flavor</span><span class="o">);</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<p>The schema scans the directory and finds all files whose name ends
with “.csv” and creates tables for them. In this case, the directory
is <code>target/test-classes/sales</code> and contains files
<code>EMPS.csv</code> and <code>DEPTS.csv</code>, which these become
the tables <code>EMPS</code> and <code>DEPTS</code>.</p>
<h2 id="tables-and-views-in-schemas">Tables and views in schemas</h2>
<p>Note how we did not need to define any tables in the model; the schema
generated the tables automatically.</p>
<p>You can define extra tables,
beyond those that are created automatically,
using the <code>tables</code> property of a schema.</p>
<p>Let’s see how to create
an important and useful type of table, namely a view.</p>
<p>A view looks like a table when you are writing a query, but it doesn’t store data.
It derives its result by executing a query.
The view is expanded while the query is being planned, so the query planner
can often perform optimizations like removing expressions from the SELECT
clause that are not used in the final result.</p>
<p>Here is a schema that defines a view:</p>
<figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w">
</span><span class="err">version</span><span class="p">:</span><span class="w"> </span><span class="err">'</span><span class="mf">1.0</span><span class="err">'</span><span class="p">,</span><span class="w">
</span><span class="err">defaultSchema</span><span class="p">:</span><span class="w"> </span><span class="err">'SALES'</span><span class="p">,</span><span class="w">
</span><span class="err">schemas</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="err">name</span><span class="p">:</span><span class="w"> </span><span class="err">'SALES'</span><span class="p">,</span><span class="w">
</span><span class="err">type</span><span class="p">:</span><span class="w"> </span><span class="err">'custom'</span><span class="p">,</span><span class="w">
</span><span class="err">factory</span><span class="p">:</span><span class="w"> </span><span class="err">'org.apache.calcite.adapter.csv.CsvSchemaFactory'</span><span class="p">,</span><span class="w">
</span><span class="err">operand</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="err">directory</span><span class="p">:</span><span class="w"> </span><span class="err">'target/test-classes/sales'</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="err">tables</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="err">name</span><span class="p">:</span><span class="w"> </span><span class="err">'FEMALE_EMPS'</span><span class="p">,</span><span class="w">
</span><span class="err">type</span><span class="p">:</span><span class="w"> </span><span class="err">'view'</span><span class="p">,</span><span class="w">
</span><span class="err">sql</span><span class="p">:</span><span class="w"> </span><span class="err">'SELECT</span><span class="w"> </span><span class="err">*</span><span class="w"> </span><span class="err">FROM</span><span class="w"> </span><span class="err">emps</span><span class="w"> </span><span class="err">WHERE</span><span class="w"> </span><span class="err">gender</span><span class="w"> </span><span class="err">=</span><span class="w"> </span><span class="err">\'F\''</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">]</span><span class="w">
</span><span class="p">}</span></code></pre></figure>
<p>The line <code>type: 'view'</code> tags <code>FEMALE_EMPS</code> as a view,
as opposed to a regular table or a custom table.
Note that single-quotes within the view definition are escaped using a
back-slash, in the normal way for JSON.</p>
<p>JSON doesn’t make it easy to author long strings, so Calcite supports an
alternative syntax. If your view has a long SQL statement, you can instead
supply a list of lines rather than a single string:</p>
<figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w">
</span><span class="err">name</span><span class="p">:</span><span class="w"> </span><span class="err">'FEMALE_EMPS'</span><span class="p">,</span><span class="w">
</span><span class="err">type</span><span class="p">:</span><span class="w"> </span><span class="err">'view'</span><span class="p">,</span><span class="w">
</span><span class="err">sql</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="err">'SELECT</span><span class="w"> </span><span class="err">*</span><span class="w"> </span><span class="err">FROM</span><span class="w"> </span><span class="err">emps'</span><span class="p">,</span><span class="w">
</span><span class="err">'WHERE</span><span class="w"> </span><span class="err">gender</span><span class="w"> </span><span class="err">=</span><span class="w"> </span><span class="err">\'F\''</span><span class="w">
</span><span class="p">]</span><span class="w">
</span><span class="p">}</span></code></pre></figure>
<p>Now we have defined a view, we can use it in queries just as if it were a table:</p>
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="n">sqlline</span><span class="o">&gt;</span> <span class="k">SELECT</span> <span class="n">e</span><span class="p">.</span><span class="n">name</span><span class="p">,</span> <span class="n">d</span><span class="p">.</span><span class="n">name</span> <span class="k">FROM</span> <span class="n">female_emps</span> <span class="k">AS</span> <span class="n">e</span> <span class="k">JOIN</span> <span class="n">depts</span> <span class="k">AS</span> <span class="n">d</span> <span class="k">on</span> <span class="n">e</span><span class="p">.</span><span class="n">deptno</span> <span class="o">=</span> <span class="n">d</span><span class="p">.</span><span class="n">deptno</span><span class="p">;</span>
<span class="o">+</span><span class="c1">--------+------------+</span>
<span class="o">|</span> <span class="n">NAME</span> <span class="o">|</span> <span class="n">NAME</span> <span class="o">|</span>
<span class="o">+</span><span class="c1">--------+------------+</span>
<span class="o">|</span> <span class="n">Wilma</span> <span class="o">|</span> <span class="n">Marketing</span> <span class="o">|</span>
<span class="o">+</span><span class="c1">--------+------------+</span></code></pre></figure>
<h2 id="custom-tables">Custom tables</h2>
<p>Custom tables are tables whose implementation is driven by user-defined code.
They don’t need to live in a custom schema.</p>
<p>There is an example in <code>model-with-custom-table.json</code>:</p>
<figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w">
</span><span class="err">version</span><span class="p">:</span><span class="w"> </span><span class="err">'</span><span class="mf">1.0</span><span class="err">'</span><span class="p">,</span><span class="w">
</span><span class="err">defaultSchema</span><span class="p">:</span><span class="w"> </span><span class="err">'CUSTOM_TABLE'</span><span class="p">,</span><span class="w">
</span><span class="err">schemas</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="err">name</span><span class="p">:</span><span class="w"> </span><span class="err">'CUSTOM_TABLE'</span><span class="p">,</span><span class="w">
</span><span class="err">tables</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="err">name</span><span class="p">:</span><span class="w"> </span><span class="err">'EMPS'</span><span class="p">,</span><span class="w">
</span><span class="err">type</span><span class="p">:</span><span class="w"> </span><span class="err">'custom'</span><span class="p">,</span><span class="w">
</span><span class="err">factory</span><span class="p">:</span><span class="w"> </span><span class="err">'org.apache.calcite.adapter.csv.CsvTableFactory'</span><span class="p">,</span><span class="w">
</span><span class="err">operand</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="err">file</span><span class="p">:</span><span class="w"> </span><span class="err">'target/test-classes/sales/EMPS.csv.gz'</span><span class="p">,</span><span class="w">
</span><span class="err">flavor</span><span class="p">:</span><span class="w"> </span><span class="s2">"scannable"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">]</span><span class="w">
</span><span class="p">}</span></code></pre></figure>
<p>We can query the table in the usual way:</p>
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="n">sqlline</span><span class="o">&gt;</span> <span class="o">!</span><span class="k">connect</span> <span class="n">jdbc</span><span class="p">:</span><span class="n">calcite</span><span class="p">:</span><span class="n">model</span><span class="o">=</span><span class="n">target</span><span class="o">/</span><span class="n">test</span><span class="o">-</span><span class="n">classes</span><span class="o">/</span><span class="n">model</span><span class="o">-</span><span class="k">with</span><span class="o">-</span><span class="n">custom</span><span class="o">-</span><span class="k">table</span><span class="p">.</span><span class="n">json</span> <span class="k">admin</span> <span class="k">admin</span>
<span class="n">sqlline</span><span class="o">&gt;</span> <span class="k">SELECT</span> <span class="n">empno</span><span class="p">,</span> <span class="n">name</span> <span class="k">FROM</span> <span class="n">custom_table</span><span class="p">.</span><span class="n">emps</span><span class="p">;</span>
<span class="o">+</span><span class="c1">--------+--------+</span>
<span class="o">|</span> <span class="n">EMPNO</span> <span class="o">|</span> <span class="n">NAME</span> <span class="o">|</span>
<span class="o">+</span><span class="c1">--------+--------+</span>
<span class="o">|</span> <span class="mi">100</span> <span class="o">|</span> <span class="n">Fred</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">110</span> <span class="o">|</span> <span class="n">Eric</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">110</span> <span class="o">|</span> <span class="n">John</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">120</span> <span class="o">|</span> <span class="n">Wilma</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">130</span> <span class="o">|</span> <span class="n">Alice</span> <span class="o">|</span>
<span class="o">+</span><span class="c1">--------+--------+</span></code></pre></figure>
<p>The schema is a regular one, and contains a custom table powered by
<a href="https://github.com/apache/calcite/blob/master/example/csv/src/main/java/org/apache/calcite/adapter/csv/CsvTableFactory.java">org.apache.calcite.adapter.csv.CsvTableFactory</a>,
which implements the Calcite interface
<a href="/apidocs/org/apache/calcite/schema/TableFactory.html">TableFactory</a>.
Its <code>create</code> method instantiates a <code>CsvScannableTable</code>,
passing in the <code>file</code> argument from the model file:</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="n">CsvTable</span> <span class="nf">create</span><span class="o">(</span><span class="n">SchemaPlus</span> <span class="n">schema</span><span class="o">,</span> <span class="n">String</span> <span class="n">name</span><span class="o">,</span>
<span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Object</span><span class="o">&gt;</span> <span class="n">map</span><span class="o">,</span> <span class="n">RelDataType</span> <span class="n">rowType</span><span class="o">)</span> <span class="o">{</span>
<span class="n">String</span> <span class="n">fileName</span> <span class="o">=</span> <span class="o">(</span><span class="n">String</span><span class="o">)</span> <span class="n">map</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="s">"file"</span><span class="o">);</span>
<span class="kd">final</span> <span class="n">File</span> <span class="n">file</span> <span class="o">=</span> <span class="k">new</span> <span class="n">File</span><span class="o">(</span><span class="n">fileName</span><span class="o">);</span>
<span class="kd">final</span> <span class="n">RelProtoDataType</span> <span class="n">protoRowType</span> <span class="o">=</span>
<span class="n">rowType</span> <span class="o">!=</span> <span class="kc">null</span> <span class="o">?</span> <span class="n">RelDataTypeImpl</span><span class="o">.</span><span class="na">proto</span><span class="o">(</span><span class="n">rowType</span><span class="o">)</span> <span class="o">:</span> <span class="kc">null</span><span class="o">;</span>
<span class="k">return</span> <span class="k">new</span> <span class="nf">CsvScannableTable</span><span class="o">(</span><span class="n">file</span><span class="o">,</span> <span class="n">protoRowType</span><span class="o">);</span>
<span class="o">}</span></code></pre></figure>
<p>Implementing a custom table is often a simpler alternative to implementing
a custom schema. Both approaches might end up creating a similar implementation
of the <code>Table</code> interface, but for the custom table you don’t
need to implement metadata discovery. (<code>CsvTableFactory</code>
creates a <code>CsvScannableTable</code>, just as <code>CsvSchema</code> does,
but the table implementation does not scan the filesystem for .csv files.)</p>
<p>Custom tables require more work for the author of the model (the author
needs to specify each table and its file explicitly) but also give the author
more control (say, providing different parameters for each table).</p>
<h2 id="comments-in-models">Comments in models</h2>
<p>Models can include comments using <code class="highlighter-rouge">/* ... */</code> and <code class="highlighter-rouge">//</code> syntax:</p>
<figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w">
</span><span class="err">version</span><span class="p">:</span><span class="w"> </span><span class="err">'</span><span class="mf">1.0</span><span class="err">'</span><span class="p">,</span><span class="w">
</span><span class="err">/*</span><span class="w"> </span><span class="err">Multi-line</span><span class="w">
</span><span class="err">comment.</span><span class="w"> </span><span class="err">*/</span><span class="w">
</span><span class="err">defaultSchema</span><span class="p">:</span><span class="w"> </span><span class="err">'CUSTOM_TABLE'</span><span class="p">,</span><span class="w">
</span><span class="err">//</span><span class="w"> </span><span class="err">Single-line</span><span class="w"> </span><span class="err">comment.</span><span class="w">
</span><span class="err">schemas</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="err">..</span><span class="w">
</span><span class="p">]</span><span class="w">
</span><span class="p">}</span></code></pre></figure>
<p>(Comments are not standard JSON, but are a harmless extension.)</p>
<h2 id="optimizing-queries-using-planner-rules">Optimizing queries using planner rules</h2>
<p>The table implementations we have seen so far are fine as long as the tables
don’t contain a great deal of data. But if your customer table has, say, a
hundred columns and a million rows, you would rather that the system did not
retrieve all of the data for every query. You would like Calcite to negotiate
with the adapter and find a more efficient way of accessing the data.</p>
<p>This negotiation is a simple form of query optimization. Calcite supports query
optimization by adding <i>planner rules</i>. Planner rules operate by
looking for patterns in the query parse tree (for instance a project on top
of a certain kind of table), and replacing the matched nodes in the tree by
a new set of nodes which implement the optimization.</p>
<p>Planner rules are also extensible, like schemas and tables. So, if you have a
data store that you want to access via SQL, you first define a custom table or
schema, and then you define some rules to make the access efficient.</p>
<p>To see this in action, let’s use a planner rule to access
a subset of columns from a CSV file. Let’s run the same query against two very
similar schemas:</p>
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="n">sqlline</span><span class="o">&gt;</span> <span class="o">!</span><span class="k">connect</span> <span class="n">jdbc</span><span class="p">:</span><span class="n">calcite</span><span class="p">:</span><span class="n">model</span><span class="o">=</span><span class="n">target</span><span class="o">/</span><span class="n">test</span><span class="o">-</span><span class="n">classes</span><span class="o">/</span><span class="n">model</span><span class="p">.</span><span class="n">json</span> <span class="k">admin</span> <span class="k">admin</span>
<span class="n">sqlline</span><span class="o">&gt;</span> <span class="k">explain</span> <span class="n">plan</span> <span class="k">for</span> <span class="k">select</span> <span class="n">name</span> <span class="k">from</span> <span class="n">emps</span><span class="p">;</span>
<span class="o">+</span><span class="c1">-----------------------------------------------------+</span>
<span class="o">|</span> <span class="n">PLAN</span> <span class="o">|</span>
<span class="o">+</span><span class="c1">-----------------------------------------------------+</span>
<span class="o">|</span> <span class="n">EnumerableCalcRel</span><span class="p">(</span><span class="n">expr</span><span class="o">#</span><span class="mi">0</span><span class="p">..</span><span class="mi">9</span><span class="o">=</span><span class="p">[</span><span class="err">{</span><span class="n">inputs</span><span class="err">}</span><span class="p">],</span> <span class="n">NAME</span><span class="o">=</span><span class="p">[</span><span class="err">$</span><span class="n">t1</span><span class="p">])</span> <span class="o">|</span>
<span class="o">|</span> <span class="n">EnumerableTableScan</span><span class="p">(</span><span class="k">table</span><span class="o">=</span><span class="p">[[</span><span class="n">SALES</span><span class="p">,</span> <span class="n">EMPS</span><span class="p">]])</span> <span class="o">|</span>
<span class="o">+</span><span class="c1">-----------------------------------------------------+</span>
<span class="n">sqlline</span><span class="o">&gt;</span> <span class="o">!</span><span class="k">connect</span> <span class="n">jdbc</span><span class="p">:</span><span class="n">calcite</span><span class="p">:</span><span class="n">model</span><span class="o">=</span><span class="n">target</span><span class="o">/</span><span class="n">test</span><span class="o">-</span><span class="n">classes</span><span class="o">/</span><span class="n">smart</span><span class="p">.</span><span class="n">json</span> <span class="k">admin</span> <span class="k">admin</span>
<span class="n">sqlline</span><span class="o">&gt;</span> <span class="k">explain</span> <span class="n">plan</span> <span class="k">for</span> <span class="k">select</span> <span class="n">name</span> <span class="k">from</span> <span class="n">emps</span><span class="p">;</span>
<span class="o">+</span><span class="c1">-----------------------------------------------------+</span>
<span class="o">|</span> <span class="n">PLAN</span> <span class="o">|</span>
<span class="o">+</span><span class="c1">-----------------------------------------------------+</span>
<span class="o">|</span> <span class="n">EnumerableCalcRel</span><span class="p">(</span><span class="n">expr</span><span class="o">#</span><span class="mi">0</span><span class="p">..</span><span class="mi">9</span><span class="o">=</span><span class="p">[</span><span class="err">{</span><span class="n">inputs</span><span class="err">}</span><span class="p">],</span> <span class="n">NAME</span><span class="o">=</span><span class="p">[</span><span class="err">$</span><span class="n">t1</span><span class="p">])</span> <span class="o">|</span>
<span class="o">|</span> <span class="n">CsvTableScan</span><span class="p">(</span><span class="k">table</span><span class="o">=</span><span class="p">[[</span><span class="n">SALES</span><span class="p">,</span> <span class="n">EMPS</span><span class="p">]])</span> <span class="o">|</span>
<span class="o">+</span><span class="c1">-----------------------------------------------------+</span></code></pre></figure>
<p>What causes the difference in plan? Let’s follow the trail of evidence. In the
<code>smart.json</code> model file, there is just one extra line:</p>
<figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="err">flavor</span><span class="p">:</span><span class="w"> </span><span class="s2">"translatable"</span></code></pre></figure>
<p>This causes a <code>CsvSchema</code> to be created with
<code>flavor = TRANSLATABLE</code>,
and its <code>createTable</code> method creates instances of
<a href="https://github.com/apache/calcite/blob/master/example/csv/src/main/java/org/apache/calcite/adapter/csv/CsvTranslatableTable.java">CsvTranslatableTable</a>
rather than a <code>CsvScannableTable</code>.</p>
<p><code>CsvTranslatableTable</code> implements the
<code><a href="/apidocs/org/apache/calcite/schema/TranslatableTable.html#toRel()">TranslatableTable.toRel()</a></code>
method to create
<a href="https://github.com/apache/calcite/blob/master/example/csv/src/main/java/org/apache/calcite/adapter/csv/CsvTableScan.java">CsvTableScan</a>.
Table scans are the leaves of a query operator tree.
The usual implementation is
<code><a href="/apidocs/org/apache/calcite/adapter/enumerable/EnumerableTableScan.html">EnumerableTableScan</a></code>,
but we have created a distinctive sub-type that will cause rules to fire.</p>
<p>Here is the rule in its entirety:</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="kd">class</span> <span class="nc">CsvProjectTableScanRule</span> <span class="kd">extends</span> <span class="n">RelOptRule</span> <span class="o">{</span>
<span class="kd">public</span> <span class="kd">static</span> <span class="kd">final</span> <span class="n">CsvProjectTableScanRule</span> <span class="n">INSTANCE</span> <span class="o">=</span>
<span class="k">new</span> <span class="nf">CsvProjectTableScanRule</span><span class="o">();</span>
<span class="kd">private</span> <span class="nf">CsvProjectTableScanRule</span><span class="o">()</span> <span class="o">{</span>
<span class="kd">super</span><span class="o">(</span>
<span class="n">operand</span><span class="o">(</span><span class="n">Project</span><span class="o">.</span><span class="na">class</span><span class="o">,</span>
<span class="n">operand</span><span class="o">(</span><span class="n">CsvTableScan</span><span class="o">.</span><span class="na">class</span><span class="o">,</span> <span class="n">none</span><span class="o">())),</span>
<span class="s">"CsvProjectTableScanRule"</span><span class="o">);</span>
<span class="o">}</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">onMatch</span><span class="o">(</span><span class="n">RelOptRuleCall</span> <span class="n">call</span><span class="o">)</span> <span class="o">{</span>
<span class="kd">final</span> <span class="n">Project</span> <span class="n">project</span> <span class="o">=</span> <span class="n">call</span><span class="o">.</span><span class="na">rel</span><span class="o">(</span><span class="mi">0</span><span class="o">);</span>
<span class="kd">final</span> <span class="n">CsvTableScan</span> <span class="n">scan</span> <span class="o">=</span> <span class="n">call</span><span class="o">.</span><span class="na">rel</span><span class="o">(</span><span class="mi">1</span><span class="o">);</span>
<span class="kt">int</span><span class="o">[]</span> <span class="n">fields</span> <span class="o">=</span> <span class="n">getProjectFields</span><span class="o">(</span><span class="n">project</span><span class="o">.</span><span class="na">getProjects</span><span class="o">());</span>
<span class="k">if</span> <span class="o">(</span><span class="n">fields</span> <span class="o">==</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// Project contains expressions more complex than just field references.</span>
<span class="k">return</span><span class="o">;</span>
<span class="o">}</span>
<span class="n">call</span><span class="o">.</span><span class="na">transformTo</span><span class="o">(</span>
<span class="k">new</span> <span class="nf">CsvTableScan</span><span class="o">(</span>
<span class="n">scan</span><span class="o">.</span><span class="na">getCluster</span><span class="o">(),</span>
<span class="n">scan</span><span class="o">.</span><span class="na">getTable</span><span class="o">(),</span>
<span class="n">scan</span><span class="o">.</span><span class="na">csvTable</span><span class="o">,</span>
<span class="n">fields</span><span class="o">));</span>
<span class="o">}</span>
<span class="kd">private</span> <span class="kt">int</span><span class="o">[]</span> <span class="nf">getProjectFields</span><span class="o">(</span><span class="n">List</span><span class="o">&lt;</span><span class="n">RexNode</span><span class="o">&gt;</span> <span class="n">exps</span><span class="o">)</span> <span class="o">{</span>
<span class="kd">final</span> <span class="kt">int</span><span class="o">[]</span> <span class="n">fields</span> <span class="o">=</span> <span class="k">new</span> <span class="kt">int</span><span class="o">[</span><span class="n">exps</span><span class="o">.</span><span class="na">size</span><span class="o">()];</span>
<span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">exps</span><span class="o">.</span><span class="na">size</span><span class="o">();</span> <span class="n">i</span><span class="o">++)</span> <span class="o">{</span>
<span class="kd">final</span> <span class="n">RexNode</span> <span class="n">exp</span> <span class="o">=</span> <span class="n">exps</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="n">i</span><span class="o">);</span>
<span class="k">if</span> <span class="o">(</span><span class="n">exp</span> <span class="k">instanceof</span> <span class="n">RexInputRef</span><span class="o">)</span> <span class="o">{</span>
<span class="n">fields</span><span class="o">[</span><span class="n">i</span><span class="o">]</span> <span class="o">=</span> <span class="o">((</span><span class="n">RexInputRef</span><span class="o">)</span> <span class="n">exp</span><span class="o">).</span><span class="na">getIndex</span><span class="o">();</span>
<span class="o">}</span> <span class="k">else</span> <span class="o">{</span>
<span class="k">return</span> <span class="kc">null</span><span class="o">;</span> <span class="c1">// not a simple projection</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="k">return</span> <span class="n">fields</span><span class="o">;</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<p>The constructor declares the pattern of relational expressions that will cause
the rule to fire.</p>
<p>The <code>onMatch</code> method generates a new relational expression and calls
<code><a href="/apidocs/org/apache/calcite/plan/RelOptRuleCall.html#transformTo(org.apache.calcite.rel.RelNode)">RelOptRuleCall.transformTo()</a></code>
to indicate that the rule has fired successfully.</p>
<h2 id="the-query-optimization-process">The query optimization process</h2>
<p>There’s a lot to say about how clever Calcite’s query planner is, but we won’t
say it here. The cleverness is designed to take the burden off you, the writer
of planner rules.</p>
<p>First, Calcite doesn’t fire rules in a prescribed order. The query optimization
process follows many branches of a branching tree, just like a chess playing
program examines many possible sequences of moves. If rules A and B both match a
given section of the query operator tree, then Calcite can fire both.</p>
<p>Second, Calcite uses cost in choosing between plans, but the cost model doesn’t
prevent rules from firing which may seem to be more expensive in the short term.</p>
<p>Many optimizers have a linear optimization scheme. Faced with a choice between
rule A and rule B, as above, such an optimizer needs to choose immediately. It
might have a policy such as “apply rule A to the whole tree, then apply rule B
to the whole tree”, or apply a cost-based policy, applying the rule that
produces the cheaper result.</p>
<p>Calcite doesn’t require such compromises.
This makes it simple to combine various sets of rules.
If, say you want to combine rules to recognize materialized views with rules to
read from CSV and JDBC source systems, you just give Calcite the set of all
rules and tell it to go at it.</p>
<p>Calcite does use a cost model. The cost model decides which plan to ultimately
use, and sometimes to prune the search tree to prevent the search space from
exploding, but it never forces you to choose between rule A and rule B. This is
important, because it avoids falling into local minima in the search space that
are not actually optimal.</p>
<p>Also (you guessed it) the cost model is pluggable, as are the table and query
operator statistics it is based upon. But that can be a subject for later.</p>
<h2 id="jdbc-adapter">JDBC adapter</h2>
<p>The JDBC adapter maps a schema in a JDBC data source as a Calcite schema.</p>
<p>For example, this schema reads from a MySQL “foodmart” database:</p>
<figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w">
</span><span class="err">version</span><span class="p">:</span><span class="w"> </span><span class="err">'</span><span class="mf">1.0</span><span class="err">'</span><span class="p">,</span><span class="w">
</span><span class="err">defaultSchema</span><span class="p">:</span><span class="w"> </span><span class="err">'FOODMART'</span><span class="p">,</span><span class="w">
</span><span class="err">schemas</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="err">name</span><span class="p">:</span><span class="w"> </span><span class="err">'FOODMART'</span><span class="p">,</span><span class="w">
</span><span class="err">type</span><span class="p">:</span><span class="w"> </span><span class="err">'custom'</span><span class="p">,</span><span class="w">
</span><span class="err">factory</span><span class="p">:</span><span class="w"> </span><span class="err">'org.apache.calcite.adapter.jdbc.JdbcSchema$Factory'</span><span class="p">,</span><span class="w">
</span><span class="err">operand</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="err">jdbcDriver</span><span class="p">:</span><span class="w"> </span><span class="err">'com.mysql.jdbc.Driver'</span><span class="p">,</span><span class="w">
</span><span class="err">jdbcUrl</span><span class="p">:</span><span class="w"> </span><span class="err">'jdbc</span><span class="p">:</span><span class="err">mysql</span><span class="p">:</span><span class="err">//localhost/foodmart'</span><span class="p">,</span><span class="w">
</span><span class="err">jdbcUser</span><span class="p">:</span><span class="w"> </span><span class="err">'foodmart'</span><span class="p">,</span><span class="w">
</span><span class="err">jdbcPassword</span><span class="p">:</span><span class="w"> </span><span class="err">'foodmart'</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">]</span><span class="w">
</span><span class="p">}</span></code></pre></figure>
<p>(The FoodMart database will be familiar to those of you who have used
the Mondrian OLAP engine, because it is Mondrian’s main test data
set. To load the data set, follow <a href="https://mondrian.pentaho.com/documentation/installation.php#2_Set_up_test_data">Mondrian’s
installation instructions</a>.)</p>
<p><b>Current limitations</b>: The JDBC adapter currently only pushes
down table scan operations; all other processing (filtering, joins,
aggregations and so forth) occurs within Calcite. Our goal is to push
down as much processing as possible to the source system, translating
syntax, data types and built-in functions as we go. If a Calcite query
is based on tables from a single JDBC database, in principle the whole
query should go to that database. If tables are from multiple JDBC
sources, or a mixture of JDBC and non-JDBC, Calcite will use the most
efficient distributed query approach that it can.</p>
<h2 id="the-cloning-jdbc-adapter">The cloning JDBC adapter</h2>
<p>The cloning JDBC adapter creates a hybrid database. The data is
sourced from a JDBC database but is read into in-memory tables the
first time each table is accessed. Calcite evaluates queries based on
those in-memory tables, effectively a cache of the database.</p>
<p>For example, the following model reads tables from a MySQL
“foodmart” database:</p>
<figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w">
</span><span class="err">version</span><span class="p">:</span><span class="w"> </span><span class="err">'</span><span class="mf">1.0</span><span class="err">'</span><span class="p">,</span><span class="w">
</span><span class="err">defaultSchema</span><span class="p">:</span><span class="w"> </span><span class="err">'FOODMART_CLONE'</span><span class="p">,</span><span class="w">
</span><span class="err">schemas</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="err">name</span><span class="p">:</span><span class="w"> </span><span class="err">'FOODMART_CLONE'</span><span class="p">,</span><span class="w">
</span><span class="err">type</span><span class="p">:</span><span class="w"> </span><span class="err">'custom'</span><span class="p">,</span><span class="w">
</span><span class="err">factory</span><span class="p">:</span><span class="w"> </span><span class="err">'org.apache.calcite.adapter.clone.CloneSchema$Factory'</span><span class="p">,</span><span class="w">
</span><span class="err">operand</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="err">jdbcDriver</span><span class="p">:</span><span class="w"> </span><span class="err">'com.mysql.jdbc.Driver'</span><span class="p">,</span><span class="w">
</span><span class="err">jdbcUrl</span><span class="p">:</span><span class="w"> </span><span class="err">'jdbc</span><span class="p">:</span><span class="err">mysql</span><span class="p">:</span><span class="err">//localhost/foodmart'</span><span class="p">,</span><span class="w">
</span><span class="err">jdbcUser</span><span class="p">:</span><span class="w"> </span><span class="err">'foodmart'</span><span class="p">,</span><span class="w">
</span><span class="err">jdbcPassword</span><span class="p">:</span><span class="w"> </span><span class="err">'foodmart'</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">]</span><span class="w">
</span><span class="p">}</span></code></pre></figure>
<p>Another technique is to build a clone schema on top of an existing
schema. You use the <code>source</code> property to reference a schema
defined earlier in the model, like this:</p>
<figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w">
</span><span class="err">version</span><span class="p">:</span><span class="w"> </span><span class="err">'</span><span class="mf">1.0</span><span class="err">'</span><span class="p">,</span><span class="w">
</span><span class="err">defaultSchema</span><span class="p">:</span><span class="w"> </span><span class="err">'FOODMART_CLONE'</span><span class="p">,</span><span class="w">
</span><span class="err">schemas</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="err">name</span><span class="p">:</span><span class="w"> </span><span class="err">'FOODMART'</span><span class="p">,</span><span class="w">
</span><span class="err">type</span><span class="p">:</span><span class="w"> </span><span class="err">'custom'</span><span class="p">,</span><span class="w">
</span><span class="err">factory</span><span class="p">:</span><span class="w"> </span><span class="err">'org.apache.calcite.adapter.jdbc.JdbcSchema$Factory'</span><span class="p">,</span><span class="w">
</span><span class="err">operand</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="err">jdbcDriver</span><span class="p">:</span><span class="w"> </span><span class="err">'com.mysql.jdbc.Driver'</span><span class="p">,</span><span class="w">
</span><span class="err">jdbcUrl</span><span class="p">:</span><span class="w"> </span><span class="err">'jdbc</span><span class="p">:</span><span class="err">mysql</span><span class="p">:</span><span class="err">//localhost/foodmart'</span><span class="p">,</span><span class="w">
</span><span class="err">jdbcUser</span><span class="p">:</span><span class="w"> </span><span class="err">'foodmart'</span><span class="p">,</span><span class="w">
</span><span class="err">jdbcPassword</span><span class="p">:</span><span class="w"> </span><span class="err">'foodmart'</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="err">name</span><span class="p">:</span><span class="w"> </span><span class="err">'FOODMART_CLONE'</span><span class="p">,</span><span class="w">
</span><span class="err">type</span><span class="p">:</span><span class="w"> </span><span class="err">'custom'</span><span class="p">,</span><span class="w">
</span><span class="err">factory</span><span class="p">:</span><span class="w"> </span><span class="err">'org.apache.calcite.adapter.clone.CloneSchema$Factory'</span><span class="p">,</span><span class="w">
</span><span class="err">operand</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="err">source</span><span class="p">:</span><span class="w"> </span><span class="err">'FOODMART'</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">]</span><span class="w">
</span><span class="p">}</span></code></pre></figure>
<p>You can use this approach to create a clone schema on any type of
schema, not just JDBC.</p>
<p>The cloning adapter isn’t the be-all and end-all. We plan to develop
more sophisticated caching strategies, and a more complete and
efficient implementation of in-memory tables, but for now the cloning
JDBC adapter shows what is possible and allows us to try out our
initial implementations.</p>
<h2 id="further-topics">Further topics</h2>
<p>There are many other ways to extend Calcite not yet described in this tutorial.
The <a href="adapter.html">adapter specification</a> describes the APIs involved.</p>
<div class="section-nav">
<div class="left align-right">
<a href="/docs/index.html" class="prev">Previous</a>
</div>
<div class="right align-left">
<a href="/docs/algebra.html" class="next">Next</a>
</div>
</div>
<div class="clear"></div>
</article>
</div>
<div class="unit one-fifth hide-on-mobiles">
<aside>
<h4>Overview</h4>
<ul>
<li class=""><a href="/docs/index.html">Background</a></li>
<li class="current"><a href="/docs/tutorial.html">Tutorial</a></li>
<li class=""><a href="/docs/algebra.html">Algebra</a></li>
</ul>
<h4>Advanced</h4>
<ul>
<li class=""><a href="/docs/adapter.html">Adapters</a></li>
<li class=""><a href="/docs/spatial.html">Spatial</a></li>
<li class=""><a href="/docs/stream.html">Streaming</a></li>
<li class=""><a href="/docs/materialized_views.html">Materialized Views</a></li>
<li class=""><a href="/docs/lattice.html">Lattices</a></li>
</ul>
<h4>Avatica</h4>
<ul>
<li class=""><a href="/docs/avatica_overview.html">Overview</a></li>
<li class=""><a href="/docs/avatica_roadmap.html">Roadmap</a></li>
<li class=""><a href="/docs/avatica_json_reference.html">JSON Reference</a></li>
<li class=""><a href="/docs/avatica_protobuf_reference.html">Protobuf Reference</a></li>
</ul>
<h4>Reference</h4>
<ul>
<li class=""><a href="/docs/reference.html">SQL language</a></li>
<li class=""><a href="/docs/model.html">JSON/YAML models</a></li>
<li class=""><a href="/docs/howto.html">HOWTO</a></li>
</ul>
<h4>Meta</h4>
<ul>
<li class=""><a href="/docs/history.html">History</a></li>
<li class=""><a href="/docs/powered_by.html">Powered by Calcite</a></li>
<li class=""><a href="/apidocs">API</a></li>
<li class=""><a href="/testapidocs">Test API</a></li>
</ul>
</aside>
</div>
<div class="clear"></div>
</div>
</section>
<footer role="contentinfo">
<div id="poweredby">
<a href="http://www.apache.org/">
<span class="sr-only">Apache</span>
<img src="/img/feather.png" width="190" height="77" alt="Apache Logo"></a>
</div>
<div id="copyright">
<p>The contents of this website are Copyright &copy;&nbsp;2019
<a href="https://www.apache.org/">Apache Software Foundation</a>
under the terms of
the <a href="https://www.apache.org/licenses/">
Apache&nbsp;License&nbsp;v2</a>. Apache Calcite and its logo are
trademarks of the Apache Software Foundation.</p>
</div>
</footer>
<script>
var anchorForId = function (id) {
var anchor = document.createElement("a");
anchor.className = "header-link";
anchor.href = "#" + id;
anchor.innerHTML = "<span class=\"sr-only\">Permalink</span><i class=\"fa fa-link\"></i>";
anchor.title = "Permalink";
return anchor;
};
var linkifyAnchors = function (level, containingElement) {
var headers = containingElement.getElementsByTagName("h" + level);
for (var h = 0; h < headers.length; h++) {
var header = headers[h];
if (typeof header.id !== "undefined" && header.id !== "") {
header.appendChild(anchorForId(header.id));
}
}
};
document.onreadystatechange = function () {
if (this.readyState === "complete") {
var contentBlock = document.getElementsByClassName("docs")[0] || document.getElementsByClassName("news")[0];
if (!contentBlock) {
return;
}
for (var level = 1; level <= 6; level++) {
linkifyAnchors(level, contentBlock);
}
}
};
</script>
</body>
</html>