| <!DOCTYPE HTML> |
| <html lang="en-US"> |
| <head> |
| <meta charset="UTF-8"> |
| <title>Tutorial</title> |
| <meta name="viewport" content="width=device-width,initial-scale=1"> |
| <meta name="generator" content="Jekyll v3.7.3"> |
| <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900"> |
| <link rel="stylesheet" href="/css/screen.css"> |
| <link rel="icon" type="image/x-icon" href="/favicon.ico"> |
| <!--[if lt IE 9]> |
| <script src="/js/html5shiv.min.js"></script> |
| <script src="/js/respond.min.js"></script> |
| <![endif]--> |
| </head> |
| |
| |
| <body class="wrap"> |
| <header role="banner"> |
| <nav class="mobile-nav show-on-mobiles"> |
| <ul> |
| <li class=""> |
| <a href="/">Home</a> |
| </li> |
| <li class=""> |
| <a href="/downloads/">Download</a> |
| </li> |
| <li class=""> |
| <a href="/community/">Community</a> |
| </li> |
| <li class=""> |
| <a href="/develop/">Develop</a> |
| </li> |
| <li class=""> |
| <a href="/news/">News</a> |
| </li> |
| <li class="current"> |
| <a href="/docs/">Docs</a> |
| </li> |
| </ul> |
| |
| </nav> |
| <div class="grid"> |
| <div class="unit one-third center-on-mobiles"> |
| <h1> |
| <a href="/"> |
| <span class="sr-only">Apache Calcite</span> |
| <img src="/img/logo.png" width="226" height="140" alt="Calcite Logo"> |
| </a> |
| </h1> |
| </div> |
| <nav class="main-nav unit two-thirds hide-on-mobiles"> |
| <ul> |
| <li class=""> |
| <a href="/">Home</a> |
| </li> |
| <li class=""> |
| <a href="/downloads/">Download</a> |
| </li> |
| <li class=""> |
| <a href="/community/">Community</a> |
| </li> |
| <li class=""> |
| <a href="/develop/">Develop</a> |
| </li> |
| <li class=""> |
| <a href="/news/">News</a> |
| </li> |
| <li class="current"> |
| <a href="/docs/">Docs</a> |
| </li> |
| </ul> |
| |
| </nav> |
| </div> |
| </header> |
| |
| |
| <section class="docs"> |
| <div class="grid"> |
| |
| <div class="docs-nav-mobile unit whole show-on-mobiles"> |
| <select onchange="if (this.value) window.location.href=this.value"> |
| <option value="">Navigate the docs…</option> |
| <optgroup label="Overview"> |
| </optgroup> |
| <optgroup label="Advanced"> |
| </optgroup> |
| <optgroup label="Avatica"> |
| </optgroup> |
| <optgroup label="Reference"> |
| </optgroup> |
| <optgroup label="Meta"> |
| </optgroup> |
| |
| </select> |
| </div> |
| |
| |
| <div class="unit four-fifths"> |
| <article> |
| <h1>Tutorial</h1> |
| <!-- |
| |
| --> |
| |
| <p>This is a step-by-step tutorial that shows how to build and connect to |
| Calcite. It uses a simple adapter that makes a directory of CSV files |
| appear to be a schema containing tables. Calcite does the rest, and |
| provides a full SQL interface.</p> |
| |
| <p>Calcite-example-CSV is a fully functional adapter for |
| Calcite that reads |
| text files in |
| <a href="https://en.wikipedia.org/wiki/Comma-separated_values">CSV |
| (comma-separated values)</a> format. It is remarkable that a couple of |
| hundred lines of Java code are sufficient to provide full SQL query |
| capability.</p> |
| |
| <p>CSV also serves as a template for building adapters to other |
| data formats. Even though there are not many lines of code, it covers |
| several important concepts:</p> |
| |
| <ul> |
| <li>user-defined schema using SchemaFactory and Schema interfaces;</li> |
| <li>declaring schemas in a model JSON file;</li> |
| <li>declaring views in a model JSON file;</li> |
| <li>user-defined table using the Table interface;</li> |
| <li>determining the record type of a table;</li> |
| <li>a simple implementation of Table, using the ScannableTable interface, that |
| enumerates all rows directly;</li> |
| <li>a more advanced implementation that implements FilterableTable, and can |
| filter out rows according to simple predicates;</li> |
| <li>advanced implementation of Table, using TranslatableTable, that translates |
| to relational operators using planner rules.</li> |
| </ul> |
| |
| <h2 id="download-and-build">Download and build</h2> |
| |
| <p>You need Java (version 8, 9 or 10) and git.</p> |
| |
| <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>git clone https://github.com/apache/calcite.git |
| <span class="nv">$ </span><span class="nb">cd </span>calcite |
| <span class="nv">$ </span>./mvnw install <span class="nt">-DskipTests</span> <span class="nt">-Dcheckstyle</span>.skip<span class="o">=</span><span class="nb">true</span> |
| <span class="nv">$ </span><span class="nb">cd </span>example/csv</code></pre></figure> |
| |
| <h2 id="first-queries">First queries</h2> |
| |
| <p>Now let’s connect to Calcite using |
| <a href="https://github.com/julianhyde/sqlline">sqlline</a>, a SQL shell |
| that is included in this project.</p> |
| |
| <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>./sqlline |
| sqlline> <span class="o">!</span>connect jdbc:calcite:model<span class="o">=</span>target/test-classes/model.json admin admin</code></pre></figure> |
| |
| <p>(If you are running Windows, the command is <code class="highlighter-rouge">sqlline.bat</code>.)</p> |
| |
| <p>Execute a metadata query:</p> |
| |
| <figure class="highlight"><pre><code class="language-bash" data-lang="bash">sqlline> <span class="o">!</span>tables |
| +------------+--------------+-------------+---------------+----------+------+ |
| | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE | |
| +------------+--------------+-------------+---------------+----------+------+ |
| | null | SALES | DEPTS | TABLE | null | null | |
| | null | SALES | EMPS | TABLE | null | null | |
| | null | SALES | HOBBIES | TABLE | null | null | |
| | null | metadata | COLUMNS | SYSTEM_TABLE | null | null | |
| | null | metadata | TABLES | SYSTEM_TABLE | null | null | |
| +------------+--------------+-------------+---------------+----------+------+</code></pre></figure> |
| |
| <p>(JDBC experts, note: sqlline’s <code>!tables</code> command is just executing |
| <a href="https://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getTables(java.lang.String, java.lang.String, java.lang.String, java.lang.String[])"><code>DatabaseMetaData.getTables()</code></a> |
| behind the scenes. |
| It has other commands to query JDBC metadata, such as <code>!columns</code> and <code>!describe</code>.)</p> |
| |
| <p>As you can see there are 5 tables in the system: tables |
| <code>EMPS</code>, <code>DEPTS</code> and <code>HOBBIES</code> in the current |
| <code>SALES</code> schema, and <code>COLUMNS</code> and |
| <code>TABLES</code> in the system <code>metadata</code> schema. The |
| system tables are always present in Calcite, but the other tables are |
| provided by the specific implementation of the schema; in this case, |
| the <code>EMPS</code> and <code>DEPTS</code> tables are based on the |
| <code>EMPS.csv</code> and <code>DEPTS.csv</code> files in the |
| <code>target/test-classes</code> directory.</p> |
| |
| <p>Let’s execute some queries on those tables, to show that Calcite is providing |
| a full implementation of SQL. First, a table scan:</p> |
| |
| <figure class="highlight"><pre><code class="language-bash" data-lang="bash">sqlline> SELECT <span class="k">*</span> FROM emps<span class="p">;</span> |
| +--------+--------+---------+---------+----------------+--------+-------+---+ |
| | EMPNO | NAME | DEPTNO | GENDER | CITY | EMPID | AGE | S | |
| +--------+--------+---------+---------+----------------+--------+-------+---+ |
| | 100 | Fred | 10 | | | 30 | 25 | t | |
| | 110 | Eric | 20 | M | San Francisco | 3 | 80 | n | |
| | 110 | John | 40 | M | Vancouver | 2 | null | f | |
| | 120 | Wilma | 20 | F | | 1 | 5 | n | |
| | 130 | Alice | 40 | F | Vancouver | 2 | null | f | |
| +--------+--------+---------+---------+----------------+--------+-------+---+</code></pre></figure> |
| |
| <p>Now JOIN and GROUP BY:</p> |
| |
| <figure class="highlight"><pre><code class="language-bash" data-lang="bash">sqlline> SELECT d.name, COUNT<span class="o">(</span><span class="k">*</span><span class="o">)</span> |
| <span class="nb">.</span> <span class="nb">.</span> <span class="nb">.</span> .> FROM emps AS e JOIN depts AS d ON e.deptno <span class="o">=</span> d.deptno |
| <span class="nb">.</span> <span class="nb">.</span> <span class="nb">.</span> .> GROUP BY d.name<span class="p">;</span> |
| +------------+---------+ |
| | NAME | EXPR<span class="nv">$1</span> | |
| +------------+---------+ |
| | Sales | 1 | |
| | Marketing | 2 | |
| +------------+---------+</code></pre></figure> |
| |
| <p>Last, the VALUES operator generates a single row, and is a convenient |
| way to test expressions and SQL built-in functions:</p> |
| |
| <figure class="highlight"><pre><code class="language-bash" data-lang="bash">sqlline> VALUES CHAR_LENGTH<span class="o">(</span><span class="s1">'Hello, '</span> <span class="o">||</span> <span class="s1">'world!'</span><span class="o">)</span><span class="p">;</span> |
| +---------+ |
| | EXPR<span class="nv">$0</span> | |
| +---------+ |
| | 13 | |
| +---------+</code></pre></figure> |
| |
| <p>Calcite has many other SQL features. We don’t have time to cover them |
| here. Write some more queries to experiment.</p> |
| |
| <h2 id="schema-discovery">Schema discovery</h2> |
| |
| <p>Now, how did Calcite find these tables? Remember, core Calcite does not |
| know anything about CSV files. (As a “database without a storage |
| layer”, Calcite doesn’t know about any file formats.) Calcite knows about |
| those tables because we told it to run code in the calcite-example-csv |
| project.</p> |
| |
| <p>There are a couple of steps in that chain. First, we define a schema |
| based on a schema factory class in a model file. Then the schema |
| factory creates a schema, and the schema creates several tables, each |
| of which knows how to get data by scanning a CSV file. Last, after |
| Calcite has parsed the query and planned it to use those tables, Calcite |
| invokes the tables to read the data as the query is being |
| executed. Now let’s look at those steps in more detail.</p> |
| |
| <p>On the JDBC connect string we gave the path of a model in JSON |
| format. Here is the model:</p> |
| |
| <figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w"> |
| </span><span class="err">version</span><span class="p">:</span><span class="w"> </span><span class="err">'</span><span class="mf">1.0</span><span class="err">'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">defaultSchema</span><span class="p">:</span><span class="w"> </span><span class="err">'SALES'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">schemas</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"> |
| </span><span class="p">{</span><span class="w"> |
| </span><span class="err">name</span><span class="p">:</span><span class="w"> </span><span class="err">'SALES'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">type</span><span class="p">:</span><span class="w"> </span><span class="err">'custom'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">factory</span><span class="p">:</span><span class="w"> </span><span class="err">'org.apache.calcite.adapter.csv.CsvSchemaFactory'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">operand</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"> |
| </span><span class="err">directory</span><span class="p">:</span><span class="w"> </span><span class="err">'target/test-classes/sales'</span><span class="w"> |
| </span><span class="p">}</span><span class="w"> |
| </span><span class="p">}</span><span class="w"> |
| </span><span class="p">]</span><span class="w"> |
| </span><span class="p">}</span></code></pre></figure> |
| |
| <p>The model defines a single schema called ‘SALES’. The schema is |
| powered by a plugin class, |
| <a href="https://github.com/apache/calcite/blob/master/example/csv/src/main/java/org/apache/calcite/adapter/csv/CsvSchemaFactory.java">org.apache.calcite.adapter.csv.CsvSchemaFactory</a>, |
| which is part of the |
| calcite-example-csv project and implements the Calcite interface |
| <a href="/apidocs/org/apache/calcite/schema/SchemaFactory.html">SchemaFactory</a>. |
| Its <code>create</code> method instantiates a |
| schema, passing in the <code>directory</code> argument from the model file:</p> |
| |
| <figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="n">Schema</span> <span class="nf">create</span><span class="o">(</span><span class="n">SchemaPlus</span> <span class="n">parentSchema</span><span class="o">,</span> <span class="n">String</span> <span class="n">name</span><span class="o">,</span> |
| <span class="n">Map</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Object</span><span class="o">></span> <span class="n">operand</span><span class="o">)</span> <span class="o">{</span> |
| <span class="n">String</span> <span class="n">directory</span> <span class="o">=</span> <span class="o">(</span><span class="n">String</span><span class="o">)</span> <span class="n">operand</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="s">"directory"</span><span class="o">);</span> |
| <span class="n">String</span> <span class="n">flavorName</span> <span class="o">=</span> <span class="o">(</span><span class="n">String</span><span class="o">)</span> <span class="n">operand</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="s">"flavor"</span><span class="o">);</span> |
| <span class="n">CsvTable</span><span class="o">.</span><span class="na">Flavor</span> <span class="n">flavor</span><span class="o">;</span> |
| <span class="k">if</span> <span class="o">(</span><span class="n">flavorName</span> <span class="o">==</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span> |
| <span class="n">flavor</span> <span class="o">=</span> <span class="n">CsvTable</span><span class="o">.</span><span class="na">Flavor</span><span class="o">.</span><span class="na">SCANNABLE</span><span class="o">;</span> |
| <span class="o">}</span> <span class="k">else</span> <span class="o">{</span> |
| <span class="n">flavor</span> <span class="o">=</span> <span class="n">CsvTable</span><span class="o">.</span><span class="na">Flavor</span><span class="o">.</span><span class="na">valueOf</span><span class="o">(</span><span class="n">flavorName</span><span class="o">.</span><span class="na">toUpperCase</span><span class="o">());</span> |
| <span class="o">}</span> |
| <span class="k">return</span> <span class="k">new</span> <span class="nf">CsvSchema</span><span class="o">(</span> |
| <span class="k">new</span> <span class="nf">File</span><span class="o">(</span><span class="n">directory</span><span class="o">),</span> |
| <span class="n">flavor</span><span class="o">);</span> |
| <span class="o">}</span></code></pre></figure> |
| |
| <p>Driven by the model, the schema factory instantiates a single schema |
| called ‘SALES’. The schema is an instance of |
| <a href="https://github.com/apache/calcite/blob/master/example/csv/src/main/java/org/apache/calcite/adapter/csv/CsvSchema.java">org.apache.calcite.adapter.csv.CsvSchema</a> |
| and implements the Calcite interface |
| <a href="/apidocs/org/apache/calcite/schema/Schema.html">Schema</a>.</p> |
| |
| <p>A schema’s job is to produce a list of tables. (It can also list sub-schemas and |
| table-functions, but these are advanced features and calcite-example-csv does |
| not support them.) The tables implement Calcite’s |
| <a href="/apidocs/org/apache/calcite/schema/Table.html">Table</a> |
| interface. <code>CsvSchema</code> produces tables that are instances of |
| <a href="https://github.com/apache/calcite/blob/master/example/csv/src/main/java/org/apache/calcite/adapter/csv/CsvTable.java">CsvTable</a> |
| and its sub-classes.</p> |
| |
| <p>Here is the relevant code from <code>CsvSchema</code>, overriding the |
| <code><a href="/apidocs/org/apache/calcite/schema/impl/AbstractSchema.html#getTableMap()">getTableMap()</a></code> |
| method in the <code>AbstractSchema</code> base class.</p> |
| |
| <figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">protected</span> <span class="n">Map</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Table</span><span class="o">></span> <span class="nf">getTableMap</span><span class="o">()</span> <span class="o">{</span> |
| <span class="c1">// Look for files in the directory ending in ".csv", ".csv.gz", ".json",</span> |
| <span class="c1">// ".json.gz".</span> |
| <span class="n">File</span><span class="o">[]</span> <span class="n">files</span> <span class="o">=</span> <span class="n">directoryFile</span><span class="o">.</span><span class="na">listFiles</span><span class="o">(</span> |
| <span class="k">new</span> <span class="nf">FilenameFilter</span><span class="o">()</span> <span class="o">{</span> |
| <span class="kd">public</span> <span class="kt">boolean</span> <span class="nf">accept</span><span class="o">(</span><span class="n">File</span> <span class="n">dir</span><span class="o">,</span> <span class="n">String</span> <span class="n">name</span><span class="o">)</span> <span class="o">{</span> |
| <span class="kd">final</span> <span class="n">String</span> <span class="n">nameSansGz</span> <span class="o">=</span> <span class="n">trim</span><span class="o">(</span><span class="n">name</span><span class="o">,</span> <span class="s">".gz"</span><span class="o">);</span> |
| <span class="k">return</span> <span class="n">nameSansGz</span><span class="o">.</span><span class="na">endsWith</span><span class="o">(</span><span class="s">".csv"</span><span class="o">)</span> |
| <span class="o">||</span> <span class="n">nameSansGz</span><span class="o">.</span><span class="na">endsWith</span><span class="o">(</span><span class="s">".json"</span><span class="o">);</span> |
| <span class="o">}</span> |
| <span class="o">});</span> |
| <span class="k">if</span> <span class="o">(</span><span class="n">files</span> <span class="o">==</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span> |
| <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">"directory "</span> <span class="o">+</span> <span class="n">directoryFile</span> <span class="o">+</span> <span class="s">" not found"</span><span class="o">);</span> |
| <span class="n">files</span> <span class="o">=</span> <span class="k">new</span> <span class="n">File</span><span class="o">[</span><span class="mi">0</span><span class="o">];</span> |
| <span class="o">}</span> |
| <span class="c1">// Build a map from table name to table; each file becomes a table.</span> |
| <span class="kd">final</span> <span class="n">ImmutableMap</span><span class="o">.</span><span class="na">Builder</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Table</span><span class="o">></span> <span class="n">builder</span> <span class="o">=</span> <span class="n">ImmutableMap</span><span class="o">.</span><span class="na">builder</span><span class="o">();</span> |
| <span class="k">for</span> <span class="o">(</span><span class="n">File</span> <span class="n">file</span> <span class="o">:</span> <span class="n">files</span><span class="o">)</span> <span class="o">{</span> |
| <span class="n">String</span> <span class="n">tableName</span> <span class="o">=</span> <span class="n">trim</span><span class="o">(</span><span class="n">file</span><span class="o">.</span><span class="na">getName</span><span class="o">(),</span> <span class="s">".gz"</span><span class="o">);</span> |
| <span class="kd">final</span> <span class="n">String</span> <span class="n">tableNameSansJson</span> <span class="o">=</span> <span class="n">trimOrNull</span><span class="o">(</span><span class="n">tableName</span><span class="o">,</span> <span class="s">".json"</span><span class="o">);</span> |
| <span class="k">if</span> <span class="o">(</span><span class="n">tableNameSansJson</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span> |
| <span class="n">JsonTable</span> <span class="n">table</span> <span class="o">=</span> <span class="k">new</span> <span class="n">JsonTable</span><span class="o">(</span><span class="n">file</span><span class="o">);</span> |
| <span class="n">builder</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">tableNameSansJson</span><span class="o">,</span> <span class="n">table</span><span class="o">);</span> |
| <span class="k">continue</span><span class="o">;</span> |
| <span class="o">}</span> |
| <span class="n">tableName</span> <span class="o">=</span> <span class="n">trim</span><span class="o">(</span><span class="n">tableName</span><span class="o">,</span> <span class="s">".csv"</span><span class="o">);</span> |
| <span class="kd">final</span> <span class="n">Table</span> <span class="n">table</span> <span class="o">=</span> <span class="n">createTable</span><span class="o">(</span><span class="n">file</span><span class="o">);</span> |
| <span class="n">builder</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">tableName</span><span class="o">,</span> <span class="n">table</span><span class="o">);</span> |
| <span class="o">}</span> |
| <span class="k">return</span> <span class="n">builder</span><span class="o">.</span><span class="na">build</span><span class="o">();</span> |
| <span class="o">}</span> |
| |
| <span class="cm">/** Creates different sub-type of table based on the "flavor" attribute. */</span> |
| <span class="kd">private</span> <span class="n">Table</span> <span class="nf">createTable</span><span class="o">(</span><span class="n">File</span> <span class="n">file</span><span class="o">)</span> <span class="o">{</span> |
| <span class="k">switch</span> <span class="o">(</span><span class="n">flavor</span><span class="o">)</span> <span class="o">{</span> |
| <span class="k">case</span> <span class="nl">TRANSLATABLE:</span> |
| <span class="k">return</span> <span class="k">new</span> <span class="nf">CsvTranslatableTable</span><span class="o">(</span><span class="n">file</span><span class="o">,</span> <span class="kc">null</span><span class="o">);</span> |
| <span class="k">case</span> <span class="nl">SCANNABLE:</span> |
| <span class="k">return</span> <span class="k">new</span> <span class="nf">CsvScannableTable</span><span class="o">(</span><span class="n">file</span><span class="o">,</span> <span class="kc">null</span><span class="o">);</span> |
| <span class="k">case</span> <span class="nl">FILTERABLE:</span> |
| <span class="k">return</span> <span class="k">new</span> <span class="nf">CsvFilterableTable</span><span class="o">(</span><span class="n">file</span><span class="o">,</span> <span class="kc">null</span><span class="o">);</span> |
| <span class="k">default</span><span class="o">:</span> |
| <span class="k">throw</span> <span class="k">new</span> <span class="nf">AssertionError</span><span class="o">(</span><span class="s">"Unknown flavor "</span> <span class="o">+</span> <span class="n">flavor</span><span class="o">);</span> |
| <span class="o">}</span> |
| <span class="o">}</span></code></pre></figure> |
| |
| <p>The schema scans the directory and finds all files whose name ends |
| with “.csv” and creates tables for them. In this case, the directory |
| is <code>target/test-classes/sales</code> and contains files |
| <code>EMPS.csv</code> and <code>DEPTS.csv</code>, which these become |
| the tables <code>EMPS</code> and <code>DEPTS</code>.</p> |
| |
| <h2 id="tables-and-views-in-schemas">Tables and views in schemas</h2> |
| |
| <p>Note how we did not need to define any tables in the model; the schema |
| generated the tables automatically.</p> |
| |
| <p>You can define extra tables, |
| beyond those that are created automatically, |
| using the <code>tables</code> property of a schema.</p> |
| |
| <p>Let’s see how to create |
| an important and useful type of table, namely a view.</p> |
| |
| <p>A view looks like a table when you are writing a query, but it doesn’t store data. |
| It derives its result by executing a query. |
| The view is expanded while the query is being planned, so the query planner |
| can often perform optimizations like removing expressions from the SELECT |
| clause that are not used in the final result.</p> |
| |
| <p>Here is a schema that defines a view:</p> |
| |
| <figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w"> |
| </span><span class="err">version</span><span class="p">:</span><span class="w"> </span><span class="err">'</span><span class="mf">1.0</span><span class="err">'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">defaultSchema</span><span class="p">:</span><span class="w"> </span><span class="err">'SALES'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">schemas</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"> |
| </span><span class="p">{</span><span class="w"> |
| </span><span class="err">name</span><span class="p">:</span><span class="w"> </span><span class="err">'SALES'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">type</span><span class="p">:</span><span class="w"> </span><span class="err">'custom'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">factory</span><span class="p">:</span><span class="w"> </span><span class="err">'org.apache.calcite.adapter.csv.CsvSchemaFactory'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">operand</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"> |
| </span><span class="err">directory</span><span class="p">:</span><span class="w"> </span><span class="err">'target/test-classes/sales'</span><span class="w"> |
| </span><span class="p">},</span><span class="w"> |
| </span><span class="err">tables</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"> |
| </span><span class="p">{</span><span class="w"> |
| </span><span class="err">name</span><span class="p">:</span><span class="w"> </span><span class="err">'FEMALE_EMPS'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">type</span><span class="p">:</span><span class="w"> </span><span class="err">'view'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">sql</span><span class="p">:</span><span class="w"> </span><span class="err">'SELECT</span><span class="w"> </span><span class="err">*</span><span class="w"> </span><span class="err">FROM</span><span class="w"> </span><span class="err">emps</span><span class="w"> </span><span class="err">WHERE</span><span class="w"> </span><span class="err">gender</span><span class="w"> </span><span class="err">=</span><span class="w"> </span><span class="err">\'F\''</span><span class="w"> |
| </span><span class="p">}</span><span class="w"> |
| </span><span class="p">]</span><span class="w"> |
| </span><span class="p">}</span><span class="w"> |
| </span><span class="p">]</span><span class="w"> |
| </span><span class="p">}</span></code></pre></figure> |
| |
| <p>The line <code>type: 'view'</code> tags <code>FEMALE_EMPS</code> as a view, |
| as opposed to a regular table or a custom table. |
| Note that single-quotes within the view definition are escaped using a |
| back-slash, in the normal way for JSON.</p> |
| |
| <p>JSON doesn’t make it easy to author long strings, so Calcite supports an |
| alternative syntax. If your view has a long SQL statement, you can instead |
| supply a list of lines rather than a single string:</p> |
| |
| <figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w"> |
| </span><span class="err">name</span><span class="p">:</span><span class="w"> </span><span class="err">'FEMALE_EMPS'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">type</span><span class="p">:</span><span class="w"> </span><span class="err">'view'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">sql</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"> |
| </span><span class="err">'SELECT</span><span class="w"> </span><span class="err">*</span><span class="w"> </span><span class="err">FROM</span><span class="w"> </span><span class="err">emps'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">'WHERE</span><span class="w"> </span><span class="err">gender</span><span class="w"> </span><span class="err">=</span><span class="w"> </span><span class="err">\'F\''</span><span class="w"> |
| </span><span class="p">]</span><span class="w"> |
| </span><span class="p">}</span></code></pre></figure> |
| |
| <p>Now we have defined a view, we can use it in queries just as if it were a table:</p> |
| |
| <figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="n">sqlline</span><span class="o">></span> <span class="k">SELECT</span> <span class="n">e</span><span class="p">.</span><span class="n">name</span><span class="p">,</span> <span class="n">d</span><span class="p">.</span><span class="n">name</span> <span class="k">FROM</span> <span class="n">female_emps</span> <span class="k">AS</span> <span class="n">e</span> <span class="k">JOIN</span> <span class="n">depts</span> <span class="k">AS</span> <span class="n">d</span> <span class="k">on</span> <span class="n">e</span><span class="p">.</span><span class="n">deptno</span> <span class="o">=</span> <span class="n">d</span><span class="p">.</span><span class="n">deptno</span><span class="p">;</span> |
| <span class="o">+</span><span class="c1">--------+------------+</span> |
| <span class="o">|</span> <span class="n">NAME</span> <span class="o">|</span> <span class="n">NAME</span> <span class="o">|</span> |
| <span class="o">+</span><span class="c1">--------+------------+</span> |
| <span class="o">|</span> <span class="n">Wilma</span> <span class="o">|</span> <span class="n">Marketing</span> <span class="o">|</span> |
| <span class="o">+</span><span class="c1">--------+------------+</span></code></pre></figure> |
| |
| <h2 id="custom-tables">Custom tables</h2> |
| |
| <p>Custom tables are tables whose implementation is driven by user-defined code. |
| They don’t need to live in a custom schema.</p> |
| |
| <p>There is an example in <code>model-with-custom-table.json</code>:</p> |
| |
| <figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w"> |
| </span><span class="err">version</span><span class="p">:</span><span class="w"> </span><span class="err">'</span><span class="mf">1.0</span><span class="err">'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">defaultSchema</span><span class="p">:</span><span class="w"> </span><span class="err">'CUSTOM_TABLE'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">schemas</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"> |
| </span><span class="p">{</span><span class="w"> |
| </span><span class="err">name</span><span class="p">:</span><span class="w"> </span><span class="err">'CUSTOM_TABLE'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">tables</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"> |
| </span><span class="p">{</span><span class="w"> |
| </span><span class="err">name</span><span class="p">:</span><span class="w"> </span><span class="err">'EMPS'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">type</span><span class="p">:</span><span class="w"> </span><span class="err">'custom'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">factory</span><span class="p">:</span><span class="w"> </span><span class="err">'org.apache.calcite.adapter.csv.CsvTableFactory'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">operand</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"> |
| </span><span class="err">file</span><span class="p">:</span><span class="w"> </span><span class="err">'target/test-classes/sales/EMPS.csv.gz'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">flavor</span><span class="p">:</span><span class="w"> </span><span class="s2">"scannable"</span><span class="w"> |
| </span><span class="p">}</span><span class="w"> |
| </span><span class="p">}</span><span class="w"> |
| </span><span class="p">]</span><span class="w"> |
| </span><span class="p">}</span><span class="w"> |
| </span><span class="p">]</span><span class="w"> |
| </span><span class="p">}</span></code></pre></figure> |
| |
| <p>We can query the table in the usual way:</p> |
| |
| <figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="n">sqlline</span><span class="o">></span> <span class="o">!</span><span class="k">connect</span> <span class="n">jdbc</span><span class="p">:</span><span class="n">calcite</span><span class="p">:</span><span class="n">model</span><span class="o">=</span><span class="n">target</span><span class="o">/</span><span class="n">test</span><span class="o">-</span><span class="n">classes</span><span class="o">/</span><span class="n">model</span><span class="o">-</span><span class="k">with</span><span class="o">-</span><span class="n">custom</span><span class="o">-</span><span class="k">table</span><span class="p">.</span><span class="n">json</span> <span class="k">admin</span> <span class="k">admin</span> |
| <span class="n">sqlline</span><span class="o">></span> <span class="k">SELECT</span> <span class="n">empno</span><span class="p">,</span> <span class="n">name</span> <span class="k">FROM</span> <span class="n">custom_table</span><span class="p">.</span><span class="n">emps</span><span class="p">;</span> |
| <span class="o">+</span><span class="c1">--------+--------+</span> |
| <span class="o">|</span> <span class="n">EMPNO</span> <span class="o">|</span> <span class="n">NAME</span> <span class="o">|</span> |
| <span class="o">+</span><span class="c1">--------+--------+</span> |
| <span class="o">|</span> <span class="mi">100</span> <span class="o">|</span> <span class="n">Fred</span> <span class="o">|</span> |
| <span class="o">|</span> <span class="mi">110</span> <span class="o">|</span> <span class="n">Eric</span> <span class="o">|</span> |
| <span class="o">|</span> <span class="mi">110</span> <span class="o">|</span> <span class="n">John</span> <span class="o">|</span> |
| <span class="o">|</span> <span class="mi">120</span> <span class="o">|</span> <span class="n">Wilma</span> <span class="o">|</span> |
| <span class="o">|</span> <span class="mi">130</span> <span class="o">|</span> <span class="n">Alice</span> <span class="o">|</span> |
| <span class="o">+</span><span class="c1">--------+--------+</span></code></pre></figure> |
| |
| <p>The schema is a regular one, and contains a custom table powered by |
| <a href="https://github.com/apache/calcite/blob/master/example/csv/src/main/java/org/apache/calcite/adapter/csv/CsvTableFactory.java">org.apache.calcite.adapter.csv.CsvTableFactory</a>, |
| which implements the Calcite interface |
| <a href="/apidocs/org/apache/calcite/schema/TableFactory.html">TableFactory</a>. |
| Its <code>create</code> method instantiates a <code>CsvScannableTable</code>, |
| passing in the <code>file</code> argument from the model file:</p> |
| |
| <figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="n">CsvTable</span> <span class="nf">create</span><span class="o">(</span><span class="n">SchemaPlus</span> <span class="n">schema</span><span class="o">,</span> <span class="n">String</span> <span class="n">name</span><span class="o">,</span> |
| <span class="n">Map</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Object</span><span class="o">></span> <span class="n">map</span><span class="o">,</span> <span class="n">RelDataType</span> <span class="n">rowType</span><span class="o">)</span> <span class="o">{</span> |
| <span class="n">String</span> <span class="n">fileName</span> <span class="o">=</span> <span class="o">(</span><span class="n">String</span><span class="o">)</span> <span class="n">map</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="s">"file"</span><span class="o">);</span> |
| <span class="kd">final</span> <span class="n">File</span> <span class="n">file</span> <span class="o">=</span> <span class="k">new</span> <span class="n">File</span><span class="o">(</span><span class="n">fileName</span><span class="o">);</span> |
| <span class="kd">final</span> <span class="n">RelProtoDataType</span> <span class="n">protoRowType</span> <span class="o">=</span> |
| <span class="n">rowType</span> <span class="o">!=</span> <span class="kc">null</span> <span class="o">?</span> <span class="n">RelDataTypeImpl</span><span class="o">.</span><span class="na">proto</span><span class="o">(</span><span class="n">rowType</span><span class="o">)</span> <span class="o">:</span> <span class="kc">null</span><span class="o">;</span> |
| <span class="k">return</span> <span class="k">new</span> <span class="nf">CsvScannableTable</span><span class="o">(</span><span class="n">file</span><span class="o">,</span> <span class="n">protoRowType</span><span class="o">);</span> |
| <span class="o">}</span></code></pre></figure> |
| |
| <p>Implementing a custom table is often a simpler alternative to implementing |
| a custom schema. Both approaches might end up creating a similar implementation |
| of the <code>Table</code> interface, but for the custom table you don’t |
| need to implement metadata discovery. (<code>CsvTableFactory</code> |
| creates a <code>CsvScannableTable</code>, just as <code>CsvSchema</code> does, |
| but the table implementation does not scan the filesystem for .csv files.)</p> |
| |
| <p>Custom tables require more work for the author of the model (the author |
| needs to specify each table and its file explicitly) but also give the author |
| more control (say, providing different parameters for each table).</p> |
| |
| <h2 id="comments-in-models">Comments in models</h2> |
| |
| <p>Models can include comments using <code class="highlighter-rouge">/* ... */</code> and <code class="highlighter-rouge">//</code> syntax:</p> |
| |
| <figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w"> |
| </span><span class="err">version</span><span class="p">:</span><span class="w"> </span><span class="err">'</span><span class="mf">1.0</span><span class="err">'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">/*</span><span class="w"> </span><span class="err">Multi-line</span><span class="w"> |
| </span><span class="err">comment.</span><span class="w"> </span><span class="err">*/</span><span class="w"> |
| </span><span class="err">defaultSchema</span><span class="p">:</span><span class="w"> </span><span class="err">'CUSTOM_TABLE'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">//</span><span class="w"> </span><span class="err">Single-line</span><span class="w"> </span><span class="err">comment.</span><span class="w"> |
| </span><span class="err">schemas</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"> |
| </span><span class="err">..</span><span class="w"> |
| </span><span class="p">]</span><span class="w"> |
| </span><span class="p">}</span></code></pre></figure> |
| |
| <p>(Comments are not standard JSON, but are a harmless extension.)</p> |
| |
| <h2 id="optimizing-queries-using-planner-rules">Optimizing queries using planner rules</h2> |
| |
| <p>The table implementations we have seen so far are fine as long as the tables |
| don’t contain a great deal of data. But if your customer table has, say, a |
| hundred columns and a million rows, you would rather that the system did not |
| retrieve all of the data for every query. You would like Calcite to negotiate |
| with the adapter and find a more efficient way of accessing the data.</p> |
| |
| <p>This negotiation is a simple form of query optimization. Calcite supports query |
| optimization by adding <i>planner rules</i>. Planner rules operate by |
| looking for patterns in the query parse tree (for instance a project on top |
| of a certain kind of table), and replacing the matched nodes in the tree by |
| a new set of nodes which implement the optimization.</p> |
| |
| <p>Planner rules are also extensible, like schemas and tables. So, if you have a |
| data store that you want to access via SQL, you first define a custom table or |
| schema, and then you define some rules to make the access efficient.</p> |
| |
| <p>To see this in action, let’s use a planner rule to access |
| a subset of columns from a CSV file. Let’s run the same query against two very |
| similar schemas:</p> |
| |
| <figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="n">sqlline</span><span class="o">></span> <span class="o">!</span><span class="k">connect</span> <span class="n">jdbc</span><span class="p">:</span><span class="n">calcite</span><span class="p">:</span><span class="n">model</span><span class="o">=</span><span class="n">target</span><span class="o">/</span><span class="n">test</span><span class="o">-</span><span class="n">classes</span><span class="o">/</span><span class="n">model</span><span class="p">.</span><span class="n">json</span> <span class="k">admin</span> <span class="k">admin</span> |
| <span class="n">sqlline</span><span class="o">></span> <span class="k">explain</span> <span class="n">plan</span> <span class="k">for</span> <span class="k">select</span> <span class="n">name</span> <span class="k">from</span> <span class="n">emps</span><span class="p">;</span> |
| <span class="o">+</span><span class="c1">-----------------------------------------------------+</span> |
| <span class="o">|</span> <span class="n">PLAN</span> <span class="o">|</span> |
| <span class="o">+</span><span class="c1">-----------------------------------------------------+</span> |
| <span class="o">|</span> <span class="n">EnumerableCalcRel</span><span class="p">(</span><span class="n">expr</span><span class="o">#</span><span class="mi">0</span><span class="p">..</span><span class="mi">9</span><span class="o">=</span><span class="p">[</span><span class="err">{</span><span class="n">inputs</span><span class="err">}</span><span class="p">],</span> <span class="n">NAME</span><span class="o">=</span><span class="p">[</span><span class="err">$</span><span class="n">t1</span><span class="p">])</span> <span class="o">|</span> |
| <span class="o">|</span> <span class="n">EnumerableTableScan</span><span class="p">(</span><span class="k">table</span><span class="o">=</span><span class="p">[[</span><span class="n">SALES</span><span class="p">,</span> <span class="n">EMPS</span><span class="p">]])</span> <span class="o">|</span> |
| <span class="o">+</span><span class="c1">-----------------------------------------------------+</span> |
| <span class="n">sqlline</span><span class="o">></span> <span class="o">!</span><span class="k">connect</span> <span class="n">jdbc</span><span class="p">:</span><span class="n">calcite</span><span class="p">:</span><span class="n">model</span><span class="o">=</span><span class="n">target</span><span class="o">/</span><span class="n">test</span><span class="o">-</span><span class="n">classes</span><span class="o">/</span><span class="n">smart</span><span class="p">.</span><span class="n">json</span> <span class="k">admin</span> <span class="k">admin</span> |
| <span class="n">sqlline</span><span class="o">></span> <span class="k">explain</span> <span class="n">plan</span> <span class="k">for</span> <span class="k">select</span> <span class="n">name</span> <span class="k">from</span> <span class="n">emps</span><span class="p">;</span> |
| <span class="o">+</span><span class="c1">-----------------------------------------------------+</span> |
| <span class="o">|</span> <span class="n">PLAN</span> <span class="o">|</span> |
| <span class="o">+</span><span class="c1">-----------------------------------------------------+</span> |
| <span class="o">|</span> <span class="n">EnumerableCalcRel</span><span class="p">(</span><span class="n">expr</span><span class="o">#</span><span class="mi">0</span><span class="p">..</span><span class="mi">9</span><span class="o">=</span><span class="p">[</span><span class="err">{</span><span class="n">inputs</span><span class="err">}</span><span class="p">],</span> <span class="n">NAME</span><span class="o">=</span><span class="p">[</span><span class="err">$</span><span class="n">t1</span><span class="p">])</span> <span class="o">|</span> |
| <span class="o">|</span> <span class="n">CsvTableScan</span><span class="p">(</span><span class="k">table</span><span class="o">=</span><span class="p">[[</span><span class="n">SALES</span><span class="p">,</span> <span class="n">EMPS</span><span class="p">]])</span> <span class="o">|</span> |
| <span class="o">+</span><span class="c1">-----------------------------------------------------+</span></code></pre></figure> |
| |
| <p>What causes the difference in plan? Let’s follow the trail of evidence. In the |
| <code>smart.json</code> model file, there is just one extra line:</p> |
| |
| <figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="err">flavor</span><span class="p">:</span><span class="w"> </span><span class="s2">"translatable"</span></code></pre></figure> |
| |
| <p>This causes a <code>CsvSchema</code> to be created with |
| <code>flavor = TRANSLATABLE</code>, |
| and its <code>createTable</code> method creates instances of |
| <a href="https://github.com/apache/calcite/blob/master/example/csv/src/main/java/org/apache/calcite/adapter/csv/CsvTranslatableTable.java">CsvTranslatableTable</a> |
| rather than a <code>CsvScannableTable</code>.</p> |
| |
| <p><code>CsvTranslatableTable</code> implements the |
| <code><a href="/apidocs/org/apache/calcite/schema/TranslatableTable.html#toRel()">TranslatableTable.toRel()</a></code> |
| method to create |
| <a href="https://github.com/apache/calcite/blob/master/example/csv/src/main/java/org/apache/calcite/adapter/csv/CsvTableScan.java">CsvTableScan</a>. |
| Table scans are the leaves of a query operator tree. |
| The usual implementation is |
| <code><a href="/apidocs/org/apache/calcite/adapter/enumerable/EnumerableTableScan.html">EnumerableTableScan</a></code>, |
| but we have created a distinctive sub-type that will cause rules to fire.</p> |
| |
| <p>Here is the rule in its entirety:</p> |
| |
| <figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="kd">class</span> <span class="nc">CsvProjectTableScanRule</span> <span class="kd">extends</span> <span class="n">RelOptRule</span> <span class="o">{</span> |
| <span class="kd">public</span> <span class="kd">static</span> <span class="kd">final</span> <span class="n">CsvProjectTableScanRule</span> <span class="n">INSTANCE</span> <span class="o">=</span> |
| <span class="k">new</span> <span class="nf">CsvProjectTableScanRule</span><span class="o">();</span> |
| |
| <span class="kd">private</span> <span class="nf">CsvProjectTableScanRule</span><span class="o">()</span> <span class="o">{</span> |
| <span class="kd">super</span><span class="o">(</span> |
| <span class="n">operand</span><span class="o">(</span><span class="n">Project</span><span class="o">.</span><span class="na">class</span><span class="o">,</span> |
| <span class="n">operand</span><span class="o">(</span><span class="n">CsvTableScan</span><span class="o">.</span><span class="na">class</span><span class="o">,</span> <span class="n">none</span><span class="o">())),</span> |
| <span class="s">"CsvProjectTableScanRule"</span><span class="o">);</span> |
| <span class="o">}</span> |
| |
| <span class="nd">@Override</span> |
| <span class="kd">public</span> <span class="kt">void</span> <span class="nf">onMatch</span><span class="o">(</span><span class="n">RelOptRuleCall</span> <span class="n">call</span><span class="o">)</span> <span class="o">{</span> |
| <span class="kd">final</span> <span class="n">Project</span> <span class="n">project</span> <span class="o">=</span> <span class="n">call</span><span class="o">.</span><span class="na">rel</span><span class="o">(</span><span class="mi">0</span><span class="o">);</span> |
| <span class="kd">final</span> <span class="n">CsvTableScan</span> <span class="n">scan</span> <span class="o">=</span> <span class="n">call</span><span class="o">.</span><span class="na">rel</span><span class="o">(</span><span class="mi">1</span><span class="o">);</span> |
| <span class="kt">int</span><span class="o">[]</span> <span class="n">fields</span> <span class="o">=</span> <span class="n">getProjectFields</span><span class="o">(</span><span class="n">project</span><span class="o">.</span><span class="na">getProjects</span><span class="o">());</span> |
| <span class="k">if</span> <span class="o">(</span><span class="n">fields</span> <span class="o">==</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span> |
| <span class="c1">// Project contains expressions more complex than just field references.</span> |
| <span class="k">return</span><span class="o">;</span> |
| <span class="o">}</span> |
| <span class="n">call</span><span class="o">.</span><span class="na">transformTo</span><span class="o">(</span> |
| <span class="k">new</span> <span class="nf">CsvTableScan</span><span class="o">(</span> |
| <span class="n">scan</span><span class="o">.</span><span class="na">getCluster</span><span class="o">(),</span> |
| <span class="n">scan</span><span class="o">.</span><span class="na">getTable</span><span class="o">(),</span> |
| <span class="n">scan</span><span class="o">.</span><span class="na">csvTable</span><span class="o">,</span> |
| <span class="n">fields</span><span class="o">));</span> |
| <span class="o">}</span> |
| |
| <span class="kd">private</span> <span class="kt">int</span><span class="o">[]</span> <span class="nf">getProjectFields</span><span class="o">(</span><span class="n">List</span><span class="o"><</span><span class="n">RexNode</span><span class="o">></span> <span class="n">exps</span><span class="o">)</span> <span class="o">{</span> |
| <span class="kd">final</span> <span class="kt">int</span><span class="o">[]</span> <span class="n">fields</span> <span class="o">=</span> <span class="k">new</span> <span class="kt">int</span><span class="o">[</span><span class="n">exps</span><span class="o">.</span><span class="na">size</span><span class="o">()];</span> |
| <span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">exps</span><span class="o">.</span><span class="na">size</span><span class="o">();</span> <span class="n">i</span><span class="o">++)</span> <span class="o">{</span> |
| <span class="kd">final</span> <span class="n">RexNode</span> <span class="n">exp</span> <span class="o">=</span> <span class="n">exps</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="n">i</span><span class="o">);</span> |
| <span class="k">if</span> <span class="o">(</span><span class="n">exp</span> <span class="k">instanceof</span> <span class="n">RexInputRef</span><span class="o">)</span> <span class="o">{</span> |
| <span class="n">fields</span><span class="o">[</span><span class="n">i</span><span class="o">]</span> <span class="o">=</span> <span class="o">((</span><span class="n">RexInputRef</span><span class="o">)</span> <span class="n">exp</span><span class="o">).</span><span class="na">getIndex</span><span class="o">();</span> |
| <span class="o">}</span> <span class="k">else</span> <span class="o">{</span> |
| <span class="k">return</span> <span class="kc">null</span><span class="o">;</span> <span class="c1">// not a simple projection</span> |
| <span class="o">}</span> |
| <span class="o">}</span> |
| <span class="k">return</span> <span class="n">fields</span><span class="o">;</span> |
| <span class="o">}</span> |
| <span class="o">}</span></code></pre></figure> |
| |
| <p>The constructor declares the pattern of relational expressions that will cause |
| the rule to fire.</p> |
| |
| <p>The <code>onMatch</code> method generates a new relational expression and calls |
| <code><a href="/apidocs/org/apache/calcite/plan/RelOptRuleCall.html#transformTo(org.apache.calcite.rel.RelNode)">RelOptRuleCall.transformTo()</a></code> |
| to indicate that the rule has fired successfully.</p> |
| |
| <h2 id="the-query-optimization-process">The query optimization process</h2> |
| |
| <p>There’s a lot to say about how clever Calcite’s query planner is, but we won’t |
| say it here. The cleverness is designed to take the burden off you, the writer |
| of planner rules.</p> |
| |
| <p>First, Calcite doesn’t fire rules in a prescribed order. The query optimization |
| process follows many branches of a branching tree, just like a chess playing |
| program examines many possible sequences of moves. If rules A and B both match a |
| given section of the query operator tree, then Calcite can fire both.</p> |
| |
| <p>Second, Calcite uses cost in choosing between plans, but the cost model doesn’t |
| prevent rules from firing which may seem to be more expensive in the short term.</p> |
| |
| <p>Many optimizers have a linear optimization scheme. Faced with a choice between |
| rule A and rule B, as above, such an optimizer needs to choose immediately. It |
| might have a policy such as “apply rule A to the whole tree, then apply rule B |
| to the whole tree”, or apply a cost-based policy, applying the rule that |
| produces the cheaper result.</p> |
| |
| <p>Calcite doesn’t require such compromises. |
| This makes it simple to combine various sets of rules. |
| If, say you want to combine rules to recognize materialized views with rules to |
| read from CSV and JDBC source systems, you just give Calcite the set of all |
| rules and tell it to go at it.</p> |
| |
| <p>Calcite does use a cost model. The cost model decides which plan to ultimately |
| use, and sometimes to prune the search tree to prevent the search space from |
| exploding, but it never forces you to choose between rule A and rule B. This is |
| important, because it avoids falling into local minima in the search space that |
| are not actually optimal.</p> |
| |
| <p>Also (you guessed it) the cost model is pluggable, as are the table and query |
| operator statistics it is based upon. But that can be a subject for later.</p> |
| |
| <h2 id="jdbc-adapter">JDBC adapter</h2> |
| |
| <p>The JDBC adapter maps a schema in a JDBC data source as a Calcite schema.</p> |
| |
| <p>For example, this schema reads from a MySQL “foodmart” database:</p> |
| |
| <figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w"> |
| </span><span class="err">version</span><span class="p">:</span><span class="w"> </span><span class="err">'</span><span class="mf">1.0</span><span class="err">'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">defaultSchema</span><span class="p">:</span><span class="w"> </span><span class="err">'FOODMART'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">schemas</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"> |
| </span><span class="p">{</span><span class="w"> |
| </span><span class="err">name</span><span class="p">:</span><span class="w"> </span><span class="err">'FOODMART'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">type</span><span class="p">:</span><span class="w"> </span><span class="err">'custom'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">factory</span><span class="p">:</span><span class="w"> </span><span class="err">'org.apache.calcite.adapter.jdbc.JdbcSchema$Factory'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">operand</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"> |
| </span><span class="err">jdbcDriver</span><span class="p">:</span><span class="w"> </span><span class="err">'com.mysql.jdbc.Driver'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">jdbcUrl</span><span class="p">:</span><span class="w"> </span><span class="err">'jdbc</span><span class="p">:</span><span class="err">mysql</span><span class="p">:</span><span class="err">//localhost/foodmart'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">jdbcUser</span><span class="p">:</span><span class="w"> </span><span class="err">'foodmart'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">jdbcPassword</span><span class="p">:</span><span class="w"> </span><span class="err">'foodmart'</span><span class="w"> |
| </span><span class="p">}</span><span class="w"> |
| </span><span class="p">}</span><span class="w"> |
| </span><span class="p">]</span><span class="w"> |
| </span><span class="p">}</span></code></pre></figure> |
| |
| <p>(The FoodMart database will be familiar to those of you who have used |
| the Mondrian OLAP engine, because it is Mondrian’s main test data |
| set. To load the data set, follow <a href="https://mondrian.pentaho.com/documentation/installation.php#2_Set_up_test_data">Mondrian’s |
| installation instructions</a>.)</p> |
| |
| <p><b>Current limitations</b>: The JDBC adapter currently only pushes |
| down table scan operations; all other processing (filtering, joins, |
| aggregations and so forth) occurs within Calcite. Our goal is to push |
| down as much processing as possible to the source system, translating |
| syntax, data types and built-in functions as we go. If a Calcite query |
| is based on tables from a single JDBC database, in principle the whole |
| query should go to that database. If tables are from multiple JDBC |
| sources, or a mixture of JDBC and non-JDBC, Calcite will use the most |
| efficient distributed query approach that it can.</p> |
| |
| <h2 id="the-cloning-jdbc-adapter">The cloning JDBC adapter</h2> |
| |
| <p>The cloning JDBC adapter creates a hybrid database. The data is |
| sourced from a JDBC database but is read into in-memory tables the |
| first time each table is accessed. Calcite evaluates queries based on |
| those in-memory tables, effectively a cache of the database.</p> |
| |
| <p>For example, the following model reads tables from a MySQL |
| “foodmart” database:</p> |
| |
| <figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w"> |
| </span><span class="err">version</span><span class="p">:</span><span class="w"> </span><span class="err">'</span><span class="mf">1.0</span><span class="err">'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">defaultSchema</span><span class="p">:</span><span class="w"> </span><span class="err">'FOODMART_CLONE'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">schemas</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"> |
| </span><span class="p">{</span><span class="w"> |
| </span><span class="err">name</span><span class="p">:</span><span class="w"> </span><span class="err">'FOODMART_CLONE'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">type</span><span class="p">:</span><span class="w"> </span><span class="err">'custom'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">factory</span><span class="p">:</span><span class="w"> </span><span class="err">'org.apache.calcite.adapter.clone.CloneSchema$Factory'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">operand</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"> |
| </span><span class="err">jdbcDriver</span><span class="p">:</span><span class="w"> </span><span class="err">'com.mysql.jdbc.Driver'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">jdbcUrl</span><span class="p">:</span><span class="w"> </span><span class="err">'jdbc</span><span class="p">:</span><span class="err">mysql</span><span class="p">:</span><span class="err">//localhost/foodmart'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">jdbcUser</span><span class="p">:</span><span class="w"> </span><span class="err">'foodmart'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">jdbcPassword</span><span class="p">:</span><span class="w"> </span><span class="err">'foodmart'</span><span class="w"> |
| </span><span class="p">}</span><span class="w"> |
| </span><span class="p">}</span><span class="w"> |
| </span><span class="p">]</span><span class="w"> |
| </span><span class="p">}</span></code></pre></figure> |
| |
| <p>Another technique is to build a clone schema on top of an existing |
| schema. You use the <code>source</code> property to reference a schema |
| defined earlier in the model, like this:</p> |
| |
| <figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w"> |
| </span><span class="err">version</span><span class="p">:</span><span class="w"> </span><span class="err">'</span><span class="mf">1.0</span><span class="err">'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">defaultSchema</span><span class="p">:</span><span class="w"> </span><span class="err">'FOODMART_CLONE'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">schemas</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"> |
| </span><span class="p">{</span><span class="w"> |
| </span><span class="err">name</span><span class="p">:</span><span class="w"> </span><span class="err">'FOODMART'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">type</span><span class="p">:</span><span class="w"> </span><span class="err">'custom'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">factory</span><span class="p">:</span><span class="w"> </span><span class="err">'org.apache.calcite.adapter.jdbc.JdbcSchema$Factory'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">operand</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"> |
| </span><span class="err">jdbcDriver</span><span class="p">:</span><span class="w"> </span><span class="err">'com.mysql.jdbc.Driver'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">jdbcUrl</span><span class="p">:</span><span class="w"> </span><span class="err">'jdbc</span><span class="p">:</span><span class="err">mysql</span><span class="p">:</span><span class="err">//localhost/foodmart'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">jdbcUser</span><span class="p">:</span><span class="w"> </span><span class="err">'foodmart'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">jdbcPassword</span><span class="p">:</span><span class="w"> </span><span class="err">'foodmart'</span><span class="w"> |
| </span><span class="p">}</span><span class="w"> |
| </span><span class="p">},</span><span class="w"> |
| </span><span class="p">{</span><span class="w"> |
| </span><span class="err">name</span><span class="p">:</span><span class="w"> </span><span class="err">'FOODMART_CLONE'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">type</span><span class="p">:</span><span class="w"> </span><span class="err">'custom'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">factory</span><span class="p">:</span><span class="w"> </span><span class="err">'org.apache.calcite.adapter.clone.CloneSchema$Factory'</span><span class="p">,</span><span class="w"> |
| </span><span class="err">operand</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"> |
| </span><span class="err">source</span><span class="p">:</span><span class="w"> </span><span class="err">'FOODMART'</span><span class="w"> |
| </span><span class="p">}</span><span class="w"> |
| </span><span class="p">}</span><span class="w"> |
| </span><span class="p">]</span><span class="w"> |
| </span><span class="p">}</span></code></pre></figure> |
| |
| <p>You can use this approach to create a clone schema on any type of |
| schema, not just JDBC.</p> |
| |
| <p>The cloning adapter isn’t the be-all and end-all. We plan to develop |
| more sophisticated caching strategies, and a more complete and |
| efficient implementation of in-memory tables, but for now the cloning |
| JDBC adapter shows what is possible and allows us to try out our |
| initial implementations.</p> |
| |
| <h2 id="further-topics">Further topics</h2> |
| |
| <p>There are many other ways to extend Calcite not yet described in this tutorial. |
| The <a href="adapter.html">adapter specification</a> describes the APIs involved.</p> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <div class="section-nav"> |
| <div class="left align-right"> |
| |
| |
| |
| <a href="/docs/index.html" class="prev">Previous</a> |
| |
| </div> |
| <div class="right align-left"> |
| |
| |
| |
| |
| |
| <a href="/docs/algebra.html" class="next">Next</a> |
| |
| </div> |
| </div> |
| <div class="clear"></div> |
| |
| |
| </article> |
| </div> |
| |
| <div class="unit one-fifth hide-on-mobiles"> |
| <aside> |
| |
| <h4>Overview</h4> |
| |
| |
| <ul> |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/index.html">Background</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class="current"><a href="/docs/tutorial.html">Tutorial</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/algebra.html">Algebra</a></li> |
| |
| |
| </ul> |
| |
| |
| <h4>Advanced</h4> |
| |
| |
| <ul> |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/adapter.html">Adapters</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/spatial.html">Spatial</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/stream.html">Streaming</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/materialized_views.html">Materialized Views</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/lattice.html">Lattices</a></li> |
| |
| |
| </ul> |
| |
| |
| <h4>Avatica</h4> |
| |
| |
| <ul> |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/avatica_overview.html">Overview</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/avatica_roadmap.html">Roadmap</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/avatica_json_reference.html">JSON Reference</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/avatica_protobuf_reference.html">Protobuf Reference</a></li> |
| |
| |
| </ul> |
| |
| |
| <h4>Reference</h4> |
| |
| |
| <ul> |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/reference.html">SQL language</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/model.html">JSON/YAML models</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/howto.html">HOWTO</a></li> |
| |
| |
| </ul> |
| |
| |
| <h4>Meta</h4> |
| |
| |
| <ul> |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/history.html">History</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/powered_by.html">Powered by Calcite</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/apidocs">API</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/testapidocs">Test API</a></li> |
| |
| |
| </ul> |
| |
| |
| </aside> |
| </div> |
| |
| |
| <div class="clear"></div> |
| |
| </div> |
| </section> |
| |
| |
| <footer role="contentinfo"> |
| <div id="poweredby"> |
| <a href="http://www.apache.org/"> |
| <span class="sr-only">Apache</span> |
| <img src="/img/feather.png" width="190" height="77" alt="Apache Logo"></a> |
| </div> |
| <div id="copyright"> |
| <p>The contents of this website are Copyright © 2019 |
| <a href="https://www.apache.org/">Apache Software Foundation</a> |
| under the terms of |
| the <a href="https://www.apache.org/licenses/"> |
| Apache License v2</a>. Apache Calcite and its logo are |
| trademarks of the Apache Software Foundation.</p> |
| </div> |
| </footer> |
| |
| <script> |
| var anchorForId = function (id) { |
| var anchor = document.createElement("a"); |
| anchor.className = "header-link"; |
| anchor.href = "#" + id; |
| anchor.innerHTML = "<span class=\"sr-only\">Permalink</span><i class=\"fa fa-link\"></i>"; |
| anchor.title = "Permalink"; |
| return anchor; |
| }; |
| |
| var linkifyAnchors = function (level, containingElement) { |
| var headers = containingElement.getElementsByTagName("h" + level); |
| for (var h = 0; h < headers.length; h++) { |
| var header = headers[h]; |
| |
| if (typeof header.id !== "undefined" && header.id !== "") { |
| header.appendChild(anchorForId(header.id)); |
| } |
| } |
| }; |
| |
| document.onreadystatechange = function () { |
| if (this.readyState === "complete") { |
| var contentBlock = document.getElementsByClassName("docs")[0] || document.getElementsByClassName("news")[0]; |
| if (!contentBlock) { |
| return; |
| } |
| for (var level = 1; level <= 6; level++) { |
| linkifyAnchors(level, contentBlock); |
| } |
| } |
| }; |
| </script> |
| |
| |
| </body> |
| </html> |