| <!DOCTYPE HTML> |
| <html lang="en-US"> |
| <head> |
| <meta charset="UTF-8"> |
| <title>OS adapter and sqlsh</title> |
| <meta name="viewport" content="width=device-width,initial-scale=1"> |
| <meta name="generator" content="Jekyll v3.7.3"> |
| <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900"> |
| <link rel="stylesheet" href="/css/screen.css"> |
| <link rel="icon" type="image/x-icon" href="/favicon.ico"> |
| <!--[if lt IE 9]> |
| <script src="/js/html5shiv.min.js"></script> |
| <script src="/js/respond.min.js"></script> |
| <![endif]--> |
| </head> |
| |
| |
| <body class="wrap"> |
| <header role="banner"> |
| <nav class="mobile-nav show-on-mobiles"> |
| <ul> |
| <li class=""> |
| <a href="/">Home</a> |
| </li> |
| <li class=""> |
| <a href="/downloads/">Download</a> |
| </li> |
| <li class=""> |
| <a href="/community/">Community</a> |
| </li> |
| <li class=""> |
| <a href="/develop/">Develop</a> |
| </li> |
| <li class=""> |
| <a href="/news/">News</a> |
| </li> |
| <li class="current"> |
| <a href="/docs/">Docs</a> |
| </li> |
| </ul> |
| |
| </nav> |
| <div class="grid"> |
| <div class="unit one-third center-on-mobiles"> |
| <h1> |
| <a href="/"> |
| <span class="sr-only">Apache Calcite</span> |
| <img src="/img/logo.png" width="226" height="140" alt="Calcite Logo"> |
| </a> |
| </h1> |
| </div> |
| <nav class="main-nav unit two-thirds hide-on-mobiles"> |
| <ul> |
| <li class=""> |
| <a href="/">Home</a> |
| </li> |
| <li class=""> |
| <a href="/downloads/">Download</a> |
| </li> |
| <li class=""> |
| <a href="/community/">Community</a> |
| </li> |
| <li class=""> |
| <a href="/develop/">Develop</a> |
| </li> |
| <li class=""> |
| <a href="/news/">News</a> |
| </li> |
| <li class="current"> |
| <a href="/docs/">Docs</a> |
| </li> |
| </ul> |
| |
| </nav> |
| </div> |
| </header> |
| |
| |
| <section class="docs"> |
| <div class="grid"> |
| |
| <div class="docs-nav-mobile unit whole show-on-mobiles"> |
| <select onchange="if (this.value) window.location.href=this.value"> |
| <option value="">Navigate the docs…</option> |
| <optgroup label="Overview"> |
| </optgroup> |
| <optgroup label="Advanced"> |
| </optgroup> |
| <optgroup label="Avatica"> |
| </optgroup> |
| <optgroup label="Reference"> |
| </optgroup> |
| <optgroup label="Meta"> |
| </optgroup> |
| |
| </select> |
| </div> |
| |
| |
| <div class="unit four-fifths"> |
| <article> |
| <h1>OS adapter and sqlsh</h1> |
| <!-- |
| |
| --> |
| |
| <h1 id="overview">Overview</h1> |
| |
| <p>The OS (operating system) adapter allows you to access data in your operating |
| system and environment using SQL queries.</p> |
| |
| <p>It aims to solve similar problems that have traditionally been solved using UNIX |
| command pipelines, but with the power and type-safety of SQL.</p> |
| |
| <p>The adapter also includes a wrapper called <code class="highlighter-rouge">sqlsh</code> that allows you to execute |
| commands from your favorite shell.</p> |
| |
| <h1 id="security-warning">Security warning</h1> |
| |
| <p>The OS adapter launches processes, and is potentially a security loop-hole. |
| It is included in Calcite’s “plus” module, which is not enabled by default. |
| You must think carefully before enabling it in a security-sensitive situation.</p> |
| |
| <h1 id="compatibility">Compatibility</h1> |
| |
| <p>We try to support all tables on every operating system, and to make sure that |
| the tables have the same columns. But we rely heavily on operating system |
| commands, and these differ widely. So:</p> |
| |
| <ul> |
| <li>These commands only work on Linux and macOS (not Windows, even with Cygwin);</li> |
| <li><code class="highlighter-rouge">vmstat</code> has very different columns between Linux and macOS;</li> |
| <li><code class="highlighter-rouge">files</code> and <code class="highlighter-rouge">ps</code> have the same column names but semantics differ;</li> |
| <li>Other commands work largely the same.</li> |
| </ul> |
| |
| <h1 id="a-simple-example">A simple example</h1> |
| |
| <p>Every bash hacker knows that to find the 3 largest files you type</p> |
| |
| <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>find <span class="nb">.</span> <span class="nt">-type</span> f <span class="nt">-print0</span> | xargs <span class="nt">-0</span> <span class="nb">ls</span> <span class="nt">-l</span> | sort <span class="nt">-nr</span> <span class="nt">-k</span> 5 | head <span class="nt">-3</span> |
| <span class="nt">-rw-r--r--</span> 1 jhyde jhyde 194877 Jul 16 16:10 ./validate/SqlValidatorImpl.java |
| <span class="nt">-rw-r--r--</span> 1 jhyde jhyde 73826 Jul 4 21:51 ./fun/SqlStdOperatorTable.java |
| <span class="nt">-rw-r--r--</span> 1 jhyde jhyde 39214 Jul 4 21:51 ./type/SqlTypeUtil.java</code></pre></figure> |
| |
| <p>This actually a pipeline of relational operations, each tuple represented |
| by line of space-separated fields. What if we were able to access the list of |
| files as a relation and use it in a SQL query? And what if we could easily |
| execute that SQL query from the shell? This is what <code class="highlighter-rouge">sqlsh</code> does:</p> |
| |
| <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>sqlsh <span class="k">select </span>size, path from files where <span class="nb">type</span> <span class="o">=</span> <span class="se">\'</span>f<span class="se">\'</span> order by size desc limit 3 |
| 194877 validate/SqlValidatorImpl.java |
| 73826 fun/SqlStdOperatorTable.java |
| 39214 <span class="nb">type</span>/SqlTypeUtil.java</code></pre></figure> |
| |
| <h1 id="sqlsh">sqlsh</h1> |
| |
| <p><code class="highlighter-rouge">sqlsh</code> launches a connection to Calcite whose default schema is the OS adapter.</p> |
| |
| <p>It uses the JAVA lexical mode, which means that unquoted table and column names |
| remain in the case that they were written. This is consistent with how shells like |
| bash behave.</p> |
| |
| <p>Shell meta-characters such as <code class="highlighter-rouge">*</code>, <code class="highlighter-rouge">></code>, <code class="highlighter-rouge"><</code>, <code class="highlighter-rouge">(</code>, and <code class="highlighter-rouge">)</code> have to be treated with |
| care. Often adding a back-slash will suffice.</p> |
| |
| <h1 id="tables-and-commands">Tables and commands</h1> |
| |
| <p>The OS adapter contains the following tables:</p> |
| |
| <ul> |
| <li><code class="highlighter-rouge">du</code> - Disk usage (based on <code class="highlighter-rouge">du</code> command)</li> |
| <li><code class="highlighter-rouge">ps</code> - Processes (based on <code class="highlighter-rouge">ps</code> command)</li> |
| <li><code class="highlighter-rouge">stdin</code> - Standard input</li> |
| <li><code class="highlighter-rouge">files</code> - Files (based on the <code class="highlighter-rouge">find</code> command)</li> |
| <li><code class="highlighter-rouge">git_commits</code> - Git commits (based on <code class="highlighter-rouge">git log</code>)</li> |
| <li><code class="highlighter-rouge">vmstat</code> - Virtual memory (based on <code class="highlighter-rouge">vmstat</code> command)</li> |
| </ul> |
| |
| <p>Most tables are implemented as views on top of table functions.</p> |
| |
| <p>New data sources are straightforward to add; please contribute yours!</p> |
| |
| <h2 id="example-du">Example: du</h2> |
| |
| <p>How many class files, and what is their total size? In <code class="highlighter-rouge">bash</code>:</p> |
| |
| <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>du <span class="nt">-ka</span> <span class="nb">.</span> | <span class="nb">grep</span> <span class="s1">'\.class$'</span> | awk <span class="s1">'{size+=$1} END {print FNR, size}'</span> |
| 4416 27960</code></pre></figure> |
| |
| <p>In <code class="highlighter-rouge">sqlsh</code>:</p> |
| |
| <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>sqlsh <span class="k">select </span>count<span class="se">\(\*\)</span>, sum<span class="se">\(</span>size_k<span class="se">\)</span> from du where path like <span class="se">\'</span>%.class<span class="se">\'</span> |
| 4416 27960</code></pre></figure> |
| |
| <p>The back-slashes are necessary because <code class="highlighter-rouge">(</code>, <code class="highlighter-rouge">*</code>, <code class="highlighter-rouge">)</code>, and <code class="highlighter-rouge">'</code> are shell meta-characters.</p> |
| |
| <h2 id="example-files">Example: files</h2> |
| |
| <p>How many files and directories? In <code class="highlighter-rouge">bash</code>, you would use <code class="highlighter-rouge">find</code>:</p> |
| |
| <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>find <span class="nb">.</span> <span class="nt">-printf</span> <span class="s2">"%Y %p</span><span class="se">\n</span><span class="s2">"</span> | <span class="nb">grep</span> <span class="s1">'/test/'</span> | cut <span class="nt">-d</span><span class="s1">' '</span> <span class="nt">-f1</span> | sort | uniq <span class="nt">-c</span> |
| 143 d |
| 1336 f</code></pre></figure> |
| |
| <p>In <code class="highlighter-rouge">sqlsh</code>, use the <code class="highlighter-rouge">files</code> table:</p> |
| |
| <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>sqlsh <span class="k">select </span><span class="nb">type</span>, count<span class="se">\(\*\)</span> from files where path like <span class="se">\'</span>%/test/%<span class="se">\'</span> group by <span class="nb">type |
| </span>d 143 |
| f 1336</code></pre></figure> |
| |
| <h2 id="example-ps">Example: ps</h2> |
| |
| <p>Which users have processes running? In <code class="highlighter-rouge">sqlsh</code>:</p> |
| |
| <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>sqlsh <span class="k">select </span>distinct ps.<span class="se">\`</span>user<span class="se">\`</span> from ps |
| avahi |
| root |
| jhyde |
| syslog |
| nobody |
| daemon</code></pre></figure> |
| |
| <p>The <code class="highlighter-rouge">ps.</code> qualifier and back-quotes are necessary because USER is a SQL reserved word.</p> |
| |
| <p>Now a ‘top N’ problem: Which three users have the most processes? In <code class="highlighter-rouge">bash</code>:</p> |
| |
| <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>ps aux | awk <span class="s1">'{print $1}'</span> | sort | uniq <span class="nt">-c</span> | sort <span class="nt">-nr</span> | head <span class="nt">-3</span></code></pre></figure> |
| |
| <p>In <code class="highlighter-rouge">sqlsh</code>:</p> |
| |
| <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>./sqlsh <span class="k">select </span>count<span class="se">\(\*\)</span>, ps.<span class="se">\`</span>user<span class="se">\`</span> from ps group by ps.<span class="se">\`</span>user<span class="se">\`</span> order by 1 desc limit 3 |
| 185 root |
| 69 jhyde |
| 2 avahi</code></pre></figure> |
| |
| <h2 id="example-vmstat">Example: vmstat</h2> |
| |
| <p>How’s my memory?</p> |
| |
| <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>./sqlsh <span class="nt">-o</span> mysql <span class="k">select</span> <span class="se">\*</span> from vmstat |
| +--------+--------+----------+----------+----------+-----------+---------+---------+-------+-------+-----------+-----------+--------+--------+--------+--------+--------+ |
| | proc_r | proc_b | mem_swpd | mem_free | mem_buff | mem_cache | swap_si | swap_so | io_bi | io_bo | system_in | system_cs | cpu_us | cpu_sy | cpu_id | cpu_wa | cpu_st | |
| +--------+--------+----------+----------+----------+-----------+---------+---------+-------+-------+-----------+-----------+--------+--------+--------+--------+--------+ |
| | 12 | 0 | 54220 | 5174424 | 402180 | 4402196 | 0 | 0 | 15 | 35 | 3 | 2 | 7 | 1 | 92 | 0 | 0 | |
| +--------+--------+----------+----------+----------+-----------+---------+---------+-------+-------+-----------+-----------+--------+--------+--------+--------+--------+ |
| <span class="o">(</span>1 row<span class="o">)</span></code></pre></figure> |
| |
| <h2 id="example-explain">Example: explain</h2> |
| |
| <p>To find out what columns a table has, use <code class="highlighter-rouge">explain</code>:</p> |
| |
| <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>sqlsh explain plan with <span class="nb">type </span><span class="k">for select</span> <span class="se">\*</span> from du |
| size_k BIGINT NOT NULL, |
| path VARCHAR NOT NULL, |
| size_b BIGINT NOT NULL</code></pre></figure> |
| |
| <h2 id="example-git">Example: git</h2> |
| |
| <p>How many commits and distinct authors per year? |
| The <code class="highlighter-rouge">git_commits</code> table is based upon the <code class="highlighter-rouge">git log</code> command.</p> |
| |
| <figure class="highlight"><pre><code class="language-bash" data-lang="bash">./sqlsh <span class="k">select </span>floor<span class="se">\(</span>commit_timestamp to year<span class="se">\)</span> as y, count<span class="se">\(\*\)</span>, count<span class="se">\(</span>distinct author<span class="se">\)</span> from git_commits group by y order by 1 |
| 2012-01-01 00:00:00 180 6 |
| 2013-01-01 00:00:00 502 13 |
| 2014-01-01 00:00:00 679 36 |
| 2015-01-01 00:00:00 470 45 |
| 2016-01-01 00:00:00 465 67 |
| 2017-01-01 00:00:00 279 53</code></pre></figure> |
| |
| <p>Note that <code class="highlighter-rouge">group by y</code> is possible because <code class="highlighter-rouge">sqlsh</code> uses Calcite’s |
| <a href="/apidocs/org/apache/calcite/sql/validate/SqlConformance.html#isGroupByAlias--">lenient mode</a>.</p> |
| |
| <h2 id="example-stdin">Example: stdin</h2> |
| |
| <p>Print the stdin, adding a number to each line.</p> |
| |
| <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span><span class="o">(</span><span class="nb">echo </span>cats<span class="p">;</span> <span class="nb">echo </span>and dogs<span class="o">)</span> | <span class="nb">cat</span> <span class="nt">-n</span> - |
| 1 cats |
| 2 and dogs</code></pre></figure> |
| |
| <p>In <code class="highlighter-rouge">sqlsh</code>:</p> |
| |
| <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span><span class="o">(</span><span class="nb">echo </span>cats<span class="p">;</span> <span class="nb">echo </span>and dogs<span class="o">)</span> | ./sqlsh <span class="k">select</span> <span class="se">\*</span> from stdin |
| 1 cats |
| 2 and dogs</code></pre></figure> |
| |
| <h2 id="example-output-format">Example: output format</h2> |
| |
| <p>The <code class="highlighter-rouge">-o</code> option controls output format.</p> |
| |
| <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>./sqlsh <span class="nt">-o</span> mysql <span class="k">select </span>min<span class="se">\(</span>size_k<span class="se">\)</span>, max<span class="se">\(</span>size_k<span class="se">\)</span> from du |
| +--------+--------+ |
| | EXPR<span class="nv">$0</span> | EXPR<span class="nv">$1</span> | |
| +--------+--------+ |
| | 0 | 94312 | |
| +--------+--------+ |
| <span class="o">(</span>1 row<span class="o">)</span></code></pre></figure> |
| |
| <p>Format options:</p> |
| |
| <ul> |
| <li>spaced - spaces between fields (the default)</li> |
| <li>headers - as spaced, but with headers</li> |
| <li>csv - comma-separated values</li> |
| <li>json - JSON, one object per row</li> |
| <li>mysql - an aligned table, in the same format used by MySQL</li> |
| </ul> |
| |
| <h1 id="further-work">Further work</h1> |
| |
| <p>The OS adapter was created in |
| [<a href="https://issues.apache.org/jira/browse/CALCITE-1896">CALCITE-1896</a>] |
| but is not complete.</p> |
| |
| <p>Some ideas for further work:</p> |
| |
| <ul> |
| <li>Allow ‘-‘and ‘.’ in unquoted table names (to match typical file names)</li> |
| <li>Allow ordinal field references, for example ‘$3’. This would help for files |
| that do not have named fields, for instance <code class="highlighter-rouge">stdin</code>, but you could use them |
| even if fields have names. Also ‘$0’ to mean the whole input line.</li> |
| <li>Use the file adapter, e.g. <code class="highlighter-rouge">select * from file.scott.emp</code> would use the |
| <a href="file_adapter.html">file adapter</a> to open the file <code class="highlighter-rouge">scott/emp.csv</code></li> |
| <li>More tables based on git, e.g. branches, tags, files changed in each commit</li> |
| <li><code class="highlighter-rouge">wc</code> function, e.g. <code class="highlighter-rouge">select path, lineCount from git_ls_files cross apply wc(path)</code></li> |
| <li>Move <code class="highlighter-rouge">sqlsh</code> command, or at least the java code underneath it, |
| into <a href="https://github.com/julianhyde/sqlline">sqlline</a></li> |
| </ul> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| </article> |
| </div> |
| |
| <div class="unit one-fifth hide-on-mobiles"> |
| <aside> |
| |
| <h4>Overview</h4> |
| |
| |
| <ul> |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/index.html">Background</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/tutorial.html">Tutorial</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/algebra.html">Algebra</a></li> |
| |
| |
| </ul> |
| |
| |
| <h4>Advanced</h4> |
| |
| |
| <ul> |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/adapter.html">Adapters</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/spatial.html">Spatial</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/stream.html">Streaming</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/materialized_views.html">Materialized Views</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/lattice.html">Lattices</a></li> |
| |
| |
| </ul> |
| |
| |
| <h4>Avatica</h4> |
| |
| |
| <ul> |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/avatica_overview.html">Overview</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/avatica_roadmap.html">Roadmap</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/avatica_json_reference.html">JSON Reference</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/avatica_protobuf_reference.html">Protobuf Reference</a></li> |
| |
| |
| </ul> |
| |
| |
| <h4>Reference</h4> |
| |
| |
| <ul> |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/reference.html">SQL language</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/model.html">JSON/YAML models</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/howto.html">HOWTO</a></li> |
| |
| |
| </ul> |
| |
| |
| <h4>Meta</h4> |
| |
| |
| <ul> |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/history.html">History</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/docs/powered_by.html">Powered by Calcite</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/apidocs">API</a></li> |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| <li class=""><a href="/testapidocs">Test API</a></li> |
| |
| |
| </ul> |
| |
| |
| </aside> |
| </div> |
| |
| |
| <div class="clear"></div> |
| |
| </div> |
| </section> |
| |
| |
| <footer role="contentinfo"> |
| <div id="poweredby"> |
| <a href="http://www.apache.org/"> |
| <span class="sr-only">Apache</span> |
| <img src="/img/feather.png" width="190" height="77" alt="Apache Logo"></a> |
| </div> |
| <div id="copyright"> |
| <p>The contents of this website are Copyright © 2019 |
| <a href="https://www.apache.org/">Apache Software Foundation</a> |
| under the terms of |
| the <a href="https://www.apache.org/licenses/"> |
| Apache License v2</a>. Apache Calcite and its logo are |
| trademarks of the Apache Software Foundation.</p> |
| </div> |
| </footer> |
| |
| <script> |
| var anchorForId = function (id) { |
| var anchor = document.createElement("a"); |
| anchor.className = "header-link"; |
| anchor.href = "#" + id; |
| anchor.innerHTML = "<span class=\"sr-only\">Permalink</span><i class=\"fa fa-link\"></i>"; |
| anchor.title = "Permalink"; |
| return anchor; |
| }; |
| |
| var linkifyAnchors = function (level, containingElement) { |
| var headers = containingElement.getElementsByTagName("h" + level); |
| for (var h = 0; h < headers.length; h++) { |
| var header = headers[h]; |
| |
| if (typeof header.id !== "undefined" && header.id !== "") { |
| header.appendChild(anchorForId(header.id)); |
| } |
| } |
| }; |
| |
| document.onreadystatechange = function () { |
| if (this.readyState === "complete") { |
| var contentBlock = document.getElementsByClassName("docs")[0] || document.getElementsByClassName("news")[0]; |
| if (!contentBlock) { |
| return; |
| } |
| for (var level = 1; level <= 6; level++) { |
| linkifyAnchors(level, contentBlock); |
| } |
| } |
| }; |
| </script> |
| |
| |
| </body> |
| </html> |