blob: 5ab11e864fe021011766981e83f668aff05b00d8 [file] [log] [blame]
<!DOCTYPE HTML>
<html lang="en-US">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta charset="UTF-8">
<title>OS adapter and sqlsh</title>
<meta name="viewport" content="width=device-width,initial-scale=1">
<meta name="generator" content="Jekyll v4.2.2">
<link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
<link rel="stylesheet" href="/css/screen.css">
<link rel="icon" type="image/x-icon" href="/favicon.ico">
</head>
<body class="wrap">
<header role="banner">
<div class="grid">
<div class="unit center-on-mobiles">
<h1>
<a href="/">
<span class="sr-only">Apache Calcite</span>
<img src="/img/logo.svg" alt="Calcite Logo">
</a>
</h1>
</div>
<nav class="main-nav">
<ul>
<li class="">
<a href="/">Home</a>
</li>
<li class="">
<a href="/downloads/">Download</a>
</li>
<li class="">
<a href="/community/">Community</a>
</li>
<li class="">
<a href="/develop/">Develop</a>
</li>
<li class="">
<a href="/news/">News</a>
</li>
<li class="current">
<a href="/docs/">Docs</a>
</li>
</ul>
</nav>
</div>
</header>
<section class="docs">
<div class="grid">
<div class="docs-nav-mobile unit whole show-on-mobiles">
<select onchange="if (this.value) window.location.href=this.value">
<option value="">Navigate the docs…</option>
<optgroup label="Overview">
</optgroup>
<optgroup label="Advanced">
</optgroup>
<optgroup label="Avatica">
</optgroup>
<optgroup label="Reference">
</optgroup>
<optgroup label="Meta">
</optgroup>
</select>
</div>
<div class="unit four-fifths">
<article>
<h1>OS adapter and sqlsh</h1>
<!--
-->
<h1 id="overview">Overview</h1>
<p>The OS (operating system) adapter allows you to access data in your operating
system and environment using SQL queries.</p>
<p>It aims to solve similar problems that have traditionally been solved using UNIX
command pipelines, but with the power and type-safety of SQL.</p>
<p>The adapter also includes a wrapper called <code class="language-plaintext highlighter-rouge">sqlsh</code> that allows you to execute
commands from your favorite shell.</p>
<h1 id="security-warning">Security warning</h1>
<p>The OS adapter launches processes, and is potentially a security loop-hole.
It is included in Calcite’s “plus” module, which is not enabled by default.
You must think carefully before enabling it in a security-sensitive situation.</p>
<h1 id="compatibility">Compatibility</h1>
<p>We try to support all tables on every operating system, and to make sure that
the tables have the same columns. But we rely heavily on operating system
commands, and these differ widely. So:</p>
<ul>
<li>These commands only work on Linux and macOS (not Windows, even with Cygwin);</li>
<li>
<code class="language-plaintext highlighter-rouge">vmstat</code> has very different columns between Linux and macOS;</li>
<li>
<code class="language-plaintext highlighter-rouge">files</code> and <code class="language-plaintext highlighter-rouge">ps</code> have the same column names but semantics differ;</li>
<li>Other commands work largely the same.</li>
</ul>
<h1 id="a-simple-example">A simple example</h1>
<p>Every bash hacker knows that to find the 3 largest files you type</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>find <span class="nb">.</span> <span class="nt">-type</span> f <span class="nt">-print0</span> | xargs <span class="nt">-0</span> <span class="nb">ls</span> <span class="nt">-l</span> | <span class="nb">sort</span> <span class="nt">-nr</span> <span class="nt">-k</span> 5 | <span class="nb">head</span> <span class="nt">-3</span>
<span class="nt">-rw-r--r--</span> 1 jhyde jhyde 194877 Jul 16 16:10 ./validate/SqlValidatorImpl.java
<span class="nt">-rw-r--r--</span> 1 jhyde jhyde 73826 Jul 4 21:51 ./fun/SqlStdOperatorTable.java
<span class="nt">-rw-r--r--</span> 1 jhyde jhyde 39214 Jul 4 21:51 ./type/SqlTypeUtil.java</code></pre></figure>
<p>This actually a pipeline of relational operations, each tuple represented
by line of space-separated fields. What if we were able to access the list of
files as a relation and use it in a SQL query? And what if we could easily
execute that SQL query from the shell? This is what <code class="language-plaintext highlighter-rouge">sqlsh</code> does:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>sqlsh <span class="k">select </span>size, path from files where <span class="nb">type</span> <span class="o">=</span> <span class="se">\'</span>f<span class="se">\'</span> order by size desc limit 3
194877 validate/SqlValidatorImpl.java
73826 fun/SqlStdOperatorTable.java
39214 <span class="nb">type</span>/SqlTypeUtil.java</code></pre></figure>
<h1 id="sqlsh">sqlsh</h1>
<p><code class="language-plaintext highlighter-rouge">sqlsh</code> launches a connection to Calcite whose default schema is the OS adapter.</p>
<p>It uses the JAVA lexical mode, which means that unquoted table and column names
remain in the case that they were written. This is consistent with how shells like
bash behave.</p>
<p>Shell meta-characters such as <code class="language-plaintext highlighter-rouge">*</code>, <code class="language-plaintext highlighter-rouge">&gt;</code>, <code class="language-plaintext highlighter-rouge">&lt;</code>, <code class="language-plaintext highlighter-rouge">(</code>, and <code class="language-plaintext highlighter-rouge">)</code> have to be treated with
care. Often adding a back-slash will suffice.</p>
<h1 id="tables-and-commands">Tables and commands</h1>
<p>The OS adapter contains the following tables:</p>
<ul>
<li>
<code class="language-plaintext highlighter-rouge">du</code> - Disk usage (based on <code class="language-plaintext highlighter-rouge">du</code> command)</li>
<li>
<code class="language-plaintext highlighter-rouge">ps</code> - Processes (based on <code class="language-plaintext highlighter-rouge">ps</code> command)</li>
<li>
<code class="language-plaintext highlighter-rouge">stdin</code> - Standard input</li>
<li>
<code class="language-plaintext highlighter-rouge">files</code> - Files (based on the <code class="language-plaintext highlighter-rouge">find</code> command)</li>
<li>
<code class="language-plaintext highlighter-rouge">git_commits</code> - Git commits (based on <code class="language-plaintext highlighter-rouge">git log</code>)</li>
<li>
<code class="language-plaintext highlighter-rouge">vmstat</code> - Virtual memory (based on <code class="language-plaintext highlighter-rouge">vmstat</code> command)</li>
</ul>
<p>Most tables are implemented as views on top of table functions.</p>
<p>New data sources are straightforward to add; please contribute yours!</p>
<h2 id="example-du">Example: du</h2>
<p>How many class files, and what is their total size? In <code class="language-plaintext highlighter-rouge">bash</code>:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span><span class="nb">du</span> <span class="nt">-ka</span> <span class="nb">.</span> | <span class="nb">grep</span> <span class="s1">'\.class$'</span> | <span class="nb">awk</span> <span class="s1">'{size+=$1} END {print FNR, size}'</span>
4416 27960</code></pre></figure>
<p>In <code class="language-plaintext highlighter-rouge">sqlsh</code>:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>sqlsh <span class="k">select </span>count<span class="se">\(\*\)</span>, <span class="nb">sum</span><span class="se">\(</span>size_k<span class="se">\)</span> from <span class="nb">du </span>where path like <span class="se">\'</span>%.class<span class="se">\'</span>
4416 27960</code></pre></figure>
<p>The back-slashes are necessary because <code class="language-plaintext highlighter-rouge">(</code>, <code class="language-plaintext highlighter-rouge">*</code>, <code class="language-plaintext highlighter-rouge">)</code>, and <code class="language-plaintext highlighter-rouge">'</code> are shell meta-characters.</p>
<h2 id="example-files">Example: files</h2>
<p>How many files and directories? In <code class="language-plaintext highlighter-rouge">bash</code>, you would use <code class="language-plaintext highlighter-rouge">find</code>:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>find <span class="nb">.</span> <span class="nt">-printf</span> <span class="s2">"%Y %p</span><span class="se">\n</span><span class="s2">"</span> | <span class="nb">grep</span> <span class="s1">'/test/'</span> | <span class="nb">cut</span> <span class="nt">-d</span><span class="s1">' '</span> <span class="nt">-f1</span> | <span class="nb">sort</span> | <span class="nb">uniq</span> <span class="nt">-c</span>
143 d
1336 f</code></pre></figure>
<p>In <code class="language-plaintext highlighter-rouge">sqlsh</code>, use the <code class="language-plaintext highlighter-rouge">files</code> table:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>sqlsh <span class="k">select </span><span class="nb">type</span>, count<span class="se">\(\*\)</span> from files where path like <span class="se">\'</span>%/test/%<span class="se">\'</span> group by <span class="nb">type
</span>d 143
f 1336</code></pre></figure>
<h2 id="example-ps">Example: ps</h2>
<p>Which users have processes running? In <code class="language-plaintext highlighter-rouge">sqlsh</code>:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>sqlsh <span class="k">select </span>distinct ps.<span class="se">\`</span>user<span class="se">\`</span> from ps
avahi
root
jhyde
syslog
nobody
daemon</code></pre></figure>
<p>The <code class="language-plaintext highlighter-rouge">ps.</code> qualifier and back-quotes are necessary because USER is a SQL reserved word.</p>
<p>Now a ‘top N’ problem: Which three users have the most processes? In <code class="language-plaintext highlighter-rouge">bash</code>:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>ps aux | <span class="nb">awk</span> <span class="s1">'{print $1}'</span> | <span class="nb">sort</span> | <span class="nb">uniq</span> <span class="nt">-c</span> | <span class="nb">sort</span> <span class="nt">-nr</span> | <span class="nb">head</span> <span class="nt">-3</span></code></pre></figure>
<p>In <code class="language-plaintext highlighter-rouge">sqlsh</code>:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>./sqlsh <span class="k">select </span>count<span class="se">\(\*\)</span>, ps.<span class="se">\`</span>user<span class="se">\`</span> from ps group by ps.<span class="se">\`</span>user<span class="se">\`</span> order by 1 desc limit 3
185 root
69 jhyde
2 avahi</code></pre></figure>
<h2 id="example-vmstat">Example: vmstat</h2>
<p>How’s my memory?</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>./sqlsh <span class="nt">-o</span> mysql <span class="k">select</span> <span class="se">\*</span> from vmstat
+--------+--------+----------+----------+----------+-----------+---------+---------+-------+-------+-----------+-----------+--------+--------+--------+--------+--------+
| proc_r | proc_b | mem_swpd | mem_free | mem_buff | mem_cache | swap_si | swap_so | io_bi | io_bo | system_in | system_cs | cpu_us | cpu_sy | cpu_id | cpu_wa | cpu_st |
+--------+--------+----------+----------+----------+-----------+---------+---------+-------+-------+-----------+-----------+--------+--------+--------+--------+--------+
| 12 | 0 | 54220 | 5174424 | 402180 | 4402196 | 0 | 0 | 15 | 35 | 3 | 2 | 7 | 1 | 92 | 0 | 0 |
+--------+--------+----------+----------+----------+-----------+---------+---------+-------+-------+-----------+-----------+--------+--------+--------+--------+--------+
<span class="o">(</span>1 row<span class="o">)</span></code></pre></figure>
<h2 id="example-explain">Example: explain</h2>
<p>To find out what columns a table has, use <code class="language-plaintext highlighter-rouge">explain</code>:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>sqlsh explain plan with <span class="nb">type </span><span class="k">for select</span> <span class="se">\*</span> from <span class="nb">du
</span>size_k BIGINT NOT NULL,
path VARCHAR NOT NULL,
size_b BIGINT NOT NULL</code></pre></figure>
<h2 id="example-git">Example: git</h2>
<p>How many commits and distinct authors per year?
The <code class="language-plaintext highlighter-rouge">git_commits</code> table is based upon the <code class="language-plaintext highlighter-rouge">git log</code> command.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">./sqlsh <span class="k">select </span>floor<span class="se">\(</span>commit_timestamp to year<span class="se">\)</span> as y, count<span class="se">\(\*\)</span>, count<span class="se">\(</span>distinct author<span class="se">\)</span> from git_commits group by y order by 1
2012-01-01 00:00:00 180 6
2013-01-01 00:00:00 502 13
2014-01-01 00:00:00 679 36
2015-01-01 00:00:00 470 45
2016-01-01 00:00:00 465 67
2017-01-01 00:00:00 279 53</code></pre></figure>
<p>Note that <code class="language-plaintext highlighter-rouge">group by y</code> is possible because <code class="language-plaintext highlighter-rouge">sqlsh</code> uses Calcite’s
<a href="/javadocAggregate/org/apache/calcite/sql/validate/SqlConformance.html#isGroupByAlias--">lenient mode</a>.</p>
<h2 id="example-stdin">Example: stdin</h2>
<p>Print the stdin, adding a number to each line.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span><span class="o">(</span><span class="nb">echo </span>cats<span class="p">;</span> <span class="nb">echo </span>and dogs<span class="o">)</span> | <span class="nb">cat</span> <span class="nt">-n</span> -
1 cats
2 and dogs</code></pre></figure>
<p>In <code class="language-plaintext highlighter-rouge">sqlsh</code>:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span><span class="o">(</span><span class="nb">echo </span>cats<span class="p">;</span> <span class="nb">echo </span>and dogs<span class="o">)</span> | ./sqlsh <span class="k">select</span> <span class="se">\*</span> from stdin
1 cats
2 and dogs</code></pre></figure>
<h2 id="example-output-format">Example: output format</h2>
<p>The <code class="language-plaintext highlighter-rouge">-o</code> option controls output format.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>./sqlsh <span class="nt">-o</span> mysql <span class="k">select </span>min<span class="se">\(</span>size_k<span class="se">\)</span>, max<span class="se">\(</span>size_k<span class="se">\)</span> from <span class="nb">du</span>
+--------+--------+
| EXPR<span class="nv">$0</span> | EXPR<span class="nv">$1</span> |
+--------+--------+
| 0 | 94312 |
+--------+--------+
<span class="o">(</span>1 row<span class="o">)</span></code></pre></figure>
<h2 id="example-jps">Example: jps</h2>
<p>provides a display of all current java process pids In <code class="language-plaintext highlighter-rouge">sqlsh</code>:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>./sqlsh <span class="k">select </span>distinct jps.<span class="se">\`</span>pid<span class="se">\`</span>, jps.<span class="se">\`</span>info<span class="se">\`</span> from jps
+--------+---------------------+
| pid | info |
+--------+---------------------+
| 49457 | RemoteMavenServer |
| 48326 | KotlinCompileDaemon |
+--------+---------------------+
<span class="o">(</span>1 row<span class="o">)</span></code></pre></figure>
<p>Format options:</p>
<ul>
<li>spaced - spaces between fields (the default)</li>
<li>headers - as spaced, but with headers</li>
<li>csv - comma-separated values</li>
<li>json - JSON, one object per row</li>
<li>mysql - an aligned table, in the same format used by MySQL</li>
</ul>
<h1 id="further-work">Further work</h1>
<p>The OS adapter was created in
[<a href="https://issues.apache.org/jira/browse/CALCITE-1896">CALCITE-1896</a>]
but is not complete.</p>
<p>Some ideas for further work:</p>
<ul>
<li>Allow ‘-‘and ‘.’ in unquoted table names (to match typical file names)</li>
<li>Allow ordinal field references, for example ‘$3’. This would help for files
that do not have named fields, for instance <code class="language-plaintext highlighter-rouge">stdin</code>, but you could use them
even if fields have names. Also ‘$0’ to mean the whole input line.</li>
<li>Use the file adapter, e.g. <code class="language-plaintext highlighter-rouge">select * from file.scott.emp</code> would use the
<a href="file_adapter.html">file adapter</a> to open the file <code class="language-plaintext highlighter-rouge">scott/emp.csv</code>
</li>
<li>More tables based on git, e.g. branches, tags, files changed in each commit</li>
<li>
<code class="language-plaintext highlighter-rouge">wc</code> function, e.g. <code class="language-plaintext highlighter-rouge">select path, lineCount from git_ls_files cross apply wc(path)</code>
</li>
<li>Move <code class="language-plaintext highlighter-rouge">sqlsh</code> command, or at least the java code underneath it,
into <a href="https://github.com/julianhyde/sqlline">sqlline</a>
</li>
</ul>
</article>
</div>
<div class="unit one-fifth hide-on-mobiles">
<aside>
<h4>Overview</h4>
<ul>
<li class=""><a href="/docs/index.html">Background</a></li>
<li class=""><a href="/docs/tutorial.html">Tutorial</a></li>
<li class=""><a href="/docs/algebra.html">Algebra</a></li>
</ul>
<h4>Advanced</h4>
<ul>
<li class=""><a href="/docs/adapter.html">Adapters</a></li>
<li class=""><a href="/docs/spatial.html">Spatial</a></li>
<li class=""><a href="/docs/stream.html">Streaming</a></li>
<li class=""><a href="/docs/materialized_views.html">Materialized Views</a></li>
<li class=""><a href="/docs/lattice.html">Lattices</a></li>
</ul>
<h4>Avatica</h4>
<ul>
<li class=""><a href="/docs/avatica_overview.html">Overview</a></li>
<li class=""><a href="/docs/avatica_roadmap.html">Roadmap</a></li>
<li class=""><a href="/docs/avatica_json_reference.html">JSON Reference</a></li>
<li class=""><a href="/docs/avatica_protobuf_reference.html">Protobuf Reference</a></li>
</ul>
<h4>Reference</h4>
<ul>
<li class=""><a href="/docs/reference.html">SQL language</a></li>
<li class=""><a href="/docs/model.html">JSON/YAML models</a></li>
<li class=""><a href="/docs/howto.html">HOWTO</a></li>
</ul>
<h4>Meta</h4>
<ul>
<li class=""><a href="/docs/history.html">History</a></li>
<li class=""><a href="/docs/powered_by.html">Powered by Calcite</a></li>
<li class=""><a href="/javadocAggregate">API</a></li>
</ul>
</aside>
</div>
<div class="clear"></div>
</div>
</section>
<footer role="contentinfo">
<div id="poweredby">
<a href="http://www.apache.org/">
<span class="sr-only">Apache</span>
<img src="/img/feather.png" width="190" height="77" alt="Apache Logo"></a>
</div>
<div id="copyright">
<p>The contents of this website are Copyright © 2023
<a href="https://www.apache.org/">Apache Software Foundation</a>
under the terms of
the <a href="https://www.apache.org/licenses/">
Apache License v2</a>. Apache Calcite and its logo are
trademarks of the Apache Software Foundation.
</p>
<p>
<a href="https://privacy.apache.org/policies/privacy-policy-public.html">Privacy Policy</a>
</p>
</div>
</footer>
<script>
var anchorForId = function (id) {
var anchor = document.createElement("a");
anchor.className = "header-link";
anchor.href = "#" + id;
anchor.innerHTML = "<span class=\"sr-only\">Permalink</span><i class=\"fa fa-link\"></i>";
anchor.title = "Permalink";
return anchor;
};
var linkifyAnchors = function (level, containingElement) {
var headers = containingElement.getElementsByTagName("h" + level);
for (var h = 0; h < headers.length; h++) {
var header = headers[h];
if (typeof header.id !== "undefined" && header.id !== "") {
header.appendChild(anchorForId(header.id));
}
}
};
document.onreadystatechange = function () {
if (this.readyState === "complete") {
var contentBlock = document.getElementsByClassName("docs")[0] || document.getElementsByClassName("news")[0];
if (!contentBlock) {
return;
}
for (var level = 1; level <= 6; level++) {
linkifyAnchors(level, contentBlock);
}
}
};
</script>
</body>
</html>