blob: f75e141f9579e068755a72cc449d85b853672c17 [file] [log] [blame]
<!DOCTYPE html>
<!--
| Generated by Apache Maven Doxia Site Renderer 1.8.1 from target/generated-site/markdown/interval_join.md at 2021-12-13
| Rendered using Apache Maven Fluido Skin 1.7
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="Date-Revision-yyyymmdd" content="20211213" />
<meta http-equiv="Content-Language" content="en" />
<title>AsterixDB &#x2013; Interval Joins</title>
<link rel="stylesheet" href="./css/apache-maven-fluido-1.7.min.css" />
<link rel="stylesheet" href="./css/site.css" />
<link rel="stylesheet" href="./css/print.css" media="print" />
<script type="text/javascript" src="./js/apache-maven-fluido-1.7.min.js"></script>
</head>
<body class="topBarDisabled">
<div class="container-fluid">
<div id="banner">
<div class="pull-left"><a href="./" id="bannerLeft"><img src="images/asterixlogo.png" alt="AsterixDB"/></a></div>
<div class="pull-right"></div>
<div class="clear"><hr/></div>
</div>
<div id="breadcrumbs">
<ul class="breadcrumb">
<li id="publishDate">Last Published: 2021-12-13</li>
<li id="projectVersion" class="pull-right">Version: 0.9.7.1</li>
<li class="pull-right"><a href="index.html" title="Documentation Home">Documentation Home</a></li>
</ul>
</div>
<div class="row-fluid">
<div id="leftColumn" class="span2">
<div class="well sidebar-nav">
<ul class="nav nav-list">
<li class="nav-header">Get Started - Installation</li>
<li><a href="ncservice.html" title="Option 1: using NCService"><span class="none"></span>Option 1: using NCService</a></li>
<li><a href="ansible.html" title="Option 2: using Ansible"><span class="none"></span>Option 2: using Ansible</a></li>
<li><a href="aws.html" title="Option 3: using Amazon Web Services"><span class="none"></span>Option 3: using Amazon Web Services</a></li>
<li class="nav-header">AsterixDB Primer</li>
<li><a href="sqlpp/primer-sqlpp.html" title="Using SQL++"><span class="none"></span>Using SQL++</a></li>
<li class="nav-header">Data Model</li>
<li><a href="datamodel.html" title="The Asterix Data Model"><span class="none"></span>The Asterix Data Model</a></li>
<li class="nav-header">Queries</li>
<li><a href="sqlpp/manual.html" title="The SQL++ Query Language"><span class="none"></span>The SQL++ Query Language</a></li>
<li><a href="SQLPP.html" title="Raw SQL++ Grammar"><span class="none"></span>Raw SQL++ Grammar</a></li>
<li><a href="sqlpp/builtins.html" title="Builtin Functions"><span class="none"></span>Builtin Functions</a></li>
<li class="nav-header">API/SDK</li>
<li><a href="api.html" title="HTTP API"><span class="none"></span>HTTP API</a></li>
<li><a href="csv.html" title="CSV Output"><span class="none"></span>CSV Output</a></li>
<li class="nav-header">Advanced Features</li>
<li><a href="aql/externaldata.html" title="Accessing External Data"><span class="none"></span>Accessing External Data</a></li>
<li><a href="feeds.html" title="Data Ingestion with Feeds"><span class="none"></span>Data Ingestion with Feeds</a></li>
<li><a href="udf.html" title="User Defined Functions"><span class="none"></span>User Defined Functions</a></li>
<li><a href="sqlpp/filters.html" title="Filter-Based LSM Index Acceleration"><span class="none"></span>Filter-Based LSM Index Acceleration</a></li>
<li><a href="sqlpp/fulltext.html" title="Support of Full-text Queries"><span class="none"></span>Support of Full-text Queries</a></li>
<li><a href="sqlpp/similarity.html" title="Support of Similarity Queries"><span class="none"></span>Support of Similarity Queries</a></li>
<li class="active"><a href="#"><span class="none"></span>Support of Interval Joins</a></li>
<li><a href="sqlpp/arrayindex.html" title="Support of Array Indexes"><span class="none"></span>Support of Array Indexes</a></li>
<li class="nav-header">Deprecated</li>
<li><a href="aql/primer.html" title="AsterixDB Primer: Using AQL"><span class="none"></span>AsterixDB Primer: Using AQL</a></li>
<li><a href="aql/manual.html" title="Queries: The Asterix Query Language (AQL)"><span class="none"></span>Queries: The Asterix Query Language (AQL)</a></li>
<li><a href="aql/builtins.html" title="Queries: Builtin Functions (AQL)"><span class="none"></span>Queries: Builtin Functions (AQL)</a></li>
</ul>
<hr />
<div id="poweredBy">
<div class="clear"></div>
<div class="clear"></div>
<div class="clear"></div>
<div class="clear"></div>
<a href="./" title="AsterixDB" class="builtBy"><img class="builtBy" alt="AsterixDB" src="images/asterixlogo.png" /></a>
</div>
</div>
</div>
<div id="bodyColumn" class="span10" >
<!--
! Licensed to the Apache Software Foundation (ASF) under one
! or more contributor license agreements. See the NOTICE file
! distributed with this work for additional information
! regarding copyright ownership. The ASF licenses this file
! to you under the Apache License, Version 2.0 (the
! "License"); you may not use this file except in compliance
! with the License. You may obtain a copy of the License at
!
! http://www.apache.org/licenses/LICENSE-2.0
!
! Unless required by applicable law or agreed to in writing,
! software distributed under the License is distributed on an
! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
! KIND, either express or implied. See the License for the
! specific language governing permissions and limitations
! under the License.
!-->
<h1>Interval Joins</h1>
<div class="section">
<h2><a name="Table_of_Contents"></a><a name="atoc" id="#toc">Table of Contents</a></h2>
<ul>
<li><a href="#Interval_joins">Introduction</a></li>
<li><a href="#Range_hint">Range Hints</a></li>
</ul><!--
! Licensed to the Apache Software Foundation (ASF) under one
! or more contributor license agreements. See the NOTICE file
! distributed with this work for additional information
! regarding copyright ownership. The ASF licenses this file
! to you under the Apache License, Version 2.0 (the
! "License"); you may not use this file except in compliance
! with the License. You may obtain a copy of the License at
!
! http://www.apache.org/licenses/LICENSE-2.0
!
! Unless required by applicable law or agreed to in writing,
! software distributed under the License is distributed on an
! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
! KIND, either express or implied. See the License for the
! specific language governing permissions and limitations
! under the License.
!-->
</div>
<div class="section">
<h2><a name="Interval_Joins"></a><a name="Interval_joins" id="Interval_joins">Interval Joins</a></h2>
<p>This system allows for the 13 types of Allen&#x2019;s interval-join relations. The default, when using these joins, is either Nested Loop, or Hybrid Hash Join. The optimal algorithm will be automatically selected based on the query. Hybrid Hash Join will be selected in cases where the relation includes an equality; these cases are: <tt>interval_starts()</tt>, <tt>interval_started_by()</tt>, <tt>interval_ends()</tt>, <tt>interval_ended_by()</tt>, <tt>interval_meets()</tt>, and <tt>interval_met_by()</tt>. Otherwise, the system will default to nested loop join. To use interval merge join you must include a range hint. Adding a range hint allows for the system to pick interval merge join.</p>
<p>The 13 interval functions are <tt>interval_after()</tt>, <tt>interval_before()</tt>, <tt>interval_covers()</tt>, <tt>interval_covered_by()</tt>, <tt>interval_ends()</tt>, <tt>interval_ended_by()</tt>, <tt>interval_meets()</tt>, <tt>interval_met_by()</tt>, <tt>interval_overlaps()</tt>, <tt>interval_overlapping()</tt>, <tt>interval_overlapped_by()</tt>, <tt>interval_starts()</tt>, and <tt>interval_started_by()</tt>.</p>
<div class="section">
<div class="section">
<div class="section">
<h5><a name="How_to_use_an_interval_join"></a>How to use an interval join</h5>
<div>
<div>
<pre class="source">select f.name as staff, d.name as student
from Staff as f, Students as d
where interval_after(f.employment, d.attendance)
</pre></div></div>
<p>In this scenario, <tt>interval_after()</tt> can be replaced with any of the 13 join functions. Here is what each of the functions represent if A represents the first interval parameter, and B represents the second set interval parameter:</p>
<table border="0" class="table table-striped">
<thead>
<tr class="a">
<th> Function </th>
<th> Condition </th></tr>
</thead><tbody>
<tr class="b">
<td> Before(A, B) and After(B, A) </td>
<td> A.end &lt; B.start </td></tr>
<tr class="a">
<td> Covers(A, B) and Covered_by(B, A) </td>
<td> A.start &lt;= B.start and A.end &gt;= B.end </td></tr>
<tr class="b">
<td> Ends(A, B) and Ended_by(B, A) </td>
<td> A.end = B.end and A.start &gt;= B.start </td></tr>
<tr class="a">
<td> Meets(A, B) and Met_by(B, A) </td>
<td> A.end = B.start </td></tr>
<tr class="b">
<td> Overlaps(A, B) and Overlapped_by(B, A) </td>
<td> A.start &lt; B.start and B.start &gt; A.end and A.end &gt; B.start </td></tr>
<tr class="a">
<td> Overlapping(A, B)</td>
<td> (A.start &gt;= B.start and B.start &lt; A.end) or (B.end &lt;= A.end and B.end &lt; A.start)</td></tr>
<tr class="b">
<td> Starts(A, B) and Started_by(B, A) </td>
<td> A.start = B.start and A.end &lt;= B.end </td></tr>
</tbody>
</table></div></div></div>
<div class="section">
<h3><a name="Using_a_Range_Hint"></a><a name="Range_hint" id="Range_hint"> Using a Range Hint </a></h3>
<p>To use an efficient interval join the data must be partitioned with the details in a range hint. Interval joins with a range hint currently work for intervals types of date, datetime, or time; the range hint type must match the interval type. Adding a range hint directly before the interval join function will cause the system to pick interval merge join for these interval functions: <tt>interval_after()</tt>, <tt>interval_before()</tt>, <tt>interval_covers()</tt>, <tt>interval_covered_by()</tt>, <tt>interval_overlaps()</tt>, <tt>interval_overlapping()</tt>, <tt>interval_overlapped_by()</tt>. The other relations will ignore the range hint and pick Hybrid Hash Join as described earlier.</p>
<p>Here is an example of how interval joins work with a range hint for all the supported data types. Suppose that we have two sets of data, a data set of staff members with an interval for length of employment and an id. The other dataset represents students, which may include an interval for attendance and an id. Each partition receives data based on the split points; The split points in the range hint must be strategically set by the user so that the data divides evenly among partitions. For example, if your query contains 1 split point, and the system is using two partitions, the data before the split point will be sent to the first partition, and the data after the split point will be sent to the second partition. This continues to work respectively based on the number of split points and number of partitions. Ideally, the number of split points should equal the number of partitions - 1.</p>
<div class="section">
<div class="section">
<h5><a name="Range_Hint_Example"></a>Range Hint Example</h5>
<div>
<div>
<pre class="source">/*+ range [&lt;Expression&gt;, ..., ] */
</pre></div></div>
</div>
<div class="section">
<h5><a name="Range_Hint_Example_with_Date"></a>Range Hint Example with Date</h5>
<div>
<div>
<pre class="source">select f.name as staff, d.name as student
from Staff as f, Students as d
where
/*+ range [date(&quot;2003-06-30&quot;), date(&quot;2005-12-31&quot;), date(&quot;2008-06-30&quot;)] */
interval_after(f.employment, d.attendance)
order by f.name, d.name;
</pre></div></div></div></div></div></div>
</div>
</div>
</div>
<hr/>
<footer>
<div class="container-fluid">
<div class="row-fluid">
<div class="row-fluid">Apache AsterixDB, AsterixDB, Apache, the Apache
feather logo, and the Apache AsterixDB project logo are either
registered trademarks or trademarks of The Apache Software
Foundation in the United States and other countries.
All other marks mentioned may be trademarks or registered
trademarks of their respective owners.
</div>
</div>
</div>
</footer>
</body>
</html>