blob: 8ff84120c844f1c127bc51e52387b9391e319a10 [file] [log] [blame]
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="description" content="">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Lightning Fast and Space Efficient Inequality Joins</title>
<meta name="generator" content="Hugo 0.46" />
<link rel="stylesheet" href="https://rheem-ecosystem.github.io/css/owl.carousel.css" />
<link rel="stylesheet" href="https://rheem-ecosystem.github.io/css/bootstrap.min.css" />
<link rel="stylesheet" href="https://rheem-ecosystem.github.io/css/font-awesome.min.css" />
<link rel="stylesheet" href="https://rheem-ecosystem.github.io/css/airspace-local-fonts.css" />
<link rel="stylesheet" href="https://rheem-ecosystem.github.io/css/airspace.css" />
<link rel="stylesheet" href="https://rheem-ecosystem.github.io/css/style.css" />
<link rel="stylesheet" href="https://rheem-ecosystem.github.io/css/ionicons.min.css" />
<link rel="stylesheet" href="https://rheem-ecosystem.github.io/css/animate.css" />
<link rel="stylesheet" href="https://rheem-ecosystem.github.io/css/responsive.css" />
<link rel="stylesheet" href="https://rheem-ecosystem.github.io/css/syntax.css" />
<link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.2.0/css/all.css" integrity="sha384-hWVjflwFxL6sNzntih27bfxkr27PmbbK/iSvJ+a4+0owXq79v+lsFkW54bOGbiDQ" crossorigin="anonymous">
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
<script>window.jQuery || document.write('<script src="js/vendor/jquery-1.10.2.min.js"><\/script>')</script>
<script src="https://rheem-ecosystem.github.io/js/bootstrap.min.js"></script>
<script src="https://rheem-ecosystem.github.io/js/owl.carousel.min.js"></script>
<script src="https://rheem-ecosystem.github.io/js/plugins.js"></script>
<script src="https://rheem-ecosystem.github.io/js/min/waypoints.min.js"></script>
<script src="https://rheem-ecosystem.github.io/js/jquery.counterup.js"></script>
<script src="https://rheem-ecosystem.github.io/js/main.js"></script>
<script>
var doNotTrack = false;
if (!doNotTrack) {
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-124750019-1', 'auto');
ga('send', 'pageview');
}
</script>
<script>
var doNotTrack = false;
if (!doNotTrack) {
window.ga=window.ga||function(){(ga.q=ga.q||[]).push(arguments)};ga.l=+new Date;
ga('create', 'UA-124750019-1', 'auto');
ga('send', 'pageview');
}
</script>
<script async src='https://www.google-analytics.com/analytics.js'></script>
</head>
<body>
<header>
<a href="https://github.com/rheem-ecosystem/rheem"><img style="position: absolute; top: 0; left: 0; border: 0; z-index: 12; " src="https://camo.githubusercontent.com/121cd7cbdc3e4855075ea8b558508b91ac463ac2/68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f6769746875622f726962626f6e732f666f726b6d655f6c6566745f677265656e5f3030373230302e706e67" alt="Fork me on GitHub" data-canonical-src="https://s3.amazonaws.com/github/ribbons/forkme_left_green_007200.png"></a>
<div class="container-fluid">
<div class="row">
<div class="col-md-12">
<nav class="navbar navbar-default">
<div class="container-fluid">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="https://rheem-ecosystem.github.io/">
<img src="https://rheem-ecosystem.github.io/img/logo.png" alt="Rheem X-Platform Logo" class="logo-rheem">
</a>
</div>
<div class="collapse navbar-collapse move" id="bs-example-navbar-collapse-1">
<ul class="nav navbar-nav navbar-right">
<li><a href="https://rheem-ecosystem.github.io/">Home</a></li>
<li><a href="/about/">About</a></li>
<li><a href="/documentation/">Documentation</a></li>
<li><a href="/publication/">Publications</a></li>
<li><a href="/people/">People</a></li>
<li><a href="/download/">Download</a></li>
<li><a href="/contact/">Contact</a></li>
</ul>
</div>
</div>
</nav>
</div>
</div>
</div>
</header>
<div class="post">
<section class="section" style="border: 1px dotted #ddd;">
<div class="container">
<div class="row">
<div>
<div class="block">
<h1>Lightning Fast and Space Efficient Inequality Joins</h1>
<div class="post-info-wrapper">
<p class="italic">By <span class="bold">Zuhair Khayyat, William Lucia, Meghna Singh, Mourad Ouzzani, Paolo Papotti, Jorge-Arnulfo Quiané-Ruiz, Nan Tang and Panos Kalnis</span> on <span class="bold">2015</span></p>
</div>
<hr />
<p><p>Inequality joins, which join relational tables on inequality conditions, are used in various applications. While there have been a wide range of optimization methods for joins in database systems, from algorithms such as sort-merge join and band join, to various indices such as B+-tree, R∗-tree and Bitmap, inequality joins have received little attention and queries containing such joins are usually very slow. In this paper, we introduce fast inequality join algorithms. We put columns to be joined in sorted arrays and we use permutation arrays to encode positions of tuples in one sorted array w.r.t. the other sorted array. In contrast to sort-merge join, we use space efficient bit-arrays that enable optimizations, such as Bloom filter indices, for fast computation of the join results. We have implemented a centralized version of these algorithms on top of PostgreSQL, and a distributed version on top of Spark SQL. We have compared against well known optimization techniques for inequality joins and show that our solution is more scalable and several orders of magnitude faster.</p>
</p>
<hr />
<p class="center-text" style="width: 100%">
<a href="https://rheem-ecosystem.github.io/pdf/paper/iejoin.pdf" class="btn btn-main"><i class="fa fa-file-pdf-o"></i> Download</a>
<br>
</p>
</div>
</div>
</div>
</div>
</section>
</div>
<p class="center-text" style="padding: 30px;">
<a class="btn btn-main" href="https://rheem-ecosystem.github.io//publication">Back to Publications</a>
</p>
<section id="call-to-action">
<div class="container-fluid">
<div class="row">
<div class="col-md-12">
<div class="block">
<h2>Turning a Zoo into a Circus</h2>
<p><a class="nothing" href="https://rheem-ecosystem.github.io/about">Read more</a> on how <b>RHEEM</b> tame the Zoo of existing data processing platforms to work together.</p>
</div>
</div>
</div>
</div>
</section>
<footer>
<div class="container-fluid">
<div class="row">
<div class="col-md-12">
<div class="footer-manu">
<ul>
<li><a href="https://www.qcri.org/">About QCRI</a></li>
<li><a href="/contact">Contact us</a></li>
<li><a href="https://github.com/rheem-ecosystem">Fork me</a></li>
<li><a href="https://github.com/rheem-ecosystem/blob/master/LICENSE.TXT">License</a></li>
</ul>
</div>
<p>Copyright &copy; Developed by the dRHEEMers @ QCRI. All rights reserved.</p>
</div>
</div>
</div>
</footer>
</body>
</html>