blob: b4f2cf7a51886dd529c0e9024b4bcd16da5c9c48 [file] [log] [blame]
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="description" content="">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Interoperating a Zoo of Data Processing Platforms Using Rheem</title>
<meta name="generator" content="Hugo 0.46" />
<link rel="stylesheet" href="https://rheem-ecosystem.github.io/css/owl.carousel.css" />
<link rel="stylesheet" href="https://rheem-ecosystem.github.io/css/bootstrap.min.css" />
<link rel="stylesheet" href="https://rheem-ecosystem.github.io/css/font-awesome.min.css" />
<link rel="stylesheet" href="https://rheem-ecosystem.github.io/css/airspace-local-fonts.css" />
<link rel="stylesheet" href="https://rheem-ecosystem.github.io/css/airspace.css" />
<link rel="stylesheet" href="https://rheem-ecosystem.github.io/css/style.css" />
<link rel="stylesheet" href="https://rheem-ecosystem.github.io/css/ionicons.min.css" />
<link rel="stylesheet" href="https://rheem-ecosystem.github.io/css/animate.css" />
<link rel="stylesheet" href="https://rheem-ecosystem.github.io/css/responsive.css" />
<link rel="stylesheet" href="https://rheem-ecosystem.github.io/css/syntax.css" />
<link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.2.0/css/all.css" integrity="sha384-hWVjflwFxL6sNzntih27bfxkr27PmbbK/iSvJ+a4+0owXq79v+lsFkW54bOGbiDQ" crossorigin="anonymous">
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
<script>window.jQuery || document.write('<script src="js/vendor/jquery-1.10.2.min.js"><\/script>')</script>
<script src="https://rheem-ecosystem.github.io/js/bootstrap.min.js"></script>
<script src="https://rheem-ecosystem.github.io/js/owl.carousel.min.js"></script>
<script src="https://rheem-ecosystem.github.io/js/plugins.js"></script>
<script src="https://rheem-ecosystem.github.io/js/min/waypoints.min.js"></script>
<script src="https://rheem-ecosystem.github.io/js/jquery.counterup.js"></script>
<script src="https://rheem-ecosystem.github.io/js/main.js"></script>
<script>
var doNotTrack = false;
if (!doNotTrack) {
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-124750019-1', 'auto');
ga('send', 'pageview');
}
</script>
<script>
var doNotTrack = false;
if (!doNotTrack) {
window.ga=window.ga||function(){(ga.q=ga.q||[]).push(arguments)};ga.l=+new Date;
ga('create', 'UA-124750019-1', 'auto');
ga('send', 'pageview');
}
</script>
<script async src='https://www.google-analytics.com/analytics.js'></script>
</head>
<body>
<header>
<a href="https://github.com/rheem-ecosystem/rheem"><img style="position: absolute; top: 0; left: 0; border: 0; z-index: 12; " src="https://camo.githubusercontent.com/121cd7cbdc3e4855075ea8b558508b91ac463ac2/68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f6769746875622f726962626f6e732f666f726b6d655f6c6566745f677265656e5f3030373230302e706e67" alt="Fork me on GitHub" data-canonical-src="https://s3.amazonaws.com/github/ribbons/forkme_left_green_007200.png"></a>
<div class="container-fluid">
<div class="row">
<div class="col-md-12">
<nav class="navbar navbar-default">
<div class="container-fluid">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="https://rheem-ecosystem.github.io/">
<img src="https://rheem-ecosystem.github.io/img/logo.png" alt="Rheem X-Platform Logo" class="logo-rheem">
</a>
</div>
<div class="collapse navbar-collapse move" id="bs-example-navbar-collapse-1">
<ul class="nav navbar-nav navbar-right">
<li><a href="https://rheem-ecosystem.github.io/">Home</a></li>
<li><a href="/about/">About</a></li>
<li><a href="/documentation/">Documentation</a></li>
<li><a href="/publication/">Publications</a></li>
<li><a href="/people/">People</a></li>
<li><a href="/download/">Download</a></li>
<li><a href="/contact/">Contact</a></li>
</ul>
</div>
</div>
</nav>
</div>
</div>
</div>
</header>
<div class="post">
<section class="section" style="border: 1px dotted #ddd;">
<div class="container">
<div class="row">
<div>
<div class="block">
<h1>Interoperating a Zoo of Data Processing Platforms Using Rheem</h1>
<div class="post-info-wrapper">
<p class="italic">By <span class="bold">Yasser Idris and Sebastian Kruse</span> on <span class="bold">2017</span></p>
</div>
<hr />
<p><p>We are witnessing a proliferation of big data, which has lead to a zoo of data processing systems. Each system providing a different set of features. For example, Spark provides scalability to analytic tasks, but Java 8 Streams provides low-latency. Furthermore, complex applications, such as ETL and ML, are now requiring a mixture of platforms to perform tasks efficiently. In such complex data analytics pipelines, the use of multiple data processing system is not only for performance reasons, but also because of data diversity. Datasets often natively reside on different data formats and storage engines. Unfortunately, developers are left alone in the challenging tasks of: (1) choosing the right platform for their applications; and (2) performing tedious and costly data migration and integration tasks to obtain the results.
In this talk, we will present Rheem, an open source scalable cross-platform system that frees developers from these burdens. Rheem provides an abstraction layer on top of Spark (and other processing platforms) with the aim of enabling cross-platform optimization and interoperability. It automatically selects the best data processing platforms for a given task and also handles the cross-platform execution. In particular, we will discuss how Rheem allows Spark to work in tandem with other platforms in order to achieve higher performance. We will also show how easy a developer can write complex applications on top of Rheem to seamlessly use multiple different data processing platforms according to their tasks at hand. Using Rheem developers do not have to worry about the integration or data migration between Spark and other platforms.</p>
</p>
<hr />
<p class="center-text" style="width: 100%">
<a href="https://databricks.com/session/interoperating-a-zoo-of-data-processing-platforms-using-rheem" class="btn btn-main"><i class="fa fa-video"></i> Look</a>
<br>
</p>
</div>
</div>
</div>
</div>
</section>
</div>
<p class="center-text" style="padding: 30px;">
<a class="btn btn-main" href="https://rheem-ecosystem.github.io//publication">Back to Publications</a>
</p>
<section id="call-to-action">
<div class="container-fluid">
<div class="row">
<div class="col-md-12">
<div class="block">
<h2>Turning a Zoo into a Circus</h2>
<p><a class="nothing" href="https://rheem-ecosystem.github.io/about">Read more</a> on how <b>RHEEM</b> tame the Zoo of existing data processing platforms to work together.</p>
</div>
</div>
</div>
</div>
</section>
<footer>
<div class="container-fluid">
<div class="row">
<div class="col-md-12">
<div class="footer-manu">
<ul>
<li><a href="https://www.qcri.org/">About QCRI</a></li>
<li><a href="/contact">Contact us</a></li>
<li><a href="https://github.com/rheem-ecosystem">Fork me</a></li>
<li><a href="https://github.com/rheem-ecosystem/blob/master/LICENSE.TXT">License</a></li>
</ul>
</div>
<p>Copyright &copy; Developed by the dRHEEMers @ QCRI. All rights reserved.</p>
</div>
</div>
</div>
</footer>
</body>
</html>