license: | Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Bored of keep moving your app to the newest data processing platform to achieve high performance? Tired of dealing with a zoo of processing platforms to get the best performance for your analytic tasks? Then, this tutorial is for you!
Indeed, we are witnessing a plethora of innovative data processing platforms in the last few years. While this is generally great, leveraging these new technologies in practice bears quite some challenges, just to name a few, developers must: (i) find among the plethora of processing platforms the best one for their applications; (ii) migrate their applications to newer and faster platforms every now and then; and (iii) orchestrate different platforms so that applications leverage their individual benefits.
We address these issues with Rheem, a system that enables big data analytics over multiple data processing platforms in a seamless manner. It provides a three-layer data processing abstraction with that applications can achieve both platform independence and interoperability across multiple platforms. With Rheem, dRheemers (Rheem developers) can focus on the logics of their applications. Rheem, in turn, takes care of efficiently executing applications by choosing either a single or multiple processing platforms. To achieve this, it comes with a cross-platform optimizer that allows a single task to run faster over multiple platforms than on a single platform.