layout: about title: “About” img-architecture: path: assets/img/architecture.png title: Architecture of Apache Wayang img-plan: path: assets/img/plan.png title: SGD plans features:

  • feature: title: Cross-platform icon: fas fa-bezier-curve description-short: Run a single data analytic task on top of any set of data processing platforms. description: | The most salient feature of Apache Wayang is its cross-platform optimizer. Besides deciding the best processing platform to run any incoming task, Apache Wayang can run a single task on multiple processing platforms. Overall, it applies an extensible set of graph transformations to a Apache Wayang plan to find alternative execution plans. Then, it compares all execution plans by using a platform-specific cost model. Cost functions can either be given or learned, and are parameterized with respect to the underlying hardware (e.g., number of computing nodes for distributed operators).

  • feature: title: High-Efficiency icon: fa fa-clock description-short: It selects the best available data processing platform for any incoming query. description: | Apache Wayang provides a number of optimized operators and novel query optimization process that allows it to efficiently deal with big (as well as small) datasets. Furthermore, as its data processing abstraction is based on UDFs, Apache Wayang lets applications expose semantic properties about their functions, optimization hints (e.g., numbers of iterations), constraints (e.g., physical collocation of operators), and alternative plans. The optimizer then uses those artifacts where available in a best-effort approach.

  • feature: title: Flexibility icon: fa fa-puzzle-piece description-short: User defined functions (UDFs) as first-class citizens, enabling extensibility and adaptability. description: | Apache Wayang provides a set of Wayang operators, which applications use to specify their tasks, as well as a set of execution operators, which processing platform provide to run application tasks. The key aspect is that Apache Wayang provides a flexible operator mapping structure allowing developers to add, modify, or delete mappings among Wayang and execution operators. As a result, developers can also add or remove Wayang and execution operators.

  • feature: title: Ease-of-Use icon: fas fa-child description-short: A simple interface that allows developers to focus only on the logics of their application. description: | Apache Wayang exposes a simple Java API to developers whereby they can implement their tasks. Developers focus on the logics of their tasks rather than on low-level details specific to data processing platforms. The figure of the SGD plans above shows the Wayang plan for a scalable gradient descent implementation: we clearly see that this tedious implementation task is now much easier!

  • feature: title: Cost Saving icon: fa fa-piggy-bank description-short: Fast development of data analytic applications. description: | Users do not have to know the intricacies of the underlying platforms: they focus on the logic of their application only. This not only speeds up the development of applications, but also it is no longer a must to be an expert in big data infrastructures. Apache Wayang takes care of how and on which data processing platforms to deploy your applications.

  • feature: title: Open Source icon: fa fa-code-branch description-short: All code is on GitHub under Apache License. description: | Apache Wayang has been open source from its very beginnings and will keep being open source until its very endings. Feel free to download it, try it, and contribute to it. Help us to make it better!


Apache Wayang has a three-layer data processing abstraction that sits between user applications and data processing platforms, such as Hadoop and Spark. The figure below depicts the Apache Wayang architecture: (i) an application layer that models all application-specific logic; (ii) a core layer that provides the intermediate representation between applications and processing platforms; and (iii) a platform layer that embraces the underlying processing platforms. Overall, the input of an application layer comprises the logical operators provided by users (or generated by a declarative query parser) and the output is a physical plan (WayangPlan). The WayangPlan is then passed to the core layer where cross-platform optimizations take place to produce an execution plan (ExecutionPlan).

Notice that, in contrast to DBMSs, Apache Wayang decouples physical and execution levels. This separation allows applications to express physical plans in terms of algorithmic needs only, without being tied to a particular processing platform. The salient features of Apache Wayang are cross-platform task execution, high-performance, flexibility, and ease-of-use.