blob: e63f4ef518aa137f63227e4467f0c0cbc21d564b [file] [log] [blame]
<!doctype html>
<html lang="en" dir="ltr" class="docs-wrapper plugin-docs plugin-id-default docs-version-current docs-doc-page docs-doc-id-introduction/benchmark" data-has-hydrated="false">
<head>
<meta charset="UTF-8">
<meta name="generator" content="Docusaurus v3.1.1">
<title data-rh="true">Benchmarking Wayang | Apache Wayang (incubating)</title><meta data-rh="true" name="viewport" content="width=device-width,initial-scale=1"><meta data-rh="true" name="twitter:card" content="summary_large_image"><meta data-rh="true" property="og:url" content="https://wayang.apache.org/docs/introduction/benchmark"><meta data-rh="true" property="og:locale" content="en"><meta data-rh="true" name="docusaurus_locale" content="en"><meta data-rh="true" name="docsearch:language" content="en"><meta data-rh="true" name="docusaurus_version" content="current"><meta data-rh="true" name="docusaurus_tag" content="docs-default-current"><meta data-rh="true" name="docsearch:version" content="current"><meta data-rh="true" name="docsearch:docusaurus_tag" content="docs-default-current"><meta data-rh="true" property="og:title" content="Benchmarking Wayang | Apache Wayang (incubating)"><meta data-rh="true" name="description" content="Wayang is a powerful cross-platform middleware that can utilize and seamlessly integrate various execution platforms, including Postgres, Spark, Flink, Java Streams, and Python. In benchmarking, we found that Wayang demonstrated exceptional performance across multiple use cases, including a popular machine learning algorithm, a core big data benchmark task, and a relational query."><meta data-rh="true" property="og:description" content="Wayang is a powerful cross-platform middleware that can utilize and seamlessly integrate various execution platforms, including Postgres, Spark, Flink, Java Streams, and Python. In benchmarking, we found that Wayang demonstrated exceptional performance across multiple use cases, including a popular machine learning algorithm, a core big data benchmark task, and a relational query."><link data-rh="true" rel="icon" href="/img/wayang-logo.jpg"><link data-rh="true" rel="canonical" href="https://wayang.apache.org/docs/introduction/benchmark"><link data-rh="true" rel="alternate" href="https://wayang.apache.org/docs/introduction/benchmark" hreflang="en"><link data-rh="true" rel="alternate" href="https://wayang.apache.org/docs/introduction/benchmark" hreflang="x-default"><link rel="alternate" type="application/rss+xml" href="/blog/rss.xml" title="Apache Wayang (incubating) RSS Feed">
<link rel="alternate" type="application/atom+xml" href="/blog/atom.xml" title="Apache Wayang (incubating) Atom Feed"><link rel="stylesheet" href="/assets/css/styles.ecf70413.css">
<script src="/assets/js/runtime~main.db1fac0d.js" defer="defer"></script>
<script src="/assets/js/main.f50bad53.js" defer="defer"></script>
</head>
<body class="navigation-with-keyboard">
<script>!function(){function t(t){document.documentElement.setAttribute("data-theme",t)}var e=function(){try{return new URLSearchParams(window.location.search).get("docusaurus-theme")}catch(t){}}()||function(){try{return localStorage.getItem("theme")}catch(t){}}();t(null!==e?e:"light")}(),function(){try{const a=new URLSearchParams(window.location.search).entries();for(var[t,e]of a)if(t.startsWith("docusaurus-data-")){var n=t.replace("docusaurus-data-","data-");document.documentElement.setAttribute(n,e)}}catch(t){}}(),document.documentElement.setAttribute("data-announcement-bar-initially-dismissed",function(){try{return"true"===localStorage.getItem("docusaurus.announcement.dismiss")}catch(t){}return!1}())</script><div id="__docusaurus"><div role="region" aria-label="Skip to main content"><a class="skipToContent_fXgn" href="#__docusaurus_skipToContent_fallback">Skip to main content</a></div><div class="announcementBar_mb4j" style="background-color:#fafbfc;color:#091E42" role="banner"><div class="announcementBarPlaceholder_vyr4"></div><div class="content_knG7 announcementBarContent_xLdY">⭐️ If you like Apache Wayang, give it a star on <a target="_blank" href="https://github.com/apache/incubator-wayang">GitHub</a>! ⭐ </div><button type="button" aria-label="Close" class="clean-btn close closeButton_CVFx announcementBarClose_gvF7"><svg viewBox="0 0 15 15" width="14" height="14"><g stroke="currentColor" stroke-width="3.1"><path d="M.75.75l13.5 13.5M14.25.75L.75 14.25"></path></g></svg></button></div><nav aria-label="Main" class="navbar navbar--fixed-top"><div class="navbar__inner"><div class="navbar__items"><button aria-label="Toggle navigation bar" aria-expanded="false" class="navbar__toggle clean-btn" type="button"><svg width="30" height="30" viewBox="0 0 30 30" aria-hidden="true"><path stroke="currentColor" stroke-linecap="round" stroke-miterlimit="10" stroke-width="2" d="M4 7h22M4 15h22M4 23h22"></path></svg></button><a class="navbar__brand" href="/"><div class="navbar__logo"><img src="/img/wayang.png" alt="Wayang Logo" class="themedComponent_mlkZ themedComponent--light_NVdE"><img src="/img/wayang.png" alt="Wayang Logo" class="themedComponent_mlkZ themedComponent--dark_xIcU"></div><b class="navbar__title text--truncate"></b></a></div><div class="navbar__items navbar__items--right"><a class="navbar__item navbar__link" href="/docs/start/download">Download</a><a aria-current="page" class="navbar__item navbar__link navbar__link--active" href="/docs/introduction/about">About</a><a class="navbar__item navbar__link" href="/docs/guide/installation">Developers</a><div class="navbar__item dropdown dropdown--hoverable dropdown--right"><a href="#" aria-haspopup="true" aria-expanded="false" role="button" class="navbar__link">Community</a><ul class="dropdown__menu"><li><a class="dropdown__link" href="/blog/">Blog</a></li><li><a class="dropdown__link" href="/docs/community/mailinglist">Project</a></li></ul></div><div class="navbar__item dropdown dropdown--hoverable dropdown--right"><a href="#" aria-haspopup="true" aria-expanded="false" role="button" class="navbar__link">ASF</a><ul class="dropdown__menu"><li><a href="https://www.apache.org/" target="_blank" rel="noopener noreferrer" class="dropdown__link">Foundation</a></li><li><a href="https://www.apache.org/licenses/" target="_blank" rel="noopener noreferrer" class="dropdown__link">License</a></li><li><a href="https://www.apache.org/events/current-event.html" target="_blank" rel="noopener noreferrer" class="dropdown__link">Events</a></li><li><a href="https://privacy.apache.org/policies/privacy-policy-public.html" target="_blank" rel="noopener noreferrer" class="dropdown__link">Privacy</a></li><li><a href="https://www.apache.org/security/" target="_blank" rel="noopener noreferrer" class="dropdown__link">Security</a></li><li><a href="https://www.apache.org/foundation/sponsorship.html" target="_blank" rel="noopener noreferrer" class="dropdown__link">Sponsorship</a></li><li><a href="https://www.apache.org/foundation/thanks.html" target="_blank" rel="noopener noreferrer" class="dropdown__link">Thanks</a></li><li><a href="https://www.apache.org/foundation/policies/conduct.html" target="_blank" rel="noopener noreferrer" class="dropdown__link">Code of Conduct</a></li></ul></div><a href="https://github.com/apache/incubator-wayang" target="_blank" rel="noopener noreferrer" class="navbar__item navbar__link header-github-link" aria-label="GitHub repository"></a><div class="toggle_vylO colorModeToggle_DEke"><button class="clean-btn toggleButton_gllP toggleButtonDisabled_aARS" type="button" disabled="" title="Switch between dark and light mode (currently light mode)" aria-label="Switch between dark and light mode (currently light mode)" aria-live="polite"><svg viewBox="0 0 24 24" width="24" height="24" class="lightToggleIcon_pyhR"><path fill="currentColor" d="M12,9c1.65,0,3,1.35,3,3s-1.35,3-3,3s-3-1.35-3-3S10.35,9,12,9 M12,7c-2.76,0-5,2.24-5,5s2.24,5,5,5s5-2.24,5-5 S14.76,7,12,7L12,7z M2,13l2,0c0.55,0,1-0.45,1-1s-0.45-1-1-1l-2,0c-0.55,0-1,0.45-1,1S1.45,13,2,13z M20,13l2,0c0.55,0,1-0.45,1-1 s-0.45-1-1-1l-2,0c-0.55,0-1,0.45-1,1S19.45,13,20,13z M11,2v2c0,0.55,0.45,1,1,1s1-0.45,1-1V2c0-0.55-0.45-1-1-1S11,1.45,11,2z M11,20v2c0,0.55,0.45,1,1,1s1-0.45,1-1v-2c0-0.55-0.45-1-1-1C11.45,19,11,19.45,11,20z M5.99,4.58c-0.39-0.39-1.03-0.39-1.41,0 c-0.39,0.39-0.39,1.03,0,1.41l1.06,1.06c0.39,0.39,1.03,0.39,1.41,0s0.39-1.03,0-1.41L5.99,4.58z M18.36,16.95 c-0.39-0.39-1.03-0.39-1.41,0c-0.39,0.39-0.39,1.03,0,1.41l1.06,1.06c0.39,0.39,1.03,0.39,1.41,0c0.39-0.39,0.39-1.03,0-1.41 L18.36,16.95z M19.42,5.99c0.39-0.39,0.39-1.03,0-1.41c-0.39-0.39-1.03-0.39-1.41,0l-1.06,1.06c-0.39,0.39-0.39,1.03,0,1.41 s1.03,0.39,1.41,0L19.42,5.99z M7.05,18.36c0.39-0.39,0.39-1.03,0-1.41c-0.39-0.39-1.03-0.39-1.41,0l-1.06,1.06 c-0.39,0.39-0.39,1.03,0,1.41s1.03,0.39,1.41,0L7.05,18.36z"></path></svg><svg viewBox="0 0 24 24" width="24" height="24" class="darkToggleIcon_wfgR"><path fill="currentColor" d="M9.37,5.51C9.19,6.15,9.1,6.82,9.1,7.5c0,4.08,3.32,7.4,7.4,7.4c0.68,0,1.35-0.09,1.99-0.27C17.45,17.19,14.93,19,12,19 c-3.86,0-7-3.14-7-7C5,9.07,6.81,6.55,9.37,5.51z M12,3c-4.97,0-9,4.03-9,9s4.03,9,9,9s9-4.03,9-9c0-0.46-0.04-0.92-0.1-1.36 c-0.98,1.37-2.58,2.26-4.4,2.26c-2.98,0-5.4-2.42-5.4-5.4c0-1.81,0.89-3.42,2.26-4.4C12.92,3.04,12.46,3,12,3L12,3z"></path></svg></button></div><div class="navbarSearchContainer_Bca1"><div class="navbar__search"><span aria-label="expand searchbar" role="button" class="search-icon" tabindex="0"></span><input id="search_input_react" type="search" placeholder="Loading..." aria-label="Search" class="navbar__search-input search-bar" disabled=""></div></div></div></div><div role="presentation" class="navbar-sidebar__backdrop"></div></nav><div id="__docusaurus_skipToContent_fallback" class="main-wrapper mainWrapper_z2l0"><div class="docsWrapper_hBAB"><button aria-label="Scroll back to top" class="clean-btn theme-back-to-top-button backToTopButton_sjWU" type="button"></button><div class="docRoot_UBD9"><aside class="theme-doc-sidebar-container docSidebarContainer_YfHR"><div class="sidebarViewport_aRkj"><div class="sidebar_njMd"><nav aria-label="Docs sidebar" class="menu thin-scrollbar menu_SIkG menuWithAnnouncementBar_GW3s"><ul class="theme-doc-sidebar-menu menu__list"><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-1 menu__list-item"><a class="menu__link" href="/docs/introduction/about">What is Wayang?</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-1 menu__list-item"><a class="menu__link menu__link--active" aria-current="page" href="/docs/introduction/benchmark">Benchmarking Wayang</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-1 menu__list-item"><a class="menu__link" href="/docs/introduction/features">Features</a></li></ul></nav></div></div></aside><main class="docMainContainer_TBSr"><div class="container padding-top--md padding-bottom--lg"><div class="row"><div class="col docItemCol_VOVn"><div class="docItemContainer_Djhp"><article><nav class="theme-doc-breadcrumbs breadcrumbsContainer_Z_bl" aria-label="Breadcrumbs"><ul class="breadcrumbs" itemscope="" itemtype="https://schema.org/BreadcrumbList"><li class="breadcrumbs__item"><a aria-label="Home page" class="breadcrumbs__link" href="/"><svg viewBox="0 0 24 24" class="breadcrumbHomeIcon_YNFT"><path d="M10 19v-5h4v5c0 .55.45 1 1 1h3c.55 0 1-.45 1-1v-7h1.7c.46 0 .68-.57.33-.87L12.67 3.6c-.38-.34-.96-.34-1.34 0l-8.36 7.53c-.34.3-.13.87.33.87H5v7c0 .55.45 1 1 1h3c.55 0 1-.45 1-1z" fill="currentColor"></path></svg></a></li><li itemscope="" itemprop="itemListElement" itemtype="https://schema.org/ListItem" class="breadcrumbs__item breadcrumbs__item--active"><span class="breadcrumbs__link" itemprop="name">Benchmarking Wayang</span><meta itemprop="position" content="1"></li></ul></nav><div class="tocCollapsible_ETCw theme-doc-toc-mobile tocMobile_ITEo"><button type="button" class="clean-btn tocCollapsibleButton_TO0P">On this page</button></div><div class="theme-doc-markdown markdown"><header><h1>Benchmarking Wayang</h1></header><p>Wayang is a powerful cross-platform middleware that can utilize and seamlessly integrate various execution platforms, including Postgres, Spark, Flink, Java Streams, and Python. In benchmarking, we found that Wayang demonstrated exceptional performance across multiple use cases, including a popular machine learning algorithm, a core big data benchmark task, and a relational query.</p>
<p>Apache Wayang has the goal of providing a platform to enable data-agnostic applications and decentralized data processing, the fundamentals of federated learning.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="running-a-relational-query-across-multiple-independent-databases-federation">Running a Relational Query Across Multiple Independent Databases (Federation)<a href="#running-a-relational-query-across-multiple-independent-databases-federation" class="hash-link" aria-label="Direct link to Running a Relational Query Across Multiple Independent Databases (Federation)" title="Direct link to Running a Relational Query Across Multiple Independent Databases (Federation)"></a></h3>
<p>In this use case, we showcase how Apache Wayang addresses the common challenge of analyzing data scattered across diverse systems, a prevalent issue in many data-driven projects. Through Wayang, developers can seamlessly perform analytics on relational data residing in multiple systems. The in-situ data processing approach employed by Wayang introduces data privacy-preserving mechanisms into distributed data analytics tasks.</p>
<p><strong>Datasets</strong>: To test the capabilities of Wayang, we utilized the <a href="https://www.tpc.org/tpch/default5.asp" target="_blank" rel="noopener noreferrer">TPC-H benchmark</a>, which comprises five datasets/relations. The lineitem and orders relations are stored in HDFS, while customer, supplier, and region are in Postgres, and nation is in S3 or the local file system. Our testing covered a range of dataset sizes, from 1GB to 100GB, to ensure our product&#x27;s scalability and robustness.</p>
<p><strong>Query/task</strong>: We evaluate the performance of the complex SQL TPC-H query 5:</p>
<div class="language-sql codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-sql codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">SELECT</span><span class="token plain"> N_NAME</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">SUM</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">L_EXTENDEDPRICE</span><span class="token operator" style="color:#393A34">*</span><span class="token punctuation" style="color:#393A34">(</span><span class="token number" style="color:#36acaa">1</span><span class="token operator" style="color:#393A34">-</span><span class="token plain">L_DISCOUNT</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">AS</span><span class="token plain"> REVENUE</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">FROM</span><span class="token plain"> CUSTOMER</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> ORDERS</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> LINEITEM</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> SUPPLIER</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> NATION</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> REGION</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">WHERE</span><span class="token plain"> C_CUSTKEY </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> O_CUSTKEY </span><span class="token operator" style="color:#393A34">AND</span><span class="token plain"> L_ORDERKEY </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> O_ORDERKEY </span><span class="token operator" style="color:#393A34">AND</span><span class="token plain"> L_SUPPKEY </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> S_SUPPKEY</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token operator" style="color:#393A34">AND</span><span class="token plain"> C_NATIONKEY </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> S_NATIONKEY </span><span class="token operator" style="color:#393A34">AND</span><span class="token plain"> S_NATIONKEY </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> N_NATIONKEY </span><span class="token operator" style="color:#393A34">AND</span><span class="token plain"> N_REGIONKEY </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> R_REGIONKEY</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token operator" style="color:#393A34">AND</span><span class="token plain"> R_NAME </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">&#x27;ASIA&#x27;</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">AND</span><span class="token plain"> O_ORDERDATE </span><span class="token operator" style="color:#393A34">&gt;=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">&#x27;1994-01-01&#x27;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token operator" style="color:#393A34">AND</span><span class="token plain"> O_ORDERDATE </span><span class="token operator" style="color:#393A34">&lt;</span><span class="token plain"> DATEADD</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">YY</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> cast</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">&#x27;1994-01-01&#x27;</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">date</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">GROUP</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">BY</span><span class="token plain"> N_NAME</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">ORDER</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">BY</span><span class="token plain"> REVENUE </span><span class="token keyword" style="color:#00009f">DESC</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p><strong>Baselines</strong>: We compared Wayang with two widely-used systems for storing and querying relational data: Spark and Postgres. To conduct a fair comparison, we first moved all datasets to the system being tested (Spark or Postgres).</p>
<p><strong>Results</strong>: The results of our tests, presented in Figure 1, show the execution time in seconds of the SQL query 5. Note that the time required to transfer data to Spark or Postgres was excluded from the results. As seen in Figure 1, Wayang significantly outperformed Postgres while retaining a runtime very close to Spark. It&#x27;s important to note that for Spark to function, we needed twice the time to extract datasets from Postgres and transfer them to Spark.
Wayang achieved such impressive performance by seamlessly combining Postgres with Spark: Wayang&#x27;s query optimizer chose to perform all selections and projections on the data stored in Postgres before extracting data to join with the relations in HDFS. Additionally, the optimizer determined that executing the join between lineitem and supplier in Spark would be beneficial, as it could distribute the computational load of joining to multiple workers. All of this was done without the need for the user to specify where each operation should be executed.</p>
<br>
<img width="75%" alt="" src="/img/benchmarks/disperse_data.png">
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="reducing-execution-costs-for-machine-learning-tasks-using-multiple-systems">Reducing Execution Costs for Machine Learning Tasks Using Multiple Systems<a href="#reducing-execution-costs-for-machine-learning-tasks-using-multiple-systems" class="hash-link" aria-label="Direct link to Reducing Execution Costs for Machine Learning Tasks Using Multiple Systems" title="Direct link to Reducing Execution Costs for Machine Learning Tasks Using Multiple Systems"></a></h3>
<p>To evaluate Apache Wayang&#x27;s ability to accelerate machine learning tasks while optimizing costs by leveraging multiple systems, we conducted a comprehensive benchmark. The benchmark involved storing all data within a single repository, a scenario where selectively offloading data to more powerful engines can significantly enhance performance. To assess this feature, we employed a widely used machine learning algorithm, stochastic gradient descent, to perform classification tasks.<br></p>
<p><strong>Datasets</strong>: We use two real-world datasets that we downloaded from the UCI machine learning repository, namely higgs and rcv1. higgs consists of ~11 million data points with 28 features each and rcv1 contains ~677 thousand data points with ~47 thousand features each. In addition, we construct a synthetic dataset so that we can stress our product with even larger datasets, specifically with 88 million data points of 100 features each.</p>
<p><strong>Query/task</strong>: We test the performance of training classification models for higgs and the synthetic dataset and a regression model rcv1. All three training models use stochastic gradient descent but with a different loss function. We use Hinge loss to simulate support vector machines for the classification tasks and the logistic loss for the regression task.</p>
<p><strong>Baselines</strong>: We compare Wayang against MLlib from Apache Spark and SystemML from IBM, two very popular machine learning libraries.</p>
<p><strong>Results</strong>: Our results, shown in Figure 2, demonstrate that Wayang outperforms both baselines by more than an order of magnitude in runtime performance for large datasets.</p>
<br>
<img width="75%" alt="" src="/img/benchmarks/bench2.png">
<p>Apache Wayang&#x27;s intelligent optimizer dynamically utilizes a hybrid approach, seamlessly blending Spark and local Java execution, to significantly improve the performance of the stochastic gradient descent algorithm. This optimization strategy involves leveraging Spark for efficient preprocessing and data preparation, while transitioning to local Java execution for the subsequent gradient computation phase. During this phase, the dataset shrinks considerably, making it more efficient to process using a single machine. This optimization technique, while not readily apparent without specialized expertise, is seamlessly implemented by Apache Wayang, requiring no additional user intervention.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="optimizing-big-data-analytics-by-adapting-platforms-to-data-and-task-characteristics">Optimizing Big Data Analytics by Adapting Platforms to Data and Task Characteristics<a href="#optimizing-big-data-analytics-by-adapting-platforms-to-data-and-task-characteristics" class="hash-link" aria-label="Direct link to Optimizing Big Data Analytics by Adapting Platforms to Data and Task Characteristics" title="Direct link to Optimizing Big Data Analytics by Adapting Platforms to Data and Task Characteristics"></a></h3>
<p>We demonstrate the platform adaptation capabilities of Apache Wayang, showcasing how it optimizes analytical tasks by tailoring execution to the specific data and task characteristics. Our approach involves dynamically selecting the optimal processing engine based on the nature of the data and the complexity of the analytical operation. This flexibility allows Wayang to deliver enhanced performance and efficiency, particularly for complex analytical workflows that demand the strengths of multiple processing engines.</p>
<p><strong>Datasets</strong>: We use Wikipedia abstracts and store them in HDFS. We vary the dataset size from 1GB to 800GB.</p>
<p><strong>Query/task</strong>: We test our product with a widely popular big data analytical task, namely WordCount. It counts the number of distinct words in a text corpus. Different variations of wordcount are useful in various text mining applications, such as term frequency, word clouds, etc.</p>
<p><strong>Baselines</strong>: We use three different platforms where Wordcount can be executed: Apache Spark, Apache Flink, and a single node Java program. We then set Wayang to be able to automatically choose which of the three platforms to use for each dataset.</p>
<p><strong>Results</strong>: The benchmark results, presented in Figure 3, provide evidence of Apache Wayang&#x27;s platform adaptation prowess, consistently selecting the fastest available platform for various dataset sizes in our WordCount example. By incorporating execution cost modeling into the query optimizer, Wayang eliminates the need for manual platform selection or migration, allowing users to solely focus on their analytical tasks. Our platform adaptation feature seamlessly identifies the optimal platform for each query or task based on data and query characteristics, ensuring faster execution and enhanced efficiency in big data analytics.</p>
<br>
<img width="75%" alt="" src="/img/benchmarks/bench3.png">
<br>
<br></div></article><nav class="pagination-nav docusaurus-mt-lg" aria-label="Docs pages"><a class="pagination-nav__link pagination-nav__link--prev" href="/docs/introduction/about"><div class="pagination-nav__sublabel">Previous</div><div class="pagination-nav__label">What is Wayang?</div></a><a class="pagination-nav__link pagination-nav__link--next" href="/docs/introduction/features"><div class="pagination-nav__sublabel">Next</div><div class="pagination-nav__label">Features</div></a></nav></div></div><div class="col col--3"><div class="tableOfContents_bqdL thin-scrollbar theme-doc-toc-desktop"><ul class="table-of-contents table-of-contents__left-border"><li><a href="#running-a-relational-query-across-multiple-independent-databases-federation" class="table-of-contents__link toc-highlight">Running a Relational Query Across Multiple Independent Databases (Federation)</a></li><li><a href="#reducing-execution-costs-for-machine-learning-tasks-using-multiple-systems" class="table-of-contents__link toc-highlight">Reducing Execution Costs for Machine Learning Tasks Using Multiple Systems</a></li><li><a href="#optimizing-big-data-analytics-by-adapting-platforms-to-data-and-task-characteristics" class="table-of-contents__link toc-highlight">Optimizing Big Data Analytics by Adapting Platforms to Data and Task Characteristics</a></li></ul></div></div></div></div></main></div></div></div><footer class="footer footer--dark"><div class="container container-fluid"><div class="row footer__links"><div class="col footer__col"><div class="footer__title">Community</div><ul class="footer__items clean-list"><li class="footer__item"><a href="https://lists.apache.org/list.html?dev@wayang.apache.org" target="_blank" rel="noopener noreferrer" class="footer__link-item">Mailing list<svg width="13.5" height="13.5" aria-hidden="true" viewBox="0 0 24 24" class="iconExternalLink_nPIU"><path fill="currentColor" d="M21 13v10h-21v-19h12v2h-10v15h17v-8h2zm3-12h-10.988l4.035 4-6.977 7.07 2.828 2.828 6.977-7.07 4.125 4.172v-11z"></path></svg></a></li><li class="footer__item"><a href="https://www.youtube.com/@apachewayang" target="_blank" rel="noopener noreferrer" class="footer__link-item">YouTube<svg width="13.5" height="13.5" aria-hidden="true" viewBox="0 0 24 24" class="iconExternalLink_nPIU"><path fill="currentColor" d="M21 13v10h-21v-19h12v2h-10v15h17v-8h2zm3-12h-10.988l4.035 4-6.977 7.07 2.828 2.828 6.977-7.07 4.125 4.172v-11z"></path></svg></a></li><li class="footer__item"><a href="https://www.linkedin.com/company/apachewayang" target="_blank" rel="noopener noreferrer" class="footer__link-item">LinkedIn<svg width="13.5" height="13.5" aria-hidden="true" viewBox="0 0 24 24" class="iconExternalLink_nPIU"><path fill="currentColor" d="M21 13v10h-21v-19h12v2h-10v15h17v-8h2zm3-12h-10.988l4.035 4-6.977 7.07 2.828 2.828 6.977-7.07 4.125 4.172v-11z"></path></svg></a></li><li class="footer__item"><a href="https://www.reddit.com/r/ApacheWayang" target="_blank" rel="noopener noreferrer" class="footer__link-item">Reddit<svg width="13.5" height="13.5" aria-hidden="true" viewBox="0 0 24 24" class="iconExternalLink_nPIU"><path fill="currentColor" d="M21 13v10h-21v-19h12v2h-10v15h17v-8h2zm3-12h-10.988l4.035 4-6.977 7.07 2.828 2.828 6.977-7.07 4.125 4.172v-11z"></path></svg></a></li><li class="footer__item"><a href="https://twitter.com/apachewayang" target="_blank" rel="noopener noreferrer" class="footer__link-item">Twitter<svg width="13.5" height="13.5" aria-hidden="true" viewBox="0 0 24 24" class="iconExternalLink_nPIU"><path fill="currentColor" d="M21 13v10h-21v-19h12v2h-10v15h17v-8h2zm3-12h-10.988l4.035 4-6.977 7.07 2.828 2.828 6.977-7.07 4.125 4.172v-11z"></path></svg></a></li></ul></div><div class="col footer__col"><div class="footer__title">Docs</div><ul class="footer__items clean-list"><li class="footer__item"><a class="footer__link-item" href="/docs/start/download">Install</a></li><li class="footer__item"><a class="footer__link-item" href="/docs/introduction/features">Features</a></li><li class="footer__item"><a class="footer__link-item" href="/docs/introduction/benchmark">Benchmark</a></li></ul></div><div class="col footer__col"><div class="footer__title">Repositories</div><ul class="footer__items clean-list"><li class="footer__item"><a href="https://github.com/apache/incubator-wayang" target="_blank" rel="noopener noreferrer" class="footer__link-item">Wayang<svg width="13.5" height="13.5" aria-hidden="true" viewBox="0 0 24 24" class="iconExternalLink_nPIU"><path fill="currentColor" d="M21 13v10h-21v-19h12v2h-10v15h17v-8h2zm3-12h-10.988l4.035 4-6.977 7.07 2.828 2.828 6.977-7.07 4.125 4.172v-11z"></path></svg></a></li><li class="footer__item"><a href="https://github.com/apache/incubator-wayang-website" target="_blank" rel="noopener noreferrer" class="footer__link-item">Website<svg width="13.5" height="13.5" aria-hidden="true" viewBox="0 0 24 24" class="iconExternalLink_nPIU"><path fill="currentColor" d="M21 13v10h-21v-19h12v2h-10v15h17v-8h2zm3-12h-10.988l4.035 4-6.977 7.07 2.828 2.828 6.977-7.07 4.125 4.172v-11z"></path></svg></a></li></ul></div></div><div class="footer__bottom text--center"><div class="margin-bottom--sm"><a href="https://incubator.apache.org/" rel="noopener noreferrer" class="footerLogoLink_BH7S"><img src="/img/apache-incubator.svg" alt="Apache Incubator logo" class="footer__logo themedComponent_mlkZ themedComponent--light_NVdE" width="200"><img src="/img/apache-incubator.svg" alt="Apache Incubator logo" class="footer__logo themedComponent_mlkZ themedComponent--dark_xIcU" width="200"></a></div><div class="footer__copyright"><div>
<p> Apache Wayang is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. </p>
<p>
Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0. <br>
Apache, the names of Apache projects, and the feather logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
</p>
</div></div></div></div></footer></div>
</body>
</html>