blob: 14386d5c7cd3653a5ca236cd81d2a4b46047c27a [file] [log] [blame]
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE html>
<html lang="en">
<head>
<!-- Required meta tags -->
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<script type="text/javascript">
if(window.location.protocol != 'https:') {
location.href = location.href.replace("http://", "https://");
}
</script>
<title>Apache Wayang - Publication</title>
<link rel="icon" href="https://wayang.apache.org/assets/img/logo/favicon-pluma.ico">
<!-- Bootstrap CSS -->
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap@4.6.0/dist/css/bootstrap.min.css" integrity="sha384-B0vP5xmATw1+K9KRQjQERJvTumQW0nPEzvF6L/Z6nronJ3oUOFUFpCjEUQouq2+l" crossorigin="anonymous">
<link rel="stylesheet" href="https://wayang.apache.org/assets/css/color.css">
<link rel="stylesheet" href="https://pro.fontawesome.com/releases/v5.10.0/css/all.css" integrity="sha384-AYmEC3Yw5cVb3ZcuHtOA93w35dYTsvhLPVnYs9eStHfGJvOvKxVfELGroGkvsg+p" crossorigin="anonymous"/>
<link rel="stylesheet" href="https://wayang.apache.org/assets/css/monokai.css">
<link rel="stylesheet" href="https://wayang.apache.org/assets/css/home.css">
</head>
<body>
<nav class="navbar navbar-expand-lg navbar-light bg-light sticky-top shadow-lg">
<div class="container d-flex justify-content-between w-100">
<div class="mr-auto p-2 w-100">
<div class="d-flex">
<a class="navbar-brand mr-auto" href="/">
<img style="max-height: 75px" src="https://wayang.apache.org/assets/img/logo/logo_400x160.png"/>
</a>
<button class="navbar-toggler ml-auto align-self-center" type="button" data-toggle="collapse" data-target="#navbarSupportedContent" aria-controls="navbarSupportedContent" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
</div>
</div>
<div>
<div class="p-2 collapse navbar-collapse" id="navbarSupportedContent">
<div class="navbar-nav">
<li class="nav-item ">
<a class="nav-link" href="https://wayang.apache.org/">
Home
</a>
</li>
<li class="nav-item ">
<a class="nav-link" href="https://wayang.apache.org/about">
About
</a>
</li>
<li class="nav-item ">
<a class="nav-link" href="https://wayang.apache.org/community">
Community
</a>
</li>
<li class="nav-item ">
<a class="nav-link" href="https://wayang.apache.org/documentation">
Documentation
</a>
</li>
<li class="nav-item ">
<a class="nav-link" href="https://wayang.apache.org/publications-home">
Publications
</a>
</li>
<li class="nav-item dropdown ">
<a class="nav-link dropdown-toggle" data-toggle="dropdown" href="#" role="button" aria-haspopup="true" aria-expanded="false">
Apache
</a>
<div class="dropdown-menu">
<a class="dropdown-item" href="http://www.apache.org/foundation/how-it-works.html">
Apache Software Foundation
</a>
<a class="dropdown-item" href="http://www.apache.org/licenses/">
Apache License
</a>
<a class="dropdown-item" href="http://www.apache.org/foundation/sponsorship.html">
Sponsorship
</a>
<a class="dropdown-item" href="http://www.apache.org/foundation/thanks.html">
Thanks
</a>
</div>
</li>
</div>
</div>
</div>
</div>
</div>
</nav>
<div class="container-fluid p-0">
<div class="title-post mb-3 mt-n5 d-flex align-items-center shadow" >
<div class="col pt-4" style="text-align: center">
<h1 class="mb-n2 mt-1" style="color: white; font-size: 4em">Publication</h1>
<h2 style="color: white; font-size: 2em">BigDansing: A System for Big Data Cleansing
</h2>
</div>
</div>
<div class="container">
<div class="row justify-content-md-center mb-4">
<div class="col-12 ">
<div class="post-info-wrapper">
<p class="italic">By <span class="bold">Zuhair Khayyat, Ihab F. Ilyas, Alekh Jindal, Samuel Madden, Mourad Ouzzani, Paolo Papotti, Jorge-Arnulfo Quiané-Ruiz, Nan Tang and Si Yin</span> on <span class="bold">2015</span></p>
</div>
<hr />
<p>Data cleansing approaches have usually focused on detecting and fixing errors with little attention to scaling to big datasets. This presents a serious impediment since data cleansing often involves costly computations such as enumerating pairs of tuples, handling inequality joins, and dealing with user-defined functions. In this paper, we present BigDansing, a Big Data Cleansing system to tackle efficiency, scalability, and ease-of-use issues in data cleansing. The system can run on top of most common general purpose data processing platforms, ranging from DBMSs to MapReduce-like frameworks. A user-friendly programming interface allows users to express data quality rules both declaratively and procedurally, with no requirement of being aware of the underlying distributed platform. BigDansing takes these rules into a series of transformations that enable distributed computations and several optimizations, such as shared scans and specialized joins operators. Experimental results on both synthetic and real datasets show that BigDansing outperforms existing baseline systems up to more than two orders of magnitude without sacrificing the quality provided by the repair algorithms.</p>
<hr />
</div>
<div class="col-10 text-center">
<a href="/assets/pdf/paper/bigdansing.pdf" class="btn btn-outline-info">
<i class="far fa-file-pdf"></i> Download
</a>
</div>
</div>
</div>
</div>
<footer class="footer position-sticky sticky-bottom">
<nav class="navbar navbar-light bg-light" style="background: #A6A6A6;">
<div class="container">
<div class="row">
<div class="col text-center">
<a href="http://incubator.apache.org/" >
<img style="max-height: 15vw" src="https://wayang.apache.org/assets/img/egg-logo.png">
</a>
<br />
<p style="text-align: justify">
Apache Wayang is an effort undergoing Incubation at The Apache Software Foundation (ASF), sponsored by the Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
</p>
<p class="text-center">
Copyright &#169; 2021 The Apache Software Foundation.<br />
Licensed under the Apache License, Version 2.0.<br />
Apache, the Apache Feather logo, and the Apache Incubator project logo are trademarks of The Apache Software Foundation.
</p>
</div>
</div>
</div>
</nav>
</footer>
<script src="https://code.jquery.com/jquery-3.5.1.slim.min.js" integrity="sha384-DfXdz2htPH0lsSSs5nCTpuj/zy4C+OGpamoFVy38MVBnE+IbbVYUew+OrCXaRkfj" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/bootstrap@4.6.0/dist/js/bootstrap.bundle.min.js" integrity="sha384-Piv4xVNRyMGpqkS2by6br4gNJ7DXjqk09RmUpJ8jgGtD7zP9yug3goQfGII0yAns" crossorigin="anonymous"></script>
<script src="https://wayang.apache.org/assets/js/add_numbers.js"></script>
</body>
</html>