blob: 60d64c583c61eb0746d74222cb43e46b4aaffab3 [file] [log] [blame]
<!DOCTYPE html>
<!-- Start _layouts/front_page.html-->
<html lang="en">
<head>
<!-- Start _include/site_head.html -->
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="">
<meta name="author" content="datasketches">
<title>DataSketches | </title>
<link rel="shortcut icon" href="/img/favicon.png">
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.1.0/css/font-awesome.min.css">
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap.min.css">
<link href='https://fonts.googleapis.com/css?family=Open+Sans+Condensed:300,700,300italic|Open+Sans:300italic,400italic,600italic,400,300,600'
rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="/css/main.css">
<link rel="stylesheet" href="/css/header.css">
<link rel="stylesheet" href="/css/footer.css">
<link rel="stylesheet" href="/css/syntax.css">
<link rel="stylesheet" href="/css/docs.css">
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML-full">
</script>
<script src="https://code.jquery.com/jquery.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/js/bootstrap.min.js"></script>
<!-- End _include/site_head.html -->
</head>
<body>
<!-- Start _include/nav_bar.html -->
<div class="navbar navbar-inverse navbar-static-top ds-nav">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a href="/" style="padding-top: 0px; padding-bottom: 0px;">
<span class="ds-small-h-logo"></span></a>
</div>
<div class="navbar-collapse collapse">
<ul class="nav navbar-nav navbar-right">
<li>
<a href="/docs/Background/TheChallenge.html">
<span class="fa fa-info-circle"></span> DOCUMENTATION</a>
</li>
<li>
<a href="/docs/Community/Downloads.html">
<span class="fa fa-download"></span> DOWNLOAD</a>
</li>
<!--
<li>
<a href="/docs/Architecture/Components.html">
<span class="fa fa-github"></span> GITHUB</a>
</li>
-->
<li>
<a href="/docs/Community/Research.html">
<span class="fa fa-paper-plane"></span> RESEARCH</a>
</li>
<li>
<a href="/docs/Community/index.html" style="padding-top: 0; padding-bottom: 0;">
<img class="ds-small-man" src="/img/datasketches-ManWhite.svg"/>COMMUNITY</a>
</li>
<li>
<ul class="nav navbar-nav navbar-right ds-nav">
<li class="dropdown ds-nav" >
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false" style="padding-top: 0; padding-bottom: 0;"><img class="apache-logo" src="/img/feather.svg"/>Apache <span class="caret"></span></a>
<ul class="dropdown-menu ds-nav">
<li><a href="https://www.apache.org/" target="_blank">Foundation</a></li>
<li><a href="https://www.apache.org/events/current-event" target="_blank">Events</a></li>
<li><a href="https://www.apache.org/licenses/" target="_blank">License</a></li>
<li><a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li>
<li><a href="https://www.apache.org/security/" target="_blank">Security</a></li>
<li><a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a></li>
</ul>
</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
<!-- End _include/nav_bar.html -->
<!-- Start /index.md -->
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<link rel="stylesheet" type="text/css" href="css/index.css" />
<link rel="stylesheet" type="text/css" href="css/header.css" />
<main class="ds-masthead">
<div class="container">
<div class="row">
<div class="col-md-8 col-md-offset-2 text-center">
<span class="ds-bootlogo"></span>
<p class="lead" style="font-size: 20px; line-height: 1.0; margin-bottom: 15px; color: #FFFFFF">A software library of
<a href="https://en.wikipedia.org/wiki/Stochastic" style="color: #EDE379"><i>stochastic</i></a>
<a href="https://en.wikipedia.org/wiki/Streaming_algorithm" style="color: #EDE379"><i>streaming algorithms</i></a></p>
<p class="lead" style="font-size: 18px; line-height: 1.0; margin-bottom: 15px; color: #FFFFFF"><i>"A truly excellent example of theoretically-informed algorithm engineering"</i> -- Graham Cormode</p>
</div>
</div>
</div>
</main>
<div class="container">
<div class="row">
<div class="text-justify" style="font-size: 18px; padding-left: 25px; padding-right: 25px">
<p><b>The Business Challenge:</b> Analyzing Big Data Quickly.</p>
<p>In the analysis of big data there are often problem queries that don’t scale because they require huge compute resources and time to generate exact results. Examples include <i>count distinct</i>, quantiles, most-frequent items, joins, matrix computations, and graph analysis.</p>
<p>If approximate results are acceptable, there is a class of specialized algorithms, called streaming algorithms, or <a href="/docs/Background/SketchOrigins.html">sketches</a> that can produce results orders-of magnitude faster and with mathematically proven error bounds. For interactive queries there may not be other viable alternatives, and in the case of real-time analysis, sketches are the only known solution.</p>
<p>For any system that needs to extract useful information from big data these sketches are a required toolkit that should be tightly integrated into their analysis capabilities. This technology has helped Yahoo successfully reduce data processing times from days or hours to minutes or seconds on a number of its internal platforms.</p>
<p>This project is dedicated to providing a broad selection of sketch algorithms of production quality. Contributions are welcome from those interested in further development of this science and art.</p>
</div>
</div>
<div class="row text-center main-marketing">
<div class="col-md-4">
<a href="/docs/Architecture/LargeScale.html#speed">
<span class="fa fa-fighter-jet fa-4x"></span>
<h2>Fast</h2>
</a>
<p class="text-justify"><a href="/docs/Background/SketchOrigins.html">Sketches</a> are <i>fast</i>.
The sketch algorithms in this library process data in a single pass and are suitable for
both real-time and batch.
Sketches enable streaming computation of set expression cardinalities, quantiles, frequency estimation and more.
In addition, designing a system around sketching allows simplification of system's architecture and reduction in overall compute resources required for these heretofore difficult computational tasks.</p>
</div>
<div class="col-md-4">
<a href="/docs/Architecture/LargeScale.html#specific-sketch-features-for-large-data">
<span class="fa fa-database fa-4x"></span>
<h2>Big Data</h2>
</a>
<p class="text-justify">This library has been specifically designed for production systems that must process massive data.
The library includes adaptors for Apache Hive, Apache Pig, and PostgreSQL (C++). These adaptors also stand as examples for adaptors for other systems.
The sketches in this library are designed to have compatible binary representations across languages (Java, C++, Python) and platforms.
</p>
</div>
<div class="col-md-4">
<a href="/docs/Architecture/KeyFeatures.html#key-algorithms">
<span class="fa fa-bar-chart-o fa-4x"></span>
<h2>Analysis</h2>
</a>
<p class="text-justify">Built-in Theta Sketch set operators (Union, Intersection, Difference)
produce sketches as a result (and not just a number) enabling full set expressions of cardinality,
such as ((A &#8746; B) &#8745; (C &#8746; D)) \ (E &#8746; F).
This capability along with predictable and superior accuracy
(compared with <i>Include/Exclude</i> approaches) enable unprecedented analysis capabilities
for fast queries. </p>
</div>
</div>
</div>
<!-- End /index.md -->
<!-- Start _include/page_footer.html -->
<footer class="ds-footer">
<div class="container">
<div class="text-center">
<p>
<div>Copyright © 2020 <a href="https://www.apache.org">Apache Software Foundation</a>,
Licensed under the Apache License, Version 2.0. All Rights Reserved.<br/>
Apache DataSketches, Apache, the Apache feather logo, and the Apache DataSketches project logos are trademarks of The Apache Software Foundation.<br/>
All other marks mentioned may be trademarks or registered trademarks of their respective owners.
</div>
</p>
</div>
</div>
</footer>
<!-- End _include/page_footer.html -->
</body>
</html>
<!-- End _layouts/front_page.html-->