| <!DOCTYPE html> |
| <!-- Start _layouts/front_page.html--> |
| <html lang="en"> |
| |
| <head> |
| <!-- Start _include/site_head.html --> |
| <meta charset="UTF-8" /> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| <meta name="description" content=""> |
| <meta name="author" content="datasketches"> |
| |
| <title>DataSketches | </title> |
| |
| <link rel="shortcut icon" href="/img/favicon.png"> |
| |
| <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.1.0/css/font-awesome.min.css"> |
| <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap.min.css"> |
| |
| <link href='https://fonts.googleapis.com/css?family=Open+Sans+Condensed:300,700,300italic|Open+Sans:300italic,400italic,600italic,400,300,600' |
| rel='stylesheet' type='text/css'> |
| |
| <link rel="stylesheet" href="/css/main.css"> |
| <link rel="stylesheet" href="/css/header.css"> |
| <link rel="stylesheet" href="/css/footer.css"> |
| <link rel="stylesheet" href="/css/syntax.css"> |
| <link rel="stylesheet" href="/css/docs.css"> |
| |
| |
| <script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML-full"> |
| </script> |
| <script src="https://code.jquery.com/jquery.min.js"></script> |
| <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/js/bootstrap.min.js"></script> |
| <!-- End _include/site_head.html --> |
| </head> |
| |
| <body> |
| <!-- Start _include/nav_bar.html --> |
| <div class="navbar navbar-inverse navbar-static-top ds-nav"> |
| <div class="container"> |
| <div class="navbar-header"> |
| <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse"> |
| <span class="sr-only">Toggle navigation</span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| <a href="/" style="padding-top: 0px; padding-bottom: 0px;"> |
| <span class="ds-small-h-logo"></span></a> |
| </div> |
| <div class="navbar-collapse collapse"> |
| <ul class="nav navbar-nav navbar-right"> |
| <li> |
| <a href="/docs/Background/TheChallenge.html"> |
| <span class="fa fa-info-circle"></span> DOCUMENTATION</a> |
| </li> |
| <li> |
| <a href="/docs/Community/Downloads.html"> |
| <span class="fa fa-download"></span> DOWNLOAD</a> |
| </li> |
| <!-- |
| <li> |
| <a href="/docs/Architecture/Components.html"> |
| <span class="fa fa-github"></span> GITHUB</a> |
| </li> |
| --> |
| <li> |
| <a href="/docs/Community/Research.html"> |
| <span class="fa fa-paper-plane"></span> RESEARCH</a> |
| </li> |
| <li> |
| <a href="/docs/Community/index.html" style="padding-top: 0; padding-bottom: 0;"> |
| <img class="ds-small-man" src="/img/datasketches-ManWhite.svg"/>COMMUNITY</a> |
| </li> |
| <li> |
| <ul class="nav navbar-nav navbar-right ds-nav"> |
| <li class="dropdown ds-nav" > |
| <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false" style="padding-top: 0; padding-bottom: 0;"><img class="apache-logo" src="/img/feather.svg"/>Apache <span class="caret"></span></a> |
| <ul class="dropdown-menu ds-nav"> |
| <li><a href="https://www.apache.org/" target="_blank">Foundation</a></li> |
| <li><a href="https://www.apache.org/events/current-event" target="_blank">Events</a></li> |
| <li><a href="https://www.apache.org/licenses/" target="_blank">License</a></li> |
| <li><a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li> |
| <li><a href="https://www.apache.org/security/" target="_blank">Security</a></li> |
| <li><a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a></li> |
| </ul> |
| </li> |
| </ul> |
| </li> |
| </ul> |
| </div> |
| </div> |
| </div> |
| <!-- End _include/nav_bar.html --> |
| <!-- Start /index.md --> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| |
| <link rel="stylesheet" type="text/css" href="css/index.css" /> |
| |
| <link rel="stylesheet" type="text/css" href="css/header.css" /> |
| |
| <main class="ds-masthead"> |
| <div class="container"> |
| <div class="row"> |
| <div class="col-md-8 col-md-offset-2 text-center"> |
| <span class="ds-bootlogo"></span> |
| <p class="lead" style="font-size: 20px; line-height: 1.0; margin-bottom: 15px; color: #FFFFFF">A software library of |
| <a href="https://en.wikipedia.org/wiki/Stochastic" style="color: #EDE379"><i>stochastic</i></a> |
| <a href="https://en.wikipedia.org/wiki/Streaming_algorithm" style="color: #EDE379"><i>streaming algorithms</i></a></p> |
| <p class="lead" style="font-size: 18px; line-height: 1.0; margin-bottom: 15px; color: #FFFFFF"><i>"A truly excellent example of theoretically-informed algorithm engineering"</i> -- Graham Cormode</p> |
| </div> |
| </div> |
| </div> |
| </main> |
| |
| <div class="container"> |
| <div class="row"> |
| <div class="text-justify" style="font-size: 18px; padding-left: 25px; padding-right: 25px"> |
| <p><b>The Business Challenge:</b> Analyzing Big Data Quickly.</p> |
| <p>In the analysis of big data there are often problem queries that don’t scale because they require huge compute resources and time to generate exact results. Examples include <i>count distinct</i>, quantiles, most-frequent items, joins, matrix computations, and graph analysis.</p> |
| |
| <p>If approximate results are acceptable, there is a class of specialized algorithms, called streaming algorithms, or <a href="/docs/Background/SketchOrigins.html">sketches</a> that can produce results orders-of magnitude faster and with mathematically proven error bounds. For interactive queries there may not be other viable alternatives, and in the case of real-time analysis, sketches are the only known solution.</p> |
| |
| <p>For any system that needs to extract useful information from big data these sketches are a required toolkit that should be tightly integrated into their analysis capabilities. This technology has helped Yahoo successfully reduce data processing times from days or hours to minutes or seconds on a number of its internal platforms.</p> |
| |
| <p>This project is dedicated to providing a broad selection of sketch algorithms of production quality. Contributions are welcome from those interested in further development of this science and art.</p> |
| </div> |
| </div> |
| <div class="row text-center main-marketing"> |
| <div class="col-md-4"> |
| <a href="/docs/Architecture/LargeScale.html#speed"> |
| <span class="fa fa-fighter-jet fa-4x"></span> |
| <h2>Fast</h2> |
| </a> |
| <p class="text-justify"><a href="/docs/Background/SketchOrigins.html">Sketches</a> are <i>fast</i>. |
| The sketch algorithms in this library process data in a single pass and are suitable for |
| both real-time and batch. |
| Sketches enable streaming computation of set expression cardinalities, quantiles, frequency estimation and more. |
| In addition, designing a system around sketching allows simplification of system's architecture and reduction in overall compute resources required for these heretofore difficult computational tasks.</p> |
| </div> |
| |
| <div class="col-md-4"> |
| <a href="/docs/Architecture/LargeScale.html#specific-sketch-features-for-large-data"> |
| <span class="fa fa-database fa-4x"></span> |
| <h2>Big Data</h2> |
| </a> |
| <p class="text-justify">This library has been specifically designed for production systems that must process massive data. |
| The library includes adaptors for Apache Hive, Apache Pig, and PostgreSQL (C++). These adaptors also stand as examples for adaptors for other systems. |
| The sketches in this library are designed to have compatible binary representations across languages (Java, C++, Python) and platforms. |
| </p> |
| </div> |
| |
| <div class="col-md-4"> |
| <a href="/docs/Architecture/KeyFeatures.html#key-algorithms"> |
| <span class="fa fa-bar-chart-o fa-4x"></span> |
| <h2>Analysis</h2> |
| </a> |
| <p class="text-justify">Built-in Theta Sketch set operators (Union, Intersection, Difference) |
| produce sketches as a result (and not just a number) enabling full set expressions of cardinality, |
| such as ((A ∪ B) ∩ (C ∪ D)) \ (E ∪ F). |
| This capability along with predictable and superior accuracy |
| (compared with <i>Include/Exclude</i> approaches) enable unprecedented analysis capabilities |
| for fast queries. </p> |
| </div> |
| </div> |
| </div> |
| <!-- End /index.md --> |
| |
| <!-- Start _include/page_footer.html --> |
| <footer class="ds-footer"> |
| <div class="container"> |
| <div class="text-center"> |
| <p> |
| <div>Copyright © 2020 <a href="https://www.apache.org">Apache Software Foundation</a>, |
| Licensed under the Apache License, Version 2.0. All Rights Reserved.<br/> |
| Apache DataSketches, Apache, the Apache feather logo, and the Apache DataSketches project logos are trademarks of The Apache Software Foundation.<br/> |
| All other marks mentioned may be trademarks or registered trademarks of their respective owners. |
| </div> |
| </p> |
| </div> |
| </div> |
| </footer> |
| <!-- End _include/page_footer.html --> |
| |
| </body> |
| |
| </html> |
| <!-- End _layouts/front_page.html--> |