| <!DOCTYPE html> |
| <!-- Start _layouts/doc_page.html--> |
| <html lang="en"> |
| |
| <head> |
| <!-- Start _include/site_head.html --> |
| <meta charset="UTF-8" /> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| <meta name="description" content=""> |
| <meta name="author" content="datasketches"> |
| |
| <title>DataSketches | </title> |
| |
| <link rel="shortcut icon" href="/img/favicon.png"> |
| |
| <!-- original source: https://maxcdn.bootstrapcdn.com/font-awesome/4.1.0/css/font-awesome.min.css --> |
| <link rel="stylesheet" href="/css/font-awesome.min.css"> |
| <!-- original source: https://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap.min.css --> |
| <link rel="stylesheet" href="/css/bootstrap.min.css"> |
| |
| <link rel="stylesheet" href="/css/fonts.css" type="text/css"> |
| |
| <link rel="stylesheet" href="/css/main.css"> |
| <link rel="stylesheet" href="/css/header.css"> |
| <link rel="stylesheet" href="/css/footer.css"> |
| <link rel="stylesheet" href="/css/syntax.css"> |
| <link rel="stylesheet" href="/css/docs.css"> |
| |
| <script type="text/x-mathjax-config"> |
| MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]},showMathMenu:false,showMathMenuMSIE:false,showProcessingMessages:false}); |
| </script> |
| |
| <!-- original source: https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMX_HTML-full --> |
| <script type="text/javascript" src="/js/MathJax.js?config=TeX-AMS_HTML"></script> |
| |
| <!-- original source: https://code.jquery.com/jquery.min.js --> |
| <script src="/js/jquery.min.js"></script> |
| |
| <!-- original source: https://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/js/bootstrap.min.js --> |
| <script src="/js/bootstrap.min.js"></script> <!-- 3.2.0--> |
| <!-- End _include/site_head.html --> |
| |
| </head> |
| |
| <body> |
| <!-- Start _include/nav_bar.html --> |
| <div class="navbar navbar-inverse navbar-static-top ds-nav"> |
| <div class="container"> |
| <div class="navbar-header"> |
| <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse"> |
| <span class="sr-only">Toggle navigation</span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| <a href="/" style="padding-top: 0px; padding-bottom: 0px;"> |
| <span class="ds-small-h-logo"></span></a> |
| </div> |
| <div class="navbar-collapse collapse"> |
| <ul class="nav navbar-nav navbar-right"> |
| <li> |
| <a href="/docs/Background/TheChallenge.html"> |
| <span class="fa fa-info-circle"></span> DOCUMENTATION</a> |
| </li> |
| <li> |
| <a href="/docs/Community/Downloads.html"> |
| <span class="fa fa-download"></span> DOWNLOAD</a> |
| </li> |
| <!-- |
| <li> |
| <a href="/docs/Architecture/Components.html"> |
| <span class="fa fa-github"></span> GITHUB</a> |
| </li> |
| --> |
| <li> |
| <a href="/docs/Community/Research.html"> |
| <span class="fa fa-paper-plane"></span> RESEARCH</a> |
| </li> |
| <li> |
| <a href="/docs/Community/index.html" style="padding-top: 0; padding-bottom: 0;"> |
| <img class="ds-small-man" src="/img/datasketches-ManWhite.svg"/>COMMUNITY</a> |
| </li> |
| <li> |
| <ul class="nav navbar-nav navbar-right ds-nav"> |
| <li class="dropdown ds-nav" > |
| <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false" style="padding-top: 0; padding-bottom: 0;"><img class="apache-logo" src="/img/feather.svg"/>Apache <span class="caret"></span></a> |
| <ul class="dropdown-menu ds-nav"> |
| <li><a href="https://www.apache.org/" target="_blank">Foundation</a></li> |
| <li><a href="https://www.apache.org/events/current-event" target="_blank">Events</a></li> |
| <li><a href="https://www.apache.org/licenses/" target="_blank">License</a></li> |
| <li><a href="https://privacy.apache.org/policies/privacy-policy-public.html" target="_blank">Privacy Policy</a></li> |
| <li><a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li> |
| <li><a href="https://www.apache.org/security/" target="_blank">Security</a></li> |
| <li><a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a></li> |
| </ul> |
| </li> |
| </ul> |
| </li> |
| </ul> |
| </div> |
| </div> |
| </div> |
| <!-- End _include/nav_bar.html --> |
| |
| |
| <!-- Start _include/javadocs.html --> |
| <div class="ds-header"> |
| <div class="container"> |
| <h4>API Snapshots: |
| <a href="https://apache.github.io/datasketches-java/4.2.0/">Java Core</a>, |
| <a href="https://apache.github.io/datasketches-cpp/5.0.0/">C++ Core</a>, |
| <a href="https://apache.github.io/datasketches-python/main/">Python</a>, |
| <a href="https://apache.github.io/datasketches-memory/master/">Memory</a>, |
| <a href="/api/pig/snapshot/apidocs/index.html">Pig</a>, |
| <a href="/api/hive/snapshot/apidocs/index.html">Hive</a>, |
| </h4> |
| </div> |
| </div> |
| <!-- End _include/javadocs.html --> |
| |
| |
| <div class="container"> |
| <div class="row"> |
| <!-- Start ToC Block --> |
| <div class="col-md-3"> |
| <div class="searchbox" style="position:relative"> |
| <gcse:searchbox-only></gcse:searchbox-only> |
| </div> |
| <!-- Start _includes/toc.html --> |
| <!-- Computer Generated File, Do Not Edit! --> |
| <link rel="stylesheet" href="/css/toc.css"> |
| <div id="toc" class="nav toc hidden-print"> |
| |
| <p id="background"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_background">Background</a> |
| </p> |
| <div class="collapse" id="collapse_background"> |
| <li><a href="/docs/Background/TheChallenge.html">•The Challenge</a></li> |
| <li><a href="/docs/Background/SketchOrigins.html">•Sketch Origins</a></li> |
| <li><a href="/docs/Background/SketchElements.html">•Sketch Elements</a></li> |
| <li><a href="/docs/Background/Presentations.html">•Presentations</a></li> |
| <li><a href="https://github.com/apache/datasketches-website/tree/master/docs/pdf/DataSketches_deck.pdf">•Overview Slide Deck</a></li> |
| </div> |
| |
| <p id="architecture-and-design"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_architecture_and_design">Architecture And Design</a> |
| </p> |
| <div class="collapse" id="collapse_architecture_and_design"> |
| <li><a href="/docs/Architecture/MajorSketchFamilies.html">•The Major Sketch Families</a></li> |
| <li><a href="/docs/Architecture/LargeScale.html">•Large Scale Computing</a></li> |
| <li><a href="/docs/Architecture/KeyFeatures.html">•Key Features</a></li> |
| <li><a href="/docs/Architecture/SketchFeaturesMatrix.html">•Sketch Features Matrix</a></li> |
| <li><a href="/docs/Architecture/Components.html">•Components</a></li> |
| <li><a href="/docs/Architecture/SketchesByComponent.html">•Sketches by Component</a></li> |
| <li><a href="/docs/Architecture/SketchCriteria.html">•Sketch Criteria</a></li> |
| |
| <p id="memory-component"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_memory_component">Memory Component</a> |
| </p> |
| <div class="collapse" id="collapse_memory_component"> |
| <li><a href="/docs/Memory/MemoryComponent.html">•Memory Component</a></li> |
| <li><a href="/docs/Memory/MemoryPerformance.html">•Memory Component Performance</a></li> |
| </div> |
| <li><a href="/docs/Architecture/OrderSensitivity.html">•Notes on Order Sensitivity</a></li> |
| <li><a href="/docs/Architecture/Concurrency.html">•Notes on Concurrency</a></li> |
| </div> |
| |
| <p id="sketch-families"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_sketch_families">Sketch Families</a> |
| </p> |
| <div class="collapse" id="collapse_sketch_families"> |
| |
| <p id="distinct-counting"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_distinct_counting">Distinct Counting</a> |
| </p> |
| <div class="collapse" id="collapse_distinct_counting"> |
| <li><a href="/docs/DistinctCountFeaturesMatrix.html">•Features Matrix</a></li> |
| <li><a href="/docs/DistinctCountMeritComparisons.html">•Figures-of-Merit Comparison</a></li> |
| |
| <p id="cpc-sketches"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_cpc_sketches">CPC Sketches</a> |
| </p> |
| <div class="collapse" id="collapse_cpc_sketches"> |
| <li><a href="/docs/CPC/CPC.html">•CPC Sketch</a></li> |
| <li><a href="/docs/CPC/CpcPerformance.html">•CPC Sketch Performance</a></li> |
| |
| <p id="cpc-examples"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_cpc_examples">CPC Examples</a> |
| </p> |
| <div class="collapse" id="collapse_cpc_examples"> |
| <li><a href="/docs/CPC/CpcJavaExample.html">•CPC Sketch Java Example</a></li> |
| <li><a href="/docs/CPC/CpcCppExample.html">•CPC Sketch C++ Example</a></li> |
| <li><a href="/docs/CPC/CpcPigExample.html">•CPC Sketch Pig UDFs</a></li> |
| <li><a href="/docs/CPC/CpcHiveExample.html">•CPC Sketch Hive UDFs</a></li> |
| </div> |
| </div> |
| |
| <p id="hyperloglog-sketches"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_hyperloglog_sketches">HyperLogLog Sketches</a> |
| </p> |
| <div class="collapse" id="collapse_hyperloglog_sketches"> |
| <li><a href="/docs/HLL/HLL.html">•HLL Sketch</a></li> |
| <li><a href="/docs/HLL/HllMap.html">•HLL Map Sketch</a></li> |
| |
| <p id="hll-examples"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_hll_examples">HLL Examples</a> |
| </p> |
| <div class="collapse" id="collapse_hll_examples"> |
| <li><a href="/docs/HLL/HllJavaExample.html">•HLL Sketch Java Example</a></li> |
| <li><a href="/docs/HLL/HllCppExample.html">•HLL Sketch C++ Example</a></li> |
| <li><a href="/docs/HLL/HllPigUDFs.html">•HLL Sketch Pig UDFs</a></li> |
| <li><a href="/docs/HLL/HllHiveUDFs.html">•HLL Sketch Hive UDFs</a></li> |
| </div> |
| |
| <p id="hll-studies"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_hll_studies">HLL Studies</a> |
| </p> |
| <div class="collapse" id="collapse_hll_studies"> |
| <li><a href="/docs/HLL/HllPerformance.html">•HLL Sketch Performance</a></li> |
| <li><a href="/docs/HLL/Hll_vs_CS_Hllpp.html">•HLL vs Clearspring HLL++</a></li> |
| <li><a href="/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html">•HLL Sketch vs Druid HyperLogLogCollector</a></li> |
| </div> |
| </div> |
| |
| <p id="theta-sketches"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_theta_sketches">Theta Sketches</a> |
| </p> |
| <div class="collapse" id="collapse_theta_sketches"> |
| <li><a href="/docs/Theta/ThetaSketchFramework.html">•Theta Sketch Framework</a></li> |
| |
| <p id="theta-examples"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_theta_examples">Theta Examples</a> |
| </p> |
| <div class="collapse" id="collapse_theta_examples"> |
| <li><a href="/docs/Theta/ConcurrentThetaSketch.html">•Concurrent Theta Sketch</a></li> |
| <li><a href="/docs/Theta/ThetaJavaExample.html">•Theta Sketch Java Example</a></li> |
| <li><a href="/docs/Theta/ThetaSparkExample.html">•Theta Sketch Spark Example</a></li> |
| <li><a href="/docs/Theta/ThetaPigUDFs.html">•Theta Sketch Pig UDFs</a></li> |
| <li><a href="/docs/Theta/ThetaHiveUDFs.html">•Theta Sketch Hive UDFs</a></li> |
| </div> |
| |
| <p id="kmv-tutorial"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_kmv_tutorial">KMV Tutorial</a> |
| </p> |
| <div class="collapse" id="collapse_kmv_tutorial"> |
| <li><a href="/docs/Theta/InverseEstimate.html">•The Inverse Estimate</a></li> |
| <li><a href="/docs/Theta/KMVempty.html">•Empty Sketch</a></li> |
| <li><a href="/docs/Theta/KMVfirstEst.html">•First Estimator</a></li> |
| <li><a href="/docs/Theta/KMVbetterEst.html">•Better Estimator</a></li> |
| <li><a href="/docs/Theta/KMVrejection.html">•Rejection Rules</a></li> |
| <li><a href="/docs/Theta/KMVupdateVkth.html">•Update V(kth) Rule</a></li> |
| </div> |
| |
| <p id="set-operations-and-p-sampling"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_set_operations_and_p-sampling">Set Operations and P-sampling</a> |
| </p> |
| <div class="collapse" id="collapse_set_operations_and_p-sampling"> |
| <li><a href="/docs/Theta/ThetaSketchSetOps.html">•Set Operations</a></li> |
| <li><a href="/docs/Theta/ThetaSetOpsCornerCases.html">•Model & Test Set Operations</a></li> |
| <li><a href="/docs/Theta/ThetaPSampling.html">•<i>p</i>-Sampling</a></li> |
| </div> |
| |
| <p id="accuracy"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_accuracy">Accuracy</a> |
| </p> |
| <div class="collapse" id="collapse_accuracy"> |
| <li><a href="/docs/Theta/ThetaAccuracy.html">•Basic Accuracy</a></li> |
| <li><a href="/docs/Theta/ThetaAccuracyPlots.html">•Accuracy Plots</a></li> |
| <li><a href="/docs/Theta/ThetaErrorTable.html">•Relative Error Table</a></li> |
| <li><a href="/docs/Theta/ThetaSketchSetOpsAccuracy.html">•SetOp Accuracy</a></li> |
| <li><a href="/docs/Theta/AccuracyOfDifferentKUnions.html">•Unions With Different k</a></li> |
| </div> |
| |
| <p id="size"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_size">Size</a> |
| </p> |
| <div class="collapse" id="collapse_size"> |
| <li><a href="/docs/Theta/ThetaSize.html">•Theta Sketch Size</a></li> |
| </div> |
| |
| <p id="speed"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_speed">Speed</a> |
| </p> |
| <div class="collapse" id="collapse_speed"> |
| <li><a href="/docs/Theta/ThetaUpdateSpeed.html">•Update Speed</a></li> |
| <li><a href="/docs/Theta/ThetaMergeSpeed.html">•Merge Speed</a></li> |
| </div> |
| |
| <p id="theta-sketch-theory"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_theta_sketch_theory">Theta Sketch Theory</a> |
| </p> |
| <div class="collapse" id="collapse_theta_sketch_theory"> |
| <li><a href="https://github.com/apache/datasketches-website/tree/master/docs/pdf/ThetaSketchFramework.pdf">•Theta Sketch Framework (PDF)</a></li> |
| <li><a href="https://github.com/apache/datasketches-website/tree/master/docs/pdf/ThetaSketchEquations.pdf">•Theta Sketch Equations (PDF)</a></li> |
| <li><a href="https://github.com/apache/datasketches-website/tree/master/docs/pdf/DataSketches.pdf">•DataSketches (PDF)</a></li> |
| <li><a href="/docs/Theta/ThetaConfidenceIntervals.html">•Confidence Intervals Notes</a></li> |
| <li><a href="/docs/Theta/ThetaMergingAlgorithm.html">•Merging Algorithm Notes</a></li> |
| <li><a href="/docs/Theta/ThetaReferences.html">•Theta References</a></li> |
| </div> |
| </div> |
| |
| <p id="tuple-sketches"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_tuple_sketches">Tuple Sketches</a> |
| </p> |
| <div class="collapse" id="collapse_tuple_sketches"> |
| <li><a href="/docs/Tuple/TupleOverview.html">•Tuple Overview</a></li> |
| |
| <p id="tuple-examples"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_tuple_examples">Tuple Examples</a> |
| </p> |
| <div class="collapse" id="collapse_tuple_examples"> |
| <li><a href="/docs/Tuple/TupleJavaExample.html">•Tuple Java Example</a></li> |
| <li><a href="/docs/Tuple/TupleEngagementExample.html">•Tuple Engagement Example</a></li> |
| <li><a href="/docs/Tuple/TuplePigUDFs.html">•Tuple Pig UDFs</a></li> |
| <li><a href="/docs/Tuple/TupleHiveUDFs.html">•Tuple Hive UDFs</a></li> |
| </div> |
| </div> |
| </div> |
| |
| <p id="most-frequent"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_most_frequent">Most Frequent</a> |
| </p> |
| <div class="collapse" id="collapse_most_frequent"> |
| <li><a href="/docs/Frequency/FrequencySketchesOverview.html">•Frequency Sketches Overview</a></li> |
| |
| <p id="frequent-item-sketches"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_frequent_item_sketches">Frequent Item Sketches</a> |
| </p> |
| <div class="collapse" id="collapse_frequent_item_sketches"> |
| <li><a href="/docs/Frequency/FrequentItemsOverview.html">•Frequent Items Overview</a></li> |
| <li><a href="/docs/Frequency/FrequentItemsErrorTable.html">•Frequent Items Error Table</a></li> |
| <li><a href="/docs/Frequency/FrequentItemsReferences.html">•Frequent Items References</a></li> |
| <li><a href="/docs/Frequency/FrequentItemsPerformance.html">•Frequent Items Performance</a></li> |
| |
| <p id="most-frequent-examples"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_most_frequent_examples">Most Frequent Examples</a> |
| </p> |
| <div class="collapse" id="collapse_most_frequent_examples"> |
| <li><a href="/docs/Frequency/FrequentItemsJavaExample.html">•Frequent Items Java Example</a></li> |
| <li><a href="/docs/Frequency/FrequentItemsCppExample.html">•Frequent Items C++ Example</a></li> |
| <li><a href="/docs/Frequency/FrequentItemsPigUDFs.html">•Frequent Items Pig UDFs</a></li> |
| <li><a href="/docs/Frequency/FrequentItemsHiveUDFs.html">•Frequent Items Hive UDFs</a></li> |
| </div> |
| </div> |
| |
| <p id="frequent-distinct-sketches"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_frequent_distinct_sketches">Frequent Distinct Sketches</a> |
| </p> |
| <div class="collapse" id="collapse_frequent_distinct_sketches"> |
| <li><a href="/docs/Frequency/FrequentDistinctTuplesSketch.html">•Frequent Distinct Tuples Sketch</a></li> |
| </div> |
| </div> |
| |
| <p id="quantiles-and-histograms"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_quantiles_and_histograms">Quantiles And Histograms</a> |
| </p> |
| <div class="collapse" id="collapse_quantiles_and_histograms"> |
| <li><a href="/docs/Quantiles/SketchingQuantilesAndRanksTutorial.html">•Quantiles and Ranks Tutorial</a></li> |
| <li><a href="/docs/Quantiles/QuantilesOverview.html">•Quantiles Overview</a></li> |
| <li><a href="/docs/KLL/KLLSketch.html">•KLL Floats sketch</a></li> |
| <li><a href="/docs/KLL/KLLAccuracyAndSize.html">•KLL Sketch Accuracy and Size</a></li> |
| <li><a href="/docs/REQ/ReqSketch.html">•REQ Floats sketch</a></li> |
| <li><a href="/docs/Quantiles/OrigQuantilesSketch.html">•Original QuantilesSketch</a></li> |
| |
| <p id="quantiles-examples"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_quantiles_examples">Quantiles Examples</a> |
| </p> |
| <div class="collapse" id="collapse_quantiles_examples"> |
| <li><a href="/docs/Quantiles/QuantilesJavaExample.html">•Quantiles Sketch Java Example</a></li> |
| <li><a href="/docs/KLL/KLLCppExample.html">•KLL Quantiles Sketch C++ Example</a></li> |
| <li><a href="/docs/Quantiles/QuantilesPigUDFs.html">•Quantiles Sketch Pig UDFs</a></li> |
| <li><a href="/docs/Quantiles/QuantilesHiveUDFs.html">•Quantiles Sketch Hive UDFs</a></li> |
| </div> |
| |
| <p id="quantiles-studies"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_quantiles_studies">Quantiles Studies</a> |
| </p> |
| <div class="collapse" id="collapse_quantiles_studies"> |
| <li><a href="/docs/QuantilesStudies/DruidApproxHistogramStudy.html">•Druid Approximate Histogram</a></li> |
| <li><a href="/docs/QuantilesStudies/MomentsSketchStudy.html">•Moments Sketch Study</a></li> |
| <li><a href="/docs/QuantilesStudies/QuantilesStreamAStudy.html">•Quantiles StreamA Study</a></li> |
| <li><a href="/docs/QuantilesStudies/ExactQuantiles.html">•Exact Quantiles for Studies</a></li> |
| </div> |
| |
| <p id="quantiles-sketch-theory"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_quantiles_sketch_theory">Quantiles Sketch Theory</a> |
| </p> |
| <div class="collapse" id="collapse_quantiles_sketch_theory"> |
| <li><a href="https://github.com/apache/datasketches-website/tree/master/docs/pdf/Quantiles_KLL.pdf">•Optimal Quantile Approximation in Streams</a></li> |
| <li><a href="/docs/Quantiles/QuantilesReferences.html">•Quantiles References</a></li> |
| </div> |
| </div> |
| |
| <p id="sampling"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_sampling">Sampling</a> |
| </p> |
| <div class="collapse" id="collapse_sampling"> |
| <li><a href="/docs/Sampling/ReservoirSampling.html">•Reservoir Sampling</a></li> |
| <li><a href="/docs/Sampling/ReservoirSamplingPerformance.html">•Reservoir Sampling Performance</a></li> |
| <li><a href="/docs/Sampling/VarOptSampling.html">•VarOpt Sampling</a></li> |
| |
| <p id="sampling-examples"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_sampling_examples">Sampling Examples</a> |
| </p> |
| <div class="collapse" id="collapse_sampling_examples"> |
| <li><a href="/docs/Sampling/ReservoirSamplingJava.html">•Reservoir Sampling Java Example</a></li> |
| <li><a href="/docs/Sampling/ReservoirSamplingPigUDFs.html">•Reservoir Sampling Pig UDFs</a></li> |
| <li><a href="/docs/Sampling/VarOptSamplingJava.html">•VarOpt Sampling Java Example</a></li> |
| <li><a href="/docs/Sampling/VarOptPigUDFs.html">•VarOpt Sampling Pig UDFs</a></li> |
| </div> |
| </div> |
| </div> |
| |
| <p id="system-integrations"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_system_integrations">System Integrations</a> |
| </p> |
| <div class="collapse" id="collapse_system_integrations"> |
| <li><a href="/docs/SystemIntegrations/ApacheDruidIntegration.html">•Using Sketches in ApacheDruid</a></li> |
| <li><a href="/docs/SystemIntegrations/ApacheHiveIntegration.html">•Using Sketches in Apache Hive</a></li> |
| <li><a href="/docs/SystemIntegrations/ApachePigIntegration.html">•Using Sketches in Apache Pig</a></li> |
| <li><a href="/docs/SystemIntegrations/PostgreSQLIntegration.html">•Using Sketches in PostgreSQL</a></li> |
| </div> |
| |
| <p id="community"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_community">Community</a> |
| </p> |
| <div class="collapse" id="collapse_community"> |
| <li><a href="/docs/Community/index.html">•Community</a></li> |
| <li><a href="/docs/Community/Downloads.html">•Downloads</a></li> |
| <li><a href="/docs/Community/NewCommitterProcess.html">•Committer Process</a></li> |
| <li><a href="/docs/Community/ReleaseProcessForCppComponents.html">•Release Process For CPP Components</a></li> |
| <li><a href="/docs/Community/ReleaseProcessForJavaComponents.html">•Release Process For Java Components</a></li> |
| <li><a href="/docs/Community/Transitioning.html">•Transitioning from prior GitHub Site</a></li> |
| </div> |
| |
| <p id="research"> |
| <a data-toggle="collapse" class="menu collapsed" href="#collapse_research">Research</a> |
| </p> |
| <div class="collapse" id="collapse_research"> |
| <li><a href="/docs/Community/Research.html">•Research</a></li> |
| </div> |
| </div> |
| <!-- End _includes/toc.html --> |
| |
| |
| <!-- Start _includes/tocScript.html --> |
| <script> |
| (function () { |
| |
| var findLineItem = function (path) { |
| return document.querySelector(`#toc [href="${path}"]`); |
| }; |
| |
| function findNavItem(path) { |
| return document.querySelector(`.nav [href="${path}"]`); |
| } |
| |
| var highlighLineItem = function (element) { |
| element.classList.add('highlight'); |
| }; |
| |
| var checkHasClass = function (element, className) { |
| return element.className.split(' ').find(function (item) { return item === className || '' }) |
| } |
| |
| var findAllCollapseParents = function (element) { |
| var collapseMenus = []; |
| var elementPointer = element; |
| while (elementPointer !== document.body) { |
| if (checkHasClass(elementPointer, 'collapse')) { |
| collapseMenus.push(elementPointer); |
| } |
| elementPointer = elementPointer.parentElement |
| } |
| return collapseMenus |
| }; |
| |
| var openMenuItem = function (element) { |
| // $(element).collapse('show') would start a transition, adding `in` class instead. |
| element.classList.add('in'); |
| }; |
| |
| var openAllFromList = function (elementList) { |
| elementList.forEach(openMenuItem); |
| }; |
| |
| var highlightAndOpenMenu = function () { |
| // Highlight & expand nav item in the TOC |
| var currentLineItem = findLineItem(document.location.pathname); |
| highlighLineItem(currentLineItem); |
| openAllFromList(findAllCollapseParents(currentLineItem)); |
| |
| // Highlight nav item in top navigation |
| highlighLineItem(findNavItem(document.location.pathname)); |
| }; |
| |
| $(highlightAndOpenMenu); |
| |
| }()); |
| </script> |
| <!-- End _includes/tocScript.html --> |
| |
| </div> |
| <!-- End ToC Block --> |
| <div class="col-md-9 doc-content"> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| <h2 id="size-and-byte-array-structures">Size and Byte Array Structures</h2> |
| |
| <p>All the sketches in the <i>theta</i> package share a common byte array data structure and similar |
| strategies for managing use of memory when the sketches are instantiated on the Java heap or |
| when instantiated in off-heap native memory.</p> |
| |
| <p>The byte array structure has two forms <i>Hash Table</i> and <i>Compact</i>.</p> |
| |
| <h3 id="update-sketch-families--hash-table-form">Update Sketch Families – Hash Table Form</h3> |
| <p>The Hash Table form is similar to how the sketch is instantiated on the Java heap. |
| Hash tables consume more space depending on how full the table is. |
| However, updating the sketch is much faster in this form and is the default for all the Update Sketches.</p> |
| |
| <p>There are two Update Sketch families, <i>QuickSelect</i>, and <i>Alpha</i>. |
| The QuickSelect family has both a Java heap and a direct, off-heap variants while the |
| Alpha sketch is designed only for operation on the Java heap.</p> |
| |
| <p>The Update Sketches consist of an internal hash table cache and a few other primitives. |
| How the cache is sized and/or resized can be controlled by the user specified |
| <i>ResizeFactor enum</i> that has the values <i>X1, X2, X4</i>, and <i>X8</i> (default). |
| The meaning of these values is as follows:</p> |
| |
| <ul> |
| <li>X1: A Resize Factor of 1 means the cache will not increase its size. A new sketch will |
| immediately allocate space for twice the given Nominal Entries. |
| For example, if <i>nomEntries is 4096</i> the internal cache will be allocated at 8192 entries. |
| Each entry is 8 bytes.</li> |
| <li>X2: A Resize Factor of 2 means the cache may increase the cache size by factors of 2. |
| A new sketch will initially allocate a very small cache (typically enough for 16 entries), |
| and, when required, will increase the cache size by factors of 2 until the maximum cache |
| size is reached, which is twice the Nominal Entries.</li> |
| <li>X4: A Resize Factor of 4 is similar to X2, except the resizing factor is 4.</li> |
| <li>X8: A Resize Factor of 8 is similar to X2, except the resizing factor is 8. |
| This is the default.</li> |
| </ul> |
| |
| <p>Resizing a cache is costly as a new array needs to be allocated and the hash table rebuilt. |
| Depending on the application, the user can select the Resize Factor to trade-off memory size |
| and overall update speed performance.</p> |
| |
| <p>When an Update Sketch is converted to a byte array using <i>toByteArray()</i>, |
| the internal structure is a 24-byte preamble followed by a non-contiguous |
| hash table array of 8-byte, long data entries. |
| This enables the sketch to be quickly reconstructed from the byte array so that updating |
| of the sketch can be continued.</p> |
| |
| <h3 id="compact-sketch-family--compact-form">Compact Sketch Family – Compact Form</h3> |
| <p>Once the updating of a sketch is completed the HT is no longer needed, so the sketch can be |
| stored in a compact form. |
| The size of this compact form is a simple function of the number of retained hash values |
| (8 bytes) and a small preamble that varies from 8 to 24 bytes depending on the |
| internal state of the sketch. An empty sketch is represented by only 8 bytes. |
| The upper limit of the size of the sketch varies by the type of sketch but is |
| in the range of <i>8<em>k to 16</em>k</i>. |
| <i>k</i> is the configured size of the sketch in nominal entries, |
| and also determines the accuracy of the sketch.</p> |
| |
| <p>Compact Sketches optimize memory usage and sketch merge performance, are inherently <i>read only</i> |
| and can only be created from an existing Update Sketch or Set Operation |
| (<i>Union, Intersection, or AnotB</i>) object. |
| The internal structure of Compact Sketches is an 8, 16, or 24 byte preamble followed |
| by a contiguous data array of 8-byte, long data entries. |
| This data array can be either <i>ordered</i> or <i>unordered</i>. |
| This data structure is the same whether the Compact Sketch is instantiated on the |
| Java heap or in off-heap direct memory. |
| An empty Compact Sketch consumes only 8 bytes.</p> |
| |
| <p>The <i>ordered</i> form is ideal for systems environments where the building of the sketches |
| from data occur in an offline system (like a Hadoop grid), then compacted, ordered and |
| uploaded to a real-time query engine (like Druid) where the compact sketches can be quickly |
| merged to satisfy end-user queries. |
| The ordered form enables <i>early stop</i> enhancements to the merge algorithm |
| that makes the merging extreemly fast (~ 10’s of millions of sketches per second).</p> |
| |
| <p>The <i>unordered</i> form is more desirable for systems environments where the |
| building of the sketches from data occur in real-time and queried in real-time. |
| In this environment there is no need to pay the cost of the sort.</p> |
| |
| <p>Thus, the choice of <i>ordered</i> or <i>unordered</i> is a tradeoff between |
| real-time sketch build & getEstimate() performance and offline sketch-build |
| and real-time merge performance.</p> |
| |
| <p>The Compact Sketch is created four different ways selected by the ordering preference |
| (<i>dstOrdered</i>) and whether the Compact Sketch will reside on-heap or off-heap (<i>dstMemory</i>).</p> |
| |
| <ul> |
| <li>Unordered, On-heap |
| <ul> |
| <li>Update Sketch: <i>compact(false, null)</i></li> |
| <li>Set Operation: <i>getResult(false, null)</i></li> |
| </ul> |
| </li> |
| <li>Ordered, On-heap |
| <ul> |
| <li>Update Sketch: <i>compact(true, null)</i></li> |
| <li>Set Operation: <i>getResult(true, null)</i></li> |
| </ul> |
| </li> |
| <li>Unordered, Off-heap |
| <ul> |
| <li>Update Sketch: <i>compact(false, Memory mem)</i></li> |
| <li>Set Operation: <i>getResult(false, Memory mem)</i></li> |
| </ul> |
| </li> |
| <li>Ordered, Off-heap |
| <ul> |
| <li>Update Sketch: <i>compact(true, Memory mem)</i></li> |
| <li>Set Operation: <i>getResult(true, Memory mem)</i></li> |
| </ul> |
| </li> |
| </ul> |
| |
| <h4 id="compact-sketch-sizing">Compact Sketch Sizing</h4> |
| <p>These tables compute the size of a sketch after it has been converted into Compact Form. |
| The size of a sketch during the build phase is explained above as the sketch starts small and |
| resizes by the configurable <i>Resize Factor</i> up to the in-memory size of <i>2k*8</i> bytes plus |
| a few primitives.</p> |
| |
| <p>Note: a sketch entry = 8 bytes.</p> |
| |
| <h5 id="quick-select-sketch-default">Quick Select Sketch (Default)</h5> |
| <p>The number of valid entries in the Quick Select Sketch after it enters estimation mode |
| statistically varies from <i>k</i> to <i>15k/8</i> with an average of about <i>3k/2</i>. |
| It is a user option to force a rebuild() prior to compacting the sketch in which case the |
| number of valid entries is never larger than <i>k</i>.</p> |
| |
| <table> |
| <thead> |
| <tr> |
| <th> </th> |
| <th>Empty</th> |
| <th>After Rebuild()</th> |
| <th>Estimating Avg</th> |
| <th>Estimating Max</th> |
| </tr> |
| <tr> |
| <th>Nominal Entries (k) : Formula -></th> |
| <th>8</th> |
| <th>k*8 +24</th> |
| <th>k*12 + 24</th> |
| <th>k*15 + 24</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td>16</td> |
| <td>8</td> |
| <td>152</td> |
| <td>216</td> |
| <td>264</td> |
| </tr> |
| <tr> |
| <td>32</td> |
| <td>8</td> |
| <td>280</td> |
| <td>408</td> |
| <td>504</td> |
| </tr> |
| <tr> |
| <td>64</td> |
| <td>8</td> |
| <td>536</td> |
| <td>792</td> |
| <td>984</td> |
| </tr> |
| <tr> |
| <td>128</td> |
| <td>8</td> |
| <td>1,048</td> |
| <td>1,560</td> |
| <td>1,944</td> |
| </tr> |
| <tr> |
| <td>256</td> |
| <td>8</td> |
| <td>2,072</td> |
| <td>3,096</td> |
| <td>3,864</td> |
| </tr> |
| <tr> |
| <td>512</td> |
| <td>8</td> |
| <td>4,120</td> |
| <td>6,168</td> |
| <td>7,704</td> |
| </tr> |
| <tr> |
| <td>1,024</td> |
| <td>8</td> |
| <td>8,216</td> |
| <td>12,312</td> |
| <td>15,384</td> |
| </tr> |
| <tr> |
| <td>2,048</td> |
| <td>8</td> |
| <td>16,408</td> |
| <td>24,600</td> |
| <td>30,744</td> |
| </tr> |
| <tr> |
| <td>4,096</td> |
| <td>8</td> |
| <td>32,792</td> |
| <td>49,176</td> |
| <td>61,464</td> |
| </tr> |
| <tr> |
| <td>8,192</td> |
| <td>8</td> |
| <td>65,560</td> |
| <td>98,328</td> |
| <td>122,904</td> |
| </tr> |
| <tr> |
| <td>16,384</td> |
| <td>8</td> |
| <td>131,096</td> |
| <td>196,632</td> |
| <td>245,784</td> |
| </tr> |
| <tr> |
| <td>32,768</td> |
| <td>8</td> |
| <td>262,168</td> |
| <td>393,240</td> |
| <td>491,544</td> |
| </tr> |
| <tr> |
| <td>65,536</td> |
| <td>8</td> |
| <td>524,312</td> |
| <td>786,456</td> |
| <td>983,064</td> |
| </tr> |
| <tr> |
| <td>131,072</td> |
| <td>8</td> |
| <td>1,048,600</td> |
| <td>1,572,888</td> |
| <td>1,966,104</td> |
| </tr> |
| <tr> |
| <td>262,144</td> |
| <td>8</td> |
| <td>2,097,176</td> |
| <td>3,145,752</td> |
| <td>3,932,184</td> |
| </tr> |
| <tr> |
| <td>524,288</td> |
| <td>8</td> |
| <td>4,194,328</td> |
| <td>6,291,480</td> |
| <td>7,864,344</td> |
| </tr> |
| <tr> |
| <td>1,048,576</td> |
| <td>8</td> |
| <td>8,388,632</td> |
| <td>12,582,936</td> |
| <td>15,728,664</td> |
| </tr> |
| </tbody> |
| </table> |
| |
| <h5 id="alpha-sketch">Alpha Sketch</h5> |
| <p>The number of valid entries in the Alpha Sketch after it enters estimation mode |
| is a random variable that has a probability distribution with a mean of <i>k</i> |
| and a standard deviation of <i>sqrt(k)</i>. |
| The last column computes the maximum size with a confidence of 99.997% representing |
| plus 4 standard deviations.</p> |
| |
| <table> |
| <thead> |
| <tr> |
| <th> </th> |
| <th>Empty</th> |
| <th>Estimating Avg</th> |
| <th>Std Dev</th> |
| <th>Max @ 99.997%</th> |
| </tr> |
| <tr> |
| <th>Nominal Entries (k) : Formula -></th> |
| <th>8</th> |
| <th>k*8 + 24</th> |
| <th>sqrt(k)</th> |
| <th>(k+4SD)*8 +24</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td>512</td> |
| <td>8</td> |
| <td>4,120</td> |
| <td>23</td> |
| <td>4,844</td> |
| </tr> |
| <tr> |
| <td>1,024</td> |
| <td>9</td> |
| <td>8,216</td> |
| <td>32</td> |
| <td>9,240</td> |
| </tr> |
| <tr> |
| <td>2,048</td> |
| <td>10</td> |
| <td>16,408</td> |
| <td>45</td> |
| <td>17,856</td> |
| </tr> |
| <tr> |
| <td>4,096</td> |
| <td>11</td> |
| <td>32,792</td> |
| <td>64</td> |
| <td>34,840</td> |
| </tr> |
| <tr> |
| <td>8,192</td> |
| <td>12</td> |
| <td>65,560</td> |
| <td>91</td> |
| <td>68,456</td> |
| </tr> |
| <tr> |
| <td>16,384</td> |
| <td>13</td> |
| <td>131,096</td> |
| <td>128</td> |
| <td>135,192</td> |
| </tr> |
| <tr> |
| <td>32,768</td> |
| <td>14</td> |
| <td>262,168</td> |
| <td>181</td> |
| <td>267,961</td> |
| </tr> |
| <tr> |
| <td>65,536</td> |
| <td>15</td> |
| <td>524,312</td> |
| <td>256</td> |
| <td>532,504</td> |
| </tr> |
| <tr> |
| <td>131,072</td> |
| <td>16</td> |
| <td>1,048,600</td> |
| <td>362</td> |
| <td>1,060,185</td> |
| </tr> |
| <tr> |
| <td>262,144</td> |
| <td>17</td> |
| <td>2,097,176</td> |
| <td>512</td> |
| <td>2,113,560</td> |
| </tr> |
| <tr> |
| <td>524,288</td> |
| <td>18</td> |
| <td>4,194,328</td> |
| <td>724</td> |
| <td>4,217,498</td> |
| </tr> |
| <tr> |
| <td>1,048,576</td> |
| <td>19</td> |
| <td>8,388,632</td> |
| <td>1,024</td> |
| <td>8,421,400</td> |
| </tr> |
| </tbody> |
| </table> |
| |
| <h3 id="set-operation-family">Set Operation Family</h3> |
| |
| <h4 id="union">Union</h4> |
| <p>The <i>Union</i> operation has both a Java heap and a direct, off-heap variant. |
| When a Union operation is converted to a byte array using <i>toByteArray()</i>, |
| the internal structure is a 32-byte preamble followed by a non-contiguous hash |
| table array of 8-byte, long data entries. |
| This enables the Union to be quickly reconstructed from the byte array |
| so that updating of the Union can be continued.</p> |
| |
| <h4 id="intersection">Intersection</h4> |
| <p>The <i>Intersection</i> operation has both a Java heap and a direct, off-heap variant. |
| When an Intersection operation is converted to a byte array using <i>toByteArray()</i>, |
| the internal structure is a 24-byte preamble followed by a non-contiguous hash |
| table array of 8-byte, long data entries. |
| This enables the Intersection to be quickly reconstructed from the byte array |
| so that updating of the Intersection can be continued.</p> |
| |
| <h4 id="a-not-b">A not B</h4> |
| <p>The <i>A not B</i> operation is asymmetric and stateless. |
| Both the <i>A</i> and <i>B</i> arguments are presented and the difference |
| is computed and returned. |
| There is no need for a byte array form.</p> |
| |
| |
| </div> <!-- End content --> |
| </div> <!-- End row --> |
| </div> <!-- End Container --> |
| |
| <!-- Start _include/page_footer.html --> |
| <footer class="ds-footer"> |
| <div class="container"> |
| <div class="text-center"> |
| <p> |
| <div>Copyright © 2024 <a href="https://www.apache.org">Apache Software Foundation</a>, |
| Licensed under the Apache License, Version 2.0. All Rights Reserved. |
| | <a href="https://privacy.apache.org/policies/privacy-policy-public.html">Privacy Policy</a><br/> |
| Apache DataSketches, Apache, the Apache feather logo, and the Apache DataSketches project logos are trademarks of The Apache Software Foundation.<br/> |
| All other marks mentioned may be trademarks or registered trademarks of their respective owners. |
| </div> |
| </p> |
| </div> |
| </div> |
| </footer> |
| <!-- End _include/page_footer.html --> |
| |
| </body> |
| |
| </html> |
| <!-- End _layouts/doc_page.html--> |