blob: c9e129dd13b66a54253e2f0e10ddb8dc323cb872 [file] [log] [blame]
<!DOCTYPE html>
<!-- Start _layouts/doc_page.html-->
<html lang="en">
<head>
<!-- Start _include/site_head.html -->
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="">
<meta name="author" content="datasketches">
<title>DataSketches | </title>
<link rel="shortcut icon" href="/img/favicon.png">
<!-- original source: https://maxcdn.bootstrapcdn.com/font-awesome/4.1.0/css/font-awesome.min.css -->
<link rel="stylesheet" href="/css/font-awesome.min.css">
<!-- original source: https://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap.min.css -->
<link rel="stylesheet" href="/css/bootstrap.min.css">
<link rel="stylesheet" href="/css/fonts.css" type="text/css">
<link rel="stylesheet" href="/css/main.css">
<link rel="stylesheet" href="/css/header.css">
<link rel="stylesheet" href="/css/footer.css">
<link rel="stylesheet" href="/css/syntax.css">
<link rel="stylesheet" href="/css/docs.css">
<script type="text/x-mathjax-config">
MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]},showMathMenu:false,showMathMenuMSIE:false,showProcessingMessages:false});
</script>
<!-- original source: https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMX_HTML-full -->
<script type="text/javascript" src="/js/MathJax.js?config=TeX-AMS_HTML"></script>
<!-- original source: https://code.jquery.com/jquery.min.js -->
<script src="/js/jquery.min.js"></script>
<!-- original source: https://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/js/bootstrap.min.js -->
<script src="/js/bootstrap.min.js"></script> <!-- 3.2.0-->
<!-- End _include/site_head.html -->
</head>
<body>
<!-- Start _include/nav_bar.html -->
<div class="navbar navbar-inverse navbar-static-top ds-nav">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a href="/" style="padding-top: 0px; padding-bottom: 0px;">
<span class="ds-small-h-logo"></span></a>
</div>
<div class="navbar-collapse collapse">
<ul class="nav navbar-nav navbar-right">
<li>
<a href="/docs/Background/TheChallenge.html">
<span class="fa fa-info-circle"></span> DOCUMENTATION</a>
</li>
<li>
<a href="/docs/Community/Downloads.html">
<span class="fa fa-download"></span> DOWNLOAD</a>
</li>
<!--
<li>
<a href="/docs/Architecture/Components.html">
<span class="fa fa-github"></span> GITHUB</a>
</li>
-->
<li>
<a href="/docs/Community/Research.html">
<span class="fa fa-paper-plane"></span> RESEARCH</a>
</li>
<li>
<a href="/docs/Community/index.html" style="padding-top: 0; padding-bottom: 0;">
<img class="ds-small-man" src="/img/datasketches-ManWhite.svg"/>COMMUNITY</a>
</li>
<li>
<ul class="nav navbar-nav navbar-right ds-nav">
<li class="dropdown ds-nav" >
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false" style="padding-top: 0; padding-bottom: 0;"><img class="apache-logo" src="/img/feather.svg"/>Apache <span class="caret"></span></a>
<ul class="dropdown-menu ds-nav">
<li><a href="https://www.apache.org/" target="_blank">Foundation</a></li>
<li><a href="https://www.apache.org/events/current-event" target="_blank">Events</a></li>
<li><a href="https://www.apache.org/licenses/" target="_blank">License</a></li>
<li><a href="https://privacy.apache.org/policies/privacy-policy-public.html" target="_blank">Privacy Policy</a></li>
<li><a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li>
<li><a href="https://www.apache.org/security/" target="_blank">Security</a></li>
<li><a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a></li>
</ul>
</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
<!-- End _include/nav_bar.html -->
<!-- Start _include/javadocs.html -->
<div class="ds-header">
<div class="container">
<h4>API Snapshots:
<a href="https://apache.github.io/datasketches-java/4.2.0/">Java Core</a>,
<a href="https://apache.github.io/datasketches-cpp/5.0.0/">C++ Core</a>,
<a href="https://apache.github.io/datasketches-python/main/">Python</a>,
<a href="https://apache.github.io/datasketches-memory/master/">Memory</a>,
<a href="/api/pig/snapshot/apidocs/index.html">Pig</a>,
<a href="/api/hive/snapshot/apidocs/index.html">Hive</a>,
</h4>
</div>
</div>
<!-- End _include/javadocs.html -->
<div class="container">
<div class="row">
<!-- Start ToC Block -->
<div class="col-md-3">
<div class="searchbox" style="position:relative">
<gcse:searchbox-only></gcse:searchbox-only>
</div>
<!-- Start _includes/toc.html -->
<!-- Computer Generated File, Do Not Edit! -->
<link rel="stylesheet" href="/css/toc.css">
<div id="toc" class="nav toc hidden-print">
<p id="background">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_background">Background</a>
</p>
<div class="collapse" id="collapse_background">
<li><a href="/docs/Background/TheChallenge.html">•The Challenge</a></li>
<li><a href="/docs/Background/SketchOrigins.html">•Sketch Origins</a></li>
<li><a href="/docs/Background/SketchElements.html">•Sketch Elements</a></li>
<li><a href="/docs/Background/Presentations.html">•Presentations</a></li>
<li><a href="https://github.com/apache/datasketches-website/tree/master/docs/pdf/DataSketches_deck.pdf">•Overview Slide Deck</a></li>
</div>
<p id="architecture-and-design">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_architecture_and_design">Architecture And Design</a>
</p>
<div class="collapse" id="collapse_architecture_and_design">
<li><a href="/docs/Architecture/MajorSketchFamilies.html">•The Major Sketch Families</a></li>
<li><a href="/docs/Architecture/LargeScale.html">•Large Scale Computing</a></li>
<li><a href="/docs/Architecture/KeyFeatures.html">•Key Features</a></li>
<li><a href="/docs/Architecture/SketchFeaturesMatrix.html">•Sketch Features Matrix</a></li>
<li><a href="/docs/Architecture/Components.html">•Components</a></li>
<li><a href="/docs/Architecture/SketchesByComponent.html">•Sketches by Component</a></li>
<li><a href="/docs/Architecture/SketchCriteria.html">•Sketch Criteria</a></li>
<p id="memory-component">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_memory_component">Memory Component</a>
</p>
<div class="collapse" id="collapse_memory_component">
<li><a href="/docs/Memory/MemoryComponent.html">•Memory Component</a></li>
<li><a href="/docs/Memory/MemoryPerformance.html">•Memory Component Performance</a></li>
</div>
<li><a href="/docs/Architecture/OrderSensitivity.html">•Notes on Order Sensitivity</a></li>
<li><a href="/docs/Architecture/Concurrency.html">•Notes on Concurrency</a></li>
</div>
<p id="sketch-families">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_sketch_families">Sketch Families</a>
</p>
<div class="collapse" id="collapse_sketch_families">
<p id="distinct-counting">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_distinct_counting">Distinct Counting</a>
</p>
<div class="collapse" id="collapse_distinct_counting">
<li><a href="/docs/DistinctCountFeaturesMatrix.html">•Features Matrix</a></li>
<li><a href="/docs/DistinctCountMeritComparisons.html">•Figures-of-Merit Comparison</a></li>
<p id="cpc-sketches">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_cpc_sketches">CPC Sketches</a>
</p>
<div class="collapse" id="collapse_cpc_sketches">
<li><a href="/docs/CPC/CPC.html">•CPC Sketch</a></li>
<li><a href="/docs/CPC/CpcPerformance.html">•CPC Sketch Performance</a></li>
<p id="cpc-examples">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_cpc_examples">CPC Examples</a>
</p>
<div class="collapse" id="collapse_cpc_examples">
<li><a href="/docs/CPC/CpcJavaExample.html">•CPC Sketch Java Example</a></li>
<li><a href="/docs/CPC/CpcCppExample.html">•CPC Sketch C++ Example</a></li>
<li><a href="/docs/CPC/CpcPigExample.html">•CPC Sketch Pig UDFs</a></li>
<li><a href="/docs/CPC/CpcHiveExample.html">•CPC Sketch Hive UDFs</a></li>
</div>
</div>
<p id="hyperloglog-sketches">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_hyperloglog_sketches">HyperLogLog Sketches</a>
</p>
<div class="collapse" id="collapse_hyperloglog_sketches">
<li><a href="/docs/HLL/HLL.html">•HLL Sketch</a></li>
<li><a href="/docs/HLL/HllMap.html">•HLL Map Sketch</a></li>
<p id="hll-examples">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_hll_examples">HLL Examples</a>
</p>
<div class="collapse" id="collapse_hll_examples">
<li><a href="/docs/HLL/HllJavaExample.html">•HLL Sketch Java Example</a></li>
<li><a href="/docs/HLL/HllCppExample.html">•HLL Sketch C++ Example</a></li>
<li><a href="/docs/HLL/HllPigUDFs.html">•HLL Sketch Pig UDFs</a></li>
<li><a href="/docs/HLL/HllHiveUDFs.html">•HLL Sketch Hive UDFs</a></li>
</div>
<p id="hll-studies">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_hll_studies">HLL Studies</a>
</p>
<div class="collapse" id="collapse_hll_studies">
<li><a href="/docs/HLL/HllPerformance.html">•HLL Sketch Performance</a></li>
<li><a href="/docs/HLL/Hll_vs_CS_Hllpp.html">•HLL vs Clearspring HLL++</a></li>
<li><a href="/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html">•HLL Sketch vs Druid HyperLogLogCollector</a></li>
</div>
</div>
<p id="theta-sketches">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_theta_sketches">Theta Sketches</a>
</p>
<div class="collapse" id="collapse_theta_sketches">
<li><a href="/docs/Theta/ThetaSketchFramework.html">•Theta Sketch Framework</a></li>
<p id="theta-examples">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_theta_examples">Theta Examples</a>
</p>
<div class="collapse" id="collapse_theta_examples">
<li><a href="/docs/Theta/ConcurrentThetaSketch.html">•Concurrent Theta Sketch</a></li>
<li><a href="/docs/Theta/ThetaJavaExample.html">•Theta Sketch Java Example</a></li>
<li><a href="/docs/Theta/ThetaSparkExample.html">•Theta Sketch Spark Example</a></li>
<li><a href="/docs/Theta/ThetaPigUDFs.html">•Theta Sketch Pig UDFs</a></li>
<li><a href="/docs/Theta/ThetaHiveUDFs.html">•Theta Sketch Hive UDFs</a></li>
</div>
<p id="kmv-tutorial">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_kmv_tutorial">KMV Tutorial</a>
</p>
<div class="collapse" id="collapse_kmv_tutorial">
<li><a href="/docs/Theta/InverseEstimate.html">•The Inverse Estimate</a></li>
<li><a href="/docs/Theta/KMVempty.html">•Empty Sketch</a></li>
<li><a href="/docs/Theta/KMVfirstEst.html">•First Estimator</a></li>
<li><a href="/docs/Theta/KMVbetterEst.html">•Better Estimator</a></li>
<li><a href="/docs/Theta/KMVrejection.html">•Rejection Rules</a></li>
<li><a href="/docs/Theta/KMVupdateVkth.html">•Update V(kth) Rule</a></li>
</div>
<p id="set-operations-and-p-sampling">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_set_operations_and_p-sampling">Set Operations and P-sampling</a>
</p>
<div class="collapse" id="collapse_set_operations_and_p-sampling">
<li><a href="/docs/Theta/ThetaSketchSetOps.html">•Set Operations</a></li>
<li><a href="/docs/Theta/ThetaSetOpsCornerCases.html">•Model & Test Set Operations</a></li>
<li><a href="/docs/Theta/ThetaPSampling.html"><i>p</i>-Sampling</a></li>
</div>
<p id="accuracy">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_accuracy">Accuracy</a>
</p>
<div class="collapse" id="collapse_accuracy">
<li><a href="/docs/Theta/ThetaAccuracy.html">•Basic Accuracy</a></li>
<li><a href="/docs/Theta/ThetaAccuracyPlots.html">•Accuracy Plots</a></li>
<li><a href="/docs/Theta/ThetaErrorTable.html">•Relative Error Table</a></li>
<li><a href="/docs/Theta/ThetaSketchSetOpsAccuracy.html">•SetOp Accuracy</a></li>
<li><a href="/docs/Theta/AccuracyOfDifferentKUnions.html">•Unions With Different k</a></li>
</div>
<p id="size">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_size">Size</a>
</p>
<div class="collapse" id="collapse_size">
<li><a href="/docs/Theta/ThetaSize.html">•Theta Sketch Size</a></li>
</div>
<p id="speed">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_speed">Speed</a>
</p>
<div class="collapse" id="collapse_speed">
<li><a href="/docs/Theta/ThetaUpdateSpeed.html">•Update Speed</a></li>
<li><a href="/docs/Theta/ThetaMergeSpeed.html">•Merge Speed</a></li>
</div>
<p id="theta-sketch-theory">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_theta_sketch_theory">Theta Sketch Theory</a>
</p>
<div class="collapse" id="collapse_theta_sketch_theory">
<li><a href="https://github.com/apache/datasketches-website/tree/master/docs/pdf/ThetaSketchFramework.pdf">•Theta Sketch Framework (PDF)</a></li>
<li><a href="https://github.com/apache/datasketches-website/tree/master/docs/pdf/ThetaSketchEquations.pdf">•Theta Sketch Equations (PDF)</a></li>
<li><a href="https://github.com/apache/datasketches-website/tree/master/docs/pdf/DataSketches.pdf">•DataSketches (PDF)</a></li>
<li><a href="/docs/Theta/ThetaConfidenceIntervals.html">•Confidence Intervals Notes</a></li>
<li><a href="/docs/Theta/ThetaMergingAlgorithm.html">•Merging Algorithm Notes</a></li>
<li><a href="/docs/Theta/ThetaReferences.html">•Theta References</a></li>
</div>
</div>
<p id="tuple-sketches">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_tuple_sketches">Tuple Sketches</a>
</p>
<div class="collapse" id="collapse_tuple_sketches">
<li><a href="/docs/Tuple/TupleOverview.html">•Tuple Overview</a></li>
<p id="tuple-examples">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_tuple_examples">Tuple Examples</a>
</p>
<div class="collapse" id="collapse_tuple_examples">
<li><a href="/docs/Tuple/TupleJavaExample.html">•Tuple Java Example</a></li>
<li><a href="/docs/Tuple/TupleEngagementExample.html">•Tuple Engagement Example</a></li>
<li><a href="/docs/Tuple/TuplePigUDFs.html">•Tuple Pig UDFs</a></li>
<li><a href="/docs/Tuple/TupleHiveUDFs.html">•Tuple Hive UDFs</a></li>
</div>
</div>
</div>
<p id="most-frequent">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_most_frequent">Most Frequent</a>
</p>
<div class="collapse" id="collapse_most_frequent">
<li><a href="/docs/Frequency/FrequencySketchesOverview.html">•Frequency Sketches Overview</a></li>
<p id="frequent-item-sketches">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_frequent_item_sketches">Frequent Item Sketches</a>
</p>
<div class="collapse" id="collapse_frequent_item_sketches">
<li><a href="/docs/Frequency/FrequentItemsOverview.html">•Frequent Items Overview</a></li>
<li><a href="/docs/Frequency/FrequentItemsErrorTable.html">•Frequent Items Error Table</a></li>
<li><a href="/docs/Frequency/FrequentItemsReferences.html">•Frequent Items References</a></li>
<li><a href="/docs/Frequency/FrequentItemsPerformance.html">•Frequent Items Performance</a></li>
<p id="most-frequent-examples">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_most_frequent_examples">Most Frequent Examples</a>
</p>
<div class="collapse" id="collapse_most_frequent_examples">
<li><a href="/docs/Frequency/FrequentItemsJavaExample.html">•Frequent Items Java Example</a></li>
<li><a href="/docs/Frequency/FrequentItemsCppExample.html">•Frequent Items C++ Example</a></li>
<li><a href="/docs/Frequency/FrequentItemsPigUDFs.html">•Frequent Items Pig UDFs</a></li>
<li><a href="/docs/Frequency/FrequentItemsHiveUDFs.html">•Frequent Items Hive UDFs</a></li>
</div>
</div>
<p id="frequent-distinct-sketches">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_frequent_distinct_sketches">Frequent Distinct Sketches</a>
</p>
<div class="collapse" id="collapse_frequent_distinct_sketches">
<li><a href="/docs/Frequency/FrequentDistinctTuplesSketch.html">•Frequent Distinct Tuples Sketch</a></li>
</div>
</div>
<p id="quantiles-and-histograms">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_quantiles_and_histograms">Quantiles And Histograms</a>
</p>
<div class="collapse" id="collapse_quantiles_and_histograms">
<li><a href="/docs/Quantiles/SketchingQuantilesAndRanksTutorial.html">•Quantiles and Ranks Tutorial</a></li>
<li><a href="/docs/Quantiles/QuantilesOverview.html">•Quantiles Overview</a></li>
<li><a href="/docs/KLL/KLLSketch.html">•KLL Floats sketch</a></li>
<li><a href="/docs/KLL/KLLAccuracyAndSize.html">•KLL Sketch Accuracy and Size</a></li>
<li><a href="/docs/REQ/ReqSketch.html">•REQ Floats sketch</a></li>
<li><a href="/docs/Quantiles/OrigQuantilesSketch.html">•Original QuantilesSketch</a></li>
<p id="quantiles-examples">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_quantiles_examples">Quantiles Examples</a>
</p>
<div class="collapse" id="collapse_quantiles_examples">
<li><a href="/docs/Quantiles/QuantilesJavaExample.html">•Quantiles Sketch Java Example</a></li>
<li><a href="/docs/KLL/KLLCppExample.html">•KLL Quantiles Sketch C++ Example</a></li>
<li><a href="/docs/Quantiles/QuantilesPigUDFs.html">•Quantiles Sketch Pig UDFs</a></li>
<li><a href="/docs/Quantiles/QuantilesHiveUDFs.html">•Quantiles Sketch Hive UDFs</a></li>
</div>
<p id="quantiles-studies">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_quantiles_studies">Quantiles Studies</a>
</p>
<div class="collapse" id="collapse_quantiles_studies">
<li><a href="/docs/QuantilesStudies/DruidApproxHistogramStudy.html">•Druid Approximate Histogram</a></li>
<li><a href="/docs/QuantilesStudies/MomentsSketchStudy.html">•Moments Sketch Study</a></li>
<li><a href="/docs/QuantilesStudies/QuantilesStreamAStudy.html">•Quantiles StreamA Study</a></li>
<li><a href="/docs/QuantilesStudies/ExactQuantiles.html">•Exact Quantiles for Studies</a></li>
</div>
<p id="quantiles-sketch-theory">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_quantiles_sketch_theory">Quantiles Sketch Theory</a>
</p>
<div class="collapse" id="collapse_quantiles_sketch_theory">
<li><a href="https://github.com/apache/datasketches-website/tree/master/docs/pdf/Quantiles_KLL.pdf">•Optimal Quantile Approximation in Streams</a></li>
<li><a href="/docs/Quantiles/QuantilesReferences.html">•Quantiles References</a></li>
</div>
</div>
<p id="sampling">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_sampling">Sampling</a>
</p>
<div class="collapse" id="collapse_sampling">
<li><a href="/docs/Sampling/ReservoirSampling.html">•Reservoir Sampling</a></li>
<li><a href="/docs/Sampling/ReservoirSamplingPerformance.html">•Reservoir Sampling Performance</a></li>
<li><a href="/docs/Sampling/VarOptSampling.html">•VarOpt Sampling</a></li>
<p id="sampling-examples">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_sampling_examples">Sampling Examples</a>
</p>
<div class="collapse" id="collapse_sampling_examples">
<li><a href="/docs/Sampling/ReservoirSamplingJava.html">•Reservoir Sampling Java Example</a></li>
<li><a href="/docs/Sampling/ReservoirSamplingPigUDFs.html">•Reservoir Sampling Pig UDFs</a></li>
<li><a href="/docs/Sampling/VarOptSamplingJava.html">•VarOpt Sampling Java Example</a></li>
<li><a href="/docs/Sampling/VarOptPigUDFs.html">•VarOpt Sampling Pig UDFs</a></li>
</div>
</div>
</div>
<p id="system-integrations">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_system_integrations">System Integrations</a>
</p>
<div class="collapse" id="collapse_system_integrations">
<li><a href="/docs/SystemIntegrations/ApacheDruidIntegration.html">•Using Sketches in ApacheDruid</a></li>
<li><a href="/docs/SystemIntegrations/ApacheHiveIntegration.html">•Using Sketches in Apache Hive</a></li>
<li><a href="/docs/SystemIntegrations/ApachePigIntegration.html">•Using Sketches in Apache Pig</a></li>
<li><a href="/docs/SystemIntegrations/PostgreSQLIntegration.html">•Using Sketches in PostgreSQL</a></li>
</div>
<p id="community">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_community">Community</a>
</p>
<div class="collapse" id="collapse_community">
<li><a href="/docs/Community/index.html">•Community</a></li>
<li><a href="/docs/Community/Downloads.html">•Downloads</a></li>
<li><a href="/docs/Community/NewCommitterProcess.html">•Committer Process</a></li>
<li><a href="/docs/Community/ReleaseProcessForCppComponents.html">•Release Process For CPP Components</a></li>
<li><a href="/docs/Community/ReleaseProcessForJavaComponents.html">•Release Process For Java Components</a></li>
<li><a href="/docs/Community/Transitioning.html">•Transitioning from prior GitHub Site</a></li>
</div>
<p id="research">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_research">Research</a>
</p>
<div class="collapse" id="collapse_research">
<li><a href="/docs/Community/Research.html">•Research</a></li>
</div>
</div>
<!-- End _includes/toc.html -->
<!-- Start _includes/tocScript.html -->
<script>
(function () {
var findLineItem = function (path) {
return document.querySelector(`#toc [href="${path}"]`);
};
function findNavItem(path) {
return document.querySelector(`.nav [href="${path}"]`);
}
var highlighLineItem = function (element) {
element.classList.add('highlight');
};
var checkHasClass = function (element, className) {
return element.className.split(' ').find(function (item) { return item === className || '' })
}
var findAllCollapseParents = function (element) {
var collapseMenus = [];
var elementPointer = element;
while (elementPointer !== document.body) {
if (checkHasClass(elementPointer, 'collapse')) {
collapseMenus.push(elementPointer);
}
elementPointer = elementPointer.parentElement
}
return collapseMenus
};
var openMenuItem = function (element) {
// $(element).collapse('show') would start a transition, adding `in` class instead.
element.classList.add('in');
};
var openAllFromList = function (elementList) {
elementList.forEach(openMenuItem);
};
var highlightAndOpenMenu = function () {
// Highlight & expand nav item in the TOC
var currentLineItem = findLineItem(document.location.pathname);
highlighLineItem(currentLineItem);
openAllFromList(findAllCollapseParents(currentLineItem));
// Highlight nav item in top navigation
highlighLineItem(findNavItem(document.location.pathname));
};
$(highlightAndOpenMenu);
}());
</script>
<!-- End _includes/tocScript.html -->
</div>
<!-- End ToC Block -->
<div class="col-md-9 doc-content">
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<h1 id="theta-sketch-and-tuple-sketch-set-operation-corner-cases">Theta Sketch and Tuple Sketch Set Operation Corner Cases</h1>
<p>The <em>TupleSketch</em> is an extension of the <em>ThetaSketch</em> and both are part of the <em>Theta Sketch Framework</em><sup>1</sup>.
In this document, the term <em>Theta</em> (upper case) when referencing sketches will refer to both the <em>ThetaSketch</em> and the <em>TupleSketch</em>.<br />
This is not to be confused with the term <em>theta</em> (lower case), which refers to the sketch variable that tracks the sampling probability of the sketch.</p>
<p>Because Theta sketches provide the set operations of <em>intersection</em> and <em>difference</em> (<em>A and not B</em> or just <em>A not B</em>), a number of corner cases arise that require some analysis to determine how the code should handle them.</p>
<p>Theta sketches track three key variables in addition to retained data:</p>
<ul>
<li>
<p><em>theta</em>: This is the current sampling probability of the sketch and mathematically expressed as a real number between 0 and 1 inclusive. In the code it is expressed as a double-precision (64-bit) floating-point value. However, internally in the sketch, this value is expressed as a 64-bit, signed, long integer (usually identified as <em>thetaLong</em> in the code), where the maximum positive value (<em>Long.MAX_VALUE</em>) is interpreted as the double 1.0. In this document we will only refer to the mathematical quantity <em>theta</em>.</p>
</li>
<li>
<p><em>retained entries</em> or <em>count</em>: This is the number of hash values currently retained in the sketch. It can never be less than zero.</p>
</li>
<li>
<p><em>empty</em>:</p>
<ul>
<li>By definition, if <em>empty = true</em>, the number of <em>retained entries</em> must be zero. The value of <em>theta</em> is irrelevant in this case, and can be assumed to be 1.0.</li>
<li>If <em>empty</em> = false, the <em>retained entries</em> can be zero or greater than zero, and <em>theta</em> can be 1.0 or less than 1.0.</li>
</ul>
</li>
</ul>
<p>We have developed a shorthand notation for these three variables to record their state as <em>{theta, retained entries, empty}</em>.
When analyzing the corner cases of the set operations, we only need to know whether <em>theta</em> is 1.0 or less than 1.0, <em>retained entries</em> is zero or greater than zero, and <em>empty</em> is true or false. These are further abbreviated as</p>
<ul>
<li><em>theta</em> can be <em>1.0</em> or <em>&lt;1.0</em></li>
<li><em>retained entries</em> can be either <em>0</em> or <em>&gt;0</em></li>
<li><em>empty</em> can be either <em>T</em> or <em>F</em></li>
</ul>
<p>Each of the above three states can be represented as a boolean variable.
Thus, there are 8 possible combinations of the three variables.</p>
<hr />
<p><sup>1</sup> Anirban Dasgupta, Kevin J. Lang, Lee Rhodes, and Justin Thaler. A framework for estimating stream expression cardinalities. In <em>EDBT/ICDT Proceedings 2016</em>, pages 6:1–6:17, 2016.</p>
<h2 id="valid-states-of-a-sketch">Valid States of a Sketch</h2>
<p>Of the eight possible combinations of the three boolean variables and using the above notation, there are four valid states of a <em>Theta</em> sketch.</p>
<h3 id="empty10-0-t">Empty{1.0, 0, T}</h3>
<p>When a new sketch is created, <em>theta</em> is set to 1.0, <em>retained entries</em> is set to zero, and <em>empty</em> is true.
This state can also occur as the result of a set operation, where the operation creates a new sketch to potentially load result data into the sketch but there is no data to load into the sketch.
So it effectively returns a new empty sketch that has been untouched and unaffected by the input arguments to the set operation.</p>
<h3 id="exact10-0-f">Exact{1.0, &gt;0, F}</h3>
<p>All of the <em>Theta</em> sketches have an internal buffer that is effectively a list of hash values of the items received by the sketch.
If the number of distinct input items does not exceed the size of that buffer, the sketch is in <em>exact</em> mode.
There is no probabilistic estimation involved so <em>theta = 1.0</em>, which indicates that all distinct values are in the buffer.
<em>retained entries</em> is the count of those values in the buffer, and the sketch is not <em>empty</em>.</p>
<h3 id="estimation10-0-f">Estimation{&lt;1.0, &gt;0, F}</h3>
<p>Here, the number of distinct inputs to the sketch have exceeded the size of the buffer, so the sketch must start choosing what values to retain in the sketch and starts reducing the value of <em>theta</em> accordingly. <em>theta &lt; 1.0</em>, <em>retained entries &gt; 0</em>, and <em>empty = F</em>.</p>
<h3 id="degenerate10-0-f2">Degenerate{&lt;1.0, 0, F}<sup>2</sup></h3>
<p>This requires some explanation.</p>
<p>Imagine we have two large data sets, A and B, with only a few items in common.
The exact intersection of these two sets, <em>A∩B</em> would result in those few common items.</p>
<p>Now suppose we compute Sketch(A) and Sketch(B).
Because sketches are approximate and the items from each set are chosen at random, there is some probability that one of the sketches may not contain any of the common items.
As a result, the sketch intersection of these two sets, <em>Sketch(A)∩Sketch(B)</em>, which is also approximate, might contain zero retained entries.
Even though the retained entries are zero, the upper bound of the estimated number of distinct values from the input domain is clearly greater than zero, but missed by the sketch intersection.
This upper bound can be computed statistically.
It is too complex to discuss further here, but the sketch code actually performs this estimation.</p>
<p>Where both input sketches are non-empty, there is a non-zero probability that the intersection will have zero entries, yet the statistics tell us that the result may
not be really empty, we may have been just unlucky.<br />
We indicate this by setting the result <em>empty = F</em>, and <em>retained entries = 0</em>.
The resulting <em>theta = min(thetaA, thetaB)</em>.
Calling <em>getUpperBound(…)</em> on the resulting intersection will reveal the best estimate of how many values might exist in the intersection of the raw data.
The <em>getLowerBound(…)</em> will be zero because it is also possible that the two sets, A and B, were exactly disjoint.</p>
<hr />
<p><sup>2</sup>Note that this degenerate state can also result from an AnotB operation or the Union operation, which will be discussed below.</p>
<h3 id="summary-table-of-the-valid-states-of-a-sketch">Summary Table of the Valid States of a Sketch</h3>
<p>The <em>Has Seen Data</em> column is not an independent variable, but helps with the interpretation of the state.</p>
<p>We can assign a single octal digit ID to each state where</p>
<ul>
<li><em>theta = 1.0 := 4, else 0</em></li>
<li><em>retained entries &gt;0 := 2, else 0</em></li>
<li><em>empty = true := 1, else 0</em></li>
</ul>
<p>The octal digit <code class="highlighter-rouge">ID = ((theta == 1.0) ? 4 : 0) | ((retainedEntries &gt; 0) ? 2 : 0) | (empty ? 1 : 0);</code></p>
<table>
<thead>
<tr>
<th style="text-align: center">Shorthand<br />Notation</th>
<th style="text-align: center">Theta</th>
<th style="text-align: center">Retained<br />Entries</th>
<th style="text-align: center">Empty</th>
<th style="text-align: center">Has Seen<br />Data</th>
<th style="text-align: center">ID</th>
<th style="text-align: left">Comments</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center">Empty<br />{1.0,0,T}</td>
<td style="text-align: center">1.0</td>
<td style="text-align: center">0</td>
<td style="text-align: center">T</td>
<td style="text-align: center">F</td>
<td style="text-align: center">5</td>
<td style="text-align: left">Empty Sketch</td>
</tr>
<tr>
<td style="text-align: center">Exact<br />{1.0,&gt;0,F}</td>
<td style="text-align: center">1.0</td>
<td style="text-align: center">&gt;0</td>
<td style="text-align: center">F</td>
<td style="text-align: center">T</td>
<td style="text-align: center">6</td>
<td style="text-align: left">Exact Mode Sketch</td>
</tr>
<tr>
<td style="text-align: center">Estimation<br />{&lt;1.0,&gt;0,F}</td>
<td style="text-align: center">&lt;1.0</td>
<td style="text-align: center">&gt;0</td>
<td style="text-align: center">F</td>
<td style="text-align: center">T</td>
<td style="text-align: center">2</td>
<td style="text-align: left">Estimation Mode Sketch</td>
</tr>
<tr>
<td style="text-align: center">Degenerate<br />{&lt;1.0,0,F}<sup>3</sup></td>
<td style="text-align: center">&lt;1.0</td>
<td style="text-align: center">0</td>
<td style="text-align: center">F</td>
<td style="text-align: center">T</td>
<td style="text-align: center">0</td>
<td style="text-align: left">Degenerate and valid<br />Intersect or AnotB result</td>
</tr>
</tbody>
</table>
<hr />
<p><sup>3</sup> <em>Degenerate</em>: This can occur as an estimating result of a an Intersection of two disjoint sets,
an AnotB of two identical sets, or the Union of two <em>Degenerate</em> sets.</p>
<h2 id="invalid-states-of-a-sketch">Invalid States of a Sketch</h2>
<p>The remaining four combinations of the variables are invalid and should not occur.</p>
<p>The <em>Has Seen Data</em> column is not an independent variable, but helps with the interpretation of the state.</p>
<p>| Theta | Retained<br />Entries | Empty<br />Flag | Has Seen<br />Data | Comments |
|:—–:|:——————-:|:————-:|:—————-:|:———————————————————————————————–|
| 1.0 | 0 | F | T | If it has seen data, Empty = F.<sup>4</sup> <br />∴ Theta cannot be = 1.0 AND Entries = 0 |
| 1.0 | &gt;0 | T | F | If it has not seen data, Empty = T. <br />∴ Entries cannot be &gt; 0 |
| &lt;1.0 | &gt;0 | T | F | If it has not seen data, Empty = T. <br />∴ Theta cannot be &lt; 1.0 OR Entries &gt; 0 |
| &lt;1.0 | 0 | T | F | If it has not seen data, Empty = T.<sup>5</sup> <br />∴ Theta cannot be &lt; 1.0 |
<sup>4</sup>This can occur internally as the result from an intersection of two exact, disjoint sets, or AnotB of two exact, identical sets.
There is no probability distribution, so this is converted internally to EMPTY {1.0, 0, T}. A Union cannot produce this result.</p>
<p><sup>5</sup>This can occur internally as the initial state of an UpdateSketch if p was set to less than 1.0 by the user and the sketch has not seen any data.
There is no probability distribution because the sketch has not been offered any data, so this is converted internally to EMPTY {1.0, 0, T}.</p>
<h2 id="state-combinations-of-two-sketches-and-set-operation-results">State Combinations of Two Sketches and Set Operation Results</h2>
<p>Each sketch can have four valid states, which means we can have 16 combinations of states of two sketches as expanded in the following table.</p>
<table>
<thead>
<tr>
<th style="text-align: center">Sketch A<br />State</th>
<th style="text-align: center">Sketch B<br />State</th>
<th style="text-align: center">Pair<br />ID</th>
<th style="text-align: center">Intersection<br />Action</th>
<th style="text-align: center">AnotB<br />Action</th>
<th style="text-align: center">Union<br />Action</th>
<th style="text-align: center">Action IDs</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center">Empty<br />{1.0,0,T}</td>
<td style="text-align: center">Empty<br />{1.0,0,T}</td>
<td style="text-align: center">55</td>
<td style="text-align: center">Empty<br />{1.0,0,T}=A=B</td>
<td style="text-align: center">Empty<br />{1.0,0,T}=A</td>
<td style="text-align: center">Empty<br />{1.0,0,T}=A=B</td>
<td style="text-align: center">E,E,E</td>
</tr>
<tr>
<td style="text-align: center">Empty<br />{1.0,0,T}</td>
<td style="text-align: center">Exact<br />{1.0,&gt;0,F}</td>
<td style="text-align: center">56</td>
<td style="text-align: center">Empty<br />{1.0,0,T}=A</td>
<td style="text-align: center">Empty<br />{1.0,0,T}=A</td>
<td style="text-align: center">Sketch B</td>
<td style="text-align: center">E,E,B</td>
</tr>
<tr>
<td style="text-align: center">Empty<br />{1.0,0,T}</td>
<td style="text-align: center">Estimation<br />{&lt;1.0,&gt;0,F}</td>
<td style="text-align: center">52</td>
<td style="text-align: center">Empty<br />{1.0,0,T}=A</td>
<td style="text-align: center">Empty<br />{1.0,0,T}=A</td>
<td style="text-align: center">Sketch B</td>
<td style="text-align: center">E,E,B</td>
</tr>
<tr>
<td style="text-align: center">Empty<br />{1.0,0,T}</td>
<td style="text-align: center">Degenerate<br />{&lt;1.0,0,F}</td>
<td style="text-align: center">50</td>
<td style="text-align: center">Empty<br />{1.0,0,T}=A</td>
<td style="text-align: center">Empty<br />{1.0,0,T}=A</td>
<td style="text-align: center">Degenerate<br />{ThetaB,0,F}=B</td>
<td style="text-align: center">E,E,DB</td>
</tr>
<tr>
<td style="text-align: center">Exact<br />{1.0,&gt;0,F}</td>
<td style="text-align: center">Empty<br />{1.0,0,T}</td>
<td style="text-align: center">65</td>
<td style="text-align: center">Empty<br />{1.0,0,T}=B</td>
<td style="text-align: center">Sketch A</td>
<td style="text-align: center">Sketch A</td>
<td style="text-align: center">E,A,A</td>
</tr>
<tr>
<td style="text-align: center">Exact<br />{1.0,&gt;0,F}</td>
<td style="text-align: center">Exact<br />{1.0,&gt;0,F}</td>
<td style="text-align: center">66</td>
<td style="text-align: center">Full Intersect</td>
<td style="text-align: center">Full AnotB</td>
<td style="text-align: center">Full Union</td>
<td style="text-align: center">I,N,U</td>
</tr>
<tr>
<td style="text-align: center">Exact<br />{1.0,&gt;0,F}</td>
<td style="text-align: center">Estimation<br />{&lt;1.0,&gt;0,F}</td>
<td style="text-align: center">62</td>
<td style="text-align: center">Full Intersect</td>
<td style="text-align: center">Full AnotB</td>
<td style="text-align: center">Full Union</td>
<td style="text-align: center">I,N,U</td>
</tr>
<tr>
<td style="text-align: center">Exact<br />{1.0,&gt;0,F}</td>
<td style="text-align: center">Degenerate<br />{&lt;1.0,0,F}</td>
<td style="text-align: center">60</td>
<td style="text-align: center">Degenerate<br />{ThetaB,0,F}=B</td>
<td style="text-align: center">Trim A<br />by minTheta</td>
<td style="text-align: center">Trim A<br />by minTheta</td>
<td style="text-align: center">D,TA,TA</td>
</tr>
<tr>
<td style="text-align: center">Estimation<br />{&lt;1.0,&gt;0,F}</td>
<td style="text-align: center">Empty<br />{1.0,0,T}</td>
<td style="text-align: center">25</td>
<td style="text-align: center">Empty<br />{1.0,0,T}=B</td>
<td style="text-align: center">Sketch A</td>
<td style="text-align: center">Sketch A</td>
<td style="text-align: center">E,A,A</td>
</tr>
<tr>
<td style="text-align: center">Estimation<br />{&lt;1.0,&gt;0,F}</td>
<td style="text-align: center">Exact<br />{1.0,&gt;0,F}</td>
<td style="text-align: center">26</td>
<td style="text-align: center">Full Intersect</td>
<td style="text-align: center">Full AnotB</td>
<td style="text-align: center">Full Union</td>
<td style="text-align: center">I,N,U</td>
</tr>
<tr>
<td style="text-align: center">Estimation<br />{&lt;1.0,&gt;0,F}</td>
<td style="text-align: center">Estimation<br />{&lt;1.0,&gt;0,F}</td>
<td style="text-align: center">22</td>
<td style="text-align: center">Full Intersect</td>
<td style="text-align: center">Full AnotB</td>
<td style="text-align: center">Full Union</td>
<td style="text-align: center">I,N,U</td>
</tr>
<tr>
<td style="text-align: center">Estimation<br />{&lt;1.0,&gt;0,F}</td>
<td style="text-align: center">Degenerate<br />{&lt;1.0,0,F}</td>
<td style="text-align: center">20</td>
<td style="text-align: center">Degenerate<br />{minTheta,0,F}</td>
<td style="text-align: center">Trim A<br />by minTheta</td>
<td style="text-align: center">Trim A<br />by minTheta</td>
<td style="text-align: center">D,TA,TA</td>
</tr>
<tr>
<td style="text-align: center">Degenerate<br />{&lt;1.0,0,F}</td>
<td style="text-align: center">Empty<br />{1.0,0,T}</td>
<td style="text-align: center">05</td>
<td style="text-align: center">Empty<br />{1.0,0,T}=B</td>
<td style="text-align: center">Degenerate<br />{ThetaA,0,F}=A</td>
<td style="text-align: center">Degenerate<br />{ThetaA,0,F}=A</td>
<td style="text-align: center">E,DA,DA</td>
</tr>
<tr>
<td style="text-align: center">Degenerate<br />{&lt;1.0,0,F}</td>
<td style="text-align: center">Exact<br />{1.0,&gt;0,F}</td>
<td style="text-align: center">06</td>
<td style="text-align: center">Degenerate<br />{ThetaA,0,F}=A</td>
<td style="text-align: center">Degenerate<br />{ThetaA,0,F}=A</td>
<td style="text-align: center">Trim B<br />by minTheta</td>
<td style="text-align: center">DA,DA,TB</td>
</tr>
<tr>
<td style="text-align: center">Degenerate<br />{&lt;1.0,0,F}</td>
<td style="text-align: center">Estimation<br />{&lt;1.0,&gt;0,F}</td>
<td style="text-align: center">02</td>
<td style="text-align: center">Degenerate<br />{minTheta,0,F}</td>
<td style="text-align: center">Degenerate<br />{minTheta,0,F}</td>
<td style="text-align: center">Trim B<br />by minTheta</td>
<td style="text-align: center">D,D,TB</td>
</tr>
<tr>
<td style="text-align: center">Degenerate<br />{&lt;1.0,0,F}</td>
<td style="text-align: center">Degenerate<br />{&lt;1.0,0,F}</td>
<td style="text-align: center">00</td>
<td style="text-align: center">Degenerate<br />{minTheta,0,F}</td>
<td style="text-align: center">Degenerate<br />{minTheta,0,F}</td>
<td style="text-align: center">Degenerate<br />{minTheta,0,F}</td>
<td style="text-align: center">D,D,D</td>
</tr>
</tbody>
</table>
<p><strong>Column Descriptions:</strong></p>
<ul>
<li>Pair ID: two octal digits, the first digit represents the state of Sketch A, the second digit represents the state of Sketch B.</li>
<li>Sketch A State</li>
<li>Sketch B State</li>
<li>Intersection Action</li>
<li>AnotB Action</li>
<li>Union Action</li>
<li>Action Codes: Intersection, AnotB, Union.</li>
</ul>
<p>The action IDs are given by the following table along with description and where used:</p>
<table>
<thead>
<tr>
<th style="text-align: center">Action ID</th>
<th style="text-align: center">Action<br />Description</th>
<th style="text-align: center">Intersection</th>
<th style="text-align: center">AnotB</th>
<th style="text-align: center">Union</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center">A</td>
<td style="text-align: center">Sketch A</td>
<td style="text-align: center"> </td>
<td style="text-align: center"></td>
<td style="text-align: center"></td>
</tr>
<tr>
<td style="text-align: center">TA</td>
<td style="text-align: center">Trim Sketch A<br />by minTheta</td>
<td style="text-align: center"> </td>
<td style="text-align: center"></td>
<td style="text-align: center"></td>
</tr>
<tr>
<td style="text-align: center">B</td>
<td style="text-align: center">Sketch B</td>
<td style="text-align: center"> </td>
<td style="text-align: center"> </td>
<td style="text-align: center"></td>
</tr>
<tr>
<td style="text-align: center">TB</td>
<td style="text-align: center">Trim Sketch B<br />by minTheta</td>
<td style="text-align: center"> </td>
<td style="text-align: center"> </td>
<td style="text-align: center"></td>
</tr>
<tr>
<td style="text-align: center">D</td>
<td style="text-align: center">Degenerate<br />{minTheta,0,F}</td>
<td style="text-align: center"></td>
<td style="text-align: center"></td>
<td style="text-align: center"></td>
</tr>
<tr>
<td style="text-align: center">DA</td>
<td style="text-align: center">Degenerate<br />{ThetaA,0,F}<br />(optional)</td>
<td style="text-align: center"> </td>
<td style="text-align: center"> </td>
<td style="text-align: center"></td>
</tr>
<tr>
<td style="text-align: center">DB</td>
<td style="text-align: center">Degenerate<br />{ThetaB,0,F}<br />(optional)</td>
<td style="text-align: center"> </td>
<td style="text-align: center"> </td>
<td style="text-align: center"></td>
</tr>
<tr>
<td style="text-align: center">E</td>
<td style="text-align: center">Empty<br />{1.0,0,T}</td>
<td style="text-align: center"></td>
<td style="text-align: center"></td>
<td style="text-align: center"></td>
</tr>
<tr>
<td style="text-align: center">I</td>
<td style="text-align: center">Full Intersect</td>
<td style="text-align: center"></td>
<td style="text-align: center"> </td>
<td style="text-align: center"> </td>
</tr>
<tr>
<td style="text-align: center">N</td>
<td style="text-align: center">Full AnotB</td>
<td style="text-align: center"> </td>
<td style="text-align: center"></td>
<td style="text-align: center"> </td>
</tr>
<tr>
<td style="text-align: center">U</td>
<td style="text-align: center">Full Union</td>
<td style="text-align: center"> </td>
<td style="text-align: center"> </td>
<td style="text-align: center"></td>
</tr>
</tbody>
</table>
<p>Note that the results of <em>Full Intersect</em>, <em>Full AnotB</em>, or <em>Full Union</em> actions will require further interpretation of the resulting state. For example:</p>
<ul>
<li>If the resulting sketch is <em>{1.0,0,?}</em>, then an <em>Empty{1.0,0,T}</em> is returned.</li>
<li>If the resulting sketch is <em>{&lt;1.0,0,?}</em> then a <em>Degenerate{&lt;1.0,0,F}</em> is returned.</li>
<li>Otherwise, the sketch returned will be an estimating <em>{minTheta, &gt;0, F}</em>, or exact <em>{1.0, &gt;0, F}</em>.</li>
</ul>
<h2 id="testing">Testing</h2>
<p>The above information is encoded as a model into the special class
<em><a href="https://github.com/apache/datasketches-java/blob/master/src/main/java/org.apache.datasketches.SetOperationCornerCases.java">org.apache.datasketches.SetOperationsCornerCases</a></em>.
This class is made up of enums and static methods to quickly determine for a sketch what actions to take based on the state of the input arguments.
This model is independent of the implementation of the Theta Sketch, whether the set operation is performed as a Theta Sketch, or a Tuple Sketch and when translated can be used in other languages as well.</p>
<p>Before this model was put to use an extensive set of tests was designed to test any potential implementation against this model.
These tests are slightly different for the Tuple Sketch than the Theta Sketch because the Tuple Sketch has more combinations to test, but the model is the same.</p>
<p>The tests for the Theta Sketch can be found in the class
<em><a href="https://github.com/apache/datasketches-java/blob/master/src/main/java/org.apache.datasketches.theta.CornerCaseThetaSetOperationsTest.java">org.apache.datasketches.theta.CornerCaseThetaSetOperationsTest</a></em></p>
<p>The tests for the Tuple Sketch can be found in the class
<em><a href="https://github.com/apache/datasketches-java/blob/master/src/main/java/org.apache.datasketches.tuple.aninteger.CornerCaseTupleSetOperationsTest.java">org.apache.datasketches.tuple.aninteger.CornerCaseTupleSetOperationsTest</a></em></p>
<p>The details of how this model is used in run-time code can be found in the class <em><a href="https://github.com/apache/datasketches-java/blob/master/src/main/java/org.apache.datasketches.tuple.AnotB.java">org.apache.datasketches.tuple.AnotB.java</a></em>.</p>
</div> <!-- End content -->
</div> <!-- End row -->
</div> <!-- End Container -->
<!-- Start _include/page_footer.html -->
<footer class="ds-footer">
<div class="container">
<div class="text-center">
<p>
<div>Copyright © 2024 <a href="https://www.apache.org">Apache Software Foundation</a>,
Licensed under the Apache License, Version 2.0. All Rights Reserved.
| <a href="https://privacy.apache.org/policies/privacy-policy-public.html">Privacy Policy</a><br/>
Apache DataSketches, Apache, the Apache feather logo, and the Apache DataSketches project logos are trademarks of The Apache Software Foundation.<br/>
All other marks mentioned may be trademarks or registered trademarks of their respective owners.
</div>
</p>
</div>
</div>
</footer>
<!-- End _include/page_footer.html -->
</body>
</html>
<!-- End _layouts/doc_page.html-->