Merge pull request #173 from cbalci/pinot-integration
Add documentation for Apache Pinot integration
diff --git a/_includes/toc.html b/_includes/toc.html
index 7986ecf..34e8cec 100644
--- a/_includes/toc.html
+++ b/_includes/toc.html
@@ -290,6 +290,7 @@
<li><a href="{{site.docs_dir}}/SystemIntegrations/ApacheDruidIntegration.html">•Using Sketches in ApacheDruid</a></li>
<li><a href="{{site.docs_dir}}/SystemIntegrations/ApacheHiveIntegration.html">•Using Sketches in Apache Hive</a></li>
<li><a href="{{site.docs_dir}}/SystemIntegrations/ApachePigIntegration.html">•Using Sketches in Apache Pig</a></li>
+ <li><a href="{{site.docs_dir}}/SystemIntegrations/ApachePinotIntegration.html">•Using Sketches in Apache Pinot</a></li>
<li><a href="{{site.docs_dir}}/SystemIntegrations/PostgreSQLIntegration.html">•Using Sketches in PostgreSQL</a></li>
</div>
diff --git a/docs/Architecture/LargeScale.md b/docs/Architecture/LargeScale.md
index 5e4b1de..5d2ce53 100644
--- a/docs/Architecture/LargeScale.md
+++ b/docs/Architecture/LargeScale.md
@@ -82,6 +82,8 @@
* [Apache Pig](https://datasketches.apache.org/docs/SystemIntegrations/ApachePigIntegration.html)
+* [Apache Pinot](https://datasketches.apache.org/docs/SystemIntegrations/ApachePinotIntegration.html)
+
* [PostgreSQL](https://datasketches.apache.org/docs/SystemIntegrations/PostgreSQLIntegration.html)
* [Spark Examples](https://datasketches.apache.org/docs/Theta/ThetaSparkExample.html)
diff --git a/docs/SystemIntegrations/ApachePinotIntegration.md b/docs/SystemIntegrations/ApachePinotIntegration.md
new file mode 100644
index 0000000..e6a7c24
--- /dev/null
+++ b/docs/SystemIntegrations/ApachePinotIntegration.md
@@ -0,0 +1,71 @@
+---
+layout: doc_page
+---
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+### Apache Pinot Integration
+[Apache Pinot](https://pinot.apache.org/) has built-in support for most major sketch families from Apache Datasketches as aggregation and transformation functions in its SQL dialect.
+
+Example:
+```sql
+select distinctCountThetaSketch(
+ sketchCol,
+ 'nominalEntries=1024',
+ 'country'=''USA'' AND 'state'=''CA'', 'device'=''mobile'', 'SET_INTERSECT($1, $2)'
+)
+from table
+where country = 'USA' or device = 'mobile...'
+```
+
+### Cardinality Estimation
+* [DistinctCountThetaSketch](https://docs.pinot.apache.org/configuration-reference/functions/distinctcountthetasketch)
+* [CPCSketch](https://docs.pinot.apache.org/users/user-guide-query/query-syntax/how-to-handle-unique-counting#compressed-probability-counting-cpc-sketches)
+* [TupleSketches](https://docs.pinot.apache.org/users/user-guide-query/query-syntax/how-to-handle-unique-counting#tuple-sketches)
+
+### Quantiles
+* [PercentileKLL](https://docs.pinot.apache.org/configuration-reference/functions/percentilekll)
+
+### Frequent Items
+* [FrequentLongsSketch](https://docs.pinot.apache.org/configuration-reference/functions/frequentlongssketch)
+* [FrequentStringsSketch](https://docs.pinot.apache.org/configuration-reference/functions/frequentstringssketch)
+
+<hr>
+### Advanced Integration
+#### Raw Output Mode
+Supported functions have 'raw' variants which can output binary representations of sketches for further processing.
+
+Example:
+```sql
+select percentileRawKll(ArrDelayMinutes, 90) as sketch
+from airlineStats
+```
+Returns Base64 encoded string: `BQEPC...`
+
+Output can be processed as:
+
+```java
+byte[] decodedBytes = Base64.getDecoder().decode(encoded);
+KllDoublesSketch sketch = KllDoublesSketch.wrap(Memory.wrap(decodedBytes));
+
+System.out.println("Min, Median, Max values:");
+System.out.println(Arrays.toString(sketch.getQuantiles(new double[]{0, 0.5, 1})));
+```
+
+#### Pre-built Sketch Ingestion
+Apache Pinot can also ingest pre-built sketch objects either via Kafka (Realtime) or Spark (Batch) and merge them when doing aggregations.
diff --git a/src/main/resources/docgen/toc.json b/src/main/resources/docgen/toc.json
index e5cfd06..6df97ec 100644
--- a/src/main/resources/docgen/toc.json
+++ b/src/main/resources/docgen/toc.json
@@ -208,8 +208,6 @@
},
{ "class":"Dropdown", "desc" : "Quantiles Studies", "array":
[
-
- {"class":"Doc", "desc" : "KLL sketch vs t-digest", "dir" : "QuantilesStudies", "file": "KllSketchVsTDigest" },
{"class":"Doc", "desc" : "Druid Approximate Histogram", "dir" : "QuantilesStudies", "file": "DruidApproxHistogramStudy" },
{"class":"Doc", "desc" : "Moments Sketch Study", "dir" : "QuantilesStudies", "file": "MomentsSketchStudy" },
{"class":"Doc", "desc" : "Quantiles StreamA Study", "dir" : "QuantilesStudies", "file": "QuantilesStreamAStudy" },
@@ -249,6 +247,7 @@
{"class":"Doc", "desc" : "Using Sketches in ApacheDruid", "dir" : "SystemIntegrations", "file": "ApacheDruidIntegration" },
{"class":"Doc", "desc" : "Using Sketches in Apache Hive", "dir" : "SystemIntegrations", "file": "ApacheHiveIntegration" },
{"class":"Doc", "desc" : "Using Sketches in Apache Pig", "dir" : "SystemIntegrations", "file": "ApachePigIntegration" },
+ {"class":"Doc", "desc" : "Using Sketches in Apache Pinot", "dir" : "SystemIntegrations", "file": "ApachePinotIntegration" },
{"class":"Doc", "desc" : "Using Sketches in PostgreSQL", "dir" : "SystemIntegrations", "file": "PostgreSQLIntegration" },
]
},
@@ -270,4 +269,3 @@
},
]
}
-