Apache Datasketches

Clone this repo:
  1. 2c14bfb Merge pull request #19 from apache/UpdatePrintFunctions by Lee Rhodes · 7 days ago master
  2. f8f09dd Update print functions to handle Objects. by Lee Rhodes · 8 days ago
  3. 8b5d1fa Add license escape character CommentType by Lee Rhodes · 3 weeks ago
  4. 89dfcba Update ThetaSpeedJob.conf by Lee Rhodes · 4 weeks ago
  5. ba1e693 update theta speed profiles by Lee Rhodes · 4 weeks ago


We define characterization as the task of comprehensively measuring accuracy or speed performance of our library. These characterization tests are often long running (some can run for days) and very resource intensive, which makes them unsuitable for including in unit tests. The code in this repository are some of the test suites we use to create some of the plots on our website and provide evidence for our speed and accuracy claims. This code is shared here so that others can duplicate our own characterizations.

The code here is shared “as-is” and does not pretend to have the same level of quality as the primary repositories (jave, pig, hive and vector). This code is not archived to Maven Central and will change from time-to-time as we grow these characterization suites.


DataSketches Library Website

Build Instructions

JDK8 is Required Compiler

This DataSketches component is pure Java and you must compile using JDK 8.

Recommended Build Tool

This DataSketches component is structured as a Maven project and Maven is the recommended Build Tool.

There are two types of tests: normal unit tests and tests run by the strict profile.

To run normal unit tests:

$ mvn clean test

To run the strict profile tests:

$ mvn clean test -P strict



See the pom.xml for the top-level dependencies.


See the pom.xml file for test dependencies.


Issues for datasketches-checkstyle


Dev mailing list