Apache Datasketches

Clone this repo:

Branches

  1. 8600775 Merge pull request #32 from apache/kll_timing by Alexander Saydakov · 3 days ago master
  2. e6435dc kll memory usage by AlexanderSaydakov · 6 days ago
  3. 415510e fixed legend (no java measurements yet) by AlexanderSaydakov · 7 days ago
  4. cdb0b7d added gcc9 to names of files with timing results by AlexanderSaydakov · 7 days ago
  5. b9cac18 Merge pull request #31 from apache/kll_timing by Alexander Saydakov · 7 days ago

Characterization Java & C++ Component

We define characterization as the task of comprehensively measuring accuracy or speed performance of our library. These characterization tests are often long running (some can run for days) and very resource intensive, which makes them unsuitable for including in unit tests. The code in this repository are some of the test suites we use to create some of the plots on our website and provide evidence for our speed and accuracy claims.
This code is shared here so that others can duplicate our own characterizations.

The code here is shared “as-is” and does not pretend to have the same level of quality as the primary repositories (java, pig, hive and vector). This code is not archived to Maven Central and will change from time-to-time as we grow these characterization suites.

Documentation

DataSketches Library Website

Java Core Overview

Java Core Javadocs

Build Instructions (Java)

JDK8 is required to compile

This Java classes of this DataSketches component must be compiled using JDK 8.

Recommended Build Tool

This DataSketches component is structured as a Maven project and Maven is the recommended Build Tool.

There are two types of tests: normal unit tests and tests run by the strict profile.

To run normal unit tests:

$ mvn clean test

To run the strict profile tests:

$ mvn clean test -P strict

Dependencies

Run-time

See the pom.xml for the top-level dependencies.

Testing

See the pom.xml file for test dependencies.

Build Instructions (C++)

From within Eclipse

  1. After your project is created, from “Project Properties”
  2. From the Eclipse C++ Build Menu, check “Generate Makefiles automatically”.
  3. Under “Settings”, select “Compiler”, then “Includes” and add incude directories for the appropriate sketches and common.
  4. Under “Optimization” select “-O3” and “-DNDEBUG”.

Resources

Issues for datasketches-characterization

Forum

Dev mailing list