Please visit the main DataSketches website for more information.
If you are interested in making contributions to this site please see our Community page for how to contact us.
We define characterization as the task of comprehensively measuring accuracy or speed performance of various library components over a wide range of inputs. The output from characterization tests are huge tables of numbers with hundreds or thousands of rows. These are then converted into graphs and charts for close examination of behavior. This is different from benchmarking, which is usually focused on testing performance of a given configuration against a single or small set of thresholds for a pass / fail result.
These characterization tests are often long running (some can run for many hours or a few days) and very resource intensive, which makes them unsuitable for including in unit tests. The reason for the long running times is that sketches are stochastic in nature and it takes thousands to millions of trials to reduce the statistical noise in order to produce relatively smooth graphs that we can then analyze.
The code in this repository are the test suites we use to create most of the plots on our website and provide evidence for our speed and accuracy claims. This code is shared here so that others can duplicate our characterizations.
This code is shared “as-is” and does not pretend to have the same level of quality as the primary repositories (java, C++, Go). This code is not formally released nor archived to Maven Central and will change from time-to-time as we grow these characterization suites. At any point in time this code is not necessarily up-to-date with the latest releases and thus may require some tweeking to get it to compile.
Caveat Emptor, and enjoy!
This Java classes of this DataSketches component must be compiled using JDK 8.
This DataSketches component is structured as a Maven project and Maven is the recommended Build Tool.
There are two types of tests: normal unit tests and tests run by the strict profile.
To run normal unit tests:
$ mvn clean test
To run the strict profile tests:
$ mvn clean test -P strict
See the pom.xml for the top-level dependencies.
See the pom.xml file for test dependencies.
If you already have Eclipse you will need to install the CDT extensions, or you can install Eclipse with CDT only. We had to upgrade our Eclipse to the latest version before we could successfully install the CDT extensions.
We have found it convenient to setup two projects in Eclipse:
After your project is created, open Project Properties
C/C++ Build In this menu select Use default build command, Generate Makefiles automatically, and Expand Env. Variable Refs in Makefiles.
C/C++ General
After this setup you should be able to Build Project from the top-level Eclipse / Project Menu. You may need to unselect the Build Automatically option.
go build
The project has a main function that runs the characterization tests. You can run the tests by running the following command:
./datasketches-characterization-go <job name>
or alternatively:
go run . <job name>
The list of available jobs can be found in the usage of the program:
./datasketches-characterization-go