The easiest way to install the python wrapper is to run pip install git+https://github.com/apache/incubator-datasketches-cpp.git
If you prefer to downlioad the source first, be sure to clone the repo with --recursive
to ensure you get the python binding library (pybind11):
git clone --recursive https://github.com/apache/incubator-datasketches-cpp.git cd incubator-datasketches-cpp pip install .
In the event you do not have pip
installed, you can invoke the setup script directly by replacing the last line above with python3 setup.py install
.
Having installed the library, loading the Datasketches library in Python is simple: from datasketches import *
.
kll_ints_sketch
kll_floats_sketch
frequent_strings_sketch
frequent_items_error_type.{NO_FALSE_NEGATIVES | NO_FALSE_POSITIVES}
update_theta_sketch
compact_theta_sketch
(cannot be instantiated directly)theta_union
theta_intersection
theta_a_not_b
hll_sketch
hll_union
tgt_hll_type.{HLL_4 | HLL_6 | HLL_8}
cpc_sketch
cpc_union
The Python API largely mirrors the C++ API, with a few minor exceptions: The primary known differences are that Python on modern platforms does not support unsigned integer values or numeric values with fewer than 64 bits. As a result, you may not be able to produce identical sketches from within Python as you can with Java and C++. Loading those sketches after they have been serialized from another language will work as expected.
We have also removed reliance on a builder class for theta sketches as Python allows named arguments to the constructor, not strictly positional arguments.