The release files do not include the needed python binding library (pybind11). If building from a relase package, you must ensure that the pybind11 directory points to a local copy of pybind11.
An official pypi build is eventually planned but not yet available.
If you instead want to take a (possibly ill-advised) gamble on the current state of the master branch being useable, you can run: pip install git+https://github.com/apache/incubator-datasketches-cpp.git
When cloning the source repository, you should include the pybind11 submodule with the --recursive
option to the clone command:
git clone --recursive https://github.com/apache/incubator-datasketches-cpp.git cd incubator-datasketches-cpp python -m pip install --upgrade pip setuptools wheel numpy python setup.py build
If you cloned without --recursive
, you can add the submodule post-checkout using git submodule update --init --recursive
.
Assuming you have already checked out the library and any dependent submodules, install by simply replacing the lsat line of the build command with python setup.py install
.
The python tests are run with tox
. To ensure you have all the needed packages, from the package base directory run:
python -m pip install --upgrade pip setuptools wheel numpy tox tox
Having installed the library, loading the Datasketches library in Python is simple: import datasketches
.
kll_ints_sketch
kll_floats_sketch
frequent_strings_sketch
frequent_items_error_type.{NO_FALSE_NEGATIVES | NO_FALSE_POSITIVES}
update_theta_sketch
compact_theta_sketch
(cannot be instantiated directly)theta_union
theta_intersection
theta_a_not_b
hll_sketch
hll_union
tgt_hll_type.{HLL_4 | HLL_6 | HLL_8}
cpc_sketch
cpc_union
var_opt_sketch
var_opt_union
The Python API largely mirrors the C++ API, with a few minor exceptions: The primary known differences are that Python on modern platforms does not support unsigned integer values or numeric values with fewer than 64 bits. As a result, you may not be able to produce identical sketches from within Python as you can with Java and C++. Loading those sketches after they have been serialized from another language will work as expected.
We have also removed reliance on a builder class for theta sketches as Python allows named arguments to the constructor, not strictly positional arguments.