Relative Error Quantiles Sketch that rovides extremely high accuracy at a chosen end of the rank domain: high rank accuracy (HRA) or low rank accuracy (LRA). REQ sketches are quantile sketches that provide approximate quantiles and ranks for a dataset.
Please visit REQ Sketches for more information about this sketch family.
Please visit the main Apache DataSketches website for more information about DataSketches library.
If you are interested in making contributions to this project please see our Community page for how to contact us.
Creates a sketch that represents the distribution of the given column.
Merges sketches from the given column.
Creates a sketch that represents the distribution of the given column.
Merges sketches from the given column.
Returns the length of the input stream.
Returns the number of retained items (samples) in the sketch.
Returns the minimum value of the input stream.
Returns a summary string that represents the state of the given sketch.
Returns the maximum value of the input stream.
Returns an approximation to the Cumulative Distribution Function (CDF) of the input stream as an array of cumulative probabilities defined by the given split_points.
Param sketch: the given sketch as BYTES.
Param split_points: an array of M unique, monotonically increasing values (of the same type as the input values to the sketch) that divide the input value domain into M+1 overlapping intervals.
The start of each interval is below the lowest input value retained by the sketch (corresponding to a zero rank or zero probability).
The end of each interval is the associated split-point except for the top interval where the end is the maximum input value of the stream.
Param inclusive: if true and the upper boundary of an interval equals a value retained by the sketch, the interval will include that value. If the lower boundary of an interval equals a value retained by the sketch, the interval will exclude that value.
If false and the upper boundary of an interval equals a value retained by the sketch, the interval will exclude that value. If the lower boundary of an interval equals a value retained by the sketch, the interval will include that value.
Returns: the CDF as a monotonically increasing FLOAT64 array of M+1 cumulative probablities on the interval [0.0, 1.0]. The top-most probability of the returned array is always 1.0.
Returns an approximate lower bound of the given normalized rank.
Returns an approximation to the Probability Mass Function (PMF) of the input stream as an array of probability masses defined by the given split_points.
Param sketch: the given sketch as BYTES.
Param split_points: an array of M unique, monotonically increasing values (of the same type as the input values) that divide the input value domain into M+1 non-overlapping intervals.
Each interval except for the end intervals starts with a split-point and ends with the next split-point in sequence.
The first interval starts below the minimum value of the stream (corresponding to a zero rank or zero probability), and ends with the first split-point
The last (m+1)th interval starts with the last split-point and ends above the maximum value of the stream (corresponding to a rank or probability of 1.0).
Param inclusive: if true and the upper boundary of an interval equals a value retained by the sketch, the interval will include that value. If the lower boundary of an interval equals a value retained by the sketch, the interval will exclude that value.
If false and the upper boundary of an interval equals a value retained by the sketch, the interval will exclude that value. If the lower boundary of an interval equals a value retained by the sketch, the interval will include that value.
Returns: the PMF as a FLOAT64 array of M+1 probability masses on the interval [0.0, 1.0]. The sum of the probability masses of all (m+1) intervals is 1.0.
Returns a value from the sketch that is the best approximation to a value from the original stream with the given rank.
Returns an approximate upper bound of the given normalized rank.
Returns an approximation to the normalized rank, on the interval [0.0, 1.0], of the given value.
# using defaults create or replace temp table req_sketch(sketch bytes); insert into req_sketch (select bqutil.datasketches.req_sketch_float_build(value) from unnest([1,2,3,4,5,6,7,8,9,10]) as value); insert into req_sketch (select bqutil.datasketches.req_sketch_float_build(value) from unnest([11,12,13,14,15,16,17,18,19,20]) as value); select bqutil.datasketches.req_sketch_float_to_string(bqutil.datasketches.req_sketch_float_merge(sketch)) from req_sketch; # expected 0.5 select bqutil.datasketches.req_sketch_float_get_rank(bqutil.datasketches.req_sketch_float_merge(sketch), 10, true) from req_sketch; # expected 10 select bqutil.datasketches.req_sketch_float_get_quantile(bqutil.datasketches.req_sketch_float_merge(sketch), 0.5, true) from req_sketch; # expected 0.5, 0.5 select bqutil.datasketches.req_sketch_float_get_pmf(bqutil.datasketches.req_sketch_float_merge(sketch), [10.0], true) from req_sketch; # expected 0.5, 1 select bqutil.datasketches.req_sketch_float_get_cdf(bqutil.datasketches.req_sketch_float_merge(sketch), [10.0], true) from req_sketch; # expected 1 select bqutil.datasketches.req_sketch_float_get_min_value(bqutil.datasketches.req_sketch_float_merge(sketch)) from req_sketch; # expected 20 select bqutil.datasketches.req_sketch_float_get_max_value(bqutil.datasketches.req_sketch_float_merge(sketch)) from req_sketch; # expected 20 select bqutil.datasketches.req_sketch_float_get_n(bqutil.datasketches.req_sketch_float_merge(sketch)) from req_sketch; # expected 20 select bqutil.datasketches.req_sketch_float_get_num_retained(bqutil.datasketches.req_sketch_float_merge(sketch)) from req_sketch; drop table req_sketch; # using full signatures create or replace temp table req_sketch(sketch bytes); insert into req_sketch (select bqutil.datasketches.req_sketch_float_build_k_hra(value, struct<int, bool>(10, false)) from unnest([1,2,3,4,5,6,7,8,9,10]) as value); insert into req_sketch (select bqutil.datasketches.req_sketch_float_build_k_hra(value, struct<int, bool>(10, false)) from unnest([11,12,13,14,15,16,17,18,19,20]) as value); select bqutil.datasketches.req_sketch_float_to_string(bqutil.datasketches.req_sketch_float_merge_k_hra(sketch, struct<int, bool>(10, false))) from req_sketch; drop table req_sketch;