layout: doc_page

ReqSketch Accuracy with Adversarial Streams

This set of tests characterize the accuracy (or more precisely the rank error) of the ReqSketch using specifically selected adversarial stream patterns. The goal of this suite of tests is to understand how the rank error of the sketch behaves across all ranks with these specific stream patterns. All of these tests are run with the same sketch configuration except for the choice of the adversarial stream pattern.

The design of these tests is different from the tests for the Random Shuffled Streams in one key aspect, we do not shuffle the input stream for each trial. Here, each test has one pattern and runs multiple trials with exactly the same pattern. The plots then reveal the results of the random process of the sketch itself.

For those that are interested in the actual code that run these tests can examine the following links.

  • Code: The code used to generate these characterization studies.
  • Config: The human readable and editable configuration file that instructs the above code with the specific properties used to run the test. These configuration properties are different for each of the following plots and summarized below with each plot.

Test Design

  • Stream Length (SL): 2^20 = 1,048,576 floats
  • Stream Values: Natural numbers, ℕ1, from 1 to SL, expressed as 32-bit floats.
  • Y-axis: The absolute error of the sketch getRank(value) method.
  • X-axis: The normalized rank [0.0, 1.0]
  • Plot Points (PP): 100. Equally spaced points along the X-axis starting at 0 and ending at 1.0.
  • Trial:
    • The stream is generated according to the chosen adversarial pattern and presented to the sketch.
    • Execute estRanks[] = getRanks(trueValues[]) from the sketch where the array of trueValues[] are the 100 equally spaced values along the x-axis that exactly correspond to the above Plot Points. The estRanks[] is the array of the sketch's estimates of the ranks of the given trueValues.
    • Computes the absolute error: estRanks[i] - trueRanks[i] for each Plot Point and submits this error to a corresponding array of standard qantile sketches, with one quantile sketch assigned to each Plot Point.
  • Trials:
    • 2^12 trials are executed. This produces a distribution of error at each Plot Point held by the error quantile skeches.
    • Note: With these tests, since we do not shuffle the stream with each trial, randomization is due to just the internal random process of the sketch itself.
    • Extract 7 quantiles from the error quantile sketches at each Plot Point. These error quantiles correspond to the standard normal distribuion at the median, +/- 1SD, +/- 2SD and +/- 3SD, where SD stands for Standard Deviation from the mean.
  • Plotting:
    • Each of the error quantiles are connected by lines to form contours of the error distribution where the area between the +/- 1SD contours is the size of the 68% confidence interval; between the +/- 2SD coutours is the 95.4% confidence interval; and between the +/- 3SD contours is the 99.7% confidence interval.
    • In addition to the error contours. 6 dashed contours (with colors corresponding to the error contours) represent the a priori estimates of the error at each of the +/- standard deviations computed from the sketch's getRankLowerBound(double, int) and getRankUpperBound(double, int) methods.

Common Configuration for the these tests

  • K=50: the sketch sizing & accuracy parameter
  • HRA: High Rank Accuracy
  • Crit=LT: Comparison criterion: LT = Less-Than
  • SL=2^20: StreamLength
  • Eq Spaced: Equally spaced Plot Points (PP)
  • PP=100: Number of plot points on the x-axis

Results

Plot 1 Adversarial Pattern: Sorted

  • Run Time: 8:18
  • Unique Error Value Count Per Plot Point: Min=2, Max=2

Plot 2 Adversarial Pattern: Reversed

  • Run Time: 4:12
  • Unique Error Value Count Per Plot Point: Min=8, Max=104

Plot 3 Adversarial Pattern: Random

  • Run Time: 14:30
  • Unique Error Value Count Per Plot Point: Min=61, Max=797

Plot 4 Adversarial Pattern: Zoomin

  • Run Time: 12:50
  • Unique Error Value Count Per Plot Point: Min=2, Max=171

Plot 5 Adversarial Pattern: Zoomout

  • Run Time: 17:36
  • Unique Error Value Count Per Plot Point: Min=24, Max=501

Plot 6 Adversarial Pattern: Sqrt

  • Run Time: 7:57
  • Unique Error Value Count Per Plot Point: Min=2, Max=169

Plot 6 Adversarial Pattern: FlipFlop

  • Run Time: 9:12
  • Unique Error Value Count Per Plot Point: Min=2, Max=137

Final Compactor Profile & Size

Because all of these tests are run with the same exact sketch configuration and the same input stream length they all have the same final compactor profile, retained items and Serialization Bytes.

Lg WtItemsNominal SizeSection SizeNum SectionsNum Compactions
0786864182416288
191986418246755
284686418242979
353157624121308
46135762412578
55295762412246
65285762412106
7781576241233
862143236613
93644323664
102503005031
111353005030
  • Retained Items: 6903
  • Serialization Bytes: 28144