The druid-stats
extension for Apache Druid incorporates aggregators to compute test statistics, including z-scores and p-values. Please refer to Democratizing Experimentation Data for Product Innovations for math background and details.
Make sure to include druid-stats
extension in order to use these aggregators.
Please refer to Making Sense of the Two-Proportions Test and An Introduction to Statistics: Comparing Two Means for more details.
z = (p1 - p2) / S.E. (assuming null hypothesis is true)
Please see below for p1 and p2. Please note S.E. stands for standard error where
S.E. = sqrt{ p1 * ( 1 - p1 )/n1 + p2 * (1 - p2)/n2) }
(p1 – p2) is the observed difference between two sample proportions.
zscore2sample
: calculate the z-score using two-sample z-test while converting binary variables (e.g. success or not) to continuous variables (e.g. conversion rate).{ "type": "zscore2sample", "name": "<output_name>", "successCount1": <post_aggregator> success count of sample 1, "sample1Size": <post_aggregaror> sample 1 size, "successCount2": <post_aggregator> success count of sample 2, "sample2Size" : <post_aggregator> sample 2 size }
Please note the post aggregator will be converting binary variables to continuous variables for two population proportions. Specifically
p1 = (successCount1) / (sample size 1)
p2 = (successCount2) / (sample size 2)
pvalue2tailedZtest
: calculate p-value of two-sided z-test from zscore{ "type": "pvalue2tailedZtest", "name": "<output_name>", "zScore": <zscore post_aggregator> }
In this example, we use zscore2sample post aggregator to calculate z-score, and then feed the z-score to pvalue2tailedZtest post aggregator to calculate p-value.
A JSON query example can be as follows:
{ ... "postAggregations" : { "type" : "pvalue2tailedZtest", "name" : "pvalue", "zScore" : { "type" : "zscore2sample", "name" : "zscore", "successCount1" : { "type" : "constant", "name" : "successCountFromPopulation1Sample", "value" : 300 }, "sample1Size" : { "type" : "constant", "name" : "sampleSizeOfPopulation1", "value" : 500 }, "successCount2": { "type" : "constant", "name" : "successCountFromPopulation2Sample", "value" : 450 }, "sample2Size" : { "type" : "constant", "name" : "sampleSizeOfPopulation2", "value" : 600 } } } }