blob: 43857ee5c4c831317c29dc280a967779c494a5d5 [file] [log] [blame]
<!-- HTML header for doxygen 1.8.4-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<meta name="generator" content="Doxygen 1.8.13"/>
<meta name="keywords" content="madlib,postgres,greenplum,machine learning,data mining,deep learning,ensemble methods,data science,market basket analysis,affinity analysis,pca,lda,regression,elastic net,huber white,proportional hazards,k-means,latent dirichlet allocation,bayes,support vector machines,svm"/>
<title>MADlib: Hypothesis Tests</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="navtreedata.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript">
$(document).ready(initResizable);
</script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/searchdata.js"></script>
<script type="text/javascript" src="search/search.js"></script>
<script type="text/javascript">
$(document).ready(function() { init_search(); });
</script>
<!-- hack in the navigation tree -->
<script type="text/javascript" src="eigen_navtree_hacks.js"></script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
<link href="madlib_extra.css" rel="stylesheet" type="text/css"/>
<!-- google analytics -->
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-45382226-1', 'madlib.incubator.apache.org');
ga('send', 'pageview');
</script>
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td id="projectlogo"><a href="http://madlib.incubator.apache.org"><img alt="Logo" src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
<td style="padding-left: 0.5em;">
<div id="projectname">
<span id="projectnumber">1.11</span>
</div>
<div id="projectbrief">User Documentation for MADlib</div>
</td>
<td> <div id="MSearchBox" class="MSearchBoxInactive">
<span class="left">
<img id="MSearchSelect" src="search/mag_sel.png"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
alt=""/>
<input type="text" id="MSearchField" value="Search" accesskey="S"
onfocus="searchBox.OnSearchFieldFocus(true)"
onblur="searchBox.OnSearchFieldFocus(false)"
onkeyup="searchBox.OnSearchFieldChange(event)"/>
</span><span class="right">
<a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a>
</span>
</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.13 -->
<script type="text/javascript">
var searchBox = new SearchBox("searchBox", "search",false,'Search');
</script>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
<div id="nav-tree">
<div id="nav-tree-contents">
<div id="nav-sync" class="sync"></div>
</div>
</div>
<div id="splitbar" style="-moz-user-select:none;"
class="ui-resizable-handle">
</div>
</div>
<script type="text/javascript">
$(document).ready(function(){initNavTree('group__grp__stats__tests.html','');});
</script>
<div id="doc-content">
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
onkeydown="return searchBox.OnSearchSelectKey(event)">
</div>
<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<iframe src="javascript:void(0)" frameborder="0"
name="MSearchResults" id="MSearchResults">
</iframe>
</div>
<div class="header">
<div class="headertitle">
<div class="title">Hypothesis Tests<div class="ingroups"><a class="el" href="group__grp__stats.html">Statistics</a> &raquo; <a class="el" href="group__grp__inf__stats.html">Inferential Statistics</a></div></div> </div>
</div><!--header-->
<div class="contents">
<div class="toc"><b>Contents</b> <ul>
<li>
<a href="#input">Input</a> </li>
<li>
<a href="#usage">Usage</a> </li>
<li>
<a href="#examples">Examples</a> </li>
<li>
<a href="#literature">Literature</a> </li>
<li>
<a href="#related">Related Topics</a> </li>
</ul>
</div><p>Hypothesis tests are used to confirm or reject a <em>null hypothesis</em> <img class="formulaInl" alt="$ H_0 $" src="form_397.png"/> about the distribution of random variables, given realizations of these random variables. Since in general it is not possible to make statements with certainty, one is interested in the probability <img class="formulaInl" alt="$ p $" src="form_111.png"/> of seeing random variates at least as extreme as the ones observed, assuming that <img class="formulaInl" alt="$ H_0 $" src="form_397.png"/> is true. If this probability <img class="formulaInl" alt="$ p $" src="form_111.png"/> is small, <img class="formulaInl" alt="$ H_0 $" src="form_397.png"/> will be rejected by the test with <em>significance level</em> <img class="formulaInl" alt="$ p $" src="form_111.png"/>. Falsifying <img class="formulaInl" alt="$ H_0 $" src="form_397.png"/> is the canonic goal when employing a hypothesis test. That is, hypothesis tests are typically used in order to substantiate that instead the <em>alternative hypothesis</em> <img class="formulaInl" alt="$ H_1 $" src="form_398.png"/> is true.</p>
<p>Hypothesis tests may be divided into parametric and non-parametric tests. A parametric test assumes certain distributions and makes inferences about parameters of the distributions (e.g., the mean of a normal distribution). Formally, there is a given domain of possible parameters <img class="formulaInl" alt="$ \Gamma $" src="form_399.png"/> and the null hypothesis <img class="formulaInl" alt="$ H_0 $" src="form_397.png"/> is the event that the true parameter <img class="formulaInl" alt="$ \gamma_0 \in \Gamma_0 $" src="form_400.png"/>, where <img class="formulaInl" alt="$ \Gamma_0 \subsetneq \Gamma $" src="form_401.png"/>. Non-parametric tests, on the other hand, do not assume any particular distribution of the sample (e.g., a non-parametric test may simply test if two distributions are similar).</p>
<p>The first step of a hypothesis test is to compute a <em>test statistic</em>, which is a function of the random variates, i.e., a random variate itself. A hypothesis test relies on the distribution of the test statistic being (approximately) known. Now, the <img class="formulaInl" alt="$ p $" src="form_111.png"/>-value is the probability of seeing a test statistic at least as extreme as the one observed, assuming that <img class="formulaInl" alt="$ H_0 $" src="form_397.png"/> is true. In a case where the null hypothesis corresponds to a family of distributions (e.g., in a parametric test where <img class="formulaInl" alt="$ \Gamma_0 $" src="form_402.png"/> is not a singleton set), the <img class="formulaInl" alt="$ p $" src="form_111.png"/>-value is the supremum, over all possible distributions according to the null hypothesis, of these probabilities.</p>
<dl class="section note"><dt>Note</dt><dd>Please refer to <a class="el" href="hypothesis__tests_8sql__in.html">hypothesis_tests.sql_in</a> for additional technical information on the MADlib implementation of hypothesis tests, and for detailed function signatures for all tests.</dd></dl>
<p><a class="anchor" id="input"></a></p><dl class="section user"><dt>Input</dt><dd></dd></dl>
<p>Input data is assumed to be normalized with all values stored row-wise. In general, the following inputs are expected.</p>
<p><b>One-sample tests</b> expect the following form: </p><pre>{TABLE|VIEW} <em>source</em> (
...
<em>value</em> DOUBLE PRECISION
...
)</pre><p><b>Two-sample tests</b> expect the following form: </p><pre>{TABLE|VIEW} <em>source</em> (
...
<em>first</em> BOOLEAN,
<em>value</em> DOUBLE PRECISION
...
)</pre><p> The <code>first</code> column indicates whether a value is from the first sample (if <code>TRUE</code>) or the second sample (if <code>FALSE</code>).</p>
<p><b>Many-sample tests</b> expect the following form: </p><pre>{TABLE|VIEW} <em>source</em> (
...
<em>group</em> INTEGER,
<em>value</em> DOUBLE PRECISION
...
)</pre><p><a class="anchor" id="usage"></a></p><dl class="section user"><dt>Usage</dt><dd></dd></dl>
<p>All tests are implemented as aggregate functions. The non-parametric (rank-based) tests are implemented as ordered aggregate functions and thus necessitate an <code>ORDER BY</code> clause. In the following, the most simple forms of usage are given. Specific function signatures, as described in <a class="el" href="hypothesis__tests_8sql__in.html">hypothesis_tests.sql_in</a>, may require more arguments or a different <code>ORDER BY</code> clause.</p>
<ul>
<li>Run a parametric one-sample test: <pre>SELECT <em>test</em>(<em>value</em>) FROM <em>source</em></pre> where '<em>test</em>' can be one of<ul>
<li><code>t_test_one</code> (one-sample or dependent paired Student's t-test)</li>
<li><code>chi2_gof_test</code> (Pearson's chi-squared goodness of fit test, also used for chi-squared independence test as shown in example section below)</li>
</ul>
</li>
<li>Run a parametric two-sample/multi-sample test: <pre>SELECT <em>test</em>(<em>first/group</em>, <em>value</em>) FROM <em>source</em></pre> where '<em>test</em>' can be one of<ul>
<li><code>f_test</code> (Fisher F-test)</li>
<li><code>t_test_two_pooled</code> (two-sample pooled Student’s t-test, i.e. equal variances)</li>
<li><code>t_test_two_unpooled</code> (two-sample unpooled t-test, i.e., unequal variances, also known as Welch's t-test)</li>
<li><code>one_way_anova</code> (one-way analysis of variance, multi-sample)</li>
</ul>
</li>
<li><p class="startli">Run a non-parametric two-sample/multi-sample test: </p><pre>SELECT <em>test</em>(<em>first/group</em>, <em>value</em> ORDER BY <em>value</em>) FROM <em>source</em></pre><p> where '<em>test</em>' can be one of</p><ul>
<li><code>ks_test</code> (Kolmogorov-Smirnov test)</li>
<li><code>mw_test</code> (Mann-Whitney test)</li>
<li><code>wsr_test</code> (Wilcoxon signed-rank test, multi-sample)</li>
</ul>
<p class="startli"><b>Note on non-parametric tests:</b> Kolomogov-Smirnov two-sample test is based on the asymptotic theory. The p-value is given by comparing the test statistics with the Kolomogov distribution. The p-value is also adjusted for data with heavy tail distribution, which may give different results than those given by R function's ks.test. See [3] for a detailed explanation. The literature is not unanimous about the definitions of the Wilcoxon rank sum and Mann-Whitney tests. There are two possible definitions for the statistic; MADlib outputs the minimum of the two and uses it for significance testing. This might give different results for both mw_test and wsr_test compared to statistical functions in other popular packages (like R's wilcox.test function). See [4] for a detailed explanation.</p>
</li>
</ul>
<p><a class="anchor" id="examples"></a></p><dl class="section user"><dt>Examples</dt><dd></dd></dl>
<ul>
<li><b>One-sample and two-sample t-test</b> (data is subset of mpg data from <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm">NIST/SEMATECH</a>)</li>
</ul>
<pre class="example">
-- Load data
DROP TABLE IF EXISTS auto83b;
CREATE TABLE auto83b (
id SERIAL,
mpg_us DOUBLE PRECISION,
mpg_j DOUBLE PRECISION
);
COPY auto83b (mpg_us, mpg_j) FROM stdin DELIMITER '|';
18|24
15|27
18|27
16|25
17|31
15|35
14|24
14|19
21|31
10|32
10|24
11|26
9| 9
\N|32
\N|37
\N|38
\N|34
\N|34
\N|32
\N|33
\N|32
\N|25
\N|24
\N|37
13|\N
12|\N
18|\N
21|\N
19|\N
21|\N
15|\N
16|\N
15|\N
11|\N
20|\N
21|\N
19|\N
15|\N
\.
</pre><pre class="example">
-- Create table for one sample tests
DROP TABLE IF EXISTS auto83b_one_sample;
CREATE TABLE auto83b_one_sample AS
SELECT mpg_us AS mpg
FROM auto83b
WHERE mpg_us is not NULL;
-- Print table
SELECT * FROM auto83b_one_sample;
</pre><pre class="result">
mpg
18
15
18
16
17
15
14
14
21
10
10
11
9
13
12
18
21
19
21
15
16
15
11
20
21
19
15
(27 rows)
</pre> <pre class="example">
-- Create table for two sample tests
DROP TABLE IF EXISTS auto83b_two_sample;
CREATE TABLE auto83b_two_sample AS
SELECT TRUE AS is_us, mpg_us AS mpg
FROM auto83b
WHERE mpg_us is not NULL
UNION ALL
SELECT FALSE, mpg_j
FROM auto83b
WHERE mpg_j is not NULL;
-- Print table
SELECT * FROM auto83b_two_sample;
</pre> <pre class="result">
is_us | mpg
-------+-----
t | 18
t | 15
t | 18
t | 16
t | 17
t | 15
t | 14
t | 14
t | 21
t | 10
t | 10
t | 11
t | 9
t | 13
t | 12
t | 18
t | 21
t | 19
t | 21
t | 15
t | 16
t | 15
t | 11
t | 20
t | 21
t | 19
t | 15
f | 24
f | 27
f | 27
f | 25
f | 31
f | 35
f | 24
f | 19
f | 31
f | 32
f | 24
f | 26
f | 9
f | 32
f | 37
f | 38
f | 34
f | 34
f | 32
f | 33
f | 32
f | 25
f | 24
f | 37
(51 rows)
</pre> <pre class="example">
-- One sample tests
SELECT (madlib.t_test_one(mpg - 20)).* FROM auto83b_one_sample; -- test rejected for mean = 20
</pre><pre class="result">
statistic | df | p_value_one_sided | p_value_two_sided
------------------+----+-------------------+----------------------
-6.0532478722666 | 26 | 0.999998926789141 | 2.14642171769697e-06
</pre><pre class="example">
SELECT (madlib.t_test_one(mpg - 15.7)).* FROM auto83b_one_sample; -- test not rejected
</pre><pre class="result">
statistic | df | p_value_one_sided | p_value_two_sided
---------------------+----+-------------------+-------------------
0.00521831713126531 | 26 | 0.497938118950661 | 0.995876237901321
</pre><pre class="example">
-- Two sample tests
SELECT (madlib.t_test_two_pooled(is_us, mpg)).* FROM auto83b_two_sample;
</pre> <pre class="result">
statistic | df | p_value_one_sided | p_value_two_sided
-------------------+----+-------------------+----------------------
-8.89342267075968 | 49 | 0.999999999995748 | 8.50408632402377e-12
</pre><pre class="example">
SELECT (madlib.t_test_two_unpooled(is_us, mpg)).* FROM auto83b_two_sample;
</pre><pre class="result">
statistic | df | p_value_one_sided | p_value_two_sided
-------------------+------------------+-------------------+----------------------
-8.61746388524314 | 35.1283818346179 | 0.999999999821218 | 3.57563867403599e-10
</pre><ul>
<li><b>F-Test</b> (Uses same data as above t-test)</li>
</ul>
<pre class="example">
SELECT (madlib.f_test(is_us, mpg)).* FROM auto83b_two_sample;
-- Test result indicates that the two distributions have different variances
</pre> <pre class="result">
statistic | df1 | df2 | p_value_one_sided | p_value_two_sided
-------------------+-----+-----+-------------------+---------------------
0.311786921089247 | 26 | 23 | 0.997559863672441 | 0.00488027265511803
</pre><ul>
<li><b>Chi-squared goodness-of-fit test</b> (<a href="http://www.statsdirect.com/help/default.htm#nonparametric_methods/chisq_goodness_fit.htm">Data source</a>)</li>
</ul>
<pre class="example">
CREATE TABLE chi2_test_blood_group (
id SERIAL,
blood_group VARCHAR,
observed BIGINT,
expected DOUBLE PRECISION
);
INSERT INTO chi2_test_blood_group(blood_group, observed, expected) VALUES
('O', 67, 82.28),
('A', 83, 84.15),
('B', 29, 14.96),
('AB', 8, 5.61);
SELECT (madlib.chi2_gof_test(observed, expected)).* FROM chi2_test_blood_group;
</pre> <pre class="result">
statistic | p_value | df | phi | contingency_coef
------------------+----------------------+----+------------------+-------------------
17.0481013341976 | 0.000690824622923826 | 3 | 2.06446732440826 | 0.899977280680593
</pre><ul>
<li><b>Chi-squared independence test</b> (<a href="http://itl.nist.gov/div898/software/dataplot/refman1/auxillar/chistest.htm">Data source</a>)</li>
</ul>
<p>The Chi-squared independence test uses the Chi-squared goodness-of-fit function, as shown in the example below. The expected value needs to be computed and passed to the goodness-of-fit function. The expected value for MADlib is computed as <em>sum of rows * sum of columns</em>, for each element of the input matrix. For e.g., expected value for element (2,1) would be <em>sum of row 2 * sum of column 1</em>.</p>
<pre class="example">
CREATE TABLE chi2_test_friendly (
id_x SERIAL,
values INTEGER[]
);
INSERT INTO chi2_test_friendly(values) VALUES
(array[5, 29, 14, 16]),
(array[15, 54, 14, 10]),
(array[20, 84, 17, 94]),
(array[68, 119, 26, 7]);</pre><pre class="example">-- Input table is expected to be unpivoted, so need to pivot it
CREATE TABLE chi2_test_friendly_unpivoted AS
SELECT id_x, id_y, values[id_y] AS observed
FROM
chi2_test_friendly,
generate_series(1,4) AS id_y;</pre><pre class="example">-- Compute Chi-squared independence statistic, by calculating expected value in the SQL and calling the goodness-of-fit function
SELECT (madlib.chi2_gof_test(observed, expected, deg_freedom)).*
FROM (
-- Compute expected values and degrees of freedom
SELECT
observed,
sum(observed) OVER (PARTITION BY id_x)::DOUBLE PRECISION *
sum(observed) OVER (PARTITION BY id_y) AS expected
FROM chi2_test_friendly_unpivoted
) p, (
SELECT
(count(DISTINCT id_x) - 1) * (count(DISTINCT id_y) - 1) AS deg_freedom
FROM chi2_test_friendly_unpivoted
) q;
</pre> <pre class="result">
statistic | p_value | df | phi | contingency_coef
------------------+----------------------+----+------------------+-------------------
138.289841626008 | 2.32528678709871e-25 | 9 | 2.93991753313346 | 0.946730727519112
</pre><ul>
<li><b>ANOVA test</b> (<a href="http://www.itl.nist.gov/div898/handbook/prc/section4/prc433.htm">Data source</a>)</li>
</ul>
<pre class="example">
CREATE TABLE nist_anova_test (
id SERIAL,
resistance FLOAT8[]
);
INSERT INTO nist_anova_test(resistance) VALUES
(array[6.9,8.3,8.0]),
(array[5.4,6.8,10.5]),
(array[5.8,7.8,8.1]),
(array[4.6,9.2,6.9]),
(array[4.0,6.5,9.3]);</pre><pre class="example">SELECT (madlib.one_way_anova(level, value)).* FROM (
SELECT level, resistance[level] AS value
FROM
nist_anova_test, (SELECT * FROM generate_series(1,3) level) q1
) q2;
</pre> <pre class="result">
sum_squares_between | sum_squares_within | df_between | df_within | mean_squares_between | mean_squares_within | statistic | p_value
---------------------+--------------------+------------+-----------+----------------------+---------------------+------------------+--------------------
27.8973333333333 | 17.452 | 2 | 12 | 13.9486666666667 | 1.45433333333333 | 9.59110703644281 | 0.0032482226008593
</pre><ul>
<li><b>Kolmogorov-Smirnov test</b> (<a href="http://www.physics.csbsju.edu/stats/KS-test.html">Data source</a>)</li>
</ul>
<pre class="example">
CREATE TABLE ks_sample_1 AS
SELECT
TRUE AS first,
unnest(ARRAY[0.22, -0.87, -2.39, -1.79, 0.37, -1.54, 1.28, -0.31, -0.74, 1.72, 0.38, -0.17, -0.62, -1.10, 0.30, 0.15, 2.30, 0.19, -0.50, -0.09]) AS value
UNION ALL
SELECT
FALSE,
unnest(ARRAY[-5.13, -2.19, -2.43, -3.83, 0.50, -3.25, 4.32, 1.63, 5.18, -0.43, 7.11, 4.87, -3.10, -5.81, 3.76, 6.31, 2.58, 0.07, 5.76, 3.50]);</pre><pre class="example">SELECT (madlib.ks_test(first, value,
(SELECT count(value) FROM ks_sample_1 WHERE first),
(SELECT count(value) FROM ks_sample_1 WHERE NOT first)
ORDER BY value)).*
FROM ks_sample_1;
</pre> <pre class="result">
statistic | k_statistic | p_value
-----------+-----------------+--------------------
0.45 | 1.4926782214936 | 0.0232132758544496
</pre><ul>
<li><b>Mann-Whitney test</b> (use same data as t-test)</li>
</ul>
<pre class="example">
SELECT (madlib.mw_test(is_us, mpg ORDER BY mpg)).* from auto83b_two_sample;
-- Note first parameter above is BOOLEAN
</pre> <pre class="result">
statistic | u_statistic | p_value_one_sided | p_value_two_sided
-------------------+-------------+-------------------+----------------------
-5.50097925755249 | 32.5 | 0.999999981115618 | 3.77687645883758e-08
</pre><ul>
<li><b>Wilcoxon signed-rank test</b></li>
</ul>
<pre class="example">
DROP TABLE IF EXISTS test_wsr;
CREATE TABLE test_wsr (
x DOUBLE PRECISION,
y DOUBLE PRECISION
);
COPY test_wsr (x, y) FROM stdin DELIMITER '|';
0.32|0.39
0.4|0.47
0.11|0.11
0.47|0.43
0.32|0.42
0.35|0.3
0.32|0.43
0.63|0.98
0.5|0.86
0.6|0.79
0.38|0.33
0.46|0.45
0.2|0.22
0.31|0.3
0.62|0.6
0.52|0.53
0.77|0.85
0.23|0.21
0.3|0.33
0.7|0.57
0.41|0.43
0.53|0.49
0.19|0.2
0.31|0.35
0.48|0.4
\.
SELECT (madlib.wsr_test(
x - y,
2 * 2^(-52) * greatest(x,y)
ORDER BY abs(x - y)
)).*
FROM test_wsr;
</pre> <pre class="result">
statistic | rank_sum_pos | rank_sum_neg | num | z_statistic | p_value_one_sided | p_value_two_sided
-----------+--------------+--------------+-----+-------------------+-------------------+-------------------
105.5 | 105.5 | 194.5 | 24 | -1.27318365656729 | 0.898523560667509 | 0.202952878664983
</pre><p><a class="anchor" id="literature"></a></p><dl class="section user"><dt>Literature</dt><dd></dd></dl>
<p>[1] M. Hollander, D. Wolfe: <em>Nonparametric Statistical Methods</em>, 2nd edition, Wiley, 1999</p>
<p>[2] E. Lehmann, J. Romano: <em>Testing Statistical Hypotheses</em>, 3rd edition, Springer, 2005</p>
<p>[3] M. Stephens: <em>Use of the Kolmogorov-Smirnov, Cramer-Von Mises and related statistics without extensive tables</em>, Journal of the Royal Statistical Society. Series B (Methodological) (1970): 115-122.</p>
<p>[4] Wikipedia: Mann–Whitney U test calculation, <a href="http://en.wikipedia.org/wiki/Mann-Whitney_test#Calculations">http://en.wikipedia.org/wiki/Mann-Whitney_test#Calculations</a></p>
<p><a class="anchor" id="related"></a></p><dl class="section user"><dt>Related Topics</dt><dd></dd></dl>
<p>File <a class="el" href="hypothesis__tests_8sql__in.html" title="SQL functions for statistical hypothesis tests. ">hypothesis_tests.sql_in</a> documenting the SQL functions. </p>
</div><!-- contents -->
</div><!-- doc-content -->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
<ul>
<li class="footer">Generated on Tue May 16 2017 13:24:38 for MADlib by
<a href="http://www.doxygen.org/index.html">
<img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.13 </li>
</ul>
</div>
</body>
</html>