<!-- HTML header for doxygen 1.8.4-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<meta name="generator" content="Doxygen 1.8.13"/>
<meta name="keywords" content="madlib,postgres,greenplum,machine learning,data mining,deep learning,ensemble methods,data science,market basket analysis,affinity analysis,pca,lda,regression,elastic net,huber white,proportional hazards,k-means,latent dirichlet allocation,bayes,support vector machines,svm"/>
<title>MADlib: Hypothesis Tests</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="navtreedata.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript">
  $(document).ready(initResizable);
</script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/searchdata.js"></script>
<script type="text/javascript" src="search/search.js"></script>
<script type="text/javascript">
  $(document).ready(function() { init_search(); });
</script>
<script type="text/x-mathjax-config">
  MathJax.Hub.Config({
    extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
    jax: ["input/TeX","output/HTML-CSS"],
});
</script><script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js"></script>
<!-- hack in the navigation tree -->
<script type="text/javascript" src="eigen_navtree_hacks.js"></script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
<link href="madlib_extra.css" rel="stylesheet" type="text/css"/>
<!-- google analytics -->
<script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
  ga('create', 'UA-45382226-1', 'madlib.apache.org');
  ga('send', 'pageview');
</script>
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
 <tbody>
 <tr style="height: 56px;">
  <td id="projectlogo"><a href="http://madlib.apache.org"><img alt="Logo" src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
  <td style="padding-left: 0.5em;">
   <div id="projectname">
   <span id="projectnumber">1.21.0</span>
   </div>
   <div id="projectbrief">User Documentation for Apache MADlib</div>
  </td>
   <td>        <div id="MSearchBox" class="MSearchBoxInactive">
        <span class="left">
          <img id="MSearchSelect" src="search/mag_sel.png"
               onmouseover="return searchBox.OnSearchSelectShow()"
               onmouseout="return searchBox.OnSearchSelectHide()"
               alt=""/>
          <input type="text" id="MSearchField" value="Search" accesskey="S"
               onfocus="searchBox.OnSearchFieldFocus(true)" 
               onblur="searchBox.OnSearchFieldFocus(false)" 
               onkeyup="searchBox.OnSearchFieldChange(event)"/>
          </span><span class="right">
            <a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a>
          </span>
        </div>
</td>
 </tr>
 </tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.13 -->
<script type="text/javascript">
var searchBox = new SearchBox("searchBox", "search",false,'Search');
</script>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
  <div id="nav-tree">
    <div id="nav-tree-contents">
      <div id="nav-sync" class="sync"></div>
    </div>
  </div>
  <div id="splitbar" style="-moz-user-select:none;" 
       class="ui-resizable-handle">
  </div>
</div>
<script type="text/javascript">
$(document).ready(function(){initNavTree('group__grp__stats__tests.html','');});
</script>
<div id="doc-content">
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
     onmouseover="return searchBox.OnSearchSelectShow()"
     onmouseout="return searchBox.OnSearchSelectHide()"
     onkeydown="return searchBox.OnSearchSelectKey(event)">
</div>

<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<iframe src="javascript:void(0)" frameborder="0" 
        name="MSearchResults" id="MSearchResults">
</iframe>
</div>

<div class="header">
  <div class="headertitle">
<div class="title">Hypothesis Tests<div class="ingroups"><a class="el" href="group__grp__stats.html">Statistics</a> &raquo; <a class="el" href="group__grp__inf__stats.html">Inferential Statistics</a></div></div>  </div>
</div><!--header-->
<div class="contents">
<div class="toc"><b>Contents</b> <ul>
<li>
<a href="#input">Input</a> </li>
<li>
<a href="#usage">Usage</a> </li>
<li>
<a href="#examples">Examples</a> </li>
<li>
<a href="#literature">Literature</a> </li>
<li>
<a href="#related">Related Topics</a> </li>
</ul>
</div><p>Hypothesis tests are used to confirm or reject a <em>null hypothesis</em> \( H_0 \) about the distribution of random variables, given realizations of these random variables. Since in general it is not possible to make statements with certainty, one is interested in the probability \( p \) of seeing random variates at least as extreme as the ones observed, assuming that \( H_0 \) is true. If this probability \( p \) is small, \( H_0 \) will be rejected by the test with <em>significance level</em> \( p \). Falsifying \( H_0 \) is the canonic goal when employing a hypothesis test. That is, hypothesis tests are typically used in order to substantiate that instead the <em>alternative hypothesis</em> \( H_1 \) is true.</p>
<p>Hypothesis tests may be divided into parametric and non-parametric tests. A parametric test assumes certain distributions and makes inferences about parameters of the distributions (e.g., the mean of a normal distribution). Formally, there is a given domain of possible parameters \( \Gamma \) and the null hypothesis \( H_0 \) is the event that the true parameter \( \gamma_0 \in \Gamma_0 \), where \( \Gamma_0 \subsetneq \Gamma \). Non-parametric tests, on the other hand, do not assume any particular distribution of the sample (e.g., a non-parametric test may simply test if two distributions are similar).</p>
<p>The first step of a hypothesis test is to compute a <em>test statistic</em>, which is a function of the random variates, i.e., a random variate itself. A hypothesis test relies on the distribution of the test statistic being (approximately) known. Now, the \( p \)-value is the probability of seeing a test statistic at least as extreme as the one observed, assuming that \( H_0 \) is true. In a case where the null hypothesis corresponds to a family of distributions (e.g., in a parametric test where \( \Gamma_0 \) is not a singleton set), the \( p \)-value is the supremum, over all possible distributions according to the null hypothesis, of these probabilities.</p>
<dl class="section note"><dt>Note</dt><dd>Please refer to <a class="el" href="hypothesis__tests_8sql__in.html">hypothesis_tests.sql_in</a> for additional technical information on the MADlib implementation of hypothesis tests, and for detailed function signatures for all tests.</dd></dl>
<p><a class="anchor" id="input"></a></p><dl class="section user"><dt>Input</dt><dd></dd></dl>
<p>Input data is assumed to be normalized with all values stored row-wise. In general, the following inputs are expected.</p>
<p><b>One-sample tests</b> expect the following form: </p><pre>{TABLE|VIEW} <em>source</em> (
    ...
    <em>value</em> DOUBLE PRECISION
    ...
)</pre><p><b>Two-sample tests</b> expect the following form: </p><pre>{TABLE|VIEW} <em>source</em> (
    ...
    <em>first</em> BOOLEAN,
    <em>value</em> DOUBLE PRECISION
    ...
)</pre><p> The <code>first</code> column indicates whether a value is from the first sample (if <code>TRUE</code>) or the second sample (if <code>FALSE</code>).</p>
<p><b>Many-sample tests</b> expect the following form: </p><pre>{TABLE|VIEW} <em>source</em> (
    ...
    <em>group</em> INTEGER,
    <em>value</em> DOUBLE PRECISION
    ...
)</pre><p><a class="anchor" id="usage"></a></p><dl class="section user"><dt>Usage</dt><dd></dd></dl>
<p>All tests are implemented as aggregate functions. The non-parametric (rank-based) tests are implemented as ordered aggregate functions and thus necessitate an <code>ORDER BY</code> clause. In the following, the most simple forms of usage are given. Specific function signatures, as described in <a class="el" href="hypothesis__tests_8sql__in.html">hypothesis_tests.sql_in</a>, may require more arguments or a different <code>ORDER BY</code> clause.</p>
<ul>
<li>Run a parametric one-sample test: <pre>SELECT <em>test</em>(<em>value</em>) FROM <em>source</em></pre> where '<em>test</em>' can be one of<ul>
<li><code>t_test_one</code> (one-sample or dependent paired Student's t-test)</li>
<li><code>chi2_gof_test</code> (Pearson's chi-squared goodness of fit test, also used for chi-squared independence test as shown in example section below)</li>
</ul>
</li>
<li>Run a parametric two-sample/multi-sample test: <pre>SELECT <em>test</em>(<em>first/group</em>, <em>value</em>) FROM <em>source</em></pre> where '<em>test</em>' can be one of<ul>
<li><code>f_test</code> (Fisher F-test)</li>
<li><code>t_test_two_pooled</code> (two-sample pooled Student’s t-test, i.e. equal variances)</li>
<li><code>t_test_two_unpooled</code> (two-sample unpooled t-test, i.e., unequal variances, also known as Welch's t-test)</li>
<li><code>one_way_anova</code> (one-way analysis of variance, multi-sample)</li>
</ul>
</li>
<li><p class="startli">Run a non-parametric two-sample/multi-sample test: </p><pre>SELECT <em>test</em>(<em>first/group</em>, <em>value</em> ORDER BY <em>value</em>) FROM <em>source</em></pre><p> where '<em>test</em>' can be one of</p><ul>
<li><code>ks_test</code> (Kolmogorov-Smirnov test)</li>
<li><code>mw_test</code> (Mann-Whitney test)</li>
<li><code>wsr_test</code> (Wilcoxon signed-rank test, multi-sample)</li>
</ul>
<p class="startli"><b>Note on non-parametric tests:</b> Kolomogov-Smirnov two-sample test is based on the asymptotic theory. The p-value is given by comparing the test statistics with the Kolomogov distribution. The p-value is also adjusted for data with heavy tail distribution, which may give different results than those given by R function's ks.test. See [3] for a detailed explanation. The literature is not unanimous about the definitions of the Wilcoxon rank sum and Mann-Whitney tests. There are two possible definitions for the statistic; MADlib outputs the minimum of the two and uses it for significance testing. This might give different results for both mw_test and wsr_test compared to statistical functions in other popular packages (like R's wilcox.test function). See [4] for a detailed explanation.</p>
</li>
</ul>
<p><a class="anchor" id="examples"></a></p><dl class="section user"><dt>Examples</dt><dd></dd></dl>
<ul>
<li><b>One-sample and two-sample t-test</b> (data is subset of mpg data from <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm">NIST/SEMATECH</a>)</li>
</ul>
<pre class="example">
-- Load data
DROP TABLE IF EXISTS auto83b;
CREATE TABLE auto83b (
    id SERIAL,
    mpg_us DOUBLE PRECISION,
    mpg_j DOUBLE PRECISION
);
COPY auto83b (mpg_us, mpg_j) FROM stdin DELIMITER '|';
18|24
15|27
18|27
16|25
17|31
15|35
14|24
14|19
21|31
10|32
10|24
11|26
9| 9
\N|32
\N|37
\N|38
\N|34
\N|34
\N|32
\N|33
\N|32
\N|25
\N|24
\N|37
13|\N
12|\N
18|\N
21|\N
19|\N
21|\N
15|\N
16|\N
15|\N
11|\N
20|\N
21|\N
19|\N
15|\N
\.
</pre><pre class="example">
-- Create table for one sample tests
DROP TABLE IF EXISTS auto83b_one_sample;
CREATE TABLE auto83b_one_sample AS
    SELECT mpg_us AS mpg
    FROM auto83b
    WHERE mpg_us is not NULL;
-- Print table
SELECT * FROM auto83b_one_sample;
</pre><pre class="result">
mpg
  18
  15
  18
  16
  17
  15
  14
  14
  21
  10
  10
  11
   9
  13
  12
  18
  21
  19
  21
  15
  16
  15
  11
  20
  21
  19
  15
(27 rows)
</pre> <pre class="example">
-- Create table for two sample tests
DROP TABLE IF EXISTS auto83b_two_sample;
CREATE TABLE auto83b_two_sample AS
SELECT TRUE AS is_us, mpg_us AS mpg
    FROM auto83b
    WHERE mpg_us is not NULL
    UNION ALL
    SELECT FALSE, mpg_j
    FROM auto83b
    WHERE mpg_j is not NULL;
-- Print table
SELECT * FROM auto83b_two_sample;
</pre> <pre class="result">
 is_us | mpg
-------+-----
 t     |  18
 t     |  15
 t     |  18
 t     |  16
 t     |  17
 t     |  15
 t     |  14
 t     |  14
 t     |  21
 t     |  10
 t     |  10
 t     |  11
 t     |   9
 t     |  13
 t     |  12
 t     |  18
 t     |  21
 t     |  19
 t     |  21
 t     |  15
 t     |  16
 t     |  15
 t     |  11
 t     |  20
 t     |  21
 t     |  19
 t     |  15
 f     |  24
 f     |  27
 f     |  27
 f     |  25
 f     |  31
 f     |  35
 f     |  24
 f     |  19
 f     |  31
 f     |  32
 f     |  24
 f     |  26
 f     |   9
 f     |  32
 f     |  37
 f     |  38
 f     |  34
 f     |  34
 f     |  32
 f     |  33
 f     |  32
 f     |  25
 f     |  24
 f     |  37
(51 rows)
</pre> <pre class="example">
-- One sample tests
SELECT (madlib.t_test_one(mpg - 20)).* FROM auto83b_one_sample;  -- test rejected for mean = 20
</pre><pre class="result">
     statistic     | df | p_value_one_sided |  p_value_two_sided
 ------------------+----+-------------------+----------------------
  -6.0532478722666 | 26 | 0.999998926789141 | 2.14642171769697e-06
 </pre><pre class="example">
SELECT (madlib.t_test_one(mpg - 15.7)).* FROM auto83b_one_sample;  -- test not rejected
</pre><pre class="result">
       statistic      | df | p_value_one_sided | p_value_two_sided
 ---------------------+----+-------------------+-------------------
  0.00521831713126531 | 26 | 0.497938118950661 | 0.995876237901321
</pre><pre class="example">
-- Two sample tests
SELECT (madlib.t_test_two_pooled(is_us, mpg)).* FROM auto83b_two_sample;
</pre> <pre class="result">
     statistic     | df | p_value_one_sided |  p_value_two_sided
 -------------------+----+-------------------+----------------------
  -8.89342267075968 | 49 | 0.999999999995748 | 8.50408632402377e-12
 </pre><pre class="example">
SELECT (madlib.t_test_two_unpooled(is_us, mpg)).* FROM auto83b_two_sample;
</pre><pre class="result">
      statistic     |        df        | p_value_one_sided |  p_value_two_sided
 -------------------+------------------+-------------------+----------------------
  -8.61746388524314 | 35.1283818346179 | 0.999999999821218 | 3.57563867403599e-10
</pre><ul>
<li><b>F-Test</b> (Uses same data as above t-test)</li>
</ul>
<pre class="example">
SELECT (madlib.f_test(is_us, mpg)).* FROM auto83b_two_sample;
-- Test result indicates that the two distributions have different variances
</pre> <pre class="result">
      statistic     | df1 | df2 | p_value_one_sided |  p_value_two_sided
 -------------------+-----+-----+-------------------+---------------------
  0.311786921089247 |  26 |  23 | 0.997559863672441 | 0.00488027265511803
</pre><ul>
<li><b>Chi-squared goodness-of-fit test</b> (<a href="http://www.statsdirect.com/help/default.htm#nonparametric_methods/chisq_goodness_fit.htm">Data source</a>)</li>
</ul>
<pre class="example">
CREATE TABLE chi2_test_blood_group (
    id SERIAL,
    blood_group VARCHAR,
    observed BIGINT,
    expected DOUBLE PRECISION
);
INSERT INTO chi2_test_blood_group(blood_group, observed, expected) VALUES
    ('O', 67, 82.28),
    ('A', 83, 84.15),
    ('B', 29, 14.96),
    ('AB', 8, 5.61);
SELECT (madlib.chi2_gof_test(observed, expected)).* FROM chi2_test_blood_group;
</pre> <pre class="result">
     statistic     |       p_value        | df |       phi        | contingency_coef
 ------------------+----------------------+----+------------------+-------------------
  17.0481013341976 | 0.000690824622923826 |  3 | 2.06446732440826 | 0.899977280680593
 </pre><ul>
<li><b>Chi-squared independence test</b> (<a href="http://itl.nist.gov/div898/software/dataplot/refman1/auxillar/chistest.htm">Data source</a>)</li>
</ul>
<p>The Chi-squared independence test uses the Chi-squared goodness-of-fit function, as shown in the example below. The expected value needs to be computed and passed to the goodness-of-fit function. The expected value for MADlib is computed as <em>sum of rows * sum of columns</em>, for each element of the input matrix. For e.g., expected value for element (2,1) would be <em>sum of row 2 * sum of column 1</em>.</p>
<pre class="example">
CREATE TABLE chi2_test_friendly (
    id_x SERIAL,
    values INTEGER[]
);
INSERT INTO chi2_test_friendly(values) VALUES
    (array[5, 29, 14, 16]),
    (array[15, 54, 14, 10]),
    (array[20, 84, 17, 94]),
    (array[68, 119, 26, 7]);</pre><pre class="example">-- Input table is expected to be unpivoted, so need to pivot it
CREATE TABLE chi2_test_friendly_unpivoted AS
SELECT id_x, id_y, values[id_y] AS observed
FROM
    chi2_test_friendly,
    generate_series(1,4) AS id_y;</pre><pre class="example">-- Compute Chi-squared independence statistic, by calculating expected value in the SQL and calling the goodness-of-fit function
SELECT (madlib.chi2_gof_test(observed, expected, deg_freedom)).*
FROM (
    -- Compute expected values and degrees of freedom
    SELECT
        observed,
        sum(observed) OVER (PARTITION BY id_x)::DOUBLE PRECISION *
        sum(observed) OVER (PARTITION BY id_y) AS expected
    FROM chi2_test_friendly_unpivoted
) p, (
    SELECT
        (count(DISTINCT id_x) - 1) * (count(DISTINCT id_y) - 1) AS deg_freedom
    FROM chi2_test_friendly_unpivoted
) q;
</pre> <pre class="result">
     statistic     |       p_value        | df |       phi        | contingency_coef
 ------------------+----------------------+----+------------------+-------------------
  138.289841626008 | 2.32528678709871e-25 |  9 | 2.93991753313346 | 0.946730727519112
 </pre><ul>
<li><b>ANOVA test</b> (<a href="http://www.itl.nist.gov/div898/handbook/prc/section4/prc433.htm">Data source</a>)</li>
</ul>
<pre class="example">
CREATE TABLE nist_anova_test (
    id SERIAL,
    resistance FLOAT8[]
);
INSERT INTO nist_anova_test(resistance) VALUES
    (array[6.9,8.3,8.0]),
    (array[5.4,6.8,10.5]),
    (array[5.8,7.8,8.1]),
    (array[4.6,9.2,6.9]),
    (array[4.0,6.5,9.3]);</pre><pre class="example">SELECT (madlib.one_way_anova(level, value)).* FROM (
    SELECT level, resistance[level] AS value
    FROM
        nist_anova_test, (SELECT * FROM generate_series(1,3) level) q1
) q2;
</pre> <pre class="result">
  sum_squares_between | sum_squares_within | df_between | df_within | mean_squares_between | mean_squares_within |    statistic     |      p_value
 ---------------------+--------------------+------------+-----------+----------------------+---------------------+------------------+--------------------
     27.8973333333333 |             17.452 |          2 |        12 |     13.9486666666667 |    1.45433333333333 | 9.59110703644281 | 0.0032482226008593
</pre><ul>
<li><b>Kolmogorov-Smirnov test</b> (<a href="http://www.physics.csbsju.edu/stats/KS-test.html">Data source</a>)</li>
</ul>
<pre class="example">
CREATE TABLE ks_sample_1 AS
SELECT
    TRUE AS first,
    unnest(ARRAY[0.22, -0.87, -2.39, -1.79, 0.37, -1.54, 1.28, -0.31, -0.74, 1.72, 0.38, -0.17, -0.62, -1.10, 0.30, 0.15, 2.30, 0.19, -0.50, -0.09]) AS value
UNION ALL
SELECT
    FALSE,
    unnest(ARRAY[-5.13, -2.19, -2.43, -3.83, 0.50, -3.25, 4.32, 1.63, 5.18, -0.43, 7.11, 4.87, -3.10, -5.81, 3.76, 6.31, 2.58, 0.07, 5.76, 3.50]);</pre><pre class="example">SELECT (madlib.ks_test(first, value,
    (SELECT count(value) FROM ks_sample_1 WHERE first),
    (SELECT count(value) FROM ks_sample_1 WHERE NOT first)
    ORDER BY value)).*
FROM ks_sample_1;
</pre> <pre class="result">
  statistic |   k_statistic   |      p_value
 -----------+-----------------+--------------------
       0.45 | 1.4926782214936 | 0.0232132758544496
</pre><ul>
<li><b>Mann-Whitney test</b> (use same data as t-test)</li>
</ul>
<pre class="example">
SELECT (madlib.mw_test(is_us, mpg ORDER BY mpg)).* from auto83b_two_sample;
-- Note first parameter above is BOOLEAN
</pre> <pre class="result">
      statistic     | u_statistic | p_value_one_sided |  p_value_two_sided
 -------------------+-------------+-------------------+----------------------
  -5.50097925755249 |        32.5 | 0.999999981115618 | 3.77687645883758e-08
</pre><ul>
<li><b>Wilcoxon signed-rank test</b></li>
</ul>
<pre class="example">
DROP TABLE IF EXISTS test_wsr;
CREATE TABLE test_wsr (
    x DOUBLE PRECISION,
    y DOUBLE PRECISION
);
COPY test_wsr (x, y) FROM stdin DELIMITER '|';
0.32|0.39
0.4|0.47
0.11|0.11
0.47|0.43
0.32|0.42
0.35|0.3
0.32|0.43
0.63|0.98
0.5|0.86
0.6|0.79
0.38|0.33
0.46|0.45
0.2|0.22
0.31|0.3
0.62|0.6
0.52|0.53
0.77|0.85
0.23|0.21
0.3|0.33
0.7|0.57
0.41|0.43
0.53|0.49
0.19|0.2
0.31|0.35
0.48|0.4
\.

SELECT (madlib.wsr_test(
    x - y,
    2 * 2^(-52) * greatest(x,y)
    ORDER BY abs(x - y)
)).*
FROM test_wsr;
</pre> <pre class="result">
  statistic | rank_sum_pos | rank_sum_neg | num |    z_statistic    | p_value_one_sided | p_value_two_sided
 -----------+--------------+--------------+-----+-------------------+-------------------+-------------------
      105.5 |        105.5 |        194.5 |  24 | -1.27318365656729 | 0.898523560667509 | 0.202952878664983
</pre><p><a class="anchor" id="literature"></a></p><dl class="section user"><dt>Literature</dt><dd></dd></dl>
<p>[1] M. Hollander, D. Wolfe: <em>Nonparametric Statistical Methods</em>, 2nd edition, Wiley, 1999</p>
<p>[2] E. Lehmann, J. Romano: <em>Testing Statistical Hypotheses</em>, 3rd edition, Springer, 2005</p>
<p>[3] M. Stephens: <em>Use of the Kolmogorov-Smirnov, Cramer-Von Mises and related statistics without extensive tables</em>, Journal of the Royal Statistical Society. Series B (Methodological) (1970): 115-122.</p>
<p>[4] Wikipedia: Mann–Whitney U test calculation, <a href="http://en.wikipedia.org/wiki/Mann-Whitney_test#Calculations">http://en.wikipedia.org/wiki/Mann-Whitney_test#Calculations</a></p>
<p><a class="anchor" id="related"></a></p><dl class="section user"><dt>Related Topics</dt><dd></dd></dl>
<p>File <a class="el" href="hypothesis__tests_8sql__in.html" title="SQL functions for statistical hypothesis tests. ">hypothesis_tests.sql_in</a> documenting the SQL functions. </p>
</div><!-- contents -->
</div><!-- doc-content -->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
  <ul>
    <li class="footer">Generated on Thu Feb 23 2023 19:26:41 for MADlib by
    <a href="http://www.doxygen.org/index.html">
    <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.13 </li>
  </ul>
</div>
</body>
</html>
