docs/v1.11/group__grp__stats__tests.html - madlib-site - Git at Google

 <!-- HTML header for doxygen 1.8.4-->
 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
 <html xmlns="http://www.w3.org/1999/xhtml">
 <head>
 <meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
 <meta http-equiv="X-UA-Compatible" content="IE=9"/>
 <meta name="generator" content="Doxygen 1.8.13"/>
 <meta name="keywords" content="madlib,postgres,greenplum,machine learning,data mining,deep learning,ensemble methods,data science,market basket analysis,affinity analysis,pca,lda,regression,elastic net,huber white,proportional hazards,k-means,latent dirichlet allocation,bayes,support vector machines,svm"/>
 <title>MADlib: Hypothesis Tests</title>
 <link href="tabs.css" rel="stylesheet" type="text/css"/>
 <script type="text/javascript" src="jquery.js"></script>
 <script type="text/javascript" src="dynsections.js"></script>
 <link href="navtree.css" rel="stylesheet" type="text/css"/>
 <script type="text/javascript" src="resize.js"></script>
 <script type="text/javascript" src="navtreedata.js"></script>
 <script type="text/javascript" src="navtree.js"></script>
 <script type="text/javascript">
   $(document).ready(initResizable);
 </script>
 <link href="search/search.css" rel="stylesheet" type="text/css"/>
 <script type="text/javascript" src="search/searchdata.js"></script>
 <script type="text/javascript" src="search/search.js"></script>
 <script type="text/javascript">
   $(document).ready(function() { init_search(); });
 </script>
 <!-- hack in the navigation tree -->
 <script type="text/javascript" src="eigen_navtree_hacks.js"></script>
 <link href="doxygen.css" rel="stylesheet" type="text/css" />
 <link href="madlib_extra.css" rel="stylesheet" type="text/css"/>
 <!-- google analytics -->
 <script>
   (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
   (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
   m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
   })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
   ga('create', 'UA-45382226-1', 'madlib.incubator.apache.org');
   ga('send', 'pageview');
 </script>
 </head>
 <body>
 <div id="top"><!-- do not remove this div, it is closed by doxygen! -->
 <div id="titlearea">
 <table cellspacing="0" cellpadding="0">
  <tbody>
  <tr style="height: 56px;">
   <td id="projectlogo"><a href="http://madlib.incubator.apache.org"><img alt="Logo" src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
    <span id="projectnumber">1.11</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
    <td>        <div id="MSearchBox" class="MSearchBoxInactive">
         <span class="left">
           <img id="MSearchSelect" src="search/mag_sel.png"
                onmouseover="return searchBox.OnSearchSelectShow()"
                onmouseout="return searchBox.OnSearchSelectHide()"
                alt=""/>
           <input type="text" id="MSearchField" value="Search" accesskey="S"
                onfocus="searchBox.OnSearchFieldFocus(true)"
                onblur="searchBox.OnSearchFieldFocus(false)"
                onkeyup="searchBox.OnSearchFieldChange(event)"/>
           </span><span class="right">
             <a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a>
           </span>
         </div>
 </td>
  </tr>
  </tbody>
 </table>
 </div>
 <!-- end header part -->
 <!-- Generated by Doxygen 1.8.13 -->
 <script type="text/javascript">
 var searchBox = new SearchBox("searchBox", "search",false,'Search');
 </script>
 </div><!-- top -->
 <div id="side-nav" class="ui-resizable side-nav-resizable">
   <div id="nav-tree">
     <div id="nav-tree-contents">
       <div id="nav-sync" class="sync"></div>
     </div>
   </div>
   <div id="splitbar" style="-moz-user-select:none;"
        class="ui-resizable-handle">
   </div>
 </div>
 <script type="text/javascript">
 $(document).ready(function(){initNavTree('group__grp__stats__tests.html','');});
 </script>
 <div id="doc-content">
 <!-- window showing the filter options -->
 <div id="MSearchSelectWindow"
      onmouseover="return searchBox.OnSearchSelectShow()"
      onmouseout="return searchBox.OnSearchSelectHide()"
      onkeydown="return searchBox.OnSearchSelectKey(event)">
 </div>

 <!-- iframe showing the search results (closed by default) -->
 <div id="MSearchResultsWindow">
 <iframe src="javascript:void(0)" frameborder="0"
         name="MSearchResults" id="MSearchResults">
 </iframe>
 </div>

 <div class="header">
   <div class="headertitle">
 <div class="title">Hypothesis Tests<div class="ingroups"><a class="el" href="group__grp__stats.html">Statistics</a> &raquo; <a class="el" href="group__grp__inf__stats.html">Inferential Statistics</a></div></div>  </div>
 </div><!--header-->
 <div class="contents">
 <div class="toc"><b>Contents</b> <ul>
 <li>
 <a href="#input">Input</a> </li>
 <li>
 <a href="#usage">Usage</a> </li>
 <li>
 <a href="#examples">Examples</a> </li>
 <li>
 <a href="#literature">Literature</a> </li>
 <li>
 <a href="#related">Related Topics</a> </li>
 </ul>
 </div><p>Hypothesis tests are used to confirm or reject a <em>null hypothesis</em> <img class="formulaInl" alt="$ H_0 $" src="form_397.png"/> about the distribution of random variables, given realizations of these random variables. Since in general it is not possible to make statements with certainty, one is interested in the probability <img class="formulaInl" alt="$ p $" src="form_111.png"/> of seeing random variates at least as extreme as the ones observed, assuming that <img class="formulaInl" alt="$ H_0 $" src="form_397.png"/> is true. If this probability <img class="formulaInl" alt="$ p $" src="form_111.png"/> is small, <img class="formulaInl" alt="$ H_0 $" src="form_397.png"/> will be rejected by the test with <em>significance level</em> <img class="formulaInl" alt="$ p $" src="form_111.png"/>. Falsifying <img class="formulaInl" alt="$ H_0 $" src="form_397.png"/> is the canonic goal when employing a hypothesis test. That is, hypothesis tests are typically used in order to substantiate that instead the <em>alternative hypothesis</em> <img class="formulaInl" alt="$ H_1 $" src="form_398.png"/> is true.</p>
 <p>Hypothesis tests may be divided into parametric and non-parametric tests. A parametric test assumes certain distributions and makes inferences about parameters of the distributions (e.g., the mean of a normal distribution). Formally, there is a given domain of possible parameters <img class="formulaInl" alt="$ \Gamma $" src="form_399.png"/> and the null hypothesis <img class="formulaInl" alt="$ H_0 $" src="form_397.png"/> is the event that the true parameter <img class="formulaInl" alt="$ \gamma_0 \in \Gamma_0 $" src="form_400.png"/>, where <img class="formulaInl" alt="$ \Gamma_0 \subsetneq \Gamma $" src="form_401.png"/>. Non-parametric tests, on the other hand, do not assume any particular distribution of the sample (e.g., a non-parametric test may simply test if two distributions are similar).</p>
 <p>The first step of a hypothesis test is to compute a <em>test statistic</em>, which is a function of the random variates, i.e., a random variate itself. A hypothesis test relies on the distribution of the test statistic being (approximately) known. Now, the <img class="formulaInl" alt="$ p $" src="form_111.png"/>-value is the probability of seeing a test statistic at least as extreme as the one observed, assuming that <img class="formulaInl" alt="$ H_0 $" src="form_397.png"/> is true. In a case where the null hypothesis corresponds to a family of distributions (e.g., in a parametric test where <img class="formulaInl" alt="$ \Gamma_0 $" src="form_402.png"/> is not a singleton set), the <img class="formulaInl" alt="$ p $" src="form_111.png"/>-value is the supremum, over all possible distributions according to the null hypothesis, of these probabilities.</p>
 <dl class="section note"><dt>Note</dt><dd>Please refer to <a class="el" href="hypothesis__tests_8sql__in.html">hypothesis_tests.sql_in</a> for additional technical information on the MADlib implementation of hypothesis tests, and for detailed function signatures for all tests.</dd></dl>
 <p><a class="anchor" id="input"></a></p><dl class="section user"><dt>Input</dt><dd></dd></dl>
 <p>Input data is assumed to be normalized with all values stored row-wise. In general, the following inputs are expected.</p>
 <p><b>One-sample tests</b> expect the following form: </p><pre>{TABLE|VIEW} <em>source</em> (
     ...
     <em>value</em> DOUBLE PRECISION
     ...
 )</pre><p><b>Two-sample tests</b> expect the following form: </p><pre>{TABLE|VIEW} <em>source</em> (
     ...
     <em>first</em> BOOLEAN,
     <em>value</em> DOUBLE PRECISION
     ...
 )</pre><p> The <code>first</code> column indicates whether a value is from the first sample (if <code>TRUE</code>) or the second sample (if <code>FALSE</code>).</p>
 <p><b>Many-sample tests</b> expect the following form: </p><pre>{TABLE|VIEW} <em>source</em> (
     ...
     <em>group</em> INTEGER,
     <em>value</em> DOUBLE PRECISION
     ...
 )</pre><p><a class="anchor" id="usage"></a></p><dl class="section user"><dt>Usage</dt><dd></dd></dl>
 <p>All tests are implemented as aggregate functions. The non-parametric (rank-based) tests are implemented as ordered aggregate functions and thus necessitate an <code>ORDER BY</code> clause. In the following, the most simple forms of usage are given. Specific function signatures, as described in <a class="el" href="hypothesis__tests_8sql__in.html">hypothesis_tests.sql_in</a>, may require more arguments or a different <code>ORDER BY</code> clause.</p>
 <ul>
 <li>Run a parametric one-sample test: <pre>SELECT <em>test</em>(<em>value</em>) FROM <em>source</em></pre> where '<em>test</em>' can be one of<ul>
 <li><code>t_test_one</code> (one-sample or dependent paired Student's t-test)</li>
 <li><code>chi2_gof_test</code> (Pearson's chi-squared goodness of fit test, also used for chi-squared independence test as shown in example section below)</li>
 </ul>
 </li>
 <li>Run a parametric two-sample/multi-sample test: <pre>SELECT <em>test</em>(<em>first/group</em>, <em>value</em>) FROM <em>source</em></pre> where '<em>test</em>' can be one of<ul>
 <li><code>f_test</code> (Fisher F-test)</li>
 <li><code>t_test_two_pooled</code> (two-sample pooled Student’s t-test, i.e. equal variances)</li>
 <li><code>t_test_two_unpooled</code> (two-sample unpooled t-test, i.e., unequal variances, also known as Welch's t-test)</li>
 <li><code>one_way_anova</code> (one-way analysis of variance, multi-sample)</li>
 </ul>
 </li>
 <li><p class="startli">Run a non-parametric two-sample/multi-sample test: </p><pre>SELECT <em>test</em>(<em>first/group</em>, <em>value</em> ORDER BY <em>value</em>) FROM <em>source</em></pre><p> where '<em>test</em>' can be one of</p><ul>
 <li><code>ks_test</code> (Kolmogorov-Smirnov test)</li>
 <li><code>mw_test</code> (Mann-Whitney test)</li>
 <li><code>wsr_test</code> (Wilcoxon signed-rank test, multi-sample)</li>
 </ul>
 <p class="startli"><b>Note on non-parametric tests:</b> Kolomogov-Smirnov two-sample test is based on the asymptotic theory. The p-value is given by comparing the test statistics with the Kolomogov distribution. The p-value is also adjusted for data with heavy tail distribution, which may give different results than those given by R function's ks.test. See [3] for a detailed explanation. The literature is not unanimous about the definitions of the Wilcoxon rank sum and Mann-Whitney tests. There are two possible definitions for the statistic; MADlib outputs the minimum of the two and uses it for significance testing. This might give different results for both mw_test and wsr_test compared to statistical functions in other popular packages (like R's wilcox.test function). See [4] for a detailed explanation.</p>
 </li>
 </ul>
 <p><a class="anchor" id="examples"></a></p><dl class="section user"><dt>Examples</dt><dd></dd></dl>
 <ul>
 <li><b>One-sample and two-sample t-test</b> (data is subset of mpg data from <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm">NIST/SEMATECH</a>)</li>
 </ul>
 <pre class="example">
 -- Load data
 DROP TABLE IF EXISTS auto83b;
 CREATE TABLE auto83b (
     id SERIAL,
     mpg_us DOUBLE PRECISION,
     mpg_j DOUBLE PRECISION
 );
 COPY auto83b (mpg_us, mpg_j) FROM stdin DELIMITER '|';
 18|24
 15|27
 18|27
 16|25
 17|31
 15|35
 14|24
 14|19
 21|31
 10|32
 10|24
 11|26
 9| 9
 \N|32
 \N|37
 \N|38
 \N|34
 \N|34
 \N|32
 \N|33
 \N|32
 \N|25
 \N|24
 \N|37
 13|\N
 12|\N
 18|\N
 21|\N
 19|\N
 21|\N
 15|\N
 16|\N
 15|\N
 11|\N
 20|\N
 21|\N
 19|\N
 15|\N
 \.
 </pre><pre class="example">
 -- Create table for one sample tests
 DROP TABLE IF EXISTS auto83b_one_sample;
 CREATE TABLE auto83b_one_sample AS
     SELECT mpg_us AS mpg
     FROM auto83b
     WHERE mpg_us is not NULL;
 -- Print table
 SELECT * FROM auto83b_one_sample;
 </pre><pre class="result">
 mpg
   18
   15
   18
   16
   17
   15
   14
   14
   21
   10
   10
   11
    9
   13
   12
   18
   21
   19
   21
   15
   16
   15
   11
   20
   21
   19
   15
 (27 rows)
 </pre> <pre class="example">
 -- Create table for two sample tests
 DROP TABLE IF EXISTS auto83b_two_sample;
 CREATE TABLE auto83b_two_sample AS
 SELECT TRUE AS is_us, mpg_us AS mpg
     FROM auto83b
     WHERE mpg_us is not NULL
     UNION ALL
     SELECT FALSE, mpg_j
     FROM auto83b
     WHERE mpg_j is not NULL;
 -- Print table
 SELECT * FROM auto83b_two_sample;
 </pre> <pre class="result">
  is_us | mpg
 -------+-----
  t     |  18
  t     |  15
  t     |  18
  t     |  16
  t     |  17
  t     |  15
  t     |  14
  t     |  14
  t     |  21
  t     |  10
  t     |  10
  t     |  11
  t     |   9
  t     |  13
  t     |  12
  t     |  18
  t     |  21
  t     |  19
  t     |  21
  t     |  15
  t     |  16
  t     |  15
  t     |  11
  t     |  20
  t     |  21
  t     |  19
  t     |  15
  f     |  24
  f     |  27
  f     |  27
  f     |  25
  f     |  31
  f     |  35
  f     |  24
  f     |  19
  f     |  31
  f     |  32
  f     |  24
  f     |  26
  f     |   9
  f     |  32
  f     |  37
  f     |  38
  f     |  34
  f     |  34
  f     |  32
  f     |  33
  f     |  32
  f     |  25
  f     |  24
  f     |  37
 (51 rows)
 </pre> <pre class="example">
 -- One sample tests
 SELECT (madlib.t_test_one(mpg - 20)).* FROM auto83b_one_sample;  -- test rejected for mean = 20
 </pre><pre class="result">
      statistic     | df | p_value_one_sided |  p_value_two_sided
  ------------------+----+-------------------+----------------------
   -6.0532478722666 | 26 | 0.999998926789141 | 2.14642171769697e-06
  </pre><pre class="example">
 SELECT (madlib.t_test_one(mpg - 15.7)).* FROM auto83b_one_sample;  -- test not rejected
 </pre><pre class="result">
        statistic      | df | p_value_one_sided | p_value_two_sided
  ---------------------+----+-------------------+-------------------
   0.00521831713126531 | 26 | 0.497938118950661 | 0.995876237901321
 </pre><pre class="example">
 -- Two sample tests
 SELECT (madlib.t_test_two_pooled(is_us, mpg)).* FROM auto83b_two_sample;
 </pre> <pre class="result">
      statistic     | df | p_value_one_sided |  p_value_two_sided
  -------------------+----+-------------------+----------------------
   -8.89342267075968 | 49 | 0.999999999995748 | 8.50408632402377e-12
  </pre><pre class="example">
 SELECT (madlib.t_test_two_unpooled(is_us, mpg)).* FROM auto83b_two_sample;
 </pre><pre class="result">
       statistic     |        df        | p_value_one_sided |  p_value_two_sided
  -------------------+------------------+-------------------+----------------------
   -8.61746388524314 | 35.1283818346179 | 0.999999999821218 | 3.57563867403599e-10
 </pre><ul>
 <li><b>F-Test</b> (Uses same data as above t-test)</li>
 </ul>
 <pre class="example">
 SELECT (madlib.f_test(is_us, mpg)).* FROM auto83b_two_sample;
 -- Test result indicates that the two distributions have different variances
 </pre> <pre class="result">
       statistic     | df1 | df2 | p_value_one_sided |  p_value_two_sided
  -------------------+-----+-----+-------------------+---------------------
   0.311786921089247 |  26 |  23 | 0.997559863672441 | 0.00488027265511803
 </pre><ul>
 <li><b>Chi-squared goodness-of-fit test</b> (<a href="http://www.statsdirect.com/help/default.htm#nonparametric_methods/chisq_goodness_fit.htm">Data source</a>)</li>
 </ul>
 <pre class="example">
 CREATE TABLE chi2_test_blood_group (
     id SERIAL,
     blood_group VARCHAR,
     observed BIGINT,
     expected DOUBLE PRECISION
 );
 INSERT INTO chi2_test_blood_group(blood_group, observed, expected) VALUES
     ('O', 67, 82.28),
     ('A', 83, 84.15),
     ('B', 29, 14.96),
     ('AB', 8, 5.61);
 SELECT (madlib.chi2_gof_test(observed, expected)).* FROM chi2_test_blood_group;
 </pre> <pre class="result">
      statistic     |       p_value        | df |       phi        | contingency_coef
  ------------------+----------------------+----+------------------+-------------------
   17.0481013341976 | 0.000690824622923826 |  3 | 2.06446732440826 | 0.899977280680593
  </pre><ul>
 <li><b>Chi-squared independence test</b> (<a href="http://itl.nist.gov/div898/software/dataplot/refman1/auxillar/chistest.htm">Data source</a>)</li>
 </ul>
 <p>The Chi-squared independence test uses the Chi-squared goodness-of-fit function, as shown in the example below. The expected value needs to be computed and passed to the goodness-of-fit function. The expected value for MADlib is computed as <em>sum of rows * sum of columns</em>, for each element of the input matrix. For e.g., expected value for element (2,1) would be <em>sum of row 2 * sum of column 1</em>.</p>
 <pre class="example">
 CREATE TABLE chi2_test_friendly (
     id_x SERIAL,
     values INTEGER[]
 );
 INSERT INTO chi2_test_friendly(values) VALUES
     (array[5, 29, 14, 16]),
     (array[15, 54, 14, 10]),
     (array[20, 84, 17, 94]),
     (array[68, 119, 26, 7]);</pre><pre class="example">-- Input table is expected to be unpivoted, so need to pivot it
 CREATE TABLE chi2_test_friendly_unpivoted AS
 SELECT id_x, id_y, values[id_y] AS observed
 FROM
     chi2_test_friendly,
     generate_series(1,4) AS id_y;</pre><pre class="example">-- Compute Chi-squared independence statistic, by calculating expected value in the SQL and calling the goodness-of-fit function
 SELECT (madlib.chi2_gof_test(observed, expected, deg_freedom)).*
 FROM (
     -- Compute expected values and degrees of freedom
     SELECT
         observed,
         sum(observed) OVER (PARTITION BY id_x)::DOUBLE PRECISION *
         sum(observed) OVER (PARTITION BY id_y) AS expected
     FROM chi2_test_friendly_unpivoted
 ) p, (
     SELECT
         (count(DISTINCT id_x) - 1) * (count(DISTINCT id_y) - 1) AS deg_freedom
     FROM chi2_test_friendly_unpivoted
 ) q;
 </pre> <pre class="result">
      statistic     |       p_value        | df |       phi        | contingency_coef
  ------------------+----------------------+----+------------------+-------------------
   138.289841626008 | 2.32528678709871e-25 |  9 | 2.93991753313346 | 0.946730727519112
  </pre><ul>
 <li><b>ANOVA test</b> (<a href="http://www.itl.nist.gov/div898/handbook/prc/section4/prc433.htm">Data source</a>)</li>
 </ul>
 <pre class="example">
 CREATE TABLE nist_anova_test (
     id SERIAL,
     resistance FLOAT8[]
 );
 INSERT INTO nist_anova_test(resistance) VALUES
     (array[6.9,8.3,8.0]),
     (array[5.4,6.8,10.5]),
     (array[5.8,7.8,8.1]),
     (array[4.6,9.2,6.9]),
     (array[4.0,6.5,9.3]);</pre><pre class="example">SELECT (madlib.one_way_anova(level, value)).* FROM (
     SELECT level, resistance[level] AS value
     FROM
         nist_anova_test, (SELECT * FROM generate_series(1,3) level) q1
 ) q2;
 </pre> <pre class="result">
   sum_squares_between | sum_squares_within | df_between | df_within | mean_squares_between | mean_squares_within |    statistic     |      p_value
  ---------------------+--------------------+------------+-----------+----------------------+---------------------+------------------+--------------------
      27.8973333333333 |             17.452 |          2 |        12 |     13.9486666666667 |    1.45433333333333 | 9.59110703644281 | 0.0032482226008593
 </pre><ul>
 <li><b>Kolmogorov-Smirnov test</b> (<a href="http://www.physics.csbsju.edu/stats/KS-test.html">Data source</a>)</li>
 </ul>
 <pre class="example">
 CREATE TABLE ks_sample_1 AS
 SELECT
     TRUE AS first,
     unnest(ARRAY[0.22, -0.87, -2.39, -1.79, 0.37, -1.54, 1.28, -0.31, -0.74, 1.72, 0.38, -0.17, -0.62, -1.10, 0.30, 0.15, 2.30, 0.19, -0.50, -0.09]) AS value
 UNION ALL
 SELECT
     FALSE,
     unnest(ARRAY[-5.13, -2.19, -2.43, -3.83, 0.50, -3.25, 4.32, 1.63, 5.18, -0.43, 7.11, 4.87, -3.10, -5.81, 3.76, 6.31, 2.58, 0.07, 5.76, 3.50]);</pre><pre class="example">SELECT (madlib.ks_test(first, value,
     (SELECT count(value) FROM ks_sample_1 WHERE first),
     (SELECT count(value) FROM ks_sample_1 WHERE NOT first)
     ORDER BY value)).*
 FROM ks_sample_1;
 </pre> <pre class="result">
   statistic |   k_statistic   |      p_value
  -----------+-----------------+--------------------
        0.45 | 1.4926782214936 | 0.0232132758544496
 </pre><ul>
 <li><b>Mann-Whitney test</b> (use same data as t-test)</li>
 </ul>
 <pre class="example">
 SELECT (madlib.mw_test(is_us, mpg ORDER BY mpg)).* from auto83b_two_sample;
 -- Note first parameter above is BOOLEAN
 </pre> <pre class="result">
       statistic     | u_statistic | p_value_one_sided |  p_value_two_sided
  -------------------+-------------+-------------------+----------------------
   -5.50097925755249 |        32.5 | 0.999999981115618 | 3.77687645883758e-08
 </pre><ul>
 <li><b>Wilcoxon signed-rank test</b></li>
 </ul>
 <pre class="example">
 DROP TABLE IF EXISTS test_wsr;
 CREATE TABLE test_wsr (
     x DOUBLE PRECISION,
     y DOUBLE PRECISION
 );
 COPY test_wsr (x, y) FROM stdin DELIMITER '|';
 0.32|0.39
 0.4|0.47
 0.11|0.11
 0.47|0.43
 0.32|0.42
 0.35|0.3
 0.32|0.43
 0.63|0.98
 0.5|0.86
 0.6|0.79
 0.38|0.33
 0.46|0.45
 0.2|0.22
 0.31|0.3
 0.62|0.6
 0.52|0.53
 0.77|0.85
 0.23|0.21
 0.3|0.33
 0.7|0.57
 0.41|0.43
 0.53|0.49
 0.19|0.2
 0.31|0.35
 0.48|0.4
 \.

 SELECT (madlib.wsr_test(
     x - y,
     2 * 2^(-52) * greatest(x,y)
     ORDER BY abs(x - y)
 )).*
 FROM test_wsr;
 </pre> <pre class="result">
   statistic | rank_sum_pos | rank_sum_neg | num |    z_statistic    | p_value_one_sided | p_value_two_sided
  -----------+--------------+--------------+-----+-------------------+-------------------+-------------------
       105.5 |        105.5 |        194.5 |  24 | -1.27318365656729 | 0.898523560667509 | 0.202952878664983
 </pre><p><a class="anchor" id="literature"></a></p><dl class="section user"><dt>Literature</dt><dd></dd></dl>
 <p>[1] M. Hollander, D. Wolfe: <em>Nonparametric Statistical Methods</em>, 2nd edition, Wiley, 1999</p>
 <p>[2] E. Lehmann, J. Romano: <em>Testing Statistical Hypotheses</em>, 3rd edition, Springer, 2005</p>
 <p>[3] M. Stephens: <em>Use of the Kolmogorov-Smirnov, Cramer-Von Mises and related statistics without extensive tables</em>, Journal of the Royal Statistical Society. Series B (Methodological) (1970): 115-122.</p>
 <p>[4] Wikipedia: Mann–Whitney U test calculation, <a href="http://en.wikipedia.org/wiki/Mann-Whitney_test#Calculations">http://en.wikipedia.org/wiki/Mann-Whitney_test#Calculations</a></p>
 <p><a class="anchor" id="related"></a></p><dl class="section user"><dt>Related Topics</dt><dd></dd></dl>
 <p>File <a class="el" href="hypothesis__tests_8sql__in.html" title="SQL functions for statistical hypothesis tests. ">hypothesis_tests.sql_in</a> documenting the SQL functions. </p>
 </div><!-- contents -->
 </div><!-- doc-content -->
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
     <li class="footer">Generated on Tue May 16 2017 13:24:38 for MADlib by
     <a href="http://www.doxygen.org/index.html">
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.13 </li>
   </ul>
 </div>
 </body>
 </html>