| <?xml version="1.0"?> |
| |
| <!-- |
| Copyright 2003-2005 The Apache Software Foundation |
| |
| Licensed under the Apache License, Version 2.0 (the "License"); |
| you may not use this file except in compliance with the License. |
| You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <?xml-stylesheet type="text/xsl" href="./xdoc.xsl"?> |
| <!-- $Revision$ $Date$ --> |
| <document url="stat.html"> |
| <properties> |
| <title>The Commons Math User Guide - Statistics</title> |
| </properties> |
| <body> |
| <section name="1 Statistics"> |
| <subsection name="1.1 Overview" href="overview"> |
| <p> |
| The statistics package provides frameworks and implementations for |
| basic Descriptive statistics, frequency distributions, bivariate regression, |
| and t- and chi-square test statistics. |
| </p> |
| <p> |
| <a href="#1.2 Descriptive statistics">Descriptive statistics</a><br></br> |
| <a href="#1.3 Frequency distributions">Frequency distributions</a><br></br> |
| <a href="#1.4 Simple regression">Simple Regression</a><br></br> |
| <a href="#1.5 Statistical tests">Statistical Tests</a><br></br> |
| </p> |
| </subsection> |
| <subsection name="1.2 Descriptive statistics" href="univariate"> |
| <p> |
| The stat package includes a framework and default implementations for |
| the following Descriptive statistics: |
| <ul> |
| <li>arithmetic and geometric means</li> |
| <li>variance and standard deviation</li> |
| <li>sum, product, log sum, sum of squared values</li> |
| <li>minimum, maximum, median, and percentiles</li> |
| <li>skewness and kurtosis</li> |
| <li>first, second, third and fourth moments</li> |
| </ul> |
| </p> |
| <p> |
| With the exception of percentiles and the median, all of these |
| statistics can be computed without maintaining the full list of input |
| data values in memory. The stat package provides interfaces and |
| implementations that do not require value storage as well as |
| implementations that operate on arrays of stored values. |
| </p> |
| <p> |
| The top level interface is |
| <a href="../apidocs/org/apache/commons/math/stat/descriptive/UnivariateStatistic.html"> |
| org.apache.commons.math.stat.descriptive.UnivariateStatistic.</a> |
| This interface, implemented by all statistics, consists of |
| <code>evaluate()</code> methods that take double[] arrays as arguments |
| and return the value of the statistic. This interface is extended by |
| <a href="../apidocs/org/apache/commons/math/stat/descriptive/StorelessUnivariateStatistic.html"> |
| StorelessUnivariateStatistic</a>, which adds <code>increment(),</code> |
| <code>getResult()</code> and associated methods to support |
| "storageless" implementations that maintain counters, sums or other |
| state information as values are added using the <code>increment()</code> |
| method. |
| </p> |
| <p> |
| Abstract implementations of the top level interfaces are provided in |
| <a href="../apidocs/org/apache/commons/math/stat/descriptive/AbstractUnivariateStatistic.html"> |
| AbstractUnivariateStatistic</a> and |
| <a href="../apidocs/org/apache/commons/math/stat/descriptive/AbstractStorelessUnivariateStatistic.html"> |
| AbstractStorelessUnivariateStatistic</a> respectively. |
| </p> |
| <p> |
| Each statistic is implemented as a separate class, in one of the |
| subpackages (moment, rank, summary) and each extends one of the abstract |
| classes above (depending on whether or not value storage is required to |
| compute the statistic). There are several ways to instantiate and use statistics. |
| Statistics can be instantiated and used directly, but it is generally more convenient |
| (and efficient) to access them using the provided aggregates, |
| <a href="../apidocs/org/apache/commons/math/stat/descriptive/DescriptiveStatistics.html"> |
| DescriptiveStatistics</a> and |
| <a href="../apidocs/org/apache/commons/math/stat/descriptive/SummaryStatistics.html"> |
| SummaryStatistics.</a> |
| </p> |
| <p> |
| <code>DescriptiveStatistics</code> maintains the input data in memory |
| and has the capability of producing "rolling" statistics computed from a |
| "window" consisting of the most recently added values. |
| </p> |
| <p> |
| <code>SummaryStatisics</code> does not store the input data values |
| in memory, so the statistics included in this aggregate are limited to those |
| that can be computed in one pass through the data without access to |
| the full array of values. |
| </p> |
| <p> |
| <table> |
| <tr><th>Aggregate</th><th>Statistics Included</th><th>Values stored?</th> |
| <th>"Rolling" capability?</th></tr><tr><td> |
| <a href="../apidocs/org/apache/commons/math/stat/descriptive/DescriptiveStatistics.html"> |
| DescriptiveStatistics</a></td><td>min, max, mean, geometric mean, n, |
| sum, sum of squares, standard deviation, variance, percentiles, skewness, |
| kurtosis, median</td><td>Yes</td><td>Yes</td></tr><tr><td> |
| <a href="../apidocs/org/apache/commons/math/stat/descriptive/SummaryStatistics.html"> |
| SummaryStatistics</a></td><td>min, max, mean, geometric mean, n, |
| sum, sum of squares, standard deviation, variance</td><td>No</td><td>No</td></tr> |
| </table> |
| </p> |
| <p> |
| There is also a utility class, |
| <a href="../apidocs/org/apache/commons/math/stat/StatUtils.html"> |
| StatUtils</a>, that provides static methods for computing statistics |
| directly from double[] arrays. |
| </p> |
| <p> |
| Here are some examples showing how to compute Descriptive statistics. |
| <dl> |
| <dt>Compute summary statistics for a list of double values</dt> |
| <br></br> |
| <dd>Using the <code>DescriptiveStatistics</code> aggregate |
| (values are stored in memory): |
| <source> |
| // Get a DescriptiveStatistics instance using factory method |
| DescriptiveStatistics stats = DescriptiveStatistics.newInstance(); |
| |
| // Add the data from the array |
| for( int i = 0; i < inputArray.length; i++) { |
| stats.addValue(inputArray[i]); |
| } |
| |
| // Compute some statistics |
| double mean = stats.getMean(); |
| double std = stats.getStandardDeviation(); |
| double median = stats.getMedian(); |
| </source> |
| </dd> |
| <dd>Using the <code>SummaryStatistics</code> aggregate (values are |
| <strong>not</strong> stored in memory): |
| <source> |
| // Get a SummaryStatistics instance using factory method |
| SummaryStatistics stats = SummaryStatistics.newInstance(); |
| |
| // Read data from an input stream, |
| // adding values and updating sums, counters, etc. |
| while (line != null) { |
| line = in.readLine(); |
| stats.addValue(Double.parseDouble(line.trim())); |
| } |
| in.close(); |
| |
| // Compute the statistics |
| double mean = stats.getMean(); |
| double std = stats.getStandardDeviation(); |
| //double median = stats.getMedian(); <-- NOT AVAILABLE |
| </source> |
| </dd> |
| <dd>Using the <code>StatUtils</code> utility class: |
| <source> |
| // Compute statistics directly from the array |
| // assume values is a double[] array |
| double mean = StatUtils.mean(values); |
| double std = StatUtils.variance(values); |
| double median = StatUtils.percentile(50); |
| |
| // Compute the mean of the first three values in the array |
| mean = StatuUtils.mean(values, 0, 3); |
| </source> |
| </dd> |
| <dt>Maintain a "rolling mean" of the most recent 100 values from |
| an input stream</dt> |
| <br></br> |
| <dd>Use a <code>DescriptiveStatistics</code> instance with |
| window size set to 100 |
| <source> |
| // Create a DescriptiveStats instance and set the window size to 100 |
| DescriptiveStatistics stats = DescriptiveStatistics.newInstance(); |
| stats.setWindowSize(100); |
| |
| // Read data from an input stream, |
| // displaying the mean of the most recent 100 observations |
| // after every 100 observations |
| long nLines = 0; |
| while (line != null) { |
| line = in.readLine(); |
| stats.addValue(Double.parseDouble(line.trim())); |
| if (nLines == 100) { |
| nLines = 0; |
| System.out.println(stats.getMean()); |
| } |
| } |
| in.close(); |
| </source> |
| </dd> |
| </dl> |
| </p> |
| </subsection> |
| <subsection name="1.3 Frequency distributions" href="frequency"> |
| <p> |
| <a href="../apidocs/org/apache/commons/math/stat/Frequency.html"> |
| org.apache.commons.math.stat.descriptive.Frequency</a> |
| provides a simple interface for maintaining counts and percentages of discrete |
| values. |
| </p> |
| <p> |
| Strings, integers, longs and chars are all supported as value types, |
| as well as instances of any class that implements <code>Comparable.</code> |
| The ordering of values used in computing cumulative frequencies is by |
| default the <i>natural ordering,</i> but this can be overriden by supplying a |
| <code>Comparator</code> to the constructor. Adding values that are not |
| comparable to those that have already been added results in an |
| <code>IllegalArgumentException.</code> |
| </p> |
| <p> |
| Here are some examples. |
| <dl> |
| <dt>Compute a frequency distribution based on integer values</dt> |
| <br></br> |
| <dd>Mixing integers, longs, Integers and Longs: |
| <source> |
| Frequency f = new Frequency(); |
| f.addValue(1); |
| f.addValue(new Integer(1)); |
| f.addValue(new Long(1)); |
| f.addValue(2); |
| f.addValue(new Integer(-1)); |
| System.out.prinltn(f.getCount(1)); // displays 3 |
| System.out.println(f.getCumPct(0)); // displays 0.2 |
| System.out.println(f.getPct(new Integer(1))); // displays 0.6 |
| System.out.println(f.getCumPct(-2)); // displays 0 |
| System.out.println(f.getCumPct(10)); // displays 1 |
| </source> |
| </dd> |
| <dt>Count string frequencies</dt> |
| <br></br> |
| <dd>Using case-sensitive comparison, alpha sort order (natural comparator): |
| <source> |
| Frequency f = new Frequency(); |
| f.addValue("one"); |
| f.addValue("One"); |
| f.addValue("oNe"); |
| f.addValue("Z"); |
| System.out.println(f.getCount("one")); // displays 1 |
| System.out.println(f.getCumPct("Z")); // displays 0.5 |
| System.out.println(f.getCumPct("Ot")); // displays 0.25 |
| </source> |
| </dd> |
| <dd>Using case-insensitive comparator: |
| <source> |
| Frequency f = new Frequency(String.CASE_INSENSITIVE_ORDER); |
| f.addValue("one"); |
| f.addValue("One"); |
| f.addValue("oNe"); |
| f.addValue("Z"); |
| System.out.println(f.getCount("one")); // displays 3 |
| System.out.println(f.getCumPct("z")); // displays 1 |
| </source> |
| </dd> |
| </dl> |
| </p> |
| </subsection> |
| <subsection name="1.4 Simple regression" href="regression"> |
| <p> |
| <a href="../apidocs/org/apache/commons/math/stat/regression/SimpleRegression.html"> |
| org.apache.commons.math.stat.regression.SimpleRegression</a> |
| provides ordinary least squares regression with one independent variable, |
| estimating the linear model: |
| </p> |
| <p> |
| <code> y = intercept + slope * x </code> |
| </p> |
| <p> |
| Standard errors for <code>intercept</code> and <code>slope</code> are |
| available as well as ANOVA, r-square and Pearson's r statistics. |
| </p> |
| <p> |
| Observations (x,y pairs) can be added to the model one at a time or they |
| can be provided in a 2-dimensional array. The observations are not stored |
| in memory, so there is no limit to the number of observations that can be |
| added to the model. |
| </p> |
| <p> |
| <strong>Usage Notes</strong>: <ul> |
| <li> When there are fewer than two observations in the model, or when |
| there is no variation in the x values (i.e. all x values are the same) |
| all statistics return <code>NaN</code>. At least two observations with |
| different x coordinates are requred to estimate a bivariate regression |
| model.</li> |
| <li> getters for the statistics always compute values based on the current |
| set of observations -- i.e., you can get statistics, then add more data |
| and get updated statistics without using a new instance. There is no |
| "compute" method that updates all statistics. Each of the getters performs |
| the necessary computations to return the requested statistic.</li> |
| </ul> |
| </p> |
| <p> |
| <strong>Implementation Notes</strong>: <ul> |
| <li> As observations are added to the model, the sum of x values, y values, |
| cross products (x times y), and squared deviations of x and y from their |
| respective means are updated using updating formulas defined in |
| "Algorithms for Computing the Sample Variance: Analysis and |
| Recommendations", Chan, T.F., Golub, G.H., and LeVeque, R.J. |
| 1983, American Statistician, vol. 37, pp. 242-247, referenced in |
| Weisberg, S. "Applied Linear Regression". 2nd Ed. 1985. All regression |
| statistics are computed from these sums.</li> |
| <li> Inference statistics (confidence intervals, parameter significance levels) |
| are based on on the assumption that the observations included in the model are |
| drawn from a <a href="http://mathworld.wolfram.com/BivariateNormalDistribution.html"> |
| Bivariate Normal Distribution</a></li> |
| </ul> |
| </p> |
| <p> |
| Here are some examples. |
| <dl> |
| <dt>Estimate a model based on observations added one at a time</dt> |
| <br></br> |
| <dd>Instantiate a regression instance and add data points |
| <source> |
| regression = new SimpleRegression(); |
| regression.addData(1d, 2d); |
| // At this point, with only one observation, |
| // all regression statistics will return NaN |
| |
| regression.addData(3d, 3d); |
| // With only two observations, |
| // slope and intercept can be computed |
| // but inference statistics will return NaN |
| |
| regression.addData(3d, 3d); |
| // Now all statistics are defined. |
| </source> |
| </dd> |
| <dd>Compute some statistics based on observations added so far |
| <source> |
| System.out.println(regression.getIntercept()); |
| // displays intercept of regression line |
| |
| System.out.println(regression.getSlope()); |
| // displays slope of regression line |
| |
| System.out.println(regression.getSlopeStdErr()); |
| // displays slope standard error |
| </source> |
| </dd> |
| <dd>Use the regression model to predict the y value for a new x value |
| <source> |
| System.out.println(regression.predict(1.5d) |
| // displays predicted y value for x = 1.5 |
| </source> |
| More data points can be added and subsequent getXxx calls will incorporate |
| additional data in statistics. |
| </dd> |
| <dt>Estimate a model from a double[][] array of data points</dt> |
| <br></br> |
| <dd>Instantiate a regression object and load dataset |
| <source> |
| double[][] data = { { 1, 3 }, {2, 5 }, {3, 7 }, {4, 14 }, {5, 11 }}; |
| SimpleRegression regression = new SimpleRegression(); |
| regression.addData(data); |
| </source> |
| </dd> |
| <dd>Estimate regression model based on data |
| <source> |
| System.out.println(regression.getIntercept()); |
| // displays intercept of regression line |
| |
| System.out.println(regression.getSlope()); |
| // displays slope of regression line |
| |
| System.out.println(regression.getSlopeStdErr()); |
| // displays slope standard error |
| </source> |
| More data points -- even another double[][] array -- can be added and subsequent |
| getXxx calls will incorporate additional data in statistics. |
| </dd> |
| </dl> |
| </p> |
| </subsection> |
| <subsection name="1.5 Statistical tests" href="tests"> |
| <p> |
| The interfaces and implementations in the |
| <a href="../apidocs/org/apache/commons/math/stat/inference/"> |
| org.apache.commons.math.stat.inference</a> package provide |
| <a href="http://www.itl.nist.gov/div898/handbook/prc/section2/prc22.htm"> |
| Student's t</a> and |
| <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm"> |
| Chi-Square</a> test statistics as well as |
| <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue"> |
| p-values</a> associated with <code>t-</code> and |
| <code>Chi-Square</code> tests. The interfaces are |
| <a href="../apidocs/org/apache/commons/math/stat/inference/TTest.html"> |
| TTest</a> and |
| <a href="../apidocs/org/apache/commons/math/stat/inference/ChiSquareTest.html"> |
| ChiSquareTest</a>, with |
| provided implementations |
| <a href="../apidocs/org/apache/commons/math/stat/inference/TTestImpl.html"> |
| TTestImpl</a> and |
| <a href="../apidocs/org/apache/commons/math/stat/inference/ChiSquareTestImpl.html"> |
| ChiSquareTestImpl</a>. |
| Abstract and default factories are provided, with configuration |
| optional using commons-discovery to specify the concrete factory. The |
| <a href="../apidocs/org/apache/commons/math/stat/inference/TestUtils.html"> |
| TestUtils</a> class provides static methods to get test instances or |
| to compute test statistics directly. The examples below all use the |
| static methods in <code>TestUtils</code> to execute tests. To get |
| test object instances, either use e.g., |
| <code>TestUtils.getTTest()</code> or use the factory directly, e.g., |
| <code>TestFactory.newInstance().createChiSquareTest()</code>. |
| </p> |
| <p> |
| <strong>Implementation Notes</strong> |
| <ul> |
| <li>Both one- and two-sample t-tests are supported. Two sample tests |
| can be either paired or unpaired and the unpaired two-sample tests can |
| be conducted under the assumption of equal subpopulation variances or |
| without this assumption. When equal variances is assumed, a pooled |
| variance estimate is used to compute the t-statistic and the degrees |
| of freedom used in the t-test equals the sum of the sample sizes minus 2. |
| When equal variances is not assumed, the t-statistic uses both sample |
| variances and the |
| <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/gifs/nu3.gif"> |
| Welch-Satterwaite approximation</a> is used to compute the degrees |
| of freedom. Methods to return t-statistics and p-values are provided in each |
| case, as well as boolean-valued methods to perform fixed significance |
| level tests. The names of methods or methods that assume equal |
| subpopulation variances always start with "homoscedastic." Test or |
| test-statistic methods that just start with "t" do not assume equal |
| variances. See the examples below and the API documentation for |
| more details.</li> |
| <li>The validity of the p-values returned by the t-test depends on the |
| assumptions of the parametric t-test procedure, as discussed |
| <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html"> |
| here</a></li> |
| <li>p-values returned by both t- and chi-square tests are exact, based |
| on numerical approximations to the t- and chi-square distributions in the |
| <code>distributions</code> package. </li> |
| <li>p-values returned by t-tests are for two-sided tests and the boolean-valued |
| methods supporting fixed significance level tests assume that the hypotheses |
| are two-sided. One sided tests can be performed by dividing returned p-values |
| (resp. critical values) by 2.</li> |
| <li>Degrees of freedom for chi-square tests are integral values, based on the |
| number of observed or expected counts (number of observed counts - 1) |
| for the goodness-of-fit tests and (number of columns -1) * (number of rows - 1) |
| for independence tests.</li> |
| </ul> |
| </p> |
| <p> |
| <strong>Examples:</strong> |
| <dl> |
| <dt><strong>One-sample <code>t</code> tests</strong></dt> |
| <br></br> |
| <dd>To compare the mean of a double[] array to a fixed value: |
| <source> |
| double[] observed = {1d, 2d, 3d}; |
| double mu = 2.5d; |
| System.out.println(TestUtils.t(mu, observed); |
| </source> |
| The code above will display the t-statisitic associated with a one-sample |
| t-test comparing the mean of the <code>observed</code> values against |
| <code>mu.</code> |
| </dd> |
| <dd>To compare the mean of a dataset described by a |
| <a href="../apidocs/org/apache/commons/math/stat/descriptive/StatisticalSummary.html"> |
| org.apache.commons.math.stat.descriptive.StatisticalSummary</a> to a fixed value: |
| <source> |
| double[] observed ={1d, 2d, 3d}; |
| double mu = 2.5d; |
| SummaryStatistics sampleStats = null; |
| sampleStats = SummaryStatistics.newInstance(); |
| for (int i = 0; i < observed.length; i++) { |
| sampleStats.addValue(observed[i]); |
| } |
| System.out.println(TestUtils.t(mu, observed); |
| </source> |
| </dd> |
| <dd>To compute the p-value associated with the null hypothesis that the mean |
| of a set of values equals a point estimate, against the two-sided alternative that |
| the mean is different from the target value: |
| <source> |
| double[] observed = {1d, 2d, 3d}; |
| double mu = 2.5d; |
| System.out.println(TestUtils.tTest(mu, observed); |
| </source> |
| The snippet above will display the p-value associated with the null |
| hypothesis that the mean of the population from which the |
| <code>observed</code> values are drawn equals <code>mu.</code> |
| </dd> |
| <dd>To perform the test using a fixed significance level, use: |
| <source> |
| TestUtils.tTest(mu, observed, alpha); |
| </source> |
| where <code>0 < alpha < 0.5</code> is the significance level of |
| the test. The boolean value returned will be <code>true</code> iff the |
| null hypothesis can be rejected with confidence <code>1 - alpha</code>. |
| To test, for example at the 95% level of confidence, use |
| <code>alpha = 0.05</code> |
| </dd> |
| <dt><strong>Two-Sample t-tests</strong></dt> |
| <br></br> |
| <dd><strong>Example 1:</strong> Paired test evaluating |
| the null hypothesis that the mean difference between corresponding |
| (paired) elements of the <code>double[]</code> arrays |
| <code>sample1</code> and <code>sample2</code> is zero. |
| <p> |
| To compute the t-statistic: |
| <source> |
| TestUtils.pairedT(sample1, sample2); |
| </source> |
| </p> |
| <p> |
| To compute the p-value: |
| <source> |
| TestUtils.pairedTTest(sample1, sample2); |
| </source> |
| </p> |
| <p> |
| To perform a fixed significance level test with alpha = .05: |
| <source> |
| TestUtils.pairedTTest(sample1, sample2, .05); |
| </source> |
| </p> |
| The last example will return <code>true</code> iff the p-value |
| returned by <code>TestUtils.pairedTTest(sample1, sample2)</code> |
| is less than <code>.05</code> |
| </dd> |
| <dd><strong>Example 2: </strong> unpaired, two-sided, two-sample t-test using |
| <code>StatisticalSummary</code> instances, without assuming that |
| subpopulation variances are equal. |
| <p> |
| First create the <code>StatisticalSummary</code> instances. Both |
| <code>DescriptiveStatistics</code> and <code>SummaryStatistics</code> |
| implement this interface. Assume that <code>summary1</code> and |
| <code>summary2</code> are <code>SummaryStatistics</code> instances, |
| each of which has had at least 2 values added to the (virtual) dataset that |
| it describes. The sample sizes do not have to be the same -- all that is required |
| is that both samples have at least 2 elements. |
| </p> |
| <p><strong>Note:</strong> The <code>SummaryStatistics</code> class does |
| not store the dataset that it describes in memory, but it does compute all |
| statistics necessary to perform t-tests, so this method can be used to |
| conduct t-tests with very large samples. One-sample tests can also be |
| performed this way. |
| (See <a href="#1.2 Descriptive statistics">Descriptive statistics</a> for details |
| on the <code>SummaryStatistics</code> class.) |
| </p> |
| <p> |
| To compute the t-statistic: |
| <source> |
| TestUtils.t(summary1, summary2); |
| </source> |
| </p> |
| <p> |
| To compute the p-value: |
| <source> |
| TestUtils.tTest(sample1, sample2); |
| </source> |
| </p> |
| <p> |
| To perform a fixed significance level test with alpha = .05: |
| <source> |
| TestUtils.tTest(sample1, sample2, .05); |
| </source> |
| </p> |
| <p> |
| In each case above, the test does not assume that the subpopulation |
| variances are equal. To perform the tests under this assumption, |
| replace "t" at the beginning of the method name with "homoscedasticT" |
| </p> |
| </dd> |
| <dt>Computing <code>chi-square</code> test statistics</dt> |
| <br></br> |
| <dd>To compute a chi-square statistic measuring the agreement between a |
| <code>long[]</code> array of observed counts and a <code>double[]</code> |
| array of expected counts, use: |
| <source> |
| long[] observed = {10, 9, 11}; |
| double[] expected = {10.1, 9.8, 10.3}; |
| System.out.println(TestUtils.chiSquare(expected, observed)); |
| </source> |
| the value displayed will be |
| <code>sum((expected[i] - observed[i])^2 / expected[i])</code> |
| </dd> |
| <dd> To get the p-value associated with the null hypothesis that |
| <code>observed</code> conforms to <code>expected</code> use: |
| <source> |
| TestUtils.chiSquareTest(expected, observed); |
| </source> |
| </dd> |
| <dd> To test the null hypothesis that <code>observed</code> conforms to |
| <code>expected</code> with <code>alpha</code> siginficance level |
| (equiv. <code>100 * (1-alpha)%</code> confidence) where <code> |
| 0 < alpha < 1 </code> use: |
| <source> |
| TestUtils.chiSquareTest(expected, observed, alpha); |
| </source> |
| The boolean value returned will be <code>true</code> iff the null hypothesis |
| can be rejected with confidence <code>1 - alpha</code>. |
| </dd> |
| <dd>To compute a chi-square statistic statistic associated with a |
| <a href="http://www.itl.nist.gov/div898/handbook/prc/section4/prc45.htm"> |
| chi-square test of independence</a> based on a two-dimensional (long[][]) |
| <code>counts</code> array viewed as a two-way table, use: |
| <source> |
| TestUtils.chiSquareTest(counts); |
| </source> |
| The rows of the 2-way table are |
| <code>count[0], ... , count[count.length - 1]. </code><br></br> |
| The chi-square statistic returned is |
| <code>sum((counts[i][j] - expected[i][j])^2/expected[i][j])</code> |
| where the sum is taken over all table entries and |
| <code>expected[i][j]</code> is the product of the row and column sums at |
| row <code>i</code>, column <code>j</code> divided by the total count. |
| </dd> |
| <dd>To compute the p-value associated with the null hypothesis that |
| the classifications represented by the counts in the columns of the input 2-way |
| table are independent of the rows, use: |
| <source> |
| TestUtils.chiSquareTest(counts); |
| </source> |
| </dd> |
| <dd>To perform a chi-square test of independence with <code>alpha</code> |
| siginficance level (equiv. <code>100 * (1-alpha)%</code> confidence) |
| where <code>0 < alpha < 1 </code> use: |
| <source> |
| TestUtils.chiSquareTest(counts, alpha); |
| </source> |
| The boolean value returned will be <code>true</code> iff the null |
| hypothesis can be rejected with confidence <code>1 - alpha</code>. |
| </dd> |
| </dl> |
| </p> |
| </subsection> |
| </section> |
| </body> |
| </document> |