| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
| <html xmlns="http://www.w3.org/1999/xhtml"> |
| <head> |
| <meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/> |
| <meta http-equiv="X-UA-Compatible" content="IE=9"/> |
| <meta name="generator" content="Doxygen 1.8.4"/> |
| <title>MADlib: Huber White Variance</title> |
| <link href="tabs.css" rel="stylesheet" type="text/css"/> |
| <script type="text/javascript" src="jquery.js"></script> |
| <script type="text/javascript" src="dynsections.js"></script> |
| <link href="navtree.css" rel="stylesheet" type="text/css"/> |
| <script type="text/javascript" src="resize.js"></script> |
| <script type="text/javascript" src="navtree.js"></script> |
| <script type="text/javascript"> |
| $(document).ready(initResizable); |
| $(window).load(resizeHeight); |
| </script> |
| <link href="search/search.css" rel="stylesheet" type="text/css"/> |
| <script type="text/javascript" src="search/search.js"></script> |
| <script type="text/javascript"> |
| $(document).ready(function() { searchBox.OnSelectItem(0); }); |
| </script> |
| <script type="text/x-mathjax-config"> |
| MathJax.Hub.Config({ |
| extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], |
| jax: ["input/TeX","output/HTML-CSS"], |
| }); |
| </script><script src="../mathjax/MathJax.js"></script> |
| <link href="doxygen.css" rel="stylesheet" type="text/css" /> |
| </head> |
| <body> |
| <div id="top"><!-- do not remove this div, it is closed by doxygen! --> |
| <div id="titlearea"> |
| <table cellspacing="0" cellpadding="0"> |
| <tbody> |
| <tr style="height: 56px;"> |
| <td style="padding-left: 0.5em;"> |
| <div id="projectname">MADlib |
|  <span id="projectnumber">1.0</span> <span style="font-size:10pt; font-style:italic"><a href="../latest/./group__grp__robust.html"> A newer version is available</a></span> |
| </div> |
| <div id="projectbrief">User Documentation</div> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </div> |
| <!-- end header part --> |
| <!-- Generated by Doxygen 1.8.4 --> |
| <script type="text/javascript"> |
| var searchBox = new SearchBox("searchBox", "search",false,'Search'); |
| </script> |
| <div id="navrow1" class="tabs"> |
| <ul class="tablist"> |
| <li><a href="index.html"><span>Main Page</span></a></li> |
| <li><a href="modules.html"><span>Modules</span></a></li> |
| <li> |
| <div id="MSearchBox" class="MSearchBoxInactive"> |
| <span class="left"> |
| <img id="MSearchSelect" src="search/mag_sel.png" |
| onmouseover="return searchBox.OnSearchSelectShow()" |
| onmouseout="return searchBox.OnSearchSelectHide()" |
| alt=""/> |
| <input type="text" id="MSearchField" value="Search" accesskey="S" |
| onfocus="searchBox.OnSearchFieldFocus(true)" |
| onblur="searchBox.OnSearchFieldFocus(false)" |
| onkeyup="searchBox.OnSearchFieldChange(event)"/> |
| </span><span class="right"> |
| <a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a> |
| </span> |
| </div> |
| </li> |
| </ul> |
| </div> |
| </div><!-- top --> |
| <div id="side-nav" class="ui-resizable side-nav-resizable"> |
| <div id="nav-tree"> |
| <div id="nav-tree-contents"> |
| <div id="nav-sync" class="sync"></div> |
| </div> |
| </div> |
| <div id="splitbar" style="-moz-user-select:none;" |
| class="ui-resizable-handle"> |
| </div> |
| </div> |
| <script type="text/javascript"> |
| $(document).ready(function(){initNavTree('group__grp__robust.html','');}); |
| </script> |
| <div id="doc-content"> |
| <!-- window showing the filter options --> |
| <div id="MSearchSelectWindow" |
| onmouseover="return searchBox.OnSearchSelectShow()" |
| onmouseout="return searchBox.OnSearchSelectHide()" |
| onkeydown="return searchBox.OnSearchSelectKey(event)"> |
| <a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(0)"><span class="SelectionMark"> </span>All</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(1)"><span class="SelectionMark"> </span>Files</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(2)"><span class="SelectionMark"> </span>Functions</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(3)"><span class="SelectionMark"> </span>Groups</a></div> |
| |
| <!-- iframe showing the search results (closed by default) --> |
| <div id="MSearchResultsWindow"> |
| <iframe src="javascript:void(0)" frameborder="0" |
| name="MSearchResults" id="MSearchResults"> |
| </iframe> |
| </div> |
| |
| <div class="header"> |
| <div class="headertitle"> |
| <div class="title">Huber White Variance<div class="ingroups"><a class="el" href="group__grp__glm.html">Generalized Linear Models</a></div></div> </div> |
| </div><!--header--> |
| <div class="contents"> |
| <div id="dynsection-0" onclick="return toggleVisibility(this)" class="dynheader closed" style="cursor:pointer;"> |
| <img id="dynsection-0-trigger" src="closed.png" alt="+"/> Collaboration diagram for Huber White Variance:</div> |
| <div id="dynsection-0-summary" class="dynsummary" style="display:block;"> |
| </div> |
| <div id="dynsection-0-content" class="dyncontent" style="display:none;"> |
| <center><table><tr><td><div class="center"><iframe scrolling="no" frameborder="0" src="group__grp__robust.svg" width="363" height="56"><p><b>This browser is not able to show SVG: try Firefox, Chrome, Safari, or Opera instead.</b></p></iframe> |
| </div> |
| </td></tr></table></center> |
| </div> |
| <dl class="section user"><dt>About:</dt><dd>When doing regression analysis, we are sometimes interested in the variance of the computed coefficients \( \boldsymbol c \). While the built-in regression functions provide variance estimates, we may prefer a <em> robust </em> variance estimate.</dd></dl> |
| <p>The robust variance calculation can be expressed in a sandwich formation, which is the form </p> |
| <p class="formulaDsp"> |
| \[ S( \boldsymbol c) = B( \boldsymbol c) M( \boldsymbol c) B( \boldsymbol c) \] |
| </p> |
| <p> where \( B( \boldsymbol c)\) and \( M( \boldsymbol c)\) are matrices. The \( B( \boldsymbol c) \) matrix, also known as the bread, is relatively straight forward, and can be computed as </p> |
| <p class="formulaDsp"> |
| \[ B( \boldsymbol c) = n\left(\sum_i^n -H(y_i, x_i, \boldsymbol c) \right)^{-1} \] |
| </p> |
| <p> where \( H \) is the hessian matrix.</p> |
| <p>The \( M( \boldsymbol c)\) matrix has several variation, each with different robustness properties. The form implemented here is the Huber-White sandwich operator, which takes the form </p> |
| <p class="formulaDsp"> |
| \[ M_{H} =\frac{1}{n} \sum_i^n \psi(y_i,x_i, \boldsymbol c)^T \psi(y_i,x_i, \boldsymbol c). \] |
| </p> |
| <p>The above method for calculating robust variance (Huber-White estimates) is implemented for linear regression, logistic regression, and multinomial logistic regression. It is useful in calculating variances in a dataset with potentially noisy outliers. The Huber-White implemented here is identical to the "HC0" sandwich operator in the R module "sandwich".</p> |
| <p>The interface for robust linear, logistic, and multinomial logistic regression are similar, differing only in the optimal parameters. Calling the help and usage functions is identical across all three robust regressions.</p> |
| <p>When multinomial logistic regression is computed before the multinomial robust regression, it uses a default reference category of zero and the regression coefficients are included in the output table. The regression coefficients in the output are in the same order as multinomial logistic regression function, which is described below. For a problem with \( K \) dependent variables \( (1, ..., K) \) and \( J \) categories \( (0, ..., J-1) \), let \( {m_{k,j}} \) denote the coefficient for dependent variable \( k \) and category \( j \) . The output is \( {m_{k_1, j_0}, m_{k_1, j_1} \ldots m_{k_1, j_{J-1}}, m_{k_2, j_0}, m_{k_2, j_1} \ldots m_{k_K, j_{J-1}}} \). The order is NOT CONSISTENT with the multinomial regression marginal effect calculation with function <em>marginal_mlogregr</em>. This is deliberate because the interfaces of all multinomial regressions (robust, clustered, ...) will be moved to match that used in marginal.</p> |
| <dl class="section user"><dt>Input:</dt><dd></dd></dl> |
| <p>The training data is expected to be of the following form: </p> |
| <pre>{TABLE|VIEW} <em>sourceName</em> ( |
| <em>outputTable</em> VARCHAR, |
| <em>regressionType </em> VARCHAR, |
| <em>dependentVariable</em> VARCHAR, |
| <em>independentVariable</em> VARCHAR |
| )</pre><dl class="section user"><dt>Usage:</dt><dd></dd></dl> |
| <p><b> The Full Interface</b></p> |
| <dl class="section warning"><dt>Warning</dt><dd>The <b>'groupingCol'</b> and <b>'print_warnings'</b> input parameter for <em>robust_variance_mlogregr</em> is a placeholder in the Madlib V1.0. These input parameters will be implemented in a future release.</dd></dl> |
| <pre> |
| SELECT <a class="el" href="robust_8sql__in.html#ac09a7ffa805778b160795f96916995a6">madlib::robust_variance_linregr</a>( |
| <em>'source_table'</em>, -- name of input table, VARCHAR |
| <em>'out_table'</em>, -- name of output table, VARCHAR |
| <em>'dependent_varname'</em>, -- dependent variable, VARCHAR |
| <em>'independent_varname'</em>, -- independent variables, VARCHAR |
| <em>'grouping_cols'</em> -- [OPTIONAL] grouping variables, VARCHAR |
| ); |
| </pre><p> OR </p> |
| <pre> |
| SELECT <a class="el" href="robust_8sql__in.html#af431a81d3e3e448cf2f5f26389e5b882">madlib::robust_variance_logregr</a>( |
| <em>'source_table'</em>, -- name of input table, VARCHAR |
| <em>'out_table'</em>, -- name of output table, VARCHAR |
| <em>'dependent_varname'</em>, -- dependent variable, VARCHAR |
| <em>'independent_varname'</em>, -- independent variables, VARCHAR |
| <em>'grouping_cols'</em>, -- [OPTIONAL] grouping variables, VARCHAR |
| <em>max_iter</em>, -- [OPTIONAL] Integer identifying the maximum iterations used by the logistic regression solver. Default is 20, Integer |
| <em>'optimizer'</em>, -- [OPTIONAL] String identifying the optimizer used in the logistic regression. See the documentation in the logistic regression for the available options. Default is irls. VARCHAR |
| <em>tolerance</em>, -- [OPTIONAL] Float identifying the tolerance of the logistic regression optimizer. Default is 0.0001. DOUBLE PRECISION |
| <em>print_warnings</em> -- [OPTIONAL] Boolean specifying if the regression fit should print any warning messages. Default is false. BOOLEAN |
| ); |
| </pre><p> OR </p> |
| <pre> |
| SELECT madlib.robust_variance_mlogregr( |
| <em>'source_table'</em>, -- name of input table, VARCHAR |
| <em>'out_table'</em>, -- name of output table, VARCHAR |
| <em>'dependent_varname'</em>, -- dependent variable, VARCHAR |
| <em>'independent_varname'</em>, -- independent variables, VARCHAR |
| <em>ref_category</em>, -- [OPTIONAL] Integer specifying the reference category. Default is 0. |
| <em>'grouping_cols'</em>, -- [OPTIONAL] grouping variables, VARCHAR. Default is NULL. |
| <em>max_iter</em>, -- [OPTIONAL] Integer identifying the maximum iterations used by the logistic regression solver. Default is 20. |
| <em>'optimizer'</em>, -- [OPTIONAL] String identifying the optimizer used in the multinomial logistic regression. See the documentation in the multinomial logistic regression for the available options. Default is irls. |
| <em>tolerance</em>, -- [OPTIONAL] Float identifying the tolerance of the multinomial logistic regression optimizer. Default is 0.0001. |
| <em>print_warnings</em> -- [OPTIONAL] Boolean specifying if the regression fit should print any warning messages. Default is false. |
| ); |
| </pre><p> Here the <em>'independent_varname'</em> can be the name of a column, which contains array of numeric values. It can also have a format of string 'array[1, x1, x2, x3]', where <em>x1</em>, <em>x2</em> and <em>x3</em> are all column names.</p> |
| <p>Output is stored in the <em>out_table</em>: </p> |
| <pre> |
| [ coef | std_err | (z/t)-stats | p_values | |
| +------+---------+-------------+----------+ |
| </pre><dl class="section user"><dt>Examples:</dt><dd><ol type="1"> |
| <li>For function summary information. Run <pre class="fragment">sql> select robust_variance_{linregr OR logregr OR mlogregr}('help'); |
| OR |
| sql> select robust_variance_{linregr OR logregr OR mlogregr}(); |
| OR |
| sql> select robust_variance_{linregr OR logregr OR mlogregr}('?'); |
| </pre></li> |
| <li>For function usage information. <pre class="fragment">sql> select robust_variance_{linregr OR logregr OR mlogregr}('usage'); |
| </pre></li> |
| <li>Create the sample data set: <pre class="fragment">sql> SELECT * FROM data; |
| id | second_attack | treatment | trait_anxiety |
| ----+---------------+-----------+--------------- |
| 1 | 1 | 1 | 70 |
| 3 | 1 | 1 | 50 |
| 5 | 1 | 0 | 40 |
| 7 | 1 | 0 | 75 |
| 9 | 1 | 0 | 70 |
| 11 | 0 | 1 | 65 |
| 13 | 0 | 1 | 45 |
| 15 | 0 | 1 | 40 |
| 17 | 0 | 0 | 55 |
| ... |
| </pre></li> |
| <li>Run the logistic regression function and then compute the robust logistic variance of the regression: <pre class="fragment">sql> select robust_variance_logregr('patients', 'newTable', 'second_attack', 'ARRAY[1, treatment, trait_anxiety]'); |
| sql> select * from newTable; |
| coef | {11.962748350258,1.37269168529894,0.00285507335100035} |
| std_err | {3.45872062333141,1.17161925782182,0.053432886418388} |
| z_stats | {-1.839833462942,-0.874094587942144,2.22793348156965} |
| p_values | {0.0657926909738772,0.382066744586027,0.0258849510756295} |
| </pre></li> |
| </ol> |
| </dd></dl> |
| <dl class="section user"><dt>Literature:</dt><dd></dd></dl> |
| <p>[1] vce(cluster) function in STATA: <a href="http://www.stata.com/help.cgi?vce_option">http://www.stata.com/help.cgi?vce_option</a></p> |
| <p>[2] clustered estimators in R: <a href="http://people.su.se/~ma/clustering.pdf">http://people.su.se/~ma/clustering.pdf</a></p> |
| <p>[3] Achim Zeileis: Object-oriented Computation of Sandwich Estimators. Research Report Series / Department of Statistics and Mathematics, 37. Department of Statistics and Mathematics, WU Vienna University of Economics and Business, Vienna. <a href="http://cran.r-project.org/web/packages/sandwich/vignettes/sandwich-OOP.pdf">http://cran.r-project.org/web/packages/sandwich/vignettes/sandwich-OOP.pdf</a></p> |
| <dl class="section see"><dt>See Also</dt><dd>File <a class="el" href="robust_8sql__in.html" title="SQL functions for linear regression. ">robust.sql_in</a> documenting the SQL functions. </dd></dl> |
| </div><!-- contents --> |
| </div><!-- doc-content --> |
| <!-- start footer part --> |
| <div id="nav-path" class="navpath"><!-- id is needed for treeview function! --> |
| <ul> |
| <li class="footer">Generated on Tue Sep 10 2013 15:48:04 for MADlib by |
| <a href="http://www.doxygen.org/index.html"> |
| <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.4 </li> |
| </ul> |
| </div> |
| </body> |
| </html> |