| <!-- HTML header for doxygen 1.8.4--> |
| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
| <html xmlns="http://www.w3.org/1999/xhtml"> |
| <head> |
| <meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/> |
| <meta http-equiv="X-UA-Compatible" content="IE=9"/> |
| <meta name="generator" content="Doxygen 1.8.13"/> |
| <meta name="keywords" content="madlib,postgres,greenplum,machine learning,data mining,deep learning,ensemble methods,data science,market basket analysis,affinity analysis,pca,lda,regression,elastic net,huber white,proportional hazards,k-means,latent dirichlet allocation,bayes,support vector machines,svm"/> |
| <title>MADlib: Marginal Effects</title> |
| <link href="tabs.css" rel="stylesheet" type="text/css"/> |
| <script type="text/javascript" src="jquery.js"></script> |
| <script type="text/javascript" src="dynsections.js"></script> |
| <link href="navtree.css" rel="stylesheet" type="text/css"/> |
| <script type="text/javascript" src="resize.js"></script> |
| <script type="text/javascript" src="navtreedata.js"></script> |
| <script type="text/javascript" src="navtree.js"></script> |
| <script type="text/javascript"> |
| $(document).ready(initResizable); |
| </script> |
| <link href="search/search.css" rel="stylesheet" type="text/css"/> |
| <script type="text/javascript" src="search/searchdata.js"></script> |
| <script type="text/javascript" src="search/search.js"></script> |
| <script type="text/javascript"> |
| $(document).ready(function() { init_search(); }); |
| </script> |
| <script type="text/x-mathjax-config"> |
| MathJax.Hub.Config({ |
| extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], |
| jax: ["input/TeX","output/HTML-CSS"], |
| }); |
| </script><script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js"></script> |
| <!-- hack in the navigation tree --> |
| <script type="text/javascript" src="eigen_navtree_hacks.js"></script> |
| <link href="doxygen.css" rel="stylesheet" type="text/css" /> |
| <link href="madlib_extra.css" rel="stylesheet" type="text/css"/> |
| <!-- google analytics --> |
| <script> |
| (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ |
| (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), |
| m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) |
| })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); |
| ga('create', 'UA-45382226-1', 'madlib.apache.org'); |
| ga('send', 'pageview'); |
| </script> |
| </head> |
| <body> |
| <div id="top"><!-- do not remove this div, it is closed by doxygen! --> |
| <div id="titlearea"> |
| <table cellspacing="0" cellpadding="0"> |
| <tbody> |
| <tr style="height: 56px;"> |
| <td id="projectlogo"><a href="http://madlib.apache.org"><img alt="Logo" src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td> |
| <td style="padding-left: 0.5em;"> |
| <div id="projectname"> |
| <span id="projectnumber">1.16</span> |
| </div> |
| <div id="projectbrief">User Documentation for Apache MADlib</div> |
| </td> |
| <td> <div id="MSearchBox" class="MSearchBoxInactive"> |
| <span class="left"> |
| <img id="MSearchSelect" src="search/mag_sel.png" |
| onmouseover="return searchBox.OnSearchSelectShow()" |
| onmouseout="return searchBox.OnSearchSelectHide()" |
| alt=""/> |
| <input type="text" id="MSearchField" value="Search" accesskey="S" |
| onfocus="searchBox.OnSearchFieldFocus(true)" |
| onblur="searchBox.OnSearchFieldFocus(false)" |
| onkeyup="searchBox.OnSearchFieldChange(event)"/> |
| </span><span class="right"> |
| <a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a> |
| </span> |
| </div> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </div> |
| <!-- end header part --> |
| <!-- Generated by Doxygen 1.8.13 --> |
| <script type="text/javascript"> |
| var searchBox = new SearchBox("searchBox", "search",false,'Search'); |
| </script> |
| </div><!-- top --> |
| <div id="side-nav" class="ui-resizable side-nav-resizable"> |
| <div id="nav-tree"> |
| <div id="nav-tree-contents"> |
| <div id="nav-sync" class="sync"></div> |
| </div> |
| </div> |
| <div id="splitbar" style="-moz-user-select:none;" |
| class="ui-resizable-handle"> |
| </div> |
| </div> |
| <script type="text/javascript"> |
| $(document).ready(function(){initNavTree('group__grp__marginal.html','');}); |
| </script> |
| <div id="doc-content"> |
| <!-- window showing the filter options --> |
| <div id="MSearchSelectWindow" |
| onmouseover="return searchBox.OnSearchSelectShow()" |
| onmouseout="return searchBox.OnSearchSelectHide()" |
| onkeydown="return searchBox.OnSearchSelectKey(event)"> |
| </div> |
| |
| <!-- iframe showing the search results (closed by default) --> |
| <div id="MSearchResultsWindow"> |
| <iframe src="javascript:void(0)" frameborder="0" |
| name="MSearchResults" id="MSearchResults"> |
| </iframe> |
| </div> |
| |
| <div class="header"> |
| <div class="headertitle"> |
| <div class="title">Marginal Effects<div class="ingroups"><a class="el" href="group__grp__super.html">Supervised Learning</a> » <a class="el" href="group__grp__regml.html">Regression Models</a></div></div> </div> |
| </div><!--header--> |
| <div class="contents"> |
| <div class="toc"><b>Contents</b> <ul> |
| <li> |
| <a href="#margins">Marginal Effects with Interaction Terms</a> </li> |
| <li> |
| <a href="#examples">Examples</a> </li> |
| <li> |
| <a href="#notes">Notes</a> </li> |
| <li> |
| <a href="#background">Technical Background</a> </li> |
| <li> |
| <a href="#literature">Literature</a> </li> |
| <li> |
| <a href="#related">Related Topics</a> </li> |
| </ul> |
| </div><p>A marginal effect (ME) or partial effect measures the effect on the conditional mean of \( y \) for a change in one of the regressors, say \(X_k\). In the linear regression model, the ME equals the relevant slope coefficient, greatly simplifying analysis. For nonlinear models, specialized algorithms are required for calculating ME. The marginal effect computed is the average of the marginal effect at every data point present in the source table.</p> |
| <p>MADlib provides marginal effects regression functions for linear, logistic and multinomial logistic regressions.</p> |
| <dl class="section warning"><dt>Warning</dt><dd>The <a class="el" href="marginal_8sql__in.html#a9517d679ee4209126895445cbed51fe3">margins_logregr()</a> and <a class="el" href="marginal_8sql__in.html#ae39ad0e1beca060fd153dba35901a4e7">margins_mlogregr()</a> functions have been deprecated in favor of the <a class="el" href="marginal_8sql__in.html#a36fcae5245ca31517723fce38b183c90" title="Marginal effects with default variable_names. ">margins()</a> function.</dd></dl> |
| <p><a class="anchor" id="margins"></a></p><dl class="section user"><dt>Marginal Effects with Interaction Terms</dt><dd><pre class="syntax"> |
| margins( model_table, |
| output_table, |
| x_design, |
| source_table, |
| marginal_vars |
| ) |
| </pre> <b>Arguments</b> <dl class="arglist"> |
| <dt>model_table </dt> |
| <dd>VARCHAR. The name of the model table, which is the output of <a class="el" href="logistic_8sql__in.html#a74210a7ef513dfcbdfdd9f3b37bfe428" title="Compute logistic-regression coefficients and diagnostic statistics. ">logregr_train()</a> or <a class="el" href="multilogistic_8sql__in.html#aedc13474e6abbc88451d120ad97e44d4" title="Compute multinomial logistic regression coefficients. ">mlogregr_train()</a>. </dd> |
| <dt>output_table </dt> |
| <dd>VARCHAR. The name of the result table. The output table has the following columns. <table class="output"> |
| <tr> |
| <th>variables </th><td>INTEGER[]. The indices of the basis variables. </td></tr> |
| <tr> |
| <th>margins </th><td>DOUBLE PRECISION[]. The marginal effects. </td></tr> |
| <tr> |
| <th>std_err </th><td>DOUBLE PRECISION[]. An array of the standard errors, computed using the delta method. </td></tr> |
| <tr> |
| <th>z_stats </th><td>DOUBLE PRECISION[]. An array of the z-stats of the marginal effects. </td></tr> |
| <tr> |
| <th>p_values </th><td>DOUBLE PRECISION[]. An array of the Wald p-values of the marginal effects. </td></tr> |
| </table> |
| </dd> |
| <dt>x_design (optional) </dt> |
| <dd><p class="startdd">VARCHAR, default: NULL. The design of independent variables, necessary only if interaction term or indicator (categorical) terms are present. This parameter is necessary since the independent variables in the underlying regression is not parsed to extract the relationship between variables.</p> |
| <p>Example: The <em>independent_varname</em> in the regression method can be specified in either of the following ways:</p><ul> |
| <li><code> ‘array[1, color_blue, color_green, gender_female, gpa, gpa^2, gender_female*gpa, gender_female*gpa^2, weight]’ </code></li> |
| <li><code> ‘x’ </code></li> |
| </ul> |
| <p>In the second version, the column <em>x</em> is an array containing data identical to that expressed in the first version, computed in a prior data preparation step. Supply an <em>x_design argument</em> to the <a class="el" href="marginal_8sql__in.html#a36fcae5245ca31517723fce38b183c90" title="Marginal effects with default variable_names. ">margins()</a> function in the following way:</p><ul> |
| <li><code> ‘1, i.color_blue.color, i.color_green.color, i.gender_female, gpa, gpa^2, gender_female*gpa, gender_female*gpa^2, weight’</code></li> |
| </ul> |
| <p>The variable names (<em>'gpa', 'weight', ...</em>), referred to here as <em>identifiers</em>, should be unique for each basis variable and need not be the same as the original variable name in <em>independent_varname</em>. They should, however, be in the same order as the corresponding variables in <em>independent_varname</em>. The length of <em>x_design</em> is expected to be the same as the length of <em>independent_varname</em>. Each <em>identifier</em> name can contain only alphanumeric characters and the underscore.</p> |
| <p>Indicator (dummy) variables are prefixed with an 'i.' (This is only necessary for the basis term; it is not needed in the interaction terms.) Indicator variables that are obtained from the same categorical variable (for example, 'color_blue' and 'color_green') need to have a common and unique suffix (for example, '.color'). The '.' is used to add the prefix and suffix. If a reference indicator variable is present, it should contain the prefix 'ir.'.</p> |
| <p>An identifier may contain alphanumeric characters and underscores. To include other characters, the string must be double-quoted. Escape-characters are not currently supported. </p> |
| <p class="enddd"></p> |
| </dd> |
| <dt>source_table (optional) </dt> |
| <dd><p class="startdd">VARCHAR, default: NULL. Name of the data table to apply marginal effects on. If not provided or NULL then the marginal effects are computed on the training table.</p> |
| <p class="enddd"></p> |
| </dd> |
| <dt>marginal_vars (optional) </dt> |
| <dd>VARCHAR, default: NULL. Comma-separated string containing specific variable identifiers to calculate marginal effects on. When NULL, marginal effects for all variables are returned. </dd> |
| </dl> |
| </dd></dl> |
| <dl class="section note"><dt>Note</dt><dd>No output will be provided for the reference indicator variable, since the marginal effect for that variable is undefined. If a reference variable is included in the independent variables and <em>marginal_vars</em>, the <a class="el" href="marginal_8sql__in.html#a36fcae5245ca31517723fce38b183c90" title="Marginal effects with default variable_names. ">margins()</a> function will ignore that variable for the output. The variable can still be included in the regression and margins, since it will affect the values for other related indicator variables.</dd></dl> |
| <p><a class="anchor" id="logregr_train"></a></p><dl class="section user"><dt>Marginal Effects for Logistic Regression</dt><dd></dd></dl> |
| <dl class="section warning"><dt>Warning</dt><dd>This function has been deprecated in favor of the <a class="el" href="marginal_8sql__in.html#a36fcae5245ca31517723fce38b183c90" title="Marginal effects with default variable_names. ">margins()</a> function.</dd></dl> |
| <pre class="syntax"> |
| margins_logregr( source_table, |
| output_table, |
| dependent_variable, |
| independent_variable, |
| grouping_cols, |
| marginal_vars, |
| max_iter, |
| optimizer, |
| tolerance, |
| verbose_mode |
| ) |
| </pre><p> <b>Arguments</b> </p><dl class="arglist"> |
| <dt>source_table </dt> |
| <dd>VARCHAR. The name of the data table. </dd> |
| <dt>output_table </dt> |
| <dd><p class="startdd">VARCHAR. The name of the result table. The output table has the following columns. </p><table class="output"> |
| <tr> |
| <th>margins </th><td>DOUBLE PRECISION[]. The marginal effects. </td></tr> |
| <tr> |
| <th>std_err </th><td>DOUBLE PRECISION[]. An array of the standard errors, using the delta method. </td></tr> |
| <tr> |
| <th>z_stats </th><td>DOUBLE PRECISION[]. An array of the z-stats of the marginal effects. </td></tr> |
| <tr> |
| <th>p_values </th><td>DOUBLE PRECISION[]. An array of the Wald p-values of the marginal effects. </td></tr> |
| </table> |
| <p>A summary table named <output_table>_summary is also created, which is the same as the summary table created by <a class="el" href="logistic_8sql__in.html#a74210a7ef513dfcbdfdd9f3b37bfe428" title="Compute logistic-regression coefficients and diagnostic statistics. ">logregr_train()</a> function. Refer to the documentation for logistic regression for details.</p> |
| <p class="enddd"></p> |
| </dd> |
| <dt>dependent_variable </dt> |
| <dd>VARCHAR. The name of the column for dependent variables. </dd> |
| <dt>independent_variable </dt> |
| <dd>VARCHAR. The name of the column for independent variables. Can be any SQL expression that evaluates to an array. </dd> |
| <dt>grouping_cols (optional) </dt> |
| <dd>VARCHAR, default: NULL. <em>Not currently implemented. Any non-NULL value is ignored.</em> An expression list used to group the input dataset into discrete groups, running one regression per group. Similar to the SQL "GROUP BY" clause. When this value is NULL, no grouping is used and a single result model is generated. </dd> |
| <dt>marginal_vars (optional) </dt> |
| <dd>INTEGER[], default: NULL. An index list (base 1) representing the independent variables to compute marginal effects on. When NULL, computes marginal effects on all variables. </dd> |
| <dt>max_iter (optional) </dt> |
| <dd>INTEGER, default: 20. The maximum number of iterations for the logistic regression. </dd> |
| <dt>optimizer (optional) </dt> |
| <dd>VARCHAR, default: 'irls'. The optimizer to use for the logistic regression: newton/irls, cg, or igd. </dd> |
| <dt>tolerance (optional) </dt> |
| <dd>DOUBLE PRECISION, default: 1e-4. Termination criterion for logistic regression (relative). </dd> |
| <dt>verbose_mode (optional) </dt> |
| <dd>BOOLEAN, default FALSE. When TRUE, provides verbose output of the results of training. </dd> |
| </dl> |
| <p><a class="anchor" id="mlogregr_train"></a></p><dl class="section user"><dt>Marginal Effects for Multinomial Logistic Regression</dt><dd></dd></dl> |
| <dl class="section warning"><dt>Warning</dt><dd>This function has been deprecated in favor of the <a class="el" href="marginal_8sql__in.html#a36fcae5245ca31517723fce38b183c90" title="Marginal effects with default variable_names. ">margins()</a> function.</dd></dl> |
| <pre class="syntax"> |
| margins_mlogregr( source_table, |
| out_table, |
| dependent_varname, |
| independent_varname, |
| ref_category, |
| grouping_cols, |
| marginal_vars, |
| optimizer_params, |
| verbose_mode |
| ) |
| </pre><p> <b>Arguments</b> </p><dl class="arglist"> |
| <dt>source_table </dt> |
| <dd>VARCHAR. The name of data table. </dd> |
| <dt>out_table </dt> |
| <dd><p class="startdd">VARCHAR. The name of result table. The output table has the following columns. </p><table class="output"> |
| <tr> |
| <th>category </th><td>The category. </td></tr> |
| <tr> |
| <th>ref_category </th><td>The refererence category used for modeling. </td></tr> |
| <tr> |
| <th>margins </th><td>DOUBLE PRECISION[]. The marginal effects. </td></tr> |
| <tr> |
| <th>std_err </th><td>DOUBLE PRECISION[]. An array of the standard errors, using the delta method. </td></tr> |
| <tr> |
| <th>z_stats </th><td>DOUBLE PRECISION[]. An array of the z-stats of the marginal effects. </td></tr> |
| <tr> |
| <th>p_values </th><td>DOUBLE PRECISION[]. An array of the Wald p-values of the marginal effects. </td></tr> |
| </table> |
| <p>A summary table named <out_table>_summary is also created, which is the same as the summary table created by <a class="el" href="multilogistic_8sql__in.html#aedc13474e6abbc88451d120ad97e44d4" title="Compute multinomial logistic regression coefficients. ">mlogregr_train()</a> function. Refer to the documentation for multinomial logistic regression for details.</p> |
| <p class="enddd"></p> |
| </dd> |
| <dt>dependent_varname </dt> |
| <dd>VARCHAR. The name of the column for dependent variables. </dd> |
| <dt>independent_varname </dt> |
| <dd>VARCHAR. The name of the column for independent variables. Can be any SQL expression that evaluates to an array. </dd> |
| <dt>ref_category (optional) </dt> |
| <dd>INTEGER, default: 0. Reference category for the multinomial logistic regression. </dd> |
| <dt>grouping_cols (optional) </dt> |
| <dd>VARCHAR, default: NULL. <em>Not currently implemented. Any non-NULL value is ignored.</em> An expression list used to group the input dataset into discrete groups, running one regression per group. Similar to the SQL "GROUP BY" clause. When this value is NULL, no grouping is used and a single result model is generated. </dd> |
| <dt>marginal_vars(optional) </dt> |
| <dd>INTEGER[], default: NULL. An index list (base 1) representing the independent variables to compute marginal effects on. When NULL, computes marginal effects on all variables. </dd> |
| <dt>optimizer_params (optional) </dt> |
| <dd>TEXT, default: NULL, which uses the default values of optimizer parameters: max_iter=20, optimizer='newton', tolerance=1e-4. It should be a string that contains 'key=value' pairs separated by commas. </dd> |
| <dt>verbose_mode (optional) </dt> |
| <dd>BOOLEAN, default FALSE. When TRUE, provides verbose output of the results of training. </dd> |
| </dl> |
| <p><a class="anchor" id="examples"></a></p><dl class="section user"><dt>Examples</dt><dd></dd></dl> |
| <ol type="1"> |
| <li>View online help for the marginal effects function. <pre class="example"> |
| SELECT madlib.margins(); |
| </pre></li> |
| <li>Create the sample data set. Use the <code>patients</code> dataset from the <a href="group__grp__logreg.html#examples">Logistic Regression examples</a>. <pre class="example"> |
| SELECT * FROM patients; |
| </pre> Result: <pre class="result"> |
| id | second_attack | treatment | trait_anxiety |
|  ---+---------------+-----------+--------------- |
| 1 | 1 | 1 | 70 |
| 3 | 1 | 1 | 50 |
| 5 | 1 | 0 | 40 |
| 7 | 1 | 0 | 75 |
| 9 | 1 | 0 | 70 |
| 11 | 0 | 1 | 65 |
| 13 | 0 | 1 | 45 |
| 15 | 0 | 1 | 40 |
| 17 | 0 | 0 | 55 |
| 19 | 0 | 0 | 50 |
| 2 | 1 | 1 | 80 |
| 4 | 1 | 0 | 60 |
| 6 | 1 | 0 | 65 |
| 8 | 1 | 0 | 80 |
| 10 | 1 | 0 | 60 |
| 12 | 0 | 1 | 50 |
| 14 | 0 | 1 | 35 |
| 16 | 0 | 1 | 50 |
| 18 | 0 | 0 | 45 |
| 20 | 0 | 0 | 60 |
| </pre></li> |
| <li>Run logistic regression to get the model, compute the marginal effects of all variables, and view the results. <pre class="example"> |
| DROP TABLE IF EXISTS model_table; |
| DROP TABLE IF EXISTS model_table_summary; |
| DROP TABLE IF EXISTS margins_table; |
| SELECT madlib.logregr_train( 'patients', |
| 'model_table', |
| 'second_attack', |
| 'ARRAY[1, treatment, trait_anxiety, treatment^2, treatment * trait_anxiety]' |
| ); |
| SELECT madlib.margins( 'model_table', |
| 'margins_table', |
| 'intercept, treatment, trait_anxiety, treatment^2, treatment*trait_anxiety', |
| NULL, |
| NULL |
| ); |
| \x ON |
| SELECT * FROM margins_table; |
| </pre> Result: <pre class="result"> |
| variables | {intercept, treatment, trait_anxiety} |
| margins | {-0.876046514609573,-0.0648833521465306,0.0177196513589633} |
| std_err | {0.551714275062467,0.373592457067442,0.00458001207971933} |
| z_stats | {-1.58786269307674,-0.173674149247659,3.86890930646828} |
| p_values | {0.112317391159946,0.862121554662231,0.000109323294026272} |
| </pre></li> |
| <li>Compute the marginal effects of the first variable using the previous model and view the results (using different names in 'x_design'). <pre class="example"> |
| DROP TABLE IF EXISTS result_table; |
| SELECT madlib.margins( 'model_table', |
| 'result_table', |
| 'i, tre, tra, tre^2, tre*tra', |
| NULL, |
| 'tre' |
| ); |
| SELECT * FROM result_table; |
| </pre> Result: <pre class="result"> |
| -[ RECORD 1 ]------------------- |
| variables | {tre} |
| margins | {-0.110453283517281} |
| std_err | {0.228981529064089} |
| z_stats | {-0.482367656329023} |
| p_values | {0.629544793219806} |
| </pre></li> |
| <li>Create a sample data set for multinomial logistic regression. (The full dataset has three categories.) Use the dataset from the <a href="group__grp__mlogreg.html#examples">Multinomial Regression example</a>. <pre class="example"> |
| \x OFF |
| SELECT * FROM test3; |
| </pre> Result: <pre class="result"> |
| feat1 | feat2 | cat |
| -------+-------+----- |
| 2 | 33 | 0 |
| 2 | 31 | 1 |
| 2 | 36 | 1 |
| 2 | 31 | 1 |
| 2 | 41 | 1 |
| 2 | 37 | 1 |
| 2 | 44 | 1 |
| 2 | 46 | 1 |
| 2 | 46 | 2 |
| 2 | 39 | 0 |
| 2 | 44 | 1 |
| 2 | 44 | 0 |
| 2 | 67 | 2 |
| 2 | 59 | 2 |
| 2 | 59 | 0 |
| ... |
| </pre></li> |
| <li>Run the regression function and then compute the marginal effects of all variables in the regression. <pre class="example"> |
| DROP TABLE IF EXISTS model_table; |
| DROP TABLE IF EXISTS model_table_summary; |
| DROP TABLE IF EXISTS result_table; |
| SELECT madlib.mlogregr_train('test3', 'model_table', 'cat', |
| 'ARRAY[1, feat1, feat2, feat1*feat2]', |
| 0); |
| SELECT madlib.margins('model_table', |
| 'result_table', |
| 'intercept, feat1, feat2, feat1*feat2'); |
| \x ON |
| SELECT * FROM result_table; |
| </pre> Result: <pre class="result"> |
| -[ RECORD 1 ]+------------------------------------------------------------- |
| category | 1 |
| ref_category | 0 |
| variables | {intercept,feat1,feat2} |
| margins | {2.38176571752675,-0.0545733108729351,-0.0147264917310351} |
| std_err | {0.851299967007829,0.0697049196489632,0.00374946341567828} |
| z_stats | {2.79779843748643,-0.782919070099622,-3.92762646235104} |
| p_values | {0.00514522099923651,0.43367463815468,8.57883141882439e-05} |
| -[ RECORD 2 ]+------------------------------------------------------------- |
| category | 2 |
| ref_category | 0 |
| variables | {intercept,feat1,feat2} |
| margins | {-1.99279068434949,0.0922540608068343,0.0168049205501686} |
| std_err | {0.742790306495022,0.0690712705200096,0.00202015384479213} |
| z_stats | {-2.68284422524683,1.33563578767686,8.31863404536785} |
| p_values | {0.00729989838349161,0.181668346802398,8.89828265128986e-17} |
| </pre></li> |
| </ol> |
| <p><a class="anchor" id="notes"></a> </p><dl class="section note"><dt>Note</dt><dd>The <em>marginal_vars</em> argument is a list with the names matching those in 'x_design'. If no 'x_design' is present (i.e. no interaction and no indicator variables), then <em>marginal_vars</em> must be the indices (base 1) of variables in 'independent_varname'. Use <em>NULL</em> to use all independent variables. It is important to note that the <em>independent_varname</em> array in the underlying regression is assumed to start with a lower bound index of 1. Arrays that don't follow this would result in an incorrect solution.</dd></dl> |
| <p><a class="anchor" id="background"></a></p><dl class="section user"><dt>Technical Background</dt><dd></dd></dl> |
| <p>The standard approach to modeling dichotomous/binary variables (so \(y \in \{0, 1\} \)) is to estimate a generalized linear model under the assumption that \( y \) follows some form of Bernoulli distribution. Thus the expected value of \( y \) becomes, </p><p class="formulaDsp"> |
| \[ y = G(X' \beta), \] |
| </p> |
| <p>where G is the specified binomial distribution. For logistic regression, the function \( G \) represents the inverse logit function.</p> |
| <p>In logistic regression: </p><p class="formulaDsp"> |
| \[ P = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + \dots \beta_j x_j)}} = \frac{1}{1 + e^{-z}} \implies \frac{\partial P}{\partial X_k} = \beta_k \cdot \frac{1}{1 + e^{-z}} \cdot \frac{e^{-z}}{1 + e^{-z}} \\ = \beta_k \cdot P \cdot (1-P) \] |
| </p> |
| <p>There are several methods for calculating the marginal effects for dichotomous dependent variables. This package uses the average of the marginal effects at every sample observation.</p> |
| <p>This is calculated as follows: </p><p class="formulaDsp"> |
| \[ \frac{\partial y}{\partial x_k} = \beta_k \frac{\sum_{i=1}^n P(y_i = 1)(1-P(y_i = 1))}{n}, \\ \text{where}, P(y_i=1) = g(X^{(i)}\beta) \] |
| </p> |
| <p>We use the delta method for calculating standard errors on the marginal effects.</p> |
| <p><a class="anchor" id="literature"></a></p><dl class="section user"><dt>Literature</dt><dd></dd></dl> |
| <p>[1] mfx function in STATA: <a href="http://www.stata.com/help.cgi?mfx_option">http://www.stata.com/help.cgi?mfx_option</a></p> |
| <p><a class="anchor" id="related"></a></p><dl class="section user"><dt>Related Topics</dt><dd></dd></dl> |
| <p>File <a class="el" href="marginal_8sql__in.html" title="SQL functions for linear regression. ">marginal.sql_in</a> documenting the SQL functions.</p> |
| </div><!-- contents --> |
| </div><!-- doc-content --> |
| <!-- start footer part --> |
| <div id="nav-path" class="navpath"><!-- id is needed for treeview function! --> |
| <ul> |
| <li class="footer">Generated on Tue Jul 2 2019 22:35:51 for MADlib by |
| <a href="http://www.doxygen.org/index.html"> |
| <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.13 </li> |
| </ul> |
| </div> |
| </body> |
| </html> |