| <!-- HTML header for doxygen 1.8.4--> | 
 | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> | 
 | <html xmlns="http://www.w3.org/1999/xhtml"> | 
 | <head> | 
 | <meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/> | 
 | <meta http-equiv="X-UA-Compatible" content="IE=9"/> | 
 | <meta name="generator" content="Doxygen 1.8.4"/> | 
 | <meta name="keywords" content="madlib,postgres,greenplum,machine learning,data mining,deep learning,ensemble methods,data science,market basket analysis,affinity analysis,pca,lda,regression,elastic net,huber white,proportional hazards,k-means,latent dirichlet allocation,bayes,support vector machines,svm"/> | 
 | <title>MADlib: Sketch-based Estimators</title> | 
 | <link href="tabs.css" rel="stylesheet" type="text/css"/> | 
 | <script type="text/javascript" src="jquery.js"></script> | 
 | <script type="text/javascript" src="dynsections.js"></script> | 
 | <link href="navtree.css" rel="stylesheet" type="text/css"/> | 
 | <script type="text/javascript" src="resize.js"></script> | 
 | <script type="text/javascript" src="navtree.js"></script> | 
 | <script type="text/javascript"> | 
 |   $(document).ready(initResizable); | 
 |   $(window).load(resizeHeight); | 
 | </script> | 
 | <link href="search/search.css" rel="stylesheet" type="text/css"/> | 
 | <script type="text/javascript" src="search/search.js"></script> | 
 | <script type="text/javascript"> | 
 |   $(document).ready(function() { searchBox.OnSelectItem(0); }); | 
 | </script> | 
 | <script type="text/x-mathjax-config"> | 
 |   MathJax.Hub.Config({ | 
 |     extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], | 
 |     jax: ["input/TeX","output/HTML-CSS"], | 
 | }); | 
 | </script><script src="../mathjax/MathJax.js"></script> | 
 | <link href="doxygen.css" rel="stylesheet" type="text/css" /> | 
 | <link href="madlib_extra.css" rel="stylesheet" type="text/css"/> | 
 | <!-- google analytics --> | 
 | <script> | 
 |   (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ | 
 |   (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), | 
 |   m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) | 
 |   })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); | 
 |   ga('create', 'UA-45382226-1', 'auto'); | 
 |   ga('send', 'pageview'); | 
 | </script> | 
 | </head> | 
 | <body> | 
 | <div id="top"><!-- do not remove this div, it is closed by doxygen! --> | 
 | <div id="titlearea"> | 
 | <table cellspacing="0" cellpadding="0"> | 
 |  <tbody> | 
 |  <tr style="height: 56px;"> | 
 |   <td style="padding-left: 0.5em;"> | 
 |    <div id="projectname">MADlib | 
 |     <span id="projectnumber">1.3</span> <span style="font-size:10pt; font-style:italic"><a href="../latest/./group__grp__sketches.html"> A newer version is available</a></span> | 
 |    </div> | 
 |    <div id="projectbrief">User Documentation</div> | 
 |   </td> | 
 |    <td>        <div id="MSearchBox" class="MSearchBoxInactive"> | 
 |         <span class="left"> | 
 |           <img id="MSearchSelect" src="search/mag_sel.png" | 
 |                onmouseover="return searchBox.OnSearchSelectShow()" | 
 |                onmouseout="return searchBox.OnSearchSelectHide()" | 
 |                alt=""/> | 
 |           <input type="text" id="MSearchField" value="Search" accesskey="S" | 
 |                onfocus="searchBox.OnSearchFieldFocus(true)"  | 
 |                onblur="searchBox.OnSearchFieldFocus(false)"  | 
 |                onkeyup="searchBox.OnSearchFieldChange(event)"/> | 
 |           </span><span class="right"> | 
 |             <a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a> | 
 |           </span> | 
 |         </div> | 
 | </td> | 
 |  </tr> | 
 |  </tbody> | 
 | </table> | 
 | </div> | 
 | <!-- end header part --> | 
 | <!-- Generated by Doxygen 1.8.4 --> | 
 | <script type="text/javascript"> | 
 | var searchBox = new SearchBox("searchBox", "search",false,'Search'); | 
 | </script> | 
 | </div><!-- top --> | 
 | <div id="side-nav" class="ui-resizable side-nav-resizable"> | 
 |   <div id="nav-tree"> | 
 |     <div id="nav-tree-contents"> | 
 |       <div id="nav-sync" class="sync"></div> | 
 |     </div> | 
 |   </div> | 
 |   <div id="splitbar" style="-moz-user-select:none;"  | 
 |        class="ui-resizable-handle"> | 
 |   </div> | 
 | </div> | 
 | <script type="text/javascript"> | 
 | $(document).ready(function(){initNavTree('group__grp__sketches.html','');}); | 
 | </script> | 
 | <div id="doc-content"> | 
 | <!-- window showing the filter options --> | 
 | <div id="MSearchSelectWindow" | 
 |      onmouseover="return searchBox.OnSearchSelectShow()" | 
 |      onmouseout="return searchBox.OnSearchSelectHide()" | 
 |      onkeydown="return searchBox.OnSearchSelectKey(event)"> | 
 | <a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(0)"><span class="SelectionMark"> </span>All</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(1)"><span class="SelectionMark"> </span>Files</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(2)"><span class="SelectionMark"> </span>Functions</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(3)"><span class="SelectionMark"> </span>Variables</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(4)"><span class="SelectionMark"> </span>Groups</a></div> | 
 |  | 
 | <!-- iframe showing the search results (closed by default) --> | 
 | <div id="MSearchResultsWindow"> | 
 | <iframe src="javascript:void(0)" frameborder="0"  | 
 |         name="MSearchResults" id="MSearchResults"> | 
 | </iframe> | 
 | </div> | 
 |  | 
 | <div class="header"> | 
 |   <div class="summary"> | 
 | <a href="#groups">Modules</a>  </div> | 
 |   <div class="headertitle"> | 
 | <div class="title">Sketch-based Estimators<div class="ingroups"><a class="el" href="group__grp__early__stage.html">Early Stage Development</a></div></div>  </div> | 
 | </div><!--header--> | 
 | <div class="contents"> | 
 | <table class="memberdecls"> | 
 | <tr class="heading"><td colspan="2"><h2 class="groupheader"><a name="groups"></a> | 
 | Modules</h2></td></tr> | 
 | <tr class="memitem:group__grp__countmin"><td class="memItemLeft" align="right" valign="top"> </td><td class="memItemRight" valign="bottom"><a class="el" href="group__grp__countmin.html">CountMin (Cormode-Muthukrishnan)</a></td></tr> | 
 | <tr class="separator:"><td class="memSeparator" colspan="2"> </td></tr> | 
 | <tr class="memitem:group__grp__fmsketch"><td class="memItemLeft" align="right" valign="top"> </td><td class="memItemRight" valign="bottom"><a class="el" href="group__grp__fmsketch.html">FM (Flajolet-Martin)</a></td></tr> | 
 | <tr class="separator:"><td class="memSeparator" colspan="2"> </td></tr> | 
 | <tr class="memitem:group__grp__mfvsketch"><td class="memItemLeft" align="right" valign="top"> </td><td class="memItemRight" valign="bottom"><a class="el" href="group__grp__mfvsketch.html">MFV (Most Frequent Values)</a></td></tr> | 
 | <tr class="separator:"><td class="memSeparator" colspan="2"> </td></tr> | 
 | </table> | 
 | <a name="details" id="details"></a><h2 class="groupheader">Detailed Description</h2> | 
 | <dl class="section warning"><dt>Warning</dt><dd><em> This MADlib method is still in early stage development. There may be some issues that will be addressed in a future version. Interface and implementation is subject to change. </em></dd></dl> | 
 | <p>Sketches (sometimes called "synopsis data structures") are small randomized in-memory data structures that capture statistical properties of a large set of values (e.g., a column of a table). Sketches can be formed in a single pass of the data, and used to approximate a variety of descriptive statistics.</p> | 
 | <p>We implement sketches as SQL User-Defined Aggregates (UDAs). Because they are single-pass, small-space and parallelized, a single query can use many sketches to gather summary statistics on many columns of a table efficiently.</p> | 
 | <p>This module currently implements user-defined aggregates based on three main sketch methods:</p> | 
 | <ul> | 
 | <li><em>Count-Min (CM)</em> sketches, which can be used to approximate a number of descriptive statistics including<ul> | 
 | <li><code>COUNT(*)</code> of rows whose column value matches a given value in a set</li> | 
 | <li><code>COUNT(*)</code> of rows whose column value falls in a range (*)</li> | 
 | <li>order statistics including <em>median</em> and <em>centiles</em> (*)</li> | 
 | <li><em>histograms</em>: both <em>equi-width</em> and <em>equi-depth</em> (*)</li> | 
 | </ul> | 
 | </li> | 
 | <li><em>Flajolet-Martin (FM)</em> sketches for approximating <code>COUNT(DISTINCT)</code>.</li> | 
 | <li><em>Most Frequent Value (MFV)</em> sketches, which output the most frequently-occuring values in a column, along with their associated counts.</li> | 
 | </ul> | 
 | <p><em>Note:</em> Features marked with a star (*) only work for discrete types that can be cast to int8.</p> | 
 | <p>The sketch methods consist of a number of SQL UDAs (user defined aggregates) and UDFs (user defined functions), to be used directly in SQL queries. </p> | 
 | </div><!-- contents --> | 
 | </div><!-- doc-content --> | 
 | <!-- start footer part --> | 
 | <div id="nav-path" class="navpath"><!-- id is needed for treeview function! --> | 
 |   <ul> | 
 |     <li class="footer">Generated on Thu Jan 9 2014 20:27:18 for MADlib by | 
 |     <a href="http://www.doxygen.org/index.html"> | 
 |     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.4 </li> | 
 |   </ul> | 
 | </div> | 
 | </body> | 
 | </html> |