blob: 55bd092e9798a3272ec60c0108b6b69acda358b0 [file] [log] [blame]
<!-- HTML header for doxygen 1.8.4-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<meta name="generator" content="Doxygen 1.8.13"/>
<meta name="keywords" content="madlib,postgres,greenplum,machine learning,data mining,deep learning,ensemble methods,data science,market basket analysis,affinity analysis,pca,lda,regression,elastic net,huber white,proportional hazards,k-means,latent dirichlet allocation,bayes,support vector machines,svm"/>
<title>MADlib: Stemming</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="navtreedata.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript">
$(document).ready(initResizable);
</script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/searchdata.js"></script>
<script type="text/javascript" src="search/search.js"></script>
<script type="text/javascript">
$(document).ready(function() { init_search(); });
</script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
jax: ["input/TeX","output/HTML-CSS"],
});
</script><script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js"></script>
<!-- hack in the navigation tree -->
<script type="text/javascript" src="eigen_navtree_hacks.js"></script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
<link href="madlib_extra.css" rel="stylesheet" type="text/css"/>
<!-- google analytics -->
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-45382226-1', 'madlib.apache.org');
ga('send', 'pageview');
</script>
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td id="projectlogo"><a href="http://madlib.apache.org"><img alt="Logo" src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
<td style="padding-left: 0.5em;">
<div id="projectname">
<span id="projectnumber">1.18.0</span>
</div>
<div id="projectbrief">User Documentation for Apache MADlib</div>
</td>
<td> <div id="MSearchBox" class="MSearchBoxInactive">
<span class="left">
<img id="MSearchSelect" src="search/mag_sel.png"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
alt=""/>
<input type="text" id="MSearchField" value="Search" accesskey="S"
onfocus="searchBox.OnSearchFieldFocus(true)"
onblur="searchBox.OnSearchFieldFocus(false)"
onkeyup="searchBox.OnSearchFieldChange(event)"/>
</span><span class="right">
<a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a>
</span>
</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.13 -->
<script type="text/javascript">
var searchBox = new SearchBox("searchBox", "search",false,'Search');
</script>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
<div id="nav-tree">
<div id="nav-tree-contents">
<div id="nav-sync" class="sync"></div>
</div>
</div>
<div id="splitbar" style="-moz-user-select:none;"
class="ui-resizable-handle">
</div>
</div>
<script type="text/javascript">
$(document).ready(function(){initNavTree('group__grp__stemmer.html','');});
</script>
<div id="doc-content">
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
onkeydown="return searchBox.OnSearchSelectKey(event)">
</div>
<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<iframe src="javascript:void(0)" frameborder="0"
name="MSearchResults" id="MSearchResults">
</iframe>
</div>
<div class="header">
<div class="headertitle">
<div class="title">Stemming<div class="ingroups"><a class="el" href="group__grp__datatrans.html">Data Types and Transformations</a></div></div> </div>
</div><!--header-->
<div class="contents">
<div class="toc"><b>Contents</b> <ul>
<li>
<a href="#notes">Implementation Notes</a> </li>
<li>
<a href="#list">List of Stemmer Operations</a> </li>
<li>
<a href="#examples">Examples</a> </li>
<li>
<a href="#related">Related Topics</a> </li>
</ul>
</div><p>This module provides a basic stemming operation for text input. It is a support module for several machine learning algorithms that require a stemmer. Currently, it only supports English words.</p>
<p>This function is a SQL interface to the implementation of the <a href="http://tartarus.org/~martin/PorterStemmer/">Porter Stemming Algorithm</a>. The original stemming algorithm is written and maintained by Martin Porter</p>
<p><a class="anchor" id="notes"></a></p><dl class="section user"><dt>Implementation Notes</dt><dd></dd></dl>
<p>All functions described in this module work with text OR text array.</p>
<p>Several of the function require TEXT VALUES, and returns NULL for a NULL input. See details in description of individual functions.</p>
<p><a class="anchor" id="list"></a></p><dl class="section user"><dt>Stemmer Operations</dt><dd><table class="output">
<tr>
<th><a class="el" href="porter__stemmer_8sql__in.html#aca5bc24a9a8f5c33470b9f0bf0b3c515" title="Returns stem of input token. Returns NULL if input token is NULL. ">stem_token()</a></th><td><p class="starttd">Returns the stem of the token. Returns NULL if input is NULL.</p>
<p class="endtd"></p>
</td></tr>
<tr>
<th><a class="el" href="porter__stemmer_8sql__in.html#a1ac3a2fd645ddf807b36a1328134a4ea" title="Returns stems in an array of input token array. Returns NULL element for corresponding input NULL tok...">stem_token_arr()</a></th><td><p class="starttd">Returns the stems in an array of input token array. The stem would be NULL for corresponding NULL token.</p>
<p class="endtd"><a class="anchor" id="examples"></a></p>
</td></tr>
</table>
</dd></dl>
<dl class="section user"><dt>Examples</dt><dd></dd></dl>
<ol type="1">
<li>Create a table with some words to be stemmed. <pre class="example">
CREATE TABLE token_tbl ( id integer,
word text
);
INSERT INTO token_tbl VALUES
(1, 'kneel'),
(2, 'kneeled'),
(3, 'kneeling'),
(4, 'kneels'),
(5, 'knees'),
(6, 'knell'),
(7, 'knelt'),
(8, 'knew'),
(9, 'knick'),
(10, 'knif'),
(11, 'knife'),
(12, 'knight'),
(13, 'knightly'),
(14, 'knights'),
(15, 'knit'),
(16, 'knits'),
(17, 'knitted'),
(18, 'knitting'),
(19, 'knives'),
(20, 'knob'),
(21, 'knobs'),
(22, 'knock'),
(23, 'knocked'),
(24, 'knocker'),
(25, 'knockers'),
(26, 'knocking'),
(27, 'knocks'),
(28, 'knopp'),
(29, 'knot'),
(30, 'knots');
</pre></li>
<li>Return the stem words <pre class="example">
SELECT madlib.stem_token(word) FROM token_tbl;
</pre> <pre class="result">
stem_token
&#160;------------
kneel
kneel
kneel
kneel
knee
knell
knelt
knew
knick
knif
knife
knight
knight
knight
knit
knit
knit
knit
knive
knob
knob
knock
knock
knocker
knocker
knock
knock
knopp
knot
knot
(30 rows)
</pre></li>
<li>The input can be processed as an array <pre class="example">
SELECT madlib.stem_token_arr(array_agg(word order by word)) FROM token_tbl;
</pre> <pre class="result">
stem_token_arr
&#160;-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
{kneel,kneel,kneel,kneel,knee,knell,knelt,knew,knick,knif,knife,knight,knight,knight,knit,knit,knit,knit,knive,knob,knob,knock,knock,knocker,knocker,knock,knock,knopp,knot,knot}
(1 row)
</pre></li>
</ol>
<p><a class="anchor" id="related"></a></p><dl class="section user"><dt>Related Topics</dt><dd></dd></dl>
<p>File <a class="el" href="porter__stemmer_8sql__in.html" title="implementation of porter stemmer operations in SQL ">porter_stemmer.sql_in</a> for list of functions and usage. </p>
</div><!-- contents -->
</div><!-- doc-content -->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
<ul>
<li class="footer">Generated on Wed Mar 31 2021 20:45:47 for MADlib by
<a href="http://www.doxygen.org/index.html">
<img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.13 </li>
</ul>
</div>
</body>
</html>