blob: 0fcf536e8485c31b0acdf3786de327f3ca79fb5b [file] [log] [blame]
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<meta name="generator" content="Doxygen 1.8.4"/>
<title>MADlib: Random Forest</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript">
$(document).ready(initResizable);
$(window).load(resizeHeight);
</script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/search.js"></script>
<script type="text/javascript">
$(document).ready(function() { searchBox.OnSelectItem(0); });
</script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
jax: ["input/TeX","output/HTML-CSS"],
});
</script><script src="../mathjax/MathJax.js"></script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td style="padding-left: 0.5em;">
<div id="projectname">MADlib
&#160;<span id="projectnumber">1.1</span> <span style="font-size:10pt; font-style:italic"><a href="../latest/./group__grp__rf.html"> A newer version is available</a></span>
</div>
<div id="projectbrief">User Documentation</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.4 -->
<script type="text/javascript">
var searchBox = new SearchBox("searchBox", "search",false,'Search');
</script>
<div id="navrow1" class="tabs">
<ul class="tablist">
<li><a href="index.html"><span>Main&#160;Page</span></a></li>
<li><a href="modules.html"><span>Modules</span></a></li>
<li><a href="files.html"><span>Files</span></a></li>
<li>
<div id="MSearchBox" class="MSearchBoxInactive">
<span class="left">
<img id="MSearchSelect" src="search/mag_sel.png"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
alt=""/>
<input type="text" id="MSearchField" value="Search" accesskey="S"
onfocus="searchBox.OnSearchFieldFocus(true)"
onblur="searchBox.OnSearchFieldFocus(false)"
onkeyup="searchBox.OnSearchFieldChange(event)"/>
</span><span class="right">
<a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a>
</span>
</div>
</li>
</ul>
</div>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
<div id="nav-tree">
<div id="nav-tree-contents">
<div id="nav-sync" class="sync"></div>
</div>
</div>
<div id="splitbar" style="-moz-user-select:none;"
class="ui-resizable-handle">
</div>
</div>
<script type="text/javascript">
$(document).ready(function(){initNavTree('group__grp__rf.html','');});
</script>
<div id="doc-content">
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
onkeydown="return searchBox.OnSearchSelectKey(event)">
<a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(0)"><span class="SelectionMark">&#160;</span>All</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(1)"><span class="SelectionMark">&#160;</span>Files</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(2)"><span class="SelectionMark">&#160;</span>Functions</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(3)"><span class="SelectionMark">&#160;</span>Groups</a></div>
<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<iframe src="javascript:void(0)" frameborder="0"
name="MSearchResults" id="MSearchResults">
</iframe>
</div>
<div class="header">
<div class="headertitle">
<div class="title">Random Forest<div class="ingroups"><a class="el" href="group__grp__early__stage.html">Early Stage Development</a></div></div> </div>
</div><!--header-->
<div class="contents">
<div id="dynsection-0" onclick="return toggleVisibility(this)" class="dynheader closed" style="cursor:pointer;">
<img id="dynsection-0-trigger" src="closed.png" alt="+"/> Collaboration diagram for Random Forest:</div>
<div id="dynsection-0-summary" class="dynsummary" style="display:block;">
</div>
<div id="dynsection-0-content" class="dyncontent" style="display:none;">
<center><table><tr><td><div class="center"><iframe scrolling="no" frameborder="0" src="group__grp__rf.svg" width="363" height="40"><p><b>This browser is not able to show SVG: try Firefox, Chrome, Safari, or Opera instead.</b></p></iframe>
</div>
</td></tr></table></center>
</div>
<dl class="section warning"><dt>Warning</dt><dd><em> This MADlib method is still in early stage development. There may be some issues that will be addressed in a future version. Interface and implementation is subject to change. </em></dd></dl>
<dl class="section user"><dt>About:</dt><dd>A random forest (RF) is an ensemble classifier that consists of many decision trees and outputs the class that is voted by the majority of the individual trees.</dd></dl>
<p>It has the following well-known advantages:</p>
<ul>
<li>Overall, RF produces better accuracy.</li>
<li>It can be very efficient for large data sets. Trees of an RF can be trained in parallel.</li>
<li>It can handle thousands of input attributes without attribute deletion.</li>
</ul>
<p>This module provides an implementation of the random forest algorithm described in [1].</p>
<p>The implementation supports:</p>
<ul>
<li>Building random forests</li>
<li>Multiple split critera, including: . Information Gain . Gini Coefficient . Gain Ratio</li>
<li>Random forest Classification/Scoring</li>
<li>Random forest Display</li>
<li>Continuous and Discrete features</li>
<li>Equal frequency discretization for continuous features</li>
<li>Missing value handling</li>
<li>Sampling with replacement</li>
</ul>
<dl class="section user"><dt>Input:</dt><dd></dd></dl>
<p>The <b>training data</b> is expected to be of the following form: </p>
<pre>{TABLE|VIEW} <em>trainingSource</em> (
...
<em>id</em> INT|BIGINT,
<em>feature1</em> SUPPORTED_DATA_TYPE,
<em>feature2</em> SUPPORTED_DATA_TYPE,
<em>feature3</em> SUPPORTED_DATA_TYPE,
....................
<em>featureN</em> SUPPORTED_DATA_TYPE,
<em>class</em> SUPPORTED_DATA_TYPE,
...
)</pre><p>The detailed list of SUPPORTED_DATA_TYPE is: SMALLINT, INT, BIGINT, FLOAT8, REAL, DECIMAL, INET, CIDR, MACADDR, BOOLEAN, CHAR, VARCHAR, TEXT, "char", DATE, TIME, TIMETZ, TIMESTAMP, TIMESTAMPTZ, and INTERVAL.</p>
<p>The <b>data to classify</b> is expected to be of the same form as <b>training data</b>, except that it does not need a class column.</p>
<dl class="section user"><dt>Usage:</dt><dd><ul>
<li>Run the training algorithm on the source data: <pre>SELECT * FROM <a class="el" href="rf_8sql__in.html#a3981c021e89c0c5f40ab436d96848845">rf_train</a>(
'<em>split_criterion</em>',
'<em>training_table_name</em>',
'<em>result_rf_table_name</em>',
'<em>num_trees</em>',
'<em>features_per_node</em>',
'<em>sampling_percentage</em>',
'<em>continuous_feature_names</em>',
'<em>feature_col_names</em>',
'<em>id_col_name</em>',
'<em>class_col_name</em>'
'<em>how2handle_missing_value</em>',
'<em>max_tree_depth</em>',
'<em>node_prune_threshold</em>',
'<em>node_split_threshold</em>',
'<em>verbosity</em>');
</pre> This will create the decision tree output table storing an abstract object (representing the model) used for further classification. Column names: <pre>
id | tree_location | feature | probability | ebp_coeff | maxclass | split_gain | live | cat_size | parent_id | lmc_nid | lmc_fval | is_feature_cont | split_value | tid | dp_ids
----+---------------+---------+-------------------+------------------+----------+-------------------+------+----------+-----------+---------+----------+-----------------+-------------+-----+--------
...</pre></li>
<li>Run the classification function using the learned model: <pre>SELECT * FROM <a class="el" href="rf_8sql__in.html#a57cd1d51be539e0da4fff351f8b477fe">rf_classify</a>(
'<em>rf_table_name</em>',
'<em>classification_table_name</em>',
'<em>result_table_name</em>');</pre> This will create the result_table with the classification results. <pre> </pre></li>
<li>Run the scoring function to score the learned model against a validation data set: <pre>SELECT * FROM <a class="el" href="rf_8sql__in.html#a9fd5da138e06924e89541ce4035ce8e1">rf_score</a>(
'<em>rf_table_name</em>',
'<em>validation_table_name</em>',
'<em>verbosity</em>');</pre> This will give a ratio of correctly classified items in the validation set. <pre> </pre></li>
<li>Run the display tree function using the learned model: <pre>SELECT * FROM <a class="el" href="rf_8sql__in.html#af89e4b67475e2e57039382467fa43747">rf_display</a>(
'<em>rf_table_name</em>');</pre> This will display the trained trees in human readable format. <pre> </pre></li>
<li>Run the clean tree function as below: <pre>SELECT * FROM <a class="el" href="rf_8sql__in.html#af33b77b75df225ee65a8acf18705256e">rf_clean</a>(
'<em>rf_table_name</em>');</pre> This will clean up the learned model and all metadata. <pre> </pre></li>
</ul>
</dd></dl>
<dl class="section user"><dt>Examples:</dt><dd><ol type="1">
<li>Prepare an input table/view, e.g.: <pre class="fragment">sql&gt; select * from golf_data order by id;
id | outlook | temperature | humidity | windy | class
----+----------+-------------+----------+--------+--------------
1 | sunny | 85 | 85 | false | Do not Play
2 | sunny | 80 | 90 | true | Do not Play
3 | overcast | 83 | 78 | false | Play
4 | rain | 70 | 96 | false | Play
5 | rain | 68 | 80 | false | Play
6 | rain | 65 | 70 | true | Do not Play
7 | overcast | 64 | 65 | true | Play
8 | sunny | 72 | 95 | false | Do not Play
9 | sunny | 69 | 70 | false | Play
10 | rain | 75 | 80 | false | Play
11 | sunny | 75 | 70 | true | Play
12 | overcast | 72 | 90 | true | Play
13 | overcast | 81 | 75 | false | Play
14 | rain | 71 | 80 | true | Do not Play
(14 rows)
</pre></li>
<li>Train the random forest, e.g.: <pre class="fragment">sql&gt; SELECT * FROM MADlib.rf_clean('trained_tree_infogain');
sql&gt; SELECT * FROM MADlib.rf_train(
'infogain', -- split criterion_name
'golf_data', -- input table name
'trained_tree_infogain', -- result tree name
10, -- number of trees
NULL, -- features_per_node
0.632, -- sampling_percentage
'temperature,humidity', -- continuous feature names
'outlook,temperature,humidity,windy', -- feature column names
'id', -- id column name
'class', -- class column name
'explicit', -- how to handle missing value
10, -- max tree depth
0.0, -- min percent mode
0.0, -- min percent split
0 -- max split point
0); -- verbosity
training_time | num_of_samples | num_trees | features_per_node | num_tree_nodes | max_tree_depth | split_criterion | acs_time | acc_time | olap_time | update_time | best_time
----------------+--------------+-----------+-------------------+----------------+----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------
00:00:03.60498 | 14 | 10 | 3 | 71 | 6 | infogain | 00:00:00.154991 | 00:00:00.404411 | 00:00:00.736876 | 00:00:00.374084 | 00:00:01.722658
(1 row)
</pre></li>
<li>Check the table records that keep the random forest: <pre class="fragment">sql&gt; select * from golf_tree order by tid,id;
id | tree_location | feature | probability | ebp_coeff | maxclass | split_gain | live | cat_size | parent_id | lmc_nid | lmc_fval | is_feature_cont | split_value | tid | dp_ids
----+---------------+---------+-------------------+-----------+----------+--------------------+------+----------+-----------+---------+----------+-----------------+-------------+-----+--------
1 | {0} | 3 | 0.777777777777778 | 1 | 2 | 0.197530864197531 | 0 | 9 | 0 | 24 | 1 | f | | 1 |
24 | {0,1} | 4 | 1 | 1 | 2 | 0 | 0 | 4 | 1 | | | f | | 1 | {3}
25 | {0,2} | 4 | 1 | 1 | 2 | 0 | 0 | 2 | 1 | | | f | | 1 | {3}
26 | {0,3} | 2 | 0.666666666666667 | 1 | 1 | 0.444444444444444 | 0 | 3 | 1 | 42 | 1 | t | 70 | 1 | {3}
42 | {0,3,1} | 4 | 1 | 1 | 2 | 0 | 0 | 1 | 26 | | | f | | 1 |
43 | {0,3,2} | 4 | 1 | 1 | 1 | 0 | 0 | 2 | 26 | | | f | | 1 |
2 | {0} | 2 | 0.555555555555556 | 1 | 1 | 0.17636684303351 | 0 | 9 | 0 | 11 | 1 | t | 65 | 2 |
11 | {0,1} | 4 | 1 | 1 | 2 | 0 | 0 | 2 | 2 | | | f | | 2 |
12 | {0,2} | 4 | 0.714285714285714 | 1 | 1 | 0.217687074829932 | 0 | 7 | 2 | 44 | 1 | f | | 2 |
44 | {0,2,1} | 3 | 0.666666666666667 | 1 | 2 | 0.444444444444444 | 0 | 3 | 12 | 57 | 1 | f | | 2 | {4}
45 | {0,2,2} | 3 | 1 | 1 | 1 | 0 | 0 | 4 | 12 | | | f | | 2 | {4}
57 | {0,2,1,1} | 2 | 1 | 1 | 2 | 0 | 0 | 1 | 44 | | | t | 78 | 2 | {4,3}
58 | {0,2,1,2} | 2 | 1 | 1 | 2 | 0 | 0 | 1 | 44 | | | t | 96 | 2 | {4,3}
59 | {0,2,1,3} | 2 | 1 | 1 | 1 | 0 | 0 | 1 | 44 | | | t | 85 | 2 | {4,3}
3 | {0} | 2 | 0.777777777777778 | 1 | 2 | 0.197530864197531 | 0 | 9 | 0 | 27 | 1 | t | 80 | 3 |
27 | {0,1} | 4 | 1 | 1 | 2 | 0 | 0 | 6 | 3 | | | f | | 3 |
28 | {0,2} | 2 | 0.666666666666667 | 1 | 1 | 0.444444444444444 | 0 | 3 | 3 | 46 | 1 | t | 90 | 3 |
46 | {0,2,1} | 4 | 1 | 1 | 1 | 0 | 0 | 2 | 28 | | | f | | 3 |
47 | {0,2,2} | 4 | 1 | 1 | 2 | 0 | 0 | 1 | 28 | | | f | | 3 |
4 | {0} | 4 | 0.888888888888889 | 1 | 2 | 0.0493827160493827 | 0 | 9 | 0 | 13 | 1 | f | | 4 |
13 | {0,1} | 3 | 1 | 1 | 2 | 0 | 0 | 6 | 4 | | | f | | 4 | {4}
14 | {0,2} | 3 | 0.666666666666667 | 1 | 2 | 0.444444444444444 | 0 | 3 | 4 | 48 | 1 | f | | 4 | {4}
48 | {0,2,1} | 2 | 1 | 1 | 2 | 0 | 0 | 2 | 14 | | | t | 90 | 4 | {4,3}
49 | {0,2,2} | 2 | 1 | 1 | 1 | 0 | 0 | 1 | 14 | | | t | 80 | 4 | {4,3}
5 | {0} | 2 | 0.888888888888889 | 1 | 2 | 0.197530864197531 | 0 | 9 | 0 | 29 | 1 | t | 90 | 5 |
29 | {0,1} | 4 | 1 | 1 | 2 | 0 | 0 | 8 | 5 | | | f | | 5 |
30 | {0,2} | 3 | 1 | 1 | 1 | 0 | 0 | 1 | 5 | | | f | | 5 |
6 | {0} | 3 | 0.555555555555556 | 1 | 2 | 0.345679012345679 | 0 | 9 | 0 | 15 | 1 | f | | 6 |
15 | {0,1} | 4 | 1 | 1 | 2 | 0 | 0 | 3 | 6 | | | f | | 6 | {3}
16 | {0,2} | 4 | 0.666666666666667 | 1 | 2 | 0.444444444444444 | 0 | 3 | 6 | 51 | 1 | f | | 6 | {3}
17 | {0,3} | 4 | 1 | 1 | 1 | 0 | 0 | 3 | 6 | | | f | | 6 | {3}
51 | {0,2,1} | 2 | 1 | 1 | 2 | 0 | 0 | 2 | 16 | | | t | 96 | 6 | {3,4}
52 | {0,2,2} | 2 | 1 | 1 | 1 | 0 | 0 | 1 | 16 | | | t | 70 | 6 | {3,4}
7 | {0} | 4 | 0.666666666666667 | 1 | 2 | 0.253968253968254 | 0 | 9 | 0 | 31 | 1 | f | | 7 |
31 | {0,1} | 2 | 0.857142857142857 | 1 | 2 | 0.102040816326531 | 0 | 7 | 7 | 36 | 1 | t | 80 | 7 | {4}
32 | {0,2} | 3 | 1 | 1 | 1 | 0 | 0 | 2 | 7 | | | f | | 7 | {4}
36 | {0,1,1} | 4 | 1 | 1 | 2 | 0 | 0 | 5 | 31 | | | f | | 7 |
37 | {0,1,2} | 2 | 0.5 | 1 | 2 | 0.5 | 0 | 2 | 31 | 60 | 1 | t | 95 | 7 |
60 | {0,1,2,1} | 4 | 1 | 1 | 1 | 0 | 0 | 1 | 37 | | | f | | 7 |
61 | {0,1,2,2} | 4 | 1 | 1 | 2 | 0 | 0 | 1 | 37 | | | f | | 7 |
8 | {0} | 3 | 0.777777777777778 | 1 | 2 | 0.0864197530864197 | 0 | 9 | 0 | 18 | 1 | f | | 8 |
18 | {0,1} | 4 | 1 | 1 | 2 | 0 | 0 | 4 | 8 | | | f | | 8 | {3}
19 | {0,2} | 4 | 0.666666666666667 | 1 | 2 | 0.444444444444444 | 0 | 3 | 8 | 38 | 1 | f | | 8 | {3}
20 | {0,3} | 2 | 0.5 | 1 | 2 | 0.5 | 0 | 2 | 8 | 53 | 1 | t | 70 | 8 | {3}
38 | {0,2,1} | 2 | 1 | 1 | 2 | 0 | 0 | 2 | 19 | | | t | 80 | 8 | {3,4}
39 | {0,2,2} | 2 | 1 | 1 | 1 | 0 | 0 | 1 | 19 | | | t | 80 | 8 | {3,4}
53 | {0,3,1} | 4 | 1 | 1 | 2 | 0 | 0 | 1 | 20 | | | f | | 8 |
54 | {0,3,2} | 4 | 1 | 1 | 1 | 0 | 0 | 1 | 20 | | | f | | 8 |
9 | {0} | 3 | 0.555555555555556 | 1 | 2 | 0.327160493827161 | 0 | 9 | 0 | 33 | 1 | f | | 9 |
33 | {0,1} | 4 | 1 | 1 | 2 | 0 | 0 | 2 | 9 | | | f | | 9 | {3}
34 | {0,2} | 4 | 0.75 | 1 | 2 | 0.375 | 0 | 4 | 9 | 55 | 1 | f | | 9 | {3}
35 | {0,3} | 4 | 1 | 1 | 1 | 0 | 0 | 3 | 9 | | | f | | 9 | {3}
55 | {0,2,1} | 2 | 1 | 1 | 2 | 0 | 0 | 3 | 34 | | | t | 96 | 9 | {3,4}
56 | {0,2,2} | 2 | 1 | 1 | 1 | 0 | 0 | 1 | 34 | | | t | 70 | 9 | {3,4}
10 | {0} | 3 | 0.666666666666667 | 1 | 2 | 0.277777777777778 | 0 | 9 | 0 | 21 | 1 | f | | 10 |
21 | {0,1} | 4 | 1 | 1 | 2 | 0 | 0 | 1 | 10 | | | f | | 10 | {3}
22 | {0,2} | 4 | 1 | 1 | 2 | 0 | 0 | 4 | 10 | | | f | | 10 | {3}
23 | {0,3} | 2 | 0.75 | 1 | 1 | 0.375 | 0 | 4 | 10 | 40 | 1 | t | 70 | 10 | {3}
40 | {0,3,1} | 4 | 1 | 1 | 2 | 0 | 0 | 1 | 23 | | | f | | 10 |
41 | {0,3,2} | 4 | 1 | 1 | 1 | 0 | 0 | 3 | 23 | | | f | | 10 |
(60 rows)
</pre></li>
<li>To display the random forest with human readable format: <pre class="fragment">sql&gt; select * from MADlib.rf_display('trained_tree_infogain');
rf_display
-----------------------------------------------------------------------------------------------------
Tree 1
Root Node : class( Play) num_elements(9) predict_prob(0.777777777777778)
outlook: = overcast : class( Play) num_elements(4) predict_prob(1)
outlook: = rain : class( Play) num_elements(2) predict_prob(1)
outlook: = sunny : class( Do not Play) num_elements(3) predict_prob(0.666666666666667)
humidity: &lt;= 70 : class( Play) num_elements(1) predict_prob(1)
humidity: &gt; 70 : class( Do not Play) num_elements(2) predict_prob(1)
Tree 2
Root Node : class( Do not Play) num_elements(9) predict_prob(0.555555555555556)
humidity: &lt;= 65 : class( Play) num_elements(2) predict_prob(1)
humidity: &gt; 65 : class( Do not Play) num_elements(7) predict_prob(0.714285714285714)
windy: = false : class( Play) num_elements(3) predict_prob(0.666666666666667)
outlook: = overcast : class( Play) num_elements(1) predict_prob(1)
outlook: = rain : class( Play) num_elements(1) predict_prob(1)
outlook: = sunny : class( Do not Play) num_elements(1) predict_prob(1)
windy: = true : class( Do not Play) num_elements(4) predict_prob(1)
Tree 3
Root Node : class( Play) num_elements(9) predict_prob(0.777777777777778)
humidity: &lt;= 80 : class( Play) num_elements(6) predict_prob(1)
humidity: &gt; 80 : class( Do not Play) num_elements(3) predict_prob(0.666666666666667)
humidity: &lt;= 90 : class( Do not Play) num_elements(2) predict_prob(1)
humidity: &gt; 90 : class( Play) num_elements(1) predict_prob(1)
Tree 4
Root Node : class( Play) num_elements(9) predict_prob(0.888888888888889)
windy: = false : class( Play) num_elements(6) predict_prob(1)
windy: = true : class( Play) num_elements(3) predict_prob(0.666666666666667)
outlook: = overcast : class( Play) num_elements(2) predict_prob(1)
outlook: = rain : class( Do not Play) num_elements(1) predict_prob(1)
Tree 5
Root Node : class( Play) num_elements(9) predict_prob(0.888888888888889)
humidity: &lt;= 90 : class( Play) num_elements(8) predict_prob(1)
humidity: &gt; 90 : class( Do not Play) num_elements(1) predict_prob(1)
Tree 6
Root Node : class( Play) num_elements(9) predict_prob(0.555555555555556)
outlook: = overcast : class( Play) num_elements(3) predict_prob(1)
outlook: = rain : class( Play) num_elements(3) predict_prob(0.666666666666667)
windy: = false : class( Play) num_elements(2) predict_prob(1)
windy: = true : class( Do not Play) num_elements(1) predict_prob(1)
outlook: = sunny : class( Do not Play) num_elements(3) predict_prob(1)
Tree 7
Root Node : class( Play) num_elements(9) predict_prob(0.666666666666667)
windy: = false : class( Play) num_elements(7) predict_prob(0.857142857142857)
humidity: &lt;= 80 : class( Play) num_elements(5) predict_prob(1)
humidity: &gt; 80 : class( Play) num_elements(2) predict_prob(0.5)
humidity: &lt;= 95 : class( Do not Play) num_elements(1) predict_prob(1)
humidity: &gt; 95 : class( Play) num_elements(1) predict_prob(1)
windy: = true : class( Do not Play) num_elements(2) predict_prob(1)
Tree 8
Root Node : class( Play) num_elements(9) predict_prob(0.777777777777778)
outlook: = overcast : class( Play) num_elements(4) predict_prob(1)
outlook: = rain : class( Play) num_elements(3) predict_prob(0.666666666666667)
windy: = false : class( Play) num_elements(2) predict_prob(1)
windy: = true : class( Do not Play) num_elements(1) predict_prob(1)
outlook: = sunny : class( Play) num_elements(2) predict_prob(0.5)
humidity: &lt;= 70 : class( Play) num_elements(1) predict_prob(1)
humidity: &gt; 70 : class( Do not Play) num_elements(1) predict_prob(1)
Tree 9
Root Node : class( Play) num_elements(9) predict_prob(0.555555555555556)
outlook: = overcast : class( Play) num_elements(2) predict_prob(1)
outlook: = rain : class( Play) num_elements(4) predict_prob(0.75)
windy: = false : class( Play) num_elements(3) predict_prob(1)
windy: = true : class( Do not Play) num_elements(1) predict_prob(1)
outlook: = sunny : class( Do not Play) num_elements(3) predict_prob(1)
Tree 10
Root Node : class( Play) num_elements(9) predict_prob(0.666666666666667)
outlook: = overcast : class( Play) num_elements(1) predict_prob(1)
outlook: = rain : class( Play) num_elements(4) predict_prob(1)
outlook: = sunny : class( Do not Play) num_elements(4) predict_prob(0.75)
humidity: &lt;= 70 : class( Play) num_elements(1) predict_prob(1)
humidity: &gt; 70 : class( Do not Play) num_elements(3) predict_prob(1)
(10 rows)
</pre></li>
<li>To classify data with the learned model: <pre class="fragment">sql&gt; select * from MADlib.rf_classify(
'trained_tree_infogain', -- name of the trained model
'golf_data', -- name of the table containing data to classify
'classification_result'); -- name of the output table
input_set_size | classification_time
----------------+---------------------
14 | 00:00:02.215017
(1 row)
</pre></li>
<li>Check classification results: <pre class="fragment">sql&gt; select t.id,t.outlook,t.temperature,t.humidity,t.windy,c.class from
classification_result c,golf_data t where t.id=c.id order by id;
id | outlook | temperature | humidity | windy | class
----+----------+-------------+----------+--------+--------------
1 | sunny | 85 | 85 | false | Do not Play
2 | sunny | 80 | 90 | true | Do not Play
3 | overcast | 83 | 78 | false | Play
4 | rain | 70 | 96 | false | Play
5 | rain | 68 | 80 | false | Play
6 | rain | 65 | 70 | true | Do not Play
7 | overcast | 64 | 65 | true | Play
8 | sunny | 72 | 95 | false | Do not Play
9 | sunny | 69 | 70 | false | Play
10 | rain | 75 | 80 | false | Play
11 | sunny | 75 | 70 | true | Do not Play
12 | overcast | 72 | 90 | true | Play
13 | overcast | 81 | 75 | false | Play
14 | rain | 71 | 80 | true | Do not Play
(14 rows)
</pre></li>
<li>Score the data against a validation set: <pre class="fragment">sql&gt; select * from MADlib.rf_score(
'trained_tree_infogain',
'golf_data_validation',
0);
rf_score
-------------------
0.928571428571429
(1 row)
</pre></li>
<li>Clean up the random forest and other auxiliary information: <pre class="fragment">testdb=# select MADLIB_SCHEMA.rf_clean('trained_tree_infogain');
rf_clean
----------
t
(1 row)
</pre></li>
</ol>
</dd></dl>
<dl class="section user"><dt>Literature:</dt><dd></dd></dl>
<p>[1] <a href="http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm">http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm</a></p>
<p>[2] <a href="http://en.wikipedia.org/wiki/Discretization_of_continuous_features">http://en.wikipedia.org/wiki/Discretization_of_continuous_features</a></p>
<dl class="section see"><dt>See Also</dt><dd>File <a class="el" href="rf_8sql__in.html" title="random forest APIs and main control logic written in PL/PGSQL ">rf.sql_in</a> documenting the SQL functions. </dd></dl>
</div><!-- contents -->
</div><!-- doc-content -->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
<ul>
<li class="footer">Generated on Wed Aug 21 2013 16:09:52 for MADlib by
<a href="http://www.doxygen.org/index.html">
<img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.4 </li>
</ul>
</div>
</body>
</html>