| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
| <html xmlns="http://www.w3.org/1999/xhtml"> |
| <head> |
| <meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/> |
| <meta http-equiv="X-UA-Compatible" content="IE=9"/> |
| <meta name="generator" content="Doxygen 1.8.4"/> |
| <title>MADlib: Random Forest</title> |
| <link href="tabs.css" rel="stylesheet" type="text/css"/> |
| <script type="text/javascript" src="jquery.js"></script> |
| <script type="text/javascript" src="dynsections.js"></script> |
| <link href="navtree.css" rel="stylesheet" type="text/css"/> |
| <script type="text/javascript" src="resize.js"></script> |
| <script type="text/javascript" src="navtree.js"></script> |
| <script type="text/javascript"> |
| $(document).ready(initResizable); |
| $(window).load(resizeHeight); |
| </script> |
| <link href="search/search.css" rel="stylesheet" type="text/css"/> |
| <script type="text/javascript" src="search/search.js"></script> |
| <script type="text/javascript"> |
| $(document).ready(function() { searchBox.OnSelectItem(0); }); |
| </script> |
| <script type="text/x-mathjax-config"> |
| MathJax.Hub.Config({ |
| extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], |
| jax: ["input/TeX","output/HTML-CSS"], |
| }); |
| </script><script src="../mathjax/MathJax.js"></script> |
| <link href="doxygen.css" rel="stylesheet" type="text/css" /> |
| </head> |
| <body> |
| <div id="top"><!-- do not remove this div, it is closed by doxygen! --> |
| <div id="titlearea"> |
| <table cellspacing="0" cellpadding="0"> |
| <tbody> |
| <tr style="height: 56px;"> |
| <td style="padding-left: 0.5em;"> |
| <div id="projectname">MADlib |
|  <span id="projectnumber">1.1</span> <span style="font-size:10pt; font-style:italic"><a href="../latest/./group__grp__rf.html"> A newer version is available</a></span> |
| </div> |
| <div id="projectbrief">User Documentation</div> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </div> |
| <!-- end header part --> |
| <!-- Generated by Doxygen 1.8.4 --> |
| <script type="text/javascript"> |
| var searchBox = new SearchBox("searchBox", "search",false,'Search'); |
| </script> |
| <div id="navrow1" class="tabs"> |
| <ul class="tablist"> |
| <li><a href="index.html"><span>Main Page</span></a></li> |
| <li><a href="modules.html"><span>Modules</span></a></li> |
| <li><a href="files.html"><span>Files</span></a></li> |
| <li> |
| <div id="MSearchBox" class="MSearchBoxInactive"> |
| <span class="left"> |
| <img id="MSearchSelect" src="search/mag_sel.png" |
| onmouseover="return searchBox.OnSearchSelectShow()" |
| onmouseout="return searchBox.OnSearchSelectHide()" |
| alt=""/> |
| <input type="text" id="MSearchField" value="Search" accesskey="S" |
| onfocus="searchBox.OnSearchFieldFocus(true)" |
| onblur="searchBox.OnSearchFieldFocus(false)" |
| onkeyup="searchBox.OnSearchFieldChange(event)"/> |
| </span><span class="right"> |
| <a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a> |
| </span> |
| </div> |
| </li> |
| </ul> |
| </div> |
| </div><!-- top --> |
| <div id="side-nav" class="ui-resizable side-nav-resizable"> |
| <div id="nav-tree"> |
| <div id="nav-tree-contents"> |
| <div id="nav-sync" class="sync"></div> |
| </div> |
| </div> |
| <div id="splitbar" style="-moz-user-select:none;" |
| class="ui-resizable-handle"> |
| </div> |
| </div> |
| <script type="text/javascript"> |
| $(document).ready(function(){initNavTree('group__grp__rf.html','');}); |
| </script> |
| <div id="doc-content"> |
| <!-- window showing the filter options --> |
| <div id="MSearchSelectWindow" |
| onmouseover="return searchBox.OnSearchSelectShow()" |
| onmouseout="return searchBox.OnSearchSelectHide()" |
| onkeydown="return searchBox.OnSearchSelectKey(event)"> |
| <a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(0)"><span class="SelectionMark"> </span>All</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(1)"><span class="SelectionMark"> </span>Files</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(2)"><span class="SelectionMark"> </span>Functions</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(3)"><span class="SelectionMark"> </span>Groups</a></div> |
| |
| <!-- iframe showing the search results (closed by default) --> |
| <div id="MSearchResultsWindow"> |
| <iframe src="javascript:void(0)" frameborder="0" |
| name="MSearchResults" id="MSearchResults"> |
| </iframe> |
| </div> |
| |
| <div class="header"> |
| <div class="headertitle"> |
| <div class="title">Random Forest<div class="ingroups"><a class="el" href="group__grp__early__stage.html">Early Stage Development</a></div></div> </div> |
| </div><!--header--> |
| <div class="contents"> |
| <div id="dynsection-0" onclick="return toggleVisibility(this)" class="dynheader closed" style="cursor:pointer;"> |
| <img id="dynsection-0-trigger" src="closed.png" alt="+"/> Collaboration diagram for Random Forest:</div> |
| <div id="dynsection-0-summary" class="dynsummary" style="display:block;"> |
| </div> |
| <div id="dynsection-0-content" class="dyncontent" style="display:none;"> |
| <center><table><tr><td><div class="center"><iframe scrolling="no" frameborder="0" src="group__grp__rf.svg" width="363" height="40"><p><b>This browser is not able to show SVG: try Firefox, Chrome, Safari, or Opera instead.</b></p></iframe> |
| </div> |
| </td></tr></table></center> |
| </div> |
| <dl class="section warning"><dt>Warning</dt><dd><em> This MADlib method is still in early stage development. There may be some issues that will be addressed in a future version. Interface and implementation is subject to change. </em></dd></dl> |
| <dl class="section user"><dt>About:</dt><dd>A random forest (RF) is an ensemble classifier that consists of many decision trees and outputs the class that is voted by the majority of the individual trees.</dd></dl> |
| <p>It has the following well-known advantages:</p> |
| <ul> |
| <li>Overall, RF produces better accuracy.</li> |
| <li>It can be very efficient for large data sets. Trees of an RF can be trained in parallel.</li> |
| <li>It can handle thousands of input attributes without attribute deletion.</li> |
| </ul> |
| <p>This module provides an implementation of the random forest algorithm described in [1].</p> |
| <p>The implementation supports:</p> |
| <ul> |
| <li>Building random forests</li> |
| <li>Multiple split critera, including: . Information Gain . Gini Coefficient . Gain Ratio</li> |
| <li>Random forest Classification/Scoring</li> |
| <li>Random forest Display</li> |
| <li>Continuous and Discrete features</li> |
| <li>Equal frequency discretization for continuous features</li> |
| <li>Missing value handling</li> |
| <li>Sampling with replacement</li> |
| </ul> |
| <dl class="section user"><dt>Input:</dt><dd></dd></dl> |
| <p>The <b>training data</b> is expected to be of the following form: </p> |
| <pre>{TABLE|VIEW} <em>trainingSource</em> ( |
| ... |
| <em>id</em> INT|BIGINT, |
| <em>feature1</em> SUPPORTED_DATA_TYPE, |
| <em>feature2</em> SUPPORTED_DATA_TYPE, |
| <em>feature3</em> SUPPORTED_DATA_TYPE, |
| .................... |
| <em>featureN</em> SUPPORTED_DATA_TYPE, |
| <em>class</em> SUPPORTED_DATA_TYPE, |
| ... |
| )</pre><p>The detailed list of SUPPORTED_DATA_TYPE is: SMALLINT, INT, BIGINT, FLOAT8, REAL, DECIMAL, INET, CIDR, MACADDR, BOOLEAN, CHAR, VARCHAR, TEXT, "char", DATE, TIME, TIMETZ, TIMESTAMP, TIMESTAMPTZ, and INTERVAL.</p> |
| <p>The <b>data to classify</b> is expected to be of the same form as <b>training data</b>, except that it does not need a class column.</p> |
| <dl class="section user"><dt>Usage:</dt><dd><ul> |
| <li>Run the training algorithm on the source data: <pre>SELECT * FROM <a class="el" href="rf_8sql__in.html#a3981c021e89c0c5f40ab436d96848845">rf_train</a>( |
| '<em>split_criterion</em>', |
| '<em>training_table_name</em>', |
| '<em>result_rf_table_name</em>', |
| '<em>num_trees</em>', |
| '<em>features_per_node</em>', |
| '<em>sampling_percentage</em>', |
| '<em>continuous_feature_names</em>', |
| '<em>feature_col_names</em>', |
| '<em>id_col_name</em>', |
| '<em>class_col_name</em>' |
| '<em>how2handle_missing_value</em>', |
| '<em>max_tree_depth</em>', |
| '<em>node_prune_threshold</em>', |
| '<em>node_split_threshold</em>', |
| '<em>verbosity</em>'); |
| </pre> This will create the decision tree output table storing an abstract object (representing the model) used for further classification. Column names: <pre> |
| id | tree_location | feature | probability | ebp_coeff | maxclass | split_gain | live | cat_size | parent_id | lmc_nid | lmc_fval | is_feature_cont | split_value | tid | dp_ids |
| ----+---------------+---------+-------------------+------------------+----------+-------------------+------+----------+-----------+---------+----------+-----------------+-------------+-----+-------- |
| ...</pre></li> |
| <li>Run the classification function using the learned model: <pre>SELECT * FROM <a class="el" href="rf_8sql__in.html#a57cd1d51be539e0da4fff351f8b477fe">rf_classify</a>( |
| '<em>rf_table_name</em>', |
| '<em>classification_table_name</em>', |
| '<em>result_table_name</em>');</pre> This will create the result_table with the classification results. <pre> </pre></li> |
| <li>Run the scoring function to score the learned model against a validation data set: <pre>SELECT * FROM <a class="el" href="rf_8sql__in.html#a9fd5da138e06924e89541ce4035ce8e1">rf_score</a>( |
| '<em>rf_table_name</em>', |
| '<em>validation_table_name</em>', |
| '<em>verbosity</em>');</pre> This will give a ratio of correctly classified items in the validation set. <pre> </pre></li> |
| <li>Run the display tree function using the learned model: <pre>SELECT * FROM <a class="el" href="rf_8sql__in.html#af89e4b67475e2e57039382467fa43747">rf_display</a>( |
| '<em>rf_table_name</em>');</pre> This will display the trained trees in human readable format. <pre> </pre></li> |
| <li>Run the clean tree function as below: <pre>SELECT * FROM <a class="el" href="rf_8sql__in.html#af33b77b75df225ee65a8acf18705256e">rf_clean</a>( |
| '<em>rf_table_name</em>');</pre> This will clean up the learned model and all metadata. <pre> </pre></li> |
| </ul> |
| </dd></dl> |
| <dl class="section user"><dt>Examples:</dt><dd><ol type="1"> |
| <li>Prepare an input table/view, e.g.: <pre class="fragment">sql> select * from golf_data order by id; |
| id | outlook | temperature | humidity | windy | class |
| ----+----------+-------------+----------+--------+-------------- |
| 1 | sunny | 85 | 85 | false | Do not Play |
| 2 | sunny | 80 | 90 | true | Do not Play |
| 3 | overcast | 83 | 78 | false | Play |
| 4 | rain | 70 | 96 | false | Play |
| 5 | rain | 68 | 80 | false | Play |
| 6 | rain | 65 | 70 | true | Do not Play |
| 7 | overcast | 64 | 65 | true | Play |
| 8 | sunny | 72 | 95 | false | Do not Play |
| 9 | sunny | 69 | 70 | false | Play |
| 10 | rain | 75 | 80 | false | Play |
| 11 | sunny | 75 | 70 | true | Play |
| 12 | overcast | 72 | 90 | true | Play |
| 13 | overcast | 81 | 75 | false | Play |
| 14 | rain | 71 | 80 | true | Do not Play |
| (14 rows) |
| </pre></li> |
| <li>Train the random forest, e.g.: <pre class="fragment">sql> SELECT * FROM MADlib.rf_clean('trained_tree_infogain'); |
| sql> SELECT * FROM MADlib.rf_train( |
| 'infogain', -- split criterion_name |
| 'golf_data', -- input table name |
| 'trained_tree_infogain', -- result tree name |
| 10, -- number of trees |
| NULL, -- features_per_node |
| 0.632, -- sampling_percentage |
| 'temperature,humidity', -- continuous feature names |
| 'outlook,temperature,humidity,windy', -- feature column names |
| 'id', -- id column name |
| 'class', -- class column name |
| 'explicit', -- how to handle missing value |
| 10, -- max tree depth |
| 0.0, -- min percent mode |
| 0.0, -- min percent split |
| 0 -- max split point |
| 0); -- verbosity |
| training_time | num_of_samples | num_trees | features_per_node | num_tree_nodes | max_tree_depth | split_criterion | acs_time | acc_time | olap_time | update_time | best_time |
| ----------------+--------------+-----------+-------------------+----------------+----------------+-----------------+-----------------+-----------------+-----------------+-----------------+----------------- |
| 00:00:03.60498 | 14 | 10 | 3 | 71 | 6 | infogain | 00:00:00.154991 | 00:00:00.404411 | 00:00:00.736876 | 00:00:00.374084 | 00:00:01.722658 |
| (1 row) |
| </pre></li> |
| <li>Check the table records that keep the random forest: <pre class="fragment">sql> select * from golf_tree order by tid,id; |
| id | tree_location | feature | probability | ebp_coeff | maxclass | split_gain | live | cat_size | parent_id | lmc_nid | lmc_fval | is_feature_cont | split_value | tid | dp_ids |
| ----+---------------+---------+-------------------+-----------+----------+--------------------+------+----------+-----------+---------+----------+-----------------+-------------+-----+-------- |
| 1 | {0} | 3 | 0.777777777777778 | 1 | 2 | 0.197530864197531 | 0 | 9 | 0 | 24 | 1 | f | | 1 | |
| 24 | {0,1} | 4 | 1 | 1 | 2 | 0 | 0 | 4 | 1 | | | f | | 1 | {3} |
| 25 | {0,2} | 4 | 1 | 1 | 2 | 0 | 0 | 2 | 1 | | | f | | 1 | {3} |
| 26 | {0,3} | 2 | 0.666666666666667 | 1 | 1 | 0.444444444444444 | 0 | 3 | 1 | 42 | 1 | t | 70 | 1 | {3} |
| 42 | {0,3,1} | 4 | 1 | 1 | 2 | 0 | 0 | 1 | 26 | | | f | | 1 | |
| 43 | {0,3,2} | 4 | 1 | 1 | 1 | 0 | 0 | 2 | 26 | | | f | | 1 | |
| 2 | {0} | 2 | 0.555555555555556 | 1 | 1 | 0.17636684303351 | 0 | 9 | 0 | 11 | 1 | t | 65 | 2 | |
| 11 | {0,1} | 4 | 1 | 1 | 2 | 0 | 0 | 2 | 2 | | | f | | 2 | |
| 12 | {0,2} | 4 | 0.714285714285714 | 1 | 1 | 0.217687074829932 | 0 | 7 | 2 | 44 | 1 | f | | 2 | |
| 44 | {0,2,1} | 3 | 0.666666666666667 | 1 | 2 | 0.444444444444444 | 0 | 3 | 12 | 57 | 1 | f | | 2 | {4} |
| 45 | {0,2,2} | 3 | 1 | 1 | 1 | 0 | 0 | 4 | 12 | | | f | | 2 | {4} |
| 57 | {0,2,1,1} | 2 | 1 | 1 | 2 | 0 | 0 | 1 | 44 | | | t | 78 | 2 | {4,3} |
| 58 | {0,2,1,2} | 2 | 1 | 1 | 2 | 0 | 0 | 1 | 44 | | | t | 96 | 2 | {4,3} |
| 59 | {0,2,1,3} | 2 | 1 | 1 | 1 | 0 | 0 | 1 | 44 | | | t | 85 | 2 | {4,3} |
| 3 | {0} | 2 | 0.777777777777778 | 1 | 2 | 0.197530864197531 | 0 | 9 | 0 | 27 | 1 | t | 80 | 3 | |
| 27 | {0,1} | 4 | 1 | 1 | 2 | 0 | 0 | 6 | 3 | | | f | | 3 | |
| 28 | {0,2} | 2 | 0.666666666666667 | 1 | 1 | 0.444444444444444 | 0 | 3 | 3 | 46 | 1 | t | 90 | 3 | |
| 46 | {0,2,1} | 4 | 1 | 1 | 1 | 0 | 0 | 2 | 28 | | | f | | 3 | |
| 47 | {0,2,2} | 4 | 1 | 1 | 2 | 0 | 0 | 1 | 28 | | | f | | 3 | |
| 4 | {0} | 4 | 0.888888888888889 | 1 | 2 | 0.0493827160493827 | 0 | 9 | 0 | 13 | 1 | f | | 4 | |
| 13 | {0,1} | 3 | 1 | 1 | 2 | 0 | 0 | 6 | 4 | | | f | | 4 | {4} |
| 14 | {0,2} | 3 | 0.666666666666667 | 1 | 2 | 0.444444444444444 | 0 | 3 | 4 | 48 | 1 | f | | 4 | {4} |
| 48 | {0,2,1} | 2 | 1 | 1 | 2 | 0 | 0 | 2 | 14 | | | t | 90 | 4 | {4,3} |
| 49 | {0,2,2} | 2 | 1 | 1 | 1 | 0 | 0 | 1 | 14 | | | t | 80 | 4 | {4,3} |
| 5 | {0} | 2 | 0.888888888888889 | 1 | 2 | 0.197530864197531 | 0 | 9 | 0 | 29 | 1 | t | 90 | 5 | |
| 29 | {0,1} | 4 | 1 | 1 | 2 | 0 | 0 | 8 | 5 | | | f | | 5 | |
| 30 | {0,2} | 3 | 1 | 1 | 1 | 0 | 0 | 1 | 5 | | | f | | 5 | |
| 6 | {0} | 3 | 0.555555555555556 | 1 | 2 | 0.345679012345679 | 0 | 9 | 0 | 15 | 1 | f | | 6 | |
| 15 | {0,1} | 4 | 1 | 1 | 2 | 0 | 0 | 3 | 6 | | | f | | 6 | {3} |
| 16 | {0,2} | 4 | 0.666666666666667 | 1 | 2 | 0.444444444444444 | 0 | 3 | 6 | 51 | 1 | f | | 6 | {3} |
| 17 | {0,3} | 4 | 1 | 1 | 1 | 0 | 0 | 3 | 6 | | | f | | 6 | {3} |
| 51 | {0,2,1} | 2 | 1 | 1 | 2 | 0 | 0 | 2 | 16 | | | t | 96 | 6 | {3,4} |
| 52 | {0,2,2} | 2 | 1 | 1 | 1 | 0 | 0 | 1 | 16 | | | t | 70 | 6 | {3,4} |
| 7 | {0} | 4 | 0.666666666666667 | 1 | 2 | 0.253968253968254 | 0 | 9 | 0 | 31 | 1 | f | | 7 | |
| 31 | {0,1} | 2 | 0.857142857142857 | 1 | 2 | 0.102040816326531 | 0 | 7 | 7 | 36 | 1 | t | 80 | 7 | {4} |
| 32 | {0,2} | 3 | 1 | 1 | 1 | 0 | 0 | 2 | 7 | | | f | | 7 | {4} |
| 36 | {0,1,1} | 4 | 1 | 1 | 2 | 0 | 0 | 5 | 31 | | | f | | 7 | |
| 37 | {0,1,2} | 2 | 0.5 | 1 | 2 | 0.5 | 0 | 2 | 31 | 60 | 1 | t | 95 | 7 | |
| 60 | {0,1,2,1} | 4 | 1 | 1 | 1 | 0 | 0 | 1 | 37 | | | f | | 7 | |
| 61 | {0,1,2,2} | 4 | 1 | 1 | 2 | 0 | 0 | 1 | 37 | | | f | | 7 | |
| 8 | {0} | 3 | 0.777777777777778 | 1 | 2 | 0.0864197530864197 | 0 | 9 | 0 | 18 | 1 | f | | 8 | |
| 18 | {0,1} | 4 | 1 | 1 | 2 | 0 | 0 | 4 | 8 | | | f | | 8 | {3} |
| 19 | {0,2} | 4 | 0.666666666666667 | 1 | 2 | 0.444444444444444 | 0 | 3 | 8 | 38 | 1 | f | | 8 | {3} |
| 20 | {0,3} | 2 | 0.5 | 1 | 2 | 0.5 | 0 | 2 | 8 | 53 | 1 | t | 70 | 8 | {3} |
| 38 | {0,2,1} | 2 | 1 | 1 | 2 | 0 | 0 | 2 | 19 | | | t | 80 | 8 | {3,4} |
| 39 | {0,2,2} | 2 | 1 | 1 | 1 | 0 | 0 | 1 | 19 | | | t | 80 | 8 | {3,4} |
| 53 | {0,3,1} | 4 | 1 | 1 | 2 | 0 | 0 | 1 | 20 | | | f | | 8 | |
| 54 | {0,3,2} | 4 | 1 | 1 | 1 | 0 | 0 | 1 | 20 | | | f | | 8 | |
| 9 | {0} | 3 | 0.555555555555556 | 1 | 2 | 0.327160493827161 | 0 | 9 | 0 | 33 | 1 | f | | 9 | |
| 33 | {0,1} | 4 | 1 | 1 | 2 | 0 | 0 | 2 | 9 | | | f | | 9 | {3} |
| 34 | {0,2} | 4 | 0.75 | 1 | 2 | 0.375 | 0 | 4 | 9 | 55 | 1 | f | | 9 | {3} |
| 35 | {0,3} | 4 | 1 | 1 | 1 | 0 | 0 | 3 | 9 | | | f | | 9 | {3} |
| 55 | {0,2,1} | 2 | 1 | 1 | 2 | 0 | 0 | 3 | 34 | | | t | 96 | 9 | {3,4} |
| 56 | {0,2,2} | 2 | 1 | 1 | 1 | 0 | 0 | 1 | 34 | | | t | 70 | 9 | {3,4} |
| 10 | {0} | 3 | 0.666666666666667 | 1 | 2 | 0.277777777777778 | 0 | 9 | 0 | 21 | 1 | f | | 10 | |
| 21 | {0,1} | 4 | 1 | 1 | 2 | 0 | 0 | 1 | 10 | | | f | | 10 | {3} |
| 22 | {0,2} | 4 | 1 | 1 | 2 | 0 | 0 | 4 | 10 | | | f | | 10 | {3} |
| 23 | {0,3} | 2 | 0.75 | 1 | 1 | 0.375 | 0 | 4 | 10 | 40 | 1 | t | 70 | 10 | {3} |
| 40 | {0,3,1} | 4 | 1 | 1 | 2 | 0 | 0 | 1 | 23 | | | f | | 10 | |
| 41 | {0,3,2} | 4 | 1 | 1 | 1 | 0 | 0 | 3 | 23 | | | f | | 10 | |
| (60 rows) |
| </pre></li> |
| <li>To display the random forest with human readable format: <pre class="fragment">sql> select * from MADlib.rf_display('trained_tree_infogain'); |
| rf_display |
| ----------------------------------------------------------------------------------------------------- |
| |
| Tree 1 |
| Root Node : class( Play) num_elements(9) predict_prob(0.777777777777778) |
| outlook: = overcast : class( Play) num_elements(4) predict_prob(1) |
| outlook: = rain : class( Play) num_elements(2) predict_prob(1) |
| outlook: = sunny : class( Do not Play) num_elements(3) predict_prob(0.666666666666667) |
| humidity: <= 70 : class( Play) num_elements(1) predict_prob(1) |
| humidity: > 70 : class( Do not Play) num_elements(2) predict_prob(1) |
| |
| |
| Tree 2 |
| Root Node : class( Do not Play) num_elements(9) predict_prob(0.555555555555556) |
| humidity: <= 65 : class( Play) num_elements(2) predict_prob(1) |
| humidity: > 65 : class( Do not Play) num_elements(7) predict_prob(0.714285714285714) |
| windy: = false : class( Play) num_elements(3) predict_prob(0.666666666666667) |
| outlook: = overcast : class( Play) num_elements(1) predict_prob(1) |
| outlook: = rain : class( Play) num_elements(1) predict_prob(1) |
| outlook: = sunny : class( Do not Play) num_elements(1) predict_prob(1) |
| windy: = true : class( Do not Play) num_elements(4) predict_prob(1) |
| |
| |
| Tree 3 |
| Root Node : class( Play) num_elements(9) predict_prob(0.777777777777778) |
| humidity: <= 80 : class( Play) num_elements(6) predict_prob(1) |
| humidity: > 80 : class( Do not Play) num_elements(3) predict_prob(0.666666666666667) |
| humidity: <= 90 : class( Do not Play) num_elements(2) predict_prob(1) |
| humidity: > 90 : class( Play) num_elements(1) predict_prob(1) |
| |
| |
| Tree 4 |
| Root Node : class( Play) num_elements(9) predict_prob(0.888888888888889) |
| windy: = false : class( Play) num_elements(6) predict_prob(1) |
| windy: = true : class( Play) num_elements(3) predict_prob(0.666666666666667) |
| outlook: = overcast : class( Play) num_elements(2) predict_prob(1) |
| outlook: = rain : class( Do not Play) num_elements(1) predict_prob(1) |
| |
| |
| Tree 5 |
| Root Node : class( Play) num_elements(9) predict_prob(0.888888888888889) |
| humidity: <= 90 : class( Play) num_elements(8) predict_prob(1) |
| humidity: > 90 : class( Do not Play) num_elements(1) predict_prob(1) |
| |
| |
| Tree 6 |
| Root Node : class( Play) num_elements(9) predict_prob(0.555555555555556) |
| outlook: = overcast : class( Play) num_elements(3) predict_prob(1) |
| outlook: = rain : class( Play) num_elements(3) predict_prob(0.666666666666667) |
| windy: = false : class( Play) num_elements(2) predict_prob(1) |
| windy: = true : class( Do not Play) num_elements(1) predict_prob(1) |
| outlook: = sunny : class( Do not Play) num_elements(3) predict_prob(1) |
| |
| |
| Tree 7 |
| Root Node : class( Play) num_elements(9) predict_prob(0.666666666666667) |
| windy: = false : class( Play) num_elements(7) predict_prob(0.857142857142857) |
| humidity: <= 80 : class( Play) num_elements(5) predict_prob(1) |
| humidity: > 80 : class( Play) num_elements(2) predict_prob(0.5) |
| humidity: <= 95 : class( Do not Play) num_elements(1) predict_prob(1) |
| humidity: > 95 : class( Play) num_elements(1) predict_prob(1) |
| windy: = true : class( Do not Play) num_elements(2) predict_prob(1) |
| |
| |
| Tree 8 |
| Root Node : class( Play) num_elements(9) predict_prob(0.777777777777778) |
| outlook: = overcast : class( Play) num_elements(4) predict_prob(1) |
| outlook: = rain : class( Play) num_elements(3) predict_prob(0.666666666666667) |
| windy: = false : class( Play) num_elements(2) predict_prob(1) |
| windy: = true : class( Do not Play) num_elements(1) predict_prob(1) |
| outlook: = sunny : class( Play) num_elements(2) predict_prob(0.5) |
| humidity: <= 70 : class( Play) num_elements(1) predict_prob(1) |
| humidity: > 70 : class( Do not Play) num_elements(1) predict_prob(1) |
| |
| |
| Tree 9 |
| Root Node : class( Play) num_elements(9) predict_prob(0.555555555555556) |
| outlook: = overcast : class( Play) num_elements(2) predict_prob(1) |
| outlook: = rain : class( Play) num_elements(4) predict_prob(0.75) |
| windy: = false : class( Play) num_elements(3) predict_prob(1) |
| windy: = true : class( Do not Play) num_elements(1) predict_prob(1) |
| outlook: = sunny : class( Do not Play) num_elements(3) predict_prob(1) |
| |
| |
| Tree 10 |
| Root Node : class( Play) num_elements(9) predict_prob(0.666666666666667) |
| outlook: = overcast : class( Play) num_elements(1) predict_prob(1) |
| outlook: = rain : class( Play) num_elements(4) predict_prob(1) |
| outlook: = sunny : class( Do not Play) num_elements(4) predict_prob(0.75) |
| humidity: <= 70 : class( Play) num_elements(1) predict_prob(1) |
| humidity: > 70 : class( Do not Play) num_elements(3) predict_prob(1) |
| |
| (10 rows) |
| </pre></li> |
| <li>To classify data with the learned model: <pre class="fragment">sql> select * from MADlib.rf_classify( |
| 'trained_tree_infogain', -- name of the trained model |
| 'golf_data', -- name of the table containing data to classify |
| 'classification_result'); -- name of the output table |
| input_set_size | classification_time |
| ----------------+--------------------- |
| 14 | 00:00:02.215017 |
| (1 row) |
| </pre></li> |
| <li>Check classification results: <pre class="fragment">sql> select t.id,t.outlook,t.temperature,t.humidity,t.windy,c.class from |
| classification_result c,golf_data t where t.id=c.id order by id; |
| id | outlook | temperature | humidity | windy | class |
| ----+----------+-------------+----------+--------+-------------- |
| 1 | sunny | 85 | 85 | false | Do not Play |
| 2 | sunny | 80 | 90 | true | Do not Play |
| 3 | overcast | 83 | 78 | false | Play |
| 4 | rain | 70 | 96 | false | Play |
| 5 | rain | 68 | 80 | false | Play |
| 6 | rain | 65 | 70 | true | Do not Play |
| 7 | overcast | 64 | 65 | true | Play |
| 8 | sunny | 72 | 95 | false | Do not Play |
| 9 | sunny | 69 | 70 | false | Play |
| 10 | rain | 75 | 80 | false | Play |
| 11 | sunny | 75 | 70 | true | Do not Play |
| 12 | overcast | 72 | 90 | true | Play |
| 13 | overcast | 81 | 75 | false | Play |
| 14 | rain | 71 | 80 | true | Do not Play |
| (14 rows) |
| </pre></li> |
| <li>Score the data against a validation set: <pre class="fragment">sql> select * from MADlib.rf_score( |
| 'trained_tree_infogain', |
| 'golf_data_validation', |
| 0); |
| rf_score |
| ------------------- |
| 0.928571428571429 |
| (1 row) |
| </pre></li> |
| <li>Clean up the random forest and other auxiliary information: <pre class="fragment">testdb=# select MADLIB_SCHEMA.rf_clean('trained_tree_infogain'); |
| rf_clean |
| ---------- |
| t |
| (1 row) |
| </pre></li> |
| </ol> |
| </dd></dl> |
| <dl class="section user"><dt>Literature:</dt><dd></dd></dl> |
| <p>[1] <a href="http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm">http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm</a></p> |
| <p>[2] <a href="http://en.wikipedia.org/wiki/Discretization_of_continuous_features">http://en.wikipedia.org/wiki/Discretization_of_continuous_features</a></p> |
| <dl class="section see"><dt>See Also</dt><dd>File <a class="el" href="rf_8sql__in.html" title="random forest APIs and main control logic written in PL/PGSQL ">rf.sql_in</a> documenting the SQL functions. </dd></dl> |
| </div><!-- contents --> |
| </div><!-- doc-content --> |
| <!-- start footer part --> |
| <div id="nav-path" class="navpath"><!-- id is needed for treeview function! --> |
| <ul> |
| <li class="footer">Generated on Wed Aug 21 2013 16:09:52 for MADlib by |
| <a href="http://www.doxygen.org/index.html"> |
| <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.4 </li> |
| </ul> |
| </div> |
| </body> |
| </html> |