| /* ----------------------------------------------------------------------- *//** |
| * |
| * @file assoc_rules.sql_in |
| * |
| * @brief The \ref assoc_rules function computes association rules for a given |
| * set of data. The data is assumed to have two dimensions; items (between which |
| * we are trying to discover associations), and a transaction id. This tranaction |
| * id groups the items by event and could also be a user id, date, etc. depending |
| * on the context of the data. This function assumes the data is stored in two |
| * columns with one transaction id and one item per row. |
| |
| * @date June 2011 |
| * @date August 2012 |
| * |
| * @sa For a brief introduction to the association rules implementation, see the module |
| * description \ref grp_assoc_rules. |
| * |
| *//* ----------------------------------------------------------------------- */ |
| |
| m4_include(`SQLCommon.m4') |
| |
| |
| /** |
| @addtogroup grp_assoc_rules |
| |
| <div class="toc"><b>Contents</b> |
| <ul> |
| <li><a href="#rules">Rules</a></li> |
| <li><a href="#algorithm">Apriori Algorithm</a></li> |
| <li><a href="#syntax">Function Syntax</a></li> |
| <li><a href="#examples">Examples</a></li> |
| <li><a href="#notes">Notes</a></li> |
| <li><a href="#literature">Literature</a></li> |
| <li><a href="#related">Related Topics</a></li> |
| </ul> |
| </div> |
| |
| @brief Computes association rules for a given set of data. |
| |
| This module implements the association rules data mining technique on a |
| transactional data set. Given the names of a table and the columns, minimum |
| support and confidence values, this function generates all single and |
| multidimensional association rules that meet the minimum thresholds. |
| |
| Association rule mining is a widely used technique for discovering relationships |
| between variables in a large data set (e.g., items in a store that are commonly |
| purchased together). The classic market basket analysis example using |
| association rules is the "beer and diapers" rule. According to data mining urban |
| legend, a study of customer purchase behavior in a supermarket found that men |
| often purchased beer and diapers together. After making this discovery, the |
| managers strategically placed beer and diapers closer together on the shelves |
| and saw a dramatic increase in sales. In addition to market basket analysis, |
| association rules are also used in bioinformatics, web analytics, and several |
| other fields. |
| |
| This type of data mining algorithm uses transactional data. Every transaction |
| event has a unique identifier, and each transaction consists of a set of |
| items (or itemset). Purchases are considered binary (either it was purchased or |
| not), and this implementation does not take into consideration the quantity of |
| each item. For the MADlib association rules function, it is assumed that the |
| data is stored in two columns with one item and transaction id per row. |
| Transactions with multiple items will span multiple rows with one row per item. |
| |
| <pre> |
| trans_id | product |
| ---------+--------- |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 3 |
| 2 | 4 |
| 2 | 5 |
| 3 | 1 |
| 3 | 4 |
| 3 | 6 |
| ... |
| </pre> |
| |
| @anchor rules |
| @par Rules |
| |
| Association rules take the form "If X, then Y", where X and Y are non-empty |
| itemsets. X and Y are called the antecedent and consequent, or the left-hand-side |
| and right-hand-side, of the rule respectively. Using our previous example, |
| the association rule may state "If {diapers}, then {beer}" with .2 support and |
| .85 confidence. |
| |
| The following metrics are defined for any given itemset "X". |
| - Count: The number of transactions that contain X |
| |
| - Support: The ratio of transactions that contain X to all transactions, T |
| \f[ |
| S (X) = \frac{Total X}{Total transactions} |
| \f] |
| |
| Given any association rule "If X, then Y", the association rules function will |
| also calculate the following metrics: |
| - Count: The number of transactions that contain X,Y |
| |
| - Support: The ratio of transactions that contain X,Y to all transactions, T |
| \f[ |
| S (X \Rightarrow Y) = \frac{Total(X \cup Y)}{Total transactions} |
| \f] |
| |
| - Confidence: The ratio of transactions that contain \f$ X,Y \f$ to |
| transactions that contain \f$ X \f$. One could view this metric as the |
| conditional probability of \f$ Y \f$ , given \f$ X \f$ . \f$ P(Y|X) \f$ |
| \f[ |
| C (X \Rightarrow Y) = \frac{s(X \cap Y )}{s(X)} |
| \f] |
| |
| - Lift: The ratio of observed support of \f$ X,Y \f$ to the expected support of |
| \f$ X,Y \f$ , assuming \f$ X \f$ and \f$ Y \f$ are independent. |
| \f[ |
| L (X \Rightarrow Y) = \frac{s(X \cap Y )}{s(X) \cdot s(Y)} |
| \f] |
| |
| - Conviction: The ratio of expected support of \f$ X \f$ occurring without |
| \f$ Y \f$ assuming \f$ X \f$ and \f$ \neg Y \f$ are independent, to the |
| observed support of \f$ X \f$ occuring without \f$ Y \f$. If conviction is |
| greater than 1, then this metric shows that incorrect predictions ( \f$ X |
| \Rightarrow Y \f$ ) occur less often than if these two actions were independent. |
| This metric can be viewed as the ratio that the association rule would be |
| incorrect if the actions were independent (i.e. a conviction of 1.5 indicates |
| that if the variables were independent, this rule would be incorrect 50% more |
| often.) |
| |
| \f[ |
| Conv (X \Rightarrow Y) = \frac{1 - S(Y)}{1 - C(X \Rightarrow Y)} |
| \f] |
| |
| @anchor algorithm |
| @par Apriori Algorithm |
| |
| Although there are many algorithms that generate association rules, the classic |
| algorithm is called Apriori [1] which we have implemented in this module. It is a |
| breadth-first search, as opposed to depth-first searches like Eclat. Frequent |
| itemsets of order \f$ n \f$ are generated from sets of order \f$ n - 1 \f$. |
| Using the downward closure property, all sets must have frequent subsets. There |
| are two steps in this algorithm; generating frequent itemsets, and using these |
| itemsets to construct the association rules. A simplified version of the |
| algorithm is as follows, and assumes a minimum level of support and confidence |
| is provided: |
| |
| \e Initial \e step |
| -# Generate all itemsets of order 1. |
| -# Eliminate itemsets that have support less than minimum support. |
| |
| \e Main \e algorithm |
| -# For \f$ n \ge 2 \f$, generate itemsets of order \f$ n \f$ by combining the |
| itemsets of order \f$ n - 1 \f$. |
| This is done by doing the union of two itemsets that have identical items except one. |
| -# Eliminate itemsets that have (n-1) order subsets with insufficient support. |
| -# Eliminate itemsets with insufficient support. |
| -# Repeat until itemsets cannot be generated, or maximum itemset size is exceeded. |
| |
| \e Association \e rule \e generation |
| |
| Given a frequent itemset \f$ A \f$ generated from the Apriori algorithm, and all |
| subsets \f$ B \f$ , we generate rules such that \f$ B \Rightarrow (A - B) \f$ |
| meets minimum confidence requirements. |
| |
| @note Beware of combinatorial explosion. The Apriori algorithm can potentially |
| generate a huge number of rules, even for fairly simple data sets, resulting |
| in run times that are unreasonably long. To avoid this, it is recommended |
| to cap the maximum itemset size to a small number to start with, then |
| increase it gradually. Similarly, <em>max_LHS_size</em> and <em>max_RHS_size</em> |
| limit the number of items on the LHS and RHS of the rules |
| and can significantly reduce run times. |
| <em>Support</em> and <em>confidence</em> values are |
| parameters that can also be used to control rule generation. |
| |
| @anchor syntax |
| @par Function Syntax |
| Association rules has the following syntax: |
| <pre class="syntax"> |
| assoc_rules( support, |
| confidence, |
| tid_col, |
| item_col, |
| input_table, |
| output_schema, |
| verbose, |
| max_itemset_size, |
| max_LHS_size, |
| max_RHS_size |
| );</pre> |
| This generates all association rules that satisfy the specified minimum |
| <em>support</em> and <em>confidence</em>. |
| |
| \b Arguments |
| <dl class="arglist"> |
| <dt>support</dt> |
| <dd>Minimum level of support needed for each itemset to be included in result.</dd> |
| |
| <dt>confidence</dt> |
| <dd>Minimum level of confidence needed for each rule to be included in result.</dd> |
| |
| <dt>tid_col</dt> |
| <dd>Name of the column storing the transaction ids.</dd> |
| |
| <dt>item_col</dt> |
| <dd>Name of the column storing the products.</dd> |
| |
| <dt>input_table</dt> |
| <dd>Name of the table containing the input data. |
| |
| The input data is expected to be of the following form: |
| <pre>{TABLE|VIEW} <em>input_table</em> ( |
| <em>trans_id</em> INTEGER, |
| <em>product</em> TEXT |
| )</pre> |
| |
| The algorithm maps the product names to consecutive integer ids starting at 1. |
| If they are already structured this way, then the ids will not change. |
| </dd> |
| |
| <dt>output_schema</dt> |
| <dd>The name of the schema where the final results will be stored. |
| The schema must be created before calling the function. Alternatively, use |
| <tt>NULL</tt> to output to the current schema. |
| |
| The results containing the rules, support, count, confidence, lift, and |
| conviction are stored in the table \c assoc_rules in the schema |
| specified by \c output_schema. |
| |
| The table has the following columns. |
| <table class="output"> |
| <tr> |
| <th>ruleid</th> |
| <td>integer</td> |
| </tr> |
| <tr> |
| <th>pre</th> |
| <td>text</td> |
| </tr> |
| <tr> |
| <th>post</th> |
| <td>text</td> |
| </tr> |
| <tr> |
| <th>count</th> |
| <td>integer</td> |
| </tr> |
| <tr> |
| <th>support</th> |
| <td>double</td> |
| </tr> |
| <tr> |
| <th>confidence</th> |
| <td>double</td> |
| </tr> |
| <tr> |
| <th>lift</th> |
| <td>double</td> |
| </tr> |
| <tr> |
| <th>conviction</th> |
| <td>double</td> |
| </tr> |
| </table> |
| |
| On Greenplum Database, the table is distributed by the \c ruleid column. |
| |
| The \c pre and \c post columns are the itemsets of left and right hand sides of the |
| association rule respectively. The \c support, \c confidence, \c lift, and |
| \c conviction columns are calculated as described earlier. |
| </dd> |
| |
| <dt>verbose (optional)</dt> |
| <dd>BOOLEAN, default: FALSE. Determines if details are printed for each iteration |
| as the algorithm progresses.</dd> |
| |
| <dt>max_itemset_size (optional)</dt> |
| <dd>INTEGER, default: 10. Determines the maximum size of frequent |
| itemsets that are used for generating association rules. Must be 2 or more. |
| This parameter can be used to reduce run time for data sets where itemset size is large, |
| which is a common situation. If your query is not returning or is running too long, |
| try using a lower value for this parameter.</dd> |
| |
| |
| <dt>max_LHS_size (optional)</dt> |
| <dd>INTEGER, default: NULL. Determines the maximum size of the left hand side |
| of the rule. Must be 1 or more. |
| This parameter can be used to reduce run time.</dd> |
| |
| |
| <dt>max_RHS_size (optional)</dt> |
| <dd>INTEGER, default: NULL. Determines the maximum size of the right hand side |
| of the rule. Must be 1 or more. |
| This parameter can be used to reduce run time. For example, setting to 1 |
| can significantly reduce run time if this makes sense for your use case. |
| (The <em>apriori</em> algorithm in the R package <em>arules</em> [2] only |
| supports a RHS of 1.)</dd> |
| </dl> |
| |
| |
| @anchor examples |
| @examp |
| |
| Let's look at some sample transactional data and generate association rules. |
| |
| -# Create an input dataset: |
| <pre class="example"> |
| DROP TABLE IF EXISTS test_data; |
| CREATE TABLE test_data ( |
| trans_id INT, |
| product TEXT |
| ); |
| INSERT INTO test_data VALUES (1, 'beer'); |
| INSERT INTO test_data VALUES (1, 'diapers'); |
| INSERT INTO test_data VALUES (1, 'chips'); |
| INSERT INTO test_data VALUES (2, 'beer'); |
| INSERT INTO test_data VALUES (2, 'diapers'); |
| INSERT INTO test_data VALUES (3, 'beer'); |
| INSERT INTO test_data VALUES (3, 'diapers'); |
| INSERT INTO test_data VALUES (4, 'beer'); |
| INSERT INTO test_data VALUES (4, 'chips'); |
| INSERT INTO test_data VALUES (5, 'beer'); |
| INSERT INTO test_data VALUES (6, 'beer'); |
| INSERT INTO test_data VALUES (6, 'diapers'); |
| INSERT INTO test_data VALUES (6, 'chips'); |
| INSERT INTO test_data VALUES (7, 'beer'); |
| INSERT INTO test_data VALUES (7, 'diapers'); |
| </pre> |
| |
| -# Let \f$ min(support) = .25 \f$ and \f$ min(confidence) = .5 \f$, and the |
| output schema is set to \c NULL indicating output to the current schema. |
| In this example we set verbose to |
| TRUE so that we have some insight into progress of the function. We |
| can now generate association rules as follows: |
| <pre class="example"> |
| SELECT * FROM madlib.assoc_rules( .25, -- Support |
| .5, -- Confidence |
| 'trans_id', -- Transaction id col |
| 'product', -- Product col |
| 'test_data', -- Input data |
| NULL, -- Output schema |
| TRUE -- Verbose output |
| ); |
| </pre> |
| Result (iteration details not shown): |
| <pre class="result"> |
| output_schema | output_table | total_rules | total_time |
| ---------------+--------------+-------------+----------------- |
| public | assoc_rules | 7 | 00:00:00.569254 |
| (1 row) |
| </pre> |
| The association rules are stored in the assoc_rules table: |
| <pre class="example"> |
| SELECT * FROM assoc_rules |
| ORDER BY support DESC, confidence DESC; |
| </pre> |
| Result: |
| <pre class="result"> |
| ruleid | pre | post | count | support | confidence | lift | conviction |
| --------+-----------------+----------------+-------+-------------------+-------------------+-------------------+------------------- |
| 2 | {diapers} | {beer} | 5 | 0.714285714285714 | 1 | 1 | 0 |
| 6 | {beer} | {diapers} | 5 | 0.714285714285714 | 0.714285714285714 | 1 | 1 |
| 5 | {chips} | {beer} | 3 | 0.428571428571429 | 1 | 1 | 0 |
| 4 | {chips,diapers} | {beer} | 2 | 0.285714285714286 | 1 | 1 | 0 |
| 1 | {chips} | {diapers,beer} | 2 | 0.285714285714286 | 0.666666666666667 | 0.933333333333333 | 0.857142857142857 |
| 7 | {chips} | {diapers} | 2 | 0.285714285714286 | 0.666666666666667 | 0.933333333333333 | 0.857142857142857 |
| 3 | {beer,chips} | {diapers} | 2 | 0.285714285714286 | 0.666666666666667 | 0.933333333333333 | 0.857142857142857 |
| (7 rows) |
| </pre> |
| |
| -# Limit association rules generated from itemsets of size at most 2. This parameter is |
| a good way to reduce long run times. |
| <pre class="example"> |
| SELECT * FROM madlib.assoc_rules( .25, -- Support |
| .5, -- Confidence |
| 'trans_id', -- Transaction id col |
| 'product', -- Product col |
| 'test_data', -- Input data |
| NULL, -- Output schema |
| TRUE, -- Verbose output |
| 2 -- Max itemset size |
| ); |
| </pre> |
| Result (iteration details not shown): |
| <pre class="result"> |
| output_schema | output_table | total_rules | total_time |
| ---------------+--------------+-------------+----------------- |
| public | assoc_rules | 4 | 00:00:00.565176 |
| (1 row) |
| </pre> |
| The association rules are again stored in the assoc_rules table: |
| <pre class="example"> |
| SELECT * FROM assoc_rules |
| ORDER BY support DESC, confidence DESC; |
| </pre> |
| Result: |
| <pre class="result"> |
| ruleid | pre | post | count | support | confidence | lift | conviction |
| --------+-----------+-----------+-------+-------------------+-------------------+-------------------+------------------- |
| 1 | {diapers} | {beer} | 5 | 0.714285714285714 | 1 | 1 | 0 |
| 2 | {beer} | {diapers} | 5 | 0.714285714285714 | 0.714285714285714 | 1 | 1 |
| 3 | {chips} | {beer} | 3 | 0.428571428571429 | 1 | 1 | 0 |
| 4 | {chips} | {diapers} | 2 | 0.285714285714286 | 0.666666666666667 | 0.933333333333333 | 0.857142857142857 |
| (4 rows) |
| </pre> |
| |
| -# Post-processing can now be done on the output table in the case that |
| you want to filter the results. For example, if you want any single item on the left hand side |
| and a particular item on the right hand side: |
| <pre class="example"> |
| SELECT * FROM assoc_rules WHERE array_upper(pre,1) = 1 AND post = array['beer']; |
| </pre> |
| Result: |
| <pre class="result"> |
| ruleid | pre | post | count | support | confidence | lift | conviction |
| --------+-----------+--------+-------+-------------------+------------+------+------------ |
| 1 | {diapers} | {beer} | 5 | 0.714285714285714 | 1 | 1 | 0 |
| 3 | {chips} | {beer} | 3 | 0.428571428571429 | 1 | 1 | 0 |
| (2 rows) |
| </pre> |
| |
| |
| -# Limit the size of right hand side to 1. This parameter is a good way to |
| reduce long run times. |
| <pre class="example"> |
| SELECT * FROM madlib.assoc_rules( .25, -- Support |
| .5, -- Confidence |
| 'trans_id', -- Transaction id col |
| 'product', -- Product col |
| 'test_data', -- Input data |
| NULL, -- Output schema |
| TRUE, -- Verbose output |
| NULL, -- Max itemset size |
| NULL, -- Max LHS size |
| 1 -- Max RHS size |
| ); |
| </pre> |
| Result (iteration details not shown): |
| <pre class="result"> |
| output_schema | output_table | total_rules | total_time |
| ---------------+--------------+-------------+----------------- |
| public | assoc_rules | 6 | 00:00:00.031362 |
| (1 row) |
| </pre> |
| The association rules are again stored in the assoc_rules table: |
| <pre class="example"> |
| SELECT * FROM assoc_rules |
| ORDER BY support DESC, confidence DESC; |
| </pre> |
| Result: |
| <pre class="result"> |
| ruleid | pre | post | count | support | confidence | lift | conviction |
| --------+-----------------+-----------+-------+-------------------+-------------------+-------------------+------------------- |
| 4 | {diapers} | {beer} | 5 | 0.714285714285714 | 1 | 1 | 0 |
| 3 | {beer} | {diapers} | 5 | 0.714285714285714 | 0.714285714285714 | 1 | 1 |
| 1 | {chips} | {beer} | 3 | 0.428571428571429 | 1 | 1 | 0 |
| 6 | {diapers,chips} | {beer} | 2 | 0.285714285714286 | 1 | 1 | 0 |
| 2 | {chips} | {diapers} | 2 | 0.285714285714286 | 0.666666666666667 | 0.933333333333333 | 0.857142857142857 |
| 5 | {beer,chips} | {diapers} | 2 | 0.285714285714286 | 0.666666666666667 | 0.933333333333333 | 0.857142857142857 |
| (6 rows) |
| </pre> |
| |
| |
| @anchor notes |
| @par Notes |
| |
| The association rules function always creates a table named \c assoc_rules. |
| Make a copy of this table before running the function again if you would |
| like to keep multiple association rule tables. This behavior will be improved |
| in a later release. |
| |
| @anchor literature |
| @literature |
| |
| [1] https://en.wikipedia.org/wiki/Apriori_algorithm |
| |
| [2] https://cran.r-project.org/web/packages/arules/arules.pdf |
| |
| @anchor related |
| @par Related Topics |
| |
| File assoc_rules.sql_in documenting the SQL function. |
| |
| */ |
| |
| /* |
| * @brief The result data type for the association rule API |
| * |
| * output_schema the name of the output schema. |
| * output_table the name of the output table. |
| * total_rules the number of total rules. |
| * total_time the running time. |
| */ |
| DROP TYPE IF EXISTS MADLIB_SCHEMA.assoc_rules_results CASCADE; |
| CREATE TYPE MADLIB_SCHEMA.assoc_rules_results AS |
| ( |
| output_schema TEXT, |
| output_table TEXT, |
| total_rules INT, |
| total_time INTERVAL |
| ); |
| |
| /* |
| * @brief Given the text form of a closed frequent pattern (cfp), this function |
| * generates the association rules for that pattern. We use text format |
| * because text values are hash joinable. The output is a set of text |
| * array. For example, assuming the input pattern is '1,2,3'. |
| * The result rules: |
| * array['1', '2,3'] |
| * array['2', '1,3'] |
| * array['3', '1,2'] |
| * array['1,2', '3'] |
| * array['1,3', '2'] |
| * array['2,3', '1'] |
| * Note that two meaningless rules will be excluded: |
| * array['1,2,3', NULL] |
| * array[NULL, '1,2,3'] |
| * |
| * @param arg 1 The text form of a closed frequent pattern. |
| * @param arg 2 The number of items in the pattern. |
| * |
| * @return A set of text array. Each array has two elements, corresponding to |
| * the left and right parts of an association rule. |
| * |
| */ |
| CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.gen_rules_from_cfp |
| ( |
| TEXT, |
| INT, |
| INT, |
| INT |
| ) |
| RETURNS SETOF TEXT[] AS 'MODULE_PATHNAME' |
| LANGUAGE C STRICT IMMUTABLE |
| m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `NO SQL', `'); |
| |
| |
| /** |
| * |
| * @param support minimum level of support needed for each itemset to |
| * be included in result |
| * @param confidence minimum level of confidence needed for each rule to |
| * be included in result |
| * @param tid_col name of the column storing the transaction ids |
| * @param item_col name of the column storing the products |
| * @param input_table name of the table where the data is stored |
| * @param output_schema name of the schema where the final results will be stored |
| * @param verbose determining if output contains comments |
| * |
| * @returns The schema and table name containing association rules, |
| * and total number of rules found. |
| * |
| * This function computes the association rules between products in a data set. |
| * It reads the name of the table, the column names of the product and ids, and |
| * computes ssociation rules using the Apriori algorithm, and subject to the |
| * support and confidence constraints as input by the user. This version of |
| * association rules has verbose functionality. When verbose is true, output of |
| * function includes iteration steps and comments on Apriori algorithm steps. |
| * |
| */ |
| CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.assoc_rules |
| ( |
| support FLOAT8, |
| confidence FLOAT8, |
| tid_col TEXT, |
| item_col TEXT, |
| input_table TEXT, |
| output_schema TEXT, |
| verbose BOOLEAN, |
| max_itemset_size INTEGER, |
| max_lhs_size INTEGER, |
| max_rhs_size INTEGER |
| ) |
| RETURNS MADLIB_SCHEMA.assoc_rules_results |
| AS $$ |
| PythonFunctionBodyOnly(`assoc_rules', `assoc_rules') |
| with AOControl(False): |
| plpy.execute("SET client_min_messages = error;") |
| # schema_madlib comes from PythonFunctionBodyOnly |
| return assoc_rules.assoc_rules(schema_madlib, |
| support, |
| confidence, |
| tid_col, |
| item_col, |
| input_table, |
| output_schema, |
| verbose, |
| max_itemset_size, |
| max_lhs_size, |
| max_rhs_size |
| ); |
| |
| $$ LANGUAGE plpythonu |
| m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `'); |
| |
| CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.assoc_rules |
| ( |
| support FLOAT8, |
| confidence FLOAT8, |
| tid_col TEXT, |
| item_col TEXT, |
| input_table TEXT, |
| output_schema TEXT, |
| verbose BOOLEAN, |
| max_itemset_size INTEGER, |
| max_LHS_size INTEGER |
| ) |
| RETURNS MADLIB_SCHEMA.assoc_rules_results |
| AS $$ |
| PythonFunctionBodyOnly(`assoc_rules', `assoc_rules') |
| with AOControl(False): |
| plpy.execute("SET client_min_messages = error;") |
| # schema_madlib comes from PythonFunctionBodyOnly |
| return assoc_rules.assoc_rules(schema_madlib, |
| support, |
| confidence, |
| tid_col, |
| item_col, |
| input_table, |
| output_schema, |
| verbose, |
| max_itemset_size, |
| max_lhs_size, |
| None |
| ); |
| |
| $$ LANGUAGE plpythonu |
| m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `'); |
| |
| |
| CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.assoc_rules |
| ( |
| support FLOAT8, |
| confidence FLOAT8, |
| tid_col TEXT, |
| item_col TEXT, |
| input_table TEXT, |
| output_schema TEXT, |
| verbose BOOLEAN, |
| max_itemset_size INTEGER |
| ) |
| RETURNS MADLIB_SCHEMA.assoc_rules_results |
| AS $$ |
| PythonFunctionBodyOnly(`assoc_rules', `assoc_rules') |
| with AOControl(False): |
| plpy.execute("SET client_min_messages = error;") |
| # schema_madlib comes from PythonFunctionBodyOnly |
| return assoc_rules.assoc_rules(schema_madlib, |
| support, |
| confidence, |
| tid_col, |
| item_col, |
| input_table, |
| output_schema, |
| verbose, |
| max_itemset_size, |
| None, |
| None |
| ); |
| |
| $$ LANGUAGE plpythonu |
| m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `'); |
| |
| |
| /** |
| * |
| * @brief The short form of the above function with vobose removed. |
| * |
| */ |
| CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.assoc_rules |
| ( |
| support FLOAT8, |
| confidence FLOAT8, |
| tid_col TEXT, |
| item_col TEXT, |
| input_table TEXT, |
| output_schema TEXT |
| ) |
| RETURNS MADLIB_SCHEMA.assoc_rules_results |
| AS $$ |
| |
| PythonFunctionBodyOnly(`assoc_rules', `assoc_rules') |
| plpy.execute("SET client_min_messages = error;") |
| with AOControl(False): |
| # schema_madlib comes from PythonFunctionBodyOnly |
| return assoc_rules.assoc_rules(schema_madlib, |
| support, |
| confidence, |
| tid_col, |
| item_col, |
| input_table, |
| output_schema, |
| False, |
| 10, |
| None, |
| None); |
| |
| $$ LANGUAGE plpythonu |
| m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `'); |
| |
| |
| CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.assoc_rules |
| ( |
| support FLOAT8, |
| confidence FLOAT8, |
| tid_col TEXT, |
| item_col TEXT, |
| input_table TEXT, |
| output_schema TEXT, |
| verbose BOOLEAN |
| ) |
| RETURNS MADLIB_SCHEMA.assoc_rules_results |
| AS $$ |
| PythonFunctionBodyOnly(`assoc_rules', `assoc_rules') |
| plpy.execute("SET client_min_messages = error;") |
| with AOControl(False): |
| # schema_madlib comes from PythonFunctionBodyOnly |
| return assoc_rules.assoc_rules(schema_madlib, |
| support, |
| confidence, |
| tid_col, |
| item_col, |
| input_table, |
| output_schema, |
| verbose, |
| 10, |
| None, |
| None); |
| $$ LANGUAGE plpythonu |
| m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `'); |
| |
| -------------------------------------------------------------------------- |
| |
| CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.assoc_rules(message TEXT) |
| RETURNS text AS $$ |
| PythonFunction(assoc_rules, assoc_rules, assoc_rules_help_message) |
| $$ language plpythonu |
| m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `CONTAINS SQL', `'); |
| |
| CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.assoc_rules() |
| RETURNS text AS $$ |
| SELECT MADLIB_SCHEMA.assoc_rules(''); |
| $$ language SQL |
| m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `CONTAINS SQL', `'); |