| --- |
| { |
| "title": "WINDOW_FUNNEL", |
| "language": "en", |
| "description": "The WINDOWFUNNEL function analyzes user behavior sequences by searching for event chains within a specified time window and calculating the maximum " |
| } |
| --- |
| |
| ## Description |
| |
| The WINDOW_FUNNEL function analyzes user behavior sequences by searching for event chains within a specified time window and calculating the maximum number of completed steps in the event chain. This function is particularly useful for conversion funnel analysis, such as analyzing user conversion from website visits to final purchases. |
| |
| The function works according to the algorithm: |
| |
| - The function searches for data that triggers the first condition in the chain and sets the event counter to 1. This is the moment when the sliding window starts. |
| - If events from the chain occur sequentially within the window, the counter is incremented. If the sequence of events is disrupted, the counter is not incremented. |
| - If the data has multiple event chains at varying points of completion, the function will only output the size of the longest chain. |
| |
| ## Syntax |
| |
| ```sql |
| WINDOW_FUNNEL(<window>, <mode>, <timestamp>, <event_1>[, event_2, ... , event_n]) |
| ``` |
| |
| ## Parameters |
| |
| | Parameter | Description | |
| | -- | -- | |
| | `<window>` | window is the length of time window in seconds | |
| | `<mode>` | There are four modes in total, `default`, `deduplication`, `fixed`, and `increase`. For details, please refer to the **Mode** below. | |
| | `<timestamp>` | timestamp specifies column of DATETIME type, sliding time window works on it | |
| | `<event_n>` | evnet_n is boolean expression like eventID = 1004 | |
| |
| **Mode** |
| |
| - `default`: Defualt mode. |
| |
| - `deduplication`: If the same event holds for the sequence of events, then such repeating event interrupts further processing. E.g. the array parameter is [event1='A', event2='B', event3='C', event4='D'], and the original event chain is "A-B-C-B-D". Since event B repeats, the filtered event chain can only be "A-B-C" and the max event level is 3. |
| |
| - `fixed`: Don't allow interventions of other events. E.g. the array parameter is [event1='A', event2='B', event3='C', event4='D'], and the original event chain is A->B->D->C, it stops finding A->B->C at the D and the max event level is 2. |
| |
| - `increase`: Apply conditions only to events with strictly increasing timestamps. |
| |
| ## Return Value |
| Returns an integer representing the maximum number of consecutive steps completed within the specified time window. |
| |
| ## Examples |
| |
| ### example1: default mode |
| |
| Using the `default` mode, find out the maximum number of consecutive events corresponding to different `user_id`, with a time window of `5` minutes: |
| |
| ```sql |
| CREATE TABLE events( |
| user_id BIGINT, |
| event_name VARCHAR(64), |
| event_timestamp datetime, |
| phone_brand varchar(64), |
| tab_num int |
| ) distributed by hash(user_id) buckets 3 properties("replication_num" = "1"); |
| |
| INSERT INTO |
| events |
| VALUES |
| (100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1), |
| (100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2), |
| (100123, 'order', '2022-05-14 10:04:00', 'HONOR', 3), |
| (100123, 'payment', '2022-05-14 10:10:00', 'HONOR', 4), |
| (100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1), |
| (100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2), |
| (100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6), |
| (100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1), |
| (100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2), |
| (100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1), |
| (100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5); |
| |
| SELECT |
| user_id, |
| window_funnel( |
| 300, |
| "default", |
| event_timestamp, |
| event_name = 'login', |
| event_name = 'visit', |
| event_name = 'order', |
| event_name = 'payment' |
| ) AS level |
| FROM |
| events |
| GROUP BY |
| user_id |
| order BY |
| user_id; |
| ``` |
| |
| ```text |
| +---------+-------+ |
| | user_id | level | |
| +---------+-------+ |
| | 100123 | 3 | |
| | 100125 | 3 | |
| | 100126 | 2 | |
| | 100127 | 2 | |
| +---------+-------+ |
| ``` |
| |
| For `uesr_id=100123`, because the time when the `payment` event occurred exceeds the time window, the matched event chain is `login-visit-order`. |
| |
| ### example2: deduplication mode |
| |
| Use the `deduplication` mode to find out the maximum number of consecutive events corresponding to different user_ids, with a time window of 1 hour: |
| |
| ```sql |
| CREATE TABLE events( |
| user_id BIGINT, |
| event_name VARCHAR(64), |
| event_timestamp datetime, |
| phone_brand varchar(64), |
| tab_num int |
| ) distributed by hash(user_id) buckets 3 properties("replication_num" = "1"); |
| |
| INSERT INTO |
| events |
| VALUES |
| (100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1), |
| (100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2), |
| (100123, 'login', '2022-05-14 10:03:00', 'HONOR', 3), |
| (100123, 'order', '2022-05-14 10:04:00', "HONOR", 4), |
| (100123, 'payment', '2022-05-14 10:10:00', 'HONOR', 4), |
| (100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1), |
| (100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2), |
| (100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6), |
| (100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1), |
| (100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2), |
| (100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1), |
| (100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5); |
| |
| SELECT |
| user_id, |
| window_funnel( |
| 3600, |
| "deduplication", |
| event_timestamp, |
| event_name = 'login', |
| event_name = 'visit', |
| event_name = 'order', |
| event_name = 'payment' |
| ) AS level |
| FROM |
| events |
| GROUP BY |
| user_id |
| order BY |
| user_id; |
| ``` |
| |
| ```text |
| +---------+-------+ |
| | user_id | level | |
| +---------+-------+ |
| | 100123 | 2 | |
| | 100125 | 3 | |
| | 100126 | 2 | |
| | 100127 | 2 | |
| +---------+-------+ |
| ``` |
| For `uesr_id=100123`, after matching the `visit` event, the `login` event appears repeatedly, so the matched event chain is `login-visit`. |
| |
| ### example3: fixed mode |
| |
| Use the `fixed` mode to find out the maximum number of consecutive events corresponding to different `user_id`, with a time window of `1` hour: |
| |
| ```sql |
| CREATE TABLE events( |
| user_id BIGINT, |
| event_name VARCHAR(64), |
| event_timestamp datetime, |
| phone_brand varchar(64), |
| tab_num int |
| ) distributed by hash(user_id) buckets 3 properties("replication_num" = "1"); |
| |
| INSERT INTO |
| events |
| VALUES |
| (100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1), |
| (100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2), |
| (100123, 'order', '2022-05-14 10:03:00', "HONOR", 4), |
| (100123, 'login2', '2022-05-14 10:04:00', 'HONOR', 3), |
| (100123, 'payment', '2022-05-14 10:10:00', 'HONOR', 4), |
| (100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1), |
| (100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2), |
| (100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6), |
| (100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1), |
| (100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2), |
| (100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1), |
| (100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5); |
| |
| SELECT |
| user_id, |
| window_funnel( |
| 3600, |
| "fixed", |
| event_timestamp, |
| event_name = 'login', |
| event_name = 'visit', |
| event_name = 'order', |
| event_name = 'payment' |
| ) AS level |
| FROM |
| events |
| GROUP BY |
| user_id |
| order BY |
| user_id; |
| ``` |
| |
| ```text |
| +---------+-------+ |
| | user_id | level | |
| +---------+-------+ |
| | 100123 | 3 | |
| | 100125 | 3 | |
| | 100126 | 2 | |
| | 100127 | 2 | |
| +---------+-------+ |
| ``` |
| For `uesr_id=100123`, after matching the `order` event, the event chain is interrupted by the `login2` event, so the matched event chain is `login-visit-order`. |
| |
| ### example4: increase mode |
| |
| Use the `increase` mode to find out the maximum number of consecutive events corresponding to different `user_id`, with a time window of `1` hour: |
| |
| ```sql |
| CREATE TABLE events( |
| user_id BIGINT, |
| event_name VARCHAR(64), |
| event_timestamp datetime, |
| phone_brand varchar(64), |
| tab_num int |
| ) distributed by hash(user_id) buckets 3 properties("replication_num" = "1"); |
| |
| INSERT INTO |
| events |
| VALUES |
| (100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1), |
| (100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2), |
| (100123, 'order', '2022-05-14 10:04:00', "HONOR", 4), |
| (100123, 'payment', '2022-05-14 10:04:00', 'HONOR', 4), |
| (100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1), |
| (100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2), |
| (100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6), |
| (100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1), |
| (100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2), |
| (100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1), |
| (100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5); |
| |
| SELECT |
| user_id, |
| window_funnel( |
| 3600, |
| "increase", |
| event_timestamp, |
| event_name = 'login', |
| event_name = 'visit', |
| event_name = 'order', |
| event_name = 'payment' |
| ) AS level |
| FROM |
| events |
| GROUP BY |
| user_id |
| order BY |
| user_id; |
| ``` |
| |
| ```text |
| +---------+-------+ |
| | user_id | level | |
| +---------+-------+ |
| | 100123 | 3 | |
| | 100125 | 3 | |
| | 100126 | 2 | |
| | 100127 | 2 | |
| +---------+-------+ |
| ``` |
| For `uesr_id=100123`, the timestamp of the `payment` event and the timestamp of the `order` event occur in the same second and are not incremented, so the matched event chain is `login-visit-order`. |