The WINDOW_FUNNEL function analyzes user behavior sequences by searching for event chains within a specified time window and calculating the maximum number of completed steps in the event chain. This function is particularly useful for conversion funnel analysis, such as analyzing user conversion from website visits to final purchases.
The function works according to the algorithm:
WINDOW_FUNNEL(<window>, <mode>, <timestamp>, <event_1>[, event_2, ... , event_n])
| Parameter | Description |
|---|---|
<window> | window is the length of time window in seconds |
<mode> | There are four modes in total, default, deduplication, fixed, and increase. For details, please refer to the Mode below. |
<timestamp> | timestamp specifies column of DATETIME type, sliding time window works on it |
<event_n> | evnet_n is boolean expression like eventID = 1004 |
Mode
- `default`: Defualt mode. - `deduplication`: If the same event holds for the sequence of events, then such repeating event interrupts further processing. E.g. the array parameter is [event1='A', event2='B', event3='C', event4='D'], and the original event chain is "A-B-C-B-D". Since event B repeats, the filtered event chain can only be "A-B-C" and the max event level is 3. - `fixed`: Don't allow interventions of other events. E.g. the array parameter is [event1='A', event2='B', event3='C', event4='D'], and the original event chain is A->B->D->C, it stops finding A->B->C at the D and the max event level is 2. - `increase`: Apply conditions only to events with strictly increasing timestamps.
Returns an integer representing the maximum number of consecutive steps completed within the specified time window.
Using the default mode, find out the maximum number of consecutive events corresponding to different user_id, with a time window of 5 minutes:
CREATE TABLE events( user_id BIGINT, event_name VARCHAR(64), event_timestamp datetime, phone_brand varchar(64), tab_num int ) distributed by hash(user_id) buckets 3 properties("replication_num" = "1"); INSERT INTO events VALUES (100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1), (100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2), (100123, 'order', '2022-05-14 10:04:00', 'HONOR', 3), (100123, 'payment', '2022-05-14 10:10:00', 'HONOR', 4), (100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1), (100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2), (100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6), (100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1), (100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2), (100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1), (100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5); SELECT user_id, window_funnel( 300, "default", event_timestamp, event_name = 'login', event_name = 'visit', event_name = 'order', event_name = 'payment' ) AS level FROM events GROUP BY user_id order BY user_id;
+---------+-------+ | user_id | level | +---------+-------+ | 100123 | 3 | | 100125 | 3 | | 100126 | 2 | | 100127 | 2 | +---------+-------+
For uesr_id=100123, because the time when the payment event occurred exceeds the time window, the matched event chain is login-visit-order.
Use the deduplication mode to find out the maximum number of consecutive events corresponding to different user_ids, with a time window of 1 hour:
CREATE TABLE events( user_id BIGINT, event_name VARCHAR(64), event_timestamp datetime, phone_brand varchar(64), tab_num int ) distributed by hash(user_id) buckets 3 properties("replication_num" = "1"); INSERT INTO events VALUES (100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1), (100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2), (100123, 'login', '2022-05-14 10:03:00', 'HONOR', 3), (100123, 'order', '2022-05-14 10:04:00', "HONOR", 4), (100123, 'payment', '2022-05-14 10:10:00', 'HONOR', 4), (100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1), (100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2), (100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6), (100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1), (100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2), (100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1), (100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5); SELECT user_id, window_funnel( 3600, "deduplication", event_timestamp, event_name = 'login', event_name = 'visit', event_name = 'order', event_name = 'payment' ) AS level FROM events GROUP BY user_id order BY user_id;
+---------+-------+ | user_id | level | +---------+-------+ | 100123 | 2 | | 100125 | 3 | | 100126 | 2 | | 100127 | 2 | +---------+-------+
For uesr_id=100123, after matching the visit event, the login event appears repeatedly, so the matched event chain is login-visit.
Use the fixed mode to find out the maximum number of consecutive events corresponding to different user_id, with a time window of 1 hour:
CREATE TABLE events( user_id BIGINT, event_name VARCHAR(64), event_timestamp datetime, phone_brand varchar(64), tab_num int ) distributed by hash(user_id) buckets 3 properties("replication_num" = "1"); INSERT INTO events VALUES (100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1), (100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2), (100123, 'order', '2022-05-14 10:03:00', "HONOR", 4), (100123, 'login2', '2022-05-14 10:04:00', 'HONOR', 3), (100123, 'payment', '2022-05-14 10:10:00', 'HONOR', 4), (100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1), (100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2), (100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6), (100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1), (100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2), (100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1), (100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5); SELECT user_id, window_funnel( 3600, "fixed", event_timestamp, event_name = 'login', event_name = 'visit', event_name = 'order', event_name = 'payment' ) AS level FROM events GROUP BY user_id order BY user_id;
+---------+-------+ | user_id | level | +---------+-------+ | 100123 | 3 | | 100125 | 3 | | 100126 | 2 | | 100127 | 2 | +---------+-------+
For uesr_id=100123, after matching the order event, the event chain is interrupted by the login2 event, so the matched event chain is login-visit-order.
Use the increase mode to find out the maximum number of consecutive events corresponding to different user_id, with a time window of 1 hour:
CREATE TABLE events( user_id BIGINT, event_name VARCHAR(64), event_timestamp datetime, phone_brand varchar(64), tab_num int ) distributed by hash(user_id) buckets 3 properties("replication_num" = "1"); INSERT INTO events VALUES (100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1), (100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2), (100123, 'order', '2022-05-14 10:04:00', "HONOR", 4), (100123, 'payment', '2022-05-14 10:04:00', 'HONOR', 4), (100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1), (100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2), (100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6), (100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1), (100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2), (100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1), (100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5); SELECT user_id, window_funnel( 3600, "increase", event_timestamp, event_name = 'login', event_name = 'visit', event_name = 'order', event_name = 'payment' ) AS level FROM events GROUP BY user_id order BY user_id;
+---------+-------+ | user_id | level | +---------+-------+ | 100123 | 3 | | 100125 | 3 | | 100126 | 2 | | 100127 | 2 | +---------+-------+
For uesr_id=100123, the timestamp of the payment event and the timestamp of the order event occur in the same second and are not incremented, so the matched event chain is login-visit-order.