Searches for event chains in a sliding time window and calculates the maximum number of events that occurred from the chain.
The function works according to the algorithm:
window_funnel(window, mode, timestamp_column, event1, event2, ... , eventN)
Using the default mode, find out the maximum number of consecutive events corresponding to different user_id, with a time window of 5 minutes:
CREATE TABLE events( user_id BIGINT, event_name VARCHAR(64), event_timestamp datetime, phone_brand varchar(64), tab_num int ) distributed by hash(user_id) buckets 3 properties("replication_num" = "1"); INSERT INTO events VALUES (100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1), (100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2), (100123, 'order', '2022-05-14 10:04:00', 'HONOR', 3), (100123, 'payment', '2022-05-14 10:10:00', 'HONOR', 4), (100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1), (100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2), (100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6), (100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1), (100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2), (100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1), (100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5); SELECT user_id, window_funnel( 300, "default", event_timestamp, event_name = 'login', event_name = 'visit', event_name = 'order', event_name = 'payment' ) AS level FROM events GROUP BY user_id order BY user_id; +---------+-------+ | user_id | level | +---------+-------+ | 100123 | 3 | | 100125 | 3 | | 100126 | 2 | | 100127 | 2 | +---------+-------+
For uesr_id=100123, because the time when the payment event occurred exceeds the time window, the matched event chain is login-visit-order.
Use the deduplication mode to find out the maximum number of consecutive events corresponding to different user_ids, with a time window of 1 hour:
CREATE TABLE events( user_id BIGINT, event_name VARCHAR(64), event_timestamp datetime, phone_brand varchar(64), tab_num int ) distributed by hash(user_id) buckets 3 properties("replication_num" = "1"); INSERT INTO events VALUES (100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1), (100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2), (100123, 'login', '2022-05-14 10:03:00', 'HONOR', 3), (100123, 'order', '2022-05-14 10:04:00', "HONOR", 4), (100123, 'payment', '2022-05-14 10:10:00', 'HONOR', 4), (100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1), (100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2), (100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6), (100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1), (100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2), (100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1), (100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5); SELECT user_id, window_funnel( 3600, "deduplication", event_timestamp, event_name = 'login', event_name = 'visit', event_name = 'order', event_name = 'payment' ) AS level FROM events GROUP BY user_id order BY user_id; +---------+-------+ | user_id | level | +---------+-------+ | 100123 | 2 | | 100125 | 3 | | 100126 | 2 | | 100127 | 2 | +---------+-------+
For uesr_id=100123, after matching the visit event, the login event appears repeatedly, so the matched event chain is login-visit.
Use the fixed mode to find out the maximum number of consecutive events corresponding to different user_id, with a time window of 1 hour:
CREATE TABLE events( user_id BIGINT, event_name VARCHAR(64), event_timestamp datetime, phone_brand varchar(64), tab_num int ) distributed by hash(user_id) buckets 3 properties("replication_num" = "1"); INSERT INTO events VALUES (100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1), (100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2), (100123, 'order', '2022-05-14 10:03:00', "HONOR", 4), (100123, 'login2', '2022-05-14 10:04:00', 'HONOR', 3), (100123, 'payment', '2022-05-14 10:10:00', 'HONOR', 4), (100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1), (100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2), (100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6), (100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1), (100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2), (100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1), (100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5); SELECT user_id, window_funnel( 3600, "fixed", event_timestamp, event_name = 'login', event_name = 'visit', event_name = 'order', event_name = 'payment' ) AS level FROM events GROUP BY user_id order BY user_id; +---------+-------+ | user_id | level | +---------+-------+ | 100123 | 3 | | 100125 | 3 | | 100126 | 2 | | 100127 | 2 | +---------+-------+
For uesr_id=100123, after matching the order event, the event chain is interrupted by the login2 event, so the matched event chain is login-visit-order.
Use the increase mode to find out the maximum number of consecutive events corresponding to different user_id, with a time window of 1 hour:
CREATE TABLE events( user_id BIGINT, event_name VARCHAR(64), event_timestamp datetime, phone_brand varchar(64), tab_num int ) distributed by hash(user_id) buckets 3 properties("replication_num" = "1"); INSERT INTO events VALUES (100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1), (100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2), (100123, 'order', '2022-05-14 10:04:00', "HONOR", 4), (100123, 'payment', '2022-05-14 10:04:00', 'HONOR', 4), (100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1), (100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2), (100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6), (100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1), (100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2), (100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1), (100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5); SELECT user_id, window_funnel( 3600, "increase", event_timestamp, event_name = 'login', event_name = 'visit', event_name = 'order', event_name = 'payment' ) AS level FROM events GROUP BY user_id order BY user_id; +---------+-------+ | user_id | level | +---------+-------+ | 100123 | 3 | | 100125 | 3 | | 100126 | 2 | | 100127 | 2 | +---------+-------+
For uesr_id=100123, the timestamp of the payment event and the timestamp of the order event occur in the same second and are not incremented, so the matched event chain is login-visit-order.
WINDOW,FUNCTION,WINDOW_FUNNEL