The WINDOW_FUNNEL function analyzes user behavior sequences by searching for event chains within a specified time window and calculating the maximum number of completed steps in the event chain. This function is particularly useful for conversion funnel analysis, such as analyzing user conversion from website visits to final purchases.
The function works according to the algorithm:
WINDOW_FUNNEL(<window>, <mode>, <timestamp>, <event_1>[, event_2, ... , event_n])
| Parameter | Description |
|---|---|
<window> | window is the length of time window in seconds |
<mode> | There are four modes in total, default, deduplication, fixed, and increase. For details, please refer to the Mode below. |
<timestamp> | timestamp specifies column of DATETIME type, sliding time window works on it |
<event_n> | evnet_n is boolean expression like eventID = 1004 |
Mode
- `default`: Standard funnel calculation. Doris searches for the longest event chain that matches the specified order within the time window. Events that do not match any condition are ignored. - `deduplication`: Based on `default`, but an event that has already been matched in the current chain cannot appear again. For example, if the condition list is [event1='A', event2='B', event3='C', event4='D'] and the original event chain is `A-B-C-B-D`, the second `B` breaks the chain, so the matched event chain is `A-B-C` and the max level is `3`. - `fixed`: The chain must advance in the specified order and cannot skip intermediate steps. If an event that matches a later condition appears before its immediate predecessor is matched, the chain stops. Events that do not match any condition are ignored and do not break the chain. For example, with [event1='A', event2='B', event3='C', event4='D'], `A-B-D-C` returns `A-B` and level `2`; with `A-B-X-C-D` (`X` matches none of the conditions), Doris returns `A-B-C-D`. - `increase`: Based on `default`, but matched events must have strictly increasing timestamps. If two matched events have the same timestamp, the later event cannot advance the chain.
Returns an integer representing the maximum number of consecutive steps completed within the specified time window.
Using the default mode, find out the maximum number of consecutive events corresponding to different user_id, with a time window of 5 minutes:
CREATE TABLE events( user_id BIGINT, event_name VARCHAR(64), event_timestamp datetime, phone_brand varchar(64), tab_num int ) distributed by hash(user_id) buckets 3 properties("replication_num" = "1"); INSERT INTO events VALUES (100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1), (100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2), (100123, 'order', '2022-05-14 10:04:00', 'HONOR', 3), (100123, 'payment', '2022-05-14 10:10:00', 'HONOR', 4), (100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1), (100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2), (100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6), (100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1), (100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2), (100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1), (100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5); SELECT user_id, window_funnel( 300, "default", event_timestamp, event_name = 'login', event_name = 'visit', event_name = 'order', event_name = 'payment' ) AS level FROM events GROUP BY user_id order BY user_id;
+---------+-------+ | user_id | level | +---------+-------+ | 100123 | 3 | | 100125 | 3 | | 100126 | 2 | | 100127 | 2 | +---------+-------+
For user_id=100123, because the time when the payment event occurred exceeds the time window, the matched event chain is login-visit-order.
Use the deduplication mode to find out the maximum number of consecutive events corresponding to different user_ids, with a time window of 1 hour:
CREATE TABLE events( user_id BIGINT, event_name VARCHAR(64), event_timestamp datetime, phone_brand varchar(64), tab_num int ) distributed by hash(user_id) buckets 3 properties("replication_num" = "1"); INSERT INTO events VALUES (100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1), (100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2), (100123, 'login', '2022-05-14 10:03:00', 'HONOR', 3), (100123, 'order', '2022-05-14 10:04:00', 'HONOR', 4), (100123, 'payment', '2022-05-14 10:10:00', 'HONOR', 4), (100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1), (100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2), (100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6), (100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1), (100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2), (100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1), (100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5); SELECT user_id, window_funnel( 3600, "deduplication", event_timestamp, event_name = 'login', event_name = 'visit', event_name = 'order', event_name = 'payment' ) AS level FROM events GROUP BY user_id order BY user_id;
+---------+-------+ | user_id | level | +---------+-------+ | 100123 | 2 | | 100125 | 3 | | 100126 | 2 | | 100127 | 2 | +---------+-------+
For user_id=100123, after matching the visit event, the login event appears repeatedly, so the matched event chain is login-visit.
Use the fixed mode to find out the maximum number of consecutive events corresponding to different user_id, with a time window of 1 hour:
CREATE TABLE events( user_id BIGINT, event_name VARCHAR(64), event_timestamp datetime, phone_brand varchar(64), tab_num int ) distributed by hash(user_id) buckets 3 properties("replication_num" = "1"); INSERT INTO events VALUES (100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1), (100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2), (100123, 'login2', '2022-05-14 10:03:00', 'HONOR', 3), (100123, 'order', '2022-05-14 10:04:00', 'HONOR', 4), (100123, 'payment', '2022-05-14 10:10:00', 'HONOR', 4), (100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1), (100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2), (100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6), (100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1), (100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2), (100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1), (100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5); SELECT user_id, window_funnel( 3600, "fixed", event_timestamp, event_name = 'login', event_name = 'visit', event_name = 'order', event_name = 'payment' ) AS level FROM events GROUP BY user_id order BY user_id;
+---------+-------+ | user_id | level | +---------+-------+ | 100123 | 4 | | 100125 | 3 | | 100126 | 2 | | 100127 | 2 | +---------+-------+
For user_id=100123, login2 does not match any condition in the funnel, so it does not break the fixed chain. The matched event chain is login-visit-order-payment.
Use the increase mode to find out the maximum number of consecutive events corresponding to different user_id, with a time window of 1 hour:
CREATE TABLE events( user_id BIGINT, event_name VARCHAR(64), event_timestamp datetime, phone_brand varchar(64), tab_num int ) distributed by hash(user_id) buckets 3 properties("replication_num" = "1"); INSERT INTO events VALUES (100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1), (100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2), (100123, 'order', '2022-05-14 10:04:00', 'HONOR', 4), (100123, 'payment', '2022-05-14 10:04:00', 'HONOR', 4), (100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1), (100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2), (100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6), (100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1), (100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2), (100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1), (100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5); SELECT user_id, window_funnel( 3600, "increase", event_timestamp, event_name = 'login', event_name = 'visit', event_name = 'order', event_name = 'payment' ) AS level FROM events GROUP BY user_id order BY user_id;
+---------+-------+ | user_id | level | +---------+-------+ | 100123 | 3 | | 100125 | 3 | | 100126 | 2 | | 100127 | 2 | +---------+-------+
For user_id=100123, the timestamp of the payment event and the timestamp of the order event occur in the same second and are not incremented, so the matched event chain is login-visit-order.