blob: 3394cb246459caeaac1064cfcd99be57c38e3d6c [file] [log] [blame] [view]
---
{
"title": "WINDOW_FUNNEL",
"language": "en",
"description": "The WINDOWFUNNEL function analyzes user behavior sequences by searching for event chains within a specified time window and calculating the maximum "
}
---
## Description
The WINDOW_FUNNEL function analyzes user behavior sequences by searching for event chains within a specified time window and calculating the maximum number of completed steps in the event chain. This function is particularly useful for conversion funnel analysis, such as analyzing user conversion from website visits to final purchases.
The function works according to the algorithm:
- The function searches for data that triggers the first condition in the chain and sets the event counter to 1. This is the moment when the sliding window starts.
- If events from the chain occur sequentially within the window, the counter is incremented. If the sequence of events is disrupted, the counter is not incremented.
- If the data has multiple event chains at varying points of completion, the function will only output the size of the longest chain.
## Syntax
```sql
WINDOW_FUNNEL(<window>, <mode>, <timestamp>, <event_1>[, event_2, ... , event_n])
```
## Parameters
| Parameter | Description |
| -- | -- |
| `<window>` | window is the length of time window in seconds |
| `<mode>` | There are four modes in total, `default`, `deduplication`, `fixed`, and `increase`. For details, please refer to the **Mode** below. |
| `<timestamp>` | timestamp specifies column of DATETIME type, sliding time window works on it |
| `<event_n>` | evnet_n is boolean expression like eventID = 1004 |
**Mode**
- `default`: Defualt mode.
- `deduplication`: If the same event holds for the sequence of events, then such repeating event interrupts further processing. E.g. the array parameter is [event1='A', event2='B', event3='C', event4='D'], and the original event chain is "A-B-C-B-D". Since event B repeats, the filtered event chain can only be "A-B-C" and the max event level is 3.
- `fixed`: Don't allow interventions of other events. E.g. the array parameter is [event1='A', event2='B', event3='C', event4='D'], and the original event chain is A->B->D->C, it stops finding A->B->C at the D and the max event level is 2.
- `increase`: Apply conditions only to events with strictly increasing timestamps.
## Return Value
Returns an integer representing the maximum number of consecutive steps completed within the specified time window.
## Examples
### example1: default mode
Using the `default` mode, find out the maximum number of consecutive events corresponding to different `user_id`, with a time window of `5` minutes:
```sql
CREATE TABLE events(
user_id BIGINT,
event_name VARCHAR(64),
event_timestamp datetime,
phone_brand varchar(64),
tab_num int
) distributed by hash(user_id) buckets 3 properties("replication_num" = "1");
INSERT INTO
events
VALUES
(100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1),
(100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2),
(100123, 'order', '2022-05-14 10:04:00', 'HONOR', 3),
(100123, 'payment', '2022-05-14 10:10:00', 'HONOR', 4),
(100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1),
(100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2),
(100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6),
(100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1),
(100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2),
(100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1),
(100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5);
SELECT
user_id,
window_funnel(
300,
"default",
event_timestamp,
event_name = 'login',
event_name = 'visit',
event_name = 'order',
event_name = 'payment'
) AS level
FROM
events
GROUP BY
user_id
order BY
user_id;
```
```text
+---------+-------+
| user_id | level |
+---------+-------+
| 100123 | 3 |
| 100125 | 3 |
| 100126 | 2 |
| 100127 | 2 |
+---------+-------+
```
For `uesr_id=100123`, because the time when the `payment` event occurred exceeds the time window, the matched event chain is `login-visit-order`.
### example2: deduplication mode
Use the `deduplication` mode to find out the maximum number of consecutive events corresponding to different user_ids, with a time window of 1 hour:
```sql
CREATE TABLE events(
user_id BIGINT,
event_name VARCHAR(64),
event_timestamp datetime,
phone_brand varchar(64),
tab_num int
) distributed by hash(user_id) buckets 3 properties("replication_num" = "1");
INSERT INTO
events
VALUES
(100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1),
(100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2),
(100123, 'login', '2022-05-14 10:03:00', 'HONOR', 3),
(100123, 'order', '2022-05-14 10:04:00', "HONOR", 4),
(100123, 'payment', '2022-05-14 10:10:00', 'HONOR', 4),
(100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1),
(100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2),
(100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6),
(100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1),
(100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2),
(100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1),
(100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5);
SELECT
user_id,
window_funnel(
3600,
"deduplication",
event_timestamp,
event_name = 'login',
event_name = 'visit',
event_name = 'order',
event_name = 'payment'
) AS level
FROM
events
GROUP BY
user_id
order BY
user_id;
```
```text
+---------+-------+
| user_id | level |
+---------+-------+
| 100123 | 2 |
| 100125 | 3 |
| 100126 | 2 |
| 100127 | 2 |
+---------+-------+
```
For `uesr_id=100123`, after matching the `visit` event, the `login` event appears repeatedly, so the matched event chain is `login-visit`.
### example3: fixed mode
Use the `fixed` mode to find out the maximum number of consecutive events corresponding to different `user_id`, with a time window of `1` hour:
```sql
CREATE TABLE events(
user_id BIGINT,
event_name VARCHAR(64),
event_timestamp datetime,
phone_brand varchar(64),
tab_num int
) distributed by hash(user_id) buckets 3 properties("replication_num" = "1");
INSERT INTO
events
VALUES
(100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1),
(100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2),
(100123, 'order', '2022-05-14 10:03:00', "HONOR", 4),
(100123, 'login2', '2022-05-14 10:04:00', 'HONOR', 3),
(100123, 'payment', '2022-05-14 10:10:00', 'HONOR', 4),
(100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1),
(100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2),
(100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6),
(100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1),
(100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2),
(100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1),
(100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5);
SELECT
user_id,
window_funnel(
3600,
"fixed",
event_timestamp,
event_name = 'login',
event_name = 'visit',
event_name = 'order',
event_name = 'payment'
) AS level
FROM
events
GROUP BY
user_id
order BY
user_id;
```
```text
+---------+-------+
| user_id | level |
+---------+-------+
| 100123 | 3 |
| 100125 | 3 |
| 100126 | 2 |
| 100127 | 2 |
+---------+-------+
```
For `uesr_id=100123`, after matching the `order` event, the event chain is interrupted by the `login2` event, so the matched event chain is `login-visit-order`.
### example4: increase mode
Use the `increase` mode to find out the maximum number of consecutive events corresponding to different `user_id`, with a time window of `1` hour:
```sql
CREATE TABLE events(
user_id BIGINT,
event_name VARCHAR(64),
event_timestamp datetime,
phone_brand varchar(64),
tab_num int
) distributed by hash(user_id) buckets 3 properties("replication_num" = "1");
INSERT INTO
events
VALUES
(100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1),
(100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2),
(100123, 'order', '2022-05-14 10:04:00', "HONOR", 4),
(100123, 'payment', '2022-05-14 10:04:00', 'HONOR', 4),
(100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1),
(100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2),
(100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6),
(100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1),
(100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2),
(100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1),
(100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5);
SELECT
user_id,
window_funnel(
3600,
"increase",
event_timestamp,
event_name = 'login',
event_name = 'visit',
event_name = 'order',
event_name = 'payment'
) AS level
FROM
events
GROUP BY
user_id
order BY
user_id;
```
```text
+---------+-------+
| user_id | level |
+---------+-------+
| 100123 | 3 |
| 100125 | 3 |
| 100126 | 2 |
| 100127 | 2 |
+---------+-------+
```
For `uesr_id=100123`, the timestamp of the `payment` event and the timestamp of the `order` event occur in the same second and are not incremented, so the matched event chain is `login-visit-order`.