| --- |
| { |
| "title": "WINDOW_FUNCTION_WINDOW_FUNNEL", |
| "language": "en" |
| } |
| --- |
| |
| <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> |
| |
| ## WINDOW FUNCTION WINDOW_FUNNEL |
| ### description |
| |
| Searches for event chains in a sliding time window and calculates the maximum number of events that occurred from the chain. |
| |
| - window is the length of time window in seconds. |
| - mode can be one of the followings: |
| - "default": Defualt mode. |
| - "deduplication": If the same event holds for the sequence of events, then such repeating event interrupts further processing. E.g. the array parameter is [event1='A', event2='B', event3='C', event4='D'], and the original event chain is "A-B-C-B-D". Since event B repeats, the filtered event chain can only be "A-B-C" and the max event level is 3. |
| - "fixed": Don't allow interventions of other events. E.g. the array parameter is [event1='A', event2='B', event3='C', event4='D'], and the original event chain is A->B->D->C, it stops finding A->B->C at the D and the max event level is 2. |
| - "increase": Apply conditions only to events with strictly increasing timestamps. |
| - timestamp_column specifies column of DATETIME type, sliding time window works on it. |
| - evnetN is boolean expression like eventID = 1004. |
| |
| The function works according to the algorithm: |
| |
| - The function searches for data that triggers the first condition in the chain and sets the event counter to 1. This is the moment when the sliding window starts. |
| - If events from the chain occur sequentially within the window, the counter is incremented. If the sequence of events is disrupted, the counter is not incremented. |
| - If the data has multiple event chains at varying points of completion, the function will only output the size of the longest chain. |
| |
| ```sql |
| window_funnel(window, mode, timestamp_column, event1, event2, ... , eventN) |
| ``` |
| |
| ### example |
| |
| #### example1: default mode |
| |
| Using the ```default``` mode, find out the maximum number of consecutive events corresponding to different ```user_id```, with a time window of ```5``` minutes: |
| |
| ```sql |
| CREATE TABLE events( |
| user_id BIGINT, |
| event_name VARCHAR(64), |
| event_timestamp datetime, |
| phone_brand varchar(64), |
| tab_num int |
| ) distributed by hash(user_id) buckets 3 properties("replication_num" = "1"); |
| |
| INSERT INTO |
| events |
| VALUES |
| (100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1), |
| (100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2), |
| (100123, 'order', '2022-05-14 10:04:00', 'HONOR', 3), |
| (100123, 'payment', '2022-05-14 10:10:00', 'HONOR', 4), |
| (100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1), |
| (100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2), |
| (100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6), |
| (100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1), |
| (100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2), |
| (100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1), |
| (100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5); |
| |
| SELECT |
| user_id, |
| window_funnel( |
| 300, |
| "default", |
| event_timestamp, |
| event_name = 'login', |
| event_name = 'visit', |
| event_name = 'order', |
| event_name = 'payment' |
| ) AS level |
| FROM |
| events |
| GROUP BY |
| user_id |
| order BY |
| user_id; |
| |
| +---------+-------+ |
| | user_id | level | |
| +---------+-------+ |
| | 100123 | 3 | |
| | 100125 | 3 | |
| | 100126 | 2 | |
| | 100127 | 2 | |
| +---------+-------+ |
| ``` |
| |
| For ```uesr_id=100123```, because the time when the ```payment``` event occurred exceeds the time window, the matched event chain is ```login-visit-order```. |
| |
| #### example2: deduplication mode |
| |
| Use the ```deduplication``` mode to find out the maximum number of consecutive events corresponding to different user_ids, with a time window of 1 hour: |
| |
| ```sql |
| CREATE TABLE events( |
| user_id BIGINT, |
| event_name VARCHAR(64), |
| event_timestamp datetime, |
| phone_brand varchar(64), |
| tab_num int |
| ) distributed by hash(user_id) buckets 3 properties("replication_num" = "1"); |
| |
| INSERT INTO |
| events |
| VALUES |
| (100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1), |
| (100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2), |
| (100123, 'login', '2022-05-14 10:03:00', 'HONOR', 3), |
| (100123, 'order', '2022-05-14 10:04:00', "HONOR", 4), |
| (100123, 'payment', '2022-05-14 10:10:00', 'HONOR', 4), |
| (100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1), |
| (100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2), |
| (100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6), |
| (100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1), |
| (100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2), |
| (100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1), |
| (100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5); |
| |
| SELECT |
| user_id, |
| window_funnel( |
| 3600, |
| "deduplication", |
| event_timestamp, |
| event_name = 'login', |
| event_name = 'visit', |
| event_name = 'order', |
| event_name = 'payment' |
| ) AS level |
| FROM |
| events |
| GROUP BY |
| user_id |
| order BY |
| user_id; |
| |
| +---------+-------+ |
| | user_id | level | |
| +---------+-------+ |
| | 100123 | 2 | |
| | 100125 | 3 | |
| | 100126 | 2 | |
| | 100127 | 2 | |
| +---------+-------+ |
| ``` |
| For ```uesr_id=100123```, after matching the ```visit``` event, the ```login``` event appears repeatedly, so the matched event chain is ```login-visit```. |
| |
| #### example3: fixed mode |
| |
| Use the ```fixed``` mode to find out the maximum number of consecutive events corresponding to different ```user_id```, with a time window of ```1``` hour: |
| |
| ```sql |
| CREATE TABLE events( |
| user_id BIGINT, |
| event_name VARCHAR(64), |
| event_timestamp datetime, |
| phone_brand varchar(64), |
| tab_num int |
| ) distributed by hash(user_id) buckets 3 properties("replication_num" = "1"); |
| |
| INSERT INTO |
| events |
| VALUES |
| (100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1), |
| (100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2), |
| (100123, 'order', '2022-05-14 10:03:00', "HONOR", 4), |
| (100123, 'login2', '2022-05-14 10:04:00', 'HONOR', 3), |
| (100123, 'payment', '2022-05-14 10:10:00', 'HONOR', 4), |
| (100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1), |
| (100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2), |
| (100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6), |
| (100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1), |
| (100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2), |
| (100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1), |
| (100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5); |
| |
| SELECT |
| user_id, |
| window_funnel( |
| 3600, |
| "fixed", |
| event_timestamp, |
| event_name = 'login', |
| event_name = 'visit', |
| event_name = 'order', |
| event_name = 'payment' |
| ) AS level |
| FROM |
| events |
| GROUP BY |
| user_id |
| order BY |
| user_id; |
| |
| +---------+-------+ |
| | user_id | level | |
| +---------+-------+ |
| | 100123 | 3 | |
| | 100125 | 3 | |
| | 100126 | 2 | |
| | 100127 | 2 | |
| +---------+-------+ |
| ``` |
| For ```uesr_id=100123```, after matching the ```order``` event, the event chain is interrupted by the ```login2``` event, so the matched event chain is ```login-visit-order```. |
| |
| #### example4: increase mode |
| |
| Use the ```increase``` mode to find out the maximum number of consecutive events corresponding to different ```user_id```, with a time window of ```1``` hour: |
| |
| ```sql |
| CREATE TABLE events( |
| user_id BIGINT, |
| event_name VARCHAR(64), |
| event_timestamp datetime, |
| phone_brand varchar(64), |
| tab_num int |
| ) distributed by hash(user_id) buckets 3 properties("replication_num" = "1"); |
| |
| INSERT INTO |
| events |
| VALUES |
| (100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1), |
| (100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2), |
| (100123, 'order', '2022-05-14 10:04:00', "HONOR", 4), |
| (100123, 'payment', '2022-05-14 10:04:00', 'HONOR', 4), |
| (100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1), |
| (100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2), |
| (100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6), |
| (100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1), |
| (100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2), |
| (100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1), |
| (100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5); |
| |
| SELECT |
| user_id, |
| window_funnel( |
| 3600, |
| "increase", |
| event_timestamp, |
| event_name = 'login', |
| event_name = 'visit', |
| event_name = 'order', |
| event_name = 'payment' |
| ) AS level |
| FROM |
| events |
| GROUP BY |
| user_id |
| order BY |
| user_id; |
| |
| +---------+-------+ |
| | user_id | level | |
| +---------+-------+ |
| | 100123 | 3 | |
| | 100125 | 3 | |
| | 100126 | 2 | |
| | 100127 | 2 | |
| +---------+-------+ |
| ``` |
| For ```uesr_id=100123```, the timestamp of the ```payment``` event and the timestamp of the ```order``` event occur in the same second and are not incremented, so the matched event chain is ```login-visit-order```. |
| |
| |
| ### keywords |
| |
| WINDOW,FUNCTION,WINDOW_FUNNEL |