blob: 5e6ca2f9a85022decf0fd64c1e95d4d80c570f10 [file] [log] [blame] [view]
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# 模式查询
## 1. 语法定义
```SQL
MATCH_RECOGNIZE (
[ PARTITION BY column [, ...] ]
[ ORDER BY column [, ...] ]
[ MEASURES measure_definition [, ...] ]
[ ROWS PER MATCH ]
[ AFTER MATCH skip_to ]
PATTERN ( row_pattern )
[ SUBSET subset_definition [, ...] ]
DEFINE variable_definition [, ...]
)
```
**说明:**
* PARTITION BY : 可选,用于对输入表进行分组,每个分组能独立进行模式匹配。如果未声明该子句,则整个输入表将作为一个整体进行处理。
* ORDER BY :可选,用于确保输入数据按某种顺序进行匹配处理。
* MEASURES :可选,用于指定从匹配到的一段数据中提取哪些信息。
* ROWS PER MATCH :可选,用于指定模式匹配成功后结果集的输出方式。
* AFTER MATCH SKIP :可选,用于指定在识别到一个非空匹配后,下一次模式匹配应从哪一行继续进行。
* PATTERN :用于定义需要匹配的行模式。
* SUBSET :可选,用于将多个基本模式变量所匹配的行合并为一个逻辑集合。
* DEFINE :用于定义行模式的基本模式变量。
更多详细功能介绍请参考:[模式查询](../User-Manual/Pattern-Query_timecho.md)
## 2. 使用示例
以[示例数据](../Reference/Sample-Data.md)为源数据
1. 时间分段查询
将 table1 中的数据按照时间间隔小于等于 24 小时分段,查询每段中的数据总条数,以及开始、结束时间。
查询SQL
```SQL
SELECT start_time, end_time, cnt
FROM table1
MATCH_RECOGNIZE (
ORDER BY time
MEASURES
RPR_FIRST(A.time) AS start_time,
RPR_LAST(time) AS end_time,
COUNT() AS cnt
PATTERN (A B*)
DEFINE B AS (cast(B.time as INT64) - cast(PREV(B.time) as INT64)) <= 86400000
) AS m
```
查询结果
```SQL
+-----------------------------+-----------------------------+---+
| start_time| end_time|cnt|
+-----------------------------+-----------------------------+---+
|2024-11-26T13:37:00.000+08:00|2024-11-26T13:38:00.000+08:00| 2|
|2024-11-27T16:38:00.000+08:00|2024-11-30T14:30:00.000+08:00| 16|
+-----------------------------+-----------------------------+---+
Total line number = 2
```
2. 差值分段查询
将 table2 中的数据按照 humidity 湿度值差值小于 0.1 分段,查询每段中的数据总条数,以及开始、结束时间。
* 查询sql
```SQL
SELECT start_time, end_time, cnt
FROM table2
MATCH_RECOGNIZE (
ORDER BY time
MEASURES
RPR_FIRST(A.time) AS start_time,
RPR_LAST(time) AS end_time,
COUNT() AS cnt
PATTERN (A B*)
DEFINE B AS (B.humidity - PREV(B.humidity )) <=0.1
) AS m;
```
* 查询结果
```SQL
+-----------------------------+-----------------------------+---+
| start_time| end_time|cnt|
+-----------------------------+-----------------------------+---+
|2024-11-26T13:37:00.000+08:00|2024-11-27T00:00:00.000+08:00| 2|
|2024-11-28T08:00:00.000+08:00|2024-11-29T00:00:00.000+08:00| 2|
|2024-11-29T11:00:00.000+08:00|2024-11-30T00:00:00.000+08:00| 2|
+-----------------------------+-----------------------------+---+
Total line number = 3
```
3. 事件统计查询
将 table1 中数据按照设备号分组,统计上海地区湿度大于 35 的开始、结束时间及最大湿度值。
* 查询sql
```SQL
SELECT m.device_id, m.match, m.event_start, m.event_end, m.max_humidity
FROM table1
MATCH_RECOGNIZE (
PARTITION BY device_id
ORDER BY time
MEASURES
MATCH_NUMBER() AS match,
RPR_FIRST(A.time) AS event_start,
RPR_LAST(A.time) AS event_end,
MAX(A.humidity) AS max_humidity
ONE ROW PER MATCH
PATTERN (A+)
DEFINE
A AS A.region= '上海' AND A.humidity> 35
) AS m
```
* 查询结果
```SQL
+---------+-----+-----------------------------+-----------------------------+------------+
|device_id|match| event_start| event_end|max_humidity|
+---------+-----+-----------------------------+-----------------------------+------------+
| 100| 1|2024-11-28T09:00:00.000+08:00|2024-11-29T18:30:00.000+08:00| 45.1|
| 101| 1|2024-11-30T09:30:00.000+08:00|2024-11-30T09:30:00.000+08:00| 35.2|
+---------+-----+-----------------------------+-----------------------------+------------+
Total line number = 2
```