blob: 62cd4691e1784cc647c2d12a4be462455a74f439 [file] [log] [blame] [view]
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# Pattern Query
## 1. Syntax Definition
```SQL
MATCH_RECOGNIZE (
[ PARTITION BY column [, ...] ]
[ ORDER BY column [, ...] ]
[ MEASURES measure_definition [, ...] ]
[ ROWS PER MATCH ]
[ AFTER MATCH skip_to ]
PATTERN ( row_pattern )
[ SUBSET subset_definition [, ...] ]
DEFINE variable_definition [, ...]
)
```
**Note:**
* PARTITION BY: Optional. Used to group the input table, and each group can perform pattern matching independently. If this clause is not specified, the entire input table will be processed as a single unit.
* ORDER BY: Optional. Used to ensure that input data is processed in a specific order during matching.
* MEASURES: Optional. Used to specify which information to extract from the matched segment of data.
* ROWS PER MATCH: Optional. Used to specify the output method of the result set after successful pattern matching.
* AFTER MATCH SKIP: Optional. Used to specify which row to resume from for the next pattern match after identifying a non-empty match.
* PATTERN: Used to define the row pattern to be matched.
* SUBSET: Optional. Used to merge rows matched by multiple basic pattern variables into a single logical set.
* DEFINE: Used to define the basic pattern variables for the row pattern.
For more detailed introductions to the features, please refer to:[Pattern Query](../User-Manual/Timeseries-Featured-Analysis_timecho.md#_1-pattern-query)
## 2. Usage Examples
Using [Sample Data](../Reference/Sample-Data.md) as the source data
1. Time Segment Query
Segment the data in table1 by time intervals less than or equal to 24 hours, and query the total number of data entries in each segment, as well as the start and end times.
Query SQL
SQL
```SQL
SELECT start_time, end_time, cnt
FROM table1
MATCH_RECOGNIZE (
ORDER BY time
MEASURES
RPR_FIRST(A.time) AS start_time,
RPR_LAST(time) AS end_time,
COUNT() AS cnt
PATTERN (A B*)
DEFINE B AS (cast(B.time as INT64) - cast(PREV(B.time) as INT64)) <= 86400000
) AS m
```
Query Results
SQL
```SQL
+-----------------------------+-----------------------------+---+
| start_time| end_time|cnt|
+-----------------------------+-----------------------------+---+
|2024-11-26T13:37:00.000+08:00|2024-11-26T13:38:00.000+08:00| 2|
|2024-11-27T16:38:00.000+08:00|2024-11-30T14:30:00.000+08:00| 16|
+-----------------------------+-----------------------------+---+
Total line number = 2
```
2. Difference Segment Query
Segment the data in table2 by humidity value differences less than 0.1, and query the total number of data entries in each segment, as well as the start and end times.
* Query SQL
SQL
```SQL
SELECT start_time, end_time, cnt
FROM table2
MATCH_RECOGNIZE (
ORDER BY time
MEASURES
RPR_FIRST(A.time) AS start_time,
RPR_LAST(time) AS end_time,
COUNT() AS cnt
PATTERN (A B*)
DEFINE B AS (B.humidity - PREV(B.humidity )) <=0.1
) AS m;
```
* Query Results
SQL
```SQL
+-----------------------------+-----------------------------+---+
| start_time| end_time|cnt|
+-----------------------------+-----------------------------+---+
|2024-11-26T13:37:00.000+08:00|2024-11-27T00:00:00.000+08:00| 2|
|2024-11-28T08:00:00.000+08:00|2024-11-29T00:00:00.000+08:00| 2|
|2024-11-29T11:00:00.000+08:00|2024-11-30T00:00:00.000+08:00| 2|
+-----------------------------+-----------------------------+---+
Total line number = 3
```
3. Event Statistics Query
Group the data in table1 by device ID, and count the start and end times and maximum humidity value where the humidity in the Shanghai area is greater than 35.
* Query SQL
SQL
```SQL
SELECT m.device_id, m.match, m.event_start, m.event_end, m.max_humidity
FROM table1
MATCH_RECOGNIZE (
PARTITION BY device_id
ORDER BY time
MEASURES
MATCH_NUMBER() AS match,
RPR_FIRST(A.time) AS event_start,
RPR_LAST(A.time) AS event_end,
MAX(A.humidity) AS max_humidity
ONE ROW PER MATCH
PATTERN (A+)
DEFINE
A AS A.region= 'Shanghai' AND A.humidity> 35
) AS m
```
* Query Results
SQL
```SQL
+---------+-----+-----------------------------+-----------------------------+------------+
|device_id|match| event_start| event_end|max_humidity|
+---------+-----+-----------------------------+-----------------------------+------------+
| 100| 1|2024-11-28T09:00:00.000+08:00|2024-11-29T18:30:00.000+08:00| 45.1|
| 101| 1|2024-11-30T09:30:00.000+08:00|2024-11-30T09:30:00.000+08:00| 35.2|
+---------+-----+-----------------------------+-----------------------------+------------+
Total line number = 2
```