blob: f339369692204f87efcd5ea1c54971388a926d96 [file] [log] [blame] [view]
<!--
​ Licensed to the Apache Software Foundation (ASF) under one
​ or more contributor license agreements. See the NOTICE file
​ distributed with this work for additional information
​ regarding copyright ownership. The ASF licenses this file
​ to you under the Apache License, Version 2.0 (the
​ "License"); you may not use this file except in compliance
​ with the License. You may obtain a copy of the License at
​ http://www.apache.org/licenses/LICENSE-2.0
​ Unless required by applicable law or agreed to in writing,
​ software distributed under the License is distributed on an
​ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
​ KIND, either express or implied. See the License for the
​ specific language governing permissions and limitations
​ under the License.
-->
# String Processing
## RegexMatch
### Usage
The function is used to fetch matched contents from text with given regular expression.
**Name:** REGEXMATCH
**Input Series:** Only support a single input series. The data type is TEXT.
**Parameter:**
+ `regex`: The regular expression to match in the text. All grammars supported by Java are acceptable,
for example, `\d+\.\d+\.\d+\.\d+` is expected to match any IPv4 addresses.
+ `group`: The wanted group index in the matched result.
Reference to java.util.regex, group 0 is the whole pattern and
the next ones are numbered with the appearance order of left parentheses.
For example, the groups in `A(B(CD))` are: 0-`A(B(CD))`, 1-`B(CD)`, 2-`CD`.
**Output Series:** Output a single series. The type is TEXT.
**Note:** Those points with null values or not matched with the given pattern will not return any results.
### Examples
Input series:
```
+-----------------------------+-------------------------------+
| Time| root.test.d1.s1|
+-----------------------------+-------------------------------+
|2021-01-01T00:00:01.000+08:00| [192.168.0.1] [SUCCESS]|
|2021-01-01T00:00:02.000+08:00| [192.168.0.24] [SUCCESS]|
|2021-01-01T00:00:03.000+08:00| [192.168.0.2] [FAIL]|
|2021-01-01T00:00:04.000+08:00| [192.168.0.5] [SUCCESS]|
|2021-01-01T00:00:05.000+08:00| [192.168.0.124] [SUCCESS]|
+-----------------------------+-------------------------------+
```
SQL for query:
```sql
select regexmatch(s1, "regex"="\d+\.\d+\.\d+\.\d+", "group"="0") from root.test.d1
```
Output series:
```
+-----------------------------+----------------------------------------------------------------------+
| Time|regexmatch(root.test.d1.s1, "regex"="\d+\.\d+\.\d+\.\d+", "group"="0")|
+-----------------------------+----------------------------------------------------------------------+
|2021-01-01T00:00:01.000+08:00| 192.168.0.1|
|2021-01-01T00:00:02.000+08:00| 192.168.0.24|
|2021-01-01T00:00:03.000+08:00| 192.168.0.2|
|2021-01-01T00:00:04.000+08:00| 192.168.0.5|
|2021-01-01T00:00:05.000+08:00| 192.168.0.124|
+-----------------------------+----------------------------------------------------------------------+
```
## RegexReplace
### Usage
The function is used to replace the specific regular expression matches with given string.
**Name:** REGEXREPLACE
**Input Series:** Only support a single input series. The data type is TEXT.
**Parameter:**
+ `regex`: The target regular expression to be replaced. All grammars supported by Java are acceptable.
+ `replace`: The string to be put on and back reference notes in Java is also supported,
for example, '$1' refers to group 1 in the `regex` which will be filled with corresponding matched results.
+ `limit`: The number of matches to be replaced which should be an integer no less than -1,
default to -1 which means all matches will be replaced.
+ `offset`: The number of matches to be skipped, which means the first `offset` matches will not be replaced, default to 0.
+ `reverse`: Whether to count all the matches reversely, default to 'false'.
**Output Series:** Output a single series. The type is TEXT.
### Examples
Input series:
```
+-----------------------------+-------------------------------+
| Time| root.test.d1.s1|
+-----------------------------+-------------------------------+
|2021-01-01T00:00:01.000+08:00| [192.168.0.1] [SUCCESS]|
|2021-01-01T00:00:02.000+08:00| [192.168.0.24] [SUCCESS]|
|2021-01-01T00:00:03.000+08:00| [192.168.0.2] [FAIL]|
|2021-01-01T00:00:04.000+08:00| [192.168.0.5] [SUCCESS]|
|2021-01-01T00:00:05.000+08:00| [192.168.0.124] [SUCCESS]|
+-----------------------------+-------------------------------+
```
SQL for query:
```sql
select regexreplace(s1, "regex"="192\.168\.0\.(\d+)", "replace"="cluster-$1", "limit"="1") from root.test.d1
```
Output series:
```
+-----------------------------+-----------------------------------------------------------+
| Time|regexreplace(root.test.d1.s1, "regex"="192\.168\.0\.(\d+)",|
| | "replace"="cluster-$1", "limit"="1")|
+-----------------------------+-----------------------------------------------------------+
|2021-01-01T00:00:01.000+08:00| [cluster-1] [SUCCESS]|
|2021-01-01T00:00:02.000+08:00| [cluster-24] [SUCCESS]|
|2021-01-01T00:00:03.000+08:00| [cluster-2] [FAIL]|
|2021-01-01T00:00:04.000+08:00| [cluster-5] [SUCCESS]|
|2021-01-01T00:00:05.000+08:00| [cluster-124] [SUCCESS]|
+-----------------------------+-----------------------------------------------------------+
```
## RegexSplit
### Usage
The function is used to split text with given regular expression and return specific element.
**Name:** REGEXSPLIT
**Input Series:** Only support a single input series. The data type is TEXT.
**Parameter:**
+ `regex`: The regular expression used to split the text.
All grammars supported by Java are acceptable, for example, `['"]` is expected to match `'` and `"`.
+ `index`: The wanted index of elements in the split result.
It should be an integer no less than -1, default to -1 which means the length of the result array is returned
and any non-negative integer is used to fetch the text of the specific index starting from 0.
**Output Series:** Output a single series. The type is INT32 when `index` is -1 and TEXT when it's an valid index.
**Note:** When `index` is out of the range of the result array, for example `0,1,2` split with `,` and `index` is set to 3,
no result are returned for that record.
### Examples
Input series:
```
+-----------------------------+---------------+
| Time|root.test.d1.s1|
+-----------------------------+---------------+
|2021-01-01T00:00:01.000+08:00| A,B,A+,B-|
|2021-01-01T00:00:02.000+08:00| A,A+,A,B+|
|2021-01-01T00:00:03.000+08:00| B+,B,B|
|2021-01-01T00:00:04.000+08:00| A+,A,A+,A|
|2021-01-01T00:00:05.000+08:00| A,B-,B,B|
+-----------------------------+---------------+
```
SQL for query:
```sql
select regexsplit(s1, "regex"=",", "index"="-1") from root.test.d1
```
Output series:
```
+-----------------------------+------------------------------------------------------+
| Time|regexsplit(root.test.d1.s1, "regex"=",", "index"="-1")|
+-----------------------------+------------------------------------------------------+
|2021-01-01T00:00:01.000+08:00| 4|
|2021-01-01T00:00:02.000+08:00| 4|
|2021-01-01T00:00:03.000+08:00| 3|
|2021-01-01T00:00:04.000+08:00| 4|
|2021-01-01T00:00:05.000+08:00| 4|
+-----------------------------+------------------------------------------------------+
```
Another SQL for query:
SQL for query:
```sql
select regexsplit(s1, "regex"=",", "index"="3") from root.test.d1
```
Output series:
```
+-----------------------------+-----------------------------------------------------+
| Time|regexsplit(root.test.d1.s1, "regex"=",", "index"="3")|
+-----------------------------+-----------------------------------------------------+
|2021-01-01T00:00:01.000+08:00| B-|
|2021-01-01T00:00:02.000+08:00| B+|
|2021-01-01T00:00:04.000+08:00| A|
|2021-01-01T00:00:05.000+08:00| B|
+-----------------------------+-----------------------------------------------------+
```
## StrReplace
### Usage
The function is used to replace the specific substring with given string.
**Name:** STRREPLACE
**Input Series:** Only support a single input series. The data type is TEXT.
**Parameter:**
+ `target`: The target substring to be replaced.
+ `replace`: The string to be put on.
+ `limit`: The number of matches to be replaced which should be an integer no less than -1,
default to -1 which means all matches will be replaced.
+ `offset`: The number of matches to be skipped, which means the first `offset` matches will not be replaced, default to 0.
+ `reverse`: Whether to count all the matches reversely, default to 'false'.
**Output Series:** Output a single series. The type is TEXT.
### Examples
Input series:
```
+-----------------------------+---------------+
| Time|root.test.d1.s1|
+-----------------------------+---------------+
|2021-01-01T00:00:01.000+08:00| A,B,A+,B-|
|2021-01-01T00:00:02.000+08:00| A,A+,A,B+|
|2021-01-01T00:00:03.000+08:00| B+,B,B|
|2021-01-01T00:00:04.000+08:00| A+,A,A+,A|
|2021-01-01T00:00:05.000+08:00| A,B-,B,B|
+-----------------------------+---------------+
```
SQL for query:
```sql
select strreplace(s1, "target"=",", "replace"="/", "limit"="2") from root.test.d1
```
Output series:
```
+-----------------------------+-----------------------------------------+
| Time|strreplace(root.test.d1.s1, "target"=",",|
| | "replace"="/", "limit"="2")|
+-----------------------------+-----------------------------------------+
|2021-01-01T00:00:01.000+08:00| A/B/A+,B-|
|2021-01-01T00:00:02.000+08:00| A/A+/A,B+|
|2021-01-01T00:00:03.000+08:00| B+/B/B|
|2021-01-01T00:00:04.000+08:00| A+/A/A+,A|
|2021-01-01T00:00:05.000+08:00| A/B-/B,B|
+-----------------------------+-----------------------------------------+
```
Another SQL for query:
```sql
select strreplace(s1, "target"=",", "replace"="/", "limit"="1", "offset"="1", "reverse"="true") from root.test.d1
```
Output series:
```
+-----------------------------+-----------------------------------------------------+
| Time|strreplace(root.test.d1.s1, "target"=",", "replace"= |
| | "|", "limit"="1", "offset"="1", "reverse"="true")|
+-----------------------------+-----------------------------------------------------+
|2021-01-01T00:00:01.000+08:00| A,B/A+,B-|
|2021-01-01T00:00:02.000+08:00| A,A+/A,B+|
|2021-01-01T00:00:03.000+08:00| B+/B,B|
|2021-01-01T00:00:04.000+08:00| A+,A/A+,A|
|2021-01-01T00:00:05.000+08:00| A,B-/B,B|
+-----------------------------+-----------------------------------------------------+
```