| <!-- |
| |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| |
| --> |
| |
| # String Processing |
| |
| ## RegexMatch |
| |
| ### Usage |
| |
| The function is used to fetch matched contents from text with given regular expression. |
| |
| **Name:** REGEXMATCH |
| |
| **Input Series:** Only support a single input series. The data type is TEXT. |
| |
| **Parameter:** |
| |
| + `regex`: The regular expression to match in the text. All grammars supported by Java are acceptable, |
| for example, `\d+\.\d+\.\d+\.\d+` is expected to match any IPv4 addresses. |
| + `group`: The wanted group index in the matched result. |
| Reference to java.util.regex, group 0 is the whole pattern and |
| the next ones are numbered with the appearance order of left parentheses. |
| For example, the groups in `A(B(CD))` are: 0-`A(B(CD))`, 1-`B(CD)`, 2-`CD`. |
| |
| **Output Series:** Output a single series. The type is TEXT. |
| |
| **Note:** Those points with null values or not matched with the given pattern will not return any results. |
| |
| ### Examples |
| |
| Input series: |
| |
| ``` |
| +-----------------------------+-------------------------------+ |
| | Time| root.test.d1.s1| |
| +-----------------------------+-------------------------------+ |
| |2021-01-01T00:00:01.000+08:00| [192.168.0.1] [SUCCESS]| |
| |2021-01-01T00:00:02.000+08:00| [192.168.0.24] [SUCCESS]| |
| |2021-01-01T00:00:03.000+08:00| [192.168.0.2] [FAIL]| |
| |2021-01-01T00:00:04.000+08:00| [192.168.0.5] [SUCCESS]| |
| |2021-01-01T00:00:05.000+08:00| [192.168.0.124] [SUCCESS]| |
| +-----------------------------+-------------------------------+ |
| ``` |
| |
| SQL for query: |
| |
| ```sql |
| select regexmatch(s1, "regex"="\d+\.\d+\.\d+\.\d+", "group"="0") from root.test.d1 |
| ``` |
| |
| Output series: |
| |
| ``` |
| +-----------------------------+----------------------------------------------------------------------+ |
| | Time|regexmatch(root.test.d1.s1, "regex"="\d+\.\d+\.\d+\.\d+", "group"="0")| |
| +-----------------------------+----------------------------------------------------------------------+ |
| |2021-01-01T00:00:01.000+08:00| 192.168.0.1| |
| |2021-01-01T00:00:02.000+08:00| 192.168.0.24| |
| |2021-01-01T00:00:03.000+08:00| 192.168.0.2| |
| |2021-01-01T00:00:04.000+08:00| 192.168.0.5| |
| |2021-01-01T00:00:05.000+08:00| 192.168.0.124| |
| +-----------------------------+----------------------------------------------------------------------+ |
| ``` |
| |
| ## RegexReplace |
| |
| ### Usage |
| |
| The function is used to replace the specific regular expression matches with given string. |
| |
| **Name:** REGEXREPLACE |
| |
| **Input Series:** Only support a single input series. The data type is TEXT. |
| |
| **Parameter:** |
| |
| + `regex`: The target regular expression to be replaced. All grammars supported by Java are acceptable. |
| + `replace`: The string to be put on and back reference notes in Java is also supported, |
| for example, '$1' refers to group 1 in the `regex` which will be filled with corresponding matched results. |
| + `limit`: The number of matches to be replaced which should be an integer no less than -1, |
| default to -1 which means all matches will be replaced. |
| + `offset`: The number of matches to be skipped, which means the first `offset` matches will not be replaced, default to 0. |
| + `reverse`: Whether to count all the matches reversely, default to 'false'. |
| |
| **Output Series:** Output a single series. The type is TEXT. |
| |
| ### Examples |
| |
| Input series: |
| |
| ``` |
| +-----------------------------+-------------------------------+ |
| | Time| root.test.d1.s1| |
| +-----------------------------+-------------------------------+ |
| |2021-01-01T00:00:01.000+08:00| [192.168.0.1] [SUCCESS]| |
| |2021-01-01T00:00:02.000+08:00| [192.168.0.24] [SUCCESS]| |
| |2021-01-01T00:00:03.000+08:00| [192.168.0.2] [FAIL]| |
| |2021-01-01T00:00:04.000+08:00| [192.168.0.5] [SUCCESS]| |
| |2021-01-01T00:00:05.000+08:00| [192.168.0.124] [SUCCESS]| |
| +-----------------------------+-------------------------------+ |
| ``` |
| |
| SQL for query: |
| |
| ```sql |
| select regexreplace(s1, "regex"="192\.168\.0\.(\d+)", "replace"="cluster-$1", "limit"="1") from root.test.d1 |
| ``` |
| |
| Output series: |
| |
| ``` |
| +-----------------------------+-----------------------------------------------------------+ |
| | Time|regexreplace(root.test.d1.s1, "regex"="192\.168\.0\.(\d+)",| |
| | | "replace"="cluster-$1", "limit"="1")| |
| +-----------------------------+-----------------------------------------------------------+ |
| |2021-01-01T00:00:01.000+08:00| [cluster-1] [SUCCESS]| |
| |2021-01-01T00:00:02.000+08:00| [cluster-24] [SUCCESS]| |
| |2021-01-01T00:00:03.000+08:00| [cluster-2] [FAIL]| |
| |2021-01-01T00:00:04.000+08:00| [cluster-5] [SUCCESS]| |
| |2021-01-01T00:00:05.000+08:00| [cluster-124] [SUCCESS]| |
| +-----------------------------+-----------------------------------------------------------+ |
| ``` |
| |
| ## RegexSplit |
| |
| ### Usage |
| |
| The function is used to split text with given regular expression and return specific element. |
| |
| **Name:** REGEXSPLIT |
| |
| **Input Series:** Only support a single input series. The data type is TEXT. |
| |
| **Parameter:** |
| |
| + `regex`: The regular expression used to split the text. |
| All grammars supported by Java are acceptable, for example, `['"]` is expected to match `'` and `"`. |
| + `index`: The wanted index of elements in the split result. |
| It should be an integer no less than -1, default to -1 which means the length of the result array is returned |
| and any non-negative integer is used to fetch the text of the specific index starting from 0. |
| |
| **Output Series:** Output a single series. The type is INT32 when `index` is -1 and TEXT when it's an valid index. |
| |
| **Note:** When `index` is out of the range of the result array, for example `0,1,2` split with `,` and `index` is set to 3, |
| no result are returned for that record. |
| |
| ### Examples |
| |
| Input series: |
| |
| ``` |
| +-----------------------------+---------------+ |
| | Time|root.test.d1.s1| |
| +-----------------------------+---------------+ |
| |2021-01-01T00:00:01.000+08:00| A,B,A+,B-| |
| |2021-01-01T00:00:02.000+08:00| A,A+,A,B+| |
| |2021-01-01T00:00:03.000+08:00| B+,B,B| |
| |2021-01-01T00:00:04.000+08:00| A+,A,A+,A| |
| |2021-01-01T00:00:05.000+08:00| A,B-,B,B| |
| +-----------------------------+---------------+ |
| ``` |
| |
| SQL for query: |
| |
| ```sql |
| select regexsplit(s1, "regex"=",", "index"="-1") from root.test.d1 |
| ``` |
| |
| Output series: |
| |
| ``` |
| +-----------------------------+------------------------------------------------------+ |
| | Time|regexsplit(root.test.d1.s1, "regex"=",", "index"="-1")| |
| +-----------------------------+------------------------------------------------------+ |
| |2021-01-01T00:00:01.000+08:00| 4| |
| |2021-01-01T00:00:02.000+08:00| 4| |
| |2021-01-01T00:00:03.000+08:00| 3| |
| |2021-01-01T00:00:04.000+08:00| 4| |
| |2021-01-01T00:00:05.000+08:00| 4| |
| +-----------------------------+------------------------------------------------------+ |
| ``` |
| |
| Another SQL for query: |
| |
| SQL for query: |
| |
| ```sql |
| select regexsplit(s1, "regex"=",", "index"="3") from root.test.d1 |
| ``` |
| |
| Output series: |
| |
| ``` |
| +-----------------------------+-----------------------------------------------------+ |
| | Time|regexsplit(root.test.d1.s1, "regex"=",", "index"="3")| |
| +-----------------------------+-----------------------------------------------------+ |
| |2021-01-01T00:00:01.000+08:00| B-| |
| |2021-01-01T00:00:02.000+08:00| B+| |
| |2021-01-01T00:00:04.000+08:00| A| |
| |2021-01-01T00:00:05.000+08:00| B| |
| +-----------------------------+-----------------------------------------------------+ |
| ``` |
| |
| ## StrReplace |
| |
| ### Usage |
| |
| The function is used to replace the specific substring with given string. |
| |
| **Name:** STRREPLACE |
| |
| **Input Series:** Only support a single input series. The data type is TEXT. |
| |
| **Parameter:** |
| |
| + `target`: The target substring to be replaced. |
| + `replace`: The string to be put on. |
| + `limit`: The number of matches to be replaced which should be an integer no less than -1, |
| default to -1 which means all matches will be replaced. |
| + `offset`: The number of matches to be skipped, which means the first `offset` matches will not be replaced, default to 0. |
| + `reverse`: Whether to count all the matches reversely, default to 'false'. |
| |
| **Output Series:** Output a single series. The type is TEXT. |
| |
| ### Examples |
| |
| Input series: |
| |
| ``` |
| +-----------------------------+---------------+ |
| | Time|root.test.d1.s1| |
| +-----------------------------+---------------+ |
| |2021-01-01T00:00:01.000+08:00| A,B,A+,B-| |
| |2021-01-01T00:00:02.000+08:00| A,A+,A,B+| |
| |2021-01-01T00:00:03.000+08:00| B+,B,B| |
| |2021-01-01T00:00:04.000+08:00| A+,A,A+,A| |
| |2021-01-01T00:00:05.000+08:00| A,B-,B,B| |
| +-----------------------------+---------------+ |
| ``` |
| |
| SQL for query: |
| |
| ```sql |
| select strreplace(s1, "target"=",", "replace"="/", "limit"="2") from root.test.d1 |
| ``` |
| |
| Output series: |
| |
| ``` |
| +-----------------------------+-----------------------------------------+ |
| | Time|strreplace(root.test.d1.s1, "target"=",",| |
| | | "replace"="/", "limit"="2")| |
| +-----------------------------+-----------------------------------------+ |
| |2021-01-01T00:00:01.000+08:00| A/B/A+,B-| |
| |2021-01-01T00:00:02.000+08:00| A/A+/A,B+| |
| |2021-01-01T00:00:03.000+08:00| B+/B/B| |
| |2021-01-01T00:00:04.000+08:00| A+/A/A+,A| |
| |2021-01-01T00:00:05.000+08:00| A/B-/B,B| |
| +-----------------------------+-----------------------------------------+ |
| ``` |
| |
| Another SQL for query: |
| |
| ```sql |
| select strreplace(s1, "target"=",", "replace"="/", "limit"="1", "offset"="1", "reverse"="true") from root.test.d1 |
| ``` |
| |
| Output series: |
| |
| ``` |
| +-----------------------------+-----------------------------------------------------+ |
| | Time|strreplace(root.test.d1.s1, "target"=",", "replace"= | |
| | | "|", "limit"="1", "offset"="1", "reverse"="true")| |
| +-----------------------------+-----------------------------------------------------+ |
| |2021-01-01T00:00:01.000+08:00| A,B/A+,B-| |
| |2021-01-01T00:00:02.000+08:00| A,A+/A,B+| |
| |2021-01-01T00:00:03.000+08:00| B+/B,B| |
| |2021-01-01T00:00:04.000+08:00| A+,A/A+,A| |
| |2021-01-01T00:00:05.000+08:00| A,B-/B,B| |
| +-----------------------------+-----------------------------------------------------+ |
| ``` |