| <!-- |
| |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| |
| --> |
| |
| ## 数据匹配 |
| |
| ### Cov |
| |
| #### 函数简介 |
| |
| 本函数用于计算两列数值型数据的总体协方差。 |
| |
| **函数名:** COV |
| |
| **输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE。 |
| |
| **输出序列:** 输出单个序列,类型为 DOUBLE。序列仅包含一个时间戳为 0、值为总体协方差的数据点。 |
| |
| **提示:** |
| |
| + 如果某行数据中包含空值、缺失值或`NaN`,该行数据将会被忽略; |
| + 如果数据中所有的行都被忽略,函数将会输出`NaN`。 |
| |
| |
| #### 使用示例 |
| |
| 输入序列: |
| |
| ``` |
| +-----------------------------+---------------+---------------+ |
| | Time|root.test.d2.s1|root.test.d2.s2| |
| +-----------------------------+---------------+---------------+ |
| |2020-01-01T00:00:02.000+08:00| 100.0| 101.0| |
| |2020-01-01T00:00:03.000+08:00| 101.0| null| |
| |2020-01-01T00:00:04.000+08:00| 102.0| 101.0| |
| |2020-01-01T00:00:06.000+08:00| 104.0| 102.0| |
| |2020-01-01T00:00:08.000+08:00| 126.0| 102.0| |
| |2020-01-01T00:00:10.000+08:00| 108.0| 103.0| |
| |2020-01-01T00:00:12.000+08:00| null| 103.0| |
| |2020-01-01T00:00:14.000+08:00| 112.0| 104.0| |
| |2020-01-01T00:00:15.000+08:00| 113.0| null| |
| |2020-01-01T00:00:16.000+08:00| 114.0| 104.0| |
| |2020-01-01T00:00:18.000+08:00| 116.0| 105.0| |
| |2020-01-01T00:00:20.000+08:00| 118.0| 105.0| |
| |2020-01-01T00:00:22.000+08:00| 100.0| 106.0| |
| |2020-01-01T00:00:26.000+08:00| 124.0| 108.0| |
| |2020-01-01T00:00:28.000+08:00| 126.0| 108.0| |
| |2020-01-01T00:00:30.000+08:00| NaN| 108.0| |
| +-----------------------------+---------------+---------------+ |
| ``` |
| |
| 用于查询的 SQL 语句: |
| |
| ```sql |
| select cov(s1,s2) from root.test.d2 |
| ``` |
| |
| 输出序列: |
| |
| ``` |
| +-----------------------------+-------------------------------------+ |
| | Time|cov(root.test.d2.s1, root.test.d2.s2)| |
| +-----------------------------+-------------------------------------+ |
| |1970-01-01T08:00:00.000+08:00| 12.291666666666666| |
| +-----------------------------+-------------------------------------+ |
| ``` |
| |
| ### Dtw |
| |
| #### 函数简介 |
| |
| 本函数用于计算两列数值型数据的 DTW 距离。 |
| |
| **函数名:** DTW |
| |
| **输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE。 |
| |
| **输出序列:** 输出单个序列,类型为 DOUBLE。序列仅包含一个时间戳为 0、值为两个时间序列的 DTW 距离值。 |
| |
| **提示:** |
| |
| + 如果某行数据中包含空值、缺失值或`NaN`,该行数据将会被忽略; |
| + 如果数据中所有的行都被忽略,函数将会输出 0。 |
| |
| |
| #### 使用示例 |
| |
| 输入序列: |
| |
| ``` |
| +-----------------------------+---------------+---------------+ |
| | Time|root.test.d2.s1|root.test.d2.s2| |
| +-----------------------------+---------------+---------------+ |
| |1970-01-01T08:00:00.001+08:00| 1.0| 2.0| |
| |1970-01-01T08:00:00.002+08:00| 1.0| 2.0| |
| |1970-01-01T08:00:00.003+08:00| 1.0| 2.0| |
| |1970-01-01T08:00:00.004+08:00| 1.0| 2.0| |
| |1970-01-01T08:00:00.005+08:00| 1.0| 2.0| |
| |1970-01-01T08:00:00.006+08:00| 1.0| 2.0| |
| |1970-01-01T08:00:00.007+08:00| 1.0| 2.0| |
| |1970-01-01T08:00:00.008+08:00| 1.0| 2.0| |
| |1970-01-01T08:00:00.009+08:00| 1.0| 2.0| |
| |1970-01-01T08:00:00.010+08:00| 1.0| 2.0| |
| |1970-01-01T08:00:00.011+08:00| 1.0| 2.0| |
| |1970-01-01T08:00:00.012+08:00| 1.0| 2.0| |
| |1970-01-01T08:00:00.013+08:00| 1.0| 2.0| |
| |1970-01-01T08:00:00.014+08:00| 1.0| 2.0| |
| |1970-01-01T08:00:00.015+08:00| 1.0| 2.0| |
| |1970-01-01T08:00:00.016+08:00| 1.0| 2.0| |
| |1970-01-01T08:00:00.017+08:00| 1.0| 2.0| |
| |1970-01-01T08:00:00.018+08:00| 1.0| 2.0| |
| |1970-01-01T08:00:00.019+08:00| 1.0| 2.0| |
| |1970-01-01T08:00:00.020+08:00| 1.0| 2.0| |
| +-----------------------------+---------------+---------------+ |
| ``` |
| |
| 用于查询的 SQL 语句: |
| |
| ```sql |
| select dtw(s1,s2) from root.test.d2 |
| ``` |
| |
| 输出序列: |
| |
| ``` |
| +-----------------------------+-------------------------------------+ |
| | Time|dtw(root.test.d2.s1, root.test.d2.s2)| |
| +-----------------------------+-------------------------------------+ |
| |1970-01-01T08:00:00.000+08:00| 20.0| |
| +-----------------------------+-------------------------------------+ |
| ``` |
| |
| ### Pearson |
| |
| #### 函数简介 |
| |
| 本函数用于计算两列数值型数据的皮尔森相关系数。 |
| |
| **函数名:** PEARSON |
| |
| **输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE。 |
| |
| **输出序列:** 输出单个序列,类型为 DOUBLE。序列仅包含一个时间戳为 0、值为皮尔森相关系数的数据点。 |
| |
| **提示:** |
| |
| + 如果某行数据中包含空值、缺失值或`NaN`,该行数据将会被忽略; |
| + 如果数据中所有的行都被忽略,函数将会输出`NaN`。 |
| |
| #### 使用示例 |
| |
| 输入序列: |
| |
| ``` |
| +-----------------------------+---------------+---------------+ |
| | Time|root.test.d2.s1|root.test.d2.s2| |
| +-----------------------------+---------------+---------------+ |
| |2020-01-01T00:00:02.000+08:00| 100.0| 101.0| |
| |2020-01-01T00:00:03.000+08:00| 101.0| null| |
| |2020-01-01T00:00:04.000+08:00| 102.0| 101.0| |
| |2020-01-01T00:00:06.000+08:00| 104.0| 102.0| |
| |2020-01-01T00:00:08.000+08:00| 126.0| 102.0| |
| |2020-01-01T00:00:10.000+08:00| 108.0| 103.0| |
| |2020-01-01T00:00:12.000+08:00| null| 103.0| |
| |2020-01-01T00:00:14.000+08:00| 112.0| 104.0| |
| |2020-01-01T00:00:15.000+08:00| 113.0| null| |
| |2020-01-01T00:00:16.000+08:00| 114.0| 104.0| |
| |2020-01-01T00:00:18.000+08:00| 116.0| 105.0| |
| |2020-01-01T00:00:20.000+08:00| 118.0| 105.0| |
| |2020-01-01T00:00:22.000+08:00| 100.0| 106.0| |
| |2020-01-01T00:00:26.000+08:00| 124.0| 108.0| |
| |2020-01-01T00:00:28.000+08:00| 126.0| 108.0| |
| |2020-01-01T00:00:30.000+08:00| NaN| 108.0| |
| +-----------------------------+---------------+---------------+ |
| ``` |
| |
| 用于查询的 SQL 语句: |
| |
| ```sql |
| select pearson(s1,s2) from root.test.d2 |
| ``` |
| |
| 输出序列: |
| |
| ``` |
| +-----------------------------+-----------------------------------------+ |
| | Time|pearson(root.test.d2.s1, root.test.d2.s2)| |
| +-----------------------------+-----------------------------------------+ |
| |1970-01-01T08:00:00.000+08:00| 0.5630881927754872| |
| +-----------------------------+-----------------------------------------+ |
| ``` |
| |
| ### PtnSym |
| |
| #### 函数简介 |
| |
| 本函数用于寻找序列中所有对称度小于阈值的对称子序列。对称度通过 DTW 计算,值越小代表序列对称性越高。 |
| |
| **函数名:** PTNSYM |
| |
| **输入序列:** 仅支持一个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 |
| |
| **参数:** |
| |
| + `window`:对称子序列的长度,是一个正整数,默认值为 10。 |
| + `threshold`:对称度阈值,是一个非负数,只有对称度小于等于该值的对称子序列才会被输出。在缺省情况下,所有的子序列都会被输出。 |
| |
| **输出序列:** 输出单个序列,类型为 DOUBLE。序列中的每一个数据点对应于一个对称子序列,时间戳为子序列的起始时刻,值为对称度。 |
| |
| |
| #### 使用示例 |
| |
| 输入序列: |
| |
| ``` |
| +-----------------------------+---------------+ |
| | Time|root.test.d1.s4| |
| +-----------------------------+---------------+ |
| |2021-01-01T12:00:00.000+08:00| 1.0| |
| |2021-01-01T12:00:01.000+08:00| 2.0| |
| |2021-01-01T12:00:02.000+08:00| 3.0| |
| |2021-01-01T12:00:03.000+08:00| 2.0| |
| |2021-01-01T12:00:04.000+08:00| 1.0| |
| |2021-01-01T12:00:05.000+08:00| 1.0| |
| |2021-01-01T12:00:06.000+08:00| 1.0| |
| |2021-01-01T12:00:07.000+08:00| 1.0| |
| |2021-01-01T12:00:08.000+08:00| 2.0| |
| |2021-01-01T12:00:09.000+08:00| 3.0| |
| |2021-01-01T12:00:10.000+08:00| 2.0| |
| |2021-01-01T12:00:11.000+08:00| 1.0| |
| +-----------------------------+---------------+ |
| ``` |
| |
| 用于查询的 SQL 语句: |
| |
| ```sql |
| select ptnsym(s4, 'window'='5', 'threshold'='0') from root.test.d1 |
| ``` |
| |
| 输出序列: |
| |
| ``` |
| +-----------------------------+------------------------------------------------------+ |
| | Time|ptnsym(root.test.d1.s4, "window"="5", "threshold"="0")| |
| +-----------------------------+------------------------------------------------------+ |
| |2021-01-01T12:00:00.000+08:00| 0.0| |
| |2021-01-01T12:00:07.000+08:00| 0.0| |
| +-----------------------------+------------------------------------------------------+ |
| ``` |
| |
| ### XCorr |
| |
| #### 函数简介 |
| |
| 本函数用于计算两条时间序列的互相关函数值, |
| 对离散序列而言,互相关函数可以表示为 |
| $$CR(n) = \frac{1}{N} \sum_{m=1}^N S_1[m]S_2[m+n]$$ |
| 常用于表征两条序列在不同对齐条件下的相似度。 |
| |
| **函数名:** XCORR |
| |
| **输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE。 |
| |
| **输出序列:** 输出单个序列,类型为 DOUBLE。序列中共包含$2N-1$个数据点, |
| 其中正中心的值为两条序列按照预先对齐的结果计算的互相关系数(即等于以上公式的$CR(0)$), |
| 前半部分的值表示将后一条输入序列向前平移时计算的互相关系数, |
| 直至两条序列没有重合的数据点(不包含完全分离时的结果$CR(-N)=0.0$), |
| 后半部分类似。 |
| 用公式可表示为(所有序列的索引从1开始计数): |
| $$OS[i] = CR(-N+i) = \frac{1}{N} \sum_{m=1}^{i} S_1[m]S_2[N-i+m],\ if\ i <= N$$ |
| $$OS[i] = CR(i-N) = \frac{1}{N} \sum_{m=1}^{2N-i} S_1[i-N+m]S_2[m],\ if\ i > N$$ |
| |
| **提示:** |
| |
| + 两条序列中的`null` 和`NaN` 值会被忽略,在计算中表现为 0。 |
| |
| #### 使用示例 |
| |
| 输入序列: |
| |
| ``` |
| +-----------------------------+---------------+---------------+ |
| | Time|root.test.d1.s1|root.test.d1.s2| |
| +-----------------------------+---------------+---------------+ |
| |2020-01-01T00:00:01.000+08:00| null| 6| |
| |2020-01-01T00:00:02.000+08:00| 2| 7| |
| |2020-01-01T00:00:03.000+08:00| 3| NaN| |
| |2020-01-01T00:00:04.000+08:00| 4| 9| |
| |2020-01-01T00:00:05.000+08:00| 5| 10| |
| +-----------------------------+---------------+---------------+ |
| ``` |
| |
| |
| 用于查询的 SQL 语句: |
| |
| ```sql |
| select xcorr(s1, s2) from root.test.d1 where time <= 2020-01-01 00:00:05 |
| ``` |
| |
| 输出序列: |
| |
| ``` |
| +-----------------------------+---------------------------------------+ |
| | Time|xcorr(root.test.d1.s1, root.test.d1.s2)| |
| +-----------------------------+---------------------------------------+ |
| |1970-01-01T08:00:00.001+08:00| 0.0| |
| |1970-01-01T08:00:00.002+08:00| 4.0| |
| |1970-01-01T08:00:00.003+08:00| 9.6| |
| |1970-01-01T08:00:00.004+08:00| 13.4| |
| |1970-01-01T08:00:00.005+08:00| 20.0| |
| |1970-01-01T08:00:00.006+08:00| 15.6| |
| |1970-01-01T08:00:00.007+08:00| 9.2| |
| |1970-01-01T08:00:00.008+08:00| 11.8| |
| |1970-01-01T08:00:00.009+08:00| 6.0| |
| +-----------------------------+---------------------------------------+ |
| ``` |
| |
| ### Top-K DTW |
| |
| #### 函数简介 |
| |
| 本函数通过滑动窗口匹配算法,计算在目标时间序列中与模式时间序列的 DTW 距离最接近的 前 K 个子串。 |
| |
| **函数名:** TOP_K_DTW_SLIDING_WINDOW |
| |
| **输入参数:** |
| + 第一个时间序列:目标时间序列 |
| + 第二个时间序列:模式时间序列 |
| + 参数 k:返回的子串个数 |
| + (可选)参数 batchSize:算法每批次输入的数据行数,默认为 65535 |
| |
| **输出序列:** 输出单个序列,序列中共包含K个数据点,每个数据点包含如下信息: |
| + startTime:子串在目标时间序列中的起始时间戳 |
| + endTime:子串在目标时间序列中的结束时间戳 |
| + distance:子串与模式时间序列的 DTW 距离 |
| |
| #### 使用示例 |
| |
| 首先保证 IoTDB 中已写入目标时间序列和模式时间序列,随后可用下面的 SQL 语句查询: |
| |
| ```sql |
| select top_k_dtw_sliding_window(s, p, 'k'='20') from root.database.device; |
| ``` |
| |
| 输出序列: |
| |
| ``` |
| +-----------------------------+----------------------------------------------------------------------------------+ |
| | Time|top_k_dtw_sliding_window(root.database.device.s, root.database.device.p, "k"="20")| |
| +-----------------------------+----------------------------------------------------------------------------------+ |
| |2000-03-31T14:46:40.000+08:00| DTWPath{startTime=954485200000, endTime=954485224000, distance=0.0}| |
| |2000-03-31T17:33:45.000+08:00| DTWPath{startTime=954495225000, endTime=954495250000, distance=0.0}| |
| |2000-03-31T20:20:51.000+08:00| DTWPath{startTime=954505251000, endTime=954505278000, distance=0.0}| |
| |2000-03-31T23:07:59.000+08:00| DTWPath{startTime=954515279000, endTime=954515301000, distance=0.0}| |
| |2000-04-01T01:55:02.000+08:00| DTWPath{startTime=954525302000, endTime=954525325000, distance=0.0}| |
| |2000-04-01T04:42:06.000+08:00| DTWPath{startTime=954535326000, endTime=954535350000, distance=0.0}| |
| |2000-04-01T07:29:11.000+08:00| DTWPath{startTime=954545351000, endTime=954545376000, distance=0.0}| |
| |2000-04-01T10:16:17.000+08:00| DTWPath{startTime=954555377000, endTime=954555403000, distance=0.0}| |
| |2000-04-01T13:03:24.000+08:00| DTWPath{startTime=954565404000, endTime=954565432000, distance=0.0}| |
| |2000-04-01T15:50:33.000+08:00| DTWPath{startTime=954575433000, endTime=954575461000, distance=0.0}| |
| |2000-04-01T18:37:42.000+08:00| DTWPath{startTime=954585462000, endTime=954585488000, distance=0.0}| |
| |2000-04-01T21:24:49.000+08:00| DTWPath{startTime=954595489000, endTime=954595516000, distance=0.0}| |
| |2000-04-02T00:11:57.000+08:00| DTWPath{startTime=954605517000, endTime=954605541000, distance=0.0}| |
| |2000-04-02T02:59:02.000+08:00| DTWPath{startTime=954615542000, endTime=954615570000, distance=0.0}| |
| |2000-04-02T05:46:11.000+08:00| DTWPath{startTime=954625571000, endTime=954625597000, distance=0.0}| |
| |2000-04-02T08:33:18.000+08:00| DTWPath{startTime=954635598000, endTime=954635623000, distance=0.0}| |
| |2000-04-02T11:20:24.000+08:00| DTWPath{startTime=954645624000, endTime=954645657000, distance=0.0}| |
| |2000-04-02T14:07:38.000+08:00| DTWPath{startTime=954655658000, endTime=954655682000, distance=0.0}| |
| |2000-04-02T16:54:43.000+08:00| DTWPath{startTime=954665683000, endTime=954665708000, distance=0.0}| |
| |2000-04-02T19:41:49.000+08:00| DTWPath{startTime=954675709000, endTime=954675736000, distance=0.0}| |
| +-----------------------------+----------------------------------------------------------------------------------+ |
| ``` |