本函数用于检验超出上下四分位数1.5倍IQR的数据分布异常。
函数名: IQR
输入序列: 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。
参数:
method:若设置为 “batch”,则将数据全部读入后检测;若设置为 “stream”,则需用户提供上下四分位数进行流式检测。默认为 “batch”。q1:使用流式计算时的下四分位数。q3:使用流式计算时的上四分位数。输出序列:输出单个序列,类型为 DOUBLE。
说明:$IQR=Q_3-Q_1$
输入序列:
+-----------------------------+------------+ | Time|root.test.s1| +-----------------------------+------------+ |1970-01-01T08:00:00.100+08:00| 0.0| |1970-01-01T08:00:00.200+08:00| 0.0| |1970-01-01T08:00:00.300+08:00| 1.0| |1970-01-01T08:00:00.400+08:00| -1.0| |1970-01-01T08:00:00.500+08:00| 0.0| |1970-01-01T08:00:00.600+08:00| 0.0| |1970-01-01T08:00:00.700+08:00| -2.0| |1970-01-01T08:00:00.800+08:00| 2.0| |1970-01-01T08:00:00.900+08:00| 0.0| |1970-01-01T08:00:01.000+08:00| 0.0| |1970-01-01T08:00:01.100+08:00| 1.0| |1970-01-01T08:00:01.200+08:00| -1.0| |1970-01-01T08:00:01.300+08:00| -1.0| |1970-01-01T08:00:01.400+08:00| 1.0| |1970-01-01T08:00:01.500+08:00| 0.0| |1970-01-01T08:00:01.600+08:00| 0.0| |1970-01-01T08:00:01.700+08:00| 10.0| |1970-01-01T08:00:01.800+08:00| 2.0| |1970-01-01T08:00:01.900+08:00| -2.0| |1970-01-01T08:00:02.000+08:00| 0.0| +-----------------------------+------------+
用于查询的 SQL 语句:
select iqr(s1) from root.test
输出序列:
+-----------------------------+-----------------+ | Time|iqr(root.test.s1)| +-----------------------------+-----------------+ |1970-01-01T08:00:01.700+08:00| 10.0| +-----------------------------+-----------------+
本函数利用动态 K-Sigma 算法进行异常检测。在一个窗口内,与平均值的差距超过k倍标准差的数据将被视作异常并输出。
函数名: KSIGMA
输入序列: 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE
参数:
k:在动态 K-Sigma 算法中,分布异常的标准差倍数阈值,默认值为 3。window:动态 K-Sigma 算法的滑动窗口大小,默认值为 10000。输出序列: 输出单个序列,类型与输入序列相同。
提示: k 应大于 0,否则将不做输出。
输入序列:
+-----------------------------+---------------+ | Time|root.test.d1.s1| +-----------------------------+---------------+ |2020-01-01T00:00:02.000+08:00| 0.0| |2020-01-01T00:00:03.000+08:00| 50.0| |2020-01-01T00:00:04.000+08:00| 100.0| |2020-01-01T00:00:06.000+08:00| 150.0| |2020-01-01T00:00:08.000+08:00| 200.0| |2020-01-01T00:00:10.000+08:00| 200.0| |2020-01-01T00:00:14.000+08:00| 200.0| |2020-01-01T00:00:15.000+08:00| 200.0| |2020-01-01T00:00:16.000+08:00| 200.0| |2020-01-01T00:00:18.000+08:00| 200.0| |2020-01-01T00:00:20.000+08:00| 150.0| |2020-01-01T00:00:22.000+08:00| 100.0| |2020-01-01T00:00:26.000+08:00| 50.0| |2020-01-01T00:00:28.000+08:00| 0.0| |2020-01-01T00:00:30.000+08:00| NaN| +-----------------------------+---------------+
用于查询的 SQL 语句:
select ksigma(s1,"k"="1.0") from root.test.d1 where time <= 2020-01-01 00:00:30
输出序列:
+-----------------------------+---------------------------------+ |Time |ksigma(root.test.d1.s1,"k"="3.0")| +-----------------------------+---------------------------------+ |2020-01-01T00:00:02.000+08:00| 0.0| |2020-01-01T00:00:03.000+08:00| 50.0| |2020-01-01T00:00:26.000+08:00| 50.0| |2020-01-01T00:00:28.000+08:00| 0.0| +-----------------------------+---------------------------------+
本函数使用局部离群点检测方法用于查找序列的密度异常。将根据提供的第k距离数及局部离群点因子(lof)阈值,判断输入数据是否为离群点,即异常,并输出各点的 LOF 值。
函数名: LOF
输入序列: 多个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE
参数:
method:使用的检测方法。默认为 default,以高维数据计算。设置为 series,将一维时间序列转换为高维数据计算。k:使用第k距离计算局部离群点因子.默认为 3。window:每次读取数据的窗口长度。默认为 10000.windowsize:使用series方法时,转化高维数据的维数,即单个窗口的大小。默认为 5。输出序列: 输出单时间序列,类型为DOUBLE。
提示: 不完整的数据行会被忽略,不参与计算,也不标记为离群点。
输入序列:
+-----------------------------+---------------+---------------+ | Time|root.test.d1.s1|root.test.d1.s2| +-----------------------------+---------------+---------------+ |1970-01-01T08:00:00.100+08:00| 0.0| 0.0| |1970-01-01T08:00:00.200+08:00| 0.0| 1.0| |1970-01-01T08:00:00.300+08:00| 1.0| 1.0| |1970-01-01T08:00:00.400+08:00| 1.0| 0.0| |1970-01-01T08:00:00.500+08:00| 0.0| -1.0| |1970-01-01T08:00:00.600+08:00| -1.0| -1.0| |1970-01-01T08:00:00.700+08:00| -1.0| 0.0| |1970-01-01T08:00:00.800+08:00| 2.0| 2.0| |1970-01-01T08:00:00.900+08:00| 0.0| null| +-----------------------------+---------------+---------------+
用于查询的 SQL 语句:
select lof(s1,s2) from root.test.d1 where time<1000
输出序列:
+-----------------------------+-------------------------------------+ | Time|lof(root.test.d1.s1, root.test.d1.s2)| +-----------------------------+-------------------------------------+ |1970-01-01T08:00:00.100+08:00| 3.8274824267668244| |1970-01-01T08:00:00.200+08:00| 3.0117631741126156| |1970-01-01T08:00:00.300+08:00| 2.838155437762879| |1970-01-01T08:00:00.400+08:00| 3.0117631741126156| |1970-01-01T08:00:00.500+08:00| 2.73518261244453| |1970-01-01T08:00:00.600+08:00| 2.371440975708148| |1970-01-01T08:00:00.700+08:00| 2.73518261244453| |1970-01-01T08:00:00.800+08:00| 1.7561416374270742| +-----------------------------+-------------------------------------+
输入序列:
+-----------------------------+---------------+ | Time|root.test.d1.s1| +-----------------------------+---------------+ |1970-01-01T08:00:00.100+08:00| 1.0| |1970-01-01T08:00:00.200+08:00| 2.0| |1970-01-01T08:00:00.300+08:00| 3.0| |1970-01-01T08:00:00.400+08:00| 4.0| |1970-01-01T08:00:00.500+08:00| 5.0| |1970-01-01T08:00:00.600+08:00| 6.0| |1970-01-01T08:00:00.700+08:00| 7.0| |1970-01-01T08:00:00.800+08:00| 8.0| |1970-01-01T08:00:00.900+08:00| 9.0| |1970-01-01T08:00:01.000+08:00| 10.0| |1970-01-01T08:00:01.100+08:00| 11.0| |1970-01-01T08:00:01.200+08:00| 12.0| |1970-01-01T08:00:01.300+08:00| 13.0| |1970-01-01T08:00:01.400+08:00| 14.0| |1970-01-01T08:00:01.500+08:00| 15.0| |1970-01-01T08:00:01.600+08:00| 16.0| |1970-01-01T08:00:01.700+08:00| 17.0| |1970-01-01T08:00:01.800+08:00| 18.0| |1970-01-01T08:00:01.900+08:00| 19.0| |1970-01-01T08:00:02.000+08:00| 20.0| +-----------------------------+---------------+
用于查询的 SQL 语句:
select lof(s1, "method"="series") from root.test.d1 where time<1000
输出序列:
+-----------------------------+--------------------+ | Time|lof(root.test.d1.s1)| +-----------------------------+--------------------+ |1970-01-01T08:00:00.100+08:00| 3.77777777777778| |1970-01-01T08:00:00.200+08:00| 4.32727272727273| |1970-01-01T08:00:00.300+08:00| 4.85714285714286| |1970-01-01T08:00:00.400+08:00| 5.40909090909091| |1970-01-01T08:00:00.500+08:00| 5.94999999999999| |1970-01-01T08:00:00.600+08:00| 6.43243243243243| |1970-01-01T08:00:00.700+08:00| 6.79999999999999| |1970-01-01T08:00:00.800+08:00| 7.0| |1970-01-01T08:00:00.900+08:00| 7.0| |1970-01-01T08:00:01.000+08:00| 6.79999999999999| |1970-01-01T08:00:01.100+08:00| 6.43243243243243| |1970-01-01T08:00:01.200+08:00| 5.94999999999999| |1970-01-01T08:00:01.300+08:00| 5.40909090909091| |1970-01-01T08:00:01.400+08:00| 4.85714285714286| |1970-01-01T08:00:01.500+08:00| 4.32727272727273| |1970-01-01T08:00:01.600+08:00| 3.77777777777778| +-----------------------------+--------------------+
本函数用于检测数据中的缺失异常。在一些数据中,缺失数据会被线性插值填补,在数据中出现完美的线性片段,且这些片段往往长度较大。本函数通过在数据中发现这些完美线性片段来检测缺失异常。
函数名: MISSDETECT
输入序列: 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。
参数:
minlen:被标记为异常的完美线性片段的最小长度,是一个大于等于 10 的整数,默认值为 10。输出序列: 输出单个序列,类型为 BOOLEAN,即该数据点是否为缺失异常。
提示: 数据中的NaN将会被忽略。
输入序列:
+-----------------------------+---------------+ | Time|root.test.d2.s2| +-----------------------------+---------------+ |2021-07-01T12:00:00.000+08:00| 0.0| |2021-07-01T12:00:01.000+08:00| 1.0| |2021-07-01T12:00:02.000+08:00| 0.0| |2021-07-01T12:00:03.000+08:00| 1.0| |2021-07-01T12:00:04.000+08:00| 0.0| |2021-07-01T12:00:05.000+08:00| 0.0| |2021-07-01T12:00:06.000+08:00| 0.0| |2021-07-01T12:00:07.000+08:00| 0.0| |2021-07-01T12:00:08.000+08:00| 0.0| |2021-07-01T12:00:09.000+08:00| 0.0| |2021-07-01T12:00:10.000+08:00| 0.0| |2021-07-01T12:00:11.000+08:00| 0.0| |2021-07-01T12:00:12.000+08:00| 0.0| |2021-07-01T12:00:13.000+08:00| 0.0| |2021-07-01T12:00:14.000+08:00| 0.0| |2021-07-01T12:00:15.000+08:00| 0.0| |2021-07-01T12:00:16.000+08:00| 1.0| |2021-07-01T12:00:17.000+08:00| 0.0| |2021-07-01T12:00:18.000+08:00| 1.0| |2021-07-01T12:00:19.000+08:00| 0.0| |2021-07-01T12:00:20.000+08:00| 1.0| +-----------------------------+---------------+
用于查询的SQL语句:
select missdetect(s2,'minlen'='10') from root.test.d2
输出序列:
+-----------------------------+------------------------------------------+ | Time|missdetect(root.test.d2.s2, "minlen"="10")| +-----------------------------+------------------------------------------+ |2021-07-01T12:00:00.000+08:00| false| |2021-07-01T12:00:01.000+08:00| false| |2021-07-01T12:00:02.000+08:00| false| |2021-07-01T12:00:03.000+08:00| false| |2021-07-01T12:00:04.000+08:00| true| |2021-07-01T12:00:05.000+08:00| true| |2021-07-01T12:00:06.000+08:00| true| |2021-07-01T12:00:07.000+08:00| true| |2021-07-01T12:00:08.000+08:00| true| |2021-07-01T12:00:09.000+08:00| true| |2021-07-01T12:00:10.000+08:00| true| |2021-07-01T12:00:11.000+08:00| true| |2021-07-01T12:00:12.000+08:00| true| |2021-07-01T12:00:13.000+08:00| true| |2021-07-01T12:00:14.000+08:00| true| |2021-07-01T12:00:15.000+08:00| true| |2021-07-01T12:00:16.000+08:00| false| |2021-07-01T12:00:17.000+08:00| false| |2021-07-01T12:00:18.000+08:00| false| |2021-07-01T12:00:19.000+08:00| false| |2021-07-01T12:00:20.000+08:00| false| +-----------------------------+------------------------------------------+
本函数用于查找时间序列的范围异常。将根据提供的上界与下界,判断输入数据是否越界,即异常,并输出所有异常点为新的时间序列。
函数名: RANGE
输入序列: 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE
参数:
lower_bound:范围异常检测的下界。upper_bound:范围异常检测的上界。输出序列: 输出单个序列,类型与输入序列相同。
提示: 应满足upper_bound大于lower_bound,否则将不做输出。
输入序列:
+-----------------------------+---------------+ | Time|root.test.d1.s1| +-----------------------------+---------------+ |2020-01-01T00:00:02.000+08:00| 100.0| |2020-01-01T00:00:03.000+08:00| 101.0| |2020-01-01T00:00:04.000+08:00| 102.0| |2020-01-01T00:00:06.000+08:00| 104.0| |2020-01-01T00:00:08.000+08:00| 126.0| |2020-01-01T00:00:10.000+08:00| 108.0| |2020-01-01T00:00:14.000+08:00| 112.0| |2020-01-01T00:00:15.000+08:00| 113.0| |2020-01-01T00:00:16.000+08:00| 114.0| |2020-01-01T00:00:18.000+08:00| 116.0| |2020-01-01T00:00:20.000+08:00| 118.0| |2020-01-01T00:00:22.000+08:00| 120.0| |2020-01-01T00:00:26.000+08:00| 124.0| |2020-01-01T00:00:28.000+08:00| 126.0| |2020-01-01T00:00:30.000+08:00| NaN| +-----------------------------+---------------+
用于查询的 SQL 语句:
select range(s1,"lower_bound"="101.0","upper_bound"="125.0") from root.test.d1 where time <= 2020-01-01 00:00:30
输出序列:
+-----------------------------+------------------------------------------------------------------+ |Time |range(root.test.d1.s1,"lower_bound"="101.0","upper_bound"="125.0")| +-----------------------------+------------------------------------------------------------------+ |2020-01-01T00:00:02.000+08:00| 100.0| |2020-01-01T00:00:28.000+08:00| 126.0| +-----------------------------+------------------------------------------------------------------+
本函数基于双边窗口检测法对输入序列中的异常点进行过滤。
函数名: TWOSIDEDFILTER
输出序列: 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE
输出序列: 输出单个序列,类型与输入相同,是输入序列去除异常点后的结果。
参数:
len:双边窗口检测法中的窗口大小,取值范围为正整数,默认值为 5.如当len=3 时,算法向前、向后各取长度为3的窗口,在窗口中计算异常度。threshold:异常度的阈值,取值范围为(0,1),默认值为 0.3。阈值越高,函数对于异常度的判定标准越严格。输入序列:
+-----------------------------+------------+ | Time|root.test.s0| +-----------------------------+------------+ |1970-01-01T08:00:00.000+08:00| 2002.0| |1970-01-01T08:00:01.000+08:00| 1946.0| |1970-01-01T08:00:02.000+08:00| 1958.0| |1970-01-01T08:00:03.000+08:00| 2012.0| |1970-01-01T08:00:04.000+08:00| 2051.0| |1970-01-01T08:00:05.000+08:00| 1898.0| |1970-01-01T08:00:06.000+08:00| 2014.0| |1970-01-01T08:00:07.000+08:00| 2052.0| |1970-01-01T08:00:08.000+08:00| 1935.0| |1970-01-01T08:00:09.000+08:00| 1901.0| |1970-01-01T08:00:10.000+08:00| 1972.0| |1970-01-01T08:00:11.000+08:00| 1969.0| |1970-01-01T08:00:12.000+08:00| 1984.0| |1970-01-01T08:00:13.000+08:00| 2018.0| |1970-01-01T08:00:37.000+08:00| 1484.0| |1970-01-01T08:00:38.000+08:00| 1055.0| |1970-01-01T08:00:39.000+08:00| 1050.0| |1970-01-01T08:01:05.000+08:00| 1023.0| |1970-01-01T08:01:06.000+08:00| 1056.0| |1970-01-01T08:01:07.000+08:00| 978.0| |1970-01-01T08:01:08.000+08:00| 1050.0| |1970-01-01T08:01:09.000+08:00| 1123.0| |1970-01-01T08:01:10.000+08:00| 1150.0| |1970-01-01T08:01:11.000+08:00| 1034.0| |1970-01-01T08:01:12.000+08:00| 950.0| |1970-01-01T08:01:13.000+08:00| 1059.0| +-----------------------------+------------+
用于查询的 SQL 语句:
select TwoSidedFilter(s0, 'len'='5', 'threshold'='0.3') from root.test
输出序列:
+-----------------------------+------------+ | Time|root.test.s0| +-----------------------------+------------+ |1970-01-01T08:00:00.000+08:00| 2002.0| |1970-01-01T08:00:01.000+08:00| 1946.0| |1970-01-01T08:00:02.000+08:00| 1958.0| |1970-01-01T08:00:03.000+08:00| 2012.0| |1970-01-01T08:00:04.000+08:00| 2051.0| |1970-01-01T08:00:05.000+08:00| 1898.0| |1970-01-01T08:00:06.000+08:00| 2014.0| |1970-01-01T08:00:07.000+08:00| 2052.0| |1970-01-01T08:00:08.000+08:00| 1935.0| |1970-01-01T08:00:09.000+08:00| 1901.0| |1970-01-01T08:00:10.000+08:00| 1972.0| |1970-01-01T08:00:11.000+08:00| 1969.0| |1970-01-01T08:00:12.000+08:00| 1984.0| |1970-01-01T08:00:13.000+08:00| 2018.0| |1970-01-01T08:01:05.000+08:00| 1023.0| |1970-01-01T08:01:06.000+08:00| 1056.0| |1970-01-01T08:01:07.000+08:00| 978.0| |1970-01-01T08:01:08.000+08:00| 1050.0| |1970-01-01T08:01:09.000+08:00| 1123.0| |1970-01-01T08:01:10.000+08:00| 1150.0| |1970-01-01T08:01:11.000+08:00| 1034.0| |1970-01-01T08:01:12.000+08:00| 950.0| |1970-01-01T08:01:13.000+08:00| 1059.0| +-----------------------------+------------+
本函数用于检测基于距离的异常点。在当前窗口中,如果一个点距离阈值范围内的邻居数量(包括它自己)少于密度阈值,则该点是异常点。
函数名: OUTLIER
输入序列: 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。
参数:
r:基于距离异常检测中的距离阈值。k:基于距离异常检测中的密度阈值。w:用于指定滑动窗口的大小。s:用于指定滑动窗口的步长。输出序列:输出单个序列,类型与输入序列相同。
输入序列:
+-----------------------------+------------+ | Time|root.test.s1| +-----------------------------+------------+ |2020-01-04T23:59:55.000+08:00| 56.0| |2020-01-04T23:59:56.000+08:00| 55.1| |2020-01-04T23:59:57.000+08:00| 54.2| |2020-01-04T23:59:58.000+08:00| 56.3| |2020-01-04T23:59:59.000+08:00| 59.0| |2020-01-05T00:00:00.000+08:00| 60.0| |2020-01-05T00:00:01.000+08:00| 60.5| |2020-01-05T00:00:02.000+08:00| 64.5| |2020-01-05T00:00:03.000+08:00| 69.0| |2020-01-05T00:00:04.000+08:00| 64.2| |2020-01-05T00:00:05.000+08:00| 62.3| |2020-01-05T00:00:06.000+08:00| 58.0| |2020-01-05T00:00:07.000+08:00| 58.9| |2020-01-05T00:00:08.000+08:00| 52.0| |2020-01-05T00:00:09.000+08:00| 62.3| |2020-01-05T00:00:10.000+08:00| 61.0| |2020-01-05T00:00:11.000+08:00| 64.2| |2020-01-05T00:00:12.000+08:00| 61.8| |2020-01-05T00:00:13.000+08:00| 64.0| |2020-01-05T00:00:14.000+08:00| 63.0| +-----------------------------+------------+
用于查询的 SQL 语句:
select outlier(s1,"r"="5.0","k"="4","w"="10","s"="5") from root.test
输出序列:
+-----------------------------+--------------------------------------------------------+ | Time|outlier(root.test.s1,"r"="5.0","k"="4","w"="10","s"="5")| +-----------------------------+--------------------------------------------------------+ |2020-01-05T00:00:03.000+08:00| 69.0| +-----------------------------+--------------------------------------------------------+ |2020-01-05T00:00:08.000+08:00| 52.0| +-----------------------------+--------------------------------------------------------+
本函数基于主数据训练VAR预测模型。将根据提供的主数据判断时间序列中的数据点是否为错误值,并由连续p+1个非错误值作为训练样本训练VAR模型,输出训练后的模型参数。
函数名: MasterTrain
输入序列: 支持多个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。
参数:
p:模型阶数。eta:错误值判定阈值,在缺省情况下,算法根据3-sigma原则自动估计该参数。输出序列: 输出单个序列,类型为DOUBLE。
安装方式:
research/master-detector分支代码到本地mvn clean package -am -Dmaven.test.skip=true 编译项目./distribution/target/apache-iotdb-1.2.0-SNAPSHOT-library-udf-bin/apache-iotdb-1.2.0-SNAPSHOT-library-udf-bin/ext/udf/library-udf.jar复制到IoTDB服务器的./ext/udf/ 路径下。create function MasterTrain as org.apache.iotdb.library.anomaly.UDTFMasterTrain'。输入序列:
+-----------------------------+------------+------------+--------------+--------------+ | Time|root.test.lo|root.test.la|root.test.m_la|root.test.m_lo| +-----------------------------+------------+------------+--------------+--------------+ |1970-01-01T08:00:00.001+08:00| 39.99982556| 116.327274| 116.3271939| 39.99984748| |1970-01-01T08:00:00.002+08:00| 39.99983865| 116.327305| 116.3272269| 39.99984748| |1970-01-01T08:00:00.003+08:00| 40.00019038| 116.3273291| 116.3272634| 39.99984769| |1970-01-01T08:00:00.004+08:00| 39.99982556| 116.327342| 116.3273015| 39.9998483| |1970-01-01T08:00:00.005+08:00| 39.99982991| 116.3273744| 116.327339| 39.99984892| |1970-01-01T08:00:00.006+08:00| 39.99982716| 116.3274117| 116.3273759| 39.99984892| |1970-01-01T08:00:00.007+08:00| 39.9998259| 116.3274396| 116.3274163| 39.99984953| |1970-01-01T08:00:00.008+08:00| 39.99982597| 116.3274668| 116.3274525| 39.99985014| |1970-01-01T08:00:00.009+08:00| 39.99982226| 116.3275026| 116.3274915| 39.99985076| |1970-01-01T08:00:00.010+08:00| 39.99980988| 116.3274967| 116.3275235| 39.99985137| |1970-01-01T08:00:00.011+08:00| 39.99984873| 116.3274929| 116.3275611| 39.99985199| |1970-01-01T08:00:00.012+08:00| 39.99981589| 116.3274745| 116.3275974| 39.9998526| |1970-01-01T08:00:00.013+08:00| 39.9998259| 116.3275095| 116.3276338| 39.99985384| |1970-01-01T08:00:00.014+08:00| 39.99984873| 116.3274787| 116.3276695| 39.99985446| |1970-01-01T08:00:00.015+08:00| 39.9998343| 116.3274693| 116.3277045| 39.99985569| |1970-01-01T08:00:00.016+08:00| 39.99983316| 116.3274941| 116.3277389| 39.99985631| |1970-01-01T08:00:00.017+08:00| 39.99983311| 116.3275401| 116.3277747| 39.99985693| |1970-01-01T08:00:00.018+08:00| 39.99984113| 116.3275713| 116.3278041| 39.99985756| |1970-01-01T08:00:00.019+08:00| 39.99983602| 116.3276003| 116.3278379| 39.99985818| |1970-01-01T08:00:00.020+08:00| 39.9998355| 116.3276308| 116.3278723| 39.9998588| |1970-01-01T08:00:00.021+08:00| 40.00012176| 116.3276107| 116.3279026| 39.99985942| |1970-01-01T08:00:00.022+08:00| 39.9998404| 116.3276684| null| null| |1970-01-01T08:00:00.023+08:00| 39.99983942| 116.3277016| null| null| |1970-01-01T08:00:00.024+08:00| 39.99984113| 116.3277284| null| null| |1970-01-01T08:00:00.025+08:00| 39.99984283| 116.3277562| null| null| +-----------------------------+------------+------------+--------------+--------------+
用于查询的 SQL 语句:
select MasterTrain(lo,la,m_lo,m_la,'p'='3','eta'='1.0') from root.test
输出序列:
+-----------------------------+---------------------------------------------------------------------------------------------+ | Time|MasterTrain(root.test.lo, root.test.la, root.test.m_lo, root.test.m_la, "p"="3", "eta"="1.0")| +-----------------------------+---------------------------------------------------------------------------------------------+ |1970-01-01T08:00:00.001+08:00| 0.13656607660463288| |1970-01-01T08:00:00.002+08:00| 0.8291884323013894| |1970-01-01T08:00:00.003+08:00| 0.05012816073171693| |1970-01-01T08:00:00.004+08:00| -0.5495287787485761| |1970-01-01T08:00:00.005+08:00| 0.03740486307345578| |1970-01-01T08:00:00.006+08:00| 1.0500132150475212| |1970-01-01T08:00:00.007+08:00| 0.04583944643116993| |1970-01-01T08:00:00.008+08:00| -0.07863708480736269| +-----------------------------+---------------------------------------------------------------------------------------------+
本函数基于主数据检测并修复时间序列中的错误值。将根据提供的主数据判断时间序列中的数据点是否为错误值,并由MasterTrain训练的模型进行时间序列预测,错误值将由预测值及主数据共同修复。
函数名: MasterDetect
输入序列: 支持多个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。
参数:
p:模型阶数。k:主数据中的近邻数量,正整数, 在缺省情况下,算法根据主数据中的k个近邻的元组距离自动估计该参数。eta:错误值判定阈值,在缺省情况下,算法根据3-sigma原则自动估计该参数。beta:异常值判定阈值,在缺省情况下,算法根据3-sigma原则自动估计该参数。output_type:输出结果类型,可选‘repair’或‘anomaly’,即输出修复结果或异常检测结果,在缺省情况下默认为‘repair’。output_column:输出列的序号,默认为1,即输出第一列的修复结果。安装方式:
research/master-detector分支代码到本地mvn clean package -am -Dmaven.test.skip=true 编译项目./distribution/target/apache-iotdb-1.2.0-SNAPSHOT-library-udf-bin/apache-iotdb-1.2.0-SNAPSHOT-library-udf-bin/ext/udf/library-udf.jar复制到IoTDB服务器的./ext/udf/ 路径下。create function MasterDetect as 'org.apache.iotdb.library.anomaly.UDTFMasterDetect'。输出序列: 输出单个序列,类型与输入数据中对应列的类型相同,序列为输入列修复后的结果。
输入序列:
+-----------------------------+------------+------------+--------------+--------------+--------------------+ | Time|root.test.lo|root.test.la|root.test.m_la|root.test.m_lo| root.test.model| +-----------------------------+------------+------------+--------------+--------------+--------------------+ |1970-01-01T08:00:00.001+08:00| 39.99982556| 116.327274| 116.3271939| 39.99984748| 0.13656607660463288| |1970-01-01T08:00:00.002+08:00| 39.99983865| 116.327305| 116.3272269| 39.99984748| 0.8291884323013894| |1970-01-01T08:00:00.003+08:00| 40.00019038| 116.3273291| 116.3272634| 39.99984769| 0.05012816073171693| |1970-01-01T08:00:00.004+08:00| 39.99982556| 116.327342| 116.3273015| 39.9998483| -0.5495287787485761| |1970-01-01T08:00:00.005+08:00| 39.99982991| 116.3273744| 116.327339| 39.99984892| 0.03740486307345578| |1970-01-01T08:00:00.006+08:00| 39.99982716| 116.3274117| 116.3273759| 39.99984892| 1.0500132150475212| |1970-01-01T08:00:00.007+08:00| 39.9998259| 116.3274396| 116.3274163| 39.99984953| 0.04583944643116993| |1970-01-01T08:00:00.008+08:00| 39.99982597| 116.3274668| 116.3274525| 39.99985014|-0.07863708480736269| |1970-01-01T08:00:00.009+08:00| 39.99982226| 116.3275026| 116.3274915| 39.99985076| null| |1970-01-01T08:00:00.010+08:00| 39.99980988| 116.3274967| 116.3275235| 39.99985137| null| |1970-01-01T08:00:00.011+08:00| 39.99984873| 116.3274929| 116.3275611| 39.99985199| null| |1970-01-01T08:00:00.012+08:00| 39.99981589| 116.3274745| 116.3275974| 39.9998526| null| |1970-01-01T08:00:00.013+08:00| 39.9998259| 116.3275095| 116.3276338| 39.99985384| null| |1970-01-01T08:00:00.014+08:00| 39.99984873| 116.3274787| 116.3276695| 39.99985446| null| |1970-01-01T08:00:00.015+08:00| 39.9998343| 116.3274693| 116.3277045| 39.99985569| null| |1970-01-01T08:00:00.016+08:00| 39.99983316| 116.3274941| 116.3277389| 39.99985631| null| |1970-01-01T08:00:00.017+08:00| 39.99983311| 116.3275401| 116.3277747| 39.99985693| null| |1970-01-01T08:00:00.018+08:00| 39.99984113| 116.3275713| 116.3278041| 39.99985756| null| |1970-01-01T08:00:00.019+08:00| 39.99983602| 116.3276003| 116.3278379| 39.99985818| null| |1970-01-01T08:00:00.020+08:00| 39.9998355| 116.3276308| 116.3278723| 39.9998588| null| |1970-01-01T08:00:00.021+08:00| 40.00012176| 116.3276107| 116.3279026| 39.99985942| null| |1970-01-01T08:00:00.022+08:00| 39.9998404| 116.3276684| null| null| null| |1970-01-01T08:00:00.023+08:00| 39.99983942| 116.3277016| null| null| null| |1970-01-01T08:00:00.024+08:00| 39.99984113| 116.3277284| null| null| null| |1970-01-01T08:00:00.025+08:00| 39.99984283| 116.3277562| null| null| null| +-----------------------------+------------+------------+--------------+--------------+--------------------+
用于查询的 SQL 语句:
select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0') from root.test
输出序列:
+-----------------------------+--------------------------------------------------------------------------------------+ | Time|MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0')| +-----------------------------+--------------------------------------------------------------------------------------+ |1970-01-01T08:00:00.001+08:00| 116.327274| |1970-01-01T08:00:00.002+08:00| 116.327305| |1970-01-01T08:00:00.003+08:00| 116.3273291| |1970-01-01T08:00:00.004+08:00| 116.327342| |1970-01-01T08:00:00.005+08:00| 116.3273744| |1970-01-01T08:00:00.006+08:00| 116.3274117| |1970-01-01T08:00:00.007+08:00| 116.3274396| |1970-01-01T08:00:00.008+08:00| 116.3274668| |1970-01-01T08:00:00.009+08:00| 116.3275026| |1970-01-01T08:00:00.010+08:00| 116.3274967| |1970-01-01T08:00:00.011+08:00| 116.3274929| |1970-01-01T08:00:00.012+08:00| 116.3274745| |1970-01-01T08:00:00.013+08:00| 116.3275095| |1970-01-01T08:00:00.014+08:00| 116.3274787| |1970-01-01T08:00:00.015+08:00| 116.3274693| |1970-01-01T08:00:00.016+08:00| 116.3274941| |1970-01-01T08:00:00.017+08:00| 116.3275401| |1970-01-01T08:00:00.018+08:00| 116.3275713| |1970-01-01T08:00:00.019+08:00| 116.3276003| |1970-01-01T08:00:00.020+08:00| 116.3276308| |1970-01-01T08:00:00.021+08:00| 116.3276338| |1970-01-01T08:00:00.022+08:00| 116.3276684| |1970-01-01T08:00:00.023+08:00| 116.3277016| |1970-01-01T08:00:00.024+08:00| 116.3277284| |1970-01-01T08:00:00.025+08:00| 116.3277562| +-----------------------------+--------------------------------------------------------------------------------------+
用于查询的 SQL 语句:
select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0') from root.test
输出序列:
+-----------------------------+---------------------------------------------------------------------------------------+ | Time|MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0')| +-----------------------------+---------------------------------------------------------------------------------------+ |1970-01-01T08:00:00.001+08:00| false| |1970-01-01T08:00:00.002+08:00| false| |1970-01-01T08:00:00.003+08:00| false| |1970-01-01T08:00:00.004+08:00| false| |1970-01-01T08:00:00.005+08:00| true| |1970-01-01T08:00:00.006+08:00| false| |1970-01-01T08:00:00.007+08:00| false| |1970-01-01T08:00:00.008+08:00| false| |1970-01-01T08:00:00.009+08:00| false| |1970-01-01T08:00:00.010+08:00| false| |1970-01-01T08:00:00.011+08:00| false| |1970-01-01T08:00:00.012+08:00| false| |1970-01-01T08:00:00.013+08:00| false| |1970-01-01T08:00:00.014+08:00| true| |1970-01-01T08:00:00.015+08:00| false| |1970-01-01T08:00:00.016+08:00| false| |1970-01-01T08:00:00.017+08:00| false| |1970-01-01T08:00:00.018+08:00| false| |1970-01-01T08:00:00.019+08:00| false| |1970-01-01T08:00:00.020+08:00| false| |1970-01-01T08:00:00.021+08:00| false| |1970-01-01T08:00:00.022+08:00| false| |1970-01-01T08:00:00.023+08:00| false| |1970-01-01T08:00:00.024+08:00| false| |1970-01-01T08:00:00.025+08:00| false| +-----------------------------+---------------------------------------------------------------------------------------+