This function is used to detect anomalies based on IQR. Points distributing beyond 1.5 times IQR are selected.
Name: IQR
Input Series: Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE.
method: When set to “batch”, anomaly test is conducted after importing all data points; when set to “stream”, it is required to provide upper and lower quantiles. The default method is “batch”.q1: The lower quantile when method is set to “stream”.q3: The upper quantile when method is set to “stream”.Output Series: Output a single series. The type is DOUBLE.
Note: $IQR=Q_3-Q_1$
Input series:
+-----------------------------+------------+ | Time|root.test.s1| +-----------------------------+------------+ |1970-01-01T08:00:00.100+08:00| 0.0| |1970-01-01T08:00:00.200+08:00| 0.0| |1970-01-01T08:00:00.300+08:00| 1.0| |1970-01-01T08:00:00.400+08:00| -1.0| |1970-01-01T08:00:00.500+08:00| 0.0| |1970-01-01T08:00:00.600+08:00| 0.0| |1970-01-01T08:00:00.700+08:00| -2.0| |1970-01-01T08:00:00.800+08:00| 2.0| |1970-01-01T08:00:00.900+08:00| 0.0| |1970-01-01T08:00:01.000+08:00| 0.0| |1970-01-01T08:00:01.100+08:00| 1.0| |1970-01-01T08:00:01.200+08:00| -1.0| |1970-01-01T08:00:01.300+08:00| -1.0| |1970-01-01T08:00:01.400+08:00| 1.0| |1970-01-01T08:00:01.500+08:00| 0.0| |1970-01-01T08:00:01.600+08:00| 0.0| |1970-01-01T08:00:01.700+08:00| 10.0| |1970-01-01T08:00:01.800+08:00| 2.0| |1970-01-01T08:00:01.900+08:00| -2.0| |1970-01-01T08:00:02.000+08:00| 0.0| +-----------------------------+------------+
SQL for query:
select iqr(s1) from root.test
Output series:
+-----------------------------+-----------------+ | Time|iqr(root.test.s1)| +-----------------------------+-----------------+ |1970-01-01T08:00:01.700+08:00| 10.0| +-----------------------------+-----------------+
This function is used to detect anomalies based on the Dynamic K-Sigma Algorithm. Within a sliding window, the input value with a deviation of more than k times the standard deviation from the average will be output as anomaly.
Name: KSIGMA
Input Series: Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE.
k: How many times to multiply on standard deviation to define anomaly, the default value is 3.window: The window size of Dynamic K-Sigma Algorithm, the default value is 10000.Output Series: Output a single series. The type is same as input series.
Note: Only when is larger than 0, the anomaly detection will be performed. Otherwise, nothing will be output.
Input series:
+-----------------------------+---------------+ | Time|root.test.d1.s1| +-----------------------------+---------------+ |2020-01-01T00:00:02.000+08:00| 0.0| |2020-01-01T00:00:03.000+08:00| 50.0| |2020-01-01T00:00:04.000+08:00| 100.0| |2020-01-01T00:00:06.000+08:00| 150.0| |2020-01-01T00:00:08.000+08:00| 200.0| |2020-01-01T00:00:10.000+08:00| 200.0| |2020-01-01T00:00:14.000+08:00| 200.0| |2020-01-01T00:00:15.000+08:00| 200.0| |2020-01-01T00:00:16.000+08:00| 200.0| |2020-01-01T00:00:18.000+08:00| 200.0| |2020-01-01T00:00:20.000+08:00| 150.0| |2020-01-01T00:00:22.000+08:00| 100.0| |2020-01-01T00:00:26.000+08:00| 50.0| |2020-01-01T00:00:28.000+08:00| 0.0| |2020-01-01T00:00:30.000+08:00| NaN| +-----------------------------+---------------+
SQL for query:
select ksigma(s1,"k"="1.0") from root.test.d1 where time <= 2020-01-01 00:00:30
Output series:
+-----------------------------+---------------------------------+ |Time |ksigma(root.test.d1.s1,"k"="3.0")| +-----------------------------+---------------------------------+ |2020-01-01T00:00:02.000+08:00| 0.0| |2020-01-01T00:00:03.000+08:00| 50.0| |2020-01-01T00:00:26.000+08:00| 50.0| |2020-01-01T00:00:28.000+08:00| 0.0| +-----------------------------+---------------------------------+
This function is used to detect density anomaly of time series. According to k-th distance calculation parameter and local outlier factor (lof) threshold, the function judges if a set of input values is an density anomaly, and a bool mark of anomaly values will be output.
Name: LOF
Input Series: Multiple input series. The type is INT32 / INT64 / FLOAT / DOUBLE.
method:assign a detection method. The default value is “default”, when input data has multiple dimensions. The alternative is “series”, when a input series will be transformed to high dimension.k:use the k-th distance to calculate lof. Default value is 3.window: size of window to split origin data points. Default value is 10000.windowsize:dimension that will be transformed into when method is “series”. The default value is 5.Output Series: Output a single series. The type is DOUBLE.
Note: Incomplete rows will be ignored. They are neither calculated nor marked as anomaly.
Input series:
+-----------------------------+---------------+---------------+ | Time|root.test.d1.s1|root.test.d1.s2| +-----------------------------+---------------+---------------+ |1970-01-01T08:00:00.100+08:00| 0.0| 0.0| |1970-01-01T08:00:00.200+08:00| 0.0| 1.0| |1970-01-01T08:00:00.300+08:00| 1.0| 1.0| |1970-01-01T08:00:00.400+08:00| 1.0| 0.0| |1970-01-01T08:00:00.500+08:00| 0.0| -1.0| |1970-01-01T08:00:00.600+08:00| -1.0| -1.0| |1970-01-01T08:00:00.700+08:00| -1.0| 0.0| |1970-01-01T08:00:00.800+08:00| 2.0| 2.0| |1970-01-01T08:00:00.900+08:00| 0.0| null| +-----------------------------+---------------+---------------+
SQL for query:
select lof(s1,s2) from root.test.d1 where time<1000
Output series:
+-----------------------------+-------------------------------------+ | Time|lof(root.test.d1.s1, root.test.d1.s2)| +-----------------------------+-------------------------------------+ |1970-01-01T08:00:00.100+08:00| 3.8274824267668244| |1970-01-01T08:00:00.200+08:00| 3.0117631741126156| |1970-01-01T08:00:00.300+08:00| 2.838155437762879| |1970-01-01T08:00:00.400+08:00| 3.0117631741126156| |1970-01-01T08:00:00.500+08:00| 2.73518261244453| |1970-01-01T08:00:00.600+08:00| 2.371440975708148| |1970-01-01T08:00:00.700+08:00| 2.73518261244453| |1970-01-01T08:00:00.800+08:00| 1.7561416374270742| +-----------------------------+-------------------------------------+
Input series:
+-----------------------------+---------------+ | Time|root.test.d1.s1| +-----------------------------+---------------+ |1970-01-01T08:00:00.100+08:00| 1.0| |1970-01-01T08:00:00.200+08:00| 2.0| |1970-01-01T08:00:00.300+08:00| 3.0| |1970-01-01T08:00:00.400+08:00| 4.0| |1970-01-01T08:00:00.500+08:00| 5.0| |1970-01-01T08:00:00.600+08:00| 6.0| |1970-01-01T08:00:00.700+08:00| 7.0| |1970-01-01T08:00:00.800+08:00| 8.0| |1970-01-01T08:00:00.900+08:00| 9.0| |1970-01-01T08:00:01.000+08:00| 10.0| |1970-01-01T08:00:01.100+08:00| 11.0| |1970-01-01T08:00:01.200+08:00| 12.0| |1970-01-01T08:00:01.300+08:00| 13.0| |1970-01-01T08:00:01.400+08:00| 14.0| |1970-01-01T08:00:01.500+08:00| 15.0| |1970-01-01T08:00:01.600+08:00| 16.0| |1970-01-01T08:00:01.700+08:00| 17.0| |1970-01-01T08:00:01.800+08:00| 18.0| |1970-01-01T08:00:01.900+08:00| 19.0| |1970-01-01T08:00:02.000+08:00| 20.0| +-----------------------------+---------------+
SQL for query:
select lof(s1, "method"="series") from root.test.d1 where time<1000
Output series:
+-----------------------------+--------------------+ | Time|lof(root.test.d1.s1)| +-----------------------------+--------------------+ |1970-01-01T08:00:00.100+08:00| 3.77777777777778| |1970-01-01T08:00:00.200+08:00| 4.32727272727273| |1970-01-01T08:00:00.300+08:00| 4.85714285714286| |1970-01-01T08:00:00.400+08:00| 5.40909090909091| |1970-01-01T08:00:00.500+08:00| 5.94999999999999| |1970-01-01T08:00:00.600+08:00| 6.43243243243243| |1970-01-01T08:00:00.700+08:00| 6.79999999999999| |1970-01-01T08:00:00.800+08:00| 7.0| |1970-01-01T08:00:00.900+08:00| 7.0| |1970-01-01T08:00:01.000+08:00| 6.79999999999999| |1970-01-01T08:00:01.100+08:00| 6.43243243243243| |1970-01-01T08:00:01.200+08:00| 5.94999999999999| |1970-01-01T08:00:01.300+08:00| 5.40909090909091| |1970-01-01T08:00:01.400+08:00| 4.85714285714286| |1970-01-01T08:00:01.500+08:00| 4.32727272727273| |1970-01-01T08:00:01.600+08:00| 3.77777777777778| +-----------------------------+--------------------+
This function is used to detect missing anomalies. In some datasets, missing values are filled by linear interpolation. Thus, there are several long perfect linear segments. By discovering these perfect linear segments, missing anomalies are detected.
Name: MISSDETECT
Input Series: Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE.
Parameter:
error: The minimum length of the detected missing anomalies, which is an integer greater than or equal to 10. By default, it is 10.
Output Series: Output a single series. The type is BOOLEAN. Each data point which is miss anomaly will be labeled as true.
Input series:
+-----------------------------+---------------+ | Time|root.test.d2.s2| +-----------------------------+---------------+ |2021-07-01T12:00:00.000+08:00| 0.0| |2021-07-01T12:00:01.000+08:00| 1.0| |2021-07-01T12:00:02.000+08:00| 0.0| |2021-07-01T12:00:03.000+08:00| 1.0| |2021-07-01T12:00:04.000+08:00| 0.0| |2021-07-01T12:00:05.000+08:00| 0.0| |2021-07-01T12:00:06.000+08:00| 0.0| |2021-07-01T12:00:07.000+08:00| 0.0| |2021-07-01T12:00:08.000+08:00| 0.0| |2021-07-01T12:00:09.000+08:00| 0.0| |2021-07-01T12:00:10.000+08:00| 0.0| |2021-07-01T12:00:11.000+08:00| 0.0| |2021-07-01T12:00:12.000+08:00| 0.0| |2021-07-01T12:00:13.000+08:00| 0.0| |2021-07-01T12:00:14.000+08:00| 0.0| |2021-07-01T12:00:15.000+08:00| 0.0| |2021-07-01T12:00:16.000+08:00| 1.0| |2021-07-01T12:00:17.000+08:00| 0.0| |2021-07-01T12:00:18.000+08:00| 1.0| |2021-07-01T12:00:19.000+08:00| 0.0| |2021-07-01T12:00:20.000+08:00| 1.0| +-----------------------------+---------------+
SQL for query:
select missdetect(s2,'minlen'='10') from root.test.d2
Output series:
+-----------------------------+------------------------------------------+ | Time|missdetect(root.test.d2.s2, "minlen"="10")| +-----------------------------+------------------------------------------+ |2021-07-01T12:00:00.000+08:00| false| |2021-07-01T12:00:01.000+08:00| false| |2021-07-01T12:00:02.000+08:00| false| |2021-07-01T12:00:03.000+08:00| false| |2021-07-01T12:00:04.000+08:00| true| |2021-07-01T12:00:05.000+08:00| true| |2021-07-01T12:00:06.000+08:00| true| |2021-07-01T12:00:07.000+08:00| true| |2021-07-01T12:00:08.000+08:00| true| |2021-07-01T12:00:09.000+08:00| true| |2021-07-01T12:00:10.000+08:00| true| |2021-07-01T12:00:11.000+08:00| true| |2021-07-01T12:00:12.000+08:00| true| |2021-07-01T12:00:13.000+08:00| true| |2021-07-01T12:00:14.000+08:00| true| |2021-07-01T12:00:15.000+08:00| true| |2021-07-01T12:00:16.000+08:00| false| |2021-07-01T12:00:17.000+08:00| false| |2021-07-01T12:00:18.000+08:00| false| |2021-07-01T12:00:19.000+08:00| false| |2021-07-01T12:00:20.000+08:00| false| +-----------------------------+------------------------------------------+
This function is used to detect range anomaly of time series. According to upper bound and lower bound parameters, the function judges if a input value is beyond range, aka range anomaly, and a new time series of anomaly will be output.
Name: RANGE
Input Series: Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE.
lower_bound:lower bound of range anomaly detection.upper_bound:upper bound of range anomaly detection.Output Series: Output a single series. The type is the same as the input.
Note: Only when upper_bound is larger than lower_bound, the anomaly detection will be performed. Otherwise, nothing will be output.
Input series:
+-----------------------------+---------------+ | Time|root.test.d1.s1| +-----------------------------+---------------+ |2020-01-01T00:00:02.000+08:00| 100.0| |2020-01-01T00:00:03.000+08:00| 101.0| |2020-01-01T00:00:04.000+08:00| 102.0| |2020-01-01T00:00:06.000+08:00| 104.0| |2020-01-01T00:00:08.000+08:00| 126.0| |2020-01-01T00:00:10.000+08:00| 108.0| |2020-01-01T00:00:14.000+08:00| 112.0| |2020-01-01T00:00:15.000+08:00| 113.0| |2020-01-01T00:00:16.000+08:00| 114.0| |2020-01-01T00:00:18.000+08:00| 116.0| |2020-01-01T00:00:20.000+08:00| 118.0| |2020-01-01T00:00:22.000+08:00| 120.0| |2020-01-01T00:00:26.000+08:00| 124.0| |2020-01-01T00:00:28.000+08:00| 126.0| |2020-01-01T00:00:30.000+08:00| NaN| +-----------------------------+---------------+
SQL for query:
select range(s1,"lower_bound"="101.0","upper_bound"="125.0") from root.test.d1 where time <= 2020-01-01 00:00:30
Output series:
+-----------------------------+------------------------------------------------------------------+ |Time |range(root.test.d1.s1,"lower_bound"="101.0","upper_bound"="125.0")| +-----------------------------+------------------------------------------------------------------+ |2020-01-01T00:00:02.000+08:00| 100.0| |2020-01-01T00:00:28.000+08:00| 126.0| +-----------------------------+------------------------------------------------------------------+
The function is used to filter anomalies of a numeric time series based on two-sided window detection.
Name: TWOSIDEDFILTER
Input Series: Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE
Output Series: Output a single series. The type is the same as the input. It is the input without anomalies.
Parameter:
len: The size of the window, which is a positive integer. By default, it's 5. When len=3, the algorithm detects forward window and backward window with length 3 and calculates the outlierness of the current point.
threshold: The threshold of outlierness, which is a floating number in (0,1). By default, it's 0.3. The strict standard of detecting anomalies is in proportion to the threshold.
Input series:
+-----------------------------+------------+ | Time|root.test.s0| +-----------------------------+------------+ |1970-01-01T08:00:00.000+08:00| 2002.0| |1970-01-01T08:00:01.000+08:00| 1946.0| |1970-01-01T08:00:02.000+08:00| 1958.0| |1970-01-01T08:00:03.000+08:00| 2012.0| |1970-01-01T08:00:04.000+08:00| 2051.0| |1970-01-01T08:00:05.000+08:00| 1898.0| |1970-01-01T08:00:06.000+08:00| 2014.0| |1970-01-01T08:00:07.000+08:00| 2052.0| |1970-01-01T08:00:08.000+08:00| 1935.0| |1970-01-01T08:00:09.000+08:00| 1901.0| |1970-01-01T08:00:10.000+08:00| 1972.0| |1970-01-01T08:00:11.000+08:00| 1969.0| |1970-01-01T08:00:12.000+08:00| 1984.0| |1970-01-01T08:00:13.000+08:00| 2018.0| |1970-01-01T08:00:37.000+08:00| 1484.0| |1970-01-01T08:00:38.000+08:00| 1055.0| |1970-01-01T08:00:39.000+08:00| 1050.0| |1970-01-01T08:01:05.000+08:00| 1023.0| |1970-01-01T08:01:06.000+08:00| 1056.0| |1970-01-01T08:01:07.000+08:00| 978.0| |1970-01-01T08:01:08.000+08:00| 1050.0| |1970-01-01T08:01:09.000+08:00| 1123.0| |1970-01-01T08:01:10.000+08:00| 1150.0| |1970-01-01T08:01:11.000+08:00| 1034.0| |1970-01-01T08:01:12.000+08:00| 950.0| |1970-01-01T08:01:13.000+08:00| 1059.0| +-----------------------------+------------+
SQL for query:
select TwoSidedFilter(s0, 'len'='5', 'threshold'='0.3') from root.test
Output series:
+-----------------------------+------------+ | Time|root.test.s0| +-----------------------------+------------+ |1970-01-01T08:00:00.000+08:00| 2002.0| |1970-01-01T08:00:01.000+08:00| 1946.0| |1970-01-01T08:00:02.000+08:00| 1958.0| |1970-01-01T08:00:03.000+08:00| 2012.0| |1970-01-01T08:00:04.000+08:00| 2051.0| |1970-01-01T08:00:05.000+08:00| 1898.0| |1970-01-01T08:00:06.000+08:00| 2014.0| |1970-01-01T08:00:07.000+08:00| 2052.0| |1970-01-01T08:00:08.000+08:00| 1935.0| |1970-01-01T08:00:09.000+08:00| 1901.0| |1970-01-01T08:00:10.000+08:00| 1972.0| |1970-01-01T08:00:11.000+08:00| 1969.0| |1970-01-01T08:00:12.000+08:00| 1984.0| |1970-01-01T08:00:13.000+08:00| 2018.0| |1970-01-01T08:01:05.000+08:00| 1023.0| |1970-01-01T08:01:06.000+08:00| 1056.0| |1970-01-01T08:01:07.000+08:00| 978.0| |1970-01-01T08:01:08.000+08:00| 1050.0| |1970-01-01T08:01:09.000+08:00| 1123.0| |1970-01-01T08:01:10.000+08:00| 1150.0| |1970-01-01T08:01:11.000+08:00| 1034.0| |1970-01-01T08:01:12.000+08:00| 950.0| |1970-01-01T08:01:13.000+08:00| 1059.0| +-----------------------------+------------+
This function is used to detect distance-based outliers. For each point in the current window, if the number of its neighbors within the distance of neighbor distance threshold is less than the neighbor count threshold, the point in detected as an outlier.
Name: OUTLIER
Input Series: Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE.
r:the neighbor distance threshold.k:the neighbor count threshold.w:the window size.s:the slide size.Output Series: Output a single series. The type is the same as the input.
Input series:
+-----------------------------+------------+ | Time|root.test.s1| +-----------------------------+------------+ |2020-01-04T23:59:55.000+08:00| 56.0| |2020-01-04T23:59:56.000+08:00| 55.1| |2020-01-04T23:59:57.000+08:00| 54.2| |2020-01-04T23:59:58.000+08:00| 56.3| |2020-01-04T23:59:59.000+08:00| 59.0| |2020-01-05T00:00:00.000+08:00| 60.0| |2020-01-05T00:00:01.000+08:00| 60.5| |2020-01-05T00:00:02.000+08:00| 64.5| |2020-01-05T00:00:03.000+08:00| 69.0| |2020-01-05T00:00:04.000+08:00| 64.2| |2020-01-05T00:00:05.000+08:00| 62.3| |2020-01-05T00:00:06.000+08:00| 58.0| |2020-01-05T00:00:07.000+08:00| 58.9| |2020-01-05T00:00:08.000+08:00| 52.0| |2020-01-05T00:00:09.000+08:00| 62.3| |2020-01-05T00:00:10.000+08:00| 61.0| |2020-01-05T00:00:11.000+08:00| 64.2| |2020-01-05T00:00:12.000+08:00| 61.8| |2020-01-05T00:00:13.000+08:00| 64.0| |2020-01-05T00:00:14.000+08:00| 63.0| +-----------------------------+------------+
SQL for query:
select outlier(s1,"r"="5.0","k"="4","w"="10","s"="5") from root.test
Output series:
+-----------------------------+--------------------------------------------------------+ | Time|outlier(root.test.s1,"r"="5.0","k"="4","w"="10","s"="5")| +-----------------------------+--------------------------------------------------------+ |2020-01-05T00:00:03.000+08:00| 69.0| +-----------------------------+--------------------------------------------------------+ |2020-01-05T00:00:08.000+08:00| 52.0| +-----------------------------+--------------------------------------------------------+
This function is used to train the VAR model based on master data. The model is trained on learning samples consisting of p+1 consecutive non-error points.
Name: MasterTrain
Input Series: Support multiple input series. The types are are in INT32 / INT64 / FLOAT / DOUBLE.
Parameters:
p: The order of the model.eta: The distance threshold. By default, it will be estimated based on the 3-sigma rule.Output Series: Output a single series. The type is the same as the input.
Installation
research/master-detector.mvn clean package -am -Dmaven.test.skip=true../distribution/target/apache-iotdb-1.2.0-SNAPSHOT-library-udf-bin/apache-iotdb-1.2.0-SNAPSHOT-library-udf-bin/ext/udf/library-udf.jar to ./ext/udf/.create function MasterTrain as 'org.apache.iotdb.library.anomaly.UDTFMasterTrain' in client.Input series:
+-----------------------------+------------+------------+--------------+--------------+ | Time|root.test.lo|root.test.la|root.test.m_la|root.test.m_lo| +-----------------------------+------------+------------+--------------+--------------+ |1970-01-01T08:00:00.001+08:00| 39.99982556| 116.327274| 116.3271939| 39.99984748| |1970-01-01T08:00:00.002+08:00| 39.99983865| 116.327305| 116.3272269| 39.99984748| |1970-01-01T08:00:00.003+08:00| 40.00019038| 116.3273291| 116.3272634| 39.99984769| |1970-01-01T08:00:00.004+08:00| 39.99982556| 116.327342| 116.3273015| 39.9998483| |1970-01-01T08:00:00.005+08:00| 39.99982991| 116.3273744| 116.327339| 39.99984892| |1970-01-01T08:00:00.006+08:00| 39.99982716| 116.3274117| 116.3273759| 39.99984892| |1970-01-01T08:00:00.007+08:00| 39.9998259| 116.3274396| 116.3274163| 39.99984953| |1970-01-01T08:00:00.008+08:00| 39.99982597| 116.3274668| 116.3274525| 39.99985014| |1970-01-01T08:00:00.009+08:00| 39.99982226| 116.3275026| 116.3274915| 39.99985076| |1970-01-01T08:00:00.010+08:00| 39.99980988| 116.3274967| 116.3275235| 39.99985137| |1970-01-01T08:00:00.011+08:00| 39.99984873| 116.3274929| 116.3275611| 39.99985199| |1970-01-01T08:00:00.012+08:00| 39.99981589| 116.3274745| 116.3275974| 39.9998526| |1970-01-01T08:00:00.013+08:00| 39.9998259| 116.3275095| 116.3276338| 39.99985384| |1970-01-01T08:00:00.014+08:00| 39.99984873| 116.3274787| 116.3276695| 39.99985446| |1970-01-01T08:00:00.015+08:00| 39.9998343| 116.3274693| 116.3277045| 39.99985569| |1970-01-01T08:00:00.016+08:00| 39.99983316| 116.3274941| 116.3277389| 39.99985631| |1970-01-01T08:00:00.017+08:00| 39.99983311| 116.3275401| 116.3277747| 39.99985693| |1970-01-01T08:00:00.018+08:00| 39.99984113| 116.3275713| 116.3278041| 39.99985756| |1970-01-01T08:00:00.019+08:00| 39.99983602| 116.3276003| 116.3278379| 39.99985818| |1970-01-01T08:00:00.020+08:00| 39.9998355| 116.3276308| 116.3278723| 39.9998588| |1970-01-01T08:00:00.021+08:00| 40.00012176| 116.3276107| 116.3279026| 39.99985942| |1970-01-01T08:00:00.022+08:00| 39.9998404| 116.3276684| null| null| |1970-01-01T08:00:00.023+08:00| 39.99983942| 116.3277016| null| null| |1970-01-01T08:00:00.024+08:00| 39.99984113| 116.3277284| null| null| |1970-01-01T08:00:00.025+08:00| 39.99984283| 116.3277562| null| null| +-----------------------------+------------+------------+--------------+--------------+
SQL for query:
select MasterTrain(lo,la,m_lo,m_la,'p'='3','eta'='1.0') from root.test
Output series:
+-----------------------------+---------------------------------------------------------------------------------------------+ | Time|MasterTrain(root.test.lo, root.test.la, root.test.m_lo, root.test.m_la, "p"="3", "eta"="1.0")| +-----------------------------+---------------------------------------------------------------------------------------------+ |1970-01-01T08:00:00.001+08:00| 0.13656607660463288| |1970-01-01T08:00:00.002+08:00| 0.8291884323013894| |1970-01-01T08:00:00.003+08:00| 0.05012816073171693| |1970-01-01T08:00:00.004+08:00| -0.5495287787485761| |1970-01-01T08:00:00.005+08:00| 0.03740486307345578| |1970-01-01T08:00:00.006+08:00| 1.0500132150475212| |1970-01-01T08:00:00.007+08:00| 0.04583944643116993| |1970-01-01T08:00:00.008+08:00| -0.07863708480736269| +-----------------------------+---------------------------------------------------------------------------------------------+
This function is used to detect time series and repair errors based on master data. The VAR model is trained by MasterTrain.
Name: MasterDetect
Input Series: Support multiple input series. The types are are in INT32 / INT64 / FLOAT / DOUBLE.
Parameters:
p: The order of the model.k: The number of neighbors in master data. It is a positive integer. By default, it will be estimated according to the tuple distance of the k-th nearest neighbor in the master data.eta: The distance threshold. By default, it will be estimated based on the 3-sigma rule.eta: The detection threshold. By default, it will be estimated based on the 3-sigma rule.output_type: The type of output. ‘repair’ for repairing and ‘anomaly’ for anomaly detection.output_column: The repaired column to output, defaults to 1 which means output the repair result of the first column.Output Series: Output a single series. The type is the same as the input.
Installation
research/master-detector.mvn clean package -am -Dmaven.test.skip=true../distribution/target/apache-iotdb-1.2.0-SNAPSHOT-library-udf-bin/apache-iotdb-1.2.0-SNAPSHOT-library-udf-bin/ext/udf/library-udf.jar to ./ext/udf/.create function MasterDetect as 'org.apache.iotdb.library.anomaly.UDTFMasterDetect' in client.Input series:
+-----------------------------+------------+------------+--------------+--------------+--------------------+ | Time|root.test.lo|root.test.la|root.test.m_la|root.test.m_lo| root.test.model| +-----------------------------+------------+------------+--------------+--------------+--------------------+ |1970-01-01T08:00:00.001+08:00| 39.99982556| 116.327274| 116.3271939| 39.99984748| 0.13656607660463288| |1970-01-01T08:00:00.002+08:00| 39.99983865| 116.327305| 116.3272269| 39.99984748| 0.8291884323013894| |1970-01-01T08:00:00.003+08:00| 40.00019038| 116.3273291| 116.3272634| 39.99984769| 0.05012816073171693| |1970-01-01T08:00:00.004+08:00| 39.99982556| 116.327342| 116.3273015| 39.9998483| -0.5495287787485761| |1970-01-01T08:00:00.005+08:00| 39.99982991| 116.3273744| 116.327339| 39.99984892| 0.03740486307345578| |1970-01-01T08:00:00.006+08:00| 39.99982716| 116.3274117| 116.3273759| 39.99984892| 1.0500132150475212| |1970-01-01T08:00:00.007+08:00| 39.9998259| 116.3274396| 116.3274163| 39.99984953| 0.04583944643116993| |1970-01-01T08:00:00.008+08:00| 39.99982597| 116.3274668| 116.3274525| 39.99985014|-0.07863708480736269| |1970-01-01T08:00:00.009+08:00| 39.99982226| 116.3275026| 116.3274915| 39.99985076| null| |1970-01-01T08:00:00.010+08:00| 39.99980988| 116.3274967| 116.3275235| 39.99985137| null| |1970-01-01T08:00:00.011+08:00| 39.99984873| 116.3274929| 116.3275611| 39.99985199| null| |1970-01-01T08:00:00.012+08:00| 39.99981589| 116.3274745| 116.3275974| 39.9998526| null| |1970-01-01T08:00:00.013+08:00| 39.9998259| 116.3275095| 116.3276338| 39.99985384| null| |1970-01-01T08:00:00.014+08:00| 39.99984873| 116.3274787| 116.3276695| 39.99985446| null| |1970-01-01T08:00:00.015+08:00| 39.9998343| 116.3274693| 116.3277045| 39.99985569| null| |1970-01-01T08:00:00.016+08:00| 39.99983316| 116.3274941| 116.3277389| 39.99985631| null| |1970-01-01T08:00:00.017+08:00| 39.99983311| 116.3275401| 116.3277747| 39.99985693| null| |1970-01-01T08:00:00.018+08:00| 39.99984113| 116.3275713| 116.3278041| 39.99985756| null| |1970-01-01T08:00:00.019+08:00| 39.99983602| 116.3276003| 116.3278379| 39.99985818| null| |1970-01-01T08:00:00.020+08:00| 39.9998355| 116.3276308| 116.3278723| 39.9998588| null| |1970-01-01T08:00:00.021+08:00| 40.00012176| 116.3276107| 116.3279026| 39.99985942| null| |1970-01-01T08:00:00.022+08:00| 39.9998404| 116.3276684| null| null| null| |1970-01-01T08:00:00.023+08:00| 39.99983942| 116.3277016| null| null| null| |1970-01-01T08:00:00.024+08:00| 39.99984113| 116.3277284| null| null| null| |1970-01-01T08:00:00.025+08:00| 39.99984283| 116.3277562| null| null| null| +-----------------------------+------------+------------+--------------+--------------+--------------------+
SQL for query:
select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0') from root.test
Output series:
+-----------------------------+--------------------------------------------------------------------------------------+ | Time|MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0')| +-----------------------------+--------------------------------------------------------------------------------------+ |1970-01-01T08:00:00.001+08:00| 116.327274| |1970-01-01T08:00:00.002+08:00| 116.327305| |1970-01-01T08:00:00.003+08:00| 116.3273291| |1970-01-01T08:00:00.004+08:00| 116.327342| |1970-01-01T08:00:00.005+08:00| 116.3273744| |1970-01-01T08:00:00.006+08:00| 116.3274117| |1970-01-01T08:00:00.007+08:00| 116.3274396| |1970-01-01T08:00:00.008+08:00| 116.3274668| |1970-01-01T08:00:00.009+08:00| 116.3275026| |1970-01-01T08:00:00.010+08:00| 116.3274967| |1970-01-01T08:00:00.011+08:00| 116.3274929| |1970-01-01T08:00:00.012+08:00| 116.3274745| |1970-01-01T08:00:00.013+08:00| 116.3275095| |1970-01-01T08:00:00.014+08:00| 116.3274787| |1970-01-01T08:00:00.015+08:00| 116.3274693| |1970-01-01T08:00:00.016+08:00| 116.3274941| |1970-01-01T08:00:00.017+08:00| 116.3275401| |1970-01-01T08:00:00.018+08:00| 116.3275713| |1970-01-01T08:00:00.019+08:00| 116.3276003| |1970-01-01T08:00:00.020+08:00| 116.3276308| |1970-01-01T08:00:00.021+08:00| 116.3276338| |1970-01-01T08:00:00.022+08:00| 116.3276684| |1970-01-01T08:00:00.023+08:00| 116.3277016| |1970-01-01T08:00:00.024+08:00| 116.3277284| |1970-01-01T08:00:00.025+08:00| 116.3277562| +-----------------------------+--------------------------------------------------------------------------------------+
SQL for query:
select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0') from root.test
Output series:
+-----------------------------+---------------------------------------------------------------------------------------+ | Time|MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0')| +-----------------------------+---------------------------------------------------------------------------------------+ |1970-01-01T08:00:00.001+08:00| false| |1970-01-01T08:00:00.002+08:00| false| |1970-01-01T08:00:00.003+08:00| false| |1970-01-01T08:00:00.004+08:00| false| |1970-01-01T08:00:00.005+08:00| true| |1970-01-01T08:00:00.006+08:00| true| |1970-01-01T08:00:00.007+08:00| false| |1970-01-01T08:00:00.008+08:00| false| |1970-01-01T08:00:00.009+08:00| false| |1970-01-01T08:00:00.010+08:00| false| |1970-01-01T08:00:00.011+08:00| false| |1970-01-01T08:00:00.012+08:00| false| |1970-01-01T08:00:00.013+08:00| false| |1970-01-01T08:00:00.014+08:00| true| |1970-01-01T08:00:00.015+08:00| false| |1970-01-01T08:00:00.016+08:00| false| |1970-01-01T08:00:00.017+08:00| false| |1970-01-01T08:00:00.018+08:00| false| |1970-01-01T08:00:00.019+08:00| false| |1970-01-01T08:00:00.020+08:00| false| |1970-01-01T08:00:00.021+08:00| false| |1970-01-01T08:00:00.022+08:00| false| |1970-01-01T08:00:00.023+08:00| false| |1970-01-01T08:00:00.024+08:00| false| |1970-01-01T08:00:00.025+08:00| false| +-----------------------------+---------------------------------------------------------------------------------------+