|  | --- | 
|  | id: moving-average-query | 
|  | title: "Moving Average Query" | 
|  | --- | 
|  |  | 
|  | <!-- | 
|  | ~ Licensed to the Apache Software Foundation (ASF) under one | 
|  | ~ or more contributor license agreements.  See the NOTICE file | 
|  | ~ distributed with this work for additional information | 
|  | ~ regarding copyright ownership.  The ASF licenses this file | 
|  | ~ to you under the Apache License, Version 2.0 (the | 
|  | ~ "License"); you may not use this file except in compliance | 
|  | ~ with the License.  You may obtain a copy of the License at | 
|  | ~ | 
|  | ~   http://www.apache.org/licenses/LICENSE-2.0 | 
|  | ~ | 
|  | ~ Unless required by applicable law or agreed to in writing, | 
|  | ~ software distributed under the License is distributed on an | 
|  | ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | 
|  | ~ KIND, either express or implied.  See the License for the | 
|  | ~ specific language governing permissions and limitations | 
|  | ~ under the License. | 
|  | --> | 
|  |  | 
|  |  | 
|  | ## Overview | 
|  | **Moving Average Query** is an extension which provides support for [Moving Average](https://en.wikipedia.org/wiki/Moving_average) and other Aggregate [Window Functions](https://en.wikibooks.org/wiki/Structured_Query_Language/Window_functions) in Druid queries. | 
|  |  | 
|  | These Aggregate Window Functions consume standard Druid Aggregators and outputs additional windowed aggregates called [Averagers](#averagers). | 
|  |  | 
|  | #### High level algorithm | 
|  |  | 
|  | Moving Average encapsulates the [groupBy query](../../querying/groupbyquery.md) (Or [timeseries](../../querying/timeseriesquery.md) in case of no dimensions) in order to rely on the maturity of these query types. | 
|  |  | 
|  | It runs the query in two main phases: | 
|  |  | 
|  | 1. Runs an inner [groupBy](../../querying/groupbyquery.md) or [timeseries](../../querying/timeseriesquery.md) query to compute Aggregators (i.e. daily count of events). | 
|  | 2. Passes over aggregated results in Broker, in order to compute Averagers (i.e. moving 7 day average of the daily count). | 
|  |  | 
|  | #### Main enhancements provided by this extension: | 
|  | 1. Functionality: Extending druid query functionality (i.e. initial introduction of Window Functions). | 
|  | 2. Performance: Improving performance of such moving aggregations by eliminating multiple segment scans. | 
|  |  | 
|  | #### Further reading | 
|  | [Moving Average](https://en.wikipedia.org/wiki/Moving_average) | 
|  |  | 
|  | [Window Functions](https://en.wikibooks.org/wiki/Structured_Query_Language/Window_functions) | 
|  |  | 
|  | [Analytic Functions](https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts) | 
|  |  | 
|  |  | 
|  | ## Operations | 
|  |  | 
|  | ### Installation | 
|  | Use [pull-deps](../../operations/pull-deps.md) tool shipped with Druid to install this [extension](../../configuration/extensions.md#community-extensions) on all Druid broker and router nodes. | 
|  |  | 
|  | ```bash | 
|  | java -classpath "<your_druid_dir>/lib/*" org.apache.druid.cli.Main tools pull-deps -c org.apache.druid.extensions.contrib:druid-moving-average-query:{VERSION} | 
|  | ``` | 
|  |  | 
|  | ### Enabling | 
|  | After installation, to enable this extension, just add `druid-moving-average-query` to `druid.extensions.loadList` in broker and routers' `runtime.properties` file and then restart broker and router nodes. | 
|  |  | 
|  | For example: | 
|  |  | 
|  | ```bash | 
|  | druid.extensions.loadList=["druid-moving-average-query"] | 
|  | ``` | 
|  |  | 
|  | ## Configuration | 
|  | There are currently no configuration properties specific to Moving Average. | 
|  |  | 
|  | ## Limitations | 
|  | * movingAverage is missing support for the following groupBy properties: `subtotalsSpec`, `virtualColumns`. | 
|  | * movingAverage is missing support for the following timeseries properties: `descending`. | 
|  | * movingAverage is missing support for [SQL-compatible null handling](https://github.com/apache/druid/issues/4349) (So setting druid.generic.useDefaultValueForNull in configuration will give an error). | 
|  |  | 
|  | ## Query spec | 
|  | * Most properties in the query spec derived from  [groupBy query](../../querying/groupbyquery.md) / [timeseries](../../querying/timeseriesquery.md), see documentation for these query types. | 
|  |  | 
|  | |property|description|required?| | 
|  | |--------|-----------|---------| | 
|  | |queryType|This String should always be "movingAverage"; this is the first thing Druid looks at to figure out how to interpret the query.|yes| | 
|  | |dataSource|A String or Object defining the data source to query, very similar to a table in a relational database. See [DataSource](../../querying/datasource.md) for more information.|yes| | 
|  | |dimensions|A JSON list of [DimensionSpec](../../querying/dimensionspecs.md) (Notice that property is optional)|no| | 
|  | |limitSpec|See [LimitSpec](../../querying/limitspec.md)|no| | 
|  | |having|See [Having](../../querying/having.md)|no| | 
|  | |granularity|A period granularity; See [Period Granularities](../../querying/granularities.md#period-granularities)|yes| | 
|  | |filter|See [Filters](../../querying/filters.md)|no| | 
|  | |aggregations|Aggregations forms the input to Averagers; See [Aggregations](../../querying/aggregations.md)|yes| | 
|  | |postAggregations|Supports only aggregations as input; See [Post Aggregations](../../querying/post-aggregations.md)|no| | 
|  | |intervals|A JSON Object representing ISO-8601 Intervals. This defines the time ranges to run the query over.|yes| | 
|  | |context|An additional JSON Object which can be used to specify certain flags.|no| | 
|  | |averagers|Defines the moving average function; See [Averagers](#averagers)|yes| | 
|  | |postAveragers|Support input of both averagers and aggregations; Syntax is identical to postAggregations (See [Post Aggregations](../../querying/post-aggregations.md))|no| | 
|  |  | 
|  | ## Averagers | 
|  |  | 
|  | Averagers are used to define the Moving-Average function. Averagers are not limited to an average - they can also provide other types of window functions such as MAX()/MIN(). | 
|  |  | 
|  | ### Properties | 
|  |  | 
|  | These are properties which are common to all Averagers: | 
|  |  | 
|  | |property|description|required?| | 
|  | |--------|-----------|---------| | 
|  | |type|Averager type; See [Averager types](#averager-types)|yes| | 
|  | |name|Averager name|yes| | 
|  | |fieldName|Input name (An aggregation name)|yes| | 
|  | |buckets|Number of lookback buckets (time periods), including current one. Must be >0|yes| | 
|  | |cycleSize|Cycle size; Used to calculate day-of-week option; See [Cycle size (Day of Week)](#cycle-size-day-of-week)|no, defaults to 1| | 
|  |  | 
|  |  | 
|  | ### Averager types: | 
|  |  | 
|  | * [Standard averagers](#standard-averagers): | 
|  | * doubleMean | 
|  | * doubleMeanNoNulls | 
|  | * doubleSum | 
|  | * doubleMax | 
|  | * doubleMin | 
|  | * longMean | 
|  | * longMeanNoNulls | 
|  | * longSum | 
|  | * longMax | 
|  | * longMin | 
|  |  | 
|  | #### Standard averagers | 
|  |  | 
|  | These averagers offer four functions: | 
|  |  | 
|  | * Mean (Average) | 
|  | * MeanNoNulls (Ignores empty buckets). | 
|  | * Sum | 
|  | * Max | 
|  | * Min | 
|  |  | 
|  | **Ignoring nulls**: | 
|  | Using a MeanNoNulls averager is useful when the interval starts at the dataset beginning time. | 
|  | In that case, the first records will ignore missing buckets and average won't be artificially low. | 
|  | However, this also means that empty days in a sparse dataset will also be ignored. | 
|  |  | 
|  | Example of usage: | 
|  |  | 
|  | ```json | 
|  | { "type" : "doubleMean", "name" : <output_name>, "fieldName": <input_name> } | 
|  | ``` | 
|  |  | 
|  | ### Cycle size (Day of Week) | 
|  | This optional parameter is used to calculate over a single bucket within each cycle instead of all buckets. | 
|  | A prime example would be weekly buckets, resulting in a Day of Week calculation. (Other examples: Month of year, Hour of day). | 
|  |  | 
|  | I.e. when using these parameters: | 
|  |  | 
|  | * *granularity*: period=P1D (daily) | 
|  | * *buckets*: 28 | 
|  | * *cycleSize*: 7 | 
|  |  | 
|  | Within each output record, the averager will compute the result over the following buckets: current (#0), #7, #14, #21. | 
|  | Whereas without specifying cycleSize it would have computed over all 28 buckets. | 
|  |  | 
|  | ## Examples | 
|  |  | 
|  | All examples are based on the Wikipedia dataset provided in the Druid [tutorials](../../tutorials/index.md). | 
|  |  | 
|  | ### Basic example | 
|  |  | 
|  | Calculating a 7-buckets moving average for Wikipedia edit deltas. | 
|  |  | 
|  | Query syntax: | 
|  |  | 
|  | ```json | 
|  | { | 
|  | "queryType": "movingAverage", | 
|  | "dataSource": "wikipedia", | 
|  | "granularity": { | 
|  | "type": "period", | 
|  | "period": "PT30M" | 
|  | }, | 
|  | "intervals": [ | 
|  | "2015-09-12T00:00:00Z/2015-09-13T00:00:00Z" | 
|  | ], | 
|  | "aggregations": [ | 
|  | { | 
|  | "name": "delta30Min", | 
|  | "fieldName": "delta", | 
|  | "type": "longSum" | 
|  | } | 
|  | ], | 
|  | "averagers": [ | 
|  | { | 
|  | "name": "trailing30MinChanges", | 
|  | "fieldName": "delta30Min", | 
|  | "type": "longMean", | 
|  | "buckets": 7 | 
|  | } | 
|  | ] | 
|  | } | 
|  | ``` | 
|  |  | 
|  | Result: | 
|  |  | 
|  | ```json | 
|  | [ { | 
|  | "version" : "v1", | 
|  | "timestamp" : "2015-09-12T00:30:00.000Z", | 
|  | "event" : { | 
|  | "delta30Min" : 30490, | 
|  | "trailing30MinChanges" : 4355.714285714285 | 
|  | } | 
|  | }, { | 
|  | "version" : "v1", | 
|  | "timestamp" : "2015-09-12T01:00:00.000Z", | 
|  | "event" : { | 
|  | "delta30Min" : 96526, | 
|  | "trailing30MinChanges" : 18145.14285714286 | 
|  | } | 
|  | }, { | 
|  | ... | 
|  | ... | 
|  | ... | 
|  | }, { | 
|  | "version" : "v1", | 
|  | "timestamp" : "2015-09-12T23:00:00.000Z", | 
|  | "event" : { | 
|  | "delta30Min" : 119100, | 
|  | "trailing30MinChanges" : 198697.2857142857 | 
|  | } | 
|  | }, { | 
|  | "version" : "v1", | 
|  | "timestamp" : "2015-09-12T23:30:00.000Z", | 
|  | "event" : { | 
|  | "delta30Min" : 177882, | 
|  | "trailing30MinChanges" : 193890.0 | 
|  | } | 
|  | } | 
|  | ``` | 
|  |  | 
|  | ### Post averager example | 
|  |  | 
|  | Calculating a 7-buckets moving average for Wikipedia edit deltas, plus a ratio between the current period and the moving average. | 
|  |  | 
|  | Query syntax: | 
|  |  | 
|  | ```json | 
|  | { | 
|  | "queryType": "movingAverage", | 
|  | "dataSource": "wikipedia", | 
|  | "granularity": { | 
|  | "type": "period", | 
|  | "period": "PT30M" | 
|  | }, | 
|  | "intervals": [ | 
|  | "2015-09-12T22:00:00Z/2015-09-13T00:00:00Z" | 
|  | ], | 
|  | "aggregations": [ | 
|  | { | 
|  | "name": "delta30Min", | 
|  | "fieldName": "delta", | 
|  | "type": "longSum" | 
|  | } | 
|  | ], | 
|  | "averagers": [ | 
|  | { | 
|  | "name": "trailing30MinChanges", | 
|  | "fieldName": "delta30Min", | 
|  | "type": "longMean", | 
|  | "buckets": 7 | 
|  | } | 
|  | ], | 
|  | "postAveragers" : [ | 
|  | { | 
|  | "name": "ratioTrailing30MinChanges", | 
|  | "type": "arithmetic", | 
|  | "fn": "/", | 
|  | "fields": [ | 
|  | { | 
|  | "type": "fieldAccess", | 
|  | "fieldName": "delta30Min" | 
|  | }, | 
|  | { | 
|  | "type": "fieldAccess", | 
|  | "fieldName": "trailing30MinChanges" | 
|  | } | 
|  | ] | 
|  | } | 
|  | ] | 
|  | } | 
|  | ``` | 
|  |  | 
|  | Result: | 
|  |  | 
|  | ```json | 
|  | [ { | 
|  | "version" : "v1", | 
|  | "timestamp" : "2015-09-12T22:00:00.000Z", | 
|  | "event" : { | 
|  | "delta30Min" : 144269, | 
|  | "trailing30MinChanges" : 204088.14285714287, | 
|  | "ratioTrailing30MinChanges" : 0.7068955500319539 | 
|  | } | 
|  | }, { | 
|  | "version" : "v1", | 
|  | "timestamp" : "2015-09-12T22:30:00.000Z", | 
|  | "event" : { | 
|  | "delta30Min" : 242860, | 
|  | "trailing30MinChanges" : 214031.57142857142, | 
|  | "ratioTrailing30MinChanges" : 1.134692411867141 | 
|  | } | 
|  | }, { | 
|  | "version" : "v1", | 
|  | "timestamp" : "2015-09-12T23:00:00.000Z", | 
|  | "event" : { | 
|  | "delta30Min" : 119100, | 
|  | "trailing30MinChanges" : 198697.2857142857, | 
|  | "ratioTrailing30MinChanges" : 0.5994042624782422 | 
|  | } | 
|  | }, { | 
|  | "version" : "v1", | 
|  | "timestamp" : "2015-09-12T23:30:00.000Z", | 
|  | "event" : { | 
|  | "delta30Min" : 177882, | 
|  | "trailing30MinChanges" : 193890.0, | 
|  | "ratioTrailing30MinChanges" : 0.9174377224199288 | 
|  | } | 
|  | } ] | 
|  | ``` | 
|  |  | 
|  |  | 
|  | ### Cycle size example | 
|  |  | 
|  | Calculating an average of every first 10-minutes of the last 3 hours: | 
|  |  | 
|  | Query syntax: | 
|  |  | 
|  | ```json | 
|  | { | 
|  | "queryType": "movingAverage", | 
|  | "dataSource": "wikipedia", | 
|  | "granularity": { | 
|  | "type": "period", | 
|  | "period": "PT10M" | 
|  | }, | 
|  | "intervals": [ | 
|  | "2015-09-12T00:00:00Z/2015-09-13T00:00:00Z" | 
|  | ], | 
|  | "aggregations": [ | 
|  | { | 
|  | "name": "delta10Min", | 
|  | "fieldName": "delta", | 
|  | "type": "doubleSum" | 
|  | } | 
|  | ], | 
|  | "averagers": [ | 
|  | { | 
|  | "name": "trailing10MinPerHourChanges", | 
|  | "fieldName": "delta10Min", | 
|  | "type": "doubleMeanNoNulls", | 
|  | "buckets": 18, | 
|  | "cycleSize": 6 | 
|  | } | 
|  | ] | 
|  | } | 
|  | ``` |