| <?xml version="1.0"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| <!-- |
| See https://jira.codehaus.org/browse/MSITE-646, the links to JavaDoc methods have to |
| have the hash URL-escaped. |
| --> |
| <document> |
| <properties> |
| <title>Aggregator</title> |
| <author email="me@liviutudor.com">Aggregator</author> |
| </properties> |
| <body> |
| |
| <section name="Overview"> |
| <p> |
| The <a href="apidocs/org/apache/commons/functor/aggregator/package-summary.html">aggregator package</a> |
| provides a component which can be "fed" data from various sources and apply various |
| computations to this data to "aggregate" it. The design allows for a pluggable |
| implementation to be used for the purpose of aggregation and also provides flexibility |
| with regards to storing the data being fed to this component. |
| </p> |
| <p> |
| There are 2 implementations as far as storage is concerned : <i>nostore</i> and |
| <code>List</code>-backed store. The former should be used when the aggregation |
| formula is simple enough to be applied on-the-fly without having to traverse a series |
| of data -- e.g. summarizing / adding the data -- while the latter can perform more advanced processing -- e.g. |
| statistical calculations: median, average, percentile etc. |
| </p> |
| </section> |
| |
| <section name="Storage"> |
| <p> |
| The <code>List</code>-backed store implementation stores the whole data series in a |
| growing <code>List</code> such that when data is aggregated one can traverse the |
| series and apply complex formulas (find the <a href="http://en.wikipedia.org/wiki/Median" |
| target="_blank">median</a> of a set of numeric values for instance). |
| </p> |
| <p> |
| The framework allows for any <code>List</code>-based implementation to be plugged in, by |
| implementing <a href="apidocs/org/apache/commons/functor/aggregator/AbstractListBackedAggregator.html%23createList()">createList()</a>, |
| however, there is also an <code>ArrayList</code>-based implementation provided in |
| <a href="apidocs/org/apache/commons/functor/aggregator/ArrayListBackedAggregator.html">ArrayListBackedAggregator</a> |
| which can be used out-of-the-box. |
| </p> |
| <p> |
| While the <i>store</i> implementation stores the data in a list, the <i>nostore</i> one |
| stores just a single object -- every time data is fed into the <code>Aggregator</code>, |
| the data stored in this object and the data |
| <a href="apidocs/org/apache/commons/functor/aggregator/Aggregator.html%23add(T)">passed in</a> |
| are "aggregated" there and then using the supplied |
| <a href="apidocs/org/apache/commons/functor/BinaryFunction.html">formula</a> and the result of this operation |
| is stored back in the object this implementation uses to store data. |
| </p> |
| <p> |
| This has the implication that unlike the list-backed storage implementation (where the |
| result of aggregating the data is not known until <a href="apidocs/org/apache/commons/functor/aggregator/Aggregator.html%23evaluate()">evaluate()</a> |
| is called), with the <i>nostore</i> implementation the result is known (and saved back) at |
| any moment. This arguably makes this class "faster", however this comes at the cost of a |
| slower <a href="apidocs/org/apache/commons/functor/aggregator/Aggregator.html%23add(T)">add()</a> |
| operation, as the aggregation formula is applied. Also, let's remind ourselves that not all |
| formulas can be implemented using the <i>nostore</i> implementation (see above). |
| </p> |
| </section> |
| |
| <section name="Flushing Data"> |
| <p> |
| The data an aggregator handles can be "reset" at any point -- in the case of the |
| <i>store</i> implementation this means emptying the <code>List</code>; in the case |
| of <i>nostore</i> this means setting the "start value" to a "neutral" value -- for |
| instance 0 for implementations which involve additions (<code>Y + 0 = Y</code>), 1 for |
| multiplication (<code>Y * 1 = Y</code>) etc. This operation ultimately resets the |
| aggregator to the initial state it was in when created in the first place. This feature |
| allows for a caller to schedule regular resets to free up resources etc. |
| </p> |
| </section> |
| |
| <section name="Timers"> |
| <p> |
| Retrieving the data and regular intervals then flush/reset the aggregator was an |
| envisaged scenario for the aggregator -- so the base class |
| <a href="apidocs/org/apache/commons/functor/aggregator/AbstractTimedAggregator.html">AbstractTimedAggregator</a> |
| offers support to start a timer internally to do that. The class offers a <i>listener</i> |
| mechanism where classes can register to receive <a href="apidocs/org/apache/commons/functor/aggregator/TimedAggregatorListener.html">timer notifications</a> (if timer support was configured) and after all listeners have been |
| notified the aggregator is <a href="apidocs/org/apache/commons/functor/aggregator/Aggregator.html%23reset()">reset</a>. |
| The result of |
| <a href="apidocs/org/apache/commons/functor/aggregator/Aggregator.html%23evaluate()">evaluate()</a> |
| (the result of aggregating data) is passed in to the listeners. This allows for an object |
| to simply register itself as a timer listener to the aggregator and only react to the |
| timer notifications (e.g. write the result to a file for offline analysis etc) while the |
| aggregator itself manages all the internals. |
| </p> |
| <p> |
| When the data is being flushed/reset, a |
| <a href="apidocs/org/apache/commons/functor/aggregator/TimedAggregatorListener.html">TimedAggregatorListener</a> |
| can be registered to receive a notification. The notification is sent <b>after</b> the data is reset. |
| Prior to resetting the data, <a href="apidocs/org/apache/commons/functor/aggregator/Aggregator.html%23evaluate()">evaluate()</a> |
| is called, and the result of the evaluation is sent to the listener. |
| </p> |
| <p> |
| This allows for data aggregated periodically to be processed (e.g. logged, written into a database etc) without |
| worrying about what happens in between. |
| </p> |
| <p> |
| There are 2 ways the |
| <a href="apidocs/org/apache/commons/functor/aggregator/AbstractTimedAggregator.html">AbstractTimedAggregator</a> |
| can be configured with timer support: |
| </p> |
| <p> |
| <ul> |
| <li><b>Using a per-instance timer</b> : by default, when using the |
| <a href="apidocs/org/apache/commons/functor/aggregator/AbstractTimedAggregator.html%23AbstractTimedAggregator(long)">1 |
| parameter constructor</a> with a value > 0, the aggregator will create a |
| <code>Timer</code> object internally and schedule the regular task of resetting the |
| aggregator under this <code>Timer</code>. While this helps preventing memory leaks |
| (both <code>Aggregator</code> and <code>Timer</code> will be recycled together) it |
| does increase the threading level in the system with each aggregator object created. |
| If your system creates a lot of <code>AbstractTimedAggregator</code> instances with |
| timer support, you might want to consider using the shared timer (see below).</li> |
| <li><b>Using a shared timer</b> : the <a |
| href="apidocs/org/apache/commons/functor/aggregator/AbstractTimedAggregator.html">AbstractTimedAggregator</a> |
| class creates a <a |
| href="apidocs/org/apache/commons/functor/aggregator/AbstractTimedAggregator.html#MAIN_TIMER">static |
| timer instance</a> which can be shared amongst all instances of this class, if using |
| <a |
| href="apidocs/org/apache/commons/functor/aggregator/AbstractTimedAggregator.html%23AbstractTimedAggregator(long, boolean)">the |
| constructor with 2 params</a>. When doing so, each such instance will schedule its |
| regular reset task against this timer. While this reduces the memory footprint and |
| threading on the system, it can lead to memory leaks if this is not handled correctly. |
| Therefore please make sure you invoke <a |
| href="apidocs/org/apache/commons/functor/aggregator/AbstractTimedAggregator.html%23stop()">stop()</a> |
| on such aggregators when finished working with them.</li> |
| </ul> |
| </p> |
| <p> |
| It is recommended you always invoke <a |
| href="apidocs/org/apache/commons/functor/aggregator/AbstractTimedAggregator.html%23stop()">stop()</a> |
| at the end of the lifecycle of an aggregator -- regardless of timer support (shared / |
| per instance) or not -- in particular calling <code>stop()</code> on an aggregator with |
| no timer support has no effect, however doing so ensures code uniformity and cleanliness. |
| </p> |
| </section> |
| |
| <section name="Examples"> |
| <p> |
| An example of usage for this component (though not necessarily the only one) could be in |
| a highly transactional system where, for instance, the average transaction time can be an |
| indication of the system health. While a fully-fledged monitoring system can say monitor |
| the system load, file system etc., you might also want your monitoring system to keep an |
| eye on the average transaction time. One way to do so is have an <a |
| href="apidocs/org/apache/commons/functor/aggregator/AbstractTimedAggregator.html">AbstractTimedAggregator</a> |
| which is capturing each transaction execution time and every say 5 seconds it computes |
| the arithmetic mean (average) and writes that to a log file (which then your monitoring |
| system can tail). To do so you would use a piece of code like this: |
| </p> |
| |
| <source> |
| public class AggregatorTesting implements TimedAggregatorListener<Double> { |
| Aggregator<Double> agg; |
| |
| public AggregatorTesting() { |
| AbstractTimedAggregator<Double> aggregator = new ArrayListBackedAggregator<Double>(new DoubleMeanValueAggregatorFunction(), |
| 5000L); |
| aggregator.addTimerListener(this); |
| this.agg = aggregator; |
| } |
| |
| @Override |
| public void onTimer(AbstractTimedAggregator<Double> aggregator) { |
| double aggregated = aggregator.evaluate(); |
| /* log the value etc. */ |
| } |
| |
| private void add(double d) { |
| agg.add(d); |
| } |
| |
| public static void main(String[] args) { |
| AggregatorTesting t = new AggregatorTesting(); |
| /* add data */ |
| t.add( 1.0 ); |
| t.add( 2.0 ); |
| /* .. */ |
| } |
| } |
| </source> |
| |
| <p>Now let's assume that you occasionally see the odd spike in transaction times -- which, |
| for the purpose of this example, we'll assume is normal. As such you are not concerned with |
| these spikes, so you want a formula to exclude them. Chances are an arithmetic median |
| would be more appropriate in this case; in which case the code above can suffer just a |
| small change: |
| </p> |
| |
| <source> |
| ... |
| AbstractTimedAggregator<Double> aggregator = new ArrayListBackedAggregator<Double>(new DoubleMedianValueAggregatorFunction(), |
| 5000L); |
| ... |
| </source> |
| |
| <p>Or maybe you want to be more precise and ensure that lets say 95% of your transactions |
| take less than a certain execution time, so you replace the median formula with a |
| percentile one: |
| </p> |
| |
| <source> |
| ... |
| AbstractTimedAggregator<Double> aggregator = new ArrayListBackedAggregator<Double>(new DoublePercentileAggregatorFunction(95), |
| 5000L); |
| ... |
| </source> |
| |
| <p>Or maybe your health indicator is the number of transactions going through the system |
| every 5 seconds. In this case you can use a <i>nostore</i> aggregator with a sum formula |
| like this: |
| </p> |
| |
| <source> |
| public class AggregatorTesting implements TimedAggregatorListener<Double> { |
| Aggregator<Double> agg; |
| |
| public AggregatorTesting() { |
| AbstractTimedAggregator<Double> aggregator = new AbstractNoStoreAggregator<Double>( |
| new DoubleSumAggregatorBinaryFunction(), 5000L) { |
| @Override |
| protected Double initialValue() { |
| return 0.0; |
| } |
| }; |
| aggregator.addTimerListener(this); |
| this.agg = aggregator; |
| } |
| |
| @Override |
| public void onTimer(AbstractTimedAggregator<Double> aggregator) { |
| double aggregated = aggregator.evaluate(); |
| /* log the value etc. */ |
| } |
| |
| private void add(double d) { |
| agg.add(d); |
| } |
| |
| public static void main(String[] args) { |
| AggregatorTesting t = new AggregatorTesting(); |
| /* add data */ |
| t.add(1.0); |
| t.add(2.0); |
| /* .. */ |
| } |
| } |
| </source> |
| |
| <p>(Bear in mind that there is also a |
| <a href="apidocs/org/apache/commons/functor/aggregator/functions/IntegerCountAggregatorBinaryFunction.html">IntegerCountAggregatorBinaryFunction</a> too!) |
| </p> |
| |
| <p>This has the advantage of a lower memory footprint as well (see above). If your healthcheck |
| indicator is based on the maximum transaction time over a 5 seconds interval, then you simply |
| replace the <a href="apidocs/org/apache/commons/functor/aggregator/AbstractNoStoreAggregator.html%23getAggregationFunction()">aggregation |
| function</a> with a max value implementation: |
| </p> |
| |
| <source> |
| ... |
| AbstractTimedAggregator<Double> aggregator = new AbstractNoStoreAggregator<Double>( |
| new DoubleMaxAggregatorBinaryFunction(), 5000L) { |
| ... |
| </source> |
| |
| <p>Or you can simply roll out your own code -- if using the <i>nostore</i> implementation, |
| all you need to do is implement a |
| <a href="apidocs/org/apache/commons/functor/BinaryFunction.html">BinaryFunction</a> and pass |
| it to the <a href="apidocs/org/apache/commons/functor/aggregator/AbstractNoStoreAggregator.html%23AbstractNoStoreAggregator(org.apache.commons.functor.BinaryFunction)">Aggregator |
| constructor</a>. This function will receive the already-stored |
| aggregated value as the first parameter, and data just passed in (via <a |
| href="apidocs/org/apache/commons/functor/aggregator/Aggregator.html%23add(T)">add()</a>) as the |
| second parameter; the result of this function will be stored back in the aggregator. |
| </p> |
| <p> |
| For the list-based aggregators, use an instance of |
| <a href="apidocs/org/apache/commons/functor/aggregator/AbstractListBackedAggregator.html">AbstractListBackedAggregator</a> |
| and pass in an instance of <a href="apidocs/org/apache/commons/functor/UnaryFunction.html">UnaryFunction</a> |
| which accepts a <code>List</code> and returns a single object. |
| </p> |
| <p> |
| Have a look at the |
| <a href="apidocs/org/apache/commons/functor/aggregator/functions/package-summary.html">org.apache.commons.functor.aggregator.functions |
| package</a> which includes a set of functions to achieve various of these tasks already. |
| </p> |
| |
| <p><b>Note on class naming</b> : You will notice in the |
| <a href="apidocs/org/apache/commons/functor/aggregator/functions/package-summary.html">org.apache.commons.functor.aggregator.functions |
| package</a> there are functions with similar names -- e.g. |
| <a href="apidocs/org/apache/commons/functor/aggregator/functions/DoubleSumAggregatorFunction.html">DoubleSumAggregatorFunction</a> |
| and <a href="apidocs/org/apache/commons/functor/aggregator/functions/DoubleSumAggregatorBinaryFunction.html">DoubleSumAggregatorBinaryFunction</a>. |
| The naming convention is that if the function takes 2 |
| parameters (i.e. it is an instance of <a href="apidocs/org/apache/commons/functor/BinaryFunction.html">BinaryFunction</a> then the |
| name of the class in this package will contain <code>BinaryFunction</code>, otherwise, if it is an instance |
| of <a href="apidocs/org/apache/commons/functor/UnaryFunction.html">UnaryFunction</a> then the |
| name will only contain <code>Function</code>. |
| </p> |
| <p> |
| The reason behind it is that instances of <code>BinaryFunction</code> can be used with |
| <a href="apidocs/org/apache/commons/functor/aggregator/AbstractNoStoreAggregator.html">AbstractNoStoreAggregator</a> |
| whereas the instances of <code>UnaryFunction</code> can be used |
| with <a href="apidocs/org/apache/commons/functor/aggregator/AbstractTimedAggregator.html">AbstractTimedAggregator</a>. |
| </p> |
| </section> |
| </body> |
| </document> |