blob: 6912dc4b6284c6495218c68162dd3f16bbf16c91 [file] [log] [blame]
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html>
<head>
<title>Accumulo Aggregation</title>
<link rel='stylesheet' type='text/css' href='documentation.css' media='screen'/>
</head>
<body>
<h1>Apache Accumulo Documentation : Aggregation</h1>
<p>Accumulo does aggregation differently than traditional RDBMSs. Instead of aggregating at query time, it aggregates at ingest time. Doing aggregation at ingest time is ideal for large amounts of data and has three distinct advantages. First, it reduces the actual amount of data stored. Second, it makes queries run faster. Third, it removes the need to do a lookup at insert time in many cases, which can greatly speed up ingest.
<p>Aggregation in accumulo is easy to use. You simply specify which columns or column family you want to aggregate at table creation time. Allowing an aggregation function to apply to a whole column family is an interesting twist that gives the user great flexibility. The example below demonstrates this flexibility.
<p><pre>
Shell - Apache Accumulo Interactive Shell
- version: 1.3.6
- instance id: 863fc0d1-3623-4b6c-8c23-7d4fdb1c8a49
-
- type 'help' for a list of available commands
-
user@instance:9999&gt; createtable perDayCounts -a day=org.apache.accumulo.core.iterators.aggregation.StringSummation
user@instance:9999 perDayCounts&gt; insert foo day 20080101 1
insert successful
user@instance:9999 perDayCounts&gt; insert foo day 20080101 1
insert successful
user@instance:9999 perDayCounts&gt; insert foo day 20080103 1
insert successful
user@instance:9999 perDayCounts&gt; insert bar day 20080101 1
insert successful
user@instance:9999 perDayCounts&gt; insert bar day 20080101 1
insert successful
user@instance:9999 perDayCounts&gt; scan
bar day:20080101 [] 2
foo day:20080101 [] 2
foo day:20080103 [] 1
user@instance:9999 perDayCounts&gt;
</pre>
<p>Implementing a new aggregation function is a snap. Simply write some Java code that implements <a href='apidocs/org/apache/accumulo/core/iterators/aggregation/Aggregator.html'>org.apache.accumulo.core.iterators.aggregation.Aggregator</a>. A good example to look at is <a href='apidocs/org/apache/accumulo/core/iterators/aggregation/StringSummation.html'>StringSummation</a> which sums numbers encoded as ascii strings. However, one could easily write a much more efficient summation aggregator that operates on numbers encoded in twos complement.
<p>To deploy a new aggregator, jar it up and put the jar in accumulo/lib. To see an example look at <a href='examples/README.aggregation'>README.aggregation</a>
<p>If you would like to see what aggregators a table has you can use the config command like in the following example.
<p><pre>
user@instance:9999 perDayCounts&gt; config -t perDayCounts agg
---------+------------------------------------------+-----------------------------------------
SCOPE | NAME | VALUE
---------+------------------------------------------+-----------------------------------------
table | table.iterator.majc.agg................. | 10,org.apache.accumulo.core.iterators.AggregatingIterator
table | table.iterator.majc.agg.opt.day......... | org.apache.accumulo.core.iterators.aggregation.StringSummation
table | table.iterator.minc.agg................. | 10,org.apache.accumulo.core.iterators.AggregatingIterator
table | table.iterator.minc.agg.opt.day......... | org.apache.accumulo.core.iterators.aggregation.StringSummation
table | table.iterator.scan.agg................. | 10,org.apache.accumulo.core.iterators.AggregatingIterator
table | table.iterator.scan.agg.opt.day......... | org.apache.accumulo.core.iterators.aggregation.StringSummation
---------+------------------------------------------+-----------------------------------------
user@instance:9999 perDayCounts&gt;
</pre>
<p>You can add aggregators to an existing table using the following command in the accumulo shell.
<p><pre>
config -t &lt;tablename&gt; &lt;columnFamily[:columnQual]&gt;=&lt;aggregation class&gt;
</pre>
</body>
</html>