| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| <html> |
| <head> |
| <title>Accumulo Aggregation</title> |
| <link rel='stylesheet' type='text/css' href='documentation.css' media='screen'/> |
| </head> |
| <body> |
| |
| <h1>Apache Accumulo Documentation : Aggregation</h1> |
| |
| <p>Accumulo does aggregation differently than traditional RDBMSs. Instead of aggregating at query time, it aggregates at ingest time. Doing aggregation at ingest time is ideal for large amounts of data and has three distinct advantages. First, it reduces the actual amount of data stored. Second, it makes queries run faster. Third, it removes the need to do a lookup at insert time in many cases, which can greatly speed up ingest. |
| |
| <p>Aggregation in accumulo is easy to use. You simply specify which columns or column family you want to aggregate at table creation time. Allowing an aggregation function to apply to a whole column family is an interesting twist that gives the user great flexibility. The example below demonstrates this flexibility. |
| |
| <p><pre> |
| Shell - Apache Accumulo Interactive Shell |
| - version: 1.3.6 |
| - instance id: 863fc0d1-3623-4b6c-8c23-7d4fdb1c8a49 |
| - |
| - type 'help' for a list of available commands |
| - |
| user@instance:9999> createtable perDayCounts -a day=org.apache.accumulo.core.iterators.aggregation.StringSummation |
| user@instance:9999 perDayCounts> insert foo day 20080101 1 |
| insert successful |
| user@instance:9999 perDayCounts> insert foo day 20080101 1 |
| insert successful |
| user@instance:9999 perDayCounts> insert foo day 20080103 1 |
| insert successful |
| user@instance:9999 perDayCounts> insert bar day 20080101 1 |
| insert successful |
| user@instance:9999 perDayCounts> insert bar day 20080101 1 |
| insert successful |
| user@instance:9999 perDayCounts> scan |
| bar day:20080101 [] 2 |
| foo day:20080101 [] 2 |
| foo day:20080103 [] 1 |
| user@instance:9999 perDayCounts> |
| </pre> |
| |
| |
| <p>Implementing a new aggregation function is a snap. Simply write some Java code that implements <a href='apidocs/org/apache/accumulo/core/iterators/aggregation/Aggregator.html'>org.apache.accumulo.core.iterators.aggregation.Aggregator</a>. A good example to look at is <a href='apidocs/org/apache/accumulo/core/iterators/aggregation/StringSummation.html'>StringSummation</a> which sums numbers encoded as ascii strings. However, one could easily write a much more efficient summation aggregator that operates on numbers encoded in twos complement. |
| |
| <p>To deploy a new aggregator, jar it up and put the jar in accumulo/lib. To see an example look at <a href='examples/README.aggregation'>README.aggregation</a> |
| |
| <p>If you would like to see what aggregators a table has you can use the config command like in the following example. |
| |
| <p><pre> |
| user@instance:9999 perDayCounts> config -t perDayCounts agg |
| ---------+------------------------------------------+----------------------------------------- |
| SCOPE | NAME | VALUE |
| ---------+------------------------------------------+----------------------------------------- |
| table | table.iterator.majc.agg................. | 10,org.apache.accumulo.core.iterators.AggregatingIterator |
| table | table.iterator.majc.agg.opt.day......... | org.apache.accumulo.core.iterators.aggregation.StringSummation |
| table | table.iterator.minc.agg................. | 10,org.apache.accumulo.core.iterators.AggregatingIterator |
| table | table.iterator.minc.agg.opt.day......... | org.apache.accumulo.core.iterators.aggregation.StringSummation |
| table | table.iterator.scan.agg................. | 10,org.apache.accumulo.core.iterators.AggregatingIterator |
| table | table.iterator.scan.agg.opt.day......... | org.apache.accumulo.core.iterators.aggregation.StringSummation |
| ---------+------------------------------------------+----------------------------------------- |
| user@instance:9999 perDayCounts> |
| </pre> |
| |
| <p>You can add aggregators to an existing table using the following command in the accumulo shell. |
| |
| <p><pre> |
| config -t <tablename> <columnFamily[:columnQual]>=<aggregation class> |
| </pre> |
| |
| </body> |
| </html> |