docs/aggregation.html - accumulo - Git at Google

 <!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
 -->
 <html>
 <head>
 <title>Accumulo Aggregation</title>
 <link rel='stylesheet' type='text/css' href='documentation.css' media='screen'/>
 </head>
 <body>

 <h1>Apache Accumulo Documentation : Aggregation</h1>

 <p>Accumulo does aggregation differently than traditional RDBMSs.  Instead of aggregating at query time, it aggregates at ingest time.  Doing aggregation at ingest time is ideal for large amounts of data and has three distinct advantages.  First, it reduces the actual amount of data stored.  Second, it makes queries run faster.  Third, it removes the need to do a lookup at insert time in many cases, which can greatly speed up ingest.

 <p>Aggregation in accumulo is easy to use.  You simply specify which columns or column family you want to aggregate at table creation time.  Allowing an aggregation function to apply to a whole column family is an interesting twist that gives the user great flexibility.  The example below demonstrates this flexibility.

 <p><pre>
 Shell - Apache Accumulo Interactive Shell
 - version: 1.3.6
 - instance id: 863fc0d1-3623-4b6c-8c23-7d4fdb1c8a49
 -
 - type 'help' for a list of available commands
 -
 user@instance:9999&gt; createtable perDayCounts -a day=org.apache.accumulo.core.iterators.aggregation.StringSummation
 user@instance:9999 perDayCounts&gt; insert foo day 20080101 1
 insert successful
 user@instance:9999 perDayCounts&gt; insert foo day 20080101 1
 insert successful
 user@instance:9999 perDayCounts&gt; insert foo day 20080103 1
 insert successful
 user@instance:9999 perDayCounts&gt; insert bar day 20080101 1
 insert successful
 user@instance:9999 perDayCounts&gt; insert bar day 20080101 1
 insert successful
 user@instance:9999 perDayCounts&gt; scan
 bar day:20080101 [] 2
 foo day:20080101 [] 2
 foo day:20080103 [] 1
 user@instance:9999 perDayCounts&gt;
 </pre>


 <p>Implementing a new aggregation function is a snap.  Simply write some Java code that implements <a href='apidocs/org/apache/accumulo/core/iterators/aggregation/Aggregator.html'>org.apache.accumulo.core.iterators.aggregation.Aggregator</a>. A good example to look at is <a href='apidocs/org/apache/accumulo/core/iterators/aggregation/StringSummation.html'>StringSummation</a> which sums numbers encoded as ascii strings.  However, one could easily write a much more efficient summation aggregator that operates on numbers encoded in twos complement.

 <p>To deploy a new aggregator, jar it up and put the jar in accumulo/lib.  To see an example look at <a href='examples/README.aggregation'>README.aggregation</a>

 <p>If you would like to see what aggregators a table has you can use the config command like in the following example.

 <p><pre>
 user@instance:9999 perDayCounts&gt; config -t perDayCounts agg
 ---------+------------------------------------------+-----------------------------------------
 SCOPE    | NAME                                     | VALUE
 ---------+------------------------------------------+-----------------------------------------
 table    | table.iterator.majc.agg................. | 10,org.apache.accumulo.core.iterators.AggregatingIterator
 table    | table.iterator.majc.agg.opt.day......... | org.apache.accumulo.core.iterators.aggregation.StringSummation
 table    | table.iterator.minc.agg................. | 10,org.apache.accumulo.core.iterators.AggregatingIterator
 table    | table.iterator.minc.agg.opt.day......... | org.apache.accumulo.core.iterators.aggregation.StringSummation
 table    | table.iterator.scan.agg................. | 10,org.apache.accumulo.core.iterators.AggregatingIterator
 table    | table.iterator.scan.agg.opt.day......... | org.apache.accumulo.core.iterators.aggregation.StringSummation
 ---------+------------------------------------------+-----------------------------------------
 user@instance:9999 perDayCounts&gt;
 </pre>

 <p>You can add aggregators to an existing table using the following command in the accumulo shell.

 <p><pre>
   config -t &lt;tablename&gt; &lt;columnFamily[:columnQual]&gt;=&lt;aggregation class&gt;
 </pre>

 </body>
 </html>
	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->
	<html>
	<head>
	<title>Accumulo Aggregation</title>
	<link rel='stylesheet' type='text/css' href='documentation.css' media='screen'/>
	</head>
	<body>

	<h1>Apache Accumulo Documentation : Aggregation</h1>

	<p>Accumulo does aggregation differently than traditional RDBMSs. Instead of aggregating at query time, it aggregates at ingest time. Doing aggregation at ingest time is ideal for large amounts of data and has three distinct advantages. First, it reduces the actual amount of data stored. Second, it makes queries run faster. Third, it removes the need to do a lookup at insert time in many cases, which can greatly speed up ingest.

	<p>Aggregation in accumulo is easy to use. You simply specify which columns or column family you want to aggregate at table creation time. Allowing an aggregation function to apply to a whole column family is an interesting twist that gives the user great flexibility. The example below demonstrates this flexibility.

	<p><pre>
	Shell - Apache Accumulo Interactive Shell
	- version: 1.3.6
	- instance id: 863fc0d1-3623-4b6c-8c23-7d4fdb1c8a49
	-
	- type 'help' for a list of available commands
	-
	user@instance:9999> createtable perDayCounts -a day=org.apache.accumulo.core.iterators.aggregation.StringSummation
	user@instance:9999 perDayCounts> insert foo day 20080101 1
	insert successful
	user@instance:9999 perDayCounts> insert foo day 20080101 1
	insert successful
	user@instance:9999 perDayCounts> insert foo day 20080103 1
	insert successful
	user@instance:9999 perDayCounts> insert bar day 20080101 1
	insert successful
	user@instance:9999 perDayCounts> insert bar day 20080101 1
	insert successful
	user@instance:9999 perDayCounts> scan
	bar day:20080101 [] 2
	foo day:20080101 [] 2
	foo day:20080103 [] 1
	user@instance:9999 perDayCounts>
	</pre>


	<p>Implementing a new aggregation function is a snap. Simply write some Java code that implements <a href='apidocs/org/apache/accumulo/core/iterators/aggregation/Aggregator.html'>org.apache.accumulo.core.iterators.aggregation.Aggregator</a>. A good example to look at is <a href='apidocs/org/apache/accumulo/core/iterators/aggregation/StringSummation.html'>StringSummation</a> which sums numbers encoded as ascii strings. However, one could easily write a much more efficient summation aggregator that operates on numbers encoded in twos complement.

	<p>To deploy a new aggregator, jar it up and put the jar in accumulo/lib. To see an example look at <a href='examples/README.aggregation'>README.aggregation</a>

	<p>If you would like to see what aggregators a table has you can use the config command like in the following example.

	<p><pre>
	user@instance:9999 perDayCounts> config -t perDayCounts agg
	---------+------------------------------------------+-----------------------------------------
	SCOPE \| NAME \| VALUE
	---------+------------------------------------------+-----------------------------------------
	table \| table.iterator.majc.agg................. \| 10,org.apache.accumulo.core.iterators.AggregatingIterator
	table \| table.iterator.majc.agg.opt.day......... \| org.apache.accumulo.core.iterators.aggregation.StringSummation
	table \| table.iterator.minc.agg................. \| 10,org.apache.accumulo.core.iterators.AggregatingIterator
	table \| table.iterator.minc.agg.opt.day......... \| org.apache.accumulo.core.iterators.aggregation.StringSummation
	table \| table.iterator.scan.agg................. \| 10,org.apache.accumulo.core.iterators.AggregatingIterator
	table \| table.iterator.scan.agg.opt.day......... \| org.apache.accumulo.core.iterators.aggregation.StringSummation
	---------+------------------------------------------+-----------------------------------------
	user@instance:9999 perDayCounts>
	</pre>

	<p>You can add aggregators to an existing table using the following command in the accumulo shell.

	<p><pre>
	config -t <tablename> <columnFamily[:columnQual]>=<aggregation class>
	</pre>

	</body>
	</html>