_posts/release/2020-01-19-accumulo-2.1.0.md

title: Apache Accumulo 2.1.0 draft: true

** DRAFT RELEASE NOTES **

Notable Changes

Compaction Changes

Significant changes were made to how Accumulo compacts files in this release. See {% dlink administration/compaction %} for details, below are some highlights.

Multiple concurrent compactions per tablet on disjoint files is now supported. Previously only a single compaction could run on a tablet. This allows tablets that are running long compactions on large files to concurrently compact new smaller files that arrive.
Multiple compaction thread pools per tablet server are now supported. Previously only a single thread pool existed within a tablet server for compactions. With a single thread pool, if all threads are working on long compactions it can starve quick compactions. Now compactions with little data can be processed by dedicated thread pools.
Accumulo's default algorithm for selecting files to compact was modified to select the smallest set of files that meet the compaction ratio criteria instead of the largest set. This change makes tablets more aggressive about reducing their number files while still doing logarithmic compaction work. This change also enables efficiently compacting new small files that arrive during a long running compaction.
Having dedicated compaction threads pools for tables is now supported through configuration. The default configuration for Accumulo sets up dedicated thread pools for compacting the Accumulo metadata table.
Merging minor compactions were dropped. These were added to Accumulo to address the problem of new files arriving while a long running compaction was running. Merging minor compactions could cause O(N^2) compaction work. The new compaction changes in this release can satisfy this use case while doing a logarithmic amount of work.

CompactionStrategy was deprecated in favor of new public APIs. See its [javadoc]({% jurl org.apache.accumulo.tserver.compaction.CompactionStrategy %}) for more information.

Fixed GC Metadata hotspots

Prior to this release, Accumulo stored GC file candidates in the metadata table using rows of the form ~del<URI>. This row schema lead to uneven load on the metadata table and metadata tablets that were eventually never used. In {% ghi 1043 %} the row format was changed to ~del<hash(URI)><URI> resulting in even load on the metadata table and even data spread in the tablets. After upgrading, there may still be splits in the metadata table using the old row format. These splits can be merged away as shown in the example below which starts off with splits generated from the old and new row schema. The old splits with the prefix ~delhdfs are merged away.

root@uno> getsplits -t accumulo.metadata 
2<
~
~del55
~dela7
~delhdfs://localhost:8020/accumulo/tables/2/default_tablet/F00000a0.rf
~delhdfs://localhost:8020/accumulo/tables/2/default_tablet/F00000kb.rf
root@uno> merge -t accumulo.metadata -b ~delhdfs -e ~delhdfs~
root@uno> getsplits -t accumulo.metadata 
2<
~
~del55
~dela7