Apache Accumulo 2.0.0 is a significant release. These release notes are still a work in progress. The notes are fairly complete in terms of features added to 2.0, but some features may still be missing. Also, the notes will be updated in the future to point to documentation for new features in addition to issues.
A fluent API for creating Accumulo clients was introduced in ACCUMULO-4784 and #634. The Connector
and ZooKeeperInstance
objects have been deprecated and replaced by AccumuloClient
which is created from the Accumulo
entry point. The new API also deprecates ClientConfiguration
and introduces its own properties file called accumulo-client.properties
that ships with the Accumulo tarball. The new API has the following benefits over the old API:
ZooKeeperInstance
to be created first before creating a client.Properties
object, or accumulo-client.properties
BatchWriter
, Scanner
, etc. See the client documentation for more information on how to use the new API.Accumulo 2.x expects at least Java 8 and Hadoop 3. It is built against Java 8 and Hadoop 3 and the binary tarball is targeted to work with a Java 8 and Hadoop 3 system. See {% jira ACCUMULO-4826 %}, {% ghi 531 %}, and {% jira ACCUMULO-4299 %}. Running with Java 11 is also supported, but Java 11 is not required.
Accumulo's scripts and configuration were refactored in ACCUMULO-4490 to make Accumulo easier to use. The number of scripts in the bin
directory of the Accumulo release tarball has been reduced from 20 scripts to the four scripts below:
accumulo
- mostly left alone except for improved usageaccumulo-service
- manage Accumulo processes as servicesaccumulo-cluster
- manage Accumulo on cluster. Replaces start-all.sh
and stop-all.sh
accumulo-util
- combines many utility scripts into one script.Read this blog post for more information on this change.
A new bulk import API was added in 2.0 that has very different implementation. This new API supports the following new functionality.
The shell command for doing bulk load supports the old and new API. To use the new API from the shell simply omit the failure directory argument. TODO link to javadoc for new API. See {% ghi 436 %}, {% ghi 472 %}, and {% ghi 570 %}.
[Summaries]({% durl development/summaries %}) enables continually generating statistics about a table with user defined functions. This feature can inform a user about what is in their table and be used by compaction strategies to make decisions. For example, using this feature it would be possible to compact all tablets where deletes are more than 25% of the data. Another example use case is optimizing filtering compactions by enabling smart selection of files with pertinent data. Examples of filtering compactions are age off and removal of non-compliant data.
[Scan executors]({% durl administration/scan-executors %}) support prioritizing and dedicating scan resources. Each executor has a configurable number of threads and an optional custom prioritizer. Tables can be configured in a flexible way to dispatch scans to different executors.
All new pluggable components introduced in 2.0 were placed under a new SPI package. The SPI package is analyzed by Apilyzer at build time to ensure plugins only use SPI and API types. This prevents plugins from using internal Accumulo types that are inherently unstable over time. Plugins created before 2.0 do use internal types and are less stable. The new pluggable interfaces should be much more stable.
An official Accumulo docker images was created in ACCUMULO-4706 to make it easier for users to run Accumulo in Docker. To support running in Docker, a few changes were made to Accumulo:
--upload-accumulo-site
option was added to accumulo init
to set properties in accumulo-site.xml to Zookeeper during initialization.-o <key>=<value>
option was added to the accumulo
command to override configuration that could not be set in Zookeeper.Accumulo's documentation has been refactored with the following improvements:
Accumulo's documentation was also reviewed and changes were made to improve accuracy and remove out of date documentation.
The Accumulo examples were moved out the accumulo repo to the accumulo-examples repo which has the following benefits:
The log4j configuration of Accumulo services was improved in ACCUMULO-4588 with the following changes:
Replaced the ability to use Value.equals(byte[])
to check if the contents of a Value
object was equal to a given byte array in ACCUMULO-4726. To perform that check, you must now use the newly added Value.contentEquals(byte[])
method. This corrects the behavior of the equals
method so that it conforms to the API contract documented in the javadoc inherited from its superclass. However, it will break any code that was relying on the undocumented and broken behavior to compare Value
objects with byte arrays. Such comparisons will now always return false
instead of true
, even if the contents are equal.
String
and qualifier of type byte[]
is much easier to write using this new API.View the Upgrading Accumulo documentation for guidance.