blob: 44a905df86e042a78baf755203e527b0b0085a71 [file] [log] [blame] [view]
---
title: Apache Accumulo 2.0.0
sortableversion: '02.00.00-final'
archived_critical: true
---
Apache Accumulo 2.0.0 contains significant changes from 1.9 and earlier
versions. It is the first major release since adopting [semver] and is the
culmination of more than 3 years worth of work by more than 40 contributors
from the Accumulo community. The following release notes highlight some of the
changes. If anything is missing from this list, please [contact] the developers
to have it included.
## Notable Changes
### New API for creating connections to Accumulo
A fluent API for creating Accumulo clients was introduced in [ACCUMULO-4784] and [#634].
The `Connector` and `ZooKeeperInstance` objects have been deprecated and replaced by
`AccumuloClient` which is created from the `Accumulo` entry point. The new API also deprecates
`ClientConfiguration` and introduces its own properties file called `accumulo-client.properties`
that ships with the Accumulo tarball. The new API has the following benefits over the old API:
* All connection information can be specifed in properties file to create the client. This was not
possible with old API.
* The new API does not require `ZooKeeperInstance` to be created first before creating a client.
* The new client is closeable and does not rely on shared static resource management
* Clients can be created using a new Java builder, `Properties` object, or `accumulo-client.properties`
* Clients can now be created with default settings for `BatchWriter`, `Scanner`, etc.
* Create scanners with default authorizations. {% ghi 744 %}
See the [client documentation][clients] for more information on how to use the new API.
### Hadoop 3 Java 8 & 11.
Accumulo 2.x expects at least Java 8 and Hadoop 3. It is built against Java 8
and Hadoop 3 and the binary tarball is targeted to work with a Java 8 and
Hadoop 3 system. See {% jira ACCUMULO-4826 %}, {% ghi 531 %}, and {% jira
ACCUMULO-4299 %}. Running with Java 11 is also supported, but Java 11 is not
required.
### Simplified Accumulo scripts and configuration files
Accumulo's scripts and configuration were refactored in [ACCUMULO-4490] to make Accumulo
easier to use. The number of scripts in the `bin` directory of the Accumulo release tarball
has been reduced from 20 scripts to the four scripts below:
* `accumulo` - mostly left alone except for improved usage
* `accumulo-service` - manage Accumulo processes as services
* `accumulo-cluster` - manage Accumulo on cluster. Replaces `start-all.sh` and `stop-all.sh`
* `accumulo-util` - combines many utility scripts into one script.
Read [this blog post][script-post] for more information on this change.
### New Bulk Import API
A new bulk import API was added in 2.0 that has very different implementation. This new API supports the following new functionality.
* Bulk import to an offline table.
* Load plans that specify where files go in a table which avoids opening the
files for inspection.
* Inspection of file on the client side. Inspection of all files is done
before the FATE operation starts. This results in less namenode operations
and fail-fast for bad files (no longer need a fail directory).
* A new improved algorithm to load files into tablets. This new algorithm
scans the metadata table and makes asynchronous load calls to all tablets.
This queues load operations on all tablets at around the same time. The
async RPC calls and beforehand inspection make the bulk load FATE operation
much shorter.
The shell command for doing bulk load supports the old and new API. To use the
new API from the shell simply omit the failure directory argument.
For the API, use the [new fluent API][newImportDir].
See {% ghi 436 %}, {% ghi 472 %}, and {% ghi 570 %}.
### Summaries
[Summaries]({% durl development/summaries %}) enables continually generating
statistics about a table with user defined functions. This feature can inform
a user about what is in their table and be used by compaction strategies to
make decisions. For example, using this feature it would be possible to
compact all tablets where deletes are more than 25% of the data. Another
example use case is optimizing filtering compactions by enabling smart
selection of files with pertinent data. Examples of filtering compactions are
age off and removal of non-compliant data.
### Scan Executors
[Scan executors]({% durl administration/scan-executors %}) support prioritizing
and dedicating scan resources. Each executor has a configurable number of
threads and an optional custom prioritizer. Tables can be configured in a
flexible way to dispatch scans to different executors.
### SPI package
All new pluggable components introduced in 2.0 were placed under a new SPI
package. The SPI package is analyzed by [Apilyzer] at build time to ensure
plugins only use SPI and API types. This prevents plugins from using internal
Accumulo types that are inherently unstable over time. Plugins created before
2.0 do use internal types and are less stable. The new pluggable interfaces
should be much more stable.
### Official Accumulo docker image was created
An [official Accumulo docker images][accumulo-docker] was created in [ACCUMULO-4706] to make
it easier for users to run Accumulo in Docker. To support running in Docker, a few changes were
made to Accumulo:
* The `--upload-accumulo-site` option was added to `accumulo init` to set properties in accumulo-site.xml
to Zookeeper during initialization.
* The `-o <key>=<value>` option was added to the `accumulo` command to override configuration that could
not be set in Zookeeper.
### Updated and improved Accumulo documentation
Accumulo's documentation has been refactored with the following improvements:
* Documentation source now lives in [accumulo-website repo][website-repo] so changes
are now immediately viewable.
* Improved navigation using a new sidebar
* Better linking to Javadocs, between documentation pages, and to configuration properties.
Accumulo's documentation was also reviewed and changes were made to improve accuracy and remove
out of date documentation.
### Moved Accumulo Examples to its own repo
The Accumulo examples were moved out the accumulo repo to the [accumulo-examples repo][accumulo-examples]
which has the following benefits:
* The Accumulo examples are no longer released with Accumulo and can be continuously improved.
* The Accumulo API version used by the examples can be updated right before Accumulo is released
to test for any changes to the API that break semver.
### Simplified Accumulo logging configuration
The log4j configuration of Accumulo services was improved in [ACCUMULO-4588] with the following changes:
* Logging is now configured using standard log4j JVM property 'log4j.configuration' in accumulo-env.sh.
* Tarball ships with fewer log4j config files (3 rather than 6) which are all log4j properties files.
* Log4j XML can still be used by editing accumulo-env.sh
* Removed auditLog.xml and added audit log configuration to log4j-service properties files
* Accumulo conf/ directory no longer has an examples/ directory. Configuration files ship in conf/ and are
used by default.
* Accumulo monitor by default will bind to 0.0.0.0 but will advertise hostname looked up in Java for log
forwarding
* Switched to use full hostnames rather than short hostnames for logging
### Removed comparison of Value with byte[] in Value.equals()
Replaced the ability to use `Value.equals(byte[])` to check if the contents of a
`Value` object was equal to a given byte array in [ACCUMULO-4726]. To perform
that check, you must now use the newly added `Value.contentEquals(byte[])`
method. This corrects the behavior of the `equals` method so that it conforms
to the API contract documented in the javadoc inherited from its superclass.
However, it will break any code that was relying on the undocumented and broken
behavior to compare `Value` objects with byte arrays. Such comparisons will now
always return `false` instead of `true`, even if the contents are equal.
### Removed default dynamic reloading classpath directory (lib/ext)
In {% ghi 1179 %}, the default directory for dynamic class reloading (lib/ext)
was removed and the default value for the deprecated property
`general.dynamic.classpaths` was set to blank. This was done as part of a plan
to phase out class loading behaviors that are tightly coupled to Accumulo, in
favor of more user-pluggable class loading features that are easier to maintain
separately from Accumulo's core code.
To continue to use this feature until it is removed, you must set this property
to a value. However, it is recommended to add your non-dynamic user class paths
to the `CLASSPATH` environment in `accumulo-env.sh` instead, or to leverage the
per-table context class paths feature, depending on your use case. For
reference, the previous default value was `$ACCUMULO_HOME/lib/ext/[^.].*.jar`.
### Other Notable Changes
* [ACCUMULO-3652] - Replaced string concatenation in log statements with slf4j
where applicable. Removed tserver TLevel logging class.
* [ACCUMULO-4449] - Removed 'slave' terminology and replaced with 'tserver' in
most cases. The former 'slaves' config file is now named 'tservers'. Added checks to
scripts to fail if 'slaves' file is present.
* {% jira ACCUMULO-4808 %} - Can now create table with splits and offline. Specifying splits
at table creation time can be much faster than adding splits after creation.
* {% jira ACCUMULO-4463 %} - Caching is now pluggable.
* {% jira ACCUMULO-4177 %} - New built in cache implementation based on TinyLFU.
* {% jira ACCUMULO-4376 %} {% jira ACCUMULO-4746 %} - Mutation and Key Fluent APIs allow easy mixing of types. For example a family of type `String` and qualifier of type `byte[]` is much easier to write using this new API.
* {% jira ACCUMULO-4771 %} - The Accumulo monitor was completely rewritten.
* {% jira ACCUMULO-4732 %} - Specify iterators and locality groups at table creation time.
* {% jira ACCUMULO-4612 %} - Use percentages for memory related configuration.
* {% jira ACCUMULO-1787 %} - Two tier compaction strategy. Support compacting small files with snappy and large files with gzip.
* {% ghi 560 %} - Provide new Crypto interface & impl
* {% ghi 536 %} - Removed mock Accumulo.
* {% ghi 438 %} - Added support for ZStandard compression
* {% ghi 404 %} - Added basic Grafana dashboard example.
* {% ghi 1102 %} {% ghi 1100 %} {% ghi 1037 %} - Removed lock contention in different areas. These locks caused threads working unrelated task to impede each other.
* {% ghi 1033 %} - Optimized the default compaction strategy. In some cases the Accumulo would rewrite data O(N^2) times over repeated compactions. With this change the amount of rewriting is always logarithmic.
* Many performance improvements mentioned in the 1.9.X release notes are also available in 2.0.
* Scanners close server side sessions on close {% ghi 813 %} {% ghi 905 %}
This release also includes bug fixes from 1.9.3 which was released after
2.0.0-alpha-1 and 2.0.0-alpha-2.
## Upgrading
View the [Upgrading Accumulo documentation][upgrade] for guidance.
## Useful Links
* [All tickets on GitHub related to this release][milestone]
[milestone]: https://github.com/apache/accumulo/milestone/12
[#634]: https://github.com/apache/accumulo/issues/634
[ACCUMULO-3652]: https://issues.apache.org/jira/browse/ACCUMULO-3652
[ACCUMULO-4449]: https://issues.apache.org/jira/browse/ACCUMULO-4449
[ACCUMULO-4490]: https://issues.apache.org/jira/browse/ACCUMULO-4490
[ACCUMULO-4588]: https://issues.apache.org/jira/browse/ACCUMULO-4588
[ACCUMULO-4706]: https://issues.apache.org/jira/browse/ACCUMULO-4706
[ACCUMULO-4726]: https://issues.apache.org/jira/browse/ACCUMULO-4726
[ACCUMULO-4733]: https://issues.apache.org/jira/browse/ACCUMULO-4733
[ACCUMULO-4737]: https://issues.apache.org/jira/browse/ACCUMULO-4737
[ACCUMULO-4784]: https://issues.apache.org/jira/browse/ACCUMULO-4784
[Apilyzer]: https://code.revelc.net/apilyzer-maven-plugin/
[accumulo-docker]: https://github.com/apache/accumulo-docker
[accumulo-examples]: https://github.com/apache/accumulo-examples
[clients]: /docs/2.x/getting-started/clients
[contact]: /contact-us
[newImportDir]: {% jurl org.apache.accumulo.core.client.admin.TableOperations %}#importDirectory(java.lang.String)
[script-post]: /blog/2016/11/16/simpler-scripts-and-config.html
[semver]: https://semver.org/spec/v2.0.0.html
[upgrade]: /docs/2.x/administration/upgrading
[website-repo]: https://github.com/apache/accumulo-website