| --- |
| title: Apache Accumulo 2.0.0 |
| sortableversion: '02.00.00-final' |
| archived_critical: true |
| --- |
| |
| Apache Accumulo 2.0.0 contains significant changes from 1.9 and earlier |
| versions. It is the first major release since adopting [semver] and is the |
| culmination of more than 3 years worth of work by more than 40 contributors |
| from the Accumulo community. The following release notes highlight some of the |
| changes. If anything is missing from this list, please [contact] the developers |
| to have it included. |
| |
| ## Notable Changes |
| |
| ### New API for creating connections to Accumulo |
| |
| A fluent API for creating Accumulo clients was introduced in [ACCUMULO-4784] and [#634]. |
| The `Connector` and `ZooKeeperInstance` objects have been deprecated and replaced by |
| `AccumuloClient` which is created from the `Accumulo` entry point. The new API also deprecates |
| `ClientConfiguration` and introduces its own properties file called `accumulo-client.properties` |
| that ships with the Accumulo tarball. The new API has the following benefits over the old API: |
| * All connection information can be specifed in properties file to create the client. This was not |
| possible with old API. |
| * The new API does not require `ZooKeeperInstance` to be created first before creating a client. |
| * The new client is closeable and does not rely on shared static resource management |
| * Clients can be created using a new Java builder, `Properties` object, or `accumulo-client.properties` |
| * Clients can now be created with default settings for `BatchWriter`, `Scanner`, etc. |
| * Create scanners with default authorizations. {% ghi 744 %} |
| |
| See the [client documentation][clients] for more information on how to use the new API. |
| |
| ### Hadoop 3 Java 8 & 11. |
| |
| Accumulo 2.x expects at least Java 8 and Hadoop 3. It is built against Java 8 |
| and Hadoop 3 and the binary tarball is targeted to work with a Java 8 and |
| Hadoop 3 system. See {% jira ACCUMULO-4826 %}, {% ghi 531 %}, and {% jira |
| ACCUMULO-4299 %}. Running with Java 11 is also supported, but Java 11 is not |
| required. |
| |
| ### Simplified Accumulo scripts and configuration files |
| |
| Accumulo's scripts and configuration were refactored in [ACCUMULO-4490] to make Accumulo |
| easier to use. The number of scripts in the `bin` directory of the Accumulo release tarball |
| has been reduced from 20 scripts to the four scripts below: |
| |
| * `accumulo` - mostly left alone except for improved usage |
| * `accumulo-service` - manage Accumulo processes as services |
| * `accumulo-cluster` - manage Accumulo on cluster. Replaces `start-all.sh` and `stop-all.sh` |
| * `accumulo-util` - combines many utility scripts into one script. |
| |
| Read [this blog post][script-post] for more information on this change. |
| |
| ### New Bulk Import API |
| |
| A new bulk import API was added in 2.0 that has very different implementation. This new API supports the following new functionality. |
| |
| * Bulk import to an offline table. |
| * Load plans that specify where files go in a table which avoids opening the |
| files for inspection. |
| * Inspection of file on the client side. Inspection of all files is done |
| before the FATE operation starts. This results in less namenode operations |
| and fail-fast for bad files (no longer need a fail directory). |
| * A new improved algorithm to load files into tablets. This new algorithm |
| scans the metadata table and makes asynchronous load calls to all tablets. |
| This queues load operations on all tablets at around the same time. The |
| async RPC calls and beforehand inspection make the bulk load FATE operation |
| much shorter. |
| |
| The shell command for doing bulk load supports the old and new API. To use the |
| new API from the shell simply omit the failure directory argument. |
| For the API, use the [new fluent API][newImportDir]. |
| See {% ghi 436 %}, {% ghi 472 %}, and {% ghi 570 %}. |
| |
| ### Summaries |
| |
| [Summaries]({% durl development/summaries %}) enables continually generating |
| statistics about a table with user defined functions. This feature can inform |
| a user about what is in their table and be used by compaction strategies to |
| make decisions. For example, using this feature it would be possible to |
| compact all tablets where deletes are more than 25% of the data. Another |
| example use case is optimizing filtering compactions by enabling smart |
| selection of files with pertinent data. Examples of filtering compactions are |
| age off and removal of non-compliant data. |
| |
| ### Scan Executors |
| |
| [Scan executors]({% durl administration/scan-executors %}) support prioritizing |
| and dedicating scan resources. Each executor has a configurable number of |
| threads and an optional custom prioritizer. Tables can be configured in a |
| flexible way to dispatch scans to different executors. |
| |
| ### SPI package |
| |
| All new pluggable components introduced in 2.0 were placed under a new SPI |
| package. The SPI package is analyzed by [Apilyzer] at build time to ensure |
| plugins only use SPI and API types. This prevents plugins from using internal |
| Accumulo types that are inherently unstable over time. Plugins created before |
| 2.0 do use internal types and are less stable. The new pluggable interfaces |
| should be much more stable. |
| |
| ### Official Accumulo docker image was created |
| |
| An [official Accumulo docker images][accumulo-docker] was created in [ACCUMULO-4706] to make |
| it easier for users to run Accumulo in Docker. To support running in Docker, a few changes were |
| made to Accumulo: |
| |
| * The `--upload-accumulo-site` option was added to `accumulo init` to set properties in accumulo-site.xml |
| to Zookeeper during initialization. |
| * The `-o <key>=<value>` option was added to the `accumulo` command to override configuration that could |
| not be set in Zookeeper. |
| |
| ### Updated and improved Accumulo documentation |
| |
| Accumulo's documentation has been refactored with the following improvements: |
| |
| * Documentation source now lives in [accumulo-website repo][website-repo] so changes |
| are now immediately viewable. |
| * Improved navigation using a new sidebar |
| * Better linking to Javadocs, between documentation pages, and to configuration properties. |
| |
| Accumulo's documentation was also reviewed and changes were made to improve accuracy and remove |
| out of date documentation. |
| |
| ### Moved Accumulo Examples to its own repo |
| |
| The Accumulo examples were moved out the accumulo repo to the [accumulo-examples repo][accumulo-examples] |
| which has the following benefits: |
| |
| * The Accumulo examples are no longer released with Accumulo and can be continuously improved. |
| * The Accumulo API version used by the examples can be updated right before Accumulo is released |
| to test for any changes to the API that break semver. |
| |
| ### Simplified Accumulo logging configuration |
| |
| The log4j configuration of Accumulo services was improved in [ACCUMULO-4588] with the following changes: |
| |
| * Logging is now configured using standard log4j JVM property 'log4j.configuration' in accumulo-env.sh. |
| * Tarball ships with fewer log4j config files (3 rather than 6) which are all log4j properties files. |
| * Log4j XML can still be used by editing accumulo-env.sh |
| * Removed auditLog.xml and added audit log configuration to log4j-service properties files |
| * Accumulo conf/ directory no longer has an examples/ directory. Configuration files ship in conf/ and are |
| used by default. |
| * Accumulo monitor by default will bind to 0.0.0.0 but will advertise hostname looked up in Java for log |
| forwarding |
| * Switched to use full hostnames rather than short hostnames for logging |
| |
| ### Removed comparison of Value with byte[] in Value.equals() |
| |
| Replaced the ability to use `Value.equals(byte[])` to check if the contents of a |
| `Value` object was equal to a given byte array in [ACCUMULO-4726]. To perform |
| that check, you must now use the newly added `Value.contentEquals(byte[])` |
| method. This corrects the behavior of the `equals` method so that it conforms |
| to the API contract documented in the javadoc inherited from its superclass. |
| However, it will break any code that was relying on the undocumented and broken |
| behavior to compare `Value` objects with byte arrays. Such comparisons will now |
| always return `false` instead of `true`, even if the contents are equal. |
| |
| ### Removed default dynamic reloading classpath directory (lib/ext) |
| |
| In {% ghi 1179 %}, the default directory for dynamic class reloading (lib/ext) |
| was removed and the default value for the deprecated property |
| `general.dynamic.classpaths` was set to blank. This was done as part of a plan |
| to phase out class loading behaviors that are tightly coupled to Accumulo, in |
| favor of more user-pluggable class loading features that are easier to maintain |
| separately from Accumulo's core code. |
| |
| To continue to use this feature until it is removed, you must set this property |
| to a value. However, it is recommended to add your non-dynamic user class paths |
| to the `CLASSPATH` environment in `accumulo-env.sh` instead, or to leverage the |
| per-table context class paths feature, depending on your use case. For |
| reference, the previous default value was `$ACCUMULO_HOME/lib/ext/[^.].*.jar`. |
| |
| ### Other Notable Changes |
| |
| * [ACCUMULO-3652] - Replaced string concatenation in log statements with slf4j |
| where applicable. Removed tserver TLevel logging class. |
| * [ACCUMULO-4449] - Removed 'slave' terminology and replaced with 'tserver' in |
| most cases. The former 'slaves' config file is now named 'tservers'. Added checks to |
| scripts to fail if 'slaves' file is present. |
| * {% jira ACCUMULO-4808 %} - Can now create table with splits and offline. Specifying splits |
| at table creation time can be much faster than adding splits after creation. |
| * {% jira ACCUMULO-4463 %} - Caching is now pluggable. |
| * {% jira ACCUMULO-4177 %} - New built in cache implementation based on TinyLFU. |
| * {% jira ACCUMULO-4376 %} {% jira ACCUMULO-4746 %} - Mutation and Key Fluent APIs allow easy mixing of types. For example a family of type `String` and qualifier of type `byte[]` is much easier to write using this new API. |
| * {% jira ACCUMULO-4771 %} - The Accumulo monitor was completely rewritten. |
| * {% jira ACCUMULO-4732 %} - Specify iterators and locality groups at table creation time. |
| * {% jira ACCUMULO-4612 %} - Use percentages for memory related configuration. |
| * {% jira ACCUMULO-1787 %} - Two tier compaction strategy. Support compacting small files with snappy and large files with gzip. |
| * {% ghi 560 %} - Provide new Crypto interface & impl |
| * {% ghi 536 %} - Removed mock Accumulo. |
| * {% ghi 438 %} - Added support for ZStandard compression |
| * {% ghi 404 %} - Added basic Grafana dashboard example. |
| * {% ghi 1102 %} {% ghi 1100 %} {% ghi 1037 %} - Removed lock contention in different areas. These locks caused threads working unrelated task to impede each other. |
| * {% ghi 1033 %} - Optimized the default compaction strategy. In some cases the Accumulo would rewrite data O(N^2) times over repeated compactions. With this change the amount of rewriting is always logarithmic. |
| * Many performance improvements mentioned in the 1.9.X release notes are also available in 2.0. |
| * Scanners close server side sessions on close {% ghi 813 %} {% ghi 905 %} |
| |
| This release also includes bug fixes from 1.9.3 which was released after |
| 2.0.0-alpha-1 and 2.0.0-alpha-2. |
| |
| ## Upgrading |
| |
| View the [Upgrading Accumulo documentation][upgrade] for guidance. |
| |
| ## Useful Links |
| |
| * [All tickets on GitHub related to this release][milestone] |
| |
| |
| [milestone]: https://github.com/apache/accumulo/milestone/12 |
| [#634]: https://github.com/apache/accumulo/issues/634 |
| [ACCUMULO-3652]: https://issues.apache.org/jira/browse/ACCUMULO-3652 |
| [ACCUMULO-4449]: https://issues.apache.org/jira/browse/ACCUMULO-4449 |
| [ACCUMULO-4490]: https://issues.apache.org/jira/browse/ACCUMULO-4490 |
| [ACCUMULO-4588]: https://issues.apache.org/jira/browse/ACCUMULO-4588 |
| [ACCUMULO-4706]: https://issues.apache.org/jira/browse/ACCUMULO-4706 |
| [ACCUMULO-4726]: https://issues.apache.org/jira/browse/ACCUMULO-4726 |
| [ACCUMULO-4733]: https://issues.apache.org/jira/browse/ACCUMULO-4733 |
| [ACCUMULO-4737]: https://issues.apache.org/jira/browse/ACCUMULO-4737 |
| [ACCUMULO-4784]: https://issues.apache.org/jira/browse/ACCUMULO-4784 |
| [Apilyzer]: https://code.revelc.net/apilyzer-maven-plugin/ |
| [accumulo-docker]: https://github.com/apache/accumulo-docker |
| [accumulo-examples]: https://github.com/apache/accumulo-examples |
| [clients]: /docs/2.x/getting-started/clients |
| [contact]: /contact-us |
| [newImportDir]: {% jurl org.apache.accumulo.core.client.admin.TableOperations %}#importDirectory(java.lang.String) |
| [script-post]: /blog/2016/11/16/simpler-scripts-and-config.html |
| [semver]: https://semver.org/spec/v2.0.0.html |
| [upgrade]: /docs/2.x/administration/upgrading |
| [website-repo]: https://github.com/apache/accumulo-website |