Apache Accumulo 1.9.3 contains bug fixes for Write Ahead Logs and compaction. Users of 1.9.2 are encouraged to upgrade.
This release fixes Write Ahead Logs issues that slow or prevent recovery and in some cases lead to data loss. The fixes reduce the number of WALS referenced by a tserver, improve error handing, and improve clean up.
Eliminates a race condition that could result in data loss during recovery. If the GC deletes unreferenced WALs from ZK while the master is reading recovery WALs from ZK, the master may skip WALs it should not, resulting in data loss. Fixed in #866.
Opening a new WAL in DFS may fail, but still be advertised in ZK. This could result in a missing WAL during recovery, preventing tablets from loading. There is no data loss in this case, just WAL references that should not exists. Reported in #949 and fixed in #1005 #1057.
tserver failures could result in many empty WALs that unnecessarily slow recovery. This was fixed in #823 #845.
Some write patterns caused tservers to unnecessarily reference a lot of WALs, which could slow any recovery. In #854 #860 the max WALs referenced was limited regardless of the write pattern, avoiding long recovery times.
During tablet recovery, filter out logs that do not define the tablet. #881
If a tserver fails sorting, a marker file is written to the recovery directory. This marker prevents any subsequent recovery attempts from succeeding. Fixed by modifying the WAL RecoveryLogReader to handle failed file markers in #961 #1048.
Improve performance of serializing mutations to a WAL by avoiding frequent synchronization. #669
Stop locking during compaction. Compactions acquired the tablet lock between each key value. This created unnecessary contention with other operations like scan and bulk imports. The synchronization was removed #1031 #1032.
Only re-queue compaction when there is activity. #759
If the 7 digit base 36 number used to name files attempted to go to 8 digits, then compactions would fail. This was fixed in #562.
Added master metrics to provide a snapshot of current FATE operations. The metrics added:
The number of child operations provides a light-weight surrogate for FATE transaction progression between snapshots. The metrics are controlled with the following properties:
When enabled, the metrics are published to JMX and can optionally be configured using standard hadoop metrics2 configuration files.
Versions of libstdc++ 8.2 and higher triggered errors within within the native map code. This release fixes issues #767, #769, {% ghi 1064 %}, and {% ghi 1070 %}.
The split code assumed that if a tablet had files that it had data in those files. There are some edge case where this is not true. Updated the split code to handle this #998 #999.
Accumulo has a configurable limit on the max number of files open in a tserver for all scans. When too many files are open, scans must wait. In #978 and #981 scans that wait too long for files now log a message.
The Accumulo client code that checks if tables exists had a race condition. The race was fixed in #768 and #973
Mini Accumulo made some assumptions about classloaders that were no longer true in Java 11. This caused Mini to fail in Java 11. In #924 Mini was updated to work with Java 11, while still working with Java 7 and 8.
If snappy was configured and the snappy libraries were not available then minor compactions could hang forever. In #920 and #925 this was fixed and minor compactions will proceed when a different compression is configured.
Improperly configured locality groups could cause a tablet to become inoperative. This was fixed in #819 and #840.
There was a race condition in bulk import that could result in files being imported after a bulk import transaction had completed. In the worst case these files were already compacted and garbage collected. This would cause a tablet to have a reference to a file that did not exists. No data would have been lost, but it would cause scans to fail. The race was fixed in #800 and #837
This addresses an issue when using the HostRegexTableLoadBalancer when the default pool is empty. The load balancer will not assign the tablets at all. Here, we select a random pool to assign the tablets to. This behavior is on by default in the HostRegexTableLoadBalancer but can be disabled via HostRegexTableLoadBalancer configuration setting table.custom.balancer.host.regex.HostTableLoadBalancer.ALL Fixed in #691 - backported to 1.9 in #710
The packaged, binary tarball contains updated version of libthrift to version 0.9.3-1 to address thrift CVE. Issue #1029
View the [Upgrading Accumulo documentation][upgrade] for guidance.
[examples]: {{ site.baseurl }}/1.9/examples/ [fluo]: https://fluo.apache.org [javadoc]: {{ site.baseurl }}/1.9/apidocs/ [prev_notes]: {{ site.baseurl }}/release/accumulo-1.9.2/ [upgrade]: {{ site.baseurl }}/docs/2.x/administration/upgrading [user_manual]: {{ site.baseurl }}/1.9/accumulo_user_manual.html [vote-emails]: https://lists.apache.org/thread.html/62a490ee3005ef2ec1f3865f6a9539efc082abc49c90892b49005eed@%3Cdev.accumulo.apache.org%3E