Fix checksum calculation bug when the payload is a CompositeByteBuf with readerIndex > 0 (#4196)

* Add a test that reproduces a bug in checksum calculation

* Revert "Fixed unnecessary copy to heap (#2701)" changes to ByteBufList

This partially reverts commit 3c9c7102538909fd3764ea7314e7618d6d9458fd.

* Remove CompositeBuffer unwrapping in DigestManager

* Rename update -> internalUpdate so that unwrapping logic could be added to update

* Remove unnecessary unwrapping logic in Java9IntHash

* Add safe way to handle CompositeByteBuf

* Add license header

* Fix checkstyle

* Refactor ByteBuf visitor solution

* Fix checkstyle

* Reformat

* Refactor recursive visiting

* Revisit equals, hashCode and toString

* Refactor test case

* Add support for UnpooledHeapByteBuf.getBytes which passes an array

* Add support for visiting buffers backed by byte[] arrays

- getBytes calls setBytes with a byte[] argument for
  heap ByteBufs

* Move ByteBufVisitor to org.apache.bookkeeper.util package

* Update javadoc

* Refactor to use stateless visitor so that instance can be shared

* Improve test so that a single scenario can be used for debugging

* Fix bug in Java9IntHash calculation that assumed crc32c_update(x) == ~crc32c_update(~x)

- Java9IntHash uses private methods from java.util.zip.CRC32C class,
  updateBytes and updateDirectByteBuffer.
  When inspecting the use and interface contract, it doesn't match
  how it is used in Java9IntHash. This PR addresses that by introducing
  a separate initial value for initializing the accumulated value
  so that the initial value could match the logic in
  java.util.zip.CRC32C.reset method. There's also a separate
  method for finalizing the accumulated value into a final
  checksum value. This is to match the java.util.zip.CRC32C.getValue
  method's logic (uses bitwise complement operator ~).

- With a quick glance, it might appear that the previous logic is similar.
  However it isn't since I have a failing test which gets fixed with this
  change. I haven't yet added the Java9IntHash level unit test case to prove how
  it differs. It must be related to integer value overflow. For the CRC32C function,
  I believe it means that it cannot be assumed in all cases that
  func(x) == ~func(~x). That's the assumption that the previous code was making.
  It probably applies for many inputs, but not all. It would break in overflow
  cases.

* Fix checkstyle

* Fix checkstyle

* Fix missing depth increment that prevents StackOverflowException

* Properly handle the depth increase and decrease

* Remove unnecessary condition

* Use more efficient way to read bytes to the target array

* Don't use ByteBufVisitor if it's not necessary

* Revert "Fix bug in Java9IntHash calculation that assumed crc32c_update(x) == ~crc32c_update(~x)"

This reverts commit 272e962930a31cbc237c5e7c0bd0c93213520ba4.

* Fix issue in resume byte[] version that was added

- input and output should be complemented. explanation has been added to the
  resume ByteBuf method

* Polish ByteBufVisitor

- reuse GetBytesCallbackByteBuf instance for handling the root ByteBuf instance

* Use extracted method

* Fix bug with array handling

* Polish ByteBufVisitor

* Optimize the buffer copying in the case where array or memory address cannot be accessed

- read-only buffers will need to be copied before reading
  - use ByteBuf.copy for direct buffers with pooled allocator when the algorithm can accept
    a memory address buffer
- use the 64kB threadlocal byte[] buffer for copying all other inputs

* Check if memory address is accepted

* Improve comments about complement (current = ~current) in resume

* Print thread dump when build is cancelled

* Filter empty buffers and arrays in ByteBufVisitor
17 files changed
tree: 29bb901873046dbf50022447bc750a1a4853d431
  1. .github/
  2. .test-infra/
  3. bin/
  4. bookkeeper-benchmark/
  5. bookkeeper-common/
  6. bookkeeper-common-allocator/
  7. bookkeeper-dist/
  8. bookkeeper-http/
  9. bookkeeper-proto/
  10. bookkeeper-server/
  11. bookkeeper-slogger/
  12. buildtools/
  13. circe-checksum/
  14. conf/
  15. cpu-affinity/
  16. deploy/
  17. dev/
  18. docker/
  19. metadata-drivers/
  20. microbenchmarks/
  21. native-io/
  22. shaded/
  23. site3/
  24. src/
  25. stats/
  26. stream/
  27. tests/
  28. testtools/
  29. tools/
  30. .asf.yaml
  31. .dlc.json
  32. .gitignore
  33. CONTRIBUTING.md
  34. LICENSE
  35. NOTICE
  36. pom.xml
  37. README.md
README.md

Maven Central

Apache BookKeeper

Apache BookKeeper is a scalable, fault-tolerant and low latency storage service optimized for append-only workloads.

It is suitable for being used in following scenarios:

  • WAL (Write-Ahead-Logging), e.g. HDFS NameNode, Pravega.
  • Message Store, e.g. Apache Pulsar.
  • Offset/Cursor Store, e.g. Apache Pulsar.
  • Object/Blob Store, e.g. storing state machine snapshots.

Get Started

  • Checkout the project website.
  • Concepts: Start with the basic concepts of Apache BookKeeper. This will help you to fully understand the other parts of the documentation.
  • Follow the Installation guide to set up BookKeeper.

Documentation

Please visit the Documentation from the project website for more information.

Get In Touch

Report a Bug

For filing bugs, suggesting improvements, or requesting new features, help us out by opening a GitHub issue.

Need Help?

Subscribe or mail the user@bookkeeper.apache.org list - Ask questions, find answers, and also help other users.

Subscribe or mail the dev@bookkeeper.apache.org list - Join development discussions, propose new ideas and connect with contributors.

Join us on Slack - This is the most immediate way to connect with Apache BookKeeper committers and contributors.

Contributing

We feel that a welcoming open community is important and welcome contributions.

Contributing Code

  1. See our installation guide to get your local environment setup.

  2. Take a look at our open issues: GitHub Issues.

  3. Review our coding style and follow our pull requests to learn more about our conventions.

  4. Make your changes according to our contributing guide