ARROW-11984: [C++][Gandiva] Implement SHA1 and SHA256 functions

Implement SHA1 and SHA256 functions on Gandiva module.
Used OpenSSL to run the SHA algorithm for numeric and string values.

JIRA issue: https://issues.apache.org/jira/browse/ARROW-11984

Closes #9707 from jpedroantunes/feature/add-sha256-functions and squashes the following commits:

4315a5621 <Anthony Louis> Fix cmake file formatter
f30e8039c <Anthony Louis> Fix OpenSSL version message
343ef7497 <Anthony Louis> Fix static linkining for OpenSSL in linux
245404f87 <Anthony Louis> Set option to use static libraries from OpenSSL
6c4eadf3a <Anthony Louis> Fix message for openssl
b7525f01a <Anthony Louis> Change the docker image for ubuntu
5cb4212b8 <Anthony Louis> Change least version for openssl
fb0b8b45c <Anthony Louis> Add openssl as whitelist library
313daf099 <Anthony Louis> Add openssl include dir
beaf1809c <Anthony Louis> Apply formatter changes
1aa40af7a <Anthony Louis> Fix test assert
e22b9782c <Anthony Louis> Add check for null values inside the test
9e87e1bab <Anthony Louis> Add gandiva export to functions
af64113f5 <Anthony Louis> Fix casting problems
792e9f70e <Anthony Louis> Fix errors in unit tests
8f95d4583 <Anthony Louis> Fix formating problems
826d1cb08 <Anthony Louis> Add tests for hash utils
9f250045c <Anthony Louis> Add gandiva export in decimal functions
80c56923e <Anthony Louis> Change integration tests asserts types
db28d7fd6 <Anthony Louis> Add comments for in line methods
9a6ecd328 <Anthony Louis> Change class to first class functions
220735e0d <Anthony Louis> Fix buffer update way
25f45d90b <Anthony Louis> Commenting out unnecessary parameters
4afdc0fc7 <Anthony Louis> Add check for the hash response size during the processing
258c93d46 <Anthony Louis> Add check for hash size after processing
6238bfc3c <Anthony Louis> Change name from sha128 to sha1
aa94ffdc8 <Anthony Louis> Remove null-char at the string final pos
a7092d931 <Anthony Louis> Add hash values for null
899ccb6ca <Anthony Louis> Fix problems for wrong types in windows
4bf483cec <Anthony Louis> Fix problems for address sanitizer
a3200b993 <Anthony Louis> Fix linter errors
adadbdd84 <Anthony Louis> Add tests for the decimal hash
2339345be <Anthony Louis> Add functions for sha in decimal types
61122220b <Anthony Louis> Fix name for repeated macros
ea58b145d <João Pedro> Fix linter suggestions on c++ files
d044ab492 <Anthony Louis> Refactor names for the function registry hashes
a150c7056 <Anthony Louis> Refactor names for sha functions macros
fbc08d993 <Anthony Louis> Fix problems for tests with var len variables
a3605ec98 <Anthony Louis> Add a variable to limit buffer for response
154a1d419 <Anthony Louis> Add a flag that indicates that method can return an error
f7d1c17d6 <Anthony Louis> Refactor methods names to follow cpp patterns
7a0ca5c5f <Anthony Louis> Add file to find OpenSSL library
d55e42504 <Anthony Louis> Add licenses in created files
9abbd9196 <João Pedro> Fix missing ) in macro on function registry common for gandiva
821260bf6 <João Pedro> Add base sha128 method definition on globall mapper for gandiva function stubs
56ca07d7d <João Pedro> Add base macros for sha128 definition
4dc3b68e9 <João Pedro> Add methdo definition for sha128 on gandiva stub file
e479d4a6f <João Pedro> Remove unused unit test for sha128 on gandiva
3f4d0fa3e <João Pedro> Add base tests for hash 128 gdv function
d05681a09 <João Pedro> Standardize pointer definition on hash utils
8e2cfee21 <Anthony Louis> Add integration tests for the functions over strings
d836d27c0 <Anthony Louis> Add macros for all types for sha256
28fb10582 <Anthony Louis> Add macros for all numeric and date types for sha256 functions
e28e175dc <Anthony Louis> Add the result length inside the hash functions
6c906e7f0 <Anthony Louis> Add support to function length inside the hash functions
be7ebee19 <Anthony Louis> Add integration tests for sha hash functions
5ddcf0569 <frank400> Implement test o sha1 and sha256 with an empty string
f3e1d13e9 <Anthony Louis> Implements a generic method to retrieve SHA hashes
af05c2117 <frank400> Implements the function hash_using_SHA128
3466df0b6 <frank400> Add tests to the function gdv_fn_sha128_from_numeric and gdv_fn_hash_sha128_from_string
9b617c421 <frank400> Add stubs for the method hash_using_SHA128
9355de12e <frank400> Add tests to the sha256 method
832db207b <Anthony Louis> Add tests for hash in strings
e01c9264a <Anthony Louis> Create tests for the gdv_fn_sha256_from_numeric function
513b9a05a <Anthony Louis> Implements and expose hash functions
1218ce6bc <Anthony Louis> Fix the parameter types for hash_utils
0de6580e6 <Anthony Louis> Add hash utils files inside the CmakeLists
87f0fe0e3 <Anthony Louis> Port initial methods from old repository

Lead-authored-by: Anthony Louis <anthony@simbioseventures.com>
Co-authored-by: João Pedro <joaop@simbioseventures.com>
Co-authored-by: frank400 <j.victorhuguenin2018@gmail.com>
Signed-off-by: Praveen <praveen@dremio.com>
15 files changed
tree: cda688326a61493593eff9231c8486d84a4f82f4
  1. .github/
  2. c_glib/
  3. ci/
  4. cpp/
  5. csharp/
  6. dev/
  7. docs/
  8. format/
  9. go/
  10. java/
  11. js/
  12. julia/
  13. matlab/
  14. python/
  15. r/
  16. ruby/
  17. rust/
  18. .asf.yaml
  19. .clang-format
  20. .clang-tidy
  21. .clang-tidy-ignore
  22. .dir-locals.el
  23. .dockerignore
  24. .env
  25. .gitattributes
  26. .gitignore
  27. .gitmodules
  28. .hadolint.yaml
  29. .pre-commit-config.yaml
  30. .readthedocs.yml
  31. .travis.yml
  32. appveyor.yml
  33. CHANGELOG.md
  34. cmake-format.py
  35. CODE_OF_CONDUCT.md
  36. CONTRIBUTING.md
  37. docker-compose.yml
  38. header
  39. LICENSE.txt
  40. NOTICE.txt
  41. README.md
  42. run-cmake-format.py
README.md

Apache Arrow

Build Status Coverage Status Fuzzing Status License Twitter Follow

Powering In-Memory Analytics

Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to process and move data fast.

Major components of the project include:

Arrow is an Apache Software Foundation project. Learn more at arrow.apache.org.

What's in the Arrow libraries?

The reference Arrow libraries contain many distinct software components:

  • Columnar vector and table-like containers (similar to data frames) supporting flat or nested types
  • Fast, language agnostic metadata messaging layer (using Google's Flatbuffers library)
  • Reference-counted off-heap buffer memory management, for zero-copy memory sharing and handling memory-mapped files
  • IO interfaces to local and remote filesystems
  • Self-describing binary wire formats (streaming and batch/file-like) for remote procedure calls (RPC) and interprocess communication (IPC)
  • Integration tests for verifying binary compatibility between the implementations (e.g. sending data from Java to C++)
  • Conversions to and from other in-memory data structures
  • Readers and writers for various widely-used file formats (such as Parquet, CSV)

Implementation status

The official Arrow libraries in this repository are in different stages of implementing the Arrow format and related features. See our current feature matrix on git master.

How to Contribute

Please read our latest project contribution guide.

Getting involved

Even if you do not plan to contribute to Apache Arrow itself or Arrow integrations in other projects, we'd be happy to have you involved: