commit | b43a6898f9a0be44d283a24ef52011530eb718dc | [log] [tgz] |
---|---|---|
author | João Pedro <joaop@simbioseventures.com> | Mon Apr 12 21:23:03 2021 +0530 |
committer | Praveen <praveen@dremio.com> | Mon Apr 12 21:23:03 2021 +0530 |
tree | bc43608a3099bc1b5fcfdd40c63a475844fb93d8 | |
parent | 632b2c1d2e0ca1ace51f52061c078f693338f3e9 [diff] |
ARROW-12146: [C++][Gandiva] Implement CONVERT_FROM(expression, replacement char) function Implement CONVERT_FROM(expression, ‘UTF8’, replacement char) Converts the byte data in expression to UTF-8. Expression can be a literal string or a field name. Will replace any invalid UTF-8 characters with the replacement character. Obs.: Actually we will only support a single byte replacement char Closes #9844 from jpedroantunes/feature/convert-replace-utf8 and squashes the following commits: bef6eafda <João Pedro> Add optimization for returning original string if no invalid chars were found e7c6a71db <João Pedro> Refactor memcpy unnecessary for single byte 7aac875e7 <João Pedro> Add handler for cases with 0 char len on replace char 6544583f0 <João Pedro> Apply proper identation on types.h and string_ops.cc in gandiva c66efb8e4 <João Pedro> Apply corrections and optimization on convert replace function d815f854c <João Pedro> Add validation for MSBs on convert replace utf8 Gandiva function 8e44d413d <João Pedro> Add validation for defined char length greater than 1 on convert replace a2ea61bee <João Pedro> Adapt convert_from method to support single char on replacement (defined with dremio team) 7d4cec02c <João Pedro> Adapt convert_from method to support multiple char on replacement 1a1734b9a <João Pedro> Change string ops test for defining int variables instead of size_t b96dfc750 <João Pedro> Fix lint problems on string ops and test files 8f9a4bde0 <João Pedro> Fix identation on string files on gandiva module 875a1dd87 <João Pedro> Add integration test for convert replace utf8 method 536fd3a63 <João Pedro> Add definition of convert replace str method to types.h c950c8a45 <João Pedro> Add base tests for convert replace invalid chars 2a5fe944e <João Pedro> Add base logic for convert replace utf8 invalid chars Authored-by: João Pedro <joaop@simbioseventures.com> Signed-off-by: Praveen <praveen@dremio.com>
Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to process and move data fast.
Major components of the project include:
Arrow is an Apache Software Foundation project. Learn more at arrow.apache.org.
The reference Arrow libraries contain many distinct software components:
The official Arrow libraries in this repository are in different stages of implementing the Arrow format and related features. See our current feature matrix on git master.
Please read our latest project contribution guide.
Even if you do not plan to contribute to Apache Arrow itself or Arrow integrations in other projects, we'd be happy to have you involved: