tree 74fae4f9f5c4a8fc87a3ec44763478a55d273d50
parent aafec07e086463fc7ed72c704e9f7e367460618a
author wiedld <wiedld@users.noreply.github.com> 1735736591 -0500
committer GitHub <noreply@github.com> 1735736591 -0500
gpgsig -----BEGIN PGP SIGNATURE-----
 
 wsFcBAABCAAQBQJndT0PCRC1aQ7uu5UhlAAAXNUQAFpGKl+eTUAps4WU2AnBJwka
 AwNxCLX0Rda35VqiRB0JDRO77mnecodyoVQwtHFs1Ib9Ryvm9teKcPWA2h9tZbi6
 sLyLOxLlWF/sfks4BNM3Vux+u00M02QSBEl0h1DWjuyAcaHBAF99I5UteGRd8gp8
 AtBluESqd/q/7Fn4DmeJAnrH882A2TagscRJLMjte/tjofzzP7AmazX+TPtH6Lu/
 gifKvwUUXQr70LlJggrwZuLXXjad8oiabpIem7NcEzBuUd4/hmZtYdbVWfoL9Hjy
 gxlisNQPz8xtecScA8Yhn6uz+XF5wSUiYB1P+38A8n5gqNfcnfDo06wZXe8JCsym
 bsOtA4gj3I/x3y5ThPMcqSilmLe8+WHP+JjH8ibKNF21URXByqnFz1hZjiUO1qYr
 lHjwKLtItpPsaDd6BWlJz45IdB7ChzvcUqN/Z4A6O0zDJmHaVrFDpsLRXV5tydEZ
 CCTsKWxSL4zZO9aqAOKQxwTFtJt/P6zr1laTDWzy1cRMCX/aKQ3HOUbMhahVKOgF
 9WS01lTt8rMHZVM029Kz1TbNS5XX+aVS8oTEXDkOfdEx5pppo1euJQsvlGkBHybm
 2L+AZMVgKUUzAWYCrtl371kWlNz0FnJ+tB4ZSGQxguJf1MX4eOZZq9HHiaTnOWnl
 WZIH1GF4aJleg6hvcyqK
 =N21F
 -----END PGP SIGNATURE-----
 

Supporting writing schema metadata when writing Parquet in parallel (#13866)

* refactor: make ParquetSink tests a bit more readable

* chore(11770): add new ParquetOptions.skip_arrow_metadata

* test(11770): demonstrate that the single threaded ParquetSink is already writing the arrow schema in the kv_meta, and allow disablement

* refactor(11770): replace  with new method, since the kv_metadata is inherent to TableParquetOptions and therefore we should explicitly make the API apparant that you have to include the arrow schema or not

* fix(11770): fix parallel ParquetSink to encode arrow  schema into the file metadata, based on the ParquetOptions

* refactor(11770): provide deprecation warning for TryFrom

* test(11770): update tests with new default to include arrow schema

* refactor: including partitioning of arrow schema inserted into kv_metdata

* test: update tests for new config prop, as well as the new file partition offsets based upon larger metadata

* chore: avoid cloning in tests, and update code docs

* refactor: return to the WriterPropertiesBuilder::TryFrom<TableParquetOptions>, and separately add the arrow_schema to the kv_metadata on the TableParquetOptions

* refactor: require the arrow_schema key to be present in the kv_metadata, if is required by the configuration

* chore: update configs.md

* test: update tests to handle the (default) required arrow schema in the kv_metadata

* chore: add reference to arrow-rs upstream PR