)]}'
{
  "log": [
    {
      "commit": "be664614ab684ec34809361f7e868cc8c3919552",
      "tree": "301b090f3ee10eb6e52f2f90b706c15db0dd4fa0",
      "parents": [
        "11a58ac3ec7086960376ed1b5da7dc75cad9da0d"
      ],
      "author": {
        "name": "RIchard Baah",
        "email": "137434454+Rich-T-kid@users.noreply.github.com",
        "time": "Fri Jun 12 22:23:09 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sat Jun 13 12:23:09 2026 +1000"
      },
      "message": "Support writing REE arrays directly to Parquet (#10064)\n\n# Which issue does this PR close?\nThis PR works towards an initial solution closing #8016 \n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n\n- Closes #8016.\n\n# Rationale for this change      \nCurrently `arrow_writer` does not support writing Run End Encoded\ncolumns out to parquet. This PR works towards solving this by first\nexpanding out the REE to its value type \u0026 then writing out to parquet.\nOnce its possible to write REE to parquet we can work on optimizing it\nby keeping the compacting nature in tact.\n\u003c!--\nWhy are you proposing this change? If this is already explained clearly\nin the issue then this section is not needed.\nExplaining clearly why changes are proposed helps reviewers understand\nyour changes and offer better suggestions for fixes.\n--\u003e\n\n# What changes are included in this PR?\n`arrow_writer()` now supports writing Run End Encoded (REE) arrays to\nParquet by hydrating them to their underlying value type before\nencoding. This is an initial, correctness-first implementation. A\nfollow-up can/should optimize to preserve the compacted structure.\n\n**parquet/src/arrow/arrow_writer/mod.rs**: generate a value-type\narrow-column writer \u0026 test\n**parquet/src/arrow/arrow_writer/levels.rs**: core writer logic updated\nto detect REE columns and expand them to their flat value type before\nthe existing write path.\n**parquet/src/arrow/schema/mod.rs**: schema conversion updated to map\nRunEndEncodedType to an appropriate Parquet physical type.\n**parquet/benches/arrow_writer.rs**: REE write benchmarks added with low\nand high null density scenarios, now unblocked by the implementation.\n\n\u003c!--\nThere is no need to duplicate the description in the issue here but it\nis sometimes worth providing a summary of the individual changes in this\nPR.\n--\u003e\n\n# Are these changes tested?\nYes\n\u003c!--\nWe typically require tests for all PRs in order to:\n1. Prevent the code from being accidentally broken by subsequent changes\n2. Serve as another way to document the expected behavior of the code\n\nIf tests are not included in your PR, please explain why (for example,\nare they covered by existing tests)?\n\nIf this PR claims a performance improvement, please include evidence\nsuch as benchmark results.\n--\u003e\n\n# Are there any user-facing changes?\nUsers will be able to write out their REE columns out to parquet using\n`arrow_writer`\n\u003c!--\nIf there are user-facing changes then we may require documentation to be\nupdated before approving the PR.\n\nIf there are any breaking changes to public APIs, please call them out.\n--\u003e"
    },
    {
      "commit": "11a58ac3ec7086960376ed1b5da7dc75cad9da0d",
      "tree": "7902ed7b44c767caf5a53ccca3a087d31e8a130a",
      "parents": [
        "c4a831a1c81be3c20dff510d73954fa6712d90d5"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Sat Jun 13 03:19:10 2026 +0200"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sat Jun 13 11:19:10 2026 +1000"
      },
      "message": "chore: update pyo3 dependency to 0.29 (#10134)\n\n# Which issue does this PR close?\n\nNone, just a dependency update.\n\n# Rationale for this change\n\npyo3 has security vulnerability:\nhttps://rustsec.org/advisories/RUSTSEC-2026-0176.html\n\nThis PR updates to 0.29 to resolve this vulnerability.\n\n# What changes are included in this PR?\n\nUpdate all crates that use the pyo3 dependency to 0.29\n\n# Are these changes tested?\n\nUpdated and run against existing integration test suite.\n\n# Are there any user-facing changes?\n\nNo\n\n---------\n\nCo-authored-by: Claude Opus 4.8 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "c4a831a1c81be3c20dff510d73954fa6712d90d5",
      "tree": "a9cb42cffa0941d9293a8b4b25ba44ca1f28c12d",
      "parents": [
        "1ba5d486a45b234d5bdc5e91850c241d5a7bccec"
      ],
      "author": {
        "name": "Adam Gutglick",
        "email": "adam@spiraldb.com",
        "time": "Fri Jun 12 05:49:19 2026 +0100"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jun 12 14:49:19 2026 +1000"
      },
      "message": "ci: Split miri tests into 4 parallel shards (#10067)\n\n# Which issue does this PR close?\n\n- Closes #NNN.\n\n# Rationale for this change\n\nMiri currently takes just under an hour to run, with most of it being\nthe actual tests.\n\n# What changes are included in this PR?\n\nThis PR modifies the script that runs miri to optionally use nextest\u0027s\n[partitioning](https://nexte.st/docs/ci-features/partitioning/) feature,\nand makes use of it in CI with 4 partitions. This should reduce the\noverall miri runtime to just over 15 minutes with a minimal increase in\nCI resource usage.\n\nThis is also scalable if the number of tests keeps increasing, changing\nthe number of partitions is trivial, picking 4 here is an arbitrary\nchoice.\n\n# Are these changes tested?\n\nTested the script locally.\n\n# Are there any user-facing changes?\n\nNo"
    },
    {
      "commit": "1ba5d486a45b234d5bdc5e91850c241d5a7bccec",
      "tree": "1107166d961ab4dd4beacc2de1649aacf5b4822b",
      "parents": [
        "7567b7aacf3c4f7ea21d78dda2054f5312ef80aa"
      ],
      "author": {
        "name": "Konstantin Tarasov",
        "email": "33369833+sdf-jkl@users.noreply.github.com",
        "time": "Thu Jun 11 17:17:42 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 11 17:17:42 2026 -0400"
      },
      "message": "[Variant] Preserve `UUID` extension type metadata for Parquet writer (#10015)\n\n# Which issue does this PR close?\n\n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n\n- Closes ##8420.\n\n# Rationale for this change\n\nShredding into `FixedSizeBinary(16)` means we\u0027re shredding into `UUID`\nParquet logical type. `shred_variant` currently doesn\u0027t preserve\nextension type metadata for the typed value field.\n\nUUID is the only valid `Variant` shredding type that requires an arrow\nextension type.\nhttps://github.com/apache/parquet-format/blob/master/VariantShredding.md\n\nEarlier in\n[#](https://github.com/apache/arrow-rs/issues/8665#issuecomment-3423325006)\n@scovich mentioned:\n\n\u003e Yeah, as long as `shred_variant` only takes a `DataType` instead of a\n`Field`, we are forced to assume 16-byte fixed binary is UUID. If it\naccepted a `Field`, we should additionally require the UUID extension\ntype. Otherwise, we potentially run into problems because Decimal128 can\n_also_ use 16-byte fixed binary!\n\nThis is an argument proposing to use `Field` instead of `DataType` for\n`as_type` parameter in `shred_variant`. This should not be an issue\nbecause arrow has a `Decimal128Type` to represent `Decimal128` logical\nParquet type. This way there\u0027s no ambiguity in using\n`FixedSizeBinary(16)` arrow type to represent `UUID`. Switching\n`as_type` to `Field` is unnecessary.\n\n\u003c!--\nWhy are you proposing this change? If this is already explained clearly\nin the issue then this section is not needed.\nExplaining clearly why changes are proposed helps reviewers understand\nyour changes and offer better suggestions for fixes.\n--\u003e\n\n# What changes are included in this PR?\n\n- `VariantArray::from_parts/ShreddedVariantFieldArray::from_parts` now\nadd `UUID` extension type metadata to the typed_value `Field` if\n`DataType` is `FixedSizeBinary(16)`\n- Uncommented `UUID` extension part metadata validation in a unit test.\n\n\u003c!--\nThere is no need to duplicate the description in the issue here but it\nis sometimes worth providing a summary of the individual changes in this\nPR.\n--\u003e\n\n# Are these changes tested?\n\n- Yes, unit test.\n\u003c!--\nWe typically require tests for all PRs in order to:\n1. Prevent the code from being accidentally broken by subsequent changes\n2. Serve as another way to document the expected behavior of the code\n\nIf tests are not included in your PR, please explain why (for example,\nare they covered by existing tests)?\n\nIf this PR claims a performance improvement, please include evidence\nsuch as benchmark results.\n--\u003e\n\n# Are there any user-facing changes?\n\n- Shredded `UUID` typed value fields now preserve `UUID` extension type\nmetadata.\n\n\u003c!--\nIf there are user-facing changes then we may require documentation to be\nupdated before approving the PR.\n\nIf there are any breaking changes to public APIs, please call them out.\n--\u003e\n\n---------\n\nCo-authored-by: Ryan Johnson \u003cscovich@users.noreply.github.com\u003e"
    },
    {
      "commit": "7567b7aacf3c4f7ea21d78dda2054f5312ef80aa",
      "tree": "f5ea021e0f566a9daf30c9468c7b6dfa996dead6",
      "parents": [
        "a6596e4097b4104ee604f20fab67260d2b2e556d"
      ],
      "author": {
        "name": "mwish",
        "email": "maplewish117@gmail.com",
        "time": "Fri Jun 12 05:12:28 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 11 17:12:28 2026 -0400"
      },
      "message": "optimize(concat): concat map implementation (#10048)\n\n# Which issue does this PR close?\n\n- Closes #10047 .\n\n# Rationale for this change\n\nImplement concat for map\n\n# What changes are included in this PR?\n\nImplement concat for map\n\n# Are these changes tested?\n\nYes\n\n# Are there any user-facing changes?\n\nNo\n\n---------\n\nCo-authored-by: Andrew Lamb \u003candrew@nerdnetworks.org\u003e\nCo-authored-by: Claude Opus 4.8 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "a6596e4097b4104ee604f20fab67260d2b2e556d",
      "tree": "8dcbcf3c1772acb3c2283fd97118acb69ea1394d",
      "parents": [
        "549d77cb229eebc78153b9e254021e994f88a209"
      ],
      "author": {
        "name": "Raz Luvaton",
        "email": "16746759+rluvaton@users.noreply.github.com",
        "time": "Fri Jun 12 00:11:31 2026 +0300"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 11 17:11:31 2026 -0400"
      },
      "message": "feat: support `MapArray` in lengths kernel (#10121)\n\n# Which issue does this PR close?\n\nN/A\n\n# Rationale for this change\n\n`MapArray` is not very different than `ListArray` that is supported in\n`lengths` kernel\n\n# What changes are included in this PR?\n\nadded MapArray support and tests\n\n# Are these changes tested?\nyes\n\n# Are there any user-facing changes?\n\n`lengths` now support `MapArray`"
    },
    {
      "commit": "549d77cb229eebc78153b9e254021e994f88a209",
      "tree": "63efa3de74c43e8dcb9efddbab05a2083c33b964",
      "parents": [
        "8d2333cd0ac40a0aa6d9986feb3b2386647e5289"
      ],
      "author": {
        "name": "Raz Luvaton",
        "email": "16746759+rluvaton@users.noreply.github.com",
        "time": "Fri Jun 12 00:11:11 2026 +0300"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 11 17:11:11 2026 -0400"
      },
      "message": "feat(arrow_array): add helper function to create MapArray from `Vec\u003cOption\u003cVec\u003c(Key, Option\u003cValue\u003e)\u003e\u003e\u003e` for tests (#10123)\n\n# Which issue does this PR close?\n\nN/A\n\n# Rationale for this change\n\nWhenever you try to write tests that use `MapArray` you have very\nverbose way to build the MapArray with the specific values you want\n\nso adding this helper will allow arrow tests and user tests to be\ncleaner\n\n# What changes are included in this PR?\n\nadded function and updated some of the tests in the repo that use the\n`MapBuilder` (that do not test the builder itself of course) with the\nnew method to showcase how much cleaner it looks\n\n# Are these changes tested?\nyes\n\n# Are there any user-facing changes?\n\nnew function"
    },
    {
      "commit": "8d2333cd0ac40a0aa6d9986feb3b2386647e5289",
      "tree": "056ee8e4f4cea7456525dd1d76e6197719ff93aa",
      "parents": [
        "826b808b2792235c27a22ba917a7abe93cfbb221"
      ],
      "author": {
        "name": "Raz Luvaton",
        "email": "16746759+rluvaton@users.noreply.github.com",
        "time": "Fri Jun 12 00:09:43 2026 +0300"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 11 17:09:43 2026 -0400"
      },
      "message": "chore: update `Bytes` visibility to correctly reflect the actual visibility (#10115)\n\n# Which issue does this PR close?\n\nN/A\n\n# Rationale for this change\n\nalthough `Bytes` and some of its functions are marked as `pub` it is\nnever exposed outside the crate.\nupdated this so reading the code will be less confusing\n\n# What changes are included in this PR?\n\nchanged `pub` to `pub(crate)` in `Bytes` impl\n \n# Are these changes tested?\n\nexisting tests\n\n# Are there any user-facing changes?\n\nno since it was never exposed anyway"
    },
    {
      "commit": "826b808b2792235c27a22ba917a7abe93cfbb221",
      "tree": "2e82edafafe94aaca0af6a3e45cbf0bdf53cd493",
      "parents": [
        "cecbc72edbb4cdc93ef6e78d38493afb91ada02e"
      ],
      "author": {
        "name": "Konstantin Tarasov",
        "email": "33369833+sdf-jkl@users.noreply.github.com",
        "time": "Thu Jun 11 12:38:15 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 11 09:38:15 2026 -0700"
      },
      "message": "Add `StructArray::field_` APIs symmetric to `StructArray::column_` ones (#10110)\n\n# Which issue does this PR close?\n\n- Closes #10092.\n\n# Rationale for this change\n\ncheck issue\n\n# What changes are included in this PR?\n\n- Add two `field_` APIs symmetric to `column_` ones.\n- Reuse `Fields::find` in `column_by_name` to avoid a `Vec` alloc.\n- Fix doc pointing to an old Jira issue. Now points to #9205\n- `MapArray::entries_fields` avoid a `Vec` alloc.\n\n# Are these changes tested?\n\n- Yes, unit tests\n\n# Are there any user-facing changes?\n\nNew `StructArray` APIs."
    },
    {
      "commit": "cecbc72edbb4cdc93ef6e78d38493afb91ada02e",
      "tree": "ce3ccf2baf7a1761dba9a63d0a9d4b389badd2b7",
      "parents": [
        "301eb26bb92362e531bd6c39980292ebcae8e8aa"
      ],
      "author": {
        "name": "RIchard Baah",
        "email": "137434454+Rich-T-kid@users.noreply.github.com",
        "time": "Thu Jun 11 10:48:11 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 11 10:48:11 2026 -0400"
      },
      "message": "removed clippy ignore statment (#10111)\n\n# Which issue does this PR close?\n\n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n\nResolves this\nhttps://github.com/apache/arrow-rs/pull/10044#discussion_r3381759987\nfrom #10044\n\n# Rationale for this change\n\n\u003c!--\nWhy are you proposing this change? If this is already explained clearly\nin the issue then this section is not needed.\nExplaining clearly why changes are proposed helps reviewers understand\nyour changes and offer better suggestions for fixes.\n--\u003e\nCode in this file is hard to navigate \u0026 its unclear what is happening.\n# What changes are included in this PR?\n\n\u003c!--\nThere is no need to duplicate the description in the issue here but it\nis sometimes worth providing a summary of the individual changes in this\nPR.\n--\u003e\nThis PR introduces `IpcMetadataBuilde`r, a struct that groups the nodes\nand buffers vecs previously passed separately into `write_array_data()`,\nand removes the redundant num_rows/null_count parameters by deriving\nthem from `array_data` directly. Together these reduce\n`write_array_data()` from 10 arguments to 7, eliminating the\n#[allow(clippy::too_many_arguments)] suppression, and doc comments are\nadded to clarify the two-channel output model between\n`IpcMetadataBuilder` (flatbuffer header metadata) and `IpcBodySink` (raw\nArrow data bytes).\n# Are these changes tested?\nyes\n\u003c!--\nWe typically require tests for all PRs in order to:\n1. Prevent the code from being accidentally broken by subsequent changes\n2. Serve as another way to document the expected behavior of the code\n\nIf tests are not included in your PR, please explain why (for example,\nare they covered by existing tests)?\n\nIf this PR claims a performance improvement, please include evidence\nsuch as benchmark results.\n--\u003e\n\n# Are there any user-facing changes?\nno\n\u003c!--\nIf there are user-facing changes then we may require documentation to be\nupdated before approving the PR.\n\nIf there are any breaking changes to public APIs, please call them out.\n--\u003e"
    },
    {
      "commit": "301eb26bb92362e531bd6c39980292ebcae8e8aa",
      "tree": "dcdd580f6b96be9089399b2e76d954dc298f7fc7",
      "parents": [
        "e106c1534095cbfe1201a770c8ccca5252fdbb88"
      ],
      "author": {
        "name": "RIchard Baah",
        "email": "137434454+Rich-T-kid@users.noreply.github.com",
        "time": "Wed Jun 10 14:54:51 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 10 14:54:51 2026 -0400"
      },
      "message": "Reduce copies in Arrow IPC writer (#10044)\n\n# Which issue does this PR close?\n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n\n- Closes #10029.\n[A document that provides a bit of\ncontext](https://github.com/user-attachments/files/28477762/Arrow.flight.speed.up.2.pdf)\n\n\n# Rationale for this change\nCompression is the most compute and memory intensive part of the\narrow-ipc encoding pipeline. It runs per buffer, not per record batch.\nFor a Flight stream of 10 batches with 5 primitive arrays each, that is\n100 compression calls minimum, [more for string and struct\narrays](https://arrow.apache.org/docs/format/Columnar.html#compression).\nEach of those calls produced an owned compressed Vec that was then\ncopied a second time into a flat arrow_data accumulator before being\nwritten to the output. For the uncompressed path the situation was the\nsame: Arc-backed buffer slices that required no compression were still\ncopied into that accumulator unnecessarily.\n\nSeparately, the original **write_message()** function flushed after\nevery dictionary and every record batch, causing repeated small OS write\ncalls per batch. ( **for non vector backed writer implementations** )\nThe goal was to eliminate both problems: stop copying buffers that do\nnot need to be copied, and stop flushing on every message.\n\n\u003c!--\nWhy are you proposing this change? If this is already explained clearly\nin the issue then this section is not needed.\nExplaining clearly why changes are proposed helps reviewers understand\nyour changes and offer better suggestions for fixes.\n--\u003e\n\n# What changes are included in this PR?\n\n\u003c!--\nThere is no need to duplicate the description in the issue here but it\nis sometimes worth providing a summary of the individual changes in this\nPR.\n--\u003e\n\n- Introduced EncodedBuffer, an enum that wraps either a raw Arc-backed\nBuffer for the uncompressed path or an owned Vec for the compressed\npath, so both can be held in a uniform collection without an extra copy\ninto a flat accumulator\n- Changed write_array_data to push EncodedBuffer segments instead of\ncopying bytes into arrow_data\n- FileWriter and StreamWriter both now call **write_batch_direct()**,\neliminating the flush-per-message behavior and the intermediate copy on\nthe hot path\n\n# Are these changes tested?\nThese changes are intended to be completely seamless. I didn\u0027t write new\nunit test for the code as nothing externally changed. all test still\npass\n\u003c!--\nWe typically require tests for all PRs in order to:\n1. Prevent the code from being accidentally broken by subsequent changes\n2. Serve as another way to document the expected behavior of the code\n\nIf tests are not included in your PR, please explain why (for example,\nare they covered by existing tests)?\n\nIf this PR claims a performance improvement, please include evidence\nsuch as benchmark results.\n--\u003e\n## benchmarks\n[**main** -\u003e `cargo bench --bench ipc_writer -- \"StreamWriter/write_10$\"\n--sample-size 100`]\n[**my branch** -\u003e `cargo bench --bench ipc_writer --\n\"StreamWriter/write_10$\" --sample-size 100` ]\n\u003cimg width\u003d\"1832\" height\u003d\"982\" alt\u003d\"Image 6-1-26 at 3 19 PM\"\nsrc\u003d\"https://github.com/user-attachments/assets/8e6253a4-8a53-4d03-bdab-d0321edc2561\"\n/\u003e\n\n\n[**main** -\u003e `cargo bench --bench ipc_writer -- --sample-size 1000`]\n[**my branch** -\u003e `cargo bench --bench ipc_writer -- --sample-size\n1000`]\n\u003cimg width\u003d\"1944\" height\u003d\"1000\" alt\u003d\"Image 6-1-26 at 3 20 PM\"\nsrc\u003d\"https://github.com/user-attachments/assets/dc8015e8-ed60-487c-aa66-06f5d35499fe\"\n/\u003e\n\n\n# Are there any user-facing changes?\nno\n\u003c!--\nIf there are user-facing changes then we may require documentation to be\nupdated before approving the PR.\n\nIf there are any breaking changes to public APIs, please call them out.\n--\u003e"
    },
    {
      "commit": "e106c1534095cbfe1201a770c8ccca5252fdbb88",
      "tree": "ce42066f751c4d04217f69bd21fe4744896308f6",
      "parents": [
        "dda2d2de83055dba7e2638ee3237d326279e8948"
      ],
      "author": {
        "name": "ClSlaid",
        "email": "cailue@apache.org",
        "time": "Thu Jun 11 02:35:33 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 10 14:35:33 2026 -0400"
      },
      "message": "arrow-select: Optimize `BatchCoalescer::push_batch_with_filter` for low selectivity filters and inlined Utf8View/BinaryView (#9755)\n\n## Summary\n- fuse the sparse inline `BinaryView` filter and coalescing paths so\nprimitive columns and inline views can be appended directly without\nmaterialising an intermediate filtered `RecordBatch`\n- reuse optimised filter indices and null-mask handling for coalescing,\nwhile preserving the existing fallback paths for dense and non-inline\n`BinaryView` inputs\n- add focused tests and benchmarks for single-column and mixed\n`BinaryView` filter cases related to `#9143`\n\n## Verification\n- `cargo test -p arrow-select coalesce --lib`\n- `cargo clippy -p arrow-select --lib --tests -- -D warnings`\n- `cargo clippy -p arrow --bench coalesce_kernels --features test_utils\n-- -D warnings`\n- `cargo bench -p arrow --bench coalesce_kernels --features test_utils\n-- --noplot single_binaryview`\n- `cargo bench -p arrow --bench coalesce_kernels --features test_utils\n-- --noplot mixed_binaryview`\n\n## Benchmark Results\nMeasured against a clean `origin/main` worktree with the same\n`BinaryView` benchmark additions. The figures below compare\nrepresentative median times from the baseline worktree and this branch.\n\n### Mixed primitive + BinaryView\n- `mixed_binaryview (max_string_len\u003d8), 8192, nulls: 0, selectivity:\n0.001`: `23.16 ms` -\u003e `8.51 ms`\n- `mixed_binaryview (max_string_len\u003d8), 8192, nulls: 0, selectivity:\n0.01`: `2.37 ms` -\u003e `1.31 ms`\n- `mixed_binaryview (max_string_len\u003d8), 8192, nulls: 0.1, selectivity:\n0.001`: `31.70 ms` -\u003e `14.33 ms`\n- `mixed_binaryview (max_string_len\u003d8), 8192, nulls: 0.1, selectivity:\n0.01`: `3.92 ms` -\u003e `2.44 ms`\n\n### Single BinaryView\n- `single_binaryview, 8192, nulls: 0, selectivity: 0.01`: `4.86 ms` -\u003e\n`4.90 ms` (roughly flat, slightly slower)\n- `single_binaryview (max_string_len\u003d8), 8192, nulls: 0, selectivity:\n0.001`: `34.72 ms` -\u003e `19.33 ms`\n- `single_binaryview (max_string_len\u003d8), 8192, nulls: 0, selectivity:\n0.01`: `3.46 ms` -\u003e `2.03 ms`\n- `single_binaryview (max_string_len\u003d8), 8192, nulls: 0.1, selectivity:\n0.01`: `5.93 ms` -\u003e `3.97 ms`\n- `single_binaryview (max_string_len\u003d8), 8192, nulls: 0, selectivity:\n0.8`: `597 µs` -\u003e `619 µs` (regression)\n- `single_binaryview (max_string_len\u003d8), 8192, nulls: 0.1, selectivity:\n0.8`: `1.78 ms` -\u003e `1.79 ms` (roughly flat, slightly slower)\n\nIn short, this change substantially improves the mixed primitive +\ninline `BinaryView` path that motivated `#9143`, while the single-column\n`BinaryView` benchmarks still show trade-offs: sparse inline cases\nimprove, but dense inline cases are slightly slower and the non-inline\nsingle-column path is effectively unchanged.\n\nCloses #9143.\n\n---------\n\nSigned-off-by: cl \u003ccailue@apache.org\u003e\nCo-authored-by: Andrew Lamb \u003candrew@nerdnetworks.org\u003e"
    },
    {
      "commit": "dda2d2de83055dba7e2638ee3237d326279e8948",
      "tree": "bb7170f4dae34ea56cd21747d818cf1fb3755bc0",
      "parents": [
        "c3e0684179d2e3059a3bd99ea13cc7ccb0411f46"
      ],
      "author": {
        "name": "mwish",
        "email": "maplewish117@gmail.com",
        "time": "Thu Jun 11 02:31:50 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 10 14:31:50 2026 -0400"
      },
      "message": "perf(interleave): Optimize list interleave_list when child is primitive (#10025)\n\n# Which issue does this PR close?\n\n- Closes #10022.\n\n# Rationale for this change\n\nOptimize interleave_list when child is primitive type.\n\n# What changes are included in this PR?\n\n1. Special path when child is primitive type.\n2. new `interleave_list_primitive_child` function\n\n# Are these changes tested?\n\nCovered by existing\n\n# Are there any user-facing changes?\n\nno"
    },
    {
      "commit": "c3e0684179d2e3059a3bd99ea13cc7ccb0411f46",
      "tree": "73e7b0138481b4215ae0bb5071ce120bd12a377e",
      "parents": [
        "d4c1b44e2b4de4b864dd9d15cf20de6979d62a88"
      ],
      "author": {
        "name": "mwish",
        "email": "maplewish117@gmail.com",
        "time": "Wed Jun 10 19:30:45 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 10 07:30:45 2026 -0400"
      },
      "message": "bench(parquet): add nested list writer benchmarks (#10084)\n\n# Which issue does this PR close?\n\n- Closes #10083 .\n\n# Rationale for this change\n\nAdd benchmarks for list types with nested repetition levels:\n- `list_nested`: List\u003cList\u003cInt32\u003e\u003e\n- `list_struct_with_list`: List\u003cStruct\u003ca:Int32, b:Float32,\nc:List\u003cInt32\u003e\u003e\u003e\n\nThese exercise the per-slot (non-batched) write path where\nchild_has_no_nested_rep() returns false, providing a baseline for future\noptimizations.\n\n# What changes are included in this PR?\n\nAdd some benchmarks\n\n# Are these changes tested?\n\nThey\u0027re already tests\n\n# Are there any user-facing changes?\n\nNo\n\nCo-authored-by: Claude Opus 4 \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "d4c1b44e2b4de4b864dd9d15cf20de6979d62a88",
      "tree": "2a507ceace335988ddd188e1e3c588683d948875",
      "parents": [
        "481223f8750957d47d34990fe1ae2a1a5aa0515a"
      ],
      "author": {
        "name": "Konstantin Tarasov",
        "email": "33369833+sdf-jkl@users.noreply.github.com",
        "time": "Wed Jun 10 02:31:18 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 10 16:31:18 2026 +1000"
      },
      "message": "replace github .md templates with.yaml (#10096)\n\n# Which issue does this PR close?\n\n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n\n- Closes #10095.\n\n# Rationale for this change\n\n- check issue\n\u003c!--\nWhy are you proposing this change? If this is already explained clearly\nin the issue then this section is not needed.\nExplaining clearly why changes are proposed helps reviewers understand\nyour changes and offer better suggestions for fixes.\n--\u003e\n\n# What changes are included in this PR?\n\n- Replaced old .md issue templates with new .yaml templates following\nDataFusion.\n- config.yaml is new, but has a convenient link to discussions since we\nget rid of `question` template\n\u003c!--\nThere is no need to duplicate the description in the issue here but it\nis sometimes worth providing a summary of the individual changes in this\nPR.\n--\u003e\n\n# Are these changes tested?\nn/a\n\u003c!--\nWe typically require tests for all PRs in order to:\n1. Prevent the code from being accidentally broken by subsequent changes\n2. Serve as another way to document the expected behavior of the code\n\nIf tests are not included in your PR, please explain why (for example,\nare they covered by existing tests)?\n\nIf this PR claims a performance improvement, please include evidence\nsuch as benchmark results.\n--\u003e\n\n# Are there any user-facing changes?\n\n- Yes, better UX for issue creation.\n\u003c!--\nIf there are user-facing changes then we may require documentation to be\nupdated before approving the PR.\n\nIf there are any breaking changes to public APIs, please call them out.\n--\u003e"
    },
    {
      "commit": "481223f8750957d47d34990fe1ae2a1a5aa0515a",
      "tree": "ea3177454db652aca462c71dac977501b8fd2b1c",
      "parents": [
        "7aa97829e6fe7e94941c96127645f9c60c4abfa2"
      ],
      "author": {
        "name": "Neetika Mittal",
        "email": "7251919+mneetika@users.noreply.github.com",
        "time": "Tue Jun 09 18:50:21 2026 +0100"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 09 13:50:21 2026 -0400"
      },
      "message": "feat(parquet-variant): add Dictionary and REE variant_to_arrow support (#10014)\n\n# Which issue does this PR close?\n\n- Closes #10013\n- Related to #6736\n\n# Rationale for this change\n\n`variant_get` / `variant_to_arrow` can already convert Variant values\ninto many native Arrow array layouts, but requesting\n`DataType::Dictionary` or `DataType::RunEndEncoded` was not supported.\n\nThis PR adds support for those output encodings without changing Variant\nshredding semantics. `Dictionary` and `RunEndEncoded` are produced as\nArrow result arrays only; they are not introduced as valid Parquet\nVariant shredded `typed_value` layouts.\n\n# What changes are included in this PR?\n\n1. Adds an encoded output builder in `variant_to_arrow` for\n`DataType::Dictionary` and `DataType::RunEndEncoded`.\n2. Builds the logical child value array using the existing\nVariant-to-Arrow builders, then delegates the final Dictionary/REE\nencoding to Arrow\u0027s existing cast kernels.\n3. Adds `variant_get` regression coverage for string dictionary, numeric\ndictionary, and run-end encoded outputs.\n\n# Are these changes tested?\n\nYes:\n\n- `cargo fmt --check`\n- `cargo test -p parquet-variant-compute`\n- `cargo test -p parquet-variant`\n- `cargo clippy --workspace --all-targets`\n\n# Are there any user-facing changes?\n\nYes. `variant_get` with `as_type` set to `DataType::Dictionary` or\n`DataType::RunEndEncoded` can now return those Arrow array encodings.\n\nCo-authored-by: Neetika Mittal \u003cmneetika@users.noreply.github.com\u003e"
    },
    {
      "commit": "7aa97829e6fe7e94941c96127645f9c60c4abfa2",
      "tree": "617d934882c5884adc5999564e3cd287d4a0070e",
      "parents": [
        "6fae4eae8569799442728923b40930536ea3ccd2"
      ],
      "author": {
        "name": "Andrew Lamb",
        "email": "andrew@nerdnetworks.org",
        "time": "Tue Jun 09 13:47:53 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 09 13:47:53 2026 -0400"
      },
      "message": "Run arrow tests for --lib and --tests on CI builds (#9814)\n\n# Which issue does this PR close?\n\n- Closes #9815.\n\n# Rationale for this change\n\nAs noted in\nhttps://github.com/apache/arrow-rs/pull/9813#discussion_r3140956707,\nRust debug builds panic on arithmetic overflow / underflow but release\nbuilds do not (they simply overflow / underflow). This means that some\ncode paths may panic in debug builds that would have silently failed in\nrelease builds.\n\nAs we harden down the security posture of arrow-rs I would like to start\ntesting in release mode too to ensure overflows such as\nhttps://github.com/apache/arrow-rs/pull/9813 can be properly validation\n\n# What changes are included in this PR?\n\nAdd Some new release mode tests: `linux-release-test:` et al\n\n# Are these changes tested?\n\nThey are only tests, no code changes\n\n# Are there any user-facing changes?\n\nNo"
    },
    {
      "commit": "6fae4eae8569799442728923b40930536ea3ccd2",
      "tree": "1a9fbfe8fc46fa33d991dc6a7eaa7ef497ef0400",
      "parents": [
        "f01ff3d47bb93a1dfeba61f2d1b3ac771a78f651"
      ],
      "author": {
        "name": "Hippolyte Barraud",
        "email": "hippolyte.barraud@datadoghq.com",
        "time": "Mon Jun 08 16:22:10 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 08 16:22:10 2026 -0400"
      },
      "message": "refactor(parquet): bundle array reader recursion args into `ReaderArgs` (#10089)\n\n# Which issue does this PR close?\n\n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n\n- Spawn off from #9848\n- Contributes to #9731\n\n# Rationale for this change\n\nThe recursive `build_reader` / `build_*_reader` methods in the array\nreader builder thread `field` and `mask` through every call.\n\n# What changes are included in this PR?\n\nBundle them into a small `Copy` `ReaderArgs` struct so the recursive\nsignatures stay compact and there is a single, documented home for\nper-field reader options added in the future. This is a mechanical,\nbehavior-preserving change: `build_array_reader` constructs the args at\nthe entry point, group readers recurse with `args.with_field(child)`,\nand leaf readers read `args.field` and `args.mask`.\n\n# Are these changes tested?\n\nAll tests passing.\n\n# Are there any user-facing changes?\n\nNo."
    },
    {
      "commit": "f01ff3d47bb93a1dfeba61f2d1b3ac771a78f651",
      "tree": "bea009bf7e23815d055981fdcd7da363ce1f498e",
      "parents": [
        "9d5d79be99fbe7671fce1b4dd55606510f93d5d0"
      ],
      "author": {
        "name": "Adam Gutglick",
        "email": "adam@spiraldb.com",
        "time": "Fri Jun 05 19:53:18 2026 +0100"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jun 05 14:53:18 2026 -0400"
      },
      "message": "Implement From\u003ci128\u003e for i256 (#10081)\n\n# Which issue does this PR close?\n\n- Closes #10080.\n\n# Rationale for this change\n\n`From` is already implemented for all other signed integer primitives,\nran into it working on decimal aggregations in DataFusion, which this\nwill make much simpler.\n\n# What changes are included in this PR?\n\nAdds an additional trait implementation for i256. I\u0027ve also considered\ndeprecating `i256::from_i128` as a public function, but figured I\u0027ll see\nwhat reviewers think.\n\n# Are these changes tested?\n\nJust exposes an additional path for existing functionality.\n\n# Are there any user-facing changes?\n\nNo\n\nSigned-off-by: Adam Gutglick \u003cadam@spiraldb.com\u003e"
    },
    {
      "commit": "9d5d79be99fbe7671fce1b4dd55606510f93d5d0",
      "tree": "a4f531abd96f069b97e4930e4c8174fbbceee123",
      "parents": [
        "e5e66fa05ce986fa6a3f61c36c11ff6b476cb76c"
      ],
      "author": {
        "name": "Adrian Garcia Badaracco",
        "email": "1755071+adriangb@users.noreply.github.com",
        "time": "Fri Jun 05 13:47:26 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jun 05 13:47:26 2026 -0400"
      },
      "message": "test(parquet): drop confusing `main` reference in page-roundtrip test comment (#10072)\n\n# Which issue does this PR close?\n\nFollow-up to #9972.\n\n# Rationale for this change\n\nA test comment added in #9972 described granular mode as writing \"more\npages than `main`\". As noted in [review\nfeedback](https://github.com/apache/arrow-rs/pull/9972#discussion_r3357602241),\ncomparing to `main` is confusing now that the PR has merged — that code\n*is* main. This rephrases the comment to compare against the default\nbatched path instead, which the same comment already references.\n\n# What changes are included in this PR?\n\n- Reword one test comment in\n`test_arrow_writer_granular_mode_roundtrip`. No behavior change.\n\n# Are there any user-facing changes?\n\nNo.\n\n🤖 Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-authored-by: Claude Opus 4.8 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "e5e66fa05ce986fa6a3f61c36c11ff6b476cb76c",
      "tree": "ac5f5ab50b656ac7bedccc943104200a425ee1be",
      "parents": [
        "9f96a8f5c3f47c6303c7d17ac7f00ad4ad0df513"
      ],
      "author": {
        "name": "Ed Seidl",
        "email": "etseidl@users.noreply.github.com",
        "time": "Fri Jun 05 03:06:25 2026 -0700"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jun 05 06:06:25 2026 -0400"
      },
      "message": "Add test for `parquet-testing/bad_data/ARROW-GH-47662.parquet` (#10077)\n\n# Which issue does this PR close?\n\n- Issue raised in #9110\n\n# Rationale for this change\nAdd a \"bad_data\" test for newly added file in parquet-testing\n\n# What changes are included in this PR?\n\nAdds a new test so the `bad_data` unit test doesn\u0027t fail.\n\n# Are these changes tested?\n\nYes\n\n# Are there any user-facing changes?\n\nNo, only tests"
    },
    {
      "commit": "9f96a8f5c3f47c6303c7d17ac7f00ad4ad0df513",
      "tree": "b0c5ab52167fc8a551f6aa712f298895908c58c7",
      "parents": [
        "d7ef673275d6b7f124d5f588146acb6c49d59a0a"
      ],
      "author": {
        "name": "Andrew Lamb",
        "email": "andrew@nerdnetworks.org",
        "time": "Thu Jun 04 16:56:59 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 04 16:56:59 2026 -0400"
      },
      "message": "Prepare for `59.0.0` release (#10063)\n\n# Which issue does this PR close?\n\n- Part of https://github.com/apache/arrow-rs/issues/9110\n\n# Rationale for this change\n\nThis prepares for the `59.0.0` (major) release of the Rust Arrow /\nParquet crates.\n\n# What changes are included in this PR?\n\n1. Update version to `59.0.0`\n2. Update CHANGELOG. See rendered preview here:\nhttps://github.com/alamb/arrow-rs/blob/alamb/make_release_59.0.0/CHANGELOG.md\n\n# Are these changes tested?\n\nBy CI\n\n# Are there any user-facing changes?\n\nyes"
    },
    {
      "commit": "d7ef673275d6b7f124d5f588146acb6c49d59a0a",
      "tree": "d2a64aff80b6a82c0e4327a61b959d39c05f4f3c",
      "parents": [
        "8042ea288e084107b602f9e25a850314942567b6"
      ],
      "author": {
        "name": "RIchard Baah",
        "email": "137434454+Rich-T-kid@users.noreply.github.com",
        "time": "Thu Jun 04 13:48:17 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 04 13:48:17 2026 -0400"
      },
      "message": "Bump max throughput in `flight` benchmark before blocking (#10070)\n\n# Which issue does this PR close?\n\n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n- Closes #10029.\n\n# Rationale for this change\nIncrease the duplex buffer from 1 MB to 64 MB to eliminate artificial\nback-pressure in the roundtrip benchmarks.\nSee rational in this\n[comment](https://github.com/apache/arrow-rs/pull/10044#issuecomment-4623736500)\n\u003c!--\nWhy are you proposing this change? If this is already explained clearly\nin the issue then this section is not needed.\nExplaining clearly why changes are proposed helps reviewers understand\nyour changes and offer better suggestions for fixes.\n--\u003e\n\n# What changes are included in this PR?\nbumps `max_buf_size` to 64**MB**\n\u003c!--\nThere is no need to duplicate the description in the issue here but it\nis sometimes worth providing a summary of the individual changes in this\nPR.\n--\u003e\n\n# Are these changes tested?\nn/a\n\u003c!--\nWe typically require tests for all PRs in order to:\n1. Prevent the code from being accidentally broken by subsequent changes\n2. Serve as another way to document the expected behavior of the code\n\nIf tests are not included in your PR, please explain why (for example,\nare they covered by existing tests)?\n\nIf this PR claims a performance improvement, please include evidence\nsuch as benchmark results.\n--\u003e\n\n# Are there any user-facing changes?\nn/a\n\u003c!--\nIf there are user-facing changes then we may require documentation to be\nupdated before approving the PR.\n\nIf there are any breaking changes to public APIs, please call them out.\n--\u003e"
    },
    {
      "commit": "8042ea288e084107b602f9e25a850314942567b6",
      "tree": "ba7250bc8428adbfd389be512b6275f7ad28fd9e",
      "parents": [
        "1e8ea5eb6857bad5247908c1c27aa51e62aa9006"
      ],
      "author": {
        "name": "Adrian Garcia Badaracco",
        "email": "1755071+adriangb@users.noreply.github.com",
        "time": "Thu Jun 04 12:24:25 2026 -0500"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 04 13:24:25 2026 -0400"
      },
      "message": "Pluggable page spilling API for the Parquet ArrowWriter (PageStore) (#10020)\n\n- closes https://github.com/apache/arrow-rs/issues/10071\n\n## Problem description\n\nWe currently buffer entire row groups in memory. From [our own\ndocs](https://docs.rs/parquet/latest/parquet/arrow/arrow_writer/struct.ArrowWriter.html#memory-usage-and-limiting):\n\n\u003e The nature of Parquet requires buffering of an entire row group before\nit can be flushed to the underlying writer.\n\nFor our production workload where we have ~400 columns with large data\nskews (some much larger than others) this causes \u003e\u003d12GBs of memory\nconsumed *just to write Parquet*.\n\nWhen `ArrowWriter` writes a row group, record batches arrive with all\ncolumns\ninterleaved, but each Parquet column chunk must be contiguous on disk.\nSo every\ncolumn\u0027s *compressed* pages are buffered for the whole row group and\nonly\nspliced into the output at flush. Peak `ArrowWriter` memory is therefore\n≈ Σ(compressed bytes of every column chunk) for one row group, and it\ngrows with\nthe row group size.\n\nToday the only lever against this is\n[ArrowWriter::in_progress_size](https://docs.rs/parquet/latest/parquet/arrow/arrow_writer/struct.ArrowWriter.html#method.in_progress_size)\n+ flushing\nsmaller row groups — which trades away compression and read-time\n(page/row-group)\npruning. This has negative consequences for encoding efficiency, read\nperformance, etc. Parquet already has pages, we don\u0027t need one column to\nforce the layout of another. Ideally what we\u0027d want in a case like this\nis a large (lets just say 1M row) row group with ~4 1MB pages for the\n`id: i32` column and N ~1MB pages for the `large_text` column. Reading\nthe small id column has no data fragmentation penalty, no page index\nbloat penalty, etc.\n\n## Related issues\n\nSome of the issues I could find\n\n- https://github.com/apache/arrow-rs/issues/5828\n- https://github.com/apache/arrow-rs/issues/5450\n- https://github.com/apache/arrow-rs/issues/5484\n\n## Proposed solution\n\nIntroduce a trait for pluggable buffering. In particular we would like\nto implement spilling (spill buffered completed pages to disk). If this\nworks well it can be upstreamed / made easily configurable and usable\nfor all arrow users. I am not adding an implementation here to avoid\ndiscussing those APIs (is it a temp dir, how does it get configured,\netc.).\n\n## What changes are included in this PR?\n\nA small, intentionally \"dumb\" key/value store trait and its wiring, in\nfour\nstacked commits:\n\n1. **`PageStore` + `PageKey` + `PageStoreFactory` + `InMemoryPageStore`,\nwired\ninto `ArrowWriter`.** The store maps an opaque, store-allocated\n`PageKey` to a\nblob of bytes and knows nothing about pages, dictionaries, ordering, or\noffsets — the caller keeps the handles and decides what they mean. The\ndefault `InMemoryPageStore` (a `Vec\u003cBytes\u003e`) adds no overhead and\nproduces\nequivalent, valid output — byte-for-byte identical to the previous\nbuffering\n   for non-dictionary columns; dictionary columns differ only in the\n`page_encoding_stats` ordering (see commit 4). A `PageStoreFactory` is\nthreaded\n   through `ArrowWriterOptions::with_page_store_factory` →\n   `ArrowRowGroupWriterFactory` → `ArrowColumnWriterFactory`.\n2. **Stream column chunks out of the store at splice.** Replaces the\nmaterialize-then-copy splice with a `Read` that takes each page blob\nback out\nof the store in write order *as it is consumed* and releases it\nimmediately,\nso the splice never holds more than one page in memory at a time\n(essential\nfor a spilling backend on skewed schemas). `append_column` is unchanged\nfor\n   external `ChunkReader` callers.\n3. **Public `PageKey::new`/`get`** (so external backends can mint their\nown\nhandles) **+ a memory regression test** (in-tree thread-local tracking\n   allocator) with a temp-file backend.\n4. **Spill dictionary-column data pages too.** Dictionary-encoded\ncolumns\nbuffered every completed data page in `GenericColumnWriter.data_pages`\nuntil\n`close()` (the dictionary page must be written first but isn\u0027t final\nuntil all\n   values are seen), so those pages never reached the store. A new\n`PageWriter::defers_dictionary_ordering()` lets a writer that buffers\nthe\nwhole chunk and splices later (the Arrow path) accept data pages\n*before* the\n   dictionary page and order them itself; the column writer then streams\ndictionary-column data pages straight through. `ArrowPageWriter` holds\nthe\n(bounded, ≤ `dict_page_size_limit`) dictionary page in memory — it now\narrives\nlast — and emits it first at splice, where the production-order page\noffsets\nare rewritten to the dictionary-first layout. Because the data page is\nnow\nproduced before the dictionary page, the chunk\u0027s `page_encoding_stats`\nlists\n   the data-page entry before the dictionary-page entry (the reverse of\n`master`); the Parquet spec defines this list as an unordered set, so\nthe\n   output stays valid. The column-at-a-time\n`SerializedFileWriter` path is unchanged (it commits bytes live and\nstill\nbuffers, which is inherent there). This commit also fixes\n`memory_size()` to\n   report bytes the writer actually holds *resident* (via\n`PageStore::memory_size` / `PageWriter::buffered_memory_size`) rather\nthan\n   bytes written, so it drops to ~0 once pages are spilled off-heap.\n\n## Are these changes tested?\n\nYes:\n\n- A **byte-identical round-trip test** using a custom `PageStore` with\nsparse,\nnon-contiguous, `HashMap`-backed handles, proving the writer relies only\non\nthe opaque-handle contract across dictionary and non-dictionary columns\nand\n  multiple row groups.\n- A **dictionary round-trip test with the offset index disabled**,\ncovering the\npath where only the chunk-level dictionary/data page offsets are\nrewritten.\n- Unit tests for the in-memory backend contract and its resident-byte\nreporting.\n- An **always-on memory regression test**\n(`page_store_bounds_write_memory` in\n`parquet/tests/arrow_writer.rs`, using an in-tree thread-local tracking\nallocator) measuring peak heap, for both a skewed wide row group (~16\nMiB) and\n  a low-cardinality dictionary column (~4.2M rows):\n\n  | scenario | in-memory store | temp-file spill |\n  |---|---|---|\n  | skewed ~16 MiB row group | ~18.3 MiB | ~4.2 MiB |\n  | dictionary column, 4.2M rows | ~2.69 MiB | ~0.48 MiB |\n\n  i.e. the spilling backend bounds peak write memory by the in-flight\nencoder/dictionary buffers rather than the row group size, for both the\npage\n  buffer and the dictionary-column data pages.\n\n## Are there any user-facing changes?\n\nNew, additive public API (default behavior unchanged):\n\n- `ArrowWriterOptions::with_page_store_factory`\n- `PageStore`, `PageKey`, `PageStoreFactory`, `InMemoryPageStore`,\n`InMemoryPageStoreFactory` (re-exported from\n`parquet::arrow::arrow_writer`,\n  defined in `parquet::column::page_store`).\n- New defaulted `PageWriter` trait methods\n`defers_dictionary_ordering()` and\n`buffered_memory_size()` (both default to the previous behavior), and a\n  defaulted `PageStore::memory_size()`.\n\n## Not covered (by design)\n\n- The **column-at-a-time `SerializedFileWriter` path** still buffers\ndictionary-column data pages: it commits bytes to the file live, so the\ndictionary-first ordering must be resolved during encoding. That path\nalready\n  has minimal memory otherwise.\n- The in-flight **encoder buffer** and **dictionary** themselves stay\nresident\n(already bounded by the page/dict size limits), as do **bloom filters**.\n\n🤖 Generated with [Claude Code](https://claude.com/claude-code)\n\n---------\n\nCo-authored-by: Claude Opus 4.8 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "1e8ea5eb6857bad5247908c1c27aa51e62aa9006",
      "tree": "6b15796451df563410b8147ae91a4b386cc3b7c1",
      "parents": [
        "2a1d40d35c3dd5cc274dd97baa413a5384006fb2"
      ],
      "author": {
        "name": "Adrian Garcia Badaracco",
        "email": "1755071+adriangb@users.noreply.github.com",
        "time": "Thu Jun 04 11:55:52 2026 -0500"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 04 12:55:52 2026 -0400"
      },
      "message": "fix(parquet): bound data page byte size for large variable-width values (#9972)\n\n- closes https://github.com/apache/arrow-rs/issues/10061\n\nWe write large values into our parquet files (e.g. a 5MB LLM prompt). A\nnaive write will cause massive pages (we\u0027ve seen up to 2GB) at default\nwrite settings. The main knob to control this is\n[`write_batch_size`](https://docs.rs/datafusion/latest/datafusion/common/config/struct.ParquetOptions.html#structfield.write_batch_size)\nwhich defaults to 1024. But if each row is 5MB that\u0027s 5GB. On the other\nhand setting this to something small like 32 kills write performance and\nis completely unnecessary for other fixed width columns.\n\nThe writer even documents this (`parquet/src/column/writer/mod.rs`):\n\n\u003e We check for DataPage limits only after we have inserted the values.\nIf a user writes a large number of values, the DataPage size can be well\nabove the limit.\n\nThis PR makes the mini-batch size byte-budget aware:\n\n- For each chunk, compute `bytes_per_value` from the values about to be\nwritten and pick `sub_batch_size \u003d page_byte_limit / bytes_per_value`\n(clamped ≥ 1).\n- For typical small values — numeric columns, short strings —\n`sub_batch_size` ≥ chunk size, so we stay on the existing batched fast\npath with zero behavior change.\n- Only when individual values are large enough that a full chunk would\nblow the page does the sub-batch shrink — to one row per mini-batch in\nthe limit, matching the format minimum of one record per page.\n\n## Implementation notes\n\nSkip the byte-size check while parquet dictionary encoding is active:\n`estimated_value_bytes` returns plain-encoded size but a dict-encoded\ndata page only stores small RLE indices, so the estimate would\nspuriously shrink pages. Dict fallback bounds dict-encoded pages\nindependently.\n\nFor repeated/nested columns the sub-batch steps record-by-record (rep \u003d\u003d\n0 boundaries) so a record never spans data pages, matching the parquet\nformat rule.\n\n### Regression test\n\n`test_column_writer_caps_page_size_for_large_byte_array_values` writes\n64 × 64 KiB BYTE_ARRAY values with a 16 KiB page byte limit. Before this\nfix that produced a single ~4 MiB page; after, it\u0027s one page per value\n(~64 pages, all within ~2× the value size).\n\n### Bench results\n\n5-run medians, criterion `arrow_writer` bench, default writer\nproperties, on a noisy laptop (run-to-run variance ~±1.6%):\n\n| bench | Δ vs main |\n|---|---|\n| `primitive/default` (i32 25% null) | −1.0% |\n| `primitive_non_null/default` | −0.0% |\n| `bool_non_null/default` | −1.2% |\n| `string/default` | +0.6% |\n| `short_string_non_null/default` (new, 1M × 8 B) | +0.2% |\n| `large_string_non_null/default` (new, 1024 × 256 KiB) | +1.2% |\n| `string_non_null/default` | −2.1% |\n| `string_dictionary/default` | +0.4% |\n| `list_primitive/default` | +0.5% |\n| `list_primitive_non_null/default` | +0.1% |\n\n🤖 Generated with [Claude Code](https://claude.com/claude-code)\n\n---------\n\nCo-authored-by: Claude Opus 4.8 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "2a1d40d35c3dd5cc274dd97baa413a5384006fb2",
      "tree": "7669b88d4da5a340523d89fd01709a7e578e6558",
      "parents": [
        "97f4b1460cd32b5b8c3b91ed554238e12e3e68cf"
      ],
      "author": {
        "name": "Adam Gutglick",
        "email": "adam@spiraldb.com",
        "time": "Wed Jun 03 21:17:25 2026 +0100"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 03 16:17:25 2026 -0400"
      },
      "message": "Reduce Miri runtime even more (#9650)\n\n# Rationale for this change\n\nFollow up to #9629, as noted there miri runtime is dominated by the\nfollowing 3 tests:\n1. `test_from_bitwise_binary_op`\n2. `sort::tests::fuzz_partition_validity`\n3. `sort::tests::test_fuzz_random_strings`\n\n# What changes are included in this PR?\n\nUnder Miri, this PR reduces the amount of variations tested in 1, and\nignores the latter two.\n\n# Are these changes tested?\n\nThey are tests!\n\n# Are there any user-facing changes?\n\nNo\n\n---------\n\nCo-authored-by: Jeffrey Vo \u003cjeffrey.vo.australia@gmail.com\u003e"
    },
    {
      "commit": "97f4b1460cd32b5b8c3b91ed554238e12e3e68cf",
      "tree": "a1e0d025c46e909ad77858494e00c5ff3e668f83",
      "parents": [
        "6c397977687380f5e3d13d3d061b0369fe31cb3a"
      ],
      "author": {
        "name": "theirix",
        "email": "theirix@gmail.com",
        "time": "Wed Jun 03 14:45:53 2026 +0100"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 03 09:45:53 2026 -0400"
      },
      "message": "arrow-buffer: i256: implement ilog (#9453)\n\n# Which issue does this PR close?\n\n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n\n- Closes #9452.\n\n# Rationale for this change\n\nImplementation of integer logarithm. There is no matching `num_traits`\ntrait, but this implementation provides a good motivation for such a\ntrait.\n\n# What changes are included in this PR?\n\n- No external dependencies\n- Checked methods (log, log2, log10)\n- Unchecked methods (panic by design)\n\n# Are these changes tested?\n\n- Unit tests\n\n# Are there any user-facing changes?\n\nNo\n\n---------\n\nCo-authored-by: Andrew Lamb \u003candrew@nerdnetworks.org\u003e\nCo-authored-by: Claude Opus 4.8 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "6c397977687380f5e3d13d3d061b0369fe31cb3a",
      "tree": "03f0f355eb09e2d56ab16fb3e8bf5edb2a52a3e7",
      "parents": [
        "9949226f0ab644c701e6ed283db32989bbf6b006"
      ],
      "author": {
        "name": "Andrew Lamb",
        "email": "andrew@nerdnetworks.org",
        "time": "Wed Jun 03 09:38:18 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 03 09:38:18 2026 -0400"
      },
      "message": "Improve email created by create_tarball.sh script (#9944)\n\n# Which issue does this PR close?\n\n\n# Rationale for this change\n\nThere are several issues I found with the email template created by the\ncurrent template:\n1. It uses links to tags (rather than git shas) which can potentially be\nchanged\n2. It does not include the actual SHA values (only a link to a place to\ndownload the sha values) which means in theory it is not clear what\nexact artifact is being voted on\n3. It does not include a link to the issue used to do release\ncoordiation\n\n# What changes are included in this PR?\n\nFix the above issues \n\nExample new output:\n```\n---------------------------------------------------------\nTo: dev@arrow.apache.org\nSubject: [VOTE][RUST] Release Apache Arrow Rust 57.3.1 RC1\n\nHi,\n\nI would like to propose a release of Apache Arrow Rust Implementation, version 57.3.1.\n\nThis release candidate is based on commit: da8975cfacdf8623892a7937dc5c5e6515a05483 [1].\nThe SHA256 of the release candidate is: 067a4c47c515d57b283f431d426c46c0f48601a2017202a490d2a234e0cd2fb4\n\nThe proposed release tarball and signatures are hosted at [2].\n\nThe changelog is located at [3].\n\nThe release tracking issue is: [4]\n\nPlease download, verify checksums and signatures, run the unit tests,\nand vote on the release. There is a script [4] that automates some of\nthe verification.\n\nThe vote will be open for at least 72 hours.\n\n[ ] +1 Release this as Apache Arrow Rust 57.3.1\n[ ] +0\n[ ] -1 Do not release this as Apache Arrow Rust 57.3.1 because...\n\n[1]: https://github.com/apache/arrow-rs/tree/da8975cfacdf8623892a7937dc5c5e6515a05483\n[2]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-57.3.1-rc1\n[3]: https://github.com/apache/arrow-rs/blob/da8975cfacdf8623892a7937dc5c5e6515a05483/CHANGELOG.md\n[4]: https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh\n[5]: RELEASE_ISSUE\n```\n# Are these changes tested?\n\nI tested this script while creating release candidates for 57.3.1 and\n56.2.1 and it worked well\n- https://github.com/apache/arrow-rs/issues/9858\n- https://github.com/apache/arrow-rs/issues/9857 \n\n# Are there any user-facing changes?\n\n\u003c!--\nIf there are user-facing changes then we may require documentation to be\nupdated before approving the PR.\n\nIf there are any breaking changes to public APIs, please call them out.\n--\u003e"
    },
    {
      "commit": "9949226f0ab644c701e6ed283db32989bbf6b006",
      "tree": "4a6c66af86023d227f03fa0a3df0b9a6900f0a99",
      "parents": [
        "58bdc7d557df323b5c15b5779f8ea23b05bf0ccf"
      ],
      "author": {
        "name": "mwish",
        "email": "maplewish117@gmail.com",
        "time": "Wed Jun 03 21:33:22 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 03 09:33:22 2026 -0400"
      },
      "message": "perf(parquet): LevelInfoBuilder batch write when no repetition childs (#10037)\n\n# Which issue does this PR close?\n\n- Closes #10023 .\n\n# Rationale for this change\n\nParquet writer writes lists element one by one, this is extremly slow.\nThis patch batches writes.\n\n# What changes are included in this PR?\n\nBatches write when writing list with maximum rep level.\n\n# Are these changes tested?\n\nCovered by existing\n\n# Are there any user-facing changes?\n\nNo"
    },
    {
      "commit": "58bdc7d557df323b5c15b5779f8ea23b05bf0ccf",
      "tree": "0b60c397f14e0ff8cc30d577df155a20de3af744",
      "parents": [
        "4e7a2fa553e2e5e7385f6c4e77984a354d40c813"
      ],
      "author": {
        "name": "theirix",
        "email": "theirix@gmail.com",
        "time": "Wed Jun 03 14:18:47 2026 +0100"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 03 09:18:47 2026 -0400"
      },
      "message": "arrow-buffer: i256: Implement num_traits wrapping shift (#9418)\n\n# Which issue does this PR close?\n\n- Closes #9417\n\n# Rationale for this change\n\nA follow-up to #8976\n\nImplement some missing traits -\n[WrappingShl](https://docs.rs/num-traits/latest/num_traits/ops/wrapping/trait.WrappingShl.html)\nand\n[WrappingShr](https://docs.rs/num-traits/latest/num_traits/ops/wrapping/trait.WrappingShl.html)\n\n# What changes are included in this PR?\n\n- num_traits\u0027 WrappingShl implementation  for `usize`\n- `Shl` and `Shr` trait implementation for all scalar numeric types, not\nonly for `u8`\n\n# Are these changes tested?\n\n- Unit tests\n\n# Are there any user-facing changes?\n\n\u003c!--\nIf there are user-facing changes then we may require documentation to be\nupdated before approving the PR.\n\nIf there are any breaking changes to public APIs, please call them out.\n--\u003e"
    },
    {
      "commit": "4e7a2fa553e2e5e7385f6c4e77984a354d40c813",
      "tree": "95a451da87130429b829fde4d867b7cad3040fa9",
      "parents": [
        "f001b222483ad2a18e6bb4563c3505e9f4280f66"
      ],
      "author": {
        "name": "Adam Gutglick",
        "email": "adam@spiraldb.com",
        "time": "Wed Jun 03 14:15:20 2026 +0100"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 03 09:15:20 2026 -0400"
      },
      "message": "Improve `take_bytes` perf in the null cases between 10-25% (#9625)\n\n# Which issue does this PR close?\n\n\n- Closes #NNN.\n\n# Rationale for this change\n\nJust improves performance, I was profiling some things downstream and\ngot curious about how it works.\n\n# What changes are included in this PR?\n\nThe main idea is to use a two-pass approach:\n1. Compute byte offsets and collects (start, end) byte ranges \n2. Copy byte data via raw pointer writes (`copy_byte_ranges`)\n\nThis PR also reduces the branching from 4 (one for each nullability\ncombination) to only two.\n\n\n# Are these changes tested?\n\nExisting tests\n\n# Are there any user-facing changes?\n\nNone\n\n---------\n\nSigned-off-by: Adam Gutglick \u003cadam@spiraldb.com\u003e\nCo-authored-by: Andrew Lamb \u003candrew@nerdnetworks.org\u003e\nCo-authored-by: Claude Opus 4.8 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "f001b222483ad2a18e6bb4563c3505e9f4280f66",
      "tree": "89e197689fea2c0f6bd9298e05de9287b8514375",
      "parents": [
        "f03e1bc028630561af7d4adfc000916337ec6ce5"
      ],
      "author": {
        "name": "RIchard Baah",
        "email": "137434454+Rich-T-kid@users.noreply.github.com",
        "time": "Wed Jun 03 07:34:00 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 03 07:34:00 2026 -0400"
      },
      "message": "[#10029][benchmarks] arrow-flight roundtrip as well as encode/decode  (#10031)\n\n# Which issue does this PR close?\n\n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n\n- Contributes towards closing #10029.\n\n# Rationale for this change\nProvides benchmarks for arrow-flight crate. benchmarks for round trip as\nwell as encode/decode individually.\n\u003c!--\nWhy are you proposing this change? If this is already explained clearly\nin the issue then this section is not needed.\nExplaining clearly why changes are proposed helps reviewers understand\nyour changes and offer better suggestions for fixes.\n--\u003e\n\n# What changes are included in this PR?\nAdds three criterion benches under arrow-flight/benchmarks/\n(roundtrip.rs, flight_encode.rs, flight_decode.rs), each sweeping a\ntunable matrix of rows, cols, and column types (fixed Int64, variable\nStringArray, nested List, dict DictionaryArray) built via a shared\n  common::build_batch helper.  \n\u003c!--\nThere is no need to duplicate the description in the issue here but it\nis sometimes worth providing a summary of the individual changes in this\nPR.\n--\u003e\n\n# Are these changes tested?\nn/a\n\u003c!--\nWe typically require tests for all PRs in order to:\n1. Prevent the code from being accidentally broken by subsequent changes\n2. Serve as another way to document the expected behavior of the code\n\nIf tests are not included in your PR, please explain why (for example,\nare they covered by existing tests)?\n\nIf this PR claims a performance improvement, please include evidence\nsuch as benchmark results.\n--\u003e\n\n# Are there any user-facing changes?\nno\n\u003c!--\nIf there are user-facing changes then we may require documentation to be\nupdated before approving the PR.\n\nIf there are any breaking changes to public APIs, please call them out.\n--\u003e"
    },
    {
      "commit": "f03e1bc028630561af7d4adfc000916337ec6ce5",
      "tree": "41e770e556f439860d3c8ac46a96f61f8f6415f3",
      "parents": [
        "259cff297c5a6b1015ad9ee02ebeb61b53f39a70"
      ],
      "author": {
        "name": "pchintar",
        "email": "89355405+pchintar@users.noreply.github.com",
        "time": "Wed Jun 03 06:06:16 2026 +0530"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 03 10:36:16 2026 +1000"
      },
      "message": "fix(ipc): handle duplicate projection indices in IPC reader (#9952)\n\n# Which issue does this PR close?\n\n- Closes #9950 .\n\n# Rationale for this change\n\nThe current IPC reader does not correctly handle duplicate projection\nindices.\n\n`Schema::project`(in `arrow-schema/src/schema.rs`) and\n`RecordBatch::project`(in `arrow-array/src/record_batch.rs`) both map\neach requested index directly, preserve the projection order and allow\nduplicate indices such as:\n\n```rust id\u003d\"n4pq0f\"\nvec![1, 1]\n```\n\nHowever, the IPC reader currently uses:\n\n```rust id\u003d\"gjklyo\"\nprojection.iter().position(|p| p \u003d\u003d \u0026idx)\n```\n\nwhich only returns the first matching entry. As a result, only one\ncolumn is decoded even though the projected schema contains multiple\nfields, leading to schema/column count mismatches when constructing the\n`RecordBatch`.\n\nThis also affects reordered duplicate projections such as:\n\n```rust id\u003d\"jlwmku\"\nvec![2, 0, 2]\n```\n\n# What changes are included in this PR?\n\n* Updated IPC projection handling in `arrow-ipc/src/reader.rs` to\npreserve all matching projection entries\n* Reused the decoded array for duplicate projection indices instead of\ndecoding the same field multiple times\n* Preserved projection order for reordered duplicate projections\n\n# Are these changes tested?\n\nYes.\n\nAdded `test_projection_duplicate_indices`, which verifies:\n\n* duplicate projections (`vec![1, 1]`)\n* reordered duplicate projections (`vec![2, 0, 2]`)\n\nThe test compares IPC projection results against `RecordBatch::project`.\n\nThe test fails before the fix and passes after it.\n\nAll existing `arrow-ipc` tests also pass `cargo test -p arrow-ipc --lib`\n\n# Are there any user-facing changes?\n\nNo."
    },
    {
      "commit": "259cff297c5a6b1015ad9ee02ebeb61b53f39a70",
      "tree": "d2d48e187d5a203781e8b344696f84f8e1af623e",
      "parents": [
        "cfc2b88d1d4e95e2de0dc5eeba4c23694058a903"
      ],
      "author": {
        "name": "Adam Reeve",
        "email": "adam.reeve@gr-oss.io",
        "time": "Wed Jun 03 11:49:20 2026 +1200"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 19:49:20 2026 -0400"
      },
      "message": "feat(parquet): add uses_key_retriever method to FileDecryptionProperties (#9895)\n\n# Which issue does this PR close?\n\n- Closes #9721.\n\n# Are these changes tested?\n\nYes, includes new unit tests.\n\n# Are there any user-facing changes?\n\nYes, there is a new public API method."
    },
    {
      "commit": "cfc2b88d1d4e95e2de0dc5eeba4c23694058a903",
      "tree": "756c51960ab0e0ff3c860e8e85f49515bde2ad64",
      "parents": [
        "1ae246942d09888633338bc623dcc53b07ba9d75"
      ],
      "author": {
        "name": "ClSlaid",
        "email": "cailue@apache.org",
        "time": "Wed Jun 03 05:09:37 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 17:09:37 2026 -0400"
      },
      "message": "Add coalesce inline-view filter benchmarks (#10050)\n\nThis is a benchmark-only companion patch for\nhttps://github.com/apache/arrow-rs/pull/9755.\n\nIt keeps the functional changes out of this PR and only adds benchmark\ncoverage in `arrow/benches/coalesce_kernels.rs` so the coalesce\ninline-view filter work can be tested independently.\n\nBenchmark coverage included:\n\n- filter and take coalesce benchmarks\n- primitive schemas\n- single-column `Utf8View` and `BinaryView`\n- mixed primitive + `Utf8View` and primitive + `BinaryView` schemas\n- filter cases for short inline strings with `max_string_len\u003d8`\n- filter/take cases for longer view strings, including\n`max_string_len\u003d20`, `30`, and `128` depending on scenario\n\nCoverage note:\n\n- The filter benchmarks cover the main short-inline path targeted by\n#9755 for both `Utf8View` and `BinaryView`.\n- The take benchmarks cover `Utf8View`/`BinaryView` and mixed schemas,\nbut do not add `max_string_len\u003d8` take variants. This patch keeps the\nbenchmark changes aligned with the benchmark patch currently carried by\n#9755.\n\nValidation:\n\n```text\ncargo fmt --package arrow\ncargo bench --bench coalesce_kernels -- --list\ngit diff --check\n```"
    },
    {
      "commit": "1ae246942d09888633338bc623dcc53b07ba9d75",
      "tree": "735b49a8575cb39a9ab10240bd7c91a1cefe6428",
      "parents": [
        "870fb06f73c45d9d09565d7b83cd83139a74424f"
      ],
      "author": {
        "name": "Ed Seidl",
        "email": "etseidl@users.noreply.github.com",
        "time": "Tue Jun 02 11:12:12 2026 -0700"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 11:12:12 2026 -0700"
      },
      "message": "chore: Remove some deprecated Arrow functions from the public API (#10040)\n\n# Which issue does this PR close?\n\nNone\n\n- related to #9110 \n\n# Rationale for this change\nHousecleaning for 59.0.0\n\n# What changes are included in this PR?\nRemove some deprecated functions from public Arrow APIs.\n\n# Are these changes tested?\nCovered by existing tests\n\n# Are there any user-facing changes?\nYes, deprecated public functions are removed\n\n---------\n\nCo-authored-by: Andrew Lamb \u003candrew@nerdnetworks.org\u003e"
    },
    {
      "commit": "870fb06f73c45d9d09565d7b83cd83139a74424f",
      "tree": "3ad01bcdda0e0d91e9153bf55d374a3b79933ca0",
      "parents": [
        "57eeb266af09f7fee47b6d4265ed2bdff6746929"
      ],
      "author": {
        "name": "Soiman Vasile",
        "email": "vasilecristiansoiman@gmail.com",
        "time": "Tue Jun 02 18:36:07 2026 +0300"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 11:36:07 2026 -0400"
      },
      "message": "test: add overflow tests for MutableBuffer (#9958)\n\n### Description\nWhile analyzing the memory allocation logic in `MutableBuffer`, I\nidentified that the `with_capacity` and `reserve` methods correctly use\n`.expect()` guards to prevent integer overflows.\n\nHowever, I couldnt find test cases for this `.expect()` guard in the\ncurrent test suite. This PR adds 2 `#[should_panic]` tests in\n`mutable.rs` to verify that the API correctly panics\n**Changes:**\n* Added `test_mutable_new_capacity_overflow` to cover\n`MutableBuffer::new`\n* Added `test_mutable_reserve_overflow` to cover\n`MutableBuffer::reserve`\n\n###  Rationale\nAdding these tests ensures that the safety guards in `arrow-buffer`\nremain intact and provides explicit coverage for edge cases involving\nnear-`usize::MAX` allocations.\n\n### Tests\n-  `test_mutable_new_capacity_overflow`\n- `test_mutable_reserve_overflow`\n\nCo-authored-by: Andrew Lamb \u003candrew@nerdnetworks.org\u003e"
    },
    {
      "commit": "57eeb266af09f7fee47b6d4265ed2bdff6746929",
      "tree": "b34f923cbbb7b2d193e02166064849f4d48e698e",
      "parents": [
        "6881f73b0b3621f2263cb57608a8eff58efe58cc"
      ],
      "author": {
        "name": "Frederic Branczyk",
        "email": "fbranczyk@gmail.com",
        "time": "Tue Jun 02 14:55:53 2026 +0000"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 10:55:53 2026 -0400"
      },
      "message": "arrow-cast: Add ability to cast plain struct to dictionary (#10039)\n\n# Which issue does this PR close?\n\n- Closes #10038\n\n# What changes are included in this PR?\n\nA naive implementation of casting plain structs to dictionaries, that\ndoesn\u0027t perform any deduplication.\n\n# Are these changes tested?\n\nUnit tests added.\n\n# Are there any user-facing changes?\n\nNo, just a new feature.\n\n@alamb @Jefffrey"
    },
    {
      "commit": "6881f73b0b3621f2263cb57608a8eff58efe58cc",
      "tree": "d77465b9cdf4cc81c80d4e217b74b2ab423bcf55",
      "parents": [
        "51ffd8c873feb3fbc1659ebffcc2c72fca796e94"
      ],
      "author": {
        "name": "František Zatloukal",
        "email": "Zatloukal.Frantisek@gmail.com",
        "time": "Tue Jun 02 17:55:40 2026 +0300"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 10:55:40 2026 -0400"
      },
      "message": "Adjust Variant size expectation for s390x architecture (#10027)\n\n# Which issue does this PR close?\n- Closes #10026 .\n\n# Rationale for this change\n\nThe Variant enum has a different size on s390x (72 bytes) compared to\nother 64-bit architectures (80 bytes) due to architecture-specific\nalignment and padding requirements.\n\n# What changes are included in this PR?\n\nThis change adds a conditional compilation check to expect 72 bytes on\ns390x while maintaining the existing 80-byte expectation for other\n64-bit platforms. This ensures the size check passes on s390x without\ncompromising the performance validation on other architectures.\n\n# Are these changes tested?\n\nBuild-time test only\n\n# Are there any user-facing changes?\n\nN/A\n\nSigned-off-by: František Zatloukal \u003cfzatlouk@redhat.com\u003e"
    },
    {
      "commit": "51ffd8c873feb3fbc1659ebffcc2c72fca796e94",
      "tree": "3b28ec1c1f9f485f7ca40b9a2defdee444fdf3cf",
      "parents": [
        "38778f0110725eb36f38f9fa2d43d2060b8cf928"
      ],
      "author": {
        "name": "theirix",
        "email": "theirix@gmail.com",
        "time": "Tue Jun 02 15:52:05 2026 +0100"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 10:52:05 2026 -0400"
      },
      "message": "fix: better error handling for negative size of FixedSizeBinary (#10042)\n\n# Which issue does this PR close?\n\n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n\n- Closes #10033.\n\n# Rationale for this change\n\nRelated to https://github.com/apache/datafusion/pull/22297, where using\n`FixedSizeBinary(-N)` caused failures. Actually, it will still panic,\nbut with a proper error. It could be a good idea to introduce\n`try_new_null` to carry an error gracefully - thoughts?\n\n# What changes are included in this PR?\n\n- Return a `Result` on negative byte width when possible\n- Panic explicitly with a proper error message otherwise\n- Avoid silent overflow with a direct `len as usize` cast\n- Reject negative FSB when parsing from tokens\n\n# Are these changes tested?\n\n- Tests are passing\n\n# Are there any user-facing changes?\n\nNo"
    },
    {
      "commit": "38778f0110725eb36f38f9fa2d43d2060b8cf928",
      "tree": "550aac3699223745bf98023d68a5b13f68e5448f",
      "parents": [
        "c58f0b1f09fb455ec95d3d87a409c5cfc92940f1"
      ],
      "author": {
        "name": "quantumish",
        "email": "freifeld.david@gmail.com",
        "time": "Mon Jun 01 15:32:35 2026 +0100"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 10:32:35 2026 -0400"
      },
      "message": "Replace `From\u003cVec\u003c_\u003e\u003e` impls with `TryFrom`s for `FixedSizeBinaryArray` (#10019)\n\n# Which issue does this PR close?\n\n- Closes #10018.\n\n# Rationale for this change\n\nThere isn\u0027t a clear way to fix the `From\u003cVec\u003c_\u003e\u003e` implementations for\n`FixedSizeBinaryArray` that wouldn\u0027t be confusing, so making them\n`TryFrom` is a better fit since they are in genuine use across e.g.\ntests within the Arrow library as well as a terser way of calling\n`FixedSizeBinaryArray::try_from_iter` or\n`FixedSizeBinaryArray::try_from_sparse_iter`.\n\n# What changes are included in this PR?\n\n- Converts `From\u003cVec\u003c\u0026[u8]\u003e\u003e`, `From\u003cVec\u003c\u0026[u8; N]\u003e\u003e`, and\n`From\u003cVec\u003cOption\u003c\u0026[u8]\u003e\u003e\u003e` implementations for `FixedSizeBinaryArray` to\n`TryFrom` implementations.\n- Adds a `TryFrom\u003cVec\u003cOption\u003c\u0026[u8; N]\u003e\u003e\u003e` implementation for the missing\ncombination of types.\n- Updates various test cases within the arrow/parquet libraries to use\n`try_from().unwrap()` instead of `from()`.\n\n# Are these changes tested?\n\nThis is sort of a transparent change in that only the API for expressing\nfailure cases has changed rather than the actual failure cases. All\nexisting tests surrounding conversion failures have been updated to\ncheck whether a conversion has correctly failed.\n\n# Are there any user-facing changes?\n\nYes, this is a breaking API change since user-facing trait\nimplementations have been replaced with different trait implementations."
    },
    {
      "commit": "c58f0b1f09fb455ec95d3d87a409c5cfc92940f1",
      "tree": "7b9fca5fdd2376a145a20aef1eccc2a8f9465ce8",
      "parents": [
        "511ad068ae2b7f511e16215d9d3c96fb0f334f2c"
      ],
      "author": {
        "name": "Ed Seidl",
        "email": "etseidl@users.noreply.github.com",
        "time": "Mon Jun 01 07:32:15 2026 -0700"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 10:32:15 2026 -0400"
      },
      "message": "chore: Remove some deprecated functions from parquet crate (#10035)\n\n# Which issue does this PR close?\nNone\n\n# Rationale for this change\nTidying up for 59.0.0. \n\n# What changes are included in this PR?\nRemoves public APIs that were due to be removed per the [deprecation\npolicy](https://github.com/apache/arrow-rs#deprecation-guidelines).\n\n# Are these changes tested?\nShould be covered by existing tests\n\n# Are there any user-facing changes?\nYes, public functions have been removed, including\n`ParquetMetaDataReader::with_page_indexes`, `read_columns_indexes`, and\n`read_offset_indexes`."
    },
    {
      "commit": "511ad068ae2b7f511e16215d9d3c96fb0f334f2c",
      "tree": "810e5dc1631f9d3b5284bd1d3fcb2ff6fc2257ae",
      "parents": [
        "9450f1e9cc17061049309e98b0cbccca91e7dfb7"
      ],
      "author": {
        "name": "Dan Mattheiss",
        "email": "dmatth1@users.noreply.github.com",
        "time": "Sat May 30 23:18:11 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sat May 30 20:18:11 2026 -0700"
      },
      "message": "bench(parquet): add Sbbf check/insert benchmarks (#10041)\n\nAdds `bench_check` and `bench_insert` benchmarks\nfor`Sbbf::{check,insert}`. Originally benchmarks were part of #10011 but\nwere split out to follow Contributing guidelines\n\n# Are these changes tested?\n\nBenchmarks compiled and ran using `cargo bench -p parquet --bench\nbloom_filter`.\n\n# Are there any user-facing changes?\n\nNo."
    },
    {
      "commit": "9450f1e9cc17061049309e98b0cbccca91e7dfb7",
      "tree": "83fc608a460f1c991e969710f7636c1644d40a98",
      "parents": [
        "1377761779afb1ce70a7a9a9038a308d2ff1ab88"
      ],
      "author": {
        "name": "Matt Butrovich",
        "email": "mbutrovich@users.noreply.github.com",
        "time": "Fri May 29 15:42:00 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 15:42:00 2026 -0400"
      },
      "message": "Call `align_buffers()` in `from_ffi`, remove redundant call from `arrow-pyarrow` (#10030)\n\n# Which issue does this PR close?\n\n- Closes #10028.\n\n# Rationale for this change\n\n`from_ffi` / `from_ffi_and_data_type` (and therefore\n`ArrowArrayStreamReader`) panic inside `ScalarBuffer::\u003ci128\u003e::from` when\nan FFI producer hands over a `Decimal128` buffer that is 8-byte aligned\nbut not 16-byte aligned. The producer is spec-conformant — the C Data\nInterface only recommends 8-byte alignment — but `align_of::\u003ci128\u003e() \u003d\u003d\n16` since Rust 1.77 on x86 (always on ARM), so arrow-rs\u0027s typed arrays\nrequire 16. JVM producers like arrow-java\u0027s `NettyAllocationManager` hit\nthis regularly.\n\nThe IPC reader already handles this by calling\n`ArrayData::align_buffers()` on import (default of\n`IpcReadOptions::require_alignment`, see #5554), and `arrow-pyarrow` was\npatched the same way for #6471 / apache/arrow#43552. The C Data\nInterface entry points were the missing piece.\n\n# What changes are included in this PR?\n\n- `arrow::ffi::from_ffi` and `from_ffi_and_data_type`: call\n`data.align_buffers()` after `consume()`. No-op when buffers are already\naligned; depends on #6462 making `align_buffers` recursive over child\ndata.\n- `arrow-pyarrow`: drop the now-redundant `array_data.align_buffers()`\ncall; it\u0027s covered by `from_ffi`.\n\n# Are these changes tested?\n\nYes. New regression test `test_decimal128_under_aligned_round_trip` in\n`arrow-array/src/ffi.rs` constructs an 8-aligned-not-16-aligned\n`Decimal128` buffer via `Buffer::from_vec(...).slice(8)`, imports\nthrough `from_ffi`, and asserts the resulting `Decimal128Array` values\nare correct. The test panics without the fix with the exact error from\n#10028.\n\n# Are there any user-facing changes?\n\nNo API changes. Behavior change: `from_ffi` / `from_ffi_and_data_type`\n(and `ArrowArrayStreamReader::next`) now silently realign under-aligned\nbuffers instead of panicking. Already-aligned producers are unaffected;\nmisaligned producers that previously panicked now succeed with a\none-time copy of the offending buffer."
    },
    {
      "commit": "1377761779afb1ce70a7a9a9038a308d2ff1ab88",
      "tree": "ab73c53ac8702431afd1da38df35f49d9edcfd68",
      "parents": [
        "e470187b93b13bf9821e67d5b2348a1d89612a39"
      ],
      "author": {
        "name": "eunsang",
        "email": "thecynicdog0328@gmail.com",
        "time": "Fri May 29 03:05:56 2026 +0900"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 28 11:05:56 2026 -0700"
      },
      "message": "Validate FIXED_LEN_BYTE_ARRAY length for DECIMAL and INTERVAL types (#9985)\n\n## Which issue does this PR close?\n\n- Closes #9984.\n\n## Rationale for this change\n\n`from_fixed_len_byte_array` in `parquet/src/arrow/schema/primitive.rs`\ndoes not validate `type_length`. While `PrimitiveTypeBuilder::build()`\nenforces these constraints during schema construction\n(`schema/types.rs:477` for INTERVAL, `:565-580` for DECIMAL), schemas\ndecoded directly from Thrift bypass that validation path entirely. As a\nresult:\n\n- `DECIMAL` with a `type_length` outside `1..\u003d32` was silently routed\nthrough `Decimal128` / `Decimal256` using invalid parameters.\n- `INTERVAL` with a `type_length !\u003d 12` silently returned\n`Interval(DayTime)` regardless.\n\nThe same function already rejects `FLOAT16` when `type_length !\u003d 2`.\nThis PR mirrors that pattern for DECIMAL and INTERVAL, closing the TODO\nintroduced in #1682.\n\n## What changes are included in this PR?\n\n- Added a `check_decimal_length` helper to reject `type_length` values\noutside `1..\u003d32` for both `LogicalType::Decimal` and\n`ConvertedType::DECIMAL`.\n- Added an inline `type_length \u003d\u003d 12` check for\n`ConvertedType::INTERVAL`.\n\n## Are these changes tested?\n\nYes. Added five new tests in\n`parquet/src/arrow/schema/primitive.rs::tests` covering:\n- Invalid lengths (`{-1, 0, 33}` for DECIMAL, `{0, 11, 13}` for\nINTERVAL)\n- Valid lengths (16 → `Decimal128`, 32 → `Decimal256`, 12 →\n`Interval(DayTime)`)\n\nTo exercise the reader-side check, the tests construct a valid\n`Type::PrimitiveType` via the builder and directly modify the\n`type_length` on the resulting enum, simulating a malformed schema\ndecoded from Thrift.\n\n## Are there any user-facing changes?\n\nNo public API changes. The only behavior change is on the reader side:\nschemas with an out-of-range `type_length` for DECIMAL or INTERVAL will\nnow return a `ParquetError::General` instead of silently producing a\nmismatched Arrow type."
    },
    {
      "commit": "e470187b93b13bf9821e67d5b2348a1d89612a39",
      "tree": "783588ca8641241d8bd24e3d5a476ee856a57586",
      "parents": [
        "2eeb805b8a8b8ca67788917ec2f5220eb3e6f958"
      ],
      "author": {
        "name": "Neil Conway",
        "email": "neil.conway@gmail.com",
        "time": "Wed May 27 16:16:41 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed May 27 16:16:41 2026 -0400"
      },
      "message": "fix: Reject empty strings when casting strings to decimal (#10010)\n\n# Which issue does this PR close?\n\n- Closes #10009.\n\n# Rationale for this change\n\nWhen casting string values to decimal, `parse_string_to_decimal_native`\ntreated empty strings and whitespace-only strings as valid input,\nresulting in a decimal with a value of 0. This is inconsistent with\n`parse_decimal` and how parsing and string -\u003e numeric casts work for\nfloating point types: in all of those cases, empty strings and\nwhitespace-only strings are rejected.\n\n# What changes are included in this PR?\n\n* Change `parse_string_to_decimal_native` to reject empty strings and\nwhitespace-only strings\n* Add test coverage\n\n# Are these changes tested?\n\nYes, new tests added.\n\n# Are there any user-facing changes?\n\nYes, this changes the behavior of string -\u003e decimal casts. The previous\nbehavior is (IMO) clearly incorrect but it is possible that some user\ncode relies upon it."
    },
    {
      "commit": "2eeb805b8a8b8ca67788917ec2f5220eb3e6f958",
      "tree": "0c9ff0976f1a49b208a79cb7d701d8a01ac37354",
      "parents": [
        "40ebcf03a12cd5934d4b9cf6ccd6e970eb658366"
      ],
      "author": {
        "name": "RIchard Baah",
        "email": "137434454+Rich-T-kid@users.noreply.github.com",
        "time": "Wed May 27 03:39:21 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed May 27 17:39:21 2026 +1000"
      },
      "message": "Implement AnyRee (#9959)\n\n# Which issue does this PR close?\ncloses #9909.\n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n\n- Closes #9909.\n\n# Rationale for this change\nmakes the API simpler to work with \u0026 less code duplication\n\u003c!--\nWhy are you proposing this change? If this is already explained clearly\nin the issue then this section is not needed.\nExplaining clearly why changes are proposed helps reviewers understand\nyour changes and offer better suggestions for fixes.\n--\u003e\n\n# What changes are included in this PR?\nReplace the per-key-type RunEndEncoded match arms in length/bit_length\n(arrow-string) and date_part (arrow-arith) with a single dispatch\nthrough the new `AsArray::as_any_ree_opt/as_any_ree` returning \u0026dyn\nAnyRunEndArray, mirroring the existing dictionary handling. This removes\nthe\nnow-unused `ree_map!` macro, leaving one trait-object code path for all\nInt16/Int32/Int64 run-end types.\n\u003c!--\nThere is no need to duplicate the description in the issue here but it\nis sometimes worth providing a summary of the individual changes in this\nPR.\n--\u003e\n\n# Are these changes tested?\nyes\n\n\u003c!--\nWe typically require tests for all PRs in order to:\n1. Prevent the code from being accidentally broken by subsequent changes\n2. Serve as another way to document the expected behavior of the code\n\nIf tests are not included in your PR, please explain why (for example,\nare they covered by existing tests)?\n\nIf this PR claims a performance improvement, please include evidence\nsuch as benchmark results.\n--\u003e\n\n# Are there any user-facing changes?\nno\n\u003c!--\nIf there are user-facing changes then we may require documentation to be\nupdated before approving the PR.\n\nIf there are any breaking changes to public APIs, please call them out.\n--\u003e"
    },
    {
      "commit": "40ebcf03a12cd5934d4b9cf6ccd6e970eb658366",
      "tree": "a83bf14a12fa6a993eadab1023cb7f8cc4f547b4",
      "parents": [
        "bbbe8a60b950d70b0f59991ee2099eb17f65ceb8"
      ],
      "author": {
        "name": "RIchard Baah",
        "email": "137434454+Rich-T-kid@users.noreply.github.com",
        "time": "Wed May 27 03:37:42 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed May 27 17:37:42 2026 +1000"
      },
      "message": "benchmarks for writing REE arrays to parquet (#9936)\n\n# Which issue does this PR close?\n\n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n\n- Closes #9935.\n\n# Rationale for this change\nthere is no way to currently tell which approach to writing out REE\ncolumns to parquet is more performant. This PR aims to solve that.\n\u003c!--\nWhy are you proposing this change? If this is already explained clearly\nin the issue then this section is not needed.\nExplaining clearly why changes are proposed helps reviewers understand\nyour changes and offer better suggestions for fixes.\n--\u003e\n\n# What changes are included in this PR?\nAdded a `create_string_ree_bench_batch()` function that builds record\nbatches of REE data — it plugs into the existing benchmark structure.\n\nFor controlling the shape of the generated REE arrays, I currently have\ntwo constants, `MIN_RUN` and `MAX_RUN`, that bound the run length. The\nintent is to let benchmarks cover long uniform runs as well as shorter /\nmore sparse data, rather than only one shape.\n\nAn alternative would be a small params struct with defaults that callers\ncan override — happy to switch to that if it\u0027s preferred, but that would\nrequire changing other callsites\n\u003c!--\nThere is no need to duplicate the description in the issue here but it\nis sometimes worth providing a summary of the individual changes in this\nPR.\n--\u003e\n\n# Are these changes tested?\nyes\n\u003c!--\nWe typically require tests for all PRs in order to:\n1. Prevent the code from being accidentally broken by subsequent changes\n2. Serve as another way to document the expected behavior of the code\n\nIf tests are not included in your PR, please explain why (for example,\nare they covered by existing tests)?\n\nIf this PR claims a performance improvement, please include evidence\nsuch as benchmark results.\n--\u003e\n\n# Are there any user-facing changes?\nno\n\u003c!--\nIf there are user-facing changes then we may require documentation to be\nupdated before approving the PR.\n\nIf there are any breaking changes to public APIs, please call them out.\n--\u003e"
    },
    {
      "commit": "bbbe8a60b950d70b0f59991ee2099eb17f65ceb8",
      "tree": "dc952a158fe3fe6412634f232b1bc32119469e2d",
      "parents": [
        "2b2a95ac966ba73b3d020259d43111d2f76c557b"
      ],
      "author": {
        "name": "Adrian Garcia Badaracco",
        "email": "1755071+adriangb@users.noreply.github.com",
        "time": "Tue May 26 18:40:23 2026 -0500"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue May 26 16:40:23 2026 -0700"
      },
      "message": "bench(parquet): add short and large string `arrow_writer` benchmarks (#10021)\n\n# Which issue does this PR close?\n\nSplit out of #9972 per [this review\ncomment](https://github.com/apache/arrow-rs/pull/9972#discussion_r3307256819).\n\n# Rationale for this change\n\n#9972 makes the parquet writer\u0027s mini-batch sizing byte-budget aware so\nlarge variable-width values don\u0027t produce oversized data pages. To\nmeasure that change against a stable baseline — and in particular to see\nthe difference in the large-string case — these benchmarks belong on\n`main` first.\n\n# What changes are included in this PR?\n\nAdds two BYTE_ARRAY write cases to the `arrow_writer` criterion bench:\n\n- **`short_string_non_null`** — 1M fixed-width 8-byte strings. The\nsmall-value hot path, where byte-budget-based sub-batch sizing should\nalways resolve to the full chunk (no granular splitting, no regression).\n- **`large_string_non_null`** — 1024 × 256 KiB strings (256 MiB total).\nThe large-value case: with the default 1 MiB page byte limit each value\nneeds its own page, and a `write_batch_size` of 1024 would otherwise\nbuffer all 256 MiB before the post-write size check runs.\n\nNo library code changes — benchmarks only.\n\n# Are there any user-facing changes?\n\nNo.\n\n🤖 Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "2b2a95ac966ba73b3d020259d43111d2f76c557b",
      "tree": "f78b91135aaaaa188c3203d86b1cc0395e912530",
      "parents": [
        "e28fd0d0f1f45e5608928eb52af3b12ffe9fee2e"
      ],
      "author": {
        "name": "ClSlaid",
        "email": "cailue@apache.org",
        "time": "Wed May 27 03:36:47 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue May 26 15:36:47 2026 -0400"
      },
      "message": "arrow: add oversized coalesce take benchmarks (#9799)\n\n## Summary\n- add `coalesce_kernels` take benchmarks alongside the existing filter\ncoverage for primitive, view, utf8, and dictionary schemas\n- add oversized repeated-index benchmark cases that stress the\nmaterialized `push_batch_with_indices` fallback with a configured\n`biggest_coalesce_batch_size`\n- extend the benchmark harness so take benchmarks can use explicit\noutput lengths and optionally drain all completed batches while running\n\n## Verification\n- cargo fmt -p arrow\n- cargo clippy -p arrow --bench coalesce_kernels --features test_utils\n-- -D warnings\n- cargo bench -p arrow --bench coalesce_kernels --features test_utils --\n\u0027extra_large_repeat\u0027\n\nSigned-off-by: 蔡略 \u003ccailue@apache.org\u003e"
    },
    {
      "commit": "e28fd0d0f1f45e5608928eb52af3b12ffe9fee2e",
      "tree": "9db8da9c4e7785e2dc9476c92b691ea2b4d0b733",
      "parents": [
        "c4e154f38b487b3bdefd2d3c7c429bf12a2bcb81"
      ],
      "author": {
        "name": "Konstantin Tarasov",
        "email": "33369833+sdf-jkl@users.noreply.github.com",
        "time": "Mon May 25 07:19:27 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon May 25 07:19:27 2026 -0400"
      },
      "message": "Add `DatePart` enum 1-indexed variants (#9965)\n\n# Which issue does this PR close?\n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n\n- Closes #9964.\n\n# Rationale for this change\n\nRemove extra computations done on the datafusion side for temporal\nfunctions.\n\u003c!--\nWhy are you proposing this change? If this is already explained clearly\nin the issue then this section is not needed.\nExplaining clearly why changes are proposed helps reviewers understand\nyour changes and offer better suggestions for fixes.\n--\u003e\n\n# What changes are included in this PR?\n\n- Added two new `DatePart` variant\n- Closed some tests lists drift\n\u003c!--\nThere is no need to duplicate the description in the issue here but it\nis sometimes worth providing a summary of the individual changes in this\nPR.\n--\u003e\n\n# Are these changes tested?\n\n- Unit tests\n\u003c!--\nWe typically require tests for all PRs in order to:\n1. Prevent the code from being accidentally broken by subsequent changes\n2. Serve as another way to document the expected behavior of the code\n\nIf tests are not included in your PR, please explain why (for example,\nare they covered by existing tests)?\n\nIf this PR claims a performance improvement, please include evidence\nsuch as benchmark results.\n--\u003e\n\n# Are there any user-facing changes?\n\nNo\n\n\u003c!--\nIf there are user-facing changes then we may require documentation to be\nupdated before approving the PR.\n\nIf there are any breaking changes to public APIs, please call them out.\n--\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "c4e154f38b487b3bdefd2d3c7c429bf12a2bcb81",
      "tree": "354e14f2abc193379541e1ed8c9542db935d6055",
      "parents": [
        "aa61e0724612e8eb9e576de2903827e9d6c23a8d"
      ],
      "author": {
        "name": "Konstantin Tarasov",
        "email": "33369833+sdf-jkl@users.noreply.github.com",
        "time": "Mon May 25 07:18:33 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon May 25 07:18:33 2026 -0400"
      },
      "message": "Support Shredded Lists/Array in `variant_get` (#8354)\n\n# Which issue does this PR close?\n\n- closes #9443.\n\n# Rationale for this change\n\nWe should be able to `variant_get` using Indices to path through\n`VariantArray`s\n\n# What changes are included in this PR?\n\n# Are these changes tested?\n\nYes, unit tested.\n\n# Are there any user-facing changes?\n\n---------\n\nCo-authored-by: Congxian Qiu \u003cqcx978132955@gmail.com\u003e\nCo-authored-by: Ryan Johnson \u003cscovich@users.noreply.github.com\u003e"
    },
    {
      "commit": "aa61e0724612e8eb9e576de2903827e9d6c23a8d",
      "tree": "87f496aed91e436a3e17e35ac73d679aac36f256",
      "parents": [
        "1c679ef8a7d0481596a29736a0bdea10f7eb4fed"
      ],
      "author": {
        "name": "Liam Bao",
        "email": "liam.zw.bao@gmail.com",
        "time": "Mon May 25 07:18:05 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon May 25 07:18:05 2026 -0400"
      },
      "message": "[arrow-select] Replace `ArrayData` with direct `Array` construction in filter kernels (#9986)\n\n# Which issue does this PR close?\n\n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n\n- Part of #9298.\n\n# Rationale for this change\n\n\u003c!--\nWhy are you proposing this change? If this is already explained clearly\nin the issue then this section is not needed.\nExplaining clearly why changes are proposed helps reviewers understand\nyour changes and offer better suggestions for fixes.\n--\u003e\n\n# What changes are included in this PR?\n\n\u003c!--\nThere is no need to duplicate the description in the issue here but it\nis sometimes worth providing a summary of the individual changes in this\nPR.\n--\u003e\n\n- Replaces several `ArrayDataBuilder` paths in\n`arrow-select/src/filter.rs` with direct typed array constructors.\n- Adds a small helper for filtered null buffers that reuses the\nalready-computed null count.\n\n# Are these changes tested?\n\n\u003c!--\nWe typically require tests for all PRs in order to:\n1. Prevent the code from being accidentally broken by subsequent changes\n2. Serve as another way to document the expected behavior of the code\n\nIf tests are not included in your PR, please explain why (for example,\nare they covered by existing tests)?\n\nIf this PR claims a performance improvement, please include evidence\nsuch as benchmark results.\n--\u003e\nCovered by exsiting tests\n\n# Are there any user-facing changes?\n\n\u003c!--\nIf there are user-facing changes then we may require documentation to be\nupdated before approving the PR.\n\nIf there are any breaking changes to public APIs, please call them out.\n--\u003e\nNo\n\n---------\n\nCo-authored-by: Andrew Lamb \u003candrew@nerdnetworks.org\u003e"
    },
    {
      "commit": "1c679ef8a7d0481596a29736a0bdea10f7eb4fed",
      "tree": "d31705c8d3dceb155908484955c14fa46a931b81",
      "parents": [
        "8acab7b5371470deff6c211899295d2bb3030dfc"
      ],
      "author": {
        "name": "Sergei",
        "email": "sv.sokolov@gmail.com",
        "time": "Mon May 25 18:00:30 2026 +0700"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon May 25 07:00:30 2026 -0400"
      },
      "message": "fix(parquet): exclude single-leaf struct roots from predicate cache (#9983)\n\n# Which issue does this PR close?\n\n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n\n- Closes #9982 .\n\n# Rationale for this change\n\n\u003c!--\nWhy are you proposing this change? If this is already explained clearly\nin the issue then this section is not needed.\nExplaining clearly why changes are proposed helps reviewers understand\nyour changes and offer better suggestions for fixes.\n--\u003e\n\n# What changes are included in this PR?\n\n\u003c!--\nThere is no need to duplicate the description in the issue here but it\nis sometimes worth providing a summary of the individual changes in this\nPR.\n--\u003e\n## Root cause\n\n`ProjectionMask::without_nested_types` (`parquet/src/arrow/mod.rs:427`)\ndecides which leaves the predicate cache may cover. The check before\nthis fix was:\n\n```rust\nif root_leaf_counts[root_idx] \u003d\u003d 1 \u0026\u0026 !root.is_list() {\n    included_leaves.push(leaf_idx);\n}\n```\n\nPR #8866 added `!root.is_list()` to exclude lists, but a **struct** root\nwith a single leaf still satisfies the condition and gets cached.\n\n## Fix (1 line)\n\n`parquet/src/arrow/mod.rs:455`:\n\n```diff\n-                if root_leaf_counts[root_idx] \u003d\u003d 1 \u0026\u0026 !root.is_list() {\n+                if root_leaf_counts[root_idx] \u003d\u003d 1 \u0026\u0026 root.is_primitive() {\n                     included_leaves.push(leaf_idx);\n                 }\n```\n\n# Are these changes tested?\n\n\u003c!--\nWe typically require tests for all PRs in order to:\n1. Prevent the code from being accidentally broken by subsequent changes\n2. Serve as another way to document the expected behavior of the code\n\nIf tests are not included in your PR, please explain why (for example,\nare they covered by existing tests)?\n\nIf this PR claims a performance improvement, please include evidence\nsuch as benchmark results.\n--\u003e\n## Tests added on the branch\n\n### 1. Reproducer integration test\n**File:** `parquet/tests/arrow_reader/predicate_cache.rs`\n**Name:** `test_async_predicate_on_single_leaf_nullable_struct`\n\nBuilds an in-memory Parquet file with `OPTIONAL group b { REQUIRED\nBYTE_ARRAY aa (UTF8); }`, writes two rows (parent NULL, parent\nnon-NULL), then runs the same `IS NULL` row filter through the async\nreader twice: once with the default cache, once with\n`with_max_predicate_cache_size(0)`. It asserts that\n\n- the uncached control yields exactly 1 row (`address` NULL row\nmatches);\n- the cached run yields the same row count as the uncached one.\n\n**Pre-fix:** panic at `struct_array.rs:142`.\n**Post-fix:** passes (1 row in both cases).\n\n### 2. Unit test\n**File:** `parquet/src/arrow/mod.rs` (test module)\n**Name:** `test_projection_mask_without_nested_single_leaf_struct`\n\nDirectly checks `ProjectionMask::without_nested_types` against a schema\nwith `OPTIONAL group address { REQUIRED BYTE_ARRAY street; } REQUIRED\nINT32 id`, for three input masks (single nested leaf, mixed, all\nleaves). All three expected outputs reflect that the struct\u0027s leaf is\nnow considered nested.\n\n**Pre-fix:** would return `Some([street_leaf])` for the single-leaf-only\nmask.\n**Post-fix:** returns `None` for the single-leaf-only mask; returns\n`Some([id])` for mixed.\n\n## Verification matrix\n\n| Test | Pre-fix | Post-fix |\n|---|---|---|\n| `test_projection_mask_without_nested_single_leaf_struct` (new unit) |\nwould FAIL | PASS |\n| `test_async_predicate_on_single_leaf_nullable_struct` (new\nintegration) | PANIC | PASS |\n| `predicate_cache::test_default_read` | PASS | PASS |\n| `predicate_cache::test_async_cache_with_filters` | PASS | PASS |\n| `predicate_cache::test_sync_cache_with_filters` | PASS | PASS |\n| `predicate_cache::test_cache_disabled_with_filters` | PASS | PASS |\n| `predicate_cache::test_cache_projection_excludes_nested_columns` |\nPASS | PASS |\n| `test_projection_mask_without_nested_*` (5 existing) | PASS | PASS |\n\n# Are there any user-facing changes?\n\n\u003c!--\nIf there are user-facing changes then we may require documentation to be\nupdated before approving the PR.\n\nIf there are any breaking changes to public APIs, please call them out.\n--\u003e"
    },
    {
      "commit": "8acab7b5371470deff6c211899295d2bb3030dfc",
      "tree": "74eb00a68a01bba67640b8fb34cb698290295120",
      "parents": [
        "7b335b7712089c1270ab1a989336908a240cc812"
      ],
      "author": {
        "name": "Neil Conway",
        "email": "neil.conway@gmail.com",
        "time": "Sun May 24 21:57:45 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon May 25 11:57:45 2026 +1000"
      },
      "message": "feat: Implement decimal \u003c-\u003e float16 casts (#10008)\n\n# Which issue does this PR close?\n\n- Closes #9123.\n\n# Rationale for this change\n\nArrow supports casts between decimal and float64/float32; for\nconsistency and completeness, we should also support casts between\ndecimal and float16.\n\nIn DataFusion, this will be particularly useful: once\nhttps://github.com/apache/datafusion/issues/14612 is fixed,\n`arrow_cast(0.0, \u0027Float16\u0027)` will no longer work, unless we first add\nsupport for decimal -\u003e float16 casts in arrow-rs.\n\n# What changes are included in this PR?\n\n* Add support for decimal -\u003e float16 cast\n* Add support for float16 -\u003e decimal cast\n* Add unit tests for new behavior\n* Update docs/comment on supported casts\n\n# Are these changes tested?\n\nYes; new tests added.\n\n# Are there any user-facing changes?\n\nYes; new casts are now supported. Otherwise no changes."
    },
    {
      "commit": "7b335b7712089c1270ab1a989336908a240cc812",
      "tree": "93f25d63635f0a6d0fb6e9fb848243a84fe9acf3",
      "parents": [
        "df89537ea98dd2b8228fdcb94cfd6a6b92935081"
      ],
      "author": {
        "name": "BoazC-MSFT",
        "email": "boazc@microsoft.com",
        "time": "Fri May 22 21:26:57 2026 +0300"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 22 14:26:57 2026 -0400"
      },
      "message": "fix: prevent panic in record reader when row group metadata overcounts num_rows (#9993)\n\n# Which issue does this PR close?\n\n- Closes #9992.\n\n# Rationale for this change\n\nThe record reader (`RowIter` / `get_row_iter`) panics with `index out of\nbounds` when a Parquet file\u0027s row group metadata declares more rows than\na column chunk actually contains. This happens in production when\nreading third-party Parquet files with mismatched metadata. Instead of\npanicking, the reader should return an error.\n\n# What changes are included in this PR?\n\nThree layers of fix in `parquet/src/record/`:\n\n**triplet.rs - fix the inconsistent internal state:**\n- Reset `curr_triplet_index` to 0 in the exhaustion path of `read_next`,\nso the stale index from the previous batch never persists alongside\nempty buffers.\n- Return 0 from `current_def_level` and `current_rep_level` when\n`has_next` is false, as defense-in-depth against any caller that skips\nthe `has_next` check.\n\n**reader.rs - return errors instead of panicking:**\n- Add `has_next()` guards before consuming column data in all\n`read_field` variants: `PrimitiveReader`, `OptionReader`,\n`RepeatedReader`, and `KeyValueReader`. When a column is exhausted\nmid-iteration, `read_field` now returns `Err(\"Unexpected end of column\ndata\")` which propagates through `ReaderIter::next` as `Some(Err(...))`.\n\n# Are these changes tested?\n\nYes. Five new tests:\n\n- `test_current_def_level_safe_after_exhaustion` - drives a\n`TripletIter` to exhaustion on an optional column and asserts\n`current_def_level()` returns 0 instead of panicking.\n- `test_current_rep_level_safe_after_exhaustion` - same for\n`current_rep_level()` on a repeated column.\n- `test_reader_iter_returns_error_when_num_records_exceeds_data` -\nexercises the full `ReaderIter` stack with an optional field (via\n`nulls.snappy.parquet`).\n-\n`test_reader_iter_returns_error_for_repeated_field_when_num_records_exceeds_data`\n- same for a repeated primitive field (via\n`repeated_primitive_no_list.parquet`).\n-\n`test_reader_iter_returns_error_for_map_field_when_num_records_exceeds_data`\n- same for a map field projected alone (via `map_no_value.parquet`).\n\nEach integration test inflates `num_records` by 1 beyond actual data,\nasserts all real rows return `Ok`, and asserts the extra iteration\nreturns `Err` containing \"Unexpected end of column data\".\n\n# Are there any user-facing changes?\n\nCallers of `get_row_iter` or `RowIter` that previously hit a panic on\ncorrupt/truncated files will now receive an `Err` from the iterator\ninstead. No API signature changes."
    },
    {
      "commit": "df89537ea98dd2b8228fdcb94cfd6a6b92935081",
      "tree": "3423d89599fba16f781483b3a8d0b6a6e776f69e",
      "parents": [
        "5372e8acf70ba6a9455d8b98edddb7368e4c71f5"
      ],
      "author": {
        "name": "Konstantin Tarasov",
        "email": "33369833+sdf-jkl@users.noreply.github.com",
        "time": "Fri May 22 14:24:20 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 22 14:24:20 2026 -0400"
      },
      "message": "[Variant] remove `BorrowedShreddingState` (#9791)\n\n# Which issue does this PR close?\n\n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n\n- Closes #9790.\n\n# Rationale for this change\n\nCheck issue\n\n\u003c!--\nWhy are you proposing this change? If this is already explained clearly\nin the issue then this section is not needed.\nExplaining clearly why changes are proposed helps reviewers understand\nyour changes and offer better suggestions for fixes.\n--\u003e\n\n# What changes are included in this PR?\n\n- Drop `BorrowedShreddingState`\n- Replace it with `ShreddingState`\n- ~~Removed the lifetimes in `unshred_variant` as they required helpers\nto cover recursive `ShreddingState` handling.~~\n- ~~Lifetimes removal introduces clone on `NullBuffer`. Extra 3 usize\n(24 bytes) per `Array`. Only used in `NullUnshredVariantBuilder`~~\nRemoved the only place where `NullBuffer` was stored. No regression.\n\u003c!--\nThere is no need to duplicate the description in the issue here but it\nis sometimes worth providing a summary of the individual changes in this\nPR.\n--\u003e\n\n# Are these changes tested?\n\nYes, unit tests.\n\u003c!--\nWe typically require tests for all PRs in order to:\n1. Prevent the code from being accidentally broken by subsequent changes\n2. Serve as another way to document the expected behavior of the code\n\nIf tests are not included in your PR, please explain why (for example,\nare they covered by existing tests)?\n--\u003e\n\n# Are there any user-facing changes?\n\nNo.\n\u003c!--\nIf there are user-facing changes then we may require documentation to be\nupdated before approving the PR.\n\nIf there are any breaking changes to public APIs, please call them out.\n--\u003e"
    },
    {
      "commit": "5372e8acf70ba6a9455d8b98edddb7368e4c71f5",
      "tree": "e8315681393e0d3e64470b7f187d8508afa751db",
      "parents": [
        "8c88bd49b5c4c019b3473b652267b46464d2e58f"
      ],
      "author": {
        "name": "Adam Gutglick",
        "email": "adam@spiraldb.com",
        "time": "Fri May 22 19:23:31 2026 +0100"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 22 14:23:31 2026 -0400"
      },
      "message": "Fix parquet-variant build on wasm targets (#9978)\n\n# Which issue does this PR close?\n\n- Closes #9977\n\n# Rationale for this change\n\nSeems like a trivial fix to get it building on more targets.\n\n# What changes are included in this PR?\n\n1. enables a feature for `uuid` that is required to build on WASM (only\nwhen building for WASM).\n2. Change the const size assertions to take pointer_width into account\n\n# Are these changes tested?\n\nI\u0027ve tested the change locally (for both WASM targets), not sure if its\nworth it to add to CI that currently only tests the top-level `arrow`\ncrate on WASM\n\n# Are there any user-facing changes?\n\nNone"
    },
    {
      "commit": "8c88bd49b5c4c019b3473b652267b46464d2e58f",
      "tree": "99609e80b5d43a517c3ec1cef293ea99ccadc07f",
      "parents": [
        "edfb9aba45e3fba6fdc52d525c8d4b4132a0d857"
      ],
      "author": {
        "name": "Kevin Choubacha",
        "email": "2043529+choubacha@users.noreply.github.com",
        "time": "Fri May 22 11:22:26 2026 -0700"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 22 14:22:26 2026 -0400"
      },
      "message": "Adds is_null function to RowAccessor (#9979)\n\n# Which issue does this PR close?\n\n- Closes #8076.\n\n# Rationale for this change\n\nWhen dealing with parquet files directly, having a built in option for\nchecking for nulls is useful when reading the rows out of the file. This\nfunction allows gating calls to the other accessors so that they can be\nmapped to None instead of relying on errors or accessing the field vec\nitself.\n\n# Are these changes tested?\n\nYes, a test was added.\n\n# Are there any user-facing changes?\n\nThis does add to the Row api. The trait function has a documentation\nblock."
    },
    {
      "commit": "edfb9aba45e3fba6fdc52d525c8d4b4132a0d857",
      "tree": "8c3075af8e41cd69fdbc548b016cb93be2de1c71",
      "parents": [
        "2f923f72989ab9df0cb02c749891c5ab3093f743"
      ],
      "author": {
        "name": "Ed Seidl",
        "email": "etseidl@users.noreply.github.com",
        "time": "Fri May 22 08:33:47 2026 -0700"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 22 08:33:47 2026 -0700"
      },
      "message": "Use Thrift macro to generate Parquet `LogicalType` serialization code (#9997)\n\n# Which issue does this PR close?\n\n- Closes #9995.\n\n# Rationale for this change\nSee issue. Improve code maintainability by using thrift macro to\ngenerate `LogicalType` serialization code.\n\n# What changes are included in this PR?\n\nAdds a new macro to generate code for a Thrift `union` that needs to be\nforward compatible. Does this by adding a catchall `_Unknown` variant\nfor unknown field ids.\n\n# Are there any user-facing changes?\nYes this is a breaking API change because the `LogicalType` enum will\nnow use tuple variants rather than struct. This also makes public some\nstructs that were previously private.\n\n---------\n\nCo-authored-by: Andrew Lamb \u003candrew@nerdnetworks.org\u003e"
    },
    {
      "commit": "2f923f72989ab9df0cb02c749891c5ab3093f743",
      "tree": "7f02c3014d6ccb7c4ed974eb0587723897cd896e",
      "parents": [
        "c46f419b4f87319637869021d46823abeb33ce1f"
      ],
      "author": {
        "name": "Swanand Mulay",
        "email": "73115739+swanandx@users.noreply.github.com",
        "time": "Fri May 22 12:28:39 2026 +0530"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 22 16:58:39 2026 +1000"
      },
      "message": "fix(arrow-cast): support full Date32 range when parsing extended-year dates (#9961)\n\n`Date32Type::parse` previously used `chrono::NaiveDate`, which caps at\nroughly +-262,143 years and rejected valid ISO 8601 extended-year inputs\nlike `+2739877-01-03`\n\nAs Gregorian repeats in 400-year era (146,097 days), we find the current\nera and then calculate \u0026 validate the date in current era. We recover\nthe absolute day count by adding era * 146,097.\n\n\nClaude code\u0027s help was taken to come up with this.\n\n# Which issue does this PR close?\n\n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n\n- Closes #9960 \n\n# Rationale for this change\n\n\u003c!--\nWhy are you proposing this change? If this is already explained clearly\nin the issue then this section is not needed.\nExplaining clearly why changes are proposed helps reviewers understand\nyour changes and offer better suggestions for fixes.\n--\u003e\n\nSupporting full range of date\u0027s allows other dependents like delta-rs to\nparse data written/managed by other engines like Spark which support\nfull Date32\n\n\nchanging `parse_date()` signature is also other option but would need\nchanges with Date64 as well.\n\n# What changes are included in this PR?\n\n\u003c!--\nThere is no need to duplicate the description in the issue here but it\nis sometimes worth providing a summary of the individual changes in this\nPR.\n--\u003e\n\ncalculating number of days without converting it full extended year to\nNaiveDate. And tests for it.\n\n# Are these changes tested?\n\n\u003c!--\nWe typically require tests for all PRs in order to:\n1. Prevent the code from being accidentally broken by subsequent changes\n2. Serve as another way to document the expected behavior of the code\n\nIf tests are not included in your PR, please explain why (for example,\nare they covered by existing tests)?\n\nIf this PR claims a performance improvement, please include evidence\nsuch as benchmark results.\n--\u003e\n\nadded tests and relying on existing tests for verification\n\n# Are there any user-facing changes?\n\n\u003c!--\nIf there are user-facing changes then we may require documentation to be\nupdated before approving the PR.\n\nIf there are any breaking changes to public APIs, please call them out.\n--\u003e\n\nMaybe as we parse some data successfully which would have previously\nbeen None / error\n\nSigned-off-by: Swanand Mulay \u003c73115739+swanandx@users.noreply.github.com\u003e"
    },
    {
      "commit": "c46f419b4f87319637869021d46823abeb33ce1f",
      "tree": "a1a8b9e6b7d653350874cefe27d3b7bdf00b0841",
      "parents": [
        "f7907211873fef5f80ae22b1c3779dd24041e940"
      ],
      "author": {
        "name": "mwish",
        "email": "maplewish117@gmail.com",
        "time": "Fri May 22 02:17:11 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 21 14:17:11 2026 -0400"
      },
      "message": "fix(cast): Trying to fix cast losting schema problem (#10005)\n\n# Which issue does this PR close?\n\n\n- Closes #10004 .\n\n# Rationale for this change\n\nPreviously, just `data_type` is considered. Now the field is taking into\naccount.\n\n# What changes are included in this PR?\n\nPreviously, just `data_type` is considered. Now the field is taking into\naccount.\n\n# Are these changes tested?\n\nYes\n\n# Are there any user-facing changes?\n\nMaybe cast would be a bit more strict"
    },
    {
      "commit": "f7907211873fef5f80ae22b1c3779dd24041e940",
      "tree": "ea05fa1ccfb0ef9a5a996516096797a77c59cb1e",
      "parents": [
        "4b80f0e1587b003aa01082fc3f8b15873800f219"
      ],
      "author": {
        "name": "Minh Vu",
        "email": "vuhoangminh97@gmail.com",
        "time": "Thu May 21 18:03:54 2026 +0200"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 21 09:03:54 2026 -0700"
      },
      "message": "fix(parquet): validate INT96 column metadata statistics (#10003)\n\n# Which issue does this PR close?\n\nCloses #10002.\n\n# Rationale for this change\n\nMalformed Parquet footer metadata can contain INT96 statistics whose\nencoded min or max value is longer than 12 bytes. The footer metadata\nconversion path checked that INT96 statistics were at least 12 bytes,\nbut then asserted they were exactly 12 bytes. That allowed malformed\ninput to panic instead of returning an error.\n\nThe page-statistics path already returns an error for non-12-byte INT96\nstatistics, so this change makes the footer metadata path behave\nconsistently.\n\n# What changes are included in this PR?\n\nThis PR replaces the INT96 min/max length assertions in footer metadata\nstatistics conversion with explicit `ParquetError` returns.\n\nIt also adds a regression test covering overlong INT96 min and max\nvalues in column metadata statistics.\n\n# Are these changes tested?\n\nYes. I ran:\n\n- `cargo fmt --all`\n- `cargo +stable fmt --all -- --check`\n- `cargo fmt -p parquet -- --check --config skip_children\u003dtrue $(find\n./parquet -name \"*.rs\" ! -name format.rs)`\n- `cargo test -p parquet --lib\nfile::metadata::thrift::tests::test_convert_stats_returns_error_for_overlong_int96_statistics`\n- `cargo test -p parquet --lib file::metadata::thrift::tests`\n- `cargo test -p parquet`\n- `cargo check -p parquet --all-targets`\n- `cargo clippy -p parquet --all-targets --all-features -- -D warnings`\n\n# Are there any user-facing changes?\n\nMalformed INT96 column metadata statistics now return an error instead\nof panicking."
    },
    {
      "commit": "4b80f0e1587b003aa01082fc3f8b15873800f219",
      "tree": "b6b6532d87f5974317a7dc70b92a6f46b93e5214",
      "parents": [
        "dc821a93e9eb0b49b77cbea12915076f7142bedf"
      ],
      "author": {
        "name": "Hippolyte Barraud",
        "email": "hippolyte.barraud@datadoghq.com",
        "time": "Wed May 20 17:08:20 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed May 20 17:08:20 2026 -0400"
      },
      "message": "feat(parquet): generalize value encoder inputs (#9955)\n\n# Which issue does this PR close?\n\n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n\n- Spawn off from #9653 \n- Contributes to #9731\n\n# Rationale for this change\n\n\u003c!--\nWhy are you proposing this change? If this is already explained clearly\nin the issue then this section is not needed.\nExplaining clearly why changes are proposed helps reviewers understand\nyour changes and offer better suggestions for fixes.\n--\u003e\n\nSee #9731\n\n# What changes are included in this PR?\n\nChanges `byte_array` encoder methods (`FallbackEncoder::encode`,\n`DictEncoder::encode`, etc) and all `get_*_array_slice` functions from\n`\u0026[usize]` to `impl ExactSizeIterator\u003cItem \u003d usize\u003e`.\n\n# Are these changes tested?\n\n\u003c!--\nWe typically require tests for all PRs in order to:\n1. Prevent the code from being accidentally broken by subsequent changes\n2. Serve as another way to document the expected behavior of the code\n\nIf tests are not included in your PR, please explain why (for example,\nare they covered by existing tests)?\n\nIf this PR claims a performance improvement, please include evidence\nsuch as benchmark results.\n--\u003e\n\nAll tests passing.\n\n# Are there any user-facing changes?\n\n\u003c!--\nIf there are user-facing changes then we may require documentation to be\nupdated before approving the PR.\n\nIf there are any breaking changes to public APIs, please call them out.\n--\u003e\n\nNone.\n\nSigned-off-by: Hippolyte Barraud \u003chippolyte.barraud@datadoghq.com\u003e"
    },
    {
      "commit": "dc821a93e9eb0b49b77cbea12915076f7142bedf",
      "tree": "ba2ac730e545d7e2d619f37f65649a25d04c882f",
      "parents": [
        "73c513afd36b0c6a3b41ef9dc135aee568184255"
      ],
      "author": {
        "name": "Rishab Joshi",
        "email": "8187657+rishvin@users.noreply.github.com",
        "time": "Wed May 20 13:44:00 2026 -0700"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed May 20 16:44:00 2026 -0400"
      },
      "message": "Add support for FixedSizeList to variant_to_arrow (#9663)\n\n- Closes #9531 \n\n# Rationale for this change\n\nAdd support for `FixedSizeList` when invoking `variant_to_arrow`.\n\n\n# What changes are included in this PR?\n- Introduces a new builder `VariantToFixedSizeListArrowRowBuilder`.\n- Adds test cases for shredding and getting variant by `FixedSizeList`.\n\n# Are these changes tested?\nBy adding few test cases.\n\n# Are there any user-facing changes?\nN/A.\n\n---------\n\nCo-authored-by: Konstantin Tarasov \u003c33369833+sdf-jkl@users.noreply.github.com\u003e"
    },
    {
      "commit": "73c513afd36b0c6a3b41ef9dc135aee568184255",
      "tree": "58a686e6f0b742e53ebe75c32f77b50a750387d6",
      "parents": [
        "39c8814f04150f4b8f1c3b30d7914022e7c2a292"
      ],
      "author": {
        "name": "RyanStewart",
        "email": "47729789+RyanJamesStewart@users.noreply.github.com",
        "time": "Wed May 20 11:49:47 2026 -0700"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed May 20 14:49:47 2026 -0400"
      },
      "message": "Bulk-fill definition levels for majority-null leaf columns (#9967)\n\n## Which issue does this PR close?\n\n- Contributes to #9731.\n\n## AI assistance\n\nImplementation drafted with AI assistance and iterated against the\nbenchmarks below. I\u0027ve reviewed and own the code, including the gate\nthreshold which I picked from the sweep in [Threshold\n(`BULK_FILL_MIN_LEN`)](#threshold-bulk_fill_min_len). Per the project\u0027s\n[CONTRIBUTING guidance on AI-generated\nsubmissions](https://github.com/apache/arrow-rs/blob/main/CONTRIBUTING.md#ai-generated-submissions).\n\n## Rationale for this change\n\nWhen writing a nullable leaf (primitive) Arrow array, `write_leaf`\nbuilds the definition-level buffer one element at a time, mapping each\nnull bit to a level. For columns that are mostly null this does\n~`num_rows` of branchy work and allocates a `num_rows`-element level\nbuffer even though almost every produced level is the same value. #9954\nadds an O(1) fast path for the *entirely* null case; this PR covers the\n*sparse* (mostly-but-not-entirely null) case it doesn\u0027t handle, the\nliteral subject of #9731 (\"a column that is 99% null … ~100x more work\nthan necessary\").\n\n## What changes are included in this PR?\n\nA single popcount pass over the null mask\n(`Buffer::count_set_bits_offset`, O(`num_rows`/64)) counts the valid\nvalues in the range. When the slice is majority-null, the\ndefinition-level buffer is bulk-filled with the null level (a vectorized\n`Vec::resize` memset) and only the non-null positions (from\n`NullBuffer::valid_indices()`) are overwritten. The existing per-row\npath is kept for non-majority-null slices, so balanced and null-light\ncolumns are unaffected. Both branches share the same `let range_nulls \u003d\nnulls.slice(range.start, len)` slicing idiom; the slow path uses\n`range_nulls.iter()` for the def-level map and\n`range_nulls.valid_indices().map(|i| i + range.start)` for\n`non_null_indices`, with no `unsafe`. Output is byte-identical: the\nlevel *values* are unchanged, just produced via memset+scatter (fast\npath) or via the high-level `NullBuffer` iterators (slow path) instead\nof a manual `BitIndexIterator` walk.\n\n## Threshold (`BULK_FILL_MIN_LEN`)\n\nThe bulk-fill fast path is gated on two conditions:\n\n- `len \u003e\u003d BULK_FILL_MIN_LEN` (currently 64). Per-call\nslice/popcount/iterator overhead only amortizes on sizable sub-ranges.\nList/struct paths call `write_leaf` many times with tiny ranges (avg\nlist length 1-5); paying any per-call popcount there would regress them.\nA threshold sweep at T \u003d {0, 16, 32, 64, 128, 256} on Ryzen 9 9950X\nshows the regression floor settles by T\u003d32, and the choice of 64 gives\n~12x margin over the average list length without losing the\nflat-primitive wins.\n- `nulls.null_count() * 2 \u003e\u003d nulls.len()`. The cached `null_count()` is\nO(1), so this check is free. We use the buffer-wide density as a\nheuristic for the sub-range; for full-array writes (the primary target,\nflat primitive columns) it\u0027s exact.\n\nEven when the gate skips the fast path, evaluating it across\nhigh-frequency call sites (~10K calls in some list benchmarks) is a\nsmall structural cost (~1-2% on list-sparse cases). The wins on the\ntargeted shapes (-35% sparse-primitive, -66% all-null primitive) far\noutweigh that. Reducing the cost further would require hoisting the\ndecision into the caller.\n\n## Are these changes tested?\n\nExisting tests cover this path: `cargo test -p parquet --features arrow\n--lib arrow_writer` is green (136 tests, full of nulls and roundtrips);\nfull `cargo test -p parquet --features arrow` green modulo the\npre-existing `PARQUET_TEST_DATA` submodule failures (unrelated, same on\n`main`). `cargo clippy -p parquet --features arrow --lib` and `cargo fmt\n--check` clean. The `unsafe get_unchecked_mut` flagged in the original\nrevision was replaced via `NullBuffer::valid_indices()`; the slow-path\nalso dropped its `unsafe value_unchecked` for the same reason.\n\n## Are there any user-facing changes?\n\nNone.\n\n## Benchmarks\n\n`cargo bench -p parquet --bench arrow_writer`, 1M rows × 7 nullable\nprimitive columns, local Ryzen 9 9950X:\n\n```\nprimitive_sparse_99pct_null/default   11.88 ms -\u003e 9.13 ms   (-23%)   \u003c- the case #9731 calls out\nprimitive_all_null/default             5.65 ms -\u003e 2.33 ms   (-59%)   (subsumed by #9954\u0027s O(1) path if that lands first)\nstruct_sparse_99pct_null/default       5.67 ms -\u003e 5.32 ms   (-6%)\nstruct_all_null/default                1.52 ms -\u003e 1.31 ms   (-14%)\nlist_primitive_sparse_99pct_null, primitive (25% null), primitive_non_null, bool, string:  within noise (no regression)\n```\n\nThe CI benchmark bot (GKE `c4a-highmem-16`, Neoverse-V2) on the\npost-fixup revision shows the same shape with stronger relative wins on\nthe targeted cases:\n\n```\nprimitive_all_null/default              2.47x (11.0ms -\u003e 4.4ms)\nprimitive_sparse_99pct_null/default     1.60x (16.8ms -\u003e 10.5ms)\nprimitive_all_null/{bloom_filter,cdc,parquet_2,zstd,zstd_parquet_2}    1.38x to 2.48x\nprimitive_sparse_99pct_null/{...}        1.28x to 1.59x\nlist_primitive*, list_primitive_sparse_99pct_null*:                    1.00x to 1.01x (within noise)\n```\n\nMicrobench of the definition-level fill in isolation: 10.3x @ 100%-null,\n8.6x @ 99%, 5.2x @ 90%, 1.9x @ 50%, 0.93x @ 10%, 0.81x @ 0%. Crossover ≈\n12-15% null, clean win above ~25%; the `\u003e\u003d 50% null` guard is\nconservative.\n\nThis is the *materialization*-cost half of #9731 (~30% of the 99%-null\nwrite); the *walk*-cost half, a run-length input to the level encoder so\nthe column writer doesn\u0027t even iterate all `num_rows` levels, is the\nlarger structural change #9653 is heading toward. This PR is\ndeliberately small and isolated so it lands independently of and rebases\ncleanly under that work.\n\n---------\n\nCo-authored-by: Ryan Stewart \u003cnoreply@example.com\u003e"
    },
    {
      "commit": "39c8814f04150f4b8f1c3b30d7914022e7c2a292",
      "tree": "8b160bc6cb801ab8601bc5b91bfe772fa4ecb42f",
      "parents": [
        "e7c37ded17298172ebfbe349d9044ef100f82e40"
      ],
      "author": {
        "name": "Raz Luvaton",
        "email": "16746759+rluvaton@users.noreply.github.com",
        "time": "Wed May 20 21:44:42 2026 +0300"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed May 20 14:44:42 2026 -0400"
      },
      "message": "feat: extract `has_false` and `has_true` from BooleanArray to `BooleanBuffer` and reuse for no nulls (#9987)\n\n# Which issue does this PR close?\n\nN/A\n\n# Rationale for this change\n\nSo we can use the useful helpers without creating temp `BooleanArray`\n\n# What changes are included in this PR?\n\nExtract non null variants for computing `has_false` and `has_true` from\n`BooleanArray` to `BooleanBuffer` and call them instead, and copied the\ntests for not nullable\n\n# Are these changes tested?\nYes\n\n# Are there any user-facing changes?\n2 new functions\n\n\n-----\n\nCc @alamb as talked in:\n- https://github.com/apache/datafusion/pull/22158#discussion_r3241321436\n\n---------\n\nCo-authored-by: Andrew Lamb \u003candrew@nerdnetworks.org\u003e"
    },
    {
      "commit": "e7c37ded17298172ebfbe349d9044ef100f82e40",
      "tree": "73272e5b47555930f1dce589f217f5645896454e",
      "parents": [
        "7c6eb2cbd958369fc2d44b5947bcfba480b6dbf8"
      ],
      "author": {
        "name": "Ed Seidl",
        "email": "etseidl@users.noreply.github.com",
        "time": "Wed May 20 11:44:15 2026 -0700"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed May 20 14:44:15 2026 -0400"
      },
      "message": "Add helper functions to create `LogicalType` struct variants (#9996)\n\n# Which issue does this PR close?\n\n- Part of #9995.\n\n# Rationale for this change\nBefore switching `LogicalType` from struct variants to tuple variants,\nadd some helper functions that will hide some of the increase in\ncomplexity.\n\n# What changes are included in this PR?\nAdds functions to the `LogicalType` impl for creating instances of the\nnon-unit variants (`Integer`, `Decimal`, `Time`, `Timestamp`, `Variant`,\n`Geometry`, `Geography`).\n\n# Are these changes tested?\nShould be covered by existing tests.\n\n# Are there any user-facing changes?\nAdds to the `LogicalType` API"
    },
    {
      "commit": "7c6eb2cbd958369fc2d44b5947bcfba480b6dbf8",
      "tree": "2bce7d61b07e0d26a6a98e2bae832eef5cb64ec2",
      "parents": [
        "accb1cf12a33878f179f8bb3c78c1a740fb97216"
      ],
      "author": {
        "name": "Adrian Garcia Badaracco",
        "email": "1755071+adriangb@users.noreply.github.com",
        "time": "Wed May 20 11:11:36 2026 -0700"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed May 20 14:11:36 2026 -0400"
      },
      "message": "feat(parquet): Add `ParquetPushDecoder::into_builder` to allow swapping projections / row filters at row group boundaries (#9968)\n\nThis is the decoder piece of the work presented at the NYC DataFusion\nmeetup.\n\nThe idea is that we\u0027ll be able to adaptively promote and demote filters\ninto row filters based on runtime selectivity stats.\n\n---------\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "accb1cf12a33878f179f8bb3c78c1a740fb97216",
      "tree": "3f3703eaf86ea07edeaf5d546af0a17bd94e3bca",
      "parents": [
        "d48c3057e1e8113463fe5d3cee3ede6db092eaf2"
      ],
      "author": {
        "name": "hsiang-c",
        "email": "137842490+hsiang-c@users.noreply.github.com",
        "time": "Wed May 20 10:51:53 2026 -0700"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed May 20 13:51:53 2026 -0400"
      },
      "message": "[parquet] Allow more encryption algorithms (#9203)\n\n# Which issue does this PR close?\n\n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n\n- Closes #9202.\n\n# Rationale for this change\n\n\u003c!--\nWhy are you proposing this change? If this is already explained clearly\nin the issue then this section is not needed.\nExplaining clearly why changes are proposed helps reviewers understand\nyour changes and offer better suggestions for fixes.\n--\u003e\n \n- Iceberg\n[spec](https://iceberg.apache.org/gcm-stream-spec/#encryption-algorithm)\nsupports AES key sizes of 128, 192 and 256 bits. Iceberg Rust depends on\n`arrow-rs` for Parquet I/O, I\u0027d like to start supporting AES 256 with\nthis PR.\n\n# What changes are included in this PR?\n\n\u003c!--\nThere is no need to duplicate the description in the issue here but it\nis sometimes worth providing a summary of the individual changes in this\nPR.\n--\u003e\n- `RingGcmBlockEncryptor` and `RingGcmBlockDecryptor` will pick AES-128\nor AES-256 based on key size\n- Refactor `encryption_async.rs` and `encryption.rs` to test both\nAES-128 and AES-256 encrypted parquet files\n\n\n# Are these changes tested?\n\n\u003c!--\nWe typically require tests for all PRs in order to:\n1. Prevent the code from being accidentally broken by subsequent changes\n2. Serve as another way to document the expected behavior of the code\n\nIf tests are not included in your PR, please explain why (for example,\nare they covered by existing tests)?\n--\u003e\n\nYes, unit test and on AES-256 encrypted Parquet files defined in\nhttps://github.com/apache/parquet-testing/tree/master/data/aes256\n\n# Are there any user-facing changes?\n\n\u003c!--\nIf there are user-facing changes then we may require documentation to be\nupdated before approving the PR.\n\nIf there are any breaking changes to public APIs, please call them out.\n--\u003e\n\nNo"
    },
    {
      "commit": "d48c3057e1e8113463fe5d3cee3ede6db092eaf2",
      "tree": "06cfdf6ab3c67bb5feba72cf35fbd81df4faf6b9",
      "parents": [
        "da09872193bb3805a3177678bf9005ddeea2333d"
      ],
      "author": {
        "name": "Alfonso Subiotto Marqués",
        "email": "alfonso.subiotto@polarsignals.com",
        "time": "Wed May 20 14:56:04 2026 +0200"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed May 20 08:56:04 2026 -0400"
      },
      "message": "fix(arrow-schema): allow empty metadata value for UUID extension type (#10001)\n\n# Which issue does this PR close?\n\n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n\n- Closes #10000\n\n# Rationale for this change\n\n\u003c!--\nWhy are you proposing this change? If this is already explained clearly\nin the issue then this section is not needed.\nExplaining clearly why changes are proposed helps reviewers understand\nyour changes and offer better suggestions for fixes.\n--\u003e\nSome ipc writers such as arrow-go unconditionally write an empty\nmetadata value, which breaks deserialization.\n\n# What changes are included in this PR?\n\n\u003c!--\nThere is no need to duplicate the description in the issue here but it\nis sometimes worth providing a summary of the individual changes in this\nPR.\n--\u003e\nEmpty check\n\n# Are these changes tested?\n\n\u003c!--\nWe typically require tests for all PRs in order to:\n1. Prevent the code from being accidentally broken by subsequent changes\n2. Serve as another way to document the expected behavior of the code\n\nIf tests are not included in your PR, please explain why (for example,\nare they covered by existing tests)?\n\nIf this PR claims a performance improvement, please include evidence\nsuch as benchmark results.\n--\u003e\nYes\n\n# Are there any user-facing changes?\n\n\u003c!--\nIf there are user-facing changes then we may require documentation to be\nupdated before approving the PR.\n\nIf there are any breaking changes to public APIs, please call them out.\n--\u003e\n\nSigned-off-by: Alfonso Subiotto Marques \u003calfonso.subiotto@polarsignals.com\u003e"
    },
    {
      "commit": "da09872193bb3805a3177678bf9005ddeea2333d",
      "tree": "2b075a41b25e01a952a9838d51cc8e0763b82c21",
      "parents": [
        "ae27070e71ececd9daefccc99b92394f361c87d4"
      ],
      "author": {
        "name": "Vegard Stikbakke",
        "email": "vegard.stikbakke@gmail.com",
        "time": "Wed May 20 03:44:03 2026 +0200"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed May 20 11:44:03 2026 +1000"
      },
      "message": "Implement native interleave for ListView (#9558)\n\nThis PR adds a native implementation of interleave for the ListView type\nwhich uses a good heuristic thanks to @asubiotto, either\n1. copy each row\u0027s elements and put them all into a new flat array, or \n2. concatenate all source value array (and adjust offsets).\n\nThe latter is best when there is sharing of elements.\n\nCloses #9342.\n\n---------\n\nSigned-off-by: Alfonso Subiotto Marques \u003calfonso.subiotto@polarsignals.com\u003e\nCo-authored-by: Alfonso Subiotto Marques \u003calfonso.subiotto@polarsignals.com\u003e"
    },
    {
      "commit": "ae27070e71ececd9daefccc99b92394f361c87d4",
      "tree": "643fe514f9e1a07b483f14c7bc49360087161291",
      "parents": [
        "fd1c5b391e169762a0981870c4e94baa3372d7a3"
      ],
      "author": {
        "name": "Congxian Qiu",
        "email": "qcx978132955@gmail.com",
        "time": "Tue May 19 02:55:20 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon May 18 14:55:20 2026 -0400"
      },
      "message": "[Variant] Align cast logic for from/to_decimal for variant (#9689)\n\n# Which issue does this PR close?\n\n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n\n- Closes #9688 .\n\n\n# What changes are included in this PR?\n\n\u003c!--\nThere is no need to duplicate the description in the issue here but it\nis sometimes worth providing a summary of the individual changes in this\nPR.\n--\u003e\n- Extract some logic in arrow-cast\n- Reuse the extracted logic in arrow-cast and parquet-variant\n\n# Are these changes tested?\nReuse the existing tests in arrow-test\n\n# Are there any user-facing changes?\n\nYes, changed the docs\n\u003c!--\nIf there are user-facing changes then we may require documentation to be\nupdated before approving the PR.\n\nIf there are any breaking changes to public APIs, please call them out.\n--\u003e"
    },
    {
      "commit": "fd1c5b391e169762a0981870c4e94baa3372d7a3",
      "tree": "8a948f5a58dfe09ab19b21e34a1ea753308c8d90",
      "parents": [
        "30185d6116c7ab6c50aa19d9c32defcf2937f720"
      ],
      "author": {
        "name": "Ed Seidl",
        "email": "etseidl@users.noreply.github.com",
        "time": "Fri May 15 08:19:21 2026 -0700"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 15 08:19:21 2026 -0700"
      },
      "message": "Safely ignore Parquet fields with unimplemented Thrift types (#9974)\n\n# Which issue does this PR close?\n\n- Closes #9973.\n\n# Rationale for this change\n\nThe thrift decoder should be able to skip fields with unimplemented\nthrift types `set`, `map`, and `uuid`.\n\n# What changes are included in this PR?\n\nFlesh out the thrift enums and add code to the skip function to handle\nthe above types.\n\n# Are these changes tested?\n\nYes, test is added\n\n# Are there any user-facing changes?\n\nNo, changes are to internal APIs"
    },
    {
      "commit": "30185d6116c7ab6c50aa19d9c32defcf2937f720",
      "tree": "19faa4b9388e16a90c72813a37886011ba40e4b5",
      "parents": [
        "2e8e0c750b930c5bc3138d434c6007ffb7c22e61"
      ],
      "author": {
        "name": "Ed Seidl",
        "email": "etseidl@users.noreply.github.com",
        "time": "Fri May 15 07:24:13 2026 -0700"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 15 10:24:13 2026 -0400"
      },
      "message": "perf: Remove `bool_val` from Parquet Thrift `FieldIdentifier` (#9945)\n\n# Which issue does this PR close?\n\n- Closes #9946.\n\n# Rationale for this change\nRemove some unnecessary branching from a hot path in the Thrift parser,\nimproving performance and making for easier to read code.\n\n# What changes are included in this PR?\nRemoves the `bool_val` field from `FieldIdentifier`.\n\n# Are these changes tested?\nCovered by existing tests\n\n# Are there any user-facing changes?\nNo, this is an internal-only API"
    },
    {
      "commit": "2e8e0c750b930c5bc3138d434c6007ffb7c22e61",
      "tree": "b60373afa37b481521f39ccafe9dfbbe0931f58a",
      "parents": [
        "f1ef71aa5aab3d5f3e3ff46f63dd198cf5da7b38"
      ],
      "author": {
        "name": "Dewey Dunnington",
        "email": "dewey@wherobots.com",
        "time": "Thu May 14 16:14:17 2026 -0500"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 14 17:14:17 2026 -0400"
      },
      "message": "Support ListView/BinaryView/RunEndEncoded types in integration test JSON parser (#9888)\n\n# Which issue does this PR close?\n\nSupporting unskipping more types in the Rust IPC/C Data tests for\nhttps://github.com/apache/arrow/pull/49910 /\nhttps://github.com/apache/arrow/issues/49744 .\n\n# Rationale for this change\n\nView types and decimal 32/64 are supported in Rust but aren\u0027t supported\nin the integration test JSON implementation (so they fail when the\nintegration test tries to check them).\n\n# What changes are included in this PR?\n\nIntegration test JSON now supports how these values are represented.\n\n# Are these changes tested?\n\nYes. I\u0027ve added to the embedded integration.json for the new types and\nI\u0027ve run the apache/arrow PR against this branch with these types no\nlonger being skipped.\n\n# Are there any user-facing changes?\n\nNo\n\n---------\n\nCo-authored-by: Copilot \u003ccopilot@github.com\u003e"
    },
    {
      "commit": "f1ef71aa5aab3d5f3e3ff46f63dd198cf5da7b38",
      "tree": "13ec7ec6cc468d9d12db32dc1a99b58593bca52a",
      "parents": [
        "44599851b90629c12f8211a0b467f5c59b84b9d5"
      ],
      "author": {
        "name": "Andrew Lamb",
        "email": "andrew@nerdnetworks.org",
        "time": "Thu May 14 16:42:58 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 14 16:42:58 2026 -0400"
      },
      "message": "[arrow-array] use usize arithmetic in FixedSizeBinaryArray, aggressive overflow checks (#9910)\n\n# Which issue does this PR close?\n\n- Closes #9906.\n\n# Rationale for this change\n\n`FixedSizeBinaryArray` still stores its public fixed width as `i32`,\nwhich means internal address calculations rely on repeated conversions\nbetween `i32` and pointer-sized offsets. We recently had issue with some\ni32 based arithmetic overflowing (see\nhttps://github.com/apache/arrow-rs/issues/9898)\n\nTo avoid inadvertently using i32 arithmetic, this PR proposes to change\nthe internal representation of the FixedSizeBinaryArray to use `usize`\nand compute byte positions using `usize` ( pointer-sized arithmetic)\ndirectly, with checked conversions only at the public API boundaries\nthat still require `i32`.\n\nI am quite pleased it is a net reduction in lines of code (admittedly\nmost of that was the checks added in\nhttps://github.com/apache/arrow-rs/pull/9872\n\n# What changes are included in this PR?\n\n- Store the fixed-width element size as `value_size: usize` inside\n`FixedSizeBinaryArray`.\n- Rewrite internal position calculations in accessors and slicing to use\n`usize` arithmetic.\n- Remove the old `validate_lengths` invariant that existed only to keep\ninternal `i32` offset arithmetic in range.\n- Remove implicit `as` casts from the implementation and replace them\nwith checked conversions or typed bindings.\n\n# Are these changes tested?\n\nThese changes are covered by CI.\n\n# Are there any user-facing changes?\n\nNo.\n\n---------\n\nCo-authored-by: Adam Reeve \u003cadreeve@gmail.com\u003e\nCo-authored-by: Adam Reeve \u003cadam.reeve@gr-oss.io\u003e"
    },
    {
      "commit": "44599851b90629c12f8211a0b467f5c59b84b9d5",
      "tree": "7f4ed886705d5bb0e204ed680498925f15362899",
      "parents": [
        "2108f20db1f6bc300bc6e1deacc0fca299e7feda"
      ],
      "author": {
        "name": "dependabot[bot]",
        "email": "49699333+dependabot[bot]@users.noreply.github.com",
        "time": "Thu May 14 16:12:30 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 14 16:12:30 2026 -0400"
      },
      "message": "chore(deps): bump peaceiris/actions-gh-pages from 4.0.0 to 4.1.0 (#9966)\n\nBumps\n[peaceiris/actions-gh-pages](https://github.com/peaceiris/actions-gh-pages)\nfrom 4.0.0 to 4.1.0.\n\u003cdetails\u003e\n\u003csummary\u003eRelease notes\u003c/summary\u003e\n\u003cp\u003e\u003cem\u003eSourced from \u003ca\nhref\u003d\"https://github.com/peaceiris/actions-gh-pages/releases\"\u003epeaceiris/actions-gh-pages\u0027s\nreleases\u003c/a\u003e.\u003c/em\u003e\u003c/p\u003e\n\u003cblockquote\u003e\n\u003ch2\u003eactions-github-pages v4.1.0\u003c/h2\u003e\n\u003cp\u003eSee \u003ca\nhref\u003d\"https://github.com/peaceiris/actions-gh-pages/blob/v4.0.0/CHANGELOG.md\"\u003eCHANGELOG.md\u003c/a\u003e\nfor more details.\u003c/p\u003e\n\u003ch2\u003eWhat\u0027s Changed\u003c/h2\u003e\n\u003cul\u003e\n\u003cli\u003eActions examples: update to modern versions of actions by \u003ca\nhref\u003d\"https://github.com/clintonsteiner\"\u003e\u003ccode\u003e@​clintonsteiner\u003c/code\u003e\u003c/a\u003e\nin \u003ca\nhref\u003d\"https://redirect.github.com/peaceiris/actions-gh-pages/pull/1117\"\u003epeaceiris/actions-gh-pages#1117\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003echore: update Node runtime and dependencies by \u003ca\nhref\u003d\"https://github.com/peaceiris\"\u003e\u003ccode\u003e@​peaceiris\u003c/code\u003e\u003c/a\u003e in \u003ca\nhref\u003d\"https://redirect.github.com/peaceiris/actions-gh-pages/pull/1147\"\u003epeaceiris/actions-gh-pages#1147\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003eci: harden GitHub Actions workflows by \u003ca\nhref\u003d\"https://github.com/peaceiris\"\u003e\u003ccode\u003e@​peaceiris\u003c/code\u003e\u003c/a\u003e in \u003ca\nhref\u003d\"https://redirect.github.com/peaceiris/actions-gh-pages/pull/1156\"\u003epeaceiris/actions-gh-pages#1156\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch2\u003eNew Contributors\u003c/h2\u003e\n\u003cul\u003e\n\u003cli\u003e\u003ca\nhref\u003d\"https://github.com/clintonsteiner\"\u003e\u003ccode\u003e@​clintonsteiner\u003c/code\u003e\u003c/a\u003e\nmade their first contribution in \u003ca\nhref\u003d\"https://redirect.github.com/peaceiris/actions-gh-pages/pull/1117\"\u003epeaceiris/actions-gh-pages#1117\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003e\u003cstrong\u003eFull Changelog\u003c/strong\u003e: \u003ca\nhref\u003d\"https://github.com/peaceiris/actions-gh-pages/compare/v4.0.0...v4.1.0\"\u003ehttps://github.com/peaceiris/actions-gh-pages/compare/v4.0.0...v4.1.0\u003c/a\u003e\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003c/details\u003e\n\u003cdetails\u003e\n\u003csummary\u003eChangelog\u003c/summary\u003e\n\u003cp\u003e\u003cem\u003eSourced from \u003ca\nhref\u003d\"https://github.com/peaceiris/actions-gh-pages/blob/main/CHANGELOG.md\"\u003epeaceiris/actions-gh-pages\u0027s\nchangelog\u003c/a\u003e.\u003c/em\u003e\u003c/p\u003e\n\u003cblockquote\u003e\n\u003ch1\u003e\u003ca\nhref\u003d\"https://github.com/peaceiris/actions-gh-pages/compare/v4.0.0...v4.1.0\"\u003e4.1.0\u003c/a\u003e\n(2026-05-12)\u003c/h1\u003e\n\u003ch3\u003echore\u003c/h3\u003e\n\u003cul\u003e\n\u003cli\u003eadd .codex/ (\u003ca\nhref\u003d\"https://github.com/peaceiris/actions-gh-pages/commit/94ae2d2c73d9417ae30f61ddead523dc54d56dab\"\u003e94ae2d2\u003c/a\u003e)\u003c/li\u003e\n\u003cli\u003eadd hasInstallScript true (\u003ca\nhref\u003d\"https://github.com/peaceiris/actions-gh-pages/commit/494ec9b2cc029a46119b4e13ff65f91eacbe1cf3\"\u003e494ec9b\u003c/a\u003e)\u003c/li\u003e\n\u003cli\u003eupdate Node runtime and dependencies (\u003ca\nhref\u003d\"https://redirect.github.com/peaceiris/actions-gh-pages/issues/1147\"\u003e#1147\u003c/a\u003e)\n(\u003ca\nhref\u003d\"https://github.com/peaceiris/actions-gh-pages/commit/954f6bf8259a6185f366f5cf13baee63745e0f79\"\u003e954f6bf\u003c/a\u003e),\ncloses \u003ca\nhref\u003d\"https://redirect.github.com/peaceiris/actions-gh-pages/issues/1147\"\u003e#1147\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch3\u003eci\u003c/h3\u003e\n\u003cul\u003e\n\u003cli\u003echange automerge to false (\u003ca\nhref\u003d\"https://github.com/peaceiris/actions-gh-pages/commit/4b09552702d0b65573696410d4707c765da2630b\"\u003e4b09552\u003c/a\u003e)\u003c/li\u003e\n\u003cli\u003eharden GitHub Actions workflows (\u003ca\nhref\u003d\"https://redirect.github.com/peaceiris/actions-gh-pages/issues/1156\"\u003e#1156\u003c/a\u003e)\n(\u003ca\nhref\u003d\"https://github.com/peaceiris/actions-gh-pages/commit/aa0466c1792bb558ed327a96629c4dd4ec390e48\"\u003eaa0466c\u003c/a\u003e),\ncloses \u003ca\nhref\u003d\"https://redirect.github.com/peaceiris/actions-gh-pages/issues/1156\"\u003e#1156\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch3\u003edocs\u003c/h3\u003e\n\u003cul\u003e\n\u003cli\u003eadd repository guidelines (\u003ca\nhref\u003d\"https://github.com/peaceiris/actions-gh-pages/commit/a1f94b504729eaee11b94d0f21ef5630241e8a52\"\u003ea1f94b5\u003c/a\u003e)\u003c/li\u003e\n\u003cli\u003ebump to v4 from v3 (\u003ca\nhref\u003d\"https://github.com/peaceiris/actions-gh-pages/commit/a16b61f0780be556cf97931905d261429ee79342\"\u003ea16b61f\u003c/a\u003e)\u003c/li\u003e\n\u003cli\u003efix note style (\u003ca\nhref\u003d\"https://github.com/peaceiris/actions-gh-pages/commit/0b7567fde6f7517edcc13d8ffa2d89cd8734d47c\"\u003e0b7567f\u003c/a\u003e)\u003c/li\u003e\n\u003cli\u003eupdate versions of actions (\u003ca\nhref\u003d\"https://redirect.github.com/peaceiris/actions-gh-pages/issues/1117\"\u003e#1117\u003c/a\u003e)\n(\u003ca\nhref\u003d\"https://github.com/peaceiris/actions-gh-pages/commit/aa83d0c2cfc3d813560e13068d3152aa21490171\"\u003eaa83d0c\u003c/a\u003e),\ncloses \u003ca\nhref\u003d\"https://redirect.github.com/peaceiris/actions-gh-pages/issues/1117\"\u003e#1117\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/blockquote\u003e\n\u003c/details\u003e\n\u003cdetails\u003e\n\u003csummary\u003eCommits\u003c/summary\u003e\n\u003cul\u003e\n\u003cli\u003e\u003ca\nhref\u003d\"https://github.com/peaceiris/actions-gh-pages/commit/84c30a85c19949d7eee79c4ff27748b70285e453\"\u003e\u003ccode\u003e84c30a8\u003c/code\u003e\u003c/a\u003e\nchore(release): 4.1.0\u003c/li\u003e\n\u003cli\u003e\u003ca\nhref\u003d\"https://github.com/peaceiris/actions-gh-pages/commit/6fa0f50907221d627dfc1f22925e09fc46a95139\"\u003e\u003ccode\u003e6fa0f50\u003c/code\u003e\u003c/a\u003e\nchore(release): Add build assets\u003c/li\u003e\n\u003cli\u003e\u003ca\nhref\u003d\"https://github.com/peaceiris/actions-gh-pages/commit/3b7506a0311b775872374907835d53bcfbbb7464\"\u003e\u003ccode\u003e3b7506a\u003c/code\u003e\u003c/a\u003e\nchore(deps): update dependency trim-newlines to v5 (\u003ca\nhref\u003d\"https://redirect.github.com/peaceiris/actions-gh-pages/issues/1158\"\u003e#1158\u003c/a\u003e)\u003c/li\u003e\n\u003cli\u003e\u003ca\nhref\u003d\"https://github.com/peaceiris/actions-gh-pages/commit/aa0466c1792bb558ed327a96629c4dd4ec390e48\"\u003e\u003ccode\u003eaa0466c\u003c/code\u003e\u003c/a\u003e\nci: harden GitHub Actions workflows (\u003ca\nhref\u003d\"https://redirect.github.com/peaceiris/actions-gh-pages/issues/1156\"\u003e#1156\u003c/a\u003e)\u003c/li\u003e\n\u003cli\u003e\u003ca\nhref\u003d\"https://github.com/peaceiris/actions-gh-pages/commit/31835fbbe39cd0ffade1ab81fac14a532b529633\"\u003e\u003ccode\u003e31835fb\u003c/code\u003e\u003c/a\u003e\nchore(deps): update actions/labeler action to v6 (\u003ca\nhref\u003d\"https://redirect.github.com/peaceiris/actions-gh-pages/issues/1153\"\u003e#1153\u003c/a\u003e)\u003c/li\u003e\n\u003cli\u003e\u003ca\nhref\u003d\"https://github.com/peaceiris/actions-gh-pages/commit/f4f1bc416d16988941232658cea5c06368f3373b\"\u003e\u003ccode\u003ef4f1bc4\u003c/code\u003e\u003c/a\u003e\nchore(deps): update peaceiris/actions-mdbook action to v2 (\u003ca\nhref\u003d\"https://redirect.github.com/peaceiris/actions-gh-pages/issues/1161\"\u003e#1161\u003c/a\u003e)\u003c/li\u003e\n\u003cli\u003e\u003ca\nhref\u003d\"https://github.com/peaceiris/actions-gh-pages/commit/a5e49793f6bdcb5cae6355701f7370ac849c8f20\"\u003e\u003ccode\u003ea5e4979\u003c/code\u003e\u003c/a\u003e\nchore(deps): update dependency ubuntu to v24 (\u003ca\nhref\u003d\"https://redirect.github.com/peaceiris/actions-gh-pages/issues/1159\"\u003e#1159\u003c/a\u003e)\u003c/li\u003e\n\u003cli\u003e\u003ca\nhref\u003d\"https://github.com/peaceiris/actions-gh-pages/commit/6cc3bac1ca327126c11b95063230514c80197c9c\"\u003e\u003ccode\u003e6cc3bac\u003c/code\u003e\u003c/a\u003e\nchore(deps): update github/codeql-action action to v4 (\u003ca\nhref\u003d\"https://redirect.github.com/peaceiris/actions-gh-pages/issues/1160\"\u003e#1160\u003c/a\u003e)\u003c/li\u003e\n\u003cli\u003e\u003ca\nhref\u003d\"https://github.com/peaceiris/actions-gh-pages/commit/0d6e9f4a6f26532ada0e15a7e783b34f9faad71a\"\u003e\u003ccode\u003e0d6e9f4\u003c/code\u003e\u003c/a\u003e\nchore(deps): update actions/setup-node action to v6 (\u003ca\nhref\u003d\"https://redirect.github.com/peaceiris/actions-gh-pages/issues/1154\"\u003e#1154\u003c/a\u003e)\u003c/li\u003e\n\u003cli\u003e\u003ca\nhref\u003d\"https://github.com/peaceiris/actions-gh-pages/commit/d70c101088107fa90acab16aa67e6db280eda929\"\u003e\u003ccode\u003ed70c101\u003c/code\u003e\u003c/a\u003e\nchore(deps): update actions/upload-artifact action to v7 (\u003ca\nhref\u003d\"https://redirect.github.com/peaceiris/actions-gh-pages/issues/1155\"\u003e#1155\u003c/a\u003e)\u003c/li\u003e\n\u003cli\u003eAdditional commits viewable in \u003ca\nhref\u003d\"https://github.com/peaceiris/actions-gh-pages/compare/v4.0.0...v4.1.0\"\u003ecompare\nview\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/details\u003e\n\u003cbr /\u003e\n\n\n[![Dependabot compatibility\nscore](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name\u003dpeaceiris/actions-gh-pages\u0026package-manager\u003dgithub_actions\u0026previous-version\u003d4.0.0\u0026new-version\u003d4.1.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)\n\nDependabot will resolve any conflicts with this PR as long as you don\u0027t\nalter it yourself. You can also trigger a rebase manually by commenting\n`@dependabot rebase`.\n\n[//]: # (dependabot-automerge-start)\n[//]: # (dependabot-automerge-end)\n\n---\n\n\u003cdetails\u003e\n\u003csummary\u003eDependabot commands and options\u003c/summary\u003e\n\u003cbr /\u003e\n\nYou can trigger Dependabot actions by commenting on this PR:\n- `@dependabot rebase` will rebase this PR\n- `@dependabot recreate` will recreate this PR, overwriting any edits\nthat have been made to it\n- `@dependabot show \u003cdependency name\u003e ignore conditions` will show all\nof the ignore conditions of the specified dependency\n- `@dependabot ignore this major version` will close this PR and stop\nDependabot creating any more for this major version (unless you reopen\nthe PR or upgrade to it yourself)\n- `@dependabot ignore this minor version` will close this PR and stop\nDependabot creating any more for this minor version (unless you reopen\nthe PR or upgrade to it yourself)\n- `@dependabot ignore this dependency` will close this PR and stop\nDependabot creating any more for this dependency (unless you reopen the\nPR or upgrade to it yourself)\n\n\n\u003c/details\u003e\n\nSigned-off-by: dependabot[bot] \u003csupport@github.com\u003e\nCo-authored-by: dependabot[bot] \u003c49699333+dependabot[bot]@users.noreply.github.com\u003e"
    },
    {
      "commit": "2108f20db1f6bc300bc6e1deacc0fca299e7feda",
      "tree": "a271c37a9a2fd0465c7d1262548028bf4febcb1f",
      "parents": [
        "28d7537c26de0ad3320cc44bc47521d1c66f02dd"
      ],
      "author": {
        "name": "Hippolyte Barraud",
        "email": "hippolyte.barraud@datadoghq.com",
        "time": "Thu May 14 15:22:12 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 14 15:22:12 2026 -0400"
      },
      "message": "feat(parquet): add all-null fast paths for level building (#9954)\n\n# Which issue does this PR close?\n\n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n\n- Spawn off from #9653 \n- Contributes to #9731\n\n# Rationale for this change\n\n\u003c!--\nWhy are you proposing this change? If this is already explained clearly\nin the issue then this section is not needed.\nExplaining clearly why changes are proposed helps reviewers understand\nyour changes and offer better suggestions for fixes.\n--\u003e\n\nSee #9731\n\n# What changes are included in this PR?\n\nWhen an entire list, struct, fixed-size list, or leaf array is null,\nskip per-row iteration and emit bulk uniform def/rep levels via\n`extend_uniform_levels` in O(1).\n\n# Are these changes tested?\n\n\u003c!--\nWe typically require tests for all PRs in order to:\n1. Prevent the code from being accidentally broken by subsequent changes\n2. Serve as another way to document the expected behavior of the code\n\nIf tests are not included in your PR, please explain why (for example,\nare they covered by existing tests)?\n\nIf this PR claims a performance improvement, please include evidence\nsuch as benchmark results.\n--\u003e\n\nAll tests passing + additional all null unit tests.\n\n# Are there any user-facing changes?\n\n\u003c!--\nIf there are user-facing changes then we may require documentation to be\nupdated before approving the PR.\n\nIf there are any breaking changes to public APIs, please call them out.\n--\u003e\n\nNone.\n\nSigned-off-by: Hippolyte Barraud \u003chippolyte.barraud@datadoghq.com\u003e"
    },
    {
      "commit": "28d7537c26de0ad3320cc44bc47521d1c66f02dd",
      "tree": "c89bb234229b71f17441e4c67739d3c2436790ee",
      "parents": [
        "821f42f7ead5c17280b984ca9ae33bab131e70c2"
      ],
      "author": {
        "name": "Yu-Chuan Hung",
        "email": "86523891+CuteChuanChuan@users.noreply.github.com",
        "time": "Fri May 15 02:19:02 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 14 14:19:02 2026 -0400"
      },
      "message": "refactor: make `BloomFilterProperties` fpp/ndv private with accessors (#9969)\n\n# Which issue does this PR close?\n\n- Follow-up to #9877; completes the remaining scope of #9667 (making\n`BloomFilterProperties` fields non-public and exposing accessors).\n\n# Rationale for this change\n\n#9877 introduced `BloomFilterPropertiesBuilder`. With the builder in\nplace, the public `fpp` / `ndv` fields constrain future evolution.\nLocking them down behind accessors makes the public API smaller and more\nevolvable, matching the original direction outlined in #9667.\n\n# What changes are included in this PR?\n\n- Make `BloomFilterProperties::fpp` and `BloomFilterProperties::ndv`\nprivate fields\n- Add `pub fn fpp(\u0026self) -\u003e f64` and `pub fn ndv(\u0026self) -\u003e u64` accessor\nmethods\n- Move the per-field documentation verbatim onto the accessor methods so\nit remains rendered by rustdoc (private fields are not rendered)\n\n# Are these changes tested?\nYes — the existing test suite covers the affected paths\n\n# Are there any user-facing changes?\n**Yes — this is a breaking change to the public API.**\n\n- External struct-literal construction no longer compiles:\n```rust\n// Before — no longer works\nBloomFilterProperties { fpp: 0.01, ndv: 10_000 }\n\n// Use the builder added in #9877 instead:\nBloomFilterProperties::builder()\n  .with_fpp(0.01)\n  .with_max_ndv(10_000)\n  .build()\n```\n- Direct field reads (props.fpp, props.ndv) no longer compile; use\nprops.fpp() / props.ndv() instead."
    },
    {
      "commit": "821f42f7ead5c17280b984ca9ae33bab131e70c2",
      "tree": "726a7ac163d3469d75273c0001991a40e6cd867c",
      "parents": [
        "86d3401273a4765f1e16b1938e9ab877d8171dd2"
      ],
      "author": {
        "name": "Andrew Lamb",
        "email": "andrew@nerdnetworks.org",
        "time": "Thu May 14 08:39:16 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 14 08:39:16 2026 -0400"
      },
      "message": "Add docs for `BitReader` (#9948)\n\n# Which issue does this PR close?\n\n- Related to https://github.com/apache/arrow-rs/pull/9372\n\n# Rationale for this change\n\nWhile reviewing the ALP implementation from @sdf-jkl , I ran into this\nstruct which I haven\u0027t really used before.\n- https://github.com/apache/arrow-rs/pull/9372\n\nNow that I have read it, I wanted to capture that information as doc\ncomments (for my future self and hopefully for others)\n\n# What changes are included in this PR?\n\nAdd documentation comments to `BitReader`\n\n# Are these changes tested?\n\nJust docs, \n\n# Are there any user-facing changes?\nJust docs on an internal struct,"
    },
    {
      "commit": "86d3401273a4765f1e16b1938e9ab877d8171dd2",
      "tree": "edc60c6326571bcd083ba3745823d0e46d930488",
      "parents": [
        "1ffd202aeef1af2a3b0ab2b4ae604a9edbac6e09"
      ],
      "author": {
        "name": "Andrew Lamb",
        "email": "andrew@nerdnetworks.org",
        "time": "Thu May 14 08:37:59 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 14 08:37:59 2026 -0400"
      },
      "message": "Add docs for `BitWriter` (#9949)\n\n# Which issue does this PR close?\n\n- Related to https://github.com/apache/arrow-rs/pull/9372\n\n\n# Rationale for this change\n\nSimilarly to https://github.com/apache/arrow-rs/pull/9948. I ran into\nBitWriter as part of reviewing code from @sdf-jkl and wanted to document\nmy findings (so I didn\u0027t have to re-read the code each time)\n- https://github.com/apache/arrow-rs/pull/9372\n\n\n\n# What changes are included in this PR?\n\nAdd docs \n\n# Are these changes tested?\n\nBy CI\n# Are there any user-facing changes?\n\nNo -- this is docs to an internal structure"
    },
    {
      "commit": "1ffd202aeef1af2a3b0ab2b4ae604a9edbac6e09",
      "tree": "3910f1926a6dc30b79a3983c7f43a735a264dbf8",
      "parents": [
        "48fa8a7a45567b9ab47c461771b968ba0d37812f"
      ],
      "author": {
        "name": "Jörn Horstmann",
        "email": "git@jhorstmann.net",
        "time": "Wed May 13 22:50:26 2026 +0200"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed May 13 13:50:26 2026 -0700"
      },
      "message": "Remove deprecated parquet::format module and thrift dependency (#9962)\n\n# Which issue does this PR close?\n\n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n\n- Closes #9953.\n\n# Rationale for this change\n\nRemoval of the `parquet::format` was planned for the 59.0 Release\n(#9110) according to code comments. There is also now a security\nadvisory against the Apache Thrift Rust implementation\n(https://github.com/advisories/GHSA-2F9F-GQ7V-9H6M), for which there is\nno fixed release yet on `crates.io`.\n\n\u003c!--\nWhy are you proposing this change? If this is already explained clearly\nin the issue then this section is not needed.\nExplaining clearly why changes are proposed helps reviewers understand\nyour changes and offer better suggestions for fixes.\n--\u003e\n\n# What changes are included in this PR?\n\n- Removal of the Apache Thrift dependency\n- Removal of the deprecated `parquet::format` module\n- Changes to the `parquet-layout` binary to remove printing of page\ndetails, since that still depended on the deprecated code\n\n# Are these changes tested?\n\nExisting tests pass as the code was unused.\n\n# Are there any user-facing changes?\n\nBreaking api change, since `parquet::format` was a public module.\n\nThe output of the `parquet-layout` binary changes."
    },
    {
      "commit": "48fa8a7a45567b9ab47c461771b968ba0d37812f",
      "tree": "ba86bdc14d206fdc51d0ec2de59787fb228f14b2",
      "parents": [
        "7abb2255f2de045ca8dcd12e1e46377e11de5b9f"
      ],
      "author": {
        "name": "Hippolyte Barraud",
        "email": "hippolyte.barraud@datadoghq.com",
        "time": "Tue May 12 11:03:34 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue May 12 11:03:34 2026 -0400"
      },
      "message": "feat(parquet): separate push decoder frontier state from row-group decoding (#9804)\n\n# Which issue does this PR close?\n\n- Prerequisite to #9697\n\n# Rationale for this change\n\n#9697 aims to make staged buffer management in the push decoder more\nexplicit. In doing so, it exposes a structural problem: the logic for\ndeciding whether a row group is still live, skipped, or unreachable is\nspread across several parts of the decoder.\n\nThis matters because row-group-level buffer release depends on a single\nquestion having a clear answer: can this row group ever need bytes\nagain? That answer depends on the queued row groups, the remaining\nselection, the running offset/limit budget, and whether predicates\nrequire the decoder to stay conservative. Today, that state is split\nacross multiple components, which makes the release policy difficult to\ncentralize cleanly.\n\n# What changes are included in this PR?\n\nThis PR introduces a clearer ownership boundary in the push decoder:\n\n- cross-row-group scan state is now handled by a dedicated\nfrontier/look-ahead mechanism\n- the row-group builder is reduced to current-row-group decode work only\n- offset/limit accounting and row-group selection advancement are\ncentralized around that frontier/builder split\n\nThis does not implement row-group-level buffer release directly, but it\nestablishes the structure needed for that follow-up work. It should also\nmake future pruning rules easier to add and maintain.\n\n# Are these changes tested?\n\nAll existing tests pass, and the refactor adds focused coverage for the\nextracted budget logic and the frontier-driven `try_next_reader` path.\n\n# Are there any user-facing changes?\n\nNone.\n\n---------\n\nSigned-off-by: Hippolyte Barraud \u003chippolyte.barraud@datadoghq.com\u003e"
    },
    {
      "commit": "7abb2255f2de045ca8dcd12e1e46377e11de5b9f",
      "tree": "d0a83e88157a38c0b7a152712e0ee867932c466a",
      "parents": [
        "6ce4bc899644c1b0816072f3260d835a0c14d148"
      ],
      "author": {
        "name": "Hippolyte Barraud",
        "email": "hippolyte.barraud@datadoghq.com",
        "time": "Thu May 07 14:25:50 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 07 14:25:50 2026 -0400"
      },
      "message": "bench(parquet): add `ListArray` benchmarks for runtime and peak memory (#9846)\n\n# Which issue does this PR close?\n\n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n\n- Contributes to #9731\n- Dependency of #9848\n\n# Rationale for this change\n\nSee #9848\n\nExisting benchmarks have some gaps in the types of columns they\nexercise. Additionally, I would like to improve the memory efficiency of\nthe read/decode path in terms of RSS requirements, especially for sparse\ninputs and we currently do not have any infrastructure to measure that.\n\n# What changes are included in this PR?\n\nExtend the existing `arrow_reader` runtime benchmarks with `Int32` and\n`FixedBinary32` list columns alongside the existing `StringList`, with\nparameterized null density (0%, 50%, 90%, 99%). The prior benchmarks\nonly covered string lists, which didn\u0027t surface costs specific to\nfixed-width and primitive element types.\n\nAdd a new `arrow_reader_peak_memory` benchmark that measures peak heap\nusage during `ListArrayReader::consume_batch` using a thread-local\ntracking allocator. It captures how RSS-efficient we are when\nmaterializing a column into its final Arrow in-memory representation.\n\n# Are these changes tested?\n\nAll tests passing.\n\n# Are there any user-facing changes?\n\nNone.\n\nSigned-off-by: Hippolyte Barraud \u003chippolyte.barraud@datadoghq.com\u003e\nCo-authored-by: Andrew Lamb \u003candrew@nerdnetworks.org\u003e"
    },
    {
      "commit": "6ce4bc899644c1b0816072f3260d835a0c14d148",
      "tree": "8f7dee67514b963350a9b1343f0d96bcbf80f9eb",
      "parents": [
        "3c71d928b65da7a85e96ce9e81d6c0a67d4be193"
      ],
      "author": {
        "name": "Ed Seidl",
        "email": "etseidl@users.noreply.github.com",
        "time": "Thu May 07 11:24:08 2026 -0700"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 07 11:24:08 2026 -0700"
      },
      "message": "Validate encoded Thrift lists match the schema (#9924)\n\n# Which issue does this PR close?\n\n- Part of #9923.\n\n# Rationale for this change\n\nA first attempt at adding some validation. This will check that the\nencoded list element type matches what is expected from the Parquet\nschema.\n\n# What changes are included in this PR?\nAdds an `ELEMENT_TYPE` to the `ReadThrift` trait for use in validating\ndata types in `read_thrift_vec`.\n\n# Are these changes tested?\n\nShould be covered by existing. These changes also cause an earlier error\ndetection in an existing test of malformed data.\n\n# Are there any user-facing changes?\n\nNo, just improves error handling"
    },
    {
      "commit": "3c71d928b65da7a85e96ce9e81d6c0a67d4be193",
      "tree": "e3422c4d5c7be04d20df587743f17732450a70b8",
      "parents": [
        "c1507ad20a3dad44353bd9fb4c489785298d10d8"
      ],
      "author": {
        "name": "Alfonso Subiotto Marqués",
        "email": "alfonso.subiotto@polarsignals.com",
        "time": "Thu May 07 20:12:08 2026 +0200"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 07 14:12:08 2026 -0400"
      },
      "message": "perf[arrow-select]: add specialized REE interleave (#9856)\n\nBenchmarks for this PR are in #9849. They have been separated out so we\ncan compare this PR to main once the benchmarks have merged.\n\nThe specialized interleave works by preserving run ends as much as\npossible by coalescing groups of adjacent logical indices pointing to\nthe same source and calling interleave on the run end values.\n\nFuture work could additionally coalesce values across sources, but this\nrequires a value equality check.\n\n# Which issue does this PR close?\n\n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n\n- None\n\n# Rationale for this change\n\n\u003c!--\nWhy are you proposing this change? If this is already explained clearly\nin the issue then this section is not needed.\nExplaining clearly why changes are proposed helps reviewers understand\nyour changes and offer better suggestions for fixes.\n--\u003e\ninterleave_fallback on REE arrays is slow\n\n# What changes are included in this PR?\n\n\u003c!--\nThere is no need to duplicate the description in the issue here but it\nis sometimes worth providing a summary of the individual changes in this\nPR.\n--\u003e\nA specialized REE interleave implementation\n\n# Are these changes tested?\n\n\u003c!--\nWe typically require tests for all PRs in order to:\n1. Prevent the code from being accidentally broken by subsequent changes\n2. Serve as another way to document the expected behavior of the code\n\nIf tests are not included in your PR, please explain why (for example,\nare they covered by existing tests)?\n\nIf this PR claims a performance improvement, please include evidence\nsuch as benchmark results.\n--\u003e\nYes, by existing tests.\n\n# Are there any user-facing changes?\n\n\u003c!--\nIf there are user-facing changes then we may require documentation to be\nupdated before approving the PR.\n\nIf there are any breaking changes to public APIs, please call them out.\n--\u003e\n\nSigned-off-by: Alfonso Subiotto Marques \u003calfonso.subiotto@polarsignals.com\u003e"
    },
    {
      "commit": "c1507ad20a3dad44353bd9fb4c489785298d10d8",
      "tree": "ff97f4ea4ad3ea87f5bc2ba817daed03161db3ba",
      "parents": [
        "aa3c9d3ecc35891793d962015fd2cdff708ac63f"
      ],
      "author": {
        "name": "Rostislav Rumenov",
        "email": "rostislav.rumenov@gmail.com",
        "time": "Thu May 07 20:10:50 2026 +0200"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 07 14:10:50 2026 -0400"
      },
      "message": "generic channel support for FlightClient (#9933)\n\nAllow FlightServiceClient to be parameterized over the underlying\nchannel type, so\n  users can wrap a tonic channel with custom interceptors or services.\nMotivation: Annotating outbound Flight requests with metadata (e.g.\ninjecting\nOpenTelemetry trace context into headers) currently requires forking or\nwrapping at\na higher level. Making the channel generic lets callers compose tower\nlayers/interceptors idiomatically and propagate distributed tracing\ncontext without\n  bespoke plumbing.\n\n---------\n\nCo-authored-by: Rostislav Rumenov \u003crostislav.rumenov@qube-rt.com\u003e"
    },
    {
      "commit": "aa3c9d3ecc35891793d962015fd2cdff708ac63f",
      "tree": "865dcf208ef54abc90c0ff938602c50eb93f95e7",
      "parents": [
        "5d464b58f9fc183141e4501df6759e17d4c71507"
      ],
      "author": {
        "name": "Yu-Chuan Hung",
        "email": "86523891+CuteChuanChuan@users.noreply.github.com",
        "time": "Fri May 08 02:03:15 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 07 14:03:15 2026 -0400"
      },
      "message": "feat(parquet): add BloomFilterPropertiesBuilder (#9877)\n\n# Which issue does this PR close?\n- Closes #9667.\n\n# Rationale for this change\nNo builder exists for `BloomFilterProperties`, so callers write\n`BloomFilterProperties { fpp, ndv }` literals — pinning field layout to\nthe API and skipping fpp validation. `WriterPropertiesBuilder` also has\nno setter that takes a built `BloomFilterProperties`.\n\n# What changes are included in this PR?\n- `BloomFilterPropertiesBuilder` (`with_fpp`, `with_max_ndv`, `build`,\n`try_build`). Two entry points per discussion in the issue:\n`BloomFilterProperties::builder()` and\n`BloomFilterPropertiesBuilder::new()`.\n- `WriterPropertiesBuilder::set_bloom_filter_properties` + per-column\nvariant. NDV from the passed-in struct is honoured (no row-group-size\noverride). For dynamic NDV, keep using `set_bloom_filter_enabled` /\n`set_bloom_filter_fpp`.\n- Renamed `set_bloom_filter_ndv` → `set_bloom_filter_max_ndv` (also\nper-column). Old names are `#[deprecated(since \u003d \"59.0.0\")]` aliases.\n\n\n# Are these changes tested?\nYes — 10 new unit tests + a doc-test.\n\n# Are there any user-facing changes?\nAdditive only. `set_bloom_filter_ndv` (and per-column) emit a\ndeprecation warning pointing to `_max_ndv`."
    },
    {
      "commit": "5d464b58f9fc183141e4501df6759e17d4c71507",
      "tree": "8d3f19914d305978e6131bbe1d9cba09faa98afe",
      "parents": [
        "76c381ff0a2cd3f0b5f6690b3491428ffa0806a0"
      ],
      "author": {
        "name": "Ed Seidl",
        "email": "etseidl@users.noreply.github.com",
        "time": "Thu May 07 10:56:58 2026 -0700"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 07 13:56:58 2026 -0400"
      },
      "message": "Add `CompressionCodec` Thrift enum for Parquet metadata (#9864)\n\n# Which issue does this PR close?\n\n- Part of #9863.\n\n# Rationale for this change\nSee issue for more, but the idea is to separate Parquet metadata\nstructures from those used for configuration. This can reduce memory\nused by the metadata, and also allows use of the thrift macros, reducing\nmaintenance burden.\n\n# What changes are included in this PR?\nThis adds a new `CompressionCodec` enum for use in the Parquet metadata,\nand means to convert between `CompressionCodec` and `Compression`.\n\n# Are these changes tested?\n\nShould be covered by existing tests, but new test of the interchange is\nalso added.\n\n# Are there any user-facing changes?\nNo"
    },
    {
      "commit": "76c381ff0a2cd3f0b5f6690b3491428ffa0806a0",
      "tree": "5406557f36244582c4d038a0a6e5900b86dd7a1b",
      "parents": [
        "e45354a4137ce92380953dcc23ffbf909cfa3539"
      ],
      "author": {
        "name": "Andrew Lamb",
        "email": "andrew@nerdnetworks.org",
        "time": "Thu May 07 13:56:33 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 07 13:56:33 2026 -0400"
      },
      "message": "Remove redundant benchmarks in `cast_kernels` (#9789)\n\n# Which issue does this PR close?\n\n- Follow on to https://github.com/apache/arrow-rs/pull/9729\n\n# Rationale for this change\n\n#9729, added many new cases to `cast_kernels` but many of these are\nredundant and increase the benchmark runtime without providing\nproportional value in coverage.\n\nThis PR reduces the redundancy by:\n1. Keeps one representative benchmark for each major physical code path\n(e.g., `i128` vs `i256` storage).\n2. Removes redundant combinations of target types (e.g., casting\n`decimal128` to every integer width when `int64` is sufficient).\n3. Consolidating invalid/error path testing into a single representative\ncase.\n4. Reducing the total number of benchmark cases from over 60 new\nadditions to 10 high-value cases.\n\n# What changes are included in this PR?\n\n- Pruned redundant decimal-to-integer/float and string/float-to-decimal\nbenchmarks in `arrow/benches/cast_kernels.rs`.\n- Added `create_primitive_array_range` helper to\n`arrow/src/util/bench_util.rs`\n\nCompared to main before PR #9729, the following benchmarks will be new\nafter my PR #9789 is merged:\n\n  1. New Decimal Casting Benchmarks\nThese cases cover the core performance paths for casting to and from\ndecimals using representative physical storage types (i128 and i256):\n\n   * cast string to decimal128(38, 3)\n   * cast float64 to decimal128(32, 3)\n   * cast invalid float64 to to decimal128(32, 3) (Error path testing)\n   * cast decimal128 to float64\n   * cast decimal128 to int64\n   * cast decimal256 to float64\n   * cast decimal256 to int64\n* cast decimal128 to decimal128 512 with lower scale (infallible)\n(specifically testing the fast path for infallible\n\n# Are these changes tested?\n\nCI covers verification. \n\n# Are there any user-facing changes?\n\nNo."
    },
    {
      "commit": "e45354a4137ce92380953dcc23ffbf909cfa3539",
      "tree": "666f1d3e314651f7d295fe94abb06e2879bfff03",
      "parents": [
        "c025c48f284f1c1829cd1c469ac0232d0ec9c79f"
      ],
      "author": {
        "name": "Adam Gutglick",
        "email": "adam@spiraldb.com",
        "time": "Thu May 07 18:56:13 2026 +0100"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 07 13:56:13 2026 -0400"
      },
      "message": "Remove deprecated legacy `like` kernels in `arrow-string` (#9674)\n\n# Which issue does this PR close?\n\n- Closes #9675 \n\n# Rationale for this change\n\nThe legacy kernels have been deprecated for 2 years and hidden from\ndocs, and its just more code to build.\n\n# What changes are included in this PR?\n\n1. Removing long deprecated legacy like kernels\n2. Keeps all tests, just folded them into the existing pattern. This\nalso increases the effective coverage - testing scalar comparison both\nfor Utf8 scalars and Dict scalars\n\n# Are these changes tested?\n\nNo functional changes, but refactors existing tests\n\n# Are there any user-facing changes?\n\nOnly for users using long hidden and deprecated functions, but I this\nisn\u0027t a meaningfully public API. The functionality is also still\navailable with very minor changes.\n\nCo-authored-by: Andrew Lamb \u003candrew@nerdnetworks.org\u003e"
    },
    {
      "commit": "c025c48f284f1c1829cd1c469ac0232d0ec9c79f",
      "tree": "e08ff7ac3b02755f2aa47fb5702e814e0c5da87f",
      "parents": [
        "13f5f940645fd17527a8397b296f04dc1f7da1d7"
      ],
      "author": {
        "name": "Ed Seidl",
        "email": "etseidl@users.noreply.github.com",
        "time": "Thu May 07 10:17:59 2026 -0700"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 07 13:17:59 2026 -0400"
      },
      "message": "[Parquet]: GH-563: Make `path_in_schema` optional (#9678)\n\n# Which issue does this PR close?\n\nnone\n\n# Rationale for this change\nThis is a proof of concept implementation for\nhttps://github.com/apache/parquet-format/issues/563\n\n# What changes are included in this PR?\n\nSince version 57.0.0, this crate has been tolerant of a missing\n`path_in_schema`. This PR adds options to cease writing the field as\nwell. The option defaults to continuing to write the field.\n\nSee related discussion on parquet mailing list:\nhttps://lists.apache.org/thread/czm2bk45wwtkhhpqxqvmx9dk5wkwk1kt\n\n# Are these changes tested?\n\nYes\n\n# Are there any user-facing changes?\n\nNo, this only adds an optional behavior change that defaults to no\nchange\n\n# Related PRs\n- https://github.com/apache/parquet-format/issues/563\n- https://github.com/apache/parquet-format/pull/564\n- https://github.com/apache/parquet-java/pull/3470"
    },
    {
      "commit": "13f5f940645fd17527a8397b296f04dc1f7da1d7",
      "tree": "c7a42db6e6a05e29351f7be99febf17cef805531",
      "parents": [
        "c2021f1c4a15c4b420cb78f1bc9dc371d6e30ccc"
      ],
      "author": {
        "name": "Hippolyte Barraud",
        "email": "hippolyte.barraud@datadoghq.com",
        "time": "Thu May 07 13:16:51 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 07 13:16:51 2026 -0400"
      },
      "message": "feat(parquet): compact level representation with generic writer dispatch (#9831)\n\n# Which issue does this PR close?\n\n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n\n- Spawn off from #9653 \n- Contributes to #9731\n\n# Rationale for this change\n\n\u003c!--\nWhy are you proposing this change? If this is already explained clearly\nin the issue then this section is not needed.\nExplaining clearly why changes are proposed helps reviewers understand\nyour changes and offer better suggestions for fixes.\n--\u003e\n\nSee #9731\n\n# What changes are included in this PR?\n\nRepresent definition and repetition levels as `LevelData`/`LevelDataRef`\nwith `Absent`, `Materialized`, and `Uniform` variants, and thread this\nthrough Arrow level generation, CDC chunking, and the generic column\nwriter.\n\nUniform level runs, such as required fields and all-null pages, can now\nbe encoded without materializing dense `Vec\u003ci16\u003e` buffers. Add bulk run\nsupport to `LevelEncoder`/`RleEncoder` so repeated levels are encoded in\namortized O(1) after the RLE warmup, while preserving histogram, row\ncount, null count, page splitting, and CDC chunk accounting.\n\n# Are these changes tested?\n\n\u003c!--\nWe typically require tests for all PRs in order to:\n1. Prevent the code from being accidentally broken by subsequent changes\n2. Serve as another way to document the expected behavior of the code\n\nIf tests are not included in your PR, please explain why (for example,\nare they covered by existing tests)?\n--\u003e\n\nAll tests passing. Coverage exercises bulk RLE level encoding,\ncompact/uniform `LevelData` slicing and writer roundtrips across Parquet\nv1/v2, and CDC/Arrow writer behavior including all-null and nested-level\ncases.\n\n# Are there any user-facing changes?\n\n\u003c!--\nIf there are user-facing changes then we may require documentation to be\nupdated before approving the PR.\n\nIf there are any breaking changes to public APIs, please call them out.\n--\u003e\n\nNone.\n\n---------\n\nSigned-off-by: Hippolyte Barraud \u003chippolyte.barraud@datadoghq.com\u003e\nCo-authored-by: Andrew Lamb \u003candrew@nerdnetworks.org\u003e"
    },
    {
      "commit": "c2021f1c4a15c4b420cb78f1bc9dc371d6e30ccc",
      "tree": "02c9c8250c1226cfc3e5c9a76d6c9cd1895d9174",
      "parents": [
        "913bab26ba9bed8fc2bc1acda300cc52345b0da1"
      ],
      "author": {
        "name": "Andrew Lamb",
        "email": "andrew@nerdnetworks.org",
        "time": "Thu May 07 10:48:21 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 07 10:48:21 2026 -0400"
      },
      "message": "Fix MSRV check by checking in Cargo.lock (#9941)\n\nNOTE: All of this PR is `Cargo.lock`. I swear it is easy to review...\n\n# Which issue does this PR close?\n\n- Closes #9938.\n- Modeled on #9902.\n\n# Rationale for this change\n\nThe guidance on lock files from the Cargo folks changed a while ago:\nhttps://blog.rust-lang.org/2023/08/29/committing-lockfiles/\n\nThe MSRV check is failing on `main` because dependency resolution\ncurrently uses the latest compatible versions from crates.io for each\npackage. The newest `tonic` release now require a newer Rust version\nthan the workspace MSRV.\n\nHere is the reported CI failure:\nhttps://github.com/apache/arrow-rs/actions/runs/25472344356/job/74738606768\n\n```text\nerror: rustc 1.85.0 is not supported by the following packages:\n  tonic@0.14.6 requires rustc 1.88\n  tonic-prost@0.14.6 requires rustc 1.88\n```\n\n# What changes are included in this PR?\n\nThis PR checks in a root `Cargo.lock` so CI verifies MSRV against the\ndependency set we control, rather than the tip of all dependency ranges.\nThe generated lockfile pins the `tonic` 0.14 crates to `0.14.5`, which\nsupports Rust 1.85.\n\nNote this does not change code in the crates. It only pins dependency\nresolution for workspace builds and CI.\n\nThis will result in more dependabot PRs to explicitly update the crate\nversions, but I think that is a good thing. I think the existing config\nfile will work fine\nhttps://github.com/apache/arrow-rs/blob/main/.github/dependabot.yml\n\n# Are these changes tested?\n\nYes, by CI \n\nThis passed locally with Rust 1.85.0.\n\n# Are there any user-facing changes?\n\nNo. This only checks in the root lockfile used for dependency resolution\nin workspace builds and CI."
    },
    {
      "commit": "913bab26ba9bed8fc2bc1acda300cc52345b0da1",
      "tree": "fd572195e3750d1253846d1cd6e9f88e24a141b1",
      "parents": [
        "3384f649cc07212631111fd2c7e34da750721ec5"
      ],
      "author": {
        "name": "Andrew Lamb",
        "email": "andrew@nerdnetworks.org",
        "time": "Thu May 07 10:21:39 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 07 10:21:39 2026 -0400"
      },
      "message": "Prepare for `58.3.0` release (#9893)\n\n# Which issue does this PR close?\n\n- Part of https://github.com/apache/arrow-rs/issues/9859\n\n# Rationale for this change\n\nEven though we just did a release from 58, I want to get a release out\nthat has these changes:\n- https://github.com/apache/arrow-rs/pull/9872\n- https://github.com/apache/arrow-rs/pull/9813\n\n# What changes are included in this PR?\n\n1. Update version to 58.3.0\n2. Update CHANGELOG. See Rendered preview here:\nhttps://github.com/alamb/arrow-rs/blob/alamb/prepare_58.3.0/CHANGELOG.md\n\n# Are these changes tested?\n\nBy CI\n# Are there any user-facing changes?\n\nyes"
    },
    {
      "commit": "3384f649cc07212631111fd2c7e34da750721ec5",
      "tree": "2538ab84cc29ec072e2200f2e78dcda1bb3a4ac1",
      "parents": [
        "cc5a25649d38f94dbaa6ad9994b6af812d061803"
      ],
      "author": {
        "name": "dependabot[bot]",
        "email": "49699333+dependabot[bot]@users.noreply.github.com",
        "time": "Thu May 07 08:38:51 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 07 08:38:51 2026 -0400"
      },
      "message": "chore(deps): bump actions/labeler from 6.0.1 to 6.1.0 (#9932)\n\nBumps [actions/labeler](https://github.com/actions/labeler) from 6.0.1\nto 6.1.0.\n\u003cdetails\u003e\n\u003csummary\u003eRelease notes\u003c/summary\u003e\n\u003cp\u003e\u003cem\u003eSourced from \u003ca\nhref\u003d\"https://github.com/actions/labeler/releases\"\u003eactions/labeler\u0027s\nreleases\u003c/a\u003e.\u003c/em\u003e\u003c/p\u003e\n\u003cblockquote\u003e\n\u003ch2\u003ev6.1.0\u003c/h2\u003e\n\u003ch2\u003eEnhancements\u003c/h2\u003e\n\u003cul\u003e\n\u003cli\u003eAdd changed-files-labels-limit and max-files-changed configuration\noptions to cap the number of labels added by \u003ca\nhref\u003d\"https://github.com/bluca\"\u003e\u003ccode\u003e@​bluca\u003c/code\u003e\u003c/a\u003e in \u003ca\nhref\u003d\"https://redirect.github.com/actions/labeler/pull/923\"\u003eactions/labeler#923\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch2\u003eBug Fixes\u003c/h2\u003e\n\u003cul\u003e\n\u003cli\u003eImprove Labeler Action documentation and permission error handling\nby \u003ca\nhref\u003d\"https://github.com/chiranjib-swain\"\u003e\u003ccode\u003e@​chiranjib-swain\u003c/code\u003e\u003c/a\u003e\nin \u003ca\nhref\u003d\"https://redirect.github.com/actions/labeler/pull/897\"\u003eactions/labeler#897\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003ePreserve manually added labels during workflow runs and refine label\nsynchronization logic by \u003ca\nhref\u003d\"https://github.com/chiranjib-swain\"\u003e\u003ccode\u003e@​chiranjib-swain\u003c/code\u003e\u003c/a\u003e\nin \u003ca\nhref\u003d\"https://redirect.github.com/actions/labeler/pull/917\"\u003eactions/labeler#917\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch2\u003eDependency Updates\u003c/h2\u003e\n\u003cul\u003e\n\u003cli\u003eUpgrade brace-expansion from 1.1.11 to 1.1.12 and document breaking\nchanges in v6 by \u003ca\nhref\u003d\"https://github.com/dependabot\"\u003e\u003ccode\u003e@​dependabot\u003c/code\u003e\u003c/a\u003e in \u003ca\nhref\u003d\"https://redirect.github.com/actions/labeler/pull/877\"\u003eactions/labeler#877\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003eUpgrade minimatch from 10.0.1 to 10.2.3 by \u003ca\nhref\u003d\"https://github.com/dependabot\"\u003e\u003ccode\u003e@​dependabot\u003c/code\u003e\u003c/a\u003e in \u003ca\nhref\u003d\"https://redirect.github.com/actions/labeler/pull/926\"\u003eactions/labeler#926\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003eUpgrade dependencies (\u003ccode\u003e@​actions/core\u003c/code\u003e,\n\u003ccode\u003e@​actions/github\u003c/code\u003e, js-yaml, minimatch, \u003ca\nhref\u003d\"https://github.com/typescript-eslint\"\u003e\u003ccode\u003e@​typescript-eslint\u003c/code\u003e\u003c/a\u003e)\nby \u003ca href\u003d\"https://github.com/Copilot\"\u003e\u003ccode\u003e@​Copilot\u003c/code\u003e\u003c/a\u003e in \u003ca\nhref\u003d\"https://redirect.github.com/actions/labeler/pull/934\"\u003eactions/labeler#934\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch2\u003eNew Contributors\u003c/h2\u003e\n\u003cul\u003e\n\u003cli\u003e\u003ca\nhref\u003d\"https://github.com/chiranjib-swain\"\u003e\u003ccode\u003e@​chiranjib-swain\u003c/code\u003e\u003c/a\u003e\nmade their first contribution in \u003ca\nhref\u003d\"https://redirect.github.com/actions/labeler/pull/897\"\u003eactions/labeler#897\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href\u003d\"https://github.com/bluca\"\u003e\u003ccode\u003e@​bluca\u003c/code\u003e\u003c/a\u003e made\ntheir first contribution in \u003ca\nhref\u003d\"https://redirect.github.com/actions/labeler/pull/923\"\u003eactions/labeler#923\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href\u003d\"https://github.com/Copilot\"\u003e\u003ccode\u003e@​Copilot\u003c/code\u003e\u003c/a\u003e made\ntheir first contribution in \u003ca\nhref\u003d\"https://redirect.github.com/actions/labeler/pull/934\"\u003eactions/labeler#934\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003e\u003cstrong\u003eFull Changelog\u003c/strong\u003e: \u003ca\nhref\u003d\"https://github.com/actions/labeler/compare/v6...v6.1.0\"\u003ehttps://github.com/actions/labeler/compare/v6...v6.1.0\u003c/a\u003e\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003c/details\u003e\n\u003cdetails\u003e\n\u003csummary\u003eCommits\u003c/summary\u003e\n\u003cul\u003e\n\u003cli\u003e\u003ca\nhref\u003d\"https://github.com/actions/labeler/commit/f27b608878404679385c85cfa523b85ccb86e213\"\u003e\u003ccode\u003ef27b608\u003c/code\u003e\u003c/a\u003e\nchore: upgrade dependencies (\u003ccode\u003e@​actions/core\u003c/code\u003e,\n\u003ccode\u003e@​actions/github\u003c/code\u003e, js-yaml, minimat...\u003c/li\u003e\n\u003cli\u003e\u003ca\nhref\u003d\"https://github.com/actions/labeler/commit/c5dadc2a45784a4b6adfcd20fea3465da3a5f904\"\u003e\u003ccode\u003ec5dadc2\u003c/code\u003e\u003c/a\u003e\nAdd \u0027changed-files-labels-limit\u0027 and \u0027max-files-changed\u0027 configs to\nallow cap...\u003c/li\u003e\n\u003cli\u003e\u003ca\nhref\u003d\"https://github.com/actions/labeler/commit/e52e4fb63ed5cd0e07abaad9826b2a893ccb921f\"\u003e\u003ccode\u003ee52e4fb\u003c/code\u003e\u003c/a\u003e\nBump minimatch from 10.0.1 to 10.2.3 (\u003ca\nhref\u003d\"https://redirect.github.com/actions/labeler/issues/926\"\u003e#926\u003c/a\u003e)\u003c/li\u003e\n\u003cli\u003e\u003ca\nhref\u003d\"https://github.com/actions/labeler/commit/77a4082b841706ac431479b7e2bb11216ffef250\"\u003e\u003ccode\u003e77a4082\u003c/code\u003e\u003c/a\u003e\nFix: Preserve manually added labels during workflow run and refine label\nsync...\u003c/li\u003e\n\u003cli\u003e\u003ca\nhref\u003d\"https://github.com/actions/labeler/commit/25abb3cad4f14b7ac27968a495c37798860a5a1a\"\u003e\u003ccode\u003e25abb3c\u003c/code\u003e\u003c/a\u003e\nImprove Labeler Action Documentation and Error Handling for Permissions\n(\u003ca\nhref\u003d\"https://redirect.github.com/actions/labeler/issues/897\"\u003e#897\u003c/a\u003e)\u003c/li\u003e\n\u003cli\u003e\u003ca\nhref\u003d\"https://github.com/actions/labeler/commit/395c8cfdb1e1e691cc4bad0dd315820af8eb67fd\"\u003e\u003ccode\u003e395c8cf\u003c/code\u003e\u003c/a\u003e\nBump brace-expansion from 1.1.11 to 1.1.12 and document breaking changes\nin v...\u003c/li\u003e\n\u003cli\u003eSee full diff in \u003ca\nhref\u003d\"https://github.com/actions/labeler/compare/v6.0.1...v6.1.0\"\u003ecompare\nview\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/details\u003e\n\u003cbr /\u003e\n\n\n[![Dependabot compatibility\nscore](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name\u003dactions/labeler\u0026package-manager\u003dgithub_actions\u0026previous-version\u003d6.0.1\u0026new-version\u003d6.1.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)\n\nDependabot will resolve any conflicts with this PR as long as you don\u0027t\nalter it yourself. You can also trigger a rebase manually by commenting\n`@dependabot rebase`.\n\n[//]: # (dependabot-automerge-start)\n[//]: # (dependabot-automerge-end)\n\n---\n\n\u003cdetails\u003e\n\u003csummary\u003eDependabot commands and options\u003c/summary\u003e\n\u003cbr /\u003e\n\nYou can trigger Dependabot actions by commenting on this PR:\n- `@dependabot rebase` will rebase this PR\n- `@dependabot recreate` will recreate this PR, overwriting any edits\nthat have been made to it\n- `@dependabot show \u003cdependency name\u003e ignore conditions` will show all\nof the ignore conditions of the specified dependency\n- `@dependabot ignore this major version` will close this PR and stop\nDependabot creating any more for this major version (unless you reopen\nthe PR or upgrade to it yourself)\n- `@dependabot ignore this minor version` will close this PR and stop\nDependabot creating any more for this minor version (unless you reopen\nthe PR or upgrade to it yourself)\n- `@dependabot ignore this dependency` will close this PR and stop\nDependabot creating any more for this dependency (unless you reopen the\nPR or upgrade to it yourself)\n\n\n\u003c/details\u003e\n\nSigned-off-by: dependabot[bot] \u003csupport@github.com\u003e\nCo-authored-by: dependabot[bot] \u003c49699333+dependabot[bot]@users.noreply.github.com\u003e"
    },
    {
      "commit": "cc5a25649d38f94dbaa6ad9994b6af812d061803",
      "tree": "f161bb7198d65b8c28c73618459d2c26f14055fd",
      "parents": [
        "97ff1984910656fcd76be7a2a44b92b032d3b300"
      ],
      "author": {
        "name": "Konstantin Tarasov",
        "email": "33369833+sdf-jkl@users.noreply.github.com",
        "time": "Thu May 07 08:38:21 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 07 08:38:21 2026 -0400"
      },
      "message": "impl `FromStr` for `DatePart` (#9931)\n\n# Which issue does this PR close?\n\n\u003c!--\nWe generally require a GitHub issue to be filed for all bug fixes and\nenhancements and this helps us generate change logs for our releases.\nYou can link an issue to this PR using the GitHub syntax.\n--\u003e\n\n- Closes #9930.\n\n# Rationale for this change\n\nCheck issue\n\u003c!--\nWhy are you proposing this change? If this is already explained clearly\nin the issue then this section is not needed.\nExplaining clearly why changes are proposed helps reviewers understand\nyour changes and offer better suggestions for fixes.\n--\u003e\n\n# What changes are included in this PR?\n\nimpl `FromStr` for `DatePart`\n\u003c!--\nThere is no need to duplicate the description in the issue here but it\nis sometimes worth providing a summary of the individual changes in this\nPR.\n--\u003e\n\n# Are these changes tested?\n\nYes, added unit tests\n\u003c!--\nWe typically require tests for all PRs in order to:\n1. Prevent the code from being accidentally broken by subsequent changes\n2. Serve as another way to document the expected behavior of the code\n\nIf tests are not included in your PR, please explain why (for example,\nare they covered by existing tests)?\n\nIf this PR claims a performance improvement, please include evidence\nsuch as benchmark results.\n--\u003e\n\n# Are there any user-facing changes?\nNo\n\u003c!--\nIf there are user-facing changes then we may require documentation to be\nupdated before approving the PR.\n\nIf there are any breaking changes to public APIs, please call them out.\n--\u003e"
    },
    {
      "commit": "97ff1984910656fcd76be7a2a44b92b032d3b300",
      "tree": "dc6d12f5b7d11b3a1fc866165b8e20cac04cf1ba",
      "parents": [
        "ded985c95e6d132567710319d21e1901973ea16f"
      ],
      "author": {
        "name": "theirix",
        "email": "theirix@gmail.com",
        "time": "Thu May 07 03:18:46 2026 +0100"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 07 12:18:46 2026 +1000"
      },
      "message": "feat(arrow-string): concat_elements for view, fixed binary (#9876)\n\n# Which issue does this PR close?\n\n- Closes #9875.\n\n# Rationale for this change\n\n`concat_elements` module lacks versions for binary view and fixed-size\nbinaries. It\u0027s worth having them here.\n\n# What changes are included in this PR?\n\n- Kernel for `BinaryViewArray`\n- Kernel for `FixedSizeBinaryArray`\n- Dispatching logic under `concat_elements_dyn`\n- Unit tests\n- \n# Are these changes tested?\n\nNew unit tests\n\n# Are there any user-facing changes?"
    },
    {
      "commit": "ded985c95e6d132567710319d21e1901973ea16f",
      "tree": "d565d98afec6a657d950a9cf3d64a846346a86be",
      "parents": [
        "7f6524def267f5c5be73b7d5320185ea9f3bb91f"
      ],
      "author": {
        "name": "masumi-ryugo",
        "email": "masumi.ryugo@gmail.com",
        "time": "Thu May 07 00:06:53 2026 +0900"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed May 06 11:06:53 2026 -0400"
      },
      "message": "fix(arrow-csv): bound RecordDecoder::flush offset accumulation (#9886)\n\nCloses #9885.\n\n## What\n\n`RecordDecoder::flush` walks the per-row offsets emitted by\n`csv_core::Reader` and accumulates them so each end offset is absolute\nover `self.data` after the loop. The accumulator was a plain `usize` and\nthe loop body did `*x +\u003d offset`, which on malformed input that drives\n`csv_core` to emit row-relative offsets large enough to wrap a `usize`:\n\n- panics with `attempt to add with overflow` in debug builds (and the\ncargo-fuzz `csv_reader` harness that found this is built with\n`--debug-assertions`);\n- silently wraps to a wildly out-of-bounds index in release builds,\nwhich then trips an unrelated `assert!` / `unwrap` somewhere downstream.\n\n## Fix\n\nSwitch the accumulator to `checked_add` and surface the overflow as\n`ArrowError::CsvError` instead. The body of the loop becomes a normal\n`for` loop because `?` doesn\u0027t compose with the previous closure form.\n\n```rust\nlet mut row_offset: usize \u003d 0;\nfor row in self.offsets[1..self.offsets_len].chunks_exact_mut(self.num_columns) {\n    let offset \u003d row_offset;\n    for x in row.iter_mut() {\n        *x \u003d x.checked_add(offset).ok_or_else(|| {\n            ArrowError::CsvError(\n                \"CSV record offsets overflowed usize while flushing\".to_string(),\n            )\n        })?;\n        row_offset \u003d *x;\n    }\n}\n```\n\n## Repro\n\nThe cargo-fuzz `csv_reader` harness from\n[`fuzz/initial-harnesses`](https://github.com/masumi-ryugo/arrow-rs/tree/fuzz/initial-harnesses)\n(per #5332) reproduces this from an empty corpus in single-digit\nminutes. The minimized repro is 72 bytes:\n\n```\n0000  2e 22 3f 0a 31 0a 3f 3f  0a 3c 50 50 0a 3f 0a 31  |.\"?.1.??.\u003cPP.?.1|\n0010  0a 3f 38 0a 3c 0a 3f 0a  3c 50 50 0a 3f 0a 31 0a  |.?8.\u003c.?.\u003cPP.?.1.|\n0020  3f 38 0a 0a 2e 22 3f 0a  31 0a 3f 3f 0a ce ce ce  |?8...\"?.1.??....|\n0030  b1 ce ce ce ce ce ce ce  ce 31 0a 3f 38 0a 3c 0a  |.........1.?8.\u003c.|\n0040  3f 0a 3c 0a 3f 0a 3f 69                            |?.\u003c.?.?i|\n```\n\nBefore this PR (run on `main` HEAD against the cargo-fuzz harness):\n```\nthread \u0027\u003cunnamed\u003e\u0027 panicked at arrow-csv/src/reader/records.rs:207:21:\nattempt to add with overflow\n```\n\nAfter this PR the same 72 bytes pass through the fuzz target in 40 ms\nwith exit 0; the API now returns `ArrowError::CsvError(...)` for callers\nto handle.\n\n## Tests\n\nAdds\n`reader::records::tests::test_flush_offset_overflow_does_not_panic`,\nwhich feeds the 72-byte fuzz repro through `RecordDecoder::decode` +\n`flush` and asserts the loop terminates cleanly instead of panicking.\nThe existing 4 tests in that module continue to pass.\n\n## Alternatives considered\n\n- **Cap by `self.data_len`**: each emitted offset is supposed to be ≤\n`self.data_len`, so an explicit cap would also turn the overflow into a\nclean error. I went with `checked_add` because it\u0027s the more targeted\nchange — it doesn\u0027t add a new invariant on `csv_core`\u0027s output, only\nrefuses to compute something that would have been arithmetically\nnonsensical anyway.\n- **Use `saturating_add`**: would silently truncate the offset and then\nmis-slice `self.data`, producing a confusing `Encountered invalid UTF-8\ndata` error or panic deeper in the call stack. Worse signal.\n\nxref #5332 #9883 #9884\n\n---------\n\nCo-authored-by: masumi ryugo \u003c280057467+masumi-ryugo@users.noreply.github.com\u003e\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    }
  ],
  "next": "7f6524def267f5c5be73b7d5320185ea9f3bb91f"
}
